E N D
CLUE Framework First Draft IETF - 81 July, 2011 Allyn Romanow (allyn@cisco.com) Mark Duckworth (mark.duckworth@polycom.com ) Andy Pepperell (apeppere@cisco.com) Brian Baldino (bbaldino@cisco.com )
R R C C L L Multiple Media Streams Video and Audio London Video and Audio R L Dallas R Video and Audio L Paris
Extensible • Simple • Usable now Challenges • Current functionality • Practical to implement • Future functionality
What’s Needed? • MEDIA CAPTURE DESCRIPTION • CHOOSING STREAMS
Process • Consumer sends hints to provider • Provider sends capabilities • Consumer chooses streams (Not negotiated in the strict sense, 2 one-way)
Structure of Information Capture Sets Media Capture Audio or Video Media Capture Audio or Video Media Capture Audio or Video Simultaneous Transmission Set Attributes Encode Group
Media Capture Description Mark Duckworth
Media Capture & Attributes Capture Sets Media Capture Audio or Video Media Capture Audio or Video Media Capture Audio or Video Simultaneous Transmission Set Encoding Group Attributes
Attributes • Audio attributes • Purpose (role) • Main • Presentation • Mixed – true/false • Channel Format • Linear array • Stereo • Mono • Linear position • 0 to 100 • Video attributes • Purpose (role) • Main • Presentation • Composed – true/false • Auto switched • True/false • Spatial scale • Image width EXTENSIBILITY
Capture Scene Capture Scene VC0 VC1 VC2 VC2 Three cameras Cameras VC3 VC4 People VC1 Two cameras, moved & zoomed out VC5 VC0 Switched (based on voice) with composed PiP
Capture Set Capture Set Rows VC0 VC1 VC2 (VC0, VC1, VC2) (VC3, VC4) (VC5) (AC0) Three cameras VC3 VC4 Two cameras, moved and zoomed out VC5 Each alternative representation of a Capture Scene is a row in a Capture Set Switched (based on voice), composed PiP
Video Capture Adjacency people cameras VC1 right VC1 right left VC0 VC0 left Capture Set: (VC0, VC1) Other capture set rows
Matching Audio with Video • Same capture scene • Video adjacency matches audio sound stage
Matching Audio with Video Spatial extent of video VC0 VC1 VC2 Stereo Left Right Linear Array 0 50 100 Spatial extent of audio
Choosing Streams Andy Pepperell
Basic message flow Media Stream Consumer Media Stream Provider Consumer capability advertisement Media capture advertisement Consumer configuration of provider’s streams
Capabilities Sent by Consumer Media Stream Consumer Physical factors e.g. number of screens Consumer capability advertisement User preferences Software limitations e.g. media capture attributes known
Advertisement Sent by Provider Media Stream Provider Consumercapabilityadvertisement Media capture advertisement Provider fixed characteristics e.g. number of cameras Dynamic factors e.g. whether presentation source present
Configure Msg Sent by Consumer Media Stream Consumer Provider capture advertisement simultaneous transmission set + encoding groups Stream configure message Consumer’s fixed characteristics e.g. number of screens Dynamic factors e.g. change of user preferences
Provider Capture Advertisement Captures and attributes Simultaneous transmission sets Capture sets Encoding groups
Simultaneous Transmission Sets Right VC2 VC3 Center camera can do either regular or zoomed VC1 Center People VC0 Left (VC0, VC1, VC2) (VC0, VC3, VC2)
Encoding Groups Media Stream Provider Encoding group Encoding group Encoding Group
Encoding Group Structure Media stream provider Encoding group Encoding group Encoding group Encode 1 Encode 2 Encode 3
Sample Encoding Group <=2 encodes, <= 1080p30 Bandwidth trade-off between encodes & group as a whole EG0: maxMbps = 489600, maxBandwidth=6000000 • ENC0: maxWidth=1920, maxHeight=1080, maxFrameRate=60, maxMbps=244800, maxBandwidth=4000000 • ENC1: maxWidth=1920, maxHeight=1080, maxFrameRate=60, maxMbps=244800, maxBandwidth=4000000
Examples Brian Baldino