Streaming Media Lecture 33 Streaming Audio & Video April 20, 2005 Classes of applications: streaming stored video/audio streaming live video/audio real-time interactive video/audio Examples: distributed MP3 player webcast video conference VoIP surveillance games... Streaming Media Continuous, arbitrarily long stream of data. Characteristics: typically delay-sensitive delay jitter loss tolerant: drop/lose some `frames example: MPEG user-specific quality needs: image size, resolutions, color depth, frame rate,... Real-Time Streaming audio/video is considered a class of real-time applications. Real-time: correctness is important (data, order,...) TIME is important: frame replay rate of 25fps a frame needs to be displayed every 40ms jitter is variation: 38ms, 41ms, 43ms,... early arrival? late arrival? Note: Fall course on Real-Time Systems Streaming Stored Multimedia Streaming Stored Multimedia Like VCR: Cumulative data pause rewind fast-forward Delays: initial delay (up to 10s ok) command delay (1-2s) 1. video recorded 2. video sent network delay streaming: at this time, client playing out early part of video, while server still sending later part of video 3. video received, played out at client time 1
Streaming Live Multimedia Examples: conference, webcast, tele-teaching sports events Internet radio talk show Streaming: playback buffer playback can lag several seconds Interactivity: no fast-forward pause, rewind possible Real-Time Interactive MM Examples: video conference IP telephony interactive virtual worlds End-to-end delay characteristics: audio: < 150ms good, < 400ms OK higher delays noticeable, impair interactivity Multimedia over Internet TCP/UDP/IP: no delay guarantees! Internet multimedia applications use application-level techniques to ensure (as best as possible) QoS levels Consider a phone application at 1Mbps and an FTP application sharing a 1.5 Mbps link. Bursts of FTP can congest the router and cause audio packets to be dropped. Want to give priority to audio over FTP. PRINCIPLE 1: Marking of packets is needed for router to distinguish between different classes; and new router policy to treat packets accordingly. Applications misbehave (audio sends packets at a rate higher than 1Mbps assumed above). PRINCIPLE 2: provide protection (isolation) for one class from other classes. Require Policing Mechanisms to ensure sources adhere to bandwidth requirements; Marking and Policing need to be done at the edges: Alternative to Marking and Policing: allocate a set portion of bandwidth to each application flow; can lead to inefficient use of bandwidth if one of the flows does not use its allocation. PRINCIPLE 3: While providing isolation, it is desirable to use resources as efficiently as possible. 2
Cannot support traffic beyond link capacity. PRINCIPLE 4: Need a Call Admission Process; application flow declares its needs, network may block call if it cannot satisfy the needs. Integrated Services (IntServ): architecture for providing QoS guarantees in IP networks for individual flows fundamental changes to the Internet to reserve end-to-end bandwidths components: admission control resource reservation routing classifier and route selection packet scheduling Problems with that approach: scalability/complexity: maintaining per-flow router state difficult with large number of flows flexible service models: IntServ has only two classes (controlled, guaranteed), we would like qualitative service classes Differentiated Services (DiffServ): simple functions in network core, relatively complex functions at edge routers (or hosts) edge: packet classification, packet marking, traffic conditioning core: forwarding flows are aggregated into classes that receive treatment depending on class parameters don t define define service classes, provide functional components to build service classes Do fine-grained enforcement only at the edge of the network. Typically slower links at edges E.g., mail sorting in post office Label packets with a field. E.g., a priority stamp The core of the network uses only the type field for QoS management. Small number of types with well defined forwarding behavior Can be handled fast One of the main motivations for Diffserv is scalability. Keep the core of the network simple. Packet is marked in the Type of Service (TOS) in IPv4, and Traffic Class in IPv6. 6 bits used for Differentiated Service Code Point (DSCP) and determine PHB that the packet will receive. 2 bits are currently unused. 3
Forwarding: according to Per-Hop- Behavior or PHB specified for the particular packet class; such PHB is strictly based on class marking (no other header fields can be used to influence PHB). BIG ADVANTAGE: No state info to be maintained by routers! Content Distribution Networks Challenging to stream large files (e.g., video) from single origin server in real time origin server in North America Solution: replicate content at hundreds of servers throughout Internet CDN distribution node content downloaded to CDN servers ahead of time placing content close to user avoids impairments (loss, delay) of sending content over long paths CDN server CDN server CDN server typically in edge/access network in S. America CDN server in Europe in Asia Multicast/Broadcast Internet Multimedia duplicate R1 duplicate creation/transmission R1 R2 R2 duplicate R3 R4 R3 R4 (a) (b) Source-duplication versus in-network duplication. (a) source duplication, (b) in-network duplication Simplest approach: audio/video stored as file, shipped as HTTP object, received at client, then passed to player. No streaming : long delays until playout. Streaming vs Downloading Download: receive entire file before playback begins long startup delays QoS? file sizes: 100s MBytes - several GBytes Streaming: play media file while being received reasonable start-up delays QoS? buffering Progressive Download Browser GETs meta file. Browser launches player, passes meta file. Player contacts server. Player downloads video/audio. 4
Streaming Server Client Buffering Cumulative data constant bit rate video transmission variable network delay client video reception buffered video constant bit rate video playout at client Architecture allows for non-http protocol between server and player. TCP or UDP. client playout delay Client-side buffering, playout delay compensate for network-added delay, delay jitter. time Client Buffering variable fill rate, x(t) buffered video constant drain rate, d Client-side buffering, playout delay compensate for network-added delay, delay jitter. TCP or UDP UDP server sends at rate appropriate for client (oblivious to network congestion!) often send rate = encoding rate = constant rate then, fill rate = constant rate - packet loss short playout delay (2-5 seconds) to compensate for network delay jitter error recover: time permitting TCP send at maximum possible rate under TCP fill rate fluctuates due to TCP congestion control larger playout delay: smooth TCP delivery rate HTTP/TCP passes more easily through firewalls Real-Time Streaming Protocol HTTP does not target multimedia content no commands for fast forward, etc. RTSP: RFC 2326 client-server application layer protocol. for user to control display: rewind, fast forward, pause, resume, repositioning, etc What it doesn t do: does not define how audio/video is encapsulated for streaming over network does not restrict how streamed media is transported; it can be transported over UDP or TCP does not specify how the media player buffers audio/video Out of Band Control FTP uses an out-of-band control channel: a file is transferred over one channel. control information (directory changes, file deletion, file renaming, etc.) is sent over a separate TCP connection. the out-of-band and in-band channels use different port numbers. RTSP messages are also sent out-of-band: the RTSP control messages use different port numbers than the media stream, and are therefore sent out-of-band. the media stream, whose packet structure is not defined by RTSP, is considered in-band. if the RTSP messages were to use the same port numbers as the media stream, then RTSP messages would be said to be interleaved with the media stream. 5
RTSP Example Scenario: metafile communicated to web browser browser launches player player sets up an RTSP control connection, data connection to streaming server Meta File Example <title>twister</title> <session> <group language=en lipsync> <switch> <track type=audio e="pcmu/8000/1" src = "rtsp://audio.example.com/twister/audio.en/lofi"> <track type=audio e="dvi4/16000/2" pt="90 DVI4/8000/1" src="rtsp://audio.example.com/twister/audio.en/hifi"> </switch> <track type="video/jpeg" src="rtsp://video.example.com/twister/video"> </group> </session> RTSP Session RTSP Operation Each RTSP has a session identifier, which is chosen by the server. The client initiates the session with the SETUP request, and the server responds to the request with an identifier. The client repeats the session identifier for each request, until the client closes the session with the TEARDOWN request. RTSP port number is 554. RTSP can be sent over UDP or TCP. Each RTSP message can be sent over a separate TCP connection. rtsp://media.example/com/twister/audiotrack identifies audio stream, can be controlled with RTSP over TCP connection on port 554 on host media.example.com RTSP Exchange Example C: SETUP rtsp://audio.example.com/twister/audio RTSP/1.0 Transport: rtp/udp; compression; port=3056; mode=play S: RTSP/1.0 200 1 OK Session 4231 C: PLAY rtsp://audio.example.com/twister/audio.en/lofi RTSP/1.0 Session: 4231 Range: npt=0- C: PAUSE rtsp://audio.example.com/twister/audio.en/lofi RTSP/1.0 Session: 4231 Range: npt=37 C: TEARDOWN rtsp://audio.example.com/twister/audio.en/lofi RTSP/1.0 Session: 4231 RTSP Streaming Caching Caching of RTSP response messages makes little sense. But desirable to cache media streams closer to client. Much of HTTP/1.1 cache control has been adopted by RTSP. cache control headers can be put in RTSP SETUP requests and responses: If-modified-since:, Expires:, Via:, Cache-Control: Proxy cache may hold only segments of a given media stream. proxy cache may start serving a client from its local cache, and then have to connect to origin server and fill missing material, hopefully without introducing gaps at client. When origin server is sending a stream through client, and stream passes through a proxy, proxy can use TCP to obtain the stream; but proxy still sends RTSP control messages to origin server. S: 200 3 OK 6
Packet Loss Network loss: IP datagram lost due to network congestion (router buffer overflow) Delay loss: IP datagram arrives too late for playout at receiver delays: processing, queueing in network; endsystem (sender, receiver) delays tolerable delay depends on the application How can packet loss be handled? Receiver-Based Packet Loss Recovery Generate replacement packet packet repetition interpolation other sophisticated schemes Works when audio/video stream exhibits short-term self-similarity Works for relatively low loss rates (e.g., < 5%) Typically, breaks down on bursty losses Forward Error Correction FEC For every group of n packets generate k redundant packets Send out n+k packets, increasing the bandwidth by factor k/n. Can reconstruct the original n packets provided at most k packets are lost from the group Works well at high loss rate (for a proper choice of k) Handles bursty packet losses Cost: increase in transmission cost (bandwidth) FEC Example Piggyback lower quality stream Example: send lower resolution audio stream as the redundant information Whenever there is non-consecutive loss, the receiver can conceal the loss. Can also append (n- 1)st and (n-2)nd lowbit rate chunk Interleaving: Recovery from Packet Loss Interleaving Re-sequence packets before transmission Better handling of burst losses Results in increased playout delay Recovery from Packet Loss Receiver-based repair of damaged audio streams produce a replacement for a lost packet that is similar to the original can give good performance for low loss rates and small packets (4-40 msec) simplest: repetition more complicated: interpolation 7