Real Time Protocol (RTP) is used for media transport
It comprises of two parts, separate data and control channels:
- RTP – media Payload formats
- RTP Control Protocol (RTCP)
- Source description and caller identity, reception quality, codec control
Payload formats
Codec-specific packet formats; application level framing; robust, but complex. Each frame packetized for independent use for low latency.
RTP payload formats define how compressed audio/visual data is formatted into RTP packets
Goal: each packet should be independently usable
If a packet arrives, it should be possible to decode all the data it contains – not always possible, but desirable Naïve packetisation can lead to inter-packet dependencies where a packet arrives but can’t be decoded because some previous packet, on which it depends, was lost
Extensions
Reception quality and user experience monitoring • Codec control and other feedback • Circuit breakers and congestion control
Timing Recovery
Unfortunately due to the nature of the internet, we cannot guarantee the order, and spacing of the packets being sent to be identical to what is received, and we have to account for this, especially in media.
This is done as follows:
Where we calculate the playout buffering delay as such:

General Structure

RTP Packet
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronisation source (SSRC) identifier |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
| [CSRC identifier list] |
| (4 * CC octets) |
| CC may be zero |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+
| defined by signalling | header extension length | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| header extension | | OPTIONAL
| format defined by signalling | | (if X=1)
| | |
| | |
| | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -+
| Payload |
| (variable format and length, depends on PT) |
| |
| |
| |
| +-------.........---------------+---------------+
| |Padding (PadCnt octets, if P=1)|PadCnt (if P=1)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Carried in UDP Header information carries:
- Sequence number and timestamp
- Allows receiver to reconstruct ordering and timing
- Source identifiers
- Who sent this packet – needed for multiparty calls
- Payload format identifier
- Does the packet contain audio or video?
- What compression algorithm is used?
Signaling is needed to establish the direct P2P connection This is provided by
SDP
Session Description Protocol
This is used to:
- specify things like the details of what transport connections are to be set up,
- exchange the set of candidate IP addresses on which they can be reached, to set up the peer-to-peer connection.
- specify the media formats they want to use.
- Is it just audio? Or is it audio and video?
- And which compression algorithms are to be used?
- And they want to specify the timing of the session, and the security parameters, and all the other parameters.
SDP Offer/Answer
Interactive sessions require negotiation
- An offer to communicate: lists codecs, options and addressing details, identity of caller
- The answer subsets codecs and options to those mutually acceptable, supplies addressing details, and confirms willingness to communicate
- ICE algorithm (→ Lecture 2) probes NAT bindings, establishes path
- Audio and video data flows
Format
It’s essentially a set of key-value pairs, where the keys are all single letters, and the values are more complex, one key-value pair per line, with the key and the value separated by equals signs.
And, as we see in the example:
Link to original
- it starts with a version number, v=0.
- There’s an originator line, and it was originated by Jane Doe, who had IP address 10.47.16.5.
- It’s a seminar about session description protocol.
- It’s got the email address of Jane Doe, who set up the call,
- it’s got their IP address, the times that session is active,
- it’s receive only, it’s broadcast so that the listener just receives the data,
- it’s sending using audio and video media,
- it specifies the ports and some details of the video compression scheme, and so on.
And, as we see in the example: