Interactive applications have been around for a long time, and come in many forms:

  • Telephony
  • Voice-over-IP (VoIP)
  • Video Conferencing

Many types:

Requirements

Requirements for interactive applications determined by task and human perception:

  • Phone call or video conference One-way mouth-to-ear delay ~150ms maximum for telephony Video conferences want to lip-sync audio and video Audio should be no more than 15ms ahead, or 45ms behind, video

  • Lecture style Mostly unidirectional with occasional questions → can tolerate much higher latency

  • Distributed music performance One-way latency ≪50ms desirable Speed of sound: ~15ms to go from one side of a large orchestra to the other

Media Encoding

Speech Encoding

Typically operates on 20ms packets

  • Data rate tens of kilobits
  • Background noise packets are much lower encoding quality Highly loss tolerant, can conceal around 10-20% random packet loss without noticing Burst losses are less well concealed

Video Encoding

Video frame rate/resolution is highly variable:

  • High def H.264 is around 2-4Mbps
  • Frame rates from 25-60fps common
  • I-frame tens of packets; P-frames single/few packets Not very loss tolerant
  • No scene changes to reset decoder state to known good value
  • Retransmissions possible in some cases; forward error correction more typical

General Transmission Path

  1. Frames of media data are captured periodically
  2. Codec compresses media frames
  3. Compressed frames fragmented into packets
    1. Transmitted using RTP inside UDP packets
    2. RTP protocol adds timing and sequencing, source identification, payload identification
  4. Transmitted over the network

General Reception Path

  1. UDP packets containing RTP protocol data arrive
    1. Separated according to sender
  2. Channel coder repairs loss using forward error correction
    1. Additional packets sent along with the media, to allow some repair without needed retransmission
  3. Playout buffer used to reconstruct order, smooth timing
  4. Media is decompressed, packet loss concealed, and clock skew corrected
  5. Recovered media is rendered to user

Forward Error Correction (FEC)

Retransmission possible, but often takes too long – Packet should have been played out before retransmission arrives Forward error correction (FEC) often used instead Additional FEC packets are sent along with the original data Contain error correcting codes e.g., the Exclusive-OR (XOR) of the original packets – many different FEC schemes If some original packets are lost but the FEC packets arrive, original data can be reconstructed