VoIP: How to Effectively Encapsulate Voice in IP Packets

- select the contributor at the end of the page -
VoIP converts voice signals from a telephone into digital signals that can be transmitted over the Internet. VoIP is becoming more popular every day.

Nowadays it is common to find leased lines and VPN connections between company branches which are used to transport voice traffic in addition to data.

However, voice has strict real-time requirements in terms of delay, jitter, and bandwidth.

To be able to effectively carry voice over a connectionless, unreliable protocol such as IP, certain mechanisms need to be addressed. A protocol on an upper layer needs to provide reliability and different compression algorithms need to exist to satisfy different bandwidth requirements.

All of these things are crucial in designing and operating a VoIP infrastructure, and this is exactly what we'll talk about today.

Barriers to Overcome

Traditional voice networks are based on Time Division Multiplexing, a technique that requires both accurate and timely processing.

Voice traffic has strict requirements in terms of delay, jitter and packet loss. Circuit switched telephone exchanges were able to fulfill those requirements and provide reliable and guaranteed voice services.

The need for carrying voice traffic over the data network has caused an evolution to voice architectural design both on the signaling call control part and media transportation.

The big question is: How to integrate connection-oriented voice traffic in a connectionless IP network and still provide a reliable service?

VoIP Protocols have been implemented to provide a solution to the above question.

Signaling protocols have been enhanced to address the call control requirements over IP networks. In addition, media transportation protocols have been designed to reliably transfer voice packets and effectively save bandwidth.

Relating VoIP Protocols with the OSI Model

VoIP Protocols and the OSI Model

Voice media packets use RTP/UDP for transport; this is a constant attribute. For media, UDP is always used and on top RTP protocol provides reliable transfer by providing sequencing functionality, hence providing the mechanism of synchronizing and reordering media packets.

Moreover RTCP (Real Time Control Protocol) operates on top of RTP and provides the mechanism for controlling RTP by monitoring QoS parameters on running sessions.


  • RTP provides a reliable transport mechanism for real-time traffic such as voice.

    RTP header consists of voice codec-type identification, sequence numbering and time stamping for monitoring QoS parameters.

  • RTP uses sequence numbering for reordering packets arriving at the receiver side.

    It does not use retransmission of packets since it makes no sense in real-time traffic to retransmit an expired sample of traffic.

  • RTP uses time stamping based on synchronized clocks so that the receiver is able to buffer packets and smooth jitter and delay so that voice is played continuously in a synchronized manner.

  • RTCP uses a separate flow from RTP. It is transported over UDP as well, and its purpose is to monitor the quality of the data transmission.

    RTCP is similar to RTP in that it uses different UDP ports on each direction -- One port for the Transmit and a separate one for the Receive direction just like the RTP protocol.

  • RTCP collects statistics on a given media connection. It counts packet loss, jitter, round trip delay, etc.

    These statistics can be evaluated by special-purpose applications and appropriate measures can be taken to sustain or even increase the quality of service. Choosing a different compression method or even increasing the bandwidth are a couple of possible measures that can be taken.

VoIP Bandwidth

Total Bandwidth consumed depends on the compression codec. Codec type to be used can be negotiated per call session or it can be preconfigured from the beginning.

A few codec types are presented below along with the perceived codec rate:

  • g723ar63 with 48 bytes payload (Annex A: includes VAD, codec rate =6300bps )
  • g723r63 with 48 bytes payload (codec rate =6300bps)
  • g729br8 with 40 bytes payload (Annex B: includes VAD, codec rate= 8000bps)
  • g729r8 with 40 bytes payload (codec rate =8000bps)

To be able to correctly plan the bandwidth requirements of the WAN link for carrying voice traffic, the following formula needs to be considered:

  • Total Bandwidth= ([Layer2_overhead+IP_UDP_RTP overhead + payload size]/payload size)*codec rate

IP, UDP and RTP headers have more or less a constant size. IP has a 20-byte header, UDP consists of an 8-byte header and RTP has a 12-byte header.

The terms in the formula that are variable are Layer 2 headers and payload size. The latter depends upon the codec used, while the first depends on the link layer protocol used, i.e. Ethernet, HDLC, Frame Relay, PPP, etc.

Let's calculate the Bandwidth requirements for the HDLC data link protocol (approximately 6-byte header size) for the codec types presented above.

  • G723r63
  • TB= [(6+20+8+12+48)/48]*6300=12.3 kbps

  • G729r8
  • TB= [(6+20+8+12+40)/40]*8000=17.2 kbps

When VAD enabled codecs are used, extra bandwidth savings are achieved. Approximate bandwidth estimation per call is presented below.

Keep in mind that this is just an approximate estimation and not a fix per call value. It is a general observation that is concealed from a sample of more than 20 simultaneous calls on a specific WAN link.

  • G723r63 with VAD
  • TB~ 8.6 kbps

  • G729r8 with VAD
  • TB~ 12 kbps

RTP Header Compression

Real-time Transport Protocol (RTP) is a protocol used for carrying packetized audio and video traffic over an IP network. RTP has been designed to serve transport requirements of real-time traffic such as audio and video.

RTP has a minimal 12-byte long header which combined with IP (20-byte header) and UDP (8-byte header) creates a total of 40-byte header. This is too much for header size; it is inefficient to transmit the 40-byte header per packet without applying compression to it.

By applying RTP header compression (CRTP), the IP/UDP/RTP header in an RTP data packet is reduced from 40 bytes to approximately 2 to 5 bytes, as shown on figure 2 below.

RTP header compression is a hop-by-hop scheme therefore all parties involved within the transmission path should comply with this scheme. Details on CRTP can be found in RFC 2508.

RTP Header Compression (CRTP)

Choosing the right Compression Algorithm

Choosing the right codec to use is not that simple.

If Bandwidth is not an issue then the traditional codec G.711 is the best choice to use. It is the only codec that has achieved an excellent grade of service.

If bandwidth and processing resources are an important issue, then a compromise needs to be made between the amount of CPU resources to spare, voice quality and additional delay incorporated by the use of the codec.

In practice, G.729 and G.723 are the most popular compression schemes used.

Want More? Subscribe to our RSS Feed and get all the latest

networking updates as soon as they're available!

Get our content first. In your inbox.

Loading form...

If this message remains, it may be due to cookies being disabled or to an ad blocker.


Stelios Antoniou

(CCNA, NET+, MOUS) holds a BSc in Electronic Engineering and an MSc in Communication Networks. He has over three years of experience in teaching MS Office applications, networking courses and GCE courses in Information Technology. Stelios is currently working as a VoIP Engineer in a Telecom company, where he uses his knowledge in practice. He has successfully completed training on CCNP topics, Linux and IMS. His enthusiasm, ambition and knowledge motivate him to offer his best. Stelios has written many articles covering Cisco CCENT, CCNA, and CCNP.