Microsoft DirectX 9.0

Jitter Buffers

Microsoft?DirectPlay?Voice features a jitter buffer, an adaptive buffering algorithm that provides optimal voice quality with the least amount of latency.

On busy networks, individual packets of voice data information might arrive in a different sequence from that in which they were encoded on the host computer. Because voice data is sequential in nature, these incoming packets must be queued for a time so that delayed packets have an opportunity to arrive and be played back in order.

If the jitter buffer is set to maximize the quality of voice communication, it takes longer for the required number of voice packets to arrive and be queued for play. The result is voice latency, and the effect is that voice communication is not heard in real time. Instead, the voice data might be heard anywhere from a fraction of a second to several seconds after it was recorded. This can introduce problems during cooperative game play because events can occur in the game but players will not be able to communicate information based on those events in real time. For example, if a player in a first-person shooter is about to be attacked from behind and a teammate attempts to warn the player, the voice communication might not be heard until after the player has been attacked.

If the jitter buffer is set to a reduce latency, the number of packets required to fill the queue is reduced. However, it is possible that not all sequential packets will arrive in time and, as a result, voice data will be missing from the buffer when it is played. The voice communication will be heard much closer to the actual time it was recorded. However, it will have a "broken-up" quality.

The DirectPlay jitter buffer uses two methods to determine how to provide the best quality of voice communication with the least amount of latency. First, network conditions are monitored to determine the amount of lag or network congestion. The size of the jitter buffer, or queue, is then dynamically sized to keep latency as low as possible while providing the least amount of voice break up.

The default behavior of DirectPlay Voice jitter buffer is to automatically adjust to network conditions. You can manually adjust how closely the algorithm tracks network conditions using the dwBufferAggressiveness and dwBufferQuality members of the DVCLIENTCONFIG structure. The higher the level of "aggressiveness," the more closely the algorithm monitors network conditions. In general, the higher the quality value, the higher the quality of the voice, but the higher the latency as well. The lower the quality value, the lower the latency but the lower the quality of the voice. You can set these two members when you call IDirectPlayVoiceClient::Connect to connect to a session, and at any time during the session by calling IDirectPlayVoiceClient::SetClientConfig.

It is important to choose an appropriate level of aggressiveness for network conditions when your game application is running because selecting a high level of aggressiveness during times of steady network performance can cause the algorithm to misinterpret a transitory problem and overcompensate for a problem that might not exist.

© 2002 Microsoft Corporation. All rights reserved.