csperkins.org

Robust Audio Tool

Sender based repair of damaged audio streams

Unless some form of resource reservation protcol (eg: RSVP) is used, an IP based network, such as the Internet or the Mbone, will occasionally lose packets. These lost packets result in broken up audio, which rapidly becomes unintelligible as the loss rate increases. RAT implements two sender based repair schemes to recover from this problem: redundant transmission and interleaving.

Redundant transmission is the means by which a (more) heavily compressed copy of a packet is piggy-backed onto the following packet. If the original packet is lost, the redundant copy can be used in its place. Because the redundant packet is very heavily compressed, sound quality suffers, but is still better than having no audio to play out in the place of the lost packet. Clearly, there exists a tradeoff between the amount of compression used for the redundant packet (and hence stream bandwidth/overhead), and the quality of the resultant audio.

Redundant audio encoding

Redundant transmission was developed by UCL and INRIA Sophia-Antipolis, as part of the MICE/MERCI multimedia conferencing projects. It is discussed further in the following papers:

As an alternative to redundant transmission, recent versions of RAT provide the option to send interleaved audio. Units of audio data are resequenced before transmission, so that originally adjacent units are separated by a guaranteed distance in the transmitted stream, and returned to their original order at the receiver. Interleaving disperses the effect of packet losses. If, for example, units are 5ms in length and packets 20ms (ie: 4 units per packet), then the first packet could contain units 1, 5, 9, 13; the second packet would contain units 2, 6, 10, 14; and so on. It can be seen that the loss of a single packet from an interleaved stream results in multiple small gaps in the reconstructed stream, as opposed to the single large gap which would occur in a non-interleaved stream.

Interleaving

Although interleaving does not reduce the amount of loss observed, it does significantly improve the perceived quality of an audio stream. The obvious disadvantage of interleaving is that it increases latency. This limits the use of this technique for interactive applications, although it performs well for non-interactive use. The major advantage of interleaving is that it does not increase the bandwidth requirements of a stream.

Receiver based repair of damaged audio streams

Receiver based recovery schemes rely on producing a replacement for a lost packet which is similar to the original. This is possible since audio signals, and in particular speech, exhibit large amounts of short-term self similarity. As such, these techniques work for relatively small loss rates (less than 15%), and for small packets (4-40ms). When the loss length approaches the length of a phoneme (5-100ms) these techniques breakdown, since whole phonemes may be missed by the listener.

It is, therefore, clear that receiver based repair schemes are not a substitute for sender-based repair, but rather work in tandem with it. A sender-based scheme is used to repair most losses, leaving a small number of isolated gaps to be repaired. Once the effective loss rate has been reduced in this way, receiver based repair forms a cheap and effective means of patching over the remaining loss.

A number of receiver based repair schemes are implemented in RAT:

A simple form of receiver based recovery is silence substitution. The gap left by a lost packet is filled with silence, to maintain the timing relationship between the surrounding packets. It is only effective with short packet lengths (less than 4ms) and low loss rates (less than 2%), making it suitable for striped audio with narrow and distributed stripes over low loss paths.

The performance of silence substitution degrades rapidly as packet sizes increase, and quality is unacceptably bad for the 40ms packet size in common use in network audio conferencing tools. Despite this, the use of silence substitution is widespread, primarily because it is simple to implement.

Packet repetition replaces lost packets with copies of the packets that arrived immediately before the loss. It has low computational complexity and performs reasonably well. The subjective quality of repetition is improved by gradually fading repeated units. The GSM system, for example, advocates the repetition of the first 20ms with the same amplitude and followed by fading the repeated signal to zero amplitude over the next 320ms.

The use of repetition with fading is a good compromise between the poor performance of silence substitution, and the more complex pattern matching scheme.

Pattern matching repair uses audio before and after the loss to interpolate a suitable signal to cover the loss. It performs somewhat better than packet repetition, but is significantly more computationally intesive.

Adaptive Scheduling Protection

Current general purpose operating systems, such as Unix and Windows 95, do not provide adequate support for real-time services in their scheduling algorithms. RAT uses a novel adaptive algorithm, where the DMA driven audio playout is used to `cushion' the system against scheduling anomolies. This is described in the following paper:

Secure Conferencing

RAT allows for secure conferencing, whereby media streams and participant identity information can be encrypted using triple-DES. Other encryption algorithms could easily be added.

Security Preferences

Improved Statistics and diagnostic features

Like other RTP-based audio tools, RAT provides reception quality statistics and user information for all participants in a conference. In addition, it has a graphical display of the loss to/from each participant, making diagnosis of problems a simple matter:

Reception Quality Matrix

Conference coordination bus

RAT implements a conference coordination message bus, whereby the user interface and media engine are separated, and communicate via an IPC mechanism. This allows for complete control of RAT by another process operating on the same host. Advantages of this split approach include:

Transcoder operation

When the bandwidth available is not constant for all participants in a conference, or when some participants do not have multicast capable access, the RAT transcoder/gateway may be used. This connects two multicast groups, or one multicast group and a single unicast host. RTP packets received from either group are transcoded into the format specified for the other group, multiple sources are mixed together, and the resulting stream is transmitted to the other group. This allows for different codecs to be used in each group, meaning that the bandwidth requirements are different.