Summary

Recommendation ITU-T G.711.1 describes an ITU-T G.711 embedded wideband speech and audio coding algorithm operating at 64, 80 and 96 kbit/s.

The encoder input and decoder outputs are sampled at 16 kHz by default, but 8‑kHz sampling is also supported. When sampled at 16 kHz, the output of the ITU-T G.711.1 coder can encode signals with a bandwidth of 50‑7000 Hz at 80 and 96 kbit/s, and for 8‑kHz sampling the output may produce signals with a bandwidth ranging from 50 up to 4000 Hz, operating at 64 and 80 kbit/s (the bandwidth of the narrowband signal output from the decoder is characterized by the built‑in split‑band filterbank which has a frequency cut‑offs at 4000 Hz). At 64 kbit/s, Recommendation ITU‑T G.711.1 is compatible with Recommendation ITU-T G.711; hence an efficient deployment in existing ITU‑T G.711-based voice over IP (VoIP) infrastructures is foreseen. The coder operates on 5 ms frames, has a maximum algorithmic delay of 11.875 ms, and has a worst‑case computational complexity of 8.70 weighted million operations per second (WMOPS).

The encoder produces an embedded bitstream structured in three layers corresponding to three available bit rates: 64, 80 and 96 kbit/s. The bitstream can be truncated at the decoder side or by any component of the communication system to adjust the bit rate to the desired value, but since it does not contain any information on which layers are contained, an implementation would require outband signalling on which layers are available.

The underlying algorithm has a three‑layer coding structure: log companded pulse code modulation (PCM) of the lower band including noise feedback, embedded PCM extension with adaptive bit allocation for enhancing the quality of the base layer in the lower band, and weighted vector quantization coding of the higher band based on modified discrete cosine transformation (MDCT).

Annex A defines an alternative implementation of the ITU-T G.711.1 algorithm using floating-point arithmetic to facilitate its use on hardware optimized for floating-point operations. The accompanying floating-point C-code is fully interoperable with the fixed-point C-code and provides equivalent quality.

Annex B contains the RTP payload format, capability identifiers and parameters for signalling of ITU‑T G.711.1 capabilities using Recommendation ITU‑T H.245. The packet format is fully compatible with the corresponding ITU‑T G.711.1 RTP definitions to allow seamless interoperability.

Annex C describes an algorithm applying ITU-T G.711.0 lossless compression algorithm to ITU‑T G.711.1. As Recommendation ITU-T G.711.0 is more efficient when applied to large frame sizes, to achieve efficient compression rate as many ITU-T G.711.1 frames as possibly supported by ITU-T G.711.0 are encoded together. The use of this extension introduces no quality degradation when compared to Recommendation ITU‑T G.711.1, as it is a lossless encoding of the ITU‑T G.711 portion of the ITU‑T G.711.1 bitstream. Furthermore, there is no additional algorithmic delay; the delay will be the same as the one of ITU‑T G.711.1, plus the selected size of the packet minus five milliseconds. It keeps the same robustness against packet losses as Recommendation ITU‑T G.711.1 and no error propagation in case of frame errors. The proposed scheme can easily be transcoded to ITU‑T G.711.1 or ITU‑T G.711.0 at minimum complexity.

Annex D describes a scalable superwideband (SWB, 50‑14000 Hz) speech and audio coding algorithm operating from 96 to 112 kbit/s for ITU-T G.711.1 80 kbit/s core, and operating from 112 to 128 kbit/s for ITU-T G.711.1 96 kbit/s core. The ITU-T G.711.1 superwideband extension codec is interoperable with both ITU-T G.711 and ITU-T G.711.1. The output of the ITU-T G.711.1 SWB coder has a bandwidth of 50-14000 Hz. The coder operates with 5 ms frames, has an algorithmic delay of 12.8125 ms and a worst case complexity of 21.498 MOPS. By default, the encoder input and decoder output are sampled at 32 kHz. The superwideband encoder produces an embedded bitstream structured in two layers corresponding to two available bit rates from 96 to 112 kbit/s or from 112 to 128 kbit/s with a step size of 16 kbit/s depending on the chosen ITU-T G.711.1 core. The bitstream can be truncated at the decoder side or by any component of the communication system to instantaneously adjust the bit rate to the desired value with no need for out-of-band signalling. At ITU-T G.711.1 80 kbit/s mode or 96 kbit/s mode, ITU‑T G.711.1 SWB is fully interoperable with ITU-T G.711.1. The underlying algorithm includes three main parts: higher band enhancements, bandwidth extension (BWE) and transform coding in modified discrete cosine transform (MDCT) domain based on algebraic vector quantization (AVQ).

Annex E describes a proposed draft of an alternative implementation of ITU-T G.711.1 Annex D based on floating-point arithmetic. While Annex D provides a bit-exact, fixed-point specification with the fixed-point C-source code available from the ITU-T, alternative floating implementation is useful for platforms equipped with floating-point processors. This alternative floating-point arithmetic was found to be fully interoperable with Annex D in all configurations including the cross configurations.

Annex F describes a stereo extension of the wideband codec ITU-T G.711.1 and its superwideband extension, ITU-T G.711.1 Annex D. It is optimized for the transmission of stereo signals with limited additional bitrate, while keeping full compatibility with both codecs. Annex F operates from 96 to 160 kbit/s: five superwideband stereo bitrates from 112 to 160 kbit/s and two wideband stereo bitrates at 96 and 128 kbit/s. The wideband stereo modes are backward compatible with legacy ITU‑T G.711 and ITU-T G.711.1, while the superwideband modes offer the backward compatibility with mono narrowband ITU-T G.711, mono wideband ITU-T G.711.1 and superwideband ITU‑T G.711.1 Annex D. The stereo codec operates on 5 ms frames with an algorithmic delay of 18.125 ms for wideband stereo and 19.0625 ms for superwideband stereo. The encoder input and decoder output are sampled at 16 kHz and 32 kHz for wideband and superwideband operating modes respectively. The underlying algorithm includes three main parts: stereo parameter analysis and down-mix at the encoder and stereo synthesis at the decoder. The first stereo extension layer is a 16 kbit/s layer comprising the basic stereo parameters, the whole wideband inter-channel time difference/inter-channel phase difference/inter-channel coherence, sub-band inter-channel level differences and the low frequency sub-band inter-channel phase differences. The second stereo layer is a 16 kbit/s layer. In this last layer, the inter-channel phase differences of a larger bandwidth are transmitted, which allow to further improve the stereo image. The bitstream can be truncated by the decoder, or by any components of the communication system, to instantaneously adjust the bitrate to the desired value, including narrowband ITU-T G.711, wideband ITU-T G.711.1 and superwideband ITU-T G.711.1 Annex D bitrates, with no need for out-of-band signalling.

Appendix I describes a supplementary postfilter for use in the decoder. This postfilter enhances the quality of the decoded signal when a legacy ITU-T G.711 or only the basic log companded PCM part of the ITU-T G.711.1 bitstream are available. It is intended for end‑user terminals and usage in tandem scenarios should be avoided (such as in a signal mixer or bitstream translators).

Appendices II and III provide information on frame size selection and on decoding of the ITU‑T G.711.0 bitstream part of ITU-T G.711.1 LLC bitstreams, respectively.

Appendix IV to ITU-T G.711.1 defines a coding scheme for mid-side (MS) stereo using ITU‑T G.711.1 Annex D (ITU-T G.711.1-SWB). By introducing the mid-side stereo coding into stereo terminals, interoperability with the monaural devices could be obtained in very low complexity. The basic coding scheme is as follows: two channels of the left-right (LR) stereo are converted to those of the MS stereo and then the signals of each channel are independently encoded using ITU-T G.711.1-SWB; and at the decoder side, the MS channels of the bitstream from the encoder are decoded respectively and then the decoded signals of the MS channels are reversed to those of the LR channels. The LR-MS conversion and its inverse are conducted in a conventional way. On the encoder side, additional two arithmetic operations per sample are required for the LR-MS conversion and one operator for the MS-LR conversion in the decoder. In an STL2009, see Recommendation ITU-T G.191 (2010) basic operator implementation, the conversion complexity amounts to about 0.2 WMOPS in total. The coding algorithm for each channel is identical to the one in ITU-T G.711.1 Annex D.

ANSI C source code is provided for the algorithms specified in the main body of this Recommendation and in Annexes A, C, D, E and F. These ANSI C source codes are an integral part of this Recommendation.

A non-exhaustive set of test signals for use with these ANSI C source codes is also provided as an electronic attachment to this Recommendation. It should be noted that some of the text vectors are too voluminous for distribution with the source code, in particular for Annex F. The test vectors can be downloaded for free from the ITU web site at: http://itu.int/net/itu-t/sigdb/speaudio/‌Gseries.htm#G.711.1.