CCITT SG XV
Working Party XV/1
Specialist Group on Coding for
Visual Telephone

Doc. No.: 299 Jan. 88

Title: AEG 64 kbit/s Video Codec with 8 DSP's

Source: FRG

## 1. The Algorithm

Key element of coders with high reduction factors needed for video telephony in ISDN with 64 kbit/s transmission rate is movement compensation and interpolation. Today's mostly used technique is block matching. This technique is very efficient but has two main disadvantages:

- The integer displacements used imply high frequency artefacts at the block borders in the estimated picture if neighbouring blocks have different displacements.
- The block matching vectors cannot be used for movement adaptive interpolation of skipped fields because these vectors are in general not the "true" movement vectors.

In the implemented AEG codec we use a regression analysis in order to discard erroneous estimates of of block displacements, 2D prediction of the vectors in order to save transmission bits and nonlinear filtering of the displacement field thus avoiding high frequency artefacts at the block borders. For typical test sequences ("Missa America", "Claire", "Alexis") this technique results in a gain up to 6 db in S/N ratio of the reconstructed pictures compared with simple block matching, and in a gain of 1.5 db compared to block matching with filtering in the loop. A further big advantage is the possibility to use the displacement field immediately for interpolation of not coded fields thus generating smooth movement reproduction on the receiver side without further calculations and delay.

## 2. The architecture of the hardware

This technique of motion estimation, compensation and interpolation is implemented in the AEG codec for ISDN video telephony with 64 or 48 kbit/s. The spatial resolution is 288 \* 352 pixels for luminance and 2\*144\*176 for chrominance as proposed by Study Group XV of CCITT for video conference applications. The coded field frequency is fixed to 8.33 or 10 Hz. Movement compensation and quantizing of the residual prediction errors is based on a 16\*16 block raster. The coding of the residual prediction errors is performed with cosine transform, adaptive classification and quantizing using different scanning classes.



An implementation with DSP's implies big efforts to reduce the computational power needed for displacment search, calculation of the motion compensated estimated picture (with subpel accuracy!) and transform. We could reduce the number crunching algorithms from a first figure of 200-300 Mops down to 60 Mops without spending accuracy in estimating displacments or transforming blocks (even in the worst cases of a fast pan or a scene cut).

The figure shows the architecture of the hardware. The control bus is a VME Bus; the fast bus between the memory (4 fields) and the DSP's has a transfer rate of 16 Mbyte/s. The DSP type is ADSP 2100 from Analog Devices. They operate in the pixel domain and calculate block attributes (e.g. the block displacements), while the microcomputer (MC 68020, 25 MHz) operates in the picture and block domain, controls the whole system and performs the coding for transmission.



At the moment our codec works with 8 DSP's for coding and 4 for decoding. Announced new products will allow to perform coding and decoding in one system with 4 or even 2 DSP's in the next future. At the moment no pre- or post processing is used and the movement adaptive interpolation is not yet implemented in the hardware codec. The tape produced January 19, 1988 shows the real time scene from our lab with 8.33 coded pictures/s, while the simulation results are produced with the same algorithms but including the movement adaptive interpolation and a post filteriung of the reconstructed pictures.