&%PAGE& - &%PAGE& - Document AVC-355R CCITT SGXV Document AVC-355R Working Party XV/1 October 1, 1992 Experts Group for ATM Video Coding SOURCE : CHAIRMAN OF THE EXPERTS GROUP FOR ATM VIDEO CODING TITLE : REPORT OF THE THE EIGHTH MEETING IN TARRYTOWN (Sep.28 - Oct.1, 1992) Purpose: Report ----------------- PART I GENERAL The eighth meeting of the Experts Group was held at Tarrytown Hilton Inn in Tarrytown, USA, at the kind invitation of David Sarnoff Research Center and IBM. It consisted of two parts; - CCITT sole sessions : Sep. 28, - Joint sessions with ISO/IEC JTC1/SC29/WG11 (MPEG): Sep. 29 - Oct. 1. The list of participants appears at the end of this report. At the opening of the joint sessions, Dr. C. Gonzales made a welcoming address on behalf of the hosting organization. The Experts Group appreciated the support of the hosting organizations for their providing meeting facilities, D1 tape demonstration equipment and secretarial services. Before starting discussions, we confirmed the following objectives of this meeting; Sole sessions - To prepare for the joint sessions in issues of CCITT's concern, particularly low delay, cell loss resilience, compatibility - To prepare for the Ipswich meeting Joint sessions - To elaborate Test Model - To prepare for the London meeting PART II SOLE SESSIONS 1. Date September 28, 13:00 - 19:00 2. Documentation (TD-2) For the CCITT sole sessions, 34 AVC-numbered documents and 5 temporary documents were made available as in Annex 1. 3. Tape demonstration The meeting reviewed a number of D1 tape demonstrations as listed in Annex 2. 4. Low delay mode 4.1 Prediction (AVC-327,339,340; TD-5) One of the main issues regarding the low delay mode is to clarify what predictions are required for the low delay mode since the prediction is a primary source of the coding efficiency improvement in this mode. Three documents reported experimental results on improvements obtained by "intelligent" predictions such as FAMC, SVMC, DUAL and DUAL'. Mr. Yukitake provided a summary table (TD-5) listing SNR improvements, indicating that those new intelligent predictions bring significant improvements (more than 0.5 dB) in some sequences such as Flower Garden, Mobile & Calendar, Cheer Leaders, compared to Field/Frame adaptive MC. It is a common understanding of the meeting, however, that SNR evaluation alone may mislead us. Particularly, different coding controls produce different pictures without direct correspondence to SNR values. After having reviewed demonstration tapes, Chairman asked the participants of their impression. Roughly 1/4 of them voted for having observed impressive improvements and 3/4 for not. The matter was left for impression of the larger population and consideration in the context of the general prediction scheme in the joint sessions. Chairman raised a related question whether this additional prediction mode can be optional in the encoder and mandatory in the decoder as H.261 motion compensation is. This question is generalized into what modes are inside the "maximum core" and what are outside. The members are requested to study how to apply the generic coding (H.26X) to our communication terminals and feedback to the structure of coding standard. It was pointed out that in communication applications, two way channel may allow negotiations of the operational mode between two terminals at the start of call. It was pointed out at the same time that B-ISDN applications will need one way distribution in the form of broadcast or multicast. 4.2 Steady state delay (AVC-327,329) Intra slice/column is compared to intra picture in the two documents. The meeting concluded that the former is clearly favorable in delay performance at the same coding efficiency. Leaky prediction has been identified as an alternative to smooth out the number of bits generated throughout sequences, but there was no input in this respect. 4.3 Scene change handling (AVC-327,328,330) AVC-328 showed that if the transmission buffer has a capacity equivalent to twice the frame time (buffering delay = 1 frame time), graceful scene changes are achievable without picture skipping. AVC-327 showed that picture skipping can maintain low delay during the steady state though some temporal degradation is observed at a scene change. The meeting concluded that the standard should allow occasional picture in the low delay mode. AVC-330 discussed the following items to include picture skipping in the standard; - Buffering (modification of VBV specifications) - Picture header (to be sent with a flag for skipping) - Temporal reference (to indicate source picture number) The meeting accepted the proposals and agreed to put them forward to the consideration of the joint sessions. 4.4 Modified TM specifications (AVC-335,348) It has been found that according to the current rate control for the low delay mode the number of bits decreases as time passes. AVC-335 reports that there were two solutions experimented, and they equally worked well. The meeting concluded that this can be left to the choice of Low Delay Adhoc group, chaired by Mr. T. Yukitake, which may take into account other factors, if any. AVC-348 provided clarifications for the low delay mode experiment; handling of the first picture for the intra picture case and skipped pictures. The meeting supported this proposal, leaving the skipping order issue for further experiments. 5. Cell loss resilience 5.1 Overview (AVC-351) Mr. M. Biggar, Chairman of Adhoc group for "ATM, Packet Loss and General Error Resilience", provided a progress report covering possible error resilience techniques and their characteristics. Our objective is to answer the question of what elements should be included for cell loss resilience in the standard. 5.2 Spatial and temporal localization (AVC-332) AVC-332 classified cell loss resilient techniques into spatial localization and temporal localization. The meeting agreed to the view that both should be provided by the standard for the single layer coding. Spatial localization techniques are for reducing the corrupted area in a picture, and temporal localization techniques are for reducing the time the corrupted pictures are displayed. The two localizations may similarly be mapped to the layered coding, but needs further consideration. 5.3 Spatial localization (AVC-332,333) Both documents presented the following techniques are effective for spatial localization; - concealment, and - structured packing. The latter requires definition of new syntactic element(s) for Absolute MacroBlock (AMB) which has an absolute macroblock address and MQUANT, and resets motion vector coding. The choice is left for further study. It should be also studied whether short slice be applicable for the same purpose and how the multimedia multiplex structure gives impact on the applicability of this technique. 5.4 Temporal localization (AVC-332,333) Both documents compared - intra slice or intra picture, and - leaky prediction as a method for temporal localization. Though either technique has been proved effective, leaky prediction is concluded a little better as far as recovery from the cell loss is concerned. It has been demonstrated that by use of both spatial and temporal localization techniques, disturbance due to cell loss of 10E-3 can be controlled to an acceptable extent. 5.5 Leaky prediction (AVC-331,332,333,349,350; AVC-329) Various experiments were carried out for the leaky prediction defined in Core Experiment No.6. From coding efficiency point of view, the leaky prediction has been found almost competitive against the perfect prediction with cyclic intra refreshing. The leaky prediction, however, has two inherent problems. One is residual error due to limited arithmetic accuracy in the prediction loop (limit cycle). AVC-332,349 provided analysis of the problem, and furthermore AVC-349 demonstrated this problem by coding a still picture and gave a recommended solution by use of a pseudo-random auxiliary signal. The other problem is noisy background as pointed out in AVC-331,333. This is caused by the fact that the leaky prediction output converges to 128 as time passes but recovers when a prediction error exceeds a quantizer threshold. A solution for this problem is not yet found. With respect to channel hopping, the leaky prediction has been confirmed to provide graceful recovery. Chairman raised whether leaky prediction makes intra refresh to cope with the IDCT mismatch unnecessary. This awaits further study. 5.6 Layered coding (AVC-345) AVC-345 provided a simulation guideline for layered coding experiments to study its cell loss resilience property. The meeting supported this guideline. Mr. Dunstan indicated Australia will present experimental results at the next meeting. 5.7 Generic coding and adaptation to media (AVC-351) H.26X is intended to be generic across a wide range of applications. On the other hand error resilience should be media specific; e.g. ATM networks require cell loss resilience, STM networks require random/burst error resilience. We should provide answers to the following questions; - At what level, bitstreams can be interchangeable across applications? - What side information is required for interchanging bitstreams? Chairman suggested a case study for transporting picture data retrieved from a disk to a communication terminal through an ATM network . It is felt that an illustrative model will help us. FEC needs study whether it be applied at the separate level of audio, video, data, etc. or at the multiplexed level. It is envisaged that different transport media need different FECs. 6. H.261/H.26X compatibility (AVC-334,341,347,352) AVC-334,341 provided experimental results to compare the two compatible coding structures; - prediction from the base layer, and - prediction of the prediction error. From these experimental results as well as from consideration for flexibility of the structure (i.e. relation between the upper layer picture and base layer one is not fixed as far as upconversion methods for the base layer picture is defined), the meeting decided to adopt the prediction from the base layer. This structure allows HDTV/TV compatibility, smooth PAN/SCAN etc. where exact 2:1 relationship of resolution does not appear. Mr. Parke stressed that upconversion can be covered by a generic method. Other experimental results in AVC-334,347,352 confirmed that embedded coding outperforms simulcast at the total bit rate of 4 Mbit/s. Chairman requested members to consider whether this functionality be in or outside the "maximum core", in other words, whether all the H.26X coder/decoders should have this functionality. The notion of "flexible layering" has been reminded in this respect. 7. Scalability (AVC-343,344,346,353) Experimental results on the following topics were shortly reviewed; - frequency pyramid, - scalable side information, - interlace-in-interlace extraction. - frequency scanning We did not prepare specific comments of the Experts Group for this topic. 8. Prediction (AVC-337,338; AVC-339; AVC-329,331,333,349,350) The meeting shortly reviewed available documents for various prediction methods. The questions is what prediction modes are appropriate for the generic standard. This is left for the consideration of the joint sessions (see Section 4.1). 9. Quantization There was no input from the Experts Group members on this topic. 10. Video clock recovery (AVC-336) AVC-336 proposed provision of 8 bit field in the picture header for video clock recovery. There were comments that there are cases where a common network clock is not be available at the transmitter and receiver, and other cases where pre-recorded materials are retrieved, thus the common clock can not be used at the time of coding. The meeting suspended discussion until the next meeting, and decided that AVC- 336 be submitted to the joint sessions as an informational document. 11. PSTN videophone (AVC-342) European countries expressed that it is appropriate to develop standards for the PSTN videotelephony complementing ISDN ones and suggested guidelines. Japan stated that they will submit a paper to the WPXV/1 meeting in November, addressing interworking between PSTN and ISDN videophones. Mr. Schaphorst informed the meeting that T1A1 is establishing a related project subject to the last approval. The meeting considered what should we recommend to Working Party XV/1 as a continuation of the New Jersey meeting in July. Chairman proposed to recommend organizing a rapporteurs (experts) group for this task, covering both short term and long term study items. This matter will be finalized at the next meeting in Ipswich. 12. Work plan (TD-3) Mr. Biggar provided comments requesting to set up clear objectives of the Experts Group. Since the framework for H.26X is being formed, it is the time to review and clarify our future work plan. Members are requested to consider this toward the next meeting. 13. Status report (TD-4) Members are requested to review the draft and return comments to Chairman by October 10. 14. Preparation for the joint sessions 14.1 Documents All the contributions to this meeting were put forward to the joint sessions except AVC-342. See also Section 10. 14.2 Reporters The meeting appointed reporters of the joint sessions as follows; Adhoc CCITT EG Adhoc Group Chair Reporter ------------------------------------------------------ Overall D. LeGall S. Okubo Low delay T. Yukitake T. Yukitake Cell loss resilience S. Dunstan S. Dunstan Compatibility A. Puri I. Parke Scalability E. Viscito O. Poncin Prediction H. Watanabe H. Watanabe Quantization N. Wells G. Eude Their roles are; - to reflect the discussion of the sole sessions for consideration of the joint sessions - to provide a report containing conclusions, discussed items, action points, etc. by Oct. 9 at the latest. 15. Others 1) Editor for the common text The meeting appreciated that Mr. M. Biggar had volunteered to be an Editor representing the Experts Group. 2) Next meeting - Ipswich meeting of the Experts Group (Oct. 28 - 30) - IVS Technical Session in Ipswich (Oct. 26 - 27) PART III JOINT SESSIONS WITH MPEG 1. Date September 28 - October 1 2. Documentation The list of documents considered during the joint sessions is attached as Annex 3. 3. Video Test Model (S. Okubo) 3.1 Work plan The following was confirmed toward freezing technical specifications at the Sydney meeting in March 1993; - New technical possibilities will no longer be accepted after the london meeting and the members should focus instead on resolving and refining the core experiments - In the spirit of convergence process, proposals that are not actively worked on should be dropped from the list of core experiments and resolved by taking the older alternative. - As many as possible of the members are requested to devote their energy to solve the outstanding problems, e.g. scalability and compatibility. Current activities should be reviewed at a joint Video/Requirements session in London to identify those areas that need more work. - An additional WG11 meeting will be held during January 25-29, 1993 in Torino. 3.2 Patent statements Members are requested to submit their patent statements relevant to the current phase of work at an early occasion, if possible by the London meeting. 3.3 Profile We initiated to study how to structure the generic standard. "Profile" has been thought to assist this work, but it needs clear definition for our particular case. A summary of discussion on this topic is contained in Annex 4. Use of the word "profile" should be avoided until clarification is obtained. 3.4 Liaison toward CCIR Since CCIR TG11/4 is going to meet during October 13-15, and it will deal with comments on the requirements listing, it was decided to send a provisional updated version of the integrated requirements listing which is now being worked out by Adhoc Group established in Rio. 4. Low delay (T. Yukitake) 4.1 General The group met a couple of times in this week, one is joint meeting with prediction ad-hoc group and another is for itself. 4.2 Prediction modes for low delay mode (Core experiment No.1; Joint meeting with prediction group) - What prediction modes are appropriate for the low delay mode? We need special prediction modes (S-FAMC,SVMC, Dual') or not? 1) Among the three coding structures preferable for low delay profile, field structure with M=2 has the best performance and frame structure with M=1 is the next and field structure with M=1 the last. The coding performance can be improved at the cost of basic delay. 2) In terms of SNR, the introduction of special predictions can improve the coding efficiency. It is very desirable for low delay, because we can save the basic delay and hardware cost to get the better image quality. For example, the SNR of field structure M=2 with fi/fr prediction is as that of frame structure M=1 with special prediction. By introducing the special prediction, we do not need to use the bi-directional prediction and it makes the hardware simple and delay low. There is a discussion about the necessity of special prediction mode, and we can not get the firm conclusion. However, the feeling of the meeting seems to be positive. 3) These three special predictions seem to similarly solve the same problems. Therefore we should have an effort to unify these predictions or select one of these. To unity them or select one prediction, a new core experiment (L10 defined in the prediction ad-hoc group) is carried out towards the London meeting. 4.3 Intra picture vs Intra slice/column (Core Experiment No.2) 1) The coding efficiency of intra slice and intra picture is almost the same in terms of SNR. 2) The delay of intra slice is more than 10 times shorter than that of intra picture. 3) The coding efficiency of intra slice and intra column is almost the same in terms of SNR. 4) In case of low delay profile, the low delay ad-hoc group recommends to use intra slice or column in stead of intra picture. 4.4 Introduction of skipped pictures (Core experiment No.3) 1) Picture skipping is a useful technique to keep the delay law at the cost of the temporal smoothness just after the scene change. The introduction of skipped picture expands the freedom of encoder control, so the low delay ad- hoc group recommends to introduce the picture skipping. 2) In order to keep low delay except for transient periods, the decoder should identify dropped pictures so that it does not take time to display them. We have two choices ; - to send only their picture headers - to send no data at all In case of the picture header transmission, we need 1 bit indication in the picture header for distinction between the case of picture skipping and the case where a picture is coded with almost no coded data. We discussed these two methods, and we concluded that we did not have a strong opinion to choose one of two. This item is still open now. Some discussions are needed at the London meeting. 3) In MPEG-1, TR is defined as indicating the display picture order. In case of low delay profile, it should be defined as indication of the source picture order, because some pictures are not displayed. It is noted that if all pictures are displayed, the source picture order is equivalent to the display picture order. 4) To allow picture skipping, VBV specifications were modified. 4.4 New experiments Core experiment No.1 concerning prediction modes is re-defined as L10. A new experiment, which is concerned with the skipped picture order in field structure with M=2, is defined. 5. Cell loss resilience (S. Dunstan) 5.1 Introduction The second meeting of the Ad-hoc Group on ATM, Packet Loss and General Error Resilience was held at Tarrytown in September/October. 5.2 Transmission media One of the Ad-hoc group's aims is to recognize at an early stage what syntax changes, if any, might be required to support MPEG over different transmission media. Which media are of interest need to be recognised, along with their error characteristics. Currently only B-ISDN ATM is being investigated and mention made of satellite transmission. 5.3 Inputs There were five inputs relevant to the Tarrytown meeting, concerning - leaky prediction and cell loss error recovery, - consideration of layered coding cell loss resilience. 1) Error localisation Two documents classified the localization of the effects of cell loss as - spatial localization - temporal localization Spatial localization relates to the number of macro blocks before resynchronization of the decoder occurs. Two possibilities to control this are; - resynchronize on the next slice start code - resynchronize at the next macro block having an absolute address (structured packing) Results have shown that structured packing results in only a small change in coding efficiency. However, future work is required to investigate; - syntax changes required to implement new macro block types having an absolute address, - implementation of structured packing. It is not clear how alignment of the structured packing format could be achieved in for example AAL type 1. - the impact of the System Layer upon structured packing method Temporal localization relates to the number of frames that an error is allowed to propagate. Two possibilities of control are; - intra frame or intra slice, - leaky predictor. Two documents conclude that leaky prediction provides superior subjective performance to that of intra slice, though further refinement is required to improve the image quality in stationary conditions. The performance of leaky prediction as a method of temporal localization of errors should be further tested. 2) Layered coding A document discussed the design of an experiment to allow a fair comparison between the cell loss resilience performance of a layered coder to that of a single layer coder. Possible relationships between the two cases, when both layers of the layered coder were subject to cell loss, were presented. 5.4 Generic coding and media adaptation The MPEG-2 standard provides for generic coding. Specific media may require specific adaptation, e.g. prioritised encoding of the MPEG bitstream. Transfer of an MPEG bitstream across different media should avoid coding and recoding of the bitstream. A system model is required to identify demarcation between generic coding and media specific adaptation. The impact of the System Layer should be considered. For example the audio and video signals may require different adaptation. Also the layers of a scalable coder may require separate virtual channels on B-ISDN. 5.5 Frequency scanning It is noted that the frequency scanning method may have desirable cell loss resilience properties, because of the ordering of the DCT coefficients within the slice. The application of priority to different parts of the slice components is simple in this method. Further work should be done to investigate the error performance of frequency scanning methods. 5.6 Future work The following are proposed as core experiments; - cell loss resilience of frequency scanning method, - evaluation of leaky prediction as a temporal error localization method, where sophisticated concealment techniques are used. Results of the following work will be presented at the London meeting; - structured packing versus slice resynchronization with concealment, - refinement of leaky prediction method, - performance of layered coding versus single layer coding under conditions of cell loss. 5.7 Request for information Information is requested from the appropriate bodies concerning the following; - error characteristics of satellite transmission, - impact of Systems Layer on media adaptation, - other media of interest. 6. Compatibility (I. Parke) The relevant documents in this group were MPEG92/ 421,430,458,462,464,465,485,492,495,506,509. The discussion focused on three topics, review of core experiment results, improvements to compatible prediction modes and interlace-to-interlace conversions. The core experiment on comparison of prediction of prediction error and prediction from the base picture was reported in documents 92/430, 92/458 and 92/492. These showed that prediction from the base picture was more efficient though in the case of field structured pictures the difference was small. The group decided to adopt the prediction from the base picture as the compatible prediction method. Improvements to the compatible prediction modes are documented in 92/485, 92/495 and 92/506. The compatible prediction mode was improved by weighting with the normal TM2 prediction. The weighting used in 92/485 and 92/495 was a simple averaging. Document 92/506 described a more general structure with more weightings. The technique has been given the name 'spatial-temporal weighted prediction'. The spatial prediction is from Mpeg-1 and the temporal prediction is the TM2 motion compensated predictions. There is a proposed core experiment to study suitable weightings. The group agreed to unify the syntax for the compatible experiments. This has led to a proposal for a new set of macroblock type tables for when a macroblock is coded compatibly. There is also a 2 bit weight code associated with compatible macroblocks. There was discussion on other syntax issues such as pan vectors, windowing and upconversion tables. The group was unable to resolve these issues. An interim solution was to define a code word with 3 fields that indicated the compatible standard, the picture format and the subsampling ratios. The group will continue to work on improving this. On interlace to interlace compatible coding. Columbia University presented results on motion compensated deinterlacing. They had tried using the low resolution vectors for this. Their conclusion was that this did not give acceptable results and were now looking to use the high resolution vectors. A further technique on interlace to interlace conversion is described in 92/509. Here a 'spatial-temporal' technique is used without motion compensation. This gave better results than the existing technique documented in TM2. A core experiment has been proposed to compare the technique of 92/509 with that in TM2. Further solutions are also requested. The group has much to do. The core experiments proposed are to improve the compatible prediction modes and to find a good solution for interlace-to- interlace compatibility. 7. Scalability (O. Poncin) 7.1 Status Report Ten core experiments on scalability had been defined during the Rio meeting. They are contained in the appendix I of TM2. The status of each of them was reviewed at Tarrytown. Among those 10 core experiments: - one (I.2) was completed and led to a new core experiment proposal (I.12), - one (I.5) was not yet started, - the remaining 8 core experiments are in progress, results of them are expected by the London meeting. The main topics which the scalability ad-hoc group is dealing with are; - to extract an interlaced downsampled signal from an interlaced source, - to reduce the drift effect in the lower resolution layers, - to improve the coding efficiency in as many layers as possible, - to adapt the amount and the accuracy of the transmitted information layer by layer, - to define a rate control layer by layer, - to compare block scanning coding and frequency scanning coding. The prediction, scalability and compatibility ad-hoc groups met jointly on Wednesday morning (from 11 to 13). Two recommendations were issued by the scalability group. The cell loss resilience, scalability and compatibility ad-hoc groups met jointly on Thursday morning (from 9 to 11). The frequency scanning technique was presented to ATM people. This technique was found promising for cell loss resilience; a new core experiment on that topic was decided. Other people expressed their wish to achieve the same goal by keeping the usual block scanning technique. 7.2 Recommendations The scalability ad-hoc group recommends that MPEG/Video consider; 1) the proposal for core experiment I.11 concerning the comparison of several codec options which represent different method of frequency domain scalable coding, 2) the proposal for core experiment I.12 concerning the use of adaptive inter-scale prediction in a situation where lower layer rate is controlled. The scalability ad-hoc group recommends that MPEG/field-frame prediction ad- hoc group; 1) study the use of special prediction modes in scalable systems 2) clarify the level at which motion compensation modes will be specified or adapted. 7.3 Statement on convergence in scalability The requirement for very high quality in all layers of a scalable system may conflict with the desire for low implementation complexity. This has led to the investigation of two classes of solutions. The possibility of an unified solution is still being considered. 8. Prediction (H. Watanabe) 8.1 Introduction Many documents and video tapes were presented in the field/frame prediction adhoc group at the Tarrytown meeting. This report describes the results of discussions and conclusions of the adhoc group. 8.2 Document list related to the prediction group at the Tarrytown meeting 422 Watanabe Meeting announcement, agenda and call for contributions 432 Yukitake Simulation results on prediction and DCT mode coupling for S-FAMC 433 Yukitake Simulation results on S-FAMC and Dual' 434 Noguchi Results of some prediction experiments 437 Nagata Results of Core Experiments on Simplified FAMC, Dual-Prime and 438 Nagata Test Model Simplification 440 Takahashi Simulation Results on Global Motion Compensation Core Experiment 441 Sugiyama Results of Prediction Experiments 442 Sugiyama Results of Leaky Prediction Experiments 448 Yagasaki Simulation results on TM2 core experiment of prediction mode No.8 453 Nakajima Results of prediction core experiments on TM2 (SFAMC,SVMC,DUAL') 456 Odaka Simulation results on prediction core experiment No.3 (Dual- prime) in TM2 459 Kameyama Comparison of Prediction Modes and Simplified Test Model 461 Yukitake Clarification of Appendix H in TM2 463 Yu Results of core experiments H2.1 and L7 467 Nishikawa Core experiments: TM prediction methods and simulation results 471 CCITT/J Coding efficiency of leaky prediction 493 Reibman Leaky prediction: Eliminating the limit cycle 494 Reibman Leaky prediction: Experiments results 499 Corset Additional tests on FAMC compensation mode 505 Madec Results on leaky prediction experiments 510 Watanabe Intermediate report of the adhoc group on field/frame prediction experiment 511 Paik A proposal for switching the coding mode (Intra/Inter) on a block basis 512 Paik A proposal for specification of 8x8 motion vectors 518 Savatier Simulation results on prediction modes 519 Savatier Reference fields for forward prediction of p-fields 520 Savatier Correction to test model 521 Savatier Frame prediction in field-pictures 523 Wong TM2 Errors 536 Koster Tarrytown recommendations of test model editing 8.3 Allocation of the contribution 8.3.1 Core experiments L.1 Simplified FAMC; 431, 432, 433, 434, 437, 459, 499 L.2 SVMC; 453, 518 L.3 Dual'; 433, 456, 459, 467 L.4 Global MC; 440 L.5 Leaky prediction 1; 442 L.6 Leaky prediction 2; 471, 493, 494, 505 L.7 Reverse order prediction; 463, L.8 Simplification of Test Model; 448, 459, 467 L.9 16x8 MB; 441 L.10 Special prediction mode Annex 5.3 L.11 8x8 Motion vectors; 512 L.12 8x8 Intra/Inter 511 8.3.2 Others Document #422, 510, 519, 520, 521, 523 8.4 Circulated e-mails after 8/15 L.1 Simplified FAMC; Koster(8/26),Savatier(8/26,8/28,9/1),Yukitake(9/1) L.2 SVMC; Yukitake(8/21),Savatier(8/28) L.8 Reverse order prediction; Anastassiou (9/23) L.9 Simplification of Test Model; Yagasaki(9/22) 8.5 Discussion 8.5.1 Special Predictions (Simplified FAMC, SVMC, Dual') - Joint meeting with Low delay group (See Annex 5.1) - Three candidates seem to have similar improvement, effective to constant vertical motion - Difference from field/frame adaptive is not definitely significant by tape- viewing - Purpose is recognized to supply high performance mode for low delay application - Agreed to merge three into single mode - Prepare new core experiment (L10, see Annex 5.3) - We will choose one among five sub-candidates in L10 core experiments in terms of the subjective quality and hardware complexity in the future meeting. Simulation at M=1 has a priority. Test should be carried out for field- picture and frame-picture. 8.5.2 Global MC - Tape demonstration by Matsushita - 0.2 to 0.3 dB improvement than I-P structure in I-picture - needs more experimenter 8.5.3 Leaky prediction 1 - Tape demonstration by JVC - Role of this prediction is a loop filter, different from error resilience purpose - needs more experimenter 8.5.4 Leaky prediction 2 - Tape demonstration by AT&T, CCETT - Limit cycle problem is solved by AT&T proposal - Intra slice has advantages than leaky prediction in terms of coding gain - Delay between leaky prediction and intra slice is not compared - Ask ATM group how to decide to include it into TM2 8.5.5 Reverse order prediction - Tape demonstration by Columbia Univ. - Field coding with Dual gives sufficient quality - Dual can be replaced by SVMC or Dual' - Needs more experimenters 8.5.6 TM Simplification - Tape demonstration by SONY. - Coupling of MC mode and DCT mode gives simplicity - Information to define "decoder's level" (TM2, p,124) according to the hardware complexity 8.5.7 16x8 MB - Tape demonstration by JVC. - Needs unified syntax in TM2 - See Annex 5.2. 8.5.8 Specification of 8x8 motion vectors - New core experiment (L11) (MPEG92/512) - Decision rule; first 16x16Inter or 8x8Inter, next Inter or Intra, both a posteriori decision - A posteriori decision needs more discussion at the London meeting - Necessary information for field coding will be provided by Mr. Paik (wpaik@gi.com) 8.5.9 8x8 Inter/Intra decision - New core experiment (L12) (MPEG92/511) - Current intra macroblock can be partly inter-coded - Intra_block_pattern is used to specify intra block in a macroblock at the macroblock layer 8.5.10 Answer to TM2 editorial group - See Annex 5.2. 8.5.11 Joint meeting with Compatibility and Frequency Scalability group 1) Prediction and compatibility - Compatibility, Spatial Scalability group may use either field-picture or frame-picture - 16x16 / 16x8 motion compensation block is desirable for field coding - Field/frame adaptive prediction can be used for frame-picture case - Needs complete syntax for 16x8 macroblock in TM2 2) Prediction and frequency scalability - Frequency scalability group use frame-picture, and field-frame adaptive prediction. - New high performance prediction needs to be checked whether it fits several layers or not - A level of the layer that determines prediction modes should be notified to the scalability group - Current level is macroblock layer 8.6 Summary - Core Exp. L1, 2, 3 (FAMC, SVMC, Dual') should be merged to a new core experiment (L10). - Core Exp. L4, 5, 7 (Global MC, Leaky prediction 1, Reverse order prediction) needs another experimenter. - Core Exp. L11, L12 (8x8 motion vector, 8x8 intra/inter) are new core experiments. - Core Exp. L6 (Leaky prediction 2) should be treated in ATM group. - We fixed some ambiguity of TM, but still need more elegant syntax. - For Core Exp. 10, members are requested to consider a matching feature with scalability and compatibility. 8.7 Recommendations of Frame/Field prediction adhoc group - Core Exp. L1, 2, 3 (FAMC, SVMC, Dual') should be merged to a new core experiment (L10). The adhoc group will choose one among five sub-candidates in L10 core experiment in terms of the subjective quality and hardware complexity in the future meeting. Simulation at M=1 has a priority. Test should be carried out for field-picture and frame-picture. Members are requested to consider a matching feature with scalability and compatibility. - Core Exp. L4, 5, 7 (Global MC, Leaky prediction 1, Reverse order prediction) needs another experimenter. - Core Exp. L11, L12 (8x8 motion vector, 8x8 intra/inter) are identified as new core experiments. - Core Exp. L6 (Leaky prediction 2) should be treated in ATM group. - The adhoc group will fix ambiguities of the description for prediction in TM. - No more core experiment will be accepted after the London meeting. The current core experiments are; L.4 Global MC; L.5 Leaky prediction 1; L.6 Leaky prediction 2; L.7 Reverse order prediction; L.8 Simplification of Test Model; L.9 16x8 MB; L.10 Special prediction mode L.11 8x8 Motion vectors; L.12 8x8 Intra/Inter; 9. Quantization (G. Eude) The Ad-hoc group on quantisation had several sessions during its meeting in Tarrytown and 18 documents were presented. The results of the different core experiments have first been reported and then compared in order to propose modified or new core experiments for the London meeting. Accompanying video tapes were viewed. The group also discussed on the way to decide to include or to discard new techniques. One proposal was that new technique should satisfy at least one of the following features; - visibility better quality, - bit saving around 10%, - less hardware, - syntax for generic standard. The discussions during the meeting showed that not many proposals meet this rule(!). Further considerations seem needed. The technical topics were the followings: 1) New proposal "vector quantisation" In Doc MPEG/92/525, a "vector quantisation" method is proposed to transmit the transformed coefficients. A vector is chosen to best fit the quantized coefficients with a "cost function". The coefficients which belong to the selected vector are 1-D variable length encoded. It has been pointed out that the matching method could be adapted to perceptual criteria. A new core experiment has been defined. As this technique implies dropping of coefficients, TM2 will be compared by using the same dropping. 2) VLC proposal The effectiveness of an alternative VLC for the INTRA coded pictures has been demonstrated - with MQUANT less than 8, the bit saving is about 10% for the INTRA coded pictures (MPEG/92/452). A second new proposal (MPEG/92/427), correlated to the "vector quantisation" coefficients coding, consists in using switched 1-D VLCs at the slice level. A core experiment will be defined. Modified UVLC, as proposed in MPEG/92/504 has also high efficiency in I pictures. The other advantages of this technique seems to be: frequency sclalability, SNR scalability and self-adaptability. A higher hardware cost has to be checked by the implementation group. In MPEG/92/450 it is concluded that CBP and "first coefficient trick" are not necessary (the gain is less than 0.1 dB for all the sequences). 3) Range extension and precision - INTRA DC: it has been agreed that some applications need to increase the precision of the intra DC to 9 bits. Requirement for more than 9-bit precision seems related to more than 8-bit input signals (papers for discussions are requested). - transmitted coefficients (MPEG/92/449): the results presented show that the range of +/-256 is exceeded on all sequences with MQUANT=1 or 2. Effect on bitrate is very small (MPEG/92/435). Current TM2 solution was accepted to fix this issue. - MQUANT: the ad_hoc group reached a consensus to recommend an alternative 5- bit nonlinear law for the control of quantization stepsize (as described in MPEG/92/508). The law is fixed, simple in definition and selected at the picture layer (hardware impact should be checked). 4) Scanning (MPEG/92/435, 436, 460 & 480) Considering the improvements obtained by adaptively scanning according to DCT mode (max in SNR +0.3 dB which was impossible to see on the pictures), the ad- hoc group recommended to drop this mode from the TM. Zigzag/vertical scan adaptation experiments showed some improvements particularly for field-based coding. It has been concluded that this mode needs a suitable "a priori" decision criteria and a strong support to be considered further. 5) Non 8x8 DCT block coding - DCT blocksize adaptation: the results obtained by using adaptive 8x8/8x1 DCT selection give up to 0.5 dB improvement (MPEG/92/435, 444 and 445). It has been agreed that 8x4 DCT adaptation, which does not work as well as 8x1 DCT, will not be considered further. Comparisons with non-DCT modes are needed on new sequences including vertical and horizontal scrolling text (as it can be seen at the end of the movies). - Non transform coding: the two proposals NTC1 and NTC2 (MPEG/92/451 & 478) gave improvement in SNR and in subjective picture quality as well (in particular in moving text). The main issues are the prediction method and the quantisation (linear or not). A refinement of the existing core experiment was defined in order to be able to reach only one compromise solution. For this core experiment new sequences including text will be used. In order to be sure that "new tools" are really needed, the performances must be compared with the use of MQUANT (and/or BQUANT) control based on the same "edge- detection" criteria. 6) quantization BQUANT: Results does not show improvement by using BQUANT and more results are requested. Weighting matrix: there was no result on this topic. New matrix descriptions and selection criterion will be provided by AT&T (new inputs are needed to keep this option in TM). 7) List of "new" or "modified" core - vector quantization:TCE, CNET, AT&T, GI, PHILIPS, Matsushita - VLC adaptation on MB basis: AT&T, GI, Sony, TCE - modified UVLC: HHI, Belgacom, UCL, Siemens - non-transform coding: Sony, Matsushita, AT&T, JVC, University Hannover - MQUANT control: GCT, Sony, AT&T, University Hannover, CNET. END Participants of the eighth meeting of Experts Group for ATM Video Coding (28 September - 1 October 1992, Tarrytown) Australia J. Arnold University of New South Wales S. Dunstan Siemens Ltd (CM) T. Sikora Monash University Belgium Mr. O. Poncin Belgacom CM USA Mr. B.G. Haskell AT&T Bell Labs Mr. S. Kumar Wiltel Mr. A. Luthra Tektronix Mr. D. Hein VideoTelecom Mr. D. Raychaudhuri David Sarnoff Mr. R. Schaphorst DIS CM Mr. A. Tabatabai Bellcore CM Mr. X. Yuan PictureTel France Mr. G. Eude CNET Mr. J. Guichard CNET CM Japan Mr. K. Asai Mitsubishi Mr. W. Fujikawa Matsushita Mr. Y. Nakajima KDD (CM) Mr. S. Nogaki NEC Mr. S. Okubo NTT Chairman Mr. K. Sakai Fujitsu Mr. H. Watanabe NTT (CM) Mr. T Yukitake Matsushita Communication Norway Mr. H. Sandgrind NTA CM Netherlands Mr. A. Koster PTT Research (CM) UK Mr. I. Parke BT (CM) CM: Coordinating Member (CM): Substitute for CM