&%PAGE& - &%page& - Annex 5 to Doc. AVC-355R Annex 5 to AVC-355R Frame/field prediction Adhoc Group Annex 5.1 Frame/Field Prediction, Low Delay Adhoc Group Joint Meeting Report Title: Discussion on Core Experiments L1, L2, L3, H1 Several prediction techniques which accommodate interlace structure are proposed as core experiments L1, L2, L3. These are (1) FAMC, (2) SVMC, (3) Dual'. They are also listed up as a special mode to improve picture quality for low delay core experiment (H1). Simulation results of these prediction modes are demonstrated, and they are recognized as the technique having the same level of performance in terms of SNR and picture quality by tape viewing. 1. Tape demonstration TI FAMC, Dual', TM simplification Toshiba FAMC, Dual' Sharp FAMC Mitsubishi Dual' LEP FAMC KDD SVMC, FAMC, Dual' Matsushita FAMC TCE SVMC GCT FAMC, 2. Discussion 1) Purpose: Purpose of the special prediction mode is mainly for the low delay application to improve picture quality especially at M=1. It is theoretically effective when a sequence has slow vertical motion with a constant velocity. 2) Necessity: Majority of the group can distinguish the difference of the picture quality between special prediction modes and field/frame adaptive prediction. However, some of members could not recognize significant differences. Thus, an agreement of the necessity of this mode is not unanimous. 3. Conclusion 1) Evaluation: Differences of the coding efficiency between FAMC, SVMC and Dual' is very small. The agreement to select the best among three was not obtained. 2) Action towards the London meeting: Two adhoc groups agreed with the idea to merge three candidates (FAMC, SVMC, Dual') into the new single core experiment by mixing up their good features. The decision of adaptation in the TM syntax will be based on the recognition of its significance at the London meeting. Annex 5.2 Resolution of the issues raised by Test model editing group and MPEG92/523 1. Inconsistent use of horizontal_f and forward_f_code "forward_f_code" and "backward_f_code" indicate "horizontal_f", therefore, they are interpreted as "forward_f_hori_code" and "backward_f_hori_code". "forward_f_vert_code" and "backward_f_vert_code" should exist in the picture layer after "extension start code". The number of the maximum f_codes is 4. (andria@nyquist.bellcore.com) 2. The motion estimation and compensation range for field-picture The horizontal motion compensation range is half of frame-pictures per field. The vertical range is quarter of the value written in TM2 page21. 3. The definition of the search pseudo c-code in SVMC Modifications are supplied by Mr.Nakajima (KDD, nakajima@spg.elb.kddlabs.co.jp). 4. Clarification of the use of SVMC and FAMC in field picture Modifications are supplied by Mr.Nakajima (KDD, nakajima@spg.elb.kddlabs.co.jp), and Mr.Yukitake (Matsushita, yukiatke@adl.mci.mei.co.jp). 5. Clarification of the use of 16x8 field macroblock We use "field_motion _type" in the macroblock layer to specify 16x8 motion compensation block. If "field_motion_type" is "10", 18x8 motion compensation is used instead of SFAMC. "field_motion_type" code prediction type motion_vector_count mv_format 10 16x8 2 field "sub_MB_type" should be added just after "backward_reference_field". This is 1 bit flag, uimsbf. sub_MB_type=="0" indicates the condition written in Core Exp. L9 3(1). sub_MB_type=="1" indicates the condition written in Core Exp. L9 3(2). 6. Default values of "forward_reference_fields" and "backward_reference_fields" Default values are set to "11" in both cases, which means Field 1 and Field 2 are used for prediction. 7.Syntax on progressive material Frame structure, frame prediction may be used for progressive material. However it does not prohibit to use frame field adaptive or field structure. 8. "temporal_reference" is needed to obtain frame/field distance for decoder. Its unit is "Frame". 9. Ambiguity in the definition of the noMC mode in field-pictures (p-field pictures only) If two previous fields are allowed as reference fields, the noMC mode does not specify which of the reference fields is to be used for the prediction. Thus, the noMc mode shall refer to the reference field of the same parity as the target field. In the case a (0,0) MV is to be used with the reference field of the opposite parity, the noMC mode cannot be used. The (0,0) MV must be explicitly transmitted (after the appropriate selection bit). Annex 5.3 Special Prediction Modes (Ver. 2) 1. Definitions SVMC3 is SVMC without FAMC. The latest document is used (TM-2 Erratum), with the modifications and corrections described in this document. 2. Temporal Scaling of the Motion Vector Scaling of the motion vector is done in the same manner for all the special prediction modes (FAMC, SVMC3 and DUAL-PRIME). The transmitted motion vector (x, y) corresponds to a prediction from same- parity field. The horizontal coordinate is in 1/2-pel units. The vertical coordinate is in 1/2-pel units or 1/4-pel units (depending on the mode and of the experiment performed). If the same parity reference frame is at a distance of 2*k fields from the predicted field, the coordinates (x', y') of the "scaled-motion-vector" used for accessing the different-parity field is computed as follows: x' = (x * m * K) // 32 (x and x' are integers) y' = ((y * m * K) // 32) + e (y and y' are integers) K = 16 // k (k is integer) m = field-distance between the predicted field and the different-parity-field (m is integer and can be negative). The "e" is an adjustment necessary to reflect the vertical shift between the lines of field 1 and field 2. To give an example, line 1 of field 2 is in fact located 1/2 line under line 1 of field 1. If vertical unit is 1/4-pel, "e" is defined as follows: e = -2 if the reference field corresponding to the scaled vector is field 2 e = +2 if the reference field corresponding to the scaled vector is field 1 If vertical unit is 1/2-pel, "e" is defined as follows: e = -1 if the reference field corresponding to the scaled vector is field 2 e = +1 if the reference field corresponding to the scaled vector is field 1 3. Reference Fields for SVMC3 and DUAL-PRIME The reference fields used for SVMC3 and DUAL-PRIME are not always contiguous in time. Those modes can now be used in all cases of Field-structure P- Pictures. When SVMC3 or DUAL-PRIME is used in the second P-Field of a P-Picture, the first P-Field is used as a reference (different-parity) field. SVMC3 and DUAL-PRIME can be used with reversed order prediction of P-Fields (in this case, m = -1). 4. Decision for Field-based Prediction In order to take advantage of the various special prediction modes, the decision rule must be modified for Field-based prediction. It has been noted by various members that quality is improved by choosing Field-based prediction less often, to the benefit of another special prediction mode, particularly in B-Pictures. For example, even in cases where Field-based prediction has an MSE slightly better than any of the other prediction modes, it may cost a significant overhead to transmit two field-vectors (four in B-Frames). Until further improvement, we propose to use the following decision rule in core experiments involving one of the special prediction modes: - Field-based chosen if MSE_field + 8 < MSE_best_of_other_modes in B-pictures if MSE_field < MSE_best_of_other_modes in P-pictures where MSE = Mean Square Error PER PEL of predicted MB 5. Concise Specification of SVMC3 The transmitted motion vector is scaled with the specified rule to obtain motion vectors origination from each of the reference field, and pointing to the predicted field. When the reference field and the predicted field are of same parity, the motion vector is used directly (no scaling is necessary). 5.1 Forward Prediction 5.1.1 Forward Prediction of the pels of Field 1 (16Hx8V) The coordinates (x'1 , y'1) of the scaled motion vector are computed as specified, with m = m1. A 16Hx8V prediction block is obtained from reference field 1 with the vector originating from this field Vertical interpolation is 1/4-pel linear interpolation. Horizontal interpolation is 1/2-pel as usual. Like in the "usual" case, horizontal and vertical interpolation are done in a single step, involving only one division (in this case by 8). An example is given in Figure 1 and Figure 2. A 16Hx8V prediction block is obtained from reference field 2 with the vector originating from this field. Vertical interpolation is 1/4-pel linear interpolation. Horizontal interpolation is 1/2-pel as usual. The selection of the prediction is done according to the SVMC3 type: - Near-field: The prediction block used is the one corresponding to the reference field closest to the predicted field (in time axis). - Same-parity: The prediction block used is the one corresponding to the reference field of same parity as the predicted field. - Dual: The prediction block used is obtained by averaging the two prediction blocks from field 1 and field 2. The averaging is done like in "Interpolation-mode" in B-Pictures. 5.1.2 Prediction of the pels of Field 2 (16Hx8V) The coordinates (x'2 , y'2) of the scaled motion vector are computed as specified, with m = m2. For the rest, the prediction is done like in 5.1.1. 5.2 Backward Prediction The forward rule is simply transposed. 5.3 Averaged Prediction in B-Pictures 5.3.1 Averaged Prediction of the pels of Field 1 (16Hx8V) The predictor blocks for forward prediction are computed as in sections 5.1.1 and 5.1.2. The predictor blocks for backward prediction are computed as in 5.2. The selection of the prediction is done according to the SVMC3 type: - Near-near: The prediction block used is obtained by averaging the prediction blocks from the closest forward and backward reference fields (in time axis). - Same-near: The prediction block used is obtained by averaging the prediction block from the same parity forward reference field and the prediction block from the closest backward reference field (in time axis). - Near-same: The prediction block used is obtained by averaging the prediction block from the same parity backward reference field and the prediction block from the closest forward reference field (in time axis). - Same-same: The prediction block used is obtained by averaging the prediction block from the same parity forward reference field and the prediction block from the same parity backward reference field. Note that the four SVMC3 averaged modes and the SVMC3 dual mode are extremely similar. Only the choice of the two reference fields differs. The averaging is always done like in "Interpolation-mode" in B-Pictures. The other SVMC3 modes are equivalent to field-based prediction with 1/4 vertical accuracy. 5.4. Chrominance The motion vector used for chrominance is obtained from the luminance SVMC3 motion vector with precisely the same rule as in the case of field-based prediction (for 4:2:0 : divide each coordinate by 2 as described section 5.2.2.1. of TM-2). The rules of prediction are same as for luminance. 5.5. Motion estimation of SVMC3 Search is done by a local refinement around several candidate motion vectors resulting of a first search. The candidate motion vector can be the result of a full-pel accuracy search. The local search covers 5Vx5H = 25 motion vectors and is done on reconstructed. For each of those, all the candidate SVMC3 prediction blocks must be evaluated. For local search, the vertical step is 1/4-pel, and the horizontal step is 1/2-pel. In the case of Frame-Pictures, the candidate motion vectors used as starting point of local search are: - The Frame motion (result of Frame-based search). - The Field motion vector (result of Field-based search) from the closest reference field to the predicted field of same parity. In this case, the vertical cordinate must be multiplied by two to have 1/4-pel vertical field accuracy. If forward: from field 2 to field 2 If backward: from field 1 to field 1 - Optionally, the other field motion vectors (scaled appropriately) could be used as candidate motion vectors. In the case of Field-Pictures, the candidate motion vectors used as starting point of local search are the field motion vectors (result of Field-based search), scaled to the field-distance corresponding to same-parity. In this case, the vertical coordinate of the field vectors must be multiplied by two to have a candidate motion vector with 1/4-pel vertical field accuracy. 6. Concise Specification of DUAL-PRIME In DUAL-PRIME prediction, single motion vector like that of SVMC3 with 1/2 pixel precision and one very small differential motion vector called DMV is transmitted per macroblock. To obtain motion vectors originating from each of the reference field, and pointing to the predicted field of the different parity, the transmitted motion vector is scaled and DMV is added as follows: x' = ((x * m * K) // 32) + dmv_horizontal (x and x' are integers) y' = ((y * m * K) // 32) + dmv_vertical + e (y and y' are integers) The variables (x, y), (x', y'), K, m, and e have been already defined above, and vertical unit is 1/2-pel. dmv_horizontal and dmv_vertical are horizontal and vertical components of DMV with 1/2 pixel precision. These values are restricted within the range from -1 to +1. Note that the same DMV is used for the two scaled motion vectors in the frame picture as illustrated in Figure 3. When the reference field and the predicted field are of same parity, the motion vector is used directly (no scaling is necessary, and the addition of DMV is not necessary). 6.1 Forward Prediction 6.1.1 Forward Prediction of the pels of Field 1 (16Hx8V): The coordinates (x'1 , y'1) of the scaled motion vector are computed as specified, with m = m1. A 16Hx8V prediction block is obtained from reference field 1 with the vector originating from this field. Both horizontal and vertical interpolation is 1/2-pel linear interpolation as usual in the field motion vector. Like in the "usual" case, horizontal and vertical interpolation is done in a single step. A 16Hx8V prediction block is obtained from reference field 2 with the vector originating from this field. Both vertical and horizontal interpolation is 1/2-pel linear interpolation as usual. The prediction block used is obtained by averaging the two prediction blocks from field 1 and field 2. The averaging is done like in "Interpolation-mode" in B-Pictures. 6.1.2 Prediction of the pels of Field 2 (16Hx8V): The coordinates (x'2 , y'2) of the scaled motion vector are computed as specified, with m = m2. For the rest, the prediction is done like in 6.1.1. 6.2 Backward Prediction The forward rule is simply transposed. 6.3 Prediction mode in B-Pictures The averaging mode is inhibited in DUAL-PRIME. Only the forward/backward prediction is used in B-pictures. 6.4. Chrominance From DUAL-PRIME motion vector, four field motion vectors for luminance from the reference field 1/field 2 to the predicted field 1/field 2 can be obtained. Corresponding four chrominance vectors are obtained with precisely the same rule as in the case of field-base prediction (for 4:2:0; divide each coordinate by 2 as described section 5.2.2.1. of TM-2). The rules of prediction are same as for luminance. 6.5. Motion estimation of DUAL-PRIME The motion estimation of DUAL-PRIME is carried out by the following two steps. The first step is to obtain four candidate motion vectors as follows. First, four field motion vectors with half-pel accuracy from the reference field 1/field 2 to the predicted field 1/field 2 are searched by the normal field motion vector search method defined in TM2, except that original pictures are used in half-pel refinement. Then, these vectors are appropriately scaled, if the parity of the predicted field is opposite to that of the reference field. The second step is to evaluate prediction errors using possible combinations of four candidate motion vectors obtained by the first step, and 3Vx3H = 9 candidates for DMV using local decoded pictures, and to select the best combination of the motion vector and DMV. 7. Core Experiment L-10 L-10.1. Frame + Field + DUAL-PRIME L-10.2. Frame + Field + SVMC3 L-10.3. Frame + Field + SVMC3-1/2-pel Same as L-10.2, except that all motion vectors involved are only 1/2-pel vertical accuracy. The motion vector transmitted is field-type, as in DUAL- PRIME. However the rule for selecting the PMV's is same as for SVMC3, i.e., the same as for Frame-prediction. Scaling of the PMV vertical coordinate is done like for field vectors. L-10.4 Frame + Field + DUAL-PRIME + SVMC3 L-10.5 Frame + Field + DUAL-PRIME + (SVMC3 - dual) The dual mode of SVMC3 is not used, since DUAL-PRIME may replace it advantageously. However, no significant hardware simplification is expected by implementation of l-10.5 vs. L-10.4. Among the modes in the above core experiment, mode selection is decided by MSE. Annex 5.4 Selection Process of 8x8 motion vector for L-11 1. For each block within a macroblock, count the number of bits (n16) required for interframe coding using 16x16 motion vector and the number of bits (n8) required for interframe coding using 8x8 motion vector. "n8" should include the overhead required for the 8x8 motion vector. "n16" does not include the overhead for the 16x16 motion vector. Choose 8x8 motion vector if "n8" is smaller than "n16". 2. Add all the bits ("n8" or "n16" depending on the decision) for all block in the macroblock, and add the overhead for the 16x16 motion vector. This number is "nDPCM". Compare the number (nDPCM) and the number of bits required for intraframe coding of the macroblock (nPCM). Choose interframe coding if "nDPCM" is smaller than "nPCM". END