Rechnernetze

MPEG

MPEG steht für "Moving Picture Coding Experts Group", die 1988 eingerichtet wurde, um einen Standard für die Aufzeichnung von Video und Audio in VHS-Qualität auf CDs zu entwickeln (352 x 288 + CD-Audio bei 1,5 Mbits/sec).

MPEG verwendet zur Bildkodierung ein ähnliches Verfahren wie JPEG, wobei zusätzlich ein B-Frame (bidirektional) eingeführt wird. Dieser kann sich Information sowohl aus bereits vorhandenen als auch aus späteren I- oder P-Frames holen. MPEG wird auch im Video-Recorder-Bereich eingesetzt, so dass Filme vor- und rückwärts sowie wahlfrei abgespielt werden können.

Diese Site wurde kopiert von

http://www.rasip.fer.hr/research/compress/algorithms/adv/mpeg/index.html

MPEG

Purpose of this page is to give you quick look on the MPEG standard, and to give some directions for you who decide to get deeply into it.
MPEG is acronym for Moving Picture Expert Group, a group formed under ISO (International Organization for Standardization) and the IEC (International Electrotechnical Commission). Later, MPEG was given formal status within ISO/IEC.
The topics, which are covered in three parts of MPEG standard, are coding of video and audio including synchronization of audio and video bitstreams with multiple interleaved video sequences.
These three parts of the MPEG standard are:
Part 1: System aspects
Part 2: Video compression
Part 3: Audio compression
There are different types of MPEG. For example: MPEG-1, MPEG-2, MPEG-4 etc.
The most important differences between them are data rate and applications.
MPEG-1 has data rates on the order of 1.5 Mbit/s, MPEG-2 has 10 Mbit/s, and MPEG-4 has the lowest data rate of 64 Kbit/s.

A video stream is a sequence of individual frames. Every frame is a still image which shown together, one after another, become a video motion picture. Usually at rate close to 30 frames per second. Frames are digitized in a standard RGB format, 24 bits per pixel (8xRed, 8xGreen, 8xBlue).
MPEG algorithm operates on images represented in YUV color space (Y Cr Cb). YUV format, also, represents images in 24 bits per pixel (8 bits for the luminance information (Y-luminance provides a monochrome picture) and 8 bits for each of two chrominance information (U and V-chrominances provides the equivalent of color hue and saturation in the picture)). YUV format can be more efficiently compressed than RGB. The YUV format is subsampled. All luminance information is retained. However, chrominance information is subsampled 2:1 in both, horizontal and vertical, directions. Thus, there are 2 bits per pixel for any of U and V information. This subsampling does not drastically affect quality because the eye is more sensitive to luminance than to chrominance information. Subsampling is a lossy step.

Video Stream Data Hierarchy

Video sequence includes one or more groups of pictures, begins with a sequence header and ends with an end-of-sequence code.

In MPEG, video stream breaks sequence of images into a series of layers, each containing more precise sample clusters. These layers are:

Group Of Pictures is a header and a series of one of more pictures.

Picture is primary coding unit of a video sequence. A picture consists of three rectangular matrices representing luminance (Y) and two chrominance (Cb and Cr) values. Being rectangular, the Y matrix has an even number of rows and columns. The Cb and Cr matrices are one-half the size of the Y matrix in each direction (horizontal and vertical).

Slice-MPEG picture is composed of slices where each slice is sequence of macroblocks in raster scan order.Slices can strech from one macroblock row to the next. This slice structure, allows great flexibility in error handling and discovering changes in coding parameters.

Macroblock is basic coding block of an MPEG picture. It consists 16x16 sample array of luminance (Y) samples together with 8x8 block of samples for each chrominances (Cr and Cb). Y block consists of four 8x8 blocks of samples.

The most important aspect of MPEG is similarity between two neighbour pictures.

There are three types of pictures that the MPEG standard defines.

First are the Intra Pictures (I-Pictures).

Intra Picture is encoded as a single image using only information from that picture. I-Picture uses only transform coding, so it provides low compression rate. In the most cases it uses cca two bits per coded pixel.
Image blocks have very big spatial redundancy, so MPEG tries to cut this huge amount of data.
The block is first transformed from the spatial domain into a frequency domain using Discrete Cosine Transform (DCT). DCT separates the signal into independent frequency bands making data quantized. You can imagine quantization like ignoring lower-order bits (just a little bit complicated!!).
Quantization is the only lossy part of the whole compression process other than subsampling.
Afterwards, the resulting data is run-length encoded in a zig-zag ordering to optimize compression. This zig-zag ordering produces longer runs of zeros by taking advantage of the fact that there should be less high-frequency information (more zeros as one zig-zags from the upper left corner towards the lower right corner of the 8x8 block). Coefficient in the upper left corner of the block, called the DC coefficient, is encoded relative to the DC coefficient of the previous block (DCPM coding).

Second are the forward Predicted pictures (P-Pictures).

A P-Picture is nonintra picture. Nonintra picture uses information which are displaced in time. P-Picture is coded with reference to a previous image that could be either I or P Picture. Example picture shows that the picture to be encoded is similar to the reference picture except it is a little bit shifted.
Most of the changes between reference picture and picture to be encoded can be represented as translation of small picture regions. So technique that is used here is called motion compensation prediction.
Each macroblock in a P-Picture can be encoded either as an I-macroblock or as a P-macroblock. An I-macroblock is encoded just like a macroblock in an I-frame. A P-macroblock is encoded as a 16x16 area of the past reference picture, plus an error term (the difference between two macroblocks).
A Motion Vector is used to specify the 16x16 area of the reference frame. A motion vector (0, 0) means that the 16x16 area is in the same position as the macroblock we are encoding. Other motion vectors are relative to that position. There will be no perfect match for the macroblock in the reference picture due to the spatial location, but the closest match is searched. The error term is finally encoded using the DCT, quantization, and run-length encoding.

Third are Bi-directional predicted pictures (B-Pictures).

A B-Picture is also nonintra picture. It is encoded relatively to the past reference picture, the future reference picture, or both pictures. The future reference picture is the closest following reference picture (I or P). The encoding for B-pictures is similar to P-pictures, except that motion vectors may refer to areas in the future reference pictures. For macroblocks that use both past and future reference pictures, the two 16x16 areas are averaged.