MPEG-2
MPEG (Moving Pictures Expert Group)
- 1988 working group within ISO
- borrows significantly from JPEG (lossy)
- encoder is algorithmic and adaptive
- decoder is "dumb"
- perfect for broadcasting (few encoders vs. many decoders)
- MPEG defines the bitstream - not encoders/decoders
Pathway to compression
- spatial and temporal redundancy
- pixel values are not independent
- correlated to neighbors in the same frame
- correlated to neighbors across frames
- psychovisual redundancy
- human eye has a limited response to fine spatial detail
- human eye less senstive to detail around object edges
Spatial redundancy
- transform to the frequency domain
- Fourier analysis - any periodic waveform can be reproduced by
adding together an arbitrary number of harmonically related
sinusoids of various amplitudes and phases
- does not result in compression (usually increases)
- samples are not periodic
STFT (short-time Fourier Transform)
- break up the continuous time domain with windows
- rectangular
- # of frequencies depends on size of window
- wrap samples into ring to make "continuous"
- DFT (discrete Fourier transform)
- we want # input samples == # of frequency coefficients
- FFT is a fast way to compute DFT
DCT (Discrete Cosine Transformation)
- special case of DFT where sine components eliminated
- repeat original samples in time-reversed order
- perform DFT (8x8 block of pixels)
- produces as many usefull coefficients as input samples
Compression?
- 64 pixels -> 64 coefficients
- but
- most coefficients will be 0
- statistically, the further from the top-left, the
smaller the magnitude
- compression
- run-length coding
- Huffman coding gain
- quantization of coefficient wordlengths
- amount weighted according to visibility by a human observer
- not reversable in the decoder (lossy)
Sequences
- try to predict the next picture from a previous picture
- Send a difference picture
- difference between coded picture and next picture
- original picture not available at the decoder
- should also contain spatial redundancy
- encode the difference picture before sending
Motion-compensated inter-frame prediction
- divide screen into areas called macroblocks (16x16)
- each macroblock steered by a motion vector
- vector gives offset in another picture to find pixel data
- fetch from another picture
- motion vector overhead can account for 1/3 of bitrate in
a "high-action" sequence
- practical search ranges between +/-15 and +/-32 pixels
Types of pictures
- I or 'intra' pictures
- coded w/o reference to any other pictures
- just reduce spatial redundancy
- access points in the bitstream where decoding can begin
- P or 'predictive' pictures
- use previous I or P for motion compensation
- each block is either predicted or intra-coded
- B or 'bidirectionally-predictive' pictures
- can use the previous and/or next I/P for motion compensation
- can cause a reorder from natural display order
- prediction is a fetch operation
Sequences of pictures
- a sequence may consist of almost any pattern of I, P, and B pictures
- in a typical sequence, relative sizes of pictures are
Summary
- lossless compression generally limited to around 2:1
- compression "sweet spot" of MPEG is between 8:1 and 30:1
- standard VHS - approx 1.5 Mbit/sec
- broadcast NTSC - approx 3 Mbit/sec
- sports/high temporal activity - approx 5-6 Mbit/sec
- betacam (90 Mbit/sec) - approx 10 Mbit/sec
DBS
- broadcast multiple channels over a transponder
- 23 Mbit/sec available
- mix/match to balance entropy
- 1997 - 5-6 channels per trans.
- 1999 - 8-10 channels per trans.
- 2001 - 11+ channels per trans.
- take advantage of variable bit rates
- statistical multiplexing
- probability that all channels reaching peak entropy at once is
very small
DVDs
- 12cm dia disc, short laser wavelength, finer track pitch, better optics
- approx. 5GB of storage -> movie at 5Mbit/sec
- disc reduced to 0.6mm (can glue 2 together for 1.2mm disc)
- approx 10GB of storage -> movie at 10Mbit/sec
- film performs well with MPEG
- frame rate of 24Hz vs 30Hz (20% savings)
- film source progressive (no interlacing for motion estimation)
- pre-digital source oversampled (very high quality signal)
- focusing, motion blur, etc from film more amenable to the transform
and quantization of MPEG