4 minute read
Video compression is a wonderful world of mathematics, computer science, and arcane mystery. The good news is you don’t need to be a scholar, engineer, or wizard to understand the basics. In this subsection we’ll explore video compression so you can make a more informed choice when choosing codecs.
As mentioned earlier, Spatial/Intraframe Compression only involves data within individual frames. To achieve data savings, this method of compression doesn’t save the chroma data of every pixel. Rather, pixels are grouped together into groups, called Macro-Blocks, and then an average chroma value is assigned to them.
The exact shape of macro-blocks differ between codecs. Some codecs only use macro-blocks of a fixed size, while others allow some flexibility up to a certain size. In their simplest form, macro-blocks are square or rectangular grids of pixels, like the example above.
However, more advanced codecs can sample blocks of almost any shape or size. In the above image, we can see how HEVC utilizes highly flexible blocks. This provides a much cleaner image than more rigid forms of spatial compression.
Where macro-blocking causes the most quality loss is in images with fine color gradients.
In the above example, notice how the sky fades over a subtle gradient of contrasting colors. This version of the image is very lightly compressed, so the gradient is smooth.
But poor-quality spatial compression can make the gradient choppy and irregular, resulting in banding between colors. This is a direct side effect of throwing away color data, where a large number of differently colored pixels are instead all assigned the same color.
However, despite this issue, intraframe compression is a popular method for a number of different codecs and applications. Popular codecs like ProRes, DNxHD, and some ALL-I (all-Intraframe) capture codecs found on newer prosumer cameras all use intraframe compression.
The other category of compression we are concerned with is Temporal/Interframe compression. This method employs the same sort of compression techniques as intraframe compression, but across multiple frames, rather than just one. That means that every frame doesn’t contain 100% of the information used to display it. This might seem confusing, but it’s actually quite simple.
In the above shot, the subject is seated in front of a static background. Obviously, as the interview progresses, there will be motion on screen. The subject’s face will change as they speak, their hands will move, and they will wiggle a bit in their seat. But as a whole, most of the image (the background) will remain largely unchanged from frame to frame. Even a good portion of the subject’s body will stay the same in short bursts. So, in reality, a lot of the information in every frame might be redundant if they were all stored equally.
So interframe codecs eliminate this redundant data by only saving changes present in every frame. In this case, that’s the subject’s face, and whatever hand gestures or body movement they make. How exactly does this work?
This complex process begins after the interframe codec breaks up an image into macroblocks. At this point, an intraframe codec would just encode the image, but interframe codecs are different. instead of directly encoding the values of every block, the compression algorithm searches for similar blocks in adjacent frames. If it finds one, it then references that block in the new frame.
And this works even if the blocks between frames are slightly different. If the referenced block is in a new position (because of camera movement) it can be placed in the new frame in a new position, by use of a motion vector. This vector shows where the reference block should appear in the new frame, all without creating new blocks.
Now, it wouldn’t make sense for the codec to use every frame in a clip for this sort of compression. After all, scenes changes have practically no details in common, so it would be fruitless to look for similar blocks in those situations. This is why temporal compression groups similar frames together for more efficient encoding and decoding.
Thus, interframe codecs are called Group of Pictures (long-GOP) codecs. These groups are made up of three different types of frames – I-Frames, P-Frames, and B-Frames.
I-Frames are Intraframe-Coded images. That means that these frames only receive spatial compression. They serve as the reference frames for the rest of the interframe sampling.
P-Frames are Predictive-Coded images. These are frames that rely on a reference frame that comes from earlier in the sequence. Any blocks that are the same between a P-frame and an I-Frame will only be stored in the I-Frame and just loaded into the P-Frame for display.
B-Frames are Bidirectionally-Predictive-Coded images. These are similar to P-Frames, but can use data from images earlier or later in the sequence. The algorithm can search for blocks in either I or P frames to fill in the gaps.
Depending on the compression settings, the number of frames in a group will vary. In general, the more an image stays the same, the more frames can be put in a group without visible artifacts. There can be lots of P and B frames between each I-Frame, or only two or three. It is also sometimes possible for users to set how many frames these groups can contain, which gives more technical control over the image.
Lots of codecs use interframe compression, such as H.264, MPEG-4, AVCHD, and XDCAM. Interframe compression is appealing for use in capture codecs, because they allow you to store more footage on a given storage device. That allows you to shoot longer on set or in the field without changing drives.
Unlock all 100,000 words of the Frame.io Workflow Guide and learn how the pros do workflow.