First, for background, here is an overview of the most common video compression standard in use today, H264: https://www.maketecheasier.com/how-video-compression-works/
H265 has improved on this, but for various reasons, it was a bit of a flop. It is in use and will continue to grow, but royalty issues are a problem. To try snd address these, 3 new video standards have been approved by MPEG: https://ottverse.com/vvc-evc-lcevc-mpeg-video-codecs/
A consortium of companies has also endorsed a royalty-free codec called AV1. Here though, I will be talking mostly about the EVC baseline used at a resolution of 512p. Although differences exist between codec efficiency at that resolution, which is called SD resolution, it is not big – only about 20% between worst and best (not including H264). These days, it is not used much – people are moving to HD (720p) or Full HD (1024p). It continues with 4k now becoming common on streaming services like Netflix, and the push is on for 8K. At these higher resolutions, there is a big difference between codec efficiency and complexity. The bitrate for 4k over the internet has basically been solved, but 8k, up until now, has been a problem. It is hoped the new codecs will help to tame it to something like 20mbps. 4k is about 15mbps, although Netflix recommends 25mbps to be on the safe side. However, with new technology like shot based encoding, Netflix has made it even more efficient: https://hackaday.com/2020/09/16/dec…laining-optimized-shot-based-encoding-for-4k/
LCEVC and baseline EVC
Here I will be talking about LCEVC and baseline EVC. Baseline EVC at 512p is close to the efficiency of the most efficient codec, VVC, but has very low complexity and no royalties. It is mostly an arbitrary choice; any codec will do and not make much difference to efficiency.
The secret sauce of LCEVC is traditional video codecs do not compress the high-frequency part of video well. To fix it, a company called V-Nova came up with LCEVC. Using the following link, you can find a lot of information: https://www.lcevc.org/
For an overview, see: https://www.lcevc.org/how-lcevc-works/
The video is downscaled by 4 and encoded using a standard codec (EVC baseline in this writeup), but any codec will do. That is transmitted. Also added is the difference between the original and downscaled version. The method used to restore the original is to upscale it by 2, then add in the difference between the original downscaled by 2 and the upscaled version. When corrected, it is upscaled by 2 again, and the differences added to get the original. It is flexible in that you can skip the second downscale if you wish, plus add extra downscaling steps (although I do not believe V-Nova has done a 3 downscale version yet – but IMHO, it would be advantageous with 8k as explained later). They developed some ‘tricky’ techniques, such as M-Prediction and using the previous frame to guess the difference to make it more efficient. You can get the detail from the patent: https://patents.google.com/patent/WO2020188273A1/en
I have read it – it does take a while but is the only way to get exactly what is going on.
Since the encoding is done at a lower resolution, it is computationally quick. Slower presets that give better compression can be used and overall still encode quicker. Because the ‘upscaling’ is done using 2X2 or 4X4 blocks, the upscale can use many concurrent threads. Generally speaking, it adds little to the decoding time (with one exception I will mention later).
Even using just one downscale (I would expect better results with 2 or 3 downscales), performance has been impressive: https://www.lcevc.org/wp-content/up…rmance-of-LCEVC-Meeting-MPEG-134-May-2021.pdf
With 8k content, one should use 3 downscales. The encoder can do the base codec at 512p. At that resolution, the difference between the performance of codecs is minimal compared to the enhancement data – especially with the newer codecs. For example, royalty-free baseline EVC is only about 15% worse than the most efficient but complex VVC, with all its royalty woes. At the cost of increased processing, one can use task-aware upscaling (TAU) and downscaling, giving smaller corrections after upscaling and greater efficiency: https://cv.snu.ac.kr/research/taid/.
Experiments done by Samsung with its Scalenet technology (a version of Task Aware Upscaling) show that using this in going from 4k to 8k, it is very, very difficult to tell the difference (I have heard it has a VMAF of nearly 100 and PSNR of a bit below 40). Considering it is hard to detect the difference between 8k and downscaled to 4k, the third layer will not be doing much work – it will all be in the first and second layers. Since they will also be using Task Aware Upscaling for the other upscales, they will also be more efficient. The downside is the amount of processing power required – but processing power is still getting cheaper – it will likely happen. Still, even without the assistance of superresolution, reductions in bitrates of 40% to 50% are entirely possible. See: https://8kassociation.com/lcevc-licensing-offers-different-model-to-kickstart-8k-market/
VMAF 93 and above is generally considered to be for all practical purposes indistinguishable from the original, and using H265 with LCEVC was achieved at about 20mbs. 8K transmission is entirely possible now at bit-rates nearly everyone has available. By Per Scene Encoding Netflix has made it even more efficient: https://hackaday.com/2020/09/16/dec…laining-optimized-shot-based-encoding-for-4k/
Look for even 20mbps to be reduced significantly. Companies are working on integrating it with Content Adaptive Encoding (CAE) that lowers it even further: https://blog.beamr.com/2019/09/11/cabr-content-adaptive-rate-control/
Harmonic managed to get 8K at about 25mbps using just CAE – no LCEVC. It is working on combining it with LCEVC at the moment. Combining all these ‘tricks’ means in the future, distributing 8K content will be not an issue at all. Even now, it is not really an issue – we need the content to make it worthwhile and some new infrastructure.
My favourite interest is exactly how can we view the world so what science tells us is intuitive.