In the area of video streaming and content delivery, ensuring the best video quality has never been more critical. Today’s viewers expect high quality video, especially for premium products such as SVOD (Subscription Video on Demand) services or live streamed sports. As video usage increases constantly across different platforms, from social media to high-end streaming services, encoders must balance compression efficiency with viewer satisfaction, essentially making video encoding a cost vs. quality trade-off.
Traditional metrics like the Peak Signal-to-Noise Ratio (PSNR) fall short in the low bitrate realm of content delivery because they do not accurately reflect how humans perceive visual quality: minor distortions that are imperceptible to the eye can alter results dramatically. This is where perceptual quality measurements come into play. By modelling how humans actually perceive video quality, encoders can be optimized better for the best possible end-user experience. By focusing on perceived visual quality, service providers can reduce overall bandwidth costs, minimize storage needs, and deliver excellent quality without over-allocating resources. In this landscape, Netflix's Video Multimethod Assessment Fusion (VMAF) has emerged as the de-facto industry standard. VMAF combines multiple perceptual quality features with a machine learning model to provide a score that closely aligns with subjective human ratings, making it indispensable for benchmarking and tuning video encoders.
Optimizing an encoder for VMAF however starts with the fundamental need to measure it accurately and efficiently. There are several ways to compute VMAF scores today, but each comes with its own trade-offs. For instance, the standalone command line VMAF binary can be used, which requires handling raw YUV files for both the reference (source) and distorted (encoded) video, making it cumbersome for complex setups and impractical in real-life video workflows. Alternatively, tools like FFmpeg with libvmaf integrated offer an easier interface, allowing arbitrary input formats and piping in encoded streams directly. While FFmpeg simplifies the process, it is still inherently an offline workflow: you encode first, then analyze separately, which can introduce delays and inefficiencies in iterative tuning cycles.
Adding to these challenges, VMAF is a full-reference metric, meaning it requires access to the reference source for comparison. In real-world scenarios, however, you often don't have the identical source readily available, or you need to reapply pre-processing steps to the original, like deinterlacing, cropping, or filtering, to match the encoders input. Things get even more complicated when the source material is also edited (for example removal of pre-roll footage, ads, etc.) or is from a live source (difficult to capture and align with the encoded video). This can easily lead to errors, inconsistencies, and wasted time aligning everything post-encoding. If reference and distorted material are misaligned by just a single frame, VMAF measurements will be wrong and meaningless.
After a considerable amount of technical and customer research, the idea for vScore was born, released this month alongside our Codec SDK 16.0. One of the many benefits of vScore comes with VMAF, which we have deeply integrated into our HEVC encoder. The key advantages of tightly integrating quality measurements with the encoding are perfect frame synchronization and avoiding additional decoding steps, thereby completely eliminating mismatches that can occur in post-encoding workflows. This streamlines video workflows where quality assurance has already been implemented, decreasing processing time and computational overhead.
Whether you are in a production environment or rapid prototyping, vScore makes quality evaluation easier, more reliable, and seamlessly part of the actual encoding routine. Beyond VMAF (available in both CPU and CUDA-accelerated modes), vScore also includes classics like PSNR and SSIM plus our novel VMAF proxy called VMAF-E. This lightweight metric uses encoder-derived data and simple image features to deliver a fast VMAF estimate, ideal for scenarios where full VMAF computation might be too resource-intensive (live streams), or GPU acceleration is not available (e.g. on cheap cloud instances).
Since it is integrated in our encoder, using vScore is easy as quality measurements can be enabled by using the quality_metric bitmask setting in the configurations, as listed in Table 1. By setting multiple bits, users can select one or multiple quality metrics.
Bit | Value | Measurement | Meaning |
- | MC_QUALITY METRIC_NONE | - | No measurement (default) |
0 | MC_QUALITY_METRIC_PSNR | PSNR | Enable PSNR measurement |
1 | MC_QUALITY_METRIC_SSIM | SSIM | Enable SSIM measurement |
2 | MC_QUALITY_METRIC_VMAF | VMAF | Enable VMAF measurement |
3 | MC_QUALITY_METRIC_VMAF_E | VMAF-E | Enable VMAF-E measurement |
For VMAF, the extremely fast CUDA acceleration provided by NVIDIA can be enabled using two separate settings: vmaf_hw_acceleration to switch it on/off and vmaf_hw_acc_idx to select a specific CUDA-enabled device if multiple GPUs are present. Of course, you can also explicitly choose between the built-in HD and 4K VMAF models (vmaf_v0.6.1 and vmaf_4k_v0.6.1) as well as the “no enhancement gain” (NEG) variants. If you don’t specify anything, vScore will choose the best fitting model according to the source resolution automatically. Further, since we are video coding engineers, it will default to using the NEG models, as they measure the pure effect of compression without taking additional image enhancements into account.
If you are experimenting with our sample encoder, a command line example to produce VMAF measurements looks like this:
Turn on CUDA hardware acceleration to get much faster VMAF computations:
In a similar fashion, you can get VMAF-E measurements by enabling the fourth bit of the bitmask:
As can be seen, the VMAF-E scores are slightly different compared to the real VMAF scores. As mentioned above, VMAF-E is a trained metric, designed to very closely resemble the real VMAF but at much lower computational complexity. This also means that some prediction error can be expected from the metric. We have trained VMAF-E with a large number of different HD and 4K video sequences, all encoded at various bitrates. On an independent test set, we measured an average VMAF-E prediction error of ± 2.4 VMAF scores at the frame level and ± 1.6 VMAF scores on sequences. Stay tuned for a future blog post on VMAF-E, where we go into more details regarding the accuracy of the metric.
When using our C/C++ API directly, you can also receive per-frame VMAF and VMAF-E data as soon as computations finish using the auxinfo callback handling. Check the SDK documentation for more information on how to do this.
One of the most exciting aspects of vScore is its ability to enable live, online VMAF measurements during the encoding process itself. Unlike traditional offline analysis, this real-time capability provides immediate feedback on quality metrics as frames are processed. VMAF is computationally demanding on a CPU, therefore we offer two alternative solutions to make perceptual quality measurements much more practical: CUDA-accelerated VMAF measurements on a GPU and VMAF-E, a fast estimation of the real VMAF score that uses a lightweight neural network. VMAF-E is perfect in situations where no GPU is present.
VMAF-E enables monitoring perceptual quality on-the-fly in live streaming setups, where, for example, engineers can spot quality degradations instantly, leading to proactive adjustments and enhanced observability. This also opens the door to innovative applications, such as dynamic bitrate adaptation based on live VMAF scores, automated quality alerts in automated pipelines, or even encoding optimizations that respond to real-time quality changes.
Of course, measuring perceptual quality does not come for free, some CPU resources must be allocated to these types of measurements, effectively slowing down the encoding process. Tables 2 through 4 compare encoding performance and quality measurements for three types of systems (desktop, laptop and server) at different encoder performance levels (5, 15, and 25):
Quality metric | Performance 5 | Performance 15 |
Performance 25 |
Average % of base-line | |||
FPS | % of base-line | FPS | % of base-line | FPS | % of base-line | ||
None (baseline) | 230.1 | 0.0 | 97.1 | 0.0 | 31.9 | 0.0 | 0.0 |
PSNR | 219.3 | -4.7 | 93.3 | -3.9 | 29.3 | -8.2 | -5.6 |
SSIM | 63.8 | -72.3 | 53.9 | -44.5 | 27.0 | -15.2 | -44.0 |
VMAF | 19.9 | -91.4 | 18.8 | -80.6 | 15.1 | -52.7 | -74.9 |
VMAF-CUDA | 192.5 | -192.5 | 90.1 | -7.2 | 29.9 | -6.3 | -9.9 |
VMAF-E | 208.3 | -9.5 | 95.2 | -2.0 | 27.4 | -14.1 | -8.5 |
Quality metric | Performance 5 | Performance 15 |
Performance 25 |
Average % of base-line | |||
FPS | % of base-line | FPS | % of base-line | FPS | % of base-line | ||
None (baseline) | 279.0 | 0.0 | 117.6 | 0.0 | 21.0 | 0.0 | 0.0 |
PSNR | 269.3 | -3.5 | 116.6 | -0.9 | 20.9 | -0.5 | -1.6 |
SSIM | 54.9 | -80.3 | 49.8 | -57.7 | 20.4 | -2.9 | -46.9 |
VMAF | 17.7 | -93.7 | 17.2 | -85.4 | 14.7 | -30.0 | -69.7 |
VMAF-CUDA | 276.1 | -1.0 | 117.3 | -0.3 | 21.0 | 0.0 | -0.4 |
VMAF-E | 233.5 | -19.9 | 111.1 | -5.5 | 20.9 | -0.5 | -8.6 |
Quality metric | Performance 5 | Performance 15 |
Performance 25 |
Average % of base-line | |||
FPS | % of base-line | FPS | % of base-line | FPS | % of base-line | ||
None (baseline) | 313.7 | 0.0 | 140.4 | 0.0 | 17.8 | 0.0 | 0.0 |
PSNR | 273.0 | -13.0 | 137.4 | -2.1 | 17.6 | -1.1 | -5.4 |
SSIM | 30.5 | -90.3 | 30.6 | -78.2 | 16.6 | -7.0 | -58.5 |
VMAF | 8.0 | -97.5 | 8.1 | -94.3 | 7.9 | -55.6 | -82.4 |
VMAF-CUDA | 298.6 | -4.8 | 139.2 | -0.9 | 17.4 | -2.5 | -2.7 |
VMAF-E | 254.6 | -18.9 | 134.6 | -4.1 | 17.1 | -3.8 | -8.9 |
Compared to the baseline performance, meaning no quality measurements were made, VMAF measurements can slow down the encoding process by more than 80%. If a VOD video workflow has VMAF measurements already in place and successful content delivery depends on hitting a VMAF target or minimum VMAF score, this slowdown might be acceptable and perhaps even reduced due to process simplification and elimination of I/O. However, CPU-bound VMAF measurements are unacceptably slow for any time critical process like encoding news clips or live streaming. With a CUDA-enabled GPU present, VMAF measurements are almost not an issue in terms of measured performance decrease. If you do not happen to have a dedicated NVIDIA GPU in your system, VMAF-E is the solution, offering fast VMAF measurements at good precision with only about 8.5% decrease in encoding speed.
With vScore, we aim to make perceptual video quality more easily observable than before. Instead of spending time engineering VMAF or FFmpeg command lines and Python scripts for offline quality analysis, measurements can now be made on the fly while encoding with little additional configuration and low impact on encoding speed. Depending on the available hardware and performance requirements, users can choose between VMAF measurements on CPU or GPU and a fast VMAF estimation suitable for live streaming as well as novel applications that, for example, instantly adapt the bitrate to the measured perceptual quality.
Want to learn more about vScore or try it out? Download the free demo or contact us today.