Will LCEVC-enhanced x265 contribute to sustainability in streaming?

15 min readFeb 27, 2023

We evaluated the LCEVC codec from V-NOVA with regards to compression efficiency and encoding complexity by comparing LCEVC-enhanced x265 to native x265.

We found encoding complexity of LCEVC-enhanced x265 with 2D-scaling to be less than 40% of native x265.
VMAF measurements showed impressive bitrates saving for the same visual qualitty (as measured by VMAF) compared to native x265. However we found the subjective visual quality produced by LCEVC to be slightly below that produced by x265 for the same VMAF score, so in real world use-cases the bitrate savings may not be as large as indicated by the VMAF measurements.

All results and conclusions presented here are based on tests on a limited selection of source material and encoder operating points. We do not make any claims regarding the validity of our findings for other source materials and/or other encoder operating points.

Introduction

At Eyevinn we are always looking to stay up to date with new developments in video and streaming technologies. LCEVC is a new interesting video codec standardized by MPEG, and with an implementation available from V-NOVA.

Besides promising better video quality/lower bitrate compared to the generation of codecs currently in widespread use (ie HEVC), it also promises lower encoding complexity. With ever more focus on sustainability in the streaming field, the latter makes this codec especially interesting.

About LCEVC

LCEVC, or formally MPEG-5 Part 2 Low Complexity Enhancement Video Encoding, is a new MPEG codec standard. An implementation of the codec is available from V-NOVA.

LCEVC is an enhancement codec, meaning it uses base stream encoded with any existing codec together with an LCEVC-specific enhancement stream. The base stream is encoded at a lower resolution, either half vertical and half horizontal resolution (2D scaling mode), or full vertical and half horizontal resolution (1D scaling mode). The enhancement stream contains data that allows the decoder to do a better job of scaling up to full resolution compared to conventional algorithms.

The resulting bitstream is backwards compatible in the sense that the base layer can be decoded by a decoder that is not aware of LCEVC.

LCEVC promises

Backwards compatible on the bistream level
Leverage existing hardware decoders
Better quality/lower bitrate compared to current codec generation
Works with any codec, so could in theory improve also next generation codecs (VVC, AV1)

Test method

When evaluating a video codec, there are of course many possible test scenarios using different sources, different codec settings, and different priorities regarding encoding speed/bitrate/quality. For this evaluation, we have limited the scope to what we believe to be a good starting point for evaluating a codec for use in VOD OTT-streaming. This means we have used codec settings that prioritise visual quality over encoding speed, and mainly looked at higher resolutions/bitrates. We choose to use x265 as the base codec. LCEVC with x265 as the base codec seems to be the most likely scenario for broad adoption in the current device landscape, with hardware support for HEVC-decoding being widespread across all types of devices.

The test clips we have selected represents a selection of different challenges for the encoder, but is by no means a complete representation of all relevant content types.

We have put the rate-control performance of the LCEVC outside of the scope of this evaluation, but are hoping to be able to look at that in the future. Rate control performance is of course a very important aspect of a codec when used in real life scenarios. We have only evaluated performance with 8bit output.

We compared the following encoding methods. For each, RD-curves were produced and BD-delta calculated. The cpu-time used for each transcode was measured with gnu time and used as a measure for encoding complexity.

LCEVC enchanced x265 with LCEVC operating in 2D Scaling mode (called LCEVC-2D below)
LCEVC enchanced x265 with LCEVC operating in 1D Scaling mode (called LCEVC-1D below)
Native x265 encoded at full resolution (called x265 below)

Transcoding settings

We transcoded to 8bit yuv420p in all cases. We used CRF ratecontrol for x265 encodings, and LCEVC corresponding LCEVC_PCRF rate control for LCEVC encodings. We transcoded with crf values ranging from 22 to 36.

The below x265 settings were used for both native x265 transcoding as well as for LCEVC-enhanced x265 transcoding.

+-----------+----------+
| parameter | value    |
+-----------+----------+
| keyint    | 50       |
| minkeyint | 50       |
| preset    | veryslow |
| Scenecut  | 0        |
| open-gop  | 0        |
| scene-cut | 0        |
+-----------+----------+

For LCEVC, we set also the the following settings. All except ctu, which is an x265 setting, are lcevc specific.

+---------------------+----------+
| parameter           | value    |
+---------------------+----------+
| lcevc_preset        | 0        |
| scaling_mode_level0 | 2D / 1D  |
| preset              | veryslow |
| scenecut            | 0        |
| dc_dithering_type   | None     |
| ctu                 | 32       |
+---------------------+----------+

Example commandlines

/ffmpeg/ffmpeg -i /video/svtopencontent_forest_lake_10s.y4m \
  -y -vf scale=1920:1080 \
  -c:v libx265 -pix_fmt yuv420p \
  -preset veryslow \
  -x265-params 'crf=26:scenecut=0:keyint=50:min-keyint=50:open-gop=0' \
  -f mp4 \
  /output/svtopencontent_forest_lake_10s/x265_1920x1080/1920x1080_0_CRF_26.mp4

/ffmpeg/ffmpeg -i /video/svtopencontent_forest_lake_10s.y4m \
  -y -vf scale=1920:1080 \
  -c:v lcevc_hevc -g 50 -base_encoder x265 \
  -pix_fmt yuv420p \
  -eil_params 'lcevc_preset=0;rc_pcrf=26;scaling_mode_level0=2D;preset=veryslow;scenecut=0;dc_dithering_type=None;min-keyint=50;open-gop=0;ctu=32' \
  -f mp4 \
  /output/svtopencontent_forest_lake_10s/lcevc_1920x1080_2dscaling/1920x1080_0_RCPCRF_26.mp4

Sources

We decided to use seven 10s clips from the SVT natural complexity test suite, together with a 10s clip from Blender foundations big bucks bunny. All encodes used 25fps sources except big bucks bunny which was 24fps.

The following clips were used:

fireside
forest_lake
midnight_sun
smithy
smoke_sauna
waterfall
water_flyover
big_bucks_bunny

For details on how the sources and how they were prepared, see section “Source file details” below.

Software

For transcoding, we used ffmpeg binaries provided by v-nova. The ffmpeg binary is linked with libx265 version 3.5. We also used libvmaf provided by v-nova for the vmaf-measurements.

Evaluation of compression efficiency and visual quality

We plotted rate-distortion curves (vmaf/bitrate) for each of the test clips and each transcoding mode LCEVC-2D/LCEVC-1D/x265, and calculated BD-metrics.

Since VMAF scores do not always tell the whole picture about visual quality, we also compared visual quality manually. For this subjective visual comparison, for each source we selected encoded clips with a VMAF score as close as possible to 90 for each transcoding mode. This means the VMAF score of the clips compared could differ by one or two point, we regarded this as insignificant.

Due to time constraints, we did not make a comparison of the subjective quality at matching bitrates. This would of course also have been a relevant comparison.

Results

Subjective Visual Quality

For each test source and encoding model, we selected the encoded file with the VMAF score closest to 90 and compared those against each other to see if there were any perceived differences in the visual quality that were not captured by VMAF. The files selected for comparison are listed below.

It should be noted that we transcoded with LCEVC dithering disabled. According to the documentation dithering can have a negative impact on VMAF score, but a positive impact on subjective visual quality. We did some test with dithering enabled, but did not see any noticeable visual impact except for in the ‘smoke_sauna’ clip, where we found dithering to have a visual impact but not lead to an improvement in quality.

1920x1080

In general, we found that the VMAF scores were reasonably accurate in the sense that the compared file appeared to be close in quality when viewed from a distance of 3H (three times the height of the screen), which is the viewing distance VMAF is tuned for. However, we also found that the LCEVC encodes displayed some minor visual artifacts not present in the HEVC encodes that were just noticeable when viewed from a distance of 3H. Staircase artifacts were visible in some diagonal edges, and some ringing artifacts and noise present in some cases. In some cases there was also a shift in color/brightness, with some areas in the LCEVC encode appearing brighter than in the original source. In most cases we did not see a significant difference in visual quality between LCEVC-2D compared to LCEVC-1D.

As noted above, we compared files with similar VMAF score, so the bitrates of the compared files were significantly lower for LCEVC compared to HEVC.

CRF values used for the files compared are listed in table below together with VMAF and bitrate (bitrates in kbit/s).

+-----------------+-----------------------+------------------------------+------------------------------+
| Source          | x265 crf/bitrate/vmaf | LCEVC-1D rcpcrf/bitrate/vmaf | LCEVC-2D rcpcrf/bitrate/vmaf |
+-----------------+-----------------------+------------------------------+------------------------------+
| fireside        | 30 / 1080 / 90.96     | 32 / 761 / 88.5              | 32 / 786 / 91.71             |
| forest_lake     | 28 / 18224 / 91.8     | 30 / 17243 / 92.8            | 28 / 13372 / 90.11           |
| smithy          | 28 / 8640 / 90.28     | 30 / 8339 / 91.58            | 30 / 7248 / 88.95            |
| smoke_sauna     | 28 / 981 / 89.7       | 30 / 760 / 90.17             | 30 / 592 / 88.47             |
| waterfall       | 30 / 14077 / 91.25    | 32 / 10062 / 86.54           | 30 / 10670 / 88.82           |
| water_flyover   | 32 / 12726 / 94.2     | 32 / 11533 / 92.59           | 32 / 7370 / 87.72            |
| big_bucks_bunny | 30 / 724 / 90.26      | 32 / 540 / 89.02             | 30 / 552 / 90.90             |
| midnight_sun    | 28 / 15006 / 90.20    | 30 / 12323 / 88.99           | 28 / 11653 / 89.28           |
+-----------------+-----------------------+------------------------------+------------------------------+

big_bucks_bunny
Close in quality, perhaps slightly sharper details for x265.

fireside
x265 somewhat better, some staircasing visible in flame edges for LCEVC.

forest_lake
LCEVC-2D lower quality than x265 with visible noise in water. LCEVC-1D and x265 appears close in quality.

midnight_sun
x265 somewhat better, staircasing artefacts visible in foreground branches and wind turbine blades. LCEVC-1D slightly sharper details in water compared to LCEVC-2D.

smithy
LCEVC seem to have slightly sharper details in the smithy. LCEVC has a visible shift in brightness in the foliage, especially visible on the left side.

smoke_sauna
x265 somewhat better. LCEVC appears to have some loss of texture in the smoke, some staircasing in the flames, and some ringing in the edge above the flames.

waterfall
x265 somewhat better with slightly sharper details. LCEVC displays some staircasing artifacts in the branches on the right, and a shift in brightness in the foliage on the left.

water_flyover
No perceived difference in quality.

Sample screenshots

Some screenshots are available for the reader to compare the visual quality. It is important to note that many of the visual artifacts are in areas with fast movement, and because of that are more noticable in a still picture than in the video. The time of the screenshots have been selected to make them representative of the overall impression of the visual quality of the videos. The screenshots have been extracted from the encodes we used for the subjective visual quality comparison, for details of encoding parameter, bitrate etc see above.

fireside (t=0.88)

forest_lake (t=1.28)

midnight_sun (t=1.0)

smithy (t=1.0)

smoke_sauna (t=2.8)

waterfall (t=0.68)

water_flyover (t=0.68)

big_bucks_bunny (t=2.35)

Sample videos

The videos available below are those we used for the subjective visual quality comparison, for details of encoding parameter, bitrate etc see above. The videos representing the LCEVC enhanced x265 encodes have been transcoded to HEVC with x265 in lossless mode [1], which means they should be visually identical to the actual LCEVC encodes, but are much larger in size.

  Source            x265                                                                                                   LCEVC-1D                                                                                                      LCEVC-2D                                                                                                     
 ----------------- ------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------- ------------------------------------------------------------------------------------------------------------- 
  fireside          https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_fireside_10s_x265.mp4           https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_fireside_10s_1d_lossless.mp4           https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_fireside_10s_2d_lossless.mp4          
  forest_lake       https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_forest_lake_10s_x265.mp4        https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_forest_lake_10s_1d_lossless.mp4        https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_forest_lake_10s_2d_lossless.mp4       
  midnight_sun      https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_midnight_sun_10s_x265.mp4       https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_midnight_sun_10s_1d_lossless.mp4       https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_midnight_sun_10s_2d_lossless.mp4      
  smithy            https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smithy_10s_x265.mp4             https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smithy_10s_1d_lossless.mp4             https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smithy_10s_2d_lossless.mp4            
  smoke_sauna       https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smoke_sauna_10s_x265.mp4        https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smoke_sauna_10s_1d_lossless.mp4        https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_smoke_sauna_10s_2d_lossless.mp4       
  waterfall         https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_waterfall_10s_x265.mp4          https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_waterfall_10s_1d_lossless.mp4          https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_waterfall_10s_2d_lossless.mp4         
  water_flyover     https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_water_flyover_10s_x265.mp4      https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_water_flyover_10s_1d_lossless.mp4      https://testcontent.eyevinn.technology/lcevc-evaluation/svtopencontent_water_flyover_10s_2d_lossless.mp4     
  big_bucks_bunny   https://testcontent.eyevinn.technology/lcevc-evaluation/big_buck_bunny_1080p24_10s_from_170_x265.mp4   https://testcontent.eyevinn.technology/lcevc-evaluation/big_buck_bunny_1080p24_10s_from_170_1d_lossless.mp4   https://testcontent.eyevinn.technology/lcevc-evaluation/big_buck_bunny_1080p24_10s_from_170_2d_lossless.mp4

1280x720

The general trend was the same as for 1280x720, allthough the visual difference between HEVC and LCEVC were somewhat larger compared to what we saw for 1920x1080. Also, there were in most cases a noticable difference between LCEVC-2D and LCEVC-1D with the latter being superior in quality.

3840x2160

Due to time constraints, we unfortunately were unable to do a visual comparison of the 3840x2160 encodes.

Compression efficiency

We used the measured VMAF score together with bitrate to estimate compression efficiency of LCEVC-enhanced x265 compared to native x265.

Rate-Distortion Curves

BD-metrics

We calculated Bjøntegaard-Delta (BD) metrics with the bjontegaard python library, with piecewise cubic hermite interpolation. The metrics have been calculated for comparing LCEVC with 2D scaling against x265. It should be noted though that in a few cases for HD-resolution LCEVC with 1D scaling performed better than LCEVC with 2D scaling.

HD resolution

+=====================================+=============+=========+
| source file                         | bd-rate (%) | bd-vmaf |
+=====================================+=============+=========+
| svtopencontent_fireside_10s         | -30.48      | 5.94    |
| svtopencontent_forest_lake_10s      | -31.11      | 7.11    |
| svtopencontent_smithy_10s           | -30.10      | 3.82    |
| svtopencontent_smoke_sauna_10s      | -31.49      | 5.50    |
| svtopencontent_waterfall_10s        | -20.55      | 5.21    |
| svtopencontent_water_flyover_10s    | -1.42       | 0.10    |
| big_buck_bunny_1080p24_10s_from_170 | -25.78      | 4.42    |
| svtopencontent_midnight_sun_10s     | -22.12      | 5.21    |
+-------------------------------------+-------------+---------+

Full HD resolution

+=====================================+=============+=========+
| source file                         | bd-rate (%) | bd-vmaf |
+=====================================+=============+=========+
| svtopencontent_fireside_10s         | -32.25      | 4.64    |
| svtopencontent_forest_lake_10s      | -20.48      | 4.22    |
| svtopencontent_smithy_10s           | -10.69      | 1.00    |
| svtopencontent_smoke_sauna_10s      | -33.44      | 4.58    |
| svtopencontent_waterfall_10s        | -16.58      | 3.15    |
| svtopencontent_water_flyover_10s    | -18.15      | 2.31    |
| big_buck_bunny_1080p24_10s_from_170 | -27.39      | 3.42    |
| svtopencontent_midnight_sun_10s     | -20.04      | 4.23    |
+-------------------------------------+-------------+---------+

UHD resolution

+==================================+=============+=========+
| source file                      | bd-rate (%) | bd-vmaf |
+==================================+=============+=========+
| svtopencontent_fireside_10s      | -41.09      | 4.04    |
| svtopencontent_forest_lake_10s   | -26.46      | 3.42    |
| svtopencontent_smithy_10s        | -30.53      | 2.90    |
| svtopencontent_smoke_sauna_10s   | -45.68      | 4.20    |
| svtopencontent_waterfall_10s     | -25.55      | 2.64    |
| svtopencontent_water_flyover_10s | -23.74      | 1.35    |
| svtopencontent_midnight_sun_10s  | -26.81      | 4.29    |
+----------------------------------+-------------+---------+

Encoding Complexity

We calculated the relative encoding complexity for each source file based on the cpu time used for each encoding. We used linear interpolation to aproximate the cpu time used for a given vmaf score, and from that calculated the relative encoding complexity for LCEVC 1D and LCEVC 2D compared to x265. Mean relative complexity were calculated for each source files.

+=====================================+==========+==============+==============+
| source file                         | cpu x265 | cpu LCEVC 1D | cpu LCEVC 2D |
+=====================================+==========+==============+==============+
| svtopencontent_fireside_10s         | 1.00     | 0.71         | 0.36         |
| svtopencontent_forest_lake_10s      | 1.00     | 0.69         | 0.34         |
| svtopencontent_smithy_10s           | 1.00     | 0.72         | 0.40         |
| svtopencontent_smoke_sauna_10s      | 1.00     | 0.65         | 0.35         |
| svtopencontent_waterfall_10s        | 1.00     | 0.63         | 0.35         |
| svtopencontent_water_flyover_10s    | 1.00     | 0.71         | 0.33         |
| big_buck_bunny_1080p24_10s_from_170 | 1.00     | 0.70         | 0.37         |
| svtopencontent_midnight_sun_10s     | 1.00     | 0.63         | 0.36         |
+-------------------------------------+----------+--------------+--------------+

Conclusions

We found the encoding complexity of LCEVC-enhanced x265 operating in 2D-scaling mode to be less than 40% compared to native x265 which is a very significant gain. With LCEVC operating in 1D-scaling mode, the encoding complexity was 60–70% compared to native x265.

The compression gains measured with VMAF as the indicator of visual quality were also impressive, being in the 15–35% range for 1920x1080 encodes. However when comparing encoded video of equal VMAF scores (~90), we found the general trend to be that the visual quality of the LCEVC encodes were slightly below that of the native x265, with some just noticable artifacts being present. Most common was staircasing in diagonal edges which was visible in roughly half of the videos.
In two of the videos there were also a slight shift in brightness in some areas compared to the source video. In one of the cases (waterfall) this shift in brightness did not have a big impact on the visual quality, in the other case (smithy) the result is somewhat unpleasant visually.
In some cases other artifacts such as noise and ringing were just noticable.

For 1920x1080 encodes, we found LCEVC-2D and LCEVC-1D to be very close in visual quality in most cases, allthough in at least one case (forest_lake) LCEVC-1D had better quality. These leads us to believe care would have to be taken when selecting between 2D and 1D scaling mode since there is a relevant difference in encoding complexity and compression efficiency, while visual quality was close in most but not all cases.

For 1280x720 we found LCEVC-1D having better visual quality than LCEVC-2D in general, leading us to believe that 1D scaling mode might be a better choice in this case.

Overall our impression that in real world use-cases the compression gains compared to native x265 are probably somewhat lower than what is indicated by the VMAF-based BD-rate measurements, allthough additional investigation would be needed to find out how big this difference is.

It should also be noted that we did not test the rate control performance of LCEVC-enchanced x265, which would also be an important factor for real world use cases.

For more information about this evaluation contact Gustav Grusell at Eyevinn Technology. Eyevinn Technology are vendor independent experts on sustainable video streaming. Contact us at sales@eyevinn.se if you want to know more on how we can help you reduce your service’s carbon footprint.

Source file details

SVT Open Content

The content of the test clips from natural complexity is described in some detail here. We extracted 10s clips from the file natural_complexity_JPEG2000_SDR_3840x2160p50_YUV444_12bit_Lossless.mov[2] with the script below.

#!/usr/bin/env bash

input=natural_complexity_JPEG2000_SDR_3840x2160p50_YUV444_12bit_Lossless.mov

names=(waterfall smoke_sauna forest_lake midnight_sun smithy water_flyover fireside)
for i in $(seq 0 6)
do
    ss=$((i*10))
    ffmpeg -y -ss $ss -i $input -t 10 -c:v copy svtopencontent_${names[$i]}_10s.mov
done

The clips were transcoded to 25fps raw yuv to use as input for our encodes like below. We chose to use raw yuv as input since the prores files are quite heavy to decode.

for f in *.mov; do ffmpeg -i $f -vf fps=25 -strict -1 "${f%.mov}.y4m"; done

Big bucks bunny

Also a 10s clip extracted from big bucks bunny[3] was used. It was extracted with the command below.

ffmpeg -ss 170 -i big_buck_bunny_1080p24.y4m -t 10 big_buck_bunny_1080p24_10s_from_170.y4m

[1] The commandline used was `ffmpeg -y -c:v lcevc_hevc -i INPUT.mp4 -c:v libx265 -preset slow -x265-params ‘lossless=1 OUTPUT.mp4`
[2] ftp://svtopencontent.svt.se/pub/svt_videotestsuite_natural_complexity/REC709/natural_complexity_JPEG2000_SDR_3840x2160p50_YUV444_12bit_Lossless.mov
[3] https://media.xiph.org/video/derf/y4m/big_buck_bunny_1080p24.y4m.xz