Log in or Sign up

How to: Encode video with FFMPEG using NVENC

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Anarion, Dec 10, 2016.

Page 1 of 2

Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti
So I just noticed that FFMPEG supports NVENC now - actually it has supported it a while now but it's now enabled in Zeranoe FFMPEG builds. You'll need to download FFMPEG for this to work, obviously, so grab yourself latest 64-bit release build.

Now that you have it extract the ffmpeg binaries to some folder. Also create an empty Input folder.

Then create encode.cmd (or what ever you want to name it) and copy&paste the following in it:

Code:

@echo OFF SET input_folder=Input SET output_folder=Output SET ffmpeg_path=ffmpeg.exe if not exist "%output_folder%" mkdir "%output_folder%" REM Settings SET ext=mkv SET format=matroska REM SET videofilter=-pix_fmt yuv444p REM SET resolution=-sws_flags lanczos -s 1280x720 SET encoder=h264_nvenc REM SET encoder=hevc_nvenc SET preset=hq SET cq=20 SET sample=-sample_fmt s16 SET khz=-ar 48000 REM SET audiofilter=-af aresample=resampler=soxr:precision=28:dither_method=shibata %sample% %khz% SET videoencoder=-c:v %encoder% -rc constqp -global_quality %cq% -preset %preset% -rc-lookahead 32 -g 600 SET audioencoder=-c:a flac -compression_level 12 REM Settings end SET params=-i "%%~f" -map_metadata -1 %resolution% %videofilter% %audiofilter% %audioencoder% %videoencoder% -f %format% "%output_folder%\%%~nf.%ext%" FOR %%f IN (%input_folder%\*.*) DO ( IF EXIST "%output_folder%\%%~nf.%ext%" ( echo. echo ************************************* echo Deleting: %output_folder%\%%~nf.%ext% echo ************************************* echo. del /F "%output_folder%\%%~nf.%ext%" ) "%ffmpeg_path%" %params% ) echo. echo ************************************* echo Done! echo ************************************* echo. pause

That script will automatically process every file in your Input folder and it will make Output folder for the new files.

Few things to note:

REM basically comments out the line so if you want to change encoder to hevc_nvenc (H.265/HEVC) add REM before SET encoder=h264_nvenc and remove REM for SET encoder=hevc_nvenc.

If you want lossless encoding use preset=lossless. cq=number controls quality, lower number means better quality. -rc constqp enables constant quality rate mode which in my opinion is really, really handy and I always use it over fixed bitrate modes. It's really great to see than NVENC supports this mode and on top of that it even supports lossless encoding and yuv444p format. On top of that NVENC's constant quality rate mode works surprisingly well, quality wise.

You can also play with -temporal-aq 1 switch (works for AVC) and -spatial_aq 1 switch (works for HEVC). Add them after -preset %preset%. For AVC you can enable b frames with -b switch. NVIDIA recommended using three b-frames (-b) in one of their pdfs for optimal quality (switch: -b 3).

In this example GOP size (-g) is 600. You can adjust it manually for optimal results: target framerate x 10 so for 60 fps -g 600.

I've added bunch of other settings there too but commented them out. They are pretty self explanatory. However, if your source material is lossless RGB and you want the absolutely best quality, use preset=lossless and uncomment SET videofilter=-pix_fmt yuv444p

In this example script the container is MKV, audio codec is FLAC and video codec is H.264/AVC.
Last edited: Dec 13, 2016

Anarion, Dec 10, 2016

#1
CrazyBaldhead Guest

Useful, thanks.

CrazyBaldhead, Dec 10, 2016

#2
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

.....You are missing:

rc-lookahead (up to 32 frames)
spatial_aq (for H264 and HEVC)
temporal_aq (for H264 only) [Pick one between spatial or temporal adaptive quantization, it works with CQP rate control too, nvidia confirmed it]

Next...you forget to use hardware accelerated DECODING known as CUVID...

Then...you forget to use high quality hardware accelerated Nvidia Performance Primitives RESIZER for resizing purpose (nearest neighbour, linear, cubic, cubic2p_bspline, cubic2p_catmullrom, cubic2p_b05c03, supersampling, lanczos)

JohnLai, Dec 10, 2016

#3
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti
JohnLai said: ↑

.....You are missing:

rc-lookahead (up to 32 frames)
spatial_aq (for H264 and HEVC)
temporal_aq (for H264 only) [Pick one between spatial or temporal adaptive quantization, it works with CQP rate control too, nvidia confirmed it]

Next...you forget to use hardware accelerated DECODING known as CUVID...

Then...you forget to use high quality hardware accelerated Nvidia Performance Primitives RESIZER for resizing purpose (nearest neighbour, linear, cubic, cubic2p_bspline, cubic2p_catmullrom, cubic2p_b05c03, supersampling, lanczos)
Click to expand...

No I didn't forget. With these settings rc-lookahead does absolutely nothing. Try it.

There's a reason why I didn't include spatial_aq and temporal_aq. It's simpler to use constantgp setting instead since spatial_aq and temporal_aq increase the file size thus better quality but that you can compensate with constantgp. This is something that one can play with and it's a matter of preference. Also source material might make a difference.

By the way, -temporal-aq 1 switch works for AVC and -spatial_aq 1 works for HEVC, at least with the build I have.

No, I didn't forgot to accelerate the decoding because I simply choose not to. Why? No freaking point since the whole point was to use high quality lossless source file like something encoded with UtVideo codec. Try decode that with hardware.... Besides... Keep reading....

And when it comes to libnpp...

Code:

configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --ena ble-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --ena ble-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --e nable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-lib openh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-lib soxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvo rbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-li bxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib

Uh ohh... Find me --enable-cuvid and --enable-libnpp. You can't? Neither can I. For decoding you should use DXVA2.
Last edited: Dec 10, 2016

Anarion, Dec 10, 2016

#4
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

[@Anarion]
rc-lookahead enables adaptive GOP (adaptive I and B frame insertion/placement ; I for hevc and I/B for H264) (Scenechange behaviour)

In case of AQ mode, it tends to provide better efficiency per bitrate. (For nvenc usage though, it adds 20-25% more space usage if this is what you mean) Note, do not turn on AQ for film grain material, it is a nightmare where file size is insane.

High quality lossless source decoding? Can't argue with this since cuvid only outputs surface format in NV12 before converting it to RGBA.......

Cuvid and libnpp ---> maybe compile ffmpeg source https://github.com/jb-alvarado/media-autobuild_suite

Or simply use Rigaya transcoder? https://drive.google.com/drive/folders/0BzA4dIFteM2dS1ZUT1FjTnF3Q0E
Need to extract NPP library to same location as Nvenc executable
Here an english translation

NVEncC (x64) 3.01 by rigaya [NVENC API v7.0], build Oct 11 2016 22:02:59
reader: raw, avs, vpy, avcuvid [H.264/AVC, MPEG1, MPEG2]
Usage: NVEncC.exe [Options] -i <input file> -o <output file>

Input can be avs, raw YUV, YUV4MPEG2(y4m).
When Input is in raw format, fps, input-res is required.

Ouput format will be in raw H.264/AVC or H.265/HEVC ES.

Example:
NVEncC -i "<avsfilename>" -o "<outfilename>"
avs2pipemod -y4mp "<avsfile>" | NVEncC --y4m -i - -o "<outfilename>"

Information Options:
-h,-? --help print help
-v,--version print version info
--check-device show DeviceId for GPUs available on system --check-hw [<int>] check NVEnc codecs for specefied DeviceId
if unset, will check DeviceId #0
--check-features [<int>] check for NVEnc Features for specefied DeviceId
if unset, will check DeviceId #0
--check-environment check for Environment Info
--check-avversion show dll version
--check-codecs show codecs available
--check-encoders show audio encoders available
--check-decoders show audio decoders available
--check-formats show in/out formats available
--check-protocols show in/out protocols available
--check-filters show filters available

Basic Encoding Options:
-d,--device <int> set DeviceId used in NVEnc (default:0)

-i,--input <filename> set input filename
-o,--output <filename> set output filename

Input formats (auto detected from extension of not set)
--raw set input as raw format
--y4m set input as y4m format
--avs set input as avs format
--vpy set input as vpy format
--vpy-mt set input as vpy(mt) format
--avcuvid [<string>] use libavformat + cuvid for input
this enables full hw transcode and resize.
avcuvid mode could be set as a option
- native (default)
- cuda
--avsw set input to use avcodec + sw deocder
--input-analyze <int> set time (sec) which reader analyze input file.
default: 5 (seconds).
could be only used with avcuvid/avsw reader.
use if reader fails to detect audio stream.
--video-track <int> set video track to encode in track id
1 (default) highest resolution video track
2 next high resolution video track
...
-1 lowest resolution video track
-2 next low resolution video track
...
--video-streamid <int> set video track to encode in stream id
--audio-source <string> input extra audio file
--audio-file [<int>?][<string>:]<string>
extract audio into file.
could be only used with avcuvid/avsw reader.
below are optional,
in [<int>?], specify track number to extract.
in [<string>?], specify output format.
--trim <int>:<int>[,<int>:<int>]...
trim video for the frame range specified.
frame range should not overwrap each other.
--seek [<int>:][<int>:]<int>[.<int>] (hh:mm:ss.ms)
skip video for the time specified,
seek will be inaccurate but fast.
--input-format <string> set input format of input file.
this requires use of avcuvid/avsw reader.
-f,--output-format <string> set output format of output file.
if format is not specified, output format will
be guessed from output file extension.
set "raw" for H.264/ES output.
--audio-copy [<int>[,...]] mux audio with video during output.
could be only used with
avcuvid/avsw reader and avcodec muxer.
by default copies all audio tracks.
"--audio-copy 1,2" will extract
audio track #1 and #2.
--audio-codec [<int>?]<string>
encode audio to specified format.
in [<int>?], specify track number to encode.
--audio-bitrate [<int>?]<int>
set encode bitrate for audio (kbps).
in [<int>?], specify track number of audio.
--audio-ignore-decode-error <int> (default: 10)
set numbers of continuous packets of audio decode
error to ignore, replaced by silence.
--audio-ignore-notrack-error ignore error when audio track is unfound.
--audio-samplerate [<int>?]<int>
set sampling rate for audio (Hz).
in [<int>?], specify track number of audio.
--audio-resampler <string> set audio resampler.
swr (swresampler: default), soxr (libsoxr)
--audio-stream [<int>?][<string1>][:<string2>][,[<string1>][:<string2>]][..
set audio streams in channels.
in [<int>?], specify track number to split.
in <string1>, set input channels to use from source stream.
if unset, all input channels will be used.
in <string2>, set output channels to mix.
if unset, all input channels will be copied without mixing.
example1: --audio-stream FL,FR
splitting dual mono audio to each stream.
example2: --audio-stream :stereo
mixing input channels to stereo.
example3: --audio-stream 5.1,5.1:stereo
keeping 5.1ch audio and also adding downmixed stereo stream.
usable simbols
mono = FC
stereo = FL + FR
2.1 = FL + FR + LFE
3.0 = FL + FR + FC
3.0(back) = FL + FR + BC
3.1 = FL + FR + FC + LFE
4.0 = FL + FR + FC + BC
quad = FL + FR + BL + BR
quad(side) = FL + FR + SL + SR
5.0 = FL + FR + FC + SL + SR
5.1 = FL + FR + FC + LFE + SL + SR
6.0 = FL + FR + FC + BC + SL + SR
6.0(front) = FL + FR + FLC + FRC + SL + SR
hexagonal = FL + FR + FC + BL + BR + BC
6.1 = FL + FR + FC + LFE + BC + SL + SR
6.1(front) = FL + FR + LFE + FLC + FRC + SL + SR
7.0 = FL + FR + FC + BL + BR + SL + SR
7.0(front) = FL + FR + FC + FLC + FRC + SL + SR
7.1 = FL + FR + FC + LFE + BL + BR + SL + SR
7.1(wide) = FL + FR + FC + LFE + FLC + FRC + SL + SR
--audio-filter [<int>?]<string>
set audio filter.
in [<int>?], specify track number of audio.
--chapter-copy copy chapter to output file.
--chapter <string> set chapter from file specified.
--sub-copy [<int>[,...]] copy subtitle to output file.
these could be only used with
avcuvid/avsw reader and avcodec muxer.
below are optional,
in [<int>?], specify track number to copy.

--avsync <string> method for AV sync (default: through)
through ... assume cfr, no check but fast
forcecfr ... check timestamp and force cfr.
-m,--mux-option <string1>:<string2>
set muxer option name and value.
these could be only used with
avcuvid/avsw reader and avcodec muxer.
--input-res <int>x<int> set input resolution
--crop <int>,<int>,<int>,<int> crop pixels from left,top,right,bottom
left crop is unavailable with avcuivid reader
--output-res <int>x<int> set output resolution
--fps <int>/<int> or <float> set framerate

-c,--codec <string> set ouput codec
h264 (or avc), h265 (or hevc)
--profile <string> set codec profile
H.264: baseline, main, high(default), high444
HEVC : main, main10, main444
--level <string> set codec level
- H.264: auto(default), 1, 1b, 1.1, 1.2, 1.3
2, 2.1, 2.2, 3, 3.1, 3.2, 4, 4.1, 4.2
5, 5.1, 5.2
- HEVC: auto(default), 1, 2, 2.1, 3, 3.1, 4
4.1, 5, 5.1, 5.2, 6, 6.1, 6.2
--output-depth <int> set output bit depth ( 8(default), 10 )
--sar <int>:<int> set SAR ratio
--dar <int>:<int> set DAR ratio

--cqp <int> or encode in Constant QP mode
<int>:<int>:<int> Default: <I>:<P>:<B>=<20>:<23>:<25>
--vbr <int> set bitrate for VBR mode (kbps)
--vbr2 <int> set bitrate for VBR2 mode (kbps)
--cbr <int> set bitrate for CBR mode (kbps)
Default: 7500 kbps

--vbr-quality <int> set target quality for VBR mode (0-51, 0 = auto)
--max-bitrate <int> set Max Bitrate (kbps) / Default: 17500 kbps
--qp-init <int> or set initial QP
<int>:<int>:<int> Default: auto
--qp-max <int> or set max QP
<int>:<int>:<int> Default: unset
--qp-min <int> or set min QP
<int>:<int>:<int> Default: unset
--lookahead <int> enable lookahead and set lookahead depth (1-32)
Default: 16 frames
--gop-len <int> set GOP Length / Default: 0 frames
--strict-gop avoid GOP len fluctuation
--no-i-apdat disable adapt. I frame insertion on lookahead mode
--no-b-apdat disable adapt. B frame insertion on lookahead mode
Default: off
-b,--bframes <int> set B frames / Default 1074913776 frames
--ref <int> set Ref frames / Default 3 frames
--enable-ltr enable LTR (Long Term Reference pictures)
--aq enable spatial adaptive quantization
--aq-temporal enable temporal adaptive quantization (FOR H.264 ONLY)
--aq-strength <int> set aq strength (weak 1 - 15 strong)
FOR H.264 ONLY, Default: auto(= 0)
--mv-precision <string> set MV Precision / Default: Q-pel
Q-pel (High Quality)
half-pel
full-pel (Low Quality)
--vbv-bufsize <int> set vbv buffer size (kbit) / Default: auto
--vpp-deinterlace <string> set deinterlace mode / Default: none
none, bob, adaptive (normal)
available only with avcuvid reader
--vpp-resize <string> default, nn, npp_linear, cubic
cubic_bspline, cubic_catmull, cubic_b05c03
super, lanczos, bilinear, spline36
default: default
--vpp-gauss <int> disabled, 3, 5, 7
default: disabled
--vpp-knn [<param1>=<value>][,<param2>=<value>][...] enable denoise filter by K-nearest neighbor.
params
radius=<int> set radius of knn (default=3)
strength=<float> set strength of knn (default=0.08, 0.0-1.0)
lerp=<float> set balance of orig and blended pixel (default=0.20)
lower value results strong denoise.
th_lerp=<float> set threshold for detecting edge (default=0.80, 0.0-1.0)
higher value will preserve edge.
--vpp-pmd [<param1>=<value>][,<param2>=<value>][...] enable denoise filter by pmd.
params
apply_count=<int> set count to apply pmd denoise (default=2)
strength=<float> set strength of pmd (default=100.00, 0.0-100.0)
threshold=<float> set threshold of pmd (default=100.00, 0.0-255.0)
lower value will preserve edge.
--vpp-delogo <string> set delogo file path
--vpp-delogo-select <string> set target logo name or auto select file
or logo index starting from 1.
--vpp-delogo-pos <int>:<int> set delogo pos offset
--vpp-delogo-depth <int> set delogo depth [default:16]
--vpp-delogo-y <int> set delogo y param
--vpp-delogo-cb <int> set delogo cb param
--vpp-delogo-cr <int> set delogo cr param
--vpp-perf-monitor check duration of each filter.
may decrease overall transcode performance.
--videoformat <string> undef, ntsc, component, pal, secam, mac
default: undef
--colormatrix <string> undef, auto, bt709, smpte170m, bt470bg
smpte240m, YCgCo, fcc, GBR, bt2020nc
bt2020c
default: undef
--colorprim <string> undef, auto, bt709, smpte170m, bt470m
bt470bg, smpte240m, film, bt2020
default: undef
--transfer <string> undef, auto, bt709, smpte170m, bt470m
bt470bg, smpte240m, linear, log100, log316
iec61966-2-4, bt1361e, iec61966-2-1
bt2020-10, bt2020-12, smpte-st-2084
smpte-st-428, arib-srd-b67
default: undef
--fullrange set fullrange
--output-buf <int> buffer size for output in MByte
default 8 MB (0-128)
--max-procfps <int> limit encoding performance to lower resource usage.
default:0 (no limit)
--output-thread <int> set output thread num
-1: auto (= default)
0: disable (slow, but less memory usage)
1: use one thread

--log <string> set log file name
--log-level <string> set log level
debug, info(default), warn, error
--log-framelist <string> output frame info of avcuvid reader to path

H.264/AVC
--tff same as --interlaced tff
--bff same as --interlaced bff
--interlaced <string> interlaced encoding
tff, bff
--cabac use CABAC
--cavlc use CAVLC (no CABAC)
--bluray for bluray / Default: off
--lossless for lossless / Default: off
--(no-)deblock enable(disable) deblock filter

H.265/HEVC
--cu-max <int> set max CU size
--cu-min <int> set min CU size
8, 16, 32 are avaliable
warning: it is not recommended to use --cu-max or --cu-min,
leaving it auto will enhance video quality.
Click to expand...

JohnLai, Dec 11, 2016

#5
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

[@Anarion]
rc-lookahead enables adaptive GOP (adaptive I and B frame insertion/placement ; I for hevc and I/B for H264) (Scenechange behaviour)

In case of AQ mode, it tends to provide better efficiency per bitrate. (For nvenc usage though, it adds 20-25% more space usage if this is what you mean) Note, do not turn on AQ for film grain material, it is a nightmare where file size is insane.

High quality lossless source decoding? Can't argue with this since cuvid only outputs surface format in NV12 before converting it to RGBA.......

Cuvid and libnpp ---> maybe compile ffmpeg source https://github.com/jb-alvarado/media-autobuild_suite

Or simply use Rigaya transcoder? https://drive.google.com/drive/folders/0BzA4dIFteM2dS1ZUT1FjTnF3Q0E
Need to extract NPP library to same location as Nvenc executable
Here an english translation
Click to expand...

I've read that hq preset should enable B-frames for Pascal but that's definitely not the case currently. Same goes to rc-lookahead which does absolutely nothing at least with when using constant quality mode. Output is the same, and no b-frames. There are either bugs in FFMPEG integration or some things just don't work yet at the NVIDIA's end.

AQ modes increase quality but the also increase the bitrate when using the same constant quality setting. You can generally get pretty much the same result and size if you change the constant quality factor. It's something you'd probably like to use with constant or variable bitrate mode. With constant quality mode it doesn't hurt but in this case to keep things simple I left them out since hevc_nvenc doesn't work if you give both AQ options. In perfect world setting like that would improve the quality without increasing the bit rate (it still might, to a point, so it makes sense to definitely use it with constant or variable bitrate mode).

The whole point of this post was to keep things simple, thus the commonly used Zeranoe builds, and not how to compile your own FFMPEG with --enable-nonfree and the rest. If I understand correctly redistributing FFMPEG with --enable-nonfree is forbidden. --enable-nonfree is needed if you want to --enable-cuvid and --enable-libnpp.

I had full range options in the video filter but NVENC doesn't seem to support that so the video file would end up being too dark. Also it looks like it's best to not make the rec.709 conversion either. Copy paste leftovers from batch file that I use for libx264.

Last edited: Dec 11, 2016

Anarion, Dec 11, 2016

#6
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

Anarion said: ↑

I've read that hq preset should enable B-frames for Pascal but that's definitely not the case currently. Same goes to rc-lookahead which does absolutely nothing at least with when using constant quality mode. Output is the same, and no b-frames. There are either bugs in FFMPEG integration or some things just don't work yet at the NVIDIA's end.

AQ modes increase quality but the also increase the bitrate when using the same constant quality setting. You can generally get pretty much the same result and size if you change the constant quality factor. It's something you'd probably like to use with constant or variable bitrate mode. With constant quality mode it doesn't hurt but in this case to keep things simple I left them out since hevc_nvenc doesn't work if you give both AQ options. In perfect world setting like that would improve the quality without increasing the bit rate (it still might, to a point, so it makes sense to definitely use it with constant or variable bitrate mode).

The whole point of this post was to keep things simple, thus the commonly used Zeranoe builds, and not how to compile your own FFMPEG with --enable-nonfree and the rest. If I understand correctly redistributing FFMPEG with --enable-nonfree is forbidden. --enable-nonfree is needed if you want to --enable-cuvid and --enable-libnpp.

I had full range options in the video filter but NVENC doesn't seem to support that so the video file would end up being too dark. Also it looks like it's best to not make the rec.709 conversion either.
Click to expand...

Pascal doesn't support B-Frame for HEVC encoding. (Hardware limitation)

RC-Lookahead works. Verified it by using HEVC bitstream analyzer.
Let say if one set IPB QP of 20:23:25 [ignore 25 since no b-frame support]
It correctly designate new scene transition as I-frame using lower QP of 20 and next frame is designated as P-frame using QP of 23 (with some Intra-frame block for delta change between those two frame)

If you means CQ......it only works with VBR, not CQP.

*By the way, you do realize the ffmpeg nvenc is using default quantizer value of 28, no? Presets don't change default quantizer value. And CQ in VBR mode only work if the INITIAL QP is set to 1:1:1. Since the default initial qp is corresponding to 28, your CQ=20 basically do nothing in first place (not to mention it doesn't work in CQP mode).

EDIT:
As per your color BT709 issue......actually...it is nvenc fault...
https://devtalk.nvidia.com/default/...hnologies/nvenc-hevc-with-full-range-colors-/
The user analysis of YCbCr value is correct.

Last edited: Dec 11, 2016

JohnLai, Dec 11, 2016

#7
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

Pascal doesn't support B-Frame for HEVC encoding. (Hardware limitation)

RC-Lookahead works. Verified it by using HEVC bitstream analyzer.
Let say if one set IPB QP of 20:23:25 [ignore 25 since no b-frame support]
It correctly designate new scene transition as I-frame using lower QP of 20 and next frame is designated as P-frame using QP of 23 (with some Intra-frame block for delta change between those two frame)

If you means CQ......it only works with VBR, not CQP.

*By the way, you do realize the ffmpeg nvenc is using default quantizer value of 28, no? Presets don't change default quantizer value. And CQ in VBR mode only work if the INITIAL QP is set to 1:1:1. Since the default initial qp is corresponding to 28, your CQ=20 basically do nothing in first place (not to mention it doesn't work in CQP mode).
Click to expand...

I wasn't talking about HEVC.

Again, there wasn't any image quality difference. At least with FFMPEG 3.2 and these settings. Try it and compare frames, I've used various samples. There are definitely bugs and some weird things with NVENC at the moment.

Do you even look what settings I use? -rc constqp with -global_quality is basically what -crf does with libx264, the outcome that is....... I can guarantee you that this setting works as intended. If you wonder about why I just happened to name that variable cq then don't bother worrying about it. It's just a batch variable name.

Last edited: Dec 11, 2016

Anarion, Dec 11, 2016

#8
Martigen Master Guru

Messages:

535

Likes Received:

254

GPU:

GTX 1080Ti SLI

I don't know what you two are arguing about but I'm getting the popcorn.

*munch*

Martigen, Dec 11, 2016

#9

Andy_K likes this.
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

Anarion said: ↑

I wasn't talking about HEVC.

Again, there wasn't any image quality difference. At least with FFMPEG 3.2.

Do you even look what settings I use? -rc constqp with -global_quality is basically what -crf does with libx264....... I can guarantee you that this setting works as intended.
Click to expand...

Oh, I see, didn't saw the -global_quality flag.

Anyway, I made a mistake on ffmpeg nvenc default quantizer value, default is 26, not 28.
Since you get -global_quality flag to 20, guess it is fine.

Now this explains why you didn't notice any different with lookahead:

rc->rateControlMode = NV_ENC_PARAMS_RC_CONSTQP;
rc->constQP.qpInterB = avctx->global_quality;
rc->constQP.qpInterP = avctx->global_quality;
rc->constQP.qpIntra = avctx->global_quality;

Basically, you are using the same quantizer for I, P , B frames.

When -global_quality is used for x264, it links to x264 --crf where x264 will adjust I P B quantizers accordingly.
In x264 crf code, there are functions (ipratio & pbratio) where I-frame quantizer should be lower than P frame by 3 and B-frame quantizer should be higher than P frame by 2. Then again, there are adaptive quantization feature which will vary these scales accordingly depending on source complexity.

No idea why ffmpeg developer think that settings I P B with same quantizer value is a good idea.

Moving on to HQ preset issue.
This is weird, no B-Frame for Pascal H264 encoding? It should have default 3 B-frames for HQ preset.
Quick question, are you using latest driver?
How about manually set gop_size (general formula is GOP length = 10 x frame rate), max_b_frames (nvenc max is 4 b-frames for h264) and refs (amount depends on h264 level?)

Edit: Pardon my grammatical mistake, english isn't my primary language.

Last edited: Dec 11, 2016

JohnLai, Dec 11, 2016

#10
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

Oh, I see, didn't saw the -global_quality flag.

Anyway, I made a mistake on ffmpeg nvenc default quantizer value, default is 26, not 28.
Since you get -global_quality flag to 20, guess it is fine.

Now this explains why you didn't notice any different with lookahead:

rc->rateControlMode = NV_ENC_PARAMS_RC_CONSTQP;
rc->constQP.qpInterB = avctx->global_quality;
rc->constQP.qpInterP = avctx->global_quality;
rc->constQP.qpIntra = avctx->global_quality;

Basically, you are using the same quantizer for I, P , B frames.

When -global_quality is used for x264, it links to x264 --crf where x264 will adjust I P B quantizers accordingly.
In x264 crf code, there are functions (ipratio & pbratio) where I-frame quantizer should be lower than P frame by 3 and B-frame quantizer should be higher than P frame by 2. Then again, there are adaptive quantization feature which will vary these scales accordingly depending on source complexity.

No idea why ffmpeg developer think that settings I P B with same quantizer value is a good idea.

Moving on to HQ preset issue.
This is weird, no B-Frame for Pascal H264 encoding? It should have default 3 B-frames for HQ preset.
Quick question, are you using latest driver?
How about manually set gop_size (general formula is GOP length = 10 x frame rate), max_b_frames (nvenc max is 4 b-frames for h264) and refs (amount depends on h264 level?)

Edit: Pardon my grammatical mistake, english isn't my primary language.
Click to expand...

I just realised that you could ditch -global_quality and instead use, say, -rc vbr and then use -qmin and -qmax instead (outcome is not quite the same - obviously). But... Even with -rc vbr the outcome is exactly the same. No b-frames. Still, bit weird that -rc constqp requires you to use -global_quality. It would be great to get it working. This needs some testing...

It looks like it's not currently possible to manually set gop_size and max_b_frames.

I haven't checked what's the situation with nightly builds.

EDIT:
:facepalm: It looks like there are more settings (http://developer.download.nvidia.co...with-NVIDIA-Acceleration-on-Ubuntu_UG_v01.pdf) than what ffmpeg -h encoder=h264_nvenc shows. Now I got b-frames to work. Use slow preset, then -b (int) switch for bframes and -g (int) switch for gop. Apparently -b switch will automatically enable lookahead. Need to do some more testing...

EDIT:
With -b switch b-frames work with -rc constqp too, slow preset. Now... It looks like -rc vbr_2pass with -qmin, -qmax, -b and preset slow (and maybe even -b:v to something sane to improve quality) might be the best choice. Though, -rc constqp is not necessarily a bad idea since it will use global_quality setting for everything (b, i, p). EDIT: Actually, it's not good idea to use vbr_2pass and try to use it like constant quality mode. Also there doesn't seem to be any difference between slow and hq preset.

Last edited: Dec 13, 2016

Anarion, Dec 11, 2016

#11
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

Anarion said: ↑

EDIT:
:facepalm: It looks like there are more settings (http://developer.download.nvidia.co...with-NVIDIA-Acceleration-on-Ubuntu_UG_v01.pdf) than what ffmpeg -h encoder=h264_nvenc shows. Now I got b-frames to work. Use slow preset, then -b (int) switch for bframes and -g (int) switch for gop. Apparently -b switch will automatically enable lookahead. Need to do some more testing...

EDIT:
With -b switch b-frames work with -rc constqp too, slow preset. Now... It looks like -rc vbr_2pass with -qmin, -qmax, -bf and preset slow might be the best choice.
Click to expand...

Oh....defaults are:
static const AVCodecDefault defaults[] = {
{ "b", "2M" },
{ "qmin", "-1" },
{ "qmax", "-1" },
{ "qdiff", "-1" },
{ "qblur", "-1" },
{ "qcomp", "-1" },
{ "g", "250" },
{ "bf", "0" },
{ NULL },

Now that explain why there is 0 B-frames for ffmpeg h264 encoding.

Problem lies with Initial QP (for VBR & VBR2) and QP for CQP. If there is only a way to edit the default quantizer value for the lookahead to works its magic.

Nvidia proposes -preset slow -cq 10 -g 150 at its website, but -cq is not working as intended. And 150 GOP size is somehow too low.

For H264 encoding, it would be CQP or vbr_2pass (if you use VBR/VBR2Pass mode, better set bitrate manually, ffmpeg has its default bitrate at 2Mbps which is ridiculously low), temporal_aq, 32 frames for rc-lookahead, GOP at framerate X 10, b-frames at 4, refs at 4.

JohnLai, Dec 11, 2016

#12
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

Oh....defaults are:
static const AVCodecDefault defaults[] = {
{ "b", "2M" },
{ "qmin", "-1" },
{ "qmax", "-1" },
{ "qdiff", "-1" },
{ "qblur", "-1" },
{ "qcomp", "-1" },
{ "g", "250" },
{ "bf", "0" },
{ NULL },

Now that explain why there is 0 B-frames for ffmpeg h264 encoding.

Problem lies with Initial QP (for VBR & VBR2) and QP for CQP. If there is only a way to edit the default quantizer value for the lookahead to works its magic.

Nvidia proposes -preset slow -cq 10 -g 150 at its website, but -cq is not working as intended. And 150 GOP size is somehow too low.

For H264 encoding, it would be CQP or vbr_2pass (if you use VBR/VBR2Pass mode, better set bitrate manually, ffmpeg has its default bitrate at 2Mbps which is ridiculously low), temporal_aq, 32 frames for rc-lookahead, GOP at framerate X 10, b-frames at 4, refs at 4.
Click to expand...

Aye.

While setting -qmin should pretty much override the default 2M bitrate, it doesn't look like that's the case for b-frames for example. So -rc constqp and global_quality isn't so bad idea and it's starting to make sense to me.

By the way, if you use rc-lookahead 32 this happens:
[h264_nvenc @ 0000000002575c00] Defined rc_lookahead requires more surfaces, increasing used surfaces 32 -> 42
so I guess it would automatically use 42?

Anarion, Dec 11, 2016

#13
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

Anarion said: ↑

Though, -rc constqp is not necessarily a bad idea since it will use global_quality setting for everything (b, i, p).
Click to expand...

Technically, using same quantizer values for I P B frames are a bad idea from quality/efficiency perspective.
You want P and B frames to refer to a very high quality I-frame (low quantizer) and B-frame to use high quantizer.
Hmmm......maybe wikipedia can help me to explain https://en.wikipedia.org/wiki/Inter_frame

Anarion said: ↑

Aye.

While setting -qmin should pretty much override the default 2M bitrate, it doesn't look like that's the case for b-frames for example. So -rc constqp and global_quality isn't so bad idea and it's starting to make sense to me.

By the way, if you use rc-lookahead 32 this happens:
[h264_nvenc @ 0000000002575c00] Defined rc_lookahead requires more surfaces, increasing used surfaces 32 -> 42
so I guess it would automatically use 42?
Click to expand...

Not sure if setting -qmin will do the trick
-qmin and -qmax only applicable for VBR.
Priority level for vbr:
1)Bitrate
2)Initial QP
3)-qmin and -qmax

About the surfaces, yes, ffmpeg will automatically increases required surfaces.
Extra info, Nvenc sdk mentions default value to be 16 with 32 frames being maximum.

Don't forget to set reference frames values too.

JohnLai, Dec 11, 2016

#14
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

Technically, using same quantizer values for I P B frames are a bad idea from quality/efficiency perspective.
You want P and B frames to refer to a very high quality I-frame (low quantizer) and B-frame to use high quantizer.
Hmmm......maybe wikipedia can help me to explain https://en.wikipedia.org/wiki/Inter_frame

Not sure if setting -qmin will do the trick
-qmin and -qmax only applicable for VBR.
Priority level for vbr:
1)Bitrate
2)Initial QP
3)-qmin and -qmax

About the surfaces, yes, ffmpeg will automatically increases required surfaces.
Extra info, Nvenc sdk mentions default value to be 16 with 32 frames being maximum.

Don't forget to set reference frames values too.
Click to expand...

Yeah but in this case (-rc constqp) it would be efficiency issue (wasting bitrate) and would give best quality. With rc vbr_2pass and -qmin & -qmax quality can end up being rather subpar (for b-frames for example) unless once doesn't rise the default bit rate. Then again if you use too high bit rate...

Anarion, Dec 11, 2016

#15
JohnLai Guest

Messages:

136

Likes Received:

7

GPU:

ASUS GTX 970 3.5+0.5GB

Anarion said: ↑

Yeah but in this case (-rc constqp) it would be efficiency issue (wasting bitrate) and would give best quality. With rc vbr_2pass and -qmin & -qmax quality can end up being rather subpar (for b-frames for example) unless once doesn't rise the default bit rate. Then again if you use too high bit rate...
Click to expand...

Quality per bitrate?

Here some data using different quantizer values for CQP rate control.

DATA 1
frame type IDR 19
frame type I 19, avgQP 20.00, total size 2.54 MB
frame type P 442, avgQP 20.00, total size 12.66 MB
frame type B 1723, avgQP 20.00, total size 24.10 MB
Total Size 39.3 MB

DATA 2
frame type IDR 19
frame type I 19, avgQP 16.00, total size 3.87 MB
frame type P 442, avgQP 19.00, total size 15.69 MB
frame type B 1723, avgQP 21.00, total size 18.75 MB
Total Size 38.31 MB

DATA 3
frame type IDR 19
frame type I 19, avgQP 15.00, total size 4.31 MB
frame type P 442, avgQP 18.00, total size 18.01 MB
frame type B 1723, avgQP 20.00, total size 20.88 MB
Total Size 43.2 MB

DATA 4
frame type IDR 19
frame type I 19, avgQP 20.00, total size 2.54 MB
frame type P 442, avgQP 23.00, total size 7.76 MB
frame type B 1723, avgQP 25.00, total size 11.13 MB
Total size 21.43 MB

Which one do you prefer? First data set or Second data set result?

JohnLai, Dec 11, 2016

#16
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

JohnLai said: ↑

Quality per bitrate?

Here some data using different quantizer values for CQP rate control.

DATA 1
frame type IDR 19
frame type I 19, avgQP 20.00, total size 2.54 MB
frame type P 442, avgQP 20.00, total size 12.66 MB
frame type B 1723, avgQP 20.00, total size 24.10 MB
Total Size 39.3 MB

DATA 2
frame type IDR 19
frame type I 19, avgQP 16.00, total size 3.87 MB
frame type P 442, avgQP 19.00, total size 15.69 MB
frame type B 1723, avgQP 21.00, total size 18.75 MB
Total Size 38.31 MB

DATA 3
frame type IDR 19
frame type I 19, avgQP 15.00, total size 4.31 MB
frame type P 442, avgQP 18.00, total size 18.01 MB
frame type B 1723, avgQP 20.00, total size 20.88 MB
Total Size 43.2 MB

DATA 4
frame type IDR 19
frame type I 19, avgQP 20.00, total size 2.54 MB
frame type P 442, avgQP 23.00, total size 7.76 MB
frame type B 1723, avgQP 25.00, total size 11.13 MB
Total size 21.43 MB

Which one do you prefer? First data set or Second data set result?
Click to expand...

DATA1 vs. DATA2? DATA2.

But in this case, assuming that -cq -qmin -qmax all would be same and one leaves bitrate to FFMPEG default the results are pretty horrible for I and B frames (with NVENC when using FFMPEG that is - I wonder if it's a bug: those settings seem to affect only P frames). So for what best for consistent quality and best efficiency... ¯\__/¯ Maybe just whack the bitrate to sky high and trust -cq -qmin -qmax... Considering the NVENC quirks when used through FFMPEG what would you do?

Last edited: Dec 12, 2016

Anarion, Dec 11, 2016

#17
chumanga1 Member Guru

Messages:

116

Likes Received:

0

GPU:

GTX770

How about encoding speed at max quality? NVIDIA says there is 200fps for Maxwell and 300fps for Pascal at 1080p in quality mode. Does it really achieve such speed for single encoding?
If true thats massive because Polaris only does 56fps.

chumanga1, Dec 12, 2016

#18
Anarion Ancient Guru

Messages:

13,599

Likes Received:

387

GPU:

GeForce RTX 3060 Ti

chumanga1 said: ↑

How about encoding speed at max quality? NVIDIA says there is 200fps for Maxwell and 300fps for Pascal at 1080p in quality mode. Does it really achieve such speed for single encoding?
If true thats massive because Polaris only does 56fps.
Click to expand...

If the source is encoded with something like UtVideo or Lagarith then decoding is the bottleneck. If it's completely uncompressed video then it's possible that IO read speed can be the bottleneck. If you capture the content directly (i.e. gameplay) then the capturing can be a bottleneck. It can achieve really fast speeds, obviously quality still is not on par with good encoders like x264 (but speed is on another level).

NVENC lossless encoding is really fast too but in that case IO write speeds can limit the encoding speed.

Anarion, Dec 12, 2016

#19
chumanga1 Member Guru

Messages:

116

Likes Received:

0

GPU:

GTX770

Anarion said: ↑

If the source is encoded with something like UtVideo or Lagarith then decoding is the bottleneck. If it's completely uncompressed video then it's possible that IO read speed can be the bottleneck. If you capture the content directly (i.e. gameplay) then the capturing can be a bottleneck. It can achieve really fast speeds, obviously quality still is not on par with good encoders like x264 (but speed is on another level).

NVENC lossless encoding is really fast too but in that case IO write speeds can limit the encoding speed.
Click to expand...

Most of my source footage is AVC 50Mbit/s~ so IO is not a concern for bottleneck. I will use most of the ASIC encoder for some specific situation where the source codec is broken for Vegas editor, recording with Quicksync in OBS-Studio cause footage to become incompatible with Vegas so i will want some fast encoder just to make it compatible. My i7 can do x264 veryfast at 100fps, NVENC has quality approach of x264 veryfast and if it can do 2-3x more performance will be a good alternative for my i7 for some encodings where i use enough bitrate fo keep quality.

In other point i will like now to use FFmpeg to do the encoding part from my Vegas rendering with frameserver, since Debug frameserver dont work with x64 codecs like Staxrip which has all Rigaya ASIC bundled inside. Rigaya has x86 version but their software is not transcoding audio from avisynth so Ffmpeg do that trick and will be nice to use.

Just trying out and using Nvenc1 which can do only 60fps at HQ and it do a good job for Vegas rendering. SonyAVC+GPU encoding acceleration template apparently make use of GPU for motion estimation in encoding which boost speed but make output quality very bad at lower bitrate and even NVENC itself can keep quality better. In a specific rendering using Nvenc it improved by 40% rendering time over SonyAVC.

Only big downside from NVIDIA is the crappy opencl performance they have in Vegas for video features like compositing and video FX with Kepler, some say they improved opencl for Maxwell and Pascal but i dont find if it's true for Vegas video processing since NVIDIA in past already was doing good opencl job at some opencl benchmarks but at video processing(Vegas only) it always performed badly. At least nvenc boosting encoding can make it worthy against AMD which have crappy VCE. Even my 2012 Nvenc is on par with Polaris latest VCE on speed and quality for AVC at 1080p.

chumanga1, Dec 14, 2016

#20

(You must log in or sign up to reply here.)

Page 1 of 2

Share This Page