How to: Encode video with FFMPEG using NVENC

Discussion in 'Videocards - NVIDIA GeForce Drivers Section' started by Anarion, Dec 10, 2016.

  1. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    So I just noticed that FFMPEG supports NVENC now - actually it has supported it a while now but it's now enabled in Zeranoe FFMPEG builds. You'll need to download FFMPEG for this to work, obviously, so grab yourself latest 64-bit release build.

    Now that you have it extract the ffmpeg binaries to some folder. Also create an empty Input folder.
    [​IMG]

    Then create encode.cmd (or what ever you want to name it) and copy&paste the following in it:
    Code:
    @echo OFF
    SET input_folder=Input
    SET output_folder=Output
    SET ffmpeg_path=ffmpeg.exe
    if not exist "%output_folder%" mkdir "%output_folder%"
    
    REM Settings
    
    SET ext=mkv
    SET format=matroska
    
    REM SET videofilter=-pix_fmt yuv444p
    REM SET resolution=-sws_flags lanczos -s 1280x720
    SET encoder=h264_nvenc
    REM SET encoder=hevc_nvenc
    SET preset=hq
    SET cq=20
    SET sample=-sample_fmt s16
    SET khz=-ar 48000
    REM SET audiofilter=-af aresample=resampler=soxr:precision=28:dither_method=shibata %sample% %khz%
    SET videoencoder=-c:v %encoder% -rc constqp -global_quality %cq% -preset %preset% -rc-lookahead 32 -g 600
    SET audioencoder=-c:a flac -compression_level 12
    
    REM Settings end
    
    SET params=-i "%%~f" -map_metadata -1 %resolution% %videofilter% %audiofilter% %audioencoder% %videoencoder% -f %format% "%output_folder%\%%~nf.%ext%"
    
    
    
    FOR %%f IN (%input_folder%\*.*) DO (
    
    IF EXIST "%output_folder%\%%~nf.%ext%" (
    echo. 
    echo *************************************
    echo Deleting: %output_folder%\%%~nf.%ext%
    echo *************************************
    echo. 
    del /F "%output_folder%\%%~nf.%ext%"
    )
    
    "%ffmpeg_path%" %params%
    
    )
    
    echo. 
    echo *************************************
    echo Done!
    echo *************************************
    echo. 
        
    pause
    
    That script will automatically process every file in your Input folder and it will make Output folder for the new files.

    Few things to note:
    • REM basically comments out the line so if you want to change encoder to hevc_nvenc (H.265/HEVC) add REM before SET encoder=h264_nvenc and remove REM for SET encoder=hevc_nvenc.
    • If you want lossless encoding use preset=lossless. cq=number controls quality, lower number means better quality. -rc constqp enables constant quality rate mode which in my opinion is really, really handy and I always use it over fixed bitrate modes. It's really great to see than NVENC supports this mode and on top of that it even supports lossless encoding and yuv444p format. On top of that NVENC's constant quality rate mode works surprisingly well, quality wise.
    • You can also play with -temporal-aq 1 switch (works for AVC) and -spatial_aq 1 switch (works for HEVC). Add them after -preset %preset%. For AVC you can enable b frames with -b switch. NVIDIA recommended using three b-frames (-b) in one of their pdfs for optimal quality (switch: -b 3).
    • In this example GOP size (-g) is 600. You can adjust it manually for optimal results: target framerate x 10 so for 60 fps -g 600.
    • I've added bunch of other settings there too but commented them out. They are pretty self explanatory. However, if your source material is lossless RGB and you want the absolutely best quality, use preset=lossless and uncomment SET videofilter=-pix_fmt yuv444p
    • In this example script the container is MKV, audio codec is FLAC and video codec is H.264/AVC.
     
    Last edited: Dec 13, 2016
  2. Useful, thanks.
     
  3. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    .....You are missing:

    rc-lookahead (up to 32 frames)
    spatial_aq (for H264 and HEVC)
    temporal_aq (for H264 only) [Pick one between spatial or temporal adaptive quantization, it works with CQP rate control too, nvidia confirmed it]

    Next...you forget to use hardware accelerated DECODING known as CUVID...

    Then...you forget to use high quality hardware accelerated Nvidia Performance Primitives RESIZER for resizing purpose (nearest neighbour, linear, cubic, cubic2p_bspline, cubic2p_catmullrom, cubic2p_b05c03, supersampling, lanczos)
     
  4. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    No I didn't forget. With these settings rc-lookahead does absolutely nothing. Try it.

    There's a reason why I didn't include spatial_aq and temporal_aq. It's simpler to use constantgp setting instead since spatial_aq and temporal_aq increase the file size thus better quality but that you can compensate with constantgp. This is something that one can play with and it's a matter of preference. Also source material might make a difference.

    By the way, -temporal-aq 1 switch works for AVC and -spatial_aq 1 works for HEVC, at least with the build I have.

    No, I didn't forgot to accelerate the decoding because I simply choose not to. Why? No freaking point since the whole point was to use high quality lossless source file like something encoded with UtVideo codec. Try decode that with hardware.... Besides... Keep reading....

    And when it comes to libnpp...
    Code:
    configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-dxva2 --enable-libmfx --enable-nvenc --ena
    ble-avisynth --enable-bzlib --enable-libebur128 --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --ena
    ble-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --e
    nable-libilbc --enable-libmodplug --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-lib
    openh264 --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-lib
    soxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvo
    rbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-li
    bxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib
    
    Uh ohh... Find me --enable-cuvid and --enable-libnpp. You can't? Neither can I. For decoding you should use DXVA2.
     
    Last edited: Dec 10, 2016

  5. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    [@Anarion]
    rc-lookahead enables adaptive GOP (adaptive I and B frame insertion/placement ; I for hevc and I/B for H264) (Scenechange behaviour)

    In case of AQ mode, it tends to provide better efficiency per bitrate. (For nvenc usage though, it adds 20-25% more space usage if this is what you mean) Note, do not turn on AQ for film grain material, it is a nightmare where file size is insane.

    High quality lossless source decoding? Can't argue with this since cuvid only outputs surface format in NV12 before converting it to RGBA.......

    Cuvid and libnpp ---> maybe compile ffmpeg source https://github.com/jb-alvarado/media-autobuild_suite

    Or simply use Rigaya transcoder? https://drive.google.com/drive/folders/0BzA4dIFteM2dS1ZUT1FjTnF3Q0E
    Need to extract NPP library to same location as Nvenc executable
    Here an english translation
     
  6. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    I've read that hq preset should enable B-frames for Pascal but that's definitely not the case currently. Same goes to rc-lookahead which does absolutely nothing at least with when using constant quality mode. Output is the same, and no b-frames. There are either bugs in FFMPEG integration or some things just don't work yet at the NVIDIA's end.

    AQ modes increase quality but the also increase the bitrate when using the same constant quality setting. You can generally get pretty much the same result and size if you change the constant quality factor. It's something you'd probably like to use with constant or variable bitrate mode. With constant quality mode it doesn't hurt but in this case to keep things simple I left them out since hevc_nvenc doesn't work if you give both AQ options. In perfect world setting like that would improve the quality without increasing the bit rate (it still might, to a point, so it makes sense to definitely use it with constant or variable bitrate mode).

    The whole point of this post was to keep things simple, thus the commonly used Zeranoe builds, and not how to compile your own FFMPEG with --enable-nonfree and the rest. If I understand correctly redistributing FFMPEG with --enable-nonfree is forbidden. --enable-nonfree is needed if you want to --enable-cuvid and --enable-libnpp.

    I had full range options in the video filter but NVENC doesn't seem to support that so the video file would end up being too dark. Also it looks like it's best to not make the rec.709 conversion either. Copy paste leftovers from batch file that I use for libx264.
     
    Last edited: Dec 11, 2016
  7. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB

    Pascal doesn't support B-Frame for HEVC encoding. (Hardware limitation)

    RC-Lookahead works. Verified it by using HEVC bitstream analyzer.
    Let say if one set IPB QP of 20:23:25 [ignore 25 since no b-frame support]
    It correctly designate new scene transition as I-frame using lower QP of 20 and next frame is designated as P-frame using QP of 23 (with some Intra-frame block for delta change between those two frame)

    If you means CQ......it only works with VBR, not CQP.


    *By the way, you do realize the ffmpeg nvenc is using default quantizer value of 28, no? Presets don't change default quantizer value. And CQ in VBR mode only work if the INITIAL QP is set to 1:1:1. Since the default initial qp is corresponding to 28, your CQ=20 basically do nothing in first place (not to mention it doesn't work in CQP mode).

    EDIT:
    As per your color BT709 issue......actually...it is nvenc fault...
    https://devtalk.nvidia.com/default/...hnologies/nvenc-hevc-with-full-range-colors-/
    The user analysis of YCbCr value is correct.
     
    Last edited: Dec 11, 2016
  8. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    I wasn't talking about HEVC.

    Again, there wasn't any image quality difference. At least with FFMPEG 3.2 and these settings. Try it and compare frames, I've used various samples. There are definitely bugs and some weird things with NVENC at the moment.

    Do you even look what settings I use? -rc constqp with -global_quality is basically what -crf does with libx264, the outcome that is....... I can guarantee you that this setting works as intended. If you wonder about why I just happened to name that variable cq then don't bother worrying about it. It's just a batch variable name.
     
    Last edited: Dec 11, 2016
  9. Martigen

    Martigen Master Guru

    Messages:
    535
    Likes Received:
    254
    GPU:
    GTX 1080Ti SLI
    I don't know what you two are arguing about but I'm getting the popcorn.

    *munch*
     
    Andy_K likes this.
  10. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB

    Oh, I see, didn't saw the -global_quality flag.

    Anyway, I made a mistake on ffmpeg nvenc default quantizer value, default is 26, not 28.
    Since you get -global_quality flag to 20, guess it is fine.

    Now this explains why you didn't notice any different with lookahead:

    rc->rateControlMode = NV_ENC_PARAMS_RC_CONSTQP;
    rc->constQP.qpInterB = avctx->global_quality;
    rc->constQP.qpInterP = avctx->global_quality;
    rc->constQP.qpIntra = avctx->global_quality;


    Basically, you are using the same quantizer for I, P , B frames.

    When -global_quality is used for x264, it links to x264 --crf where x264 will adjust I P B quantizers accordingly.
    In x264 crf code, there are functions (ipratio & pbratio) where I-frame quantizer should be lower than P frame by 3 and B-frame quantizer should be higher than P frame by 2. Then again, there are adaptive quantization feature which will vary these scales accordingly depending on source complexity.

    No idea why ffmpeg developer think that settings I P B with same quantizer value is a good idea.

    Moving on to HQ preset issue.
    This is weird, no B-Frame for Pascal H264 encoding? It should have default 3 B-frames for HQ preset.
    Quick question, are you using latest driver?
    How about manually set gop_size (general formula is GOP length = 10 x frame rate), max_b_frames (nvenc max is 4 b-frames for h264) and refs (amount depends on h264 level?)

    Edit: Pardon my grammatical mistake, english isn't my primary language. :)
     
    Last edited: Dec 11, 2016

  11. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    I just realised that you could ditch -global_quality and instead use, say, -rc vbr and then use -qmin and -qmax instead (outcome is not quite the same - obviously). But... Even with -rc vbr the outcome is exactly the same. No b-frames. Still, bit weird that -rc constqp requires you to use -global_quality. It would be great to get it working. This needs some testing...

    It looks like it's not currently possible to manually set gop_size and max_b_frames.

    I haven't checked what's the situation with nightly builds.

    EDIT:
    :facepalm: It looks like there are more settings (http://developer.download.nvidia.co...with-NVIDIA-Acceleration-on-Ubuntu_UG_v01.pdf) than what ffmpeg -h encoder=h264_nvenc shows. Now I got b-frames to work. Use slow preset, then -b (int) switch for bframes and -g (int) switch for gop. Apparently -b switch will automatically enable lookahead. Need to do some more testing...

    EDIT:
    With -b switch b-frames work with -rc constqp too, slow preset. Now... It looks like -rc vbr_2pass with -qmin, -qmax, -b and preset slow (and maybe even -b:v to something sane to improve quality) might be the best choice. Though, -rc constqp is not necessarily a bad idea since it will use global_quality setting for everything (b, i, p). EDIT: Actually, it's not good idea to use vbr_2pass and try to use it like constant quality mode. Also there doesn't seem to be any difference between slow and hq preset.
     
    Last edited: Dec 13, 2016
  12. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Oh....defaults are:
    static const AVCodecDefault defaults[] = {
    { "b", "2M" },
    { "qmin", "-1" },
    { "qmax", "-1" },
    { "qdiff", "-1" },
    { "qblur", "-1" },
    { "qcomp", "-1" },
    { "g", "250" },
    { "bf", "0" },
    { NULL },


    Now that explain why there is 0 B-frames for ffmpeg h264 encoding.

    Problem lies with Initial QP (for VBR & VBR2) and QP for CQP. If there is only a way to edit the default quantizer value for the lookahead to works its magic.

    Nvidia proposes -preset slow -cq 10 -g 150 at its website, but -cq is not working as intended. And 150 GOP size is somehow too low.

    For H264 encoding, it would be CQP or vbr_2pass (if you use VBR/VBR2Pass mode, better set bitrate manually, ffmpeg has its default bitrate at 2Mbps which is ridiculously low), temporal_aq, 32 frames for rc-lookahead, GOP at framerate X 10, b-frames at 4, refs at 4.
     
  13. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    Aye.

    While setting -qmin should pretty much override the default 2M bitrate, it doesn't look like that's the case for b-frames for example. So -rc constqp and global_quality isn't so bad idea and it's starting to make sense to me.

    By the way, if you use rc-lookahead 32 this happens:
    [h264_nvenc @ 0000000002575c00] Defined rc_lookahead requires more surfaces, increasing used surfaces 32 -> 42
    so I guess it would automatically use 42?
     
  14. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Technically, using same quantizer values for I P B frames are a bad idea from quality/efficiency perspective.
    You want P and B frames to refer to a very high quality I-frame (low quantizer) and B-frame to use high quantizer.
    Hmmm......maybe wikipedia can help me to explain https://en.wikipedia.org/wiki/Inter_frame


    Not sure if setting -qmin will do the trick
    -qmin and -qmax only applicable for VBR.
    Priority level for vbr:
    1)Bitrate
    2)Initial QP
    3)-qmin and -qmax


    About the surfaces, yes, ffmpeg will automatically increases required surfaces.
    Extra info, Nvenc sdk mentions default value to be 16 with 32 frames being maximum.

    Don't forget to set reference frames values too.
     
  15. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    Yeah but in this case (-rc constqp) it would be efficiency issue (wasting bitrate) and would give best quality. With rc vbr_2pass and -qmin & -qmax quality can end up being rather subpar (for b-frames for example) unless once doesn't rise the default bit rate. Then again if you use too high bit rate...
     

  16. JohnLai

    JohnLai Guest

    Messages:
    136
    Likes Received:
    7
    GPU:
    ASUS GTX 970 3.5+0.5GB
    Quality per bitrate?

    Here some data using different quantizer values for CQP rate control.

    DATA 1
    frame type IDR 19
    frame type I 19, avgQP 20.00, total size 2.54 MB
    frame type P 442, avgQP 20.00, total size 12.66 MB
    frame type B 1723, avgQP 20.00, total size 24.10 MB
    Total Size 39.3 MB

    DATA 2
    frame type IDR 19
    frame type I 19, avgQP 16.00, total size 3.87 MB
    frame type P 442, avgQP 19.00, total size 15.69 MB
    frame type B 1723, avgQP 21.00, total size 18.75 MB
    Total Size 38.31 MB

    DATA 3
    frame type IDR 19
    frame type I 19, avgQP 15.00, total size 4.31 MB
    frame type P 442, avgQP 18.00, total size 18.01 MB
    frame type B 1723, avgQP 20.00, total size 20.88 MB
    Total Size 43.2 MB

    DATA 4
    frame type IDR 19
    frame type I 19, avgQP 20.00, total size 2.54 MB
    frame type P 442, avgQP 23.00, total size 7.76 MB
    frame type B 1723, avgQP 25.00, total size 11.13 MB
    Total size 21.43 MB

    Which one do you prefer? First data set or Second data set result? :)
     
  17. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    DATA1 vs. DATA2? DATA2.

    But in this case, assuming that -cq -qmin -qmax all would be same and one leaves bitrate to FFMPEG default the results are pretty horrible for I and B frames (with NVENC when using FFMPEG that is - I wonder if it's a bug: those settings seem to affect only P frames). So for what best for consistent quality and best efficiency... ¯\_:)_/¯ Maybe just whack the bitrate to sky high and trust -cq -qmin -qmax... Considering the NVENC quirks when used through FFMPEG what would you do?
     
    Last edited: Dec 12, 2016
  18. chumanga1

    chumanga1 Member Guru

    Messages:
    116
    Likes Received:
    0
    GPU:
    GTX770
    How about encoding speed at max quality? NVIDIA says there is 200fps for Maxwell and 300fps for Pascal at 1080p in quality mode. Does it really achieve such speed for single encoding?
    If true thats massive because Polaris only does 56fps.
     
  19. Anarion

    Anarion Ancient Guru

    Messages:
    13,599
    Likes Received:
    387
    GPU:
    GeForce RTX 3060 Ti
    If the source is encoded with something like UtVideo or Lagarith then decoding is the bottleneck. If it's completely uncompressed video then it's possible that IO read speed can be the bottleneck. If you capture the content directly (i.e. gameplay) then the capturing can be a bottleneck. It can achieve really fast speeds, obviously quality still is not on par with good encoders like x264 (but speed is on another level).

    NVENC lossless encoding is really fast too but in that case IO write speeds can limit the encoding speed.
     
  20. chumanga1

    chumanga1 Member Guru

    Messages:
    116
    Likes Received:
    0
    GPU:
    GTX770
    Most of my source footage is AVC 50Mbit/s~ so IO is not a concern for bottleneck. I will use most of the ASIC encoder for some specific situation where the source codec is broken for Vegas editor, recording with Quicksync in OBS-Studio cause footage to become incompatible with Vegas so i will want some fast encoder just to make it compatible. My i7 can do x264 veryfast at 100fps, NVENC has quality approach of x264 veryfast and if it can do 2-3x more performance will be a good alternative for my i7 for some encodings where i use enough bitrate fo keep quality.

    In other point i will like now to use FFmpeg to do the encoding part from my Vegas rendering with frameserver, since Debug frameserver dont work with x64 codecs like Staxrip which has all Rigaya ASIC bundled inside. Rigaya has x86 version but their software is not transcoding audio from avisynth so Ffmpeg do that trick and will be nice to use.

    Just trying out and using Nvenc1 which can do only 60fps at HQ and it do a good job for Vegas rendering. SonyAVC+GPU encoding acceleration template apparently make use of GPU for motion estimation in encoding which boost speed but make output quality very bad at lower bitrate and even NVENC itself can keep quality better. In a specific rendering using Nvenc it improved by 40% rendering time over SonyAVC.

    Only big downside from NVIDIA is the crappy opencl performance they have in Vegas for video features like compositing and video FX with Kepler, some say they improved opencl for Maxwell and Pascal but i dont find if it's true for Vegas video processing since NVIDIA in past already was doing good opencl job at some opencl benchmarks but at video processing(Vegas only) it always performed badly. At least nvenc boosting encoding can make it worthy against AMD which have crappy VCE. Even my 2012 Nvenc is on par with Polaris latest VCE on speed and quality for AVC at 1080p.
     

Share This Page