Search Unity

In 2018, we’ve introduced a highly customizable rendering technology we call Scriptable Render Pipeline (SRP). A part of this is a new low-level engine rendering loop called SRP Batcher that can speed up your CPU during rendering by 1.2x to 4x, depending on the Scene. Let’s see how to use this feature at its best!


This video shows the worst case scenario for Unity: each object is dynamic and uses a different material (color, texture). This scene shows many similar meshes but it would run the same with one different mesh per object (so GPU instancing can’t be used). The speedup is about 4x on PlayStation 4 (this video is PC, Dx11).

NOTE: when we talk about x4 speedup, we’re talking about the CPU rendering code (the “RenderLoop.Draw” and “ShadowLoop.Draw” profiler markers). We’re not talking about global framerate (FPS)).

Unity and Materials

The Unity editor has a really flexible rendering engine. You can modify any Material property at any time during a frame. Plus, Unity historically was made for non-constant buffers, supporting Graphics APIs such as DirectX9. However, such nice features have some drawbacks. For example, there is a lot of work to do when a DrawCall is using a new Material. So basically, the more Materials you have in a Scene, the more CPU will be required to setup GPU data.

Standard Unity rendering workflow

During the inner render loop, when a new Material is detected, the CPU collects all properties and sets up different constant buffers in the GPU memory. The number of GPU buffers depends on how the Shader declares its CBUFFERs.

How SRP Batcher works

When we made the SRP technology, we had to rewrite some low-level engine parts. We saw a great opportunity to natively integrate some new paradigms, such as GPU data persistence. We aimed to speed up the general case where a Scene uses a lot of different Materials, but very few Shader variants.

Now, low-level render loops can make material data persistent in the GPU memory. If the Material content does not change, there is no need to set up and upload the buffer to the GPU. Plus, we use a dedicated code path to quickly update Built-in engine properties in a large GPU buffer. Now the new flow chart looks like:

SRP Batcher rendering workflow.

Here, the CPU is only handling the built-in engine properties, labeled object matrix transform. All Materials have persistent CBUFFERs located in the GPU memory, which are ready to use. To sum up, the speedup comes from two different things:

  • Each material content is now persistent in GPU memory
  • A dedicated code is managing a large “per object” GPU CBUFFER

How to enable SRP Batcher

Your project must be using either the Lightweight Render Pipeline (LWRP), the High Definition Render Pipeline (HDRP), or your own custom SRP.  To activate the SRP Batcher in HDRP or LWRP, just use the checkbox in the SRP Asset Inspector.


If you want to enable/disable SRP Batcher at runtime, to benchmark performance benefits, you can also toggle this global variable using C# code:

SRP Batcher compatibility

For an object to be rendered through the SRP Batcher code path, there are two requirements:

  1. The object must be in a mesh. It cannot be a particle or a skinned mesh.
  2. You must use a Shader that is compatible with the SRP Batcher. All Lit and Unlit Shaders in HDRP and LWRP fit this requirement.

For a Shader to be compatible with SRP:

  • All built-in engine properties must be declared in a single CBUFFER named “UnityPerDraw”. For example, unity_ObjectToWorld, or unity_SHAr.
  • All Material properties must be declared in a single CBUFFER named “UnityPerMaterial”.

You can see the compatibility status of a Shader in the Inspector panel. This compatibility section is only displayed if your Project is SRP based.

In any given Scene, some objects are SRP Batcher compatible, some are not. But the Scene is still rendered properly. Compatible objects will use SRP Batcher code path, and others still use the standard SRP code path.

The Art of profiling

SRPBatcherProfiler.cs

If you want to measure the speed increase with SRP Batcher in your specific Scene, you could use the SRPBatcherProfiler.cs C# script. Just add the script in your Scene. When this script is running, you can toggle the overlay display using F8 key. You can also turn SRP Batcher ON and OFF during play using F9 key. If you enable the overlay in PLAY mode (F8) you should see a lot of useful information:

Here, all time is measured in milliseconds (ms). Those time measurements show the CPU spent in Unity SRP rendering loops.

NOTE: timing means cumulated time of all “RenderLoop.Draw” and “Shadows.Draw” markers called during a frame, whatever the thread owner. When you see “1.31ms SRP Batcher code path”, maybe 0.31ms is spent on main thread, and 1ms is spread over all of the graphic jobs.

Overlay information

In this table, you can see a description of each setting in the Overlay visible in PLAY mode, from top to bottom:

NOTE: We hesitate to add FPS at the bottom of the overlay because you should be very careful about FPS metrics when optimizing. First, FPS is not linear, so seeing FPS increase by 20% didn’t tell you immediately how much you optimized your scene. Second, FPS is global over the frame. FPS (or global frame timing) depends on many other things than rendering, like C# gameplay, Physics, Culling, etc.

You can get SRPBatcherProfiler.cs from a SRP Batcher project template on GitHub.

Various scenes benchmark

Here are some Unity scenes shots with SRP Batcher OFF and ON to see the speed up in various situations.

Book of the Dead, HDRP, PlayStation 4. x1.47 speed up. Please note that FPS doesn’t change, because this scene is GPU bound. You get 12ms left to do other things on the CPU side. Speed up is almost the same on PC.

FPS Sample, HDRP, PC DirectX 11. X1.23 speed up. Please note there is still 1.67ms going to the standard code path because of SRP Batcher incompatibility. In this case, skinned meshes and a few particles rendered using Material Property Blocks.

Boat Attack, LWRP, PlayStation 4. Speed up x2.13.

Supported Platforms

SRP Batcher is working on almost all platforms. Here is a table showing platform and minimal Unity version required. Unity 2019.2 is currently in open alpha.

Some words about VR

SRP Batcher fast code path is supported in VR, only with “SinglePassInstanced” mode. Enabling VR won’t add any CPU time ( thanks to SinglePassInstanced mode )

Common questions

How do I know I’m using SRP Batcher the best way possible?

Use SRPBatcherProfiler.cs, and first check that SRP Batcher is ON. Then, look at “Standard code path” timing. This should be close to 0, and all timing should be spent in “SRP Batcher code path”. Sometimes, it’s normal that some time is spent in the standard code path if your scene is using a few skinned meshes or particles. Check out our SRP Batcher Benchmark project on GitHub.

SRPBatcherProfiler shows similar timing regardless of SRP Batcher is ON or OFF. Why?

First, you should check that almost all rendering time goes through the new code path (see above). If it does, and the numbers are still similar, then look at the “flush” number. This “flush” number should decrease a lot when the SRP Batcher is ON. As a rule of thumb, divided by 10 is really nice, by 2 is almost good. If the flush count does not decrease a lot, it means you still have a lot of Shader variants. Try to reduce the number of Shader variants. If you did a lot of different Shaders, try to make a “uber” one with more parameters. Having tons of different material parameters is then free.

Global FPS didn’t change when I enabled the SRP Batcher. Why?

Check the two questions above. If SRPBatcherProfiler shows that “CPU Rendering time” is twice as fast, and the FPS did not change, then the CPU rendering part is not your bottleneck. It does not mean you’re not CPU bound – instead, maybe you’re using too much C# gameplay or too many physics elements. Anyway, if “CPU Rendering time” is twice as fast, it’s still positive. You probably noticed on the top video that even with 3.5x speedup, the scene is still at 60FPS. That’s because we have VSYNC turned ON. SRP Batcher really saved 6.8ms on the CPU side. Those milliseconds could be used for another task. It can also just save some battery life on mobile.

How to check SRP Batcher efficiency

It’s important to understand what is a “batch” in SRP Batcher context. Traditionally, people tend to reduce the number of DrawCall to optimize the CPU rendering cost. The real reason for that is the engine has to set up a lot of things before issuing the draw. And the real CPU cost comes from that setup, not from the GPU DrawCall itself (that is just some bytes to push in the GPU command buffer). SRP Batcher doesn’t reduce the number of DrawCalls. It just reduces the GPU setup cost between DrawCalls.

You can see that on the following workflow:

On the left is the standard SRP rendering loop. On the right is the SRP Batcher loop. In SRP Batcher context, a “batch” is just a sequence of “Bind”, “Draw”, “Bind”, Draw”… GPU commands.

In standard SRP, the slow SetShaderPass is called for each new material. In SRP Batcher context, the SetShaderPass is called for each new shader variant.

To get maximum performance, you need to keep those batches as large as possible. So you need to avoid any shader variant change, but you can use any number of different Materials if they’re using the same shader.

You can use Unity Frame Debugger to look at the SRP Batcher “batches” length. Each batch is an event in frame debugger called “SRP Batch”, as you can see here:

See the SRP Batch event on the left. See also the size of the batch, which is the number of Draw Calls (109 here). That’s a pretty efficient batch. You also see the reason why the previous batch had been broken (“Node use different shader keywords”). It means the shader keywords used for that batch are different than the keywords in the previous batch. It means that the shader variant has changed, and we have to break the batch.

In some scenes, some batch size could be really low, like this one:

Batch size is only 2. It probably means you have too many different shader variants. If you’re creating your own SRP, try to write generic “uber” shader with minimum keywords. You don’t have to worry about how many material parameters you put in the “property” section.

NOTE: SRP Batcher information in Frame Debugger requires Unity 2018.3 or higher.

Write your own SRP with compatible shader

Note: This section is made for advanced users writing their own Scriptable Render Loop and shader library. LWRP or HDRP users can skip this section, as all shaders we provide are already SRP Batcher compatible.

If you’re writing your own render loop, your shaders have to follow some rules in order to go through the SRP Batcher code path.

“Per Material” variables

First, all “per material” data should be declared in a single CBUFFER named “UnityPerMaterial”. What is “per material” data? Typically all variables you declared in the “shader property” section. That is all variables that your artist can tweak using the material GUI inspector. For instance, let’s look at a simple shader like:

If you compile this shader, the shader inspector panel will show you:

To fix that, just declare all your “per material” data like that:

“Per Object” variables

SRP Batcher also needs a very special CBUFFER named “UnityPerDraw”. This CBUFFER should contain all Unity built-in engine variables.

The variable declaration order inside of “UnityPerDraw” CBUFFER is also important. All variables should respect some layout we call “Block Feature”. For instance, the “Space Position block feature” should contain all those variables, in that order:

You don’t have to declare some of these block features if you don’t need them. All built-in engine variables in “UnityPerDraw” should be float4 or float4x4. On mobile, people may want to use real4 ( 16 bits encoded floating point value) to save some GPU bandwidth. Not all UnityPerDraw variables could use “real4”. Please refer to the “Could be real4” column.

Here is a table describing all possible block features you could use in the “UnityPerDraw” CBUFFER:

NOTE: If one of the variables of one feature block is declared as real4 ( half ), then all other potential variables of that feature block should also be declared as real4.

HINT 1: always check the compatibility status of a new shader in the inspector. We check several potential errors ( UnityPerDraw layout declaration, etc ) and display why it’s not compatible.

HINT 2: When writing your own SRP shader you can refer to LWRP or HDRP package to look at their UnityPerDraw CBUFFER declaration for inspiration.

Future

We still continue to improve SRP Batcher by increasing batch size in some rendering passes (especially Shadow and Depth passes).

We’re also working on adding automatic GPU instancing usage with SRP Batcher. We started with new DOTS renderer used in our MegaCity demo. The speedup in the Unity editor is quite impressive, going from 10 to 50 FPS.

MegaCity in-editor with SRP Batcher & DOTS renderer. The difference in performance is so huge that even global frame rate speeds up by a factor of five.

NOTE: To be precise, this massive speedup when enabling the SRP Batcher is editor only, due to editor currently not using Graphics Jobs. Speedup in Standalone player mode is something like x2.

MegaCity in Editor. If you could play the video at 60hz you would feel the speed up when enabling SRP Batcher.

NOTE: SRP Batcher with DOTS renderer is still experimental and in active development.

24 コメント

コメントの配信登録

返信する

これらの HTML タグや属性を使用できます: <a href=""> <b> <code> <pre>

  1. dfon nan fedex

  2. Another weekend exploring new optimization solutions. Keep em coming Unity!

  3. The option doesn’t show on 2018.3.7 for me, what am I missing? I have installed the LWRP package (obviously)

  4. yeah, ok…. but please, remember to make public a tool when the tool is finished (not a preview), integrated, good documented (with examples of practique use and not just a “maybe you can use this in this way, understand it by yourself”), designed for a massive use and not only for some programmers, thanks.

    1. Setriakor Nyomi

      3月 6, 2019 6:05 am 返信

      I disagree. The preview programme is very important. It allows us to give feedback about upcoming features while they’re still in development and vastly shortens the dev cycle. Why get the opinions of only a handful of developers in a room when you can get it from 10’s of thousands all over the world?

      1. Well, is my opinion, I want to made videogames, not to be a Unity tester.

        They changed and replaced a lot of stuff on Unity 2018.3 and 2019 with tools that are on development (like the nested prefabs that broken with all the previous prefab workflow, I hate that and we don´t have alternatives) and they are showing some of that un-finished tools (like the SRP) as the principal characteristics of the newer versions, that tools are undocumented, aren’t full tested and have a lot of instability. Great if that tools are on preview, but please don’t sell it like a finished tools. Take your time, finished the tool´s development and launch a real Unity update, not this.

  5. I see only half of article.

  6. In the per object variables section, you write: Please refer to the “Could be real4” column.
    However, this column does not exist in the table.

  7. John KP @ Mindshow

    3月 1, 2019 7:08 pm 返信

    The SRP Batch code path isn’t running for me when VR is enabled in my project. This was replicated in a clean project with just art and unmodifed code as well:

    2019.1 b4
    Unmodified version of LWRP
    VR Enabled (Single Pass Instanced)
    SRP Batching Enabled in Lightweight Render Pipeline Asset.
    Graphics Jobs Enabled
    Windows Standalone
    DX11
    All materials in scene are using the Lit shader provided by LWRP.

    Result:
    SRP Batching Profiler (and frame debugger) reveal that the SRP Batches aren’t happening.

    If VR is disabled (or if frame debugger is sampled while play mode is inactive in editor)
    SRP Batching works as expected.

    Is there something I’m overlooking?

  8. Great article!
    By the way, if I see this blog from japan,
    the url is redirected to “blogs.unity3d.com/jp/2019/02/28/srp-batcher-speed-up-your-rendering/” and the content looks broken.

  9. The SRP Batcher seems to break my Statistics and Frame Debugger windows. Stats shows 6 batches and 1k tris for a scene with 1k batches and 800k tris with it disabled. The frame debugger has no Depth Prepass or GBuffer at all with the batcher enabled.
    This is on Unity 2018.3.6f1 with HDRP 4.10.0.

  10. Very cool!
    Is Unity working on a similar render approach as Frostbite/Unreal? I got this from a post:
    “The new mesh processor works by uploading the entire scene data to gpu memory. Instead of setting uniform parameters per-drawcall, it just uploads everything to GPU buffers and then refers back to those though indices. This allows unreal to do a lot more multithreading in the renderer, even in DX11 or phones, and it also makes unreal able to automatically instance everything possible.”

    No idea how this is called. Maybe not the best place to ask but anyway, is Unity doing something similar or researching?

    1. Er, that is what is happening here. GPU data persistence.

      1. No, it’s not quite right (partially). Epic changed a big chunk of renderer using Data-Oriented design along with an automatic instancing.

        1. They are working on a ECS based renderer. So I think, they would be refactoring their mesh drawing pipeline.
          Also where did you get this “EPIC changed a big chunk of renderer using Data-Oriented design”. I know they refactored and removed the legacy mesh drawing system, but I dont think its DOD-based.

    2. You are well come. I’m a user, ex UDK. I think what you are looking for is this:

      [Unity ECS: https://unity.com/dots#burst-compiler%5D

      Unity is working on this since 2017 2018 (I think) and so there are tutorials on YouTube.

      Explanation: [https://www.youtube.com/watch?v=d9Z4EUZ5apo]

      Here there is a Unity example that they are working on:

      [https://unity.com/megacity]

      And a Unity tutorial:

      [https://unity3d.com/learn/tutorials/topics/scripting/introduction-ecs?_ga=2.87982238.788035148.1551051194-2090915164.1475000191]

      So is not a job system for the mesh. Is also for the engine itself. To do this Unity is migrating to a special C# now called HPC# to get the performance of C++ in multithreading and using this memory layout that gives to the GPU all mesh, textures, animation inline order. The slogan for Unity engine improvement is: performance by default.

      [https://unity.com/dots]

      That will be introduce in this 2019 probably at GDC now in march.

    3. Here in Unity, there are 4 or 5 rendering engines: a traditional Unity5, a Desktop HDRP unity high definition render pipeline, the lightweight render pipeline LWRP for movies, other for VR with low performance and a custom rendering pipeline SRP. This article refers to this last one. I think is for advanced users or industries that want to migrate to Unity (and I presume If you are an expert you can put Unreal rendering in it if you wish).

  11. Awesome post. Thanks !

    Does that mean that we have to stop using MaterialPropertyBlocks and actually set the changing property directly on the material ?
    Or you working on a new MPB ?

    Thanks !

  12. Awesome work as usual! Excited about all these great improvements being made these past few years, the future looks bright for Unity.

    > If you could play the video at 60hz you would feel the speed up when enabling SRP Batcher.

    This is a 60fps video though, so 99.9% of people should be able to see it at 60hz!

  13. Great work, is this available on the Vulkan/DX12 APIs as I think these API’s are also more threadable and could potentially have greater gains in performance?

  14. The images labelled “SRP Batcher rendering workflow” and “standard pipeline rendering workflow” are identical, which seems to be a mistake. Otherwise whats the point of differentiating when they are the same?

    1. Isaac, these definitely a mistake. I’m sure Arnaud will fix it soon

    2. Community Team

      2月 28, 2019 2:10 pm 返信

      Thanks for noticing, and sorry for the confusion, we’ve just fixed it!

      1. First video in the post dont work in Firefox. :)

        Batcher works for OpenGL ES 3.1+
        Is there plans to make Batcher work on OpenGL ES 3.0? if no what is best workflow to create game for gles 3.0 and gles 3.1 for best performance?