剖析虛幻渲染體系(03)- 渲染機制

 

 

3.1 本篇概述和基礎

3.1.1 渲染機制概述

本篇主要講述UE怎麼將場景的物體怎麼組織成一個個Draw Call,期間做了那些優化和處理以及場景渲染器是如何渲染整個場景的。主要涉及的內容有:

  • 模型繪製流程。
  • 動態和靜態渲染路徑。
  • 場景渲染器。
  • 涉及的基礎概念和優化技術。
  • 核心類和介面的程式碼剖析。

後面的章節會具體涉及這些技術。

3.1.2 渲染機制基礎

按慣例,為了更好地切入本篇主題,先闡述或回顧一下本篇將會涉及的一些基礎概念和類型。

類型 解析
UPrimitiveComponent 圖元組件,是所有可渲染或擁有物理模擬的物體父類。是CPU層裁剪的最小粒度單位。
FPrimitiveSceneProxy 圖元場景代理,是UPrimitiveComponent在渲染器的代表,鏡像了UPrimitiveComponent在渲染執行緒的狀態。
FPrimitiveSceneInfo 渲染器內部狀態(描述了FRendererModule的實現),相當於融合了UPrimitiveComponent and FPrimitiveSceneProxy。只存在渲染器模組,所以引擎模組無法感知到它的存在。
FScene 是UWorld在渲染模組的代表。只有加入到FScene的物體才會被渲染器感知到。渲染執行緒擁有FScene的所有狀態(遊戲執行緒不可直接修改)。
FSceneView 描述了FScene內的單個視圖(view),同個FScene允許有多個view,換言之,一個場景可以被多個view繪製,或者多個view同時被繪製。每一幀都會創建新的view實例。
FViewInfo view在渲染器的內部代表,只存在渲染器模組,引擎模組不可見。
FSceneRenderer 每幀都會被創建,封裝幀間臨時數據。下派生FDeferredShadingSceneRenderer(延遲著色場景渲染器)和FMobileSceneRenderer(移動端場景渲染器),分別代表PC和移動端的默認渲染器。
FMeshBatchElement 單個網格模型的數據,包含網格渲染中所需的部分數據,如頂點、索引、UniformBuffer及各種標識等。
FMeshBatch 存著一組FMeshBatchElement的數據,這組FMeshBatchElement的數據擁有相同的材質和頂點緩衝。
FMeshDrawCommand 完整地描述了一個Pass Draw Call的所有狀態和數據,如shader綁定、頂點數據、索引數據、PSO快取等。
FMeshPassProcessor 網格渲染Pass處理器,負責將場景中感興趣的網格對象執行處理,將其由FMeshBatch對象轉成一個或多個FMeshDrawCommand。

需要特意指出,以上概念中除了UPrimitiveComponent是屬於遊戲執行緒的對象,其它皆屬於渲染執行緒。

 

3.2 模型繪製管線

3.2.1 模型繪製管線概覽

在學習OpenGL或DirectX等圖形API時,想必大家肯定都接觸過類似的程式碼(以OpenGL畫三角形為例):

void DrawTriangle()
{
    // 構造三角形頂點和索引數據.
    float vertices[] = {
         0.5f,  0.5f, 0.0f,  // top right
         0.5f, -0.5f, 0.0f,  // bottom right
        -0.5f, -0.5f, 0.0f,  // bottom left
        -0.5f,  0.5f, 0.0f   // top left 
    };
    unsigned int indices[] = {
        0, 1, 3,  // first Triangle
        1, 2, 3   // second Triangle
    };
    
    // 創建GPU側的資源並綁定.
    unsigned int VBO, VAO, EBO;
    glGenVertexArrays(1, &VAO);
    glGenBuffers(1, &VBO);
    glGenBuffers(1, &EBO);
    glBindVertexArray(VAO);

    glBindBuffer(GL_ARRAY_BUFFER, VBO);
    glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);

    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, EBO);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(indices), indices, GL_STATIC_DRAW);

    glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 3 * sizeof(float), (void*)0);
    glEnableVertexAttribArray(0);

    glBindBuffer(GL_ARRAY_BUFFER, 0); 
    glBindVertexArray(0); 

    // 清理背景
    glClearColor(0.2f, 0.3f, 0.3f, 1.0f);
    glClear(GL_COLOR_BUFFER_BIT);
    
    // 繪製三角形
    glUseProgram(shaderProgram);
    glBindVertexArray(VAO);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_INT, 0);
}

以上的Hello Triangle大致經過了幾個階段:構造CPU資源,創建和綁定GPU側資源,調用繪製介面。這對於簡單的應用程式,或者學習圖形學來言,直接調用圖形學API可以簡化過程,直奔主題。但是,對於商業遊戲引擎而言,需要以每秒數十幀渲染複雜的場景(成百上千個Draw Call,數十萬甚至數百萬個三角形),肯定不能直接採用簡單的圖形API調用。

商業遊戲引擎需要在真正調用圖形API之前,需要做很多操作和優化,諸如遮擋剔除、動態和靜態合拼、動態Instance、快取狀態和命令、生成中間指令再轉譯成圖形API指令等等。

在UE4.21之前,為了達到上述的目的,採用了網格渲染流程(Mesh Draw Pipeline),示意圖如下:

UE4.21及之前版本的網格繪製流程。

大致過程是渲染之時,渲染器會遍歷場景的所有經過了可見性測試的PrimitiveSceneProxy對象,利用其介面收集不同的FMeshBatch,然後在不同的渲染Pass中遍歷這些FMeshBatch,利用Pass對應的DrawingPolicy將其轉成RHI的命令列表,最後才會生成對應圖形API的指令,提交到GPU硬體中執行。

UE4.22在此基礎上,為了更好地做渲染優化,給網格渲染管線進行了一次比較大的重構,拋棄了低效率的DrawingPolicy,用PassMeshProcessor取而代之,在FMeshBatch和RHI命令之間增加了一個概念FMeshDrawCommand,以便更大程度更加可控地排序、快取、合併繪製指令:

UE4.22重構後新的網格繪製流程。增加了新的FMeshDrawCommand和FMeshPassProcessor等概念及操作。

這樣做的目的主要有兩個:

  • 支援RTX的實時光線追蹤。光線追蹤需要遍歷整個場景的物體,要保留整個場景的shader資源。
  • GPU驅動的渲染管線。包含GPU裁剪,所以CPU沒法知道每一幀的可見性,但又不能每幀建立整個場景的繪製指令,否則無法達成實時渲染。

為了達成上述的目的,重構後的管線採取了更多聚合快取措施,體現在:

  • 靜態圖元在加入場景時就建立繪製指令,然後快取。
  • 允許RHI層做儘可能多的預處理。
    • shader Binding Table Entry。
    • Graphics Pipeline State。
  • 避免靜態網格每幀都重建繪製指令。

重構了模型渲染管線之後,多數場景案例下,DepthPass和BasePass可以減少數倍的Draw Call數量,快取海量的命令:

Fortnite的一個測試場景在新舊網格渲染管線下的渲染數據對比。可見在新的網格渲染流程下,Draw Call得到了大量的降低,命令快取數量也巨大。

本節的後續章節就以重構後的網格繪製流程作為剖析對象。

3.2.2 從FPrimitiveSceneProxy到FMeshBatch

在上一篇中,已經解析過FPrimitiveSceneProxy是遊戲執行緒UPrimitiveComponent在渲染執行緒的鏡像數據。而FMeshBatch是本節才接觸的新概念,它它包含了繪製Pass所需的所有資訊,解耦了網格Pass和FPrimitiveSceneProxy,所以FPrimitiveSceneProxy並不知道會被哪些Pass繪製。

FMeshBatch和FMeshBatchElement的主要聲明如下:

// Engine\Source\Runtime\Engine\Public\MeshBatch.h

// 網格批次元素, 存儲了FMeshBatch單個網格所需的數據.
struct FMeshBatchElement
{
    // 網格的UniformBuffer, 如果使用GPU Scene, 則需要為null.
    FRHIUniformBuffer* PrimitiveUniformBuffer;
    // 網格的UniformBuffer在CPU側的數據.
    const TUniformBuffer<FPrimitiveUniformShaderParameters>* PrimitiveUniformBufferResource;
    // 索引緩衝.
    const FIndexBuffer* IndexBuffer;

    union 
    {
        uint32* InstanceRuns;
        class FSplineMeshSceneProxy* SplineMeshSceneProxy;
    };
    // 用戶數據.
    const void* UserData;
    void* VertexFactoryUserData;

    FRHIVertexBuffer* IndirectArgsBuffer;
    uint32 IndirectArgsOffset;

    // 圖元ID模式, 有PrimID_FromPrimitiveSceneInfo(GPU Scene模式)和PrimID_DynamicPrimitiveShaderData(每個網格擁有自己的UniformBuffer)
    // 只可被渲染器修改.
    EPrimitiveIdMode PrimitiveIdMode : PrimID_NumBits + 1;
    uint32 DynamicPrimitiveShaderDataIndex : 24;

    uint32 FirstIndex;
    /** When 0, IndirectArgsBuffer will be used. */
    uint32 NumPrimitives;

    // Instance數量
    uint32 NumInstances;
    uint32 BaseVertexIndex;
    uint32 MinVertexIndex;
    uint32 MaxVertexIndex;
    int32 UserIndex;
    float MinScreenSize;
    float MaxScreenSize;

    uint32 InstancedLODIndex : 4;
    uint32 InstancedLODRange : 4;
    uint32 bUserDataIsColorVertexBuffer : 1;
    uint32 bIsSplineProxy : 1;
    uint32 bIsInstanceRuns : 1;

    // 獲取圖元數量.
    int32 GetNumPrimitives() const
    {
        if (bIsInstanceRuns && InstanceRuns)
        {
            int32 Count = 0;
            for (uint32 Run = 0; Run < NumInstances; Run++)
            {
                Count += NumPrimitives * (InstanceRuns[Run * 2 + 1] - InstanceRuns[Run * 2] + 1);
            }
            return Count;
        }
        else
        {
            return NumPrimitives * NumInstances;
        }
    }
};


// 網格批次.
struct FMeshBatch
{
    // 這組FMeshBatchElement的數據擁有相同的材質和頂點緩衝。
    // TInlineAllocator<1>表明Elements數組至少有1個元素.
    TArray<FMeshBatchElement,TInlineAllocator<1> > Elements; 
    const FVertexFactory* VertexFactory; // 頂點工廠.
    const FMaterialRenderProxy* MaterialRenderProxy; // 渲染所用的材質.

    uint16 MeshIdInPrimitive; // 圖元所在的網格id, 用於相同圖元的穩定排序.
    int8 LODIndex; // 網格LOD索引, 用於LOD的平滑過渡.
    uint8 SegmentIndex; // 子模型索引.
    
    // 裁剪標記.
    uint32 ReverseCulling : 1;
    uint32 bDisableBackfaceCulling : 1;

    // 特定渲染Pass的關聯標記.
    uint32 CastShadow        : 1; // 是否在陰影Pass中渲染.
    uint32 bUseForMaterial    : 1; // 是否在需要材質的Pass中渲染.
    uint32 bUseForDepthPass : 1; // 是否在深度Pass中渲染.
    uint32 bUseAsOccluder    : 1; // 標明是否遮擋體.
    uint32 bWireframe        : 1; // 是否線框模式.

    uint32 Type : PT_NumBits; // 圖元類型, 如PT_TriangleList(默認), PT_LineList, ...
    uint32 DepthPriorityGroup : SDPG_NumBits; // 深度優先順序組, 如SDPG_World (default), SDPG_Foreground

    // 其它標記和數據
    const FLightCacheInterface* LCI;
    FHitProxyId BatchHitProxyId;
    float TessellationDisablingShadowMapMeshSize;
    
    uint32 bCanApplyViewModeOverrides : 1;
    uint32 bUseWireframeSelectionColoring : 1;
    uint32 bUseSelectionOutline : 1;
    uint32 bSelectable : 1;
    uint32 bRequiresPerElementVisibility : 1;
    uint32 bDitheredLODTransition : 1;
    uint32 bRenderToVirtualTexture : 1;
    uint32 RuntimeVirtualTextureMaterialType : RuntimeVirtualTexture::MaterialType_NumBits;
    
    (......)
    
    // 工具介面.
    bool IsTranslucent(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsDecal(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsDualBlend(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool UseForHairStrands(ERHIFeatureLevel::Type InFeatureLevel) const;
    bool IsMasked(ERHIFeatureLevel::Type InFeatureLevel) const;
    int32 GetNumPrimitives() const;
    bool HasAnyDrawCalls() const;
};

由此可見,FMeshBatch記錄了一組擁有相同材質和頂點工廠的FMeshBatchElement數據(如下圖),還存儲了渲染Pass特定的標記和其它所需的數據,為的就是在網格渲染後續流程中使用和二次加工。

一個FMeshBatch擁有一組FMeshBatchElement、一個頂點工廠和一個材質實例,同一個FMeshBatch的所有FMeshBatchElement共享著相同的材質和頂點緩衝(可可被視為Vertex Factory)。但通常情況(大多數情況)下,FMeshBatch只會有一個FMeshBatchElement。

場景渲染器FSceneRenderer在渲染之初,會執行可見性測試和剔除,以便剔除被遮擋和被隱藏的物體,在此階段的末期會調用GatherDynamicMeshElements收集當前場景所有的FPrimitiveSceneProxy,流程示意程式碼如下:

void FSceneRender::Render(FRHICommandListImmediate& RHICmdList)
{
    bool FDeferredShadingSceneRenderer::InitViews((FRHICommandListImmediate& RHICmdList,  ...)
    {
        void FSceneRender::ComputeViewVisibility(FRHICommandListImmediate& RHICmdList, ...)
        {
            FSceneRender::GatherDynamicMeshElements(Views, Scene, ViewFamily, DynamicIndexBuffer, DynamicVertexBuffer, DynamicReadBuffer, HasDynamicMeshElementsMasks, HasDynamicEditorMeshElementsMasks, HasViewCustomDataMasks, MeshCollector);
        }
    }
}

再進入FSceneRender::GatherDynamicMeshElements看看執行了哪些邏輯:

// Engine\Source\Runtime\Renderer\Private\SceneVisibility.cpp

void FSceneRenderer::GatherDynamicMeshElements(
    TArray<FViewInfo>& InViews, 
    const FScene* InScene, 
    const FSceneViewFamily& InViewFamily, 
    FGlobalDynamicIndexBuffer& DynamicIndexBuffer,
    FGlobalDynamicVertexBuffer& DynamicVertexBuffer,
    FGlobalDynamicReadBuffer& DynamicReadBuffer,
    const FPrimitiveViewMasks& HasDynamicMeshElementsMasks, 
    const FPrimitiveViewMasks& HasDynamicEditorMeshElementsMasks, 
    const FPrimitiveViewMasks& HasViewCustomDataMasks,
    FMeshElementCollector& Collector)
{
    (......)
    
    int32 NumPrimitives = InScene->Primitives.Num();

    int32 ViewCount = InViews.Num();
    {
        // 處理FMeshElementCollector.
        Collector.ClearViewMeshArrays();
        for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
        {
            Collector.AddViewMeshArrays(
                &InViews[ViewIndex], 
                &InViews[ViewIndex].DynamicMeshElements,
                &InViews[ViewIndex].SimpleElementCollector,
                &InViews[ViewIndex].DynamicPrimitiveShaderData, 
                InViewFamily.GetFeatureLevel(),
                &DynamicIndexBuffer,
                &DynamicVertexBuffer,
                &DynamicReadBuffer);
        }

        const bool bIsInstancedStereo = (ViewCount > 0) ? (InViews[0].IsInstancedStereoPass() || InViews[0].bIsMobileMultiViewEnabled) : false;
        const EShadingPath ShadingPath = Scene->GetShadingPath();
        
        // 遍歷場景所有的圖元.
        for (int32 PrimitiveIndex = 0; PrimitiveIndex < NumPrimitives; ++PrimitiveIndex)
        {
            const uint8 ViewMask = HasDynamicMeshElementsMasks[PrimitiveIndex];

            if (ViewMask != 0) // 只處理沒有被遮擋或隱藏的物體
            {
                // Don't cull a single eye when drawing a stereo pair
                const uint8 ViewMaskFinal = (bIsInstancedStereo) ? ViewMask | 0x3 : ViewMask;

                FPrimitiveSceneInfo* PrimitiveSceneInfo = InScene->Primitives[PrimitiveIndex];
                const FPrimitiveBounds& Bounds = InScene->PrimitiveBounds[PrimitiveIndex];
                // 將FPrimitiveSceneProxy的資訊設置到收集器中.
                Collector.SetPrimitive(PrimitiveSceneInfo->Proxy, PrimitiveSceneInfo->DefaultDynamicHitProxyId);
                // 設置動態網格自定義數據.
                SetDynamicMeshElementViewCustomData(InViews, HasViewCustomDataMasks, PrimitiveSceneInfo);

                // 標記DynamicMeshEndIndices的起始.
                if (PrimitiveIndex > 0)
                {
                    for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                    {
                        InViews[ViewIndex].DynamicMeshEndIndices[PrimitiveIndex - 1] = Collector.GetMeshBatchCount(ViewIndex);
                    }
                }
                
                // 獲取動態網格元素的數據.
                PrimitiveSceneInfo->Proxy->GetDynamicMeshElements(InViewFamily.Views, InViewFamily, ViewMaskFinal, Collector);

                // 標記DynamicMeshEndIndices的末尾.
                for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                {
                    InViews[ViewIndex].DynamicMeshEndIndices[PrimitiveIndex] = Collector.GetMeshBatchCount(ViewIndex);
                }
                
                // 處理MeshPass相關的數據和標記.
                for (int32 ViewIndex = 0; ViewIndex < ViewCount; ViewIndex++)
                {
                    if (ViewMaskFinal & (1 << ViewIndex))
                    {
                        FViewInfo& View = InViews[ViewIndex];
                        const bool bAddLightmapDensityCommands = View.Family->EngineShowFlags.LightMapDensity && AllowDebugViewmodes();
                        const FPrimitiveViewRelevance& ViewRelevance = View.PrimitiveViewRelevanceMap[PrimitiveIndex];

                        const int32 LastNumDynamicMeshElements = View.DynamicMeshElementsPassRelevance.Num();
                        View.DynamicMeshElementsPassRelevance.SetNum(View.DynamicMeshElements.Num());

                        for (int32 ElementIndex = LastNumDynamicMeshElements; ElementIndex < View.DynamicMeshElements.Num(); ++ElementIndex)
                        {
                            const FMeshBatchAndRelevance& MeshBatch = View.DynamicMeshElements[ElementIndex];
                            FMeshPassMask& PassRelevance = View.DynamicMeshElementsPassRelevance[ElementIndex];
                            // 這裡會計算當前的MeshBatch會被哪些MeshPass引用, 從而加到view的對應MeshPass的數組中.
                            ComputeDynamicMeshRelevance(ShadingPath, bAddLightmapDensityCommands, ViewRelevance, MeshBatch, View, PassRelevance, PrimitiveSceneInfo, Bounds);
                        }
                    }
                }
            }
        }
    }

    (......)
    
    // 收集器執行任務.
    MeshCollector.ProcessTasks();
}

上面的程式碼可知,收集動態圖元數據時,會給每個View創建一個FMeshElementCollector的對象,以便收集場景中所有可見的FPrimitiveSceneProxy的網格數據。而中間有一句關鍵的程式碼PrimitiveSceneInfo->Proxy->GetDynamicMeshElements()就是給每個圖元對象向渲染器(收集器)添加可見圖元元素的機會,下面展開此函數展開看看(由於基類FPrimitiveSceneProxy的這個介面是空函數體,未做任何操作,所以這個收集操作由具體的子類實現,這裡以子類FSkeletalMeshSceneProxy的實現為例):

// Engine\Source\Runtime\Engine\Private\SkeletalMesh.cpp

void FSkeletalMeshSceneProxy::GetDynamicMeshElements(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, FMeshElementCollector& Collector) const
{
    GetMeshElementsConditionallySelectable(Views, ViewFamily, true, VisibilityMap, Collector);
}

void FSkeletalMeshSceneProxy::GetMeshElementsConditionallySelectable(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, bool bInSelectable, uint32 VisibilityMap, FMeshElementCollector& Collector) const
{
    (......)

    const int32 LODIndex = MeshObject->GetLOD();
    const FSkeletalMeshLODRenderData& LODData = SkeletalMeshRenderData->LODRenderData[LODIndex];

    if( LODSections.Num() > 0 && LODIndex >= SkeletalMeshRenderData->CurrentFirstLODIdx )
    {
        const FLODSectionElements& LODSection = LODSections[LODIndex]
        
        // 根據LOD遍歷所有的子模型, 加入到collector中.
        for (FSkeletalMeshSectionIter Iter(LODIndex, *MeshObject, LODData, LODSection); Iter; ++Iter)
        {
            const FSkelMeshRenderSection& Section = Iter.GetSection();
            const int32 SectionIndex = Iter.GetSectionElementIndex();
            const FSectionElementInfo& SectionElementInfo = Iter.GetSectionElementInfo();

            bool bSectionSelected = false;
            if (MeshObject->IsMaterialHidden(LODIndex, SectionElementInfo.UseMaterialIndex) || Section.bDisabled)
            {
                continue;
            }
            // 將指定LODIndex和SectionIndex加入到Collector中.
            GetDynamicElementsSection(Views, ViewFamily, VisibilityMap, LODData, LODIndex, SectionIndex, bSectionSelected, SectionElementInfo, bInSelectable, Collector);
        }
    }
    
    (......)
}

void FSkeletalMeshSceneProxy::GetDynamicElementsSection(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, const FSkeletalMeshLODRenderData& LODData, const int32 LODIndex, const int32 SectionIndex, bool bSectionSelected, const FSectionElementInfo& SectionElementInfo, bool bInSelectable, FMeshElementCollector& Collector ) const
{
    const FSkelMeshRenderSection& Section = LODData.RenderSections[SectionIndex];
    const bool bIsSelected = false;
    const bool bIsWireframe = ViewFamily.EngineShowFlags.Wireframe;

    for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
    {
        if (VisibilityMap & (1 << ViewIndex))
        {
            const FSceneView* View = Views[ViewIndex];
            
            // 從Colloctor分配一個FMeshBatch.
            FMeshBatch& Mesh = Collector.AllocateMesh();
            
            // 創建基礎的網格批次對象(FMeshBatchElement實例).
            CreateBaseMeshBatch(View, LODData, LODIndex, SectionIndex, SectionElementInfo, Mesh);
            
            if(!Mesh.VertexFactory)
            {
                // hide this part
                continue;
            }

            Mesh.bWireframe |= bForceWireframe;
            Mesh.Type = PT_TriangleList;
            Mesh.bSelectable = bInSelectable;
            
            // 設置首個FMeshBatchElement對象.
            FMeshBatchElement& BatchElement = Mesh.Elements[0];
            const bool bRequiresAdjacencyInformation = RequiresAdjacencyInformation( SectionElementInfo.Material, Mesh.VertexFactory->GetType(), ViewFamily.GetFeatureLevel() );
            if ( bRequiresAdjacencyInformation )
            {
                check(LODData.AdjacencyMultiSizeIndexContainer.IsIndexBufferValid() );
                BatchElement.IndexBuffer = LODData.AdjacencyMultiSizeIndexContainer.GetIndexBuffer();
                Mesh.Type = PT_12_ControlPointPatchList;
                BatchElement.FirstIndex *= 4;
            }

            BatchElement.MinVertexIndex = Section.BaseVertexIndex;
            Mesh.ReverseCulling = IsLocalToWorldDeterminantNegative();
            Mesh.CastShadow = SectionElementInfo.bEnableShadowCasting;
            Mesh.bCanApplyViewModeOverrides = true;
            Mesh.bUseWireframeSelectionColoring = bIsSelected;
            
            (......)

            if ( ensureMsgf(Mesh.MaterialRenderProxy, TEXT("GetDynamicElementsSection with invalid MaterialRenderProxy. Owner:%s LODIndex:%d UseMaterialIndex:%d"), *GetOwnerName().ToString(), LODIndex, SectionElementInfo.UseMaterialIndex) &&
                 ensureMsgf(Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel), TEXT("GetDynamicElementsSection with invalid FMaterial. Owner:%s LODIndex:%d UseMaterialIndex:%d"), *GetOwnerName().ToString(), LODIndex, SectionElementInfo.UseMaterialIndex) )
            {
                // 將FMeshBatch添加到收集器中.
                Collector.AddMesh(ViewIndex, Mesh);
            }
            
            (......)
        }
    }
}

由此可見,FSkeletalMeshSceneProxy會根據不同的LOD索引,給每個Section網格添加一個FMeshBatch,每個FMeshBatch只有一個FMeshBatchElement實例。此外,FSceneRender::GatherDynamicMeshElements的邏輯中還有關鍵的一句ComputeDynamicMeshRelevance,它的作用是計算當前的MeshBatch會被哪些MeshPass引用,從而加到view的對應MeshPass的計數中:

// Engine\Source\Runtime\Renderer\Private\SceneVisibility.cpp

void ComputeDynamicMeshRelevance(EShadingPath ShadingPath, bool bAddLightmapDensityCommands, const FPrimitiveViewRelevance& ViewRelevance, const FMeshBatchAndRelevance& MeshBatch, FViewInfo& View, FMeshPassMask& PassMask, FPrimitiveSceneInfo* PrimitiveSceneInfo, const FPrimitiveBounds& Bounds)
{
    const int32 NumElements = MeshBatch.Mesh->Elements.Num();

    // 深度Pass/主Pass計數.
    if (ViewRelevance.bDrawRelevance && (ViewRelevance.bRenderInMainPass || ViewRelevance.bRenderCustomDepth || ViewRelevance.bRenderInDepthPass))
    {
        PassMask.Set(EMeshPass::DepthPass);
        View.NumVisibleDynamicMeshElements[EMeshPass::DepthPass] += NumElements;

        if (ViewRelevance.bRenderInMainPass || ViewRelevance.bRenderCustomDepth)
        {
            PassMask.Set(EMeshPass::BasePass);
            View.NumVisibleDynamicMeshElements[EMeshPass::BasePass] += NumElements;

            if (ShadingPath == EShadingPath::Mobile)
            {
                PassMask.Set(EMeshPass::MobileBasePassCSM);
                View.NumVisibleDynamicMeshElements[EMeshPass::MobileBasePassCSM] += NumElements;
            }

            if (ViewRelevance.bRenderCustomDepth)
            {
                PassMask.Set(EMeshPass::CustomDepth);
                View.NumVisibleDynamicMeshElements[EMeshPass::CustomDepth] += NumElements;
            }

            if (bAddLightmapDensityCommands)
            {
                PassMask.Set(EMeshPass::LightmapDensity);
                View.NumVisibleDynamicMeshElements[EMeshPass::LightmapDensity] += NumElements;
            }


            if (ViewRelevance.bVelocityRelevance)
            {
                PassMask.Set(EMeshPass::Velocity);
                View.NumVisibleDynamicMeshElements[EMeshPass::Velocity] += NumElements;
            }

            if (ViewRelevance.bOutputsTranslucentVelocity)
            {
                PassMask.Set(EMeshPass::TranslucentVelocity);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucentVelocity] += NumElements;
            }

            if (ViewRelevance.bUsesSingleLayerWaterMaterial)
            {
                PassMask.Set(EMeshPass::SingleLayerWaterPass);
                View.NumVisibleDynamicMeshElements[EMeshPass::SingleLayerWaterPass] += NumElements;
            }
        }
    }
    
    // 半透明及其它Pass計數.
    if (ViewRelevance.HasTranslucency()
        && !ViewRelevance.bEditorPrimitiveRelevance
        && ViewRelevance.bRenderInMainPass)
    {
        if (View.Family->AllowTranslucencyAfterDOF())
        {
            if (ViewRelevance.bNormalTranslucency)
            {
                PassMask.Set(EMeshPass::TranslucencyStandard);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyStandard] += NumElements;
            }

            if (ViewRelevance.bSeparateTranslucency)
            {
                PassMask.Set(EMeshPass::TranslucencyAfterDOF);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAfterDOF] += NumElements;
            }

            if (ViewRelevance.bSeparateTranslucencyModulate)
            {
                PassMask.Set(EMeshPass::TranslucencyAfterDOFModulate);
                View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAfterDOFModulate] += NumElements;
            }
        }
        else
        {
            PassMask.Set(EMeshPass::TranslucencyAll);
            View.NumVisibleDynamicMeshElements[EMeshPass::TranslucencyAll] += NumElements;
        }

        if (ViewRelevance.bDistortion)
        {
            PassMask.Set(EMeshPass::Distortion);
            View.NumVisibleDynamicMeshElements[EMeshPass::Distortion] += NumElements;
        }

        if (ShadingPath == EShadingPath::Mobile && View.bIsSceneCapture)
        {
            PassMask.Set(EMeshPass::MobileInverseOpacity);
            View.NumVisibleDynamicMeshElements[EMeshPass::MobileInverseOpacity] += NumElements;
        }
    }

    (......)
}

上面的程式碼中還涉及到收集器FMeshElementCollector,它的作用是收集指定view的所有可見MeshBatch資訊,聲明如下:

// Engine\Source\Runtime\Engine\Public\SceneManagement.h

class FMeshElementCollector
{
public:
    // 繪製點, 線, 面, 精靈的介面.
    FPrimitiveDrawInterface* GetPDI(int32 ViewIndex)
    {
        return SimpleElementCollectors[ViewIndex];
    }
    // 分配一個FMeshBatch對象.
    FMeshBatch& AllocateMesh()
    {
        const int32 Index = MeshBatchStorage.Add(1);
        return MeshBatchStorage[Index];
    }
    
    // 增加MeshBatch到收集器中. 添加時會初始化和設置相關數據, 再添加到MeshBatches列表中.
    void AddMesh(int32 ViewIndex, FMeshBatch& MeshBatch);
    
    // 數據獲取介面.
    FGlobalDynamicIndexBuffer& GetDynamicIndexBuffer();
    FGlobalDynamicVertexBuffer& GetDynamicVertexBuffer();
    FGlobalDynamicReadBuffer& GetDynamicReadBuffer();
    uint32 GetMeshBatchCount(uint32 ViewIndex) const;
    uint32 GetMeshElementCount(uint32 ViewIndex) const;
    ERHIFeatureLevel::Type GetFeatureLevel() const;

    void RegisterOneFrameMaterialProxy(FMaterialRenderProxy* Proxy);
    template<typename T, typename... ARGS>
    T& AllocateOneFrameResource(ARGS&&... Args);
    bool ShouldUseTasks() const;
    
    // 任務介面.
    void AddTask(TFunction<void()>&& Task)
    {
        ParallelTasks.Add(new (FMemStack::Get()) TFunction<void()>(MoveTemp(Task)));
    }
    void AddTask(const TFunction<void()>& Task)
    {
        ParallelTasks.Add(new (FMemStack::Get()) TFunction<void()>(Task));
    }
    void ProcessTasks();
    
protected:
    FMeshElementCollector(ERHIFeatureLevel::Type InFeatureLevel);

    // 設置FPrimitiveSceneProxy的數據.
    void SetPrimitive(const FPrimitiveSceneProxy* InPrimitiveSceneProxy, FHitProxyId DefaultHitProxyId)
    {
        check(InPrimitiveSceneProxy);
        PrimitiveSceneProxy = InPrimitiveSceneProxy;

        for (int32 ViewIndex = 0; ViewIndex < SimpleElementCollectors.Num(); ViewIndex++)
        {
            SimpleElementCollectors[ViewIndex]->HitProxyId = DefaultHitProxyId;
            SimpleElementCollectors[ViewIndex]->PrimitiveMeshId = 0;
        }

        for (int32 ViewIndex = 0; ViewIndex < MeshIdInPrimitivePerView.Num(); ++ViewIndex)
        {
            MeshIdInPrimitivePerView[ViewIndex] = 0;
        }
    }

    void ClearViewMeshArrays();

    // 向View添加一組Mesh.
    void AddViewMeshArrays(
        FSceneView* InView, 
        TArray<FMeshBatchAndRelevance,SceneRenderingAllocator>* ViewMeshes,
        FSimpleElementCollector* ViewSimpleElementCollector, 
        TArray<FPrimitiveUniformShaderParameters>* InDynamicPrimitiveShaderData,
        ERHIFeatureLevel::Type InFeatureLevel,
        FGlobalDynamicIndexBuffer* InDynamicIndexBuffer,
        FGlobalDynamicVertexBuffer* InDynamicVertexBuffer,
        FGlobalDynamicReadBuffer* InDynamicReadBuffer);

    TChunkedArray<FMeshBatch> MeshBatchStorage; // 保存分配的所有FMeshBatch實例.
    TArray<TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>*, TInlineAllocator<2> > MeshBatches; // 需要被渲染的FMeshBatch實例
    TArray<int32, TInlineAllocator<2> > NumMeshBatchElementsPerView; // 每個view收集到的MeshBatchElement數量.
    TArray<FSimpleElementCollector*, TInlineAllocator<2> > SimpleElementCollectors; // 點線面精靈等簡單物體的收集器.

    TArray<FSceneView*, TInlineAllocator<2> > Views; // 收集器收集的FSceneView實例.
    TArray<uint16, TInlineAllocator<2> > MeshIdInPrimitivePerView; // Current Mesh Id In Primitive per view
    TArray<TArray<FPrimitiveUniformShaderParameters>*, TInlineAllocator<2> > DynamicPrimitiveShaderDataPerView; // view的動態圖元數據, 用於更新到GPU Scene中.
    
    TArray<FMaterialRenderProxy*, SceneRenderingAllocator> TemporaryProxies;
    TArray<FOneFrameResource*, SceneRenderingAllocator> OneFrameResources;

    const FPrimitiveSceneProxy* PrimitiveSceneProxy; // 當前正在收集的PrimitiveSceneProxy

    // 全局動態緩衝.
    FGlobalDynamicIndexBuffer* DynamicIndexBuffer;
    FGlobalDynamicVertexBuffer* DynamicVertexBuffer;
    FGlobalDynamicReadBuffer* DynamicReadBuffer;

    ERHIFeatureLevel::Type FeatureLevel;

    const bool bUseAsyncTasks; // 是否使用非同步任務.
    TArray<TFunction<void()>*, SceneRenderingAllocator> ParallelTasks; // 收集完動態網格數據後需要等待處理的任務列表.
};

FMeshElementCollector和View是一一對應關係,每個View在渲染之初都會創建一個收集器。收集器收集完對應view的可見圖元列表後,通常擁有一組需要渲染的FMeshBatch列表,以及它們的管理數據和狀態,為後續的流程收集和準備足夠的準備。

此外,FMeshElementCollector在收集完網格數據後,還可以指定需要等待處理的任務列表,以實現多執行緒並行處理的同步。

3.2.3 從FMeshBatch到FMeshDrawCommand

上一節內容講到收集完動態的MeshElement,實際上,緊接著會調用SetupMeshPass來創建FMeshPassProcessor

void FSceneRender::Render(FRHICommandListImmediate& RHICmdList)
{
    bool FDeferredShadingSceneRenderer::InitViews((FRHICommandListImmediate& RHICmdList,  ...)
    {
        void FSceneRender::ComputeViewVisibility(FRHICommandListImmediate& RHICmdList, ...)
        {
            // 收集動態MeshElement
            FSceneRender::GatherDynamicMeshElements(Views, Scene, ViewFamily, DynamicIndexBuffer, DynamicVertexBuffer, DynamicReadBuffer, HasDynamicMeshElementsMasks, HasDynamicEditorMeshElementsMasks, HasViewCustomDataMasks, MeshCollector);
            
            // 處理所有view的FMeshPassProcessor.
            for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
            {
                FViewInfo& View = Views[ViewIndex];
                if (!View.ShouldRenderView())
                {
                    continue;
                }
                
                // 處理指定view的FMeshPassProcessor.
                FViewCommands& ViewCommands = ViewCommandsPerView[ViewIndex];
                SetupMeshPass(View, BasePassDepthStencilAccess, ViewCommands);
            }
        }
    }
}

其中FSceneRenderer::SetupMeshPass邏輯和解釋如下:

void FSceneRenderer::SetupMeshPass(FViewInfo& View, FExclusiveDepthStencil::Type BasePassDepthStencilAccess, FViewCommands& ViewCommands)
{
    const EShadingPath ShadingPath = Scene->GetShadingPath();
    
    // 遍歷EMeshPass定義的所有Pass。
    for (int32 PassIndex = 0; PassIndex < EMeshPass::Num; PassIndex++)
    {
        const EMeshPass::Type PassType = (EMeshPass::Type)PassIndex;
        
        if ((FPassProcessorManager::GetPassFlags(ShadingPath, PassType) & EMeshPassFlags::MainView) != EMeshPassFlags::None)
        {
            (......)

            // 創建FMeshPassProcessor
            PassProcessorCreateFunction CreateFunction = FPassProcessorManager::GetCreateFunction(ShadingPath, PassType);
            FMeshPassProcessor* MeshPassProcessor = CreateFunction(Scene, &View, nullptr);

            // 獲取指定Pass的FParallelMeshDrawCommandPass對象。
            FParallelMeshDrawCommandPass& Pass = View.ParallelMeshDrawCommandPasses[PassIndex];

            if (ShouldDumpMeshDrawCommandInstancingStats())
            {
                Pass.SetDumpInstancingStats(GetMeshPassName(PassType));
            }

            // 並行地處理可見Pass的處理任務,創建此Pass的所有繪製命令。
            Pass.DispatchPassSetup(
                Scene,
                View,
                PassType,
                BasePassDepthStencilAccess,
                MeshPassProcessor,
                View.DynamicMeshElements,
                &View.DynamicMeshElementsPassRelevance,
                View.NumVisibleDynamicMeshElements[PassType],
                ViewCommands.DynamicMeshCommandBuildRequests[PassType],
                ViewCommands.NumDynamicMeshCommandBuildRequestElements[PassType],
                ViewCommands.MeshCommands[PassIndex]);
        }
    }
}

上面程式碼涉及的EMeshPass枚舉定義如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

namespace EMeshPass
{
    enum Type
    {
        DepthPass,            // 深度
        BasePass,            // 幾何/基礎
        SkyPass,             // 天空
        SingleLayerWaterPass, // 單層水體
        CSMShadowDepth,     // 級聯陰影深度
        Distortion,         // 擾動
        Velocity,             // 速度
        
        // 透明相關的Pass
        TranslucentVelocity,
        TranslucencyStandard,
        TranslucencyAfterDOF, 
        TranslucencyAfterDOFModulate,
        TranslucencyAll, 
        
        LightmapDensity,     // 光照圖強度
        DebugViewMode,        // 調試視圖模式
        CustomDepth,        // 自定義深度
        MobileBasePassCSM,
        MobileInverseOpacity, 
        VirtualTexture,        // 虛擬紋理

        // 編輯器模式下的特殊Pass
#if WITH_EDITOR
        HitProxy,
        HitProxyOpaqueOnly,
        EditorSelection,
#endif

        Num,
        NumBits = 5,
    };
}

由此可見,UE事先羅列了所有可能需要繪製的Pass,在SetupMeshPass階段對需要用到的Pass並行化地生成DrawCommand。其中FParallelMeshDrawCommandPass::DispatchPassSetup主要邏輯和解析如下:

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchPassSetup(
    FScene* Scene,
    const FViewInfo& View,
    EMeshPass::Type PassType,
    FExclusiveDepthStencil::Type BasePassDepthStencilAccess,
    FMeshPassProcessor* MeshPassProcessor,
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>& DynamicMeshElements,
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance,
    int32 NumDynamicMeshElements,
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator>& InOutDynamicMeshCommandBuildRequests,
    int32 NumDynamicMeshCommandBuildRequestElements,
    FMeshCommandOneFrameArray& InOutMeshDrawCommands,
    FMeshPassProcessor* MobileBasePassCSMMeshPassProcessor,
    FMeshCommandOneFrameArray* InOutMobileBasePassCSMMeshDrawCommands
)
{
    MaxNumDraws = InOutMeshDrawCommands.Num() + NumDynamicMeshElements + NumDynamicMeshCommandBuildRequestElements;
    
    // 設置TaskContext的數據,收集生成MeshCommand所需的數據。
    TaskContext.MeshPassProcessor = MeshPassProcessor;
    TaskContext.MobileBasePassCSMMeshPassProcessor = MobileBasePassCSMMeshPassProcessor;
    TaskContext.DynamicMeshElements = &DynamicMeshElements;
    TaskContext.DynamicMeshElementsPassRelevance = DynamicMeshElementsPassRelevance;

    TaskContext.View = &View;
    TaskContext.ShadingPath = Scene->GetShadingPath();
    TaskContext.ShaderPlatform = Scene->GetShaderPlatform();
    TaskContext.PassType = PassType;
    TaskContext.bUseGPUScene = UseGPUScene(GMaxRHIShaderPlatform, View.GetFeatureLevel());
    TaskContext.bDynamicInstancing = IsDynamicInstancingEnabled(View.GetFeatureLevel());
    TaskContext.bReverseCulling = View.bReverseCulling;
    TaskContext.bRenderSceneTwoSided = View.bRenderSceneTwoSided;
    TaskContext.BasePassDepthStencilAccess = BasePassDepthStencilAccess;
    TaskContext.DefaultBasePassDepthStencilAccess = Scene->DefaultBasePassDepthStencilAccess;
    TaskContext.NumDynamicMeshElements = NumDynamicMeshElements;
    TaskContext.NumDynamicMeshCommandBuildRequestElements = NumDynamicMeshCommandBuildRequestElements;

    // Only apply instancing for ISR to main view passes
    const bool bIsMainViewPass = PassType != EMeshPass::Num && (FPassProcessorManager::GetPassFlags(TaskContext.ShadingPath, TaskContext.PassType) & EMeshPassFlags::MainView) != EMeshPassFlags::None;
    TaskContext.InstanceFactor = (bIsMainViewPass && View.IsInstancedStereoPass()) ? 2 : 1;

    // 設置基於view的透明排序鍵
    TaskContext.TranslucencyPass = ETranslucencyPass::TPT_MAX;
    TaskContext.TranslucentSortPolicy = View.TranslucentSortPolicy;
    TaskContext.TranslucentSortAxis = View.TranslucentSortAxis;
    TaskContext.ViewOrigin = View.ViewMatrices.GetViewOrigin();
    TaskContext.ViewMatrix = View.ViewMatrices.GetViewMatrix();
    TaskContext.PrimitiveBounds = &Scene->PrimitiveBounds;

    switch (PassType)
    {
        case EMeshPass::TranslucencyStandard: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_StandardTranslucency; break;
        case EMeshPass::TranslucencyAfterDOF: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_TranslucencyAfterDOF; break;
        case EMeshPass::TranslucencyAfterDOFModulate: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_TranslucencyAfterDOFModulate; break;
        case EMeshPass::TranslucencyAll: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_AllTranslucency; break;
        case EMeshPass::MobileInverseOpacity: TaskContext.TranslucencyPass = ETranslucencyPass::TPT_StandardTranslucency; break;
    }
    
    // 交換命令列表
    FMemory::Memswap(&TaskContext.MeshDrawCommands, &InOutMeshDrawCommands, sizeof(InOutMeshDrawCommands));
    FMemory::Memswap(&TaskContext.DynamicMeshCommandBuildRequests, &InOutDynamicMeshCommandBuildRequests, sizeof(InOutDynamicMeshCommandBuildRequests));

    if (TaskContext.ShadingPath == EShadingPath::Mobile && TaskContext.PassType == EMeshPass::BasePass)
    {
        FMemory::Memswap(&TaskContext.MobileBasePassCSMMeshDrawCommands, InOutMobileBasePassCSMMeshDrawCommands, sizeof(*InOutMobileBasePassCSMMeshDrawCommands));
    }
    else
    {
        check(MobileBasePassCSMMeshPassProcessor == nullptr && InOutMobileBasePassCSMMeshDrawCommands == nullptr);
    }

    if (MaxNumDraws > 0)
    {
        // 根據最大繪製數量(MaxNumDraws)在渲染執行緒預分配資源.
        bPrimitiveIdBufferDataOwnedByRHIThread = false;
        TaskContext.PrimitiveIdBufferDataSize = TaskContext.InstanceFactor * MaxNumDraws * sizeof(int32);
        TaskContext.PrimitiveIdBufferData = FMemory::Malloc(TaskContext.PrimitiveIdBufferDataSize);
        PrimitiveIdVertexBufferPoolEntry = GPrimitiveIdVertexBufferPool.Allocate(TaskContext.PrimitiveIdBufferDataSize);
        TaskContext.MeshDrawCommands.Reserve(MaxNumDraws);
        TaskContext.TempVisibleMeshDrawCommands.Reserve(MaxNumDraws);

        const bool bExecuteInParallel = FApp::ShouldUseThreadingForPerformance()
            && CVarMeshDrawCommandsParallelPassSetup.GetValueOnRenderThread() > 0
            && GRenderingThread; // Rendering thread is required to safely use rendering resources in parallel.
        
        // 如果是並行方式, 便創建並行任務實例並加入TaskGraph系統執行.
        if (bExecuteInParallel) 
        {
            FGraphEventArray DependentGraphEvents;
            DependentGraphEvents.Add(TGraphTask<FMeshDrawCommandPassSetupTask>::CreateTask(nullptr, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(TaskContext));
            TaskEventRef = TGraphTask<FMeshDrawCommandInitResourcesTask>::CreateTask(&DependentGraphEvents, ENamedThreads::GetRenderThread()).ConstructAndDispatchWhenReady(TaskContext);
        }
        else
        {
            QUICK_SCOPE_CYCLE_COUNTER(STAT_MeshPassSetupImmediate);
            FMeshDrawCommandPassSetupTask Task(TaskContext);
            Task.AnyThreadTask();
            FMeshDrawCommandInitResourcesTask DependentTask(TaskContext);
            DependentTask.AnyThreadTask();
        }
    }
}

以上程式碼涉及了幾個關鍵的概念:FMeshPassProcessor,FMeshDrawCommandPassSetupTaskContext,FMeshDrawCommandPassSetupTask,FMeshDrawCommandInitResourcesTask。後面3個概念的定義和解析如下:

// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.h

// 並行網格繪製命令通道設置任務(FMeshDrawCommandPassSetupTask)所需的上下文.
class FMeshDrawCommandPassSetupTaskContext
{
public:
    // view相關的數據.
    const FViewInfo* View;
    EShadingPath ShadingPath;
    EShaderPlatform ShaderPlatform;
    EMeshPass::Type PassType;
    bool bUseGPUScene;
    bool bDynamicInstancing;
    bool bReverseCulling;
    bool bRenderSceneTwoSided;
    FExclusiveDepthStencil::Type BasePassDepthStencilAccess;
    FExclusiveDepthStencil::Type DefaultBasePassDepthStencilAccess;

    // 網格通道處理器(Mesh pass processor).
    FMeshPassProcessor* MeshPassProcessor;
    FMeshPassProcessor* MobileBasePassCSMMeshPassProcessor;
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>* DynamicMeshElements;
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance;

    // 命令相關的數據.
    int32 InstanceFactor;
    int32 NumDynamicMeshElements;
    int32 NumDynamicMeshCommandBuildRequestElements;
    FMeshCommandOneFrameArray MeshDrawCommands;
    FMeshCommandOneFrameArray MobileBasePassCSMMeshDrawCommands;
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator> DynamicMeshCommandBuildRequests;
    TArray<const FStaticMeshBatch*, SceneRenderingAllocator> MobileBasePassCSMDynamicMeshCommandBuildRequests;
    FDynamicMeshDrawCommandStorage MeshDrawCommandStorage;
    FGraphicsMinimalPipelineStateSet MinimalPipelineStatePassSet;
    bool NeedsShaderInitialisation;

    // 需在渲染執行緒預分配的資源.
    void* PrimitiveIdBufferData;
    int32 PrimitiveIdBufferDataSize;
    FMeshCommandOneFrameArray TempVisibleMeshDrawCommands;

    // 透明物體排序所需.
    ETranslucencyPass::Type TranslucencyPass;
    ETranslucentSortPolicy::Type TranslucentSortPolicy;
    FVector TranslucentSortAxis;
    FVector ViewOrigin;
    FMatrix ViewMatrix;
    const TArray<struct FPrimitiveBounds>* PrimitiveBounds;

    // For logging instancing stats.
    int32 VisibleMeshDrawCommandsNum;
    int32 NewPassVisibleMeshDrawCommandsNum;
    int32 MaxInstances;
};


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

// 轉換指定EMeshPass中的每個FMeshBatch到一組FMeshDrawCommand. FMeshDrawCommandPassSetupTask要用到.
void GenerateDynamicMeshDrawCommands(
    const FViewInfo& View,
    EShadingPath ShadingPath,
    EMeshPass::Type PassType,
    FMeshPassProcessor* PassMeshProcessor,
    const TArray<FMeshBatchAndRelevance, SceneRenderingAllocator>& DynamicMeshElements,
    const TArray<FMeshPassMask, SceneRenderingAllocator>* DynamicMeshElementsPassRelevance,
    int32 MaxNumDynamicMeshElements,
    const TArray<const FStaticMeshBatch*, SceneRenderingAllocator>& DynamicMeshCommandBuildRequests,
    int32 MaxNumBuildRequestElements,
    FMeshCommandOneFrameArray& VisibleCommands,
    FDynamicMeshDrawCommandStorage& MeshDrawCommandStorage,
    FGraphicsMinimalPipelineStateSet& MinimalPipelineStatePassSet,
    bool& NeedsShaderInitialisation
)
{
    (......)

    // 構建FDynamicPassMeshDrawListContext實例, 用於傳遞PassMeshProcessor生成的繪製命令.
    FDynamicPassMeshDrawListContext DynamicPassMeshDrawListContext(
        MeshDrawCommandStorage,
        VisibleCommands,
        MinimalPipelineStatePassSet,
        NeedsShaderInitialisation
    );
    PassMeshProcessor->SetDrawListContext(&DynamicPassMeshDrawListContext);

    // 處理動態網格批次.
    {
        const int32 NumCommandsBefore = VisibleCommands.Num();
        const int32 NumDynamicMeshBatches = DynamicMeshElements.Num();
        
        // 遍歷所有的動態網格批次.
        for (int32 MeshIndex = 0; MeshIndex < NumDynamicMeshBatches; MeshIndex++)
        {
            if (!DynamicMeshElementsPassRelevance || (*DynamicMeshElementsPassRelevance)[MeshIndex].Get(PassType))
            {
                const FMeshBatchAndRelevance& MeshAndRelevance = DynamicMeshElements[MeshIndex];
                check(!MeshAndRelevance.Mesh->bRequiresPerElementVisibility);
                const uint64 BatchElementMask = ~0ull;
                
                // 將FMeshBatch加入到PassMeshProcessor進行處理.
                PassMeshProcessor->AddMeshBatch(*MeshAndRelevance.Mesh, BatchElementMask, MeshAndRelevance.PrimitiveSceneProxy);
            }
        }

        (......)
    }
    
    // 處理靜態網格批次.
    {
        const int32 NumCommandsBefore = VisibleCommands.Num();
        const int32 NumStaticMeshBatches = DynamicMeshCommandBuildRequests.Num();

        for (int32 MeshIndex = 0; MeshIndex < NumStaticMeshBatches; MeshIndex++)
        {
            const FStaticMeshBatch* StaticMeshBatch = DynamicMeshCommandBuildRequests[MeshIndex];
            const uint64 BatchElementMask = StaticMeshBatch->bRequiresPerElementVisibility ? View.StaticMeshBatchVisibility[StaticMeshBatch->BatchVisibilityId] : ~0ull;
            
            // 將FMeshBatch加入到PassMeshProcessor進行處理.
            PassMeshProcessor->AddMeshBatch(*StaticMeshBatch, BatchElementMask, StaticMeshBatch->PrimitiveSceneInfo->Proxy, StaticMeshBatch->Id);
        }

        (......)
    }
}

// 並行設置網格繪製指令的任務. 包含動態網格繪製命令的生成, 排序, 合併等.
class FMeshDrawCommandPassSetupTask
{
public:
    FMeshDrawCommandPassSetupTask(FMeshDrawCommandPassSetupTaskContext& InContext)
        : Context(InContext)
    {
    }
    
    (......)

    void AnyThreadTask()
    {
        const bool bMobileShadingBasePass = Context.ShadingPath == EShadingPath::Mobile && Context.PassType == EMeshPass::BasePass;
        const bool bMobileVulkanSM5BasePass = IsVulkanMobileSM5Platform(Context.ShaderPlatform) && Context.PassType == EMeshPass::BasePass;

        if (bMobileShadingBasePass)
        {
            (......)
        }
        else
        {
            // 生成動態和靜態網格繪製指令(通過MeshPassProcessor將FMeshBatch轉換成MeshDrawCommand).
            GenerateDynamicMeshDrawCommands(
                *Context.View,
                Context.ShadingPath,
                Context.PassType,
                Context.MeshPassProcessor,
                *Context.DynamicMeshElements,
                Context.DynamicMeshElementsPassRelevance,
                Context.NumDynamicMeshElements,
                Context.DynamicMeshCommandBuildRequests,
                Context.NumDynamicMeshCommandBuildRequestElements,
                Context.MeshDrawCommands,
                Context.MeshDrawCommandStorage,
                Context.MinimalPipelineStatePassSet,
                Context.NeedsShaderInitialisation
            );
        }

        if (Context.MeshDrawCommands.Num() > 0)
        {
            if (Context.PassType != EMeshPass::Num)
            {
                // 應用view中已經存在的MeshDrawCommand, 例如:渲染平面反射的反向裁剪模式(reverse culling mode)
                ApplyViewOverridesToMeshDrawCommands(
                    Context.ShadingPath,
                    Context.PassType,
                    Context.bReverseCulling,
                    Context.bRenderSceneTwoSided,
                    Context.BasePassDepthStencilAccess,
                    Context.DefaultBasePassDepthStencilAccess,
                    Context.MeshDrawCommands,
                    Context.MeshDrawCommandStorage,
                    Context.MinimalPipelineStatePassSet,
                    Context.NeedsShaderInitialisation,
                    Context.TempVisibleMeshDrawCommands
                );
            }

            // 更新排序鍵.
            if (bMobileShadingBasePass || bMobileVulkanSM5BasePass)
            {
                (......)
            }
            else if (Context.TranslucencyPass != ETranslucencyPass::TPT_MAX)
            {
                // 用view相關的數據更新網格排序鍵. 排序鍵的類型是FMeshDrawCommandSortKey, 包含了BasePass和透明的鍵值, 其中透明物體的排序以其到攝像機的距離為依據.
                UpdateTranslucentMeshSortKeys(
                    Context.TranslucentSortPolicy,
                    Context.TranslucentSortAxis,
                    Context.ViewOrigin,
                    Context.ViewMatrix,
                    *Context.PrimitiveBounds,
                    Context.TranslucencyPass,
                    Context.MeshDrawCommands
                );
            }

            {
                QUICK_SCOPE_CYCLE_COUNTER(STAT_SortVisibleMeshDrawCommands);
                // 執行MeshDrawCommand的排序, FCompareFMeshDrawCommands首先以FMeshDrawCommandSortKey作為排序依據, 其次再用StateBucketId.
                Context.MeshDrawCommands.Sort(FCompareFMeshDrawCommands());
            }

            if (Context.bUseGPUScene)
            {
                // 生成GPU場景的相關數據(主要是渲染場景中所有的Primitive數據).
                BuildMeshDrawCommandPrimitiveIdBuffer(
                    Context.bDynamicInstancing,
                    Context.MeshDrawCommands,
                    Context.MeshDrawCommandStorage,
                    Context.PrimitiveIdBufferData,
                    Context.PrimitiveIdBufferDataSize,
                    Context.TempVisibleMeshDrawCommands,
                    Context.MaxInstances,
                    Context.VisibleMeshDrawCommandsNum,
                    Context.NewPassVisibleMeshDrawCommandsNum,
                    Context.ShaderPlatform,
                    Context.InstanceFactor
                );
            }
        }
    }

    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        AnyThreadTask();
    }

private:
    FMeshDrawCommandPassSetupTaskContext& Context; // 設備上下文.
};


// MeshDrawCommand所需的預分配資源。
class FMeshDrawCommandInitResourcesTask
{
public:

    (......)

    void AnyThreadTask()
    {
        TRACE_CPUPROFILER_EVENT_SCOPE(MeshDrawCommandInitResourcesTask);
        if (Context.NeedsShaderInitialisation)
        {
            // 初始化所有已綁定的shader資源。
            for (const FGraphicsMinimalPipelineStateInitializer& Initializer : Context.MinimalPipelineStatePassSet)
            {
                Initializer.BoundShaderState.LazilyInitShaders();
            }
        }
    }

    void DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
    {
        AnyThreadTask();
    }

private:
    FMeshDrawCommandPassSetupTaskContext& Context;
};

由此可見FMeshDrawCommandPassSetupTask擔當了在網格渲染管線中擔當了相當重要的角色, 包含動態網格繪和靜態制繪製命令的生成、排序、合併等。其中排序階段的鍵值由FMeshDrawCommandSortKey決定,它的定義如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

// FVisibleMeshDrawCommand的排序鍵值.
class RENDERER_API FMeshDrawCommandSortKey
{
public:
    union 
    {
        uint64 PackedData;    // 打包後的64位鍵值數據

        // 幾何通道排序鍵值
        struct
        {
            uint64 VertexShaderHash        : 16; // 低位:VS地址的哈希值。
            uint64 PixelShaderHash        : 32; // 中位:PS地址的哈希值。
            uint64 Masked                : 16; // 高位:是否Masked的材質
        } BasePass;

        // 透明通道排序鍵值
        struct
        {
            uint64 MeshIdInPrimitive    : 16; // 低位:共享同一個Primitive的穩定的網格id
            uint64 Distance                : 32; // 中位:到攝像機的距離
            uint64 Priority                : 16; // 高位:優先順序(由材質指定)
        } Translucent;
    
        // 普通排序鍵值
        struct 
        {
            uint64 VertexShaderHash     : 32; // 低位:VS地址的哈希值。
            uint64 PixelShaderHash         : 32; // 高位:PS地址的哈希值。
        } Generic;
    };
    
    // 不相等操作符
    FORCEINLINE bool operator!=(FMeshDrawCommandSortKey B) const
    {
        return PackedData != B.PackedData;
    }

    // 小於操作符,用於排序。
    FORCEINLINE bool operator<(FMeshDrawCommandSortKey B) const
    {
        return PackedData < B.PackedData;
    }

    static const FMeshDrawCommandSortKey Default;
};

以上FMeshDrawCommandSortKey需要補充幾點說明:

  • FMeshDrawCommandSortKey雖然可存儲BasePass、透明Pass、普通Pass3種鍵值,但同時只有一種數據生效。

  • 鍵值的計算邏輯分散在不同的文件和階段。譬如BasePass的鍵值可以發生在BasePassRendering、DepthRendering以及MeshPassProcessor階段。其中它們的鍵值計算邏輯和解析如下表:

    鍵名 計算程式碼 解析
    VertexShaderHash PointerHash(VertexShader) 材質所用的VS的指針哈希值。
    PixelShaderHash PointerHash(PixelShader) 材質所用的PS的指針哈希值。
    Masked BlendMode == EBlendMode::BLEND_Masked ? 0 : 1 材質的混合模式是否Masked。
    MeshIdInPrimitive MeshIdInPrimitivePerView[ViewIndex] 基於視圖的共享同一個Primitive的穩定的網格id。
    Distance (uint32)~BitInvertIfNegativeFloat(((uint32)&Distance)) 根據ETranslucentSortPolicy算出Distance,再逆轉負數距離。
    Priority 直接從材質指定的透明排序優先順序獲得。
  • operator<直接對比PackedData,表明越高位的數據優先順序越高,具體地說,BasePass的排序依據首先是判斷是否Masked的材質,再判斷PS和VS的地址哈希值;同理,透明通道的排序優先順序依次是:材質指定的優先順序、網格到攝像機的距離、網格ID。

    一般來說,對網格進行排序時,對性能影響最大的因素會作為最大的優先順序。

    譬如BasePass階段,Masked的材質在某些GPU設備會嚴重阻礙並行效率和吞吐量,排在最高位;而PS在指令數量、計算複雜度往往高於VS,故而排在VS之前也是合情合理的。

    但是,透明通道的排序有一點比較特殊,那就是物體與攝像機的距離遠近關係,因為要正確繪製半透明物體的前後關係,必須將它們從遠到近的距離進行繪製,否則前後關係會出現錯亂。故而透明通道必須將距離放在最高位(優先順序最大)。

  • PackedData將若干組數據打包成單個uint64,在比較時只需比較一次,可提升排序的效率。否則按照傳統的寫法,用幾個if-else語句,勢必增加CPU指令數量,降低排序效率。

  • 修改鍵值和相關的排序邏輯,可自定義排序優先順序和演算法。譬如增加若干排序維度:紋理、頂點數據、渲染狀態等。

接下來闡述一些重要的概念:FMeshPassProcessorFMeshDrawCommands,上面的程式碼多次出現它們的身影。FMeshPassProcessor充當了將FMeshBatch轉換成FMeshDrawCommands的角色,下面是它們及關聯概念的定義和解析:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h


// 不包含渲染紋理(Render Target)的渲染管線狀態。在沒有更改RT的一組繪製指令中非常有用。它的尺寸會影響網格繪製指令的遍歷性能。
class FGraphicsMinimalPipelineStateInitializer
{
public:
    // RT的相關數據:像素格式,標記。
    using TRenderTargetFormats = TStaticArray<uint8/*EPixelFormat*/, MaxSimultaneousRenderTargets>;
    using TRenderTargetFlags = TStaticArray<uint32, MaxSimultaneousRenderTargets>;

    (......)

    // 將自己的值拷貝一份並傳遞出去。
    FGraphicsPipelineStateInitializer AsGraphicsPipelineStateInitializer() const
    {    
        return FGraphicsPipelineStateInitializer
        (    BoundShaderState.AsBoundShaderState()
            , BlendState
            , RasterizerState
            , DepthStencilState
            , ImmutableSamplerState
            , PrimitiveType
            , 0
            , FGraphicsPipelineStateInitializer::TRenderTargetFormats(PF_Unknown)
            , FGraphicsPipelineStateInitializer::TRenderTargetFlags(0)
            , PF_Unknown
            , 0
            , ERenderTargetLoadAction::ENoAction
            , ERenderTargetStoreAction::ENoAction
            , ERenderTargetLoadAction::ENoAction
            , ERenderTargetStoreAction::ENoAction
            , FExclusiveDepthStencil::DepthNop
            , 0
            , ESubpassHint::None
            , 0
            , 0
            , bDepthBounds
            , bMultiView
            , bHasFragmentDensityAttachment
        );
    }
    
    (......)

    // 計算FGraphicsMinimalPipelineStateInitializer的哈希值。
    inline friend uint32 GetTypeHash(const FGraphicsMinimalPipelineStateInitializer& Initializer)
    {
        //add and initialize any leftover padding within the struct to avoid unstable key
        struct FHashKey
        {
            uint32 VertexDeclaration;
            uint32 VertexShader;
            uint32 PixelShader;
            uint32 RasterizerState;
        } HashKey;
        HashKey.VertexDeclaration = PointerHash(Initializer.BoundShaderState.VertexDeclarationRHI);
        HashKey.VertexShader = GetTypeHash(Initializer.BoundShaderState.VertexShaderIndex);
        HashKey.PixelShader = GetTypeHash(Initializer.BoundShaderState.PixelShaderIndex);
        HashKey.RasterizerState = PointerHash(Initializer.RasterizerState);

        return uint32(CityHash64((const char*)&HashKey, sizeof(FHashKey)));
    }

    // 比較介面。
    bool operator==(const FGraphicsMinimalPipelineStateInitializer& rhs) const;
    bool operator!=(const FGraphicsMinimalPipelineStateInitializer& rhs) const
    bool operator<(const FGraphicsMinimalPipelineStateInitializer& rhs) const;
    bool operator>(const FGraphicsMinimalPipelineStateInitializer& rhs) const;

    // 渲染管線狀態
    FMinimalBoundShaderStateInput    BoundShaderState;     // 綁定的shader狀態。
    FRHIBlendState*                    BlendState;            // 混合狀態。
    FRHIRasterizerState*            RasterizerState;     // 光柵化狀態。
    FRHIDepthStencilState*            DepthStencilState;    // 深度目標狀態。
    FImmutableSamplerState            ImmutableSamplerState;    // 不可變的取樣器狀態。

    // 其它狀態。
    bool                bDepthBounds = false;
    bool                bMultiView = false;
    bool                bHasFragmentDensityAttachment = false;
    uint8                Padding[1] = {}; // 記憶體對齊而加的數據。

    EPrimitiveType        PrimitiveType;
};


// 唯一地代表了FGraphicsMinimalPipelineStateInitializer一個實例的id,用於快速排序。
class FGraphicsMinimalPipelineStateId
{
public:
    uint32 GetId() const
    {
        return PackedId;
    }
    
    // 判斷和比較介面。
    inline bool IsValid() const 
    inline bool operator==(const FGraphicsMinimalPipelineStateId& rhs) const;
    inline bool operator!=(const FGraphicsMinimalPipelineStateId& rhs) const;
    
    // 獲取關聯的FGraphicsMinimalPipelineStateInitializer。
    inline const FGraphicsMinimalPipelineStateInitializer& GetPipelineState(const FGraphicsMinimalPipelineStateSet& InPipelineSet) const
    {
        if (bComesFromLocalPipelineStateSet)
        {
            return InPipelineSet.GetByElementId(SetElementIndex);
        }

        {
            FScopeLock Lock(&PersistentIdTableLock);
            return PersistentIdTable.GetByElementId(SetElementIndex).Key;
        }
    }

    static void InitializePersistentIds();
    // 獲取FGraphicsMinimalPipelineStateInitializer對應的永久的pipeline state id。
    static FGraphicsMinimalPipelineStateId GetPersistentId(const FGraphicsMinimalPipelineStateInitializer& InPipelineState);
    static void RemovePersistentId(FGraphicsMinimalPipelineStateId Id);

    // 按如下順序獲取pipeline state id:全局永久的id表和PassSet參數,如果都沒找到,會創建一個空白的實例,並加入到PassSet參數。
    RENDERER_API static FGraphicsMinimalPipelineStateId GetPipelineStateId(const FGraphicsMinimalPipelineStateInitializer& InPipelineState, FGraphicsMinimalPipelineStateSet& InOutPassSet, bool& NeedsShaderInitialisation);


private:
    // 打包的鍵值。
    union
    {
        uint32 PackedId = 0;

        struct
        {
            uint32 SetElementIndex                   : 30;
            uint32 bComesFromLocalPipelineStateSet : 1;
            uint32 bValid                           : 1;
        };
    };

    struct FRefCountedGraphicsMinimalPipelineState
    {
        FRefCountedGraphicsMinimalPipelineState() : RefNum(0)
        {
        }
        uint32 RefNum;
    };

    static FCriticalSection PersistentIdTableLock;
    using PersistentTableType = Experimental::TRobinHoodHashMap<FGraphicsMinimalPipelineStateInitializer, FRefCountedGraphicsMinimalPipelineState>;
    // 持久id表。
    static PersistentTableType PersistentIdTable;

    static int32 LocalPipelineIdTableSize;
    static int32 CurrentLocalPipelineIdTableSize;
    static bool NeedsShaderInitialisation;
};

// 網格繪製指令,記錄了繪製單個Mesh所需的所有資源和數據,且不應該有多餘的數據,如果需要在InitView傳遞數據,可用FVisibleMeshDrawCommand。
// 所有被FMeshDrawCommand引用的資源都必須保證生命周期,因為FMeshDrawCommand並不管理資源的生命周期。
class FMeshDrawCommand
{
public:
    // 資源綁定
    FMeshDrawShaderBindings ShaderBindings;
    FVertexInputStreamArray VertexStreams;
    FRHIIndexBuffer* IndexBuffer;

    // 快取的渲染管線狀態(PSO)
    FGraphicsMinimalPipelineStateId CachedPipelineId;

    // 繪製命令參數。
    uint32 FirstIndex;
    uint32 NumPrimitives;
    uint32 NumInstances;

    // 頂點數據,包含普通模式和非直接模式。
    union
    {
        struct 
        {
            uint32 BaseVertexIndex;
            uint32 NumVertices;
        } VertexParams;
        
        struct  
        {
            FRHIVertexBuffer* Buffer;
            uint32 Offset;
        } IndirectArgs;
    };

    int8 PrimitiveIdStreamIndex;

    // 非渲染狀態參數。
    uint8 StencilRef;

    // 判斷是否和指定的FMeshDrawCommand相匹配,如果匹配,可以合併成同一個instance進行繪製。
    bool MatchesForDynamicInstancing(const FMeshDrawCommand& Rhs) const
    {
        return CachedPipelineId == Rhs.CachedPipelineId
            && StencilRef == Rhs.StencilRef
            && ShaderBindings.MatchesForDynamicInstancing(Rhs.ShaderBindings)
            && VertexStreams == Rhs.VertexStreams
            && PrimitiveIdStreamIndex == Rhs.PrimitiveIdStreamIndex
            && IndexBuffer == Rhs.IndexBuffer
            && FirstIndex == Rhs.FirstIndex
            && NumPrimitives == Rhs.NumPrimitives
            && NumInstances == Rhs.NumInstances
            && ((NumPrimitives > 0 && VertexParams.BaseVertexIndex == Rhs.VertexParams.BaseVertexIndex && VertexParams.NumVertices == Rhs.VertexParams.NumVertices)
                || (NumPrimitives == 0 && IndirectArgs.Buffer == Rhs.IndirectArgs.Buffer && IndirectArgs.Offset == Rhs.IndirectArgs.Offset));
    }

    // 獲取動態實例的哈希值。
    uint32 GetDynamicInstancingHash() const
    {
        //add and initialize any leftover padding within the struct to avoid unstable keys
        struct FHashKey
        {
            uint32 IndexBuffer;
            uint32 VertexBuffers = 0;
            uint32 VertexStreams = 0;
            uint32 PipelineId;
            uint32 DynamicInstancingHash;
            uint32 FirstIndex;
            uint32 NumPrimitives;
            uint32 NumInstances;
            uint32 IndirectArgsBufferOrBaseVertexIndex;
            uint32 NumVertices;
            uint32 StencilRefAndPrimitiveIdStreamIndex;

            // 指針地址哈希
            static inline uint32 PointerHash(const void* Key)
            {
#if PLATFORM_64BITS
                // Ignoring the lower 4 bits since they are likely zero anyway.
                // Higher bits are more significant in 64 bit builds.
                return reinterpret_cast<UPTRINT>(Key) >> 4;
#else
                return reinterpret_cast<UPTRINT>(Key);
#endif
            };

            // 哈希組合
            static inline uint32 HashCombine(uint32 A, uint32 B)
            {
                return A ^ (B + 0x9e3779b9 + (A << 6) + (A >> 2));
            }
        } HashKey;

        // 將FMeshDrawCommand的所有成員變數數值填充到FHashKey
        HashKey.PipelineId = CachedPipelineId.GetId();
        HashKey.StencilRefAndPrimitiveIdStreamIndex = StencilRef | (PrimitiveIdStreamIndex << 8);
        HashKey.DynamicInstancingHash = ShaderBindings.GetDynamicInstancingHash();

        for (int index = 0; index < VertexStreams.Num(); index++)
        {
            const FVertexInputStream& VertexInputStream = VertexStreams[index];
            const uint32 StreamIndex = VertexInputStream.StreamIndex;
            const uint32 Offset = VertexInputStream.Offset;

            uint32 Packed = (StreamIndex << 28) | Offset;
            HashKey.VertexStreams = FHashKey::HashCombine(HashKey.VertexStreams, Packed);
            HashKey.VertexBuffers = FHashKey::HashCombine(HashKey.VertexBuffers, FHashKey::PointerHash(VertexInputStream.VertexBuffer));
        }

        HashKey.IndexBuffer = FHashKey::PointerHash(IndexBuffer);
        HashKey.FirstIndex = FirstIndex;
        HashKey.NumPrimitives = NumPrimitives;
        HashKey.NumInstances = NumInstances;

        if (NumPrimitives > 0)
        {
            HashKey.IndirectArgsBufferOrBaseVertexIndex = VertexParams.BaseVertexIndex;
            HashKey.NumVertices = VertexParams.NumVertices;
        }
        else
        {
            HashKey.IndirectArgsBufferOrBaseVertexIndex = FHashKey::PointerHash(IndirectArgs.Buffer);
            HashKey.NumVertices = IndirectArgs.Offset;
        }        

        // 將填充完的HashKey轉成哈希值,數據完全一樣的HashKey總是具有相同的哈希值,這樣可以很方便地判斷是否可以合批渲染。
        return uint32(CityHash64((char*)&HashKey, sizeof(FHashKey)));
    }

    (......)
    
    // 將FMeshBatch的相關數據進行處理並傳遞到FMeshDrawCommand中。
    void SetDrawParametersAndFinalize(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        FGraphicsMinimalPipelineStateId PipelineId,
        const FMeshProcessorShaders* ShadersForDebugging)
    {
        const FMeshBatchElement& BatchElement = MeshBatch.Elements[BatchElementIndex];

        IndexBuffer = BatchElement.IndexBuffer ? BatchElement.IndexBuffer->IndexBufferRHI.GetReference() : nullptr;
        FirstIndex = BatchElement.FirstIndex;
        NumPrimitives = BatchElement.NumPrimitives;
        NumInstances = BatchElement.NumInstances;

        if (NumPrimitives > 0)
        {
            VertexParams.BaseVertexIndex = BatchElement.BaseVertexIndex;
            VertexParams.NumVertices = BatchElement.MaxVertexIndex - BatchElement.MinVertexIndex + 1;
        }
        else
        {
            IndirectArgs.Buffer = BatchElement.IndirectArgsBuffer;
            IndirectArgs.Offset = BatchElement.IndirectArgsOffset;
        }

        Finalize(PipelineId, ShadersForDebugging);
    }

    // 保存PipelineId和shader調試資訊。
    void Finalize(FGraphicsMinimalPipelineStateId PipelineId, const FMeshProcessorShaders* ShadersForDebugging)
    {
        CachedPipelineId = PipelineId;
        ShaderBindings.Finalize(ShadersForDebugging);    
    }

    /** Submits commands to the RHI Commandlist to draw the MeshDrawCommand. */
    static void SubmitDraw(
        const FMeshDrawCommand& RESTRICT MeshDrawCommand, 
        const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
        FRHIVertexBuffer* ScenePrimitiveIdsBuffer,
        int32 PrimitiveIdOffset,
        uint32 InstanceFactor,
        FRHICommandList& CommandList, 
        class FMeshDrawCommandStateCache& RESTRICT StateCache);

    (......)
};


// 可見的網格繪製指令。存儲了已經被斷定為可見的網格繪製指令所需的資訊,以便後續進行可見性處理。
// 與FMeshDrawCommand不同的是,FVisibleMeshDrawCommand只應該存儲InitViews操作(可見性/排序)所需的書,而不應該有繪製提交相關的數據。
class FVisibleMeshDrawCommand
{
public:
    (......)
    
    // 關聯的FMeshDrawCommand實例。
    const FMeshDrawCommand* MeshDrawCommand;
    // 基於無狀態排序的鍵值。(如基於深度排序的透明繪製指令)
    FMeshDrawCommandSortKey SortKey;
    // 繪製圖元id,可用於從PrimitiveSceneData的SRV獲取圖元數據。有效的DrawPrimitiveId可以反向追蹤FPrimitiveSceneInfo實例。
    int32 DrawPrimitiveId;
    // 生產FVisibleMeshDrawCommand的場景圖元id,如果是-1則代表沒有FPrimitiveSceneInfo,可以反向追蹤FPrimitiveSceneInfo實例。
    int32 ScenePrimitiveId;
    // Offset into the buffer of PrimitiveIds built for this pass, in int32's.
    int32 PrimitiveIdBufferOffset;

    // 動態instancing狀態桶id(Dynamic instancing state bucket ID)。
    // 所有相同StateBucketId的繪製指令可被合併到同一個instancing中。
    // -1表示由其它因素代替StateBucketId進行排序。
    int32 StateBucketId;

    // Needed for view overrides
    ERasterizerFillMode MeshFillMode : ERasterizerFillMode_NumBits + 1;
    ERasterizerCullMode MeshCullMode : ERasterizerCullMode_NumBits + 1;
};


// 網格通道處理器
class FMeshPassProcessor
{
public:
    
    // 以下的場景、view、context等數據由構建函數傳入.
    const FScene* RESTRICT Scene;
    ERHIFeatureLevel::Type FeatureLevel;
    const FSceneView* ViewIfDynamicMeshCommand;
    FMeshPassDrawListContext* DrawListContext;

    (......)

    // 增加FMeshBatch實例, 由具體的子類Pass實現.
    virtual void AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId = -1) = 0;
    
    // 網格繪製策略重寫設置.
    struct FMeshDrawingPolicyOverrideSettings
    {
        EDrawingPolicyOverrideFlags    MeshOverrideFlags = EDrawingPolicyOverrideFlags::None;
        EPrimitiveType                MeshPrimitiveType = PT_TriangleList;
    };
    
    (......)

    // 將1個FMeshBatch轉換成1或多個MeshDrawCommands.
    template<typename PassShadersType, typename ShaderElementDataType>
    void BuildMeshDrawCommands(
        const FMeshBatch& RESTRICT MeshBatch,
        uint64 BatchElementMask,
        const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy,
        const FMaterialRenderProxy& RESTRICT MaterialRenderProxy,
        const FMaterial& RESTRICT MaterialResource,
        const FMeshPassProcessorRenderState& RESTRICT DrawRenderState,
        PassShadersType PassShaders,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        EMeshPassFeatures MeshPassFeatures,
        const ShaderElementDataType& ShaderElementData)
    {
        const FVertexFactory* RESTRICT VertexFactory = MeshBatch.VertexFactory;
        const FPrimitiveSceneInfo* RESTRICT PrimitiveSceneInfo = PrimitiveSceneProxy ? PrimitiveSceneProxy->GetPrimitiveSceneInfo() : nullptr;

        // FMeshDrawCommand實例, 用於收集各類渲染資源和數據.
        FMeshDrawCommand SharedMeshDrawCommand;
        
        // 處理FMeshDrawCommand的模板數據.
        SharedMeshDrawCommand.SetStencilRef(DrawRenderState.GetStencilRef());

        // 渲染狀態實例.
        FGraphicsMinimalPipelineStateInitializer PipelineState;
        PipelineState.PrimitiveType = (EPrimitiveType)MeshBatch.Type;
        PipelineState.ImmutableSamplerState = MaterialRenderProxy.ImmutableSamplerState;
        
        // 處理FMeshDrawCommand的頂點數據, shader和渲染狀態.
        EVertexInputStreamType InputStreamType = EVertexInputStreamType::Default;
        if ((MeshPassFeatures & EMeshPassFeatures::PositionOnly) != EMeshPassFeatures::Default)                InputStreamType = EVertexInputStreamType::PositionOnly;
        if ((MeshPassFeatures & EMeshPassFeatures::PositionAndNormalOnly) != EMeshPassFeatures::Default)    InputStreamType = EVertexInputStreamType::PositionAndNormalOnly;

        FRHIVertexDeclaration* VertexDeclaration = VertexFactory->GetDeclaration(InputStreamType);
        SharedMeshDrawCommand.SetShaders(VertexDeclaration, PassShaders.GetUntypedShaders(), PipelineState);

        PipelineState.RasterizerState = GetStaticRasterizerState<true>(MeshFillMode, MeshCullMode);
        PipelineState.BlendState = DrawRenderState.GetBlendState();
        PipelineState.DepthStencilState = DrawRenderState.GetDepthStencilState();

        VertexFactory->GetStreams(FeatureLevel, InputStreamType, SharedMeshDrawCommand.VertexStreams);

        SharedMeshDrawCommand.PrimitiveIdStreamIndex = VertexFactory->GetPrimitiveIdStreamIndex(InputStreamType);

        // 處理VS/PS/GS等shader的綁定數據.
        int32 DataOffset = 0;
        if (PassShaders.VertexShader.IsValid())
        {
            FMeshDrawSingleShaderBindings ShaderBindings = SharedMeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Vertex, DataOffset);
            PassShaders.VertexShader->GetShaderBindings(Scene, FeatureLevel, PrimitiveSceneProxy, MaterialRenderProxy, MaterialResource, DrawRenderState, ShaderElementData, ShaderBindings);
        }

        if (PassShaders.PixelShader.IsValid())
        {
            FMeshDrawSingleShaderBindings ShaderBindings = SharedMeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Pixel, DataOffset);
            PassShaders.PixelShader->GetShaderBindings(Scene, FeatureLevel, PrimitiveSceneProxy, MaterialRenderProxy, MaterialResource, DrawRenderState, ShaderElementData, ShaderBindings);
        }

        (......)

        const int32 NumElements = MeshBatch.Elements.Num();

        // 遍歷該FMeshBatch的所有MeshBatchElement, 從材質中獲取FMeshBatchElement關聯的所有shader類型的綁定數據.
        for (int32 BatchElementIndex = 0; BatchElementIndex < NumElements; BatchElementIndex++)
        {
            if ((1ull << BatchElementIndex) & BatchElementMask)
            {
                const FMeshBatchElement& BatchElement = MeshBatch.Elements[BatchElementIndex];
                FMeshDrawCommand& MeshDrawCommand = DrawListContext->AddCommand(SharedMeshDrawCommand, NumElements);

                DataOffset = 0;
                if (PassShaders.VertexShader.IsValid())
                {
                    FMeshDrawSingleShaderBindings VertexShaderBindings = MeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Vertex, DataOffset);
                    FMeshMaterialShader::GetElementShaderBindings(PassShaders.VertexShader, Scene, ViewIfDynamicMeshCommand, VertexFactory, InputStreamType, FeatureLevel, PrimitiveSceneProxy, MeshBatch, BatchElement, ShaderElementData, VertexShaderBindings, MeshDrawCommand.VertexStreams);
                }

                if (PassShaders.PixelShader.IsValid())
                {
                    FMeshDrawSingleShaderBindings PixelShaderBindings = MeshDrawCommand.ShaderBindings.GetSingleShaderBindings(SF_Pixel, DataOffset);
                    FMeshMaterialShader::GetElementShaderBindings(PassShaders.PixelShader, Scene, ViewIfDynamicMeshCommand, VertexFactory, EVertexInputStreamType::Default, FeatureLevel, PrimitiveSceneProxy, MeshBatch, BatchElement, ShaderElementData, PixelShaderBindings, MeshDrawCommand.VertexStreams);
                }
                
                (......)

                // 處理並獲得PrimitiveId.
                int32 DrawPrimitiveId;
                int32 ScenePrimitiveId;
                GetDrawCommandPrimitiveId(PrimitiveSceneInfo, BatchElement, DrawPrimitiveId, ScenePrimitiveId);

                // 最後處理MeshDrawCommand
                FMeshProcessorShaders ShadersForDebugging = PassShaders.GetUntypedShaders();
                DrawListContext->FinalizeCommand(MeshBatch, BatchElementIndex, DrawPrimitiveId, ScenePrimitiveId, MeshFillMode, MeshCullMode, SortKey, PipelineState, &ShadersForDebugging, MeshDrawCommand);
            }
        }
    }

protected:
    RENDERER_API void GetDrawCommandPrimitiveId(
        const FPrimitiveSceneInfo* RESTRICT PrimitiveSceneInfo,
        const FMeshBatchElement& BatchElement,
        int32& DrawPrimitiveId,
        int32& ScenePrimitiveId) const;
};

上面計算鍵值時數次用到了CityHash64CityHash64是一種計算任意數量字元串哈希值的演算法,是一個快速的非加密哈希函數,也是一種快速的非加密的散列函數。它的實現程式碼在Engine\Source\Runtime\Core\Private\Hash\CityHash.cpp中,有興趣的童鞋自行研讀了。

與之相似的哈希演算法有:HalfMD5,MD5,SipHash64,SipHash128,IntHash32,IntHash64,SHA1,SHA224,SHA256等等。

FMeshDrawCommand保存了所有RHI所需的繪製網格的資訊,這些資訊時平台無關和圖形API無關的(stateless),並且是基於數據驅動的設計,因此可以共享它的設備上下文。

FMeshPassProcessor::AddMeshBatch由子類實現,每個子類通常對應著EMeshPass枚舉的一個通道。它的常見子類有:

  • FDepthPassMeshProcessor:深度通道網格處理器,對應EMeshPass::DepthPass

  • FBasePassMeshProcessor:幾何通道網格處理器,對應EMeshPass::BasePass

  • FCustomDepthPassMeshProcessor:自定義深度通道網格處理器,對應EMeshPass::CustomDepth

  • FShadowDepthPassMeshProcessor:陰影通道網格處理器,對應EMeshPass::CSMShadowDepth

  • FTranslucencyDepthPassMeshProcessor:透明深度通道網格處理器,沒有對應的EMeshPass

  • FLightmapDensityMeshProcessor:光照圖網格處理器,對應EMeshPass::LightmapDensity

  • ……

不同的Pass處理FMeshBatch會有所不同,以最常見的FBasePassMeshProcessor為例:

// Engine\Source\Runtime\Renderer\Private\BasePassRendering.cpp

void FBasePassMeshProcessor::AddMeshBatch(const FMeshBatch& RESTRICT MeshBatch, uint64 BatchElementMask, const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy, int32 StaticMeshId)
{
    if (MeshBatch.bUseForMaterial)
    {
        (......)

        if (bShouldDraw
            && (!PrimitiveSceneProxy || PrimitiveSceneProxy->ShouldRenderInMainPass())
            && ShouldIncludeDomainInMeshPass(Material.GetMaterialDomain())
            && ShouldIncludeMaterialInDefaultOpaquePass(Material))
        {
            (......)

            // 處理簡單的前向渲染
            if (IsSimpleForwardShadingEnabled(GetFeatureLevelShaderPlatform(FeatureLevel)))
            {
                AddMeshBatchForSimpleForwardShading(
                    MeshBatch,
                    BatchElementMask,
                    StaticMeshId,
                    PrimitiveSceneProxy,
                    MaterialRenderProxy,
                    Material,
                    LightMapInteraction,
                    bIsLitMaterial,
                    bAllowStaticLighting,
                    bUseVolumetricLightmap,
                    bAllowIndirectLightingCache,
                    MeshFillMode,
                    MeshCullMode);
            }
            // 渲染體積透明自陰影的物體
            else if (bIsLitMaterial
                && bIsTranslucent
                && PrimitiveSceneProxy
                && PrimitiveSceneProxy->CastsVolumetricTranslucentShadow())
            {
                (......)

                if (bIsLitMaterial
                    && bAllowStaticLighting
                    && bUseVolumetricLightmap
                    && PrimitiveSceneProxy)
                {
                    Process< FSelfShadowedVolumetricLightmapPolicy >(
                        MeshBatch,
                        BatchElementMask,
                        StaticMeshId,
                        PrimitiveSceneProxy,
                        MaterialRenderProxy,
                        Material,
                        BlendMode,
                        ShadingModels,
                        FSelfShadowedVolumetricLightmapPolicy(),
                        ElementData,
                        MeshFillMode,
                        MeshCullMode);
                }
                
                (......)
            }
            // 根據不同的光照圖的選項和品質等級,調用Process進行處理。
            else
            {
                static const auto CVarSupportLowQualityLightmap = IConsoleManager::Get().FindTConsoleVariableDataInt(TEXT("r.SupportLowQualityLightmaps"));
                const bool bAllowLowQualityLightMaps = (!CVarSupportLowQualityLightmap) || (CVarSupportLowQualityLightmap->GetValueOnAnyThread() != 0);

                switch (LightMapInteraction.GetType())
                {
                case LMIT_Texture:
                    if (bAllowHighQualityLightMaps)
                    {
                        const FShadowMapInteraction ShadowMapInteraction = (bAllowStaticLighting && MeshBatch.LCI && bIsLitMaterial)
                            ? MeshBatch.LCI->GetShadowMapInteraction(FeatureLevel)
                            : FShadowMapInteraction();

                        if (ShadowMapInteraction.GetType() == SMIT_Texture)
                        {
                            Process< FUniformLightMapPolicy >(
                                MeshBatch,
                                BatchElementMask,
                                StaticMeshId,
                                PrimitiveSceneProxy,
                                MaterialRenderProxy,
                                Material,
                                BlendMode,
                                ShadingModels,
                                FUniformLightMapPolicy(LMP_DISTANCE_FIELD_SHADOWS_AND_HQ_LIGHTMAP),
                                MeshBatch.LCI,
                                MeshFillMode,
                                MeshCullMode);
                        }
                            
                        (......)
                    }
                        
                    (......)
                        
                    break;
                default:
                    if (bIsLitMaterial
                        && bAllowStaticLighting
                        && Scene
                        && Scene->VolumetricLightmapSceneData.HasData()
                        && PrimitiveSceneProxy
                        && (PrimitiveSceneProxy->IsMovable()
                            || PrimitiveSceneProxy->NeedsUnbuiltPreviewLighting()
                            || PrimitiveSceneProxy->GetLightmapType() == ELightmapType::ForceVolumetric))
                    {
                        Process< FUniformLightMapPolicy >(
                            MeshBatch,
                            BatchElementMask,
                            StaticMeshId,
                            PrimitiveSceneProxy,
                            MaterialRenderProxy,
                            Material,
                            BlendMode,
                            ShadingModels,
                            FUniformLightMapPolicy(LMP_PRECOMPUTED_IRRADIANCE_VOLUME_INDIRECT_LIGHTING),
                            MeshBatch.LCI,
                            MeshFillMode,
                            MeshCullMode);
                    }
                        
                    (......)
                        
                    break;
                };
            }
        }
    }
}

// FBasePassMeshProcessor對不同的光照圖類型進行處理(shader綁定,渲染狀態,排序鍵值,頂點數據等等),最後調用BuildMeshDrawCommands將FMeshBatch轉換成FMeshDrawCommands。
template<typename LightMapPolicyType>
void FBasePassMeshProcessor::Process(
    const FMeshBatch& RESTRICT MeshBatch,
    uint64 BatchElementMask,
    int32 StaticMeshId,
    const FPrimitiveSceneProxy* RESTRICT PrimitiveSceneProxy,
    const FMaterialRenderProxy& RESTRICT MaterialRenderProxy,
    const FMaterial& RESTRICT MaterialResource,
    EBlendMode BlendMode,
    FMaterialShadingModelField ShadingModels,
    const LightMapPolicyType& RESTRICT LightMapPolicy,
    const typename LightMapPolicyType::ElementDataType& RESTRICT LightMapElementData,
    ERasterizerFillMode MeshFillMode,
    ERasterizerCullMode MeshCullMode)
{
    const FVertexFactory* VertexFactory = MeshBatch.VertexFactory;

    const bool bRenderSkylight = Scene && Scene->ShouldRenderSkylightInBasePass(BlendMode) && ShadingModels.IsLit();
    const bool bRenderAtmosphericFog = IsTranslucentBlendMode(BlendMode) && (Scene && Scene->HasAtmosphericFog() && Scene->ReadOnlyCVARCache.bEnableAtmosphericFog);

    TMeshProcessorShaders<
        TBasePassVertexShaderPolicyParamType<LightMapPolicyType>,
        FBaseHS,
        FBaseDS,
        TBasePassPixelShaderPolicyParamType<LightMapPolicyType>> BasePassShaders;

    // 獲取指定光照圖策略類型的shader。
    GetBasePassShaders<LightMapPolicyType>(
        MaterialResource,
        VertexFactory->GetType(),
        LightMapPolicy,
        FeatureLevel,
        bRenderAtmosphericFog,
        bRenderSkylight,
        Get128BitRequirement(),
        BasePassShaders.HullShader,
        BasePassShaders.DomainShader,
        BasePassShaders.VertexShader,
        BasePassShaders.PixelShader
        );

    // 渲染狀態處理。
    FMeshPassProcessorRenderState DrawRenderState(PassDrawRenderState);

    SetDepthStencilStateForBasePass(
        ViewIfDynamicMeshCommand,
        DrawRenderState,
        FeatureLevel,
        MeshBatch,
        StaticMeshId,
        PrimitiveSceneProxy,
        bEnableReceiveDecalOutput);

    if (bTranslucentBasePass)
    {
        SetTranslucentRenderState(DrawRenderState, MaterialResource, GShaderPlatformForFeatureLevel[FeatureLevel], TranslucencyPassType);
    }

    // 初始化Shader的材質書。
    TBasePassShaderElementData<LightMapPolicyType> ShaderElementData(LightMapElementData);
    ShaderElementData.InitializeMeshMaterialData(ViewIfDynamicMeshCommand, PrimitiveSceneProxy, MeshBatch, StaticMeshId, true);

    // 處理排序鍵值。
    FMeshDrawCommandSortKey SortKey = FMeshDrawCommandSortKey::Default;

    if (bTranslucentBasePass)
    {
        SortKey = CalculateTranslucentMeshStaticSortKey(PrimitiveSceneProxy, MeshBatch.MeshIdInPrimitive);
    }
    else
    {
        SortKey = CalculateBasePassMeshStaticSortKey(EarlyZPassMode, BlendMode, BasePassShaders.VertexShader.GetShader(), BasePassShaders.PixelShader.GetShader());
    }

    // 將FMeshBatch的元素轉換成FMeshDrawCommands。
    BuildMeshDrawCommands(
        MeshBatch,
        BatchElementMask,
        PrimitiveSceneProxy,
        MaterialRenderProxy,
        MaterialResource,
        DrawRenderState,
        BasePassShaders,
        MeshFillMode,
        MeshCullMode,
        SortKey,
        EMeshPassFeatures::Default,
        ShaderElementData);
}

由此可見,FMeshPassProcessor的主要作用是:

  • Pass過濾。將該Pass無關的MeshBatch給過濾掉,比如深度Pass過濾掉透明物體。

  • 選擇繪製命令所需的Shader及渲染狀態(深度、模板、混合狀態、光柵化狀態等)。

  • 收集繪製命令涉及的Shader資源綁定。

    • Pass的Uniform Buffer,如ViewUniformBuffer、DepthPassUniformBuffer。
    • 頂點工廠綁定(頂點數據和索引)。
    • 材質綁定。
    • Pass的與繪製指令相關的綁定。
  • 收集Draw Call相關的參數。

FMeshPassProcessor::BuildMeshDrawCommands在最後階段會調用FMeshPassDrawListContext::FinalizeCommandFMeshPassDrawListContext提供了兩個基本介面,是個抽象類,派生類有FDynamicPassMeshDrawListContextFCachedPassMeshDrawListContext,分別代表了動態網格繪製指令和快取網格繪製指令的上下文。它們的介面和解析如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.h

// 網格通道繪製列表上下文。
class FMeshPassDrawListContext
{
public:
    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) = 0;
    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) = 0;
};

// 【動態】網格通道繪製列表上下文。
class FDynamicPassMeshDrawListContext : public FMeshPassDrawListContext
{
public:
    (......)

    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final
    {
        // 將FMeshDrawCommand加進列表,返回其在數組的下標。
        const int32 Index = DrawListStorage.MeshDrawCommands.AddElement(Initializer);
        FMeshDrawCommand& NewCommand = DrawListStorage.MeshDrawCommands[Index];
        return NewCommand;
    }

    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) override final
    {
        // 獲取渲染管線Id
        FGraphicsMinimalPipelineStateId PipelineId = FGraphicsMinimalPipelineStateId::GetPipelineStateId(PipelineState, GraphicsMinimalPipelineStateSet, NeedsShaderInitialisation);

        // 對FMeshBatch等數據進行處理, 並保存到MeshDrawCommand中.
        MeshDrawCommand.SetDrawParametersAndFinalize(MeshBatch, BatchElementIndex, PipelineId, ShadersForDebugging);

        // 創建FVisibleMeshDrawCommand, 並將FMeshDrawCommand等數據填充給它.
        FVisibleMeshDrawCommand NewVisibleMeshDrawCommand;
        NewVisibleMeshDrawCommand.Setup(&MeshDrawCommand, DrawPrimitiveId, ScenePrimitiveId, -1, MeshFillMode, MeshCullMode, SortKey);
        // 直接加入到TArray中,說明動態模式並未合併和實例化MeshDrawCommand。
        DrawList.Add(NewVisibleMeshDrawCommand);
    }

private:
    // 保存FMeshDrawCommand的列表,使用的數據結構是TChunkedArray。
    FDynamicMeshDrawCommandStorage& DrawListStorage;
    // FVisibleMeshDrawCommand列表,使用的數據結構是TArray,它內部引用了FMeshDrawCommand指針,指向的數據存儲於DrawListStorage。
    FMeshCommandOneFrameArray& DrawList;
    // PSO集合。
    FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet;
    
    bool& NeedsShaderInitialisation;
};


// 【快取】網格通道繪製列表上下文。
class FCachedPassMeshDrawListContext : public FMeshPassDrawListContext
{
public:
    FCachedPassMeshDrawListContext(FCachedMeshDrawCommandInfo& InCommandInfo, FCriticalSection& InCachedMeshDrawCommandLock, FCachedPassMeshDrawList& InCachedDrawLists, FStateBucketMap& InCachedMeshDrawCommandStateBuckets, const FScene& InScene);

    virtual FMeshDrawCommand& AddCommand(FMeshDrawCommand& Initializer, uint32 NumElements) override final
    {
        if (NumElements == 1)
        {
            return Initializer;
        }
        else
        {
            MeshDrawCommandForStateBucketing = Initializer;
            return MeshDrawCommandForStateBucketing;
        }
    }

    virtual void FinalizeCommand(
        const FMeshBatch& MeshBatch, 
        int32 BatchElementIndex,
        int32 DrawPrimitiveId,
        int32 ScenePrimitiveId,
        ERasterizerFillMode MeshFillMode,
        ERasterizerCullMode MeshCullMode,
        FMeshDrawCommandSortKey SortKey,
        const FGraphicsMinimalPipelineStateInitializer& PipelineState,
        const FMeshProcessorShaders* ShadersForDebugging,
        FMeshDrawCommand& MeshDrawCommand) override final
    {
        FGraphicsMinimalPipelineStateId PipelineId = FGraphicsMinimalPipelineStateId::GetPersistentId(PipelineState);

        MeshDrawCommand.SetDrawParametersAndFinalize(MeshBatch, BatchElementIndex, PipelineId, ShadersForDebugging);

        if (UseGPUScene(GMaxRHIShaderPlatform, GMaxRHIFeatureLevel))
        {
            Experimental::FHashElementId SetId;
            auto hash = CachedMeshDrawCommandStateBuckets.ComputeHash(MeshDrawCommand);
            {
                FScopeLock Lock(&CachedMeshDrawCommandLock);

                (......)
                
                // 從快取哈希表中查找hash的id,如果不存在則添加新的. 從而達到了合併FMeshDrawCommand的目的。
                SetId = CachedMeshDrawCommandStateBuckets.FindOrAddIdByHash(hash, MeshDrawCommand, FMeshDrawCommandCount());
                // 計數加1
                CachedMeshDrawCommandStateBuckets.GetByElementId(SetId).Value.Num++;

                (......)
            }

            CommandInfo.StateBucketId = SetId.GetIndex();
        }
        else
        {
            FScopeLock Lock(&CachedMeshDrawCommandLock);
            // Only one FMeshDrawCommand supported per FStaticMesh in a pass
            // Allocate at lowest free index so that 'r.DoLazyStaticMeshUpdate' can shrink the TSparseArray more effectively
            CommandInfo.CommandIndex = CachedDrawLists.MeshDrawCommands.EmplaceAtLowestFreeIndex(CachedDrawLists.LowestFreeIndexSearchStart, MeshDrawCommand);
        }

        // 存儲其它數據.
        CommandInfo.SortKey = SortKey;
        CommandInfo.MeshFillMode = MeshFillMode;
        CommandInfo.MeshCullMode = MeshCullMode;
    }

private:
    FMeshDrawCommand MeshDrawCommandForStateBucketing;
    FCachedMeshDrawCommandInfo& CommandInfo;
    FCriticalSection& CachedMeshDrawCommandLock;
    FCachedPassMeshDrawList& CachedDrawLists;
    FStateBucketMap& CachedMeshDrawCommandStateBuckets; // 羅賓漢哈希表,自動合併和計數具有相同哈希值的FMeshDrawCommand。
    const FScene& Scene;
};

由此可見,從FMeshBatchFMeshDrawCommand階段,渲染器做了大量的處理,為的是將FMeshBatch轉換到FMeshDrawCommand,並保存到FMeshPassProcessor的FMeshPassDrawListContext成員變數中。期間還從各個對象中收集或處理網格繪製指令所需的一切數據,以便進入後續的渲染流程。下圖展示了這些關鍵過程:

關於FMeshDrawCommand的合併,需要補充說明,動態繪製路徑模式的FDynamicPassMeshDrawListContextFMeshDrawCommand存儲於TArray結構內,不會合併FMeshDrawCommand,亦不會動態實例化網格,但可以提升基於狀態排序的魯棒性。

快取(靜態)繪製路徑模式的FCachedPassMeshDrawListContext依靠FStateBucketMap實現了合併和計數功能,以便在提交繪製階段實例化繪製。

另外補充一下,UE並沒有像Unity那樣的動態合批功能,只有編輯器階段手動合網格(見下圖)。

UE編輯器中內置的Actor合併工具打開方式及其介面預覽。

3.2.4 從FMeshDrawCommand到RHICommandList

上一節已經詳盡地闡述了如何將FMeshBatch轉換成FMeshDrawCommand,本節將闡述後續的步驟,即如何將FMeshDrawCommand轉換到RHICommandList,期間又做了什麼處理和優化。

FMeshBatch轉換成FMeshDrawCommand後,每個Pass都對應了一個FMeshPassProcessor,每個FMeshPassProcessor保存了該Pass需要繪製的所有FMeshDrawCommand,以便渲染器在合適的時間觸發並渲染。以最簡單的PrePass(深度Pass)為例:

void FDeferredShadingSceneRenderer::Render(FRHICommandListImmediate& RHICmdList)
{
    (......)
    
    // FMeshBatch轉換成FMeshDrawCommand的邏輯在InitViews完成
    InitViews(RHICmdList, BasePassDepthStencilAccess, ILCTaskData, UpdateViewCustomDataEvents);
    
    (......)
    
    // 渲染PrePass(深度Pass)
    RenderPrePass(FRHICommandListImmediate& RHICmdList, TFunctionRef<void()> AfterTasksAreStarted)
    {
        bool bParallel = GRHICommandList.UseParallelAlgorithms() && CVarParallelPrePass.GetValueOnRenderThread();
        
        (......)
        
        if(EarlyZPassMode != DDM_None)
        {
            const bool bWaitForTasks = bParallel && (CVarRHICmdFlushRenderThreadTasksPrePass.GetValueOnRenderThread() > 0 || CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() > 0);

            // 遍歷所有view,每個view都渲染一次深度Pass。
            for(int32 ViewIndex = 0;ViewIndex < Views.Num();ViewIndex++)
            {
                const FViewInfo& View = Views[ViewIndex];

                // 處理深度Pass的渲染資源和狀態。
                TUniformBufferRef<FSceneTexturesUniformParameters> PassUniformBuffer;
                CreateDepthPassUniformBuffer(RHICmdList, View, PassUniformBuffer);

                FMeshPassProcessorRenderState DrawRenderState(View, PassUniformBuffer);

                SetupDepthPassState(DrawRenderState);

                if (View.ShouldRenderView())
                {
                    Scene->UniformBuffers.UpdateViewUniformBuffer(View);

                    if (bParallel)
                    {
                        // 並行渲染深度Pass。
                        bDepthWasCleared = RenderPrePassViewParallel(View, RHICmdList, DrawRenderState, AfterTasksAreStarted, !bDidPrePre) || bDepthWasCleared;
                        bDidPrePre = true;
                    }
                    (......)
                }

                (......)
            }
        }
        
        (......)
    }
}

// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

// 並行渲染深度Pass介面
bool FDeferredShadingSceneRenderer::RenderPrePassViewParallel(const FViewInfo& View, FRHICommandListImmediate& ParentCmdList, const FMeshPassProcessorRenderState& DrawRenderState, TFunctionRef<void()> AfterTasksAreStarted, bool bDoPrePre)
{
    bool bDepthWasCleared = false;

    {
        // 構造繪製指令存儲容器。
        FPrePassParallelCommandListSet ParallelCommandListSet(View, this, ParentCmdList,
            CVarRHICmdPrePassDeferredContexts.GetValueOnRenderThread() > 0, 
            CVarRHICmdFlushRenderThreadTasksPrePass.GetValueOnRenderThread() == 0 && CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() == 0,
            DrawRenderState);

        // 觸發並行繪製。
        View.ParallelMeshDrawCommandPasses[EMeshPass::DepthPass].DispatchDraw(&ParallelCommandListSet, ParentCmdList);

        (......)
    }

    (......)

    return bDepthWasCleared;
}


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FParallelMeshDrawCommandPass::DispatchDraw(FParallelCommandListSet* ParallelCommandListSet, FRHICommandList& RHICmdList) const
{
    (......)
    
    FRHIVertexBuffer* PrimitiveIdsBuffer = PrimitiveIdVertexBufferPoolEntry.BufferRHI;
    const int32 BasePrimitiveIdsOffset = 0;

    if (ParallelCommandListSet)
    {
        (......)
        
        const ENamedThreads::Type RenderThread = ENamedThreads::GetRenderThread();

        // 處理前序任務。
        FGraphEventArray Prereqs;
        if (ParallelCommandListSet->GetPrereqs())
        {
            Prereqs.Append(*ParallelCommandListSet->GetPrereqs());
        }
        if (TaskEventRef.IsValid())
        {
            Prereqs.Add(TaskEventRef);
        }

        // 構造與工作執行緒數量相同的並行繪製任務數。
        const int32 NumThreads = FMath::Min<int32>(FTaskGraphInterface::Get().GetNumWorkerThreads(), ParallelCommandListSet->Width);
        const int32 NumTasks = FMath::Min<int32>(NumThreads, FMath::DivideAndRoundUp(MaxNumDraws, ParallelCommandListSet->MinDrawsPerCommandList));
        const int32 NumDrawsPerTask = FMath::DivideAndRoundUp(MaxNumDraws, NumTasks);

        // 遍歷NumTasks次,構造NumTasks個繪製任務(FDrawVisibleMeshCommandsAnyThreadTask)實例。
        for (int32 TaskIndex = 0; TaskIndex < NumTasks; TaskIndex++)
        {
            const int32 StartIndex = TaskIndex * NumDrawsPerTask;
            const int32 NumDraws = FMath::Min(NumDrawsPerTask, MaxNumDraws - StartIndex);
            checkSlow(NumDraws > 0);

            FRHICommandList* CmdList = ParallelCommandListSet->NewParallelCommandList();

            // 構造FDrawVisibleMeshCommandsAnyThreadTask實例並加入TaskGraph中,其中TaskContext.MeshDrawCommands就是上一節闡述過的由FMeshPassProcessor生成的。
            FGraphEventRef AnyThreadCompletionEvent = TGraphTask<FDrawVisibleMeshCommandsAnyThreadTask>::CreateTask(&Prereqs, RenderThread).ConstructAndDispatchWhenReady(*CmdList, TaskContext.MeshDrawCommands, TaskContext.MinimalPipelineStatePassSet, PrimitiveIdsBuffer, BasePrimitiveIdsOffset, TaskContext.bDynamicInstancing, TaskContext.InstanceFactor, TaskIndex, NumTasks);
            // 將事件加入ParallelCommandListSet,以便追蹤深度Pass的並行繪製是否完成。
            ParallelCommandListSet->AddParallelCommandList(CmdList, AnyThreadCompletionEvent, NumDraws);
        }
    }
    
    (......)
}


// Engine\Source\Runtime\Renderer\Private\MeshDrawCommands.cpp

void FDrawVisibleMeshCommandsAnyThreadTask::DoTask(ENamedThreads::Type CurrentThread, const FGraphEventRef& MyCompletionGraphEvent)
{
    // 計算繪製的範圍
    const int32 DrawNum = VisibleMeshDrawCommands.Num();
    const int32 NumDrawsPerTask = TaskIndex < DrawNum ? FMath::DivideAndRoundUp(DrawNum, TaskNum) : 0;
    const int32 StartIndex = TaskIndex * NumDrawsPerTask;
    const int32 NumDraws = FMath::Min(NumDrawsPerTask, DrawNum - StartIndex);

    // 將繪製所需的數據傳遞到繪製介面
    SubmitMeshDrawCommandsRange(VisibleMeshDrawCommands, GraphicsMinimalPipelineStateSet, PrimitiveIdsBuffer, BasePrimitiveIdsOffset, bDynamicInstancing, StartIndex, NumDraws, InstanceFactor, RHICmdList);

    RHICmdList.EndRenderPass();
    RHICmdList.HandleRTThreadTaskCompletion(MyCompletionGraphEvent);
}

// 提交指定範圍的網格繪製指令。
void SubmitMeshDrawCommandsRange(
    const FMeshCommandOneFrameArray& VisibleMeshDrawCommands,
    const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
    FRHIVertexBuffer* PrimitiveIdsBuffer,
    int32 BasePrimitiveIdsOffset,
    bool bDynamicInstancing,
    int32 StartIndex,
    int32 NumMeshDrawCommands,
    uint32 InstanceFactor,
    FRHICommandList& RHICmdList)
{
    FMeshDrawCommandStateCache StateCache;

    // 遍歷給定範圍的繪製指令,一個一個提交。
    for (int32 DrawCommandIndex = StartIndex; DrawCommandIndex < StartIndex + NumMeshDrawCommands; DrawCommandIndex++)
    {
        const FVisibleMeshDrawCommand& VisibleMeshDrawCommand = VisibleMeshDrawCommands[DrawCommandIndex];
        const int32 PrimitiveIdBufferOffset = BasePrimitiveIdsOffset + (bDynamicInstancing ? VisibleMeshDrawCommand.PrimitiveIdBufferOffset : DrawCommandIndex) * sizeof(int32);
        // 提交單個MeshDrawCommand.
        FMeshDrawCommand::SubmitDraw(*VisibleMeshDrawCommand.MeshDrawCommand, GraphicsMinimalPipelineStateSet, PrimitiveIdsBuffer, PrimitiveIdBufferOffset, InstanceFactor, RHICmdList, StateCache);
    }
}

// 提交單個MeshDrawCommand到RHICommandList.
void FMeshDrawCommand::SubmitDraw(
    const FMeshDrawCommand& RESTRICT MeshDrawCommand, 
    const FGraphicsMinimalPipelineStateSet& GraphicsMinimalPipelineStateSet,
    FRHIVertexBuffer* ScenePrimitiveIdsBuffer,
    int32 PrimitiveIdOffset,
    uint32 InstanceFactor,
    FRHICommandList& RHICmdList,
    FMeshDrawCommandStateCache& RESTRICT StateCache)
{
    (......)
    
    const FGraphicsMinimalPipelineStateInitializer& MeshPipelineState = MeshDrawCommand.CachedPipelineId.GetPipelineState(GraphicsMinimalPipelineStateSet);

    // 設置和快取PSO.
    if (MeshDrawCommand.CachedPipelineId.GetId() != StateCache.PipelineId)
    {
        FGraphicsPipelineStateInitializer GraphicsPSOInit = MeshPipelineState.AsGraphicsPipelineStateInitializer();
        RHICmdList.ApplyCachedRenderTargets(GraphicsPSOInit);
        SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
        StateCache.SetPipelineState(MeshDrawCommand.CachedPipelineId.GetId());
    }

    // 設置和快取模板值.
    if (MeshDrawCommand.StencilRef != StateCache.StencilRef)
    {
        RHICmdList.SetStencilRef(MeshDrawCommand.StencilRef);
        StateCache.StencilRef = MeshDrawCommand.StencilRef;
    }

    // 設置頂點數據.
    for (int32 VertexBindingIndex = 0; VertexBindingIndex < MeshDrawCommand.VertexStreams.Num(); VertexBindingIndex++)
    {
        const FVertexInputStream& Stream = MeshDrawCommand.VertexStreams[VertexBindingIndex];

        if (MeshDrawCommand.PrimitiveIdStreamIndex != -1 && Stream.StreamIndex == MeshDrawCommand.PrimitiveIdStreamIndex)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, ScenePrimitiveIdsBuffer, PrimitiveIdOffset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
        else if (StateCache.VertexStreams[Stream.StreamIndex] != Stream)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, Stream.VertexBuffer, Stream.Offset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
    }

    // 設置shader綁定的資源.
    MeshDrawCommand.ShaderBindings.SetOnCommandList(RHICmdList, MeshPipelineState.BoundShaderState.AsBoundShaderState(), StateCache.ShaderBindings);

    // 根據不同的數據調用不同類型的繪製指令到RHICommandList.
    if (MeshDrawCommand.IndexBuffer)
    {
        if (MeshDrawCommand.NumPrimitives > 0)
        {
            RHICmdList.DrawIndexedPrimitive(
                MeshDrawCommand.IndexBuffer,
                MeshDrawCommand.VertexParams.BaseVertexIndex,
                0,
                MeshDrawCommand.VertexParams.NumVertices,
                MeshDrawCommand.FirstIndex,
                MeshDrawCommand.NumPrimitives,
                MeshDrawCommand.NumInstances * InstanceFactor
            );
        }
        else
        {
            RHICmdList.DrawIndexedPrimitiveIndirect(
                MeshDrawCommand.IndexBuffer, 
                MeshDrawCommand.IndirectArgs.Buffer, 
                MeshDrawCommand.IndirectArgs.Offset
                );
        }
    }
    else
    {
        if (MeshDrawCommand.NumPrimitives > 0)
        {
            RHICmdList.DrawPrimitive(
                MeshDrawCommand.VertexParams.BaseVertexIndex + MeshDrawCommand.FirstIndex,
                MeshDrawCommand.NumPrimitives,
                    MeshDrawCommand.NumInstances * InstanceFactor);
        }
        else
        {
            RHICmdList.DrawPrimitiveIndirect(
                MeshDrawCommand.IndirectArgs.Buffer,
                MeshDrawCommand.IndirectArgs.Offset);
        }
    }
}

上述程式碼已經詳盡第闡述了PrePass(深度通道)的繪製過程。關於從FMeshDrawCommand到RHICommandList需要補充以下說明:

  • 每個Pass都會執行類似上面的過程,同一幀會執行多次,但並不是所有的Pass都會開啟,可通過view的PassMask動態開啟和關閉。

  • DispatchDraw和SubmitMeshDrawCommandsRange特意採用了扁平化的數組,並且考慮了以下因素:

    • 只通過可見性集合就可以方便快捷地劃分FVisibleMeshDrawCommand的數組,以便扁平化地將向多執行緒系統TaskGraph提交FMeshDrawCommand繪製指令。
    • 通過對FMeshDrawCommand列表的排序和增加StateCache減少向RHICommandList提交的指令數量,減少RHICommandList轉換和執行的負載。增加這個步驟後,Fortnite可以減少20%的RHI執行時間。
    • 快取一致性的遍歷。緊密地打包FMeshDrawCommand,輕量化、扁平化且連續地在記憶體中存儲SubmitDraw所需的數據,可以提升快取和預存取命中率。
      • TChunkedArray<FMeshDrawCommand> MeshDrawCommands;
      • typedef TArray<FVisibleMeshDrawCommand, SceneRenderingAllocator> FMeshCommandOneFrameArray;
      • TArray<FMeshDrawShaderBindingsLayout, TInlineAllocator<2>>ShaderLayouts;
      • typedef TArray<FVertexInputStream, TInlineAllocator<4>>FVertexInputStreamArray;
      • const int32 NumInlineShaderBindings = 10;
  • 將MeshDrawCommandPasses轉成RHICommandList的命令時支援並行模式,並行的分配策略只是簡單地將地將數組平均分成等同於工作執行緒的數量,然後每個工作執行緒執行指定範圍的繪製指令。這樣做的好處是實現簡單快捷易於理解,提升CPU的cache命中率,缺點是每個組內的任務執行時間可能存在較大的差異,這樣整體的執行時間由最長的一組決定,勢必拉長了時間,降低並行效率。針對這個問題,筆者想出了一些策略:

    • 啟發性策略。記錄上一幀每個MeshDrawCommand的執行時間,下一幀根據它們的執行時間將相鄰的MeshDrawCommand相加,當它們的總和趨近每組的平均值時,作為一組執行體。
    • 考察MeshDrawCommand的某個或某幾個屬性。比如以網格的面數或材質數為分組的依據,將每組MeshDrawCommand的考察屬性之和大致相同。

    當然以上策略會增加邏輯複雜度,也可能降低CPU的cache命中率,實際效果要以運行環境為準。

  • FMeshDrawCommand::SubmitDraw的過程做了PSO和模板值的快取,防止向RHICommandList提交重複的數據和指令,減少CPU和GPU的IO交互。

    CPU與GPU之間的IO和渲染狀態的切換一直是困擾實時渲染領域的問題,在CPU和GPU異構的體系中尤為明顯。所以,減少CPU和GPU的數據交互是渲染性能優化的一大措施。採取快取PSO等狀態後,在極端情況下,可以帶來數倍的性能提升。

  • FMeshDrawCommand::SubmitDraw支援四種繪製模型,一個維度為是否有頂點索引,另一個維度為是否Indirect繪製。

    Indirect Draw簡介

    在沒有Indirect Draw之前,應用程式如果想要實現同一個Draw Call繪製多個物體,只能使用GPU Instance,但是GPU Instance有非常多的限制,比如需要完全一樣的頂點、索引、渲染狀態和材質數據,只允許Transform不一樣。即使貼圖可以打包Atlas,材質屬性和模型網格可以打包StructuredBuffer,也沒法避免每次繪製時頂點數必須一樣這一個致命的限制,想要實現GPU Driven Rendering Pipeline必須打碎成相同頂點數的Cluster。

    Indirect Draw技術的出現,GPU驅動的渲染管線將變得更加簡單且高效。它的核心思想是允許將同一個網格所需的資源引用放入一個Argument Buffer:

    不同網格的Argument Buffer又可以組成更長的Buffer:

    由於每個網格的數據可以存儲在不同的GPU執行緒中,可以並行地執行多個網格之間的繪製,相較傳統的串列繪製必然有明顯的效率提升:

    但是,Indirect Draw只在DirectX11、DirecXt12、Vulkan、Metal等現代圖形API中支援。

3.2.5 從RHICommandList到GPU

RHI全稱Rendering Hardware Interface(渲染硬體介面),是不同圖形API的抽象層,而RHICommandList便是負責收錄與圖形API無關的中間層繪製指令和數據。

RHICommandList收錄了一系列中間繪製指令之後,會在RHI執行緒一一轉換到對應目標圖形API的介面,下面以FRHICommandList::DrawIndexedPrimitive介面為例:

// Engine\Source\Runtime\RHI\Public\RHICommandList.h

void FRHICommandList::DrawIndexedPrimitive(FRHIIndexBuffer* IndexBuffer, int32 BaseVertexIndex, uint32 FirstInstance, uint32 NumVertices, uint32 StartIndex, uint32 NumPrimitives, uint32 NumInstances)
{
    if (!IndexBuffer)
    {
        UE_LOG(LogRHI, Fatal, TEXT("Tried to call DrawIndexedPrimitive with null IndexBuffer!"));
    }

    // 繞開RHI執行緒直接執行.
    if (Bypass())
    {
        GetContext().RHIDrawIndexedPrimitive(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
        return;
    }
    
    // 創建繪製指令.
    ALLOC_COMMAND(FRHICommandDrawIndexedPrimitive)(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
}

// FRHICommandDrawIndexedPrimitive的聲明體
FRHICOMMAND_MACRO(FRHICommandDrawIndexedPrimitive)
{
    // 命令所需的數據.
    FRHIIndexBuffer* IndexBuffer;
    int32 BaseVertexIndex;
    uint32 FirstInstance;
    uint32 NumVertices;
    uint32 StartIndex;
    uint32 NumPrimitives;
    uint32 NumInstances;
    
    FRHICommandDrawIndexedPrimitive(FRHIIndexBuffer* InIndexBuffer, int32 InBaseVertexIndex, uint32 InFirstInstance, uint32 InNumVertices, uint32 InStartIndex, uint32 InNumPrimitives, uint32 InNumInstances)
        : IndexBuffer(InIndexBuffer)
        , BaseVertexIndex(InBaseVertexIndex)
        , FirstInstance(InFirstInstance)
        , NumVertices(InNumVertices)
        , StartIndex(InStartIndex)
        , NumPrimitives(InNumPrimitives)
        , NumInstances(InNumInstances)
    {
    }
    
    // 執行此命令的介面.
    RHI_API void Execute(FRHICommandListBase& CmdList);
};


// Engine\Source\Runtime\RHI\Public\RHICommandListCommandExecutes.inl

// FRHICommandDrawIndexedPrimitive的執行介面實現.
void FRHICommandDrawIndexedPrimitive::Execute(FRHICommandListBase& CmdList)
{
    RHISTAT(DrawIndexedPrimitive);
    INTERNAL_DECORATOR(RHIDrawIndexedPrimitive)(IndexBuffer, BaseVertexIndex, FirstInstance, NumVertices, StartIndex, NumPrimitives, NumInstances);
}

// INTERNAL_DECORATOR的宏實際上就是調用RHICommandList內IRHICommandContext的對應介面.
#if !defined(INTERNAL_DECORATOR)
    #define INTERNAL_DECORATOR(Method) CmdList.GetContext().Method
#endif


// Engine\Source\Runtime\RHI\Public\RHICommandList.h

// 分配RHI命令的宏定義
#define ALLOC_COMMAND(...) new ( AllocCommand(sizeof(__VA_ARGS__), alignof(__VA_ARGS__)) ) __VA_ARGS__

// 分配RHI命令的介面.
void* FRHICommandListBase::AllocCommand(int32 AllocSize, int32 Alignment)
{
    checkSlow(!IsExecuting());
    // 從命令記憶體管理器分配記憶體.
    FRHICommandBase* Result = (FRHICommandBase*) MemManager.Alloc(AllocSize, Alignment);
    ++NumCommands;
    // 將新分配的命令加到鏈表的尾部.
    *CommandLink = Result;
    CommandLink = &Result->Next;
    return Result;
}

從上面可以知道,通過預先定義的宏FRHICOMMAND_MACROINTERNAL_DECORATORALLOC_COMMAND將RHICommandList中間層繪製指令,經過IRHICommandContext轉換到對應圖形API,以便後續提交繪製指令到GPU。

 

3.3 靜態和動態繪製路徑

3.3.1 繪製路徑概述

3.2章節中其實已經出現了若干靜態路徑和動態路徑的影子,但更多是以動態路徑進行闡述。實際上,UE為了優化靜態網格的繪製,分離出了靜態繪製路徑,以便對其做訂製化的性能優化。靜態路徑又分為兩種,一種是需要View的資訊,另一種是不需要View的資訊,可以執行更多的快取優化:

UE存在3種網格繪製路徑(橙色為每幀動態生成,藍色為只生成一次後快取):第1種是動態繪製路徑,從FPrimitiveSceneProxy到RHICommandList每幀都會動態創建,效率最低,但可控性最強;第2種是需要View的靜態路徑,可以快取FMeshBatch數據,效率中,可控性中;第3種是不需要view的靜態繪製路徑,可以快取FMeshBatch和FMeshDrawCommand,效率最高,但可控性差,需滿足的條件多。

靜態繪製路徑的快取數據只需要生成一次,所以可以減少渲染執行緒執行時間,提升運行效率。諸如靜態網格,通過實現DrawStaticElements介面注入FStaticMeshBatch,而DrawStaticElements通常是SceneProxy加入場景時被調用的。

3.3.2 動態繪製路徑

動態繪製路徑每幀都會重建FMeshBatch數據,而不會快取,因此可擴展性最強,但效率最低。常用於粒子特效、骨骼動畫、程式動態網格以及需要每幀更新數據的網格。通過GetDynamicMeshElements介面來收集FMeshBatch,具體參見[3.2 模型繪製管線](#3.2 模型繪製管線)。

FParallelMeshDrawCommandPass是通用的網格Pass,建議只用於性能較關鍵的網格Pass中,因為只支援並行和快取渲染。如果要使用並行或快取路徑,必須經過嚴格的設計,因為在InitViews之後不能修改網格繪製命令和shader綁定的任何數據。章節3.2已經出現過FParallelMeshDrawCommandPass的程式碼,不過為了進一步說明它的使用方式,下面找個相對簡潔的陰影渲染的例子:

// Engine\Source\Runtime\Renderer\Private\ShadowRendering.h

class FProjectedShadowInfo : public FRefCountedObject
{
    (......)
    
    // 聲明FParallelMeshDrawCommandPass實例
    FParallelMeshDrawCommandPass ShadowDepthPass;
    
    (......)
};


// Engine\Source\Runtime\Renderer\Private\ShadowDepthRendering.cpp

void FProjectedShadowInfo::RenderDepthInner(FRHICommandListImmediate& RHICmdList, FSceneRenderer* SceneRenderer, FBeginShadowRenderPassFunction BeginShadowRenderPass, bool bDoParallelDispatch)
{
    (......)

    // 並行模式
    if (bDoParallelDispatch)
    {
        bool bFlush = CVarRHICmdFlushRenderThreadTasksShadowPass.GetValueOnRenderThread() > 0
            || CVarRHICmdFlushRenderThreadTasks.GetValueOnRenderThread() > 0;
        FScopedCommandListWaitForTasks Flusher(bFlush);

        {
            // 構建並行處理集,用於存放生成的RHICommandList列表。
            FShadowParallelCommandListSet ParallelCommandListSet(*ShadowDepthView, SceneRenderer, RHICmdList, CVarRHICmdShadowDeferredContexts.GetValueOnRenderThread() > 0, !bFlush, DrawRenderState, *this, BeginShadowRenderPass);

            // 發送繪製指令
            ShadowDepthPass.DispatchDraw(&ParallelCommandListSet, RHICmdList);
        }
    }
    // 非並行模式
    else
    {
        ShadowDepthPass.DispatchDraw(nullptr, RHICmdList);
    }
}

使用起來很簡單很方便是不?這就是UE在背後為我們做了大量的封裝和細節處理。

除了FParallelMeshDrawCommandPass,還有一種更簡單的調用繪製指令的方式:DrawDynamicMeshPass。DrawDynamicMeshPass只需要傳入view/RHICommandList以及一個lambda匿名函數就可,它的聲明及使用例子如下:

// Engine\Source\Runtime\Renderer\Public\MeshPassProcessor.inl

// DrawDynamicMeshPass的聲明
template<typename LambdaType>
void DrawDynamicMeshPass(const FSceneView& View, FRHICommandList& RHICmdList, const LambdaType& BuildPassProcessorLambda, bool bForceStereoInstancingOff = false);


// Engine\Source\Runtime\Renderer\Private\DepthRendering.cpp

void FDeferredShadingSceneRenderer::RenderPrePassEditorPrimitives(FRHICommandList& RHICmdList, const FViewInfo& View, const FMeshPassProcessorRenderState& DrawRenderState, EDepthDrawingMode DepthDrawingMode, bool bRespectUseAsOccluderFlag) 
{
    (......)

    bool bDirty = false;
    if (!View.Family->EngineShowFlags.CompositeEditorPrimitives)
    {
        const bool bNeedToSwitchVerticalAxis = RHINeedsToSwitchVerticalAxis(ShaderPlatform);
        const FScene* LocalScene = Scene;

        // 調用DrawDynamicMeshPass處理深度Pass。
        DrawDynamicMeshPass(View, RHICmdList,
            [&View, &DrawRenderState, LocalScene, DepthDrawingMode, bRespectUseAsOccluderFlag](FDynamicPassMeshDrawListContext* DynamicMeshPassContext)
            {
                FDepthPassMeshProcessor PassMeshProcessor(
                    LocalScene,
                    &View,
                    DrawRenderState,
                    bRespectUseAsOccluderFlag,
                    DepthDrawingMode,
                    false,
                    DynamicMeshPassContext);

                const uint64 DefaultBatchElementMask = ~0ull;
                    
                for (int32 MeshIndex = 0; MeshIndex < View.ViewMeshElements.Num(); MeshIndex++)
                {
                    const FMeshBatch& MeshBatch = View.ViewMeshElements[MeshIndex];
                    PassMeshProcessor.AddMeshBatch(MeshBatch, DefaultBatchElementMask, nullptr);
                }
            });

        (......)
    }
}

3.3.3 靜態繪製路徑

靜態繪製路徑通常可以被快取,所以也叫快取繪製路徑,適用的對象可以是靜態模型(可在UE編輯器的網格屬性面板中指定,見下圖)。

靜態模型在其對應的FPrimitiveSceneInfo在調用AddToScene時,被執行快取處理,下面是具體的處理程式碼和解析:

// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp

void FPrimitiveSceneInfo::AddToScene(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bUpdateStaticDrawLists, bool bAddToStaticDrawLists, bool bAsyncCreateLPIs)
{
    (......)

    {
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddToScene_AddStaticMeshes, FColor::Magenta);
        // 處理靜態模型
        if (bUpdateStaticDrawLists)
        {
            AddStaticMeshes(RHICmdList, Scene, SceneInfos, bAddToStaticDrawLists);
        }
    }

    (......)
}

void FPrimitiveSceneInfo::AddStaticMeshes(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos, bool bAddToStaticDrawLists)
{
    LLM_SCOPE(ELLMTag::StaticMesh);

    {
        // 並行處理靜態圖元。
        ParallelForTemplate(SceneInfos.Num(), [Scene, &SceneInfos](int32 Index)
        {
            SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddStaticMeshes_DrawStaticElements, FColor::Magenta);
            FPrimitiveSceneInfo* SceneInfo = SceneInfos[Index];
            // 快取圖元的靜態元素。
            FBatchingSPDI BatchingSPDI(SceneInfo);
            BatchingSPDI.SetHitProxy(SceneInfo->DefaultDynamicHitProxy);
            // 調用Proxy的DrawStaticElements介面,將收集到的FStaticMeshBatch添加到SceneInfo->StaticMeshes中。
            SceneInfo->Proxy->DrawStaticElements(&BatchingSPDI);
            SceneInfo->StaticMeshes.Shrink();
            SceneInfo->StaticMeshRelevances.Shrink();

            check(SceneInfo->StaticMeshRelevances.Num() == SceneInfo->StaticMeshes.Num());
        });
    }

    {
        // 將所有PrimitiveSceneInfo的staticMeshBatch添加到場景的StaticMeshe列表。
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_AddStaticMeshes_UpdateSceneArrays, FColor::Blue);
        for (FPrimitiveSceneInfo* SceneInfo : SceneInfos)
        {
            for (int32 MeshIndex = 0; MeshIndex < SceneInfo->StaticMeshes.Num(); MeshIndex++)
            {
                FStaticMeshBatchRelevance& MeshRelevance = SceneInfo->StaticMeshRelevances[MeshIndex];
                FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshIndex];

                // Add the static mesh to the scene's static mesh list.
                // 添加靜態網格元素到場景的靜態網格列表。
                FSparseArrayAllocationInfo SceneArrayAllocation = Scene->StaticMeshes.AddUninitialized();
                Scene->StaticMeshes[SceneArrayAllocation.Index] = &Mesh;
                Mesh.Id = SceneArrayAllocation.Index;
                MeshRelevance.Id = SceneArrayAllocation.Index;

                // 處理逐元素的可見性(如果需要的話)。
                if (Mesh.bRequiresPerElementVisibility)
                {
                    // Use a separate index into StaticMeshBatchVisibility, since most meshes don't use it
                    Mesh.BatchVisibilityId = Scene->StaticMeshBatchVisibility.AddUninitialized().Index;
                    Scene->StaticMeshBatchVisibility[Mesh.BatchVisibilityId] = true;
                }
            }
        }
    }

    // 快取靜態的MeshDrawCommand
    if (bAddToStaticDrawLists)
    {
        CacheMeshDrawCommands(RHICmdList, Scene, SceneInfos);
    }
}

void FPrimitiveSceneInfo::CacheMeshDrawCommands(FRHICommandListImmediate& RHICmdList, FScene* Scene, const TArrayView<FPrimitiveSceneInfo*>& SceneInfos)
{
    //@todo - only need material uniform buffers to be created since we are going to cache pointers to them
    // Any updates (after initial creation) don't need to be forced here
    FMaterialRenderProxy::UpdateDeferredCachedUniformExpressions();

    SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_CacheMeshDrawCommands, FColor::Emerald);

    QUICK_SCOPE_CYCLE_COUNTER(STAT_CacheMeshDrawCommands);
    FMemMark Mark(FMemStack::Get());

    // 計數並行的執行緒數量。
    static constexpr int BATCH_SIZE = 64;
    const int NumBatches = (SceneInfos.Num() + BATCH_SIZE - 1) / BATCH_SIZE;

    // 執行緒回調。
    auto DoWorkLambda = [Scene, SceneInfos](int32 Index)
    {
        SCOPED_NAMED_EVENT(FPrimitiveSceneInfo_CacheMeshDrawCommand, FColor::Green);

        struct FMeshInfoAndIndex
        {
            int32 InfoIndex;
            int32 MeshIndex;
        };

        TArray<FMeshInfoAndIndex, TMemStackAllocator<>> MeshBatches;
        MeshBatches.Reserve(3 * BATCH_SIZE);

        // 遍歷當前執行緒的範圍,逐個處理PrimitiveSceneInfo
        int LocalNum = FMath::Min((Index * BATCH_SIZE) + BATCH_SIZE, SceneInfos.Num());
        for (int LocalIndex = (Index * BATCH_SIZE); LocalIndex < LocalNum; LocalIndex++)
        {
            FPrimitiveSceneInfo* SceneInfo = SceneInfos[LocalIndex];
            check(SceneInfo->StaticMeshCommandInfos.Num() == 0);
            SceneInfo->StaticMeshCommandInfos.AddDefaulted(EMeshPass::Num * SceneInfo->StaticMeshes.Num());
            FPrimitiveSceneProxy* SceneProxy = SceneInfo->Proxy;

            // 體積透明陰影需要每幀更新,不能快取。
            if (!SceneProxy->CastsVolumetricTranslucentShadow())
            {
                // 將PrimitiveSceneInfo的所有靜態網格添加到MeshBatch列表。
                for (int32 MeshIndex = 0; MeshIndex < SceneInfo->StaticMeshes.Num(); MeshIndex++)
                {
                    FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshIndex];
                    // 檢測一下是否支援快取MeshDrawCommand
                    if (SupportsCachingMeshDrawCommands(Mesh))
                    {
                        MeshBatches.Add(FMeshInfoAndIndex{ LocalIndex, MeshIndex });
                    }
                }
            }
        }

        // 遍歷所有預定義Pass,將每個靜態元素生成的MeshDrawCommand添加到對應Pass的快取列表中。
        for (int32 PassIndex = 0; PassIndex < EMeshPass::Num; PassIndex++)
        {
            const EShadingPath ShadingPath = Scene->GetShadingPath();
            EMeshPass::Type PassType = (EMeshPass::Type)PassIndex;

            if ((FPassProcessorManager::GetPassFlags(ShadingPath, PassType) & EMeshPassFlags::CachedMeshCommands) != EMeshPassFlags::None)
            {
                // 聲明快取繪製命令實例
                FCachedMeshDrawCommandInfo CommandInfo(PassType);

                // 從場景中獲取對應Pass的各種容器,以構建FCachedPassMeshDrawListContext。
                FCriticalSection& CachedMeshDrawCommandLock = Scene->CachedMeshDrawCommandLock[PassType];
                FCachedPassMeshDrawList& SceneDrawList = Scene->CachedDrawLists[PassType];
                FStateBucketMap& CachedMeshDrawCommandStateBuckets = Scene->CachedMeshDrawCommandStateBuckets[PassType];
                FCachedPassMeshDrawListContext CachedPassMeshDrawListContext(CommandInfo, CachedMeshDrawCommandLock, SceneDrawList, CachedMeshDrawCommandStateBuckets, *Scene);

                // 創建Pass的FMeshPassProcessor
                PassProcessorCreateFunction CreateFunction = FPassProcessorManager::GetCreateFunction(ShadingPath, PassType);
                FMeshPassProcessor* PassMeshProcessor = CreateFunction(Scene, nullptr, &CachedPassMeshDrawListContext);

                if (PassMeshProcessor != nullptr)
                {
                    for (const FMeshInfoAndIndex& MeshAndInfo : MeshBatches)
                    {
                        FPrimitiveSceneInfo* SceneInfo = SceneInfos[MeshAndInfo.InfoIndex];
                        FStaticMeshBatch& Mesh = SceneInfo->StaticMeshes[MeshAndInfo.MeshIndex];
                        
                        CommandInfo = FCachedMeshDrawCommandInfo(PassType);
                        FStaticMeshBatchRelevance& MeshRelevance = SceneInfo->StaticMeshRelevances[MeshAndInfo.MeshIndex];

                        check(!MeshRelevance.CommandInfosMask.Get(PassType));

                        check(!Mesh.bRequiresPerElementVisibility);
                        uint64 BatchElementMask = ~0ull;
                        // 添加MeshBatch到PassMeshProcessor,內部會將FMeshBatch轉換到FMeshDrawCommand。
                        PassMeshProcessor->AddMeshBatch(Mesh, BatchElementMask, SceneInfo->Proxy);

                        if (CommandInfo.CommandIndex != -1 || CommandInfo.StateBucketId != -1)
                        {
                            static_assert(sizeof(MeshRelevance.CommandInfosMask) * 8 >= EMeshPass::Num, "CommandInfosMask is too small to contain all mesh passes.");
                            MeshRelevance.CommandInfosMask.Set(PassType);
                            MeshRelevance.CommandInfosBase++;

                            int CommandInfoIndex = MeshAndInfo.MeshIndex * EMeshPass::Num + PassType;
                            check(SceneInfo->StaticMeshCommandInfos[CommandInfoIndex].MeshPass == EMeshPass::Num);
                            // 將CommandInfo快取到PrimitiveSceneInfo中。
                            SceneInfo->StaticMeshCommandInfos[CommandInfoIndex] = CommandInfo;
                            
                            (......)
                        }
                    }
                    // 銷毀FMeshPassProcessor
                    PassMeshProcessor->~FMeshPassProcessor();
                }
            }
        }

        (......)
    };

    // 並行模式
    if (FApp::ShouldUseThreadingForPerformance())
    {
        ParallelForTemplate(NumBatches, DoWorkLambda, EParallelForFlags::PumpRenderingThread);
    }
    // 單執行緒模式
    else
    {
        for (int Idx = 0; Idx < NumBatches; Idx++)
        {
            DoWorkLambda(Idx);
        }
    }

    FGraphicsMinimalPipelineStateId::InitializePersistentIds();
                    
    (.....)
}

上面的程式碼可知,靜態網格在加入場景時就會快取FMeshBatch,並且可能快取對應的FMeshDrawCommand。其中判斷是否支援快取FMeshDrawCommand的關鍵介面是SupportsCachingMeshDrawCommands,它的實現如下:

// Engine\Source\Runtime\Engine\Private\PrimitiveSceneProxy.cpp

bool SupportsCachingMeshDrawCommands(const FMeshBatch& MeshBatch)
{
    return
        // FMeshBatch只有一個元素。
        (MeshBatch.Elements.Num() == 1) &&

        // 頂點工廠支援快取FMeshDrawCommand
        MeshBatch.VertexFactory->GetType()->SupportsCachingMeshDrawCommands();
}


// Engine\Source\Runtime\RenderCore\Public\VertexFactory.h

bool FVertexFactoryType::SupportsCachingMeshDrawCommands() const 
{ 
    return bSupportsCachingMeshDrawCommands; 
}

由此可見,決定是否可以快取FMeshDrawCommand的條件是FMeshBatch只有一個元素且其使用的頂點工廠支援快取。

目前只有FLocalVertexFactory (UStaticMeshComponent)支援,其它頂點工廠都需要依賴view設置shader綁定。

只要任何一個條件不滿足,則無法快取FMeshDrawCommand。更詳細地說,需要滿足以下條件:

  • 該Pass是EMeshPass::Type的枚舉。
  • EMeshPassFlags::CachedMeshCommands標記在註冊自定義mesh pass processor時被正確傳遞。
  • mesh pass processor可以不依賴FSceneView就處理好所有shader綁定數據,以為快取期間FSceneView為null。

需要注意的是,快取的繪製命令所引用的任何數據發生了改變,都必須使該命令無效並重新生成。

調用FPrimitiveSceneInfo::BeginDeferredUpdateStaticMeshes可以讓指定繪製命令無效。

設置Scene->bScenesPrimitivesNeedStaticMeshElementUpdate為true可以讓所有快取失效,會嚴重影響性能,建議不用或少用。

使快取無效會影響渲染性能,可選的替代方案是將可變的數據放到該Pass的UniformBuffer,通過UniformBuffer去執行不同的shader邏輯,以分離對基於view的shader綁定的依賴。

與動態繪製路徑不一樣的是,在收集靜態網格元素時,調用的是FPrimitiveSceneProxy::DrawStaticElements介面,這個介面由具體的子類實現,下面來看看其子類FStaticMeshSceneProxy的實現過程:

// Engine\Source\Runtime\Engine\Private\StaticMeshRender.cpp

void FStaticMeshSceneProxy::DrawStaticElements(FStaticPrimitiveDrawInterface* PDI)
{
    checkSlow(IsInParallelRenderingThread());
    
    // 是否開啟bUseViewOwnerDepthPriorityGroup
    if (!HasViewDependentDPG())
    {
        // Determine the DPG the primitive should be drawn in.
        uint8 PrimitiveDPG = GetStaticDepthPriorityGroup();
        int32 NumLODs = RenderData->LODResources.Num();
        //Never use the dynamic path in this path, because only unselected elements will use DrawStaticElements
        bool bIsMeshElementSelected = false;
        const auto FeatureLevel = GetScene().GetFeatureLevel();
        const bool IsMobile = IsMobilePlatform(GetScene().GetShaderPlatform());
        const int32 NumRuntimeVirtualTextureTypes = RuntimeVirtualTextureMaterialTypes.Num();

        //check if a LOD is being forced
        if (ForcedLodModel > 0) 
        {
            // 獲取LOD級別(索引)
            int32 LODIndex = FMath::Clamp(ForcedLodModel, ClampedMinLOD + 1, NumLODs) - 1;
            const FStaticMeshLODResources& LODModel = RenderData->LODResources[LODIndex];

            // 繪製所有子模型。
            for(int32 SectionIndex = 0; SectionIndex < LODModel.Sections.Num(); SectionIndex++)
            {
                const int32 NumBatches = GetNumMeshBatches();
                PDI->ReserveMemoryForMeshes(NumBatches * (1 + NumRuntimeVirtualTextureTypes));

                // 將所有批次的元素加入PDI繪製。
                for (int32 BatchIndex = 0; BatchIndex < NumBatches; BatchIndex++)
                {
                    FMeshBatch BaseMeshBatch;

                    if (GetMeshElement(LODIndex, BatchIndex, SectionIndex, PrimitiveDPG, bIsMeshElementSelected, true, BaseMeshBatch))
                    {
                        (......)
                        {
                            // 加入到PDI執行繪製
                            PDI->DrawMesh(BaseMeshBatch, FLT_MAX);
                        }
                    }
                }
            }
        } 
        
        (......)
    }
}

由此可見,DrawStaticElements介面會傳入FStaticPrimitiveDrawInterface的實例,以收集該PrimitiveSceneProxy的所有靜態元素,下面進入FStaticPrimitiveDrawInterface及其子類FBatchingSPDI的聲明和實現,以探其真容:

// Engine\Source\Runtime\Engine\Public\SceneManagement.h

class FStaticPrimitiveDrawInterface
{
public:
    virtual void SetHitProxy(HHitProxy* HitProxy) = 0;
    virtual void ReserveMemoryForMeshes(int32 MeshNum) = 0;

    // PDI的繪製介面
    virtual void DrawMesh(const FMeshBatch& Mesh, float ScreenSize) = 0;
};


// Engine\Source\Runtime\Renderer\Private\PrimitiveSceneInfo.cpp

class FBatchingSPDI : public FStaticPrimitiveDrawInterface
{
public:
    (......)

    // 實現PDI的繪製介面
    virtual void DrawMesh(const FMeshBatch& Mesh, float ScreenSize) final override
    {
        if (Mesh.HasAnyDrawCalls())
        {
            FPrimitiveSceneProxy* PrimitiveSceneProxy = PrimitiveSceneInfo->Proxy;
            PrimitiveSceneProxy->VerifyUsedMaterial(Mesh.MaterialRenderProxy);

            // 創建新的FStaticMeshBatch實例,且加入到PrimitiveSceneInfo的StaticMeshe列表中。
            FStaticMeshBatch* StaticMesh = new(PrimitiveSceneInfo->StaticMeshes) FStaticMeshBatch(
                PrimitiveSceneInfo,
                Mesh,
                CurrentHitProxy ? CurrentHitProxy->Id : FHitProxyId()
                );

            const ERHIFeatureLevel::Type FeatureLevel = PrimitiveSceneInfo->Scene->GetFeatureLevel();
            StaticMesh->PreparePrimitiveUniformBuffer(PrimitiveSceneProxy, FeatureLevel);

            // Volumetric self shadow mesh commands need to be generated every frame, as they depend on single frame uniform buffers with self shadow data.
            const bool bSupportsCachingMeshDrawCommands = SupportsCachingMeshDrawCommands(*StaticMesh, FeatureLevel) && !PrimitiveSceneProxy->CastsVolumetricTranslucentShadow();

            // 處理Relevance
            bool bUseSkyMaterial = Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel)->IsSky();
            bool bUseSingleLayerWaterMaterial = Mesh.MaterialRenderProxy->GetMaterial(FeatureLevel)->GetShadingModels().HasShadingModel(MSM_SingleLayerWater);
            FStaticMeshBatchRelevance* StaticMeshRelevance = new(PrimitiveSceneInfo->StaticMeshRelevances) FStaticMeshBatchRelevance(
                *StaticMesh, 
                ScreenSize, 
                bSupportsCachingMeshDrawCommands,
                bUseSkyMaterial,
                bUseSingleLayerWaterMaterial,
                FeatureLevel
            );
        }
    }

private:
    FPrimitiveSceneInfo* PrimitiveSceneInfo;
    TRefCountPtr<HHitProxy> CurrentHitProxy;
};

FBatchingSPDI::DrawMesh最主要作用是將PrimitiveSceneProxy轉換成FStaticMeshBatch,然後處理網格的Relevance數據。

 

3.4 渲染機制總結

3.4.1 繪製管線優化技術

前面章節已經詳細闡述了UE是如何將圖元從Component一步步地轉成最終的繪製指令,這樣做的目的主要是為了提升渲染性能,總結起來,涉及的優化技術主要有以下幾點:

  • 繪製調用合併

由於所有的FMeshDrawCommands 都是事先捕獲,而不是立即提交給GPU,這就給Draw Call合併提供了有利的基礎保障。不過目前版本的合併是基於D3D11的特性,根據shader綁定決定是否合併成同一個instance調用。基於D3D12的聚合合併目前尚未實現。

除了合併,排序也能使得相似的指令在相鄰時間繪製,提升CPU和GPU的快取命中,減少調用指令數量。

  • 動態實例化

為了合併兩個Draw Call,它們必須擁有一致的shader綁定(FMeshDrawCommand::MatchesForDynamicInstancing返回true)。

當前只有快取的網格繪製命令才會被動態實例化,並且受FLocalVertexFactory是否支援快取的限制。另外,有一些特殊的情況也會阻止合併:

  • Lightmap產生了很小的紋理(可調整DefaultEngine.iniMaxLightmapRadius 參數)。
  • 逐組件的頂點顏色。
  • SpeedTree帶風節點。

使用控制台命令r.MeshDrawCommands.LogDynamicInstancingStats 1可探查動態實例的效益。

  • 並行繪製

大多數的網格繪製任務不是在渲染執行緒中執行的,而是由TaskGraph系統並行觸發。並行部分有Pass的Content設置,動態指令生成/排序/合併等。

並行的數量由運行設備的CPU核心數量決定,並行開啟之後,存在Join階段,以等待並行的所有執行緒都執行完畢(FSceneRenderer::WaitForTasksClearSnapshotsAndDeleteSceneRenderer開啟並行繪製等待)。

  • 快取繪製指令

UE為了提升快取的比例和效率,分離了動態和靜態物體的繪製,分別形成動態繪製路徑和靜態繪製路徑,而靜態繪製路徑可以在圖元加入場景時就快取FMeshBatch和FMeshDrawCommand,這樣就達成了一次生成多次繪製帶來的高效益。

  • 提升快取命中率

CPU或GPU的快取都具體時間局部性和空間局部性原則。時間局部性意味著最近訪問的數據如果再次被訪問,則快取命中的概率較大;空間局部性意味著當前在處理的數據的相鄰數據被快取命中的概率較大,還包含預讀取(prefetch)命中率。

UE通過以下手段來提升快取命中率:

  • 基於數據驅動的設計,而非面向對象的設計。

    • 如FMeshDrawCommand的結構設計。
  • 連續存儲數據。

    • 使用TChunkedArray存儲FMeshDrawCommand。
  • 記憶體對齊。

    • 使用訂製的記憶體對齊器和記憶體分配器。
  • 輕量化數據結構。

  • 連續存取數據。

    • 連續遍歷繪製指令。
  • 繪製指令排序。

    • 使相似的指令排在一起,充分利用快取的時間局部性。

3.4.2 調試控制台變數

下面列出並行繪製相關的控制台命令,以便動態設置或調試其性能和行為:

控制變數 解析
r.MeshDrawCommands.ParallelPassSetup 開關mesh draw command並行處理Pass。
r.MeshDrawCommands.UseCachedCommands 開關繪製命令快取。
r.MeshDrawCommands.DynamicInstancing 開關動態實例化。
r.MeshDrawCommands.LogDynamicInstancingStats 輸出動態實例化的數據,常用於查探動態實例化的效益。
r.RHICmdBasePassDeferredContexts 開關base pass的並行繪製。

3.4.3 局限性

UE目前存在的模型繪製路徑,引入了很多步驟和概念,這樣的目的就是儘可能地提升渲染效率。但這樣的做並不是只有好處而沒有壞處,正所謂天下沒有免費的午餐。總的來說,這樣的渲染機制存在以下一些弊端:

  • 系統顯得龐大且複雜,增加初學者的學習成本。
  • 增加重構和擴展成本,譬如無法很快捷地實現多Pass的繪製或者增加一個指定的Pass,必須得深入理解/熟悉/修改引擎底層源碼才能實現。
  • UE這種重度的繪製管線封裝具有一定的基礎消耗,對於簡單的應用場景,性能上可能反而沒有那些未做封裝的渲染引擎好。

兩權相利取其重,這是UE長期權衡取捨和改進的結果。

但這個繪製管線是面向未來,迎合諸如虛擬化紋理和幾何體、RGD、GPU Driven Rendering Pipeline和實時光線追蹤的技術。

3.4.4 本篇作業

前兩篇沒有布置作業,本篇開始布置一些小作業,以便讀者們加深理解和掌握UE的渲染體系。本篇的小作業如下:

  • 簡潔地複述模型繪製管線的過程和設計概念及其作用。

  • 請闡述目前的模型繪製管線有哪些可優化的邏輯。

  • 增加一個可以繪製任意個材質的Mesh Component。

  • 增加一個專用Pass,用以繪製半透明和Masked物體的深度。

以上皆屬於開放性題目,沒有標準答案,有思路的同學歡迎在評論區回復,筆者會盡量回復。

 

特別說明

  • 感謝所有參考文獻的作者,部分圖片來自參考文獻和網路,侵刪。
  • 本系列文章為筆者原創,只發表在部落格園上,歡迎分享本文鏈接,但未經同意,不允許轉載!
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目

 

參考文獻