剖析虛幻渲染體系(08)- Shader體系

 

 

8.1 本篇概述

Shader是在GPU側執行的邏輯指令,根據執行單元的不同,可分為頂點著色器(Vertex Shader)、像素著色器(Pixel Shader)、計算著色器(Compute Shader),以及幾何著色器、網格著色器等等。

UE的Shader為了跨平台、跨圖形API,做了很多封裝和抽象,由此闡述的類型和概念非常多,另外,為了優化,提升程式碼復用率,又增加了排列、PSO、DDC等概念和類型。

前面很多篇章都有涉及Shader的概念、類型和程式碼,本篇將更加深入且廣泛低闡述它的體系。主要闡述UE的以下內容:

  • Shader的基礎概念。
  • Shader的基礎類型。
  • Shader的實現層級。
  • Shader的使用方法和用例。
  • Shader的實現和原理。
  • Shader的跨平台機制。

需要注意的是,本篇涉及的Shader既包含C++層的概念和類型,也包括GPU層的概念和類型。

 

8.2 Shader基礎

本章將分析Shader涉及的基礎概念和類型,闡述它們之間的基本關係和使用方法。

8.2.1 FShader

FShader是一個已經編譯好的著色器程式碼和它的參數綁定的類型,是我們在渲染程式碼中最基礎、核心、常見的一個類型。它的定義如下:

// Engine\Source\Runtime\RenderCore\Public\Shader.h

class RENDERCORE_API FShader
{
public:
    (......)

    // 在編譯觸發之前修改編譯環境參數, 可由子類覆蓋.
    static void ModifyCompilationEnvironment(const FShaderPermutationParameters&, FShaderCompilerEnvironment&) {}
    // 是否需要編譯指定的排列, 可由子類覆蓋.
    static bool ShouldCompilePermutation(const FShaderPermutationParameters&) { return true; }
    // 檢測編譯結果是否有效, 可由子類覆蓋.
    static bool ValidateCompiledResult(EShaderPlatform InPlatform, const FShaderParameterMap& InParameterMap, TArray<FString>& OutError) { return true; }

    // 獲取各類數據的Hash的介面.
    const FSHAHash& GetHash() const;
    const FSHAHash& GetVertexFactoryHash() const;
    const FSHAHash& GetOutputHash() const;

    // 保存並檢測shader程式碼的編譯結果.
    void Finalize(const FShaderMapResourceCode* Code);

    // 數據獲取介面.
    inline FShaderType* GetType(const FShaderMapPointerTable& InPointerTable) const { return Type.Get(InPointerTable.ShaderTypes); }
    inline FShaderType* GetType(const FPointerTableBase* InPointerTable) const { return Type.Get(InPointerTable); }
    inline FVertexFactoryType* GetVertexFactoryType(const FShaderMapPointerTable& InPointerTable) const { return VFType.Get(InPointerTable.VFTypes); }
    inline FVertexFactoryType* GetVertexFactoryType(const FPointerTableBase* InPointerTable) const { return VFType.Get(InPointerTable); }
    inline FShaderType* GetTypeUnfrozen() const { return Type.GetUnfrozen(); }
    inline int32 GetResourceIndex() const { checkSlow(ResourceIndex != INDEX_NONE); return ResourceIndex; }
    inline EShaderPlatform GetShaderPlatform() const { return Target.GetPlatform(); }
    inline EShaderFrequency GetFrequency() const { return Target.GetFrequency(); }
    inline const FShaderTarget GetTarget() const { return Target; }
    inline bool IsFrozen() const { return Type.IsFrozen(); }
    inline uint32 GetNumInstructions() const { return NumInstructions; }

#if WITH_EDITORONLY_DATA
    inline uint32 GetNumTextureSamplers() const { return NumTextureSamplers; }
    inline uint32 GetCodeSize() const { return CodeSize; }
    inline void SetNumInstructions(uint32 Value) { NumInstructions = Value; }
#else
    inline uint32 GetNumTextureSamplers() const { return 0u; }
    inline uint32 GetCodeSize() const { return 0u; }
#endif

    // 嘗試返回匹配指定類型的自動綁定的Uniform Buffer, 如果不存在則返回未綁定的.
    template<typename UniformBufferStructType>
    const TShaderUniformBufferParameter<UniformBufferStructType>& GetUniformBufferParameter() const;
    const FShaderUniformBufferParameter& GetUniformBufferParameter(const FShaderParametersMetadata* SearchStruct) const;
    const FShaderUniformBufferParameter& GetUniformBufferParameter(const FHashedName SearchName) const;
    const FShaderParametersMetadata* FindAutomaticallyBoundUniformBufferStruct(int32 BaseIndex) const;
    static inline const FShaderParametersMetadata* GetRootParametersMetadata();

    (......)

public:
    // 著色器參數綁定.
    LAYOUT_FIELD(FShaderParameterBindings, Bindings);
    // 著色器參數綁定的映射資訊.
    LAYOUT_FIELD(FShaderParameterMapInfo, ParameterMapInfo);

protected:
    LAYOUT_FIELD(TMemoryImageArray<FHashedName>, UniformBufferParameterStructs);
    LAYOUT_FIELD(TMemoryImageArray<FShaderUniformBufferParameter>, UniformBufferParameters);

    // 下面3個是編輯器參數.
    // 著色器的編譯輸出和結果參數映射的哈希值, 用於查找匹配的資源.
    LAYOUT_FIELD_EDITORONLY(FSHAHash, OutputHash);
    // 頂點工廠資源哈希值
    LAYOUT_FIELD_EDITORONLY(FSHAHash, VFSourceHash);
    // shader資源哈希值.
    LAYOUT_FIELD_EDITORONLY(FSHAHash, SourceHash);

private:
    // 著色器類型.
    LAYOUT_FIELD(TIndexedPtr<FShaderType>, Type);
    // 頂點工廠類型.
    LAYOUT_FIELD(TIndexedPtr<FVertexFactoryType>, VFType);
    // 目標平台和著色頻率(frequency).
    LAYOUT_FIELD(FShaderTarget, Target);
    
    // 在FShaderMapResource的shader索引.
    LAYOUT_FIELD(int32, ResourceIndex);
    // shader指令數.
    LAYOUT_FIELD(uint32, NumInstructions);
    // 紋理取樣器數量.
    LAYOUT_FIELD_EDITORONLY(uint32, NumTextureSamplers);
    // shader程式碼尺寸.
    LAYOUT_FIELD_EDITORONLY(uint32, CodeSize);
};

以上可知,FShader存儲著Shader關聯的綁定參數、頂點工廠、編譯後的各類資源等數據,並提供了編譯器修改和檢測介面,還有各類數據獲取介面。

FShader實際上是個基礎父類,它的子類有:

  • FGlobalShader:全局著色器,它的子類在記憶體中只有唯一的實例,常用於螢幕方塊繪製、後處理等。它的定義如下:

    // Engine\Source\Runtime\RenderCore\Public\GlobalShader.h
    
    class FGlobalShader : public FShader
    {
    public:
        (......)
    
        FGlobalShader() : FShader() {}
        FGlobalShader(const ShaderMetaType::CompiledShaderInitializerType& Initializer);
        
        // 設置視圖著色器參數.
        template<typename TViewUniformShaderParameters, typename ShaderRHIParamRef, typename TRHICmdList>
        inline void SetParameters(TRHICmdList& RHICmdList, ...);
    };
    

    相比父類FShader,增加了SetParameters設置視圖統一緩衝的介面。

  • FMaterialShader:材質著色器,由FMaterialShaderType指定的材質引用的著色器,是材質藍圖在實例化後的一個shader子集。它的定義如下:

    // Engine\Source\Runtime\Renderer\Public\MaterialShader.h
    
    class RENDERER_API FMaterialShader : public FShader
    {
    public:
        (......)
    
        FMaterialShader() = default;
        FMaterialShader(const FMaterialShaderType::CompiledShaderInitializerType& Initializer);
    
        // 設置視圖Uniform Buffer參數.
        template<typename ShaderRHIParamRef>
        void SetViewParameters(FRHICommandList& RHICmdList, ...);
        // 設置材質相關但非FMeshBatch相關的像素著色器參數
        template< typename TRHIShader >
        void SetParameters(FRHICommandList& RHICmdList, ...);
        // 獲取著色器參數綁定.
        void GetShaderBindings(const FScene* Scene, ...) const;
    
    private:
        // 是否允許Uniform表達式快取.
        static int32 bAllowCachedUniformExpressions;
        // bAllowCachedUniformExpressions對應的控制台遍歷.
        static FAutoConsoleVariableRef CVarAllowCachedUniformExpressions;
    
    #if !(UE_BUILD_TEST || UE_BUILD_SHIPPING || !WITH_EDITOR)
        // 驗證表達式和著色器圖的有效性.
        void VerifyExpressionAndShaderMaps(const FMaterialRenderProxy* MaterialRenderProxy, const FMaterial& Material, const FUniformExpressionCache* UniformExpressionCache) const;
    #endif
        // 分配的參數Uniform Buffer.
        LAYOUT_FIELD(TMemoryImageArray<FShaderUniformBufferParameter>, ParameterCollectionUniformBuffers);
        // 材質的著色器Uniform Buffer.
        LAYOUT_FIELD(FShaderUniformBufferParameter, MaterialUniformBuffer);
    
        (......)
    };
    

下面是FShader繼承體系下的部分子類:

FShader
    FGlobalShader
        TMeshPaintVertexShader
        TMeshPaintPixelShader
        FDistanceFieldDownsamplingCS
        FBaseGPUSkinCacheCS
            TGPUSkinCacheCS
        FBaseRecomputeTangentsPerTriangleShader
        FBaseRecomputeTangentsPerVertexShader
        FRadixSortUpsweepCS
        FRadixSortDownsweepCS
        FParticleTileVS
        FBuildMipTreeCS
        FScreenVS
        FScreenPS
        FScreenPSInvertAlpha
        FSimpleElementVS
        FSimpleElementPS
        FStereoLayerVS
        FStereoLayerPS_Base
            FStereoLayerPS
        FUpdateTexture2DSubresouceCS
        FUpdateTexture3DSubresouceCS
        FCopyTexture2DCS
        TCopyDataCS
        FLandscapeLayersVS
        FLandscapeLayersHeightmapPS
        FGenerateMipsCS
        FGenerateMipsVS
        FGenerateMipsPS
        FCopyTextureCS
        FMediaShadersVS
        FRGBConvertPS
        FYUVConvertPS
        FYUY2ConvertPS
        FRGB10toYUVv210ConvertPS
        FInvertAlphaPS
        FSetAlphaOnePS
        FReadTextureExternalPS
        FOculusVertexShader
        FRasterizeToRectsVS
        FResolveVS
        FResolveDepthPS
        FResolveDepth2XPS
        FAmbientOcclusionPS
        FGTAOSpatialFilterCS
        FGTAOTemporalFilterCS
        FDeferredDecalVS
        FDitheredTransitionStencilPS
        FObjectCullVS
        FObjectCullPS
        FDeferredLightPS
        TDeferredLightHairVS
        FFXAAVS
        FFXAAPS
        FMotionBlurShader
        FSubsurfaceShader
        FTonemapVS
        FTonemapPS
        FTonemapCS
        FUpscalePS
        FTAAStandaloneCS
        FSceneCapturePS
        FHZBTestPS
        FOcclusionQueryVS
        FOcclusionQueryPS
        FHZBBuildPS
        FHZBBuildCS
        FDownsampleDepthPS
        FTiledDeferredLightingCS
        FShader_VirtualTextureCompress
        FShader_VirtualTextureCopy
        FPageTableUpdateVS
        FPageTableUpdatePS
        FSlateElementVS
        FSlateElementPS
        (......)
    FMaterialShader
        FDeferredDecalPS
        FLightHeightfieldsPS
        FLightFunctionVS
        FLightFunctionPS
        FPostProcessMaterialShader
        TTranslucentLightingInjectPS
        FVolumetricFogLightFunctionPS
        FMeshMaterialShader
            FLightmapGBufferVS
            FLightmapGBufferPS
            FVLMVoxelizationVS
            FVLMVoxelizationGS
            FVLMVoxelizationPS
            FLandscapeGrassWeightVS
            FLandscapeGrassWeightPS
            FLandscapePhysicalMaterial
            FAnisotropyVS
            FAnisotropyPS
            TBasePassVertexShaderPolicyParamType
                TBasePassVertexShaderBaseType
                    TBasePassVS
            TBasePassPixelShaderPolicyParamType
                TBasePassPixelShaderBaseType
                    TBasePassPS
            FMeshDecalsVS
            FMeshDecalsPS
            TDepthOnlyVS
            TDepthOnlyPS
            FDistortionMeshVS
            FDistortionMeshPS
            FHairMaterialVS
            FHairMaterialPS
            FHairVisibilityVS
            FHairVisibilityPS
            TLightMapDensityVS
            TLightMapDensityPS
            FShadowDepthVS
            FShadowDepthBasePS
                TShadowDepthPS
            FTranslucencyShadowDepthVS
            FTranslucencyShadowDepthPS
            FVelocityVS
            FVelocityPS
            FRenderVolumetricCloudVS
            FVolumetricCloudShadowPS
            FVoxelizeVolumeVS
            FVoxelizeVolumePS
            FShader_VirtualTextureMaterialDraw
            (......)
        FSlateMaterialShaderVS
        FSlateMaterialShaderPS
        (......)

上述只是列出了FShader的部分繼承體系,包含了部分之前已經解析過的Shader類型,比如FDeferredLightPS、FFXAAPS、FTonemapPS、FUpscalePS、TBasePassPS、TDepthOnlyPS等等。

FGlobalShader包含了後處理、光照、工具類、可視化、地形、虛擬紋理等方面的Shader程式碼,可以是VS、PS、CS,但CS必然是FGlobalShader的子類;FMaterialShader主要包含了模型、專用Pass、體素化等方面的Shader程式碼,可以是VS、PS、GS等,但不會有CS。

如果新定義了FShader的子類,需要藉助下面的宏聲明和實現對應的程式碼(部分常見的宏):

// ------ Shader聲明和實現宏 ------

// 聲明指定類型(FShader子類)的Shader, 可以是Global, Material, MeshMaterial, ...
#define DECLARE_SHADER_TYPE(ShaderClass,ShaderMetaTypeShortcut,...)
// 實現指定類型的Shader, 可以是Global, Material, MeshMaterial, ...
#define IMPLEMENT_SHADER_TYPE(TemplatePrefix,ShaderClass,SourceFilename,FunctionName,Frequency)

// 聲明FGlobalShader及其子類.
#define DECLARE_GLOBAL_SHADER(ShaderClass)
// 實現FGlobalShader及其子類.
#define IMPLEMENT_GLOBAL_SHADER(ShaderClass,SourceFilename,FunctionName,Frequency)

// 實現Material著色器.
#define IMPLEMENT_MATERIAL_SHADER_TYPE(TemplatePrefix,ShaderClass,SourceFilename,FunctionName,Frequency)

// 其它不常見的宏
(......)

// ------ 示例1 ------

class FDeferredLightPS : public FGlobalShader
{
    // 在FDeferredLightPS類內聲明全局著色器
    DECLARE_SHADER_TYPE(FDeferredLightPS, Global)
    (......)
};
// 實現FDeferredLightPS著色器, 讓它和程式碼文件, 主入口及著色頻率關聯起來.
IMPLEMENT_GLOBAL_SHADER(FDeferredLightPS, "/Engine/Private/DeferredLightPixelShaders.usf", "DeferredLightPixelMain", SF_Pixel);


// ------ 示例2 ------

class FDeferredDecalPS : public FMaterialShader
{
    // 在類內聲明材質著色器
    DECLARE_SHADER_TYPE(FDeferredDecalPS,Material);
    (......)
};
// 實現FDeferredDecalPS類, 讓它和程式碼文件, 主入口以及著色頻率關聯起來.
IMPLEMENT_MATERIAL_SHADER_TYPE(,FDeferredDecalPS,TEXT("/Engine/Private/DeferredDecal.usf"),TEXT("MainPS"),SF_Pixel);

8.2.2 Shader Parameter

著色器參數是一組由CPU的C++層傳入GPU Shader並存儲於GPU暫存器或顯示記憶體的數據。下面是著色器參數常見類型的定義:

// Engine\Source\Runtime\RenderCore\Public\ShaderParameters.h

// 著色器的暫存器綁定參數, 它的類型可以是float1/2/3/4,數組, UAV等.
class FShaderParameter
{
    (......)
public:
    // 綁定指定名稱的參數.
    void Bind(const FShaderParameterMap& ParameterMap,const TCHAR* ParameterName, EShaderParameterFlags Flags = SPF_Optional);
    // 是否已被著色器綁定.
    bool IsBound() const;
    // 是否初始化.
    inline bool IsInitialized() const;

    // 數據獲取介面.
    uint32 GetBufferIndex() const;
    uint32 GetBaseIndex() const;
    uint32 GetNumBytes() const;

    (......)
};

// 著色器資源綁定(紋理或取樣器)
class FShaderResourceParameter
{
    (......)
public:
    // 綁定指定名稱的參數.
    void Bind(const FShaderParameterMap& ParameterMap,const TCHAR* ParameterName,EShaderParameterFlags Flags = SPF_Optional);
    bool IsBound() const;
    inline bool IsInitialized() const;

    uint32 GetBaseIndex() const;
    uint32 GetNumResources() const;

    (......)
};

// 綁定了UAV或SRV資源的類型.
class FRWShaderParameter
{
    (......)
public:
    // 綁定指定名稱的參數.
    void Bind(const FShaderParameterMap& ParameterMap,const TCHAR* BaseName);

    bool IsBound() const;
    bool IsUAVBound() const;
    uint32 GetUAVIndex() const;

    // 設置緩衝數據到RHI.
    template<typename TShaderRHIRef, typename TRHICmdList>
    inline void SetBuffer(TRHICmdList& RHICmdList, const TShaderRHIRef& Shader, const FRWBuffer& RWBuffer) const;
    template<typename TShaderRHIRef, typename TRHICmdList>
    inline void SetBuffer(TRHICmdList& RHICmdList, const TShaderRHIRef& Shader, const FRWBufferStructured& RWBuffer) const;

    // 設置紋理數據到RHI.
    template<typename TShaderRHIRef, typename TRHICmdList>
    inline void SetTexture(TRHICmdList& RHICmdList, const TShaderRHIRef& Shader, FRHITexture* Texture, FRHIUnorderedAccessView* UAV) const;

    // 從RHI取消設置UAV.
    template<typename TRHICmdList>
    inline void UnsetUAV(TRHICmdList& RHICmdList, FRHIComputeShader* ComputeShader) const;

    (......)
};

// 創建指定平台下的Uniform Buffer結構體的著色器程式碼聲明.
extern void CreateUniformBufferShaderDeclaration(const TCHAR* Name,const FShaderParametersMetadata& UniformBufferStruct, EShaderPlatform Platform, FString& OutDeclaration);

// 著色器統一緩衝參數.
class FShaderUniformBufferParameter
{
    (......)
public:
    // 修改編譯環境變數.
    static void ModifyCompilationEnvironment(const TCHAR* ParameterName,const FShaderParametersMetadata& Struct,EShaderPlatform Platform,FShaderCompilerEnvironment& OutEnvironment);

    // 綁定著色器參數.
    void Bind(const FShaderParameterMap& ParameterMap,const TCHAR* ParameterName,EShaderParameterFlags Flags = SPF_Optional);

    bool IsBound() const;
    inline bool IsInitialized() const;
    uint32 GetBaseIndex() const;

    (......)
};

// 指定結構體的著色器統一緩衝參數
template<typename TBufferStruct>
class TShaderUniformBufferParameter : public FShaderUniformBufferParameter
{
public:
    static void ModifyCompilationEnvironment(const TCHAR* ParameterName,EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment);

    (......)
};

由此可見,著色器參數可以綁定任何GPU類型的資源或數據,但不同的類只能綁定特定的著色器類型,不能夠混用,比如FRWShaderParameter只能綁定UAV或SRV。有了以上類型,就可以在C++層的Shader類配合LAYOUT_FIELD的相關宏聲明具體的Shader參數了。

LAYOUT_FIELD是可以聲明指定著色器參數的類型、名字、初始值、位域、寫入函數等數據的宏,其相關定義如下:

// Engine\Source\Runtime\Core\Public\Serialization\MemoryLayout.h

// 普通布局
#define LAYOUT_FIELD(T, Name, ...)
// 帶初始值
#define LAYOUT_FIELD_INITIALIZED(T, Name, Value, ...)
// 帶mutable和初始值
#define LAYOUT_MUTABLE_FIELD_INITIALIZED(T, Name, Value, ...)
// 數組布局
#define LAYOUT_ARRAY(T, Name, NumArray, ...)
#define LAYOUT_MUTABLE_BITFIELD(T, Name, BitFieldSize, ...)
// 位域
#define LAYOUT_BITFIELD(T, Name, BitFieldSize, ...)
// 帶寫入函數
#define LAYOUT_FIELD_WITH_WRITER(T, Name, Func)
#define LAYOUT_MUTABLE_FIELD_WITH_WRITER(T, Name, Func)
#define LAYOUT_WRITE_MEMORY_IMAGE(Func)
#define LAYOUT_TOSTRING(Func)

藉助LAYOUT_FIELD等宏,就可以在C++類中聲明指定類型的著色器參數,示例:

struct FMyExampleParam
{
    // 聲明非虛類.
    DECLARE_TYPE_LAYOUT(FMyExampleParam, NonVirtual);
    
    // 位域
    LAYOUT_FIELD(FShaderParameter, ShaderParam); // 等價於: FShaderParameter ShaderParam;
    LAYOUT_FIELD(FShaderResourceParameter, TextureParam); // 等價於: FShaderResourceParameter TextureParam;
    LAYOUT_FIELD(FRWShaderParameter, OutputUAV); // 等價於: FRWShaderParameter OutputUAV;
    
    // 數組, 第3個參數是最大數量.
    LAYOUT_ARRAY(FShaderResourceParameter, TextureArray, 5); // 等價於: FShaderResourceParameter TextureArray[5];
    LAYOUT_ARRAY(int32, Ids, 64); // 等價於: int32 Ids[64];
    
    LAYOUT_FIELD_INITIALIZED(uint32, Size, 0); // 等價於: int32 Size = 0;

    void WriteDataFunc(FMemoryImageWriter& Writer, const TMemoryImagePtr<FOtherExampleParam>& InParameters) const;
    // 帶寫入函數.
    LAYOUT_FIELD_WITH_WRITER(TMemoryImagePtr<FOtherExampleParam>, Parameters, WriteDataFunc);
};

8.2.3 Uniform Buffer

UE的Uniform Buffer涉及了幾個核心的概念,最底層的是RHI層的FRHIUniformBuffer,封裝了各種圖形API的統一緩衝區(也叫Constant Buffer),它的定義如下(去掉了實現和調試程式碼):

// Engine\Source\Runtime\RHI\Public\RHIResources.h

class FRHIUniformBuffer : public FRHIResource
{
public:
    // 構造函數.
    FRHIUniformBuffer(const FRHIUniformBufferLayout& InLayout);

    // 引用計數操作.
    uint32 AddRef() const;
    uint32 Release() const;
    
    // 數據獲取介面.
    uint32 GetSize() const;
    const FRHIUniformBufferLayout& GetLayout() const;
    bool IsGlobal() const;

private:
    // RHI Uniform Buffer的布局.
    const FRHIUniformBufferLayout* Layout;
    // 緩衝區尺寸.
    uint32 LayoutConstantBufferSize;
};

再往上一層就是TUniformBufferRef,會引用到上述的FRHIUniformBuffer:

// Engine\Source\Runtime\RHI\Public\RHIResources.h

// 定義FRHIUniformBuffer的引用類型.
typedef TRefCountPtr<FRHIUniformBuffer> FUniformBufferRHIRef;


// Engine\Source\Runtime\RenderCore\Public\ShaderParameterMacros.h

// 引用了指定類型的FRHIUniformBuffer的實例資源. 注意是繼承了FUniformBufferRHIRef.
template<typename TBufferStruct>
class TUniformBufferRef : public FUniformBufferRHIRef
{
public:
    TUniformBufferRef();

    // 根據給定的值創建Uniform Buffer, 並返回結構體引用. (模板)
    static TUniformBufferRef<TBufferStruct> CreateUniformBufferImmediate(const TBufferStruct& Value, EUniformBufferUsage Usage, EUniformBufferValidation Validation = EUniformBufferValidation::ValidateResources);
    // 根據給定的值創建[局部]的Uniform Buffer, 並返回結構體引用.
    static FLocalUniformBuffer CreateLocalUniformBuffer(FRHICommandList& RHICmdList, const TBufferStruct& Value, EUniformBufferUsage Usage);

    // 立即刷新緩衝區數據到RHI.
    void UpdateUniformBufferImmediate(const TBufferStruct& Value);

private:
    // 私有構造體, 只能給TUniformBuffer和TRDGUniformBuffer創建.
    TUniformBufferRef(FRHIUniformBuffer* InRHIRef);

    template<typename TBufferStruct2>
    friend class TUniformBuffer;

    friend class TRDGUniformBuffer<TBufferStruct>;
};

再往上一層就是引用了FUniformBufferRHIRef的TUniformBuffer和TRDGUniformBuffer,它們的定義如下:

// Engine\Source\Runtime\RenderCore\Public\UniformBuffer.h

// 引用了Uniform Buffer的資源.
template<typename TBufferStruct>
class TUniformBuffer : public FRenderResource
{
public:
    // 構造函數.
    TUniformBuffer()
        : BufferUsage(UniformBuffer_MultiFrame)
        , Contents(nullptr){}

    // 析構函數.
    ~TUniformBuffer()
    {
        if (Contents)
        {
            FMemory::Free(Contents);
        }
    }

    // 設置Uniform Buffer的內容數據.
    void SetContents(const TBufferStruct& NewContents)
    {
        SetContentsNoUpdate(NewContents);
        UpdateRHI();
    }
    // 清零Uniform Buffer的內容數據. (若內容為空會先創建)
    void SetContentsToZero()
    {
        if (!Contents)
        {
            Contents = (uint8*)FMemory::Malloc(sizeof(TBufferStruct), SHADER_PARAMETER_STRUCT_ALIGNMENT);
        }
        FMemory::Memzero(Contents, sizeof(TBufferStruct));
        UpdateRHI();
    }

    // 獲取內容.
    const uint8* GetContents() const 
    {
        return Contents;
    }

    // ----重載FRenderResource的介面----
    
    // 初始化動態RHI資源.
    virtual void InitDynamicRHI() override
    {
        check(IsInRenderingThread());
        UniformBufferRHI.SafeRelease();
        if (Contents)
        {
            // 根據二進位流的內容數據創建RHI資源.
            UniformBufferRHI = CreateUniformBufferImmediate<TBufferStruct>(*((const TBufferStruct*)Contents), BufferUsage);
        }
    }
    // 釋放動態RHI資源.
    virtual void ReleaseDynamicRHI() override
    {
        UniformBufferRHI.SafeRelease();
    }

    // 數據訪問介面.
    FRHIUniformBuffer* GetUniformBufferRHI() const
    { 
        return UniformBufferRHI; 
    }
    const TUniformBufferRef<TBufferStruct>& GetUniformBufferRef() const
    {
        return UniformBufferRHI;
    }

    // Buffer標記.
    EUniformBufferUsage BufferUsage;

protected:
    // 設置Uniform Buffer的內容數據.
    void SetContentsNoUpdate(const TBufferStruct& NewContents)
    {
        if (!Contents)
        {
            Contents = (uint8*)FMemory::Malloc(sizeof(TBufferStruct), SHADER_PARAMETER_STRUCT_ALIGNMENT);
        }
        FMemory::Memcpy(Contents,&NewContents,sizeof(TBufferStruct));
    }

private:
    // TUniformBufferRef的引用.
    TUniformBufferRef<TBufferStruct> UniformBufferRHI;
    // CPU側的內容數據.
    uint8* Contents;
};


// Engine\Source\Runtime\RenderCore\Public\RenderGraphResources.h

class FRDGUniformBuffer : public FRDGResource
{
public:
    bool IsGlobal() const;
    const FRDGParameterStruct& GetParameters() const;

    //////////////////////////////////////////////////////////////////////////
    // 獲取RHI, 只可在Pass執行時調用.
    FRHIUniformBuffer* GetRHI() const
    {
        return static_cast<FRHIUniformBuffer*>(FRDGResource::GetRHI());
    }
    //////////////////////////////////////////////////////////////////////////

protected:
    // 構造函數.
    template <typename TParameterStruct>
    explicit FRDGUniformBuffer(TParameterStruct* InParameters, const TCHAR* InName)
        : FRDGResource(InName)
        , ParameterStruct(InParameters)
        , bGlobal(ParameterStruct.HasStaticSlot())
    {}

private:
    const FRDGParameterStruct ParameterStruct;
    // 引用了FRHIUniformBuffer的資源.
    // 注意TUniformBufferRef<TBufferStruct>和FUniformBufferRHIRef時等價的.
    TRefCountPtr<FRHIUniformBuffer> UniformBufferRHI;
    FRDGUniformBufferHandle Handle;

    // 是否被全局Shader還是局部Shader綁定.
    uint8 bGlobal : 1;

    friend FRDGBuilder;
    friend FRDGUniformBufferRegistry;
    friend FRDGAllocator;
};

// FRDGUniformBuffer的模板版本.
template <typename ParameterStructType>
class TRDGUniformBuffer : public FRDGUniformBuffer
{
public:
    // 數據獲取介面.
    const TRDGParameterStruct<ParameterStructType>& GetParameters() const;
    TUniformBufferRef<ParameterStructType> GetRHIRef() const;
    const ParameterStructType* operator->() const;

private:
    explicit TRDGUniformBuffer(ParameterStructType* InParameters, const TCHAR* InName)
        : FRDGUniformBuffer(InParameters, InName)
    {}

    friend FRDGBuilder;
    friend FRDGUniformBufferRegistry;
    friend FRDGAllocator;
};

將它們抽象成UML繼承圖之後,如下所示:

classDiagram-v2

FRHIResource <|– FRHIUniformBuffer
FUniformBufferRHIRef <|– TUniformBufferRef
FRHIUniformBuffer <– FUniformBufferRHIRef

class FRHIResource{

}
class FRHIUniformBuffer{
FRHIUniformBufferLayout* Layout
uint32 LayoutConstantBufferSize
}

class FUniformBufferRHIRef{
FRHIUniformBuffer* Reference
}
class TUniformBufferRef{
TUniformBufferRef(FRHIUniformBuffer* InRHIRef)
CreateUniformBufferImmediate()
CreateLocalUniformBuffer()
UpdateUniformBufferImmediate()
}

FRenderResource <|– TUniformBuffer
TUniformBufferRef <– TUniformBuffer

class FRenderResource{

}
class TUniformBuffer{
SetContents()
GetUniformBufferRHI()
GetUniformBufferRef()
uint8* Contents
EUniformBufferUsage BufferUsage
TUniformBufferRef<TBufferStruct> UniformBufferRHI
}

FRDGUniformBuffer <|– TRDGUniformBuffer
FUniformBufferRHIRef <– FRDGUniformBuffer

class FRDGUniformBuffer{
FUniformBufferRHIRef UniformBufferRHI
FRDGUniformBufferHandle Handle
}

class TRDGUniformBuffer{
GetRHIRef()
}

吐槽一下:文本繪圖語法Mermaid不能指定布局,自動生成的圖形布局不夠美觀,並且在window下放大UI之後,文字顯示不全了。湊合著看吧。

以上Uniform Buffer的類型可以通過SHADER_PARAMETER的相關宏定義結構體和結構體成員。SHADER_PARAMETER的相關宏定義如下:

// Engine\Source\Runtime\RenderCore\Public\ShaderParameterMacros.h

// Shader Parameter Struct: 開始/結束.
#define BEGIN_SHADER_PARAMETER_STRUCT(StructTypeName, PrefixKeywords)
#define END_SHADER_PARAMETER_STRUCT()

// Uniform Buffer Struct: 開始/結束/實現.
#define BEGIN_UNIFORM_BUFFER_STRUCT(StructTypeName, PrefixKeywords)
#define BEGIN_UNIFORM_BUFFER_STRUCT_WITH_CONSTRUCTOR(StructTypeName, PrefixKeywords)
#define END_UNIFORM_BUFFER_STRUCT()
#define IMPLEMENT_UNIFORM_BUFFER_STRUCT(StructTypeName,ShaderVariableName)
#define IMPLEMENT_UNIFORM_BUFFER_ALIAS_STRUCT(StructTypeName, UniformBufferAlias)
#define IMPLEMENT_STATIC_UNIFORM_BUFFER_STRUCT(StructTypeName,ShaderVariableName,StaticSlotName)
#define IMPLEMENT_STATIC_UNIFORM_BUFFER_SLOT(SlotName)

// Global Shader Parameter Struct: 開始/結束/實現.
#define BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT
#define BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT_WITH_CONSTRUCTOR
#define END_GLOBAL_SHADER_PARAMETER_STRUCT
#define IMPLEMENT_GLOBAL_SHADER_PARAMETER_STRUCT
#define IMPLEMENT_GLOBAL_SHADER_PARAMETER_ALIAS_STRUCT

// Shader Parameter: 單個, 數組.
#define SHADER_PARAMETER(MemberType, MemberName)
#define SHADER_PARAMETER_EX(MemberType,MemberName,Precision)
#define SHADER_PARAMETER_ARRAY(MemberType,MemberName,ArrayDecl)
#define SHADER_PARAMETER_ARRAY_EX(MemberType,MemberName,ArrayDecl,Precision)

// Shader Parameter: 紋理, SRV, UAV, 取樣器及其數組
#define SHADER_PARAMETER_TEXTURE(ShaderType,MemberName)
#define SHADER_PARAMETER_TEXTURE_ARRAY(ShaderType,MemberName, ArrayDecl)
#define SHADER_PARAMETER_SRV(ShaderType,MemberName)
#define SHADER_PARAMETER_UAV(ShaderType,MemberName)
#define SHADER_PARAMETER_SAMPLER(ShaderType,MemberName)
#define SHADER_PARAMETER_SAMPLER_ARRAY(ShaderType,MemberName, ArrayDecl)

// Shader Parameter Struct內的Shader Parameter Struct參數.
#define SHADER_PARAMETER_STRUCT(StructType,MemberName)
#define SHADER_PARAMETER_STRUCT_ARRAY(StructType,MemberName, ArrayDecl)
#define SHADER_PARAMETER_STRUCT_INCLUDE(StructType,MemberName)
// 引用一個[全局]的著色器參數結構體.
#define SHADER_PARAMETER_STRUCT_REF(StructType,MemberName)

// RDG模式的Shader Parameter.
#define SHADER_PARAMETER_RDG_TEXTURE(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_TEXTURE_SRV(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_TEXTURE_UAV(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_TEXTURE_UAV_ARRAY(ShaderType,MemberName, ArrayDecl)
#define SHADER_PARAMETER_RDG_BUFFER(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_BUFFER_SRV(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_BUFFER_SRV_ARRAY(ShaderType,MemberName, ArrayDecl)
#define SHADER_PARAMETER_RDG_BUFFER_UAV(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_BUFFER_UAV(ShaderType,MemberName)
#define SHADER_PARAMETER_RDG_BUFFER_UAV_ARRAY(ShaderType,MemberName, ArrayDecl)
#define SHADER_PARAMETER_RDG_UNIFORM_BUFFER(StructType, MemberName)

注意局部(普通)的Shader Parameter Struct沒有實現(IMPLEMENT_SHADER_PARAMETER_STRUCT)宏,Global的才有(IMPLEMENT_GLOBAL_SHADER_PARAMETER_STRUCT)。

下面給出示例,展示如何用上述部分宏來聲明著色器的各類參數:

// 定義全局的著色器參數結構體(可在.h或.cpp, 不過一般在.h)
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FMyShaderParameterStruct, )
    // 常規單個和數組參數.
    SHADER_PARAMETER(float, Intensity)
    SHADER_PARAMETER_ARRAY(FVector3, Vertexes, [8])
    
    // 取樣器, 紋理, SRV, UAV
    SHADER_PARAMETER_SAMPLER(SamplerState, TextureSampler)
    SHADER_PARAMETER_TEXTURE(Texture3D, Texture3d)
    SHADER_PARAMETER_SRV(Buffer<float4>, VertexColorBuffer)
    SHADER_PARAMETER_UAV(RWStructuredBuffer<float4>, OutputTexture)
    
    // 著色器參數結構體
    // 引用著色器參數結構體(全局的才行)
    SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View)
    // 包含著色器參數結構體(局部或全局都行)
    SHADER_PARAMETER_STRUCT_INCLUDE(FSceneTextureShaderParameters, SceneTextures)
END_GLOBAL_SHADER_PARAMETER_STRUCT()

// 實現全局的著色器參數結構體(只能在.cpp)
IMPLEMENT_GLOBAL_SHADER_PARAMETER_STRUCT(FMyShaderParameterStruct, "MyShaderParameterStruct");

上面的著色器結構體是在C++側聲明和實現的,如果需要正確傳入到Shader中,還需要額外的C++程式碼來完成:

// 聲明結構體.
FMyShaderParameterStruct MyShaderParameterStruct;

// 創建RHI資源.
// 可以是多幀(UniformBuffer_MultiFrame)的, 這樣只需創建1次就可以快取指針, 後續有數據更新調用UpdateUniformBufferImmediate即可.
// 也可以是單幀的(UniformBuffer_SingleFrame), 則每幀需要創建和更新數據.
auto MyShaderParameterStructRHI = TUniformBufferRef<FMyShaderParameterStruct>::CreateUniformBufferImmediate(ShaderParameterStruct, EUniformBufferUsage::UniformBuffer_MultiFrame);

// 更新著色器參數結構體.
MyShaderParameterStruct.Intensity = 1.0f;
(......)

// 更新數據到RHI.
MyShaderParameterStructRHI.UpdateUniformBufferImmediate(MyShaderParameterStruct);

8.2.4 Vertex Factory

我們知道,在引擎中存在著靜態網格、矇騙骨骼、程式化網格以及地形等等類型的網格類型,而材質就是通過頂點工廠FVertexFactory來支援這些網格類型。實際上,頂點工廠要涉及各方面的數據和類型,包含但不限於:

  • 頂點著色器。頂點著色器的輸入輸出需要頂點工廠來表明數據的布局。
  • 頂點工廠的參數和RHI資源。這些數據將從C++層傳入到頂點著色器中進行處理。
  • 頂點緩衝和頂點布局。通過頂點布局,我們可以自定義和擴展頂點緩衝的輸入,從而實現訂製化的Shader程式碼。
  • 幾何預處理。頂點緩衝、網格資源、材質參數等等都可以在真正渲染前預處理它們。

頂點工廠在渲染層級中的關係。由圖可知,頂點工廠是渲染執行緒的對象,橫跨於CPU和GPU兩端。

FVertexFactory封裝了可以鏈接到頂點著色器的頂點數據資源,它和相關類型的定義如下:

// Engine\Source\Runtime\RHI\Public\RHI.h

// 頂點元素.
struct FVertexElement
{
    uint8 StreamIndex;      // 流索引
    uint8 Offset;          // 偏移
    TEnumAsByte<EVertexElementType> Type; // 類型
    uint8 AttributeIndex;// 屬性索引
    uint16 Stride;          // 步長
    // 實例索引或頂點索引是否實例化的, 若是0, 則元素會對每個實例進行重複.
    uint16 bUseInstanceIndex;

    FVertexElement();
    FVertexElement(uint8 InStreamIndex, ...);
    
    void operator=(const FVertexElement& Other);
    friend FArchive& operator<<(FArchive& Ar,FVertexElement& Element);
    
    FString ToString() const;
    void FromString(const FString& Src);
    void FromString(const FStringView& Src);
};

// 頂點聲明元素列表的類型.
typedef TArray<FVertexElement,TFixedAllocator<MaxVertexElementCount> > FVertexDeclarationElementList;


// Engine\Source\Runtime\RHI\Public\RHIResources.h

// 頂點聲明的RHI資源
class FRHIVertexDeclaration : public FRHIResource
{
public:
    virtual bool GetInitializer(FVertexDeclarationElementList& Init) { return false; }
};

// 頂點緩衝區
class FRHIVertexBuffer : public FRHIResource
{
public:
    FRHIVertexBuffer(uint32 InSize,uint32 InUsage);

    uint32 GetSize() const;
    uint32 GetUsage() const;

protected:
    FRHIVertexBuffer();

    void Swap(FRHIVertexBuffer& Other);
    void ReleaseUnderlyingResource();

private:
    // 尺寸.
    uint32 Size;
    // 緩衝區標記, 如BUF_UnorderedAccess
    uint32 Usage;
};


// Engine\Source\Runtime\RenderCore\Public\VertexFactory.h

// 頂點輸入流.
struct FVertexInputStream
{
    // 頂點流索引
    uint32 StreamIndex : 4;
    // 在VertexBuffer的偏移.
    uint32 Offset : 28;
    // 頂點快取區
    FRHIVertexBuffer* VertexBuffer;

    FVertexInputStream();
    FVertexInputStream(uint32 InStreamIndex, uint32 InOffset, FRHIVertexBuffer* InVertexBuffer);

    inline bool operator==(const FVertexInputStream& rhs) const;
    inline bool operator!=(const FVertexInputStream& rhs) const;
};

// 頂點輸入流數組.
typedef TArray<FVertexInputStream, TInlineAllocator<4>> FVertexInputStreamArray;

// 頂點流標記
enum class EVertexStreamUsage : uint8
{
    Default            = 0 << 0, // 默認
    Instancing        = 1 << 0, // 實例化
    Overridden        = 1 << 1, // 覆蓋
    ManualFetch        = 1 << 2  // 手動獲取
};

// 頂點輸入流類型.
enum class EVertexInputStreamType : uint8
{
    Default = 0,  // 默認
    PositionOnly, // 只有位置
    PositionAndNormalOnly // 只有位置和法線
};

// 頂點流組件.
struct FVertexStreamComponent
{
    // 流數據的頂點緩衝區, 如果為null, 則不會有數據從此頂點流被讀取.
    const FVertexBuffer* VertexBuffer = nullptr;

    // vertex buffer的偏移.
    uint32 StreamOffset = 0;
    // 數據的偏移, 相對於頂點緩衝區中每個元素的開頭.
    uint8 Offset = 0;
    // 數據的步長.
    uint8 Stride = 0;
    // 從流讀取的數據類型.
    TEnumAsByte<EVertexElementType> Type = VET_None;
    // 頂點流標記.
    EVertexStreamUsage VertexStreamUsage = EVertexStreamUsage::Default;

    (......)
};

// 著色器使用的頂點工廠的參數綁定介面.
class FVertexFactoryShaderParameters
{
public:
    // 綁定參數到ParameterMap. 具體邏輯由子類完成.
    void Bind(const class FShaderParameterMap& ParameterMap) {}

    // 獲取頂點工廠的著色器綁定和頂點流. 具體邏輯由子類完成.
    void GetElementShaderBindings(
        const class FSceneInterface* Scene,
        const class FSceneView* View,
        const class FMeshMaterialShader* Shader,
        const EVertexInputStreamType InputStreamType,
        ERHIFeatureLevel::Type FeatureLevel,
        const class FVertexFactory* VertexFactory,
        const struct FMeshBatchElement& BatchElement,
        class FMeshDrawSingleShaderBindings& ShaderBindings,
        FVertexInputStreamArray& VertexStreams) const {}

    (......)
};

// 用來表示頂點工廠類型的類.
class FVertexFactoryType
{
public:
    // 類型定義
    typedef FVertexFactoryShaderParameters* (*ConstructParametersType)(EShaderFrequency ShaderFrequency, const class FShaderParameterMap& ParameterMap);
    typedef const FTypeLayoutDesc* (*GetParameterTypeLayoutType)(EShaderFrequency ShaderFrequency);
    (......)

    // 獲取頂點工廠類型數量.
    static int32 GetNumVertexFactoryTypes();

    // 獲取全局的著色器工廠列表.
    static RENDERCORE_API TLinkedList<FVertexFactoryType*>*& GetTypeList();
    // 獲取已存的材質類型列表.
    static RENDERCORE_API const TArray<FVertexFactoryType*>& GetSortedMaterialTypes();
    // 通過名字查找FVertexFactoryType
    static RENDERCORE_API FVertexFactoryType* GetVFByName(const FHashedName& VFName);

    // 初始化FVertexFactoryType靜態成員, 必須在VF類型創建之前調用.
    static void Initialize(const TMap<FString, TArray<const TCHAR*> >& ShaderFileToUniformBufferVariables);
    static void Uninitialize();

    // 構造/析構函數.
    RENDERCORE_API FVertexFactoryType(...);
    virtual ~FVertexFactoryType();

    // 數據獲取介面.
    const TCHAR* GetName() const;
    FName GetFName() const;
    const FHashedName& GetHashedName() const;
    const TCHAR* GetShaderFilename() const;

    // 著色器參數介面.
    FVertexFactoryShaderParameters* CreateShaderParameters(...) const;
    const FTypeLayoutDesc* GetShaderParameterLayout(...) const;
    void GetShaderParameterElementShaderBindings(...) const;

    // 標記訪問.
    bool IsUsedWithMaterials() const;
    bool SupportsStaticLighting() const;
    bool SupportsDynamicLighting() const;
    bool SupportsPrecisePrevWorldPos() const;
    bool SupportsPositionOnly() const;
    bool SupportsCachingMeshDrawCommands() const;
    bool SupportsPrimitiveIdStream() const;

    // 獲取哈希.
    friend uint32 GetTypeHash(const FVertexFactoryType* Type);
    // 基於頂點工廠類型的源碼和包含計算出來的哈希.
    const FSHAHash& GetSourceHash(EShaderPlatform ShaderPlatform) const;
    // 是否需要快取材質的著色器類型.
    bool ShouldCache(const FVertexFactoryShaderPermutationParameters& Parameters) const;

    void ModifyCompilationEnvironment(...);
    void ValidateCompiledResult(EShaderPlatform Platform, ...);

    bool SupportsTessellationShaders() const;

    // 增加引用的Uniform Buffer包含.
    void AddReferencedUniformBufferIncludes(...);
    void FlushShaderFileCache(...);
    const TMap<const TCHAR*, FCachedUniformBufferDeclaration>& GetReferencedUniformBufferStructsCache() const;

private:
    static uint32 NumVertexFactories;
    static bool bInitializedSerializationHistory;

    // 頂點工廠類型的各類數據和標記.
    const TCHAR* Name;
    const TCHAR* ShaderFilename;
    FName TypeName;
    FHashedName HashedName;
    uint32 bUsedWithMaterials : 1;
    uint32 bSupportsStaticLighting : 1;
    uint32 bSupportsDynamicLighting : 1;
    uint32 bSupportsPrecisePrevWorldPos : 1;
    uint32 bSupportsPositionOnly : 1;
    uint32 bSupportsCachingMeshDrawCommands : 1;
    uint32 bSupportsPrimitiveIdStream : 1;
    ConstructParametersType ConstructParameters;
    GetParameterTypeLayoutType GetParameterTypeLayout;
    GetParameterTypeElementShaderBindingsType GetParameterTypeElementShaderBindings;
    ShouldCacheType ShouldCacheRef;
    ModifyCompilationEnvironmentType ModifyCompilationEnvironmentRef;
    ValidateCompiledResultType ValidateCompiledResultRef;
    SupportsTessellationShadersType SupportsTessellationShadersRef;

    // 全局頂點工廠類型列表.
    TLinkedList<FVertexFactoryType*> GlobalListLink;
    // 快取引用的Uniform Buffer的包含.
    TMap<const TCHAR*, FCachedUniformBufferDeclaration> ReferencedUniformBufferStructsCache;
    // 跟蹤ReferencedUniformBufferStructsCache快取了哪些平台的聲明.
    bool bCachedUniformBufferStructDeclarations;
};


// ------頂點工廠的工具宏------

// 實現頂點工廠參數類型
#define IMPLEMENT_VERTEX_FACTORY_PARAMETER_TYPE(FactoryClass, ShaderFrequency, ParameterClass)

// 頂點工廠類型的聲明
#define DECLARE_VERTEX_FACTORY_TYPE(FactoryClass)
// 頂點工廠類型的實現
#define IMPLEMENT_VERTEX_FACTORY_TYPE(FactoryClass,ShaderFilename,bUsedWithMaterials,bSupportsStaticLighting,bSupportsDynamicLighting,bPrecisePrevWorldPos,bSupportsPositionOnly)
// 頂點工廠的虛函數表實現
#define IMPLEMENT_VERTEX_FACTORY_VTABLE(FactoryClass


// 頂點工廠
class FVertexFactory : public FRenderResource
{
public:
    FVertexFactory(ERHIFeatureLevel::Type InFeatureLevel);

    virtual FVertexFactoryType* GetType() const;

    // 獲取頂點數據流.
    void GetStreams(ERHIFeatureLevel::Type InFeatureLevel, EVertexInputStreamType VertexStreamType, FVertexInputStreamArray& OutVertexStreams) const
    {
        // Default頂點流類型
        if (VertexStreamType == EVertexInputStreamType::Default)
        {
            bool bSupportsVertexFetch = SupportsManualVertexFetch(InFeatureLevel);

            // 將頂點工廠的數據構造到FVertexInputStream中並添加到輸出列表
            for (int32 StreamIndex = 0;StreamIndex < Streams.Num();StreamIndex++)
            {
                const FVertexStream& Stream = Streams[StreamIndex];

                if (!(EnumHasAnyFlags(EVertexStreamUsage::ManualFetch, Stream.VertexStreamUsage) && bSupportsVertexFetch))
                {
                    if (!Stream.VertexBuffer)
                    {
                        OutVertexStreams.Add(FVertexInputStream(StreamIndex, 0, nullptr));
                    }
                    else
                    {
                        if (EnumHasAnyFlags(EVertexStreamUsage::Overridden, Stream.VertexStreamUsage) && !Stream.VertexBuffer->IsInitialized())
                        {
                            OutVertexStreams.Add(FVertexInputStream(StreamIndex, 0, nullptr));
                        }
                        else
                        {
                            OutVertexStreams.Add(FVertexInputStream(StreamIndex, Stream.Offset, Stream.VertexBuffer->VertexBufferRHI));
                        }
                    }
                }
            }
        }
        // 只有位置和的頂點流類型
        else if (VertexStreamType == EVertexInputStreamType::PositionOnly)
        {
            // Set the predefined vertex streams.
            for (int32 StreamIndex = 0; StreamIndex < PositionStream.Num(); StreamIndex++)
            {
                const FVertexStream& Stream = PositionStream[StreamIndex];
                OutVertexStreams.Add(FVertexInputStream(StreamIndex, Stream.Offset, Stream.VertexBuffer->VertexBufferRHI));
            }
        }
        // 只有位置和法線的頂點流類型
        else if (VertexStreamType == EVertexInputStreamType::PositionAndNormalOnly)
        {
            // Set the predefined vertex streams.
            for (int32 StreamIndex = 0; StreamIndex < PositionAndNormalStream.Num(); StreamIndex++)
            {
                const FVertexStream& Stream = PositionAndNormalStream[StreamIndex];
                OutVertexStreams.Add(FVertexInputStream(StreamIndex, Stream.Offset, Stream.VertexBuffer->VertexBufferRHI));
            }
        }
        else
        {
            // NOT_IMPLEMENTED
        }
    }
    
    // 偏移實例的數據流.
    void OffsetInstanceStreams(uint32 InstanceOffset, EVertexInputStreamType VertexStreamType, FVertexInputStreamArray& VertexStreams) const;
    
    static void ModifyCompilationEnvironment(...);
    static void ValidateCompiledResult(...);

    static bool SupportsTessellationShaders();

    // FRenderResource介面, 釋放RHI資源.
    virtual void ReleaseRHI();

    // 設置/獲取頂點聲明的RHI引用.
    FVertexDeclarationRHIRef& GetDeclaration();
    void SetDeclaration(FVertexDeclarationRHIRef& NewDeclaration);

    // 根據類型獲取頂點聲明的RHI引用.
    const FVertexDeclarationRHIRef& GetDeclaration(EVertexInputStreamType InputStreamType) const 
    {
        switch (InputStreamType)
        {
        case EVertexInputStreamType::Default:                return Declaration;
        case EVertexInputStreamType::PositionOnly:            return PositionDeclaration;
        case EVertexInputStreamType::PositionAndNormalOnly:    return PositionAndNormalDeclaration;
        }
        return Declaration;
    }

    // 各類標記.
    virtual bool IsGPUSkinned() const;
    virtual bool SupportsPositionOnlyStream() const;
    virtual bool SupportsPositionAndNormalOnlyStream() const;
    virtual bool SupportsNullPixelShader() const;

    // 用面向攝像機精靈的方式渲染圖元.
    virtual bool RendersPrimitivesAsCameraFacingSprites() const;

    // 是否需要頂點聲明.
    bool NeedsDeclaration() const;
    // 是否支援手動的頂點獲取.
    inline bool SupportsManualVertexFetch(const FStaticFeatureLevel InFeatureLevel) const;
    // 根據流類型獲取索引.
    inline int32 GetPrimitiveIdStreamIndex(EVertexInputStreamType InputStreamType) const;

protected:
    inline void SetPrimitiveIdStreamIndex(EVertexInputStreamType InputStreamType, int32 StreamIndex)
    {
        PrimitiveIdStreamIndex[static_cast<uint8>(InputStreamType)] = StreamIndex;
    }

    // 為頂點流組件創建頂點元素.
    FVertexElement AccessStreamComponent(const FVertexStreamComponent& Component,uint8 AttributeIndex);
    FVertexElement AccessStreamComponent(const FVertexStreamComponent& Component, uint8 AttributeIndex, EVertexInputStreamType InputStreamType);
    // 初始化頂點聲明.
    void InitDeclaration(const FVertexDeclarationElementList& Elements, EVertexInputStreamType StreamType = EVertexInputStreamType::Default)
    {
        if (StreamType == EVertexInputStreamType::PositionOnly)
        {
            PositionDeclaration = PipelineStateCache::GetOrCreateVertexDeclaration(Elements);
        }
        else if (StreamType == EVertexInputStreamType::PositionAndNormalOnly)
        {
            PositionAndNormalDeclaration = PipelineStateCache::GetOrCreateVertexDeclaration(Elements);
        }
        else // (StreamType == EVertexInputStreamType::Default)
        {
            // Create the vertex declaration for rendering the factory normally.
            Declaration = PipelineStateCache::GetOrCreateVertexDeclaration(Elements);
        }
    }

    // 頂點流, 需要設置到頂點流的資訊體.
    struct FVertexStream
    {
        const FVertexBuffer* VertexBuffer = nullptr;
        uint32 Offset = 0;
        uint16 Stride = 0;
        EVertexStreamUsage VertexStreamUsage = EVertexStreamUsage::Default;
        uint8 Padding = 0;

        friend bool operator==(const FVertexStream& A,const FVertexStream& B);
        FVertexStream();
    };

    // 用於渲染頂點工廠的頂點流.
    TArray<FVertexStream,TInlineAllocator<8> > Streams;

    // VF(頂點工廠)可以顯式地將此設置為false,以避免在沒有聲明的情況下出現錯誤. 主要用於需要直接從緩衝區獲取數據的VF(如Niagara).
    bool bNeedsDeclaration = true;
    bool bSupportsManualVertexFetch = false;
    int8 PrimitiveIdStreamIndex[3] = { -1, -1, -1 };

private:
    // 只有位置的頂點流, 用於渲染深度Pass的頂點工廠.
    TArray<FVertexStream,TInlineAllocator<2> > PositionStream;
    // 只有位置和法線的頂點流.
    TArray<FVertexStream, TInlineAllocator<3> > PositionAndNormalStream;

    // 用於常規渲染頂點工廠的RHI頂點聲明.
    FVertexDeclarationRHIRef Declaration;

    // PositionStream和PositionAndNormalStream對應的RHI資源.
    FVertexDeclarationRHIRef PositionDeclaration;
    FVertexDeclarationRHIRef PositionAndNormalDeclaration;
};

上面展示了Vertex Factory的很多類型,有好幾個是核心類,比如FVertexFactory、FVertexElement、FRHIVertexDeclaration、FRHIVertexBuffer、FVertexFactoryType、FVertexStreamComponent、FVertexInputStream、FVertexFactoryShaderParameters等。那麼它們之間的關係是什麼呢?

為了更好地說明它們之間的關係,以靜態模型的FStaticMeshDataType為例:

FStaticMeshDataType會包含若干個FVertexStreamComponent實例,每個FVertexStreamComponent包含了一個在FVertexDeclarationElementList的FVertexElement實例索引和一個在FVertexInputStreamArray列表的FVertexStream實例索引。

此外,FVertexFactory是個基類,內置的子類主要有:

  • FGeometryCacheVertexVertexFactory:幾何快取頂點的頂點工廠,常用於預生成的布料、動作等網格類型。

  • FGPUBaseSkinVertexFactory:GPU蒙皮骨骼網格的父類,它的子類有:

    • TGPUSkinVertexFactory:可指定骨骼權重方式的GPU蒙皮的頂點工廠。
  • FLocalVertexFactory:局部頂點工廠,常用於靜態網格,它擁有數量較多的子類:

    • FInstancedStaticMeshVertexFactory:實例化的靜態網格頂點工廠。
    • FSplineMeshVertexFactory:樣條曲線網格頂點工廠。
    • FGeometryCollectionVertexFactory:幾何收集頂點工廠。
    • FGPUSkinPassthroughVertexFactory:啟用了Skin Cache模式的蒙皮骨骼頂點工廠。
    • FSingleTriangleMeshVertexFactory:單個三角形網格的頂點工廠,用於體積雲渲染。
    • ……
  • FParticleVertexFactoryBase:用於粒子渲染的頂點工廠基類。

  • FLandscapeVertexFactory:用於渲染地形的頂點工廠。

除了以上繼承自FVertexFactory,還有一些不是繼承自FVertexFactory的類型,如:

  • FGPUBaseSkinAPEXClothVertexFactory:布料頂點工廠。
    • TGPUSkinAPEXClothVertexFactory:可帶骨骼權重模式的布料頂點工廠。

除了FVertexFactory,相應的其它核心類也有繼承體系。比如FVertexFactoryShaderParameters的子類有:

  • FGeometryCacheVertexFactoryShaderParameters
  • FGPUSkinVertexFactoryShaderParameters
  • FMeshParticleVertexFactoryShaderParameters
  • FParticleSpriteVertexFactoryShaderParameters
  • FGPUSpriteVertexFactoryShaderParametersVS
  • FGPUSpriteVertexFactoryShaderParametersPS
  • FSplineMeshVertexFactoryShaderParameters
  • FLocalVertexFactoryShaderParametersBase
  • FLandscapeVertexFactoryVertexShaderParameters
  • FLandscapeVertexFactoryPixelShaderParameters
  • ……

另外,有部分頂點工廠還會在內部派生FStaticMeshDataType的類型,以復用靜態網格相關的數據成員。

為了更好地說明頂點工廠的使用方式,下面就以最常見的FLocalVertexFactory和使用了FLocalVertexFactory的CableComponent為例:

// Engine\Source\Runtime\Engine\Public\LocalVertexFactory.h

class ENGINE_API FLocalVertexFactory : public FVertexFactory
{
public:
    FLocalVertexFactory(ERHIFeatureLevel::Type InFeatureLevel, const char* InDebugName);

    // 派生自FStaticMeshDataType的數據類型.
    struct FDataType : public FStaticMeshDataType
    {
        FRHIShaderResourceView* PreSkinPositionComponentSRV = nullptr;
    };

    // 環境變數更改和校驗.
    static bool ShouldCompilePermutation(const FVertexFactoryShaderPermutationParameters& Parameters);
    static void ModifyCompilationEnvironment(const FVertexFactoryShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment);
    static void ValidateCompiledResult(const FVertexFactoryType* Type, EShaderPlatform Platform, const FShaderParameterMap& ParameterMap, TArray<FString>& OutErrors);

    // 由TSynchronizedResource從遊戲執行緒更新而來的數據.
    void SetData(const FDataType& InData);
    // 從其它頂點工廠複製數據.
    void Copy(const FLocalVertexFactory& Other);

    // FRenderResource介面.
    virtual void InitRHI() override;
    virtual void ReleaseRHI() override
    {
        UniformBuffer.SafeRelease();
        FVertexFactory::ReleaseRHI();
    }

    // 頂點顏色介面.
    void SetColorOverrideStream(FRHICommandList& RHICmdList, const FVertexBuffer* ColorVertexBuffer) const;
    void GetColorOverrideStream(const FVertexBuffer* ColorVertexBuffer, FVertexInputStreamArray& VertexStreams) const;
    
    // 著色器參數和其它數據介面.
    inline FRHIShaderResourceView* GetPositionsSRV() const;
    inline FRHIShaderResourceView* GetPreSkinPositionSRV() const;
    inline FRHIShaderResourceView* GetTangentsSRV() const;
    inline FRHIShaderResourceView* GetTextureCoordinatesSRV() const;
    inline FRHIShaderResourceView* GetColorComponentsSRV() const;
    inline const uint32 GetColorIndexMask() const;
    inline const int GetLightMapCoordinateIndex() const;
    inline const int GetNumTexcoords() const;
    FRHIUniformBuffer* GetUniformBuffer() const;
    
    (......)

protected:
    // 從遊戲執行緒傳入的數據. FDataType是FStaticMeshDataType的子類.
    FDataType Data;
    // 局部頂點工廠的著色器參數.
    TUniformBufferRef<FLocalVertexFactoryUniformShaderParameters> UniformBuffer;
    // 頂點顏色流索引.
    int32 ColorStreamIndex;

    (......)
};

// Engine\Source\Runtime\Engine\Public\LocalVertexFactory.cpp

void FLocalVertexFactory::InitRHI()
{
    // 是否使用gpu場景.
    const bool bCanUseGPUScene = UseGPUScene(GMaxRHIShaderPlatform, GMaxRHIFeatureLevel);

    // 初始化位置流和位置聲明.
    if (Data.PositionComponent.VertexBuffer != Data.TangentBasisComponents[0].VertexBuffer)
    {
        // 增加頂點聲明.
        auto AddDeclaration = [this, bCanUseGPUScene](EVertexInputStreamType InputStreamType, bool bAddNormal)
        {
            // 頂點流元素.
            FVertexDeclarationElementList StreamElements;
            StreamElements.Add(AccessStreamComponent(Data.PositionComponent, 0, InputStreamType));

            bAddNormal = bAddNormal && Data.TangentBasisComponents[1].VertexBuffer != NULL;
            if (bAddNormal)
            {
                StreamElements.Add(AccessStreamComponent(Data.TangentBasisComponents[1], 2, InputStreamType));
            }

            const uint8 TypeIndex = static_cast<uint8>(InputStreamType);
            PrimitiveIdStreamIndex[TypeIndex] = -1;
            if (GetType()->SupportsPrimitiveIdStream() && bCanUseGPUScene)
            {
                // When the VF is used for rendering in normal mesh passes, this vertex buffer and offset will be overridden
                StreamElements.Add(AccessStreamComponent(FVertexStreamComponent(&GPrimitiveIdDummy, 0, 0, sizeof(uint32), VET_UInt, EVertexStreamUsage::Instancing), 1, InputStreamType));
                PrimitiveIdStreamIndex[TypeIndex] = StreamElements.Last().StreamIndex;
            }

            // 初始化聲明.
            InitDeclaration(StreamElements, InputStreamType);
        };

        // 增加PositionOnly和PositionAndNormalOnly兩種頂點聲明, 其中前者不需要法線.
        AddDeclaration(EVertexInputStreamType::PositionOnly, false);
        AddDeclaration(EVertexInputStreamType::PositionAndNormalOnly, true);
    }

    // 頂點聲明元素列表.
    FVertexDeclarationElementList Elements;
    
    // 頂點位置
    if(Data.PositionComponent.VertexBuffer != NULL)
    {
        Elements.Add(AccessStreamComponent(Data.PositionComponent,0));
    }

    // 圖元id
    {
        const uint8 Index = static_cast<uint8>(EVertexInputStreamType::Default);
        PrimitiveIdStreamIndex[Index] = -1;
        if (GetType()->SupportsPrimitiveIdStream() && bCanUseGPUScene)
        {
            // When the VF is used for rendering in normal mesh passes, this vertex buffer and offset will be overridden
            Elements.Add(AccessStreamComponent(FVertexStreamComponent(&GPrimitiveIdDummy, 0, 0, sizeof(uint32), VET_UInt, EVertexStreamUsage::Instancing), 13));
            PrimitiveIdStreamIndex[Index] = Elements.Last().StreamIndex;
        }
    }

    // 切線和法線, 切線法線才需要被頂點流使用, 副法線由shader生成.
    uint8 TangentBasisAttributes[2] = { 1, 2 };
    for(int32 AxisIndex = 0;AxisIndex < 2;AxisIndex++)
    {
        if(Data.TangentBasisComponents[AxisIndex].VertexBuffer != NULL)
        {
            Elements.Add(AccessStreamComponent(Data.TangentBasisComponents[AxisIndex],TangentBasisAttributes[AxisIndex]));
        }
    }

    if (Data.ColorComponentsSRV == nullptr)
    {
        Data.ColorComponentsSRV = GNullColorVertexBuffer.VertexBufferSRV;
        Data.ColorIndexMask = 0;
    }

    // 頂點顏色
    ColorStreamIndex = -1;
    if(Data.ColorComponent.VertexBuffer)
    {
        Elements.Add(AccessStreamComponent(Data.ColorComponent,3));
        ColorStreamIndex = Elements.Last().StreamIndex;
    }
    else
    {
        FVertexStreamComponent NullColorComponent(&GNullColorVertexBuffer, 0, 0, VET_Color, EVertexStreamUsage::ManualFetch);
        Elements.Add(AccessStreamComponent(NullColorComponent, 3));
        ColorStreamIndex = Elements.Last().StreamIndex;
    }

    // 紋理坐標
    if(Data.TextureCoordinates.Num())
    {
        const int32 BaseTexCoordAttribute = 4;
        for(int32 CoordinateIndex = 0;CoordinateIndex < Data.TextureCoordinates.Num();CoordinateIndex++)
        {
            Elements.Add(AccessStreamComponent(
                Data.TextureCoordinates[CoordinateIndex],
                BaseTexCoordAttribute + CoordinateIndex
                ));
        }

        for (int32 CoordinateIndex = Data.TextureCoordinates.Num(); CoordinateIndex < MAX_STATIC_TEXCOORDS / 2; CoordinateIndex++)
        {
            Elements.Add(AccessStreamComponent(
                Data.TextureCoordinates[Data.TextureCoordinates.Num() - 1],
                BaseTexCoordAttribute + CoordinateIndex
                ));
        }
    }

    // 光照圖
    if(Data.LightMapCoordinateComponent.VertexBuffer)
    {
        Elements.Add(AccessStreamComponent(Data.LightMapCoordinateComponent,15));
    }
    else if(Data.TextureCoordinates.Num())
    {
        Elements.Add(AccessStreamComponent(Data.TextureCoordinates[0],15));
    }

    // 初始化頂點聲明
    InitDeclaration(Elements);

    const int32 DefaultBaseVertexIndex = 0;
    const int32 DefaultPreSkinBaseVertexIndex = 0;
    if (RHISupportsManualVertexFetch(GMaxRHIShaderPlatform) || bCanUseGPUScene)
    {
        SCOPED_LOADTIMER(FLocalVertexFactory_InitRHI_CreateLocalVFUniformBuffer);
        UniformBuffer = CreateLocalVFUniformBuffer(this, Data.LODLightmapDataIndex, nullptr, DefaultBaseVertexIndex, DefaultPreSkinBaseVertexIndex);
    }
}

// 實現FLocalVertexFactory的參數類型.
IMPLEMENT_VERTEX_FACTORY_PARAMETER_TYPE(FLocalVertexFactory, SF_Vertex, FLocalVertexFactoryShaderParameters);

// 實現FLocalVertexFactory.
IMPLEMENT_VERTEX_FACTORY_TYPE_EX(FLocalVertexFactory,"/Engine/Private/LocalVertexFactory.ush",true,true,true,true,true,true,true);

下面進入CableComponent相關類型關於FLocalVertexFactory的使用:

// Engine\Plugins\Runtime\CableComponent\Source\CableComponent\Private\CableComponent.cpp

class FCableSceneProxy final : public FPrimitiveSceneProxy
{
public:
    FCableSceneProxy(UCableComponent* Component)
        : FPrimitiveSceneProxy(Component)
        , Material(NULL)
        // 構造頂點工廠.
        , VertexFactory(GetScene().GetFeatureLevel(), "FCableSceneProxy")
        (......)
    {
        // 利用頂點工廠初始化緩衝區.
        VertexBuffers.InitWithDummyData(&VertexFactory, GetRequiredVertexCount());
        (......)
    }

    virtual ~FCableSceneProxy()
    {
        // 釋放頂點工廠.
        VertexFactory.ReleaseResource();
        (......)
    }

    // 構建Cable網格.
    void BuildCableMesh(const TArray<FVector>& InPoints, TArray<FDynamicMeshVertex>& OutVertices, TArray<int32>& OutIndices)
    {
        (......)
    }

    // 設置動態數據(渲染執行緒調用)
    void SetDynamicData_RenderThread(FCableDynamicData* NewDynamicData)
    {
        // 釋放舊數據.
        if(DynamicData)
        {
            delete DynamicData;
            DynamicData = NULL;
        }
        DynamicData = NewDynamicData;

        // 從Cable點構建頂點.
        TArray<FDynamicMeshVertex> Vertices;
        TArray<int32> Indices;
        BuildCableMesh(NewDynamicData->CablePoints, Vertices, Indices);

        // 填充頂點緩衝區數據.
        for (int i = 0; i < Vertices.Num(); i++)
        {
            const FDynamicMeshVertex& Vertex = Vertices[i];

            VertexBuffers.PositionVertexBuffer.VertexPosition(i) = Vertex.Position;
            VertexBuffers.StaticMeshVertexBuffer.SetVertexTangents(i, Vertex.TangentX.ToFVector(), Vertex.GetTangentY(), Vertex.TangentZ.ToFVector());
            VertexBuffers.StaticMeshVertexBuffer.SetVertexUV(i, 0, Vertex.TextureCoordinate[0]);
            VertexBuffers.ColorVertexBuffer.VertexColor(i) = Vertex.Color;
        }

        // 更新頂點緩衝區數據到RHI.
        {
            auto& VertexBuffer = VertexBuffers.PositionVertexBuffer;
            void* VertexBufferData = RHILockVertexBuffer(VertexBuffer.VertexBufferRHI, 0, VertexBuffer.GetNumVertices() * VertexBuffer.GetStride(), RLM_WriteOnly);
            FMemory::Memcpy(VertexBufferData, VertexBuffer.GetVertexData(), VertexBuffer.GetNumVertices() * VertexBuffer.GetStride());
            RHIUnlockVertexBuffer(VertexBuffer.VertexBufferRHI);
        }

        (......)
    }

    virtual void GetDynamicMeshElements(const TArray<const FSceneView*>& Views, const FSceneViewFamily& ViewFamily, uint32 VisibilityMap, FMeshElementCollector& Collector) const override
    {
        (......)

        for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
        {
            if (VisibilityMap & (1 << ViewIndex))
            {
                const FSceneView* View = Views[ViewIndex];
                
                // 構造FMeshBatch實例.
                FMeshBatch& Mesh = Collector.AllocateMesh();
                // 將頂點工廠實例傳給FMeshBatch實例.
                Mesh.VertexFactory = &VertexFactory;
                
                (......)
                
                Collector.AddMesh(ViewIndex, Mesh);
            }
        }
    }

    (......)

private:
    // 材質
    UMaterialInterface* Material;
    // 頂點和索引緩衝.
    FStaticMeshVertexBuffers VertexBuffers;
    FCableIndexBuffer IndexBuffer;
    // 頂點工廠.
    FLocalVertexFactory VertexFactory;
    // 動態數據.
    FCableDynamicData* DynamicData;

    (......)
};

由上面的程式碼可知,使用已有的頂點工廠的步驟並複雜,主要在於初始化、賦值和傳遞給FMeshBatch實例等步驟。

不過,無論是使用已有的還是自定義的頂點工廠,頂點工廠的頂點聲明的順序、類型、組件數量和插槽需要和HLSL層的FVertexFactoryInput保持一致。比如說FLocalVertexFactory::InitRHI的頂點聲明順序是位置、切線、顏色、紋理坐標、光照圖,那麼我們進入FLocalVertexFactory對應的HLSL文件(由IMPLEMENT_VERTEX_FACTORY_TYPE等宏指定)看看:

// Engine\Shaders\Private\LocalVertexFactory.ush

// 局部頂點工廠對應的輸入結構體.
struct FVertexFactoryInput
{
    // 位置
    float4    Position    : ATTRIBUTE0;

    // 切線和顏色
#if !MANUAL_VERTEX_FETCH
    #if METAL_PROFILE
        float3    TangentX    : ATTRIBUTE1;
        // TangentZ.w contains sign of tangent basis determinant
        float4    TangentZ    : ATTRIBUTE2;

        float4    Color        : ATTRIBUTE3;
    #else
        half3    TangentX    : ATTRIBUTE1;
        // TangentZ.w contains sign of tangent basis determinant
        half4    TangentZ    : ATTRIBUTE2;

        half4    Color        : ATTRIBUTE3;
    #endif
#endif

    // 紋理坐標
#if NUM_MATERIAL_TEXCOORDS_VERTEX
    #if !MANUAL_VERTEX_FETCH
        #if GPUSKIN_PASS_THROUGH
            // These must match GPUSkinVertexFactory.usf
            float2    TexCoords[NUM_MATERIAL_TEXCOORDS_VERTEX] : ATTRIBUTE4;
            #if NUM_MATERIAL_TEXCOORDS_VERTEX > 4
                #error Too many texture coordinate sets defined on GPUSkin vertex input. Max: 4.
            #endif
        #else
            #if NUM_MATERIAL_TEXCOORDS_VERTEX > 1
                float4    PackedTexCoords4[NUM_MATERIAL_TEXCOORDS_VERTEX/2] : ATTRIBUTE4;
            #endif
            #if NUM_MATERIAL_TEXCOORDS_VERTEX == 1
                float2    PackedTexCoords2 : ATTRIBUTE4;
            #elif NUM_MATERIAL_TEXCOORDS_VERTEX == 3
                float2    PackedTexCoords2 : ATTRIBUTE5;
            #elif NUM_MATERIAL_TEXCOORDS_VERTEX == 5
                float2    PackedTexCoords2 : ATTRIBUTE6;
            #elif NUM_MATERIAL_TEXCOORDS_VERTEX == 7
                float2    PackedTexCoords2 : ATTRIBUTE7;
            #endif
        #endif
    #endif
#elif USE_PARTICLE_SUBUVS && !MANUAL_VERTEX_FETCH
    float2    TexCoords[1] : ATTRIBUTE4;
#endif

    (......)
};

因此可知,FVertexFactoryInput結構體的數據順序和FLocalVertexFactory的頂點聲明是一一對應的。

8.2.5 Shader Permutation

UE的Shader程式碼是取樣的了全能著色器(Uber Shader)的設計架構,這就需要在同一個shader程式碼文件里增加許多各種各樣的宏,以區分不同Pass、功能、Feature Level和品質等級的分支程式碼。在C++層,為了方便擴展、設置這些宏定義的開啟及不同的值,UE採用了著色器排列(Shader Permutation)的概念。

每一個排列包含著一個唯一的哈希鍵值,將這組排列的值填充到HLSL,編譯出對應的著色器程式碼。下面分析著色器排列的核心類型的定義:

// Engine\Source\Runtime\RenderCore\Public\ShaderPermutation.h

// Bool的著色器排列
struct FShaderPermutationBool
{
    using Type = bool;

    // 維度數量.
    static constexpr int32 PermutationCount = 2;
    // 是否多維的排列.
    static constexpr bool IsMultiDimensional = false;
    // 轉換bool到int值.
    static int32 ToDimensionValueId(Type E)
    {
        return E ? 1 : 0;
    }
    // 轉換為定義的值.
    static bool ToDefineValue(Type E)
    {
        return E;
    }
    // 從排列id轉成bool.
    static Type FromDimensionValueId(int32 PermutationId)
    {
        checkf(PermutationId == 0 || PermutationId == 1, TEXT("Invalid shader permutation dimension id %i."), PermutationId);
        return PermutationId == 1;
    }
};

// 整型的著色器排列
template <typename TType, int32 TDimensionSize, int32 TFirstValue=0>
struct TShaderPermutationInt
{
    using Type = TType;
    static constexpr int32 PermutationCount = TDimensionSize;
    static constexpr bool IsMultiDimensional = false;
    
    // 最大最小值.
    static constexpr Type MinValue = static_cast<Type>(TFirstValue);
    static constexpr Type MaxValue = static_cast<Type>(TFirstValue + TDimensionSize - 1);

    static int32 ToDimensionValueId(Type E)
    static int32 ToDefineValue(Type E);
    static Type FromDimensionValueId(int32 PermutationId);
};

// 可變維度的整型著色器排列.
template <int32... Ts>
struct TShaderPermutationSparseInt
{
    using Type = int32;
    static constexpr int32 PermutationCount = 0;
    static constexpr bool IsMultiDimensional = false;

    static int32 ToDimensionValueId(Type E);
    static Type FromDimensionValueId(int32 PermutationId);
};

// 著色器排列域, 數量是可變的
template <typename... Ts>
struct TShaderPermutationDomain
{
    using Type = TShaderPermutationDomain<Ts...>;

    static constexpr bool IsMultiDimensional = true;
    static constexpr int32 PermutationCount = 1;

    // 構造函數.
    TShaderPermutationDomain<Ts...>() {}
    explicit TShaderPermutationDomain<Ts...>(int32 PermutationId)
    {
        checkf(PermutationId == 0, TEXT("Invalid shader permutation id %i."), PermutationId);
    }

    // 設置某個維度的值.
    template<class DimensionToSet>
    void Set(typename DimensionToSet::Type)
    {
        static_assert(sizeof(typename DimensionToSet::Type) == 0, "Unknown shader permutation dimension.");
    }
    // 獲取某個維度的值.
    template<class DimensionToGet>
    const typename DimensionToGet::Type Get() const
    {
        static_assert(sizeof(typename DimensionToGet::Type) == 0, "Unknown shader permutation dimension.");
        return DimensionToGet::Type();
    }

    // 修改編譯環境變數.
    void ModifyCompilationEnvironment(FShaderCompilerEnvironment& OutEnvironment) const {}

    // 數據轉換.
    static int32 ToDimensionValueId(const Type& PermutationVector)
    {
        return 0;
    }
    int32 ToDimensionValueId() const
    {
        return ToDimensionValueId(*this);
    }
    static Type FromDimensionValueId(const int32 PermutationId)
    {
        return Type(PermutationId);
    }

    bool operator==(const Type& Other) const
    {
        return true;
    }
};


// 下面的宏方便編寫shader的c++程式碼時實現和設置著色器排列.

// 聲明指定名字的bool類型著色器排列
#define SHADER_PERMUTATION_BOOL(InDefineName)
// 聲明指定名字的int類型著色器排列
#define SHADER_PERMUTATION_INT(InDefineName, Count)
// 聲明指定名字和範圍的int類型著色器排列
#define SHADER_PERMUTATION_RANGE_INT(InDefineName, Start, Count)
// 聲明指定名字的稀疏int類型著色器排列
#define SHADER_PERMUTATION_SPARSE_INT(InDefineName,...)
// 聲明指定名字的枚舉類型著色器排列
#define SHADER_PERMUTATION_ENUM_CLASS(InDefineName, EnumName)

看上面的模板和宏定義是不是有點懵、不知所以然?沒關係,結合FDeferredLightPS的使用案例,會發現著色器排列其實很簡單:

// 延遲光源的PS.
class FDeferredLightPS : public FGlobalShader
{
    DECLARE_SHADER_TYPE(FDeferredLightPS, Global)

    // 聲明各個維度的著色器排列, 注意用的是繼承, 且父類是用SHADER_PERMUTATION_xxx定義的類型.
    // 注意父類的名詞(如LIGHT_SOURCE_SHAPE, USE_SOURCE_TEXTURE, USE_IES_PROFILE, ...)就是在HLSL程式碼中的宏名稱.
    class FSourceShapeDim        : SHADER_PERMUTATION_ENUM_CLASS("LIGHT_SOURCE_SHAPE", ELightSourceShape);
    class FSourceTextureDim        : SHADER_PERMUTATION_BOOL("USE_SOURCE_TEXTURE");
    class FIESProfileDim        : SHADER_PERMUTATION_BOOL("USE_IES_PROFILE");
    class FInverseSquaredDim    : SHADER_PERMUTATION_BOOL("INVERSE_SQUARED_FALLOFF");
    class FVisualizeCullingDim    : SHADER_PERMUTATION_BOOL("VISUALIZE_LIGHT_CULLING");
    class FLightingChannelsDim    : SHADER_PERMUTATION_BOOL("USE_LIGHTING_CHANNELS");
    class FTransmissionDim        : SHADER_PERMUTATION_BOOL("USE_TRANSMISSION");
    class FHairLighting            : SHADER_PERMUTATION_INT("USE_HAIR_LIGHTING", 2);
    class FAtmosphereTransmittance : SHADER_PERMUTATION_BOOL("USE_ATMOSPHERE_TRANSMITTANCE");
    class FCloudTransmittance     : SHADER_PERMUTATION_BOOL("USE_CLOUD_TRANSMITTANCE");
    class FAnistropicMaterials     : SHADER_PERMUTATION_BOOL("SUPPORTS_ANISOTROPIC_MATERIALS");

    // 聲明著色器排列域, 包含了上面定義的所有維度.
    using FPermutationDomain = TShaderPermutationDomain<
        FSourceShapeDim,
        FSourceTextureDim,
        FIESProfileDim,
        FInverseSquaredDim,
        FVisualizeCullingDim,
        FLightingChannelsDim,
        FTransmissionDim,
        FHairLighting,
        FAtmosphereTransmittance,
        FCloudTransmittance,
        FAnistropicMaterials>;

    // 是否需要編譯指定的著色器排列.
    static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)
    {
        // 獲取著色器排列的值.
        FPermutationDomain PermutationVector(Parameters.PermutationId);

        // 如果是平行光, 那麼IES光照和逆反的衰減將沒有任何意義, 可以不編譯.
        if( PermutationVector.Get< FSourceShapeDim >() == ELightSourceShape::Directional && (
            PermutationVector.Get< FIESProfileDim >() ||
            PermutationVector.Get< FInverseSquaredDim >() ) )
        {
            return false;
        }

        // 如果不是平行光, 那麼大氣和雲體透射將沒有任何意義, 可以不編譯.
        if (PermutationVector.Get< FSourceShapeDim >() != ELightSourceShape::Directional && (PermutationVector.Get<FAtmosphereTransmittance>() || PermutationVector.Get<FCloudTransmittance>()))
        {
            return false;
        }

        (......)

        return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::SM5);
    }

    (......)
};

// 渲染光源.
void FDeferredShadingSceneRenderer::RenderLight(FRHICommandList& RHICmdList, ...)
{
    (......)

    for (int32 ViewIndex = 0; ViewIndex < Views.Num(); ViewIndex++)
    {
        FViewInfo& View = Views[ViewIndex];
        
        (......)
        
        if (LightSceneInfo->Proxy->GetLightType() == LightType_Directional)
        {
            (......)

            // 聲明FDeferredLightPS的著色器排列的實例.
            FDeferredLightPS::FPermutationDomain PermutationVector;
            
            // 根據渲染狀態填充排列值.
            PermutationVector.Set< FDeferredLightPS::FSourceShapeDim >( ELightSourceShape::Directional );
            PermutationVector.Set< FDeferredLightPS::FIESProfileDim >( false );
            PermutationVector.Set< FDeferredLightPS::FInverseSquaredDim >( false );
            PermutationVector.Set< FDeferredLightPS::FVisualizeCullingDim >( View.Family->EngineShowFlags.VisualizeLightCulling );
            PermutationVector.Set< FDeferredLightPS::FLightingChannelsDim >( View.bUsesLightingChannels );
            PermutationVector.Set< FDeferredLightPS::FAnistropicMaterials >(ShouldRenderAnisotropyPass());
            PermutationVector.Set< FDeferredLightPS::FTransmissionDim >( bTransmission );
            PermutationVector.Set< FDeferredLightPS::FHairLighting>(0);
            PermutationVector.Set< FDeferredLightPS::FAtmosphereTransmittance >(bAtmospherePerPixelTransmittance);
            PermutationVector.Set< FDeferredLightPS::FCloudTransmittance >(bLight0CloudPerPixelTransmittance || bLight1CloudPerPixelTransmittance);

            // 用填充好的排列從視圖的ShaderMap獲取對應的PS實例.
            TShaderMapRef< FDeferredLightPS > PixelShader( View.ShaderMap, PermutationVector );
            
            // 填充PS的其它數據.
            GraphicsPSOInit.BoundShaderState.VertexDeclarationRHI = GFilterVertexDeclaration.VertexDeclarationRHI;
            GraphicsPSOInit.BoundShaderState.VertexShaderRHI = VertexShader.GetVertexShader();
            GraphicsPSOInit.BoundShaderState.PixelShaderRHI = PixelShader.GetPixelShader();

            SetGraphicsPipelineState(RHICmdList, GraphicsPSOInit);
            PixelShader->SetParameters(RHICmdList, View, LightSceneInfo, ScreenShadowMaskTexture, LightingChannelsTexture, &RenderLightParams);
            
             (......)
        }
            
    (......)
}

由此可知,著色器排列本質上只是一組擁有不定維度的鍵值,在編譯shader階段,shader編譯器會盡量為每個不同的排列生成對應的shader實例程式碼,當然也可以通過ShouldCompilePermutation排除掉部分無意義的排列。預編譯好的所有shader存放於視圖的ShaderMap中。每個維度的鍵值可在運行時動態生成,然後用它們組合成的排列域去視圖的ShaderMap獲取對應的編譯好的shader程式碼,從而進行後續的著色器數據設置和渲染。

另外,值得一提的是,排列維度父類的名詞(如LIGHT_SOURCE_SHAPE, USE_SOURCE_TEXTURE, USE_IES_PROFILE, …)就是在HLSL程式碼中的宏名稱。比如FSourceShapeDim正是控制著HLSL程式碼的LIGHT_SOURCE_SHAPE,根據FSourceShapeDim的值會選用不同片段的程式碼,從而控制不同版本和分支的shader程式碼。

 

8.3 Shader機制

本章主要分析Shader的部分底層機制,比如Shader Map的存儲機制,Shader的編譯和快取策略等。

8.3.1 Shader Map

ShaderMap是存儲編譯後的shader程式碼,分為FGlobalShaderMap、FMaterialShaderMap、FMeshMaterialShaderMap三種類型。

8.3.1.1 FShaderMapBase

本小節先闡述Shader Map相關的基礎類型和概念,如下:

// Engine\Source\Runtime\Core\Public\Serialization\MemoryImage.h

// 指針表基類.
class FPointerTableBase
{
public:
    virtual ~FPointerTableBase() {}
    virtual int32 AddIndexedPointer(const FTypeLayoutDesc& TypeDesc, void* Ptr) = 0;
    virtual void* GetIndexedPointer(const FTypeLayoutDesc& TypeDesc, uint32 i) const = 0;
};

// Engine\Source\Runtime\RenderCore\Public\Shader.h

// 用以序列化, 反序列化, 編譯, 快取一個專用的shader類. 一個FShaderType可以跨多個維度管理FShader的多個實例,如EShaderPlatform,或permutation id. FShaderType的排列數量簡單地由GetPermutationCount()給出。  
class FShaderType
{
public:
    // 著色器種類, 有全局, 材質, 網格材質, Niagara等.
    enum class EShaderTypeForDynamicCast : uint32
    {
        Global,
        Material,
        MeshMaterial,
        Niagara,
        OCIO,
        NumShaderTypes,
    };

    (......)

    // 靜態數據獲取介面.
    static TLinkedList<FShaderType*>*& GetTypeList();
    static FShaderType* GetShaderTypeByName(const TCHAR* Name);
    static TArray<const FShaderType*> GetShaderTypesByFilename(const TCHAR* Filename);
    static TMap<FHashedName, FShaderType*>& GetNameToTypeMap();
    static const TArray<FShaderType*>& GetSortedTypes(EShaderTypeForDynamicCast Type);
    
    static void Initialize(const TMap<FString, TArray<const TCHAR*> >& ShaderFileToUniformBufferVariables);
    static void Uninitialize();

    // 構造函數.
    FShaderType(...);
    virtual ~FShaderType();

    FShader* ConstructForDeserialization() const;
    FShader* ConstructCompiled(const FShader::CompiledShaderInitializerType& Initializer) const;

    bool ShouldCompilePermutation(...) const;
    void ModifyCompilationEnvironment(..) const;
    bool ValidateCompiledResult(...) const;

    // 基於shader type的源碼和包含計算哈希值.
    const FSHAHash& GetSourceHash(EShaderPlatform ShaderPlatform) const;
    // 獲取FShaderType指針的哈希值.
    friend uint32 GetTypeHash(FShaderType* Ref);

    // 訪問介面.
    (......)

    void AddReferencedUniformBufferIncludes(FShaderCompilerEnvironment& OutEnvironment, FString& OutSourceFilePrefix, EShaderPlatform Platform);
    void FlushShaderFileCache(const TMap<FString, TArray<const TCHAR*> >& ShaderFileToUniformBufferVariables);
    void GetShaderStableKeyParts(struct FStableShaderKeyAndValue& SaveKeyVal);

private:
    EShaderTypeForDynamicCast ShaderTypeForDynamicCast;
    const FTypeLayoutDesc* TypeLayout;
    // 名稱.
    const TCHAR* Name;
    // 類型名.
    FName TypeName;
    // 哈希名
    FHashedName HashedName;
    // 哈希的源碼文件名.
    FHashedName HashedSourceFilename;
    // 源文件名.
    const TCHAR* SourceFilename;
    // 入口命.
    const TCHAR* FunctionName;
    // 著色頻率.
    uint32 Frequency;
    uint32 TypeSize;
    // 排列數量.
    int32 TotalPermutationCount;

    (......)

    // 全局的列表.
    TLinkedList<FShaderType*> GlobalListLink;

protected:
    bool bCachedUniformBufferStructDeclarations;
    // 引用的Uniform Buffer包含的快取.
    TMap<const TCHAR*, FCachedUniformBufferDeclaration> ReferencedUniformBufferStructsCache;
};

// 著色器映射表指針表
class FShaderMapPointerTable : public FPointerTableBase
{
public:
    virtual int32 AddIndexedPointer(const FTypeLayoutDesc& TypeDesc, void* Ptr) override;
    virtual void* GetIndexedPointer(const FTypeLayoutDesc& TypeDesc, uint32 i) const override;

    virtual void SaveToArchive(FArchive& Ar, void* FrozenContent, bool bInlineShaderResources) const;
    virtual void LoadFromArchive(FArchive& Ar, void* FrozenContent, bool bInlineShaderResources, bool bLoadedByCookedMaterial);

    // 著色器類型
    TPtrTable<FShaderType> ShaderTypes;
    // 頂點工廠類型
    TPtrTable<FVertexFactoryType> VFTypes;
};

// 包含編譯期狀態的著色器管線實例.
class FShaderPipeline
{
public:
    explicit FShaderPipeline(const FShaderPipelineType* InType);
    ~FShaderPipeline();

    // 增加著色器.
    void AddShader(FShader* Shader, int32 PermutationId);
    // 獲取著色器數量.
    inline uint32 GetNumShaders() const;

    // 查找shader.
    template<typename ShaderType>
    ShaderType* GetShader(const FShaderMapPointerTable& InPtrTable);
    FShader* GetShader(EShaderFrequency Frequency);
    const FShader* GetShader(EShaderFrequency Frequency) const;
    inline TArray<TShaderRef<FShader>> GetShaders(const FShaderMapBase& InShaderMap) const;

    // 校驗.
    void Validate(const FShaderPipelineType* InPipelineType) const;
    // 處理編譯好的著色器程式碼.
    void Finalize(const FShaderMapResourceCode* Code);
    
    (......)

    enum EFilter
    {
        EAll,            // All pipelines
        EOnlyShared,    // Only pipelines with shared shaders
        EOnlyUnique,    // Only pipelines with unique shaders
    };

    // 哈希值.
    LAYOUT_FIELD(FHashedName, TypeName);
    // 所有著色頻率的FShader實例.
    LAYOUT_ARRAY(TMemoryImagePtr<FShader>, Shaders, SF_NumGraphicsFrequencies);
    // 排列id.
    LAYOUT_ARRAY(int32, PermutationIds, SF_NumGraphicsFrequencies);
};

// 著色器映射表內容.
class FShaderMapContent
{
public:
    struct FProjectShaderPipelineToKey
    {
        inline FHashedName operator()(const FShaderPipeline* InShaderPipeline) 
        { return InShaderPipeline->TypeName; }
    };

    explicit FShaderMapContent(EShaderPlatform InPlatform);
    ~FShaderMapContent();

    EShaderPlatform GetShaderPlatform() const;

    // 校驗.
    void Validate(const FShaderMapBase& InShaderMap);

    // 查找shader.
    template<typename ShaderType>
    ShaderType* GetShader(int32 PermutationId = 0) const;
    template<typename ShaderType>
    ShaderType* GetShader( const typename ShaderType::FPermutationDomain& PermutationVector ) const;
    FShader* GetShader(FShaderType* ShaderType, int32 PermutationId = 0) const;
    FShader* GetShader(const FHashedName& TypeName, int32 PermutationId = 0) const;

    // 檢測是否有指定shader.
    bool HasShader(const FHashedName& TypeName, int32 PermutationId) const;
    bool HasShader(const FShaderType* Type, int32 PermutationId) const;

    inline TArrayView<const TMemoryImagePtr<FShader>> GetShaders() const;
    inline TArrayView<const TMemoryImagePtr<FShaderPipeline>> GetShaderPipelines() const;

    // 增加, 查找shader或Pipeline介面.
    void AddShader(const FHashedName& TypeName, int32 PermutationId, FShader* Shader);
    FShader* FindOrAddShader(const FHashedName& TypeName, int32 PermutationId, FShader* Shader);
    void AddShaderPipeline(FShaderPipeline* Pipeline);
    FShaderPipeline* FindOrAddShaderPipeline(FShaderPipeline* Pipeline);

    // 刪除介面.
    void RemoveShaderTypePermutaion(const FHashedName& TypeName, int32 PermutationId);
    inline void RemoveShaderTypePermutaion(const FShaderType* Type, int32 PermutationId);
    void RemoveShaderPipelineType(const FShaderPipelineType* ShaderPipelineType);

    // 獲取著色器列表.
    void GetShaderList(const FShaderMapBase& InShaderMap, const FSHAHash& InMaterialShaderMapHash, TMap<FShaderId, TShaderRef<FShader>>& OutShaders) const;
    void GetShaderList(const FShaderMapBase& InShaderMap, TMap<FHashedName, TShaderRef<FShader>>& OutShaders) const;

    // 獲取著色器管線列表.
    void GetShaderPipelineList(const FShaderMapBase& InShaderMap, TArray<FShaderPipelineRef>& OutShaderPipelines, FShaderPipeline::EFilter Filter) const;

    (.......)

    // 獲取著色器最大的指令數.
    uint32 GetMaxNumInstructionsForShader(const FShaderMapBase& InShaderMap, FShaderType* ShaderType) const;
    // 保存編譯好的shader程式碼.
    void Finalize(const FShaderMapResourceCode* Code);
    // 更新哈希值.
    void UpdateHash(FSHA1& Hasher) const;

protected:
    using FMemoryImageHashTable = THashTable<FMemoryImageAllocator>;

    // 著色器哈希.
    LAYOUT_FIELD(FMemoryImageHashTable, ShaderHash);
    // 著色器類型.
    LAYOUT_FIELD(TMemoryImageArray<FHashedName>, ShaderTypes);
    // 著色器排列列表.
    LAYOUT_FIELD(TMemoryImageArray<int32>, ShaderPermutations);
    // 著色器實例列表.
    LAYOUT_FIELD(TMemoryImageArray<TMemoryImagePtr<FShader>>, Shaders);
    // 著色器管線列表.
    LAYOUT_FIELD(TMemoryImageArray<TMemoryImagePtr<FShaderPipeline>>, ShaderPipelines);
    // 著色器編譯所在的平台.
    LAYOUT_FIELD(TEnumAsByte<EShaderPlatform>, Platform);
};

// FShaderMa的基類.
class FShaderMapBase
{
public:
    (......)

private:
    const FTypeLayoutDesc& ContentTypeLayout;
    // ShaderMap資源.
    TRefCountPtr<FShaderMapResource> Resource;
    // ShaderMap資源程式碼.
    TRefCountPtr<FShaderMapResourceCode> Code;
    // ShaderMap指針表.
    FShaderMapPointerTable* PointerTable;
    // ShaderMap內容.
    FShaderMapContent* Content;
    // 內容尺寸.
    uint32 FrozenContentSize;
    // 著色器數量.
    uint32 NumFrozenShaders;
};

// 著色器映射表. 需指定FShaderMapContent和FShaderMapPointerTable
template<typename ContentType, typename PointerTableType = FShaderMapPointerTable>
class TShaderMap : public FShaderMapBase
{
public:
    inline const PointerTableType& GetPointerTable();
    inline const ContentType* GetContent() const;
    inline ContentType* GetMutableContent();

    void FinalizeContent()
    {
        ContentType* LocalContent = this->GetMutableContent();
        LocalContent->Finalize(this->GetResourceCode());
        FShaderMapBase::FinalizeContent();
    }

protected:
    TShaderMap();
    virtual FShaderMapPointerTable* CreatePointerTable();
};

// 著色器管線引用.
class FShaderPipelineRef
{
public:
    FShaderPipelineRef();
    FShaderPipelineRef(FShaderPipeline* InPipeline, const FShaderMapBase& InShaderMap);

    (......)

    // 獲取著色器
    template<typename ShaderType>
    TShaderRef<ShaderType> GetShader() const;
    TShaderRef<FShader> GetShader(EShaderFrequency Frequency) const;
    inline TArray<TShaderRef<FShader>> GetShaders() const;

    // 獲取著色管線, 資源等介面.
    inline FShaderPipeline* GetPipeline() const;
    FShaderMapResource* GetResource() const;
    const FShaderMapPointerTable& GetPointerTable() const;

    inline FShaderPipeline* operator->() const;

private:
    FShaderPipeline* ShaderPipeline; // 著色器管線.
    const FShaderMapBase* ShaderMap; // 著色器映射表.
};

上面的很多類型是基類,具體的邏輯需要由子類完成。

8.3.1.2 FGlobalShaderMap

FGlobalShaderMap保存並管理著所有編譯好的FGlobalShader程式碼,它的定義和相關類型如下所示:

// Engine\Source\Runtime\RenderCore\Public\GlobalShader.h

// 用於處理最簡單的著色器(沒有材質和頂點工廠鏈接)的shader meta type, 每個簡單的shader都應該只有一個實例.
class FGlobalShaderType : public FShaderType
{
    friend class FGlobalShaderTypeCompiler;
public:

    typedef FShader::CompiledShaderInitializerType CompiledShaderInitializerType;

    FGlobalShaderType(...);

    bool ShouldCompilePermutation(EShaderPlatform Platform, int32 PermutationId) const;
    void SetupCompileEnvironment(EShaderPlatform Platform, int32 PermutationId, FShaderCompilerEnvironment& Environment);
};

// 全局著色器子表.
class FGlobalShaderMapContent : public FShaderMapContent
{
    (......)
public:
    const FHashedName& GetHashedSourceFilename();

private:
    inline FGlobalShaderMapContent(EShaderPlatform InPlatform, const FHashedName& InHashedSourceFilename);

    // 哈希的源文件名.
    LAYOUT_FIELD(FHashedName, HashedSourceFilename);
};

class FGlobalShaderMapSection : public TShaderMap<FGlobalShaderMapContent, FShaderMapPointerTable>
{
    (......)
    
private:
    inline FGlobalShaderMapSection();
    inline FGlobalShaderMapSection(EShaderPlatform InPlatform, const FHashedName& InHashedSourceFilename);

    TShaderRef<FShader> GetShader(FShaderType* ShaderType, int32 PermutationId = 0) const;
    FShaderPipelineRef GetShaderPipeline(const FShaderPipelineType* PipelineType) const;
};

// 全局ShaderMap.
class FGlobalShaderMap
{
public:
    explicit FGlobalShaderMap(EShaderPlatform InPlatform);
    ~FGlobalShaderMap();

    // 根據著色器類型和排列id獲取編譯後的shader程式碼.
    TShaderRef<FShader> GetShader(FShaderType* ShaderType, int32 PermutationId = 0) const;
    // 根據排列id獲取編譯後的shader程式碼.
    template<typename ShaderType>
    TShaderRef<ShaderType> GetShader(int32 PermutationId = 0) const
    {
        TShaderRef<FShader> Shader = GetShader(&ShaderType::StaticType, PermutationId);
        return TShaderRef<ShaderType>::Cast(Shader);
    }
    // 根據著色器類型內的排列獲取編譯後的shader程式碼.
    template<typename ShaderType>
    TShaderRef<ShaderType> GetShader(const typename ShaderType::FPermutationDomain& PermutationVector) const
    {
        return GetShader<ShaderType>(PermutationVector.ToDimensionValueId());
    }
    
    // 檢測是否有指定的shader.
    bool HasShader(FShaderType* Type, int32 PermutationId) const
    {
        return GetShader(Type, PermutationId).IsValid();
    }
    
    // 獲取著色器管線
    FShaderPipelineRef GetShaderPipeline(const FShaderPipelineType* PipelineType) const;

    // 是否有著色器管線.
    bool HasShaderPipeline(const FShaderPipelineType* ShaderPipelineType) const
    {
        return GetShaderPipeline(ShaderPipelineType).IsValid();
    }

    bool IsEmpty() const;
    void Empty();
    void ReleaseAllSections();

    // 查找或增加shader.
    FShader* FindOrAddShader(const FShaderType* ShaderType, int32 PermutationId, FShader* Shader);
    // 查找或增加shader管線.
    FShaderPipeline* FindOrAddShaderPipeline(const FShaderPipelineType* ShaderPipelineType, FShaderPipeline* ShaderPipeline);

    // 刪除介面
    void RemoveShaderTypePermutaion(const FShaderType* Type, int32 PermutationId);
    void RemoveShaderPipelineType(const FShaderPipelineType* ShaderPipelineType);

    // ShaderMapSection操作.
    void AddSection(FGlobalShaderMapSection* InSection);
    FGlobalShaderMapSection* FindSection(const FHashedName& HashedShaderFilename);
    FGlobalShaderMapSection* FindOrAddSection(const FShaderType* ShaderType);
    
    // IO介面.
    void LoadFromGlobalArchive(FArchive& Ar);
    void SaveToGlobalArchive(FArchive& Ar);

    // 清理所有shader.
    void BeginCreateAllShaders();

    (......)

private:
    // 存儲了FGlobalShaderMapSection的映射表.
    TMap<FHashedName, FGlobalShaderMapSection*> SectionMap;
    EShaderPlatform Platform;
};

// 全局ShaderMap的列表, 其中SP_NumPlatforms是49.
extern RENDERCORE_API FGlobalShaderMap* GGlobalShaderMap[SP_NumPlatforms];

上面涉及到了ShaderMap的Content、Section、PointerTable、ShaderType等等方面的類型和概念,數據多,關係複雜,不過抽象成UML圖之後就簡單明了多了:

classDiagram-v2
FShaderType <|– FGlobalShaderType
FPointerTableBase <|– FShaderMapPointerTable

FShaderMapContent <|– FGlobalShaderMapContent

FShaderMapBase <|– TShaderMap
TShaderMap <|– FGlobalShaderMapSection

FShaderPipeline <– FShaderPipelineRef

以上類圖為了簡明,只展示了繼承關係,若是添加關聯、聚合、組合等關係之後,則是以下的模樣:

classDiagram-v2
FShaderType <|– FGlobalShaderType
FPointerTableBase <|– FShaderMapPointerTable

FShaderMapContent <|– FGlobalShaderMapContent

FShaderMapBase <|– TShaderMap
TShaderMap <|– FGlobalShaderMapSection

FShaderPipeline <– FShaderPipelineRef

FShader o– FShaderPipeline
class FShaderPipeline{
FShader Shaders[5]
}

FShaderPipeline <– FShaderMapContent
FShaderType <– FShaderMapContent
FShader o– FShaderMapContent

class FShaderMapContent{
FHashedName ShaderTypes
FShader Shaders
FShaderPipeline ShaderPipelines
}

FShaderMapContent <– FShaderMapBase
FShaderMapPointerTable <– FShaderMapBase

class FShaderMapBase{
FShaderMapPointerTable* PointerTable
FShaderMapContent* Content
}

class FShaderPipelineRef{
FShaderPipeline* ShaderPipeline
}

class FGlobalShaderMapContent{
FHashedName HashedSourceFilename
}

FGlobalShaderMapSection o– FGlobalShaderMap

class FGlobalShaderMap{
TMap<FHashedName, FGlobalShaderMapSection*> SectionMap
}

上面闡述完了FGlobalShaderMap及其核心類的關聯,下面再看看它是任何被應用到實際渲染中的。首先是在GlobalShader.h和GlobalShader.cpp聲明和定義了FGlobalShaderMap的實例和相關介面:

// Engine\Source\Runtime\RenderCore\Private\GlobalShader.h

// 聲明可外部訪問的FGlobalShaderMap列表.
extern RENDERCORE_API FGlobalShaderMap* GGlobalShaderMap[SP_NumPlatforms];

// 獲取指定著色平台的FGlobalShaderMap.
extern RENDERCORE_API FGlobalShaderMap* GetGlobalShaderMap(EShaderPlatform Platform);

// 獲取指定FeatureLevel的FGlobalShaderMap.
inline FGlobalShaderMap* GetGlobalShaderMap(ERHIFeatureLevel::Type FeatureLevel)
{ 
    return GetGlobalShaderMap(GShaderPlatformForFeatureLevel[FeatureLevel]); 
}

// Engine\Source\Runtime\RenderCore\Private\GlobalShader.cpp

// 聲明所有著色平台的FGlobalShaderMap.
FGlobalShaderMap* GGlobalShaderMap[SP_NumPlatforms] = {};

// 獲取FGlobalShaderMap.
FGlobalShaderMap* GetGlobalShaderMap(EShaderPlatform Platform)
{
    return GGlobalShaderMap[Platform];
}

不過上面只是定義了GGlobalShaderMap,數組內只是一個空的列表,真正的創建堆棧鏈如下所示:

// Engine\Source\Runtime\Launch\Private\LaunchEngineLoop.cpp

// 引擎預初始化.
int32 FEngineLoop::PreInitPreStartupScreen(const TCHAR* CmdLine)
{
    (......)
    
    // 是否開啟shader編譯, 一般情況下都會開啟.
    bool bEnableShaderCompile = !FParse::Param(FCommandLine::Get(), TEXT("NoShaderCompile"));
    
    (......)
    
    if (bEnableShaderCompile && !IsRunningDedicatedServer() && !bIsCook)
    {
        (......)
        
        // 編譯GlobalShaderMap
        CompileGlobalShaderMap(false);
        
        (......)
    }
    
    (......)
}

// Engine\Source\Runtime\Engine\Private\ShaderCompiler\ShaderCompiler.cpp

void CompileGlobalShaderMap(EShaderPlatform Platform, const ITargetPlatform* TargetPlatform, bool bRefreshShaderMap)
{
    (......)

    // 如果對應平台的GlobalShaderMap未創建, 則創建之.
    if (!GGlobalShaderMap[Platform])
    {
        (......)

        // 創建對應平台的FGlobalShaderMap.
        GGlobalShaderMap[Platform] = new FGlobalShaderMap(Platform);

        // Cooked模式.
        if (FPlatformProperties::RequiresCookedData())
        {
            (......)
        }
        // Uncooked模式
        else
        {
            // FGlobalShaderMap的id.
            FGlobalShaderMapId ShaderMapId(Platform);

            const int32 ShaderFilenameNum = ShaderMapId.GetShaderFilenameToDependeciesMap().Num();
            const float ProgressStep = 25.0f / ShaderFilenameNum;

            TArray<uint32> AsyncDDCRequestHandles;
            AsyncDDCRequestHandles.SetNum(ShaderFilenameNum);

            int32 HandleIndex = 0;

            // 提交DDC請求.
            for (const auto& ShaderFilenameDependencies : ShaderMapId.GetShaderFilenameToDependeciesMap())
            {
                SlowTask.EnterProgressFrame(ProgressStep);

                const FString DataKey = GetGlobalShaderMapKeyString(ShaderMapId, Platform, TargetPlatform, ShaderFilenameDependencies.Value);

                AsyncDDCRequestHandles[HandleIndex] = GetDerivedDataCacheRef().GetAsynchronous(*DataKey, TEXT("GlobalShaderMap"_SV));

                ++HandleIndex;
            }

            // 處理已經結束的DDC請求.
            TArray<uint8> CachedData;
            HandleIndex = 0;
            for (const auto& ShaderFilenameDependencies : ShaderMapId.GetShaderFilenameToDependeciesMap())
            {
                SlowTask.EnterProgressFrame(ProgressStep);
                CachedData.Reset();
                
                GetDerivedDataCacheRef().WaitAsynchronousCompletion(AsyncDDCRequestHandles[HandleIndex]);
                if (GetDerivedDataCacheRef().GetAsynchronousResults(AsyncDDCRequestHandles[HandleIndex], CachedData))
                {
                    FMemoryReader MemoryReader(CachedData);
                    GGlobalShaderMap[Platform]->AddSection(FGlobalShaderMapSection::CreateFromArchive(MemoryReader));
                }
                else
                {
                    // 沒有在DDC中找到, 忽略之.
                }

                ++HandleIndex;
            }
        }

        // 如果有shader沒有被載入, 編譯之.
        VerifyGlobalShaders(Platform, bLoadedFromCacheFile);

        // 創建所有著色器.
        if (GCreateShadersOnLoad && Platform == GMaxRHIShaderPlatform)
        {
            GGlobalShaderMap[Platform]->BeginCreateAllShaders();
        }
    }
}

以上可知,FGlobalShaderMap是在引擎預初始化階段就被創建出實例,然後會嘗試從DDC中讀取已經編譯好的shader數據。在此之後,其它模組就可以正常訪問和操作FGlobalShaderMap的對象了。

另外,在FViewInfo內部,也存有FGlobalShaderMap的實例,不過它也是通過GetGlobalShaderMap獲取的實例:

// Engine\Source\Runtime\Renderer\Private\SceneRendering.h

class FViewInfo : public FSceneView
{
public:
    (......)
    
    FGlobalShaderMap* ShaderMap;
    
    (......)
};

// Engine\Source\Runtime\Renderer\Private\SceneRendering.cpp

void FViewInfo::Init()
{
    (......)

    ShaderMap = GetGlobalShaderMap(FeatureLevel);
    
    (......)
}

如此一來,渲染模組內的大多數邏輯都可以方便地獲取到FViewInfo的實例,因此也就可以方便地訪問FGlobalShaderMap的實例(還不需要指定FeatureLevel)。

8.3.1.3 FMaterialShaderMap

FMaterialShaderMap存儲和管理著一組FMaterialShader實例的對象。它和相關的類型定義如下:

// Engine\Source\Runtime\Engine\Public\MaterialShared.h

// 材質ShaderMap內容.
class FMaterialShaderMapContent : public FShaderMapContent
{
public:
    (......)

    inline uint32 GetNumShaders() const;
    inline uint32 GetNumShaderPipelines() const;

private:
    struct FProjectMeshShaderMapToKey
    {
        inline const FHashedName& operator()(const FMeshMaterialShaderMap* InShaderMap) { return InShaderMap->GetVertexFactoryTypeName(); }
    };

    // 獲取/增加/刪除操作.
    FMeshMaterialShaderMap* GetMeshShaderMap(const FHashedName& VertexFactoryTypeName) const;
    void AddMeshShaderMap(const FVertexFactoryType* VertexFactoryType, FMeshMaterialShaderMap* MeshShaderMap);
    void RemoveMeshShaderMap(const FVertexFactoryType* VertexFactoryType);

    // 有序的網格著色器映射表, 通過VFType->GetId()索引, 用於運行時快速查找.
    LAYOUT_FIELD(TMemoryImageArray<TMemoryImagePtr<FMeshMaterialShaderMap>>, OrderedMeshShaderMaps);
    // 材質編譯輸出.
    LAYOUT_FIELD(FMaterialCompilationOutput, MaterialCompilationOutput);
    // 著色器內容哈希.
    LAYOUT_FIELD(FSHAHash, ShaderContentHash);

    LAYOUT_FIELD_EDITORONLY(TMemoryImageArray<FMaterialProcessedSource>, ShaderProcessedSource);
    LAYOUT_FIELD_EDITORONLY(FMemoryImageString, FriendlyName);
    LAYOUT_FIELD_EDITORONLY(FMemoryImageString, DebugDescription);
    LAYOUT_FIELD_EDITORONLY(FMemoryImageString, MaterialPath);
};

// 材質著色器映射表, 父類是TShaderMap.
class FMaterialShaderMap : public TShaderMap<FMaterialShaderMapContent, FShaderMapPointerTable>, public FDeferredCleanupInterface
{
public:
    using Super = TShaderMap<FMaterialShaderMapContent, FShaderMapPointerTable>;

    // 查找指定id和平台的FMaterialShaderMap實例.
    static TRefCountPtr<FMaterialShaderMap> FindId(const FMaterialShaderMapId& ShaderMapId, EShaderPlatform Platform);

    (......)

    // ShaderMap interface
    // 獲取著色器實例.
    TShaderRef<FShader> GetShader(FShaderType* ShaderType, int32 PermutationId = 0) const;
    template<typename ShaderType> TShaderRef<ShaderType> GetShader(int32 PermutationId = 0) const;
    template<typename ShaderType> TShaderRef<ShaderType> GetShader(const typename ShaderType::FPermutationDomain& PermutationVector) const;

    uint32 GetMaxNumInstructionsForShader(FShaderType* ShaderType) const;

    void FinalizeContent();

    // 編譯一個材質的著色器並快取到shader map中.
    void Compile(FMaterial* Material,const FMaterialShaderMapId& ShaderMapId, TRefCountPtr<FShaderCompilerEnvironment> MaterialEnvironment, const FMaterialCompilationOutput& InMaterialCompilationOutput, EShaderPlatform Platform, bool bSynchronousCompile);

    // 檢測是否有shader丟失.
    bool IsComplete(const FMaterial* Material, bool bSilent);
    // 嘗試增加已有的編譯任務.
    bool TryToAddToExistingCompilationTask(FMaterial* Material);

    // 構建在shader map的shader列表.
    void GetShaderList(TMap<FShaderId, TShaderRef<FShader>>& OutShaders) const;
    void GetShaderList(TMap<FHashedName, TShaderRef<FShader>>& OutShaders) const;
    void GetShaderPipelineList(TArray<FShaderPipelineRef>& OutShaderPipelines) const;

    uint32 GetShaderNum() const;

    // 註冊一個材質著色器映射表到全局表中, 那樣就可以被材質使用.
    void Register(EShaderPlatform InShaderPlatform);

    // Reference counting.
    void AddRef();
    void Release();

    // 刪除指定shader type的所有在快取的入口.
    void FlushShadersByShaderType(const FShaderType* ShaderType);
    void FlushShadersByShaderPipelineType(const FShaderPipelineType* ShaderPipelineType);
    void FlushShadersByVertexFactoryType(const FVertexFactoryType* VertexFactoryType);
    
    static void RemovePendingMaterial(FMaterial* Material);
    static const FMaterialShaderMap* GetShaderMapBeingCompiled(const FMaterial* Material);

    // Accessors.
    FMeshMaterialShaderMap* GetMeshShaderMap(FVertexFactoryType* VertexFactoryType) const;
    FMeshMaterialShaderMap* GetMeshShaderMap(const FHashedName& VertexFactoryTypeName) const;
    const FMaterialShaderMapId& GetShaderMapId() const;
    
    (......)

private:
    // 全局的材質shader map.
    static TMap<FMaterialShaderMapId,FMaterialShaderMap*> GIdToMaterialShaderMap[SP_NumPlatforms];
    static FCriticalSection GIdToMaterialShaderMapCS;
    // 正在編譯的材質.
    static TMap<TRefCountPtr<FMaterialShaderMap>, TArray<FMaterial*> > ShaderMapsBeingCompiled;

    // 著色器映射表id.
    FMaterialShaderMapId ShaderMapId;
    // 編譯期間的id.
    uint32 CompilingId;
    // 對應的平台.
    const ITargetPlatform* CompilingTargetPlatform;

    // 被引用的數量.
    mutable int32 NumRefs;

    // 標記
    bool bDeletedThroughDeferredCleanup;
    uint32 bRegistered : 1;
    uint32 bCompilationFinalized : 1;
    uint32 bCompiledSuccessfully : 1;
    uint32 bIsPersistent : 1;

    (......)
};

FMaterialShaderMap和FGlobalShaderMap不一樣的是,它會額外關聯一個材質和一個頂點工廠。對於單個FMaterialShaderMap的內部數據內容,如下所示:

FMaterialShaderMap
    FLightFunctionPixelShader - FMaterialShaderType
    FLocalVertexFactory - FVertexFactoryType
        TDepthOnlyPS - FMeshMaterialShaderType
        TDepthOnlyVS - FMeshMaterialShaderType
        TBasePassPS - FMeshMaterialShaderType
        TBasePassVS - FMeshMaterialShaderType
        (......)
    FGPUSkinVertexFactory - FVertexFactoryType
        (......)

由於FMaterialShaderMap跟材質藍圖綁定的,因為它是FMaterial的一個成員:

// Engine\Source\Runtime\Engine\Public\MaterialShared.h

class FMaterial
{
public:
    // 獲取材質的shader實例.
    TShaderRef<FShader> GetShader(class FMeshMaterialShaderType* ShaderType, FVertexFactoryType* VertexFactoryType, int32 PermutationId, bool bFatalIfMissing = true) const;
    
    (......)
    
private:
    // 遊戲執行緒的材質ShaderMap
    TRefCountPtr<FMaterialShaderMap> GameThreadShaderMap;
    // 渲染執行緒的材質ShaderMap
    TRefCountPtr<FMaterialShaderMap> RenderingThreadShaderMap;
    
    (......)
};

// Engine\Source\Runtime\Engine\Private\Materials\MaterialShared.cpp

TShaderRef<FShader> FMaterial::GetShader(FMeshMaterialShaderType* ShaderType, FVertexFactoryType* VertexFactoryType, int32 PermutationId, bool bFatalIfMissing) const
{
    // 從RenderingThreadShaderMap獲取shader.
    const FMeshMaterialShaderMap* MeshShaderMap = RenderingThreadShaderMap->GetMeshShaderMap(VertexFactoryType);
    FShader* Shader = MeshShaderMap ? MeshShaderMap->GetShader(ShaderType, PermutationId) : nullptr;
    
    (......)

    // 返回FShader引用.
    return TShaderRef<FShader>(Shader, *RenderingThreadShaderMap);
}

因此可以找到,每個FMaterial都有一個FMaterialShaderMap(遊戲執行緒一個,渲染執行緒一個),如果要獲取FMaterial的指定類型的Shader,就需要從該FMaterial的FMaterialShaderMap實例中獲取,從而完成了它們之間的鏈接。

8.3.1.4 FMeshMaterialShaderMap

以上小節闡述了,FGlobalShaderMap存儲和管理FGlobalShader,而FMaterialShaderMap存儲和管理FMaterialShader,相應地,FMeshMaterialShaderMap則存儲和管理FMeshMaterialShader。它的定義如下:

// Engine\Source\Runtime\Engine\Public\MaterialShared.h

class FMeshMaterialShaderMap : public FShaderMapContent
{
public:
    FMeshMaterialShaderMap(EShaderPlatform InPlatform, FVertexFactoryType* InVFType);

    // 開始編譯指定材質和頂點工廠類型的所有材質.
    uint32 BeginCompile(
        uint32 ShaderMapId,
        const FMaterialShaderMapId& InShaderMapId, 
        const FMaterial* Material,
        const FMeshMaterialShaderMapLayout& MeshLayout,
        FShaderCompilerEnvironment* MaterialEnvironment,
        EShaderPlatform Platform,
        TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>>& NewJobs,
        FString DebugDescription,
        FString DebugExtension
        );

    void FlushShadersByShaderType(const FShaderType* ShaderType);
    void FlushShadersByShaderPipelineType(const FShaderPipelineType* ShaderPipelineType);

    (......)

private:
    // 頂點工廠類型名稱.
    LAYOUT_FIELD(FHashedName, VertexFactoryTypeName);
};

FMeshMaterialShaderMap通常不能單獨被創建,而是附加在FMaterialShaderMapContent之中,隨著FMaterialShaderMapContent一起被創建和銷毀,具體細節和應用見上一小節。

8.3.2 Shader編譯

本節講的是如何將材質藍圖和usf文件編譯成對應目標平台的shader程式碼。為了便於闡述單個Shader文件的編譯過程,我們不妨追蹤RecompileShaders的命令的處理過程(編譯的是全局shader):

// Engine\Source\Runtime\Engine\Private\ShaderCompiler\ShaderCompiler.cpp

bool RecompileShaders(const TCHAR* Cmd, FOutputDevice& Ar)
{
    (......)

    FString FlagStr(FParse::Token(Cmd, 0));
    if( FlagStr.Len() > 0 )
    {
        // 刷新著色器文件快取.
        FlushShaderFileCache();
        // 刷新渲染指令.
        FlushRenderingCommands();

        // 處理`RecompileShaders Changed`命令
        if( FCString::Stricmp(*FlagStr,TEXT("Changed"))==0)
        {
            (......)
        }
        // 處理`RecompileShaders Global`命令
        else if( FCString::Stricmp(*FlagStr,TEXT("Global"))==0)
        {
            (......)
        }
        // 處理`RecompileShaders Material`命令
        else if( FCString::Stricmp(*FlagStr,TEXT("Material"))==0)
        {
            (......)
        }
        // 處理`RecompileShaders All`命令
        else if( FCString::Stricmp(*FlagStr,TEXT("All"))==0)
        {
            (......)
        }
        // 處理`RecompileShaders <ShaderPath>`命令
        else
        {
            // 根據文件名獲取FShaderType.
            TArray<const FShaderType*> ShaderTypes = FShaderType::GetShaderTypesByFilename(*FlagStr);
            // 根據FShaderType獲取FShaderPipelineType.
            TArray<const FShaderPipelineType*> ShaderPipelineTypes = FShaderPipelineType::GetShaderPipelineTypesByFilename(*FlagStr);
            if (ShaderTypes.Num() > 0 || ShaderPipelineTypes.Num() > 0)
            {
                FRecompileShadersTimer TestTimer(TEXT("RecompileShaders SingleShader"));
                
                TArray<const FVertexFactoryType*> FactoryTypes;

                // 遍歷材質所有激活的FeatureLevel, 逐個編譯它們.
                UMaterialInterface::IterateOverActiveFeatureLevels([&](ERHIFeatureLevel::Type InFeatureLevel) {
                    auto ShaderPlatform = GShaderPlatformForFeatureLevel[InFeatureLevel];
                    // 開始編譯指定ShaderTypes,ShaderPipelineTypes,ShaderPlatform的shader.
                    BeginRecompileGlobalShaders(ShaderTypes, ShaderPipelineTypes, ShaderPlatform);
                    // 結束編譯.
                    FinishRecompileGlobalShaders();
                });
            }
        }

        return 1;
    }

    (......)
}

上面程式碼進入了關鍵介面BeginRecompileGlobalShaders開始編譯指定的shader:

void BeginRecompileGlobalShaders(const TArray<const FShaderType*>& OutdatedShaderTypes, const TArray<const FShaderPipelineType*>& OutdatedShaderPipelineTypes, EShaderPlatform ShaderPlatform, const ITargetPlatform* TargetPlatform)
{
    if (!FPlatformProperties::RequiresCookedData())
    {
        // 刷新對現有全局著色器的掛起訪問.
        FlushRenderingCommands();

        // 編譯全局的ShaderMap.
        CompileGlobalShaderMap(ShaderPlatform, TargetPlatform, false);
        
        // 檢測有效性.
        FGlobalShaderMap* GlobalShaderMap = GetGlobalShaderMap(ShaderPlatform);
        if (OutdatedShaderTypes.Num() > 0 || OutdatedShaderPipelineTypes.Num() > 0)
        {
            VerifyGlobalShaders(ShaderPlatform, false, &OutdatedShaderTypes, &OutdatedShaderPipelineTypes);
        }
    }
}

// 編譯單個全局著色器映射表.
void CompileGlobalShaderMap(EShaderPlatform Platform, const ITargetPlatform* TargetPlatform, bool bRefreshShaderMap)
{
    (......)

    // 刪除舊的資源.
    if (bRefreshShaderMap || GGlobalShaderTargetPlatform[Platform] != TargetPlatform)
    {
        delete GGlobalShaderMap[Platform];
        GGlobalShaderMap[Platform] = nullptr;

        GGlobalShaderTargetPlatform[Platform] = TargetPlatform;

        // 確保我們查找更新的shader源文件.
        FlushShaderFileCache();
    }

    // 創建並編譯shader.
    if (!GGlobalShaderMap[Platform])
    {
        (......)

        GGlobalShaderMap[Platform] = new FGlobalShaderMap(Platform);

        (......)

        // 檢測是否有shader未載入, 是則編譯之.
        VerifyGlobalShaders(Platform, bLoadedFromCacheFile);

        if (GCreateShadersOnLoad && Platform == GMaxRHIShaderPlatform)
        {
            GGlobalShaderMap[Platform]->BeginCreateAllShaders();
        }
    }
}

// 檢測是否有shader未載入, 是則編譯之.
void VerifyGlobalShaders(EShaderPlatform Platform, bool bLoadedFromCacheFile, const TArray<const FShaderType*>* OutdatedShaderTypes, const TArray<const FShaderPipelineType*>* OutdatedShaderPipelineTypes)
{
    (......)

    // 獲取FGlobalShaderMap實例.
    FGlobalShaderMap* GlobalShaderMap = GetGlobalShaderMap(Platform);
    
    (......)

    // 所有作業, 包含single和pipeline.
    TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>> GlobalShaderJobs;

    // 先添加single jobs.
    TMap<TShaderTypePermutation<const FShaderType>, FShaderCompileJob*> SharedShaderJobs;

    for (TLinkedList<FShaderType*>::TIterator ShaderTypeIt(FShaderType::GetTypeList()); ShaderTypeIt; ShaderTypeIt.Next())
    {
        FGlobalShaderType* GlobalShaderType = ShaderTypeIt->GetGlobalShaderType();
        if (!GlobalShaderType)
        {
            continue;
        }

        int32 PermutationCountToCompile = 0;
        for (int32 PermutationId = 0; PermutationId < GlobalShaderType->GetPermutationCount(); PermutationId++)
        {
            if (GlobalShaderType->ShouldCompilePermutation(Platform, PermutationId) 
                && (!GlobalShaderMap->HasShader(GlobalShaderType, PermutationId) || (OutdatedShaderTypes && OutdatedShaderTypes->Contains(GlobalShaderType))))
            {
                // 如果是過期的shader類型, 刪除之.
                if (OutdatedShaderTypes)
                {
                    GlobalShaderMap->RemoveShaderTypePermutaion(GlobalShaderType, PermutationId);
                }

                // 創建編譯global shader type的作業
                auto* Job = FGlobalShaderTypeCompiler::BeginCompileShader(GlobalShaderType, PermutationId, Platform, nullptr, GlobalShaderJobs);
                TShaderTypePermutation<const FShaderType> ShaderTypePermutation(GlobalShaderType, PermutationId);
                // 添加到作業列表.
                SharedShaderJobs.Add(ShaderTypePermutation, Job);
                PermutationCountToCompile++;
            }
        }

        (......)
    }

    // 處理FShaderPipeline, 如果是可共享的pipeline, 則不需要重複添加作業.
    for (TLinkedList<FShaderPipelineType*>::TIterator ShaderPipelineIt(FShaderPipelineType::GetTypeList()); ShaderPipelineIt; ShaderPipelineIt.Next())
    {
        const FShaderPipelineType* Pipeline = *ShaderPipelineIt;
        if (Pipeline->IsGlobalTypePipeline())
        {
            if (!GlobalShaderMap->HasShaderPipeline(Pipeline) || (OutdatedShaderPipelineTypes && OutdatedShaderPipelineTypes->Contains(Pipeline)))
            {
                auto& StageTypes = Pipeline->GetStages();
                TArray<FGlobalShaderType*> ShaderStages;
                for (int32 Index = 0; Index < StageTypes.Num(); ++Index)
                {
                    FGlobalShaderType* GlobalShaderType = ((FShaderType*)(StageTypes[Index]))->GetGlobalShaderType();
                    if (GlobalShaderType->ShouldCompilePermutation(Platform, kUniqueShaderPermutationId))
                    {
                        ShaderStages.Add(GlobalShaderType);
                    }
                    else
                    {
                        break;
                    }
                }

                // 刪除過期的PipelineType
                if (OutdatedShaderPipelineTypes)
                {
                    GlobalShaderMap->RemoveShaderPipelineType(Pipeline);
                }

                if (ShaderStages.Num() == StageTypes.Num())
                {
                    (......)

                    if (Pipeline->ShouldOptimizeUnusedOutputs(Platform))
                    {
                        // Make a pipeline job with all the stages
                        FGlobalShaderTypeCompiler::BeginCompileShaderPipeline(Platform, Pipeline, ShaderStages, GlobalShaderJobs);
                    }
                    else
                    {
                        for (const FShaderType* ShaderType : StageTypes)
                        {
                            TShaderTypePermutation<const FShaderType> ShaderTypePermutation(ShaderType, kUniqueShaderPermutationId);

                            FShaderCompileJob** Job = SharedShaderJobs.Find(ShaderTypePermutation);
                            auto* SingleJob = (*Job)->GetSingleShaderJob();
                            auto& SharedPipelinesInJob = SingleJob->SharingPipelines.FindOrAdd(nullptr);
                            // 添加pipeline作業.
                            SharedPipelinesInJob.Add(Pipeline);
                        }
                    }
                }
            }
        }
    }

    if (GlobalShaderJobs.Num() > 0)
    {
        GetOnGlobalShaderCompilation().Broadcast();
        // 添加編譯作業到GShaderCompilingManager中.
        GShaderCompilingManager->AddJobs(GlobalShaderJobs, true, false, "Globals");

        // 部分平台不支援非同步shader編譯.
        const bool bAllowAsynchronousGlobalShaderCompiling =
            !IsOpenGLPlatform(GMaxRHIShaderPlatform) && !IsVulkanPlatform(GMaxRHIShaderPlatform) &&
            !IsMetalPlatform(GMaxRHIShaderPlatform) && !IsSwitchPlatform(GMaxRHIShaderPlatform) &&
            GShaderCompilingManager->AllowAsynchronousShaderCompiling();

        if (!bAllowAsynchronousGlobalShaderCompiling)
        {
            TArray<int32> ShaderMapIds;
            ShaderMapIds.Add(GlobalShaderMapId);

            GShaderCompilingManager->FinishCompilation(TEXT("Global"), ShaderMapIds);
        }
    }
}

由此可知,shader的編譯作業由全局對象GShaderCompilingManager完成,下面進入FShaderCompilingManager的類型定義:

// Engine\Source\Runtime\Engine\Public\ShaderCompiler.h

class FShaderCompilingManager
{
    (......)
    
private:
    //////////////////////////////////////////////////////
    // 執行緒共享的屬性: 只有當CompileQueueSection獲得時才能讀寫.
    bool bCompilingDuringGame;
    // 正在編譯的作業列表.
    TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>> CompileQueue;
    TMap<int32, FShaderMapCompileResults> ShaderMapJobs;
    int32 NumOutstandingJobs;
    int32 NumExternalJobs;
    FCriticalSection CompileQueueSection;

    //////////////////////////////////////////////////////
    // 主執行緒狀態 - 只有主執行緒可訪問.
    TMap<int32, FShaderMapFinalizeResults> PendingFinalizeShaderMaps;
    TUniquePtr<FShaderCompileThreadRunnableBase> Thread;

    //////////////////////////////////////////////////////
    // 配置屬性
    uint32 NumShaderCompilingThreads;
    uint32 NumShaderCompilingThreadsDuringGame;
    int32 MaxShaderJobBatchSize;
    int32 NumSingleThreadedRunsBeforeRetry;
    uint32 ProcessId;
    (......)

public:
    // 數據訪問和設置介面.
    bool ShouldDisplayCompilingNotification() const;
    bool AllowAsynchronousShaderCompiling() const;
    bool IsCompiling() const;
    bool HasShaderJobs() const;
    int32 GetNumRemainingJobs() const;
    void SetExternalJobs(int32 NumJobs);

    enum class EDumpShaderDebugInfo : int32
    {
        Never                = 0,
        Always                = 1,
        OnError                = 2,
        OnErrorOrWarning    = 3
    };
    
    (......)

    // 增加編譯作業.
    ENGINE_API void AddJobs(TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>>& NewJobs, bool bOptimizeForLowLatency, bool bRecreateComponentRenderStateOnCompletion, const FString MaterialBasePath, FString PermutationString = FString(""), bool bSkipResultProcessing = false);
    
    // 刪除編譯作業.
    ENGINE_API void CancelCompilation(const TCHAR* MaterialName, const TArray<int32>& ShaderMapIdsToCancel);
    // 結束編譯作業, 會阻塞執行緒直到指定的材質編譯完成.
    ENGINE_API void FinishCompilation(const TCHAR* MaterialName, const TArray<int32>& ShaderMapIdsToFinishCompiling);
    // 阻塞所有shader編譯, 直到完成.
    ENGINE_API void FinishAllCompilation();
    // 關閉編譯管理器.
    ENGINE_API void Shutdown();
    // 處理已經完成的非同步結果, 將它們附加到關聯的材質上.
    ENGINE_API void ProcessAsyncResults(bool bLimitExecutionTime, bool bBlockOnGlobalShaderCompletion);

    static bool IsShaderCompilerWorkerRunning(FProcHandle & WorkerHandle);
};

// Engine\Source\Runtime\Engine\Private\ShaderCompiler\ShaderCompiler.cpp

void FShaderCompilingManager::AddJobs(TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>>& NewJobs, bool bOptimizeForLowLatency, bool bRecreateComponentRenderStateOnCompletion, const FString MaterialBasePath, const FString PermutationString, bool bSkipResultProcessing)
{
    (......)
    
    // 註冊作業到GShaderCompilerStats.
    if(NewJobs.Num())
    {
        FShaderCompileJob* Job = NewJobs[0]->GetSingleShaderJob();
        if(Job) //assume that all jobs are for the same platform
        {
            GShaderCompilerStats->RegisterCompiledShaders(NewJobs.Num(), Job->Input.Target.GetPlatform(), MaterialBasePath, PermutationString);
        }
        else
        {
            GShaderCompilerStats->RegisterCompiledShaders(NewJobs.Num(), SP_NumPlatforms, MaterialBasePath, PermutationString);
        }
    }
    
    // 入隊編譯列表.
    if (bOptimizeForLowLatency)
    {
        int32 InsertIndex = 0;

        for (; InsertIndex < CompileQueue.Num(); InsertIndex++)
        {
            if (!CompileQueue[InsertIndex]->bOptimizeForLowLatency)
            {
                break;
            }
        }

        CompileQueue.InsertZeroed(InsertIndex, NewJobs.Num());

        for (int32 JobIndex = 0; JobIndex < NewJobs.Num(); JobIndex++)
        {
            CompileQueue[InsertIndex + JobIndex] = NewJobs[JobIndex];
        }
    }
    else
    {
        CompileQueue.Append(NewJobs);
    }

    // 增加作業數量.
    FPlatformAtomics::InterlockedAdd(&NumOutstandingJobs, NewJobs.Num());

    // 增加著色器映射表的作業數量.
    for (int32 JobIndex = 0; JobIndex < NewJobs.Num(); JobIndex++)
    {
        NewJobs[JobIndex]->bOptimizeForLowLatency = bOptimizeForLowLatency;
        FShaderMapCompileResults& ShaderMapInfo = ShaderMapJobs.FindOrAdd(NewJobs[JobIndex]->Id);
        ShaderMapInfo.bRecreateComponentRenderStateOnCompletion = bRecreateComponentRenderStateOnCompletion;
        ShaderMapInfo.bSkipResultProcessing = bSkipResultProcessing;
        auto* PipelineJob = NewJobs[JobIndex]->GetShaderPipelineJob();
        if (PipelineJob)
        {
            ShaderMapInfo.NumJobsQueued += PipelineJob->StageJobs.Num();
        }
        else
        {
            ShaderMapInfo.NumJobsQueued++;
        }
    }
}

void FShaderCompilingManager::FinishCompilation(const TCHAR* MaterialName, const TArray<int32>& ShaderMapIdsToFinishCompiling)
{
    (......)

    TMap<int32, FShaderMapFinalizeResults> CompiledShaderMaps;
    CompiledShaderMaps.Append( PendingFinalizeShaderMaps );
    PendingFinalizeShaderMaps.Empty();
    
    // 阻塞編譯.
    BlockOnShaderMapCompletion(ShaderMapIdsToFinishCompiling, CompiledShaderMaps);

    // 重試並獲取潛在的錯誤.
    bool bRetry = false;
    do 
    {
        bRetry = HandlePotentialRetryOnError(CompiledShaderMaps);
    } 
    while (bRetry);

    // 處理編譯好的ShaderMap.
    ProcessCompiledShaderMaps(CompiledShaderMaps, FLT_MAX);

    (......)
}

以上可知,最終的shader編譯作業實例類型是FShaderCommonCompileJob,它的實例對進入一個全局的隊列,以便多執行緒非同步地編譯。下面是FShaderCommonCompileJob及其相關類型的定義:

// Engine\Source\Runtime\Engine\Public\ShaderCompiler.h

// 存儲了用於編譯shader或shader pipeline的通用數據.
class FShaderCommonCompileJob
{
public:
    uint32 Id;
    // 是否完成了編譯.
    bool bFinalized;
    // 是否成功.
    bool bSucceeded;
    bool bOptimizeForLowLatency;

    FShaderCommonCompileJob(uint32 InId);
    virtual ~FShaderCommonCompileJob();

    // 數據介面.
    virtual FShaderCompileJob* GetSingleShaderJob();
    virtual const FShaderCompileJob* GetSingleShaderJob() const;
    virtual FShaderPipelineCompileJob* GetShaderPipelineJob();
    virtual const FShaderPipelineCompileJob* GetShaderPipelineJob() const;

    // 未著色編譯器作業獲取一個全局的id.
    ENGINE_API static uint32 GetNextJobId();

private:
    // 作業id的計數器.
    static FThreadSafeCounter JobIdCounter;
};

// 用於編譯單個shader的所有輸入和輸出資訊.
class FShaderCompileJob : public FShaderCommonCompileJob
{
public:
    // 著色器的頂點工廠, 可能是null.
    FVertexFactoryType* VFType;
    // 著色器類型.
    FShaderType* ShaderType;
    // 排列id.
    int32 PermutationId;
    // 編譯的輸入和輸出.
    FShaderCompilerInput Input;
    FShaderCompilerOutput Output;

    // 共享此作業的Pipeline列表.
    TMap<const FVertexFactoryType*, TArray<const FShaderPipelineType*>> SharingPipelines;

    FShaderCompileJob(uint32 InId, FVertexFactoryType* InVFType, FShaderType* InShaderType, int32 InPermutationId);

    virtual FShaderCompileJob* GetSingleShaderJob() override;
    virtual const FShaderCompileJob* GetSingleShaderJob() const override;
};

// 用於編譯ShaderPipeline的資訊.
class FShaderPipelineCompileJob : public FShaderCommonCompileJob
{
public:
    // 作業列表.
    TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>> StageJobs;
    bool bFailedRemovingUnused;

    // 所屬的ShaderPipeline
    const FShaderPipelineType* ShaderPipeline;

    FShaderPipelineCompileJob(uint32 InId, const FShaderPipelineType* InShaderPipeline, int32 NumStages);

    virtual FShaderPipelineCompileJob* GetShaderPipelineJob() override;
    virtual const FShaderPipelineCompileJob* GetShaderPipelineJob() const override;
};

以上作業經過FShaderCompilingManager::AddJobs等介面加入到FShaderCompilingManager::CompileQueue隊列中,然後主要由FShaderCompileThreadRunnable::PullTasksFromQueue介面拉取作業並執行(多生產者多消費者模式):

// Engine\Source\Runtime\Engine\Private\ShaderCompiler\ShaderCompiler.cpp

int32 FShaderCompileThreadRunnable::PullTasksFromQueue()
{
    int32 NumActiveThreads = 0;
    {
        // 進入臨界區, 以便訪問輸入和輸出隊列.
        FScopeLock Lock(&Manager->CompileQueueSection);

        const int32 NumWorkersToFeed = Manager->bCompilingDuringGame ? Manager->NumShaderCompilingThreadsDuringGame : WorkerInfos.Num();
        // 計算每個工作執行緒的作業數量.
        const auto NumJobsPerWorker = (Manager->CompileQueue.Num() / NumWorkersToFeed) + 1;
        
        // 遍歷所有WorkerInfos.
        for (int32 WorkerIndex = 0; WorkerIndex < WorkerInfos.Num(); WorkerIndex++)
        {
            FShaderCompileWorkerInfo& CurrentWorkerInfo = *WorkerInfos[WorkerIndex];

            // 如果本工作執行緒沒有任何隊列作業, 從其它輸入隊列查找.
            if (CurrentWorkerInfo.QueuedJobs.Num() == 0 && WorkerIndex < NumWorkersToFeed)
            {
                if (Manager->CompileQueue.Num() > 0)
                {
                    bool bAddedLowLatencyTask = false;
                    const auto MaxNumJobs = FMath::Min3(NumJobsPerWorker, Manager->CompileQueue.Num(), Manager->MaxShaderJobBatchSize);
                    
                    int32 JobIndex = 0;
                    // Don't put more than one low latency task into a batch
                    for (; JobIndex < MaxNumJobs && !bAddedLowLatencyTask; JobIndex++)
                    {
                        bAddedLowLatencyTask |= Manager->CompileQueue[JobIndex]->bOptimizeForLowLatency;
                        // 從管理器的CompileQueue添加到本工作執行緒的QueuedJobs.
                        CurrentWorkerInfo.QueuedJobs.Add(Manager->CompileQueue[JobIndex]);
                    }

                    CurrentWorkerInfo.bIssuedTasksToWorker = false;                    
                    CurrentWorkerInfo.bLaunchedWorker = false;
                    CurrentWorkerInfo.StartTime = FPlatformTime::Seconds();
                    NumActiveThreads++;
                    // 從從管理器的CompileQueue刪除已經劫取的作業. 其中CompileQueue是ThreadSafe模式的TArray.
                    Manager->CompileQueue.RemoveAt(0, JobIndex);
                }
            }
            // 本工作執行緒有作業.
            else
            {
                if (CurrentWorkerInfo.QueuedJobs.Num() > 0)
                {
                    NumActiveThreads++;
                }

                // 增加已經完成的作業到輸出隊列(ShaderMapJobs)
                if (CurrentWorkerInfo.bComplete)
                {
                    for (int32 JobIndex = 0; JobIndex < CurrentWorkerInfo.QueuedJobs.Num(); JobIndex++)
                    {
                        FShaderMapCompileResults& ShaderMapResults = Manager->ShaderMapJobs.FindChecked(CurrentWorkerInfo.QueuedJobs[JobIndex]->Id);
                        ShaderMapResults.FinishedJobs.Add(CurrentWorkerInfo.QueuedJobs[JobIndex]);
                        ShaderMapResults.bAllJobsSucceeded = ShaderMapResults.bAllJobsSucceeded && CurrentWorkerInfo.QueuedJobs[JobIndex]->bSucceeded;
                    }
                    
                    (......)
                    
                    // 更新NumOutstandingJobs數量.
                    FPlatformAtomics::InterlockedAdd(&Manager->NumOutstandingJobs, -CurrentWorkerInfo.QueuedJobs.Num());

                    // 清空作業數據.
                    CurrentWorkerInfo.bComplete = false;
                    CurrentWorkerInfo.QueuedJobs.Empty();
                }
            }
        }
    }
    return NumActiveThreads;
}

以上工作執行緒資訊CurrentWorkerInfo的類型是FShaderCompileWorkerInfo:

// 著色器編譯工作執行緒資訊.
struct FShaderCompileWorkerInfo
{
    // 工作進程的handle. 可能是非法的.
    FProcHandle WorkerProcess;
    // 追蹤是否存在有問題的任何.
    bool bIssuedTasksToWorker;    
    // 是否已啟動.
    bool bLaunchedWorker;
    // 是否所有任務問題都已收到.
    bool bComplete;
    // 最近啟動任務批次的時間.
    double StartTime;
    
    // 工作進程需負責編譯的工作.(注意是執行緒安全模式)
    TArray<TSharedRef<FShaderCommonCompileJob, ESPMode::ThreadSafe>> QueuedJobs;

    // 構造函數.
    FShaderCompileWorkerInfo();
    // 析構函數, 不是Virtual的.
    ~FShaderCompileWorkerInfo()
    {
        if(WorkerProcess.IsValid())
        {
            FPlatformProcess::TerminateProc(WorkerProcess);
            FPlatformProcess::CloseProc(WorkerProcess);
        }
    }
};

至此,Shader的編譯流程和機制已經闡述得差不多了,剩下的細節和機理可以自行研究。

8.3.3 Shader跨平台

我們在開發的時候,只會編寫一種UE Style的HLSL,那麼UE背後是如何將它們編譯成不同圖形API(下表)和FeatureLevel的Shader指令呢?

圖形API 著色語言 解析
Direct3D HLSL(High Level Shading Language) 高級著色語言,只能用於windows平台
OpenGL GLSL(OpenGL Shading Language) 可跨平台,但基於狀態機的設計和現代GPU架構格格不入
OpenGL ES ES GLSL 專用於移動平台
Metal MSL(Metal Shading Language) 只能用於Apple系統
Vulkan SPIR-V SPIR-V是中間語言,可方便且完整地轉譯其它平台的shader

SPIR-V由Khronos(也是OpenGL和Vulkan的締造者)掌管,它實際上是個龐大的生態系統,包含了著色語言、工具鏈及運行時庫:

SPIR-V的生態系統一覽,Shader跨平台只是其中一部分。

SPIR-V也是目前不少商業引擎或渲染器的shader跨平台方案。那麼UE是不是也是使用SPIR-V,還是選擇了其它方案?本節將解答此問題,挖掘UE使用的Shader跨平台方案。

對於Shader跨平台,通常需要考慮以下幾點:

  • 單次編碼多平台使用。這個是基本要求,不能實現此特性,則無從談起跨平台,也增加開發人員的工作量,降低工作效率。
  • 可離線編譯。目前多數shader編譯器都支援這個功能。
  • 需要反射來創建在運行時渲染器使用的元數據。 比如紋理被綁定到哪個索引,Uniform是否被使用使用等等。
  • 特定的優化措施。如離線校驗,內聯化,無用的指令和數據檢測、刪除,指令合併和簡化,離線編譯的是中間語言還是目標機器碼等等。

UE早期在Shader跨平台方案考慮了幾種思路:

  • 純粹用宏封裝各種著色語言的差異。簡單的著色邏輯應該可行,但實際上,各種著色語言存在巨大的差異,幾乎無法用宏抽象。因此不可行。
  • 使用FXC編譯HLSL,然後轉換位元組碼。良好的效果,但致命缺點是無法支援Mac OS平台,因此被棄用。
  • 第三方跨平台編譯器。在當時(2014年),沒有一個能夠支援SM5.0的語法和Coumte Shader的編譯器。

面對當時(2014年前後)的現狀,UE4.3受glsl-optimizer的啟發,基於Mesa GLSL parser and IR造了個自己的輪子HLSLCC(HLSL Cross Compiler)。HLSLCC將分析器用來分析SM5.0(而非GLSL),實現Mesa IR到GLSL的轉換器(類似於glsl-optimizer)。另外,Mesa天然支援IR優化,因此HLSLCC也支援IR優化。

HLSLCC在GLSL下的管線示意圖。Shader編譯器的輸入是HLSL源碼,會先轉成MCPP,然後經過HLSLCC處理成GLSL源碼和參數表。

HLSLCC的主要工作步驟如下所述:

  • Preprocessing,預處理階段。通過類似C風格的預處理器運行,在編譯之前,UE使用MCPP進行預處理,因此跳過了這一步。
  • Parsing,語法分析階段。通過Mesa的_mesa_hlsl_parse介面,HLSL將被分析成抽象語法樹,Lexer(語法分析)和Parser分別由flex和bison生成。
  • Compilation,編譯階段。利用 _mesa_ast_to_hir,將AST(抽象語法樹)編譯為Mesa IR。在此階段,編譯器執行隱式轉換、函數重載解析、生成內部函數的指令等功能,也將生成 GLSL 主入口點,會將輸入及輸出變數的全局聲明添加到IR,同時計算HLSL入口點的輸入,調用HLSL入口點,並將輸出寫入全局輸出變數。
  • Optimization,優化階段。主要通過do_optimization_pass對IR執行多遍優化,包括直接插入函數、消除無用程式碼、傳播常量、消除公共的子表達式等等。
  • Uniform packing,全局變數打包。將全局統一變數打包成數組並保留映射資訊,以便引擎可將參數與一致變數數組的相關部分綁定。
  • Final optimization,最終優化階段。打包統一變數之後,將對IR運行第二遍優化,以簡化打包統一變數時生成的程式碼。
  • Generate GLSL,生成GLSL。最後步驟,將已經優化的IR轉換為GLSL源程式碼。除了生成所有構造及統一變數緩衝區的定義以及源程式碼本身以外,還會在文件開頭的注釋中寫入一個映射表。

以上的闡述涉及的源碼在Engine\Source\ThirdParty\hlslcc目錄下面,核心文件有:

  • ast.h
  • glcpp-parse.h
  • glsl_parser_extras.h
  • hlsl_parser.h
  • ir_optimization.h

下面是編譯階段涉及到的核心函數:

函數名 解析
apply_type_conversion 此函數將一種類型的值轉換為另一種類型(如果有可能的話)。是執行隱式轉換還是顯式轉換由參數控制。
arithmetic_result_type 這組函數確定對輸入值應用操作的結果類型。
validate_assignment 確定某個 rvalue 是否可賦予特定類型的 lvalue。必要時,將應用允許的隱式轉換。
do_assignment 將 rvalue 賦予 lvalue(如果可使用 validate_assignment 完成)。
ast_expression::hir 將AST中的表達式節點轉換為一組IR指令。
process_initializer 將初始化表達式應用於變數。
ast_struct_specifier::hir 構建聚合類型,以表示所聲明的結構。
ast_cbuffer_declaration::hir 構建常量緩衝區布局的結構體,並將其存儲為統一變數塊。
process_mul 處理HLSL內部乘法的特殊程式碼。
match_function_by_name 根據輸入參數的名稱和列表來查找函數特徵符。
rank_parameter_lists 對兩個參數列表進行比較,並指定數字排名以指示這兩個列表的匹配程度。是一個輔助函數,用於執行重載解析:排名最低的特徵符將勝出,如果有任何特徵符的排名與排名最低的特徵符相同,那麼將函數調用聲明為具有歧義。排名為零表示精確匹配。
gen_texture_op 處理內置HLSL紋理和取樣對象的方法調用。
_mesa_glsl_initialize_functions 生成HLSL內部函數的內置函數。大部分函數(例如 sin 和 cos)會生成IR程式碼以執行操作,但某些函數(例如 transpose 和 determinant)會保留函數調用以推遲操作,使其由驅動程式的 GLSL 編譯器執行。

HLSLCC從UE4.3的首個版本開始,到至今的4.26,經歷了數次迭代。例如在UE4.22,Shader的跨平台示意圖如下:

UE4.22的shader跨平台示意圖,其中Metal SL由Mesa IR轉譯而來,Vulkan由Mesa IR-GLSL-GLSlang-SPIR-V多重轉義而來。

在UE4.25,Shader的跨平台示意圖如下:

UE4.25的shader跨平台示意圖,最大的改變在於增加了Shader Conductor,從而通過DXC->SPIR-V再轉譯到Metal、Vulkan、DX等平台。

因此,UE4.25的最大改變在於新增了Shader Conductor,轉換成SPIR-V,以實現Metal、Vulkan等平台的轉移。

其中Shader Conductor也是第三方庫,位於引擎的Engine\Source\ThirdParty\ShaderConductor目錄下。它的核心模組有:

  • ShaderConductor.hpp
  • ShaderConductor.cpp
  • Native.h
  • Native.cpp

Shader Conductor內部還包含了DirectXShaderCompiler、SPIRV-Cross、SPIRV-Headers、SPIRV-Tools等組件。

UE4.25的思路跟叛逆者(龔敏敏)的KlayGE的Shader跨平台方案如出一轍:

Vulkan不但擁有全新的API,還帶來了一個新的shader中間格式SPIR-V。這正是通往統一的跨平台shader編譯路上最更要的一級台階。從趨勢來看,未來將會越來越多引擎和渲染器以SPIR-V做為首選的跨平台技術解決方案。

另外提一個小細節,Direct3D和OpenGL雖然在標準化設備坐標一致,但在UV空間的坐標是不一致的:

UE為了不讓shader的開發人員察覺到這一差異,採用了翻轉的圖片,強制使得UV坐標用統一的範式:

這樣做的後果就是OpenGL的紋理實際上是垂直翻轉的(從RenderDoc截取的UE在OpenGL平台下的應用也可佐證),不過渲染後期可以再次翻轉就行了。但是,UE採用顛倒(Upside down)的渲染方式,並且將顛倒的參數集成到投影矩陣:

因此,看起來標準化設備坐標和D3D下的紋理都是垂直翻轉的。

8.3.4 Shader快取

Shader快取有兩種,一種是存於DDC的離線數據,常用來加速編輯器階段和開發階段的效率,具體可參見8.3.1.2 FGlobalShaderMap。另一種是運行時的Shader快取,早期的UE由FShaderCache承擔,但UE4.26已經取消了FShaderCache,由FShaderPipelineCache取而代之。

FShaderPipelineCache提供了新的管道狀態對象(PSO)日誌記錄、序列化和預編譯機制 。快取管道狀態對象並將初始化器序列化到磁碟,允許在下次遊戲運行時預編譯這些狀態,這可以減少卡頓。但FShaderPipelineCache依賴於FShaderCodeLibrary、Share Material Shader Code和RHI側的PipelineFileCache。

下面是FShaderPipelineCache的定義:

// Engine\Source\Runtime\RenderCore\Public\ShaderPipelineCache.h

class FShaderPipelineCache : public FTickableObjectRenderThread
{
    // 編譯作業結構體.
    struct CompileJob
    {
        FPipelineCacheFileFormatPSO PSO;
        FShaderPipelineCacheArchive* ReadRequests;
    };

public:
    // 初始化FShaderPipelineCache.
    static void Initialize(EShaderPlatform Platform);
    // 銷毀FShaderPipelineCache
    static void Shutdown();
    // 暫停/繼續打包預編譯.
    static void PauseBatching();
    static void ResumeBatching();
    
    // 打包模式
    enum class BatchMode
    {
        Background, // 最大打包尺寸由r.ShaderPipelineCache.BackgroundBatchSize決定.
        Fast, // 最大打包尺寸由r.ShaderPipelineCache.BatchSize決定.
        Precompile // 最大打包尺寸由r.ShaderPipelineCache.PrecompileBatchSize決定.
    };
    
    // 設置和獲取數據介面.
    static void SetBatchMode(BatchMode Mode);
    static uint32 NumPrecompilesRemaining();
    static uint32 NumPrecompilesActive();
    
    static int32 GetGameVersionForPSOFileCache();
    static bool SetGameUsageMaskWithComparison(uint64 Mask, FPSOMaskComparisonFn InComparisonFnPtr);
    static bool IsBatchingPaused();
    
    // 打開FShaderPipelineCache
    static bool OpenPipelineFileCache(EShaderPlatform Platform);
    static bool OpenPipelineFileCache(FString const& Name, EShaderPlatform Platform);
    
    // 保存/關閉FShaderPipelineCache
    static bool SavePipelineFileCache(FPipelineFileCache::SaveMode Mode);
    static void ClosePipelineFileCache();

    // 構造/析構函數.
    FShaderPipelineCache(EShaderPlatform Platform);
    virtual ~FShaderPipelineCache();

    // Tick相關介面.
    bool IsTickable() const;
    // 幀Tick
    void Tick( float DeltaTime );
    bool NeedsRenderingResumedForRenderingThreadTick() const;
    
    TStatId GetStatId() const;
    
    enum ELibraryState
    {
        Opened,
        Closed
    };
    
    // 狀態變換通知.
    static void ShaderLibraryStateChanged(ELibraryState State, EShaderPlatform Platform, FString const& Name);

    // 預編譯上下文.
    class FShaderCachePrecompileContext
    {
        bool bSlowPrecompileTask;
    public:
        FShaderCachePrecompileContext() : bSlowPrecompileTask(false) {}
        void SetPrecompilationIsSlowTask() { bSlowPrecompileTask = true; }
        bool IsPrecompilationSlowTask() const { return bSlowPrecompileTask; }
    };

    // 訊號委託函數.
    static FShaderCachePreOpenDelegate& GetCachePreOpenDelegate();
    static FShaderCacheOpenedDelegate& GetCacheOpenedDelegate();
    static FShaderCacheClosedDelegate& GetCacheClosedDelegate();
    static FShaderPrecompilationBeginDelegate& GetPrecompilationBeginDelegate();
    static FShaderPrecompilationCompleteDelegate& GetPrecompilationCompleteDelegate();

    (......)
    
private:
    // 打包預編譯的各種數據.
    static FShaderPipelineCache* ShaderPipelineCache;
    TArray<CompileJob> ReadTasks;
    TArray<CompileJob> CompileTasks;
    TArray<FPipelineCachePSOHeader> OrderedCompileTasks;
    TDoubleLinkedList<FPipelineCacheFileFormatPSORead*> FetchTasks;
    TSet<uint32> CompiledHashes;
    
    FString FileName;
    EShaderPlatform CurrentPlatform;
    FGuid CacheFileGuid;
    uint32 BatchSize;
    
    FShaderCachePrecompileContext ShaderCachePrecompileContext;

    FCriticalSection Mutex;
    TArray<FPipelineCachePSOHeader> PreFetchedTasks;
    TArray<CompileJob> ShutdownReadCompileTasks;
    TDoubleLinkedList<FPipelineCacheFileFormatPSORead*> ShutdownFetchTasks;

    TMap<FBlendStateInitializerRHI, FRHIBlendState*> BlendStateCache;
    TMap<FRasterizerStateInitializerRHI, FRHIRasterizerState*> RasterizerStateCache;
    TMap<FDepthStencilStateInitializerRHI, FRHIDepthStencilState*> DepthStencilStateCache;
    
    (......)
};

FShaderPipelineCache的打包預編譯獲得的數據保存在工程目錄的Saved目錄下,後綴是.upipelinecache:

// Engine\Source\Runtime\RHI\Private\PipelineFileCache.cpp

bool FPipelineFileCache::SavePipelineFileCache(FString const& Name, SaveMode Mode)
{
    bool bOk = false;
    
    // 必須開啟PipelineFileCache且記錄PSO到文件快取.
    if(IsPipelineFileCacheEnabled() && LogPSOtoFileCache())
    {
        if(FileCache)
        {
            // 保存的平台名稱.
            FName PlatformName = FileCache->GetPlatformName();
            // 保存的目錄
            FString Path = FPaths::ProjectSavedDir() / FString::Printf(TEXT("%s_%s.upipelinecache"), *Name, *PlatformName.ToString());
            // 執行保存操作.
            bOk = FileCache->SavePipelineFileCache(Path, Mode, Stats, NewPSOs, RequestedOrder, NewPSOUsage);
            
            (......)
        }
    }
    
    return bOk;
}

由於是運行時生效的Shader快取,那麼必然是要集成到UE的運行時模組中。實際上是在FEngineLoop內完成對它的操控:

int32 FEngineLoop::PreInitPreStartupScreen(const TCHAR* CmdLine)
{
    (......)
    
    {
        bool bUseCodeLibrary = FPlatformProperties::RequiresCookedData() || GAllowCookedDataInEditorBuilds;
        if (bUseCodeLibrary)
        {
            {
                FShaderCodeLibrary::InitForRuntime(GMaxRHIShaderPlatform);
            }

    #if !UE_EDITOR
            // Cooked data only - but also requires the code library - game only
            if (FPlatformProperties::RequiresCookedData())
            {
                // 初始化FShaderPipelineCache
                FShaderPipelineCache::Initialize(GMaxRHIShaderPlatform);
            }
    #endif //!UE_EDITOR
        }
    }
    
    (......)
}

int32 FEngineLoop::PreInitPostStartupScreen(const TCHAR* CmdLine)
{
    (......)
    
    IInstallBundleManager* BundleManager = IInstallBundleManager::GetPlatformInstallBundleManager();
    if (BundleManager == nullptr || BundleManager->IsNullInterface())
    {
        (......)

        {
            // 打開包含了材質著色器的遊戲庫.
            FShaderCodeLibrary::OpenLibrary(FApp::GetProjectName(), FPaths::ProjectContentDir());
            for (const FString& RootDir : FPlatformMisc::GetAdditionalRootDirectories())
            {
                FShaderCodeLibrary::OpenLibrary(FApp::GetProjectName(), FPaths::Combine(RootDir, FApp::GetProjectName(), TEXT("Content")));
            }

            // 打開FShaderPipelineCache.
            FShaderPipelineCache::OpenPipelineFileCache(GMaxRHIShaderPlatform);
        }
    }
    
    (......)
}

此外,GameEngine也會運行時相應命令行的繼續和暫停預編譯打包。一旦FShaderPipelineCache的實際準備好,RHI層就可以相應它的實際和訊號,以Vulkan的FVulkanPipelineStateCacheManager為例:

// Engine\Source\Runtime\VulkanRHI\Private\VulkanPipeline.h

class FVulkanPipelineStateCacheManager
{
    (......)

private:
    // 追蹤ShaderPipelineCache的預編譯的委託.
    void OnShaderPipelineCacheOpened(FString const& Name, EShaderPlatform Platform, uint32 Count, const FGuid& VersionGuid, FShaderPipelineCache::FShaderCachePrecompileContext& ShaderCachePrecompileContext);
    void OnShaderPipelineCachePrecompilationComplete(uint32 Count, double Seconds, const FShaderPipelineCache::FShaderCachePrecompileContext& ShaderCachePrecompileContext);

    (......)
};

如果要開啟Shader Pipeline Cache,需要在工程配置里勾選以下兩項(默認已開啟):

下面有一些命令行變數可以設置Shader Pipeline Cache的屬性:

命令行 作用
r.ShaderPipelineCache.Enabled 開啟Shader Pipeline Cache,以便從磁碟載入已有的數據並預編譯。
r.ShaderPipelineCache.BatchSize / BackgroundBatchSize 可以設置不同Batch模式下的尺寸。
r.ShaderPipelineCache.LogPSO 開啟Shader Pipeline Cache下的PSO記錄。
r.ShaderPipelineCache.SaveAfterPSOsLogged 設置預期的PSO記錄數量,到了此數量便自動保存。

另外,在GGameIni或GGameUserSettingsIni內,Shader Pipeline Cache用欄位 [ShaderPipelineCache.CacheFile]存儲資訊。

 

8.4 Shader開發

本章將講述Shader的開發案例、調試技巧和優化技術。

8.4.1 Shader調試

如果項目處於開發階段,最好將Shader的編譯選項改成Development,可以通過修改Engine\Config\ConsoleVariables.ini的以下配置達成:

將命令變數前面的分號去掉即可。它們的含義如下:

命令行 解析
r.ShaderDevelopmentMode=1 獲得關於著色器編譯的詳細日誌和錯誤重試的機會。
r.DumpShaderDebugInfo=1 將編譯的所有著色器的文件保存到磁碟ProjectName/Saved/ShaderDebugInfo的目錄。包含源文件、預處理後的版本、一個批處理文件(用於使用編譯器等效的命令行選項來編譯預處理版本)。
r.DumpShaderDebugShortNames=1 保存的Shader路徑將被精簡。
r.Shaders.Optimize=0 禁用著色器優化,使得shader的調試資訊被保留。
r.Shaders.KeepDebugInfo=1 保留調試資訊,配合RenderDoc等截幀工具時特別有用。
r.Shaders.SkipCompression=1 忽略shader壓縮,可以節省調試shader的時間。

開啟了以上命令之後,用RenderDoc截幀將可以完整地看到Shader的變數、HLSL程式碼(不開啟將是彙編指令),還可以單步調試。能夠有效提升Shader開發和調試的效率。

r.DumpShaderDebugInfo開啟後,隨意在UE的內置shader修改一行程式碼(比如在Common.ush加個空格),重啟UE編輯器,著色器將被重新編譯,完成之後在ProjectName/Saved/ShaderDebugInfo的目錄下生成有用的調試資訊:

打開某個具體的材質shader目錄,可以發現有源文件、預處理後的版本、批處理文件以及哈希值:

另外,如果修改了Shader的某些文件(如BasePassPixelShader.ush),不需要重啟UE編輯器,可以在控制台輸入RecompileShaders命令重新編譯指定的shader文件。其中RecompileShaders的具體含義如下:

命令 解析
RecompileShaders all 編譯源碼有修改的所有shader,包含global、material、meshmaterial。
RecompileShaders changed 編譯源碼有修改的shader。
RecompileShaders global 編譯源碼有修改的global shader。
RecompileShaders material 編譯源碼有修改的material shader。
RecompileShaders material 編譯指定名稱的材質。
RecompileShaders 編譯指定路徑的shader源文件。

執行以上命令之前,必須先保存shader文件的修改。

另外,要在調試時構建項目時,可以設置ShaderCompileWorker的解決方案屬性(Visual Studio:生成 -> 配置管理器)為 Debug_Program:

這樣就可以用ShaderCompileWorker (SCW) 添加Shader調試命令行:

PathToGeneratedUsfFile -directcompile -format=ShaderFormat -ShaderType -entry=EntryPoint
  • PathToGeneratedUsfFile 是 ShaderDebugInfo 文件夾中的最終 usf 文件。
  • ShaderFormat 是您想要調試的著色器平台格式(在本例中,這是 PCD3D_SM5)。
  • ShaderType 是 vs/ps/gs/hs/ds/cs 中的一項,分別對應於「頂點」、「像素」、「幾何體」、「物體外殼」、「域」和「計算」著色器類型。
  • EntryPoint 是 usf 文件中此著色器的入口點的函數名稱。

例如:

<ProjectPath>\Saved\ShaderDebugInfo\PCD3D_SM5\M_Egg\LocalVF\BPPSFNoLMPolicy\BasePassPixelShader.usf -format=PCD3D_SM5 -ps -entry=Main

可以對D3D11ShaderCompiler.cpp中的CompileD3DShader()函數設置斷點,通過命令行運行 SCW,可以了解如何調用平台編譯器:

// Engine\Source\Developer\Windows\ShaderFormatD3D\Private\D3DShaderCompiler.cpp

void CompileD3DShader(const FShaderCompilerInput& Input, FShaderCompilerOutput& Output, FShaderCompilerDefinitions& AdditionalDefines, const FString& WorkingDirectory, ELanguage Language)
{
    FString PreprocessedShaderSource;
    const bool bIsRayTracingShader = IsRayTracingShader(Input.Target);
    const bool bUseDXC = bIsRayTracingShader
        || Input.Environment.CompilerFlags.Contains(CFLAG_WaveOperations)
        || Input.Environment.CompilerFlags.Contains(CFLAG_ForceDXC);
    const TCHAR* ShaderProfile = GetShaderProfileName(Input.Target, bUseDXC);

    if(!ShaderProfile)
    {
        Output.Errors.Add(FShaderCompilerError(TEXT("Unrecognized shader frequency")));
        return;
    }

    // 設置附加的定義.
    AdditionalDefines.SetDefine(TEXT("COMPILER_HLSL"), 1);

    if (bUseDXC)
    {
        AdditionalDefines.SetDefine(TEXT("PLATFORM_SUPPORTS_SM6_0_WAVE_OPERATIONS"), 1);
        AdditionalDefines.SetDefine(TEXT("PLATFORM_SUPPORTS_STATIC_SAMPLERS"), 1);
    }

    if (Input.bSkipPreprocessedCache)
    {
        if (!FFileHelper::LoadFileToString(PreprocessedShaderSource, *Input.VirtualSourceFilePath))
        {
            return;
        }

        // 刪除常量, 因為是僅調試模式.
        CrossCompiler::CreateEnvironmentFromResourceTable(PreprocessedShaderSource, (FShaderCompilerEnvironment&)Input.Environment);
    }
    else
    {
        if (!PreprocessShader(PreprocessedShaderSource, Output, Input, AdditionalDefines))
        {
            return;
        }
    }

    GD3DAllowRemoveUnused = Input.Environment.CompilerFlags.Contains(CFLAG_ForceRemoveUnusedInterpolators) ? 1 : 0;

    FString EntryPointName = Input.EntryPointName;

    Output.bFailedRemovingUnused = false;
    if (GD3DAllowRemoveUnused == 1 && Input.Target.Frequency == SF_Vertex && Input.bCompilingForShaderPipeline)
    {
        // 總是增加SV_Position
        TArray<FString> UsedOutputs = Input.UsedOutputs;
        UsedOutputs.AddUnique(TEXT("SV_POSITION"));

        // 不能刪除任何僅輸出的系統語法.
        TArray<FString> Exceptions;
        Exceptions.AddUnique(TEXT("SV_ClipDistance"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance0"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance1"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance2"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance3"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance4"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance5"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance6"));
        Exceptions.AddUnique(TEXT("SV_ClipDistance7"));

        Exceptions.AddUnique(TEXT("SV_CullDistance"));
        Exceptions.AddUnique(TEXT("SV_CullDistance0"));
        Exceptions.AddUnique(TEXT("SV_CullDistance1"));
        Exceptions.AddUnique(TEXT("SV_CullDistance2"));
        Exceptions.AddUnique(TEXT("SV_CullDistance3"));
        Exceptions.AddUnique(TEXT("SV_CullDistance4"));
        Exceptions.AddUnique(TEXT("SV_CullDistance5"));
        Exceptions.AddUnique(TEXT("SV_CullDistance6"));
        Exceptions.AddUnique(TEXT("SV_CullDistance7"));
        
        DumpDebugShaderUSF(PreprocessedShaderSource, Input);

        TArray<FString> Errors;
        if (!RemoveUnusedOutputs(PreprocessedShaderSource, UsedOutputs, Exceptions, EntryPointName, Errors))
        {
            DumpDebugShaderUSF(PreprocessedShaderSource, Input);
            UE_LOG(LogD3D11ShaderCompiler, Warning, TEXT("Failed to Remove unused outputs [%s]!"), *Input.DumpDebugInfoPath);
            for (int32 Index = 0; Index < Errors.Num(); ++Index)
            {
                FShaderCompilerError NewError;
                NewError.StrippedErrorMessage = Errors[Index];
                Output.Errors.Add(NewError);
            }
            Output.bFailedRemovingUnused = true;
        }
    }

    FShaderParameterParser ShaderParameterParser;
    if (!ShaderParameterParser.ParseAndMoveShaderParametersToRootConstantBuffer(
        Input, Output, PreprocessedShaderSource,
        IsRayTracingShader(Input.Target) ? TEXT("cbuffer") : nullptr))
    {
        return;
    }

    RemoveUniformBuffersFromSource(Input.Environment, PreprocessedShaderSource);

    uint32 CompileFlags = D3D10_SHADER_ENABLE_BACKWARDS_COMPATIBILITY
        // 解壓unifor矩陣成行優先(row-major), 以匹配CPU布局.
        | D3D10_SHADER_PACK_MATRIX_ROW_MAJOR;

    if (Input.Environment.CompilerFlags.Contains(CFLAG_Debug)) 
    {
        // 增加調試標記.
        CompileFlags |= D3D10_SHADER_DEBUG | D3D10_SHADER_SKIP_OPTIMIZATION;
    }
    else
    {
        if (Input.Environment.CompilerFlags.Contains(CFLAG_StandardOptimization))
        {
            CompileFlags |= D3D10_SHADER_OPTIMIZATION_LEVEL1;
        }
        else
        {
            CompileFlags |= D3D10_SHADER_OPTIMIZATION_LEVEL3;
        }
    }

    for (int32 FlagIndex = 0; FlagIndex < Input.Environment.CompilerFlags.Num(); FlagIndex++)
    {
        // 累積標記設置到shader.
        CompileFlags |= TranslateCompilerFlagD3D11((ECompilerFlags)Input.Environment.CompilerFlags[FlagIndex]);
    }

    TArray<FString> FilteredErrors;
    if (bUseDXC)
    {
        if (!CompileAndProcessD3DShaderDXC(PreprocessedShaderSource, CompileFlags, Input, EntryPointName, ShaderProfile, Language, false, FilteredErrors, Output))
        {
            if (!FilteredErrors.Num())
            {
                FilteredErrors.Add(TEXT("Compile Failed without errors!"));
            }
        }
        CrossCompiler::FShaderConductorContext::ConvertCompileErrors(MoveTemp(FilteredErrors), Output.Errors);
    }
    else
    {
        // 重寫默認的編譯器路徑到更新的dll.
        FString CompilerPath = FPaths::EngineDir();
        CompilerPath.Append(TEXT("Binaries/ThirdParty/Windows/DirectX/x64/d3dcompiler_47.dll"));

        if (!CompileAndProcessD3DShaderFXC(PreprocessedShaderSource, CompilerPath, CompileFlags, Input, EntryPointName, ShaderProfile, false, FilteredErrors, Output))
        {
            if (!FilteredErrors.Num())
            {
                FilteredErrors.Add(TEXT("Compile Failed without errors!"));
            }
        }

        // 處理錯誤.
        for (int32 ErrorIndex = 0; ErrorIndex < FilteredErrors.Num(); ErrorIndex++)
        {
            const FString& CurrentError = FilteredErrors[ErrorIndex];
            FShaderCompilerError NewError;

            // Extract filename and line number from FXC output with format:
            // "d:\UE4\Binaries\BasePassPixelShader(30,7): error X3000: invalid target or usage string"
            int32 FirstParenIndex = CurrentError.Find(TEXT("("));
            int32 LastParenIndex = CurrentError.Find(TEXT("):"));
            if (FirstParenIndex != INDEX_NONE &&
                LastParenIndex != INDEX_NONE &&
                LastParenIndex > FirstParenIndex)
            {
                // Extract and store error message with source filename
                NewError.ErrorVirtualFilePath = CurrentError.Left(FirstParenIndex);
                NewError.ErrorLineString = CurrentError.Mid(FirstParenIndex + 1, LastParenIndex - FirstParenIndex - FCString::Strlen(TEXT("(")));
                NewError.StrippedErrorMessage = CurrentError.Right(CurrentError.Len() - LastParenIndex - FCString::Strlen(TEXT("):")));
            }
            else
            {
                NewError.StrippedErrorMessage = CurrentError;
            }
            Output.Errors.Add(NewError);
        }
    }

    const bool bDirectCompile = FParse::Param(FCommandLine::Get(), TEXT("directcompile"));
    if (bDirectCompile)
    {
        for (const auto& Error : Output.Errors)
        {
            FPlatformMisc::LowLevelOutputDebugStringf(TEXT("%s\n"), *Error.GetErrorStringWithLineMarker());
        }
    }

    ShaderParameterParser.ValidateShaderParameterTypes(Input, Output);

    if (Input.ExtraSettings.bExtractShaderSource)
    {
        Output.OptionalFinalShaderSource = PreprocessedShaderSource;
    }
}

此外,如果不藉助RenderDoc等工具,可以將需要調試的數據轉換成合理範圍的顏色值,以觀察它的值是否正常,例如:

// 將世界坐標除以一個範圍內的數值, 並輸出到顏色.
OutColor = frac(WorldPosition / 1000);

配合RecompileShaders的指令,這一技巧非常管用且高效。

8.4.2 Shader優化

渲染的優化技術五花八門,大到系統、架構、工程層級,小到具體的語句,不過本節專註於UE環境下的Shader常規優化技巧。

8.4.2.1 優化排列

由於UE採用了Uber Shader的設計,同一個shader源文件包含了大量的宏定義,這些宏定義根據不同的值可以組合成非常非常多的目標程式碼,而這些宏通常由排列來控制。如果我們能夠有效控制排列的數量,也可以減少Shader的編譯數量、時間,提升運行時的效率。

在工廠配置中,有一些選項可以取消勾選,以減少排列的數量:

但需要注意,如果取消了勾選,意味著引擎將禁用該功能,需要根據實際情況做出權衡和選擇,而不應該為了優化而優化。

此外,在引擎渲染模組的很多內置類型,都提供ShouldCompilePermutation的介面,以便編譯器在正式編譯之前向被編譯對象查詢某個排列是否需要編譯,如果返回false,編譯器將忽略該排列,從而減少shader數量。支援ShouldCompilePermutation的類型包含但不限於:

  • FShader
  • FGlobalShader
  • FMaterialShader
  • FMeshMaterialShader
  • FVertexFactory
  • FLocalVertexFactory
  • FShaderType
  • FGlobalShaderType
  • FMaterialShaderType
  • 上述類型的子類

所以,我們在新添加以上類型的子類時,有必要認真對待ShouldCompilePermutation,以便剔除一些無用的shader排列。

對於材質,可以關閉材質屬性模板的 Automatically Set Usage in Editor選項,防止編輯過程中產生額外的標記,增加shader排列:

但帶來的效益可能不明顯,還會因為漏選某些標記導致材質不能正常工作(比如不支援蒙皮骨骼,不支援BS等)。

此外,要謹慎添加Switch節點,這些通常也會增加排列數量:

8.4.2.2 指令優化

  • 避免if、switch分支語句。

  • 避免for循環語句,特別是循環次數可變的。

  • 減少紋理取樣次數。

  • 禁用clipdiscard操作。

  • 減少複雜數學函數調用。

  • 使用更低精度的浮點數。OpenGL ES的浮點數有三種精度:highp(32位浮點), mediump(16位浮點), lowp(8位浮點),很多計算不需要高精度,可以改成低精度浮點。

  • 避免重複計算。可以將所有像素一樣的變數提前計算好,或者由C++層傳入:

    precision mediump float;
    float a = 0.9;
    float b = 0.6;
    
    varying vec4 vColor;
    
    void main()
    {
        gl_FragColor = vColor * a * b; // a * b每個像素都會計算,導致冗餘的消耗。可將a * b在c++層計算好再傳進shader。
    }
    
  • 向量延遲計算。

    highp float f0, f1;
    highp vec4 v0, v1;
    
    v0 = (v1 * f0) * f1; // v1和f0計算後返回一個向量,再和f1計算,多了一次向量計算。
    // 改成:
    v0 = v1 * (f0 * f1); // 先計算兩個浮點數,這樣只需跟向量計算一次。
    
  • 充分利用向量分量掩碼。

    highp vec4 v0;
    highp vec4 v1;
    highp vec4 v2;
    v2.xz = v0 * v1; // v2隻用了xz分量,比v2 = v0 * v1的寫法要快。
    
  • 避免或減少臨時變數。

  • 盡量將Pixel Shader計算移到Vertex Shader。例如像素光改成頂點光。

  • 將跟頂點或像素無關的計算移到CPU,然後通過uniform傳進來。

  • 分級策略。不同畫質不同平台採用不同複雜度的演算法。

  • 頂點輸入應當採用逐Structure的布局,避免每個頂點屬性一個數組。逐Structure的布局有利於提升GPU快取命中率。

  • 儘可能用Compute Shader代替傳統的VS、PS管線。CS的管線更加簡單、純粹,利於並行化計算,結合LDS機制,可有效提升效率。

  • 降解析度渲染。有些資訊沒有必要全分配率渲染,如模糊的倒影、SSR、SSGI等。

8.4.3 Shader開發案例

結合開發案例,有利於鞏固對UE Shader體系的掌握和理解。

8.4.3.1 新增Global Shader

本節通過增加一個全新的最簡化的Global Shader,以闡述Shader添加過程和步驟。

首先需要新增加一個shader源文件,此處命名為MyTest.ush:

// VS主入口.
void MainVS(
    in float4 InPosition : ATTRIBUTE0,
    out float4 Output : SV_POSITION)
{
    Output = InPosition;
}


// 顏色變數, 由c++層傳入.
float4 MyColor;

// PS主入口.
float4 MainPS() : SV_Target0
{
    return MyColor;
}

再添加C++相關的VS和PS:

#include "GlobalShader.h"

// VS, 繼承自FGlobalShader
class FMyVS : public FGlobalShader
{
    DECLARE_EXPORTED_SHADER_TYPE(FMyVS, Global, /*MYMODULE_API*/);

    FMyTestVS() {}
    FMyTestVS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
        : FGlobalShader(Initializer)
    {
    }

    static bool ShouldCache(EShaderPlatform Platform)
    {
        return true;
    }
};

// 實現VS.
IMPLEMENT_SHADER_TYPE(, FMyVS, TEXT("MyTest"), TEXT("MainVS"), SF_Vertex);


// PS, 繼承自FGlobalShader
class FMyTestPS : public FGlobalShader
{
    DECLARE_EXPORTED_SHADER_TYPE(FMyPS, Global, /*MYMODULE_API*/);

    FShaderParameter MyColorParameter;

    FMyTestPS() {}
    FMyTestPS(const ShaderMetaType::CompiledShaderInitializerType& Initializer)
        : FGlobalShader(Initializer)
    {
        // 綁定著色器參數.
        MyColorParameter.Bind(Initializer.ParameterMap, TEXT("MyColor"), SPF_Mandatory);
    }

    static void ModifyCompilationEnvironment(EShaderPlatform Platform, FShaderCompilerEnvironment& OutEnvironment)
    {
        FGlobalShader::ModifyCompilationEnvironment(Platform, OutEnvironment);
        // 增加定義.
        OutEnvironment.SetDefine(TEXT("MY_DEFINE"), 1);
    }

    static bool ShouldCache(EShaderPlatform Platform)
    {
        return true;
    }

    // 序列化.
    virtual bool Serialize(FArchive& Ar) override
    {
        bool bShaderHasOutdatedParameters = FGlobalShader::Serialize(Ar);
        Ar << MyColorParameter;
        return bShaderHasOutdatedParameters;
    }

    void SetColor(FRHICommandList& RHICmdList, const FLinearColor& Color)
    {
        // 設置顏色到RHI.
        SetShaderValue(RHICmdList, RHICmdList.GetBoundPixelShader(), MyColorParameter, Color);
    }
};

// 實現PS.
IMPLEMENT_SHADER_TYPE(, FMyPS, TEXT("MyTest"), TEXT("MainPS"), SF_Pixel);

最後編寫渲染程式碼調用上述自定義的VS和PS:

void RenderMyTest(FRHICommandList& RHICmdList, ERHIFeatureLevel::Type FeatureLevel, const FLinearColor& Color)
{
    // 獲取全局著色器映射表.
    auto ShaderMap = GetGlobalShaderMap(FeatureLevel);

    // 獲取VS和PS實例.
    TShaderMapRef<FMyVS> MyVS(ShaderMap);
    TShaderMapRef<FMyPS> MyPS(ShaderMap);

    // 渲染狀態.
    static FGlobalBoundShaderState MyTestBoundShaderState;
    SetGlobalBoundShaderState(RHICmdList, FeatureLevel, MyTestBoundShaderState, GetVertexDeclarationFVector4(), *MyVS, *MyPS);

    // 設置PS的顏色.
    MyPS->SetColor(RHICmdList, Color);

    // 設置渲染狀態.
    RHICmdList.SetRasterizerState(TStaticRasterizerState::GetRHI());
    RHICmdList.SetBlendState(TStaticBlendState<>::GetRHI());
    RHICmdList.SetDepthStencilState(TStaticDepthStencilState::GetRHI(), 0);

    // 建立全螢幕幕方塊的頂點.
    FVector4 Vertices[4];
    Vertices[0].Set(-1.0f, 1.0f, 0, 1.0f);
    Vertices[1].Set(1.0f, 1.0f, 0, 1.0f);
    Vertices[2].Set(-1.0f, -1.0f, 0, 1.0f);
    Vertices[3].Set(1.0f, -1.0f, 0, 1.0f);

    // 繪製方塊.
    DrawPrimitiveUP(RHICmdList, PT_TriangleStrip, 2, Vertices, sizeof(Vertices[0]));
}

RenderMyTest實現完之後,可以添加到FDeferredShadingSceneRenderer::RenderFinish之中,以接入到主渲染流程中:

// 控制台變數, 以便運行時查看效果.
static TAutoConsoleVariable CVarMyTest(
    TEXT("r.MyTest"),
    0,
    TEXT("Test My Global Shader, set it to 0 to disable, or to 1, 2 or 3 for fun!"),
    ECVF_RenderThreadSafe
);

void FDeferredShadingSceneRenderer::RenderFinish(FRHICommandListImmediate& RHICmdList)
{
    (......)
    
    // 增加自定義的程式碼,以覆蓋UE之前的渲染。
    int32 MyTestValue = CVarMyTest.GetValueOnAnyThread();
    if (MyTestValue != 0)
    {
        FLinearColor Color(MyTestValue == 1, MyTestValue == 2, MyTestValue == 3, 1);
        RenderMyTest(RHICmdList, FeatureLevel, Color);
    }

    FSceneRenderer::RenderFinish(RHICmdList);
    
    (......)
}

以上邏輯最終渲染的顏色由r.MyTest決定:如果是0,則禁用;是1顯示紅色;是2顯示綠色;是3顯示藍色。

8.4.3.2 新增Vertex Factory

新增加FVertexFactory子類的過程如下:

// FMyVertexFactory.h

// 聲明頂點工廠著色器參數.
BEGIN_GLOBAL_SHADER_PARAMETER_STRUCT(FMyVertexFactoryParameters, )
    SHADER_PARAMETER(FVector4, Color)
END_GLOBAL_SHADER_PARAMETER_STRUCT()

// 聲明類型.
typedef TUniformBufferRef<FMyVertexFactoryParameters> FMyVertexFactoryBufferRef;

// 索引緩衝.
class FMyMeshIndexBuffer : public FIndexBuffer
{
public:
    FMyMeshIndexBuffer(int32 InNumQuadsPerSide) : NumQuadsPerSide(InNumQuadsPerSide) {}

    void InitRHI() override
    {
        if (NumQuadsPerSide < 256)
        {
            IndexBufferRHI = CreateIndexBuffer<uint16>();
        }
        else
        {
            IndexBufferRHI = CreateIndexBuffer<uint32>();
        }
    }

    int32 GetIndexCount() const { return NumIndices; };

private:
    template <typename IndexType>
    FIndexBufferRHIRef CreateIndexBuffer()
    {
        TResourceArray<IndexType, INDEXBUFFER_ALIGNMENT> Indices;

        // 分配頂點索引記憶體.
        Indices.Reserve(NumQuadsPerSide * NumQuadsPerSide * 6);

        // 用Morton順序構建索引緩衝, 以更好地重用頂點.
        for (int32 Morton = 0; Morton < NumQuadsPerSide * NumQuadsPerSide; Morton++)
        {
            int32 SquareX = FMath::ReverseMortonCode2(Morton);
            int32 SquareY = FMath::ReverseMortonCode2(Morton >> 1);

            bool ForwardDiagonal = false;

            if (SquareX % 2)
            {
                ForwardDiagonal = !ForwardDiagonal;
            }
            if (SquareY % 2)
            {
                ForwardDiagonal = !ForwardDiagonal;
            }

            int32 Index0 = SquareX + SquareY * (NumQuadsPerSide + 1);
            int32 Index1 = Index0 + 1;
            int32 Index2 = Index0 + (NumQuadsPerSide + 1);
            int32 Index3 = Index2 + 1;

            Indices.Add(Index3);
            Indices.Add(Index1);
            Indices.Add(ForwardDiagonal ? Index2 : Index0);
            Indices.Add(Index0);
            Indices.Add(Index2);
            Indices.Add(ForwardDiagonal ? Index1 : Index3);
        }

        NumIndices = Indices.Num();
        const uint32 Size = Indices.GetResourceDataSize();
        const uint32 Stride = sizeof(IndexType);

        // Create index buffer. Fill buffer with initial data upon creation
        FRHIResourceCreateInfo CreateInfo(&Indices);
        return RHICreateIndexBuffer(Stride, Size, BUF_Static, CreateInfo);
    }

    int32 NumIndices = 0;
    const int32 NumQuadsPerSide = 0;
};

// 頂點索引.
class FMyMeshVertexBuffer : public FVertexBuffer
{
public:
    FMyMeshVertexBuffer(int32 InNumQuadsPerSide) : NumQuadsPerSide(InNumQuadsPerSide) {}

    virtual void InitRHI() override
    {
        const uint32 NumVertsPerSide = NumQuadsPerSide + 1;
        
        NumVerts = NumVertsPerSide * NumVertsPerSide;

        FRHIResourceCreateInfo CreateInfo;
        void* BufferData = nullptr;
        VertexBufferRHI = RHICreateAndLockVertexBuffer(sizeof(FVector4) * NumVerts, BUF_Static, CreateInfo, BufferData);
        FVector4* DummyContents = (FVector4*)BufferData;

        for (uint32 VertY = 0; VertY < NumVertsPerSide; VertY++)
        {
            FVector4 VertPos;
            VertPos.Y = (float)VertY / NumQuadsPerSide - 0.5f;

            for (uint32 VertX = 0; VertX < NumVertsPerSide; VertX++)
            {
                VertPos.X = (float)VertX / NumQuadsPerSide - 0.5f;

                DummyContents[NumVertsPerSide * VertY + VertX] = VertPos;
            }
        }

        RHIUnlockVertexBuffer(VertexBufferRHI);
    }

    int32 GetVertexCount() const { return NumVerts; }

private:
    int32 NumVerts = 0;
    const int32 NumQuadsPerSide = 0;
};

// 頂點工廠.
class FMyVertexFactory : public FVertexFactory
{
    DECLARE_VERTEX_FACTORY_TYPE(FMyVertexFactory);

public:
    using Super = FVertexFactory;

    FMyVertexFactory(ERHIFeatureLevel::Type InFeatureLevel);
    ~FMyVertexFactory();

    virtual void InitRHI() override;
    virtual void ReleaseRHI() override;

    static bool ShouldCompilePermutation(const FVertexFactoryShaderPermutationParameters& Parameters);
    static void ModifyCompilationEnvironment(const FVertexFactoryShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment);
    static void ValidateCompiledResult(const FVertexFactoryType* Type, EShaderPlatform Platform, const FShaderParameterMap& ParameterMap, TArray<FString>& OutErrors);

    inline const FUniformBufferRHIRef GetMyVertexFactoryUniformBuffer() const { return UniformBuffer; }

private:
    void SetupUniformData();

    FMyMeshVertexBuffer* VertexBuffer = nullptr;
    FMyMeshIndexBuffer* IndexBuffer = nullptr;

    FMyVertexFactoryBufferRef UniformBuffer;
};


// FMyVertexFactory.cpp

#include "ShaderParameterUtils.h"

// 實現FMyVertexFactoryParameters, 注意在shader的名字是MyVF.
IMPLEMENT_GLOBAL_SHADER_PARAMETER_STRUCT(FMyVertexFactoryParameters, "MyVF");


// 頂點工廠著色器參數.
class FMyVertexFactoryShaderParameters : public FVertexFactoryShaderParameters
{
    DECLARE_TYPE_LAYOUT(FMyVertexFactoryShaderParameters, NonVirtual);

public:
    
    void Bind(const FShaderParameterMap& ParameterMap)
    {
    }

    void GetElementShaderBindings(
        const class FSceneInterface* Scene,
        const class FSceneView* View,
        const class FMeshMaterialShader* Shader,
        const EVertexInputStreamType InputStreamType,
        ERHIFeatureLevel::Type FeatureLevel,
        const class FVertexFactory* InVertexFactory,
        const struct FMeshBatchElement& BatchElement,
        class FMeshDrawSingleShaderBindings& ShaderBindings,
        FVertexInputStreamArray& VertexStreams) const
    {
        // 強制轉換成FMyVertexFactory.
        FMyVertexFactory* VertexFactory = (FMyVertexFactory*)InVertexFactory;

        // 增加shader幫定到表格.
        ShaderBindings.Add(Shader->GetUniformBufferParameter<FMyVertexFactoryShaderParameters>(), VertexFactory->GetMyVertexFactoryUniformBuffer());

        // 填充頂點流.
        if (VertexStreams.Num() > 0)
        {
            // 處理頂點流索引.
            for (int32 i = 0; i < 2; ++i)
            {
                FVertexInputStream* InstanceInputStream = VertexStreams.FindByPredicate([i](const FVertexInputStream& InStream) { return InStream.StreamIndex == i+1; });
                // 綁定頂點流索引.
                InstanceInputStream->VertexBuffer = InstanceDataBuffers->GetBuffer(i);
            }

            // 處理偏移.
            if (InstanceOffsetValue > 0)
            {
                VertexFactory->OffsetInstanceStreams(InstanceOffsetValue, InputStreamType, VertexStreams);
            }
        }
    }
};

// ----------- 實現頂點工廠 -----------

FMyVertexFactory::FMyVertexFactory(ERHIFeatureLevel::Type InFeatureLevel)
{
    VertexBuffer = new FMyMeshVertexBuffer(16);
    IndexBuffer = new FMyMeshIndexBuffer(16);
}

FMyVertexFactory::~FMyVertexFactory()
{
    delete VertexBuffer;
    delete IndexBuffer;
}

void FMyVertexFactory::InitRHI()
{
    Super::InitRHI();

    // 設置Uniform數據.
    SetupUniformData();

    VertexBuffer->InitResource();
    IndexBuffer->InitResource();

    // 頂點流: 位置
    FVertexStream PositionVertexStream;
    PositionVertexStream.VertexBuffer = VertexBuffer;
    PositionVertexStream.Stride = sizeof(FVector4);
    PositionVertexStream.Offset = 0;
    PositionVertexStream.VertexStreamUsage = EVertexStreamUsage::Default;

    // 簡單的實例化頂點流數據 其中VertexBuffer在綁定時設置.
    FVertexStream InstanceDataVertexStream;
    InstanceDataVertexStream.VertexBuffer = nullptr;
    InstanceDataVertexStream.Stride = sizeof(FVector4);
    InstanceDataVertexStream.Offset = 0;
    InstanceDataVertexStream.VertexStreamUsage = EVertexStreamUsage::Instancing;

    FVertexElement VertexPositionElement(Streams.Add(PositionVertexStream), 0, VET_Float4, 0, PositionVertexStream.Stride, false);

    // 頂點聲明.
    FVertexDeclarationElementList Elements;
    Elements.Add(VertexPositionElement);

    // 添加索引頂點流.
    for (int32 StreamIdx = 0; StreamIdx < NumAdditionalVertexStreams; ++StreamIdx)
    {
        FVertexElement InstanceElement(Streams.Add(InstanceDataVertexStream), 0, VET_Float4, 8 + StreamIdx, InstanceDataVertexStream.Stride, true);
        Elements.Add(InstanceElement);
    }

    // 初始化聲明.
    InitDeclaration(Elements);
}

void FMyVertexFactory::ReleaseRHI()
{
    UniformBuffer.SafeRelease();
    
    if (VertexBuffer)
    {
        VertexBuffer->ReleaseResource();
    }

    if (IndexBuffer)
    {
        IndexBuffer->ReleaseResource();
    }

    Super::ReleaseRHI();
}

void FMyVertexFactory::SetupUniformData()
{
    FMyVertexFactoryParameters UniformParams;
    UniformParams.Color = FVector4(1,0,0,1);

    UniformBuffer = FMyVertexFactoryBufferRef::CreateUniformBufferImmediate(UniformParams, UniformBuffer_MultiFrame);
}

void FMyVertexFactory::ShouldCompilePermutation(const FVertexFactoryShaderPermutationParameters& Parameters)
{
    return true;
}

void FMyVertexFactory::ModifyCompilationEnvironment(const FVertexFactoryShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)
{
    OutEnvironment.SetDefine(TEXT("MY_MESH_FACTORY"), 1);
}

void FMyVertexFactory::ValidateCompiledResult(const FVertexFactoryType* Type, EShaderPlatform Platform, const FShaderParameterMap& ParameterMap, TArray<FString>& OutErrors)
{
}

C++層的邏輯已經完成,但HLSL層也需要編寫對應的程式碼:

#include "/Engine/Private/VertexFactoryCommon.ush"

// VS插值到PS的結構體。
struct FVertexFactoryInterpolantsVSToPS
{
#if NUM_TEX_COORD_INTERPOLATORS
    float4    TexCoords[(NUM_TEX_COORD_INTERPOLATORS+1)/2] : TEXCOORD0;
#endif

#if VF_USE_PRIMITIVE_SCENE_DATA
    nointerpolation uint PrimitiveId : PRIMITIVE_ID;
#endif

#if INSTANCED_STEREO
    nointerpolation uint EyeIndex : PACKED_EYE_INDEX;
#endif
};

struct FVertexFactoryInput
{
    float4    Position    : ATTRIBUTE0;

    float4 InstanceData0 : ATTRIBUTE8;
    float4 InstanceData1 : ATTRIBUTE9; 

#if VF_USE_PRIMITIVE_SCENE_DATA
    uint PrimitiveId : ATTRIBUTE13;
#endif
};

struct FPositionOnlyVertexFactoryInput
{
    float4    Position    : ATTRIBUTE0;

    float4 InstanceData0 : ATTRIBUTE8;
    float4 InstanceData1 : ATTRIBUTE9; 

#if VF_USE_PRIMITIVE_SCENE_DATA
    uint PrimitiveId : ATTRIBUTE1;
#endif
};

struct FPositionAndNormalOnlyVertexFactoryInput
{
    float4    Position    : ATTRIBUTE0;
    float4    Normal        : ATTRIBUTE2;

    float4 InstanceData0 : ATTRIBUTE8;
    float4 InstanceData1 : ATTRIBUTE9; 

#if VF_USE_PRIMITIVE_SCENE_DATA
    uint PrimitiveId : ATTRIBUTE1;
#endif
};

struct FVertexFactoryIntermediates
{
    float3 OriginalWorldPos;
    
    uint PrimitiveId;
};

uint GetPrimitiveId(FVertexFactoryInterpolantsVSToPS Interpolants)
{
#if VF_USE_PRIMITIVE_SCENE_DATA
    return Interpolants.PrimitiveId;
#else
    return 0;
#endif
}

void SetPrimitiveId(inout FVertexFactoryInterpolantsVSToPS Interpolants, uint PrimitiveId)
{
#if VF_USE_PRIMITIVE_SCENE_DATA
    Interpolants.PrimitiveId = PrimitiveId;
#endif
}

#if NUM_TEX_COORD_INTERPOLATORS
float2 GetUV(FVertexFactoryInterpolantsVSToPS Interpolants, int UVIndex)
{
    float4 UVVector = Interpolants.TexCoords[UVIndex / 2];
    return UVIndex % 2 ? UVVector.zw : UVVector.xy;
}

void SetUV(inout FVertexFactoryInterpolantsVSToPS Interpolants, int UVIndex, float2 InValue)
{
    FLATTEN
    if (UVIndex % 2)
    {
        Interpolants.TexCoords[UVIndex / 2].zw = InValue;
    }
    else
    {
        Interpolants.TexCoords[UVIndex / 2].xy = InValue;
    }
}
#endif

FMaterialPixelParameters GetMaterialPixelParameters(FVertexFactoryInterpolantsVSToPS Interpolants, float4 SvPosition)
{
    // GetMaterialPixelParameters is responsible for fully initializing the result
    FMaterialPixelParameters Result = MakeInitializedMaterialPixelParameters();

#if NUM_TEX_COORD_INTERPOLATORS
    UNROLL
    for (int CoordinateIndex = 0; CoordinateIndex < NUM_TEX_COORD_INTERPOLATORS; CoordinateIndex++)
    {
        Result.TexCoords[CoordinateIndex] = GetUV(Interpolants, CoordinateIndex);
    }
#endif    //NUM_MATERIAL_TEXCOORDS

    Result.TwoSidedSign = 1;
    Result.PrimitiveId = GetPrimitiveId(Interpolants);

    return Result;
}

FMaterialVertexParameters GetMaterialVertexParameters(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates, float3 WorldPosition, half3x3 TangentToLocal)
{
    FMaterialVertexParameters Result = (FMaterialVertexParameters)0;
    
    Result.WorldPosition = WorldPosition;
    Result.TangentToWorld = float3x3(1,0,0,0,1,0,0,0,1);
    Result.PreSkinnedPosition = Input.Position.xyz;
    Result.PreSkinnedNormal = float3(0,0,1);

#if NUM_MATERIAL_TEXCOORDS_VERTEX
    UNROLL
    for(int CoordinateIndex = 0; CoordinateIndex < NUM_MATERIAL_TEXCOORDS_VERTEX; CoordinateIndex++)
    {
        Result.TexCoords[CoordinateIndex] = Intermediates.MorphedWorldPosRaw.xy;
    }
#endif  //NUM_MATERIAL_TEXCOORDS_VERTEX

    return Result;
}

FVertexFactoryIntermediates GetVertexFactoryIntermediates(FVertexFactoryInput Input)
{
    FVertexFactoryIntermediates Intermediates;

    // Get the packed instance data
    float4 Data0 = Input.InstanceData0;
    float4 Data1 = Input.InstanceData1;

    const float3 Translation = Data0.xyz;
    const float3 Scale = float3(Data1.zw, 1.0f);
    const uint PackedDataChannel = asuint(Data1.x);

    // Lod level is in first 8 bits and ShouldMorph bit is in the 9th bit
    const float LODLevel = (float)(PackedDataChannel & 0xFF);
    const uint ShouldMorph = ((PackedDataChannel >> 8) & 0x1); 

    // Calculate the world pos
    Intermediates.OriginalWorldPos = float3(Input.Position.xy, 0.0f) * Scale + Translation;

#if VF_USE_PRIMITIVE_SCENE_DATA
    Intermediates.PrimitiveId = Input.PrimitiveId;
#else
    Intermediates.PrimitiveId = 0;
#endif

    return Intermediates;
}

half3x3 VertexFactoryGetTangentToLocal(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates)
{
    return half3x3(1,0,0,0,1,0,0,0,1);
}

float4 VertexFactoryGetRasterizedWorldPosition(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates, float4 InWorldPosition)
{
    return InWorldPosition;
}

float3 VertexFactoryGetPositionForVertexLighting(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates, float3 TranslatedWorldPosition)
{
    return TranslatedWorldPosition;
}

FVertexFactoryInterpolantsVSToPS VertexFactoryGetInterpolantsVSToPS(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates, FMaterialVertexParameters VertexParameters)
{
    FVertexFactoryInterpolantsVSToPS Interpolants;

    Interpolants = (FVertexFactoryInterpolantsVSToPS)0;

#if NUM_TEX_COORD_INTERPOLATORS
    float2 CustomizedUVs[NUM_TEX_COORD_INTERPOLATORS];
    GetMaterialCustomizedUVs(VertexParameters, CustomizedUVs);
    GetCustomInterpolators(VertexParameters, CustomizedUVs);
    
    UNROLL
    for (int CoordinateIndex = 0; CoordinateIndex < NUM_TEX_COORD_INTERPOLATORS; CoordinateIndex++)
    {
        SetUV(Interpolants, CoordinateIndex, CustomizedUVs[CoordinateIndex]);
    }
#endif

#if INSTANCED_STEREO
    Interpolants.EyeIndex = 0;
#endif

    SetPrimitiveId(Interpolants, Intermediates.PrimitiveId);

    return Interpolants;
}

float4 VertexFactoryGetWorldPosition(FPositionOnlyVertexFactoryInput Input)
{
    return Input.Position;
}

float4 VertexFactoryGetPreviousWorldPosition(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates)
{
    float4x4 PreviousLocalToWorldTranslated = GetPrimitiveData(Intermediates.PrimitiveId).PreviousLocalToWorld;
    PreviousLocalToWorldTranslated[3][0] += ResolvedView.PrevPreViewTranslation.x;
    PreviousLocalToWorldTranslated[3][1] += ResolvedView.PrevPreViewTranslation.y;
    PreviousLocalToWorldTranslated[3][2] += ResolvedView.PrevPreViewTranslation.z;

    return mul(Input.Position, PreviousLocalToWorldTranslated);
}

float4 VertexFactoryGetTranslatedPrimitiveVolumeBounds(FVertexFactoryInterpolantsVSToPS Interpolants)
{
    float4 ObjectWorldPositionAndRadius = GetPrimitiveData(GetPrimitiveId(Interpolants)).ObjectWorldPositionAndRadius;
    return float4(ObjectWorldPositionAndRadius.xyz + ResolvedView.PreViewTranslation.xyz, ObjectWorldPositionAndRadius.w);
}

uint VertexFactoryGetPrimitiveId(FVertexFactoryInterpolantsVSToPS Interpolants)
{
    return GetPrimitiveId(Interpolants);
}

float3 VertexFactoryGetWorldNormal(FPositionAndNormalOnlyVertexFactoryInput Input)
{
    return Input.Normal.xyz;
}

float3 VertexFactoryGetWorldNormal(FVertexFactoryInput Input, FVertexFactoryIntermediates Intermediates)
{
    return float3(0.0f, 0.0f, 1.0f);
}

由此可見,如果新增加了FVertexFactory的自定義類型,需要在HLSL實現以下介面:

函數 描述
FVertexFactoryInput 定義輸入到VS的數據布局,需要匹配c++側的FVertexFactory的類型。
FVertexFactoryIntermediates 用於存儲將在多個頂點工廠函數中使用的快取中間數據,比如TangentToLocal。
FVertexFactoryInterpolantsVSToPS 從VS傳遞到PS的頂點工廠數據。
VertexFactoryGetWorldPosition 從頂點著色器調用來獲得世界空間的頂點位置。
VertexFactoryGetInterpolantsVSToPS 轉換FVertexFactoryInput到FVertexFactoryInterpolants,在硬體光柵化插值之前計算需要插值或傳遞到PS的數據。
GetMaterialPixelParameters 由PS調用,根據FVertexFactoryInterpolants計算並填充FMaterialPixelParameters結構體。

 

8.5 本篇總結

本篇主要闡述了UE的shader體系的基礎概念、類型、機制,希望童鞋們學習完本篇之後,會UE的shader不再默認,並能夠應用於實際項目實踐中。

8.5.1 本篇思考

按慣例,本篇也布置一些小思考,以助理解和加深UE Shader體系的掌握和理解:

  • FShader的繼承體系中有哪些重要的子類?它們的功能是什麼?有什麼異同?
  • Shader Parameter和Uniform Buffer如何聲明、實現、應用並更新到GPU中?
  • Shader Map的存儲和編譯機制是怎麼樣的?
  • UE在Shader跨平台中採用了什麼方案?為什麼要那樣做?有沒更好的方式?
  • 如何更好地調試或優化Shader?

 

特別說明

  • 感謝所有參考文獻的作者,部分圖片來自參考文獻和網路,侵刪。
  • 本系列文章為筆者原創,只發表在部落格園上,歡迎分享本文鏈接,但未經同意,不允許轉載
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目
  • 系列文章,未完待續,完整目錄請戳內容綱目

 

參考文獻