.NET性能優化-是時候換個序列化協議了

2022 年 11 月 7 日
筆記
.NET, c#, 性能優化, 高性能

計算機單機性能一直受到摩爾定律的約束，隨着移動互聯網的興趣，單機性能不足的瓶頸越來越明顯，制約着整個行業的發展。不過我們雖然不能無止境的縱向擴容系統，但是我們可以分佈式、橫向的擴容系統，這聽起來非常的美好，不過也帶來了今天要說明的問題，分佈式的節點越多，通信產生的成本就越大。

網絡傳輸帶寬變得越來越緊缺，我們服務器的標配上了10Gbps的網卡
HTTPx.x 時代TCP/IP協議通訊低效，我們即將用上QUIC HTTP 3.0
同機器走Socket協議棧太慢，我們用起了eBPF
….

現在我們的應用程序花在網絡通訊上的時間太多了，其中花在序列化上的時間也非常的多。我們和大家一樣，在內部微服務通訊序列化協議中，絕大的部分都是用JSON。JSON的好處很多，首先就是它對人非常友好，我們能直接讀懂它的含義，但是它也有着致命的缺點，那就是它序列化太慢、序列化以後的字符串太大了。

之前筆者做一個項目時，就遇到了一個選型的問題，我們有數億行數據需要緩存到Redis中，每行數據有數百個字段，如果用Json序列化存儲的話它的內存消耗是數TB級別的（部署個集群再做個主從、多中心需要成倍的內存、太貴了，用不起）。於是我們就在找有沒有除了JSON其它更好的序列化方式？

看看都有哪些

目前市面上序列化協議有很多比如XML、JSON、Thrift、Kryo等等，我們選取了在.NET平台上比較常用的序列化協議來做比較：

JSON：JSON是一種輕量級的數據交換格式。採用完全獨立於編程語言的文本格式來存儲和表示數據。簡潔和清晰的層次結構使得 JSON 成為理想的數據交換語言。
Protobuf：Protocol Buffers 是一種語言無關、平台無關、可擴展的序列化結構數據的方法，它可用於（數據）通信協議、數據存儲等，它類似XML，但比它更小、更快、更簡單。
MessagePack：是一種高效的二進制序列化格式。它可以讓你像JSON一樣在多種語言之間交換數據。但它更快、更小。小的整數被編碼成一個位元組，典型的短字符串除了字符串本身之外，只需要一個額外的位元組。
MemoryPack：是Yoshifumi Kawai大佬專為C#設計的一個高效的二進制序列化格式，它有着.NET平台很多新的特性，並且它是Code First開箱即用，非常簡單；同時它還有着非常好的性能。

我們選擇的都是.NET平台上比較常用的，特別是後面的三種都宣稱自己是非常小，非常快的，那麼我們就來看看到底是誰最快，誰序列化後的結果最小。

準備工作

我們準備了一個DemoClass類，裏面簡單的設置了幾個不同類型的屬性，然後依賴了一個子類數組。暫時忽略上面的一些頭標記。

[MemoryPackable]  
[MessagePackObject]  
[ProtoContract]  
public partial class DemoClass  
{  
    [Key(0)] [ProtoMember(1)] public int P1 { get; set; }  
    [Key(1)] [ProtoMember(2)] public bool P2 { get; set; }  
    [Key(2)] [ProtoMember(3)] public string P3 { get; set; } = null!;  
    [Key(3)] [ProtoMember(4)] public double P4 { get; set; }  
    [Key(4)] [ProtoMember(5)] public long P5 { get; set; }  
    [Key(5)] [ProtoMember(6)] public DemoSubClass[] Subs { get; set; } = null!;  
}  
  
[MemoryPackable]  
[MessagePackObject]  
[ProtoContract]  
public partial class DemoSubClass  
{  
    [Key(0)] [ProtoMember(1)] public int P1 { get; set; }  
    [Key(1)] [ProtoMember(2)] public bool P2 { get; set; }  
    [Key(2)] [ProtoMember(3)] public string P3 { get; set; } = null!;  
    [Key(3)] [ProtoMember(4)] public double P4 { get; set; }  
    [Key(4)] [ProtoMember(5)] public long P5 { get; set; }  
}

System.Text.Json

選用它的原因很簡單，這應該是.NET目前最快的JSON序列化框架之一了，它的使用非常簡單，已經內置在.NET BCL中，只需要引用System.Text.Json命名空間，訪問它的靜態方法即可完成序列化和反序列化。

using System.Text.Json;

var obj = ....;

// Serialize
var json = JsonSerializer.Serialize(obj);  

// Deserialize
var newObj = JsonSerializer.Deserialize<T>(json)

Google Protobuf

.NET上最常用的一個Protobuf序列化框架，它其實是一個工具包，通過工具包+*.proto文件可以生成GRPC Service或者對應實體的序列化代碼，不過它使用起來有點麻煩。

使用它我們需要兩個Nuget包，如下所示：

<!--Google.Protobuf 序列化和反序列化幫助類-->
<PackageReference Include="Google.Protobuf" Version="3.21.9" />

<!--Grpc.Tools 用於生成protobuf的序列化反序列化類 和 GRPC服務-->
<PackageReference Include="Grpc.Tools" Version="2.50.0">  
  <PrivateAssets>all</PrivateAssets>  
  <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>  
</PackageReference>

由於它不能直接使用C#對象，所以我們還需要創建一個*.proto文件，布局和上面的C#類一致，加入了一個DemoClassArrayProto方便後面測試：

syntax="proto3";  
option csharp_namespace="DemoClassProto";  
package DemoClassProto;  
  
message DemoClassArrayProto  
{  
  repeated DemoClassProto DemoClass = 1;  
}  
  
message DemoClassProto  
{  
  int32 P1=1;  
  bool P2=2;  
  string P3=3;  
  double P4=4;  
  int64 P5=5;  
  repeated DemoSubClassProto Subs=6;  
}  
  
message DemoSubClassProto  
{  
  int32 P1=1;  
  bool P2=2;  
  string P3=3;  
  double P4=4;  
  int64 P5=5;  
}

做完這一些後，還需要在項目文件中加入如下的配置，讓Grpc.Tools在編譯時生成對應的C#類：

<ItemGroup>  
    <Protobuf Include="*.proto" GrpcServices="Server" />  
</ItemGroup>

然後Build當前項目的話就會在obj目錄生成C#類：

最後我們可以用下面的方法來實現序列化和反序列化，泛型類型T是需要繼承IMessage<T>從*.proto生成的實體(用起來還是挺麻煩的)：

using Google.Protobuf;

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static byte[] GoogleProtobufSerialize<T>(T origin) where T : IMessage<T>  
{  
    return origin.ToByteArray();  
}

// Deserialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public DemoClassArrayProto GoogleProtobufDeserialize(byte[] bytes)  
{  
    return DemoClassArrayProto.Parser.ParseFrom(bytes);  
}

Protobuf.Net

那麼在.NET平台protobuf有沒有更簡單的使用方式呢？答案當然是有的，我們只需要依賴下面的Nuget包：

<PackageReference Include="protobuf-net" Version="3.1.22" />

然後給我們需要進行序列化的C#類打上ProtoContract特性，另外將所需要序列化的屬性打上ProtoMember特性，如下所示：

[ProtoContract]  
public class DemoClass  
{  
    [ProtoMember(1)] public int P1 { get; set; }  
    [ProtoMember(2)] public bool P2 { get; set; }  
    [ProtoMember(3)] public string P3 { get; set; } = null!;  
    [ProtoMember(4)] public double P4 { get; set; }  
    [ProtoMember(5)] public long P5 { get; set; }  
}

然後就可以直接使用框架提供的靜態類進行序列化和反序列化，遺憾的是它沒有提供直接返回byte[]的方法，不得不使用一個MemoryStrem：

using ProtoBuf;

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static void ProtoBufDotNet<T>(T origin, Stream stream)  
{  
    Serializer.Serialize(stream, origin);  
}

// Deserialize
public T ProtobufDotNet(byte[] bytes)  
{  
    using var stream = new MemoryStream(bytes);  
    return Serializer.Deserialize<T>(stream);  
}

MessagePack

這裡我們使用的是Yoshifumi Kawai實現的MessagePack-CSharp，同樣也是引入一個Nuget包：

<PackageReference Include="MessagePack" Version="2.4.35" />

然後在類上只需要打一個MessagePackObject的特性，然後在需要序列化的屬性打上Key特性：

[MessagePackObject] 
public partial class DemoClass  
{  
    [Key(0)] public int P1 { get; set; }  
    [Key(1)] public bool P2 { get; set; }  
    [Key(2)] public string P3 { get; set; } = null!;  
    [Key(3)] public double P4 { get; set; }  
    [Key(4)] public long P5 { get; set; }
}

使用起來也非常簡單，直接調用MessagePack提供的靜態類即可：

using MessagePack;

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static byte[] MessagePack<T>(T origin)  
{  
    return global::MessagePack.MessagePackSerializer.Serialize(origin);  
}

// Deserialize
public T MessagePack<T>(byte[] bytes)  
{  
    return global::MessagePack.MessagePackSerializer.Deserialize<T>(bytes);  
}

另外它提供了Lz4算法的壓縮程序，我們只需要配置Option，即可使用Lz4壓縮，壓縮有兩種方式，Lz4Block和Lz4BlockArray，我們試試：

public static readonly MessagePackSerializerOptions MpLz4BOptions =   MessagePackSerializerOptions.Standard.WithCompression(MessagePackCompression.Lz4Block);  

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static byte[] MessagePackLz4Block<T>(T origin)  
{  
    return global::MessagePack.MessagePackSerializer.Serialize(origin, MpLz4BOptions);  
}

// Deserialize
public T MessagePackLz4Block<T>(byte[] bytes)  
{  
    return global::MessagePack.MessagePackSerializer.Deserialize<T>(bytes, MpLz4BOptions);  
}

MemoryPack

這裡也是Yoshifumi Kawai大佬實現的MemoryPack，同樣也是引入一個Nuget包，不過需要注意的是，目前需要安裝VS 2022 17.3以上版本和.NET7 SDK，因為MemoryPack代碼生成依賴了它：

<PackageReference Include="MemoryPack" Version="1.4.4" />

使用起來應該是這幾個二進制序列化協議最簡單的了，只需要給對應的類加上partial關鍵字，另外打上MemoryPackable特性即可：

[MemoryPackable]
public partial class DemoClass  
{  
    public int P1 { get; set; }  
    public bool P2 { get; set; }  
    public string P3 { get; set; } = null!;  
    public double P4 { get; set; }  
    public long P5 { get; set; }
}

序列化和反序列化也是調用靜態方法：

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static byte[] MemoryPack<T>(T origin)  
{  
    return global::MemoryPack.MemoryPackSerializer.Serialize(origin);  
}

// Deserialize
public T MemoryPack<T>(byte[] bytes)  
{  
    return global::MemoryPack.MemoryPackSerializer.Deserialize<T>(bytes)!;  
}

它原生支持Brotli壓縮算法，使用如下所示：

// Serialize
[MethodImpl(MethodImplOptions.AggressiveInlining)]  
public static byte[] MemoryPackBrotli<T>(T origin)  
{  
    using var compressor = new BrotliCompressor();  
    global::MemoryPack.MemoryPackSerializer.Serialize(compressor, origin);  
    return compressor.ToArray();  
}

// Deserialize
public T MemoryPackBrotli<T>(byte[] bytes)  
{  
    using var decompressor = new BrotliDecompressor();  
    var decompressedBuffer = decompressor.Decompress(bytes);  
    return MemoryPackSerializer.Deserialize<T>(decompressedBuffer)!;  
}

跑個分吧

我使用BenchmarkDotNet構建了一個10萬個對象序列化和反序列化的測試，源碼在末尾的Github鏈接可見，比較了序列化、反序列化的性能，還有序列化以後佔用的空間大小。

public static class TestData  
{  
    //
    public static readonly DemoClass[] Origin = Enumerable.Range(0, 10000).Select(i =>  
    {  
        return new DemoClass  
        {  
            P1 = i,  
            P2 = i % 2 == 0,  
            P3 = $"Hello World {i}",  
            P4 = i,  
            P5 = i,  
            Subs = new DemoSubClass[]  
            {  
                new() {P1 = i, P2 = i % 2 == 0, P3 = $"Hello World {i}", P4 = i, P5 = i,},  
                new() {P1 = i, P2 = i % 2 == 0, P3 = $"Hello World {i}", P4 = i, P5 = i,},  
                new() {P1 = i, P2 = i % 2 == 0, P3 = $"Hello World {i}", P4 = i, P5 = i,},  
                new() {P1 = i, P2 = i % 2 == 0, P3 = $"Hello World {i}", P4 = i, P5 = i,},  
            }  
        };  
    }).ToArray();  
  
    public static readonly DemoClassProto.DemoClassArrayProto OriginProto;  
    static TestData()  
    {  
        OriginProto = new DemoClassArrayProto();  
        for (int i = 0; i < Origin.Length; i++)  
        {  
            OriginProto.DemoClass.Add(  
                DemoClassProto.DemoClassProto.Parser.ParseJson(JsonSerializer.Serialize(Origin[i])));  
        }  
    }  
}