記一次 .NET 某招聘網後端服務 內存暴漲分析
一:背景
1. 講故事
前段時間有位朋友wx找到我,說他的程序存在內存階段性暴漲,尋求如何解決,和朋友溝通下來,他的內存平時大概是5G
左右,在某些時點附近會暴漲到 10G+
, 畫個圖大概就是這樣。
所以接下來就是想辦法給他找到那莫名奇妙的 5-6G
是個啥,上 windbg 說話。
二:Windbg 分析
1. 判斷託管還是非託管
從描述上看大概率是託管層面的問題,但為了文章的完整性,我們還是用 !address -summary
和 !eeheap -gc
來看一下。
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 1164 7f5`58f12000 ( 7.958 TB) 99.48%
<unknown> 6924 a`6de84000 ( 41.717 GB) 97.90% 0.51%
Stack 1123 0`16340000 ( 355.250 MB) 0.81% 0.00%
Image 4063 0`1607d000 ( 352.488 MB) 0.81% 0.00%
Heap 71 0`0c9ea000 ( 201.914 MB) 0.46% 0.00%
TEB 374 0`002ec000 ( 2.922 MB) 0.01% 0.00%
Other 13 0`001c6000 ( 1.773 MB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 5423 a`87200000 ( 42.111 GB) 98.83% 0.51%
MEM_IMAGE 7033 0`1e5d6000 ( 485.836 MB) 1.11% 0.01%
MEM_MAPPED 113 0`01908000 ( 25.031 MB) 0.06% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 1164 7f5`58f12000 ( 7.958 TB) 99.48%
MEM_RESERVE 4165 8`1b873000 ( 32.430 GB) 76.11% 0.40%
MEM_COMMIT 8404 2`8b86b000 ( 10.180 GB) 23.89% 0.12%
0:000> !eeheap -gc
Number of GC Heaps: 32
------------------------------
Heap 0 (00000000004106d0)
generation 0 starts at 0x0000000082eb0e58
generation 1 starts at 0x0000000082d79b20
generation 2 starts at 0x000000007fff1000
ephemeral segment allocation context: none
segment begin allocated size
000000007fff0000 000000007fff1000 0000000083f80128 0x3f8f128(66646312)
Large object heap starts at 0x000000087fff1000
segment begin allocated size
000000087fff0000 000000087fff1000 0000000883fe4190 0x3ff3190(67056016)
0000000927ff0000 0000000927ff1000 000000092bfe2430 0x3ff1430(67048496)
0000000a81c50000 0000000a81c51000 0000000a8221c858 0x5cb858(6076504)
Heap Size: Size: 0xc53ef40 (206827328) bytes.
------------------------------
...
Heap 31 (0000000019c84130)
generation 0 starts at 0x0000000844fc5170
generation 1 starts at 0x0000000844f851f8
generation 2 starts at 0x000000083fff1000
ephemeral segment allocation context: none
segment begin allocated size
000000083fff0000 000000083fff1000 0000000845171ca0 0x5180ca0(85462176)
Large object heap starts at 0x00000008fbff1000
segment begin allocated size
00000008fbff0000 00000008fbff1000 00000008fffe2290 0x3ff1290(67048080)
000000094bff0000 000000094bff1000 000000094ea2ebb8 0x2a3dbb8(44293048)
000000096bff0000 000000096bff1000 000000096dbdec00 0x1bedc00(29285376)
Heap Size: Size: 0xd79d6e8 (226088680) bytes.
------------------------------
GC Heap Size: Size: 0x1f1986a88 (8348265096) bytes.
從卦中得知,10G
的內存,託管堆吃掉了 8.3G
,很明顯託管層問題,知道大方向後,接下來就可以到託管堆看一看,根據過往經驗程序肯定是生成了大量的類對象所致,上命令 !dumpheap -stat
。
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
...
000007fe9ddd5fc0 341280 30032640 System.ServiceModel.Description.MessagePartDescription
000007fe9c4865a0 866349 41584752 System.Xml.XmlDictionaryString
000007fe9defb098 937801 45014448 System.Xml.XmlDictionaryString
000007fe9c66bd28 105052 45086880 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
000007fe9e0f4d20 113299 49050864 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
00000000003c9190 44573 618414438 Free
000007fef8f6c168 428410 1209974642 System.Char[]
000007fef8f4f1b8 2849758 1246912848 System.Object[]
000007fef8f6f058 531963 1670620873 System.Byte[]
000007fef8f6aee0 2368431 2382587716 System.String
真是皂滑弄人,並沒有命中過往經驗,可以看出佔用最大的都是些 Byte,String,Char,Object
基礎類型,其實這些基礎類型排查起來很難搞,要麼不斷的用 -min, -max
去篩選,要麼就寫一個腳本對它進行分組排序,蹩腳腳本如下:
"use strict";
/*
按 mt 對託管堆類型的size進行分組
*/
let platform = 64
let mtlist = ["000007fef8f4f1b8"];
let maxlimit = 100;
function initializeScript() { return [new host.apiVersionSupport(1, 7)]; }
function log(str) { host.diagnostics.debugLog(str + "\n"); }
function exec(str) { log("\n" + str); return host.namespace.Debugger.Utility.Control.ExecuteCommand(str); }
function invokeScript() { for (var mt of mtlist) { groupby_mtsize_inheap(mt); } }
//對某個類型按照size 進行分組
function groupby_mtsize_inheap(mt) {
var size_group = {};
var commandText = "!dumpheap -mt " + mt;
var output = exec(commandText);
for (var line of output) {
if (line == "" || line.indexOf("Address") > -1) continue;
if (line.indexOf("Statistics") > -1) break;
var size = parseInt(line.substring(Math.ceil(platform / 2) + 1).trim());
if (!size_group[size]) size_group[size] = 0;
size_group[size]++;
}
show_top10_format(mt, size_group);
}
function show_top10_format(mt, size_group) {
var maparr = [];
//轉數組
for (var size in size_group) {
maparr.push({ "size": size, "count": size_group[size], "totalsize": (size * size_group[size]) });
}
maparr.sort(function (a, b) { return b.totalsize - a.totalsize });
var topTotalSize = 0;
//按size輸出
for (var i = 0; i < Math.min(maparr.length, maxlimit); i++) {
var size = maparr[i].size;
var count = maparr[i].count;
var totalsize = Math.round(maparr[i].totalsize / 1024 / 1024, 2);
topTotalSize += totalsize
log("size=" + size + ",count=" + count + ",totalsize=" + totalsize + "M");
}
log("Total:" + topTotalSize + "M");
//show max
if (maparr.length > 0) {
var size = maparr[0].size;
var totalsize = Math.round(maparr[0].totalsize / 1024 / 1024, 2) + "M";
var output = exec("!dumpheap -mt " + mt + " -min 0n" + size + " -max 0n" + size + " -short").Take(maxlimit);
for (var line of output) {
log(line);
}
}
}
接下來把 string 的方法表地址傳下去看看排序結果,簡化輸出如下:
!dumpheap -mt 000007fef8f6aee0
size=29285946,count=2,totalsize=56M
size=29285540,count=2,totalsize=56M
size=29285502,count=2,totalsize=56M
size=29285348,count=2,totalsize=56M
size=27455186,count=2,totalsize=52M
size=31116504,count=1,totalsize=30M
size=31116490,count=1,totalsize=30M
size=31116306,count=1,totalsize=30M
size=31115934,count=1,totalsize=30M
size=31115920,count=1,totalsize=30M
size=31115718,count=1,totalsize=30M
size=29286342,count=1,totalsize=28M
size=29285898,count=1,totalsize=28M
...
Total:1198M
可以看到,有不少大 size 的 string,那這些string到底是個啥,這裡我隨便抽幾個導出到txt看看。
0:000> !dumpheap -mt 000007fef8f6aee0 -min 0n31116490 -max 0n31116490 -short
0000000a61c51000
0:000> !do 0000000a61c51000
Name: System.String
MethodTable: 000007fef8f6aee0
EEClass: 000007fef88d3720
Size: 31116490(0x1daccca) bytes
File: C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
String: <String is invalid or too large to print>
Fields:
MT Field Offset Type VT Attr Value Name
000007fef8f6dc90 40000aa 8 System.Int32 1 instance 15558232 m_stringLength
000007fef8f6c1c8 40000ab c System.Char 1 instance 50 m_firstChar
000007fef8f6aee0 40000ac 18 System.String 0 shared static Empty
>> Domain:Value 00000000003fb620:NotInit 000000001ca30bd0:NotInit 000000001f7b21a0:NotInit 000000001f8940c0:NotInit 0000000027dc46b0:NotInit 00000000281bd720:NotInit 00000000282b7ee0:NotInit <<
0:000> .writemem D:\dumps\xxxx\string.txt 0000000a61c51000 L?0x1daccca
Writing 1daccca bytes..........
從內容看其實就是 pdf 的 base64 編碼,以同樣的方式調研 char[]
和 byte[]
類型,發現大多也都是 pdf,猜測程序在處理 pdf 的過程中,進行了 byte[]
,char[]
,string
之間的切換,所以這些對象理論上大多屬於無根對象,其實通過 !heapstat -iu
也能看到那大約 5.5G
的無根對象正等待GC回收。
0:000> !heapstat -iu
Heap Gen0 Gen1 Gen2 LOH
Heap0 17625808 1274680 47745824 140181016
...
Total 357486256 28100616 2229673376 5733004848
Free space: Percentage
Heap0 3962240 24 11211224 298616SOH: 22% LOH: 0%
Heap1 5625856 144 9857168 302152SOH: 27% LOH: 0%
...
Heap31 1448576 24 19957312 218024SOH: 25% LOH: 0%
Total 181492784 1136 431825856 5183128
Unrooted objects: Percentage
Heap0 12163928 243584 42872 137153536SOH: 18% LOH: 97%
...
Heap31 236832 239272 1435840 139770656SOH: 2% LOH: 99%
Total 164954952 7948448 29066480 5530423784
三:總結
本次內存階段性暴漲的事故,主要還是程序接收了上游過多的 pdf文件
,畢竟這些都是大對象,還進行了 char[] ,string,byte[] 的切換,造成短時間內過大的內存佔用。
最後就是我個人的解決建議:
-
針對大量的pdf,能否借用第三方的 oss 軟件來規避一些不必要的內存佔用。
-
清洗服務是否可以做些限流或者使用服務均攤的方式。
後來聽朋友說,他做了篩選過濾
以及一些業務流程優化
解決了這個問題,我想現實中肯定有很多朋友遇到過這類問題,歡迎大家留言補充您的解決方案。