一個超經典 WinForm 卡死問題的再反思

一:背景

1.講故事

這篇文章起源於昨天的一位朋友發給我的dump文件,說它的程式出現了卡死,看了下程式的主執行緒棧,居然又碰到了 OnUserPreferenceChanged 導致的掛死問題,真的是經典中的經典,執行緒棧如下:


0:000:x86> !clrstack
OS Thread Id: 0x4eb688 (0)
Child SP       IP Call Site
002fed38 0000002b [HelperMethodFrame_1OBJ: 002fed38] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean)
002fee1c 5cddad21 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean)
002fee34 5cddace8 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
002fee48 538d876c System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle)
002fee88 53c5214a System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean)
002fee8c 538dab4b [InlinedCallFrame: 002fee8c] 
002fef14 538dab4b System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[])
002fef48 53b03bc6 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object)
002fef60 5c774708 Microsoft.Win32.SystemEvents+SystemEventInvokeInfo.Invoke(Boolean, System.Object[])
002fef94 5c6616ec Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[])
002fefe8 5c660cd4 Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr)
002ff008 5c882c98 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr)
...

說實話,這種dump從去年看到今年,應該不下五次了,都看煩了,其形成原因是:

  • 未在主執行緒中生成用戶控制項,導致用 WindowsFormsSynchronizationContext.Send 跨執行緒封送時,對方無法響應請求進而掛死

雖然知道原因,但有一個非常大的遺憾就是在 dump 中找不到到底是哪一個控制項,只能籠統的告訴朋友,讓其洞察下程式碼是哪裡用了工作執行緒創建了 用戶控制項, 有些朋友根據這個資訊成功的找到,也有朋友因為各種原因沒有找到,比較遺憾。

為了不讓這些朋友的遺憾延續下去,這一篇做一個系統歸納,希望能助這些朋友一臂之力。

二:解決方案

1. 背景

這個問題的形成詳情,我在去年的一篇文章為:記一次 .NET 某新能源汽車鋰電池檢測程式 UI掛死分析 //www.cnblogs.com/huangxincheng/p/15245554.html 中已經做過分享,因為 dump 中找不到問題的 Control,所以也留下了一些遺憾,這一篇就做個補充。

2. 問題突破點分析

熟悉 WinForm 底層的朋友應該知道,一旦在 工作執行緒 上創建了 Control 控制項,框架會自動給這個執行緒配備一個 WindowsFormsSynchronizationContext 和其底層的 MarshalingControl ,這個是有源碼支撐的,大家可以找下 Control 的構造函數,簡化後的源碼如下:


public class Control : Component
{
    internal Control(bool autoInstallSyncContext)
    {
        //***

        if (autoInstallSyncContext)
        {
            WindowsFormsSynchronizationContext.InstallIfNeeded();
        }
    }
}

public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
    private Control controlToSendTo;

    private WeakReference destinationThreadRef;

    public WindowsFormsSynchronizationContext()
    {
        DestinationThread = Thread.CurrentThread;
        Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent();
        if (threadContext != null)
        {
            controlToSendTo = threadContext.MarshalingControl;
        }
    }

    internal static void InstallIfNeeded()
    {
        try
        {
            SynchronizationContext synchronizationContext = AsyncOperationManager.SynchronizationContext;
            if (synchronizationContext == null || synchronizationContext.GetType() == typeof(SynchronizationContext))
            {
                AsyncOperationManager.SynchronizationContext = new WindowsFormsSynchronizationContext();
            }
        }
        finally
        {
            inSyncContextInstallation = false;
        }
    }
}

public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable
{
    public WindowsFormsSynchronizationContext()
    {
        DestinationThread = Thread.CurrentThread;
        Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent();
        if (threadContext != null)
        {
            controlToSendTo = threadContext.MarshalingControl;
        }
    }
}

internal sealed class ThreadContext
{
    internal Control MarshalingControl
    {
        get
        {
            lock (this)
            {
                if (marshalingControl == null)
                {
                    marshalingControl = new MarshalingControl();
                }
                return marshalingControl;
            }
        }
    }
}

這段程式碼可以挖到下面兩點資訊。

  1. 一旦 Control 創建在工作執行緒上,那這個執行緒就會安裝一個 WindowsFormsSynchronizationContext 變數,比如此時就存在兩個對象了。

0:000:x86> !dso
OS Thread Id: 0x4eb688 (0)
ESP/REG  Object   Name
002FEC40 025a0fb0 System.Windows.Forms.WindowsFormsSynchronizationContext
...
002FEF44 0260992c System.Object[]    (System.Object[])
002FEF48 02d69164 System.Windows.Forms.WindowsFormsSynchronizationContext
...

  1. 工作執行緒ID 會記錄在內部的 destinationThreadRef 欄位中,我們試探下 02d69164

0:000:x86> !do 02d69164
Name:        System.Windows.Forms.WindowsFormsSynchronizationContext
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
...
533c2204  4002522        8 ...ows.Forms.Control  0 instance 02d69218 controlToSendTo
5cef92d0  4002523        c System.WeakReference  0 instance 02d69178 destinationThreadRef

0:000:x86> !DumpObj /d 02d69178
Name:        System.WeakReference
MethodTable: 5cef92d0
EEClass:     5cabf0cc
Size:        12(0xc) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
5cee2bdc  400070a        4        System.IntPtr  1 instance   111828 m_handle

0:000:x86> !do poi(111828)
Name:        System.Threading.Thread
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
5cee1638  40018ca       28         System.Int32  1 instance        9 m_ManagedThreadId 
...

從上面的輸出中可以看到,9號執行緒 曾經創建了不該創建的 Control,所以找出這個 Control 就是解決問題的關鍵,這也是最難的。

3. 如何找到問題 Control

以我目前的技術實力,從 dump 中確實找不到,但我可以運行時監測,突破點就是一旦這個 Control 在工作執行緒中創建,底層會安排一個 WindowsFormsSynchronizationContext 以及 MarshalingControl 對象,我們攔截他們的生成構造就好了。

為了方便講述,先上一段測試程式碼,在 backgroundWorker1_DoWork 方法中創建一個 Button 控制項。


namespace WindowsFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
        {
            Button btn = new Button();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }

        private void button1_Click(object sender, EventArgs e)
        {
            backgroundWorker1.RunWorkerAsync();
        }
    }
}

接下來在 MarshalingControl 的構造函數上下一個bp斷點來自動化記錄,觀察 new Button 的時候是否命中。


0:007> !name2ee System_Windows_Forms_ni System.Windows.Forms.Application+MarshalingControl..ctor
Module:      5b9b1000
Assembly:    System.Windows.Forms.dll
Token:       0600554a
MethodDesc:  5b9fe594
Name:        System.Windows.Forms.Application+MarshalingControl..ctor()
JITTED Code Address: 5bb5d1a4
0:007> bp 5bb5d1a4 "!clrstack; gc"
0:007> g
OS Thread Id: 0x249c (9)
Child SP       IP Call Site
067ff2f0 5bb5d1a4 System.Windows.Forms.Application+MarshalingControl..ctor()
067ff2f4 5bb70224 System.Windows.Forms.Application+ThreadContext.get_MarshalingControl()
067ff324 5bb6fe5d System.Windows.Forms.WindowsFormsSynchronizationContext..ctor()
067ff338 5bb6fd4d System.Windows.Forms.WindowsFormsSynchronizationContext.InstallIfNeeded()
067ff364 5bb6e9a0 System.Windows.Forms.Control..ctor(Boolean)
067ff41c 5bbcd5cc System.Windows.Forms.ButtonBase..ctor()
067ff428 5bbcd531 System.Windows.Forms.Button..ctor()
067ff434 02342500 WindowsFormsApp1.Form1.backgroundWorker1_DoWork(System.Object, System.ComponentModel.DoWorkEventArgs)
067ff488 630ee649 System.ComponentModel.BackgroundWorker.OnDoWork(System.ComponentModel.DoWorkEventArgs) [f:\dd\NDP\fx\src\compmod\system\componentmodel\BackgroundWorker.cs @ 107]
067ff49c 630ee55d System.ComponentModel.BackgroundWorker.WorkerThreadStart(System.Object) [f:\dd\NDP\fx\src\compmod\system\componentmodel\BackgroundWorker.cs @ 245]
067ff6a0 7c69f036 [HelperMethodFrame_PROTECTOBJ: 067ff6a0] System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr, System.Object[], System.Object, System.Object[] ByRef)
067ff95c 6197c82c System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Messaging.IMessageSink)
067ff9b0 61978274 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.DoAsyncCall() [f:\dd\ndp\clr\src\BCL\system\runtime\remoting\remotingproxy.cs @ 760]
067ff9bc 61978238 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.ThreadPoolCallBack(System.Object) [f:\dd\ndp\clr\src\BCL\system\runtime\remoting\remotingproxy.cs @ 753]
067ff9c0 6104e7b4 System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object) [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1274]
067ff9c8 61078604 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 980]
067ffa34 61078537 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 928]
067ffa48 6104f445 System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1252]
067ffa5c 6104eb7d System.Threading.ThreadPoolWorkQueue.Dispatch() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 820]
067ffaac 6104e9db System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() [f:\dd\ndp\clr\src\BCL\system\threading\threadpool.cs @ 1161]
067ffccc 7c69f036 [DebuggerU2MCatchHandlerFrame: 067ffccc] 

從執行緒棧可以清晰的追蹤到原來是 backgroundWorker1_DoWork 下的 Button 創建的,這就是問題的根源。。。

三:總結

在我一百多dump的分析旅程中,這個問題真的太高頻了,補充此篇真心希望能幫助這些朋友在焦慮中找到問題Control, 一毫之善,與人方便。

Tags: