Android OOM 問題探究 — 從入門到放棄

一、前言

最近客戶回饋了一些OOM的問題,很早之前自己也有簡單了解過OOM的知識,但時間久遠,很多東西都記不清了。

現在遇到這個OOM問題,也即趁此搜索了一些資料,對OOM問題做一些探究,把資料記錄於此,一遍後續查閱。本文內容大量借鑒參考了網路上經典文章的內容,站在巨人的肩膀上登高望遠!

註:以下分析基於 Android R source

 

二、OOM問題的可能原因

網路上可以搜索到很多的解釋,都很詳細,我在此也做一個簡單的總結,當然可能不全面,僅供學習參考

 

Android系統中,OutOfMemoryError這個錯誤是怎麼被系統拋出的?在程式碼進行搜索可以看到

 

重點關注下面兩點

✔️ 堆記憶體分配失敗時的OOM  ==   /art/runtime/gc/heap.cc

✔️ 創建執行緒失敗時的OOM     ==   /art/runtime/thread.cc

 

三、OOM — 堆記憶體分配失敗

在source code中我們可以看到,當堆記憶體分配失敗時,會拋出一些典型的log,如下程式碼

void Heap::ThrowOutOfMemoryError(Thread* self, size_t byte_count, AllocatorType allocator_type) {
  ...
  std::ostringstream oss;
  size_t total_bytes_free = GetFreeMemory();
  oss << "Failed to allocate a " << byte_count << " byte allocation with " << total_bytes_free
      << " free bytes and " << PrettySize(GetFreeMemoryUntilOOME()) << " until OOM,"
      << " target footprint " << target_footprint_.load(std::memory_order_relaxed)
      << ", growth limit "
      << growth_limit_;
  // If the allocation failed due to fragmentation, print out the largest continuous allocation.
  ...
}

在出現OOM問題時,logcat中應該會看到類似下面的資訊輸出

08-19 11:34:53.860 28028 28028 E AndroidRuntime: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728

上面這段logcat的大概解釋:想要去分配 20971536 bytes的heap memory,但時app剩餘可用的free heap只有6147912 bytes,而且當前app最大可分配的heap是134217728 bytes

 

堆記憶體分配失敗的原因可以分兩種情況:1. 超過APP進程的heap記憶體上限 與 2. 沒有足夠大小的連續地址空間

 

3.1 超過APP進程的記憶體上限

Android設備上java虛擬機對單個應用的最大記憶體分配做了約束,超出這個值就會OOM。由Runtime.getRuntime.MaxMemory()可以得到Android中每個進程被系統分配的記憶體上限,當進程佔用記憶體達到這個上限時就會發生OOM,這也是Android中最常見的OOM類型。

 


Android系統有如下約定:

  • /vendor/build.prop有定義屬性值來對單個應用的最大記憶體分配做約束

dalvik.vm.heapgrowthlimit 常規app使用的參數

dalvik.vm.heapsize 應用在AndroidManifest.xml設置了android:largeHeap="true",將會變成大應用的設置

 

  • 程式碼中也可以使用如下API來獲取記憶體限制的資訊

ActivityManager.getMemoryClass() 常規app最大可用的堆記憶體,對應 dalvik.vm.heapgrowthlimit;

ActivityManager.getLargeMemoryClass()應用在AndroidManifest.xml設置了android:largeHeap=”true”,將會變成大應用時最大可用的堆記憶體,對應dalvik.vm.heapsize;

Runtime.getRuntime().maxMemory()  可以得到Android中每個進程被系統分配的記憶體上限,等於上面的兩個值之一;


 

如下是一段簡單的程式碼來演示這種類型的OOM:

private void testOOMCreatHeap(Context context) {
    ActivityManager activityManager =(ActivityManager)context.getSystemService(Context.ACTIVITY_SERVICE);
    Log.d("OOM_TEST", "app maxMemory = " + activityManager.getMemoryClass() + "MB");
    Log.d("OOM_TEST", "large app maxMemory = " + activityManager.getLargeMemoryClass() + "MB");
    Log.d("OOM_TEST", "current app maxMemory = " + Runtime.getRuntime().maxMemory()/1024/1024 + "MB");
    List<byte[]> bytesList = new ArrayList<>();
    int count = 0;
    while (true) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Log.e("OOM-TEST", "allocate 20MB heap: " + count++ + ", total " + 20*count + "MB");
        // 每次申請20MB記憶體
        bytesList.add(new byte[1024 * 1024 * 20]);
    }
}

 

註:我的測試平台 dalvik.vm.heapgrowthlimit=128MB , dalvik.vm.heapsize=384MB

上面的測試程式碼中,我們每次分配20MB的記憶體

 

情況一:常規應用,不要在AndroidManifest.xml設置android:largeHeap=”true”,此時APP的Dalvik  heap的分配上限應該是 dalvik.vm.heapgrowthlimit=128MB

看運行結果:

08-19 11:34:53.555 28028 28028 D OOM_TEST: app maxMemory = 128MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: large app maxMemory = 384MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: current app maxMemory = 128MB
08-19 11:34:53.576 28028 28028 E OOM-TEST: allocate 20MB heap: 0, total 20MB
08-19 11:34:53.596 28028 28028 E OOM-TEST: allocate 20MB heap: 1, total 40MB
08-19 11:34:53.617 28028 28028 E OOM-TEST: allocate 20MB heap: 2, total 60MB
08-19 11:34:53.637 28028 28028 E OOM-TEST: allocate 20MB heap: 3, total 80MB
08-19 11:34:53.658 28028 28028 E OOM-TEST: allocate 20MB heap: 4, total 100MB
08-19 11:34:53.678 28028 28028 E OOM-TEST: allocate 20MB heap: 5, total 120MB
08-19 11:34:53.699 28028 28028 E OOM-TEST: allocate 20MB heap: 6, total 140MB
08-19 11:34:53.699 28028 28028 I com.demo: Waiting for a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.732 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 73us total 19.225ms
08-19 11:34:53.733 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.766 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.767 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 71us total 33.715ms
08-19 11:34:53.767 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.767 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.792 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.792 28028 28028 I com.demo: Alloc concurrent copying GC freed 1046(50KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 57us total 25.120ms
08-19 11:34:53.792 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971532 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264080 kB)
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.808 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 62us total 15.229ms
08-19 11:34:53.808 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.835 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.836 28028 28028 I com.demo: Alloc concurrent copying GC freed 3(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 59us total 27.042ms
08-19 11:34:53.836 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.836 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.857 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.857 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 50us total 21.249ms
08-19 11:34:53.857 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264016 kB)
08-19 11:34:53.858 28028 28028 E InputEventSender: Exception dispatching finished signal.
08-19 11:34:53.858 28028 28028 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

解釋:

最後一次請求分配heap memory時,此時因為已經分配了120+MB的記憶體,如果繼續分配20MB顯然超過了限制的128MB,而且此時GC並沒有能回收掉任何記憶體,最終分配失敗,拋出OutOfMemoryError

 

情況二:常規應用,在AndroidManifest.xml設置android:largeHeap=”true”,此時APP的Dalvik  heap的分配上限應該是 dalvik.vm.heapsize=384MB

看運行結果:

08-19 11:32:22.660 27539 27539 D OOM_TEST: app maxMemory = 128MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: large app maxMemory = 384MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: current app maxMemory = 384MB
08-19 11:32:23.048 27539 27539 E OOM-TEST: allocate 20MB heap: 18, total 380MB
08-19 11:32:23.061 27539 27553 I com.mediacodec: Clamp target GC heap from 406MB to 384MB
08-19 11:32:23.069 27539 27539 E OOM-TEST: allocate 20MB heap: 19, total 400MB
08-19 11:32:23.069 27539 27539 I com.demo: Starting a blocking GC Alloc
08-19 11:32:23.226 27539 27539 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184" (VmSize 2053220 kB)
08-19 11:32:23.226 27539 27539 E InputEventSender: Exception dispatching finished signal.
08-19 11:32:23.226 27539 27539 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

解釋:

最後一次請求分配heap memory時,此時因為已經分配了380+MB的記憶體,如果繼續分配20MB顯然超過了限制的384MB,而且此時GC並沒有能回收掉任何記憶體,最終分配失敗,拋出OutOfMemoryError

 

3.2 沒有足夠大小的連續地址空間

這種情況一般是進程中存在大量的記憶體碎片導致的,其堆棧資訊會比第一種OOM堆棧多出一段類似如下格式的資訊
:failed due to fragmentation (required continguous free 「<< required_bytes << 「 bytes for a new buffer where largest contiguous free 」 << largest_continuous_free_pages << 「 bytes)」
相關的程式碼在art/runtime/gc/allocator/rosalloc.cc中,如下
void RosAlloc::LogFragmentationAllocFailure(std::ostream& os, size_t failed_alloc_bytes) {
  ...
  if (required_bytes > largest_continuous_free_pages) {
    os << "; failed due to fragmentation ("
       << "required contiguous free " << required_bytes << " bytes" << new_buffer_msg
       << ", largest contiguous free " << largest_continuous_free_pages << " bytes"
       << ", total free pages " << total_free << " bytes"
       << ", space footprint " << footprint_ << " bytes"
       << ", space max capacity " << max_capacity_ << " bytes"
       << ")" << std::endl;
  }
}
 
這種場景比較難模擬,這裡就不做演示了。
 
 

四、OOM — 創建執行緒失敗

Android中執行緒(Thread)的創建及記憶體分配過程分析可以參見如下這篇文章://blog.csdn.net/u011578734/article/details/109331764

執行緒創建會消耗大量的系統資源(例如記憶體),創建過程涉及java層和native的處理。實質工作是在native層完成的,相關程式碼位於 /art/runtime/thread.cc

void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
    // 此處省略一萬字
    
  {
    std::string msg(child_jni_env_ext.get() == nullptr ?
        StringPrintf("Could not allocate JNI Env: %s", error_msg.c_str()) :
        StringPrintf("pthread_create (%s stack) failed: %s",
                                 PrettySize(stack_size).c_str(), strerror(pthread_create_result)));
    ScopedObjectAccess soa(env);
    soa.Self()->ThrowOutOfMemoryError(msg.c_str());
  }
    
}

大概總結如下:下圖借鑒了網路上的資料(偷懶了)

建議讀一下這篇文章://cloud.tencent.com/developer/article/1071770

4.1 創建JNI Env 失敗

一般有兩種原因

1. FD溢出導致JNIEnv創建失敗了,一般logcat中可以看到資訊 Too many open files … Could not allocate JNI Env

當進程fd數(可以通過 ls /proc/pid/fd | wc -l 獲得)突破 /proc/pid/limits中規定的Max open files時,產生OOM

E/art: ashmem_create_region failed for 'indirect ref table': Too many open files java.lang.OutOfMemoryError:Could not allocate JNI Env at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:730)

2. 虛擬記憶體不足導致JNIEnv創建失敗了,一般logcat中可以看到資訊 Could not allocate JNI Env: Failed anonymous mmap

08-19 17:51:50.662  3533  3533 E OOM_TEST: create thread : 1104
08-19 17:51:50.663  3533  3533 W com.demo: Throwing OutOfMemoryError "Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log." (VmSize 2865432 kB)
08-19 17:51:50.663  3533  3533 E InputEventSender: Exception dispatching finished signal.
08-19 17:51:50.663  3533  3533 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.start(Thread.java:887)


08-19 17:51:50.671  3533  3533 E AndroidRuntime: FATAL EXCEPTION: main
08-19 17:51:50.671  3533  3533 E AndroidRuntime: Process: com.demo, PID: 3533
08-19 17:51:50.671  3533  3533 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

 

4.2 創建執行緒失敗

一般有兩種原因

1. 虛擬記憶體不足導致失敗,一般logcat中可以看到資訊 mapped space: Out of memory  … pthread_create (1040KB stack) failed: Out of memory

native層通過FixStackSize設置執行緒棧大小,默認情況下,執行緒棧所需記憶體總大小 = 1M + 8k + 8k,即為1040k。

//  /art/runtime/thread.cc
static size_t FixStackSize(size_t stack_size) {
    // 這裡面設置計算 stack_size,一般默認1040KB
}

發生OOM時的典型logcat如下:

W/libc: pthread_create failed: couldn't allocate 1073152-bytes mapped space: Out of memory
W/tch.crowdsourc: Throwing OutOfMemoryError with VmSize  4191668 kB "pthread_create (1040KB stack) failed: Try again"
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
        at java.lang.Thread.nativeCreate(Native Method)
        at java.lang.Thread.start(Thread.java:753)

2. 執行緒數量超過了限制導致失敗,一般logcat中可以看到資訊 pthread_create failed: clone failed: Try again

08-19 18:55:07.725 22139 22139 E OOM_TEST: create thread : 54
08-19 18:55:07.725 22139 22139 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.726 22139 22139 W com.demo: Throwing OutOfMemoryError "pthread_create (1040KB stack) failed: Try again" (VmSize 1715684 kB)
08-19 18:55:07.733 22139 22139 E InputEventSender: Exception dispatching finished signal.
08-19 18:55:07.733 22139 22139 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 18:55:07.734 22786 22786 W externalstorag: Using default instruction set features for ARM CPU variant (generic) using conservative defaults
08-19 18:55:07.734 22786 22786 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.735 22786 22786 F externalstorag: thread_pool.cc:66] pthread_create failed for new thread pool worker thread: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
   
    
08-19 18:55:07.739 22139 22139 E AndroidRuntime: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

 

4.3 debug技巧

  • 對於FD的限制

可以執行 cat /proc/pid/limits來查看Max open files 最大打開的文件數量

可以執行 ls /proc/pid/fd | wc -l 來查看進程打開的文件數量

 

  • 對於執行緒數量的限制

可以執行cat /proc/sys/kernel/threads-max 查看系統最多可以創建多少執行緒

可以執行echo 3000 > /proc/sys/kernel/threads-max修改這個值,做測試

查看系統當前的執行緒數 top -H

查看當前進程執行緒數量cat /proc/{pid}/status

 

  • 對於虛擬記憶體使用情況

可以執行 cat /proc/pid/status | grep Vm查看VmSize及VmPeak

4.4 OOM演示

可以使用下面這段程式做簡單演示,不過不同設備由於參數配置不同,可能會OOM error會有不同

private void testOOMCreatThread() {
    int count = 0;
    while (true) {
        Log.e("OOM_TEST", "create thread : " + ++count);
        new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    Thread.sleep(10000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }, "thread-" + count).start();
    }
}

 

五、參考及推薦閱讀文章

✔️ 【性能優化】大廠OOM優化和監控方案

✔️ Android 創建執行緒源碼與OOM分析

✔️ 不可思議的OOM

✔️ Android應用OutOfMemory — 1.OOM機制了解 

✔️ 關於虛擬機參數的調整 — heapgrowthlimit/heapsize的配置

✔️ Android中執行緒(Thread)的創建及記憶體分配過程分析