Android OOM 问题探究 — 从入门到放弃

一、前言

最近客户反馈了一些OOM的问题,很早之前自己也有简单了解过OOM的知识,但时间久远,很多东西都记不清了。

现在遇到这个OOM问题,也即趁此搜索了一些资料,对OOM问题做一些探究,把资料记录于此,一遍后续查阅。本文内容大量借鉴参考了网络上经典文章的内容,站在巨人的肩膀上登高望远!

注:以下分析基于 Android R source

 

二、OOM问题的可能原因

网络上可以搜索到很多的解释,都很详细,我在此也做一个简单的总结,当然可能不全面,仅供学习参考

 

Android系统中,OutOfMemoryError这个错误是怎么被系统抛出的?在代码进行搜索可以看到

 

重点关注下面两点

✔️ 堆内存分配失败时的OOM  ==   /art/runtime/gc/heap.cc

✔️ 创建线程失败时的OOM     ==   /art/runtime/thread.cc

 

三、OOM — 堆内存分配失败

在source code中我们可以看到,当堆内存分配失败时,会抛出一些典型的log,如下代码

void Heap::ThrowOutOfMemoryError(Thread* self, size_t byte_count, AllocatorType allocator_type) {
  ...
  std::ostringstream oss;
  size_t total_bytes_free = GetFreeMemory();
  oss << "Failed to allocate a " << byte_count << " byte allocation with " << total_bytes_free
      << " free bytes and " << PrettySize(GetFreeMemoryUntilOOME()) << " until OOM,"
      << " target footprint " << target_footprint_.load(std::memory_order_relaxed)
      << ", growth limit "
      << growth_limit_;
  // If the allocation failed due to fragmentation, print out the largest continuous allocation.
  ...
}

在出现OOM问题时,logcat中应该会看到类似下面的信息输出

08-19 11:34:53.860 28028 28028 E AndroidRuntime: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728

上面这段logcat的大概解释:想要去分配 20971536 bytes的heap memory,但时app剩余可用的free heap只有6147912 bytes,而且当前app最大可分配的heap是134217728 bytes

 

堆内存分配失败的原因可以分两种情况:1. 超过APP进程的heap内存上限 与 2. 没有足够大小的连续地址空间

 

3.1 超过APP进程的内存上限

Android设备上java虚拟机对单个应用的最大内存分配做了约束,超出这个值就会OOM。由Runtime.getRuntime.MaxMemory()可以得到Android中每个进程被系统分配的内存上限,当进程占用内存达到这个上限时就会发生OOM,这也是Android中最常见的OOM类型。

 


Android系统有如下约定:

  • /vendor/build.prop有定义属性值来对单个应用的最大内存分配做约束

dalvik.vm.heapgrowthlimit 常规app使用的参数

dalvik.vm.heapsize 应用在AndroidManifest.xml设置了android:largeHeap="true",将会变成大应用的设置

 

  • 代码中也可以使用如下API来获取内存限制的信息

ActivityManager.getMemoryClass() 常规app最大可用的堆内存,对应 dalvik.vm.heapgrowthlimit;

ActivityManager.getLargeMemoryClass()应用在AndroidManifest.xml设置了android:largeHeap=”true”,将会变成大应用时最大可用的堆内存,对应dalvik.vm.heapsize;

Runtime.getRuntime().maxMemory()  可以得到Android中每个进程被系统分配的内存上限,等于上面的两个值之一;


 

如下是一段简单的代码来演示这种类型的OOM:

private void testOOMCreatHeap(Context context) {
    ActivityManager activityManager =(ActivityManager)context.getSystemService(Context.ACTIVITY_SERVICE);
    Log.d("OOM_TEST", "app maxMemory = " + activityManager.getMemoryClass() + "MB");
    Log.d("OOM_TEST", "large app maxMemory = " + activityManager.getLargeMemoryClass() + "MB");
    Log.d("OOM_TEST", "current app maxMemory = " + Runtime.getRuntime().maxMemory()/1024/1024 + "MB");
    List<byte[]> bytesList = new ArrayList<>();
    int count = 0;
    while (true) {
        try {
            Thread.sleep(20);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        Log.e("OOM-TEST", "allocate 20MB heap: " + count++ + ", total " + 20*count + "MB");
        // 每次申请20MB内存
        bytesList.add(new byte[1024 * 1024 * 20]);
    }
}

 

注:我的测试平台 dalvik.vm.heapgrowthlimit=128MB , dalvik.vm.heapsize=384MB

上面的测试代码中,我们每次分配20MB的内存

 

情况一:常规应用,不要在AndroidManifest.xml设置android:largeHeap=”true”,此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapgrowthlimit=128MB

看运行结果:

08-19 11:34:53.555 28028 28028 D OOM_TEST: app maxMemory = 128MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: large app maxMemory = 384MB
08-19 11:34:53.556 28028 28028 D OOM_TEST: current app maxMemory = 128MB
08-19 11:34:53.576 28028 28028 E OOM-TEST: allocate 20MB heap: 0, total 20MB
08-19 11:34:53.596 28028 28028 E OOM-TEST: allocate 20MB heap: 1, total 40MB
08-19 11:34:53.617 28028 28028 E OOM-TEST: allocate 20MB heap: 2, total 60MB
08-19 11:34:53.637 28028 28028 E OOM-TEST: allocate 20MB heap: 3, total 80MB
08-19 11:34:53.658 28028 28028 E OOM-TEST: allocate 20MB heap: 4, total 100MB
08-19 11:34:53.678 28028 28028 E OOM-TEST: allocate 20MB heap: 5, total 120MB
08-19 11:34:53.699 28028 28028 E OOM-TEST: allocate 20MB heap: 6, total 140MB
08-19 11:34:53.699 28028 28028 I com.demo: Waiting for a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28042 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.713 28028 28028 I com.demo: WaitForGcToComplete blocked Alloc on AddRemoveAppImageSpace for 14.279ms
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.713 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.732 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 73us total 19.225ms
08-19 11:34:53.733 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.766 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.767 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 71us total 33.715ms
08-19 11:34:53.767 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.767 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.792 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.792 28028 28028 I com.demo: Alloc concurrent copying GC freed 1046(50KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 57us total 25.120ms
08-19 11:34:53.792 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971532 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264080 kB)
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.793 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.808 28028 28028 I com.demo: Alloc young concurrent copying GC freed 4(31KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 62us total 15.229ms
08-19 11:34:53.808 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.835 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.836 28028 28028 I com.demo: Alloc concurrent copying GC freed 3(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 59us total 27.042ms
08-19 11:34:53.836 28028 28028 I com.demo: Forcing collection of SoftReferences for 20MB allocation
08-19 11:34:53.836 28028 28028 I com.demo: Starting a blocking GC Alloc
08-19 11:34:53.857 28028 28028 I com.demo: Clamp target GC heap from 146MB to 128MB
08-19 11:34:53.857 28028 28028 I com.demo: Alloc concurrent copying GC freed 6(16KB) AllocSpace objects, 0(0B) LOS objects, 4% free, 122MB/128MB, paused 50us total 21.249ms
08-19 11:34:53.857 28028 28028 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728" (VmSize 1264016 kB)
08-19 11:34:53.858 28028 28028 E InputEventSender: Exception dispatching finished signal.
08-19 11:34:53.858 28028 28028 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 6147912 free bytes and 6003KB until OOM, target footprint 134217728, growth limit 134217728
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:34:53.859 28028 28028 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

解释:

最后一次请求分配heap memory时,此时因为已经分配了120+MB的内存,如果继续分配20MB显然超过了限制的128MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

 

情况二:常规应用,在AndroidManifest.xml设置android:largeHeap=”true”,此时APP的Dalvik  heap的分配上限应该是 dalvik.vm.heapsize=384MB

看运行结果:

08-19 11:32:22.660 27539 27539 D OOM_TEST: app maxMemory = 128MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: large app maxMemory = 384MB
08-19 11:32:22.660 27539 27539 D OOM_TEST: current app maxMemory = 384MB
08-19 11:32:23.048 27539 27539 E OOM-TEST: allocate 20MB heap: 18, total 380MB
08-19 11:32:23.061 27539 27553 I com.mediacodec: Clamp target GC heap from 406MB to 384MB
08-19 11:32:23.069 27539 27539 E OOM-TEST: allocate 20MB heap: 19, total 400MB
08-19 11:32:23.069 27539 27539 I com.demo: Starting a blocking GC Alloc
08-19 11:32:23.226 27539 27539 W com.demo: Throwing OutOfMemoryError "Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184" (VmSize 2053220 kB)
08-19 11:32:23.226 27539 27539 E InputEventSender: Exception dispatching finished signal.
08-19 11:32:23.226 27539 27539 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI: java.lang.OutOfMemoryError: Failed to allocate a 20971536 byte allocation with 1900608 free bytes and 1856KB until OOM, target footprint 402653184, growth limit 402653184
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.testOOMCreatHeap(MainActivity.java:393)
08-19 11:32:23.227 27539 27539 E MessageQueue-JNI:      at com.demo.MainActivity.onClick(MainActivity.java:450)

解释:

最后一次请求分配heap memory时,此时因为已经分配了380+MB的内存,如果继续分配20MB显然超过了限制的384MB,而且此时GC并没有能回收掉任何内存,最终分配失败,抛出OutOfMemoryError

 

3.2 没有足够大小的连续地址空间

这种情况一般是进程中存在大量的内存碎片导致的,其堆栈信息会比第一种OOM堆栈多出一段类似如下格式的信息
:failed due to fragmentation (required continguous free “<< required_bytes << “ bytes for a new buffer where largest contiguous free ” << largest_continuous_free_pages << “ bytes)”
相关的代码在art/runtime/gc/allocator/rosalloc.cc中,如下
void RosAlloc::LogFragmentationAllocFailure(std::ostream& os, size_t failed_alloc_bytes) {
  ...
  if (required_bytes > largest_continuous_free_pages) {
    os << "; failed due to fragmentation ("
       << "required contiguous free " << required_bytes << " bytes" << new_buffer_msg
       << ", largest contiguous free " << largest_continuous_free_pages << " bytes"
       << ", total free pages " << total_free << " bytes"
       << ", space footprint " << footprint_ << " bytes"
       << ", space max capacity " << max_capacity_ << " bytes"
       << ")" << std::endl;
  }
}
 
这种场景比较难模拟,这里就不做演示了。
 
 

四、OOM — 创建线程失败

Android中线程(Thread)的创建及内存分配过程分析可以参见如下这篇文章://blog.csdn.net/u011578734/article/details/109331764

线程创建会消耗大量的系统资源(例如内存),创建过程涉及java层和native的处理。实质工作是在native层完成的,相关代码位于 /art/runtime/thread.cc

void Thread::CreateNativeThread(JNIEnv* env, jobject java_peer, size_t stack_size, bool is_daemon) {
    // 此处省略一万字
    
  {
    std::string msg(child_jni_env_ext.get() == nullptr ?
        StringPrintf("Could not allocate JNI Env: %s", error_msg.c_str()) :
        StringPrintf("pthread_create (%s stack) failed: %s",
                                 PrettySize(stack_size).c_str(), strerror(pthread_create_result)));
    ScopedObjectAccess soa(env);
    soa.Self()->ThrowOutOfMemoryError(msg.c_str());
  }
    
}

大概总结如下:下图借鉴了网络上的资料(偷懒了)

建议读一下这篇文章://cloud.tencent.com/developer/article/1071770

4.1 创建JNI Env 失败

一般有两种原因

1. FD溢出导致JNIEnv创建失败了,一般logcat中可以看到信息 Too many open files … Could not allocate JNI Env

当进程fd数(可以通过 ls /proc/pid/fd | wc -l 获得)突破 /proc/pid/limits中规定的Max open files时,产生OOM

E/art: ashmem_create_region failed for 'indirect ref table': Too many open files java.lang.OutOfMemoryError:Could not allocate JNI Env at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:730)

2. 虚拟内存不足导致JNIEnv创建失败了,一般logcat中可以看到信息 Could not allocate JNI Env: Failed anonymous mmap

08-19 17:51:50.662  3533  3533 E OOM_TEST: create thread : 1104
08-19 17:51:50.663  3533  3533 W com.demo: Throwing OutOfMemoryError "Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log." (VmSize 2865432 kB)
08-19 17:51:50.663  3533  3533 E InputEventSender: Exception dispatching finished signal.
08-19 17:51:50.663  3533  3533 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.668  3533  3533 E MessageQueue-JNI:      at java.lang.Thread.start(Thread.java:887)


08-19 17:51:50.671  3533  3533 E AndroidRuntime: FATAL EXCEPTION: main
08-19 17:51:50.671  3533  3533 E AndroidRuntime: Process: com.demo, PID: 3533
08-19 17:51:50.671  3533  3533 E AndroidRuntime: java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x3, 0x22, -1, 0): Operation not permitted. See process maps in the log.
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 17:51:50.671  3533  3533 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

 

4.2 创建线程失败

一般有两种原因

1. 虚拟内存不足导致失败,一般logcat中可以看到信息 mapped space: Out of memory  … pthread_create (1040KB stack) failed: Out of memory

native层通过FixStackSize设置线程栈大小,默认情况下,线程栈所需内存总大小 = 1M + 8k + 8k,即为1040k。

//  /art/runtime/thread.cc
static size_t FixStackSize(size_t stack_size) {
    // 这里面设置计算 stack_size,一般默认1040KB
}

发生OOM时的典型logcat如下:

W/libc: pthread_create failed: couldn't allocate 1073152-bytes mapped space: Out of memory
W/tch.crowdsourc: Throwing OutOfMemoryError with VmSize  4191668 kB "pthread_create (1040KB stack) failed: Try again"
java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
        at java.lang.Thread.nativeCreate(Native Method)
        at java.lang.Thread.start(Thread.java:753)

2. 线程数量超过了限制导致失败,一般logcat中可以看到信息 pthread_create failed: clone failed: Try again

08-19 18:55:07.725 22139 22139 E OOM_TEST: create thread : 54
08-19 18:55:07.725 22139 22139 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.726 22139 22139 W com.demo: Throwing OutOfMemoryError "pthread_create (1040KB stack) failed: Try again" (VmSize 1715684 kB)
08-19 18:55:07.733 22139 22139 E InputEventSender: Exception dispatching finished signal.
08-19 18:55:07.733 22139 22139 E MessageQueue-JNI: Exception in MessageQueue callback: handleReceiveCallback
08-19 18:55:07.734 22786 22786 W externalstorag: Using default instruction set features for ARM CPU variant (generic) using conservative defaults
08-19 18:55:07.734 22786 22786 W libc    : pthread_create failed: clone failed: Try again
08-19 18:55:07.735 22786 22786 F externalstorag: thread_pool.cc:66] pthread_create failed for new thread pool worker thread: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.737 22139 22139 E MessageQueue-JNI:      at java.lang.Thread.nativeCreate(Native Method)
   
    
08-19 18:55:07.739 22139 22139 E AndroidRuntime: java.lang.OutOfMemoryError: pthread_create (1040KB stack) failed: Try again
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.nativeCreate(Native Method)
08-19 18:55:07.739 22139 22139 E AndroidRuntime:        at java.lang.Thread.start(Thread.java:887)

 

4.3 debug技巧

  • 对于FD的限制

可以执行 cat /proc/pid/limits来查看Max open files 最大打开的文件数量

可以执行 ls /proc/pid/fd | wc -l 来查看进程打开的文件数量

 

  • 对于线程数量的限制

可以执行cat /proc/sys/kernel/threads-max 查看系统最多可以创建多少线程

可以执行echo 3000 > /proc/sys/kernel/threads-max修改这个值,做测试

查看系统当前的线程数 top -H

查看当前进程线程数量cat /proc/{pid}/status

 

  • 对于虚拟内存使用情况

可以执行 cat /proc/pid/status | grep Vm查看VmSize及VmPeak

4.4 OOM演示

可以使用下面这段程序做简单演示,不过不同设备由于参数配置不同,可能会OOM error会有不同

private void testOOMCreatThread() {
    int count = 0;
    while (true) {
        Log.e("OOM_TEST", "create thread : " + ++count);
        new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    Thread.sleep(10000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }, "thread-" + count).start();
    }
}

 

五、参考及推荐阅读文章

✔️ 【性能优化】大厂OOM优化和监控方案

✔️ Android 创建线程源码与OOM分析

✔️ 不可思议的OOM

✔️ Android应用OutOfMemory — 1.OOM机制了解 

✔️ 关于虚拟机参数的调整 — heapgrowthlimit/heapsize的配置

✔️ Android中线程(Thread)的创建及内存分配过程分析