Linux雜談：進程鎖核+實時線程導致的讀寫鎖死循環

2020 年 11 月 3 日
筆記
C, CPU親和屬性, linux, Linux雜談, 讀寫鎖

發現問題

公司項目測試的時候，發現運行一段時間後會出現cpu百分之百的情況。

想着可能是哪裡出現了死循環，於是打算用gdb跟一下，結果gdb居然無法attach到進程。。。。。。

定位問題

查了查去，原來有一個優先級為RT的實時線程出現了死循環，並且由於配置了CPU的親和屬性，使得進程只運行在第一個核上，此時gdb就無法attach了

使用taskset現場修改進程的cpu親和屬性後，發現這個佔cpu百分百的實時線程並沒有出現一般的死循環，而是每次都在pthread_rwlock_wrlock這個函數中，

而更詭異的是，只要修改了cpu親和屬性，就沒有「死循環了」。。。。。。

實驗

於是寫了一段實驗代碼

 1 #define _GNU_SOURCE
 2 #include "stdio.h"
 3 #include "stdlib.h"
 4 #include "unistd.h"
 5 #include "pthread.h"
 6 #include <sched.h>
 7 
 8 pthread_rwlock_t rwlock;
 9 
10 void* task1(void *arg)
11 {
12     pthread_setname_np(pthread_self(), "task1");
13 
14     while(1)
15     {   
16         printf("\r\n task1 lock \r\n");
17         pthread_rwlock_wrlock(&rwlock);
18 
19         printf("\r\n task1 unlock \r\n");
20         pthread_rwlock_unlock(&rwlock);
21 
22         usleep(10);
23     }   
24 }
25 
26 void* task2(void *arg)
27 {
28     struct sched_param sparam;
29 
30     pthread_setname_np(pthread_self(), "task2");
31 
32     /* 設置為最高優先級的實時任務 */
33     sparam.sched_priority = sched_get_priority_max(SCHED_RR);
34     pthread_setschedparam(pthread_self(), SCHED_RR, &sparam);
35 
36     while(1)
37     {   
38         printf("\r\n task2 lock \r\n");
39         pthread_rwlock_wrlock(&rwlock);
40 
41         printf("\r\n task2 unlock \r\n");
42         pthread_rwlock_unlock(&rwlock);
43 
44         usleep(10);
45     }   
46 }
47 
48 int main(int argc, char *argv[])
49 {
50     pthread_t t1, t2, t3; 
51     cpu_set_t cpuset;
52 
53     /* 設置cpu親和屬性，將進程綁定在第一個核上 */
54     CPU_ZERO(&cpuset);
55     CPU_SET(0, &cpuset);
56     sched_setaffinity(0, sizeof(cpuset), &cpuset);
57 
58     pthread_rwlock_init(&rwlock, NULL);
59 
60     pthread_create(&t2, NULL, task1, NULL);
61     sleep(3);
62     pthread_create(&t3, NULL, task2, NULL);
63 
64     while (1)
65         sleep(10);
66 
67     return 0;
68 }

運行結果，如下圖

真的出現了CPU百分百的情況！！！

分析原因

1. 讀寫鎖的「拿鎖」和「放鎖」操作並不是一個完整的原子操作，而是有可能操作到一半被調度出去；

2. 此次實驗結果顯示，task1（非實時）在做unlock操作時，已經修改了一部分讀寫鎖的屬性，此時task2（實時）lock時，發現不需要再阻塞了，只需要自旋等待（死循環）task1將unlock操作做完；

然而由於task1是實時任務，整個進程又只綁定到了第一個核上，task1無法得到調度，造成了task2的死循環。

Tags: C CPU親和屬性 linux Linux雜談讀寫鎖