Pthread 並發編程（二）——自底向上深入理解線程

2022 年 11 月 14 日
筆記
pthread, 系統編程

Pthread 並發編程（二）——自底向上深入理解線程

前言

在本篇文章當中主要給大家介紹線程最基本的組成元素，以及在 pthread 當中給我們提供的一些線程的基本機制，因為很多語言的線程機制就是建立在 pthread 線程之上的，比如說 Python 和 Java，深入理解 pthread 的線程實現機制，可以極大的提升我們對於語言線程的認識。希望能夠幫助大家深入理解線程。

線程的基本元素

首先我們需要了解一些我們在使用線程的時候的常用的基本操作，如果不是很了解沒有關係我們在後續的文章當中會仔細談論這些問題。

線程的常見的基本操作：
- 線程的創建。
- 線程的終止。
- 線程之間的同步。
- 線程的調度。
- 線程當中的數據管理。
- 線程與進程之間的交互。
在 linux 當中所有的線程和進程共享一個地址空間。
進程與線程之間共享一些內核數據結構：
- 打開的文件描述符。
- 當前工作目錄。
- 用戶 id 和用戶組 id 。
- 全局數據段的數據。
- 進程的代碼。
- 信號（signals）和信號處理函數（signal handlers）。
線程獨有的：
- 線程的 ID 。
- 寄存器線程和棧空間。
- 線程的棧當中的局部變量和返回地址。
- 信號掩碼。
- 線程自己的優先級。
- errno。

在所有的 pthread 的接口當中，只有當函數的返回值是 0 的時候表示調用成功。

線程等待

在 pthread 的實現當中，每個線程都兩個特性：joinable 和 detached，當我們啟動一個線程的時候 (pthread_create) 線程的默認屬性是 joinable，所謂 joinable 是表示線程是可以使用 pthread_join 進行同步的。

當一個線程調用 pthread_join(T, ret)，當這個函數返回的時候就表示線程 T 已經終止了，執行完成。那麼就可以釋放與線程 T 的相關的系統資源。

如果一個線程的狀態是 detached 狀態的話，當線程結束的時候與這個線程相關的資源會被自動釋放掉，將資源歸還給系統，也就不需要其他的線程調用 pthread_join 來釋放線程的資源。

pthread_join 函數簽名如下：

int pthread_join(pthread_t thread, void **retval);

thread 表示等待的線程。
retval 如果 retval 不等於 NULL 則在 pthread_join 函數內部會將線程 thead 的退出狀態拷貝到 retval 指向的地址。如果線程被取消了，那麼 PTHREAD_CANCELED 將會被放在 retval 指向的地址。
函數的返回值
- EDEADLK 表示檢測到死鎖了，比入兩個線程都調用 pthread_join 函數等待對方執行完成。
- EINVAL 線程不是一個 joinable 的線程，一種常見的情況就是 pthread_join 一個 detached 線程。
- EINVAL 當調用 pthrea_join 等待的線程正在被別的線程調用 pthread_join 等待。
- ESRCH 如果參數 thread 是一個無效的線程，比如沒有使用 pthread_create 進行創建。
- 0 表示函數調用成功。

在下面的程序當中我們使用 pthread_join 函數去等待一個 detached 線程：

#include <stdio.h>
#include <error.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>

pthread_t t1, t2;

void* thread_1(void* arg) {

  int ret = pthread_detach(pthread_self());
  sleep(2);
  if(ret != 0)
    perror("");
  return NULL;
}


int main() {

  pthread_create(&t1, NULL, thread_1, NULL);
  sleep(1);
  int ret = pthread_join(t1, NULL);
  if(ret == ESRCH)
    printf("No thread with the ID thread could be found.\n");
  else if(ret == EINVAL) {
    printf("thread is not a joinable thread or Another thread is already waiting to join with this thread\n");
  }
  return 0;
}

上面的程序的輸出結果如下所示：

$ ./join.out
thread is not a joinable thread or Another thread is already waiting to join with this thread

在上面的程序當中我們在一個 detached 狀態的線程上使用 pthread_join 函數，因此函數的返回值是 EINVAL 表示線程不是一個 joinable 的線程。

在上面的程序當中 pthread_self() 返回當前正在執行的線程，返回的數據類型是 pthread_t ，函數 pthread_detach(thread) 的主要作用是將傳入的線程 thread 的狀態變成 detached 狀態。

我們再來看一個錯誤的例子，我們在一個無效的線程上調用 pthread_join 函數


#include <stdio.h>
#include <error.h>
#include <errno.h>
#include <pthread.h>
#include <unistd.h>

pthread_t t1, t2;

void* thread_1(void* arg) {

  int ret = pthread_detach(pthread_self());
  sleep(2);
  if(ret != 0)
    perror("");
  return NULL;
}


int main() {

  pthread_create(&t1, NULL, thread_1, NULL);
  sleep(1);
  int ret = pthread_join(t2, NULL);
  if(ret == ESRCH)
    printf("No thread with the ID thread could be found.\n");
  else if(ret == EINVAL) {
    printf("thread is not a joinable thread or Another thread is already waiting to join with this thread\n");
  }
  return 0;
}

上面的程序的輸出結果如下：

$./oin01.out
No thread with the ID thread could be found.

在上面的程序當中我們並沒有使用 t2 創建一個線程但是在主線程執行的代碼當中，我們使用 pthread_join 去等待他，因此函數的返回值是一個 EINVAL 。

我們再來看一個使用 retval 例子：

#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>

void* func(void* arg)
{
  pthread_exit((void*)100);
  return NULL;
}

int main() {
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);
  
  void* ret;
  pthread_join(t, &ret);
  printf("ret = %ld\n", (u_int64_t)(ret));
  return 0;
}

上面的程序的輸出結果如下所示：

$./understandthread/join03.out
ret = 100

在上面的程序當中我們使用一個參數 ret 去獲取線程的退出碼，從上面的結果我們可以知道，我們得到了正確的結果。

如果我們沒有在線程執行的函數當中使用 pthread_exit 函數當中明確的指出線程的退出碼，線程的退出碼就是函數的返回值。比如下面的的程序：

#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>

void* func(void* arg)
{
  return (void*)100;
}

int main() {
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);
  
  void* ret;
  pthread_join(t, &ret);
  printf("ret = %ld\n", (u_int64_t)(ret));
  return 0;
}

上面的程序的輸出結果也是 100 ，這與我們期待的結果是一致的。

獲取線程的棧幀和PC值

在多線程的程序當中，每個線程擁有自己的棧幀和PC寄存器（執行的代碼的位置，在 x86_86 裏面就是 rip 寄存器的值）。在下面的程序當中我們可以得到程序在執行時候的三個寄存器 rsp, rbp, rip 的值，我們可以看到，兩個線程執行時候的輸出是不一致的，這個也從側面反映出來線程是擁有自己的棧幀和PC值的。

#include <stdio.h>
#include <pthread.h>
#include <sys/types.h>

u_int64_t rsp;
u_int64_t rbp;
u_int64_t rip;

void find_rip() {
  asm volatile(
    "movq 8(%%rbp), %0;"
    :"=r"(rip)::
  );
}

void* func(void* arg) {
  printf("In func\n");
  asm volatile(             \
    "movq %%rsp, %0;"       \
    "movq %%rbp, %1;"       \
    :"=m"(rsp), "=m"(rbp):: \
  );
  find_rip();
  printf("stack frame: rsp = %p rbp = %p rip = %p\n", (void*)rsp, (void*)rbp, (void*) rip);
  return NULL;
}

int main() {
  printf("================\n");
  printf("In main\n");
  asm volatile(             \
    "movq %%rsp, %0;"       \
    "movq %%rbp, %1;"       \
    :"=m"(rsp), "=m"(rbp):: \
  );
  find_rip();
  printf("stack frame: rsp = %p rbp = %p rip = %p\n", (void*)rsp, (void*)rbp, (void*) rip);
  printf("================\n");
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);
  pthread_join(t, NULL);
  return 0;
}

上面的程序的輸出結果如下所示：

================
In main
stack frame: rsp = 0x7ffc47096d50 rbp = 0x7ffc47096d80 rip = 0x4006c6
================
In func
stack frame: rsp = 0x7f0a60d43ee0 rbp = 0x7f0a60d43ef0 rip = 0x400634

從上面的結果來看主線程和線程 t 執行的是不同的函數，而且兩個函數的棧幀差距還是很大的，我們計算一下 0x7ffc47096d80 – 0x7f0a60d43ef0 = 1038949363344 = 968G 的內存，因此很明顯這兩個線程使用的是不同的棧幀。

線程的線程號

在 pthread 當中的一個線程對應一個內核的線程，內核和 pthread 都給線程維護了一個線程的 id 號，我們可以使用 gettid 獲取操作系統給我們維護的線程號，使用函數 pthread_self 得到 pthread 線程庫給我們維護的線程號！

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h>

void* func(void* arg) {
  printf("pthread id = %ld tid = %d\n", pthread_self(), (int)gettid());
  return NULL;
}

int main() {
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);
  pthread_join(t, NULL);
  return 0;
}

上面的程序的輸出結果如下：

pthread id = 140063790135040 tid = 161643

線程與信號

在 pthread 庫當中主要給我們提供了一些函數用於信號處理，我們在 pthread 庫當中可以通過函數 pthread_kill 給其他的進程發送信號。

 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
 6) SIGABRT	 7) SIGBUS	 8) SIGFPE	 9) SIGKILL	10) SIGUSR1
11) SIGSEGV	12) SIGUSR2	13) SIGPIPE	14) SIGALRM	15) SIGTERM
16) SIGSTKFLT	17) SIGCHLD	18) SIGCONT	19) SIGSTOP	20) SIGTSTP
21) SIGTTIN	22) SIGTTOU	23) SIGURG	24) SIGXCPU	25) SIGXFSZ
26) SIGVTALRM	27) SIGPROF	28) SIGWINCH	29) SIGIO	30) SIGPWR
31) SIGSYS	34) SIGRTMIN	35) SIGRTMIN+1	36) SIGRTMIN+2	37) SIGRTMIN+3
38) SIGRTMIN+4	39) SIGRTMIN+5	40) SIGRTMIN+6	41) SIGRTMIN+7	42) SIGRTMIN+8
43) SIGRTMIN+9	44) SIGRTMIN+10	45) SIGRTMIN+11	46) SIGRTMIN+12	47) SIGRTMIN+13
48) SIGRTMIN+14	49) SIGRTMIN+15	50) SIGRTMAX-14	51) SIGRTMAX-13	52) SIGRTMAX-12
53) SIGRTMAX-11	54) SIGRTMAX-10	55) SIGRTMAX-9	56) SIGRTMAX-8	57) SIGRTMAX-7
58) SIGRTMAX-6	59) SIGRTMAX-5	60) SIGRTMAX-4	61) SIGRTMAX-3	62) SIGRTMAX-2
63) SIGRTMAX-1	64) SIGRTMAX

我們可以在一個線程當中響應其他線程發送過來的信號，並且響應信號處理函數，在使用具體的例子深入了解線程的信號機制之前，首先我們需要了解到的是在 pthread 多線程的程序當中所有線程是共享信號處理函數的，如果在一個線程當中修改了信號處理函數，這個結果是會影響其他線程的。

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>

void sig(int signo) {
  char s[1024];
  sprintf(s, "signo = %d tid = %d pthread tid = %ld\n", signo, gettid(), pthread_self());
  write(STDOUT_FILENO, s, strlen(s));
}

void* func(void* arg) {
  printf("pthread tid = %ld\n", pthread_self());
  for(;;);
  return NULL;
}

int main() {
  signal(SIGHUP, sig);
  signal(SIGTERM, sig);
  signal(SIGSEGV, sig);
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);

  sleep(1);
  pthread_kill(t, SIGHUP);
  sleep(1);
  return 0;
}

上面的程序的輸出結果如下所示：

pthread tid = 140571386894080
signo = 1 tid = 7785 pthread tid = 140571386894080

在上面的程序當中，我們首先在主函數裏面重新定義了幾個信號的處理函數，將 SIGHUP、SIGTERM 和 SIGSEGV 信號的處理函數全部聲明為函數 sig ，進程當中的線程接受到這個信號的時候就會調用對應的處理函數，在上面的程序當中主線程會給線程 t 發送一個 SIGHUP 信號，根據前面信號和數據對應關係我們可以知道 SIGHUP 對應的信號的數字等於 1 ，我們在信號處理函數當中確實得到了這個信號。

除此之外我們還可以設置線程自己的信號掩碼，在前文當中我們已經提到了，每個線程都擁有線程自己的掩碼，因此在下面的程序當中只有線程 2 響應了主線程發送的 SIGTERM 信號。

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>

void sig(int signo) {
  char s[1024];
  sprintf(s, "signo = %d tid = %d pthread tid = %ld\n", signo, gettid(), pthread_self());
  write(STDOUT_FILENO, s, strlen(s));
}

void* func(void* arg) {
  sigset_t set;
  sigemptyset(&set);
  sigaddset(&set, SIGTERM);
  pthread_sigmask(SIG_BLOCK, &set, NULL);
  // 上面的代碼的功能是阻塞 SIGTERM 這個信號 當這個信號傳輸過來的時候不會立即執行信號處理函數
  // 而是會等到將這個信號變成非阻塞的時候才會響應
  printf("func : pthread tid = %ld\n", pthread_self());
  for(;;);
  return NULL;
}

void* func02(void* arg) {
  printf("func02 : pthread tid = %ld\n", pthread_self());
  for(;;);
  return NULL;
}

int main() {
  signal(SIGTERM, sig);
  pthread_t t1;
  pthread_create(&t1, NULL, func, NULL);
  sleep(1);
  pthread_t t2;
  pthread_create(&t2, NULL, func02, NULL);
  sleep(1);
  pthread_kill(t1, SIGTERM);
  pthread_kill(t2, SIGTERM);
  sleep(2);
  return 0;
}

在上面的程序當中我們創建了兩個線程並且定義了 SIGTERM 的信號處理函數，在線程 1 執行的函數當中修改了自己阻塞的信號集，將 SIGTERM 變成了一種阻塞信號，也就是說當線程接受到 SIGTERM 的信號的時候不會立即調用 SIGTERM 的信號處理函數，只有將這個信號變成非阻塞的時候才能夠響應這個信號，執行對應的信號處理函數，但是線程 t2 並沒有阻塞信號 SIGTERM ，因此線程 t2 會執行對應的信號處理函數，上面的程序的輸出結果如下所示：

func : pthread tid = 139887896323840
func02 : pthread tid = 139887887931136
signo = 15 tid = 10652 pthread tid = 139887887931136

根據上面程序的輸出結果我們可以知道線程 t2 確實調用了信號處理函數（根據 pthread tid ）可以判斷，而線程 t1 沒有執行信號處理函數。

在上文當中我們還提到了在一個進程當中，所有的線程共享同一套信號處理函數，如果在一個線程裏面重新定義了一個信號的處理函數，那麼他將會影響其他的線程，比如下面的程序：

#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>

void sig(int signo) {
  char s[1024];
  sprintf(s, "signo = %d tid = %d pthread tid = %ld\n", signo, gettid(), pthread_self());
  write(STDOUT_FILENO, s, strlen(s));
}

void sig2(int signo) {
  char* s = "thread-defined\n";
  write(STDOUT_FILENO, s, strlen(s));
}

void* func(void* arg) {
  signal(SIGSEGV, sig2);
  printf("pthread tid = %ld\n", pthread_self());
  for(;;);
  return NULL;
}

void* func02(void* arg) {
  printf("pthread tid = %ld\n", pthread_self());
  for(;;);
  return NULL;
}

int main() {
  signal(SIGSEGV, sig);
  pthread_t t;
  pthread_create(&t, NULL, func, NULL);
  sleep(1);
  pthread_t t2;
  pthread_create(&t2, NULL, func02, NULL);
  sleep(1);
  pthread_kill(t2, SIGSEGV);
  sleep(2);
  return 0;
}

上面的程序的輸出結果如下所示：

pthread tid = 140581246330624
pthread tid = 140581237937920
thread-defined

從上面程序輸出的結果我們可以看到線程 t2 執行的信號處理函數是 sig2 而這個信號處理函數是在線程 t1 執行的函數 func 當中進行修改的，可以看到線程 t1 修改的結果確實得到了響應，從這一點也可以看出，如果一個線程修改信號處理函數是會影響到其他的線程的。

總結

在本篇文章當中主要介紹了一些基礎了線程自己的特性，並且使用一些例子去驗證了這些特性，幫助我們從根本上去理解線程，其實線程涉及的東西實在太多了，在本篇文章裏面只是列舉其中的部分例子進行使用說明，在後續的文章當中我們會繼續深入的去談這些機制，比如線程的調度，線程的取消，線程之間的同步等等。

更多精彩內容合集可訪問項目：//github.com/Chang-LeHung/CSCore

關注公眾號：一無是處的研究僧，了解更多計算機（Java、Python、計算機系統基礎、算法與數據結構）知識。

Tags: pthread 系統編程