Impala查詢卡頓分析案例

  • 2019 年 11 月 6 日
  • 筆記

作者簡介:黃權隆,Cloudera研發工程師,Apache Impala PMC & Comitter,畢業於北大電腦系網路所資料庫實驗室,曾就職於Hulu大數據基礎架構團隊,負責大數據系統的維護和二次開發,主要負責Impala和HBase方向。現就職於Cloudera,專註於Impala開發。


最近在開發時遇到查詢卡頓(stuck)的情況,感覺比較有代表性,因此記錄一下排查過程。在生產環境中也可以用類似的方法找到卡頓的源頭。

註:本文所用的Impala版本為3.4.0-SNAPSHOT。Impala由C++和Java寫成,分別對應BE (Backend) 和 FE (Frontend) 兩部分。

案例描述

使用Alter Table語句新建一個partition時,查詢一直不返回。在Coordinator網頁上可以看到查詢一直處於CREATED狀態,且持續長達十幾分鐘:

這個查詢設置了 sync_ddl=true,在忙碌的集群里是有可能十幾分鐘未完成的,但在我的測試環境里就不應該這樣。

Alter Table Add Partition語句屬於DDL,設置sync_ddl=true時,其執行流程如下:

圖中各個步驟如下:

(1) 客戶端發送查詢請求

(2) Coordinator向Catalogd發送ExecDdl請求

(3) Catalogd向HMS發送AddPartition請求

(4) HMS創建Partition,並在HDFS中創建partition目錄

(5) Catalogd向NameNode載入新Partition的文件資訊

(6) Catalogd在執行完DDL後,將元數據的更新發送給Statestore。Statestore會將更新廣播給所有Coordinator。

(7) Catalogd返回包含更新的topic version給Coordinator

(8) Coordinator等到所有Coordinator都接收到了該version的更新時再返回給客戶端

卡頓可能發生在上面任何一步中,需要從頭開始排查。

查找卡頓源頭

I. 找到Coordinator卡住的執行緒

查詢是發給Coordinator的,因此從它開始。先查日誌,找到這個查詢相關的log。每個查詢在被解析的時候都會打一行 「Analyzing …」 這樣的日誌,藉此找到查詢:

$ grep -i analyzing /tmp/impalad_node2.INFO | grep 'ALTER TABLE insert_inherit_permission'  I1101 09:52:17.002415 20747 Frontend.java:1256] a14a8bbbaf75bf19:63c6f76f00000000] Analyzing query: ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1) db: default

我的查詢是 "ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1)",在grep的時候可以加上些關鍵字快速定位。在我的測試環境中,日誌是在 /tmp/impalad_node2.INFO,一般在CDH集群中,日誌是在 /var/log/impalad/impalad.INFO。

日誌中第三列就是執行緒id,因此FE處理這個查詢的執行緒id是20747。可以看下這個執行緒所有的log,看到最後是停在 "Analysis and authorization finished":

既然已經做完語義分析和許可權鑒定了,查詢是卡在哪裡呢?

先通過 jstack 看是否有相關資訊,jstack輸出中的nid就是十六進位的執行緒id,在我的例子中並沒有發現執行緒20747的堆棧,因此要查看所有BE的執行緒了。BE是C++寫的,理論上我們可以通過gdb attach上impalad,然後打出執行緒堆棧。但這樣會停住整個impalad進程,可能會導致心跳超時而查詢被取消的情況。為了不污染現場環境,可以使用gcore觸發一次core dump。

$ gcore  usage:  gcore [-o filename] pid  $ sudo gcore -o impalad_core 19731  ......  [New LWP 20697]  [New LWP 20747]  [Thread debugging using libthread_db enabled]  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  Installing openjdk unwinder  Traceback (most recent call last):    File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in <module>      class Types(object):    File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types      nmethodp_t = gdb.lookup_type('nmethod').pointer()  gdb.error: No type named nmethod.  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185  185    ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.  warning: target file /proc/19731/cmdline contained unexpected null characters  Saved corefile impalad_core.19731

Dump的時候會被形如 "[New LWP xxxxx]" 的輸出刷屏,還可能會有warning,但不用管,只要最後是成功的就行了。生成的core文件一般會比較大,因為把進程的記憶體也dump下來了。比如在我的開發環境中,core文件就有12GB。

$ ll -h impalad_core.19731  -rw-r--r-- 1 root root 12G 11月  1 10:08 impalad_core.19731

生產環境中的記憶體可能會更大,仍用core dump就不太合適了,後面我們再介紹其它的辦法。有了core文件後,就可以用gdb來分析了。可能會佔用大量記憶體,因此在生產環境中最好換台機器做分析。

$ gdb -c impalad_core.19731 /home/quanlong/workspace/Impala/be/build/latest/service/impalad  GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1  Copyright (C) 2016 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>  This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"  and "show warranty" for details.  This GDB was configured as "x86_64-linux-gnu".  Type "show configuration" for configuration details.  For bug reporting instructions, please see:  <http://www.gnu.org/software/gdb/bugs/>.  Find the GDB manual and other documentation resources online at:  <http://www.gnu.org/software/gdb/documentation/>.  For help, type "help".  Type "apropos word" to search for commands related to "word"...  Reading symbols from /home/quanlong/workspace/Impala/be/build/latest/service/impalad...done.  [New LWP 19731]  [New LWP 19762]  ...  [New LWP 20697]  [New LWP 20747]  [Thread debugging using libthread_db enabled]  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".  Core was generated by `/home/quanlong/workspace/Impala/be/build/latest/service/impalad'.  #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185  185    ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No such file or directory.  [Current thread is 1 (Thread 0x7f2f3fde48c0 (LWP 19731))]  Installing openjdk unwinder  Traceback (most recent call last):    File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 52, in <module>      class Types(object):    File "/usr/share/gdb/auto-load/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so-gdb.py", line 66, in Types      nmethodp_t = gdb.lookup_type('nmethod').pointer()  gdb.error: No type named nmethod.  (gdb)

找到執行緒20747並查看堆棧,gdb里的執行緒編號並不是執行緒id,但thread指令只能跳到指定編號的執行緒,因此我們要先用 "thread find 20747" 找到執行緒編號,然後再切到那個執行緒用bt指令列印堆棧:

(gdb) thread find 20747  Thread 285 has target id 'Thread 0x7f2e8f3f2700 (LWP 20747)'  (gdb) thread 285  [Switching to thread 285 (Thread 0x7f2e8f3f2700 (LWP 20747))]  #0  0x00007f2f3f7da87f in __libc_recv (fd=403, buf=0x10e9e000, n=131072, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28  28    ../sysdeps/unix/sysv/linux/x86_64/recv.c: No such file or directory.  (gdb) bt  #0  0x00007f2f3f7da87f in __libc_recv (fd=403, buf=0x10e9e000, n=131072, flags=0) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:28  #1  0x0000000004ccd3af in apache::thrift::transport::TSocket::read(unsigned char*, unsigned int) ()  ......

輸出的結果會比較長,如圖所示:

最重要的是劃紅線的部分,也就是這塊:

#15 0x0000000002513982 in impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpc  #16 0x000000000250fb8c in impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpcWithRetry  #17 0x000000000250cced in impala::CatalogOpExecutor::Exec  #18 0x00000000022b9012 in impala::ClientRequestState::ExecDdlRequest  #19 0x00000000022b4253 in impala::ClientRequestState::Exec  #20 0x0000000002241cea in impala::ImpalaServer::ExecuteInternal  #21 0x000000000224131d in impala::ImpalaServer::Execute

所以Coordinator是向Catalogd發了RPC請求,正在等待結果返回。堆棧里有程式碼行號,也可查看相關程式碼。

現在可以確認Coordinator是卡在RPC上了,接下來需要確認Catalogd在做什麼。

II. 找到Catalogd卡住的執行緒

Catalogd的日誌里並沒有發現明顯的問題,還是得看執行緒堆棧。我這裡catalogd比較閑,jstack結果中只有一個是跟ExecDdl相關的:

"Thread-10" #43 prio=5 os_prio=0 tid=0x0000000012324800 nid=0x510c in Object.wait() [0x00007f6808952000]     java.lang.Thread.State: TIMED_WAITING (on object monitor)          at java.lang.Object.wait(Native Method)          at org.apache.impala.catalog.CatalogServiceCatalog.waitForSyncDdlVersion(CatalogServiceCatalog.java:2635)          - locked <0x0000000722aff010> (a org.apache.impala.catalog.TopicUpdateLog)          at org.apache.impala.service.CatalogOpExecutor.execDdlRequest(CatalogOpExecutor.java:414)          at org.apache.impala.service.JniCatalog.execDdl(JniCatalog.java:167)

因此源頭就是卡在FE的 CatalogServiceCatalog.waitForSyncDdlVersion() 里,執行緒id是0x510c,也就是20748. 回看日誌,最後一行輸出是 "Loaded file and block metadata for …"

因此該表的metadata載入已經完成了,進入waitForSyncDdlVersion是因為該查詢使用了sync_ddl=true,catalogd需要等到更新被發送出去才能知道對應的statestore topic version,才能返回給coordinator。關於metadata的廣播及sync_ddl的實現篇幅較長,我們在後續的文章中再介紹。

到此我的診斷就結束了,可以結合程式碼去debug。然而在生產環境中catalogd一般很忙碌,不一定能直接鎖定jstack中的執行緒。有一個想法是通過coordinator執行緒確定跟catalogd連接的埠,再以此找到catalogd對應的執行緒,但fd是所有執行緒共享的,並沒法通過埠來找到執行緒。因此目前也沒有比較好的辦法,好在catalogd的大部分邏輯都是java實現的,可以多打幾次jstack,對比看哪些執行緒一直存在來排查。

使用Minidump替代Core Dump

生產環境中的impalad記憶體一般會比較大,如果使用core dump,生成的core文件也會很大,不利於分析。實際上我們感興趣的是各個執行緒的堆棧,並不需要所有的記憶體數據。Google Breakpad 提供的 Minidumps 提供了新的解決方案,它只會導出崩潰原因、各個執行緒不超過8KB的棧記憶體、運行環境(OS、CPU等)和進程資訊(載入的模組等)。這樣導出的 Minidump 文件就非常小,在本例中,core dump文件的大小是 12GB,minidump 文件的大小卻只有 3MB!

使用要求

  • Impala版本不低於2.7:Impala 2.7(對應的CDH版本是cdh5.9)開始支援外界觸發的Minidump,只需要給進程發送SIGUSR1訊號。
  • 編譯對應版本的Impala源碼:需要使用 Debug 模式編譯才能保留所有符號資訊(symbol),還需要使用toolchain中的breakpad工具來處理minidump。另外編譯環境要跟線上環境一致,比如線上環境為ubuntu16.04,編譯時也要找ubuntu16.04的機器來做。關於如何編譯,可參考文檔:https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala

使用方法

1. 生成Minidump

找到impalad進程的PID,我這裡是19731,然後用 "kill -s SIGUSR1 19731" 觸發Minidump,成功後在impalad的日誌中能找到生成的路徑:

Wrote minidump to /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp

‍‍把這個minidump文件傳送到編譯Impala的機器上。

2. 生成符號資訊

Minidump文件並不包含任何符號資訊,因此我們要自己生成Breakpad形式的符號資訊來匹配上,這樣我們才能看到函數名(而不只是函數地址)等等。

在編譯Impala的目錄(下稱IMPALA_HOME)里,可以找到 bin/dump_breakpad_symbols.py,用它來生成breakpad形式的符號資訊,假設說我們希望把符號資訊存到 /tmp/syms,指令如下:

$ cd $IMPALA_HOME  $ source bin/impala-config.sh  $ bin/dump_breakpad_symbols.py -f be/build/latest/service/libfesupport.so be/build/latest/service/impalad be/build/latest/util/libloggingsupport.so -d /tmp/syms  INFO:root:Processing binary file: be/build/latest/service/libfesupport.so  INFO:root:Processing binary file: be/build/latest/service/impalad  INFO:root:Processing binary file: be/build/latest/util/libloggingsupport.so

生成的文件大小是421MB,比Minidump文件還大一些,因為裡面有各種文件名、函數名等:

$ du -sh /tmp/syms  421M    /tmp/syms  $ head /tmp/syms/impalad/0894679AAB4BBE9079BA0C593F2114890/impalad.sym  MODULE Linux x86_64 0894679AAB4BBE9079BA0C593F2114890 impalad  INFO CODE_ID 9A6794084BAB90BE79BA0C593F211489  FILE 0 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c  FILE 1 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c  FILE 2 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c  FILE 3 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c  FILE 4 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/posix/mutexes.c  FILE 5 /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/os/posix/thread_local_storage.c  FILE 6 /home/quanlong/workspace/Impala/be/generated-sources/gen-cpp/BackendGflags_types.cpp  FILE 7 /home/quanlong/workspace/Impala/be/generated-sources/gen-cpp/BackendGflags_types.h

3. 解析Minidump

使用Impala toolchain里的breakpad工具,假設對應的版本是 IMPALA_BREAKPAD_VERSION, 指令如下:

$IMPALA_TOOLCHAIN/breakpad-$IMPALA_BREAKPAD_VERSION/bin/minidump_stackwalk $MINIDUMP_FILE $BREAKPAD_SYMBOLS > $RESULT_FILE

在本例中就是:

$ $IMPALA_HOME/toolchain/breakpad-97a98836768f8f0154f8f86e5e14c2bb7e74132e-p2/bin/minidump_stackwalk /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp /tmp/syms > /tmp/resolved.txt  ......  2019-11-03 08:33:22: basic_code_modules.cc:110: INFO: No module at 0x0  2019-11-03 08:33:22: basic_code_modules.cc:110: INFO: No module at 0x0  2019-11-03 08:33:22: minidump_processor.cc:326: INFO: Processed /tmp/minidumps/impalad/2d32f167-9189-4fdc-9fc4ef9d-624b3a1d.dmp  2019-11-03 08:33:22: minidump.cc:4980: INFO: Minidump closing minidump

中間會有刷屏的輸出,只要最終解析成功就行了。生成的文件內容如下:

Operating system: Linux                    0.0.0 Linux 4.15.0-66-generic #75~16.04.1-Ubuntu SMP Tue Oct 1 14:01:08 UTC 2019 x86_64  CPU: amd64       family 6 model 158 stepping 10       1 CPU    GPU: UNKNOWN    Crash reason:  DUMP_REQUESTED  Crash address: 0x4d78597  Process uptime: not available    Thread 0 (crashed)   0  impalad!google_breakpad::ExceptionHandler::WriteMinidump() + 0x57      rax = 0x00007ffeed518510   rdx = 0x0000000000000000      rcx = 0x0000000004d7857f   rbx = 0x0000000000000000      rsi = 0x0000000000000001   rdi = 0x00007ffeed5183a8      rbp = 0x00007ffeed518ab0   rsp = 0x00007ffeed518310       r8 = 0x0000000000000000    r9 = 0x0000000000000024      r10 = 0x00007f2f3f7db340   r11 = 0x00007ffeed518980      r12 = 0x00007ffeed518970   r13 = 0x0000000002428f99      r14 = 0x0000000000000000   r15 = 0x00007ffeed5189d0      rip = 0x0000000004d78597      Found by: given as instruction pointer in context   1  impalad!google_breakpad::ExceptionHandler::WriteMinidump(std::string const&, bool (*)(google_breakpad::MinidumpDescriptor const&, void*, bool), void*) + 0xf0      rbx = 0x00007f2f3caf0140   rbp = 0x00007ffeed518ab0      rsp = 0x00007ffeed518960   r12 = 0x00007ffeed518970      r13 = 0x0000000002428f99   r14 = 0x0000000000000000      r15 = 0x00007ffeed5189d0   rip = 0x0000000004d78e60      Found by: call frame info   2  impalad!impala::HandleSignal [minidump.cc : 103 + 0x1e]      rbx = 0x0000000000000000   rbp = 0x00007ffeed518af0      rsp = 0x00007ffeed518ac0   r12 = 0x00007ffeed519250      r13 = 0x00000000111263d0   r14 = 0x0000000000006d62      r15 = 0x0000000000000000   rip = 0x00000000024290ec      Found by: call frame info   3  libpthread-2.23.so + 0x11390      rbx = 0x0000000000000000   rbp = 0x00007ffeed519240      rsp = 0x00007ffeed518b00   r12 = 0x00007ffeed519250      r13 = 0x00000000111263d0   r14 = 0x0000000000006d62      r15 = 0x0000000000000000   rip = 0x00007f2f3f7db390      Found by: call frame info   4  impalad!boost::thread::join_noexcept() + 0x5c      rbp = 0x00007ffeed519260   rsp = 0x00007ffeed519250      rip = 0x0000000003c6a13c      Found by: previous frame's frame pointer  ...

先是OS、CPU等環境資訊,然後就是各個執行緒的堆棧資訊,每個堆棧的各個暫存器值也都顯示出來了。第一個執行緒是breakpad做minidump的執行緒,因為Crash reason是DUMP_REQUESTED(實際上進程還活著,並沒有crash)。如果是在線上環境進程崩潰,則第一個執行緒會是導致崩潰的執行緒。

回到Coordinator卡頓的分析,我們先是從日誌找到了執行緒id為20747:

I1101 09:52:17.002415 20747 Frontend.java:1256] a14a8bbbaf75bf19:63c6f76f00000000] Analyzing query: ALTER TABLE insert_inherit_permission ADD PARTITION(p1=1, p2=1, p3=1) db: default

可惜breakpad並沒有打出像gdb一樣的LWP資訊,執行緒的編號也跟gdb的不一樣。但據觀察,執行緒id(即LWP id)一般會出現在執行緒堆棧里前幾個調用的暫存器里,經常可以藉此找到執行緒。比如我們關心的執行緒id是20747,轉換成16進位就是510b,在輸出文件 /tmp/resolved.txt 里搜510b,剛好只有一個執行緒其堆棧里前幾個調用的暫存器rbx值為 0x000000000000510b,而這個執行緒的堆棧也正就是 ExecDdl 的堆棧:

Thread 283  ...  13  impalad!impala::CatalogServiceClient::recv_ExecDdl(impala::TDdlExecResponse&) [CatalogService.cpp : 1908 + 0x1f]      rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ecf20      rsp = 0x00007f2e8f3ece60   r12 = 0x0000000000000000      r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c      r15 = 0x0000000000000000   rip = 0x0000000001c319a4      Found by: call frame info  14  impalad!impala::CatalogServiceClientWrapper::ExecDdl(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*) [catalog-service-client-wrapper.h : 47 + 0x13]      rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ecf90      rsp = 0x00007f2e8f3ecf30   r12 = 0x0000000000000000      r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c      r15 = 0x0000000000000000   rip = 0x0000000002512b51      Found by: call frame info  15  impalad!impala::Status impala::ClientConnection<impala::CatalogServiceClientWrapper>::DoRpc<void (impala::CatalogServiceClientWrapper::*)(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*), impala::TDdlExecRequest, impala::TDdlExecResponse>(void (impala::CatalogServiceClientWrapper::* const&)(impala::TDdlExecResponse&, impala::TDdlExecRequest const&, bool*), impala::TDdlExecRequest const&, impala::TDdlExecResponse*) [client-cache.h : 243 + 0x31]      rbx = 0x0000000000000000   rbp = 0x00007f2e8f3ed070      rsp = 0x00007f2e8f3ecfa0   r12 = 0x0000000000000000      r13 = 0x00000000110d1ba8   r14 = 0x0000000002512a7c      r15 = 0x0000000000000000   rip = 0x0000000002513982      Found by: call frame info  ...  31  impalad!boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long, (impala::PromiseMode)0>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long, (impala::PromiseMode)0>*> > >::operator()() [bind_template.hpp : 20 + 0x22]      rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f18a0      rsp = 0x00007f2e8f3f1860   r12 = 0x0000000000000000      r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8      r15 = 0x000000000c2822b0   rip = 0x0000000001fe8aa9      Found by: call frame info  32  impalad!boost::detail::function::void_function_obj_invoker0<boost::_bi::bind_t<void, boost::_mfi::mf2<void, impala::ThriftThread, boost::shared_ptr<apache::thrift::concurrency::Runnable>, impala::Promise<unsigned long, (impala::PromiseMode)0>*>, boost::_bi::list3<boost::_bi::value<impala::ThriftThread*>, boost::_bi::value<boost::shared_ptr<apache::thrift::concurrency::Runnable> >, boost::_bi::value<impala::Promise<unsigned long, (impala::PromiseMode)0>*> > >, void>::invoke(boost::detail::function::function_buffer&) [function_template.hpp : 153 + 0xc]      rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f18d0      rsp = 0x00007f2e8f3f18b0   r12 = 0x0000000000000000      r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8      r15 = 0x000000000c2822b0   rip = 0x0000000001fe89bc      Found by: call frame info  33  impalad!boost::function0<void>::operator()() const [function_template.hpp : 767 + 0x11]      rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f1910      rsp = 0x00007f2e8f3f18e0   r12 = 0x0000000000000000      r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8      r15 = 0x000000000c2822b0   rip = 0x0000000001f04e58      Found by: call frame info  34  impalad!impala::Thread::SuperviseThread(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0xf]      rbx = 0x000000000000510b   rbp = 0x00007f2e8f3f1c70      rsp = 0x00007f2e8f3f1920   r12 = 0x0000000000000000      r13 = 0x00007f2e903f9c60   r14 = 0x000000000c7b95c8      r15 = 0x000000000c2822b0   rip = 0x000000000247bf47      Found by: call frame info  ...

堆棧里同時也有程式碼的行號,感興趣的朋友可以下載整個 resolved.txt 文件 (壓縮後73KB) 研究研究:https://github.com/stiga-huang/BlogFiles/blob/master/Debugging%20on%20Stuck%20Query/resolved.txt.gz

在這之後就知道Coordinator是卡在等待ExecDdl的RPC返回了,後面同樣可以對catalogd做分析。

總結

Impalad查詢卡頓時,如果日誌中無法發現異常,對於BE部分可以使用core dump或minidump來做執行緒堆棧分析,對於FE部分可以用jstack做分析。日誌中的執行緒id對應gdb中的 LWP id 和 jstack 中的 nid (16進位),在minidump中找不到明顯對應,可試圖在暫存器值中匹配執行緒。

(註:BE是指Backend,主要是Impala的C++部分;FE是指Frontend,主要是Impala的Java部分)

指令回顧

  • gcore [-o filename] pid
  • gdb -c core_file program_binary
  • gdb指令
    • thread find thread_id (查找對應執行緒id在gdb中的執行緒編號)
    • thread id (跳到對應編號的執行緒)
    • bt (列印堆棧)
  • jstack pid
  • kill -s SIGUSR1 pid (觸發impala進程生成minidump)
  • $IMPALA_HOME/bin/dump_breakpad_symbols.py -f binaries -d output_symbol_dir (生成Breakpad形式的符號文件)
  • $IMPALA_TOOLCHAIN/breakpad-$IMPALA_BREAKPAD_VERSION/bin/minidump_stackwalk $MINIDUMP_FILE $BREAKPAD_SYMBOLS > $RESULT_FILE (解析Minidump)

參考文檔

  • https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Core+Dumps+on+Another+System
  • https://cwiki.apache.org/confluence/display/IMPALA/Debugging+Impala+Minidumps
  • https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala