CentOS7 單節點和多節點 HPL測試

前置工作:安裝OpenBLAS; 安裝Mpich (可參考首頁部落格)

  • 官網下載壓縮包到/opt目錄

    cd /opt && wget //www.netlib.org/benchmark/hpl/hpl-2.3.tar.gz
    

    image-20220512104017520

  • 解壓到 /opt 目錄

    tar -xzf hpl-2.3.tar.gz
    
  • 複製Make.Linux_PII_CBLAS並重命名

    cd /opt/hpl-2.3 && cp setup/Make.Linux_PII_CBLAS Make.Linux
    
  • 編輯Make.Linux

    vim Make.Linux
    

    修改如下內容:

    ARCH = Linux
    
    TOPdir = /opt/hpl-2.3  # hpl安裝目錄
    
    MPdir = /opt/mpich     # mpich安裝目錄
    MPlib = $(MPdir)/lib/libmpi.a # mpi鏈接庫
    
    LAdir = /opt/OpenBLAS # openblas安裝目錄
    LAlib = $(LAdir)/lib/libopenblas.a  # openblas鏈接庫
    
    CC = /opt/mpich/bin/mpicc # compiler
    CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -pthread
    
    LINKER = /opt/mpich/bin/mpif77 # linker
    

    以上路徑根據個人安裝時的目錄修改

  • 構建hpl

    make arch=Linux
    

    若build成功,則會在/opt/hpl-2.3/bin/Linux目錄下生成HPL.dat和xhpl文件

  • 測試hpl

    cd /opt/hpl-2.3/bin/Linux
    
    1. 單節點測試

      mpiexec -np 4 ./xhpl
      
    2. 多節點測試

      編輯節點文件,輸入節點主機名或IP地址

      vim nodes
      

      eg:

      image-20220512103037665

      修改HPL.dat

      HPLinpack benchmark input file
      Innovative Computing Laboratory, University of Tennessee
      HPL.out      output file name (if any)
      6            device out (6=stdout,7=stderr,file)
      1            # of problems sizes (N)
      1200         Ns
      1            # of NBs
      232          NBs
      0            PMAP process mapping (0=Row-,1=Column-major)
      1            # of process grids (P x Q)
      1            Ps
      4            Qs
      16.0         threshold
      1            # of panel fact
      0            PFACTs (0=left, 1=Crout, 2=Right)
      1            # of recursive stopping criterium
      2            NBMINs (>= 1)
      1            # of panels in recursion
      2            NDIVs
      1            # of recursive panel fact.
      0            RFACTs (0=left, 1=Crout, 2=Right)
      1            # of broadcast
      0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
      1            # of lookahead depth
      1            DEPTHs (>=0)
      2            SWAP (0=bin-exch,1=long,2=mix)
      64           swapping threshold
      0            L1 in (0=transposed,1=no-transposed) form
      0            U  in (0=transposed,1=no-transposed) form
      1            Equilibration (0=no,1=yes)
      8            memory alignment in double (> 0)
      

      運行hpl

      mpiexec -np 4 -machinefile ./nodes ./xhpl
      

      image-20220512103517848

    3. HPL.dat配置項解釋

      HPLinpack benchmark input file                            # 文件頭,說明
      Innovative Computing Laboratory, University of Tennessee
      HPL.out      output file name (if any)                 # 如果使用文件保留輸出結果,設定文件名
      6            device out (6=stdout,7=stderr,file)     # 輸出方式選擇(stdout,stderr或文件)
      2            # of problems sizes (N)              # 指出要計算的矩陣規格有幾種
      1960  2048   Ns                                           # 每種規格分別的數值
      2            # of NBs                             # 指出使用幾種不同的分塊大小
      60 80        NBs                                     # 分別指出每種大小的具體值
      2            # of process grids (P x Q-l         # 指出用幾種進程組合方式
      2   4        Ps                                  # 每對PQ具體的值
      2   1        Qs                                   
      16.0         threshold                           # 餘數的閾值
      1            # of panel fact                     # 用幾種分解方法
      1            PFACTs (0=left, 1=Crout, 2=Right)    # 具體用哪種,0 left,1 crout,2 right
      1            # of recursive stopping criterium    # 幾種停止遞歸的判斷標準
      4            NBMINs (>= 1)                         # 具體的標準數值(須不小於1)
      1            # of panels in recursion              # 遞歸中用幾種分割法
      2            NDIVs                               # 這裡用一種NDIV值為2,即每次遞歸分成兩塊
      1            # of recursive panel fact.          # 用幾種遞歸分解方法
      2            RFACTs (0=left, 1=Crout, 2=Right)      # 這裡每種都用到(左,右,crout分解)
      1            # of broadcast                          # 用幾種廣播方法
      3            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)   # 指定具體哪種(有1-ring,1-ring Modified,2-ring,2ring Modified,Long以及long-Modified)
      1            # of lookahead depth     # 用幾種向前看的步數
      1            DEPTHs (>=0)             # 具體步數值(須大於等於0)
      2            SWAP (0=bin-exch,1=long,2=mix)  # 哪種交換演算法(bin-exchange,long或者二者混合)
      64           swapping threshold     # 採用混合的交換演算法時使用的閾值
      0            L1 in (0=transposed,1=no-transposed) form     # L1是否用轉置形式
      0	U  in (0=transposed,1=no-transposed) form    # U是否用轉置形式表示
      1            Equilibration (0=no,1=yes)                # 是否採用平衡狀態
      8            memory alignment in double (> 0)     # 指出程式運行時記憶體分配中的採用的對齊方式
      

Tags: