從零寫一個編譯器(十三):程式碼生成之遍歷AST

  • 2019 年 10 月 3 日
  • 筆記

項目的完整程式碼在 C2j-Compiler

前言

在上一篇完成對JVM指令的生成,下面就可以真正進入程式碼生成部分了。通常現代編譯器都是先把生成IR,再經過程式碼優化等等,最後才編譯成目標平台程式碼。但是時間水平有限,我們沒有IR也沒有程式碼優化,就直接利用AST生成Java位元組碼

入口

進行程式碼生成的入口在CodeGen,和之前解釋器一樣:先獲取main函數的頭節點,從這個節點開始,先進入函數定義,再進入程式碼塊

函數定義節點

在進入函數定義節點的時候,就要生成一個函數定義對應的Java位元組碼,即一個靜態方法(因為我們對整個C語言文件生成為一個類,main方法為public static main,其它的則是對應的靜態方法,結構體則是另外的類)

  • 對於函數定義先從節點中拿到對應的函數命和參數
  • emitArgs是用來處理參數的,根據參數生成相應的Java位元組碼
  • 如果這個函數是main的話已經是交由前面處理了,邏輯差不多(具體在start.Start中)
case SyntaxProductionInit.NewName_LP_RP_TO_FunctDecl:      root.reverseChildren();      AstNode n = root.getChildren().get(0);      String name = (String) n.getAttribute(NodeKey.TEXT);      symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);      generator.setCurrentFuncName(name);      if (name != null && !name.equals("main")) {          String declaration = name + emitArgs(symbol);          generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);          generator.setNameAndDeclaration(name, declaration);      }      copyChild(root, root.getChildren().get(0));      break;    case SyntaxProductionInit.NewName_LP_VarList_RP_TO_FunctDecl:      n = root.getChildren().get(0);      name = (String) n.getAttribute(NodeKey.TEXT);      symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);      generator.setCurrentFuncName(name);      if (name != null && !name.equals("main")) {          String declaration = name + emitArgs(symbol);          generator.emitDirective(Directive.METHOD_PUBBLIC_STATIC, declaration);          generator.setNameAndDeclaration(name, declaration);      }        Symbol args = symbol.getArgList();        if (args == null || argsList == null || argsList.isEmpty()) {          System.err.println("generate function with arg list but arg list is null");          System.exit(1);      }      break;

創建結構體和數組

數組

創建結構體和數組的節點在DefGenerate里,可以看到在這裡只處理了數組和普通變數,有關結構體的處理是在對結構體第一次使用的時候。順便提一下程式碼生成對於賦初值操作是沒有進行處理的。

  • 如果是個數組,酒直接調用ProgramGenerator直接生成創建數組的指令
  • 如果是個普通變數,就直接找到它並且賦值為0(這裡變數在隊列里的位置是根據符號表來計算的,具體可以看上一篇的getLocalVariableIndex方法)
public class DefGenerate extends BaseGenerate {      @Override      public Object generate(AstNode root) {          int production = (int) root.getAttribute(NodeKey.PRODUCTION);          ProgramGenerator generator = ProgramGenerator.getInstance();          Symbol symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);            switch (production) {              case SyntaxProductionInit.Specifiers_DeclList_Semi_TO_Def:                  Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);                  if (declarator != null) {                      if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {                          generator.createArray(symbol);                      }                  } else {                      int i = generator.getLocalVariableIndex(symbol);                      generator.emit(Instruction.SIPUSH, "" + 0);                      generator.emit(Instruction.ISTORE, "" + i);                  }                    break;                default:                  break;          }            return root;      }  }

結構體

處理結構體定義的程式碼在UnaryNodeGenerate,也就是只有在使用到結構體定義時才會進行定義

  • 先拿到當前UNARY的符號,如果instanceof ArrayValueSetter就說明是一個結構體數組,就進入getStructSymbolFromStructArray方法創建一個結構體數組,並返回當前下標的結構體對象
  • 設置當前結構體的作用域範圍
  • 對結構體作為類進行定義
  • 然後對讀取結構體的域
  • 其實可以忽略指針部分,因為程式碼生成並沒有對指針進行模擬
case SyntaxProductionInit.Unary_StructOP_Name_TO_Unary:      child = root.getChildren().get(0);      String fieldName = (String) root.getAttribute(NodeKey.TEXT);      Object object = child.getAttribute(NodeKey.SYMBOL);      boolean isStructArray = false;        if (object instanceof ArrayValueSetter) {          symbol = getStructSymbolFromStructArray(object);          symbol.addValueSetter(object);          isStructArray = true;      } else {          symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);      }        if (isStructArray) {          ArrayValueSetter vs = (ArrayValueSetter) object;          Symbol structArray = vs.getSymbol();          structArray.addScope(ProgramGenerator.getInstance().getCurrentFuncName());      } else {          symbol.addScope(ProgramGenerator.getInstance().getCurrentFuncName());      }        ProgramGenerator.getInstance().putStructToClassDeclaration(symbol);        if (isSymbolStructPointer(symbol)) {          copyBetweenStructAndMem(symbol, false);      }        Symbol args = symbol.getArgList();      while (args != null) {          if (args.getName().equals(fieldName)) {              args.setStructParent(symbol);              break;          }            args = args.getNextSymbol();      }        if (args == null) {          System.err.println("access a filed not in struct object!");          System.exit(1);      }        if (args.getValue() != null) {          ProgramGenerator.getInstance().readValueFromStructMember(symbol, args);      }        root.setAttribute(NodeKey.SYMBOL, args);      root.setAttribute(NodeKey.VALUE, args.getValue());        if (isSymbolStructPointer(symbol)) {          checkValidPointer(symbol);          structObjSymbol = symbol;          monitorSymbol = args;            GenerateBrocasterImpl.getInstance().registerReceiverForAfterExe(this);      } else {          structObjSymbol = null;      }      break;

一元操作節點

這個節點和在解釋器的有很多相同,除了有對結構體的操作,其它的也是有非常重要的作用

  • 像數字、字元串或者是變數和之前的操作都是把資訊傳遞到父節點,交由父節點處理
case SyntaxProductionInit.Number_TO_Unary:      text = (String) root.getAttribute(NodeKey.TEXT);      boolean isFloat = text.indexOf('.') != -1;      if (isFloat) {          value = Float.valueOf(text);          root.setAttribute(NodeKey.VALUE, value);      } else {          value = Integer.valueOf(text);          root.setAttribute(NodeKey.VALUE, value);      }      break;    case SyntaxProductionInit.Name_TO_Unary:      symbol = (Symbol) root.getAttribute(NodeKey.SYMBOL);      if (symbol != null) {          root.setAttribute(NodeKey.VALUE, symbol.getValue());          root.setAttribute(NodeKey.TEXT, symbol.getName());      }      break;    case SyntaxProductionInit.String_TO_Unary:      text = (String) root.getAttribute(NodeKey.TEXT);      root.setAttribute(NodeKey.VALUE, text);      break;    case SyntaxProductionInit.Unary_LB_Expr_RB_TO_Unary:      child = root.getChildren().get(0);      symbol = (Symbol) child.getAttribute(NodeKey.SYMBOL);        child = root.getChildren().get(1);      int index = 0;      if (child.getAttribute(NodeKey.VALUE) != null) {          index = (Integer) child.getAttribute(NodeKey.VALUE);      }      Object idxObj = child.getAttribute(NodeKey.SYMBOL);        try {          Declarator declarator = symbol.getDeclarator(Declarator.ARRAY);          if (declarator != null) {              Object val = declarator.getElement((int) index);              root.setAttribute(NodeKey.VALUE, val);              ArrayValueSetter setter;              if (idxObj == null) {                  setter = new ArrayValueSetter(symbol, index);              } else {                  setter = new ArrayValueSetter(symbol, idxObj);              }                root.setAttribute(NodeKey.SYMBOL, setter);              root.setAttribute(NodeKey.TEXT, symbol.getName());            }          Declarator pointer = symbol.getDeclarator(Declarator.POINTER);          if (pointer != null) {              setPointerValue(root, symbol, index);              PointerValueSetter pv = new PointerValueSetter(symbol, index);              root.setAttribute(NodeKey.SYMBOL, pv);              root.setAttribute(NodeKey.TEXT, symbol.getName());          }      } catch (Exception e) {          e.printStackTrace();          System.exit(1);      }      break;

賦值操作

  • 如果當前是一個數組,先拿到它的符號和下標
  • 如果不是結構體數組,那麼拿到下標直接用readArrayElement生成讀取數組元素的指令
  • 如果是一個符號則用getLocalVariableIndex讀取這個符號的值
  • 如果是一個常數,則直接生成IPUSH指令
  • 最後進行賦值操作,如果不是對結構體的域進行賦值就直接用getLocalVariableIndex拿到隊列位置然後生成ISTORE
  • 如果是對結構體數組的元素的域的賦值,就調用assignValueToStructMemberFromArray生成程式碼,如果只是結構體就直接調用assignValueToStructMember生成程式碼
ProgramGenerator generator = ProgramGenerator.getInstance();    if (BaseGenerate.resultOnStack) {      this.value = obj;      BaseGenerate.resultOnStack = false;  } else if (obj instanceof ArrayValueSetter) {      ArrayValueSetter setter = (ArrayValueSetter) obj;      Symbol symbol = setter.getSymbol();      Object index = setter.getIndex();      if (symbol.getSpecifierByType(Specifier.STRUCTURE) == null) {          if (index instanceof Symbol) {              ProgramGenerator.getInstance().readArrayElement(symbol, index);              if (((Symbol) index).getValue() != null) {                  int i = (int) ((Symbol) index).getValue();                  try {                      this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);                  } catch (Exception e) {                      e.printStackTrace();                  }              }          } else {              int i = (int) index;              try {                  this.value = symbol.getDeclarator(Declarator.ARRAY).getElement(i);              } catch (Exception e) {                  e.printStackTrace();              }                ProgramGenerator.getInstance().readArrayElement(symbol, index);          }      }  } else if (obj instanceof Symbol) {      Symbol symbol = (Symbol) obj;      this.value = symbol.value;      int i = generator.getLocalVariableIndex(symbol);      generator.emit(Instruction.ILOAD, "" + i);  } else if (obj instanceof Integer) {      Integer val = (Integer) obj;      generator.emit(Instruction.SIPUSH, "" + val);      this.value = obj;  }    if (!this.isStructMember()) {      int idx = generator.getLocalVariableIndex(this);      if (!generator.isPassingArguments()) {          generator.emit(Instruction.ISTORE, "" + idx);      }  } else {      if (this.getStructSymbol().getValueSetter() != null) {          generator.assignValueToStructMemberFromArray(this.getStructSymbol().getValueSetter(), this, this.value);      } else {          generator.assignValueToStructMember(this.getStructSymbol(), this, this.value);      }  }

最後

完成這部分後,對下面的程式碼

void quicksort(int A[10], int p, int r) {      int x;      int i;      i = p - 1;      int j;      int t;      int v;      v = r - 1;      if (p < r) {          x = A[r];          for (j = p; j <= v; j++) {              if (A[j] <= x) {                  i++;                  t = A[i];                  A[i] = A[j];                  A[j] = t;              }          }          v = i + 1;          t = A[v];          A[v] = A[r];          A[r] = t;          t = v - 1;          quicksort(A, p, t);          t = v + 1;          quicksort(A, t, r);      }  }    void main () {      int a[10];      int i;      int t;      printf("before quick sort:");      for(i = 0; i < 10; i++) {          t = (10 - i);          a[i] = t;          printf("value of a[%d] is %d", i, a[i]);      }      quicksort(a, 0, 9);      printf("after quick sort:");      for (i = 0; i < 10; i++) {          printf("value of a[%d] is %d", i, a[i]);      }  }

則會生成下面的Java位元組碼

.class public C2Bytecode  .super java/lang/Object    .method public static main([Ljava/lang/String;)V      sipush  10      newarray    int      astore  0      sipush  0      istore  1      sipush  0      istore  2      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "before quick sort:"      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "  "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      sipush  0      istore  1    loop0:      iload   1      sipush  10  if_icmpge branch0      sipush  10      iload   1      isub      istore  2      aload   0      iload   1      iload   2      iastore      aload   0      iload   1      iaload      istore  3      iload   1      istore  4      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "value of a["      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      iload   4      invokevirtual   java/io/PrintStream/print(I)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "] is "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      iload   3      invokevirtual   java/io/PrintStream/print(I)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "  "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      iload   1      sipush  1      iadd      istore  1  goto loop0  branch0:      aload   0      sipush  0      sipush  9      invokestatic    C2Bytecode/quicksort([III)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "after quick sort:"      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "  "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      sipush  0      istore  1    loop2:      iload   1      sipush  10  if_icmpge branch4      aload   0      iload   1      iaload      istore  3      iload   1      istore  4      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "value of a["      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      iload   4      invokevirtual   java/io/PrintStream/print(I)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "] is "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      iload   3      invokevirtual   java/io/PrintStream/print(I)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "  "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      iload   1      sipush  1      iadd      istore  1  goto loop2  branch4:      return  .end method  .method public static quicksort([III)V      sipush  2      newarray    int      astore  6      sipush  0      istore  5      sipush  1      istore  5      aload   6      iload   5      sipush  1      iastore      aload   6      sipush  1      iaload      istore  10      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "before quick sort: "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      iload   10      invokevirtual   java/io/PrintStream/print(I)V      getstatic   java/lang/System/out Ljava/io/PrintStream;      ldc "  "      invokevirtual   java/io/PrintStream/print(Ljava/lang/String;)V      sipush  0      istore  9      sipush  0      istore  3      iload   1      sipush  1      isub      istore  3      sipush  0      istore  4      sipush  0      istore  7      sipush  0      istore  8      iload   2      sipush  1      isub      istore  8      iload   1      iload   2  if_icmpge branch1        aload   0      iload   2      iaload      istore  9      iload   1      istore  4    loop1:        iload   4      iload   8  if_icmpgt ibranch1        aload   0      iload   4      iaload      iload   9  if_icmpgt ibranch2        iload   3      sipush  1      iadd      istore  3      aload   0      iload   3      iaload      istore  7      aload   0      iload   3      aload   0      iload   4      iaload      iastore      aload   0      iload   4      iload   7      iastore  ibranch2:        iload   4      sipush  1      iadd      istore  4  goto loop1    ibranch1:        iload   3      sipush  1      iadd      istore  8      aload   0      iload   8      iaload      istore  7      aload   0      iload   8      aload   0      iload   2      iaload      iastore      aload   0      iload   2      iload   7      iastore      iload   8      sipush  1      isub      istore  7      aload   0      iload   1      iload   7      invokestatic    C2Bytecode/quicksort([III)V      iload   8      sipush  1      iadd      istore  7      aload   0      iload   7      iload   2      invokestatic    C2Bytecode/quicksort([III)V  branch1:        return  .end method    .end class

小結

這篇的程式碼生成和之前解釋器的思路很相似,都是根據AST和對應的產生式來執行或者生成程式碼。

其實主要的思路是很清晰的,只是其中有太多細節容易讓人太過糾結。這個系列算作是我自己的學習筆記,到這也有十三篇了,下一篇可能寫寫總結就正式結束了。

歡迎Star!