Android函數抽取殼的實現

2022 年 1 月 15 日
筆記
Android

0x0 前言

函數抽取殼這個詞不知道從哪起源的，但我理解的函數抽取殼是那種將dex文件中的函數代碼給nop，然後在運行時再把位元組碼給填回dex的這麼一種殼。

函數抽取前：

函數抽取後：

很早之前就想寫這類的殼，最近終於把它做出來了，取名為dpt。現在將代碼分享出來，歡迎把玩。項目地址：//github.com/luoyesiqiu/dpt-shell

0x1 項目的結構

dpt代碼分為兩個部分，一個是proccessor，另一個是shell。

proccessor是可以將普通apk處理成加殼apk的模塊。它的主要功能有：

解壓apk
提取apk中的dex的codeitem保存起來
修改Androidmanifest.xml中的Application類名
生成新的apk

流程如下：

shell模塊最終生成的dex文件和so文件將被集成到需要加殼的apk中。它的要功能有：

處理App的啟動
替換dexElements
hook相關函數
調用目標Application
codeitem文件讀取
codeitem填回

流程如下：

0x2 proccessor

proccessor比較重要的邏輯兩點，AndroidManiest.xml的處理和Codeitem的提取

（1）處理Androidmanifest.xml

我們處理AndroidManifest.xml的操作主要是備份原Application的類名和寫入殼的代理Application的類名。備份原Application類名目的是在殼的流程執行完成後，調用我們原APK的Application。寫入殼的代理Application類名的目的是在app啟動時儘早的啟動我們的代理Application，這樣我們就可以做一些準備工作，比如自定義加載dex,Hook一些函數等。我們知道，AndroidManifest.xml在生成apk後它不是以普通xml文件的格式來存放的，而是以axml格式來存放的。不過幸運的是，已經有許多大佬寫了對axml解析和編輯的庫，我們直接拿來用就行。這裡用到的axml處理的庫是ManifestEditor。

提取原Androidmanifest.xml Application完整類名代碼如下，直接調用getApplicationName函數即可


    public static String getValue(String file,String tag,String ns,String attrName){
        byte[] axmlData = IoUtils.readFile(file);
        AxmlParser axmlParser = new AxmlParser(axmlData);
        try {
            while (axmlParser.next() != AxmlParser.END_FILE) {
                if (axmlParser.getAttrCount() != 0 && !axmlParser.getName().equals(tag)) {
                    continue;
                }
                for (int i = 0; i < axmlParser.getAttrCount(); i++) {
                    if (axmlParser.getNamespacePrefix().equals(ns) && axmlParser.getAttrName(i).equals(attrName)) {
                        return (String) axmlParser.getAttrValue(i);
                    }
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    public static String getApplicationName(String file) {
        return getValue(file,"application","android","name");
    }

寫入Application類名的代碼如下：

    public static void writeApplicationName(String inManifestFile, String outManifestFile, String newApplicationName){
        ModificationProperty property = new ModificationProperty();
        property.addApplicationAttribute(new AttributeItem(NodeValue.Application.NAME,newApplicationName));

        FileProcesser.processManifestFile(inManifestFile, outManifestFile, property);

    }

(2) 提取CodeItem

CodeItem是dex文件中存放函數位元組碼相關數據的結構。下圖顯示的就是CodeItem大概的樣子。

說是提取CodeItem，其實我們提取的是CodeItem中的insns，它裏面存放的是函數真正的位元組碼。提取insns，我們使用的是Android源碼中的dx工具，使用dx工具可以很方便的讀取dex文件的各個部分。

下面的代碼遍歷所有ClassDef，並遍歷其中的所有函數，再調用extractMethod對單個函數進行處理。

    public static List<Instruction> extractAllMethods(File dexFile, File outDexFile) {
        List<Instruction> instructionList = new ArrayList<>();
        Dex dex = null;
        RandomAccessFile randomAccessFile = null;
        byte[] dexData = IoUtils.readFile(dexFile.getAbsolutePath());
        IoUtils.writeFile(outDexFile.getAbsolutePath(),dexData);

        try {
            dex = new Dex(dexFile);
            randomAccessFile = new RandomAccessFile(outDexFile, "rw");
            Iterable<ClassDef> classDefs = dex.classDefs();
            for (ClassDef classDef : classDefs) {
                
                ......
                
                if(classDef.getClassDataOffset() == 0){
                    String log = String.format("class '%s' data offset is zero",classDef.toString());
                    logger.warn(log);
                    continue;
                }

                ClassData classData = dex.readClassData(classDef);
                ClassData.Method[] directMethods = classData.getDirectMethods();
                ClassData.Method[] virtualMethods = classData.getVirtualMethods();
                for (ClassData.Method method : directMethods) {
                    Instruction instruction = extractMethod(dex,randomAccessFile,classDef,method);
                    if(instruction != null) {
                        instructionList.add(instruction);
                    }
                }

                for (ClassData.Method method : virtualMethods) {
                    Instruction instruction = extractMethod(dex, randomAccessFile,classDef, method);
                    if(instruction != null) {
                        instructionList.add(instruction);
                    }
                }
            }
        }
        catch (Exception e){
            e.printStackTrace();
        }
        finally {
            IoUtils.close(randomAccessFile);
        }

        return instructionList;
    }

處理函數的過程中發現沒有代碼（通常為native函數）或者insns的容量不足以填充return語句則跳過處理。這裡就是對應函數抽取殼的抽取操作

    private static Instruction extractMethod(Dex dex ,RandomAccessFile outRandomAccessFile,ClassDef classDef,ClassData.Method method)
            throws Exception{
        String returnTypeName = dex.typeNames().get(dex.protoIds().get(dex.methodIds().get(method.getMethodIndex()).getProtoIndex()).getReturnTypeIndex());
        String methodName = dex.strings().get(dex.methodIds().get(method.getMethodIndex()).getNameIndex());
        String className = dex.typeNames().get(classDef.getTypeIndex());
        //native函數
        if(method.getCodeOffset() == 0){
            String log = String.format("method code offset is zero,name =  %s.%s , returnType = %s",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName));
            logger.warn(log);
            return null;
        }
        Instruction instruction = new Instruction();
        //16 = registers_size + ins_size + outs_size + tries_size + debug_info_off + insns_size
        int insnsOffset = method.getCodeOffset() + 16;
        Code code = dex.readCode(method);
        //容錯處理
        if(code.getInstructions().length == 0){
            String log = String.format("method has no code,name =  %s.%s , returnType = %s",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName));
            logger.warn(log);
            return null;
        }
        int insnsCapacity = code.getInstructions().length;
        //insns容量不足以存放return語句，跳過
        byte[] returnByteCodes = getReturnByteCodes(returnTypeName);
        if(insnsCapacity * 2 < returnByteCodes.length){
            logger.warn("The capacity of insns is not enough to store the return statement. {}.{}() -> {} insnsCapacity = {}byte(s),returnByteCodes = {}byte(s)",
                    TypeUtils.getHumanizeTypeName(className),
                    methodName,
                    TypeUtils.getHumanizeTypeName(returnTypeName),
                    insnsCapacity * 2,
                    returnByteCodes.length);

            return null;
        }
        instruction.setOffsetOfDex(insnsOffset);
        //這裡的MethodIndex對應method_ids區的索引
        instruction.setMethodIndex(method.getMethodIndex());
        //注意：這裡是數組的大小
        instruction.setInstructionDataSize(insnsCapacity * 2);
        byte[] byteCode = new byte[insnsCapacity * 2];
        //寫入nop指令
        for (int i = 0; i < insnsCapacity; i++) {
            outRandomAccessFile.seek(insnsOffset + (i * 2));
            byteCode[i * 2] = outRandomAccessFile.readByte();
            byteCode[i * 2 + 1] = outRandomAccessFile.readByte();
            outRandomAccessFile.seek(insnsOffset + (i * 2));
            outRandomAccessFile.writeShort(0);
        }
        instruction.setInstructionsData(byteCode);
        outRandomAccessFile.seek(insnsOffset);
        //寫出return語句
        outRandomAccessFile.write(returnByteCodes);

        return instruction;
    }

0x3 shell模塊

shell模塊是函數抽取殼的主要邏輯，它的功能我們上面已經講過。

(1) Hook函數

Hook函數時機最好要早點，dpt在_init函數中開始進行一系列HOOK

extern "C" void _init(void) {
    dpt_hook();
}

Hook框架使用的Dobby，主要Hook兩個函數：MapFileAtAddress和LoadMethod。

Hook MapFileAtAddress函數的目的是在我們加載dex能夠修改dex的屬性，讓加載的dex可寫，這樣我們才能把位元組碼填回dex，有大佬詳細的分析過，具體參考這篇文章。

void* MapFileAtAddressAddr = DobbySymbolResolver(GetArtLibPath(),MapFileAtAddress_Sym());
DobbyHook(MapFileAtAddressAddr, (void *) MapFileAtAddress28,(void **) &g_originMapFileAtAddress28);

Hook到了之後，給prot參數追加PROT_WRITE屬性

void* MapFileAtAddress28(uint8_t* expected_ptr,
              size_t byte_count,
              int prot,
              int flags,
              int fd,
              off_t start,
              bool low_4gb,
              bool reuse,
              const char* filename,
              std::string* error_msg){
    int new_prot = (prot | PROT_WRITE);
    if(nullptr != g_originMapFileAtAddress28) {
        return g_originMapFileAtAddress28(expected_ptr,byte_count,new_prot,flags,fd,start,low_4gb,reuse,filename,error_msg);
    }
}

在Hook LoadMethod函數之前，我們需要了解LoadMethod函數流程。為什麼是這個LoadMethod函數，其他函數是否可行？

當一個類被加載的時候，它的調用鏈是這樣的(部分流程已省略)：

ClassLoader.java::loadClass -> DexPathList.java::findClass -> DexFile.java::defineClass -> class_linker.cc::LoadClass -> class_linker.cc::LoadClassMembers -> class_linker.cc::LoadMethod

也就是說，當一個類被加載，它是會去調用LoadMethod函數的，我們看一下它的函數原型：

void ClassLinker::LoadMethod(const DexFile& dex_file,
                             const ClassDataItemIterator& it,
                             Handle<mirror::Class> klass,
                             ArtMethod* dst);

這個函數太爆炸了，它有兩個爆炸性的參數，DexFile和ClassDataItemIterator，我們可以從這個函數得到當前加載函數所在的DexFile結構和當前函數的一些信息，可以看一下ClassDataItemIterator結構：

  class ClassDataItemIterator{
  
  ......
  
  // A decoded version of the method of a class_data_item
  struct ClassDataMethod {
    uint32_t method_idx_delta_;  // delta of index into the method_ids array for MethodId
    uint32_t access_flags_;
    uint32_t code_off_;
    ClassDataMethod() : method_idx_delta_(0), access_flags_(0), code_off_(0) {}

   private:
    DISALLOW_COPY_AND_ASSIGN(ClassDataMethod);
  };
  ClassDataMethod method_;

  // Read and decode a method from a class_data_item stream into method
  void ReadClassDataMethod();

  const DexFile& dex_file_;
  size_t pos_;  // integral number of items passed
  const uint8_t* ptr_pos_;  // pointer into stream of class_data_item
  uint32_t last_idx_;  // last read field or method index to apply delta to
  DISALLOW_IMPLICIT_CONSTRUCTORS(ClassDataItemIterator);
};

其中最重要的字段就是code_off_它的值是當前加載的函數的CodeItem相對於DexFile的偏移，當相應的函數被加載，我們就可以直接訪問到它的CodeItem。其他函數是否也可以？在上面的流程中沒有比LoadMethod更適合我們Hook的函數，所以它是最佳的Hook點。

Hook LoadMethod稍微複雜一些，倒不是Hook代碼複雜，而是Hook觸發後處理的代碼比較複雜，我們要適配多個Android版本，每個版本LoadMethod函數的參數都可能有改變，幸運的是，LoadMethod改動也不是很大。那麼，我們如何讀取ClassDataItemIterator類中的code_off_呢？比較直接的做法是計算偏移，然後在代碼中維護一份偏移。不過這樣的做法不易閱讀很容易出錯。dpt的做法是把ClassDataItemIterator類拷過來，然後將ClassDataItemIterator引用直接轉換為我們自定義的ClassDataItemIterator引用，這樣就可以方便的讀取字段的值。

下面是LoadMethod被調用後做的操作，邏輯是讀取存在map中的insns，然後將它們填回指定位置。

void LoadMethod(void *thiz, void *self, const void *dex_file, const void *it, const void *method,
                void *klass, void *dst) {

    if (g_originLoadMethod25 != nullptr
        || g_originLoadMethod28 != nullptr
        || g_originLoadMethod29 != nullptr) {
        uint32_t location_offset = getDexFileLocationOffset();
        uint32_t begin_offset = getDataItemCodeItemOffset();
        callOriginLoadMethod(thiz, self, dex_file, it, method, klass, dst);

        ClassDataItemReader *classDataItemReader = getClassDataItemReader(it,method);


        uint8_t **begin_ptr = (uint8_t **) ((uint8_t *) dex_file + begin_offset);
        uint8_t *begin = *begin_ptr;
        // vtable(4|8) + prev_fields_size
        std::string *location = (reinterpret_cast<std::string *>((uint8_t *) dex_file +
                                                                 location_offset));
        if (location->find("base.apk") != std::string::npos) {

            //code_item_offset == 0說明是native方法或者沒有代碼
            if (classDataItemReader->GetMethodCodeItemOffset() == 0) {
                DLOGW("native method? = %s code_item_offset = 0x%x",
                      classDataItemReader->MemberIsNative() ? "true" : "false",
                      classDataItemReader->GetMethodCodeItemOffset());
                return;
            }

            uint16_t firstDvmCode = *((uint16_t*)(begin + classDataItemReader->GetMethodCodeItemOffset() + 16));
            if(firstDvmCode != 0x0012 && firstDvmCode != 0x0016 && firstDvmCode != 0x000e){
                NLOG("this method has code no need to patch");
                return;
            }

            uint32_t dexSize = *((uint32_t*)(begin + 0x20));

            int dexIndex = dexNumber(location);
            auto dexIt = dexMap.find(dexIndex - 1);
            if (dexIt != dexMap.end()) {

                auto dexMemIt = dexMemMap.find(dexIndex);
                if(dexMemIt == dexMemMap.end()){
                    changeDexProtect(begin,location->c_str(),dexSize,dexIndex);
                }


                auto codeItemMap = dexIt->second;
                int methodIdx = classDataItemReader->GetMemberIndex();
                auto codeItemIt = codeItemMap->find(methodIdx);

                if (codeItemIt != codeItemMap->end()) {
                    CodeItem* codeItem = codeItemIt->second;
                    uint8_t  *realCodeItemPtr = (uint8_t*)(begin +
                                                classDataItemReader->GetMethodCodeItemOffset() +
                                                16);

                    memcpy(realCodeItemPtr,codeItem->getInsns(),codeItem->getInsnsSize());
                }
            }
        }
    }
}

(2) 加載dex

其實dex在App啟動的時候已經被加載過一次了，但是，我們為什麼還要再加載一次？因為系統加載的dex是以只讀方式加載的，我們沒辦法去修改那一部分的內存。而且App的dex加載早於我們Application的啟動，這樣，我們在代碼根本沒法感知到，所以我們要重新加載dex。

    private ClassLoader loadDex(Context context){
        String sourcePath = context.getApplicationInfo().sourceDir;
        String nativePath = context.getApplicationInfo().nativeLibraryDir;

        ShellClassLoader shellClassLoader = new ShellClassLoader(sourcePath,nativePath,ClassLoader.getSystemClassLoader());
        return shellClassLoader;
    }

自定義的ClassLoader

public class ShellClassLoader extends PathClassLoader {

    private final String TAG = ShellClassLoader.class.getSimpleName();

    public ShellClassLoader(String dexPath,ClassLoader classLoader) {
        super(dexPath,classLoader);
    }

    public ShellClassLoader(String dexPath, String librarySearchPath,ClassLoader classLoader) {
        super(dexPath, librarySearchPath, classLoader);
    }
}

(3) 替換dexElements

這一步也非常重要，這一步的目的是使ClassLoader從我們新加載的dex文件中加載類。代碼如下：

void mergeDexElements(JNIEnv* env,jclass klass,jobject oldClassLoader,jobject newClassLoader){
    jclass BaseDexClassLoaderClass = env->FindClass("dalvik/system/BaseDexClassLoader");
    jfieldID  pathList = env->GetFieldID(BaseDexClassLoaderClass,"pathList","Ldalvik/system/DexPathList;");
    jobject oldDexPathListObj = env->GetObjectField(oldClassLoader,pathList);
    if(env->ExceptionCheck() || nullptr == oldDexPathListObj ){
        env->ExceptionClear();
        DLOGW("mergeDexElements oldDexPathListObj get fail");
        return;
    }
    jobject newDexPathListObj = env->GetObjectField(newClassLoader,pathList);
    if(env->ExceptionCheck() || nullptr == newDexPathListObj){
        env->ExceptionClear();
        DLOGW("mergeDexElements newDexPathListObj get fail");
        return;
    }

    jclass DexPathListClass = env->FindClass("dalvik/system/DexPathList");
    jfieldID  dexElementField = env->GetFieldID(DexPathListClass,"dexElements","[Ldalvik/system/DexPathList$Element;");


    jobjectArray newClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField(
            newDexPathListObj, dexElementField));
    if(env->ExceptionCheck() || nullptr == newClassLoaderDexElements){
        env->ExceptionClear();
        DLOGW("mergeDexElements new dexElements get fail");
        return;
    }

    jobjectArray oldClassLoaderDexElements = static_cast<jobjectArray>(env->GetObjectField(
            oldDexPathListObj, dexElementField));
    if(env->ExceptionCheck() || nullptr == oldClassLoaderDexElements){
        env->ExceptionClear();
        DLOGW("mergeDexElements old dexElements get fail");
        return;
    }

    jint oldLen = env->GetArrayLength(oldClassLoaderDexElements);
    jint newLen = env->GetArrayLength(newClassLoaderDexElements);

    DLOGD("mergeDexElements oldlen = %d , newlen = %d",oldLen,newLen);

    jclass ElementClass = env->FindClass("dalvik/system/DexPathList$Element");

    jobjectArray  newElementArray = env->NewObjectArray(oldLen + newLen,ElementClass, nullptr);

    for(int i = 0;i < newLen;i++) {
        jobject elementObj = env->GetObjectArrayElement(newClassLoaderDexElements, i);
        env->SetObjectArrayElement(newElementArray,i,elementObj);
    }


    for(int i = newLen;i < oldLen + newLen;i++) {
        jobject elementObj = env->GetObjectArrayElement(oldClassLoaderDexElements, i - newLen);
        env->SetObjectArrayElement(newElementArray,i,elementObj);
    }

    env->SetObjectField(oldDexPathListObj, dexElementField,newElementArray);

    DLOGD("mergeDexElements success");
}

0x4 總結

做這個殼確實花了不少的時間，其中走過的彎路只有自己知道，不過還好做出來了。dpt未經過大量測試，後續發現問題再慢慢解決。

Tags: Android