spark 源碼分析之二十二– Task的內存管理

  • 2019 年 10 月 3 日
  • 筆記

?????

????????????

1.  spark???????????????????

2. ?????????????????????JVM?GC??????????????????????????????????

3. ?????????????????????????????????

4. ???MemoryConsumer????

5. ??????????????

 

??????????? org.apache.spark.memory.TaskMemoryManager ?????

TaskMemoryManager

????????????????

????????????????

?????????????????64?????????

??????????????base?????offset?????????

??????????????

???????????????????????????hashmap??? sorting buffer ????????????????128???????????base??????????gc???????????????????????????????????????????????????????????????????????????????????

??????

????????????????????????gc??????????????64???13??????????51?????????offset???page????base?????page?????????????????????8192????????? 8192 * (2^31 -1)* 8 bytes????140TB?????? 2^31 -1 ??????????page??????????long??????????????2^31 -1????????????64??????????????????????????????

MemoryLocation

????base???offset??????? MemoryLocation?????????????????????????

??????? org.apache.spark.unsafe.memory.MemoryBlock?

MemoryBlock

?????????????????????????????????MemoryLocation????

???????????

?????????????????????base???offset?

???? length????????????? – ?????page number?

 

?????????Platform????????????????????

MemoryAllocator

???????????????????????????????????TaskMemoryManager????????????????????????

??????????????

???????????????

??????????????????????????????????

HeapMemoryAllocator

???org.apache.spark.unsafe.memory.HeapMemoryAllocator

????????????????long???????????16GB?

????

bufferPoolBySize???HashMap?????value???????????????????JVM ??GC??????????????????????????????????

?????????

??????????????????????

????

???????bytes?????words???????????????????????????????????????????????????????????????????????

?????????????????????????????????pool????????pool???????????????????????MmeoryBlock???????????????????????

??????????words???????????????MmeoryBlock????????????????????????????????????????????????

????

 

?????????????free??????pageNumber?????page number?

????????????????????????base????null?offset??0?

???????????????????????????????? * 8????size?????????????????

???????????????????????????????????????JVM?GC???

?????????????????LinkedList????pool????????????????????????????????JVM?????????

????????

?????????????JVM?GC?????????????????????????????????????????????????????????????????????????????????????????????????????????????

UnsafeMemoryAllocator

???org.apache.spark.unsafe.memory.UnsafeMemoryAllocator

?????????

????

???????unsafe??????????????offset????????????base???null?

????

???????????????????????JVM?????????????????????????????????????????????????????????????????

????????

?????????????????????java???Unsafe?????????????base???null??offset??????????????????

 

??????TaskMemoryManager???????????

?????TaskMemoryManager

????

???????????????????

??????????????

OFFSET_BITS????page number ???bit??

MAXIMUM_PAGE_SIZE_BYTES??17GB???????????

pageTable???????????

allocatedPages???????????????

memoryManager?????Spark????????????? spark ??????? — Spark?????? ???????

taskAttemptId???id

tungstenMemoryMode?tungsten????????????????

consumers??????????????

????

???????

?????????????????

1. ??????

???????MemoryManager??????????????????????MemoryConsumer????spill?????????????????????????????????spill???

2. ??????

?????????????????????????????????????????????MemoryConsumer?spill??????????????????

3. ???????

 

4. ?????

????????????????????????????????

?????????????????????????????????MemoryAllocator??????????????????page?????????page???MemoryAllocator????????????????????????????????????

 

5. ?????

???????EMmoryAllocator?free ???????????? ??2 ???????????

 

6. ??????

????13?????page number??51????????offset

 

7.??????

??? ? ??6 ???????

 

8.???????????base???????????????????base???

 

9.??????????????offset

???????????????????offset???

??????????????offset + ?????????????????????????????????

 

10.???????

?????MemoryAllocator????????????MemoryManager???????task??????

 

11.???????????????

????MemoryManager?????????????????

 

??????TaskMemoryManager???????? — MemoryConsumer?

MemoryConsumer

???

???org.apache.spark.memory.MemoryConsumer

???????????

???????

????

taskMemoryManager???????????

used?????????

mode????????????????????

pageSize???????

????

1. ??????????????????????

 

2. ??????????????????????? TaskMemoryManager ?????

????MemoryConsumer???????????????????Shuffle??????????

 

??

?????????Task???????????????????????????????????sort?shuffle???????????????sorter????sorter?????????????????