spark 源码分析之二十二– Task的内存管理
- 2019 年 10 月 3 日
- 筆記
?????
????????????
1. spark???????????????????
2. ?????????????????????JVM?GC??????????????????????????????????
3. ?????????????????????????????????
4. ???MemoryConsumer????
5. ??????????????
??????????? org.apache.spark.memory.TaskMemoryManager ?????
TaskMemoryManager
????????????????
????????????????
?????????????????64?????????
??????????????base?????offset?????????
??????????????
???????????????????????????hashmap??? sorting buffer ????????????????128???????????base??????????gc???????????????????????????????????????????????????????????????????????????????????
??????
????????????????????????gc??????????????64???13??????????51?????????offset???page????base?????page?????????????????????8192????????? 8192 * (2^31 -1)* 8 bytes????140TB?????? 2^31 -1 ??????????page??????????long??????????????2^31 -1????????????64??????????????????????????????
MemoryLocation
????base???offset??????? MemoryLocation?????????????????????????
??????? org.apache.spark.unsafe.memory.MemoryBlock?
MemoryBlock
?????????????????????????????????MemoryLocation????
???????????
?????????????????????base???offset?
???? length????????????? – ?????page number?
?????????Platform????????????????????
MemoryAllocator
???????????????????????????????????TaskMemoryManager????????????????????????
??????????????
???????????????
??????????????????????????????????
HeapMemoryAllocator
???org.apache.spark.unsafe.memory.HeapMemoryAllocator
????????????????long???????????16GB?
????
bufferPoolBySize???HashMap?????value???????????????????JVM ??GC??????????????????????????????????
?????????
??????????????????????
????
???????bytes?????words???????????????????????????????????????????????????????????????????????
?????????????????????????????????pool????????pool???????????????????????MmeoryBlock???????????????????????
??????????words???????????????MmeoryBlock????????????????????????????????????????????????
????
?????????????free??????pageNumber?????page number?
????????????????????????base????null?offset??0?
???????????????????????????????? * 8????size?????????????????
???????????????????????????????????????JVM?GC???
?????????????????LinkedList????pool????????????????????????????????JVM?????????
????????
?????????????JVM?GC?????????????????????????????????????????????????????????????????????????????????????????????????????????????
UnsafeMemoryAllocator
???org.apache.spark.unsafe.memory.UnsafeMemoryAllocator
?????????
????
???????unsafe??????????????offset????????????base???null?
????
???????????????????????JVM?????????????????????????????????????????????????????????????????
????????
?????????????????????java???Unsafe?????????????base???null??offset??????????????????
??????TaskMemoryManager???????????
?????TaskMemoryManager
????
???????????????????
??????????????
OFFSET_BITS????page number ???bit??
MAXIMUM_PAGE_SIZE_BYTES??17GB???????????
pageTable???????????
allocatedPages???????????????
memoryManager?????Spark????????????? spark ??????? — Spark?????? ???????
taskAttemptId???id
tungstenMemoryMode?tungsten????????????????
consumers??????????????
????
???????
?????????????????
1. ??????
???????MemoryManager??????????????????????MemoryConsumer????spill?????????????????????????????????spill???
2. ??????
?????????????????????????????????????????????MemoryConsumer?spill??????????????????
3. ???????
4. ?????
????????????????????????????????
?????????????????????????????????MemoryAllocator??????????????????page?????????page???MemoryAllocator????????????????????????????????????
5. ?????
???????EMmoryAllocator?free ???????????? ??2 ???????????
6. ??????
????13?????page number??51????????offset
7.??????
??? ? ??6 ???????
8.???????????base???????????????????base???
9.??????????????offset
???????????????????offset???
??????????????offset + ?????????????????????????????????
10.???????
?????MemoryAllocator????????????MemoryManager???????task??????
11.???????????????
????MemoryManager?????????????????
??????TaskMemoryManager???????? — MemoryConsumer?
MemoryConsumer
???
???org.apache.spark.memory.MemoryConsumer
???????????
???????
????
taskMemoryManager???????????
used?????????
mode????????????????????
pageSize???????
????
1. ??????????????????????
2. ??????????????????????? TaskMemoryManager ?????
????MemoryConsumer???????????????????Shuffle??????????
??
?????????Task???????????????????????????????????sort?shuffle???????????????sorter????sorter?????????????????