HashMap底層源碼剖析

2020 年 3 月 17 日
筆記

HashMap底層源碼剖析

一、HashMap底層用到的數據結構

數組+單向鏈表+紅黑樹

　　數組：數組每一項都是一個鏈表，其實就是數組和鏈表的結合體

　　單向鏈表：當法神hash碰撞時，首先會找到數組對應位置，然後1.8採用尾插入法（1.7採用頭插入法），形成一個單項鏈表結構

　　JDK1.8 紅黑樹：當數組中每項的鏈表長度大於8時，會轉換為紅黑樹

二、什麼是hash碰撞？解決方案？

　　hash碰撞：不同的key可能會產生相同的hash值；

　　方案：鏈表發，再哈希法；

　　hashMap中採用鏈表發，在ConcurrentHashMap中採用哈希法；

二、紅黑樹與二叉樹比較

　　二叉查找樹在特殊情況下也會變成線性結構，和原來鏈表有共同的問題，節點太深，查找性能慢；

　　紅黑樹相比二叉樹，在檢索的時候效率其實差不多，都是通過平衡來二分查找。但對於插入刪除等操效率提高很多。紅黑樹不像二叉樹一樣追求絕對的平衡，它允許局部很少的不完全平衡，這樣對於效率影響不大，但省去了很多沒有必要的調平衡操作，二叉樹調平衡有時候代價較大，所以二叉樹的效率不如紅黑樹；

三、為什麼採用紅黑樹

　　在平常我們用HashMap的時候，HashMap裏面存儲的key是具有良好的hash算法的key（比如String、Integer等包裝類），衝突幾率自然微乎其微，此時鏈表幾乎不會轉化為紅黑樹，但是當key為我們自定義的對象時，我們可能採用了不好的hash算法，使HashMap中key的衝突率極高，但是這時HashMap為了保證高速的查找效率，就引入了紅黑樹來優化查詢了。

四、為什麼臨界值為8

　　通過源碼我們得知HashMap源碼作者通過泊松分佈算出，當桶中結點個數為8時，出現的幾率是億分之6的，因此常見的情況是桶中個數小於8的情況，此時鏈表的查詢性能和紅黑樹相差不多，因為轉化為樹還需要時間和空間，所以此時沒有轉化成樹的必要。

　　當數據較少的時候，採用鏈表要比紅黑樹效率高，因為平衡二叉樹保持平衡需要耗費資源，那麼前期數據較少時採用鏈表，當鏈表中的數據長度大於8時，就將鏈錶轉換成紅黑樹，可以加快數據的插敘速度，官方測試8為性能最優。

五、put()底層分析

    public V put(K key, V value) {          return putVal(hash(key), key, value, false, true);      }        /**       * Implements Map.put and related methods       *       * @param hash hash for key       * @param key the key       * @param value the value to put       * @param onlyIfAbsent if true, don't change existing value       * @param evict if false, the table is in creation mode.       * @return previous value, or null if none       */      final V putVal(int hash, K key, V value, boolean onlyIfAbsent,                     boolean evict) {          Node<K,V>[] tab; Node<K,V> p; int n, i;          //判斷當前數組是否為空，如果為空要進行第一次擴容          if ((tab = table) == null || (n = tab.length) == 0)              //擴容後將擴容大小交給N              n = (tab = resize()).length;          //判斷獲取當前數組位置是否存在數據，如果為空則直接插入，否則需要代表當前位置不是空的，不是空的需要判斷          if ((p = tab[i = (n - 1) & hash]) == null)              //如果為空則創建一個新的節點添加到該位置              tab[i] = newNode(hash, key, value, null);          else {              Node<K,V> e; K k;              //判斷Hash值和Key值是否相同，如果相同則需要Value覆蓋              if (p.hash == hash &&                  ((k = p.key) == key || (key != null && key.equals(k))))                  e = p;              //判斷當前數組中存放的節點是否是樹節點，則添加樹節點即可              else if (p instanceof TreeNode)                  e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);              else {                  //循環遍歷鏈表                  for (int binCount = 0; ; ++binCount) {                      //判斷當前數組該位置的值得下一個元素是否為空，如果為空則追加到當前元素後邊                      if ((e = p.next) == null) {                          p.next = newNode(hash, key, value, null);                          //添加完畢後判斷當前鏈表節點有多少個，如果節點大於等於8則轉換為紅黑樹                          if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st                              //treeifyBin判斷當前數組是否為空，或者長度是否小於64，如果為空或者小於64，則先擴容                              treeifyBin(tab, hash);                          break;                      }                      //再次進行Key的重複判斷                      if (e.hash == hash &&                          ((k = e.key) == key || (key != null && key.equals(k))))                          break;                      p = e;                  }              }              //表明，記錄到具有相同元素的節點              if (e != null) { // existing mapping for key                  V oldValue = e.value;                  if (!onlyIfAbsent || oldValue == null)                      e.value = value;                  //這個是空函數，可以有用戶根據需要覆蓋                  afterNodeAccess(e);                  return oldValue;              }          }          ++modCount;          //判斷當前數組元素的個數和閾值進行比較，如果數量大於閾值則需要擴容          if (++size > threshold)              //默認情況下，第一次添加數據的時候，先會進行一次擴容後再添加數據，後續都是先添加數據在進行擴容              resize();          //這個是空函數，可以有用戶根據需要覆蓋          afterNodeInsertion(evict);          return null;      }

在上述的方法中，設計三種情況：

　　第一種情況，數組索引位置沒有鍵值對，處理方式就是直接把待添加鍵值對封裝成Node添加到索引位置即可；

　　第二種情況，如果數組索引位置有鍵值對，而且封裝的TreeNode節點，處理方式是調用紅黑樹的插入方法，把帶添加鍵值對添加到紅黑樹中；

　　第三種情況，同樣數組索引位置有鍵值對，但是封裝的是Node節點，處理方法就比較複雜，首先把待添加鍵值對封裝成Node節點添加到鏈表尾部，然後判斷當前鏈表長度，如果達到閾值，就判斷是擴容還是轉換為紅黑樹；

六、get()底層分析

    public V get(Object key) {          Node<K,V> e;          return (e = getNode(hash(key), key)) == null ? null : e.value;      }        /**       * Implements Map.get and related methods       *       * @param hash hash for key       * @param key the key       * @return the node, or null if none       */      final Node<K,V> getNode(int hash, Object key) {          Node<K,V>[] tab; Node<K,V> first, e; int n; K k;          //判斷數組以及數組對應位置數組元素是否為空          if ((tab = table) != null && (n = tab.length) > 0 &&              (first = tab[(n - 1) & hash]) != null) {              //用get傳遞過來的Key值和對應位置第一個元素進行比較，如果相等直接返回，如果不等則進行查找              if (first.hash == hash && // always check first node                  ((k = first.key) == key || (key != null && key.equals(k))))                  return first;              //判斷第一個元素的下一個元素是否為空              if ((e = first.next) != null) {                  //判斷當前節點是否為樹節點                  if (first instanceof TreeNode)                      //如果是樹節點，直接通過getTreeNode拿到該節點返回                      return ((TreeNode<K,V>)first).getTreeNode(hash, key);                  //循環一一對比                  do {                      if (e.hash == hash &&                          ((k = e.key) == key || (key != null && key.equals(k))))                          return e;                  } while ((e = e.next) != null);              }          }          return null;      }

七、擴容機制底層分析

　　在HashMap中，桶數組的長度均是2的冪，閾值大小為桶數組長度與負載因子的乘積。當HashMap中的鍵值對數量超過閾值時，就進行擴容；

　　擴容之後，要重新計算鍵值對的位置，並把它們移到合適的位置上去；

    /**       * Initializes or doubles table size.  If null, allocates in       * accord with initial capacity target held in field threshold.       * Otherwise, because we are using power-of-two expansion, the       * elements from each bin must either stay at same index, or move       * with a power of two offset in the new table.       *       * @return the table       */      final Node<K,V>[] resize() {          Node<K,V>[] oldTab = table;          int oldCap = (oldTab == null) ? 0 : oldTab.length;          int oldThr = threshold;          int newCap, newThr = 0;          //如果table不為空，表明已經初始化過了          if (oldCap > 0) {              //當table容量超過容量最大值，則不再擴容              if (oldCap >= MAXIMUM_CAPACITY) {                  threshold = Integer.MAX_VALUE;                  return oldTab;              }              //按舊容量或閾值的2倍計算新容量和閾值大小              else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&                       oldCap >= DEFAULT_INITIAL_CAPACITY)                  newThr = oldThr << 1; // double threshold          }          else if (oldThr > 0) // initial capacity was placed in threshold              //初始化時，將threshold的值賦值給newCap;  　　　　　    //HashMap使用threshold變量暫時保存initialCapacity參數的值              newCap = oldThr;          else {               // zero initial threshold signifies using defaults              //調用無參構造方法時，桶數組容量為默認容量；閾值為默認容量與默認負載因子              newCap = DEFAULT_INITIAL_CAPACITY;              newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);          }          //newThr為0時，按閾值計算公式進行計算          if (newThr == 0) {              float ft = (float)newCap * loadFactor;              newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?                        (int)ft : Integer.MAX_VALUE);          }          threshold = newThr;          @SuppressWarnings({"rawtypes","unchecked"})              //創建新的桶數組，桶數組的初始化也是這裡完成的              Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];          table = newTab;          if (oldTab != null) {              //如果舊的桶數組不為空，則遍歷桶數組，並將鍵值對映射到新的桶數組中              for (int j = 0; j < oldCap; ++j) {                  Node<K,V> e;                  if ((e = oldTab[j]) != null) {                      oldTab[j] = null;                      if (e.next == null)                          newTab[e.hash & (newCap - 1)] = e;                      else if (e instanceof TreeNode)                          //重新映射時，需要對紅黑樹進行拆分                          ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);                      else { // preserve order                          Node<K,V> loHead = null, loTail = null;                          Node<K,V> hiHead = null, hiTail = null;                          Node<K,V> next;                          //遍歷鏈表，並將鏈表節點按原順序進行分組                          do {                              next = e.next;                              if ((e.hash & oldCap) == 0) {                                  if (loTail == null)                                      loHead = e;                                  else                                      loTail.next = e;                                  loTail = e;                              }                              else {                                  if (hiTail == null)                                      hiHead = e;                                  else                                      hiTail.next = e;                                  hiTail = e;                              }                          } while ((e = next) != null);                          //將分組後的鏈表映射到新桶中                          if (loTail != null) {                              loTail.next = null;                              newTab[j] = loHead;                          }                          if (hiTail != null) {                              hiTail.next = null;                              newTab[j + oldCap] = hiHead;                          }                      }                  }              }          }          return newTab;      }

擴容總共做了三件事：

　　1.計算新桶數組的容量newCap和新閾值newThr

　　2.根據計算出的newCap創建新的桶數組，桶數組table也是這裡進行初始化的

　　3.將鍵值對節點重新映射到新桶數組中，如果節點是TreeNode類型，則需要拆分紅黑樹；如果是普通節點，則節點按原順序進行分組