HashMap探索01-源碼註解翻譯

  • 2020 年 2 月 10 日
  • 筆記

當時好奇HashMap與ConcurrentHashMap,在網上找資料時發現基本都是相關的源碼分析,想自己看看JDK裏面具體有些什麼,於是有了這個系列,信馬由韁,走到哪裡寫到哪裡吧。本系列在未註明的情況下均基於JDK8。本篇主要是HashMap類開篇的注釋翻譯。

正題

Hash table based implementation of the Map interface. This implementation provides all of the optional map operations, and permits null values and the null key. (The HashMap class is roughly equivalent to Hashtable, except that it is unsynchronized and permits nulls.) This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time.

基於哈希表的Map接口實現。該接口實現提供了所有可選的Map操作,而且允許空value(null value)和空key(null key).(HashMap類大致相當於HashTable,除了它是不同步的和允許為空。)這個類不保證Map的順序,特別是,它不保證順序隨着時間的推移保持不變。

This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.

假定散列函數將元素適當的分散在桶(buckets)之間,則該實現為其基本操作(get和put)提供了恆定時間的性能。迭代集合視圖所需要的時間與HashMap實例的容量("capacity" ,桶的數量)加上其大小(size,映射鍵值對的數量)成比例關係。因此,若迭代性能很重要,則不要將初始容量(initial capacity)設置過高(或者負載因子[load factor]設置過低)是非常重要的。

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

一個HashMap的實例有兩個影響其性能的參數:初始容量(initial capacity) 和負載因子(load factor)。容量是指哈希表中的桶的數量,初始容量只是創建哈希表時的容量。負載因子是指在自動增加容量之前允許哈希表填充的程度的衡量標準(譯註:即衡量哈希表的空/滿程度,以便其增加容量)。當哈希表中的條目超過負載因子與當前容量的乘積時,哈希表將被重哈希(rehashed,即,重建內部數據結構)以便哈希表擁有大約兩倍的桶數(譯註:即自動擴容為大致原來容量的2倍)。

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

通常,默認負載因子(0.75)在時間與空間成本之間提供了良好的權衡。較高的值會減少空間成本,但會增加查找成本(反映在HashMap類的大部分操作中,包含get和put)。在設置其初始容量時,應考慮map中的預期條目數及其負載因子,以便最小化重哈希操作的數量。如果初始容量大於最大條目數除以負載因子,則不會發成rehash操作。

If many mappings are to be stored in a HashMap instance, creating it with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table. Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

如果很多映射關係(mappings)需要存儲在一個HashMao實例中,則相對於根據需要執行rehash操作擴展表的容量來說,使用足夠大的初始容量創建它將使映射關係更有效地存儲。注意許多keys使用同一個hashCode()在任何哈希表中都會減慢性能。

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the map. If no such object exists, the map should be "wrapped" using the Collections.synchronizedMap method. This is best done at creation time, to prevent accidental unsynchronized access to the map:

注意這個實現是不同步的。如果多個線程同時訪問一個哈希映射,並且至少有一個線程在結構上修改了此映射,則它必須保持外部同步。(結構修改是指添加或刪除一個或多個映射的任何操作,僅更改與實例已包含的key關聯的值不是結構修改。)這通常通過對自然封裝該map的某個對象進行同步來實現。如果不存在此類對象,則應使用Collections.synchronizedMap方法「包裝」該map。 這最好在創建時完成,以防止意外地不同步訪問該map:

   Map m = Collections.synchronizedMap(new HashMap(...));

The iterators returned by all of this class's "collection view methods" are fail-fast: if the map is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

由所有此類的「集合視圖方法」返回的迭代器都是快速失敗的:如果在迭代器創建之後的任何時候對map進行結構修改,除非通過迭代器自己的remove方法,其他任何方式迭代器都將拋出ConcurrentModificationException。因此,在並發修改的情況下,迭代器快速而乾淨地失敗,而不是在未來的未確定時間冒任意,非確定性行為的風險。

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

請注意,迭代器的快速失敗行為無法得到保證,因為一般來說,在存在不同步的並發修改時,不可能做出任何硬性保證。失敗快速迭代器會盡最大努力拋出ConcurrentModificationException。 因此,編寫依賴於此異常的程序以確保其正確性是錯誤的:迭代器的快速失敗行為應該僅用於檢測錯誤。

This class is a member of the Java Collections Framework.

該類是Java Collections Framework中的一員。

實現注意事項(Implementation notes)

This map usually acts as a binned (bucketed) hash table, but when bins get too large, they are transformed into bins of TreeNodes, each structured similarly to those in java.util.TreeMap. Most methods try to use normal bins, but relay to TreeNode methods when applicable (simply by checking instanceof a node). Bins of TreeNodes may be traversed and used like any others, but additionally support faster lookup when overpopulated. However, since the vast majority of bins in normal use are not overpopulated, checking for existence of tree bins may be delayed in the course of table methods.

該Map通常充當binned(bucketed)哈希表,但是當容器變得太大時,它們就會轉成TreeNodes的bins,每個bins的結構都與java.util.TreeMap類型。大多數方法嘗試使用普通bins,但在適用的情況下中繼到TreeNode(簡單的通過檢測節點的實例)。TreeNodes的Bins可以像其他任何Bins一樣遍歷和使用,而且當填充過多時,還可以支持更快的查找。但是,由於在正常使用中絕大多數的bin並沒有被填滿,在使用表方法的過程中,對樹容器存在性的檢測可能會被延遲。

Tree bins (i.e., bins whose elements are all TreeNodes) are ordered primarily by hashCode, but in the case of ties, if two elements are of the same "class C implements Comparable", type then their compareTo method is used for ordering. (We conservatively check generic types via reflection to validate this — see method comparableClassFor). The added complexity of tree bins is worthwhile in providing worst-case O(log n) operations when keys either have distinct hashes or are orderable, Thus, performance degrades gracefully under accidental or malicious usages in which hashCode() methods return values that are poorly distributed, as well as those in which many keys share a hashCode, so long as they are also Comparable. (If neither of these apply, we may waste about a factor of two in time and space compared to taking no precautions. But the only known cases stem from poor user programming practices that are already so slow that this makes little difference.)

樹容器(即,其元素都是TreeNodes的容器)主要由hashCode排序,但在ties的情況下,如果兩個元素具有相同的「class C implements Comparable」(即實現了Comparable接口),則對他們使用compareTo方法排序。 (我們保守地通過反射來檢查泛型類型以驗證這一點 – 請參閱方法comparableClassFor)。 當keys具有不同的散列或可排序時,增加樹容器的複雜性對於提供最壞情況O(log n)的操作是值得的。因此,在偶然或惡意的使用hashCode()方法分佈不佳的返回值,以及許多keys共享hashCode的值(只要它們具有可比性)時,性能會優雅地降級。 (如果這兩者都不適用,與不採取任何預防措施相比,我們可能在時間和空間上浪費大約兩倍。但是,唯一已知的案例源於糟糕的用戶編程實踐,這些實踐已經非常緩慢,這幾乎沒有什麼區別。)

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:

由於TreeNodes的大小約為常規節點的兩倍,因此只有當bin包含足夠的節點以保證使用時,我們才使用它們(請參閱TREEIFY_THRESHOLD)。 當它們變得太小(由於移除或調整大小)時,它們會轉換回普通bins。 在具有良好分佈的用戶hashCodes的用法中,很少使用樹容器。 理想情況下,在隨機hashCodes下,bin中節點的頻率遵循泊松分佈(Poisson distribution,http://en.wikipedia.org/wiki/Poisson_distribution),對於0.75的默認調整閾值,參數平均值約為0.5,儘管由於調整粒度而存在很大的差異。 忽略方差,列表大小k的預期出現是(exp(-0.5) * pow(0.5, k) / factorial(k))。 第一個值是

0:    0.60653066  1:    0.30326533  2:    0.07581633  3:    0.01263606  4:    0.00157952  5:    0.00015795  6:    0.00001316  7:    0.00000094  8:    0.00000006

more: less than 1 in ten million

更多:小於千萬分之一。

The root of a tree bin is normally its first node. However, sometimes (currently only upon Iterator.remove), the root might be elsewhere, but can be recovered following parent links (method TreeNode.root()).

樹bin的根通常是它的第一個節點。 但是,有時(目前僅在Iterator.remove上),根可能在其他地方,但可以在父鏈接之後恢復(方法TreeNode.root())。

All applicable internal methods accept a hash code as an argument (as normally supplied from a public method), allowing them to call each other without recomputing user hashCodes. Most internal methods also accept a "tab" argument, that is normally the current table, but may be a new or old one when resizing or converting.

所有適用的內部方法都接受哈希碼作為參數(通常從公共方法提供),允許它們相互調用而無需重新計算用戶hashCodes。 大多數內部方法也接受「tab」參數,通常是當前表,但在調整大小或轉換時可能是新的或舊的。

When bin lists are treeified, split, or untreeified, we keep them in the same relative access/traversal order (i.e., field Node.next) to better preserve locality, and to slightly simplify handling of splits and traversals that invoke iterator.remove. When using comparators on insertion, to keep a total ordering (or as close as is required here) across rebalancings, we compare classes and identityHashCodes as tie-breakers.

當bin列表被樹化,拆分或未解析時,我們將它們保持在相同的相對訪問/遍歷順序(即,字段Node.next)中以更好地保留局部性,並略微簡化對調用iterator.remove的拆分和遍歷的處理。 當在插入中使用比較器時,為了保持整個重新排序的總排序(或者在這裡需要儘可能接近),我們比較類和identityHashCodes作為tie-breakers。

The use and transitions among plain vs tree modes is complicated by the existence of subclass LinkedHashMap. See below for hook methods defined to be invoked upon insertion, removal and access that allow LinkedHashMap internals to otherwise remain independent of these mechanics. (This also requires that a map instance be passed to some utility methods that may create new nodes.)

由於LinkedHashMap子類的存在,普通模式和樹模式之間的使用和轉換變得複雜。請參閱下面的定義在插入,刪除和訪問時調用的鉤子方法,這些鉤子方法允許LinkedHashMap內部保持獨立於這些機制。(這還需要將映射實例傳遞給一些可能創建新節點的實用工具方法。)

The concurrent-programming-like SSA-based coding style helps avoid aliasing errors amid all of the twisty pointer operations.

類似於基於sa的編碼風格的並行編程的有助於避免在所有扭曲指針操作中出現混疊錯誤。

資料

Class HashMap<K,V> Class ConcurrentHashMap<K,V>