單鏈表實現LRU快取淘汰演算法

2020 年 2 月 10 日
筆記

LRU（Least recently used，最近最少使用）演算法根據數據的歷史訪問記錄來進行淘汰數據，其核心思想是如果數據最近被訪問過，那麼將來被訪問的幾率也更高，相反如果很長時間未被訪問，則它在最近一段時間內也不會被訪問。實現方式有很多種，這裡先介紹基於數組和單鏈表的實現方式。

基於數組的LRU

首位置保存最新訪問數據，末尾位置優先清理。

如果此數據之前已經被快取在數組中了，查找到其位置並從原位置中刪除該數據，將位置之前的數據依次右移一位，保存此數據到數組第一個元素位置上。
如果此數據沒有在快取數組中，又可以分為兩種情況：
- 如果此時快取未滿，將數組中數據全部右移一位，保存此數據到數組第一個元素位置上。
- 如果此時快取已滿，將數組最後一位數據刪除，再將數組中數據全部右移一位，保存此數據到數組第一個元素位置上。

public class LRUBasedArray<T> {      private final int DEFAUL_CAPACITY = 10;      private Integer capacity;      private Integer count;      private T[] value;      /**       * 用於元素記錄所在數組位置       */      private Map<T, Integer> holder;        public LRUBasedArray() {          this.capacity = this.DEFAUL_CAPACITY;          this.value = (T[]) new Object[capacity];          this.count = 0;          this.holder = new HashMap<T, Integer>(capacity);      }        public LRUBasedArray(Integer capacity) {          this.capacity = capacity;          this.value = (T[]) new Object[capacity];          this.count = 0;          this.holder = new HashMap<T, Integer>(capacity);      }        /**       * 快取數據       * @param data       */      public void add(T data){          if (data == null){              throw new IllegalArgumentException("該快取容器不支援null!");          }          Integer index = holder.get(data);          if (index != null){              // 向右移動              update(index);          }else {              // 是否已滿              if (isFull()){                  // 刪除，更新                  removeAndCache(data);              }else {                  // 右移元素更新                  cache(data, count);              }          }      }        /**       * 數據之前已在數組中，將數組中的對應數據更新到數組開始。       * @param index       */      private void update(Integer index){          T key = value[index];          rightOffer(index);          value[0] = key;          holder.put(key,0);      }        /**       * 向數組插入新數據       * @param data       * @param end       */      private void cache(T data, Integer end){          rightOffer(end);          value[0] = data;          holder.put(data,0);          count++;      }        /**       * 刪數組最後一位，並將新數據保存到數組開始       * @param data       */      private void removeAndCache(T data){          T key = value[--count];          holder.remove(key);          cache(data, count);      }          /**       * index左側的統一向右移動一位       * @param index       */      private void rightOffer(Integer index){          for (int i=index;i>0;i--){              value[i] = value[i-1];              holder.put(value[i],i);          }      }        /**       * 判斷數組是否已滿       * @return       */      private boolean isFull(){          return count == capacity;      }        @Override      public String toString() {          StringBuilder sb = new StringBuilder();          for (int i = 0; i < count; i++) {              sb.append(value[i]);              sb.append(" ");          }          return sb.toString();      }        public static void main(String[] args) {          LRUBasedArray<Integer> array = new LRUBasedArray();          Random random = new Random(20);          int num = 0;          for (int i=0;i<20;i++){              num = random.nextInt(20);              array.add(num);              PrintUtill.println("插入"+ num + ":");              PrintUtill.println(array.toString());          }      }    }

當然也可以首位置優先清理，末尾位置保存最新訪問數據，思想類似，這裡就不再贅述。

基於單鏈表

維護一個有序單鏈表，越靠近鏈表尾部的結點是越早之前訪問的。當有一個新的數據被訪問時，我們從鏈表頭開始順序遍歷鏈表。

如果此數據之前已經被快取在鏈表中了，我們遍歷得到這個數據對應的結點，並將其從原來的位置刪除，然後再插入到鏈表的頭部。
如果此數據沒有在快取鏈表中，又可以分為兩種情況：
- 如果此時快取未滿，則將此結點直接插入到鏈表的頭部；
- 如果此時快取已滿，則鏈表尾結點刪除，將新的數據結點插入鏈表的頭部。

public class LRUBaseLinkedList<T> {      /**       * 默認容量       */      private final int DEFAUL_CAPACITY = 10;        /**       * 頭結點       */      private SNode<T> head;      /**       * 鏈表長度       */      private Integer length;      /**       * 鏈表容量       */      private Integer capacipy;        public LRUBaseLinkedList() {          this.head = new SNode<>();          this.length = 0;          this.capacipy = DEFAUL_CAPACITY;      }        public LRUBaseLinkedList(Integer capacipy) {          this.head = new SNode<>();          this.length = 0;          this.capacipy = capacipy;      }        /**       *  快取數據       * @param data       */      public void add(T data){          SNode preNode = findPreNode(data);          if (preNode != null){              // 刪除節點              deleteElemOptim(preNode);          }else {              // 節點不存在，隊列是否已滿              if (length>=capacipy){                  // 已滿，刪除隊尾                  deleteElemAtEnd();              }          }          // 將節點插入到頭          intsertElemAtBegin(data);      }        /**       * 刪除某個結點       * @param node       */      public void deleteElemOptim(SNode node){          node.setNext(node.getNext().getNext());          length--;      }        /**       * 刪除鏈表尾部的結點       */      public void deleteElemAtEnd(){          SNode node = head;            if (node.getNext() == null) return;            while (node.getNext().getNext() != null){              node = node.getNext();          }          node.getNext().setNext(null);          length--;        }        /**       * 在頭部插入結點       * @param data       */      public void intsertElemAtBegin(T data){          SNode node = new SNode(data, head.getNext());          head.setNext(node);          length++;      }            /**       * 獲取查找到元素的前一個節點       * @param data       * @return       */      private SNode findPreNode(T data){         SNode node = head;         while (node.getNext() != null){             if (node.getNext().element.equals(data)){                 return node;             }             node = node.getNext();         }         return null;      }        private void printAll(){          SNode node = head.getNext();          while (node!=null){              PrintUtill.print(node.element+">");              node = node.getNext();          }          PrintUtill.printlnRule();        }            public class SNode<T> {          private T element;          private SNode next;            public SNode() {              this.next = null;          }            public SNode(T element, SNode next) {              this.element = element;              this.next = next;          }            public T getElement() {              return element;          }            public void setElement(T element) {              this.element = element;          }            public SNode getNext() {              return next;          }            public void setNext(SNode next) {              this.next = next;          }      }          public static void main(String[] args) {          LRUBaseLinkedList<Integer> list = new LRUBaseLinkedList<Integer>();          Random random = new Random(20);          int num = 0;          for (int i=0;i<20;i++){              num = random.nextInt(20);              list.add(num);              PrintUtill.println("插入"+ num + ":");              list.printAll();          }        }  }

延伸

什麼是快取

快取是一種提高數據讀取性能的技術，在硬體設計、軟體開發中都有著非常廣泛的應用，比如常見的 CPU 快取、資料庫快取、瀏覽器快取等等。

快取的大小有限，當快取被用滿時，哪些數據應該被清理出去，哪些數據應該被保留？這就需要快取淘汰策略來決定。

有哪些快取淘汰策略？

常見的策略有三種：先進先出策略 FIFO（First In，First Out）、最少使用策略 LFU（Least Frequently Used）、最近最少使用策略 LRU（Least Recently Used）。

什麼是鏈表

和數組一樣是一種線性表。
從記憶體結構來看，鏈表的記憶體結構是不連續的記憶體空間，是將一組零散的記憶體塊串聯起來，從而進行數據存儲的數據結構。
鏈表中的每一個記憶體塊被稱為結點Node。結點除了存儲數據外，還需記錄鏈上的下一個結點的地址，即後繼指針next。

常用鏈表

單鏈表、循環鏈表、雙向鏈表、雙向循環鏈表

單鏈表

每個結點只包含一個指針，即後繼指針。
單鏈表有兩個特殊結點，即頭結點和尾結點。頭結點用來記錄鏈表的基地址，可以遍歷得到整條鏈表。尾結點指向一個空地址 NULL，表示這是鏈表上最後一個結點。
插入和刪除結點的時間複雜度為O（1），查找的時間複雜度為O(n)。

循環鏈表

一種特殊的單鏈表。

它跟單鏈表唯一的區別是尾結點指針是指向鏈表的頭結點。
循環鏈表的優點是從鏈尾到鏈頭比較方便。適用於處理具有環型結構特點的數據，比如著名的約瑟夫問題。

雙向鏈表

結點除了存儲數據外，還有兩個指針分別指向前一個結點地址（前驅指針prev）和下一個結點地址（後繼指針next）。
頭結點的前驅指向為空，尾結點指向為空。
給定數據值查詢/刪除對應結點，雙向和單鏈表的時間複雜度均是O(n)；給定結點的查詢/刪除，雙向O(1)，單鏈表O(n)。

雙向循環鏈表

首節點的前驅指針指向尾節點，尾節點的後繼指針指向首節點。

用空間換時間的設計思想

對於執行較慢的程式，可以通過消耗更多的記憶體（空間換時間）來進行優化。消耗過多記憶體的程式，可以通過消耗更多的時間（時間換空間）來降低記憶體的消耗。

數組和鏈表比較：

鏈表插入、刪除數據效率高，時間複雜度O(1)，隨機訪問效率低，時間複雜度O(n)。
數組插入、刪除數據效率低，時間複雜度O(n)，隨機訪問效率高，時間複雜度O(1)。

數組簡單易用，在實現上使用的是連續的記憶體空間，可以藉助 CPU 的快取機制，預讀數組中的數據，所以訪問效率更高。而鏈表在記憶體中並不是連續存儲，所以對 CPU 快取不友好，沒辦法有效預讀。

鏈表本身沒有大小的限制，天然地支援動態擴容。

如果程式碼對記憶體的使用非常苛刻，那數組就更適合。

數組缺點

記憶體中沒有足夠的連續空間時，數組申請空間會失敗，導致記憶體不足(out of memory)。

數組大小固定，當不夠用時，需要擴容，一旦擴容就要進行數據複製，而這時非常費時。

鏈表缺點

記憶體消耗大，因為要消耗額外的空間存儲指針資訊。

對鏈表進行頻繁的插入和刪除操作，會導致頻繁的記憶體申請和釋放，容易造成記憶體碎片。Java語言中還可能會造成頻繁的GC（Garbage Collection，垃圾回收）。

參考資料

數據結構與演算法之美