部分常用算法分析總結

2019 年 10 月 10 日
筆記

算法與來源：

快速排序、廣度優先搜索、狄克斯特拉算法、貪婪算法、動態規劃、其它說明

來源：https://book.douban.com/subject/26979890/

快速排序

分而治之的思想

找到簡單的基線條件（遞歸退出條件）

確定縮小問題規模的方法，最終符合基線條件

例如對多個數進行求和時，考慮兩個數的求和方案，進而歸納三個數的求和時候，轉化為一個數與另兩個數（一個整體）的求和。最終確定多個數求和時候的方案。

再或者完成土地的方塊劃分問題：https://www.cnblogs.com/lshedward/p/10436874.html

快速排序的思路

分而治之的方法，首先考慮最小問題規模：1個數（排序完成，不需要排序）

2個數時：

首先尋找基準數，如第一個數字。然後以這個基準數字比較，確定另外一個數字的位置在基準左邊還是右邊。

3個數時：

首先尋找基準數，如第一個數字。然後以這個基準數字比較，確定另外兩個數字的位置在基準左邊還是右邊。若剩下兩個數字都在基準的左邊（或右邊），轉化為2個數時的問題；若左右都為一個數，轉化為1個數問題。

n個數時：

首先尋找基準數，如第一個數字。然後以這個基準數字比較，確定另外n-1個數字的位置在基準左邊還是右邊。然後遞歸縮小問題的規模。

代碼

def quick_sort(arr):      if len(arr) < 2:          return arr      else:          pivot = arr[0]          left=[i for i in arr[1:] if i<pivot]          right=[i for i in arr[1:] if i>pivot]          return quick_sort(left) + [mid] + quick_sort(right)

快速排序的複雜度與比較

參閱如：https://blog.csdn.net/iechenyb/article/details/81941022中表。

複雜度計算方法：

當n個元素排序時時候快速排序，則排序過程中，認為使用的排序層數為log(n)，每層需要的時間為n，總體時間複雜度nlog(n)

當快速排序的基準數不恰當時候，取得時間複雜度n*n

廣度優先搜索

廣度優先搜索解決的問題類似於圖結點中的最短路徑條數問題或能轉化為此類的問題。

問題提出

查找你人脈中的特定職業崗位的是否存在，並最短的問題。

參閱：https://www.jianshu.com/p/a537ffb9b1eb

如圖：找到THOM

需要的數據結構

隊列：用於判斷數據的位次，按照順序遍曆數據

散列表：查詢、插入速度快，能完成鍵值映射，以完成圖中的關係顯示。

問題思路

1：一度關係加入隊列，通過自己認識的人獲取他們的職業，對他們的職業進行判斷，完成一度關係搜索。

2：不是需要查找的人，則將二度關係插入到隊列，便於隨後搜索。

3：以此類推。

代碼

https://github.com/egonSchiele/grokking_algorithms/blob/master/06_breadth-first_search/python/01_breadth-first_search.py

因為是按照一度、二度關係依次檢索，則檢索過程中出現結果自然是最短路徑。

代碼首先構建了圖表關係，通過while然後循環查找與添加數據到隊列。

from collections import deque    def person_is_seller(name):        return name[-1] == 'm'    graph = {}  graph["you"] = ["alice", "bob", "claire"]  graph["bob"] = ["anuj", "peggy"]  graph["alice"] = ["peggy"]  graph["claire"] = ["thom", "jonny"]  graph["anuj"] = []  graph["peggy"] = []  graph["thom"] = []  graph["jonny"] = []    def search(name):      search_queue = deque()      search_queue += graph[name]      # This array is how you keep track of which people you've searched before.      searched = []      while search_queue:          person = search_queue.popleft()          # Only search this person if you haven't already searched them.          if person not in searched:              if person_is_seller(person):                  print(person + " is a mango seller!")                  return True              else:                  search_queue += graph[person]                  # Marks this person as searched                  searched.append(person)      return False  search("you")

狄克斯特拉算法

狄克斯特拉算法解決在加權圖中前往目的地的最優路徑（單向非負權）

問題提出

獲取從起點到終點的最優路徑

解決思路

1：從起點判斷到A、B的權值（更新後到A點6，到B點2）

2：以最小的權值B點判斷到A、終點的權值，並比較原先到A、終點的值，更小則做出更新（更新後到A點為5，到終點為7）

3：以A點出發，得到到終點的權值（5+1），比較原來到終點的權值（7），並更新，得到最優路徑和權值為6。

數據結構

通過散列嵌套散列，完成圖表的graph建設：

graph = {}  graph["start"] = {}  graph["start"]["a"] = 6  graph["start"]["b"] = 2    graph["a"] = {}  graph["a"]["fin"] = 1    graph["b"] = {}  graph["b"]["a"] = 3  graph["b"]["fin"] = 5    graph["fin"] = {}

最優權值表示，初始化costs並添值：

# the costs table    infinity = float("inf")  costs = {}  costs["a"] = 6  costs["b"] = 2  costs["fin"] = infinity

初始化parents表，用於建立圖最優路徑記錄

parents = {}  parents["a"] = "start"  parents["b"] = "start"  parents["fin"] = None

代碼

https://github.com/egonSchiele/grokking_algorithms/blob/master/07_dijkstras_algorithm/python/01_dijkstras_algorithm.py

processed表示已經處理的節點，將不再處理

find_lowest_cost_node函數遍歷找到此結點下對應的結點中最小的路徑權值結點

processed = []    def find_lowest_cost_node(costs):      lowest_cost = float("inf")      lowest_cost_node = None      for node in costs:          cost = costs[node]          if cost < lowest_cost and node not in processed:              lowest_cost = cost              lowest_cost_node = node      return lowest_cost_node      node = find_lowest_cost_node(costs)  # If you've processed all the nodes, this while loop is done.  while node is not None:      cost = costs[node]      # Go through all the neighbors of this node.      neighbors = graph[node]      for n in neighbors.keys():          new_cost = cost + neighbors[n]          # If it's cheaper to get to this neighbor by going through this node...          if costs[n] > new_cost:              costs[n] = new_cost              # This node becomes the new parent for this neighbor.              parents[n] = node      # Mark the node as processed.      processed.append(node)      # Find the next node to process, and loop.      node = find_lowest_cost_node(costs)    print("Cost from the start to each node:")  print(costs)

貪婪算法

貪婪算法通過局部最優解企圖獲得全局最優解，面臨NP完全問題，速度快，獲得近似最優解。

問題提出

旅行商問題

https://www.jianshu.com/p/478f6b1fe60f

思路

針對複雜很慢才能全局最優解的問題，採用每次找局部最可能，直到結束。

背包問題：當數個價值不同的大小不同的商品放入一定容量的背包，直接考慮其價值而不考慮大小，按步填充。獲取可能的最大價值。

等，都是貪婪算法。

動態規劃

使用網格來進行的算法解決問題，首先計算縮小的問題規模，進而確定最終問題規模。

問題提出

總4磅的容量裝入更高價值的物品

思路

繪製表格，找到最小的單元容量，然後一步步確定在不同容量下的最大價值，最終獲得總價值。

參閱：算法圖解書

其它

狄克斯特拉算法不能計算包含負邊權的圖，如果需要計算則需要貝爾曼–福德算法。

KNN算法的K一般取sqrt（N），N為總數量

MapReduce作為一個分佈式算法，在數據映射和歸併有良好效果。

布隆過濾器和HyperLogLog用于海量數據檢索過程中，使用概率型方法，得到極可能正確的檢索結果。