deepwalk、node2vec的高性能優化

  • 2021 年 2 月 20 日
  • AI

首先是graph的存儲問題,看了很多實現node2vec或deepwalk的library,大都是以networkx或是類似的Python class對象為graph的存儲形式進行開發,首先我們需要知道目前單機版的deepwalk和node2vec的主要的性能瓶頸在random walk部分,因為底層的word2vec基本上都是直接調用的gensim的word2vec(gensim的word2vec基於C++開發,目前新版的gensim的word2vec的fast version使用了cython代替了原來的一些邏輯實現,個人感覺從word2vec層面再去做優化太複雜了,也很難比gensim做的更好,除此之外另一個思路就是用tf.keras或torch來實現word2vec,借用gpu的算力來試圖擊敗gensim,然而看完這幾篇,我心灰意冷了:

Word2vec on GPU slower than CPU · Issue #13048 · tensorflow/tensorflowgithub.com圖標Does Gensim library support GPU acceleration?stackoverflow.com圖標//www.reddit.com/r/learnmachinelearning/comments/88eeua/why_is_gensim_word2vec_so_much_faster_than_keras/www.reddit.com

graph通常以相互引用的內存指針的方式存儲在內存中,這意味着每個節點在內存中都位於不同的位置,這種基於基於鏈表的思想,使得內存訪問的速度成為了主要的瓶頸,因為每次從一個節點移動到另一個節點,都需要在內存中尋址和查詢,一種更好的方式是把節點都打包到數組中,因為數據的內存地址一般是連續的,在內存訪問上的開銷要小很多;(這裡還沒有提到networkx本身用純python編寫導致的較低效的編譯效率問題,具體可見:

馬東什麼:為什麼python這麼慢?numba解決了什麼問題?zhuanlan.zhihu.com圖標

一個較好的解決方案就是使用scipy.sparse.csr_matrix對graph進行存儲和後續的訪問等,

csr_matrix可見下:

馬東什麼:scipy sparse 中稀疏矩陣的常見存儲方式zhuanlan.zhihu.com圖標

可以看到,通過將graph轉化為csr_matrix,利用其巧妙地存儲方式,可以大大提高節點和節點之間的訪問效率。

下面就來看一個非常nice的小眾的graph library,csr_graph和nodevectors,nodevectors基本上是對csr_graph做的封裝,底層的邏輯直接使用了csr_graph的接口api,所以直接研究csr_graph即可。

VHRanger/CSRGraphgithub.com圖標

整個csr graph的核心在於graph的重構:

class csrgraph():
    """
    This top level python class either calls external JIT'ed methods
        or methods from the JIT'ed internal graph
    """
    def __init__(self, data, nodenames=None, copy=True, threads=0):
        """
        A class for larger graphs.

        NOTE: this class tends to "steal data" by default.
            If you pass a numpy array/scipy matrix/csr graph
            Chances are this object will point to the same instance of data

        Parameters:
        -------------------

        data : The graph data. Can be one of:
            **NetworkX Graph**
            **Numpy dense matrix**
            **CSR Matrix**
            **(data, indices, indptr)**
            **CSRGraph object**

        nodenames (array of str or int) : Node names
            The position in this array should correspond with the node ID
            So if passing a CSR Matrix or raw data, it should be co-indexed
            with the matrix/raw data arrays

        copy : bool
            Wether to copy passed data to create new object
            Default behavior is to point to underlying ctor passed data
            For networkX graphs and numpy dense matrices we create a new object anyway
        threads : int
            number of threads to leverage for methods in this graph.
            WARNING: changes the numba environment variable to do it.
            Recompiles methods and changes it when changed.
            0 is numba default (usually all threads)

        TODO: add numpy mmap support for very large on-disk graphs
            Should be in a different class
            This also requires routines to read/write edgelists, etc. from disk
        
        TODO: subclass scipy.csr_matrix???
        """
        # If input is a CSRGraph, same object
        if isinstance(data, csrgraph):
            if copy:
                self.mat = data.mat.copy()
                self.names = deepcopy(data.names)
            else:
                self.mat = data.mat
                self.names = data.names
            if not nodenames:
                nodenames = self.names
            else:
                self.names = nodenames
        # NetworkX Graph input
        elif isinstance(data, (nx.Graph, nx.DiGraph)):
            mat = nx.adj_matrix(data)
            mat.data = mat.data.astype(np.float32)
            self.mat = mat
            nodenames = list(data)
        # CSR Matrix Input
        elif isinstance(data, sparse.csr_matrix):
            if copy: self.mat = data.copy()
            else: self.mat = data
        # Numpy Array Input
        elif isinstance(data, np.ndarray):
            self.mat = sparse.csr_matrix(data)
        else:
            raise ValueError(f"Incorrect input data type: {type(data).__name__}")
        # Now that we have the core csr_matrix, alias underlying arrays
        assert hasattr(self, 'mat')
        self.weights = self.mat.data
        self.src = self.mat.indptr
        self.dst = self.mat.indices
        # indptr has one more element than nnodes
        self.nnodes = self.src.size - 1
        # node name -> node ID
        if nodenames is not None:
            self.names = pd.Series(nodenames)
        else:
            self.names = pd.Series(np.arange(self.nnodes))
        # Bounds check once here otherwise there be dragons later
        max_idx = np.max(self.dst)
        if self.nnodes < max_idx:
            raise ValueError(f"""
                Out of bounds node: {max_idx}, nnodes: {self.nnodes}
            """)
        self.set_threads(threads)


    def set_threads(self, threads):
        self.threads = threads
        # Manage threading through Numba hack
        if type(threads) is not int:
            raise ValueError("Threads argument must be an int!")
        if threads == 0:
            threads = numba.config.NUMBA_DEFAULT_NUM_THREADS
        threads = str(threads)
        # If we change the number of threads, recompile
        try:
            prev_numba_value = os.environ['NUMBA_NUM_THREADS']
        except KeyError:
            prev_numba_value = threads
        if threads != prev_numba_value:
            os.environ['NUMBA_NUM_THREADS'] = threads
            _random_walk.recompile()
            _row_norm.recompile()
            _node2vec_walks.recompile()
            _node_degrees.recompile()
            _src_multiply.recompile()
            _dst_multiply.recompile()

    def __getitem__(self, node):
        """
        [] operator
        like networkX, gets names of neighbor nodes
        """
        # Get node ID from names array
        # This is O(n) by design -- we more often get names from IDs
        #    than we get IDs from names and we don't want to hold 2 maps
        # TODO : replace names with a pd.Index and use get_loc
        node_id = self.names[self.names == node].index[0]
        edges = self.dst[
            self.src[node_id] : self.src[node_id+1]
        ]
        return self.names.iloc[edges].values

    def nodes(self):
        """
        Returns the graph's nodes, in order
        """
        if self.names is not None:
            return self.names
        else:
            return np.arange(self.nnodes)

    def normalize(self, return_self=True):
        """
        Normalizes edge weights per node

        For any node in the Graph, the new edges' weights will sum to 1

        return_self : bool
            whether to change the graph's values and return itself
            this lets us call `G.normalize()` directly
        """
        new_weights = _row_norm(self.weights, self.src)
        if return_self:
            self.mat = sparse.csr_matrix((new_weights, self.dst, self.src))
            # Point objects to the correct places
            self.weights = self.mat.data
            self.src = self.mat.indptr
            self.dst = self.mat.indices
            gc.collect()
            return self
        else:
            return csrgraph(sparse.csr_matrix(
                (new_weights, self.dst, self.src)), 
                nodenames=self.names)

    def random_walks(self,
                walklen=10,
                epochs=1,
                start_nodes=None,
                normalize_self=False,
                return_weight=1.,
                neighbor_weight=1.):
        """
        Create random walks from the transition matrix of a graph
            in CSR sparse format

        Parameters
        ----------
        T : scipy.sparse.csr matrix
            Graph transition matrix in CSR sparse format
        walklen : int
            length of the random walks
        epochs : int
            number of times to start a walk from each nodes
        return_weight : float in (0, inf]
            Weight on the probability of returning to node coming from
            Having this higher tends the walks to be
            more like a Breadth-First Search.
            Having this very high  (> 2) makes search very local.
            Equal to the inverse of p in the Node2Vec paper.
        explore_weight : float in (0, inf]
            Weight on the probability of visitng a neighbor node
            to the one we're coming from in the random walk
            Having this higher tends the walks to be
            more like a Depth-First Search.
            Having this very high makes search more outward.
            Having this very low makes search very local.
            Equal to the inverse of q in the Node2Vec paper.
        threads : int
            number of threads to use.  0 is full use

        Returns
        -------
        out : 2d np.array (n_walks, walklen)
            A matrix where each row is a random walk,
            and each entry is the ID of the node
        """
        # Make csr graph
        if normalize_self:
            self.normalize(return_self=True)
            T = self
        else:
            T = self.normalize(return_self=False)
        n_rows = T.nnodes
        if start_nodes is None:
            start_nodes = np.arange(n_rows)
        sampling_nodes = np.tile(start_nodes, epochs)
        # Node2Vec Biased walks if parameters specified
        if (return_weight > 1. or return_weight < 1.
                or neighbor_weight < 1. or neighbor_weight > 1.):
            walks = _node2vec_walks(T.weights, T.src, T.dst,
                                    sampling_nodes=sampling_nodes,
                                    walklen=walklen,
                                    return_weight=return_weight,
                                    neighbor_weight=neighbor_weight)
        # much faster implementation for regular walks
        else:
            walks = _random_walk(T.weights, T.src, T.dst,
                                 sampling_nodes, walklen)
        return walks

初始化init部分的邏輯很簡單,就是把graph用scipy.sparse.csr_matrix的方式存儲,需要注意的是,csr matrix僅支持有向/無向+有權/無權的簡單圖的表示,如果是複雜的屬性圖,例如存在多重的邊關係或者是多個節點類型,則表示起來比較困難,不過常見的非GNN的graph embedding(以及排除了matapath2vec)之外大部分的graph embedding算法都是在簡單圖上運行的。

首先注意一個細節部分就是,我們要對邊的權重做標準化norm,

new_weights = _row_norm(self.weights, self.src)

其實現邏輯為:

@jit(nopython=True, parallel=True, nogil=True, fastmath=True)
def _row_norm(weights, src):
    """
    Returns the weights for normalized rows in a CSR Matrix.
    
    Parameters
    ----------
    weights : array[float]
        The data array from a CSR Matrix. 
        For a scipy.csr_matrix, is accessed by M.data
  
    src : array[int]
        The index pointer array from a CSR Matrix. 
        For a scipy.csr_matrix, is accessed by M.indptr
        
    ----------------
    returns : array[float32]
        The normalized data array for the CSR Matrix
    """
    n_nodes = src.size - 1
    res = np.empty(weights.size, dtype=np.float32)
    for i in numba.prange(n_nodes):
        s1 = src[i]
        s2 = src[i+1]
        rowsum = np.sum(weights[s1:s2])
        res[s1:s2] = weights[s1:s2] / rowsum
    return res

可以看到,這裡的邏輯就是把每一個節點的權重進行歸一化,即每一個節點的和它連接的邊的權重的權重之和為1,這裡的實現就是每一行所有非零值初一這一行的非0值的總和。

可以看到,這裡使用了numba和numba.prange進行了加速,需要注意的是,numba目前對於numpy的支持是最完善的,但是numba不支持scipy sparse!因此,我們將csr matrix的三要素:

indices,indtpr,data用三個numpy array來存儲,從而使用numba來間接優化scipy sparse,代碼中的src是indtpr,

這裡還是說明一下,如上圖,index pointers即indptr,第i個節點具有的鄰節點個數就是indptr[i+1]-indptr[1],第i個節點的鄰節點的集合的列索引就是 indptr[i:i+1],例如對於節點4來說,indptr[4:5]=3:6=[3,4,5],然後去indices中查找,indices[3,4,5]=[2,3,4],此時我們就知道了 第四行的第2,3,4列(注意,初始都是從0開始的)是有值的,因此節點4的鄰節點就是節點2和節點3(節點4和本身存在自環self loop),通過這種方式可以非常迅速地找到某個節點的鄰節點,如果要找尋二階鄰節點,我們可以通過numba.prange再寫一個循環,比如對於節點4的鄰節點[2,3],我們循環對csr matrix矩陣的第二行和第三行進行同樣的查詢就可以了,訪問非常的迅速,主要原因在於csr matrix本質上三個連續內存地址的數組,內存訪問的效率很高。

對於numba,這裡的裝飾器設置:

@jit(nopython=True, parallel=True, nogil=True, fastmath=True)

比較固定,一般設置成這樣就行,如果不涉及到並行則parallel可以不設置。

然後是核心的random walk部分:

    def random_walks(self,
                walklen=10,
                epochs=1,
                start_nodes=None,
                normalize_self=False,
                return_weight=1.,
                neighbor_weight=1.):
        """
        Create random walks from the transition matrix of a graph
            in CSR sparse format

        Parameters
        ----------
        T : scipy.sparse.csr matrix
            Graph transition matrix in CSR sparse format
        walklen : int
            length of the random walks
        epochs : int
            number of times to start a walk from each nodes
        return_weight : float in (0, inf]
            Weight on the probability of returning to node coming from
            Having this higher tends the walks to be
            more like a Breadth-First Search.
            Having this very high  (> 2) makes search very local.
            Equal to the inverse of p in the Node2Vec paper.
        explore_weight : float in (0, inf]
            Weight on the probability of visitng a neighbor node
            to the one we're coming from in the random walk
            Having this higher tends the walks to be
            more like a Depth-First Search.
            Having this very high makes search more outward.
            Having this very low makes search very local.
            Equal to the inverse of q in the Node2Vec paper.
        threads : int
            number of threads to use.  0 is full use

        Returns
        -------
        out : 2d np.array (n_walks, walklen)
            A matrix where each row is a random walk,
            and each entry is the ID of the node
        """
        # Make csr graph
        if normalize_self:
            self.normalize(return_self=True)
            T = self
        else:
            T = self.normalize(return_self=False)
        n_rows = T.nnodes
        if start_nodes is None:
            start_nodes = np.arange(n_rows)
        sampling_nodes = np.tile(start_nodes, epochs)
        # Node2Vec Biased walks if parameters specified
        if (return_weight > 1. or return_weight < 1.
                or neighbor_weight < 1. or neighbor_weight > 1.):
            walks = _node2vec_walks(T.weights, T.src, T.dst,
                                    sampling_nodes=sampling_nodes,
                                    walklen=walklen,
                                    return_weight=return_weight,
                                    neighbor_weight=neighbor_weight)
        # much faster implementation for regular walks
        else:
            walks = _random_walk(T.weights, T.src, T.dst,
                                 sampling_nodes, walklen)
        return walks

可以看到,為了實現使用numba加速,基本上需要把所有的變量都以numpy形式存儲,例如range使用np.arange,sampling_nodes = np.tile(start_nodes, epochs)等,numba目前也勉強支持list,dict這類的python數據結構,但是本身numpy的性能和擴展性都要好很多,而且也更加成熟,所以還是建議走numpy不容易各種bug。

最後就進入核心的核心了:

        if (return_weight > 1. or return_weight < 1.
                or neighbor_weight < 1. or neighbor_weight > 1.):
            walks = _node2vec_walks(T.weights, T.src, T.dst,
                                    sampling_nodes=sampling_nodes,
                                    walklen=walklen,
                                    return_weight=return_weight,
                                    neighbor_weight=neighbor_weight)
        # much faster implementation for regular walks
        else:
            walks = _random_walk(T.weights, T.src, T.dst,
                                 sampling_nodes, walklen)

首先來看下_random_walk

@(nopython=True, parallel=True, nogil=True, fastmath=True)
def _random_walk(weights, indptr, dst,
                 sampling_nodes, walklen):
    """
    Create random walks from the transition matrix of a graph 
        in CSR sparse format

    Parameters
    ----------
    weights : 1d np.array  weights表示的是csr matrix的直接取值,使用csr matrix
存儲的話,會有3個id array矩陣,分別是indices,indptr,data,含義在之前關於scipy的文章里
這裡的weights對應的是data。
        CSR data vector from a sparse matrix. Can be accessed by M.data
    indptr : 1d np.array  這裡的indptr對應的是csr matrix的indptr
        CSR index pointer vector from a sparse matrix. 
        Can be accessed by M.indptr
    dst : 1d np.array 這裡的dst對應的是csr matrix的indices
        CSR column vector from a sparse matrix. 
        Can be accessed by M.indices
    sampling_nodes : 1d np.array of int  samling nodes為我們要採樣的節點,如果沒有指定需要採樣的
特定的節點則默認是對整個graph的所有節點進行採樣
        List of node IDs to start random walks from.
        If doing sampling from walks, is equal to np.arrange(n_nodes) 
            repeated for each epoch
    walklen : int   random walks的長度,遊走的次數在更上層的函數里,用epochs來表示,這裡實際上是

        length of the random walks

    Returns
    -------
    out : 2d np.array (n_walks, walklen)
        A matrix where each row is a random walk, 
        and each entry is the ID of the node
    """
    n_walks = len(sampling_nodes) ##遊走的次數
    res = np.empty((n_walks, walklen), dtype=np.uint32) #用於存放遊走的結果序列
    n_nodes = indptr.size #節點數量
    n_edges = weights.size #邊數量
    for i in numba.prange(n_walks):#對每個節點實施下面的邏輯,這裡對每個節點的遊走採樣
#是完全獨立可並行的,所以用numba的prange來自動實現對循環的並行,非常的直觀方便
        # Current node (each element is one walk's state)
        state = sampling_nodes[i]
        for k in range(walklen-1): #遊走n次則進行n次循環,這裡的循環時順序執行的,所以無法並行
            # Write state
            res[i, k] = state # 初始節點,即start node
            # Find row in csr indptr
            start = indptr[state] #通過indptr來找到start node的鄰節點
# 前面已經提到過csr matrix的indptr如何訪問鄰節點的問題了
            end = indptr[state+1]
            # If there are edges in the node, find next step
            if start != end:#如果start!=end,這裡用於判斷是否無法繼續遊走(即遊走到的節點沒有鄰
節點或者其鄰節點在上一步已經遊走過了),如果
#可以繼續則繼續執行下面的代碼
              # transition probabilities
              p = weights[start:end] #根據indptr定位start node的鄰節點,通過weights
#和indptr返回的列索引定位到start node和其鄰節點之間的edges的權重的具體的值
              # cumulative distribution of transition probabilities
              cdf = np.cumsum(p)
              # Random draw in [0, 1] for each row
              # Choice is where random draw falls in cumulative distribution
              draw = np.random.rand()
              # Find where draw is in cdf
              # Then use its index to update state
              next_idx = np.searchsorted(cdf, draw) #上面通過簡單的三個np函數就實現了
#隨機採樣,注意這裡實現的是deepwalk的無偏簡單採樣,所以邏輯很簡單,searchsorted在這個
#下非常合適,其作用
#在於在數組a中插入數組v(並不執行插入操作),返回一個下標列表,
#這個列表指明了v中對應元素應該插入在a中那個位置上,通過這種方式用numpy就實現了隨機採樣的簡單邏輯了



              # Winner points to the column index of the next node
              state = dst[start + next_idx]#此時我們的start node發生改變,這裡涉及到下一個節點的
#尋址,可以看到,我們通過dst,即indices的列表可以非常方便的定位到下一個節點的行位置。

            # If there are no edges, this is the end of the walk
            else:
              res[i, k:] = state #存放採樣結果
              break
        # Write final states
        res[i, -1] = state#存放最終採樣結果,即最後一個採樣的節點
    return res

真是越來越愛numba了。。。主要在於相對於cython來說,numba的使用更簡單,你基本上用numpy的邏輯來寫就行,對於numpy的死忠粉來說幾乎無縫銜接。

這裡的思路也不複雜,為了方便注釋直接寫代碼里


下面i 看看node2vec的邏輯,原作者還沒有實現rejective sampling有偏採樣邏輯,具體在issues裏面有提到過,不過這裡有一個類似的版本中也是用numba實現了rejective sampling:

Accelerating node2vec with rejection samplinglouisabraham.github.io圖標//github.com/louisabraham/fastnode2vecgithub.com

感覺這兩個library可以做一個合併了。。。

import numpy as np
from numba import jit


@jit(nogil=True,nopython=True,parallel=True,fastmath=True)
def random_walk(walk_length, p, q, t):
    """sample a random walk starting from t
    """
    # Normalize the weights to compute rejection probabilities
    max_prob = max(1 / p, 1, 1 / q)
    prob_0 = 1 / p / max_prob
    prob_1 = 1 / max_prob
    prob_2 = 1 / q / max_prob

    # Initialize the walk
    walk = np.empty(walk_length, dtype=indices.dtype)
    walk[0] = t
    walk[1] = random_neighbor(t)

    for j in numba.prange(2, walk_length):
        while True:
            new_node = random_neighbor(walk[j - 1])
            r = np.random.rand()
            if new_node == walk[j - 2]:
                # back to the previous node
                if r < prob_0:
                    break
            elif is_neighbor(walk[j - 2], new_node):
                # distance 1
                if r < prob_1:
                    break
            elif r < prob_2:
                # distance 2
                break
        walk[j] = new_node
        
    return walk

我們看一下node2vec的有偏採樣的邏輯,csrgraph怎麼寫的:

@jit(nopython=True, nogil=True, parallel=True, fastmath=True)
def _node2vec_walks(Tdata, Tindptr, Tindices,
                    sampling_nodes,
                    walklen,
                    return_weight,
                    neighbor_weight):
    """
    Create biased random walks from the transition matrix of a graph 
        in CSR sparse format. Bias method comes from Node2Vec paper.
    
    Parameters
    ----------
    Tdata : 1d np.array  casrmatrix的data
        CSR data vector from a sparse matrix. Can be accessed by M.data
    Tindptr : 1d np.array casrmatrix的indptr
        CSR index pointer vector from a sparse matrix. 
        Can be accessed by M.indptr
    Tindices : 1d np.array casrmatrix的indices
        CSR column vector from a sparse matrix. 
        Can be accessed by M.indices
    sampling_nodes : 1d np.array of int 採樣節點,默認是全部節點
        List of node IDs to start random walks from.
        Is generally equal to np.arrange(n_nodes) repeated for each epoch
    walklen : int 隨機遊走的步數
        length of the random walks
    return_weight : float in (0, inf]  返回權重,對應node2vec論文中的p
        Weight on the probability of returning to node coming from
        Having this higher tends the walks to be 
        more like a Breadth-First Search.
        Having this very high  (> 2) makes search very local.
        return weights如果大於2則遊走非常接近局部
        Equal to the inverse of p in the Node2Vec paper.
    explore_weight : float in (0, inf] # 探索權重,對應node2vec論文中的q
        Weight on the probability of visitng a neighbor node
        to the one we're coming from in the random walk
        Having this higher tends the walks to be 
        more like a Depth-First Search.
        Having this very high makes search more outward.
        Having this very low makes search very local.
        Equal to the inverse of q in the Node2Vec paper.
    Returns
    -------
    out : 2d np.array (n_walks, walklen)
        A matrix where each row is a biased random walk, 
        and each entry is the ID of the node
    """
    n_walks = len(sampling_nodes)
    res = np.empty((n_walks, walklen), dtype=np.uint32)
    for i in numba.prange(n_walks):
        # Current node (each element is one walk's state)
        state = sampling_nodes[i]
        res[i, 0] = state
        # Do one normal step first
        state = _node2vec_first_step(state, Tdata, Tindices, Tindptr)
        for k in range(1, walklen-1):
            # Write state
            res[i, k] = state
            state = _node2vec_inner(
                res, i, k, state,
                Tdata, Tindices, Tindptr, 
                return_weight, neighbor_weight
            )
        # Write final states
        res[i, -1] = state
    return res

這裡實現了兩個函數,一個是_node2vec_first_step用於有偏遊走的初始化,一個是_node2vec_inner用於有偏遊走後續的實現

其中:

@jit(nopython=True, nogil=True, fastmath=True)
def _node2vec_first_step(state, Tdata, Tindices, Tindptr):
    """
    Inner code for node2vec walks
    Normal random walk step
    Comments for this logic are in _random_walk
    """
    start = Tindptr[state]
    end = Tindptr[state+1]
    p = Tdata[start:end]
    cdf = np.cumsum(p)
    draw = np.random.rand()
    next_idx = np.searchsorted(cdf, draw)
    state = Tindices[start + next_idx]
    return state

這裡和deepwalk的邏輯是一樣的,實際上有偏的定義是從初始遊走之後的第一個遊走才開始的,這一點建議對照論文的圖仔細看看:

馬東什麼:Node2vec 深入分析zhuanlan.zhihu.com圖標

也就是初始先隨機走一步,然後開始下面正式的有偏遊走:

@jit(nopython=True, nogil=True, fastmath=True)
def _node2vec_inner(
    res, i, k, state,
    Tdata, Tindices, Tindptr, 
    return_weight, neighbor_weight):
    """
    Inner loop core for node2vec walks
    Does the biased walk updating (pure function)
    
    All arguments are directly from the node2vec walks method
    """
    # Find rows in csr indptr
    prev = res[i, k-1]
    start = Tindptr[state]
    end = Tindptr[state+1]
    start_prev = Tindptr[prev]
    end_prev = Tindptr[prev+1]
    # Find overlaps and fix weights
    this_edges = Tindices[start:end]
    prev_edges = Tindices[start_prev:end_prev]
    p = np.copy(Tdata[start:end])
    ret_idx = np.where(this_edges == prev)
    p[ret_idx] = np.multiply(p[ret_idx], return_weight)
    for pe in prev_edges:
        n_idx_v = np.where(this_edges == pe)
        n_idx = n_idx_v[0]
        p[n_idx] = np.multiply(p[n_idx], neighbor_weight)
    # Get next state
    cdf = np.cumsum(np.divide(p, np.sum(p)))
    draw = np.random.rand()
    next_idx = np.searchsorted(cdf, draw)
    new_state = this_edges[next_idx]
    return new_state

這裡 思路其實是把p和q的參數和edges的權重進行相乘然後轉化為常規的deepwalk的決策過程:

    cdf = np.cumsum(np.divide(p, np.sum(p)))
    draw = np.random.rand()
    next_idx = np.searchsorted(cdf, draw)

學好numpy,做一些性能不錯的開發工作還是很ok的~一些不是很複雜的需求,用numpy+numba或者numpy+numba或者numpy+numba+cython的性能並不會比c++差。