Deepwalk: Online Learning of Social Representations

Perozzi et al.



0. Definition

  • Algorithm that generates latent representations of vertices with random walk sequence generation

1. Situation

  • Traditional approaches learn hypothesis H that maps elements of X to be label Y.
  • Recent advancements in language modeling and deep learning.

2. Background

2.1 Random walks

Random walk to capture local information from neighbors of a vertex.
Random walk rooted at vertex $v_i$ as $W_{v_i}$. A stochastic process with random variables $W_{v_i}^{1}$, $W_{v_i}^{2}$, ... , $W_{v_i}^{k+1}$ such that $W_{v_i}^{k+1}$ is a vetex chosen at random from the neighbors of vertex $v_k$.

Properties

  1. Local exploration is easy to parallelize. Several random walkers can simultaneously explore different parts of the same graph.
  2. Relying on information obtained from short random walks make it possbile to accommodate small changes in the graph structure without the need for global recomputation.

2.2 Language modeling

The goal of language modeling is estimate the likelihood of a specific sequence of words appearing in a corpus. Given a sequence of words $W_1^n = (w_0, w_1, ... , w_n)$ where $w_i \in V$, maximize the $Pr(w_n \mid w_0, w_1, \dots , w_{n-1})$ over all the training corpus. Here, they generalized this language modeling to get representations of vertex, introducing a mapping function $\Phi : v \in V \mapsto \mathbb{R}^{|V| x d}$. The problem is then, to estimate the likelihood :

$Pr\Big(v\_i \mid (\Phi(v\_1), \Phi(v\_2), \dots, \Phi(v\_{i-1}))\Big)$

However, as the walk length grows, computing this objective function becomes unfeasible.
Instead, use one word to predict the context. Also, the context is composed of the words appearing to right side of the given word as well as the left side. Finally, removes the ordering constraint on the problem.
So, the final opimization problem :

$minimize_\Phi - log Pr\big({v_{i-w}, \dots , v_{i-1}, v_{i+1}, \dots, v_{i+w}} \mid \Phi(v_i)\big)$

This relaxations are desirable for social represnetation learning. 1. The order independence assumption better captures a sense of 'nearness' that is provided by random walks. 2. Useful for speeding up the training tiem by building small models as one vertex is given at a time.

Solving the optimization problem builds representations that capture the shared similarities in local graph structure between vertices. Vertices which have similar neighborhoods will acquire similar representations.


3. Approach

First a random walk generator and second an update procedure.

Their preliminary results did not show any advantage of using restarts.
With Skip-gram, we can learn such posterior distribution using several choices of classifiers. However, models that have $|V|$ labels requires a huge amount of computational resources. To speed the training time, Hierarchical softmax can be used to approximate the probability distribution.

Hierarchical softmax

If we assign the vertices to the leaves of a binary tree, the prediction problem turns into maximizing the probability of a specific path in the tree. If the path to vertex $u_k$ is identified by a sequence of tree nodes $(b_0, b_1, \dots, b_{\lceil log|V| \rceil})$, ($b_0$ = root, $b_{\lceil log|V| \rceil}$ = $u_k$) then

$Pr(u_k \mid \Phi(v_j)) = \prod_{l=1}^{\lceil log|V| \rceil} Pr(b_l \mid \Phi(v_j))$

Now, $Pr(b_l \mid \Phi(v_j))$ could be modeled by a binary classifier that is assigned to the parent of the node $b_l$. This reduces the computational complexity of calculating $Pr(u_k \mid \Phi(v_j))$ from $O(|V|))$ to $O(log|V|)$.


4. Adavantage

  1. Online algorithm : do not require repeating the learning process all over again
  2. Scalable : do not require huge memory and works well on a large graph
  3. Community aware : distance between latent dimensions represent a metric for evaluating similarity of nodes

5. Limitation

  1. Works only on undirected, unweighted graph
  2. Requires additional time for random walk sampling (Pipelined)


[github]
https://github.com/phanein/deepwalk

+ Recent posts