1597272360

*This article goes through the implementation of Graph Convolution Networks (GCN) using Spektral API, which is a Python library for graph deep learning based on Tensorflow 2. We are going to perform Semi-Supervised Node Classification using CORA dataset, similar to the work presented in the original GCN paper by Thomas Kipf and Max Welling (2017).*

*If you want to get basic understanding on Graph Convolutional Networks, it is recommended to read the first and the second parts of this series beforehand.*

CORA citation network dataset consists of **2708 nodes, **where each node represents a document or a technical paper. The node features are bag-of-words representation that indicates the presence of a word in the document. The vocabulary — hence, also the node features — contains **1433 **words.

Illustration of bag-of-words as node features, via source

We will treat the dataset as an **undirected graph** where the edge represents whether one document cites the other or vice versa. There is no edge feature in this dataset. The goal of this task is to classify the nodes (or the documents) into 7 different classes which correspond to the papers’ research areas. This is a single-label multi-class classification problem with**Single Mode** data representation setting.

This implementation is also an example of Transductive Learning, where the neural network sees all data, including the test dataset, during the training. This is contrast to Inductive Learning — which is the typical Supervised Learning — where the test data is kept separate during the training.

Since we are going to classify documents based on their textual features, a common machine learning way to look at this problem is by seeing it as a supervised text classification problem. Using this approach, the machine learning model will learn each document’s hidden representation only based on its own features.

Illustration of text classification approach on a document classification problem (image by author)

This approach might work well if there are enough labeled examples for each class. Unfortunately, in real world cases, labeling data might be expensive.

*What is another approach to solve this problem?*

Besides its own text content, normally, a technical paper also cites other related papers. Intuitively, the cited papers are likely to belong to similar research area.

In this citation network dataset, we want to leverage the citation information from each paper in addition to its own textual content. Hence, the dataset has now turned into a network of papers.

Illustration of citation network dataset with partly labeled data (image by author)

Using this configuration, we can utilize Graph Neural Networks, such as Graph Convolutional Networks (GCNs), to build a model that learns the documents interconnection in addition to their own textual features. The GCN model will learn the nodes (or documents) hidden representation not only based on its own features, but also its neighboring nodes’ features. Hence, we can reduce the number of necessary labeled examples and implement semi-supervised learning utilizing the **Adjacency Matrix (A)**or the nodes connectivity within a graph.

Another case where Graph Neural Networks might be useful is when each example does not have distinct features on its own, but the relations between the examples can enrich the feature representations.

#machine-learning #neural-networks #deep-learning #data-science #artificial-intelligence #deep learning

1595691960

In this post, we’re gonna take a close look at one of the well-known *Graph neural networks* named *GCN.* First, we’ll get the intuition to see how it works, then we’ll go deeper into the maths behind it.

Many problems are graphs in true nature. In our world, we see many data are graphs, such as molecules, social networks, and paper citations networks.

Examples of graphs. (Picture from [1])

- Node classification: Predict a type of a given node
- Link prediction: Predict whether two nodes are linked
- Community detection: Identify densely linked clusters of nodes
- Network similarity: How similar are two (sub)networks

In the graph, we have node features (the data of nodes) and the structure of the graph (how nodes are connected).

For the former, we can easily get the data from each node. But when it comes to the structure, it is not trivial to extract useful information from it. For example, if 2 nodes are close to one another, should we treat them differently to other pairs? How about high and low degree nodes? In fact, each specific task can consume a lot of time and effort just for Feature Engineering, i.e., to distill the structure into our features.

Feature engineering on graphs. (Picture from [1])

It would be much better to somehow get both the node features and the structure as the input, and let the machine to figure out what information is useful by itself.

That’s why we need Graph Representation Learning.

We want the graph can learn the “feature engineering” by itself. (Picture from [1])

**Paper:** **Semi-supervised Classification with Graph Convolutional Networks****(2017) [3]**

**GCN** is a type of **convolutional neural network** that **can work directly on graphs** and take advantage of their structural information.

it solves the problem of classifying nodes (such as documents) in a graph (such as a citation network), where labels are only available for a small subset of nodes (semi-supervised learning).

Example of Semi-supervised learning on Graphs. Some nodes dont have labels (unknown nodes).

#graph-neural-networks #graph-convolution-network #deep-learning #neural-networks

1602410400

A typical feedforward neural network takes the features of each data point as input and outputs the prediction. The neural network is trained utilizing the features and the label of each data point in the training data set. Such a framework has been shown to be very effective in a variety of applications, such as face identification, handwriting recognition, object detection, where no explicit relationships exist between data points. However, in some use cases, the prediction for a data point *v*(*i*) can be determined not only by its own features but also by the features of other data points *v*(*j*) when the relationship between *v*(*i*) and *v*(*j*) is given. For example, the topic of a journal paper (e.g computer science, physics, or biology) can be inferred from the frequency of words appearing in the paper. On the other hand, the reference in a paper can also be informative when predicting the topic of the paper. In this example, not only do we know the features of each individual data point (the word frequency), we also know the relationship between the data points (citation relation). So how can we combine them to increase the accuracy of the prediction?

By applying graph convolutional networks (GCN), the features of an individual data point and its connected data points will be combined and fed into the neural network. Let’s use the paper classification problem again as an example. In a citation graph (Fig. 1), each paper is represented by a vertex in the citation graph. The edges between the vertices represent the citation relationships. For simplicity, the edges are treated as undirected. Each paper and its feature vector are denoted as *v_i* and *x_i* respectively. Following the GCN model by Kipf and Welling [1], we can predict the topics of papers using a neural network with one hidden layer with the following steps:

Figure 1.(Image by Author) The architecture of graph convolutional networks. Each vertex vi represents a paper in the citation graph. xi is the feature vector of vi. W(0) and W(1) are the weight matrices of the 3-layer neural network. A, D, and I are the adjacency matrix, outdegree matrix, and identity matrix respectively. The horizontal and vertical propagations are highlighted in orange and blue respectively.

In the above workflow, steps 1 and 4 perform horizontal propagation where the information of each vertex is propagated to its neighbors. While steps 2 and 5 perform vertical propagation where the information on each layer is propagated to the next layer. (see Fig. 1) For a GCN with multiple hidden layers, there will be multiple iterations of horizontal and vertical propagations. It is worth noting that each time horizontal propagation is performed, the information of a vertex is propagated one-hop further on the graph. In this example, the horizontal propagation is performed twice (steps 2 and 4), so the prediction of each vertex not only depends on its own features, but also the features of all the vertices within 2-hop distance from it. Additionally, since the weight matrix W(0) and W(1)are shared by all the vertices, the size of the neural network does not have to increase with the graph size, which makes this approach scalable.

#classification #machine-learning #graph-convolution-network #semi-supervised-learning #graph-database

1604127900

This blog post will summarise the paper “ Simplifying Graph Convolutional Networks[1] ”, which tries to reverse engineer the Graph Convolutional Networks. So, let us evolve Graph Convolutional Networks backward.

Graphs are pervasive models of structures. They are everywhere, from social networks to the chemistry molecule. Various things can be represented in terms of graphs. However, applying Machine learning to these structures is something that didn’t come directly to us. Everything in Machine learning came from a small simple idea or model which was made complex with time as per the need. Just as an example, initially, we had Perceptron which evolved to Multi-Layer perception, similarly, we had image filters that evolved to non-linear CNNs, and so on. However, Graph Convolutional Networks, referred to as GCN, were something we derived directly from existing ideas and had a more complex start. Thus, to debunk the GCNs, the paper tries to reverse engineer the GCN and proposes a simplified linear model called **Simple Graph Convolution (SGC).** SGC as when applied gives comparable performance to GCNs and is faster than even the Fast-GCN.

Inputs to the Graph convolutional network are:

1. Node Labels

2. Adjacency matrix

**Adjacency matrix: **The adjacency matrix **A **is **n x n,**matrix where n is the number of nodes, with a(i,j) = 1 if node i is connected to node j else a(i,j) = 0. If edge is weighted then a(i,j) = edge weight.

**Diagonal Matrix: **Diagonal matrix **D **is n x n matrix with d(i,i) = sum of **i**th row of adjacency matrix.

**Input features: **X is an input feature matrix of size **n x c** with c as the number of classes.

Let us see how GCNs actually work before reverse engineering it.

#machine-learning #graph-convolution #graph-neural-networks #gcn #neural-networks

1593182280

Neural Networks have gained massive success in the last decade. However, early variants of Neural Networks could only be implemented using

#artificial-intelligence #data-science #graph-neural-networks #deep-learning #machine-learning #node

1602838800

This post will summarize the paper SimGNN which aims for fast graph similarity computation. Graphs are structures that are used to link different entities that we call nodes using relationships called edges. Graphs exist everywhere from bonds between the atoms to friends on Facebook, all these scenarios can be represented as a graph. One of the fundamental graph problems includes finding similarity between graphs. The similarity between graphs can be defined using these metrics :

- Graph Edit Distance
- Maximum Common Subgraph

However, currently available algorithms that are used to calculate these metrics have high complexities and it is not yet possible to compute exact GED using these for graphs having more than 16 nodes.

Some ways to compute these metrics are :

- Pruning verification Framework
- Approximating the GED in fast and heuristic ways

SimGNN follows another approach to tackle this problem i.e turning similarity computation problem into a learning problem.

Before getting into how SimGNN works, we must know the requirements to be satisfied by this model. It includes :

**Representation Invariant**: Different representations of the same graph should give the same results.- **Inductive: **Should be able to predict results for unseen graphs.
**Learnable:**Must work on different similarity metrics like GED and MCS

**SimGNN Approach: **To achieve the above-stated requirements, SimGNN uses two strategies

- Design Learnable Embedding Function: This maps the graph into an embedding vector, which provides a global summary of a graph. Here, some nodes of importance are selected and used for embedding computation. (less time complexity)
- Pair-wise node comparison: The above embedding are too coarse, thus further compute the pairwise similarity scores between nodes from the two graphs, from which the histogram features are extracted and combined with the graph level information. (this is a time-consuming strategy)

#graph-edit-distance #machine-learning #graph-neural-networks #graph-convolution-network