How to do Deep Learning on Graphs with Graph Convolutional Networks

Machine learning on graphs is a difficult task due to the highly complex, but also informative graph structure. This post is the second in a series on how to do deep learning on graphs with Graph Convolutional Networks (GCNs), a powerful type of neural network designed to work directly on graphs and leverage their structural information. I will provide a brief recap of the previous post, but you can find the other parts of the series here:

Machine learning on graphs is a difficult task due to the highly complex, but also informative graph structure. This post is the second in a series on how to do deep learning on graphs with Graph Convolutional Networks (GCNs), a powerful type of neural network designed to work directly on graphs and leverage their structural information. I will provide a brief recap of the previous post, but you can find the other parts of the series here:

  1. A High-Level Introduction to Graph Convolutional Networks
  2. Semi-Supervised Learning with Spectral Graph Convolutions (this)

A Brief Recap

In my previous post on GCNs, we a saw a simple mathematical framework for expressing propagation in GCNs. In short, given an N × F⁰ feature matrix X and a matrix representation of the graph structure, e.g., the N × N adjacency matrix A of G, each hidden layer in the GCN can be expressed as Hⁱ = f(Hⁱ⁻¹, A)) where H⁰ = X and f is a propagation rule. Each layer Hⁱ corresponds to an N × F feature matrix where each row is a feature representation of a node.

We saw propagation rules of the form

  1. f(Hⁱ, A) = σ(AHⁱWⁱ), and
  2. f(Hⁱ, A) = σ(D⁻¹ÂHⁱWⁱ) where  = A + I, I is the identity matrix, and D⁻¹ is the degree matrix of Â.

These rules computes the feature representation of a node as an aggregate of the feature representations of its neighbors before it is transformed by applying the weights Wⁱ and activation function σ. We can make the aggregation and transformation steps more explicit by expressing propagation rules 1 and 2 above as f(Hⁱ, A) = transform(aggregate(A,Hⁱ), Wⁱ) where transform(M, Wⁱ) = σ(MWⁱ) and aggregate(A,Hⁱ) = AHⁱ for rule 1 and aggregate(A,Hⁱ) = D⁻¹Â Hⁱ for rule 2.

As we discussed in the last post, aggregation in rule 1 represents a node as sum of its neighbors feature representations which has two significant shortcomings:

  • the aggregated representation of a node does not include its own features, and
  • nodes with large degrees will have large values in their feature representation while nodes with small degrees will have small values, which makes can lead to issues with exploding gradients and make it harder to train using algorithms such as stochastic gradient descent which are sensitive to feature scaling.

To fix these two issues, rule 2 first enforces self loops by adding the identity matrix to A and aggregate on using the transformed adjacency matrix  = A + I. Next, the feature representations are normalized by multiplication with the inverse degree matrix D⁻¹, turning the aggregate into a mean where the scale of the aggregated feature representation is invariant to node degree.

In the following I will refer to rule 1 as the sum rule and rule 2 as the mean rule.

Spectral Graph Convolutions

A recent paper by Kipf and Welling proposes fast approximate spectral graph convolutions using a spectral propagation rule [1]:

Compared to the sum and mean rules discussed in the previous post, the spectral rule differs only in the choice of aggregate function. Although it is somewhat similar to the mean rule in that it normalizes the aggregate using the degree matrix D raised to a negative power, the normalization is asymmetric. Let’s try it out and see what it does.

Aggregation as a Weighted Sum

We can understand the aggregation functions I’ve presented thus far as weighted sums where each aggregation rule differ only in their choice of weights. We’ll first see how we can express the relatively simple sum and mean rules as weighted sums before moving on to the spectral rule.

The Sum Rule

To see how the aggregate feature representation of the ith node is computed using the sum rule, we see how the ith row in the aggregate is computed.

The Sum Rule as a Weighted Sum

As shown above in Equation 1a, we can compute the aggregate feature representation of the ith node as a vector-matrix product. We can formulate this vector-matrix product as a simple weighted sum, as shown in Equation 1b, where we sum over each of the N rows in X.

The contribution of the jth node in the aggregate in Equation 1b is determined by the value of the jth column of the ith row of A. Since A is an adjacency matrix, this value is 1 if the jth node is a neighbor of the ith node, and is otherwise 0. Thus, Equation 1b corresponds to summing up the feature representations of the neighbors of the ith node. This confirms the informal observations from the previous post.

In conclusion, the contribution of each neighbor depends solely on the neighborhood defined by the adjacency matrix A.

The Mean Rule

To see how the mean rule aggregates node representations, we again see how the ith row in the aggregate is computed, now using the mean rule. For simplicity, we only consider the mean rule on the “raw“ adjacency matrix without addition between A and the identity matrix I which simply corresponds to adding self-loops to the graph.

The Mean Rule as a Weighted Sum

As seen in the equations above, the derivation is now slightly longer. In Equation 2a we now first transform the adjacency matrix A by multiplying it with the inverse degree matrix D. This computation is made more explicit in Equation 2b. The inverse degree matrix is a diagional matrix where the values along the diagonal are inverse node degrees s.t. the value at position (i, i) is the inverse degree of the ith node. Thus, we can remove one of the summation signs yielding Equation 2c. Equation 2c can be further reduced yieding Equations 2d and 2e.

As shown by Equation 2e, we now again sum over each of the N rows in the adjacency matrix A. As mentioned during the discussion of the sum rule, this corresponds to summing over each the ith node’s neighbors. However, the weights in the weighted sum in Equation 2e are now guaranteed to sum to 1 by with the degree of the ith node. Thus, Equation 2e corresponds to a mean over the feature representations of the neighbors of the ith node.

Whereas the sum rule depends solely on the neighborhood defined by the adjacency matrix A, the mean rule also depends on node degrees.

The Spectral Rule

We now have a useful framework in place to analyse the spectral rule. Let’s see where it takes us!

The Spectral Rule as a Weighted Sum

As with the mean rule, we transform the adjacency matrix A using the degree matrix D. However, as shown in Equation 3a, we raise the degree matrix to the power of -0.5 and multiply it on each side of A. This operation can be broken down as shown in Equation 3b. Recall again, that degree matrices (and powers thereof) are diagonal. We can therefore simplify Equation 3b further, until we reach the expression in Equation 3e.

Equation 3e shows something quite interesting. When computing the aggregate feature representation of the ith node, we not only take into consideration the degree of the ith node, but also the degree of the jth node.

Similar to the mean rule, the spectral rule normalizes the aggregate s.t. the aggregate feature representation remains roughly on the same scale as the input features. However, the spectral rule weighs neighbor in the weighted sum higher if they have a low-degree and lower if they have a high-degree. This may be useful when low-degree neighbors provide more useful information than high-degree neighbors.

Semi-Supervised Classification with GCNs

In addition to the spectral rule, Kipf and Welling demonstrate how GCNs can be used for semi-supervised classification [1]. So far we have implicitly assumed that the entire graph is available, i.e., that we are in a transductive setting. In other words, we know all the nodes, but not all the node labels.

In all the rules we’ve seen, we aggregate over node neighborhoods, and thus nodes that share neighbors tend to have similar feature representations. This property is very useful if the graph exhibits homophily, i.e., that connected nodes tend to be similar (e.g. have the same label). Homophily occurs in many real networks, and particularly social networks exhibit strong homophily.

As we saw in the previous post, even a randomly initialized GCN can achieve good separation between the feature representations of nodes in a homophilic graph just by using the graph structure. We can take this a step further by training the GCN on the labeled nodes, effectively propagating the node label information to unlabelled nodes. This can be done as follows [1]:

  1. Perform forward propagation through the GCN.
  2. Apply the sigmoid function row-wise on the last layer in the GCN.
  3. Compute the cross entropy loss on known node labels.
  4. Backpropagate the loss and update the weight matrices W in each layer.

Community Prediction in Zachary’s Karate Club

Let’s see how the spectral rule propagates node label information to unlabelled nodes using semi-supervised learning. As in the previous post, we will use Zachary’s Karate Club as an example.

Zachary’s Karate Club

Briefly, Zachary’s Karate Club is a small social network where a conflict arises between the administrator and instructor in a karate club. The task is to predict which side of the conflict each member of the karate club chooses. The graph representation of the network can be seen below. Each node represents a member of the karate club and a link between members indicate that they interact outside the club. The Administrator and Instructor marked with A and I, respectively.

Zachary’s Karate Club

Spectral Graph Convolutions in MXNet

I implement the spectral rule in MXNet, an easy-to-use and efficient deep learning framework. The implementation is as follows:

class SpectralRule(HybridBlock):
    def __init__(self,
                 A, in_units, out_units,
                 activation, **kwargs):
        I = nd.eye(*A.shape)
        A_hat = A.copy() + I
        D = nd.sum(A_hat, axis=0)
        D_inv = D**-0.5
        D_inv = nd.diag(D_inv)
        A_hat = D_inv * A_hat * D_inv
    self.in_units, self.out_units = in_units, out_units
    with self.name_scope():
        self.A_hat = self.params.get_constant('A_hat', A_hat)
        self.W = self.params.get(
            'W', shape=(self.in_units, self.out_units)
        if activation == 'ident':
            self.activation = lambda X: X
            self.activation = Activation(activation)
def hybrid_forward(self, F, X, A_hat, W):
    aggregate =, X)
    propagate = self.activation(, W))
    return propagate

init takes as input an adjacency matrix A along with the input and output dimensionality of each node’s feature representation from the graph convolutional layer;in_units, and out_units, respectively. Self-loops are added to the adjacency matrix A through addition with the identity matrix I, calculate the degree matrix D, and transform the adjacency matrix A to A_hat as specified by the spectral rule. This transformation is not strictly necessary, but is more computationally efficient since the transformation would otherwise be performed during each forward pass of the layer.

Finally, in the with clause in init, we store two model parameters — A_hat is stored as a constant and the weight matrix W is stored as a trainable parameter.

hybrid_forward is where the magic happens. In the forward pass we execute this method with the following inputs: X, the output of the previous layer, and the parameters A_hat and W that we defined in the constructor init.

Building the Graph Convolutional Network

Now that we have an implementation of the spectral rule, we can stack such layers on top of each other . We use a two-layer architecture similar to the one in the previous post, where the first hidden layer has 4 units and the second hidden layer has 2 units. This architecture makes it easy visualize the resulting 2-dimensional embeddings. It differs from the architecture in the previous post in three ways:

  • We use the spectral rule rather than the mean rule.
  • We use different activation functions: the tanh activation function is used in the first layer since the probability of dead neurons would otherwise be quite high and the second layer uses the identity function since we use the last layer to classify nodes.

Finally, we add a logistic regression layer on top of the GCN for node classification.

The Python implementation of the above architecture is as follows.

def build_model(A, X):
model = HybridSequential()
with model.name_scope():
features = build_features(A, X)
classifier = LogisticRegressor()
return model, features

I have separated the feature learning part of the network that contains the graph convolutional layers into a features component and the classification part into the classifier component. The separate features component makes it easier to visualise the activations of these layers later. The LogisticRegressor used as the classifier is a classification layer that performs logistic regression by summing over the features of each node provided by the last graph convolutional layer and applying the sigmoid function on this sum.

For completeness, the code to construct the features component is

def build_features(A, X):
hidden_layer_specs = [(4, 'tanh'), (2, 'tanh')]
in_units = in_units=X.shape[1]

features = HybridSequential()
with features.name_scope():
    for layer_size, activation_func in hidden_layer_specs:
        layer = SpectralRule(
            A, in_units=in_units, out_units=layer_size,
        in_units = layer_size
return features

and the code for the LogisticRegressor is

class LogisticRegressor(HybridBlock):
def init(self, in_units, **kwargs):
with self.name_scope():
self.w = self.params.get(
'w', shape=(1, in_units)
self.b = self.params.get(
'b', shape=(1, 1)
def hybrid_forward(self, F, X, w, b):
# Change shape of b to comply with MXnet addition API
b = F.broadcast_axis(b, axis=(0,1), size=(34, 1))
y =, w, transpose_b=True) + b
return F.sigmoid(y)

Training the GCN

The code for training the GCN model can be seen below. In brief, I initialize a binary cross entropy loss function, cross_entropy, and an SGD optimizer, trainer to learn the network parameters. Then the model is trained for a a specified number of epochs where the loss is calculated for each training example and the error is backpropagated using loss.backward(). trainer.step is then invoked to update the model parameters. After each epoch, the feature representation constructed by the GCN layer is stored in the feature_representations list which we shall inspect shortly.

def train(model, features, X, X_train, y_train, epochs):
cross_entropy = SigmoidBinaryCrossEntropyLoss(from_sigmoid=True)
trainer = Trainer(
model.collect_params(), 'sgd',
{'learning_rate': 0.001, 'momentum': 1})
feature_representations = [features(X).asnumpy()]
for e in range(1, epochs + 1):
for i, x in enumerate(X_train):
y = array(y_train)[i]
with autograd.record():
preds = model(X)[x]
loss = cross_entropy(preds, y)
return feature_representations

Crucially, only the labels of the instructor and administrator are labeled and the remaining nodes in the network are known, but unlabeled! The GCN can find representations for both labeled and unlabeled nodes during graph convolution and can leverage both sources of information during training to perform semi-supervised learning.

Visualizing the Features

As mentioned above, the feature representations at each epoch are stored which allows us to see how the feature representations changes during training. In the following I consider two input feature representations.

Representation 1

In the first representation, we simply use the sparse 34 × 34 identity matrix, I, as the feature matrix X. This representation has the advantage that it can be used in any graphs, but results in an input parameter for each node in the network which requires a substantial amount of memory and computional power for training on large networks and may result in overfitting. Thankfully, the karate club network is quite small. The network is trained for 5000 epochs using this representation.

Classification Errors in the Karate Club using Representation 1

By collectively classifying all nodes in the network, we get the distribution of errors in the network shown on above. Here, black indicates misclassification. Although a nearly half (41%) of the nodes are misclassified, the nodes that are closely connected to either the administrator or instructor (but not both!) tend to be correctly classified.

Changes in Feature Representation during Training using Representation 1

To the left, I have illustrated how the feature representation changes during training. The nodes are initially closely clustered, but as training progresses the instructor and administrator are pulled apart, dragging some nodes with them.

Although the administrator and instructor are given quite different representations, the nodes they drag with them do not necessarily belong to their community. This is because the graph convolutions embed nodes that share neighbors closely together in the feature space, but two nodes that share neighbors may not be equally connected to the administrator and instructor. In particular, using the identity matrix as the feature matrix results in highly local representations of each node, i.e., nodes that belong to the same area of the graph are likely to be embedded closely together. This makes it difficult for the network to share common knowledge between distant areas in an inductive fashion.

Representation 2

We will improve representation 1 by adding two features that are not specific to any node or area of the network, but measures the connectedness to the administrator and instructor. To this end, we compute the shortest path distance from each node in the network to both the administrator and instructor and concatenate these two features to the previous representation.

On might perhaps consider this cheating a little bit, since we inject global information about the location of each node in the graph; information which should (ideally) be captured by the graph convolutional layers in the features component. However, the graph convolutional layers always have a local perspective and has a limited capacity to capture such information. Still, it serves as a useful tool for understanding GCNs.

Classification Errors in the Karate Club using Representation 1

As before, we collectively classify all nodes in the network and plot the distribution of errors in the network shown on above. This time, only four nodes are misclassified; a significant improvement over representation 1! Upon closer inspection of the feature matrix, these nodes are either equidistant (in a shortest path sense) to the instructor and administrator or are closer to the administrator but belong in the instructor community. The GCN is trained for 250 epochs using representation 1.

Changes in Feature Representation during Training using Representation 2

As shown on the left, the nodes are again clustered quite closely together initially, but are somewhat separated into communities before training even begins! As training progresses the distance between the communities increases.

What’s Next?

In this post, I have given an in-depth explanation on how aggregation in GCNs is performed and shown how it can be expressed as a weighted sum, using the mean, sum, and spectral rules as examples. My sincere hope is that you will find this framework useful to consider which weights you might want during aggregation in your own graph convolutional network.

I have also shown how to implement and train a GCN in MXNet to perform semi-supervised classification on graphs using spectral graph convolutions with Zachary’s Karate Club as a simple example network. We saw how just using two labeled nodes, it was still possible for the GCN to achieve a high degree of separation between the two network communities in the representation space.

Although there is much more to learn about graph convolutional networks which I hope to have the time to share with you in the future, this is (for now) the final post in the series. If you are interested in further reading, I would like to conclude with the following papers which I have found quite interesting:

  1. Inductive Representation Learning on Large Graphs
  2. In this paper, Hamilton et al., propose several new aggregate functions that, e.g., uses max/mean pooling or multi-layer perceptrons. In addition, they also propose a simple method to do mini-batch training for GCNs, greatly improving training speed.
  3. FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling
  4. A drawback of the mini-batch method proposed by Hamilton et al., is that the number of nodes in a batch grows exponentially in the number of aggregates performed due to their recursive. Chen et. al, propose their FastGCN method which addresses this shortcoming by performing batched training of graph convolutional layers independently.
  5. N-GCN: Multi-scale Graph Convolution for Semi-supervised Node Classification
  6. Where FastGCN addresses the problems of training recursive graph convolutional networks, N-GCN challenges the premise that GCNs need to be recursive at all! Abu-El-Haija et al. instead propose a flat architecture with multiple (N) GCNs whose outputs are concatenated together. Each GCN captures the neighborhood at different distances (based on random walk semantics), thus avoiding recursive aggregation. Thanks to Binny Mathew for bringing this to my attention.

Learn Data Science | How to Learn Data Science for Free

Learn Data Science | How to Learn Data Science for Free

Learn Data Science | How to Learn Data Science for Free. In this post, I have described a learning path and free online courses and tutorials that will enable you to learn data science for free.

The average cost of obtaining a masters degree at traditional bricks and mortar institutions will set you back anywhere between $30,000 and $120,000. Even online data science degree programs don’t come cheap costing a minimum of $9,000. So what do you do if you want to learn data science but can’t afford to pay this?

I trained into a career as a data scientist without taking any formal education in the subject. In this article, I am going to share with you my own personal curriculum for learning data science if you can’t or don’t want to pay thousands of dollars for more formal study.

The curriculum will consist of 3 main parts, technical skills, theory and practical experience. I will include links to free resources for every element of the learning path and will also be including some links to additional ‘low cost’ options. So if you want to spend a little money to accelerate your learning you can add these resources to the curriculum. I will include the estimated costs for each of these.

Technical skills

The first part of the curriculum will focus on technical skills. I recommend learning these first so that you can take a practical first approach rather than say learning the mathematical theory first. Python is by far the most widely used programming language used for data science. In the Kaggle Machine Learning and Data Science survey carried out in 2018 83% of respondents said that they used Python on a daily basis. I would, therefore, recommend focusing on this language but also spending a little time on other languages such as R.

Python Fundamentals

Before you can start to use Python for data science you need a basic grasp of the fundamentals behind the language. So you will want to take a Python introductory course. There are lots of free ones out there but I like the Codeacademy ones best as they include hands-on in-browser coding throughout.

I would suggest taking the introductory course to learn Python. This covers basic syntax, functions, control flow, loops, modules and classes.

Data analysis with python

Next, you will want to get a good understanding of using Python for data analysis. There are a number of good resources for this.

To start with I suggest taking at least the free parts of the data analyst learning path on Dataquest offers complete learning paths for data analyst, data scientist and data engineer. Quite a lot of the content, particularly on the data analyst path is available for free. If you do have some money to put towards learning then I strongly suggest putting it towards paying for a few months of the premium subscription. I took this course and it provided a fantastic grounding in the fundamentals of data science. It took me 6 months to complete the data scientist path. The price varies from $24.50 to $49 per month depending on whether you pay annually or not. It is better value to purchase the annual subscription if you can afford it.

The Dataquest platform

Python for machine learning

If you have chosen to pay for the full data science course on Dataquest then you will have a good grasp of the fundamentals of machine learning with Python. If not then there are plenty of other free resources. I would focus to start with on scikit-learn which is by far the most commonly used Python library for machine learning.

When I was learning I was lucky enough to attend a two-day workshop run by Andreas Mueller one of the core developers of scikit-learn. He has however published all the material from this course, and others, on this Github repo. These consist of slides, course notes and notebooks that you can work through. I would definitely recommend working through this material.

Then I would suggest taking some of the tutorials in the scikit-learn documentation. After that, I would suggest building some practical machine learning applications and learning the theory behind how the models work — which I will cover a bit later on.


SQL is a vital skill to learn if you want to become a data scientist as one of the fundamental processes in data modelling is extracting data in the first place. This will more often than not involve running SQL queries against a database. Again if you haven’t opted to take the full Dataquest course then here are a few free resources to learn this skill.

Codeacamdemy has a free introduction to SQL course. Again this is very practical with in-browser coding all the way through. If you also want to learn about cloud-based database querying then Google Cloud BigQuery is very accessible. There is a free tier so you can try queries for free, an extensive range of public datasets to try and very good documentation.

Codeacademy SQL course


To be a well-rounded data scientist it is a good idea to diversify a little from just Python. I would, therefore, suggest also taking an introductory course in R. Codeacademy have an introductory course on their free plan. It is probably worth noting here that similar to Dataquest Codeacademy also offers a complete data science learning plan as part of their pro account (this costs from $31.99 to $15.99 per month depending on how many months you pay for up front). I personally found the Dataquest course to be much more comprehensive but this may work out a little cheaper if you are looking to follow a learning path on a single platform.

Software engineering

It is a good idea to get a grasp of software engineering skills and best practices. This will help your code to be more readable and extensible both for yourself and others. Additionally, when you start to put models into production you will need to be able to write good quality well-tested code and work with tools like version control.

There are two great free resources for this. Python like you mean it covers things like the PEP8 style guide, documentation and also covers object-oriented programming really well.

The scikit-learn contribution guidelines, although written to facilitate contributions to the library, actually cover the best practices really well. This covers topics such as Github, unit testing and debugging and is all written in the context of a data science application.

Deep learning

For a comprehensive introduction to deep learning, I don’t think that you can get any better than the totally free and totally ad-free This course includes an introduction to machine learning, practical deep learning, computational linear algebra and a code-first introduction to natural language processing. All their courses have a practical first approach and I highly recommend them. platform


Whilst you are learning the technical elements of the curriculum you will encounter some of the theory behind the code you are implementing. I recommend that you learn the theoretical elements alongside the practical. The way that I do this is that I learn the code to be able to implement a technique, let’s take KMeans as an example, once I have something working I will then look deeper into concepts such as inertia. Again the scikit-learn documentation contains all the mathematical concepts behind the algorithms.

In this section, I will introduce the key foundational elements of theory that you should learn alongside the more practical elements.

The khan academy covers almost all the concepts I have listed below for free. You can tailor the subjects you would like to study when you sign up and you then have a nice tailored curriculum for this part of the learning path. Checking all of the boxes below will give you an overview of most elements I have listed below.



Calculus is defined by Wikipedia as “the mathematical study of continuous change.” In other words calculus can find patterns between functions, for example, in the case of derivatives, it can help you to understand how a function changes over time.

Many machine learning algorithms utilise calculus to optimise the performance of models. If you have studied even a little machine learning you will probably have heard of Gradient descent. This functions by iteratively adjusting the parameter values of a model to find the optimum values to minimise the cost function. Gradient descent is a good example of how calculus is used in machine learning.

What you need to know:


  • Geometric definition
  • Calculating the derivative of a function
  • Nonlinear functions

Chain rule

  • Composite functions
  • Composite function derivatives
  • Multiple functions


  • Partial derivatives
  • Directional derivatives
  • Integrals

Linear Algebra

Many popular machine learning methods, including XGBOOST, use matrices to store inputs and process data. Matrices alongside vector spaces and linear equations form the mathematical branch known as Linear Algebra. In order to understand how many machine learning methods work it is essential to get a good understanding of this field.

What you need to learn:

Vectors and spaces

  • Vectors
  • Linear combinations
  • Linear dependence and independence
  • Vector dot and cross products

Matrix transformations

  • Functions and linear transformations
  • Matrix multiplication
  • Inverse functions
  • Transpose of a matrix


Here is a list of the key concepts you need to know:

Descriptive/Summary statistics

  • How to summarise a sample of data
  • Different types of distributions
  • Skewness, kurtosis, central tendency (e.g. mean, median, mode)
  • Measures of dependence, and relationships between variables such as correlation and covariance

Experiment design

  • Hypothesis testing
  • Sampling
  • Significance tests
  • Randomness
  • Probability
  • Confidence intervals and two-sample inference

Machine learning

  • Inference about slope
  • Linear and non-linear regression
  • Classification

Practical experience

The third section of the curriculum is all about practice. In order to truly master the concepts above you will need to use the skills in some projects that ideally closely resemble a real-world application. By doing this you will encounter problems to work through such as missing and erroneous data and develop a deep level of expertise in the subject. In this last section, I will list some good places you can get this practical experience from for free.

“With deliberate practice, however, the goal is not just to reach your potential but to build it, to make things possible that were not possible before. This requires challenging homeostasis — getting out of your comfort zone — and forcing your brain or your body to adapt.”, Anders Ericsson, Peak: Secrets from the New Science of Expertise

Kaggle, et al

Machine learning competitions are a good place to get practice with building machine learning models. They give access to a wide range of data sets, each with a specific problem to solve and have a leaderboard. The leaderboard is a good way to benchmark how good your knowledge at developing a good model actually is and where you may need to improve further.

In addition to Kaggle, there are other platforms for machine learning competitions including Analytics Vidhya and DrivenData.

Driven data competitions page

UCI Machine Learning Repository

The UCI machine learning repository is a large source of publically available data sets. You can use these data sets to put together your own data projects this could include data analysis and machine learning models, you could even try building a deployed model with a web front end. It is a good idea to store your projects somewhere publically such as Github as this can create a portfolio showcasing your skills to use for future job applications.

UCI repository

Contributions to open source

One other option to consider is contributing to open source projects. There are many Python libraries that rely on the community to maintain them and there are often hackathons held at meetups and conferences where even beginners can join in. Attending one of these events would certainly give you some practical experience and an environment where you can learn from others whilst giving something back at the same time. Numfocus is a good example of a project like this.

In this post, I have described a learning path and free online courses and tutorials that will enable you to learn data science for free. Showcasing what you are able to do in the form of a portfolio is a great tool for future job applications in lieu of formal qualifications and certificates. I really believe that education should be accessible to everyone and, certainly, for data science at least, the internet provides that opportunity. In addition to the resources listed here, I have previously published a recommended reading list for learning data science available here. These are also all freely available online and are a great way to complement the more practical resources covered above.

Thanks for reading!

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data

Downloadable PDF of Best AI Cheat Sheets in Super High Definition

Let’s begin.

Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Data Science in HD

Part 1: Neural Networks Cheat Sheets

Neural Networks Cheat Sheets

Neural Networks Basics

Neural Networks Basics Cheat Sheet

An Artificial Neuron Network (ANN), popularly known as Neural Network is a computational model based on the structure and functions of biological neural networks. It is like an artificial human nervous system for receiving, processing, and transmitting information in terms of Computer Science.

Basically, there are 3 different layers in a neural network :

  1. Input Layer (All the inputs are fed in the model through this layer)
  2. Hidden Layers (There can be more than one hidden layers which are used for processing the inputs received from the input layers)
  3. Output Layer (The data after processing is made available at the output layer)

Neural Networks Graphs

Neural Networks Graphs Cheat Sheet

Graph data can be used with a lot of learning tasks contain a lot rich relation data among elements. For example, modeling physics system, predicting protein interface, and classifying diseases require that a model learns from graph inputs. Graph reasoning models can also be used for learning from non-structural data like texts and images and reasoning on extracted structures.

Part 2: Machine Learning Cheat Sheets

Machine Learning Cheat Sheets

>>> If you like these cheat sheets, you can let me know here.<<<

Machine Learning with Emojis

Machine Learning with Emojis Cheat Sheet

Machine Learning: Scikit Learn Cheat Sheet

Scikit Learn Cheat Sheet

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines is a simple and efficient tools for data mining and data analysis. It’s built on NumPy, SciPy, and matplotlib an open source, commercially usable — BSD license

Scikit-learn Algorithm Cheat Sheet

Scikit-learn algorithm

This machine learning cheat sheet will help you find the right estimator for the job which is the most difficult part. The flowchart will help you check the documentation and rough guide of each estimator that will help you to know more about the problems and how to solve it.

If you like these cheat sheets, you can let me know here.### Machine Learning: Scikit-Learn Algorythm for Azure Machine Learning Studios

Scikit-Learn Algorithm for Azure Machine Learning Studios Cheat Sheet

Part 3: Data Science with Python

Data Science with Python Cheat Sheets

Data Science: TensorFlow Cheat Sheet

TensorFlow Cheat Sheet

TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. It is a symbolic math library, and is also used for machine learning applications such as neural networks.

If you like these cheat sheets, you can let me know here.### Data Science: Python Basics Cheat Sheet

Python Basics Cheat Sheet

Python is one of the most popular data science tool due to its low and gradual learning curve and the fact that it is a fully fledged programming language.

Data Science: PySpark RDD Basics Cheat Sheet

PySpark RDD Basics Cheat Sheet

“At a high level, every Spark application consists of a driver program that runs the user’s main function and executes various parallel operations on a cluster. The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.” via Spark.Aparche.Org

Data Science: NumPy Basics Cheat Sheet

NumPy Basics Cheat Sheet

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

***If you like these cheat sheets, you can let me know ***here.

Data Science: Bokeh Cheat Sheet

Bokeh Cheat Sheet

“Bokeh is an interactive visualization library that targets modern web browsers for presentation. Its goal is to provide elegant, concise construction of versatile graphics, and to extend this capability with high-performance interactivity over very large or streaming datasets. Bokeh can help anyone who would like to quickly and easily create interactive plots, dashboards, and data applications.” from

Data Science: Karas Cheat Sheet

Karas Cheat Sheet

Keras is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit, Theano, or PlaidML. Designed to enable fast experimentation with deep neural networks, it focuses on being user-friendly, modular, and extensible.

Data Science: Padas Basics Cheat Sheet

Padas Basics Cheat Sheet

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license.

If you like these cheat sheets, you can let me know here.### Pandas Cheat Sheet: Data Wrangling in Python

Pandas Cheat Sheet: Data Wrangling in Python

Data Wrangling

The term “data wrangler” is starting to infiltrate pop culture. In the 2017 movie Kong: Skull Island, one of the characters, played by actor Marc Evan Jackson is introduced as “Steve Woodward, our data wrangler”.

Data Science: Data Wrangling with Pandas Cheat Sheet

Data Wrangling with Pandas Cheat Sheet

“Why Use tidyr & dplyr

  • Although many fundamental data processing functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together → leads to difficult-to-read nested functions and/or choppy code.
  • R Studio is driving a lot of new packages to collate data management tasks and better integrate them with other analysis activities → led by Hadley Wickham & the R Studio teamGarrett Grolemund, Winston Chang, Yihui Xie among others.
  • As a result, a lot of data processing tasks are becoming packaged in more cohesive and consistent ways → leads to:
  • More efficient code
  • Easier to remember syntax
  • Easier to read syntax” via Rstudios

Data Science: Data Wrangling with ddyr and tidyr

Data Wrangling with ddyr and tidyr Cheat Sheet

If you like these cheat sheets, you can let me know here.### Data Science: Scipy Linear Algebra

Scipy Linear Algebra Cheat Sheet

SciPy builds on the NumPy array object and is part of the NumPy stack which includes tools like Matplotlib, pandas and SymPy, and an expanding set of scientific computing libraries. This NumPy stack has similar users to other applications such as MATLAB, GNU Octave, and Scilab. The NumPy stack is also sometimes referred to as the SciPy stack.[3]

Data Science: Matplotlib Cheat Sheet

Matplotlib Cheat Sheet

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented APIfor embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of matplotlib.

Pyplot is a matplotlib module which provides a MATLAB-like interface matplotlib is designed to be as usable as MATLAB, with the ability to use Python, with the advantage that it is free.

Data Science: Data Visualization with ggplot2 Cheat Sheet

Data Visualization with ggplot2 Cheat Sheet

>>> If you like these cheat sheets, you can let me know here. <<<

Data Science: Big-O Cheat Sheet

Big-O Cheat Sheet


Special thanks to DataCamp, Asimov Institute, RStudios and the open source community for their content contributions. You can see originals here:

Big-O Algorithm Cheat Sheet:

Bokeh Cheat Sheet:

Data Science Cheat Sheet:

Data Wrangling Cheat Sheet:

Data Wrangling:

Ggplot Cheat Sheet:

Keras Cheat Sheet:


Machine Learning Cheat Sheet:

Machine Learning Cheat Sheet:

ML Cheat Sheet::

Matplotlib Cheat Sheet:


Neural Networks Cheat Sheet:

Neural Networks Graph Cheat Sheet:

Neural Networks:

Numpy Cheat Sheet:


Pandas Cheat Sheet:


Pandas Cheat Sheet:

Pyspark Cheat Sheet:

Scikit Cheat Sheet:


Scikit-learn Cheat Sheet:

Scipy Cheat Sheet:


TesorFlow Cheat Sheet:

Tensor Flow:

How to get started with Python for Deep Learning and Data Science

How to get started with Python for Deep Learning and Data Science

A step-by-step guide to setting up Python for Deep Learning and Data Science for a complete beginner

A step-by-step guide to setting up Python for Deep Learning and Data Science for a complete beginner

You can code your own Data Science or Deep Learning project in just a couple of lines of code these days. This is not an exaggeration; many programmers out there have done the hard work of writing tons of code for us to use, so that all we need to do is plug-and-play rather than write code from scratch.

You may have seen some of this code on Data Science / Deep Learning blog posts. Perhaps you might have thought: “Well, if it’s really that easy, then why don’t I try it out myself?”

If you’re a beginner to Python and you want to embark on this journey, then this post will guide you through your first steps. A common complaint I hear from complete beginners is that it’s pretty difficult to set up Python. How do we get everything started in the first place so that we can plug-and-play Data Science or Deep Learning code?

This post will guide you through in a step-by-step manner how to set up Python for your Data Science and Deep Learning projects. We will:

  • Set up Anaconda and Jupyter Notebook
  • Create Anaconda environments and install packages (code that others have written to make our lives tremendously easy) like tensorflow, keras, pandas, scikit-learn and matplotlib.

Once you’ve set up the above, you can build your first neural network to predict house prices in this tutorial here:

Build your first Neural Network to predict house prices with Keras

Setting up Anaconda and Jupyter Notebook

The main programming language we are going to use is called Python, which is the most common programming language used by Deep Learning practitioners.

The first step is to download Anaconda, which you can think of as a platform for you to use Python “out of the box”.

Visit this page: and scroll down to see this:

This tutorial is written specifically for Windows users, but the instructions for users of other Operating Systems are not all that different. Be sure to click on “Windows” as your Operating System (or whatever OS that you are on) to make sure that you are downloading the correct version.

This tutorial will be using Python 3, so click the green Download button under “Python 3.7 version”. A pop up should appear for you to click “Save” into whatever directory you wish.

Once it has finished downloading, just go through the setup step by step as follows:

Click Next

Click “I Agree”

Click Next

Choose a destination folder and click Next

Click Install with the default options, and wait for a few moments as Anaconda installs

Click Skip as we will not be using Microsoft VSCode in our tutorials

Click Finish, and the installation is done!

Once the installation is done, go to your Start Menu and you should see some newly installed software:

You should see this on your start menu

Click on Anaconda Navigator, which is a one-stop hub to navigate the apps we need. You should see a front page like this:

Anaconda Navigator Home Screen

Click on ‘Launch’ under Jupyter Notebook, which is the second panel on my screen above. Jupyter Notebook allows us to run Python code interactively on the web browser, and it’s where we will be writing most of our code.

A browser window should open up with your directory listing. I’m going to create a folder on my Desktop called “Intuitive Deep Learning Tutorial”. If you navigate to the folder, your browser should look something like this:

Navigating to a folder called Intuitive Deep Learning Tutorial on my Desktop

On the top right, click on New and select “Python 3”:

Click on New and select Python 3

A new browser window should pop up like this.

Browser window pop-up

Congratulations — you’ve created your first Jupyter notebook! Now it’s time to write some code. Jupyter notebooks allow us to write snippets of code and then run those snippets without running the full program. This helps us perhaps look at any intermediate output from our program.

To begin, let’s write code that will display some words when we run it. This function is called print. Copy and paste the code below into the grey box on your Jupyter notebook:

print("Hello World!")

Your notebook should look like this:

Entering in code into our Jupyter Notebook

Now, press Alt-Enter on your keyboard to run that snippet of code:

Press Alt-Enter to run that snippet of code

You can see that Jupyter notebook has displayed the words “Hello World!” on the display panel below the code snippet! The number 1 has also filled in the square brackets, meaning that this is the first code snippet that we’ve run thus far. This will help us to track the order in which we have run our code snippets.

Instead of Alt-Enter, note that you can also click Run when the code snippet is highlighted:

Click Run on the panel

If you wish to create new grey blocks to write more snippets of code, you can do so under Insert.

Jupyter Notebook also allows you to write normal text instead of code. Click on the drop-down menu that currently says “Code” and select “Markdown”:

Now, our grey box that is tagged as markdown will not have square brackets beside it. If you write some text in this grey box now and press Alt-Enter, the text will render it as plain text like this:

If we write text in our grey box tagged as markdown, pressing Alt-Enter will render it as plain text.

There are some other features that you can explore. But now we’ve got Jupyter notebook set up for us to start writing some code!

Setting up Anaconda environment and installing packages

Now we’ve got our coding platform set up. But are we going to write Deep Learning code from scratch? That seems like an extremely difficult thing to do!

The good news is that many others have written code and made it available to us! With the contribution of others’ code, we can play around with Deep Learning models at a very high level without having to worry about implementing all of it from scratch. This makes it extremely easy for us to get started with coding Deep Learning models.

For this tutorial, we will be downloading five packages that Deep Learning practitioners commonly use:

  • Set up Anaconda and Jupyter Notebook
  • Create Anaconda environments and install packages (code that others have written to make our lives tremendously easy) like tensorflow, keras, pandas, scikit-learn and matplotlib.

The first thing we will do is to create a Python environment. An environment is like an isolated working copy of Python, so that whatever you do in your environment (such as installing new packages) will not affect other environments. It’s good practice to create an environment for your projects.

Click on Environments on the left panel and you should see a screen like this:

Anaconda environments

Click on the button “Create” at the bottom of the list. A pop-up like this should appear:

A pop-up like this should appear.

Name your environment and select Python 3.7 and then click Create. This might take a few moments.

Once that is done, your screen should look something like this:

Notice that we have created an environment ‘intuitive-deep-learning’. We can see what packages we have installed in this environment and their respective versions.

Now let’s install some packages we need into our environment!

The first two packages we will install are called Tensorflow and Keras, which help us plug-and-play code for Deep Learning.

On Anaconda Navigator, click on the drop down menu where it currently says “Installed” and select “Not Installed”:

A whole list of packages that you have not installed will appear like this:

Search for “tensorflow”, and click the checkbox for both “keras” and “tensorflow”. Then, click “Apply” on the bottom right of your screen:

A pop up should appear like this:

Click Apply and wait for a few moments. Once that’s done, we will have Keras and Tensorflow installed in our environment!

Using the same method, let’s install the packages ‘pandas’, ‘scikit-learn’ and ‘matplotlib’. These are common packages that data scientists use to process the data as well as to visualize nice graphs in Jupyter notebook.

This is what you should see on your Anaconda Navigator for each of the packages.


Installing pandas into your environment


Installing scikit-learn into your environment


Installing matplotlib into your environment

Once it’s done, go back to “Home” on the left panel of Anaconda Navigator. You should see a screen like this, where it says “Applications on intuitive-deep-learning” at the top:

Now, we have to install Jupyter notebook in this environment. So click the green button “Install” under the Jupyter notebook logo. It will take a few moments (again). Once it’s done installing, the Jupyter notebook panel should look like this:

Click on Launch, and the Jupyter notebook app should open.

Create a notebook and type in these five snippets of code and click Alt-Enter. This code tells the notebook that we will be using the five packages that you installed with Anaconda Navigator earlier in the tutorial.

import tensorflow as tf

import keras

import pandas

import sklearn

import matplotlib

If there are no errors, then congratulations — you’ve got everything installed correctly:

A sign that everything works!

If you have had any trouble with any of the steps above, please feel free to comment below and I’ll help you out!

*Originally published by Joseph Lee Wei En at *


Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

Learn More

A Complete Machine Learning Project Walk-Through in Python

Machine Learning In Node.js With TensorFlow.js

An A-Z of useful Python tricks

Top 10 Algorithms for Machine Learning Newbies

Automated Machine Learning on the Cloud in Python

Introduction to PyTorch and Machine Learning

Python Tutorial for Beginners (2019) - Learn Python for Machine Learning and Web Development

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python for Data Science and Machine Learning Bootcamp

Data Science, Deep Learning, & Machine Learning with Python

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Artificial Intelligence A-Z™: Learn How To Build An AI