A Web Developer's Guide to Machine Learning in JavaScript

A Web Developer's Guide to Machine Learning in JavaScript

Because most of them are not only used in machine learning, the JavaScript ecosystem developed a couple of sophisticated solutions beforehand: Math: math.js. Data Analysis: d3.js. Server: node.js (express, koa, hapi) Performance: Tensorflow.js

Recently, I was wondering how I could escape the web development bubble for a while. 2017 was full of React, Redux and MobX in JavaScript where I have written actively about those topics on my blog, developed small (1, 2, 3, ...) and large scale applications based on them, self-published two educational ebooks, and implemented a course platform with those technologies to teach others about them. The last year was all about those subjects, so I needed a side project to escape it for a while and to get into a zen mode of learning again.

How did I get to machine learning? A couple of months ago, I started to listen to the Machine Learning Guide podcast. I found out about it by chance and highly recommend it to get you an introduction for machine learning. Tyler Renelle is doing an amazing job to get you excited about the topic. I almost feel like I am following him on the same path to learn about machine learning now. Even though I didn't actively plan about learning ML, it was interesting to hear about all those foreign concepts. There it was again; this excitement when everything is unexplored. I felt like a whole new world opened up in front of me. It was the same feeling when I finally got the foot into web development.

As I read about a couple of machine learning articles, the course on Machine Learning by Andrew Ng was the by far most recommended to get started in machine learning. I have never taken an online course from start to end before, even though I actively give these online courses myself, but I decided to give it a shot this time. Fortunately, the course had started one week ago. So I enrolled in it and by now finished it. It's a blast and I recommend everyone who wants to get into ML to take it. Even though it's a big commitment in the first place to enroll in the course for 12 weeks. But more about it later.

After university, I immediately took a job to work on a large scale application in JavaScript. So I never had the chance to apply most of my technical and mathematical learnings that I learned at university. Yet it was great to grow in web development and JavaScript over the last years and I don't want to miss that time. But when I started to learn about machine learning, it was a pleasure to cram out all the learnings in math. Suddenly I had a use case where it would make sense to take the derivative of a function: gradient descent. Why aren't schools and universities showing these real world use cases in a simplified version to motivate their students with hands-on problems? Learning all the theoretical things is fine, but when you finally apply the derivative for an optimization problem, it actually becomes exciting. It was always difficult for me to pick up a book about plain math. But as I started to relearn the math for machine learning again, I had an applicable domain for it. So I started to relearn all those things from university which obviously go beyond taking the derivative.

So why is this article about machine learning in JavaScript? If you are coming from web development as I do, you might know how difficult it can be to make the leap over to another domain such as machine learning. It comes with its own constraints. Not only the whole domain with its algorithms is different, but also its programming languages suited for machine learning paired with mathematical concepts from linear algebra, calculus and statistics. Personally, I found it an interesting strategy to boil down the different learning parts in machine learning: algorithms, programming languages (e.g. Python) and mathematical concepts. When I looked at those, I knew that I would definitely have to learn about the machine learning algorithms themselves and the underlying mathematical concepts. But I could strip out the best suited machine learning programming language and replace it with a language where I felt most efficient: JavaScript.

The following article should give you a gentle introduction to machine learning from a web developer's perspective. It should show you the opportunities in the field of machine learning and why it could be an advantage to learn about those things with JavaScript as a web developer now. Furthermore, it should give you guidance on how to approach the topic without learning everything from scratch. You can leverage the implementation details in JavaScript and focus on the theoretical parts: algorithms and mathematics. If you are familiar with the topic and have improvements for the article, please don't hesitate to reach out to me. I am still learning about the topic myself and I would be grateful for any nudges in the right direction. After all, the guidance I give only describes my learning path, but I hope that others can make use of it.

MACHINE LEARNING IN JAVASCRIPT? WHAT'S WRONG WITH YOU?

By now I can hear the crowd yelling: JavaScript is not suited for machine learning. You may be right. But there are a couple of things why JavaScript could actually make sense to learn about machine learning as a web developer. And maybe not only as a web developer. Personally I think it has huge potential. That's why I attempt to make the topic more accessible for web developers.

As mentioned before, you might be already proficient in JavaScript. You don't have to learn another programming language from scratch. You can apply the theoretical parts of machine learning in any language. So why not JavaScript? Then you only have to learn about the theoretical parts in ML by applying the implementation details in JavaScript in the early stages. Afterward, you can always switch to another language for machine learning. Nobody takes that away from you. But you decide how to break down the learning paths for deploying your own learning curve and experience. You keep the overwhelming amount of things to learn to a minimum and thus might be better off to stay in a state of flow by keeping challenges ahead and your level of skill in balance.

JavaScript is evolving with a rapid speed. It is applied in several domains by now where nobody would have seen it a couple of years ago. You can see it on mobile devices, desktop applications, embedded systems and of course backend applications. It's not all about web development anymore. So why not machine learning? Maybe it becomes computational and implementation-wise efficient to write machine learning algorithms in JavaScript eventually. Recently a couple of libraries emerged which give us a framework around algorithms and neural networks. Those libraries make machine learning computational efficient by using WebGL in the browser. Perhaps it's not the best idea to train machine learning models in the browser, but using pre-trained models in the browser might be a promising field in the future. Last but not least, maybe it is just used as a bridge for web developer entering the field of machine learning but using a better suited programming language afterward. Nobody knows, but I want you to think about these possibilities.

But what about the performance? Machine learning algorithms are highly dependent on performance. Often they are using so called vectorized implementations to stay computational efficient. Graphical computations performed by the GPU are similarly used (see image; taken from webgltutorials.org). That's what makes C++ as a programming language so interesting for machine learning. Therefore, one would assume that JavaScript itself is not the best suited programming language. However, with WebGL becoming popular for GPU accelerates executions in the browser, it is utilized for recent machine learning libraries in JavaScript too.

Another concern exists regarding the training phase. Why should it happen in the browser at all even though it is supported by the GPU? In highly efficient machine learning architectures the computation is offloaded to distributed systems. But there again, recent machine learning libraries for JavaScript are used with pre-trained models (inference phase) and not the training phase in the browser. The model comes from a server and is only used for further predictions and visualizations in the browser. So why shouldn't it be possible to offer a framework around this interplay of training phase backend and inference phase frontend? As mentioned before, using pre-trained models in the browser could be a common practice in the future. People are working eagerly on making those models smaller in size. So it's not as difficult anymore as it was in the past to transfer them via a remote API.

One big argument against machine learning in JavaScript is its lack of libraries. But that's not so true anymore. There are a bunch of libraries helping you out. For instance, consider a couple of programming languages in machine learning and the areas where they are primarily used:

  • Math / Data Analysis: Matlab, Octave, Julia, R
  • Data Mining: Scala, Java (e.g. Hadoop, Spark, Deeplearning4j)
  • Performance: C/C++ (e.g. GPU accelerated)

Next, you can see why Python makes so much sense in machine learning. It has a suitable set of libraries for the different tasks assigned to the programming languages from above and even more good fitting solutions:

  • Math: numpy
  • Data Analysis: Pandas
  • Data Mining: PySpark
  • Server: Flask, Django
  • Performance:
    TensorFlow (because it is written with a Python API over a C/C++ engine)Keras (sits on top of TensorFlow)
    So yeah, it seems like that it just makes sense to use Python for machine learning. But the JavaScript ecosystem offers a rich set of libraries suited for most of the tasks too. Because most of them are not only used in machine learning, the JavaScript ecosystem developed a couple of sophisticated solutions beforehand:
  • Math: math.js
  • Data Analysis: d3.js
  • Server: node.js (express, koa, hapi)
  • Performance:
    Tensorflow.js (e.g. GPU accelerated via WebGL API in the browser)Keras.js
    Even though a library such as math.js is not running on the GPU for expensive computations, I guess one could use utility libraries such as gpu.js to accelerate its performance. Furthermore, the recent high level machine learning libraries such as Tensoflow.js come with their own set of mathematical functions which are indeed accelerated by the GPU. In the future you would either use one of those dedicated machine learning libraries for JavaScript which are GPU accelerated or math.js gets its own GPU accelerated wrapper eventually.

Except for the last libraries (Tensorflow.js and Keras.js) on the previous list, none of the other libraries is strictly related to machine learning. They were developed independently and thus have a strong community on their own. So JavaScript isn't so much behind other programming languages when it comes to the toolset. But for sure, the sky is the limit. There are endless of improvements which could be made or libraries which are needed. That's just another opportunity for open source developers to implement the necessary tools around it. And I assume in the future, there will evolve sophisticated libraries for machine learning in JavaScript. Just in the recent time, there were a couple of interesting libraries released or announced for machine learning in JavaScript.

  • Tensorflow.js (previously Deeplearn.js): The library by Google is GPU accelerated via WebGL API and used for predictions by using pre-trained models in inference mode in the browser but also for the training mode itself. It mirrors the API of the popular TensorFlow library.
  • TensorFire and Keras.js: Yet another pair of two GPU accelerated libraries which are used for pre-trained models in inference mode. They allow you to write your models in Keras or TensorFlow with Python. Afterward you can deploy them to the web by using TensorFire or Keras.js.

Only 2017 brought up those exciting and promising libraries. So I am curious what 2018 will offer us.

As you can see, so far the article pointed out a couple of concerns using JavaScript as your programming language to get started in machine learning. However, most of these reasons are not as much valid anymore as they were a couple of years ago. JavaScript is evolving and thus its capabilities of applying machine learning with it. Even though it may be only the bridge for you to learn about machine learning in the first place. Afterward, learn a more suited programming language for it. But then you have only to learn the programming language without worrying too much about the machine learning part anymore. Even though learning machine learning is an ongoing process and you will always learn something new in this fast paced domain. But it's exciting, because it has so many facets.

MACHINE LEARNING AS AN OPPORTUNITY FOR WEB DEVELOPER

I made my own motivation clear in the beginning of this article. However, that's not all to the story. There are plenty of reasons and opportunities to dive into machine learning as a web developer.

First of all, it is always an opportunity to broaden ones horizon. It doesn't only apply to machine learning. But whenever you feel you are getting too comfortable, take it as opportunity to learn something new. You don't need to take the practical way of implementing machine learning algorithms in JavaScript, maybe only learning about the math and the algorithms on a theoretical level suffices for you. After all, you keep your mind sharp by learning.

Second, there are plenty of job opportunities out there in the domain of machine learning. Sure, it is a overly hyped topic in the recent years, but not without any reason. Students and researcher in the field are hired straight away from university. There seems to be a huge demand in the general fields of AI, data analysis and machine learning. Bootcamps are popping up or shift their focus to data science. JavaScript can be the entry point into machine learning for web developers. See it as opportunity to take one step beyond web development and maybe to a wider range of job opportunities. Perhaps the market in web development paired with machine learning grows in the next years. But even if it doesn't, you can learn a programming language suited for machine learning to apply all your theoretical learnings in it. After all, maybe there comes the time when web developers have to make an important decision to get into a different domain than web development. Maybe their own works becomes redundant due to machine learning. So why not learning ML?

Third, even though JavaScript facilitates a lot of utility libraries for machine learning, there is plenty of space for improvements in the domain. Just thinking briefly about it, I am able to come up with a few things. For instance, speaking about computational efficiency, most of the libraries are not GPU accelerated yet. They would benefit a lot from these to be computational efficient for machine learning in the browser. In terms of visualizations, there are a couple of charting libraries, such as d3.js as low level visualization library, but there aren't any suitable abstractions for those visualizations applicable for machine learning related problems. It should be simpler to plot the result of a support vector machine or to visualize a performing neural network explicitly and not not implicitly based on the used domain problem. There is enough space for open source combining machine learning and JavaScript. You could contribute to widen the bridge for web developers entering the field of machine learning.

Last but not least, there is great effort involved on the side of ML open source contributors (e.g. Tensorflow.js, TensorFire, Keras.js, Brain.js) to enable machine learning in the browser. However, most often the documentation is suited for machine learners entering the browser domain and not the other way around as I described it in this article. Thus these solutions come with a lot of fundamental machine learning knowledge which isn't taught along the way. In return, it makes it difficult for web developers to enter the machine learning domain. Thus there is a great opportunity to pave the way for web developers into the domain of machine learning by making those fundamental topics and ported libraries accessible in an educational way. That's the point where I try to tie in my knowledge in teaching about those things. In the future, I want to give you the guidance if you are keen to enter the field of machine learning as web developer. Read more about this in the final paragraphs of this article.

INTRODUCTION TO MACHINE LEARNING

If you are familiar with machine learning, feel free to skip this section. Entering the field of machine learning as a beginner can be a buzzword heavy experience. Where should you start? There is so much terminology to clarify in the beginning. Is it AI or machine learning? What's all the hype about deep learning? And how fits data science in this area?

Let's start our journey with AI (artificial intelligence). "It is the intelligence of a machine that could successfully perform any intellectual task that a human being can." There is a great analogy in the Machine Learning Guidepodcast to convey the information of AI: Whereas the goal of the industrial revolution was the simulation of the physical body through machines, it is the goal of AI to simulate the brain for mental tasks through algorithms. So how does machine learning relate to AI? Let's have a look at the a couple of subfields of AI:

  • searching and planning (e.g. playing a game with possible actions)
  • reasoning and knowledge representation (structuring knowledge to come to conclusions)
  • perception (vision, touch, hearing)
  • ability to move and manipulate objects (goes into robotics)
  • natural language processing (NLP)
  • learning

The last one represents machine learning. As you can see, it is only a subfield of AI. However, it might be the only essential core fragment of AI because it reaches into the other subfields of AI too. It reaches into them even more over the recent time. For instance, vision as subfield becomes more of a part of applied machine learning. Where other techniques, e.g. domain specific algorithms, dominated the domain in the past, machine learning enters the field now. Now deep neural networks are often used for the domain. So what are applicable domains of AI and therefore most often machine learning? A bunch of domains and examples:

So machine learning is a subfield of AI. Let's dive into the subject itself. There are a couple of great definitions for machine learning, yet when I started out with the subject, I found the one by Arthur Samuel (1959) most memorable: *"The field of study that gives computers the ability to learn without being explicitly programmed."*How does it work? Basically machine learning can be grouped into three categories: supervised learning, unsupervised learning and reinforcement learning. It's quite an evolution from the former to the latter. Whereas the former is more concrete, the latter becomes more abstract (yet exciting and unexplored). The former, supervised learning, gives the best entry point to machine learning and is used therefore in several educational machine learning courses to get you into the field. In supervised learning, an algorithm is trained to recognize a pattern in a given data set. The data set is split up into input (x) and output (y). The algorithm is trained to map input to output by learning with the given data set (training phase) the underlying pattern. Afterward, when the algorithm is trained, it can be used to make predictions for future input data points to come up with output data points (inference phase). During the training phase, a cost function estimates the performance of the current algorithm and adjusts the parameters of the algorithm based on those outcomes (penalization). The algorithm itself can be simplified into a simple function to map an input x to an output y. It's called hypothesis or model.

Predicting housing prices in Portland is one popular machine learning problem for supervised learning. Given a data set of houses whereas each house has a size in square meter (x), the price (y) of the house should be predicted. Thus the data set consists a list of sizes and prices for houses. It is called a training set. Each row in the training set represents a house. The input x, in this case the size of the house, is called a feature of the house. Since there is only one feature for the houses in the training set, it is called a univariate training set. If there are more features for a house, such as number of bedrooms and size, it becomes a multivariate training set. Increasing the size of the training size (m) and the size of features (n) can lead to an improved prediction of y whereas y is called a label, target or simply the output. In a nutshell: A model is trained with a penalizing cost function to predict labels from data points and their features.

Tom Mitchell (1998): "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

The previous use case of predicting housing prices in Portland is called a regression problem. A linear regression, as explained before, can be used to train the hypothesis to output continuous values (e.g. housing prices). Another problem in the area of supervised learning to be solved is called classification problem where a logistic regression is used to output categorical values. For instance, imagine you have a training set of T-Shirts. The features, such as width and height, can be used to make predictions for the categorical sizes X, M and L.

The previous paragraphs were a first glimpse on supervised learning in machine learning. How does unsupervised learning work? Basically there is a given training set with features but no labels y. The algorithm is trained without any given output data in the training set. In a classification problem the algorithm has to figure out on its own to classify the data points into clusters.

And last but not least, what about reinforcement learning? In reinforcement learning the algorithm is trained without any given data. It learns from experience by repeating a learning process. For instance, take this flappy bird which learns to win the game by using neural networks in reinforcement learning. The algorithm is learning by trial and error. The underlying mechanism is a combination of rewards and penalizations to train the bird to fly. Similar as a real bird would learn how to fly.

Last but not least, there might be another question popping up in your head: What's the relationship of data science to machine learning? Data science is often associated with machine learning. So one could argue that machine learning bleeds into both domains: data science and artificial intelligence. However, data science has its own subfields such as data mining and data analysis. It can often be used coupled to machine learning, because data mining enables an algorithm to learn from mined data and data analysis enables researchers to study the outcomes of algorithms.

That was a broad introduction to the field of machine learning. If you are interested in those topics related to JavaScript, keep an eye on my website over the next months. I hope to cover a few topics to give people guidance entering the field as web developers. As I said, I am learning about the topic myself and try to internalize these learnings by writing them down.

HOW TO LEARN MACHINE LEARNING AS WEB DEVELOPER

There are a bunch of resources that I want to recommend for web developers entering the field of machine learning. As for myself, I wanted to stimulate my senses for at least 12 weeks. That's how long it is said to complete Andrew Ng's machine learning course. Keep in mind that it's my personal roadmap and it might not be suited for everyone. But it helped me a lot following a strict routine and having enough learning material along the way. So it might help other web developers too.

If you just want to get a feeling for the topic, start to listen to the Machine Learning Guide up to episode 11. Tyler Renelle has done an amazing job giving an introduction to the topic. Since it is a podcast, just give it a shot while you exercise in a gym. That's how I entered the field of ML.

If you start to get excited, the next step would be to enroll in the Machine Learning course by Andrew Ng which takes 12 weeks for completion. It takes you on a long journey from shallow machine learning algorithms to neural networks, from regression problems to clustering problems and from theoretical knowledge in the field to applied implementations in Octave or Matlab. It is intense and challenging, but you can do it by dedicating a couple of hours each week to the course and the exercises.

The machine learning course goes from linear regression to neural networks in 5 weeks. In the end of week 5, I was left with an overwhelming feeling. It was a combination of "Can week 6 become even more complex?" and "Wow, this course taught me all the building blocks to implement a neural network from scratch". Andrew gives a perfect walkthrough to learn about all these concepts which build up on one another. After all, machine learning has a lot in common with the composition of functions from functional programming. But you will learn about this yourself. I can only say that it was an overwhelming feeling to see an own implementation of a neural network performing in the browser for the first time.

Along the way, I did all the weekly assignments and solved them in Octave. In addition, I implemented most of the algorithms in JavaScript as well as as exercise for myself and as estimation how feasible it is to implement these algorithms in a different language not suited for machine learning but suited for web developers. It worked and I published all of them in a open GitHub organization. It's open for everyone to contribute. But that's not everything to the story. I wrote about a couple of topic as well to internalize my own learnings, to get guidance from others, but also to help web developers entering the field. So if you are doing the course, check out the JavaScript implementations and walkthroughs along the way. These walkthroughs are dedicated machine learning tutorials for Node.js and the browser.

It's not comprehensive yet, for instance a neural network implementation with vanilla JavaScript is missing, but I hope to complete all the bare bones algorithms in JavaScript at some point. The neural network implementation is done with a recently released library called deeplearn.js by Google which got rebranded to Tensorflow.js. I was pretty excited to use it for the first time, and it was my personal reward, after doing the course for 5 weeks, to use a library instead of implementing neural networks in JavaScript from scratch. Have a look at the neural network in action to improve web accessibility. Another one is learning digets using the MNIST databse and visualizes its outcome. Maybe you see it as opportunity as well to contribute to the GitHub organization. Next on the agenda are K-Means, Support Vector Machines (SVM) and principal component analysis (PCA) from scratch in JavaScript!

After you have completed week 5 of the machine learning course, you should have a good feeling about what's machine learning and how to solve problems with it. Afterward, the course continues with shallow algorithms for supervised learning and unsupervised learning. It gives elaborated guidance of how to improve your implemented machine learning algorithms and how to scale them for large data sets. When you have completed week 5, you should continue as well with the Machine Learning Guide podcast to learn more about shallow algorithms and neural networks. I listened to it until episode 17, because afterward it goes heavily into natural language processing.

In addition, over the course of those weeks, I read The Master Algorithm by Brilliance Audio to get an overview about the topic, its different perspectives and stakeholders, and its history. After that, I started to read the open source ebook Deep Learning (by Ian Goodfellow and Yoshua Bengio and Aaron Courville). It happened after week 5 of the course and fitted perfectly to all the foundational knowledge I learned so far. Even though I found it quite a challenging book so far, I can recommend both books to give you even more guidance along the way. Once I finish the second book, I want to read the free ebooks Neural Networks and Deep Learning by Michael Nielsen and Deep Learning by Adam Gibson, Josh Patterson. Do you have any other book or podcast recommendations? You can leave a comment below!

What else is out there to learn machine learning? Now after I completed the course by Andrew Ng, I will take some rest to internalize all those learnings. Likely I will write more about them for my blog. You can subscribe to the Newsletter if you are interested in hearing about them. However, there a bunch of other courses out there which I want to check out.

These are all courses recommended along with the Machien Learning course by Andrew Ng. Fast.ai has a course on computational linear algebra for the underlying math in ML too. In general, machine learning involves lots of math. If you need a refresher on certain topics, I can highly recommend Khan Academy.

Getting back to topic: Machine Learning in JavaScript. What kind of libraries are out there to support you for machine learning in JavaScript? If you attempt to go the puristic way of implementing math operations from scratch, there is no way around math.js (e.g. matrix operations). However, if you are using high level libraries such as Keras.js or Tensorflow.js, you will have the most important mathematical methods integrated by using their NDArrays, Tensors and mathematical operations. Otherwise, there are a couple of other libraries, not mentioning the mentioned again, which I didn't try yet. Keep in mind that not all of them are GPU accelerated, but I guess when it comes to computational efficiency, a couple of them will offer it in the future.

There are even more machine learning related libraries in JavaScript for the other subfields of AI.

Another library didn't make it in the list, because it is is not actively maintained: ConvNetJS. In addition, there are two more libraries implementing shallow machine learning algorithms in JavaScript: machine_learning and ml. In those libraries you can find logistic regression, k-means clustering, decisions trees, k-nearest neighbours, principal component analysis and naive bayes for JavaScript.

Many of those libraries are only for machine learning in Node.js. Thus they are not using the computational efficient WebGL in the browser.

If you have any other recommendations, please leave a comment below. If you know whether certain libraries are active or not maintained anymore, please reach out as well. I would love to keep this article updated for the future.

MORE PROGRAMMING LANGUAGES FOR MACHINE LEARNING

After learning and applying all the theoretical concepts in a programming languages of your choice (e.g. JavaScript), you can always come back to learn a programming languages best suited for machine learning. It can be a great learning experience in itself to experience how much more efficient something can be implemented in a different language. I had the same feeling when solving mathematical equations in Octave when doing them in JavaScript before.

A previous paragraph has shown a couple of machine learning languages (Python, C/C++, R, Scala, Java, Matlab, Octave, Julia) and their fields of expertise. The one outlier facilitating everything with its libraries seems to be Python. I cannot give any profound recommendation here, because I didn't use any of those languages in relation to machine learning, but personally I would choose Python if I would continue to learn about the topic after applying it in JavaScript. The one most recommended resource regarding learning Python was Learn Python the Hard Way. Andrew Ng mentions in his machine learning course that often machine learning algorithms are developed as prototype in Octave or Matlab but implemented in Python afterward. Therefore I am still figuring out a pragmatic learning roadmap as a combination of video, text and audio material for Python as I did for machine learning itself. If you have any recommendations, please leave a comment below.

In the end, I am curious about your feedback regarding machine learning in JavaScript for web developers. As said, I am learning on a daily basis about the topic myself at the moment. Most likely I will invest more time in this field in 2018, but I would love to hear your thoughts about it too. Are you staying with me on this journey?

Furthermore, I am curious if you have any opportunities for me to get more into machine learning in a professional way. At the moment, I am actively freelancing and consulting in JavaScript and web development and building my own projects on the side, but I would love to take the leap into machine learning for a professional position. I am eager to learn and would look up to mentors who are keen to teach someone new to the field of machine learning. So please take a moment to think about it and reach out to me in case there is anything where you can help me out :)

Last but not least, I want to announce BRIIM as a movement for machine learning in JavaScript. I hope I don't go out on a limb with it, but I am looking forward seeing JavaScript becoming more accessible for machine learning in the next years. That's why I started the BRIIM movement as a place for everyone to come together. It's an opportunity to act in concert as a community and not as individuals. Instead of library communities being isolated from each other, it should give an entry point for machine learning in JavaScript to work under a collective movement. Instead of finding articles about machine learning all over the web, it would be great to have one well maintained resource for it. Instead of scraping together all the pieces to learn about machine learning in JavaScript, there should be one high qualitative resource to pave the way for beginners. It's a movement to contribute together towards widening the bridge for JavaScript enthusiast entering the field of machine learning. So I hope to see you on the other side to join me on this journey.

If you have made it so far in this article, thank you so much for reading it!

Further reading:

Real Computer Vision for mobile and embedded

What is TensorFrames? TensorFlow + Apache Spark

How to optimize your Jupyter Notebook

5 Common Python Mistakes and How to Fix Them

How to Program a GUI Application (with Python Tkinter)

5 TensorFlow and ML Courses for Programmers

Building an Image Classification Model in 10 Minutes

Introduction New Features in TypeScript 3.7 and How to Use Them

Introduction New Features in TypeScript 3.7 and How to Use Them

The TypeScript 3.7 release is coming soon, and it's going to be a big one.

The target release date is November 5th, and there are some seriously exciting headline features included:

  • Assert signatures.
  • Recursive type aliases.
  • Top-level await.
  • Null coalescing.
  • Optional chaining.

Personally, I'm super excited about this, they're going to whisk away all sorts of annoyances that I've been fighting in TypeScript while building HTTP Toolkit.

If you haven't been paying close attention to the TypeScript development process though, it's probably not clear what half of these mean, or why you should care. Let's talk through all of them.

Assert Signatures

This is a brand-new and little-known TypeScript feature, which allows you to write functions that act like type guards as a side-effect, rather than explicitly returning their boolean result.

It's easiest to demonstrate this with a JavaScript example:

function assertString(input) { 
  if (typeof input === 'string') 
    return; 
  else 
    throw new Error('Input must be a string!'); 
} 
function doSomething(input) { 
  assertString(input); 
  // ... Use input, confident that it's a string 
} 
doSomething('abc'); 
// All good doSomething(123); // Throws an error

This pattern is neat and useful, and you can't use it in TypeScript today.

TypeScript can't know that you've guaranteed the type of input after it's run assertString. Typically, people just make the argument input: string to avoid this, and that's good. But, it also just pushes the type checking problem somewhere else, and in cases where you just want to fail hard, it's useful to have this option available.

Fortunately, soon we will:

// With TS 3.7 
function assertString(input: any): 
	asserts input is string { 
      // <-- the magic 
      if (typeof input === 'string') 
        return; 
      else 
        throw new Error('Input must be a string!'); 
    } 
function doSomething(input: string | number) { 
  assertString(input); 
  // input's type is just 'string' here }

Here assert input is string means that if this function ever returns, TypeScript can narrow the type of input to string, just as if it was inside an if block with a type guard.

To make this safe, that means if the assert statement isn't true then your assert function must either throw an error or not return at all (kill the process, infinite loop, you name it).

That's the basics, but this actually lets you pull some really neat tricks:

// With TS 3.7 
// Asserts that input is truthy, throwing immediately if not: 
function assert(input: any): 
	asserts input { // <-- not a typo 
      if (!input) 
        throw new Error('Not a truthy value'); 
    } 
declare const x: number | string | undefined; 
assert(x); // Narrows x to number | string 
// Also usable with type guarding expressions! 
assert(typeof x === 'string'); 
// Narrows x to string // -- Or use assert in your tests: -- 
const a: Result | Error = doSomethingTestable(); 
expect(a).is.instanceOf(result); 
// 'instanceOf' could 'asserts a is Result' 
expect(a.resultValue).to.equal(123); 
// a.resultValue is now legal // -- Use as a safer ! that throws immediately if 
// you're wrong -- 
function assertDefined<T>(obj: T): 
	asserts obj is NonNullable<T> { 
      if (obj === undefined || obj === null) { 
        throw new Error('Must not be a nullable value'); 
      } 
    } 
declare const x: string | undefined; 
// Gives y just 'string' as a type, could throw elsewhere later: 
const y = x!; 
// Gives y 'string' as a type, or throws immediately if you're wrong: 
assertDefined(x); const z = x; 
// -- Or even update types to track a function's side-effects -- 
type X<T extends string | {}> = { value: T }; 
// Use asserts to narrow types according to side effects: 
function setX<T extends string | {}>(x: X<any>, v: T): 
	asserts x is X<T> { 
      x.value = v; 
    } 
	declare let x: X<any>; 
// x is now { value: any }; 
setX(x, 123); 
// x is now { value: number };

This is still in flux, so don't take it as the definite result, and keep an eye on the pull request if you want the final details.

There's even a discussion there about allowing functions to assert something and return a type, which would let you extend the final example above to track a much wider variety of side effects, but we'll have to wait and see how that plays out.

Top-Level Await

Async/await is amazing and makes promises dramatically cleaner to use.

Unfortunately, though, you can't use them at the top level. This might not be something you care about much in a TS library or application, but if you're writing a runnable script or using TypeScript in a REPL, then this gets super annoying.

It's even worse if you're used to frontend development, since top-level await has been working nicely in the Chrome and Firefox console for a couple of years now.

Fortunately though, a fix is coming. This is actually a general stage-3 JS proposal, so it'll be everywhere else eventually too, but for TS devs 3.7 is where the magic happens.

This one's simple, but let's have another quick demo anyway:


// Your only solution right now for a script that does something async: 
async function doEverything() { 
  ... 
  const response = await fetch('http://example.com'); 
  ... 
} 
  
doEverything(); // <- eugh (could use an IIFE instead, but even more eugh)

With top-level await:

// With TS 3.7: 
// Your script: ... 
const response = await fetch('http://example.com'); 
// ...

There's a notable gotcha here: if you're not writing a script, or using a REPL, don't write this at the top level, unless you really know what you're doing!

It's totally possible to use this to write modules that do blocking async steps when imported. That can be useful for some niche cases, but people tend to assume that their import statement is a synchronous, reliable, and fairly quick operation, and you could easily hose your codebase's startup time if you start blocking imports for complex async processes (even worse, processes that can fail).

This is somewhat mitigated by the semantics of imports of async modules: they're imported and run in parallel, so the importing module effectively waits for Promise.all(importedModules) before being executed.

Rich Harris wrote an excellent piece on a previous version of this spec before that change when imports ran sequentially and this problem was much worse), which makes for good background reading on the risks here if you're interested.

It's also worth noting that this is only useful for module systems that support asynchronous imports. There isn't yet a formal spec for how TS will handle this, but that likely means that a very recent target configuration, and, either ES Modules or Webpack v5 (whose alphas have experimental support), will be used at runtime.

Recursive Type Aliases

If you're ever tried to define a recursive type in TypeScript, you may have run into StackOverflow questions like this: https://stackoverflow.com/questions/47842266/recursive-types-in-typescript.

Right now, you can't. Interfaces can be recursive, but there are limitations to their expressiveness, and type aliases can't. That means right now, you need to combine the two: define a type alias and extract the recursive parts of the type into interfaces. It works, but it's messy, and we can do better.

As a concrete example, this is the suggested type definition for JSON data:

type JSONValue = | string | number | boolean | JSONObject | JSONArray; 
interface JSONObject { [x: string]: JSONValue; } 
interface JSONArray extends Array<JSONValue> { }

That works, but the extra interfaces are only there because they're required to get around the recursion limitation.

Fixing this requires no new syntax; it just removes that restriction, so the below compiles:

// With TS 3.7: 
type JSONValue = | string | number | boolean | { [x: string]: JSONValue } | Array<JSONValue>;

Right now, that fails to compile with Type alias 'JSONValue' circularly references itself. Soon though, soon...

Null Coalescing

Aside from being difficult to spell, this one is quite simple and easy. It's based on a JavaScript stage-3 proposal, which means it'll also be coming to your favorite vanilla JavaScript environment soon (if it hasn't already).

In JavaScript, there's a common pattern for handling default values, and falling back to the first valid result of a defined group. It looks something like this:

// Use the first of firstResult/secondResult which is truthy: 
const result = firstResult || secondResult; 
// Use configValue from provided options if truthy, or 'default' if not: 
this.configValue = options.configValue || 'default';

This is useful in a host of cases, but due to some interesting quirks in JavaScript, it can catch you out. If firstResult or options.configValue can meaningfully be set to false, an empty string or 0, then this code has a bug. If those values are set, then when considered as booleans they're falsy, so the fallback value (secondResult / 'default') is used anyway.

Null coalescing fixes this. Instead of the above, you'll be able to write:

// With TS 3.7: 
// Use the first of firstResult/secondResult which is *defined*: 
const result = firstResult ?? secondResult; 
// Use configSetting from provided options if *defined*, or 'default' if not: 
this.configValue = options.configValue ?? 'default';

?? differs from || in that it falls through to the next value only if the first argument is null or undefined, not falsy. That fixes our bug. If you pass false as firstResult, that will be used instead of secondResult because, while it's falsy, it is still defined, and that's all that's required.

It's simple but super-useful, as it takes away a whole class of bugs.

Optional Chaining

Last but not least, optional chaining is another stage-3 proposal that is making its way into TypeScript.

This is designed to solve an issue faced by developers in every language: how do you get data out of a data structure when some or all of it might not be present?

Right now, you might do something like this:

// To get data.key1.key2, if any level could be null/undefined: 
let result = data ? (data.key1 ? data.key1.key2 : undefined) : undefined; 
// Another equivalent alternative: 
let result = ((data || {}).key1 || {}).key2;

Nasty! This gets much much worse if you need to go deeper, and although the second example works at runtime, it won't even compile in TypeScript, since the first step could be {}, in which case key1 isn't a valid key at all.

This gets still more complicated if you're trying to get into an array, or there's a function call somewhere in this process.

There's a host of other approaches to this, but they're all noisy, messy & error-prone. With optional chaining, you can do this:

// With TS 3.7: 
// Returns the value is it's all defined & non-null, or undefined if not. 
let result = data?.key1?.key2; 
// The same, through an array index or property, if possible: 
array?.[0]?.['key']; 
// Call a method, but only if it's defined: 
obj.method?.(); 
// Get a property, or return 'default' if any step is not defined: 
let result = data?.key1?.key2 ?? 'default';

The last case shows how neatly some of these dovetails together: null coalescing + optional chaining is a match made in heaven.

One gotcha: this will return undefined for missing values, even if they were null, e.g. in cases like (null)?.key (returns undefined). A small point, but one to watch out for if you have a lot of null in your data structures.

That's the lot! That should outline all the essentials for these features, but there are lots of smaller improvements, fixes, and editor support improvements coming too, so take a look at the official roadmap if you want to get into the nitty-gritty.

How to building a stable Node.js project architecture.

How to building a stable Node.js project architecture.

Often product development process which involves JavaScript, is accompanied by the use of Node.js, a JavaScript runtime environment. The birth of this technology has certainly turned the use of JS upside-down. Today, [JavaScript...

Often product development process which involves JavaScript, is accompanied by the use of Node.js, a JavaScript runtime environment. The birth of this technology has certainly turned the use of JS upside-down. Today, JavaScript is in the category of the most preferred languages to build apps thanks to Node.js.

What is so special about this technology? To answer this, let’s reflect on not only this technology’s benefits but also its architecture limitations and the ways to deal with cons.

Node JS brief history

Node.js was introduced by Ryan Dahl in 2009. The technology is mostly used for building app’s server side/ back-end development. What’s special about Node.js is that the technology is asynchronous.

This means that server continues to process other client requests without urging the client to wait till another previously sent request is processed. Let’s say, it’s Node.js “value proposition” for all who would like to create reliable JS-based apps.

What is Node JS commonly used for?

The non-blocking I/O machine behind the current framework is a great way to build real-time web app with NodeJS, mobile products, chats, data streaming apps, browser games, APIs and medium-performance JS apps.

Node JS real-time applications examples and showcase

Among the companies who rely their tech part to Node.js are LinkedIn, Yahoo, IBM, Netflix, PayPal, Uber and others.

Let’s see where else Node is used apart from back-end (2017 data):
This is image title

Source
As of business point of view, Node.js is used for:
This is image title

If you have worked with Node.js, you probably already know that with Node.js you can gain the following:

  • Since Node.js is created with the C++ help, you can call C functions

  • Asynchronous nature which accelerates app’s functioning and provides the ability to multitask. Apart from this, non-closing i/o method suits for high-traffic, real-time websites, resources creation.

  • Stable multiplatform app development

  • Lots of Node.js development tools for better workflow: npm-s, Express, Socket.io, etc. (we’ll touch upon these a bit later in the article)

  • Clear and flexible learning curve

But with Node.js architecture limitations you lose the opportunities to:

  • Create heavy-computed apps with the elements of 3D projection; calculation apps.
    As Node.js is single-threaded, it is not the right fit for such projects. All the actions happen on a single thread, and hence, overload CPU. For such types of apps or software, it’s better to utilize multithreading languages, like C, C#. For the full-scale video games, you can use Unity.

  • Use some npm-s.

Not all npm-s, we’ve mentioned in the pros, are of high quality and stable. Thus, you have to filter them properly and choose only the reliable ones. (Author’s note: Think of npm-s, like of plugins in Wordpress).

  • Taking in consideration Node.js architecture, you can’t utilize relational databases at full power.

Node.js come best with such document-oriented databases, like MongoDB

Let’s get into tech details and best practices for Node.js development and more detailed practical tips on working with this platform

  • Application Specifics
    First and foremost, think about what type of application you plan to release. To further proceed with Node.js project architecture building, ask yourself some of the following questions:

  • Are you going to build a real-time web app with Node.js? Is it meant to be a mobile or a console one? Or maybe it's a multiplatform app?

  • What data should the app operate with? Are these databases, files, or remote storages (like Amazon S3)?

  • Do you plan to use special software in your application? Are sophisticated data processing algorithms such as face detection or text recognition enlisted in your app's functionality business plan?

  • Does your application need an extra hardware, like camera, microphone, various sensors, or any other related devices?

  • What are the architectural specifics of the future app? Is this meant to be a client server, MVC or maybe any other type of architecture?

If you've answered all of these questions, make sure that Node.js is able to fully meet all requirements set for your project development. For example, if you need an API server that works with several types of databases, Node.js might be a good choice. But if you need an application that is designed to build 3D graphics using Directx, you might want to get acquainted with C ++ a bit closer.

Let's assume that your application uses special temperature and contamination sensors. You can pair such features with RaspberryPi, Arduino or any other special device to go further and create a 'smart' functionality model. But before you start, make sure that the driver of any mentioned device is compatible with Node.js.

Best practices for Node.js development workflow

Typically, an application is written by a team of developers. Everyone in the team is unique as well as their own code style. Therefore, it’s recommended to settle and take into account the following code organization nuances before the development stage.

  • Functional development style or object-oriented programming patterns usage?

Since JS is a weakly-typed language and allows you to write your code in a freestyle, it's still better to agree upon a single code writing rules in your team. This will keep most of the misunderstandings away and will help your colleagues to get the better understanding of the project’s code.

  • Code style
    Discuss with your teammates code writing do-s and don’t-s. Check the quality of what is written, using such Node.js development tools, like lint

  • Your team’s experiencewith the integration of third-party means and devices in your application (i.e. Google Maps, data collection, analytics tools, e-communication means,etc.)

  • Data models you’re up to work with (files, databases or third-party APIs)

  • Communication and data exchange tools you’re going to use (REST API, Blouse Protocol, Socket io, GraphQl, DDP protocol)

  • Possibility to utilize 3rd party libraries (hardware libraries, special algorithms)

The scope of work and its specifics can be as random as possible. Let's say one of your teammates works with SQL database, someone else deals with Amazon API. Thus, each of your colleagues is assigned to do the particular tasks.

Don't reinvent the bike

Currently Node.js community is up and thriving. A lot of neat features are invented and written by other developers already. So before you create a particular functionality for your application, make sure if someone else has not encountered the exact problem before.

If you’re not the only one who has experienced a certain issue, you might find the solution in npm packages. This is an entire catalogue of many ready-made useful libraries that will make life much easier for you.

The same applies to frameworks. Think about whether to use any of them to speed up the process to build a real-time web app with Node JS or a mobile one.

Let's say if you’re dealing with REST API, you can try out Express js. If you need to interact with a particular database type, you can refer to such frameworks, as Mongoose js or SQL depending on which database type you need.

Although packages can benefit your project, there are some significant dangers to be aware of. Given the fact, that these solutions are open source there are several threats to bear in mind:

  • Duplicates

There are too many packages already and some of them clone the others. Unfortunately, this mainstream is only growing. Be careful and make sure you choose the right and unique npm.

  • Malicious code
    Since these packages are not supervised, anyone can write anything they want. Read more about security issues in the article by David Gilbertson ‘I’m harvesting credit card numbers and passwords from your site. Here’s how’. So if your product has to provide AAA-security type check each code snippet of any package you instal meticulously.
Always stay ahead of the time

JS and Node.js community is constantly growing. ES standards are frequently updated. Old features are being replaced by the new, better ones and implemented in Node.js. Thus it’s important to always monitor the technology’s state of art.

For instance,

[calbackHell](http://callbackhell.com/) 
fs.action(source, function (err, res) {
  if (err) {
    console.log('Error: ' + err)
  } else {
     res.acton(function(err, res) {
       console.log('Error s: ' + err)
     })
    })
  }
})

Was replaced with

[promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)
fs.action(source)
  .then(res => res.action())
  .then(res => res.action())
  .then(res => res.action())
  .catch(err =>  console.log('Error : ' + err))
	

Now we can use async/await as the alternative to Promises:

[async/await](https://blog.risingstack.com/async-await-node-js-7-nightly)
try {
  const res = await fs.action(source);
  const res1 = await res.action(source);
  const res2 = await res1.action(source);
  const res3 = await res2.action(source);
} catch(error) {
   console.log('Error s: ' + err)
}

As for now, community’s opinion whether to use promises or async/await vary.

Try to update your knowledge base with the new Node.js and ES releases on the regular basis. This will help you keep a modernized development process.

Node.js app development techniques and tips

“Deep in the human unconscious is a pervasive need for a logical universe that makes sense. But the real universe is always one step beyond logic.”
― Frank Herbert.

To overcome Node.js architecture limitations and trivial-to-challenging issues, keep to the clear development structure.

Since Node.js project might consist of only one file, this does not mean that you need to pile everything up into one great mess.

Make your code as readable and understandable for others as it could be. The following recommendations :

  • Follow the instructions on the structure development for the framework you are using. If you do not use any, try to place your code in the directories and subdirectories in the most logical way possible.

  • File naming

Keep to a single agreed file naming. For example, choose one of these: ErrorHandler or errorHandler orerror_handle or error-handler. Try to name the files according to their purpose but not according to their functionality. For example, it’s better to name the File NotifyAllUserByEmaisSMSLocal as Notifier.

  • The entry point must not contain unnecessary code lines

The entry point is the main.js,app.js and www files that are requested to launch your application. Such files contain only certain methods or classes calls, but not more.

  • index.js.

Keeping only import / export in these files is in general considered to be a justified practice.

Tips for better Node.js project architecture

The way the code is written signifies the ‘face’ (reputation) of the programmer. It also shows how the entire development team deals with the app’s creation using a certain technology or language. Node.js in our case. Therefore, always try to keep it up-to-date and structured (and comprehensive for other developers). Code readability is one of the main ways to build stable, real time web app with Node.js.

  • Use code quality control tools, like Lint

This tool will help you not to slip out/keep a keen eye on any trivial small error./Give a trivial error no chance. It will also allow you to keep the code in one unified form.

  • Keep track of your files size

Too large files are difficult in guiding and understanding. The optimal file size is of 300-500 lines or less. So if you have noted that the code is constantly growing within the same file, turn it to the directory with several files inside.

  • Comment on your code
    When you write a universal module that will be used in several places, don’t forget to create a quick guide on how to utilize this code.
/**
 * Provide sending notification for users by Local, Email , SMS
 *
 * @example
 *          new Notifier(user).notify(['sms', 'email'], ...)
 **/
class Notifier {

You can also leave instructions for methods in your code

/**
 * Provide parse date for single format on project
 * @param Date
 * @example
 *        dateToString(date) => String
 * **/
 

If you’re developing a particular REST API, you can embed instructions on its usage in the code itself. It’s recommended though, to create the complete documentation/guidelines and store it on a single resource.

/**
* Provide Api for Account
  Account Register  POST /api/v1/account/
  @params
         email {string}
         password {string}
  Account Login  POST /api/v1/account/login
  @params
         email {string}
         password {string}

  Account Logout  GET /api/v1/account/logout
  @header
         Authorization: Bearer {token}
 **/
 

There are 2 sides of a coin though.

To ensure the use of Node.js architecture best practices, keep your code as descriptive and organized as possible but don’t overdo.

Don’t comment each code string. It will do more harm than good and nothing but only make the development process more complex. The better tip is to comment the code snippet’s purpose but not its functionality (what does it do). Depending on the development style (callback, promise, async/await) write and use only one (if it’s possible) general error handler/processor.

Errors handling

Errors handling is another important aspect among other best practices for Node.js development to bear in mind.

Since JavaScript is not as strict as Java all the responsibility lies on the development team.

First and foremost, always try to process the errors. Otherwise it can lead to app’s uncontrolled behavior.

  • With callback
const withoutErrors = calback => (err, updatedTank) => {
  if (err) {
    return // do something
  }
  return calback(updatedTank);
};
fs.action(withoutErrors(data => ...))

With Promise

const handlError = error => {
  if (err) {
    return // do something
  }
};
fs.action()
    .then(data => )
    .then(data => )
    .cattch(handlError)
  • With async / await
class Actions
  async action1 (data) {
    return fs.action(data)
  }
  async action2 (data) {
    return fs.action(data)
  }
  .... 
}

try {
  await new Actions().action1();
  await new Actions().action1();
} catach(error) {
  return handlError(error)
}

Node JS development tools
  • Gulp

A toolkit which allows to launch several apps simultaneously. It might be useful if you’d like to run several services at the same time with one command/request.

  • Nodemon

Hot reload feature for Node.js. This tool automatically updates/ resets your project after any code change is made. A quite handy tool during the Node.js project architecture development.

  • Forever, pm2
    These two packages ensure app’s launch during the (OC) system’s start.

  • Winston
    Provides with the opportunity to record app’s logs to the primary source (file or database). The package comes to help, when you need the app to work remotely and don’t have the full access to it.

  • Threads
    A tool designed for the better work with threads.

To sum up with

We hope that Node.js architecture best practices represented in this article will help you reach the most desireable result when looking for the way to build performant real-time web or mobile apps with Node.js.

Build a CMS with Laravel and Vue

Build a CMS with Laravel and Vue

A CMS (Content Management System) helps content creators produce content in an easily consumable format. In this tutorial series, we will consider how to build a simple CMS from scratch using Laravel and Vue.

A CMS (Content Management System) helps content creators produce content in an easily consumable format. In this tutorial series, we will consider how to build a simple CMS from scratch using Laravel and Vue.

Build a CMS with Laravel and Vue - Part 1: Setting up

The birth of the internet has since redefined content accessibility for the better, causing a distinct rise in content consumption across the globe. The average user of the internet consumes and produces some form of content formally or informally.

An example of an effort at formal content creation is when an someone makes a blog post about their work so that a targeted demographic can easily find their website. This type of content is usually served and managed by a CMS (Content Management System). Some popular ones are WordPress, Drupal, and SilverStripe.

A CMS helps content creators produce content in an easily consumable format. In this tutorial series, we will consider how to build a simple CMS from scratch using Laravel and Vue.

Our CMS will be able to make new posts, update existing posts, delete posts that we do not need anymore, and also allow users make comments to posts which will be updated in realtime using Pusher. We will also be able to add featured images to posts to give them some visual appeal.

When we are done, we will be able to have a CMS that looks like this:

Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.

The source code for this project is available here on GitHub.## Installing the Laravel CLI
The source code for this project is available here on GitHub.
The first thing we need to do is install the Laravel CLI, and the Laravel dependencies. The CLI will be instrumental in creating new Laravel projects whenever we need to create one. Laravel requires PHP and a few other tools and extensions, so we need to first install these first before installing the CLI.

Here’s a list of the dependencies as documented on the official Laravel documentation:

Let’s install them one at a time.

Installing PHP

The source code for this project is available here on GitHub.
Open a fresh instance of the terminal and paste the following command:

    # Linux Users
    $ sudo apt-get install php7.2

    # Mac users
    $ brew install php72


As at the time of writing this article, PHP 7.2 is the latest stable version of PHP so the command above installs it on your machine.

On completion, you can check that PHP has been installed to your machine with the following command:

    $ php -v


Installing the Mbstring extension

To install the mbstring extension for PHP, paste the following command in the open terminal:

    # Linux users
    $ sudo apt-get install php7.2-mbstring

    # Mac users
    # You don't have to do anything as it is installed automatically.


To check if the mbstring extension has been installed successfully, you can run the command below:

    $ php -m | grep mbstring


Installing the XML PHP extension

To install the XML extension for PHP, paste the following command in the open terminal:

    # Linux users
    $ sudo apt-get install php-xml

    # Mac users
    # You don't have to do anything as it is installed automatically.


To check if the xml extension has been installed successfully, you can run the command below:

    $ php -m | grep xml


Installing the ZIP PHP extension

To install the zip extension for PHP, paste the following command in your terminal:

    # Linux users
    $ sudo apt-get install php7.2-zip

    # Mac users
    # You don't have to do anything as it is installed automatically.


To check if the zip extension has been installed successfully, you can run the command below:

    $ php -m | grep zip


Installing curl

The source code for this project is available here on GitHub.
To install curl, paste the following command in your terminal:

    # Linux users
    $ sudo apt-get install curl

    # Mac users using Homebrew (https://brew.sh)
    $ brew install curl


To verify that curl has been installed successfully, run the following command:

    $ curl --version


Installing Composer

The source code for this project is available here on GitHub.> The source code for this project is available here on GitHub.
Now that we have curl installed on our machine, let’s pull in Composer with this command:

    $ curl -sS https://getcomposer.org/installer | sudo php -- --install-dir=/usr/local/bin --filename=composer


For us to run Composer in the future without calling sudo, we may need to change the permission, however you should only do this if you have problems installing packages:

    $ sudo chown -R $USER ~/.composer/


Installing the Laravel installer

At this point, we can already create a new Laravel project using Composer’s create-project command, which looks like this:

    $ composer create-project --prefer-dist laravel/laravel project-name


But we will go one step further and install the Laravel installer using composer:

    $ composer global require "laravel/installer"


The source code for this project is available here on GitHub.
After the installation, we will need to add the PATH to the bashrc file so that our terminal can recognize the laravel command:

    $ echo 'export PATH="$HOME/.composer/vendor/bin:$PATH"' >> ~/.bashrc
    $ source ~/.bashrc


Creating the CMS project

Now that we have the official Laravel CLI installed on our machine, let’s create our CMS project using the installer. In your terminal window, cd to the project directory you want to create the project in and run the following command:

    $ laravel new cms


The source code for this project is available here on GitHub.
We will navigate into the project directory and serve the application using PHP’s web server:

    $ cd cms
    $ php artisan serve


Now, when we visit http://127.0.0.1:8000/, we will see the default Laravel template:

Setting up the database

In this series, we will be using MySQL as our database system so a prerequisite for this section is that you have MySQL installed on your machine.

You can follow the steps below to install and configure MySQL:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.

You will also need a special driver that makes it possible for PHP to work with MySQL, you can install it with this command:

    # Linux users
    $ sudo apt-get install php7.2-mysql

    # Mac Users
    # You don't have to do anything as it is installed automatically.


Load the project directory in your favorite text editor and there should be a .env file in the root of the folder. This is where Laravel stores its environment variables.

Create a new MySQL database and call it laravelcms. In the .env file, update the database configuration keys as seen below:

    DB_CONNECTION=mysql
    DB_HOST=127.0.0.1
    DB_PORT=3306
    DB_DATABASE=laravelcms
    DB_USERNAME=YourUsername
    DB_PASSWORD=YourPassword


The source code for this project is available here on GitHub.## Setting up user roles

Like most content management systems, we are going to have a user role system so that our blog can have multiple types of users; the admin and regular user. The admin should be able to create a post and perform other CRUD operations on a post. The regular user, on the other hand, should be able to view and comment on a post.

For us to implement this functionality, we need to implement user authentication and add a simple role authorization system.

Setting up user authentication

Laravel provides user authentication out of the box, which is great, and we can key into the feature by running a single command:

    $ php artisan make:auth


The above will create all that’s necessary for authentication in our application so we do not need to do anything extra.

Setting up role authorization

We need a model for the user roles so let’s create one and an associated migration file:

    $ php artisan make:model Role -m


In the database/migrations folder, find the newly created migration file and update the CreateRolesTable class with this snippet:

    <?php // File: ./database/migrations/*_create_roles_table.php

    // [...]

    class CreateRolesTable extends Migration
    {
        public function up()
        {
            Schema::create('roles', function (Blueprint $table) {
                $table->increments('id');
                $table->string('name');
                $table->string('description');
                $table->timestamps();
            });
        }

        public function down()
        {
            Schema::dropIfExists('roles');
        }
    }

We intend to create a many-to-many relationship between the User and Role models so let’s add a relationship method on both models.

Open the User model and add the following method:

    // File: ./app/User.php
    public function roles() 
    {
        return $this->belongsToMany(Role::class);
    }

Open the Role model and include the following method:

    // File: ./app/Role.php
    public function users() 
    {
        return $this->belongsToMany(User::class);
    }

We are also going to need a pivot table to associate each user with a matching role so let’s create a new migration file for the role_user table:

    $ php artisan make:migration create_role_user_table


In the database/migrations folder, find the newly created migration file and update the CreateRoleUserTable class with this snippet:

    // File: ./database/migrations/*_create_role_user_table.php
    <?php 

    // [...]

    class CreateRoleUserTable extends Migration
    {

        public function up()
        {
            Schema::create('role_user', function (Blueprint $table) {
                $table->increments('id');
                $table->integer('role_id')->unsigned();
                $table->integer('user_id')->unsigned();
            });
        }

        public function down()
        {
            Schema::dropIfExists('role_user');
        }
    }

Next, let’s create seeders that will populate the users and roles tables with some data. In your terminal, run the following command to create the database seeders:

    $ php artisan make:seeder RoleTableSeeder
    $ php artisan make:seeder UserTableSeeder


In the database/seeds folder, open the RoleTableSeeder.php file and replace the contents with the following code:

    // File: ./database/seeds/RoleTableSeeder.php
    <?php 

    use App\Role;
    use Illuminate\Database\Seeder;

    class RoleTableSeeder extends Seeder
    {
        public function run()
        {
            $role_regular_user = new Role;
            $role_regular_user->name = 'user';
            $role_regular_user->description = 'A regular user';
            $role_regular_user->save();

            $role_admin_user = new Role;
            $role_admin_user->name = 'admin';
            $role_admin_user->description = 'An admin user';
            $role_admin_user->save();
        }
    }

Open the UserTableSeeder.php file and replace the contents with the following code:

    // File: ./database/seeds/UserTableSeeder.php
    <?php 

    use Illuminate\Database\Seeder;
    use Illuminate\Support\Facades\Hash;
    use App\User;
    use App\Role;

    class UserTableSeeder extends Seeder
    {

        public function run()
        {
            $user = new User;
            $user->name = 'Samuel Jackson';
            $user->email = '[email protected]';
            $user->password = bcrypt('samuel1234');
            $user->save();
            $user->roles()->attach(Role::where('name', 'user')->first());

            $admin = new User;
            $admin->name = 'Neo Ighodaro';
            $admin->email = '[email protected]';
            $admin->password = bcrypt('neo1234');
            $admin->save();
            $admin->roles()->attach(Role::where('name', 'admin')->first());
        }
    }

We also need to update the DatabaseSeeder class. Open the file and update the run method as seen below:

    // File: ./database/seeds/DatabaseSeeder.php
    <?php 

    // [...]

    class DatabaseSeeder extends Seeder
    {
        public function run()
        {
            $this->call([
                RoleTableSeeder::class, 
                UserTableSeeder::class,
            ]);
        }
    }

Next, let’s update the User model. We will be adding a checkRoles method that checks what role a user has. We will return a 404 page where a user doesn’t have the expected role for a page. Open the User model and add these methods:

    // File: ./app/User.php
    public function checkRoles($roles) 
    {
        if ( ! is_array($roles)) {
            $roles = [$roles];    
        }

        if ( ! $this->hasAnyRole($roles)) {
            auth()->logout();
            abort(404);
        }
    }

    public function hasAnyRole($roles): bool
    {
        return (bool) $this->roles()->whereIn('name', $roles)->first();
    }

    public function hasRole($role): bool
    {
        return (bool) $this->roles()->where('name', $role)->first();
    }

Let’s modify the RegisterController.php file in the Controllers/Auth folder so that a default role, the user role, is always attached to a new user at registration.

Open the RegisterController and update the create action with the following code:

    // File: ./app/Http/Controllers/Auth/RegisterController.php
    protected function create(array $data)
    {       
        $user = User::create([
            'name'     => $data['name'],
            'email'    => $data['email'],
            'password' => bcrypt($data['password']),
        ]);

        $user->roles()->attach(\App\Role::where('name', 'user')->first());

        return $user;
    }

Now let’s migrate and seed the database so that we can log in with the sample accounts. To do this, run the following command in your terminal:

    $ php artisan migrate:fresh --seed


In order to test that our roles work as they should, we will make an update to the HomeController.php file. Open the HomeController and update the index method as seen below:

    // File: ./app/Http/Controllers/HomeController.php
    public function index(Request $request)
    {
        $request->user()->checkRoles('admin');

        return view('home');
    }

Now, only administrators should be able to see the dashboard. In a more complex application, we would use a middleware to do this instead.

We can test that this works by serving the application and logging in both user accounts; Samuel Jackson and Neo Ighodaro.

Remember that in our UserTableSeeder.php file, we defined Samuel as a regular user and Neo as an admin, so Samuel should see a 404 error after logging in and Neo should be able to see the homepage.

Testing the application

Let’s serve the application with this command:

    $ php artisan serve


When we try logging in with Samuel’s credentials, we should see this:

On the other hand, we will get logged in with Neo’s credentials because he has an admin account:

We will also confirm that whenever a new user registers, he is assigned a role and it is the role of a regular user. We will create a new user and call him Greg, he should see a 404 error right after:

It works just as we wanted it to, however, it doesn’t really make any sense for us to redirect a regular user to a 404 page. Instead, we will edit the HomeController so that it redirects users based on their roles, that is, it redirects a regular user to a regular homepage and an admin to an admin dashboard.

Open the HomeController.php file and update the index method as seen below:

    // File: ./app/Http/Controllers/HomeController.php
    public function index(Request $request)
    {
        if ($request->user()->hasRole('user')) {
            return redirect('/');
        }

        if ($request->user()->hasRole('admin')){
            return redirect('/admin/dashboard');
        }
    }

If we serve our application and try to log in using the admin account, we will hit a 404 error because we do not have a controller or a view for the admin/dashboard route. In the next article, we will start building the basic views for the CMS.

Conclusion

In this tutorial, we learned how to install a fresh Laravel app on our machine and pulled in all the needed dependencies. We also learned how to configure the Laravel app to work with a MySQL database. We also created our models and migrations files and seeded the database using database seeders.

In the next part of this series, we will start building the views for the application.

The source code for this project is available on Github.

Build a CMS with Laravel and Vue - Part 2: Implementing posts

In the previous part of this series, we set up user authentication and role authorization but we didn’t create any views for the application yet. In this section, we will create the Post model and start building the frontend for the application.

Our application allows different levels of accessibility for two kinds of users; the regular user and admin. In this chapter, we will focus on building the view that the regular users are permitted to see.

Before we build any views, let’s create the Post model as it is imperative to rendering the view.

The source code for this project is available here on GitHub.## Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
Setting up the Post model

We will create the Post model with an associated resource controller and a migration file using this command:

    $ php artisan make:model Post -mr


The source code for this project is available here on GitHub.
Let’s navigate into the database/migrations folder and update the CreatePostsTable class that was generated for us:

    // File: ./app/database/migrations/*_create_posts_table.php
    <?php 

    // [...]

    class CreatePostsTable extends Migration
    {
        public function up()
        {
            Schema::create('posts', function (Blueprint $table) {
                $table->increments('id');
                $table->integer('user_id')->unsigned();
                $table->string('title');
                $table->text('body');
                $table->binary('image')->nullable();
                $table->timestamps();
            });
        }

        public function down()
        {
            Schema::dropIfExists('posts');
        }
    }

We included a user_id property because we want to create a relationship between the User and Post models. A Post also has an image field, which is where its associated image’s address will be stored.

Creating a database seeder for the Post table

We will create a new seeder file for the posts table using this command:

    $ php artisan make:seeder PostTableSeeder


Let’s navigate into the database/seeds folder and update the PostTableSeeder.php file:

    // File: ./app/database/seeds/PostsTableSeeder.php
    <?php 

    use App\Post;
    use Illuminate\Database\Seeder;

    class PostTableSeeder extends Seeder
    {
        public function run()
        {
            $post = new Post;
            $post->user_id = 2;
            $post->title = "Using Laravel Seeders";
            $post->body = "Laravel includes a simple method of seeding your database with test data using seed classes. All seed classes are stored in the database/seeds directory. Seed classes may have any name you wish, but probably should follow some sensible convention, such as UsersTableSeeder, etc. By default, a DatabaseSeeder class is defined for you. From this class, you may use the  call method to run other seed classes, allowing you to control the seeding order.";
            $post->save();

            $post = new Post;
            $post->user_id = 2;
            $post->title = "Database: Migrations";
            $post->body = "Migrations are like version control for your database, allowing your team to easily modify and share the application's database schema. Migrations are typically paired with Laravel's schema builder to easily build your application's database schema. If you have ever had to tell a teammate to manually add a column to their local database schema, you've faced the problem that database migrations solve.";
            $post->save();
        }
    }

When we run this seeder, it will create two new posts and assign both of them to the admin user whose ID is 2. We are attaching both posts to the admin user because the regular users are only allowed to view posts and make comments; they can’t create a post.

Let’s open the DatabaseSeeder and update it with the following code:

    // File: ./app/database/seeds/DatabaseSeeder.php
    <?php 

    use Illuminate\Database\Seeder;

    class DatabaseSeeder extends Seeder
    {
        public function run()
        {
            $this->call([
                RoleTableSeeder::class,
                UserTableSeeder::class,
                PostTableSeeder::class,
            ]);
        }
    }

The source code for this project is available here on GitHub.
We will use this command to migrate our tables and seed the database:

    $ php artisan migrate:fresh --seed


Defining the relationships

Just as we previously created a many-to-many relationship between the User and Role models, we need to create a different kind of relationship between the Post and User models.

We will define the relationship as a one-to-many relationship because a user will have many posts but a post will only ever belong to one user.

Open the User model and include the method below:

    // File: ./app/User.php
    public function posts()
    {
        return $this->hasMany(Post::class);
    }

Open the Post model and include the method below:

    // File: ./app/Post.php
    public function user()
    {
        return $this->belongsTo(User::class);
    }

Setting up the routes

At this point in our application, we do not have a front page with all the posts listed. Let’s create so anyone can see all of the created posts. Asides from the front page, we also need a single post page in case a user needs to read a specific post.

Let’s include two new routes to our routes/web.php file:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
    Route::get('/', '[email protected]');

The source code for this project is available here on GitHub.* Basic knowledge of PHP.

  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
    Route::get('/posts/{post}', '[email protected]');

With these two new routes added, here’s what the routes/web.php file should look like this:

    // File: ./routes/web.php
    <?php 

    Auth::routes();
    Route::get('/posts/{post}', '[email protected]');
    Route::get('/home', '[email protected]')->name('home');
    Route::get('/', '[email protected]');

Setting up the Post controller

In this section, we want to define the handler action methods that we registered in the routes/web.php file so that our application know how to render the matching views.

First, let’s add the all() method:

    // File: ./app/Http/Controllers/PostController.php
    public function all()
    {
        return view('landing', [
            'posts' => Post::latest()->paginate(5)
        ]);
    }

Here, we want to retrieve five created posts per page and send to the landing view. We will create this view shortly.

Next, let’s add the single() method to the controller:

    // File: ./app/Http/Controllers/PostController.php
    public function single(Post $post)
    {
        return view('single', compact('post'));
    }

In the method above, we used a feature of Laravel named route model binding to map the URL parameter to a Post instance with the same ID. We are returning a single view, which we will create shortly. This will be the view for the single post page.

Building our views

Laravel uses a templating engine called Blade for its frontend. We will use Blade to build these parts of the frontend before switching to Vue in the next chapter.

Navigate to the resources/views folder and create two new Blade files:

  1. landing.blade.php
  2. single.blade.php

These are the files that will load the views for the landing page and single post page. Before we start writing any code in these files, we want to create a simple layout template that our page views can use as a base.

In the resources/views/layouts folder, create a Blade template file and call it master.blade.php. This is where we will define the inheritable template for our single and landing pages.

Open the master.blade.php file and update it with this code:

    <!-- File: ./resources/views/layouts/master.blade.php -->
    <!DOCTYPE html>
    <html lang="en">
      <head>
        <meta charset="utf-8">
        <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
        <meta name="description" content="">
        <meta name="author" content="Neo Ighodaro">
        <title>LaravelCMS</title>
        <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css">
        <style> 
        body {
          padding-top: 54px;
        }
        @media (min-width: 992px) {
          body {
              padding-top: 56px;
          }
        }
        </style>
      </head>
      <body>
        <nav class="navbar navbar-expand-lg navbar-dark bg-dark fixed-top">
          <div class="container">
            <a class="navbar-brand" href="/">LaravelCMS</a>
            <div class="collapse navbar-collapse" id="navbarResponsive">
              <ul class="navbar-nav ml-auto">
                 @if (Route::has('login'))
                    @auth
                    <li class="nav-item">
                         <a class="nav-link" href="{{ url('/home') }}">Home</a>
                    </li>
                    <li class="nav-item">
                      <a class="nav-link" href="{{ route('logout') }}"
                                           onclick="event.preventDefault();
                                                         document.getElementById('logout-form').submit();">
                        Log out
                      </a>
                      <form id="logout-form" action="{{ route('logout') }}" method="POST" style="display: none;">
                        @csrf
                      </form>
                     </li>
                     @else
                     <li class="nav-item">
                         <a class="nav-link" href="{{ route('login') }}">Login</a>
                    </li>
                     <li class="nav-item">
                         <a class="nav-link" href="{{ route('register') }}">Register</a>
                     </li>
                     @endauth
                 @endif
              </ul>
            </div>
          </div>
        </nav>

        <div id="app">
            @yield('content')
        </div>

        <footer class="py-5 bg-dark">
          <div class="container">
            <p class="m-0 text-center text-white">Copyright &copy; LaravelCMS 2018</p>
          </div>
        </footer>
      </body>
    </html>

Now we can inherit this template in the landing.blade.php file, open it and update it with this code:

    {{-- File: ./resources/views/landing.blade.php --}}
    @extends('layouts.master')

    @section('content')
    <div class="container">
      <div class="row align-items-center">
        <div class="col-md-8 mx-auto">
          <h1 class="my-4 text-center">Welcome to the Blog </h1>

          @foreach ($posts as $post)
          <div class="card mb-4">
            <img class="card-img-top" src=" {!! !empty($post->image) ? '/uploads/posts/' . $post->image :  'http://placehold.it/750x300' !!} " alt="Card image cap">
            <div class="card-body">
              <h2 class="card-title text-center">{{ $post->title }}</h2>
              <p class="card-text"> {{ str_limit($post->body, $limit = 280, $end = '...') }} </p>
              <a href="/posts/{{ $post->id }}" class="btn btn-primary">Read More &rarr;</a>
            </div>
            <div class="card-footer text-muted">
              Posted {{ $post->created_at->diffForHumans() }} by
              <a href="#">{{ $post->user->name }} </a>
            </div>
          </div>
          @endforeach

        </div>
      </div>
    </div>
    @endsection

Let’s do the same with the single.blade.php file, open it and update it with this code:

    {{-- File: ./resources/views/single.blade.php --}}
    @extends('layouts.master')

    @section('content')
    <div class="container">
      <div class="row">
        <div class="col-lg-10 mx-auto">
          <h3 class="mt-4">{{ $post->title }} <span class="lead"> by <a href="#"> {{ $post->user->name }} </a></span> </h3>
          <hr>
          <p>Posted {{ $post->created_at->diffForHumans() }} </p>
          <hr>
          <img class="img-fluid rounded" src=" {!! !empty($post->image) ? '/uploads/posts/' . $post->image :  'http://placehold.it/750x300' !!} " alt="">
          <hr>
          <p class="lead">{{ $post->body }}</p>
          <hr>
          <div class="card my-4">
            <h5 class="card-header">Leave a Comment:</h5>
            <div class="card-body">
              <form>
                <div class="form-group">
                  <textarea class="form-control" rows="3"></textarea>
                </div>
                <button type="submit" class="btn btn-primary">Submit</button>
              </form>
            </div>
          </div>
        </div>
      </div>
    </div>
    @endsection

Testing the application

We can test the application to see that things work as we expect. When we serve the application, we expect to see a landing page and a single post page. We also expect to see two posts because that’s the number of posts we seeded into the database.

We will serve the application using this command:

    $ php artisan serve


We can visit this address to see the application:

We have used simple placeholder images here because we haven’t built the admin dashboard that allows CRUD operations to be performed on posts.

In the coming chapters, we will add the ability for an admin to include a custom image when creating a new post.

Conclusion

In this chapter, we created the Post model and defined a relationship on it to the User model. We also built the landing page and single page.

In the next part of this series, we will develop the API that will be the medium for communication between the admin user and the post items.

The source code for this project is available here on Github.

Build a CMS with Laravel and Vue - Part 3: Building an API

In the previous part of this series, we initialized the posts resource and started building the frontend of the CMS. We designed the front page that shows all the posts and the single post page using Laravel’s templating engine, Blade.

In this part of the series, we will start building the API for the application. We will create an API for CRUD operations that an admin will perform on posts and we will test the endpoints using Postman.

The source code for this project is available here on GitHub.## Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
Building the API using Laravel’s API resources

The Laravel framework makes it very easy to build APIs. It has an API resources feature that we can easily adopt in our project. You can think of API resources as a transformation layer between Eloquent models and the JSON responses that will be sent back by our API.

Allowing mass assignment on specified fields

Since we are going to be performing CRUD operations on the posts in the application, we have to explicitly specify that it’s permitted for some fields to be mass-assigned data. For security reasons, Laravel prevents mass assignment of data to model fields by default.

Open the Post.php file and include this line of code:

    // File: ./app/Post.php
    protected $fillable = ['user_id', 'title', 'body', 'image'];

Defining API routes

We will use the apiResource()method to generate only API routes. Open the routes/api.php file and add the following code:

    // File: ./routes/api.php
    Route::apiResource('posts', 'PostController');


The source code for this project is available here on GitHub.### Creating the Post resource

At the beginning of this section, we already talked about what Laravel’s API resources are. Here, we create a resource class for our Post model. This will enable us to retrieve Post data and return formatted JSON format.

To create a resource class for our Post model run the following command in your terminal:

    $ php artisan make:resource PostResource


A new PostResource.php file will be available in the app/Http/Resources directory of our application. Open up the PostResource.php file and replace the toArray() method with the following:

    // File: ./app/Http/Resources/PostResource.php
    public function toArray($request)
    {
        return [
            'id' => $this->id,
            'title' => $this->title,
            'body' => $this->body,
            'image' => $this->image,
            'created_at' => (string) $this->created_at,
            'updated_at' => (string) $this->updated_at,
        ];
    }

The job of this toArray() method is to convert our P``ost resource into an array. As seen above, we have specified the fields on our Post model, which we want to be returned as JSON when we make a request for posts.

We are also explicitly casting the dates, created_at and update_at, to strings so that they would be returned as date strings. The dates are normally an instance of Carbon.

Now that we have created a resource class for our Post model, we can start building the API’s action methods in our PostController and return instances of the PostResource where we want.

Adding the action methods to the Post controller

The usual actions performed on a post include the following:

  1. landing.blade.php
  2. single.blade.php

In the last article, we already implemented a kind of ‘Read’ functionality when we defined the all and single methods. These methods allow users to browse through posts on the homepage.

In this section, we will define the methods that will resolve our API requests for creating, reading, updating and deleting posts.

The first thing we want to do is import the PostResource class at the top of the PostController.php file:

    // File: ./app/Http/Controllers/PostController.php
    use App\Http\Resources\PostResource;

The source code for this project is available here on GitHub.### Building the handler action for the create operation

In the PostController update the store() action method with the code snippet below. It will allow us to validate and create a new post:

    // File: ./app/Http/Controllers/PostController.php
    public function store(Request $request)
    {
        $this->validate($request, [
            'title' => 'required',
            'body' => 'required',
            'user_id' => 'required',            
            'image' => 'required|mimes:jpeg,png,jpg,gif,svg',
        ]);

        $post = new Post;

        if ($request->hasFile('image')) {
            $image = $request->file('image');
            $name = str_slug($request->title).'.'.$image->getClientOriginalExtension();
            $destinationPath = public_path('/uploads/posts');
            $imagePath = $destinationPath . "/" . $name;
            $image->move($destinationPath, $name);
            $post->image = $name;
        }

        $post->user_id = $request->user_id;
        $post->title = $request->title;
        $post->body = $request->body;
        $post->save();

        return new PostResource($post);
    }

Here’s a breakdown of what this method does:

  1. landing.blade.php
  2. single.blade.php

Building the handler action for the read operations

What we want here is to be able to read all the created posts or a single post. This is possible because the apiResource() method defines the API routes using standard REST rules.

This means that a GET request to this address, http://127.0.0.1:800/api/posts, should be resolved by the index() action method. Let’s update the index method with the following code:

    // File: ./app/Http/Controllers/PostController.php
    public function index()
    {
        return PostResource::collection(Post::latest()->paginate(5));
    }

This method will allow us to return a JSON formatted collection of all of the stored posts. We also want to paginate the response as this will allow us to create a better view on the admin dashboard.

Following the RESTful conventions as we discussed above, a GET request to this address, http://127.0.0.1:800/api/posts/id, should be resolved by the show() action method. Let’s update the method with the fitting snippet:

    // File: ./app/Http/Controllers/PostController.php
    public function show(Post $post)
    {
        return new PostResource($post);
    }

Awesome, now this method will return a single instance of a post resource upon API query.

Building the handler action for the update operation

Next, let’s update the update() method in the PostController class. It will allow us to modify an existing post:

    // File: ./app/Http/Controllers/PostController.php
    public function update(Request $request, Post $post)
    {
        $this->validate($request, [
            'title' => 'required',
            'body' => 'required',
        ]);

        $post->update($request->only(['title', 'body']));

        return new PostResource($post);
    }

This method receives a request and a post id as parameters, then we use route model binding to resolve the id into an instance of a Post. First, we validate the $request attributes, then we update the title and body fields of the resolved post.

Building the handler action for the delete operation

Let’s update the destroy() method in the PostController class. This method will allow us to remove an existing post:

    // File: ./app/Http/Controllers/PostController.php
    public function destroy(Post $post)
    {
        $post->delete();

        return response()->json(null, 204);
    }

In this method, we resolve the Post instance, then delete it and return a 204 response code.

Our methods are complete. We have a method to handle our CRUD operations, however, we haven’t built the frontend for the admin dashboard.

At the end of the second article, we defined the [email protected]() action method like this:

    public function index(Request $request)
    {
        if ($request->user()->hasRole('user')) {
            return view('home');
        }

        if ($request->user()->hasRole('admin')) {
            return redirect('/admin/dashboard');
        }
    }

This allowed us to redirect regular users to the view home, and admin users to the URL /admin/dashboard. At this point in this series, a visit to /admin/dashboard will fail because we have neither defined it as a route with a handler Controller nor built a view for it.

Let’s create the AdminController with this command:

    $ php artisan make:controller AdminController


We will add the /admin/ route to our routes/web.php file:

    Route::get('/admin/{any}', '[email protected]')->where('any', '.*');

The source code for this project is available here on GitHub.
Let’s update the AdminController.php file to use the auth middleware and include an index() action method:

    // File: ./app/Http/Controllers/AdminController.php
    <?php 

    namespace App\Http\Controllers;

    class AdminController extends Controller
    {
        public function __construct()
        {
            $this->middleware('auth');
        }

        public function index()
        {
            if (request()->user()->hasRole('admin')) {
                return view('admin.dashboard');
            }

            if (request()->user()->hasRole('user')) {
                return redirect('/home');
            }
        }
    }

In the index()action method, we included a snippet that will ensure that only admin users can visit the admin dashboard and perform CRUD operations on posts.

We will not start building the admin dashboard in this article but will test that our API works properly. We will use Postman to make requests to the application.

Testing the application

Let’s test that our API works as expected. We will, first of all, serve the application using this command:

    $ php artisan serve


We can visit http://localhost:8000 to see our application and there should be exactly two posts available; these are the posts we seeded into the database during the migration:

The source code for this project is available here on GitHub.
Now let’s create a new post over the API interface using Postman. Send a POST request as seen below:

Now let’s update this post we just created. In Postman, we will pass only the title and body fields to a PUT request.

To make it easy, you can just copy the payload below and use the raw request data type for the Body:

    {
      "title": "We made an edit to the Post on APIs",
      "body": "To a developer, 'What's an API?' might be a straightforward - if not exactly simple - question. But to anyone who doesn't have experience with code. APIs can come across as confusing or downright intimidating."
    }


The source code for this project is available here on GitHub.
Finally, let’s delete the post using Postman:

We are sure the post is deleted because the response status is 204 No Content as we specified in the PostController.

Conclusion

In this chapter, we learned about Laravel’s API resources and we created a resource class for the Post model. We also used the apiResources() method to generate API only routes for our application. We wrote the methods to handle the API operations and tested them using Postman.

In the next part, we will build the admin dashboard and develop the logic that will enable the admin user to manage posts over the API.

The source code for this project is available here on Github.

Build a CMS with Laravel and Vue - Part 4: Building the dashboard

In the last article of this series, we built the API interface and used Laravel API resources to return neatly formatted JSON responses. We tested that the API works as we defined it to using Postman.

In this part of the series, we will start building the admin frontend of the CMS. This is the first part of the series where we will integrate Vue and explore Vue’s magical abilities.

When we are done with this part, our application will have some added functionalities as seen below:

The source code for this project is available here on GitHub.## Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
Building the frontend

Laravel ships with Vue out of the box so we do not need to use the Vue-CLI or reference Vue from a CDN. This makes it possible for us to have all of our application, the frontend, and backend, in a single codebase.

Every newly created instance of a Laravel installation has some Vue files included by default, we can see these files when we navigate into the resources/assets/js/components folder.

Setting up Vue and VueRouter

Before we can start using Vue in our application, we need to first install some dependencies using NPM. To install the dependencies that come by default with Laravel, run the command below:

    $ npm install


We will be managing all of the routes for the admin dashboard using vue-router so let’s pull it in:

    $ npm install --save vue-router


When the installation is complete, the next thing we want to do is open the resources/assets/js/app.js file and replace its contents with the code below:

    // File: ./resources/assets/js/app.js
    require('./bootstrap');

    import Vue from 'vue'
    import VueRouter from 'vue-router'
    import Homepage from './components/Homepage'
    import Read from './components/Read'

    Vue.use(VueRouter)

    const router = new VueRouter({
        mode: 'history',
        routes: [
            {
                path: '/admin/dashboard',
                name: 'read',
                component: Read,
                props: true
            },
        ],
    });

    const app = new Vue({
        el: '#app',
        router,
        components: { Homepage },
    });

In the snippet above, we imported the VueRouter and added it to the Vue application. We also imported a Homepage and a Read component. These are the components where we will write our markup so let’s create both files.

Open the resources/assets/js/components folder and create four files:

  1. landing.blade.php
  2. single.blade.php

The source code for this project is available here on GitHub.
In the resources/assets/js/app.js file, we defined a routes array and in it, we registered a read route. During render time, this route’s path will be mapped to the Read component.

In the previous article, we specified that admin users should be shown an admin.dashboard view in the index method, however, we didn’t create this view. Let’s create the view. Open the resources/views folder and create a new folder called admin. Within the new resources/views/admin folder, create a new file and called dashboard.blade.php. This is going to be the entry point to the admin dashboard, further from this route, we will let the VueRouter handle everything else.

Open the resources/views/admin/dashboard.blade.php file and paste in the following code:

    <!-- File: ./resources/views/admin/dashboard.blade.php -->
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <title> Welcome to the Admin dashboard </title>
        <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css">
        <style>
            html, body {
            background-color: #202B33;
            color: #738491;
            font-family: "Open Sans";
            font-size: 16px;
            font-smoothing: antialiased;
            overflow: hidden;
            }
        </style>
    </head>
    <body>

      <script src="{{ asset('js/app.js') }}"></script>
    </body>
    </html>

Our goal here is to integrate Vue into the application, so we included the resources/assets/js/app.js file with this line of code:

    <script src="{{ asset('js/app.js') }}"></script>


For our app to work, we need a root element to bind our Vue instance unto. Before the <script> tag, add this snippet of code:

    <div id="app">
      <Homepage 
        :user-name='@json(auth()->user()->name)' 
        :user-id='@json(auth()->user()->id)'
      ></Homepage>
    </div>

We earlier defined the Homepage component as the wrapping component, that’s why we pulled it in here as the root component. For some of the frontend components to work correctly, we require some details of the logged in admin user to perform CRUD operations. This is why we passed down the userName and userId props to the Homepage component.

We need to prevent the CSRF error from occurring in our Vue frontend, so include this snippet of code just before the <title> tag:

    <meta name="csrf-token" content="{{ csrf_token() }}">
    <script> window.Laravel = { csrfToken: 'csrf_token() ' } </script>

This snippet will ensure that the correct token is always included in our frontend, Laravel provides the CSRF protection for us out of the box.

At this point, this should be the contents of your resources/views/admin/dashboard.blade.php file:

    <!DOCTYPE html>
    <html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
        <meta name="csrf-token" content="{{ csrf_token() }}">
        <script> window.Laravel = { csrfToken: 'csrf_token() ' } </script>
        <title> Welcome to the Admin dashboard </title>
        <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css">
        <style>
          html, body {
            background-color: #202B33;
            color: #738491;
            font-family: "Open Sans";
            font-size: 16px;
            font-smoothing: antialiased;
            overflow: hidden;
          }
        </style>
    </head>
    <body>
    <div id="app">
      <Homepage 
        :user-name='@json(auth()->user()->name)' 
        :user-id='@json(auth()->user()->id)'>
      </Homepage>
    </div>
    <script src="{{ asset('js/app.js') }}"></script>
    </body>
    </html>

Setting up the Homepage view

Open the Homepage.vue file that we created some time ago and include this markup template:

    <!-- File: ./resources/app/js/components/Homepage.vue -->
    <template>
      <div>
        <nav>
          <section>
            <a style="color: white" href="/admin/dashboard">Laravel-CMS</a> &nbsp; ||  &nbsp;
            <a style="color: white" href="/">HOME</a>
            <hr>
            <ul>
               <li>
                 <router-link :to="{ name: 'create', params: { userId } }">
                   NEW POST
                 </router-link>
               </li>
            </ul>
          </section>
        </nav>
        <article>
          <header>
            <header class="d-inline">Welcome, {{ userName }}</header>
            <p @click="logout" class="float-right mr-3" style="cursor: pointer">Logout</p>
          </header>
          <div> 
            <router-view></router-view> 
          </div>
        </article>
      </div>
    </template>

We added a router-link in this template, which routes to the Create component.

We are passing the userId data to the create component because a userId is required during Post creation.

Let’s include some styles so that the page looks good. Below the closing template tag, paste the following code:

    <style scoped>
      @import url(https://fonts.googleapis.com/css?family=Dosis:300|Lato:300,400,600,700|Roboto+Condensed:300,700|Open+Sans+Condensed:300,600|Open+Sans:400,300,600,700|Maven+Pro:400,700);
      @import url("https://netdna.bootstrapcdn.com/font-awesome/4.2.0/css/font-awesome.css");
      * {
        -moz-box-sizing: border-box;
        -webkit-box-sizing: border-box;
        box-sizing: border-box;
      }
      header {
        color: #d3d3d3;
      }
      nav {
        position: absolute;
        top: 0;
        bottom: 0;
        right: 82%;
        left: 0;
        padding: 22px;
        border-right: 2px solid #161e23;
      }
      nav > header {
        font-weight: 700;
        font-size: 0.8rem;
        text-transform: uppercase;
      }
      nav section {
        font-weight: 600;
      }
      nav section header {
        padding-top: 30px;
      }
      nav section ul {
        list-style: none;
        padding: 0px;
      }
      nav section ul a {
        color: white;
        text-decoration: none;
        font-weight: bold;
      }
      article {
        position: absolute;
        top: 0;
        bottom: 0;
        right: 0;
        left: 18%;
        overflow: auto;
        border-left: 2px solid #2a3843;
        padding: 20px;
      }
      article > header {
        height: 60px;
        border-bottom: 1px solid #2a3843;
      }
    </style>

The source code for this project is available here on GitHub.
Next, let’s add the <script> section that will use the props we passed down from the parent component. We will also define the method that controls the log out feature here. Below the closing style tag, paste the following code:

    <script>
    export default {
      props: {
        userId: {
          type: Number,
          required: true
        },
        userName: {
          type: String,
          required: true
        }
      },
      data() {
        return {};
      },
      methods: {
        logout() {
          axios.post("/logout").then(() => {
            window.location = "/";
          });
        }
      }
    };
    </script>

Setting up the Read view

In the resources/assets/js/app.js file, we defined the path of the read component as /admin/dashboard, which is the same address as the Homepage component. This will make sure the Read component always loads by default.

In the Read component, we want to load all of the available posts. We are also going to add Update and Delete options to each post. Clicking on these options will lead to the update and delete views respectively.

Open the Read.vue file and paste the following:

    <!-- File: ./resources/app/js/components/Read.vue -->
    <template>
        <div id="posts">
            <p class="border p-3" v-for="post in posts">
                {{ post.title }}
                <router-link :to="{ name: 'update', params: { postId : post.id } }">
                    <button type="button" class="p-1 mx-3 float-right btn btn-light">
                        Update
                    </button>
                </router-link>
                <button 
                    type="button" 
                    @click="deletePost(post.id)" 
                    class="p-1 mx-3 float-right btn btn-danger"
                >
                    Delete
                </button>
            </p>
            <div>
                <button 
                    v-if="next" 
                    type="button" 
                    @click="navigate(next)" 
                    class="m-3 btn btn-primary"
                >
                  Next
                </button>
                <button 
                    v-if="prev" 
                    type="button" 
                    @click="navigate(prev)" 
                    class="m-3 btn btn-primary"
                >
                  Previous
                </button>
            </div>
        </div>
    </template>

Above, we have the template to handle the posts that are loaded from the API. Next, paste the following below the closing template tag:

    <script>
    export default {
      mounted() {
        this.getPosts();
      },
      data() {
        return {
          posts: {},
          next: null,
          prev: null
        };
      },
      methods: {
        getPosts(address) {
          axios.get(address ? address : "/api/posts").then(response => {
            this.posts = response.data.data;
            this.prev = response.data.links.prev;
            this.next = response.data.links.next;
          });
        },
        deletePost(id) {
          axios.delete("/api/posts/" + id).then(response => this.getPosts())
        },
        navigate(address) {
          this.getPosts(address)
        }
      }
    };
    </script>

In the script above, we defined a getPosts() method that requests a list of posts from the backend server. We also defined a posts object as a data property. This object will be populated whenever posts are received from the backend server.

We defined next and prev data string properties to store pagination links and only display the pagination options where it is available.

Lastly, we defined a deletePost() method that takes the id of a post as a parameter and sends a DELETE request to the API interface using Axios.

Testing the application

Now that we have completed the first few components, we can serve the application using this command:

    $ php artisan serve


We will also build the assets so that our JavaScript is compiled for us. To do this, will run the command below in the root of the project folder:

    $ npm run dev


We can visit the application’s URL http://localhost:8000 and log in as an admin user, and delete a post:

Conclusion

In this part of the series, we started building the admin dashboard using Vue. We installed VueRouter to make the admin dashboard a SPA. We added the homepage view of the admin dashboard and included read and delete functionalities.

We are not done with the dashboard just yet. In the next part, we will add the views that lets us create and update posts.

The source code for this project is available here on Github.

Build a CMS with Laravel and Vue - Part 5: Completing our dashboards

In the previous part of this series, we built the first parts of the admin dashboard using Vue. We also made it into an SPA with the VueRouter, this means that visiting the pages does not cause a reload to the web browser.

We only built the wrapper component and the Read component that retrieves the posts to be loaded so an admin can manage them.

Here’s a recording of what we ended up with, in the last article:

In this article, we will build the view that will allow users to create and update posts. We will start writing code in the Update.vue and Create.vue files that we created in the previous article.

When we are done with this part, we will have additional functionalities like create and updating:

The source code for this project is available here on GitHub.## Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
Including the new routes in VueRouter

In the previous article, we only defined the route for the Read component, we need to include the route configuration for the new components that we are about to build; Update and Create.

Open the resources/assets/js/app.js file and replace the contents with the code below:

    require('./bootstrap');

    import Vue from 'vue'
    import VueRouter from 'vue-router'
    import Homepage from './components/Homepage'
    import Create from './components/Create'
    import Read from './components/Read'
    import Update from './components/Update'

    Vue.use(VueRouter)

    const router = new VueRouter({
        mode: 'history',
        routes: [
            {
                path: '/admin/dashboard',
                name: 'read',
                component: Read,
                props: true
            },
            {
                path: '/admin/create',
                name: 'create',
                component: Create,
                props: true
            },
            {
                path: '/admin/update',
                name: 'update',
                component: Update,
                props: true
            },
        ],
    });

    const app = new Vue({
        el: '#app',
        router,
        components: { Homepage },
    });

Above, we have added two new components to the JavaScript file. We have the Create and Read components. We also added them to the router so that they can be loaded using the specified URLs.

Building the create view

Open the Create.vue file and update it with this markup template:

    <!-- File: ./resources/app/js/components/Create.vue -->
    <template>
      <div class="container">
        <form>
          <div :class="['form-group m-1 p-3', (successful ? 'alert-success' : '')]">
            <span v-if="successful" class="label label-sucess">Published!</span>
          </div>
          <div :class="['form-group m-1 p-3', error ? 'alert-danger' : '']">
            <span v-if="errors.title" class="label label-danger">
              {{ errors.title[0] }}
            </span>
            <span v-if="errors.body" class="label label-danger"> 
              {{ errors.body[0] }} 
            </span>
            <span v-if="errors.image" class="label label-danger"> 
              {{ errors.image[0] }} 
            </span>
          </div>

          <div class="form-group">
            <input type="title" ref="title" class="form-control" id="title" placeholder="Enter title" required>
          </div>

          <div class="form-group">
            <textarea class="form-control" ref="body" id="body" placeholder="Enter a body" rows="8" required></textarea>
          </div>

          <div class="custom-file mb-3">
            <input type="file" ref="image" name="image" class="custom-file-input" id="image" required>
            <label class="custom-file-label" >Choose file...</label>
          </div>

          <button type="submit" @click.prevent="create" class="btn btn-primary block">
            Submit
          </button>
        </form>
      </div>
    </template>

Above we have the template for the Create component. If there is an error during post creation, there will be a field indicating the specific error. When a post is successfully published, there will also a message saying it was successful.

Let’s include the script logic that will perform the sending of posts to our backend server and read back the response.

After the closing template tag add this:

    <script>
    export default {
      props: {
        userId: {
          type: Number,
          required: true
        }
      },
      data() {
        return {
          error: false,
          successful: false,
          errors: []
        };
      },
      methods: {
        create() {
          const formData = new FormData();
          formData.append("title", this.$refs.title.value);
          formData.append("body", this.$refs.body.value);
          formData.append("user_id", this.userId);
          formData.append("image", this.$refs.image.files[0]);

          axios
            .post("/api/posts", formData)
            .then(response => {
              this.successful = true;
              this.error = false;
              this.errors = [];
            })
            .catch(error => {
              if (!_.isEmpty(error.response)) {
                if ((error.response.status = 422)) {
                  this.errors = error.response.data.errors;
                  this.successful = false;
                  this.error = true;
                }
              }
            });

          this.$refs.title.value = "";
          this.$refs.body.value = "";
        }
      }
    };
    </script>

In the script above, we defined a create() method that takes the values of the input fields and uses the Axios library to send them to the API interface on the backend server. Within this method, we also update the status of the operation, so that an admin user can know when a post is created successfully or not.

Building the update view

Let’s start building the Update component. Open the Update.vue file and update it with this markup template:

    <!-- File: ./resources/app/js/components/Update.vue -->
    <template>
      <div class="container">
        <form>
          <div :class="['form-group m-1 p-3', successful ? 'alert-success' : '']">
            <span v-if="successful" class="label label-sucess">Updated!</span>
          </div>

          <div :class="['form-group m-1 p-3', error ? 'alert-danger' : '']">
            <span v-if="errors.title" class="label label-danger">
              {{ errors.title[0] }}
            </span>
            <span v-if="errors.body" class="label label-danger">
              {{ errors.body[0] }}
            </span>
          </div>

          <div class="form-group">
            <input type="title" ref="title" class="form-control" id="title" placeholder="Enter title" required>
          </div>

          <div class="form-group">
            <textarea class="form-control" ref="body" id="body" placeholder="Enter a body" rows="8" required></textarea>
          </div>

          <button type="submit" @click.prevent="update" class="btn btn-primary block">
            Submit
          </button>
        </form>
      </div>
    </template>

This template is similar to the one in the Create component. Let’s add the script for the component.

Below the closing template tag, paste the following:

    <script>
    export default {
      mounted() {
        this.getPost();
      },
      props: {
        postId: {
          type: Number,
          required: true
        }
      },
      data() {
        return {
          error: false,
          successful: false,
          errors: []
        };
      },
      methods: {
        update() {
          let title = this.$refs.title.value;
          let body = this.$refs.body.value;

          axios
            .put("/api/posts/" + this.postId, { title, body })
            .then(response => {
              this.successful = true;
              this.error = false;
              this.errors = [];
            })
            .catch(error => {
              if (!_.isEmpty(error.response)) {
                if ((error.response.status = 422)) {
                  this.errors = error.response.data.errors;
                  this.successful = false;
                  this.error = true;
                }
              }
            });
        },
        getPost() {
          axios.get("/api/posts/" + this.postId).then(response => {
            this.$refs.title.value = response.data.data.title;
            this.$refs.body.value = response.data.data.body;
          });
        }
      }
    };
    </script>


In the script above, we make a call to the getPosts() method as soon as the component is mounted. The getPosts() method fetches the data of a single post from the backend server, using the postId.

When Axios sends back the data for the post, we update the input fields in this component so they can be updated.

Finally, the update() method takes the values of the fields in the components and attempts to send them to the backend server for an update. In a situation where the fails, we get instant feedback.

Testing the application

To test that our changes work, we want to refresh the database and restore it back to a fresh state. To do this, run the following command in your terminal:

    $ php artisan migrate:fresh --seed


Next, let’s compile our JavaScript files and assets. This will make sure all the changes we made in the Vue component and the app.js file gets built. To recompile, run the command below in your terminal:

    $ npm run dev


Lastly, we need to serve the application. To do this, run the following command in your terminal window:

    $ php artisan serve


The source code for this project is available here on GitHub.
We will visit the application’s http://localhost:8000 and log in as an admin user. From the dashboard, you can test the create and update feature:

Conclusion

In this part of the series, we updated the dashboard to include the Create and Update component so the administrator can add and update posts.

In the next article, we will build the views that allow for the creation and updating of a post.

The source code for this project is available here on Github.

Build a CMS with Laravel and Vue - Part 6: Adding Realtime Comments

In the previous part of this series, we finished building the backend of the application using Vue. We were able to add the create and update component, which is used for creating a new post and updating an existing post.

Here’s a screen recording of what we have been able to achieve:

In this final part of the series, we will be adding support for comments. We will also ensure that the comments on each post are updated in realtime, so a user doesn’t have to refresh the page to see new comments.

When we are done, our application will have new features and will work like this:

The source code for this project is available here on GitHub.## Prerequisites

To follow along with this series, a few things are required:

  • Basic knowledge of PHP.
  • Basic knowledge of the Laravel framework.
  • Basic knowledge of JavaScript (ES6 syntax).
  • Basic knowledge of Vue.
  • Postman installed on your machine.
Adding comments to the backend

When we were creating the API, we did not add the support for comments to the post resource, so we will have to do so now. Open the API project in your text editor as we will be modifying the project a little.

The first thing we want to do is create a model, controller, and a migration for the comment resource. To do this, open your terminal and cd to the project directory and run the following command:

    $ php artisan make:model Comment -mc


The command above will create a model called Comment, a controller called CommentController, and a migration file in the database/migrations directory.

Updating the comments migration file

To update the comments migration navigate to the database/migrations folder and find the newly created migration file for the Comment model. Let’s update the up() method in the file:

    // File: ./database/migrations/*_create_comments_table.php
    public function up()
    {
        Schema::create('comments', function (Blueprint $table) {
            $table->increments('id');
            $table->timestamps();
            $table->integer('user_id')->unsigned();
            $table->integer('post_id')->unsigned();
            $table->text('body');
        });
    }

We included user_id and post_id fields because we intend to create a link between the comments, users, and posts. The body field will contain the actual comment.

Defining the relationships among the Comment, User, and Post models

In this application, a comment will belong to a user and a post because a user can make a comment on a specific post, so we need to define the relationship that ties everything up.

Open the User model and include this method:

    // File: ./app/User.php
    public function comments()
    {
        return $this->hasMany(Comment::class);
    }

This is a relationship that simply says that a user can have many comments. Now let’s define the same relationship on the Post model. Open the Post.php file and include this method:

    // File: ./app/Post.php
    public function comments()
    {
        return $this->hasMany(Comment::class);
    }

Finally, we will include two methods in the Comment model to complete the second half of the relationships we defined in the User and Post models.

Open the app/Comment.php file and include these methods:

    // File: ./app/Comment.php
    public function user()
    {
        return $this->belongsTo(User::class);
    }

    public function post()
    {
        return $this->belongsTo(Post::class);
    }

Since we want to be able to mass assign data to specific fields of a comment instance during comment creation, we will include this array of permitted assignments in the app/Comment.php file:

    protected $fillable = ['user_id', 'post_id', 'body'];

We can now run our database migration for our comments:

    $ php artisan migrate


Configuring Laravel to broadcast events using Pusher

We already said that the comments will have a realtime functionality and we will be building this using Pusher, so we need to enable Laravel’s event broadcasting feature.

Open the config/app.php file and uncomment the following line in the providers array:

    App\Providers\BroadcastServiceProvider


Next, we need to configure the broadcast driver in the .env file:

    BROADCAST_DRIVER=pusher


Let’s pull in the Pusher PHP SDK using composer:

    $ composer require pusher/pusher-php-server


Configuring Pusher

For us to use Pusher in this application, it is a prerequisite that you have a Pusher account. You can create a free Pusher account here then login to your dashboard and create an app.

Once you have created an app, we will use the app details to configure pusher in the .env file:

    PUSHER_APP_ID=xxxxxx
    PUSHER_APP_KEY=xxxxxxxxxxxxxxxxxxxx
    PUSHER_APP_SECRET=xxxxxxxxxxxxxxxxxxxx
    PUSHER_APP_CLUSTER=xx


Update the Pusher keys with the app credentials provided for you under the Keys section on the Overview tab on the Pusher dashboard.

Broadcasting an event for when a new comment is sent

To make the comment update realtime, we have to broadcast an event based on the comment creation activity. We will create a new event and call it CommentSent. It is to be fired when there is a successful creation of a new comment.

Run command in your terminal:

    php artisan make:event CommentSent


There will be a newly created file in the app\Events directory, open the CommentSent.php file and ensure that it implements the ShouldBroadcast interface.

Open and replace the file with the following code:

    // File: ./app/Events/CommentSent.php
    <?php 

    namespace App\Events;

    use App\Comment;
    use App\User;
    use Illuminate\Broadcasting\Channel;
    use Illuminate\Queue\SerializesModels;
    use Illuminate\Broadcasting\PrivateChannel;
    use Illuminate\Broadcasting\PresenceChannel;
    use Illuminate\Foundation\Events\Dispatchable;
    use Illuminate\Broadcasting\InteractsWithSockets;
    use Illuminate\Contracts\Broadcasting\ShouldBroadcast;

    class CommentSent implements ShouldBroadcast
    {
        use Dispatchable, InteractsWithSockets, SerializesModels;

        public $user;

        public $comment;

        public function __construct(User $user, Comment $comment)
        {
            $this->user = $user;

            $this->comment = $comment;
        }

        public function broadcastOn()
        {
            return new PrivateChannel('comment');
        }
    }

In the code above, we created two public properties, user and comment, to hold the data that will be passed to the channel we are broadcasting on. We also created a private channel called comment. We are using a private channel so that only authenticated clients can subscribe to the channel.

Defining the routes for handling operations on a comment

We created a controller for the comment model earlier but we haven’t defined the web routes that will redirect requests to be handled by that controller.

Open the routes/web.php file and include the code below:

    // File: ./routes/web.php
    Route::get('/{post}/comments', '[email protected]');
    Route::post('/{post}/comments', '[email protected]');

Setting up the action methods in the CommentController

We need to include two methods in the CommentController.php file, these methods will be responsible for storing and retrieving methods. In the store() method, we will also be broadcasting an event when a new comment is created.

Open the CommentController.php file and replace its contents with the code below:

    // File: ./app/Http/Controllers/CommentController.php
    <?php 

    namespace App\Http\Controllers;

    use App\Comment;
    use App\Events\CommentSent;
    use App\Post;
    use Illuminate\Http\Request;

    class CommentController extends Controller
    {
        public function store(Post $post)
        {
            $this->validate(request(), [
                'body' => 'required',
            ]);

            $user = auth()->user();

            $comment = Comment::create([
                'user_id' => $user->id,
                'post_id' => $post->id,
                'body' => request('body'),
            ]);

            broadcast(new CommentSent($user, $comment))->toOthers();

            return ['status' => 'Message Sent!'];
        }

        public function index(Post $post)
        {
            return $post->comments()->with('user')->get();
        }
    }

In the store method above, we are validating then creating a new post comment. After the comment has been created, we broadcast the CommentSent event to other clients so they can update their comments list in realtime.

In the index method we just return the comments belonging to a post along with the user that made the comment.

Adding a layer of authentication

Let’s add a layer of authentication that ensures that only authenticated users can listen on the private comment channel we created.

Add the following code to the routes/channels.php file:

    // File: ./routes/channels.php
    Broadcast::channel('comment', function ($user) {
        return auth()->check();
    });

Adding comments to the frontend

In the second article of this series, we created the view for the single post landing page in the single.blade.php file, but we didn’t add the comments functionality. We are going to add it now. We will be using Vue to build the comments for this application so the first thing we will do is include Vue in the frontend of our application.

Open the master layout template and include Vue to its <head> tag. Just before the <title> tag appears in the master.blade.php file, include this snippet:

    <!-- File: ./resources/views/layouts/master.blade.php -->
    <meta name="csrf-token" content="{{ csrf_token() }}">
    <script src="{{ asset('js/app.js') }}" defer></script>

The csrf_token() is there so that users cannot forge requests in our application. All our requests will pick the randomly generated csrf-token and use that to make requests.

Related: CSRF in Laravel: how VerifyCsrfToken works and how to prevent attacks

Now the next thing we want to do is update the resources/assets/js/app.js file so that it includes a template for the comments view.

Open the file and replace its contents with the code below:

    require('./bootstrap');

    import Vue          from 'vue'
    import VueRouter    from 'vue-router'
    import Homepage from './components/Homepage'
    import Create   from './components/Create'
    import Read     from './components/Read'
    import Update   from './components/Update'
    import Comments from './components/Comments'

    Vue.use(VueRouter)

    const router = new VueRouter({
        mode: 'history',
        routes: [
            {
                path: '/admin/dashboard',
                name: 'read',
                component: Read,
                props: true
            },
            {
                path: '/admin/create',
                name: 'create',
                component: Create,
                props: true
            },
            {
                path: '/admin/update',
                name: 'update',
                component: Update,
                props: true
            },
        ],
    });

    const app = new Vue({
        el: '#app',
        components: { Homepage, Comments },
        router,
    });

Above we imported the Comment component and then we added it to the list of components in the applications Vue instance.

Now create a Comments.vue file in the resources/assets/js/components directory. This is where all the code for our comment view will go. We will populate this file later on.

Installing Pusher and Laravel Echo

For us to be able to use Pusher and subscribe to events on the frontend, we need to pull in both Pusher and Laravel Echo. We will do so by running this command:

    $ npm install --save laravel-echo pusher-js


The source code for this project is available here on GitHub.
Now let’s configure Laravel Echo to work in our application. In the resources/assets/js/bootstrap.js file, find and uncomment this snippet of code:

    import Echo from 'laravel-echo'

    window.Pusher = require('pusher-js');

    window.Echo = new Echo({
         broadcaster: 'pusher',
         key: process.env.MIX_PUSHER_APP_KEY,
         cluster: process.env.MIX_PUSHER_APP_CLUSTER,
         encrypted: true
    });

The source code for this project is available here on GitHub.
Now let’s import the Comments component into the single.blade.php file and pass along the required the props.

Open the single.blade.php file and replace its contents with the code below:

    {{-- File: ./resources/views/single.blade.php --}}
    @extends('layouts.master')

    @section('content')
    <div class="container">
      <div class="row">
        <div class="col-lg-10 mx-auto">
          <br>
          <h3 class="mt-4">
            {{ $post->title }} 
            <span class="lead">by <a href="#">{{ $post->user->name }}</a></span>
          </h3>
          <hr>
          <p>Posted {{ $post->created_at->diffForHumans() }}</p>
          <hr>
          <img class="img-fluid rounded" src="{!! !empty($post->image) ? '/uploads/posts/' . $post->image : 'http://placehold.it/750x300' !!}" alt="">
          <hr>
          <div>
            <p>{{ $post->body }}</p>
            <hr>
            <br>
          </div>

          @auth
          <Comments
              :post-id='@json($post->id)' 
              :user-name='@json(auth()->user()->name)'>
          </Comments>
          @endauth
        </div>
      </div>
    </div>
    @endsection

Building the comments view

Open the Comments.vue file and add the following markup template below:

    <template>
      <div class="card my-4">
        <h5 class="card-header">Leave a Comment:</h5>
        <div class="card-body">
          <form>
            <div class="form-group">
              <textarea ref="body" class="form-control" rows="3"></textarea>
            </div>
            <button type="submit" @click.prevent="addComment" class="btn btn-primary">
              Submit
            </button>
          </form>
        </div>
        <p class="border p-3" v-for="comment in comments">
           <strong>{{ comment.user.name }}</strong>: 
           <span>{{ comment.body }}</span>
        </p>
      </div>
    </template>

Now, we’ll add a script that defines two methods:

  1. landing.blade.php
  2. single.blade.php

In the same file, add the following below the closing template tag:

    <script>
    export default {
      props: {
        userName: {
          type: String,
          required: true
        },
        postId: {
          type: Number,
          required: true
        }
      },
      data() {
        return {
          comments: []
        };
      },

      created() {
        this.fetchComments();

        Echo.private("comment").listen("CommentSent", e => {
            this.comments.push({
              user: {name: e.user.name},
              body: e.comment.body,
            });
        });
      },

      methods: {
        fetchComments() {
          axios.get("/" + this.postId + "/comments").then(response => {
            this.comments = response.data;
          });
        },

        addComment() {
          let body = this.$refs.body.value;
          axios.post("/" + this.postId + "/comments", { body }).then(response => {
            this.comments.push({
              user: {name: this.userName},
              body: this.$refs.body.value
            });
            this.$refs.body.value = "";
          });
        }
      }
    };
    </script>

In the created() method above, we first made a call to the fetchComments() method, then we created a listener to the private comment channel using Laravel Echo. Once this listener is triggered, the comments property is updated.

Testing the application

Now let’s test the application to see if it is working as intended. Before running the application, we need to refresh our database so as to revert any changes. To do this, run the command below in your terminal:

    $ php artisan migrate:fresh --seed


Next, let’s build the application so that all the changes will be compiled and included as a part of the JavaScript file. To do this, run the following command on your terminal:

    $ npm run dev


Finally, let’s serve the application using this command:

    $ php artisan serve


To test that our application works visit the application URL http://localhost:8000 on two separate browser windows, we will log in to our application on each of the windows as a different user.

We will finally make a comment on the same post on each of the browser windows and check that it updates in realtime on the other window:

Conclusion

In this final tutorial of this series, we created the comments feature of the CMS and also made it realtime. We were able to accomplish the realtime functionality using Pusher.

In this entire series, we learned how to build a CMS using Laravel and Vue.

The source code for this article series is available here on Github.

Learn More

Build a Basic CRUD App with Laravel and Vue

Fullstack Vue App with Node, Express and MongoDB

Build a Simple CRUD App with Spring Boot and Vue.js

Build a Basic CRUD App with Laravel and Angular

Build a Basic CRUD App with Laravel and React

PHP Programming Language - PHP Tutorial for Beginners

Vuejs 2 Authentication Tutorial

Vue Authentication And Route Handling Using Vue-router

Vue JS 2 - The Complete Guide (incl. Vue Router & Vuex)

Nuxt.js - Vue.js on Steroids

MEVP Stack Vue JS 2 Course: MySQL + Express.js + Vue.js +PHP

Build Web Apps with Vue JS 2 & Firebase

10 Tips for Building and Maintaining Large Vue.js Projects

10 Tips for Building and Maintaining Large Vue.js Projects

Here are the top best practices I've developed while working on Vue projects with a large code base. These tips will help you develop more efficient code that is easier to maintain and share.

Here are the top best practices I've developed while working on Vue projects with a large code base. These tips will help you develop more efficient code that is easier to maintain and share.

When freelancing this year, I had the opportunity to work on some large Vue applications. I am talking about projects with more than 😰 a dozen Vuex stores, a high number of components (sometimes hundreds) and many views (pages). 😄 It was actually quite a rewarding experience for me as I discovered many interesting patterns to make the code scalable. I also had to fix some bad practices that resulted in the famous spaghetti code dilemma. 🍝

Thus, today I’m sharing 10 best practices with you that I would recommend to follow if you are dealing with a large code base. 🧚🏼‍♀️

1. Use Slots to Make Your Components Easier to Understand and More Powerful

I recently wrote an article about some important things you need to know regarding slots in Vue.js. It highlights how slots can make your components more reusable and easier to maintain and why you should use them.

🧐 But what does this have to do with large Vue.js projects? A picture is usually worth a thousand words, so I will paint you a picture about the first time I deeply regretted not using them.

One day, I simply had to create a popup. Nothing really complex at first sight as it was just including a title, a description and some buttons. So what I did was to pass everything as props. I ended up with three props that you would use to customize the components and an event was emitted when people clicked on the buttons. Easy peasy! 😅

But, as the project grew over time, the team requested that we display a lot of other new things in it: form fields, different buttons depending on which page it was displayed on, cards, a footer, and the list goes on. I figured out that if I kept using props to make this component evolve, it would be ok. But god, 😩 how wrong I was! The component quickly became too complex to understand as it was including countless child components, using way too many props and emitting a large number of events. 🌋 I came to experience that terrible situation in which when you make a change somewhere and somehow it ends up breaking something else on another page. I had built a Frankenstein monster instead of a maintainable component! 🤖

However, things could have been better if I had relied on slots from the start. I ended up refactoring everything to come up with this tiny component. Easier to maintain, faster to understand and way more extendable!

<template>
  <div class="c-base-popup">
    <div v-if="$slot.header" class="c-base-popup__header">
      <slot name="header">
    </div>
    <div v-if="$slot.subheader" class="c-base-popup__subheader">
      <slot name="subheader">
    </div>
    <div class="c-base-popup__body">
      <h1>{{ title }}</h1>
      <p v-if="description">{{ description }}</p>
    </div>
    <div v-if="$slot.actions" class="c-base-popup__actions">
      <slot name="actions">
    </div>
    <div v-if="$slot.footer" class="c-base-popup__footer">
      <slot name="footer">
    </div>
  </div>
</template>

<script>
export default {
  props: {
    description: {
      type: String,
      default: null
    },
    title: {
      type: String,
      required: true
    }
  }
}
</script>

My point is that, from experience, projects built by developers who know when to use slots does make a big difference on its future maintainability. Way fewer events are being emitted, the code is easier to understand, and it offers way more flexibility as you can display whatever components you wish inside.

⚠️ As a rule of thumb, keep in mind that when you end up duplicating your child components' props inside their parent component, you should start using slots at that point.

2. Organize Your Vuex Store Properly

Usually, new Vue.js developers start to learn about Vuex because they stumbled upon on of these two issues:

  • Either they need to access the data of a given component from another one that’s actually too far apart in the tree structure, or
  • They need the data to persist after the component is destroyed.

That's when they create their first Vuex store, learn about modules and start organizing them in their application. 💡

The thing is that there is no single pattern to follow when creating modules. However, 👆🏼 I highly recommend you think about how you want to organize them. From what I've seen, most developers prefer to organize them per feature. For instance:

  • Auth.
  • Blog.
  • Inbox.
  • Settings.

😜 On my side, I find it easier to understand when they are organized according to the data models they fetch from the API. For example:

  • Users
  • Teams
  • Messages
  • Widgets
  • Articles

Which one you choose is up to you. The only thing to keep in mind is that a well-organized Vuex store will result in a more productive team in the long run. It will also make newcomers better predisposed to wrap their minds around your code base when they join your team.

3. Use Actions to Make API Calls and Commit the Data

Most of my API calls (if not all) are made inside my Vuex actions. You may wonder: why is that a good place to do so? 🤨

🤷🏼‍♀️ Simply because most of them fetch the data I need to commit in my store. Besides, they provide a level of encapsulation and reusability I really enjoy working with. Here are some other reasons I do so:

  • If I need to fetch the first page of articles in two different places (let's say the blog and the homepage), I can just call the appropriate dispatcher with the right parameters. The data will be fetched, committed and returned with no duplicated code other than the dispatcher call.

  • If I need to create some logic to avoid fetching this first page when it has already been fetched, I can do so in one place. In addition to decreasing the load on my server, I am also confident that it will work everywhere.

  • I can track most of my Mixpanel events inside these actions, making the analytics code base really easy to maintain. I do have some applications where all the Mixpanel calls are solely made in the actions. 😂 I can't tell you how much of a joy it is to work this way when I don't have to understand what is tracked from what is not and when they are being sent.

4. Simplify Your Code Base with mapState, mapGetters, mapMutations and mapActions

There usually is no need to create multiple computed properties or methods when you just need to access your state/getters or call your actions/mutations inside your components. Using mapState, mapGetters, mapMutations and mapActions can help you shorten your code and make things easier to understand by grouping what is coming from your store modules in one place.

// NPM
import { mapState, mapGetters, mapActions, mapMutations } from "vuex";

export default {
  computed: {
    // Accessing root properties
    ...mapState("my_module", ["property"]),
    // Accessing getters
    ...mapGetters("my_module", ["property"]),
    // Accessing non-root properties
    ...mapState("my_module", {
      property: state => state.object.nested.property
    })
  },

  methods: {
    // Accessing actions
    ...mapActions("my_module", ["myAction"]),
    // Accessing mutations
    ...mapMutations("my_module", ["myMutation"])
  }
};

All the information you'll need on these handy helpers is available here in the official Vuex documentation. 🤩

5. Use API Factories

I usually like to create a this.$api helper that I can call anywhere to fetch my API endpoints. At the root of my project, I have an api folder that includes all my classes (see one of them below).

api
├── auth.js
├── notifications.js
└── teams.js

Each one is grouping all the endpoints for its category. Here is how I initialize this pattern with a plugin in my Nuxt applications (it is quite a similar process in a standard Vue app).

// PROJECT: API
import Auth from "@/api/auth";
import Teams from "@/api/teams";
import Notifications from "@/api/notifications";

export default (context, inject) => {
  if (process.client) {
    const token = localStorage.getItem("token");
    // Set token when defined
    if (token) {
      context.$axios.setToken(token, "Bearer");
    }
  }
  // Initialize API repositories
  const repositories = {
    auth: Auth(context.$axios),
    teams: Teams(context.$axios),
    notifications: Notifications(context.$axios)
  };
  inject("api", repositories);
};

export default $axios => ({
  forgotPassword(email) {
    return $axios.$post("/auth/password/forgot", { email });
  },

  login(email, password) {
    return $axios.$post("/auth/login", { email, password });
  },

  logout() {
    return $axios.$get("/auth/logout");
  },

  register(payload) {
    return $axios.$post("/auth/register", payload);
  }
});

Now, I can simply call them in my components or Vuex actions like this:

export default {
  methods: {
    onSubmit() {
      try {
        this.$api.auth.login(this.email, this.password);
      } catch (error) {
        console.error(error);
      }
    }
  }
};

6. Use $config to access your environment variables (especially useful in templates)

Your project probably have some global configuration variables defined in some files:

config
├── development.json
└── production.json

I like to quickly access them through a this.$config helper, especially when I am inside a template. As always, it's quite easy to extend the Vue object:

// NPM
import Vue from "vue";

// PROJECT: COMMONS
import development from "@/config/development.json";
import production from "@/config/production.json";

if (process.env.NODE_ENV === "production") {
  Vue.prototype.$config = Object.freeze(production);
} else {
  Vue.prototype.$config = Object.freeze(development);
}

7. Follow a Single Convention to Name Your Commits

As the project grows, you will need to browse the history for your components on a regular basis. If your team does not follow the same convention to name their commits, it will make it harder to understand what each one does.

I always use and recommend the Angular commit message guidelines. I follow it in every project I work on, and in many cases other team members are quick to figure out that it's better to follow too.

Following these guidelines leads to more readable messages that make commits easier to track when looking through the project history. In a nutshell, here is how it works:

git commit -am "<type>(<scope>): <subject>"

# Here are some samples
git commit -am "docs(changelog): update changelog to beta.5"
git commit -am "fix(release): need to depend on latest rxjs and zone.js"

Have a look at their README file to learn more about it and its conventions.

8. Always Freeze Your Package Versions When Your Project is in Production

I know... All packages should follow the semantic versioning rules. But the reality is, some of them don't. 😅

To avoid having to wake up in the middle of the night because one of your dependencies broke your entire project, locking all your package versions should make your mornings at work less stressful. 😇

What it means is simply this: avoid versions prefixed with ^:

{
  "name": "my project",

  "version": "1.0.0",

  "private": true,

  "dependencies": {
    "axios": "0.19.0",
    "imagemin-mozjpeg": "8.0.0",
    "imagemin-pngquant": "8.0.0",
    "imagemin-svgo": "7.0.0",
    "nuxt": "2.8.1",
  },

  "devDependencies": {
    "autoprefixer": "9.6.1",
    "babel-eslint": "10.0.2",
    "eslint": "6.1.0",
    "eslint-friendly-formatter": "4.0.1",
    "eslint-loader": "2.2.1",
    "eslint-plugin-vue": "5.2.3"
  }
}

9. Use Vue Virtual Scroller When Displaying a Large Amount of Data

When you need to display a lot of rows in a given page or when you need to loop over a large amount of data, you might have noticed that the page can quickly become quite slow to render. To fix this, you can use vue-virtual-scoller.

npm install vue-virtual-scroller

It will render only the visible items in your list and re-use components and dom elements to be as efficient and performant as possible. It really is easy to use and works like a charm! ✨

<template>
  <RecycleScroller
    class="scroller"
    :items="list"
    :item-size="32"
    key-field="id"
    v-slot="{ item }"
  >
    <div class="user">
      {{ item.name }}
    </div>
  </RecycleScroller>
</template>

10. Track the Size of Your Third-Party Packages

When a lot of people work in the same project, the number of installed packages can quickly become incredibly high if no one is paying attention to them. To avoid your application becoming slow (especially on slow mobile networks), I use the import cost package in Visual Studio Code. This way, I can see right from my editor how large an imported module library is, and can check out what's wrong when it's getting too large.

For instance, in a recent project, the entire lodash library was imported (which is approximately 24kB gzipped). The issue? Only the cloneDeep method was used. By identifying this issue with the import cost package, we fixed it with:

npm remove lodash
npm install lodash.clonedeep

The clonedeep function could then be imported where needed:

import cloneDeep from "lodash.clonedeep";

⚠️ To optimize things even further, you can also use the Webpack Bundle Analyzer package to visualize the size of your webpack output files with an interactive zoomable treemap.


Do you have other best practices when dealing with a large Vue code base? Feel free to tell me in the comments below

A Complete Machine Learning Project Walk-Through in Python

A Complete Machine Learning Project Walk-Through in Python

A Complete Machine Learning Project Walk-Through in Python: Putting the machine learning pieces together; Model Selection, Hyperparameter Tuning, and Evaluation; Interpreting a machine learning model and presenting results

Reading through a data science book or taking a course, it can feel like you have the individual pieces, but don’t quite know how to put them together. Taking the next step and solving a complete machine learning problem can be daunting, but preserving and completing a first project will give you the confidence to tackle any data science problem. This series of articles will walk through a complete machine learning solution with a real-world dataset to let you see how all the pieces come together.

We’ll follow the general machine learning workflow step-by-step:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

Along the way, we’ll see how each step flows into the next and how to specifically implement each part in Python. The complete project is available on GitHub, with the first notebook here.

(As a note, this problem was originally given to me as an “assignment” for a job screen at a start-up. After completing the work, I was offered the job, but then the CTO of the company quit and they weren’t able to bring on any new employees. I guess that’s how things go on the start-up scene!)

Problem Definition

The first step before we get coding is to understand the problem we are trying to solve and the available data. In this project, we will work with publicly available building energy data from New York City.

The objective is to use the energy data to build a model that can predict the Energy Star Score of a building and interpret the results to find the factors which influence the score.

The data includes the Energy Star Score, which makes this a supervised regression machine learning task:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

We want to develop a model that is both **accurate **— it can predict the Energy Star Score close to the true value — and interpretable — we can understand the model predictions. Once we know the goal, we can use it to guide our decisions as we dig into the data and build models.

Data Cleaning

Contrary to what most data science courses would have you believe, not every dataset is a perfectly curated group of observations with no missing values or anomalies (looking at you mtcars and iris datasets). Real-world data is messy which means we need to clean and wrangle it into an acceptable format before we can even start the analysis. Data cleaning is an un-glamorous, but necessary part of most actual data science problems.

First, we can load in the data as a Pandas DataFrame and take a look:

import pandas as pd
import numpy as np

# Read in data into a dataframe 
data = pd.read_csv('data/Energy_and_Water_Data_Disclosure_for_Local_Law_84_2017__Data_for_Calendar_Year_2016_.csv')

# Display top of dataframe
data.head()

This is a subset of the full data which contains 60 columns. Already, we can see a couple issues: first, we know that we want to predict the ENERGY STAR Score but we don’t know what any of the columns mean. While this isn’t necessarily an issue — we can often make an accurate model without any knowledge of the variables — we want to focus on interpretability, and it might be important to understand at least some of the columns.

When I originally got the assignment from the start-up, I didn’t want to ask what all the column names meant, so I looked at the name of the file,

and decided to search for “Local Law 84”. That led me to this page which explains this is an NYC law requiring all buildings of a certain size to report their energy use. More searching brought me to all the definitions of the columns. Maybe looking at a file name is an obvious place to start, but for me this was a reminder to go slow so you don’t miss anything important!

We don’t need to study all of the columns, but we should at least understand the Energy Star Score, which is described as:

A 1-to-100 percentile ranking based on self-reported energy usage for the reporting year. The Energy Star score is a relative measure used for comparing the energy efficiency of buildings.

That clears up the first problem, but the second issue is that missing values are encoded as “Not Available”. This is a string in Python which means that even the columns with numbers will be stored as object datatypes because Pandas converts a column with any strings into a column of all strings. We can see the datatypes of the columns using the dataframe.info()method:

# See the column data types and non-missing values
data.info()

Sure enough, some of the columns that clearly contain numbers (such as ft²), are stored as objects. We can’t do numerical analysis on strings, so these will have to be converted to number (specifically float) data types!

Here’s a little Python code that replaces all the “Not Available” entries with not a number ( np.nan), which can be interpreted as numbers, and then converts the relevant columns to the float datatype:

# Replace all occurrences of Not Available with numpy not a number
data = data.replace({'Not Available': np.nan})

# Iterate through the columns
for col in list(data.columns):
    # Select columns that should be numeric
    if ('ft²' in col or 'kBtu' in col or 'Metric Tons CO2e' in col or 'kWh' in 
        col or 'therms' in col or 'gal' in col or 'Score' in col):
        # Convert the data type to float
data[col] = data[col].astype(float)

Once the correct columns are numbers, we can start to investigate the data.

Missing Data and Outliers

In addition to incorrect datatypes, another common problem when dealing with real-world data is missing values. These can arise for many reasons and have to be either filled in or removed before we train a machine learning model. First, let’s get a sense of how many missing values are in each column (see the notebook for code).

(To create this table, I used a function from this Stack Overflow Forum).

While we always want to be careful about removing information, if a column has a high percentage of missing values, then it probably will not be useful to our model. The threshold for removing columns should depend on the problem (here is a discussion), and for this project, we will remove any columns with more than 50% missing values.

At this point, we may also want to remove outliers. These can be due to typos in data entry, mistakes in units, or they could be legitimate but extreme values. For this project, we will remove anomalies based on the definition of extreme outliers:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

(For the code to remove the columns and the anomalies, see the notebook). At the end of the data cleaning and anomaly removal process, we are left with over 11,000 buildings and 49 features.

Exploratory Data Analysis

Now that the tedious — but necessary — step of data cleaning is complete, we can move on to exploring our data! Exploratory Data Analysis (EDA) is an open-ended process where we calculate statistics and make figures to find trends, anomalies, patterns, or relationships within the data.

In short, the goal of EDA is to learn what our data can tell us. It generally starts out with a high level overview, then narrows in to specific areas as we find interesting parts of the data. The findings may be interesting in their own right, or they can be used to inform our modeling choices, such as by helping us decide which features to use.

Single Variable Plots

The goal is to predict the Energy Star Score (renamed to score in our data) so a reasonable place to start is examining the distribution of this variable. A histogram is a simple yet effective way to visualize the distribution of a single variable and is easy to make using matplotlib.

import matplotlib.pyplot as plt

# Histogram of the Energy Star Score
plt.style.use('fivethirtyeight')
plt.hist(data['score'].dropna(), bins = 100, edgecolor = 'k');
plt.xlabel('Score'); plt.ylabel('Number of Buildings'); 
plt.title('Energy Star Score Distribution');

This looks quite suspicious! The Energy Star score is a percentile rank, which means we would expect to see a uniform distribution, with each score assigned to the same number of buildings. However, a disproportionate number of buildings have either the highest, 100, or the lowest, 1, score (higher is better for the Energy Star score).

If we go back to the definition of the score, we see that it is based on “self-reported energy usage” which might explain the very high scores. Asking building owners to report their own energy usage is like asking students to report their own scores on a test! As a result, this probably is not the most objective measure of a building’s energy efficiency.

If we had an unlimited amount of time, we might want to investigate why so many buildings have very high and very low scores which we could by selecting these buildings and seeing what they have in common. However, our objective is only to predict the score and not to devise a better method of scoring buildings! We can make a note in our report that the scores have a suspect distribution, but our main focus in on predicting the score.

Looking for Relationships

A major part of EDA is searching for relationships between the features and the target. Variables that are correlated with the target are useful to a model because they can be used to predict the target. One way to examine the effect of a categorical variable (which takes on only a limited set of values) on the target is through a density plot using the seaborn library.

A density plot can be thought of as a smoothed histogram because it shows the distribution of a single variable. We can color a density plot by class to see how a categorical variable changes the distribution. The following code makes a density plot of the Energy Star Score colored by the the type of building (limited to building types with more than 100 data points):

# Create a list of buildings with more than 100 measurements
types = data.dropna(subset=['score'])
types = types['Largest Property Use Type'].value_counts()
types = list(types[types.values > 100].index)

# Plot of distribution of scores for building categories
figsize(12, 10)

# Plot each building
for b_type in types:
    # Select the building type
    subset = data[data['Largest Property Use Type'] == b_type]
    
    # Density plot of Energy Star scores
    sns.kdeplot(subset['score'].dropna(),
               label = b_type, shade = False, alpha = 0.8);
    
# label the plot
plt.xlabel('Energy Star Score', size = 20); plt.ylabel('Density', size = 20); 
plt.title('Density Plot of Energy Star Scores by Building Type', size = 28);

We can see that the building type has a significant impact on the Energy Star Score. Office buildings tend to have a higher score while Hotels have a lower score. This tells us that we should include the building type in our modeling because it does have an impact on the target. As a categorical variable, we will have to one-hot encode the building type.

A similar plot can be used to show the Energy Star Score by borough:

The borough does not seem to have as large of an impact on the score as the building type. Nonetheless, we might want to include it in our model because there are slight differences between the boroughs.

To quantify relationships between variables, we can use the Pearson Correlation Coefficient. This is a measure of the strength and direction of a linear relationship between two variables. A score of +1 is a perfectly linear positive relationship and a score of -1 is a perfectly negative linear relationship. Several values of the correlation coefficient are shown below:

While the correlation coefficient cannot capture non-linear relationships, it is a good way to start figuring out how variables are related. In Pandas, we can easily calculate the correlations between any columns in a dataframe:

# Find all correlations with the score and sort 
correlations_data = data.corr()['score'].sort_values()

The most negative (left) and positive (right) correlations with the target:

There are several strong negative correlations between the features and the target with the most negative the different categories of EUI (these measures vary slightly in how they are calculated). The EUI — Energy Use Intensity — is the amount of energy used by a building divided by the square footage of the buildings. It is meant to be a measure of the efficiency of a building with a lower score being better. Intuitively, these correlations make sense: as the EUI increases, the Energy Star Score tends to decrease.

Two-Variable Plots

To visualize relationships between two continuous variables, we use scatterplots. We can include additional information, such as a categorical variable, in the color of the points. For example, the following plot shows the Energy Star Score vs. Site EUI colored by the building type:

This plot lets us visualize what a correlation coefficient of -0.7 looks like. As the Site EUI decreases, the Energy Star Score increases, a relationship that holds steady across the building types.

The final exploratory plot we will make is known as the Pairs Plot. This is a great exploration tool because it lets us see relationships between multiple pairs of variables as well as distributions of single variables. Here we are using the seaborn visualization library and the PairGrid function to create a Pairs Plot with scatterplots on the upper triangle, histograms on the diagonal, and 2D kernel density plots and correlation coefficients on the lower triangle.

# Extract the columns to  plot
plot_data = features[['score', 'Site EUI (kBtu/ft²)', 
                      'Weather Normalized Source EUI (kBtu/ft²)', 
                      'log_Total GHG Emissions (Metric Tons CO2e)']]

# Replace the inf with nan
plot_data = plot_data.replace({np.inf: np.nan, -np.inf: np.nan})

# Rename columns 
plot_data = plot_data.rename(columns = {'Site EUI (kBtu/ft²)': 'Site EUI', 
                                        'Weather Normalized Source EUI (kBtu/ft²)': 'Weather Norm EUI',
                                        'log_Total GHG Emissions (Metric Tons CO2e)': 'log GHG Emissions'})

# Drop na values
plot_data = plot_data.dropna()

# Function to calculate correlation coefficient between two columns
def corr_func(x, y, **kwargs):
    r = np.corrcoef(x, y)[0][1]
    ax = plt.gca()
    ax.annotate("r = {:.2f}".format(r),
                xy=(.2, .8), xycoords=ax.transAxes,
                size = 20)

# Create the pairgrid object
grid = sns.PairGrid(data = plot_data, size = 3)

# Upper is a scatter plot
grid.map_upper(plt.scatter, color = 'red', alpha = 0.6)

# Diagonal is a histogram
grid.map_diag(plt.hist, color = 'red', edgecolor = 'black')

# Bottom is correlation and density plot
grid.map_lower(corr_func);
grid.map_lower(sns.kdeplot, cmap = plt.cm.Reds)

# Title for entire plot
plt.suptitle('Pairs Plot of Energy Data', size = 36, y = 1.02);

To see interactions between variables, we look for where a row intersects with a column. For example, to see the correlation of Weather Norm EUI with score, we look in the Weather Norm EUI row and the score column and see a correlation coefficient of -0.67. In addition to looking cool, plots such as these can help us decide which variables to include in modeling.

Feature Engineering and Selection

Feature engineering and selection often provide the greatest return on time invested in a machine learning problem. First of all, let’s define what these two tasks are:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

A machine learning model can only learn from the data we provide it, so ensuring that data includes all the relevant information for our task is crucial. If we don’t feed a model the correct data, then we are setting it up to fail and we should not expect it to learn!

For this project, we will take the following feature engineering steps:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

One-hot encoding is necessary to include categorical variables in a model. A machine learning algorithm cannot understand a building type of “office”, so we have to record it as a 1 if the building is an office and a 0 otherwise.

Adding transformed features can help our model learn non-linear relationships within the data. Taking the square root, natural log, or various powers of features is common practice in data science and can be based on domain knowledge or what works best in practice. Here we will include the natural log of all numerical features.

The following code selects the numeric features, takes log transformations of these features, selects the two categorical features, one-hot encodes these features, and joins the two sets together. This seems like a lot of work, but it is relatively straightforward in Pandas!

# Copy the original data
features = data.copy()

# Select the numeric columns
numeric_subset = data.select_dtypes('number')

# Create columns with log of numeric columns
for col in numeric_subset.columns:
    # Skip the Energy Star Score column
    if col == 'score':
        next
    else:
        numeric_subset['log_' + col] = np.log(numeric_subset[col])
        
# Select the categorical columns
categorical_subset = data[['Borough', 'Largest Property Use Type']]

# One hot encode
categorical_subset = pd.get_dummies(categorical_subset)

# Join the two dataframes using concat
# Make sure to use axis = 1 to perform a column bind
features = pd.concat([numeric_subset, categorical_subset], axis = 1)

After this process we have over 11,000 observations (buildings) with 110 columns (features). Not all of these features are likely to be useful for predicting the Energy Star Score, so now we will turn to feature selection to remove some of the variables.

Feature Selection

Many of the 110 features we have in our data are redundant because they are highly correlated with one another. For example, here is a plot of Site EUI vs Weather Normalized Site EUI which have a correlation coefficient of 0.997.

Features that are strongly correlated with each other are known as collinear and removing one of the variables in these pairs of features can often help a machine learning model generalize and be more interpretable. (I should point out we are talking about correlations of features with other features, not correlations with the target, which help our model!)

There are a number of methods to calculate collinearity between features, with one of the most common the variance inflation factor. In this project, we will use thebcorrelation coefficient to identify and remove collinear features. We will drop one of a pair of features if the correlation coefficient between them is greater than 0.6. For the implementation, take a look at the notebook (and this Stack Overflow answer)

While this value may seem arbitrary, I tried several different thresholds, and this choice yielded the best model. Machine learning is an empirical field and is often about experimenting and finding what performs best! After feature selection, we are left with 64 total features and 1 target.

# Remove any columns with all na values
features  = features.dropna(axis=1, how = 'all')
print(features.shape)

(11319, 65)

Establishing a Baseline

We have now completed data cleaning, exploratory data analysis, and feature engineering. The final step to take before getting started with modeling is establishing a naive baseline. This is essentially a guess against which we can compare our results. If the machine learning models do not beat this guess, then we might have to conclude that machine learning is not acceptable for the task or we might need to try a different approach.

For regression problems, a reasonable naive baseline is to guess the median value of the target on the training set for all the examples in the test set. This sets a relatively low bar for any model to surpass.

The metric we will use is mean absolute error (mae) which measures the average absolute error on the predictions. There are many metrics for regression, but I like Andrew Ng’s advice to pick a single metric and then stick to it when evaluating models. The mean absolute error is easy to calculate and is interpretable.

Before calculating the baseline, we need to split our data into a training and a testing set:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

We will use 70% of the data for training and 30% for testing:

# Split into 70% training and 30% testing set
X, X_test, y, y_test = train_test_split(features, targets, 
                                        test_size = 0.3, 
                                        random_state = 42)

Now we can calculate the naive baseline performance:

# Function to calculate mean absolute error
def mae(y_true, y_pred):
    return np.mean(abs(y_true - y_pred))

baseline_guess = np.median(y)

print('The baseline guess is a score of %0.2f' % baseline_guess)
print("Baseline Performance on the test set: MAE = %0.4f" % mae(y_test, baseline_guess))
The baseline guess is a score of 66.00
Baseline Performance on the test set: MAE = 24.5164

The naive estimate is off by about 25 points on the test set. The score ranges from 1–100, so this represents an error of 25%, quite a low bar to surpass!

Conclusions

In this article we walked through the first three steps of a machine learning problem. After defining the question, we:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

Finally, we also completed the crucial step of establishing a baseline against which we can judge our machine learning algorithms.

A Complete Machine Learning Walk-Through in Python (Part Two): Model Selection, Hyperparameter Tuning, and Evaluation

Model Evaluation and Selection

As a reminder, we are working on a supervised regression task: using New York City building energy data, we want to develop a model that can predict the Energy Star Score of a building. Our focus is on both accuracy of the predictions and interpretability of the model.

There are a ton of machine learning models to choose from and deciding where to start can be intimidating. While there are some charts that try to show you which algorithm to use, I prefer to just try out several and see which one works best! Machine learning is still a field driven primarily by empirical (experimental) rather than theoretical results, and it’s almost impossible to know ahead of time which model will do the best.

Generally, it’s a good idea to start out with simple, interpretable models such as linear regression, and if the performance is not adequate, move on to more complex, but usually more accurate methods. The following chart shows a (highly unscientific) version of the accuracy vs interpretability trade-off:

We will evaluate five different models covering the complexity spectrum:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

In this post we will focus on implementing these methods rather than the theory behind them. For anyone interesting in learning the background, I highly recommend An Introduction to Statistical Learning (available free online) or Hands-On Machine Learning with Scikit-Learn and TensorFlow. Both of these textbooks do a great job of explaining the theory and showing how to effectively use the methods in R and Python respectively.

Imputing Missing Values

While we dropped the columns with more than 50% missing values when we cleaned the data, there are still quite a few missing observations. Machine learning models cannot deal with any absent values, so we have to fill them in, a process known as imputation.

First, we’ll read in all the data and remind ourselves what it looks like:

import pandas as pd
import numpy as np

# Read in data into dataframes 
train_features = pd.read_csv('data/training_features.csv')
test_features = pd.read_csv('data/testing_features.csv')
train_labels = pd.read_csv('data/training_labels.csv')
test_labels = pd.read_csv('data/testing_labels.csv')

Training Feature Size:  (6622, 64)
Testing Feature Size:   (2839, 64)
Training Labels Size:   (6622, 1)
Testing Labels Size:    (2839, 1)

Every value that is NaN represents a missing observation. While there are a number of ways to fill in missing data, we will use a relatively simple method, median imputation. This replaces all the missing values in a column with the median value of the column.

In the following code, we create a Scikit-Learn Imputer object with the strategy set to median. We then train this object on the training data (using imputer.fit) and use it to fill in the missing values in both the training and testing data (using imputer.transform). This means missing values in the test data are filled in with the corresponding median value from the training data.

(We have to do imputation this way rather than training on all the data to avoid the problem of test data leakage, where information from the testing dataset spills over into the training data.)

# Create an imputer object with a median filling strategy
imputer = Imputer(strategy='median')

# Train on the training features
imputer.fit(train_features)

# Transform both training data and testing data
X = imputer.transform(train_features)
X_test = imputer.transform(test_features)

Missing values in training features:  0
Missing values in testing features:   0

All of the features now have real, finite values with no missing examples.

Feature Scaling

Scaling refers to the general process of changing the range of a feature. This is necessary because features are measured in different units, and therefore cover different ranges. Methods such as support vector machines and K-nearest neighbors that take into account distance measures between observations are significantly affected by the range of the features and scaling allows them to learn. While methods such as Linear Regression and Random Forest do not actually require feature scaling, it is still best practice to take this step when we are comparing multiple algorithms.

We will scale the features by putting each one in a range between 0 and 1. This is done by taking each value of a feature, subtracting the minimum value of the feature, and dividing by the maximum minus the minimum (the range). This specific version of scaling is often called normalization and the other main version is known as standardization.

While this process would be easy to implement by hand, we can do it using a MinMaxScaler object in Scikit-Learn. The code for this method is identical to that for imputation except with a scaler instead of imputer! Again, we make sure to train only using training data and then transform all the data.

# Create the scaler object with a range of 0-1
scaler = MinMaxScaler(feature_range=(0, 1))

# Fit on the training data
scaler.fit(X)

# Transform both the training and testing data
X = scaler.transform(X)
X_test = scaler.transform(X_test)

Every feature now has a minimum value of 0 and a maximum value of 1. Missing value imputation and feature scaling are two steps required in nearly any machine learning pipeline so it’s a good idea to understand how they work!

Implementing Machine Learning Models in Scikit-Learn

After all the work we spent cleaning and formatting the data, actually creating, training, and predicting with the models is relatively simple. We will use the Scikit-Learn library in Python, which has great documentation and a consistent model building syntax. Once you know how to make one model in Scikit-Learn, you can quickly implement a diverse range of algorithms.

We can illustrate one example of model creation, training (using .fit ) and testing (using .predict ) with the Gradient Boosting Regressor:

from sklearn.ensemble import GradientBoostingRegressor

# Create the model
gradient_boosted = GradientBoostingRegressor()

# Fit the model on the training data
gradient_boosted.fit(X, y)

# Make predictions on the test data
predictions = gradient_boosted.predict(X_test)

# Evaluate the model
mae = np.mean(abs(predictions - y_test))

print('Gradient Boosted Performance on the test set: MAE = %0.4f' % mae)
Gradient Boosted Performance on the test set: MAE = 10.0132

Model creation, training, and testing are each one line! To build the other models, we use the same syntax, with the only change the name of the algorithm. The results are presented below:

To put these figures in perspective, the naive baseline calculated using the median value of the target was 24.5. Clearly, machine learning is applicable to our problem because of the significant improvement over the baseline!

The gradient boosted regressor (MAE = 10.013) slightly beats out the random forest (10.014 MAE). These results aren’t entirely fair because we are mostly using the default values for the hyperparameters. Especially in models such as the support vector machine, the performance is highly dependent on these settings. Nonetheless, from these results we will select the gradient boosted regressor for model optimization.

Hyperparameter Tuning for Model Optimization

In machine learning, after we have selected a model, we can optimize it for our problem by tuning the model hyperparameters.

First off, what are hyperparameters and how do they differ from parameters?

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

Controlling the hyperparameters affects the model performance by altering the balance between underfitting and overfitting in a model. Underfitting is when our model is not complex enough (it does not have enough degrees of freedom) to learn the mapping from features to target. An underfit model has high bias, which we can correct by making our model more complex.

Overfitting is when our model essentially memorizes the training data. An overfit model has high variance, which we can correct by limiting the complexity of the model through regularization. Both an underfit and an overfit model will not be able to generalize well to the testing data.

The problem with choosing the right hyperparameters is that the optimal set will be different for every machine learning problem! Therefore, the only way to find the best settings is to try out a number of them on each new dataset. Luckily, Scikit-Learn has a number of methods to allow us to efficiently evaluate hyperparameters. Moreover, projects such as TPOT by Epistasis Lab are trying to optimize the hyperparameter search using methods like genetic programming. In this project, we will stick to doing this with Scikit-Learn, but stayed tuned for more work on the auto-ML scene!

Random Search with Cross Validation

The particular hyperparameter tuning method we will implement is called random search with cross validation:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

The idea of K-Fold cross validation with K = 5 is shown below:

The entire process of performing random search with cross validation is:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

Of course, we don’t do actually do this manually, but rather let Scikit-Learn’s RandomizedSearchCV handle all the work!

Slight Diversion: Gradient Boosted Methods

Since we will be using the Gradient Boosted Regression model, I should give at least a little background! This model is an ensemble method, meaning that it is built out of many weak learners, in this case individual decision trees. While a bagging algorithm such as random forest trains the weak learners in parallel and has them vote to make a prediction, a boosting method like Gradient Boosting, trains the learners in sequence, with each learner “concentrating” on the mistakes made by the previous ones.

Boosting methods have become popular in recent years and frequently win machine learning competitions. The Gradient Boosting Method is one particular implementation that uses Gradient Descent to minimize the cost function by sequentially training learners on the residuals of previous ones. The Scikit-Learn implementation of Gradient Boosting is generally regarded as less efficient than other libraries such as XGBoost , but it will work well enough for our small dataset and is quite accurate.

Back to Hyperparameter Tuning

There are many hyperparameters to tune in a Gradient Boosted Regressor and you can look at the Scikit-Learn documentation for the details. We will optimize the following hyperparameters:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

I’m not sure if there is anyone who truly understands how all of these interact, and the only way to find the best combination is to try them out!

In the following code, we build a hyperparameter grid, create a RandomizedSearchCV object, and perform hyperparameter search using 4-fold cross validation over 25 different combinations of hyperparameters:

# Loss function to be optimized
loss = ['ls', 'lad', 'huber']

# Number of trees used in the boosting process
n_estimators = [100, 500, 900, 1100, 1500]

# Maximum depth of each tree
max_depth = [2, 3, 5, 10, 15]

# Minimum number of samples per leaf
min_samples_leaf = [1, 2, 4, 6, 8]

# Minimum number of samples to split a node
min_samples_split = [2, 4, 6, 10]

# Maximum number of features to consider for making splits
max_features = ['auto', 'sqrt', 'log2', None]

# Define the grid of hyperparameters to search
hyperparameter_grid = {'loss': loss,
                       'n_estimators': n_estimators,
                       'max_depth': max_depth,
                       'min_samples_leaf': min_samples_leaf,
                       'min_samples_split': min_samples_split,
                       'max_features': max_features}

# Create the model to use for hyperparameter tuning
model = GradientBoostingRegressor(random_state = 42)

# Set up the random search with 4-fold cross validation
random_cv = RandomizedSearchCV(estimator=model,
                               param_distributions=hyperparameter_grid,
                               cv=4, n_iter=25, 
                               scoring = 'neg_mean_absolute_error',
                               n_jobs = -1, verbose = 1, 
                               return_train_score = True,
                               random_state=42)

# Fit on the training data
random_cv.fit(X, y)

After performing the search, we can inspect the RandomizedSearchCV object to find the best model:

# Find the best combination of settings
random_cv.best_estimator_

GradientBoostingRegressor(loss='lad', max_depth=5,
                          max_features=None,
                          min_samples_leaf=6,
                          min_samples_split=6,
                          n_estimators=500)

We can then use these results to perform grid search by choosing parameters for our grid that are close to these optimal values. However, further tuning is unlikely to significant improve our model. As a general rule, proper feature engineering will have a much larger impact on model performance than even the most extensive hyperparameter tuning. It’s the law of diminishing returns applied to machine learning: feature engineering gets you most of the way there, and hyperparameter tuning generally only provides a small benefit.

One experiment we can try is to change the number of estimators (decision trees) while holding the rest of the hyperparameters steady. This directly lets us observe the effect of this particular setting. See the notebook for the implementation, but here are the results:

As the number of trees used by the model increases, both the training and the testing error decrease. However, the training error decreases much more rapidly than the testing error and we can see that our model is overfitting: it performs very well on the training data, but is not able to achieve that same performance on the testing set.

We always expect at least some decrease in performance on the testing set (after all, the model can see the true answers for the training set), but a significant gap indicates overfitting. We can address overfitting by getting more training data, or decreasing the complexity of our model through the hyerparameters. In this case, we will leave the hyperparameters where they are, but I encourage anyone to try and reduce the overfitting.

For the final model, we will use 800 estimators because that resulted in the lowest error in cross validation. Now, time to test out this model!

Evaluating on the Test Set

As responsible machine learning engineers, we made sure to not let our model see the test set at any point of training. Therefore, we can use the test set performance as an indicator of how well our model would perform when deployed in the real world.

Making predictions on the test set and calculating the performance is relatively straightforward. Here, we compare the performance of the default Gradient Boosted Regressor to the tuned model:

# Make predictions on the test set using default and final model
default_pred = default_model.predict(X_test)
final_pred = final_model.predict(X_test)

Default model performance on the test set: MAE = 10.0118.
Final model performance on the test set:   MAE = 9.0446.

Hyperparameter tuning improved the accuracy of the model by about 10%. Depending on the use case, 10% could be a massive improvement, but it came at a significant time investment!

We can also time how long it takes to train the two models using the %timeit magic command in Jupyter Notebooks. First is the default model:

%%timeit -n 1 -r 5
default_model.fit(X, y)

1.09 s ± 153 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)

1 second to train seems very reasonable. The final tuned model is not so fast:

%%timeit -n 1 -r 5
final_model.fit(X, y)

12.1 s ± 1.33 s per loop (mean ± std. dev. of 5 runs, 1 loop each)

This demonstrates a fundamental aspect of machine learning: it is always a game of trade-offs. We constantly have to balance accuracy vs interpretability, bias vs variance, accuracy vs run time, and so on. The right blend will ultimately depend on the problem. In our case, a 12 times increase in run-time is large in relative terms, but in absolute terms it’s not that significant.

Once we have the final predictions, we can investigate them to see if they exhibit any noticeable skew. On the left is a density plot of the predicted and actual values, and on the right is a histogram of the residuals:

The model predictions seem to follow the distribution of the actual values although the peak in the density occurs closer to the median value (66) on the training set than to the true peak in density (which is near 100). The residuals are nearly normally distribution, although we see a few large negative values where the model predictions were far below the true values.

Conclusions

In this article we covered several steps in the machine learning workflow:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

The results of this work showed us that machine learning is applicable to the task of predicting a building’s Energy Star Score using the available data. Using a gradient boosted regressor we were able to predict the scores on the test set to within 9.1 points of the true value. Moreover, we saw that hyperparameter tuning can increase the performance of a model at a significant cost in terms of time invested. This is one of many trade-offs we have to consider when developing a machine learning solution.

A Complete Machine Learning Walk-Through in Python (Part Three): Interpreting a machine learning model and presenting results

As a reminder, we are working through a supervised regression machine learning problem. Using New York City building energy data, we have developed a model which can predict the Energy Star Score of a building. The final model we built is a Gradient Boosted Regressor which is able to predict the Energy Star Score on the test data to within 9.1 points (on a 1–100 scale).

Model Interpretation

The gradient boosted regressor sits somewhere in the middle on the scale of model interpretability: the entire model is complex, but it is made up of hundreds of decision trees, which by themselves are quite understandable. We will look at three ways to understand how our model makes predictions:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

The first two methods are specific to ensembles of trees, while the third — as you might have guessed from the name — can be applied to any machine learning model. LIME is a relatively new package and represents an exciting step in the ongoing effort to explain machine learning predictions.

Feature Importances

Feature importances attempt to show the relevance of each feature to the task of predicting the target. The technical details of feature importances are complex (they measure the mean decrease impurity, or the reduction in error from including the feature), but we can use the relative values to compare which features are the most relevant. In Scikit-Learn, we can extract the feature importances from any ensemble of tree-based learners.

With model as our trained model, we can find the feature importances usingmodel.feature_importances_. Then we can put them into a pandas DataFrame and display or plot the top ten most important:

import pandas as pd

# model is the trained model
importances = model.feature_importances_

# train_features is the dataframe of training features
feature_list = list(train_features.columns)

# Extract the feature importances into a dataframe
feature_results = pd.DataFrame({'feature': feature_list, 
                                'importance': importances})

# Show the top 10 most important
feature_results = feature_results.sort_values('importance', 
                                              ascending = False).reset_index(drop=True)

feature_results.head(10)

The Site EUI(Energy Use Intensity) and the Weather Normalized Site Electricity Intensity are by far the most important features, accounting for over 66% of the total importance. After the top two features, the importance drops off significantly, which indicates we might not need to retain all 64 features in the data to achieve high performance. (In the Jupyter notebook, I take a look at using only the top 10 features and discover that the model is not quite as accurate.)

Based on these results, we can finally answer one of our initial questions: the most important indicators of a building’s Energy Star Score are the Site EUI and the Weather Normalized Site Electricity Intensity. While we do want to be careful about reading too much into the feature importances, they are a useful way to start to understand how the model makes its predictions.

Visualizing a Single Decision Tree

While the entire gradient boosting regressor may be difficult to understand, any one individual decision tree is quite intuitive. We can visualize any tree in the forest using the Scikit-Learn function [export_graphviz]([http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html)](http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html) "http://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html)"). We first extract a tree from the ensemble then save it as a dot file:

from sklearn import tree

# Extract a single tree (number 105)
single_tree = model.estimators_[105][0]

# Save the tree to a dot file
tree.export_graphviz(single_tree, out_file = 'images/tree.dot', 
feature_names = feature_list)

Using the Graphviz visualization software we can convert the dot file to a png from the command line:

dot -Tpng images/tree.dot -o images/tree.png

The result is a complete decision tree:

This is a little overwhelming! Even though this tree only has a depth of 6 (the number of layers), it’s difficult to follow. We can modify the call to export_graphviz and limit our tree to a more reasonable depth of 2:

Each node (box) in the tree has four pieces of information:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

(Leaf nodes only have 2.–4. because they represent the final estimate and do not have any children).

A decision tree makes a prediction for a data point by starting at the top node, called the root, and working its way down through the tree. At each node, a yes-or-no question is asked of the data point. For example, the question for the node above is: Does the building have a Site EUI less than or equal to 68.95? If the answer is yes, the building is placed in the right child node, and if the answer is no, the building goes to the left child node.

This process is repeated at each layer of the tree until the data point is placed in a leaf node, at the bottom of the tree (the leaf nodes are cropped from the small tree image). The prediction for all the data points in a leaf node is the value. If there are multiple data points ( samples ) in a leaf node, they all get the same prediction. As the depth of the tree is increased, the error on the training set will decrease because there are more leaf nodes and the examples can be more finely divided. However, a tree that is too deep will overfit to the training data and will not be able to generalize to new testing data.

In the second article, we tuned a number of the model hyperparameters, which control aspects of each tree such as the maximum depth of the tree and the minimum number of samples required in a leaf node. These both have a significant impact on the balance of under vs over-fitting, and visualizing a single decision tree allows us to see how these settings work.

Although we cannot examine every tree in the model, looking at one lets us understand how each individual learner makes a prediction. This flowchart-based method seems much like how a human makes decisions, answering one question about a single value at a time. Decision-tree-based ensembles combine the predictions of many individual decision trees in order to create a more accurate model with less variance. Ensembles of trees tend to be very accurate, and also are intuitive to explain.

Local Interpretable Model-Agnostic Explanations (LIME)

The final tool we will explore for trying to understand how our model “thinks” is a new entry into the field of model explanations. LIME aims to explain a single prediction from any machine learning model by creating a approximation of the model locally near the data point using a simple model such as linear regression (the full details can be found in the paper ).

Here we will use LIME to examine a prediction the model gets completely wrong to see what it might tell us about why the model makes mistakes.

First we need to find the observation our model gets most wrong. We do this by training and predicting with the model and extracting the example on which the model has the greatest error:

from sklearn.ensemble import GradientBoostingRegressor

# Create the model with the best hyperparamters
model = GradientBoostingRegressor(loss='lad', max_depth=5, max_features=None,
                                  min_samples_leaf=6, min_samples_split=6, 
                                  n_estimators=800, random_state=42)

# Fit and test on the features
model.fit(X, y)
model_pred = model.predict(X_test)

# Find the residuals
residuals = abs(model_pred - y_test)
    
# Extract the most wrong prediction
wrong = X_test[np.argmax(residuals), :]

print('Prediction: %0.4f' % np.argmax(residuals))
print('Actual Value: %0.4f' % y_test[np.argmax(residuals)])
Prediction: 12.8615
Actual Value: 100.0000

Next, we create the LIME explainer object passing it our training data, the mode, the training labels, and the names of the features in our data. Finally, we ask the explainer object to explain the wrong prediction, passing it the observation and the prediction function.

import lime 

# Create a lime explainer object
explainer = lime.lime_tabular.LimeTabularExplainer(training_data = X, 
                                                   mode = 'regression',
                                                   training_labels = y,
                                                   feature_names = feature_list)


# Explanation for wrong prediction
exp = explainer.explain_instance(data_row = wrong, 
                                 predict_fn = model.predict)

# Plot the prediction explaination
exp.as_pyplot_figure();

The plot explaining this prediction is below:

Here’s how to interpret the plot: Each entry on the y-axis indicates one value of a variable and the red and green bars show the effect this value has on the prediction. For example, the top entry says the Site EUI is greater than 95.90 which subtracts about 40 points from the prediction. The second entry says the Weather Normalized Site Electricity Intensity is less than 3.80 which adds about 10 points to the prediction. The final prediction is an intercept term plus the sum of each of these individual contributions.

We can get another look at the same information by calling the explainer .show_in_notebook() method:

# Show the explanation in the Jupyter Notebook
exp.show_in_notebook()

This shows the reasoning process of the model on the left by displaying the contributions of each variable to the prediction. The table on the right shows the actual values of the variables for the data point.

For this example, the model prediction was about 12 and the actual value was 100! While initially this prediction may be puzzling, looking at the explanation, we can see this was not an extreme guess, but a reasonable estimate given the values for the data point. The Site EUI was relatively high and we would expect the Energy Star Score to be low (because EUI is strongly negatively correlated with the score), a conclusion shared by our model. In this case, the logic was faulty because the building had a perfect score of 100.

It can be frustrating when a model is wrong, but explanations such as these help us to understand why the model is incorrect. Moreover, based on the explanation, we might want to investigate why the building has a perfect score despite such a high Site EUI. Perhaps we can learn something new about the problem that would have escaped us without investigating the model. Tools such as this are not perfect, but they go a long way towards helping us understand the model which in turn can allow us to make better decisions.

Documenting Work and Reporting Results

An often under-looked part of any technical project is documentation and reporting. We can do the best analysis in the world, but if we do not clearly communicate the results, then they will not have any impact!

When we document a data science project, we take all the versions of the data and code and package it so it our project can be reproduced or built on by other data scientists. It’s important to remember that code is read more often than it is written, and we want to make sure our work is understandable both for others and for ourselves if we come back to it a few months later. This means putting in helpful comments in the code and explaining your reasoning. I find Jupyter Notebooks to be a great tool for documentation because they allow for explanations and code one after the other.

Jupyter Notebooks can also be a good platform for communicating findings to others. Using notebook extensions, we can hide the code from our final report , because although it’s hard to believe, not everyone wants to see a bunch of Python code in a document!

Personally, I struggle with succinctly summarizing my work because I like to go through all the details. However, it’s important to understand your audience when you are presenting and tailor the message accordingly. With that in mind, here is my 30-second takeaway from the project:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

Originally, I was given this project as a job-screening “assignment” by a start-up. For the final report, they wanted to see both my work and my conclusions, so I developed a Jupyter Notebook to turn in. However, instead of converting directly to PDF in Jupyter, I converted it to a Latex .tex file that I then edited in texStudio before rendering to a PDF for the final version. The default PDF output from Jupyter has a decent appearance, but it can be significantly improved with a few minutes of editing. Moreover, Latex is a powerful document preparation system and it’s good to know the basics.

At the end of the day, our work is only as valuable as the decisions it enables, and being able to present results is a crucial skill. Furthermore, by properly documenting work, we allow others to reproduce our results, give us feedback so we can become better data scientists, and build on our work for the future.

Conclusions

Throughout this series of posts, we’ve walked through a complete end-to-end machine learning project. We started by cleaning the data, moved into model building, and finally looked at how to interpret a machine learning model. As a reminder, the general structure of a machine learning project is below:

  1. Data cleaning and formatting
  2. Exploratory data analysis
  3. Feature engineering and selection
  4. Compare several machine learning models on a performance metric
  5. Perform hyperparameter tuning on the best model
  6. Evaluate the best model on the testing set
  7. Interpret the model results
  8. Draw conclusions and document work

While the exact steps vary by project, and machine learning is often an iterative rather than linear process, this guide should serve you well as you tackle future machine learning projects. I hope this series has given you confidence to be able to implement your own machine learning solutions, but remember, none of us do this by ourselves! If you want any help, there are many incredibly supportive communities where you can look for advice.

A few resources I have found helpful throughout my learning process:

  • Supervised: we have access to both the features and the target and our goal is to train a model that can learn a mapping between the two
  • Regression: The Energy Star score is a continuous variable

*Originally published at *https://towardsdatascience.com

Thanks for reading

If you liked this post, share it with all of your programming buddies!

Follow us on Facebook | Twitter

Further reading

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python for Data Science and Machine Learning Bootcamp

Machine Learning, Data Science and Deep Learning with Python

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Artificial Intelligence A-Z™: Learn How To Build An AI

A Complete Machine Learning Project Walk-Through in Python

Machine Learning: how to go from Zero to Hero

Top 18 Machine Learning Platforms For Developers

10 Amazing Articles On Python Programming And Machine Learning

100+ Basic Machine Learning Interview Questions and Answers