Nowadays, demand for data scientist and analyst expert has outpaced the supply, despite the surge of the people entering the field. To answer this gap, we need some friendlies machine learning frameworks that can be used by non-experts user. Some machine learning framework like Tensorflow, H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing models. As we want to remove this gap I would like to introduce a good concept called AutoML.

AutoML is an idea to automate the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. It will be automatically trained on collections of individual models to produce highly predictive ensemble models which, in most cases, will be the top performing models in the AutoML Leaderboard.

One of the available framework to achieve this purpose is H2O, it’s possible for non-AI users, and it’s also a friendly framework for the developer who didn’t have any previous experience to analyzing or developing a model. Before we move further, I want to define the goal first. The main goal is to train the model in the most common way for training a machine learning model in python language using jupyter notebook (if you want to collaborate with another scientist in the future it would be easier for them), and then for the software engineer part we want to deliver the model in the java language as the most used language for the large scale applications in the production. I choose java over any other languages because it one of the popular language on the industry, and it would be very relevant to the real world problem. I have already provide ready to used example for you to be convenience to follow this guide.

First of all, we need to run the H2O in the docker that I already prepared for you, at the first line we want to make sure there is no docker container named h2o already exists in your computer / laptop. The next line is for running the docker container and open some ports to be available on your local machine.

docker container rm h2o
docker run -ti --name=h2o -p 54321:54321 -p 8888:8888 adrian3ka/h2o:0.0.1 /bin/bash

Don’t expect anything yet, because we only the running and entering the docker machine and didn’t run anything yet. The command below will be launch H2O applications by running the jar directly:

cd /opt
java -Xmx1g -jar h2o.jar

I just wanted to let you know that we also could develop the model via H2O-Flow notebook on website view, but we will use jupyter notebook on this tutorial, as it is the most used tools today. You could access H2O-Flow notebook through http://localhost:54321 as we already exposed the port from the command above -p 54321:54321.

Now we will try to launch the jupyter notebook, first we need to launch another docker terminal:

export H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")
docker exec -it $H2O_CONTAINER_ID bash

Inside the docker terminal, we would try to initiate the jupyter notebook, but we should get the docker IP information by running some command that already explained below. Here are the command:

cd ~/h2o ## we would try to move to the h2o folder that already prepared for this example
virtualenv h2o_venv

source h2o_venv/bin/activate
## open new terminal to check the docker local docker IP
export H2O_CONTAINER_ID=$(docker ps -aqf "name=h2o")
docker inspect $H2O_CONTAINER_ID | grep "\"IPAddress\"" -m1
## use ip from the output, my output are: 172.17.0.3
jupyter notebook --ip=172.17.0.3 --port=8888 --allow-root

#analytics #big-data #predictions #machine-learning

End to End Automated Machine Learning Process using AutoML
1.10 GEEK