Deep Learning for Java Developers

Deep Learning for Java Developers

Let's see how to do deep learning for Java with Valohai.


Some time ago, I came across this life-cycle management tool (or cloud service) called Valohai, and I was quite impressed by its user-interface and simplicity of design and layout. I had a good chat about the service at that time with one of the members of Valohai and was given a demo. Previous to that, I had written a simple pipeline using GNU Parallel, JavaScript, Python, and Bash — and another one purely using GNU Parallel and Bash.

I also thought about replacing the moving parts with ready-to-use task/workflow management tools like Jenkins X, Jenkins Pipeline, Concourse or Airflow, but due to various reasons, I did not proceed with the idea.

Coming back to our original conversation, I noticed a lot of the examples and docs on Valohai were based on Python and R and the respective frameworks and libraries. There was a lack of Java/JVM based examples or docs, so I took this opportunity to do something about that.

I was encouraged by Valohai to implement something using the famous Java library called DL4J - Deep Learning for Java.

My initial experience with Valohai already gave me a good impression after getting an understanding of its design, layout, and workflow. And that it was developer-friendly and the makers already took into consideration various facets of both developer and infrastructure workflows. In our worlds, the latter is mostly run by DevOps or SysOps teams and we know the nuances and pain-points attached to it. You can find out more about its features from the Features section of the site.

What Do We Need and How?

For any machine learning or deep learning project or initiative, two important components (from a high-level perspective) are code that will create and serve the model and infrastructure where this whole lifecycle will be executed.

Of course, there are going to be steps and components needed before, during, and after, but to keep things simple, let’s say we need code and infrastructure.


For code, I have chosen a modified example using DL4J, it’s an MNist project with a training set of 60,000 images and a test set of 10,000 images of hand-written digits. This dataset is available via the DL4J library (just like Keras provides a stock of them). Look for the MnistDataSetIterator under DatasetIterators in the DL4J Cheatsheet for further details on this particular dataset.

Have a look at the source code we will be using before getting started, the main Java class is called org.deeplearning4j.feedforward.mnist.MLPMnistSingleLayerRunner.


We have decided to try out the Java example using Valohai as our infrastructure to run our experiments (training and evaluation of the model). Valohai recognizes git repositories and directly hooks into them and allows Execution of our code, irrespective of platform or language — we will see how this works. This also means if you are a strong supporter of GitOps or Infrastructure-As-Code, you will appreciate the workflow.

For this, we just need an account on Valohai, we can use a Free-tier account and have access to several instances of various configurations when we sign up. For what we would like to do, the Free-tier is more than enough.

Deep Learning for Java and Valohai

We will bundle the necessary build and run-time dependencies into the Docker image and use it to build our Java app, train a model, and evaluate it on the Valohai platform via a simple valohai.yaml file, which is placed in the root folder of the project repository.

Deep Learning for Java: DL4J

The easy part is, we won’t need to do much here, just build the jar and download the dataset into the Docker container. We have a pre-built Docker image that contains all the dependencies needed to build a Java app. We have pushed this image into Docker Hub, and you can find it by searching for dl4j-mnist-single-layer (we will be using a specific tag as defined in the YAML file). We have chosen to use GraalVM 19.1.1 as our Java build and runtime for this project, and it is embedded into the Docker image (see Dockerfile for the definition of the Docker image).


When the uber jar is invoked from the command-line, we land into the MLPMnistSingleLayerRunner class, which directs us to the intended action depending on the parameters passed in:

public static void main(String[] args) throws Exception {
    MLPMnistSingleLayerRunner mlpMnistRunner = new MLPMnistSingleLayerRunner();




The parameters passed into the uber jar are received by this class and handled by the execute() method.

We can create a model via the --action train parameter and evaluate the created model via the --action evaluate parameter respectively passed to the Java app (uber jar).

The main parts of the Java app that does this work can be found in the two Java classes mentioned in the sections below.

Train a Model

Can be invoked from the command-line via:

./ --action train --output-dir ${VH_OUTPUTS_DIR}


java -Djava.library.path=""
-jar target/MLPMnist-1.0.0-bin.jar
--action train --output-dir ${VH_OUTPUTS_DIR}

This creates the model (when successful, at the end of the Execution) by the name mlpmnist-single-layer.pb in the folder specified by the --output-dir passed in at the beginning of the Execution. From the perspective of Valohai, it should be placed into the ${VH_OUTPUTS_DIR} which is what we do (see valohai.yaml file).

For source code, see class

Evaluate a Model

Can be invoked from the command-line via:

./ --action evaluate --input-dir ${VH_INPUTS_DIR}/model


java -Djava.library.path=""
-jar target/MLPMnist-1.0.0-bin.jar
--action evaluate --input-dir ${VH_INPUTS_DIR}/model

This expects a model (created by the training step) by the name mlpmnist-single-layer.pb to be present in the folder specified by the --input-dir passed in when the app has been called.

For source code, see class

I hope this short illustration makes it clear how the Java app that trains and evaluates the model works in general.

That’s all is needed of us, but feel free to play with the rest of the source (along with the and bash scripts) and satisfy your curiosity and understanding of how this is done!


Valohai allows us to loosely couple our runtime environment, our code, and our dataset, as you can see from the structure of the YAML file below. That way, the different components can evolve independently without impeding or being dependent on one another. Hence our Docker container only has the build and runtime time components packed into it.

At Execution time, we build the uber jar in the Docker container, upload it to some internal or external storage, and then via another Execution step download the uber jar and dataset from storage (or another location) to run the training. This way, the two execution steps are decoupled; we can e.g. build the jar once and run hundreds of training steps on the same jar. As the build and runtime environments should not change that often, we can cache them, and the code, dataset, and model sources can be made dynamically available during Execution time.


The heart of integrating our Java project with the Valohai infrastructure is defining the steps of Execution of the steps in the valohai.yaml file placed in the root of your project folder. Our valohai.yaml looks like this:


  • step: name: Build-dl4j-mnist-single-layer-java-app image: neomatrix369/dl4j-mnist-single-layer:v0.5 command:

    - cd ${VH_REPOSITORY_DIR}/examples/cloud-devops-infra/valohai/MLPMnist/
    - ./
    - echo "~~~ Copying the build jar file into ${VH_OUTPUTS_DIR}"
    - cp target/MLPMnist-1.0.0-bin.jar ${VH_OUTPUTS_DIR}/MLPMnist-1.0.0.jar
    - ls -lash ${VH_OUTPUTS_DIR}

    environment: aws-eu-west-1-g2-2xlarge

  • step: name: Run-dl4j-mnist-single-layer-train-model image: neomatrix369/dl4j-mnist-single-layer:v0.5 command:

    - echo "~~~ Unpack the MNist dataset into ${HOME} folder"
    - tar xvzf ${VH_INPUTS_DIR}/dataset/mlp-mnist-dataset.tgz -C ${HOME}
    - cd ${VH_REPOSITORY_DIR}/examples/cloud-devops-infra/valohai/MLPMnist/
    - echo "~~~ Copying the build jar file from ${VH_INPUTS_DIR} to current location"
    - cp ${VH_INPUTS_DIR}/dl4j-java-app/MLPMnist-1.0.0.jar .
    - echo "~~~ Run the DL4J app to train model based on the the MNist dataset"
    - ./ {parameters}


    - name: dl4j-java-app
      description: DL4J Java app file (jar) generated in the previous step 'Build-dl4j-mnist-single-layer-java-app'
    - name: dataset
      description: MNist dataset needed to train the model


    - name: --action
      pass-as: '--action {v}'
      type: string
      default: train
      description: Action to perform i.e. train or evaluate
    - name: --output-dir
      pass-as: '--output-dir {v}'
      type: string
      default: /valohai/outputs/
      description: Output directory where the model will be created, best to pick the Valohai output directory

    environment: aws-eu-west-1-g2-2xlarge

  • step: name: Run-dl4j-mnist-single-layer-evaluate-model image: neomatrix369/dl4j-mnist-single-layer:v0.5 command:

    - cd ${VH_REPOSITORY_DIR}/examples/cloud-devops-infra/valohai/MLPMnist/
    - echo "~~~ Copying the build jar file from ${VH_INPUTS_DIR} to current location"
    - cp ${VH_INPUTS_DIR}/dl4j-java-app/MLPMnist-1.0.0.jar .
    - echo "~~~ Run the DL4J app to evaluate the trained MNist model"
    - ./ {parameters}


    - name: dl4j-java-app
      description: DL4J Java app file (jar) generated in the previous step 'Build-dl4j-mnist-single-layer-java-app'    
    - name: model
      description: Model file generated in the previous step 'Run-dl4j-mnist-single-layer-train-model'


    - name: --action
      pass-as: '--action {v}'
      type: string
      default: evaluate
      description: Action to perform i.e. train or evaluate
    - name: --input-dir
      pass-as: '--input-dir {v}'
      type: string
      default: /valohai/inputs/model
      description: Input directory where the model created by the previous step can be found created

    environment: aws-eu-west-1-g2-2xlarge

Explanation of Build-dl4j-mnist-single-layer-java-app

From the YAML file, we can see that we define this step by first using the Docker image and then run the build script to build the uber jar. Our docker image has the build environment dependencies setup (i.e. GraalVM JDK, Maven, etc…) to build a Java app. We do not specify any inputs or parameters as this is the build step. Once the build will be successful we want to copy the uber jar called MLPMnist-1.0.0-bin.jar (original name) to the /valohai/outputs folder (represented by ${VH_OUTPUTS_DIR}). Everything within this folder automatically gets persisted within your project's storage, e.g. an AWS S3 bucket. Finally, we define our job to run in the AWS environment.

Note: The Valohai free tier does not have network access from inside the Docker container (this is disabled by default), please contact support to enable this option (I had to do the same), or else we cannot download our Maven and other dependencies during build time.

Explanation of Run-dl4j-mnist-single-layer-train-model

The semantics of the definition is similar to the previous step except we specify two inputs one for the uber jar (MLPMnist-1.0.0.jar) and the other for the dataset (to be unpacked into the${HOME}/.deeplearning4j folder). We will be passing the two parameters --action train and --output-dir /valohai/outputs. The model created from this step is collected into the /valohai/outputs/model folder (represented by ${VH_OUTPUTS_DIR}/model).

Note: In the Input fields in the Execution tab of the Valohai Web UI, we can select the outputs from previous Executions by using the Execution number i.e. #1 or #2 , in addition to using datum:// or http:// URLs. Typing in the few letters of the name of the file also helps search through the whole list.

Explanation of Run-dl4j-mnist-single-layer-evaluate-model

Again, this step is similar to the previous step, except that we will be passing in the two parameters --action evaluate and --input-dir /valohai/inputs/model. Also, we have again specified two inputs: sections defined in the YAML file called dl4j-java-app and model with no default set for both of them. This will allow us to select the uber jar and the model we wish to evaluate - that was created by the step Run-dl4j-mnist-single-layer-train-model, using the web interface.

I hope this explains the steps in the above definition file, but if you require further help, please do not hesitate to look at the docs and tutorials.

Valohai Web Interface

Once we have an account, we can sign in and continue with creating a project by the name mlpmnist-single-layer and link the git repo to the project and save the project.

Now you can execute a step and see how it pans out!

Building the DL4J Java App

Go to the Execution tab in the web interface and either copy an existing execution or create a new one using the [Create execution] button. All the necessary default options will be populated. Select Step Build-dl4j-mnist-single-layer-java-app.

For Environment, I would select AWS eu-west-1 g2.2xlarge and click on the [Create execution] button at the bottom of the page to see the Execution kick-off.

Training the Model

Go to the Execution tab in the web interface and do the same as the previous step and select the step Run-dl4j-mnist-single-layer-train-model. You will have to select the Java app (just type jar in the field) built in the previous step. The dataset has already been pre-populated via the valohai.yaml file:

Click on [Create execution] to kick off this step.

You will see the model summary fly by in the log console:

[<--- snipped --->]
11:17:05 =========================================================================
11:17:05 LayerName (LayerType) nIn,nOut TotalParams ParamsShape
11:17:05 =========================================================================
11:17:05 layer0 (DenseLayer) 784,1000 785000 W:{784,1000}, b:{1,1000}
11:17:05 layer1 (OutputLayer) 1000,10 10010 W:{1000,10}, b:{1,10}
11:17:05 -------------------------------------------------------------------------
11:17:05  Total Parameters: 795010
11:17:05  Trainable Parameters: 795010
11:17:05  Frozen Parameters: 0
11:17:05 =========================================================================
[<--- snipped --->]

The models created can be found under the Outputs sub-tab in the Executions main tab during and at the end of the Execution:

You might have noticed several artifacts in the Outputs sub-tab. That’s because we save a checkpoint at the end of each epoch! Look out for these in the Execution logs:

[<--- snipped --->]
11:17:14 o.d.o.l.CheckpointListener - Model checkpoint saved: epoch 0, iteration 469, path: /valohai/outputs/
[<--- snipped --->]

The checkpoint zip contains the state of the model training at that point, saved in three of these files:


Training the Model > Metadata

You might have noticed these notations fly by in the Execution logs:

[<--- snipped --->]
11:17:05 {"epoch": 0, "iteration": 0, "score (loss function)": 2.410047}
11:17:07 {"epoch": 0, "iteration": 100, "score (loss function)": 0.613774}
11:17:09 {"epoch": 0, "iteration": 200, "score (loss function)": 0.528494}
11:17:11 {"epoch": 0, "iteration": 300, "score (loss function)": 0.400291}
11:17:13 {"epoch": 0, "iteration": 400, "score (loss function)": 0.357800}
11:17:14 o.d.o.l.CheckpointListener - Model checkpoint saved: epoch 0, iteration 469, path: /valohai/outputs/
[<--- snipped --->]

These notations trigger Valohai to pickup these values (in JSON format) to be used to plot Execution metrics, which can be seen during and after the Execution under the Metadata sub-tab in the Executions main tab:

We were able to do this by hooking a listener class (called ValohaiMetadataCreator) into the model, such that during training, attention is passed on to this listener class at the end of each iteration. In the case of this class, we print the epoch countiteration count, and the score (the loss function value). Here is a code snippet from the class:

public void iterationDone(Model model, int iteration, int epoch) {

  if (printIterations &lt;= 0)
      printIterations = 1;
  if (iteration % printIterations == 0) {
      double score = model.score();
              "{\"epoch\": %d, \"iteration\": %d, \"score (loss function)\": %f}",


Evaluating the Model

Once the model has been successfully created via the previous step, we are ready to evaluate it. We create a new Execution just like we did previously, but this time, select the Run-dl4j-mnist-single-layer-evaluate-model step. We will need to select the Java app (MLPMnist-1.0.0.jar) again and the created model (mlpmnist-single-layer.pb) before kicking off the Execution (as shown below):

After selecting the desired model as input, click on the [Create execution] button. It is a quicker Execution step than the previous one, and we will see the following output:

The Evaluation Metrics and Confusion Matrix post model analysis will be displayed in the console logs.

We can see our training activity has resulted in the model that is near 97% accurate based on the test dataset. The confusion matrix helps point out the instances a digit has been incorrectly predicted as another digit. Maybe this is good feedback to the creator of the model and maintainer of the dataset to do some further investigations.

The question remains (and is outside the scope of this post) — how good is the model when faced with real-world data?

It’s easy to install and get started with the CLI tool, see Command-line Usage.

If you haven’t yet cloned the git repository, then here’s what to do:

$ git clone

We then need to link our Valohai project created via the web interface in the above section to the project stored on our local machine (the one we just cloned). Run the below commands to do that:

$ cd mlpmnist-dl4j-example
$ vh project --help   ### to see all the project-specific options we have for Valohai
$ vh project link

You will be shown something like this:

[  1] mlpmnist-single-layer
Which project would you like to link with /path/to/mlpmnist-dl4j-example?
Enter [n] to create a new project.:

Select 1 (or the selection appropriate for you), and you should see this message:

�� Success! Linked /path/to/mlpmnist-dl4j-example to mlpmnist-single-layer.

The quickest way to know of all the CLI options with the CLI tool is:

$ vh — help

One more thing, before going any further, ensure that your Valohai project is in sync with the latest git project by doing this:

$ vh project fetch

(on the top right side in your web interface, shown with the two-arrows-pointing-to-each-other icon).

Now we can execute the steps from the CLI with:

$ vh exec run Build-dl4j-mnist-single-layer-java-app

Once the Execution is on, we can inspect and monitor it via:

$ vh exec info
$ vh exec logs
$ vh exec watch

We can also see the above updates via the web interface at the same time.


As you have seen, both DL4J and Valohai individually or combined are fairly easy to get started with. Further, we can develop on the different components that make up our experiments i.e. build/runtime environment, code, and dataset and integrate them into an Execution in a loosely coupled manner.

The template examples used in this post are a good way to get started to build more complex projects. And you can use either the web interface or the CLI to get your job done with Valohai. With the CLI you can also integrate it with your setup and scripts (or even with CRON or CI/CD jobs).

Also, it’s clear that if I’m working on an AI/ML/DL-related project, I don’t need to concern myself with creating and maintaining an end-to-end pipeline (which many others and I have had to do in the past).

Thanks to both Skymind (the startup behind DL4J, for creating, maintaining and keeping free) and Valohai for making this tool and cloud-service available for both free and commercial use.


Additional DL4J Resources

Loss functions

Further Reading

How to get started with Python for Deep Learning and Data Science

Deep Learning Using TensorFlow

The best machine learning and deep learning libraries

Deploying a Keras Deep Learning Model as a Web Application in Python

Originally published by Mani Sarkar  at


Thanks for reading :heart: If you liked this post, share it with all of your programming buddies! Follow me on Facebook | Twitter

deep-learning java data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

50 Data Science Jobs That Opened Just Last Week

Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.

Applications Of Data Science On 3D Imagery Data

The agenda of the talk included an introduction to 3D data, its applications and case studies, 3D data alignment and more.

Most popular Data Science and Machine Learning courses — July 2020

Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant

Deep Learning — not only for the big ones

How you can use Deep Learning even for small datasets. When you’re working on Deep Learning algorithms you almost always require a large volume of data to train your model on.

Why You Should Learn R — Learn Data Science with Dataquest

Why should you learn R programming when you're aiming to learn data science? Here are six reasons why R is the right language for you.