How to setup TensorFlow on Ubuntu

TensorFlow is one of the most popular deep-learning libraries. It was created by Google and was released as an open-source project in 2015. TensorFlow is used for both research and production environments. Installing TensorFlow can be cumbersome. The difficulty varies based on your environment constraints, and more when you’re a data scientist that just wants to build your neural networks.

When using TensorFlow on GPU — setting up requires a few steps. In the following tutorial, we will go over the process required to set up TensorFlow.

Requirements:

NVIDIA GPU MachineDocker and Nvidia-Docker installed. (Read “
How to install docker and nvidia-docker in our blog
)Ubuntu 16.04

Step 1 — Prepare your environment with Docker and Nvidia-Docker

Docker is a tool designed to make it easier to create, deploy, and run applications by using containers. What exactly is a container? Containers allow data scientists and developers to wrap up an environment with all of the parts it needs — such as libraries and other dependencies — and ship it all out in one package.

To use docker with GPUs and to be able to use TensorFlow in your application, you’ll need to install Docker with Nvidia-Docker. If you already have those installed, move to the next step. Otherwise, you can follow our previous guide to installing nvidia docker.

Prerequisites:

Step 2 — Dockerfile

Docker can build images (environments) automatically by reading the instructions from a Dockerfile. A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

In our case, those commands will describe the installation of Python 3.6, CUDA 9 and CUDNN 7.2.1 — and of course the installation of TensorFlow 1.12 from source.

For this environment, we will use the following Dockerfile

FROM nvidia/cuda:9.0-base-ubuntu16.04

RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        cuda-command-line-tools-9-0 \
        cuda-cublas-dev-9-0 \
        cuda-cudart-dev-9-0 \
        cuda-cufft-dev-9-0 \
        cuda-curand-dev-9-0 \
        cuda-cusolver-dev-9-0 \
        cuda-cusparse-dev-9-0 \
        curl \
        git \
        libcudnn7=7.2.1.38-1+cuda9.0 \
        libcudnn7-dev=7.2.1.38-1+cuda9.0 \
	libnccl2=2.4.2-1+cuda9.0 \
	libnccl-dev=2.4.2-1+cuda9.0 \
        libcurl3-dev \
        libfreetype6-dev \
        libhdf5-serial-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        rsync \
        software-properties-common \
        unzip \
        zip \
        zlib1g-dev \
        wget \
        && \
    rm -rf /var/lib/apt/lists/* && \
    find /usr/local/cuda-9.0/lib64/ -type f -name 'lib*_static.a' -not -name 'libcudart_static.a' -delete && \
    rm /usr/lib/x86_64-linux-gnu/libcudnn_static_v7.a

# install python 3.6 and pip

RUN apt-get update
RUN apt-get install -y software-properties-common vim
RUN add-apt-repository ppa:jonathonf/python-3.6
RUN apt-get update

RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
RUN apt-get install -y git

RUN apt-get update && \
        apt-get install nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0 && \
        apt-get update && \
        apt-get install libnvinfer4=4.1.2-1+cuda9.0 && \
        apt-get install libnvinfer-dev=4.1.2-1+cuda9.0

RUN python3.6 -m pip install pip --upgrade
RUN python3.6 -m pip install wheel 
RUN python3.6 -m pip install six numpy wheel mock
RUN python3.6 -m pip install keras_applications
RUN python3.6 -m pip install keras_preprocessing

RUN ln -s /usr/bin/python3.6 /usr/bin/python


# Set up Bazel.

# Running bazel inside a `docker build` command causes trouble, cf:
#   https://github.com/bazelbuild/bazel/issues/134
# The easiest solution is to set up a bazelrc file forcing --batch.
RUN echo "startup --batch" >>/etc/bazel.bazelrc
# Similarly, we need to workaround sandboxing issues:
#   https://github.com/bazelbuild/bazel/issues/418
RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
    >>/etc/bazel.bazelrc
# Install the most recent bazel release.
ENV BAZEL_VERSION 0.15.0
WORKDIR /
RUN mkdir /bazel && \
    cd /bazel && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    curl -H "User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36" -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE && \
    chmod +x bazel-*.sh && \
    ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
    cd / && \
    rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh




# Download and build TensorFlow.
WORKDIR /tensorflow
RUN git clone --branch=r1.12 --depth=1 https://github.com/tensorflow/tensorflow.git .

# Configure the build for our CUDA configuration.
ENV CI_BUILD_PYTHON python3.6
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
ENV TF_NEED_CUDA 1
ENV TF_NEED_TENSORRT 1
ENV TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.2,6.0,6.1,7.0
ENV TF_CUDA_VERSION=9.0
ENV TF_CUDNN_VERSION=7

RUN ln -s /usr/local/cuda/lib64/stubs/libcuda.so /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
    LD_LIBRARY_PATH=/usr/local/cuda/lib64/stubs:${LD_LIBRARY_PATH} \
    tensorflow/tools/ci_build/builds/configured GPU \
    bazel build -c opt --copt=-mavx --config=cuda \
	--cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" \
        tensorflow/tools/pip_package:build_pip_package && \
    rm /usr/local/cuda/lib64/stubs/libcuda.so.1 && \
    bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \
    pip --no-cache-dir install --upgrade /tmp/pip/tensorflow-*.whl && \
    rm -rf /tmp/pip && \
    rm -rf /root/.cache
# Clean up pip wheel and Bazel cache when done.

WORKDIR /root

# TensorBoard
EXPOSE 6006

Step 3 — Running Dockerfile

To build the image from the Dockerfile, simply run the docker build command. Keep in mind that this build process might take a few hours to complete. We recommend using nohup utility so that if your terminal hangs — it will still run.

$ docker build -t deeplearning -f Dockerfile

This should output the setup process and should end with something similar to:

>> Successfully built deeplearning (= the image ID)

Your image is ready to use. To start the environment, simply type in the below command. But, don’t forget to replace your image id:

$ docker run --runtime=nvidia -it deeplearning /bin/bashStep 4 — Validating TensorFlow & start building!

Validate that TensorFlow is indeed running in your Dockerfile

$ python
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
2019-02-23 07:34:14.592926: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports 
instructions that this TensorFlow binary was not compiled to use: AVX2 FMA2019-02-23 07:34:17.452780: I 
tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA
node read from SysFS had negative value (-1), but there must be at leastone NUMA node, so returning NUMA node zero
2019-02-23 07:34:17.453267: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with 
properties: 
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-02-23 07:34:17.453306: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu
devices: 0
2019-02-23 07:34:17.772969: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect
StreamExecutor with strength 1 edge matrix:
2019-02-23 07:34:17.773032: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0
2019-02-23 07:34:17.773054: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N
2019-02-23 07:34:17.773403: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow
device (/job:localhost/replica:0/task:0/device:GPU:0 with 10757 MB memory)
-> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0,
compute capability: 3.7)
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80,
pci bus id: 0000:00:1e.0, compute capability: 3.7
2019-02-23 07:34:17.774289: I
tensorflow/core/common_runtime/direct_session.cc:307] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: Tesla K80, pci bus id: 0000:0

Congrats! Your new TensorFlow environment is set up and ready to start training, testing and deploying your deep learning models!

Conclusion

Tensorflow has truly disrupted the machine learning world by offering a solution to build production-ready models at scale. But Tensorflow is not always the most user-friendly. It can be difficult to smoothly incorporate into your machine learning pipeline. cnvrg.io data science platform leverages Tensorflow and other open-source tools so that data scientists can focus on the magic — the algorithms. You can find more tutorials on how to easily leverage open-source tools like Tensorflow, How to set up Kubernetes for your machine learning workflows, and How to run Spark on Kubernetes. Finding simple ways to integrate these useful tools will get your models closer to production.