If you have any comments, ideas, critiques, or you just want to say hi, don’t hesitate to send me an email at tarun.
Following the second video about Docker basics, in this video, I explain Docker architecture and explain the different building blocks of the docker engine; docker client, API, Docker Daemon. I also explain what a docker registry is and I finish the video with a demo explaining and illustrating how to use Docker hub
In this video lesson you will learn:
#docker #docker hub #docker host #docker engine #docker architecture #api
Now since we have understood the basic architecture of Docker in my previous tutorial titled “Docker: Understanding Docker Architecture and Components”, lets now learn how to install Docker and run some basic commands.
3. The memory should be at least 2 GB and there should be at least 2 core CPU.
The First thing we are going to do is to run the **“docker run hello-world” **command.
This command tries to find the “hello-world” image locally and if not found, it then downloads an image from the docker hub and runs the container out of this image.
#automation #containerization #docker-container #docker #docker-image
If you have recently come across the world of containers, it’s probably not a bad idea to understand the underlying elements that work together to offer containerisation benefits. But before that, there’s a question that you may ask. What problem do containers solve?
After building an application in a typical development lifecycle, the developer sends it to the tester for testing purposes. However, since the development and testing environments are different, the code fails to work.
Now, predominantly, there are two solutions to this – either you use a Virtual Machine or a containerised environment such as Docker. In the good old times, organisations used to deploy VMs for running multiple applications.
So, why did they started adopting containerisation over VMs? In this article, we will provide detailed explanations of all such questions.
#docker containers #docker engine #docker #docker architecture
By Design, Docker containers don’t hold persistent data. Any data you write inside the docker’s writable layer is no longer available once the container is stopped. It can be difficult to get the data out of the container if another process needs it.
Also, a container’s writable layer is tightly coupled to the host machine where the container is running. You can’t easily move the data somewhere else.
Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts.
/var/lib/docker/volumes/on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Let’s understand them in detail one by one.
#docker-container #docker #docker-volume #containerization
Hello, in this post I will show you how to set up official Apache/Airflow with PostgreSQL and LocalExecutor using docker and docker-compose. In this post, I won’t be going through Airflow, what it is, and how it is used. Please checktheofficial documentation for more information about that.
Before setting up and running Apache Airflow, please install Docker and Docker Compose.
In this chapter, I will show you files and directories which are needed to run airflow and in the next chapter, I will go file by file, line by line explaining what is going on.
Firstly, in the root directory create three more directories: dags, logs, and scripts. Further, create following files: **.env, docker-compose.yml, entrypoint.sh **and **dummy_dag.py. **Please make sure those files and directories follow the structure below.
#project structure root/ ├── dags/ │ └── dummy_dag.py ├── scripts/ │ └── entrypoint.sh ├── logs/ ├── .env └── docker-compose.yml
Created files should contain the following:
#docker-compose.yml version: '3.8' services: postgres: image: postgres environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow scheduler: image: apache/airflow command: scheduler restart_policy: condition: on-failure depends_on: - postgres env_file: - .env volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs webserver: image: apache/airflow entrypoint: ./scripts/entrypoint.sh restart_policy: condition: on-failure depends_on: - postgres - scheduler env_file: - .env volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./scripts:/opt/airflow/scripts ports: - "8080:8080"
#entrypoint.sh #!/usr/bin/env bash airflow initdb airflow webserver
#.env AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres/airflow AIRFLOW__CORE__EXECUTOR=LocalExecutor
#dummy_dag.py from airflow import DAG from airflow.operators.dummy_operator import DummyOperator from datetime import datetime with DAG('example_dag', start_date=datetime(2016, 1, 1)) as dag: op = DummyOperator(task_id='op')
Positioning in the root directory and executing “docker-compose up” in the terminal should make airflow accessible on localhost:8080. Image bellow shows the final result.
If you encounter permission errors, please run “chmod -R 777” on all subdirectories, e.g. “chmod -R 777 logs/”
For the curious ones...
In Leyman’s terms, docker is used when managing individual containers and docker-compose can be used to manage multi-container applications. It also moves many of the options you would enter on the docker run into the docker-compose.yml file for easier reuse. It works as a front end "script" on top of the same docker API used by docker. You can do everything docker-compose does with docker commands and a lot of shell scripting.
Before running our multi-container docker applications, docker-compose.yml must be configured. With that file, we define services that will be run on docker-compose up.
The first attribute of docker-compose.yml is version, which is the compose file format version. For the most recent version of file format and all configuration options click here.
Second attribute is services and all attributes one level bellow services denote containers used in our multi-container application. These are postgres, scheduler and webserver. Each container has image attribute which points to base image used for that service.
For each service, we define environment variables used inside service containers. For postgres it is defined by environment attribute, but for scheduler and webserver it is defined by .env file. Because .env is an external file we must point to it with env_file attribute.
By opening .env file we can see two variables defined. One defines executor which will be used and the other denotes connection string. Each connection string must be defined in the following manner:
Dialect names include the identifying name of the SQLAlchemy dialect, a name such as
mssql. Driver is the name of the DBAPI to be used to connect to the database using all lowercase letters. In our case, connection string is defined by:
Omitting port after host part denotes that we will be using default postgres port defined in its own Dockerfile.
Every service can define command which will be run inside Docker container. If one service needs to execute multiple commands it can be done by defining an optional .sh file and pointing to it with entrypoint attribute. In our case we have entrypoint.sh inside the scripts folder which once executed, runs airflow initdb and airflow webserver. Both are mandatory for airflow to run properly.
Defining depends_on attribute, we can express dependency between services. In our example, webserver starts only if both scheduler and postgres have started, also the scheduler only starts after postgres have started.
In case our container crashes, we can restart it by restart_policy. The restart_policy configures if and how to restart containers when they exit. Additional options are condition, delay, max_attempts, and window.
Once service is running, it is being served on containers defined port. To access that service we need to expose the containers port to the host's port. That is being done by ports attribute. In our case, we are exposing port 8080 of the container to TCP port 8080 on 127.0.0.1 (localhost) of the host machine. Left side of
: defines host machines port and the right-hand side defines containers port.
Lastly, the volumes attribute defines shared volumes (directories) between host file system and docker container. Because airflows default working directory is /opt/airflow/ we need to point our designated volumes from the root folder to the airflow containers working directory. Such is done by the following command:
#general case for airflow - ./<our-root-subdir>:/opt/airflow/<our-root-subdir> #our case - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./scripts:/opt/airflow/scripts ...
This way, when the scheduler or webserver writes logs to its logs directory we can access it from our file system within the logs directory. When we add a new dag to the dags folder it will automatically be added in the containers dag bag and so on.
Originally published by Ivan Rezic at Towardsdatascience
#docker #how-to #apache-airflow #docker-compose #postgresql