Introduction
It is very important to have a local development and testing environment for developing data pipeline. Having a local development environment means you can try things and innovate with ease without bothering about costs or impact on others.
Here’s list of steps that you need to perform to create a Docker-based development environment:-
1). Setup Docker Desktop(https://docs.docker.com/desktop/install/windows-install/), and sign up for Docker Hub (https://hub.docker.com/signup/)
2). Pull Docker images and create containers. In our case, we setup MariaDB, Jenkins, and Superset
3). Setup Anaconda locally, and create development environment that you need
4). Setup SQLYog or Dbeaver to connect with local and production databases
5). Create a Docker network interface between containers so that they can access each other e.g. application running in Jenkins needs to access the container with MariaDB setup
6). Setup various plugins to allow Jenkins to run data pipelines
7). Configure your source control e.g. bitbucket so that Jenkins can access it
8). Setup Credentials in Jenkins
9). Access Jenkins container as root and setup Python and other much-needed libraries for your applications
Setting Up Docker Containers for Our Local Development Environment
Assuming, you have already setup Docker desktop, and got connected with DockerHub, lets create different applications within Docker containers for our development environment.
Setup Superset
>> mkdir superset
>> cd superset
>> docker-compose -f docker-compose-non-dev.yml pull
>> docker-compose -f docker-compose-non-dev.yml up
Once you run these commands, then you can access superset in browser( http://localhost:8088/login/). Default user name and password is admin and admin
Setup MariaDB
Use following command to pull the image :-
>> docker pull mariadb
Set following environment variables during creation of container from docker image in Docker desktop :
MARIADB_USER=admin MARIADB_PASSWORD=admin
MARIADB_ROOT_PASSWORD=123
Set port to 3306
Reference: https://hub.docker.com/_/mariadb
Setup Jenkins
jenkins – Official Image | Docker Hub
Use following command
>> docker pull jenkins/jenkins:lts
Fire the Jenkins container from the docker desktop, assign port, get initial password/secret from the log and configure admin users.
Setting Up Python3 and Other Components Needed to Run Your Python Application Using Jenkins Docker Container
First, we need to get Docker Container ID for Jenkins
>docker ps
For example, 3d6525b8cb3a is your container id. Then your next step is to get in Docker Container with Root Permissions.
>docker exec -it -u 0 3d6525b8cb3a /bin/bash
# apt-get update
# apt-get install python3 and
# apt-get install python3-pip
# apt-get install libmariadb-dev
# pip3 install mysqlclient
Setting Up Network for Accessing Another Docker Container
We have created multiple docker containers that need to talk to each other e.g. our python application running on Jenkins need to access MariaDB running in another container. Typically, we are accessing Docker containers using localhost, but one docker container(being VM) can not access another docker container that is also running on the same docker host using localhost. So, how do we get docker containers to talk to each other in such a scenario.
It’s easy. If you have two or more running container, complete next steps:-
>> docker ps
>> docker network ls
>> docker network create myNetwork
>> docker network connect myNetwork web1
>> docker network connect myNetwork web2
Now you connect from web1 to web2 container or the other way round.
Use the internal network IP addresses which you can find by running:
>> docker network inspect myNetwork
Installing Jenkins Plugins
Before creating Jenkins pipeline, you will need to setup many plugins to manage pipeline, access bitbucket and manage production credentials within Jenkins.
Install the Bitbucket plugin in Jenkins
Log in to the Jenkins server and go to Manage Jenkins.
Then click the Manage Plugins.
Manage Jenkins Plugins
Now search for the BitBucket plugin and tick the BitBucket plugin. And click the Install without restart button to install the plugin.
Install BitBucket Plugin
Now the BitBucket plugin is installed successfully. Next, we need to add BitBucket credentials to Jenkins.
Why ? because I am using private BitBucket repo. In a real-world scenarios, we always use private repo for building software applications. If it is a public repo, then there is no need for the credentials.
Conclusion
In this tutorial, I have gone through the whole process for creating a local development environment using Docker. This is lengthy and complicated process but essential for any good developer involved in business intelligence, data engineering or data science projects. We intentionally left out many details, otherwise each of the steps above is a complete tutorial on its own. In any case, relevant details can be found through one of the many tutorials by searching on Google.