A Dockerized Data Science Environment
The topics of machine learning, vision learning, and AI are popular ones. Often times the tools used to explore these topics include Anaconda and Python along with a host of packages including Keras, Tensorflow, and OpenCV just to name a few. Having explored many topics for the years, each with their own tools and development environments, I have loaded up more than one computer with tools that are never used after the exploration is complete. Perhaps you have done the same. Trying to unwind these environments is difficult at best. I tend to not even try and thus pay the penalty of temporary environments becoming permanent.
Containers are also a popular topic today. They bring the promise of running workloads without polluting the host environment with tools and applications required to run the workload. Docker is a popular containerization technology. Anaconda is a popular tool used by data science professionals. I set about trying to find a containerized Anaconda image. I had the following goals for my container:
- Run Anaconda with Python3
- Display Images on my MacBook Pro
Spoiler Alert: I was successful. Below is a screenshot of a container terminal running a Python program that displays an image in a window on my MacBook Pro.
My first action was to go to Dockerhub to find what certainly must be a popular Docker image. Certainly this image has already been created. After a significant search effort, I did not find what I was looking for on Dockerhub. Google also did not bring satisfactory results. To be fair, I found many images on Dockerhub, but they lacked the resources to be able to display images in the host environment.
After more Google searches, I decided to construct my own Docker image. I started with an image from Dockerhub that provided the basics of running Anaconda with Python3. I then searched for how to display dockerized GUI applications. The recommendations involved running an X11 server on the MacBook Pro, and running an X11 client within the docker image. Sounds simple, doesn’t it?
Here is a screenshot of the same program running within a web-based Jupyter Python3 terminal. Again, Jupyter is running in the container.
After not an insignificant number of hours and many stops and starts, I successfully created the desired environment. The end result is a docker container running Anaconda that will display images on my MacBook Pro. This tool implementation should pave the way for many hours of data science exploration while not polluting my MacBook Pro with the various temporary environments. If you also want such and environment, take a look at my Github and try it out. If you have corrections, updates, or enhancements to the environment, let me know so that I can improve the solution as necessary.