Making your Analysis Reproducible

Slides

You can view and download the slides in a variety of formats by clicking on the image below.

link_to_slides

Introduction to Docker

In the video below I introduce you to the concept of containerisation using Docker. Docker allows us to package our software (e.g., analysis code and data files) and dependencies in containers that can then be run on any operating system. The container will work identically on macOS, Windows, or Linux. Docker containers are used widely in software development, production, and data science. They have an important role to play in data science as they allow our analysis scripts and data files to be containerised with a particular version of (e.g.) R or Python, and with particular versions of their libraries and packages, that can then be run on any other machine or on our own machine at some point in the future - just by spinning up the container. This allows our analysis to be fully reproducible.

There’s also a good official introduction to Docker here.

Although slightly more computer science focused, the following two videos provide another nice introduction to Docker and to Docker Compose (which allows multiple containers to run simultaneously).

Docker on Windows

In the video below I take you through Docker on Windows. Using the Powershell I cover running Docker containers, linking Docker containers with your local directories and files, and writing Dockerfiles to build new Docker images. Note, you will need to enable Hyper-v and virtualisation in your BIOS if you encounter an error along the lines of “cannot enable hyper-v service” when you try to run Docker.

Part One

Part Two

Docker on macOS

In the video below I take you through Docker on macOS. Using the Terminal I cover running Docker containers, linking Docker containers with your local directories and files, and writing Dockerfiles to build new Docker images.

Part One

Part Two