Author: Simon Bennett, Contributors: Jorge Guerra, Bikram Gupta
Docker is an extremely popular tool in the developer community—according to the StackOverflow 2022 survey, it is the most widely used tool among professional developers. There are several factors that have contributed to Docker’s popularity, including that it provides a simple and efficient way to package and deploy applications. Additionally, Docker has a large and active community of users, which has helped to drive innovation and adoption. In this post, we’ll walk through what data you should be backing up when using Docker and how to use SnapShooter for Docker backups.
When running an application in Docker, there are 3 types of data that need to be protected.
- Application modifications
- Application configuration
- Application data (eg. runtime logs, user data, database etc.)
Application modification is when you modify the container data at runtime. Protecting this type of data typically means that you’d back up the whole image and push it to the registry. However, you can optimize backups for application modification by configuring the application to use volumes for data storage.
Application configuration contains the description of your specific application. For example, when deploying an NGINX or a WordPress application, you will need to configure the application when it starts. When running Docker on a single host, the recommended practice for production deployment is Docker Compose. Docker Compose allows you to define the application configuration in a YAML file. You can keep the Docker Compose configuration in a Git repo, and deploy on to any host. You get an identical application, whenever you run Docker Compose. However, if you have runtime data, that will be lost when you switch hosts and run Docker Compose again.
For storing the application data, there are several options. Refer to the diagram below from Docker documentation.
- Bind mounts are used to mount a specific file or directory on the host into a container. This option provides the most flexibility and control over the storage, but can be less portable and secure than other options.
- Docker volumes are a more portable and secure way to manage storage for containers. A Docker volume is a managed directory that is created and managed by Docker itself. This option provides better isolation and control over storage, and allows for easier management and migration of data across different hosts.
- Finally, tmpfs mounts allow a container to create a temporary file system in memory. This option is useful for storing data that is only needed temporarily, and can help improve performance by avoiding disk I/O.
Out of these options, named volumes are considered a best practice for managing storage in Docker. Named volumes are persistent, shareable, and easy to manage across different hosts. They also provide a clear separation between data and containers, making it easier to manage and migrate data independently of the containers themselves.
We recommend backing up Docker volumes to build a robust data protection strategy. Docker Desktop provides an extension for backing up and restoring a volume. There are some other scenarios to be considered as well, which include:
- Backup and restore a named volume.
- Backup and restore the application data, when relying on volume backup may not be sufficient. For example, if you run a database like Postgres on Docker and use a named volume to store the data, the recommended backup practice is to use Postgres application backup. If you use the volume backup, there is a possibility of data loss if the backup happens in the middle of a write.
- Backup and restore application data, when you do not have access to the host. This is in certain cases (eg. Heroku, Fly.io, etc.), where you may need to bundle a backup agent as part of the container image itself.
In SnapShooter, we have recently added support for comprehensive backup/restore for Docker-based applications.
- You can now create backup jobs for Docker volumes. Just point to the host, select the volumes you want to backup and policy. When restoring, first you should restore the volume, and then run your docker containers (eg. Docker Compose). That way, the application can use the restored volume.
- We have expanded the scope of database backups to cover both host and containers. If you are running Postgres, MySQL or MongoDB in Docker, you can now backup and restore your database using recommended database backup tools. While using the recipe, make sure to enable the container option.
- SnapShooter agent is in early preview. Agent enables you to run backup for applications behind a NAT gateway (eg. AWS or GCP VMs), or even simplify firewall configuration. Agent also enables you to bundle with an image that you can deploy to hosted container services, such as Fargate, fly.io, or DigitalOcean App Platform for custom backup needs.