Docker & Containerization. How much is too much?

Docker & Containerization. How much is too much?

Do you need to dockerize your projects? Use this framework to decide.

The motivation for docker

There exists an infamous meme that does a better job at explaining the motivation behind docker better than any tutorial out there.

The naive explanation of the purpose of docker would be - It solves isolation and replication of environment, dependencies, configuration during the development ~ deployment cycle across different machines at various layers like OS, Network Interfaces, Dependency managers and so on.

But here’s a more troubling question that does not have as many answers on the internet.

Well I know why docker. But why exactly docker?

That’s an honest question I had because there existed solutions to isolation and replication before Docker. Virtual machines, Linux namespaces-containers (essentially what Docker is), and NPM were the most popular ways to address this.

NPM notably solved dependency inconsistencies when collaborating on projects. Having a package.json meant your friend's machine would have essentially the same dependencies. That solved a lot of "works on my machine" issues related to dependencies.

NodeJS runtime solved for environment isolation. Unlike Python and pip, NodeJS creates an isolated environment, and the dependencies are scoped to this environment only. You no longer had to install dependencies like axios globally as you would earlier and still do when working with Python (I am aware of tools that solve this for Python as well). This again solved a lot of replication issues for NodeJS craftsmen at least. (This is why you need to install dependencies each time you clone a Node/npm-related project with 'npm i', while you would do it once when working with Python/pip as it installs these globally in your global namespace.)

So, why Docker when we had these solutions already? What 'extra' is Docker solving? The answer to that would be - what degree of isolation or virtualization would suffice?

Software Engineering is all about trade-offs at the end of the day!

The degree of “virtualization”

It seems important to question two aspects to understand the purpose of docker:

  1. How much isolation, and virtualisation, is provided by ‘Containers’ against ‘Virtual Machines’ and other solutions like Node, and NPM? Here we need to think of the extent. Yes, IMO, this is a spectrum.

  2. What ‘degree’ of isolation would we need to fulfill our task?

Answering these questions will help you decide if you even need docker. One can entirely learn backend or frontend engineering without knowing a grain about docker or containerization. It is not necessary to dockerize everything you build!? It has to be an engineering tradeoff decision that is behest upon a good software engineer.

I would need docker if my answer to the second question from before would be:

I require enough isolation that:

a. My source code is not dependent on the file system nuances on my Linux that will conflict with FAT32 on the new intern's windows.

b. My source code is not dependent on network configurations and port translations from my other projects.

c. My source code is not dependent on input/output queues specific to the OS. Or maybe dependent on listeners, acceptors, or TCP connection resolving strategies specific to user configurations.

and so on.

These are actual nuances causing real issues that would allow code to run on your machine but might fail on other machines.

Now you need to answer, does your code involve features and functionalities where you are making system calls, or playing with network primitives? If you’re not, then NPM+NodeJS would suffice for you and 90% of the cases (not a pragmatic number).

NPM provides a very lean and classic way of keeping your dependencies and environment isolated and that fulfills most use cases.

We need to measure the amount of isolation that would not cause inconsistencies when we are working with others. That’s a tradeoff call. It is as simple as that.

VMs would provide the highest level of isolation and containers would provide isolation of filesystems, networks, and process stuff which would suffice for most projects, NPM would provide isolation of dependencies and runtime environment that most personal projects and MVPs suffice with.

💡 NPM solves for dependency consistency across machines but still does not solve for configuration consistencies.

The following is a helpful chart to navigate the tradeoffs(You can visualize another column - NPM and color code that as well):

credit - DevOps Directive (courses.devopsdirective.com/docker-beginner..)

VMs V Containers

I won’t elaborate a lot on this, but it helps you make decisions if you need a VM during development or deployment. This video from IBM explains better.

credit - IBM (youtu.be/cjXI-yxqGTI)

VMs

  • Virtualization of the metal hardware

  • The hypervisor is the controller/manager

  • Isolating the OS

  • Can't record definition specs and replicate accurately.

Containers

  • Virtualization of the environment & OS specifics

  • Docker Daemon is the controller/manager

  • Isolating the process, network, and so on.

  • Can record the definition spec and replicate accurately.

In conclusion, Docker's significance lies in its ability to address nuanced issues surrounding isolation and replication within the development and deployment cycles across diverse environments. While existing solutions such as Virtual Machines, Linux namespaces-containers, and NPM offer partial remedies, Docker provides an additional layer of isolation and virtualization tailored to varying project requirements.