
Docker Demystified - with (me) Dan Clarke
Durée: 33m27s
Date de sortie: 18/09/2022
This episode was a solo episode where I try my best to “demystify” Docker! A difficult task over an audio podcast, so hopefully it’s digestible! I cover the following topics…What are containers and images?Container registriesUse cases for containersInteracting with DockerDocker ComposeVolumesBuilding your own imagesImage layer cachingMulti-stage DockerfilesAbout Container OrchestratorsUsing Docker for local developmentFor a full list of show notes, or to add comments - please see the website ...
Hey everyone, welcome to the Unhandled Exception podcast. I'm Dan Clark and this is episode
number 43. And today is going to be another solo episode and this time I'm going to try
to demystify Docker. Now I'm going to cover an awful lot in this episode from the basics
of what a container is and then building upon that explaining about images, how you build
your own, how containers are run in production, what Docker compose is and also various use
cases for containers. Now if you're new to these concepts, I'm aware this will be an
awful lot to take in and might require listening back a few times. I'd recommend listening
through once then having a play around with Docker locally and trying out a few of the
Docker commands we talk about, then come back and re-listen and slowly these concepts should
hopefully start to click. It's definitely worth the investment learning Docker though,
there are so many different use cases and hopefully this episode will help you start
that journey.
Ok Docker, so let's start by explaining
some of the different terms. Let's start with containers, what are containers? So in
one way you can think of them as very lightweight virtual machines, you spin them up, they have
a file system, you can open a command line in a shell in the container and run commands
on it. There are Linux and Windows variants and you can create snapshot images from them
and we'll talk more about these images in this episode. However there are quite a few
différences from VMs too. And one big one which I see a lot of people get confused
about initially is that a container usually revolves around just one process. So when
a container starts it has this main process that gets started which is typically the thing
you're running, for example a website or an API or a console application. And when
that process ends, whether that be the app just completing cleanly after finching its
work or an unhandled exception gets thrown, however that process ends then the container
will also end. The lifetime of the container is the lifetime of that process. So imagine
you have an architecture with a web front end, two different backend APIs, maybe a database,
a reddish cache and just say you wanted to run all of those things in containers. So
personally I would port the databases in pass, the cloud, platform as a service, services
which are managed and not part of containers but let's just say you wanted to run all of
these things in containers then each of those things would have its own separate container
so the website would have its own. The two different APIs would have their own individual
containers whereas with a VM you might have multiple things running on that same VM. Containers
are just about one thing. That's one thing I see people get confused about, they think
you can put more than one thing in a container where a container is about one thing, it's
one of your APIs, whatever it is. A particular web app goes in a container, if you've got
a console app that's listening to messages of a message queue, that would be its own
container. So each service or whatever it is has its own container. Another difference
from VMs is that containers are super quick to spin up. A VM might take a minute or two
to spin up before the usable. Containers typically take seconds. So hopefully that
gives you an idea of what a container is for. I'm not going to go into the implementation
details here. Windows and Linux containers are implemented in quite different ways but
that's out of the scope of this episode and you don't really need to know how they're
implemented to use them. But if you're interested I'd recommend reading up on Linux C groups
and namespaces. It's all quite interesting how Linux leverages them to make containers
possible. But again this is out of scope of this episode.
So what about an image? A container is a running instance. And an image is a snapshot
that sits on the file system. And you can create containers from images. And you can
create multiple containers from an image. So an image is a fixed thing on the file
system. This is how you share your images, how you can use third party images, you can
build your own. And then containers are instances of an image. As I say you can build your own
and if you want to share it with other people you typically push your image to what's called
a container registry. So the other people, including your hosting environments, can then
pull those images down and create containers from them. So images play a big role in deploying
and sharing your application or service. So again an image is a snapshot, it's not
a running instance. A container is an instance of an image. I'll dig a bit more into images
shortly once we've covered a few other things first. So let's talk a bit about these container
registries I've just mentioned. The ones where if you build your own image you can push those
images to a container registry. These are pretty much just hosting providers for images.
You can create your own private container registries. Most cloud providers have managed private container
registries that you can spin up. The most well known though, and also the default container
registry is called Docker Hub. And if you go to hub.docker.com you can browse all the
images there. And at the time of writing you'll have nearly 9 million images to choose
from. And if you have Docker installed you can run them by literally just typing docker
run followed by the image name. And doing that will automatically pull the image down
to your machine and create a container from it. It's really insane how powerful it is
and easier this to use. So if I have a brand new machine it only has Docker installed and
I want to run for example MySQL. I can literally just type docker run MySQL. So maybe some examples
of common images. Redis, MySQL, SQL server, RabbitMQ, Elasticsearch and obviously nearly
9 million other things. So there's a lot of stuff you can just spin up very, very easily.
So hopefully you're getting an idea of why you might want to use them. But here's some
more reasons. Consistency. Once you build your image, and again I'll talk more about
building images in a bit, it's the same image, the same file system inside the image for
each container you create from that image. So me spinning up a container locally is
the same as a colleague spinning up a container from the same image and is the same as the container
being spun up in production or any other environment. They're the same binaries.
We know that advantages, dependencies don't need to be installed. So for example I mentioned
before about spinning up SQL server or MySQL, I can spin up those containers, I can use
them locally and I don't have to actually install those things. I also find this useful
in CI CD pipelines like Azure DevOps or GitHub Actions. To build my app, the only thing
the builder agent needs to have installed is Docker. It doesn't need to have the latest
.NET SDK or the latest node or whatever my app uses because those are baked into the
images. I think locally the biggest use case I use it locally is as I mentioned before
if I need to spin up a database or something. I don't have SQL server installed anymore
on my machine or RabbitMQ or any Azure storage emulators. I use Docker for them all. For
example in one application I work on where I use all of those things. I literally just
run Docker compose up and all of those three services spin up in seconds. A new dev
into the team, Git clones the repository, Docker compose up, they have those running
as well. So it's great for onboarding too. And then at the end of the day I can run
Docker compose down and they're gone. And I'll talk more about Docker compose in a
bit. Another brilliant use case is for integration tests. You can run third party services as
I mentioned before, for example the database using Docker compose up. Then you can run
your integration tests against them. So instead of using things like in memory and the framework
implementations, you can very easily run your integration tests against your real code
talking to a real database that you've spun up with Docker compose up. When I'm starting
work at the start of a day on a project, I would do a Docker compose up, it spins Apollo's
things, then I can run my integration tests whenever I feel like it. Then at the end
of the day I'll do a Docker compose down. And in CI CD on Azure DevOps or GitHub au
Baxons, they can do exactly the same thing. This is transformed the way I write tests
to be honest. So next let's talk about how you interact with Docker. And the primary
method is via the command line using the Docker CLI. There are guies available. For
example when you install Docker desktop, that comes with a guille. But I'd recommend that
everyone at least feel comfortable with the Docker CLI. So let's talk about some of
the common commands you might want to use. One might be Docker pull. Now you might not
actually use this regularly because some of the other commands do this explicitly. But
I thought this one was worth mentioning first. And this just pulls down the image from the
container registry to the machine you're running Docker pull on. So this image is now
cached ready to use. And by use I mean create containers from it. Another command you use
which will be more often is Docker run. Now this is what will create a new container
from the specified image. And if the image doesn't already exist locally, as in it's not
already cached locally, then it'll automatically try and pull that image before running it.
So if for example I had a completely fresh machine with just Docker installed and I ran
Docker run rabbit MQ, then it would automatically pull that image down from Docker hub and create
a container from it. Some of the commands Docker stop. This stops a running container.
Docker start. This starts a stopped container. So you can start and stop them. Another one
Docker PS. This lists all the running containers. And you can add minus a to tell it to include
stock containers in that list. Docker exec. This is a good one because this allows you
to get a shell, a command line inside a running container. So if it's a Linux container,
you'll probably get a bas shell where you can type what you would normally run on a bas shell.
Docker commit. So you probably won't use this very often, but you can use this to create a new image
from a running container. As I say, you'll probably really use this. But it's worth remembering
that whilst you normally take an existing image and create containers from it,
you can create images from containers too. In fact, that's actually what's happening behind
the scenes when you do a Docker build, which we'll talk about shortly.
So I mentioned about Docker compose. So each of those commands I've just spoken about,
I've all sorts of different arguments. For example, when you do Docker run,
you'll probably also need to pass in the minus P argument to open up a port. P stands for publish.
However, I personally prefer to think of it as standing a port. It's easier to remember,
P for port. On the command line, when typing Docker run, having to type all of those arguments
each time, you have to remember them, it can be a bit of a pain, lots of typing. And what if you
want to spin up multiple containers, like I mentioned before, that I quite often would spin up
RabbitMQ, SQL Server, Azure, Storage Emulator, a few different things. Now, this is where
Docker compose comes in, because it allows you to have a YAML file, which you can put in
source control. And in that YAML file, you can specify multiple services, which will create
containers from each. And the arguments you would have specified within Docker run, you can
specify these in the YAML file. So it just means that this is source controlled. Once you have
that YAML file, you can just type Docker compose up. Instead of typing Docker run individually,
that'll spin up everything that's in the Docker compose file. So just to take a step back and put
that into context, imagine you have a Docker compose.YAML file in your project, it contains
SQL Server, RabbitMQ, Redis, perhaps even a few of your own microservices. A brand new developer
starts on the team with a brand new machine that only has Docker installed. They then clone
the source code from Git, and run Docker compose up. Now they have running locally SQL Server,
RabbitMQ, Redis, and various of your own microservices, running locally, just like that.
It's pretty powerful. And you can use Docker compose in production, but I've never done this. I
personally prefer to use a container orchestrated tool like Kubernetes, which I'll talk about a
little bit in a bit. But I personally just use Docker compose for local stuff, like I've already
mentioned. It's worth pointing out at this point that containers are transient. So if you spin up
a database in Docker, the data files are stored inside that container, which obviously isn't what
you want for a database. So when creating a container, you can specify something called a
volume, which is basically a virtual file or directory in the container pointing to a
directory or file outside of the container. So it gets persisted. And the stuff running inside
the container, at a very simplistic level, think of it as a symbolic link. When you specify a volume
in the Docker run command or the Docker compose file, that volume can literally just point to a
file on your host machine or directory, or it can be more complicated than that it can point to
cloud storage somewhere. So that's how if you do want to have persisted data, but still use containers,
you would lean towards volumes. Right, so let's talk a bit more about images. As already said, images
are fixed snapshots that you create containers from. But there's a bit more to it than that.
Images consist of layers. So when you build your own docket image, you always start by specifying
a base image to build from. This could be a container OS like Ubuntu, or it could be any other
image. For example, if you're building a dotnet application, you'll probably base your image off
one of the Microsoft dotnet images. If you're building a Node app, you'll probably base it off
the Node image. And those images in turn will be built on top of other images, like Ubuntu,
Debian, that kind of thing. And layers are actually even more granular than that. But we'll
talk about building your own containers with docker files very, very shortly.
One key point to note is that each of these layers is cached. So for example, if I pull
two of my images from a container registry, maybe API one on API two, and they're both built on
top of a dotnet runtime base image, then docker's not going to try and pull that base image down twice.
So you find that because of all this caching, when running containers from images, when building
your own images, unless it's the first time you've done it, then it's going to be pretty quick.
Okay, so let's dig into docker files, which is how you build your own images.
And these layers that I spoke about might make a little bit more sense as we talk about these
docker files. And you'll see how powerful this caching really is. So a docker file is just a
text file with a series of commands. You then build the image by running docker build on the
command line against this docker file. The docker file always starts with a line using the from
keyword. So the first line will say from followed by some base image name. And this is what I
might tell you when talking about base images, you always build your image on top of an existing
image. And when you do docker build, docker will automatically pull down this base image if you
don't already have it cast. Then behind the scenes, when you're doing the build, docker will
create a temporary intermediate container from that base image. You don't actually see this
happening, but it's useful to understand this. And the following commands in the docker file,
it'll be running on top of that intermediate container. Each line in the docker file
does start with a keyword. And we've already spoken about this initial from another common keyword
is run. And this will run any command in this intermediate container. So if we're building
a Linux image, run would typically be followed by a bash command, maybe apt get install to install
a dependency. Or if you're building a dotnet app dotnet publish, maybe you can even run your
tests inside your docker file before doing your dotnet publish with run dotnet test. So you can
see you can just run these arbitrary commands. Now, so I mentioned that this run is happening
inside your intermediate container. Every one of these commands will then create a new layer.
So effectively creating a snapshot image at each command. And that layer will be cached.
So each of these steps in your docker file will be cached. So the next time you do docker build,
unless it's been invalidated because something's changed, then it won't have to run that command
again. It'll just use the cached version. Another common command you'll see in your docker
file is called copy. This allows you to copy files from your host machine into this immediate
container, which is obviously required so that you can copy your source code into it so you can
build it with, for example, the dotnet build command. And the last command you would typically have
in your docker file would be entry point. So remember earlier that I said that a container
is typically just about one process. When the container starts, that process starts.
When that process ends, the container ends. This entry point command specifies what that process
will be. So if you're building a dotnet app, then the run.net publish command will create the
binaries, the compile binaries in the usual place, bin release publish folder, as you would typically
see if you weren't using docker. And this entry point command would just specify a path to that.
Okay, so technically that's not quite true for dotnet, because dotnet is just in time
compiled at runtime, and you run, especially dotnet core plus, you then run dotnet apps via
the dotnet command line tool. So in this case, that single process is actually dotnet,
the dotnet command line tool, but you just pass in your compiled TLL as an argument to that.
The same way as if you weren't using docker. So the entry point command might look like
entrypoint.net bin slash release slash dotnet six slash publish slash myapi.dll.
I guess using dotnet as an example makes this harder to explain. If you're using a C++ natively
compiled app, then the entry point would just reference your compiled exe file. I hope that made
sense. So let's just take a step back and put this together. Imagine a docker file with four lines
of code. I'm obviously greatly simplifying this here, but line one from some base image.
Let's call this the dotnet SDK base image. Line two copy dot dot. That's going to copy
everything from your host machine into the intermediate container. So the docker build
has your source code. Next line run dotnet publish. Obviously, that's just going to run
the command dotnet publish as if you're running it on the command line anyway.
Then the last line entrypoint.net and bin release publish myapi.dll.
So hopefully you get an idea. I know this is hard to explain over audio, but hopefully you get an
idea of the different steps of building an image. You specify your base image, you copy files
into it that the build context can use. You would run some commands on it. So commands could be
installing stuff that the image needs to have. It could be compilation.
It can be whatever you want. It could be running tests.
And then you end with an entry point with what the image that you've just built,
what the entry point process is going to be when you run a container from the image.
Right. So if you're still with me, you probably want to pause and just
have a deep breath. I know I do. We are getting there. Another thing that's worth mentioning though
is something called a docker builds context, which is worth knowing about. When I said that the
copy command copies from your host computer where you run the build,
that's not entirely true. When you run docker build, you specify a path,
which is normally just the current directory. So you just put dot.
So you might do docker build dot. Docker build will then copy all the files from that
directory into what's called a docker builds context. And it's this context that the copy
command copies from. So it's in a way, it's kind of invisible to you that it's doing that.
It feels like you're just copying it from your host machine. But the reason it does this
is because the build might not always be happening on the same machine.
So for example, you can actually offload the build to another service,
although normally you just do it on the same machine.
You can also create a docker ignore file to tell docker build not to copy certain
files or directories into your docker build context.
So for example, if you have a local node modules directory locally,
because you've been developing locally, when you do a docker build,
you probably don't want to be copying that massive directory into the context
for the docker build. So a docker ignore file can help with that.
It works very similar to a docket ignore file.
So let's just talk a bit more about these layers and caching.
Each of the docker file commands I mentioned earlier, like copy, run, etc.
They create a new image layer, as I mentioned before,
and each of these layers is cached.
The cache gets invalidated for a specific layer if something changes.
So for example, if I change some code, which I tend to do while I'm coding,
then the docker build context that I mentioned earlier will have changed.
So that will invalidate the copy command layer in the docker file and all the commands blow it.
So that's really useful. What this means is that every time I do a docker build,
it's not having to process every single command each time.
So if for example, you have a run apt get install command near the top of your docker file,
then it's not going to have to install that thing every time you change your source code.
Because that command is already cached because it's high up the docker file.
To take advantage of this cache, I quit often in my docker files with the copy command.
I would copy the csproj files first.
Then I would do a run.net restore.
So that command is cached.
Then I would copy the rest of the source code in before I do my dotnet build.
So what this means is that when I change my source files,
unless it's the csproj file, it won't invalidate that dotnet restore layer
and do a full dotnet restore each time I do a docker build.
If you're not following me so far, it might be worth re-listening because
it's going to go a little bit deeper now and we're going to talk about something called
multi-stage docker files.
Now what this does, it allows you to have in the docker file the from command that I mentioned
before that specifies the base image. It allows you to have multiple from commands.
Now before multi-stage docker files were a thing, one problem was that the base image
required to actually build your application tends to be a lot bigger than the base image
you want to use at runtime. So for example, when you do a dotnet restore
or a dotnet publish, those commands need the SDK base image. It needs the SDK tools.
There's a much lighter dotnet runtime base image,
but that lighter image can't actually build your application as part of the docker build.
And this was really complicated to solve before multi-stage docker files,
but multi-stage docker files make this super simple.
It allows you to have different sections in your docker file,
each with their own from command, and you can copy from a previous section
with a copy command. So you typically have two stages. The build stage would use your SDK base
image with all the build tools in, which is a much larger image. And then the second stage
would be based off the runtime base image, which doesn't have all the build tools
and is much lighter weight. And you can just copy the built artifacts from the first stage
into the second stage. And the resulting image at the end of the docker build
would be this second lightweight one that just contains your compiled files.
So hard to explain over audio, so perhaps just take away from that. There's this thing
called multi-stage docker files that you should really know, and perhaps look at a few examples
in the docs. But I did want to make sure that I covered multi-stage docker files,
just so you know they exist, as they are quite an important concept.
Right, okay, so next up, I'm just going to talk a little bit about container orchestrators.
It's mostly out of scope of this docker episode. But basically, when running containers in production,
there's a lot to think about. How do we handle traffic routing, health monitoring, scaling out,
zero downtime when deploying, resiliency if the host goes down, and lots of other things as well.
And this is where you'd use a container orchestrator tool, like Kubernetes.
There's a few different other ones, but Kubernetes has pretty much won the container orchestration
battle in the same way that Git has won the source control battle.
You still build the images in the same way as I described earlier.
It would just be that your container orchestration platform would then pull those images down
from the container registry in the same way, and it would create containers
from that image and manage them for you. So it's the same kind of pattern. A container orchestrator
just does a lot of the extra stuff that you need in production.
So for local development, I must admit, I personally don't use docker for
locally developing the actual service I'm working on.
Because I honestly don't see the value in it. I generally work with .NET.
I've got .NET installed on my machine. I can just use it natively. I don't need to use docker
for this. I think some other languages that maybe that language doesn't work that well
in Windows or something. Then there's things like dev containers and various things.
But I've not used those. I should really have a play. If you're a .NET developer,
I've got to say I don't really see the value in the service you're currently working on.
You're currently developing using docker for that.
But as I mentioned earlier, for local development, where I would use docker
is for running third party services like SQL Server, RabbitMQ, Azureite, and so on.
Each of my projects has a docker compose file in the project in source control,
which has whatever dependencies that project uses.
Then one single docker compose up command can spin them all up. It really is magic.
And if I'm working on a project that has lots of different services,
maybe a microservice kind of architecture, then I might use docker compose or even Kubernetes
locally to spin up all the other services. But I don't tend to use it for this service
I'm actually working on. It's worth noting that when you spin up these other services
using docker compose, the service I'm working on can still access them via the local host hostname.
So things that are running inside containers like your other services or databases,
and also the service you're working on at the moment, running in Rider or Visual Studio,
they can play nicely together because it's all on the same network.
So I think that's perhaps enough for this episode.
If you're new to the concept of containers, then this has perhaps been a bit too much to
taking in one go. If this is the case, I'd recommend, as I mentioned at the start,
go away and have a play with docker locally, just try different commands,
run different images, just play around, then come back and re-listen.
And hopefully it'll start to click then.
As I said before, it's really worth making an investment learning docker.
There are so many different use cases, and hopefully this episode has helped a little
bit along that journey. As usual, before I finish, a very quick dev tip.
Very recently, I saw an article that apparently they're now adding an argument
to the .NET publish command that can tell it to output a built image.
So I'll make sure that I include a link to that in the show notes, that article.
And lastly, a quick reminder that this podcast is sponsored by Everstack,
which is my own company providing software development and consultation services.
For more information, visit Everstack.com.
And if you enjoy the podcast, please do help me spread the word on social media.
I normally use the hashtag UnhandledException,
and I can be found on Twitter at Dracan and my DMs are open.
And my blog, DanCloch.com, has links to all my social stuff too.
And I'll include links to all the things I've mentioned today in the show notes,
which can be found on unhandledexceptionpodcast.com.
Episode suivant:
Les infos glanées
TheUnhandledExceptionPodcast
Tags
gRPC - with Poornima Nayar