Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.
Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.
One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Properly sanitise the --cgroup-parent path, to avoid potential issues
(as it starts creating directories and writing to files as root). In
addition, fix an infinite recursion due to incomplete base cases.
It might be a good idea to move pathClean to a separate library (which
deals with path safety concerns, so all of runC and Docker can take
advantage of it).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
When we launch a container in a new user namespace, we cannot create
devices, so we bind mount the host's devices into place instead.
If we are running in a user namespace (i.e. nested in a container),
then we need to do the same thing. Add a function to detect that
and check for it before doing mknod.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
Changelog - add a comment clarifying what's going on with the
uidmap file.
This flag allows systems that are running runc to allocate tty's that
they own and provide to the container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.
Signed-off-by: Ido Yariv <ido@wizery.com>
`godep save` and `godep update` don't copy `*_test.go` files and
`testdata` directories by default, and we have no reason to keep
them in Godeps.
This is what I did:
1. A clean GOPATH with no vendors in
2. godep restore
3. remove Godeps dir in the repo
4. godep save
Note that I'm using the latest godep, so we also vendored all
necessary licese files because of https://github.com/tools/godep/pul/301
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>