There were issues where a process could die before pausing completed
leaving the container in an inconsistent state and unable to be
destoryed. This makes sure that if the container is paused and the
process is dead it will unfreeze the cgroup before removing them.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Set `-1` doesn't mean disable swap, disable swap means you
can't use swap memory, set `-1` really means you can use
unlimited swap memory.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.
Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.
One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Properly sanitise the --cgroup-parent path, to avoid potential issues
(as it starts creating directories and writing to files as root). In
addition, fix an infinite recursion due to incomplete base cases.
It might be a good idea to move pathClean to a separate library (which
deals with path safety concerns, so all of runC and Docker can take
advantage of it).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
When we launch a container in a new user namespace, we cannot create
devices, so we bind mount the host's devices into place instead.
If we are running in a user namespace (i.e. nested in a container),
then we need to do the same thing. Add a function to detect that
and check for it before doing mknod.
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
Changelog - add a comment clarifying what's going on with the
uidmap file.
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.
Signed-off-by: Ido Yariv <ido@wizery.com>
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.
Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).
Signed-off-by: Aleksa Sarai <asarai@suse.com>
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
syscall.NLA_HDRLEN is not in gccgo (as of 5.3), so in the meantime
use the #defines taken from linux/netlink.h.
See https://github.com/golang/go/issues/13629
Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add state status() method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow multiple checkpoint on restore
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handle leave-running state
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fix state transitions for inprocess
Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove isDestroyed method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handling Pausing from freezer state
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
freezer status
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Fixing review comments
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Added comment when freezer not available
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Conflicts:
libcontainer/container_linux.go
Change checkFreezer logic to isPaused()
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove state base and factor out destroy func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add unit test for state transitions
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>