Commit Graph

2172 Commits

Author SHA1 Message Date
Aleksa Sarai 103853ead7 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai a95483402e libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai f36ed4b174 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai db3159c9d9 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:32 +11:00
Alexander Morozov 421ebfd688 Merge pull request #459 from crosbymichael/console-path
Add --console to specify path to use from runc
2016-01-11 14:13:05 -08:00
Alexander Morozov c0cad6aa5e Merge pull request #451 from cyphar/fix-infinite-recursion
cgroups: fs: fix cgroup.Parent path sanitisation
2016-01-11 08:52:26 -08:00
Mrunal Patel d43108184e Merge pull request #458 from hallyn/userns
Handle running nested in a user namespace
2016-01-11 08:41:46 -08:00
Aleksa Sarai bf899fef45 cgroups: fs: fix cgroup.Parent path sanitisation
Properly sanitise the --cgroup-parent path, to avoid potential issues
(as it starts creating directories and writing to files as root). In
addition, fix an infinite recursion due to incomplete base cases.

It might be a good idea to move pathClean to a separate library (which
deals with path safety concerns, so all of runC and Docker can take
advantage of it).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-11 23:10:35 +11:00
Alexander Morozov 910752f1f5 Merge pull request #463 from jimmidyson/non-recursive-pids
Revert to non-recursive GetPids, add recursive GetAllPids
2016-01-08 13:55:00 -08:00
Serge Hallyn c0ad40c5e6 Do not create devices when in user namespace
When we launch a container in a new user namespace, we cannot create
devices, so we bind mount the host's devices into place instead.

If we are running in a user namespace (i.e. nested in a container),
then we need to do the same thing.  Add a function to detect that
and check for it before doing mknod.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 Changelog - add a comment clarifying what's going on with the
	     uidmap file.
2016-01-08 12:54:08 -08:00
Jimmi Dyson 91c7024e52 Revert to non-recursive GetPids, add recursive GetAllPids
Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>
2016-01-08 19:42:25 +00:00
Ahmet Alp Balkan c8b5e150f1 selinux: add SelinuxSetEnforceMode implementation
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
2016-01-08 16:48:30 +00:00
xlgao-zju cdc53051a3 update date in README
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2016-01-08 10:48:11 +08:00
Mrunal Patel 749928a0a1 Merge pull request #421 from rajasec/selinux-compileflag
Adding selinux label
2016-01-07 17:57:54 -08:00
Michael Crosby 4c4c9b85b7 Add --console to specify path to use from runc
This flag allows systems that are running runc to allocate tty's that
they own and provide to the container.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-07 15:01:36 -08:00
Serge Hallyn 2e13570679 Do not allow access to /dev/tty{0,1}
These are the real host devices, container should not generally
have or need them.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-01-06 18:42:17 -08:00
Mrunal Patel f03b7f8317 Merge pull request #419 from rajasec/selinux-teststepfix
make localtest failure with selinux enabled
2016-01-06 12:44:03 -08:00
Mrunal Patel 4fda64bc07 Merge pull request #452 from hqhq/hq_bindmount_whitelist
Add white list for bind mount check
2016-01-06 11:16:10 -08:00
Qiang Huang 9c1242ecba Add white list for bind mount chec
Fixes: #400

It would be useful to use fuse to isolate proc info.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-06 14:48:40 +08:00
Mrunal Patel 5c46b9d438 Merge pull request #448 from hqhq/hq_cleanup_godeps
Cleanup Godeps
2016-01-05 16:16:21 -08:00
Mrunal Patel fa24ebf26c Merge pull request #311 from crosbymichael/destory-state
Implement Container States
2016-01-04 09:59:28 -08:00
Qiang Huang 5a398b5cfc Merge pull request #449 from HackToday/fixdoc
Fix typo word in SPEC.md
2016-01-04 09:01:59 +08:00
Kai Qiang WU(Kennan) c71d8e69f1 Fix typo word in SPEC.md
Signed-off-by: Kai Qiang WU(Kennan) <wkq5325@gmail.com>
2015-12-30 00:30:58 +00:00
Ido Yariv 55a8d686a9 libcontainer: Add support for memcg pressure notifications
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-12-28 13:36:55 -05:00
Qiang Huang f4fc51b59c Cleanup Godeps
`godep save` and `godep update` don't copy `*_test.go` files and
`testdata` directories by default, and we have no reason to keep
them in Godeps.

This is what I did:
1. A clean GOPATH with no vendors in
2. godep restore
3. remove Godeps dir in the repo
4. godep save

Note that I'm using the latest godep, so we also vendored all
necessary licese files because of https://github.com/tools/godep/pul/301

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-28 09:34:37 +08:00
Mrunal Patel d97d5e8b00 Merge pull request #445 from mrunalp/revert-58
Revert "cgroups: add pids controller support"
2015-12-19 07:55:24 -08:00
Mrunal Patel 4124ba9468 Revert "cgroups: add pids controller support"
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-19 07:48:48 -08:00
Mrunal Patel bc465742ac Merge pull request #58 from cyphar/18-add-pids-controller
cgroups: add pids controller support
2015-12-18 19:55:51 -08:00
Aleksa Sarai 14ed8696c1 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:48 +11:00
Aleksa Sarai 88e6d489f6 libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 8a740d5391 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 37789f5bf1 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:38 +11:00
Michael Crosby 766e4c5250 Merge pull request #437 from clnperez/nlahdrlen-fix-for-gccgo
Add NLA_HDRLEN workaround for gccgo
2015-12-18 15:57:26 -08:00
Ben Hall ec9d8769f0 Add a Dockerfile and a dbuild target. This allows you to build runC via Docker without having Golang installed on the host
Signed-off-by: Ben Hall <ben@benhall.me.uk>
2015-12-18 11:37:12 +00:00
Christy Perez ced8e5e7ba Caclulate NLA_HDRLEN as gccgo workaround
syscall.NLA_HDRLEN is not in gccgo (as of 5.3), so in the meantime
use the #defines taken from linux/netlink.h.

See https://github.com/golang/go/issues/13629

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2015-12-17 17:36:47 -06:00
Michael Crosby 4415446c32 Add state pattern for container state transition
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add state status() method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Allow multiple checkpoint on restore

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handle leave-running state

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Fix state transitions for inprocess

Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove isDestroyed method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handling Pausing from freezer state

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

freezer status

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Fixing review comments

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Added comment when freezer not available

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Conflicts:
	libcontainer/container_linux.go

Change checkFreezer logic to isPaused()

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove state base and factor out destroy func

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add unit test for state transitions

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-17 13:55:38 -08:00
Qiang Huang 9d6ce7168a Merge pull request #434 from mrunalp/resources
Move the cgroups setting into a Resources struct
2015-12-17 09:34:29 +08:00
Mrunal Patel 55a49f2110 Move the cgroups setting into a Resources struct
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-16 15:53:31 -05:00
Alexander Morozov ac448818e5 Merge pull request #439 from ZJU-SEL/update-version
update version for release 0.0.6
2015-12-16 08:37:34 -08:00
xlgao-zju 3ff8e80662 update version for release 0.0.6
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-12-16 23:25:19 +08:00
Alexander Morozov ba1568de39 Merge pull request #436 from calavera/linux_process
Move linux only Process.InitializeIO behind the linux build flag.
2015-12-15 14:21:50 -08:00
David Calavera 77c36f4b34 Move linux only Process.InitializeIO behind the linux build flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-15 15:12:29 -05:00
Mrunal Patel 334b935833 Merge pull request #435 from calavera/use_go_units
Replace docker units package with new docker/go-units.
2015-12-15 09:34:00 -08:00
David Calavera 977991d36f Replace docker units package with new docker/go-units.
It's the same library but it won't live in docker/docker anymore.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-14 20:45:30 -05:00
Mrunal Patel 11f8fdca33 Merge pull request #430 from crosbymichael/pipes
Move STDIO initialization to libcontainer.Process
2015-12-11 14:30:42 -08:00
Alexander Morozov cb04f03854 Merge pull request #336 from hqhq/hq_parent_cgroup_systemd
systemd: support cgroup parent with specified slice
2015-12-11 10:13:47 -08:00
Mrunal Patel 6672d63ec7 Merge pull request #432 from ZJU-SEL/fix-exist
fix minor typo
2015-12-11 10:08:07 -08:00
Mrunal Patel d29479b4f9 Merge pull request #431 from hqhq/fix_readme_v1_time
Remove the timeframe for v1 spec
2015-12-11 10:06:33 -08:00
xlgao-zju ff29daafc0 fix minor typo
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-12-11 21:37:32 +08:00
Qiang Huang bf00d9c367 Remove the timeframe for v1 spec
Fixes: #429

We missed the former one and haven't got a new one, remove
it from README to avoid confusing.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-11 09:11:17 +08:00