Commit Graph

284 Commits

Author SHA1 Message Date
Alexander Morozov 6c9532f063 Merge pull request #461 from ahmetalpbalkan/selinux-setenforce
selinux: add SelinuxSetEnforceMode implementation
2016-01-15 13:01:27 -08:00
Alexander Morozov f2f8f0e4e6 Merge pull request #462 from hqhq/hq_fix_libcontainer_readme
Update README of libcontainer
2016-01-15 13:00:44 -08:00
Mrunal Patel 6259f09e97 Merge pull request #426 from gitido/pressure_level
libcontainer: Add support for memcg pressure notifications
2016-01-14 16:23:07 -08:00
Alexander Morozov 8962f371d6 Merge pull request #472 from dadgar/b-find-cgroup-mount
Only validate post-hyphen field length on cgroup mounts
2016-01-14 15:08:11 -08:00
Alexander Morozov 3b42992948 Merge pull request #455 from hallyn/tty01
Do not allow access to /dev/tty{0,1}
2016-01-14 14:35:46 -08:00
Qiang Huang d87ac4a2ca Update README of libcontainer
Fixes: #438

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-14 14:53:29 +08:00
Alex Dadgar a42f3236d5 Only validate post-hyphen field length on cgroup mounts
Signed-off-by: Alex Dadgar <alex.dadgar@gmail.com>
2016-01-13 11:28:49 -08:00
Mrunal Patel 4c767d7046 Merge pull request #446 from cyphar/18-add-pids-controller
cgroup: add PIDs cgroup controller support
2016-01-11 16:56:00 -08:00
Aleksa Sarai 103853ead7 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai a95483402e libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai f36ed4b174 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai db3159c9d9 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:32 +11:00
Alexander Morozov c0cad6aa5e Merge pull request #451 from cyphar/fix-infinite-recursion
cgroups: fs: fix cgroup.Parent path sanitisation
2016-01-11 08:52:26 -08:00
Mrunal Patel d43108184e Merge pull request #458 from hallyn/userns
Handle running nested in a user namespace
2016-01-11 08:41:46 -08:00
Aleksa Sarai bf899fef45 cgroups: fs: fix cgroup.Parent path sanitisation
Properly sanitise the --cgroup-parent path, to avoid potential issues
(as it starts creating directories and writing to files as root). In
addition, fix an infinite recursion due to incomplete base cases.

It might be a good idea to move pathClean to a separate library (which
deals with path safety concerns, so all of runC and Docker can take
advantage of it).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-11 23:10:35 +11:00
Alexander Morozov 910752f1f5 Merge pull request #463 from jimmidyson/non-recursive-pids
Revert to non-recursive GetPids, add recursive GetAllPids
2016-01-08 13:55:00 -08:00
Serge Hallyn c0ad40c5e6 Do not create devices when in user namespace
When we launch a container in a new user namespace, we cannot create
devices, so we bind mount the host's devices into place instead.

If we are running in a user namespace (i.e. nested in a container),
then we need to do the same thing.  Add a function to detect that
and check for it before doing mknod.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 Changelog - add a comment clarifying what's going on with the
	     uidmap file.
2016-01-08 12:54:08 -08:00
Jimmi Dyson 91c7024e52 Revert to non-recursive GetPids, add recursive GetAllPids
Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>
2016-01-08 19:42:25 +00:00
Ahmet Alp Balkan c8b5e150f1 selinux: add SelinuxSetEnforceMode implementation
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
2016-01-08 16:48:30 +00:00
Mrunal Patel 749928a0a1 Merge pull request #421 from rajasec/selinux-compileflag
Adding selinux label
2016-01-07 17:57:54 -08:00
Serge Hallyn 2e13570679 Do not allow access to /dev/tty{0,1}
These are the real host devices, container should not generally
have or need them.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-01-06 18:42:17 -08:00
Mrunal Patel f03b7f8317 Merge pull request #419 from rajasec/selinux-teststepfix
make localtest failure with selinux enabled
2016-01-06 12:44:03 -08:00
Mrunal Patel 4fda64bc07 Merge pull request #452 from hqhq/hq_bindmount_whitelist
Add white list for bind mount check
2016-01-06 11:16:10 -08:00
Qiang Huang 9c1242ecba Add white list for bind mount chec
Fixes: #400

It would be useful to use fuse to isolate proc info.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-06 14:48:40 +08:00
Mrunal Patel fa24ebf26c Merge pull request #311 from crosbymichael/destory-state
Implement Container States
2016-01-04 09:59:28 -08:00
Kai Qiang WU(Kennan) c71d8e69f1 Fix typo word in SPEC.md
Signed-off-by: Kai Qiang WU(Kennan) <wkq5325@gmail.com>
2015-12-30 00:30:58 +00:00
Ido Yariv 55a8d686a9 libcontainer: Add support for memcg pressure notifications
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-12-28 13:36:55 -05:00
Mrunal Patel 4124ba9468 Revert "cgroups: add pids controller support"
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-19 07:48:48 -08:00
Mrunal Patel bc465742ac Merge pull request #58 from cyphar/18-add-pids-controller
cgroups: add pids controller support
2015-12-18 19:55:51 -08:00
Aleksa Sarai 14ed8696c1 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:48 +11:00
Aleksa Sarai 88e6d489f6 libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 8a740d5391 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 37789f5bf1 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:38 +11:00
Michael Crosby 766e4c5250 Merge pull request #437 from clnperez/nlahdrlen-fix-for-gccgo
Add NLA_HDRLEN workaround for gccgo
2015-12-18 15:57:26 -08:00
Christy Perez ced8e5e7ba Caclulate NLA_HDRLEN as gccgo workaround
syscall.NLA_HDRLEN is not in gccgo (as of 5.3), so in the meantime
use the #defines taken from linux/netlink.h.

See https://github.com/golang/go/issues/13629

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2015-12-17 17:36:47 -06:00
Michael Crosby 4415446c32 Add state pattern for container state transition
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add state status() method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Allow multiple checkpoint on restore

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handle leave-running state

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Fix state transitions for inprocess

Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove isDestroyed method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handling Pausing from freezer state

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

freezer status

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Fixing review comments

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Added comment when freezer not available

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Conflicts:
	libcontainer/container_linux.go

Change checkFreezer logic to isPaused()

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove state base and factor out destroy func

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add unit test for state transitions

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-17 13:55:38 -08:00
Qiang Huang 9d6ce7168a Merge pull request #434 from mrunalp/resources
Move the cgroups setting into a Resources struct
2015-12-17 09:34:29 +08:00
Mrunal Patel 55a49f2110 Move the cgroups setting into a Resources struct
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-16 15:53:31 -05:00
David Calavera 77c36f4b34 Move linux only Process.InitializeIO behind the linux build flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-15 15:12:29 -05:00
David Calavera 977991d36f Replace docker units package with new docker/go-units.
It's the same library but it won't live in docker/docker anymore.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-14 20:45:30 -05:00
Mrunal Patel 11f8fdca33 Merge pull request #430 from crosbymichael/pipes
Move STDIO initialization to libcontainer.Process
2015-12-11 14:30:42 -08:00
Alexander Morozov cb04f03854 Merge pull request #336 from hqhq/hq_parent_cgroup_systemd
systemd: support cgroup parent with specified slice
2015-12-11 10:13:47 -08:00
xlgao-zju ff29daafc0 fix minor typo
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-12-11 21:37:32 +08:00
Michael Crosby 29b139f702 Move STDIO initialization to libcontainer.Process
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-10 16:11:49 -08:00
Mrunal Patel 0267ad05b0 Merge pull request #340 from dqminh/replace-env-netlink
nsexec: replace usage of environment variable with netlink message
2015-12-09 14:21:45 -08:00
Michael Crosby 9c9aac5385 Export console New func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-09 11:59:10 -08:00
Daniel, Dao Quang Minh 7d423cb7a1 setns: replace env with netlink for bootstrap data
replace passing of pid and console path via environment variable with passing
them with netlink message via an established pipe.

this change requires us to set _LIBCONTAINER_INITTYPE and
_LIBCONTAINER_INITPIPE as the env environment of the bootstrap process as we
only send the bootstrap data for setns process right now. When init and setns
bootstrap process are unified (i.e., init use nsexec instead of Go to clone new
process), we can remove _LIBCONTAINER_INITTYPE.

Note:
- we read nlmsghdr first before reading the content so we can get the total
  length of the payload and allocate buffer properly instead of allocating
  one large buffer.

- check read bytes vs the wanted number. It's an error if we failed to read
  the desired number of bytes from the pipe into the buffer.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-12-03 18:03:48 +00:00
Qiang Huang 7695a0ddb0 systemd: support cgroup parent with specified slice
Pick up #119
Fixes: docker/docker#16681

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-02 23:57:02 -05:00
Mrunal Patel 3317785f56 Merge pull request #420 from runcom/cgroups-unsupported
libcontainer: configs: create cgroup_unsupported.go in order to build on darwin as well
2015-11-30 09:20:23 -08:00
Alexander Morozov decba54d78 Merge pull request #424 from runcom/fix-go-vet
libcontainer: network_linux.go: fix go vet
2015-11-30 09:06:41 -08:00