Commit Graph

257 Commits

Author SHA1 Message Date
Aleksa Sarai db3159c9d9 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:32 +11:00
Mrunal Patel fa24ebf26c Merge pull request #311 from crosbymichael/destory-state
Implement Container States
2016-01-04 09:59:28 -08:00
Kai Qiang WU(Kennan) c71d8e69f1 Fix typo word in SPEC.md
Signed-off-by: Kai Qiang WU(Kennan) <wkq5325@gmail.com>
2015-12-30 00:30:58 +00:00
Mrunal Patel 4124ba9468 Revert "cgroups: add pids controller support"
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-19 07:48:48 -08:00
Mrunal Patel bc465742ac Merge pull request #58 from cyphar/18-add-pids-controller
cgroups: add pids controller support
2015-12-18 19:55:51 -08:00
Aleksa Sarai 14ed8696c1 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:48 +11:00
Aleksa Sarai 88e6d489f6 libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 8a740d5391 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 37789f5bf1 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:38 +11:00
Michael Crosby 766e4c5250 Merge pull request #437 from clnperez/nlahdrlen-fix-for-gccgo
Add NLA_HDRLEN workaround for gccgo
2015-12-18 15:57:26 -08:00
Christy Perez ced8e5e7ba Caclulate NLA_HDRLEN as gccgo workaround
syscall.NLA_HDRLEN is not in gccgo (as of 5.3), so in the meantime
use the #defines taken from linux/netlink.h.

See https://github.com/golang/go/issues/13629

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2015-12-17 17:36:47 -06:00
Michael Crosby 4415446c32 Add state pattern for container state transition
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add state status() method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Allow multiple checkpoint on restore

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handle leave-running state

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Fix state transitions for inprocess

Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove isDestroyed method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handling Pausing from freezer state

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

freezer status

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Fixing review comments

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Added comment when freezer not available

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Conflicts:
	libcontainer/container_linux.go

Change checkFreezer logic to isPaused()

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove state base and factor out destroy func

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add unit test for state transitions

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-17 13:55:38 -08:00
Qiang Huang 9d6ce7168a Merge pull request #434 from mrunalp/resources
Move the cgroups setting into a Resources struct
2015-12-17 09:34:29 +08:00
Mrunal Patel 55a49f2110 Move the cgroups setting into a Resources struct
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-16 15:53:31 -05:00
David Calavera 77c36f4b34 Move linux only Process.InitializeIO behind the linux build flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-15 15:12:29 -05:00
David Calavera 977991d36f Replace docker units package with new docker/go-units.
It's the same library but it won't live in docker/docker anymore.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-14 20:45:30 -05:00
Mrunal Patel 11f8fdca33 Merge pull request #430 from crosbymichael/pipes
Move STDIO initialization to libcontainer.Process
2015-12-11 14:30:42 -08:00
Alexander Morozov cb04f03854 Merge pull request #336 from hqhq/hq_parent_cgroup_systemd
systemd: support cgroup parent with specified slice
2015-12-11 10:13:47 -08:00
xlgao-zju ff29daafc0 fix minor typo
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-12-11 21:37:32 +08:00
Michael Crosby 29b139f702 Move STDIO initialization to libcontainer.Process
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-10 16:11:49 -08:00
Mrunal Patel 0267ad05b0 Merge pull request #340 from dqminh/replace-env-netlink
nsexec: replace usage of environment variable with netlink message
2015-12-09 14:21:45 -08:00
Michael Crosby 9c9aac5385 Export console New func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-09 11:59:10 -08:00
Daniel, Dao Quang Minh 7d423cb7a1 setns: replace env with netlink for bootstrap data
replace passing of pid and console path via environment variable with passing
them with netlink message via an established pipe.

this change requires us to set _LIBCONTAINER_INITTYPE and
_LIBCONTAINER_INITPIPE as the env environment of the bootstrap process as we
only send the bootstrap data for setns process right now. When init and setns
bootstrap process are unified (i.e., init use nsexec instead of Go to clone new
process), we can remove _LIBCONTAINER_INITTYPE.

Note:
- we read nlmsghdr first before reading the content so we can get the total
  length of the payload and allocate buffer properly instead of allocating
  one large buffer.

- check read bytes vs the wanted number. It's an error if we failed to read
  the desired number of bytes from the pipe into the buffer.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-12-03 18:03:48 +00:00
Qiang Huang 7695a0ddb0 systemd: support cgroup parent with specified slice
Pick up #119
Fixes: docker/docker#16681

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-02 23:57:02 -05:00
Mrunal Patel 3317785f56 Merge pull request #420 from runcom/cgroups-unsupported
libcontainer: configs: create cgroup_unsupported.go in order to build on darwin as well
2015-11-30 09:20:23 -08:00
Alexander Morozov decba54d78 Merge pull request #424 from runcom/fix-go-vet
libcontainer: network_linux.go: fix go vet
2015-11-30 09:06:41 -08:00
Antonio Murdaca 3029587085 libcontainer: network_linux.go: fix go vet
This patch fixes the following go vet warnings:
```
libcontainer/network_linux.go:96: github.com/vishvananda/netlink.Device
composite literal uses unkeyed fields
libcontainer/network_linux.go:114: github.com/vishvananda/netlink.Device
composite literal uses unkeyed fields
```

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-30 12:31:18 +01:00
Rajasekaran 49ff2711e1 Fixing xattr test step issue
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2015-11-29 09:24:42 +05:30
Antonio Murdaca 112493115f libcontainer: configs: create cgroup_unsupported.go in order to build on darwin as well
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-27 10:28:29 +01:00
Daniel, Dao Quang Minh d914bf7347 setns: add bootstrap data
add bootstrap data to setns process. If we have any bootstrap data then copy it
to the bootstrap process (i.e. nsexec) using the sync pipe. This will allow us
to eventually replace environment variable usage with more structured data
to setup namespaces, write pid/gid map, setgroup etc.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-11-22 11:36:58 +00:00
rajasec 949d822675 Adding error conditions when apparmor disabled
Signed-off-by: rajasec <rajasec79@gmail.com>

Add the changes to errors in lower case

Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-22 13:14:18 +05:30
Antonio Murdaca 400e05fe5b libcontainer: configs: extend unsupported os
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-19 18:24:34 +01:00
Alexander Morozov 776791463d Merge pull request #357 from ashahab-altiscale/350-container-in-container
Bind mount device nodes on EPERM
2015-11-16 14:54:02 -08:00
Qiang Huang 96f0eefa1a Fix comment to be consistent with the code
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-16 19:16:27 +08:00
Abin Shahab 28c9d0252c Userns container in containers
Enables launching userns containers by catching EPERM errors for writing
to devices cgroups, and for mknod invocations.

Signed-off-by: Abin Shahab <ashahab@altiscale.com>
2015-11-15 14:42:35 -08:00
Alexander Morozov 48fdc50d09 Merge pull request #398 from crosbymichael/seccomp-trace
Add seccomp trace support
2015-11-13 10:54:18 -08:00
Alexander Morozov bda4ca2f8f Merge pull request #388 from hqhq/hq_cgroup_cleanups
Some cgroup cleanups
2015-11-13 09:06:18 -08:00
Michael Crosby caca840972 Add seccomp trace support
Closes #347

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-12 17:03:53 -08:00
Michael Crosby 2be14dc963 Merge pull request #392 from mrunalp/poststart
Add poststart hooks
2015-11-12 16:34:38 -08:00
Michael Crosby 879dfdd980 Fix race setting process opts
When starting and quering for pids a container can start and exit before
this is set.  So set the opts after the process is started and while
libcontainer still has the container's process blocking on the pipe.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-06 16:51:59 -08:00
Mrunal Patel 452e8a73c5 Integrate poststart hooks with spec
* Call poststart hooks after the container is started
* Tie in with spec configuration

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-11-06 18:03:32 -05:00
Mrunal Patel bb2d3cd1be Add Poststart hook to libcontainer config
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-11-06 18:02:50 -05:00
Qiang Huang 209c8d9979 Add some comments about cgroup
We fixed some bugs and introduced some code hard to be
understood, add some comments for them.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
Qiang Huang 8c98ae27ac Refactor cgroupData
The former cgroup entry is confusing, separate it to parent
and name.
Rename entry `c` to `config`.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
Qiang Huang a263afaf6c Rename parent and data
'parent' function is confusing with parent cgroup, it's actually
parent path, so rename it to parentPath.

The name 'data' is too common to be identified, rename it to cgroupData
which is exactly what it is.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
John Howard a919bd3f67 Windows: Refactor Container interface
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-11-02 15:12:16 -08:00
Mrunal Patel c42a2952c4 Merge pull request #361 from jhowardmsft/jjh/criu_opts
Windows: Factor down criu_opts
2015-11-02 15:05:27 -08:00
Mrunal Patel 7caef5626b Merge pull request #359 from jhowardmsft/jjh/state_struct
Windows: Refactor state struct
2015-11-02 15:04:12 -08:00
Mrunal Patel cf73b32eeb Merge pull request #343 from hqhq/hq_unify_behavior_for_memory
Unify behavior for memory cgroup
2015-11-02 14:58:31 -08:00
Michael Crosby 26eb6a1bcd Merge pull request #377 from rhatdan/label
Docker needs to know whether the user requested a relabel
2015-11-02 14:55:27 -08:00