Commit Graph

3750 Commits

Author SHA1 Message Date
Michael Crosby 25f3f893c8
Merge pull request #1939 from cyphar/nokmem-error
cgroups: nokmem: error out on explicitly-set kmemcg limits
2018-12-04 11:14:56 -05:00
Michael Crosby 96ec2177ae
Merge pull request #1943 from giuseppe/allow-to-signal-paused-containers
kill: allow to signal paused containers
2018-12-03 16:55:13 -05:00
Michael Crosby ff38d6e7cc
Merge pull request #1944 from Ace-Tang/criu_notify_pid
cr: get pid from criu notify when restore
2018-12-03 10:35:58 -05:00
Ace-Tang dce70cdff5 cr: get pid from criu notify when restore
when restore container from a checkpoint directory, we should get
pid from criu notify, since c.initProcess has not been created.

Signed-off-by: Ace-Tang <aceapril@126.com>
2018-12-03 13:31:20 +08:00
Aleksa Sarai 8a4629f7b5
cgroups: nokmem: error out on explicitly-set kmemcg limits
When built with nokmem we explicitly are disabling support for kmemcg,
but it is a strict specification requirement that if we cannot fulfil an
aspect of the container configuration that we error out.

Completely ignoring explicitly-requested kmemcg limits with nokmem would
undoubtably lead to problems.

Fixes: 6a2c155968 ("libcontainer: ability to compile without kmem")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-01 14:31:35 +11:00
Giuseppe Scrivano 07d1ad44c8
kill: allow to signal paused containers
regression introduced by 87a188996e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-30 23:35:47 +01:00
Michael Crosby 4932620b62
Merge pull request #1919 from xiaochenshen/rdt-mba-software-controller
libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
2018-11-26 16:45:42 -05:00
Michael Crosby 1d1597315c
Merge pull request #1940 from cyphar/rohit-maintainership
MAINTAINERS: remove @rjnagal and @vmarmol
2018-11-26 15:37:08 -05:00
Aleksa Sarai a020000185
MAINTAINERS: remove @vmarmol
After discussion with Victor, he mentioned that he wanted to rescind
his maintainership a few years ago (due to a change in priorities and
what he's been working on) but wasn't sure what the right process is.

Thanks for your hard work Victor!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-24 23:19:12 +11:00
Aleksa Sarai 2efedb02aa
MAINTAINERS: remove @rjnagal
After talking to Rohit, he mentioned that he wasn't aware he was still a
maintainer (and that his maintainership was grandfathered from his
Docker maintainership). He's moved on to other projects now, and thus
said he would happily step down as maintainer. (Since he's stepping down
voluntarily, this doesn't require a mailing-list vote.)

Thanks for all of your hard work, Rohit!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-23 09:47:50 +11:00
Aleksa Sarai 9397a6f62e
merge branch 'pr-1937'
VERSION: back to development
  VERSION: release v1.0.0~rc6

Votes: +5 -0 /2
LGTMs: @crosbymichael @hqhq
Closes #1937
2018-11-22 22:47:26 +11:00
Michael Crosby 50e2634995
Merge pull request #1934 from lifubang/kill
fix: may kill other process when container has been stopped
2018-11-21 10:30:25 -05:00
Lifubang 87a188996e may kill other process when container has been stopped
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-11-21 17:44:52 +08:00
Aleksa Sarai 061dfe95ad
VERSION: back to development
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:55:26 +11:00
Aleksa Sarai ccb5efd37f
VERSION: release v1.0.0~rc6
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:54:59 +11:00
Aleksa Sarai 73856f6d6f
merge branch 'pr-1936'
Small fixes for CRIU based test cases

LGTMs: @crosbymichael @cyphar
Closes #1936
2018-11-20 12:26:02 +11:00
Aleksa Sarai ceefc3fe4e
merge branch 'pr-1741'
libcontainer: Set 'status' in hook stdin

LGTMs: @cyphar @crosbymichael
Closes #1741
2018-11-20 06:39:30 +11:00
Adrian Reber bc0b047198
Small fixes for CRIU based test cases
This removes unnecessary lines from checkpoint.bats like:

 sed -i 's;"readonly": true;"readonly": false;' config.json

and adds (and corrects) comments which are leftover from older
versions of checkpoint.bats.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-19 16:08:29 +01:00
Michael Crosby 785fd9c39d
Merge pull request #1935 from adrianreber/criu-3.11
Bump CRIU to 3.11
2018-11-19 09:32:06 -05:00
Adrian Reber 3763427777
Bump CRIU to 3.11
Upgrade CRIU to 3.11 in the Dockerfile as it includes the patch which
was manually added to fix an error with read-only root containers.

Now that the patch is part of the CRIU 3.11 release this simplifies the
Dockerfile (minimal).

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-19 10:08:09 +01:00
Aleksa Sarai bbe4b6fbfc
merge branch 'pr-1930'
add missing intelRdt parameters in 'runc update' manpage

LGTMs: @crosbymichael @cyphar
Closes #1930
2018-11-18 22:41:09 +11:00
Michael Crosby 76520a4bf0
Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint
Respect container's cgroup path
2018-11-16 14:06:54 -05:00
Lin Yang 4818971526 add missing intelRdt parameters in 'runc update' manpage
Signed-off-by: Lin Yang <lin.a.yang@intel.com>
2018-11-14 15:10:47 -08:00
W. Trevor King e23868603a libcontainer: Set 'status' in hook stdin
Finish off the work started in a344b2d6 (sync up `HookState` with OCI
spec `State`, 2016-12-19, #1201).

And drop HookState, since there's no need for a local alias for
specs.State.

Also set c.initProcess in newInitProcess to support OCIState calls
from within initProcess.start().  I think the cyclic references
between linuxContainer and initProcess are unfortunate, but didn't
want to address that here.

I've also left the timing of the Prestart hooks alone, although the
spec calls for them to happen before start (not as part of creation)
[1,2].  Once the timing gets fixed we can drop the
initProcessStartTime hacks which initProcess.start currently needs.

I'm not sure why we trigger the prestart hooks in response to both
procReady and procHooks.  But we've had two prestart rounds in
initProcess.start since 2f276498 (Move pre-start hooks after container
mounts, 2016-02-17, #568).  I've left that alone too.

I really think we should have len() guards to avoid computing the
state when .Hooks is non-nil but the particular phase we're looking at
is empty.  Aleksa, however, is adamantly against them [3] citing a
risk of sloppy copy/pastes causing the hook slice being len-guarded to
diverge from the hook slice being iterated over within the guard.  I
think that ort of thing is very lo-risk, because:

* We shouldn't be copy/pasting this, right?  DRY for the win :).
* There's only ever a few lines between the guard and the guarded
  loop.  That makes broken copy/pastes easy to catch in review.
* We should have test coverage for these.  Guarding with the wrong
  slice is certainly not the only thing you can break with a sloppy
  copy/paste.

But I'm not a maintainer ;).

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-11-14 06:49:49 -08:00
Michael Crosby 10d38b660a
Merge pull request #1897 from cyphar/epoll-console-fixup
tty: clean up epollConsole closing
2018-11-13 16:52:38 -05:00
Mrunal Patel 4769cdf607
Merge pull request #1916 from crosbymichael/cgns
Add support for cgroup namespace
2018-11-13 12:21:38 -08:00
Mrunal Patel f000fe11ec
Merge pull request #1917 from slp/master
libcontainer: map PidsLimit to systemd's TasksMax property
2018-11-13 12:21:23 -08:00
Michael Crosby aa7917b751
Merge pull request #1911 from theSuess/linter-fixes
Various cleanups to address linter issues
2018-11-13 12:13:34 -05:00
Michael Crosby bd420b59f1
Merge pull request #1925 from Ace-Tang/fix_dup_ns
test: fix TestDupNamespaces fail to test dup-ns error
2018-11-13 12:11:11 -05:00
Xiaochen Shen 95af9eff82 libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".

The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.

For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-11-13 23:27:08 +08:00
Aleksa Sarai bb522d6eca
merge branch 'pr-1928'
rootless: fix potential panic in shouldUseRootlessCgroupManager

LGTMs: @crosbymichael @cyphar
Closes #1928
2018-11-13 16:04:44 +11:00
Ace-Tang 714a4d466a rootless: fix potential panic in shouldUseRootlessCgroupManager
Signed-off-by: Ace-Tang <aceapril@126.com>
2018-11-10 21:04:36 +08:00
Michael Crosby 079817cc26
Merge pull request #1926 from Ace-Tang/fix_spec_proc
libcontainer: fix potential panic if spec.Process is nil
2018-11-07 10:07:30 -05:00
Ace-Tang 16d55f17a8 libcontainer: fix potential panic if spec.Process is nil
for the code logic, pointer 'spec.Process' should be judge first
to avoid panic.

Signed-off-by: Ace-Tang <aceapril@126.com>
2018-11-06 11:55:30 +08:00
Ace-Tang 95d1aa1886 test: fix TestDupNamespaces
add Root in created spec, or error message is 'Root must be specified'

Signed-off-by: Ace-Tang <aceapril@126.com>
2018-11-06 11:36:27 +08:00
Michael Crosby b1068fb925
Merge pull request #1814 from rhatdan/selinux
SELinux labels are tied to the thread
2018-11-05 10:00:11 -05:00
Mrunal Patel 15b24b70df
Merge pull request #1922 from kolyshkin/cgo
Makefile: rm cgo tag
2018-11-02 09:35:33 -07:00
Aleksa Sarai df8dd8d940
merge branch 'pr-1923'
readme: add nokmem build tag

LGTMs: @crosbymichael @cyphar
Closes #1923
2018-11-03 03:04:40 +11:00
Ace-Tang f1b1407e1b readme: add nokmem build tag
Signed-off-by: Ace-Tang <aceapril@126.com>
2018-11-02 11:56:54 +08:00
Kir Kolyshkin 1e0d04c642 Makefile: rm cgo tag
There is no need to explicitly add `cgo` build tag, it is set by
by go tools if cgo is enabled.

Fixes: ecd6463101

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-11-01 17:01:12 -07:00
Aleksa Sarai 9f1e94488e
merge branch 'pr-1921'
libcontainer: ability to compile without kmem

LGTMs: @mrunalp @cyphar
Closes #1921
2018-11-02 09:54:16 +11:00
Michael Crosby 9e5aa7494d
Merge pull request #1918 from giuseppe/skip-setgroups
rootless: fix running with /proc/self/setgroups set to deny
2018-11-01 13:16:47 -04:00
Kir Kolyshkin 6a2c155968 libcontainer: ability to compile without kmem
Commit fe898e7862 (PR #1350) enables kernel memory accounting
for all cgroups created by libcontainer -- even if kmem limit is
not configured.

Kernel memory accounting is known to be broken in some kernels,
specifically the ones from RHEL7 (including RHEL 7.5). Those
kernels do not support kernel memory reclaim, and are prone to
oopses. Unconditionally enabling kmem acct on such kernels lead
to bugs, such as

* https://github.com/opencontainers/runc/issues/1725
* https://github.com/kubernetes/kubernetes/issues/61937
* https://github.com/moby/moby/issues/29638

This commit gives a way to compile runc without kernel memory setting
support. To do so, use something like

	make BUILDTAGS="seccomp nokmem"

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2018-10-31 20:35:51 -07:00
Yuanhong Peng df3fa115f9 Add support for cgroup namespace
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:

```
"namespaces": [
	{
		"type": "pid"
	},
	{
		"type": "network"
	},
	{
		"type": "ipc"
	},
	{
		"type": "uts"
	},
	{
		"type": "mount"
	},
	{
		"type": "cgroup"
	}
],

```

Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.

Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-31 10:51:43 -04:00
Chris Aniszczyk f3ce8221ea
Merge pull request #1913 from xiaochenshen/rdt-add-diagnostics
libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors
2018-10-25 14:27:17 -05:00
Giuseppe Scrivano 869add3318
rootless: fix running with /proc/self/setgroups set to deny
This is a regression from 06f789cf26
when the user namespace was configured without a privileged helper.
To allow a single mapping in an user namespace, it is necessary to set
/proc/self/setgroups to "deny".

For a simple reproducer, the user namespace can be created with
"unshare -r".

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-10-25 15:44:15 +02:00
Sergio Lopez 5c6b9c3c1c libcontainer: map PidsLimit to systemd's TasksMax property
Currently runc applies PidsLimit restriction by writing directly to
cgroup's pids.max, without notifying systemd. As a consequence, when the
later updates the context of the corresponding scope, pids.max is reset
to the value of systemd's TasksMax property.

This can be easily reproduced this way (I'm using "postfix" here just an
example, any unrelated but existing service will do):

 # CTR=`docker run --pids-limit 111 --detach --rm busybox /bin/sleep 8h`
 # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max
 111
 # systemctl disable --now postfix
 # systemctl enable --now postfix
 # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max
 max

This patch adds TasksAccounting=true and TasksMax=PidsLimit to the
properties sent to systemd.

Signed-off-by: Sergio Lopez <slp@redhat.com>
2018-10-24 17:20:27 +02:00
Aleksa Sarai e93996674f
merge branch 'pr-1903'
clarify license information

LGTMs: @hqhq @cyphar
Closes #1903
2018-10-24 22:03:44 +11:00
Aleksa Sarai 9a3a8a5ebf libcontainer: implement CLONE_NEWCGROUP
This is a very simple implementation because it doesn't require any
configuration unlike the other namespaces, and in its current state it
only masks paths.

This feature is available in Linux 4.6+ and is enabled by default for
kernels compiled with CONFIG_CGROUP=y.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-10-23 16:23:00 -04:00
Michael Crosby 7ca079fdeb
Merge pull request #1915 from HaraldNordgren/go_versions
Bump Travis versions
2018-10-23 16:22:37 -04:00