Commit Graph

3870 Commits

Author SHA1 Message Date
Aleksa Sarai 565325fc36
integration: fix mis-use of libcontainer.Factory
For some reason, libcontainer/integration has a whole bunch of incorrect
usages of libcontainer.Factory -- causing test failures with a set of
security patches that will be published soon. Fixing ths is fairly
trivial (switch to creating a new libcontainer.Factory once in each
process, rather than creating one in TestMain globally).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-01-24 23:12:48 +13:00
Adrian Reber dd50c7e332
Add 'org.criu.config' annotation documentation
Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-15 19:54:47 +01:00
Adrian Reber 5f32bb94fd
Update runc-checkpoint man-page
This just copies the latest output from 'runc checkpoint --help' to the
man page.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-15 19:54:47 +01:00
Michael Crosby c1e454b2a1
Merge pull request #1960 from giuseppe/fix-kmem-systemd
systemd: fix setting kernel memory limit
2019-01-15 13:21:01 -05:00
Michael Crosby 4e9d52da54
Merge pull request #1933 from adrianreber/master
Add CRIU configuration file support
2019-01-15 11:22:38 -05:00
Aleksa Sarai 12f6a99120
merge branch 'pr-1962'
rootfs: umount all procfs and sysfs with --no-pivot

LGTMs: @mrunalp @cyphar
Closes #1962
2019-01-15 15:15:53 +11:00
Giuseppe Scrivano 28a697cce3
rootfs: umount all procfs and sysfs with --no-pivot
When creating a new user namespace, the kernel doesn't allow to mount
a new procfs or sysfs file system if there is not already one instance
fully visible in the current mount namespace.

When using --no-pivot we were effectively inhibiting this protection
from the kernel, as /proc and /sys from the host are still present in
the container mount namespace.

A container without full access to /proc could then create a new user
namespace, and from there able to mount a fully visible /proc, bypassing
the limitations in the container.

A simple reproducer for this issue is:

unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger"

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-14 09:53:35 +01:00
Giuseppe Scrivano f01923376d
systemd: fix setting kernel memory limit
since commit df3fa115f9 it is not
possible to set a kernel memory limit when using the systemd cgroups
backend as we use cgroup.Apply twice.

Skip enabling kernel memory if there are already tasks in the cgroup.

Without this patch, runc fails with:

container_linux.go:344: starting container process caused
"process_linux.go:311: applying cgroup configuration for process
caused \"failed to set memory.kmem.limit_in_bytes, because either
tasks have already joined this cgroup or it has children\""

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-10 11:33:50 +01:00
Xiaochen Shen acb75d0e38 libcontainer: intelrdt: fix null intelrdt path issue in Destroy()
This patch fixes a corner case when destroy a container:

If we start a container without 'intelRdt' config set, and then we run
“runc update --l3-cache-schema/--mem-bw-schema” to add 'intelRdt' config
implicitly.

Now if we enter "exit" from the container inside, we will pass through
linuxContainer.Destroy() -> state.destroy() -> intelRdtManager.Destroy().
But in IntelRdtManager.Destroy(), IntelRdtManager.Path is still null
string, it hasn’t been initialized yet. As a result, the created rdt
group directory during "runc update" will not be removed as expected.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2019-01-05 00:34:25 +08:00
Adrian Reber 403986c5dd
Add CRIU patch to fix checkpoint test
For the newly integrated feature to use CRIU configuration files the
test is broken without an additional CRIU patch.

The test changes CRIU's log file. Changing the log file is unfortunately
the only thing which is in broken in CRIU 3.11. But it is the easiest
option for testing. With CRIU 3.12 this will be fixed. All other CRIU
options can be changed with a CRIU configuration file.

With this change the CRIU 3.11 feature can be merged into runc with a
test and for the user it should just work, if they are not trying to
change CRIU's log file.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber 6f3e13cc48
Added test for container specific CRIU configuration files
Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber e157963054
Enable CRIU configuration files
CRIU 3.11 introduces configuration files:

https://criu.org/Configuration_files
https://lisas.de/~adrian/posts/2018-Nov-08-criu-configuration-files.html

This enables the user to influence CRIU's behaviour without code changes
if using new CRIU features or if the user wants to enable certain CRIU
behaviour without always specifying certain options.

With this it is possible to write 'tcp-established' to the configuration
file:

$ echo tcp-established > /etc/criu/runc.conf

and from now on all checkpoints will preserve the state of established
TCP connections. This removes the need to always use

$ runc checkpoint --tcp-stablished

If the goal is to always checkpoint with '--tcp-established'

It also adds the possibility for unexpected CRIU behaviour if the user
created a configuration file at some point in time and forgets about it.

As a result of the discussion in https://github.com/opencontainers/runc/pull/1933
it is now also possible to define a CRIU configuration file for each
container with the annotation 'org.criu.config'.

If 'org.criu.config' does not exist, runc will tell CRIU to use
'/etc/criu/runc.conf' if it exists.

If 'org.criu.config' is set to an empty string (''), runc will tell CRIU
to not use any runc specific configuration file at all.

If 'org.criu.config' is set to a non-empty string, runc will use that
value as an additional configuration file for CRIU.

With the annotation the user can decide to use the default configuration
file ('/etc/criu/runc.conf'), none or a container specific configuration
file.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber 360ba8a27d
Update criurpc definition for latest features
Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Michael Crosby bbb17efcb4
Merge pull request #1952 from JoeWrightss/patch-4
Fix .Fatalf() error message
2018-12-20 09:18:50 -05:00
JoeWrightss 0855bce448 Fix .Fatalf() error message
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-19 20:22:48 +08:00
Tom Godkin bdf3524b34 Retry adding pids to cgroups when EINVAL occurs
The kernel will sometimes return EINVAL when writing a pid to a
cgroup.procs file. It does so when the task being added still has the
state TASK_NEW.

See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
2018-12-17 15:34:47 +00:00
Aleksa Sarai f5b99917df
merge branch 'pr-1945'
Fix some typos

LGTMs: @crosbymichael @cyphar
Closes #1945
2018-12-11 03:43:44 +11:00
JoeWrightss 769d6c4a75 Fix some typos
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-09 23:52:54 +08:00
Daniel, Dao Quang Minh 859f74576e
Merge pull request #1942 from KentaTada/fix-kernel-config-to-adjust-to-moby
Modify check-config.sh in accordance with Moby Project updates
2018-12-08 20:15:53 +00:00
Michael Crosby 25f3f893c8
Merge pull request #1939 from cyphar/nokmem-error
cgroups: nokmem: error out on explicitly-set kmemcg limits
2018-12-04 11:14:56 -05:00
Michael Crosby 96ec2177ae
Merge pull request #1943 from giuseppe/allow-to-signal-paused-containers
kill: allow to signal paused containers
2018-12-03 16:55:13 -05:00
Michael Crosby ff38d6e7cc
Merge pull request #1944 from Ace-Tang/criu_notify_pid
cr: get pid from criu notify when restore
2018-12-03 10:35:58 -05:00
Ace-Tang dce70cdff5 cr: get pid from criu notify when restore
when restore container from a checkpoint directory, we should get
pid from criu notify, since c.initProcess has not been created.

Signed-off-by: Ace-Tang <aceapril@126.com>
2018-12-03 13:31:20 +08:00
Aleksa Sarai 8a4629f7b5
cgroups: nokmem: error out on explicitly-set kmemcg limits
When built with nokmem we explicitly are disabling support for kmemcg,
but it is a strict specification requirement that if we cannot fulfil an
aspect of the container configuration that we error out.

Completely ignoring explicitly-requested kmemcg limits with nokmem would
undoubtably lead to problems.

Fixes: 6a2c155968 ("libcontainer: ability to compile without kmem")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-01 14:31:35 +11:00
Giuseppe Scrivano 07d1ad44c8
kill: allow to signal paused containers
regression introduced by 87a188996e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-30 23:35:47 +01:00
Kenta Tada 30817421ef Modify check-config.sh in accordance with Moby Project updates
This commit modifies check-config.sh to keep up with current kernel config.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2018-11-30 16:38:19 +09:00
Michael Crosby 4932620b62
Merge pull request #1919 from xiaochenshen/rdt-mba-software-controller
libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
2018-11-26 16:45:42 -05:00
Michael Crosby 1d1597315c
Merge pull request #1940 from cyphar/rohit-maintainership
MAINTAINERS: remove @rjnagal and @vmarmol
2018-11-26 15:37:08 -05:00
Aleksa Sarai a020000185
MAINTAINERS: remove @vmarmol
After discussion with Victor, he mentioned that he wanted to rescind
his maintainership a few years ago (due to a change in priorities and
what he's been working on) but wasn't sure what the right process is.

Thanks for your hard work Victor!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-24 23:19:12 +11:00
Aleksa Sarai 2efedb02aa
MAINTAINERS: remove @rjnagal
After talking to Rohit, he mentioned that he wasn't aware he was still a
maintainer (and that his maintainership was grandfathered from his
Docker maintainership). He's moved on to other projects now, and thus
said he would happily step down as maintainer. (Since he's stepping down
voluntarily, this doesn't require a mailing-list vote.)

Thanks for all of your hard work, Rohit!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-23 09:47:50 +11:00
Aleksa Sarai 9397a6f62e
merge branch 'pr-1937'
VERSION: back to development
  VERSION: release v1.0.0~rc6

Votes: +5 -0 /2
LGTMs: @crosbymichael @hqhq
Closes #1937
2018-11-22 22:47:26 +11:00
Michael Crosby 50e2634995
Merge pull request #1934 from lifubang/kill
fix: may kill other process when container has been stopped
2018-11-21 10:30:25 -05:00
Lifubang 87a188996e may kill other process when container has been stopped
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-11-21 17:44:52 +08:00
Aleksa Sarai 061dfe95ad
VERSION: back to development
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:55:26 +11:00
Aleksa Sarai ccb5efd37f
VERSION: release v1.0.0~rc6
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:54:59 +11:00
Aleksa Sarai 73856f6d6f
merge branch 'pr-1936'
Small fixes for CRIU based test cases

LGTMs: @crosbymichael @cyphar
Closes #1936
2018-11-20 12:26:02 +11:00
Aleksa Sarai ceefc3fe4e
merge branch 'pr-1741'
libcontainer: Set 'status' in hook stdin

LGTMs: @cyphar @crosbymichael
Closes #1741
2018-11-20 06:39:30 +11:00
Adrian Reber bc0b047198
Small fixes for CRIU based test cases
This removes unnecessary lines from checkpoint.bats like:

 sed -i 's;"readonly": true;"readonly": false;' config.json

and adds (and corrects) comments which are leftover from older
versions of checkpoint.bats.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-19 16:08:29 +01:00
Michael Crosby 785fd9c39d
Merge pull request #1935 from adrianreber/criu-3.11
Bump CRIU to 3.11
2018-11-19 09:32:06 -05:00
Adrian Reber 3763427777
Bump CRIU to 3.11
Upgrade CRIU to 3.11 in the Dockerfile as it includes the patch which
was manually added to fix an error with read-only root containers.

Now that the patch is part of the CRIU 3.11 release this simplifies the
Dockerfile (minimal).

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-19 10:08:09 +01:00
Aleksa Sarai bbe4b6fbfc
merge branch 'pr-1930'
add missing intelRdt parameters in 'runc update' manpage

LGTMs: @crosbymichael @cyphar
Closes #1930
2018-11-18 22:41:09 +11:00
Michael Crosby 76520a4bf0
Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint
Respect container's cgroup path
2018-11-16 14:06:54 -05:00
Lin Yang 4818971526 add missing intelRdt parameters in 'runc update' manpage
Signed-off-by: Lin Yang <lin.a.yang@intel.com>
2018-11-14 15:10:47 -08:00
W. Trevor King e23868603a libcontainer: Set 'status' in hook stdin
Finish off the work started in a344b2d6 (sync up `HookState` with OCI
spec `State`, 2016-12-19, #1201).

And drop HookState, since there's no need for a local alias for
specs.State.

Also set c.initProcess in newInitProcess to support OCIState calls
from within initProcess.start().  I think the cyclic references
between linuxContainer and initProcess are unfortunate, but didn't
want to address that here.

I've also left the timing of the Prestart hooks alone, although the
spec calls for them to happen before start (not as part of creation)
[1,2].  Once the timing gets fixed we can drop the
initProcessStartTime hacks which initProcess.start currently needs.

I'm not sure why we trigger the prestart hooks in response to both
procReady and procHooks.  But we've had two prestart rounds in
initProcess.start since 2f276498 (Move pre-start hooks after container
mounts, 2016-02-17, #568).  I've left that alone too.

I really think we should have len() guards to avoid computing the
state when .Hooks is non-nil but the particular phase we're looking at
is empty.  Aleksa, however, is adamantly against them [3] citing a
risk of sloppy copy/pastes causing the hook slice being len-guarded to
diverge from the hook slice being iterated over within the guard.  I
think that ort of thing is very lo-risk, because:

* We shouldn't be copy/pasting this, right?  DRY for the win :).
* There's only ever a few lines between the guard and the guarded
  loop.  That makes broken copy/pastes easy to catch in review.
* We should have test coverage for these.  Guarding with the wrong
  slice is certainly not the only thing you can break with a sloppy
  copy/paste.

But I'm not a maintainer ;).

[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-11-14 06:49:49 -08:00
Michael Crosby 10d38b660a
Merge pull request #1897 from cyphar/epoll-console-fixup
tty: clean up epollConsole closing
2018-11-13 16:52:38 -05:00
Mrunal Patel 4769cdf607
Merge pull request #1916 from crosbymichael/cgns
Add support for cgroup namespace
2018-11-13 12:21:38 -08:00
Mrunal Patel f000fe11ec
Merge pull request #1917 from slp/master
libcontainer: map PidsLimit to systemd's TasksMax property
2018-11-13 12:21:23 -08:00
Michael Crosby aa7917b751
Merge pull request #1911 from theSuess/linter-fixes
Various cleanups to address linter issues
2018-11-13 12:13:34 -05:00
Michael Crosby bd420b59f1
Merge pull request #1925 from Ace-Tang/fix_dup_ns
test: fix TestDupNamespaces fail to test dup-ns error
2018-11-13 12:11:11 -05:00
Xiaochen Shen 95af9eff82 libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
MBA Software Controller feature is introduced in Linux kernel v4.18.
It is a software enhancement to mitigate some limitations in MBA which
describes in kernel documentation. It also makes the interface more user
friendly - we could specify memory bandwidth in "MBps" (Mega Bytes per
second) as well as in "percentages".

The kernel underneath would use a software feedback mechanism or a
"Software Controller" which reads the actual bandwidth using MBM
counters and adjust the memory bandwidth percentages to ensure:
"actual memory bandwidth < user specified memory bandwidth".

We could enable this feature through mount option "-o mba_MBps":
mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl

In runc, we handle both memory bandwidth schemata in unified format:
"MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
The unit of memory bandwidth is specified in "percentages" by default,
and in "MBps" if MBA Software Controller is enabled.

For more information about Intel RDT and MBA Software Controller:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-11-13 23:27:08 +08:00