Commit Graph

3982 Commits

Author SHA1 Message Date
Adrian Reber 7354546cc8
Create mountpoints also on restore
runc creates all missing mountpoints when it starts a container, this
commit also creates those mountpoints during restore. Now it is possible
to restore a container using the same, but newly created rootfs just as
during container start.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-02-08 15:59:51 +01:00
Adrian Reber f661e02343
factor out bind mount mountpoint creation
During rootfs setup all mountpoints (directory and files) are created
before bind mounting the bind mounts. This does not happen during
container restore via CRIU. If restoring in an identical but newly created
rootfs, the restore fails right now. This just factors out the code to
create the bind mount mountpoints so that it also can be used during
restore.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-02-08 15:59:51 +01:00
Aleksa Sarai 6635b4f0c6
merge branch 'cve-2019-5736'
nsenter: clone /proc/self/exe to avoid exposing host binary to container

Fixes: CVE-2019-5736
LGTMs: @cyphar @crosbymichael
2019-02-08 18:58:10 +11:00
Aleksa Sarai 0a8e4117e7
nsenter: clone /proc/self/exe to avoid exposing host binary to container
There are quite a few circumstances where /proc/self/exe pointing to a
pretty important container binary is a _bad_ thing, so to avoid this we
have to make a copy (preferably doing self-clean-up and not being
writeable).

We require memfd_create(2) -- though there is an O_TMPFILE fallback --
but we can always extend this to use a scratch MNT_DETACH overlayfs or
tmpfs. The main downside to this approach is no page-cache sharing for
the runc binary (which overlayfs would give us) but this is far less
complicated.

This is only done during nsenter so that it happens transparently to the
Go code, and any libcontainer users benefit from it. This also makes
ExtraFiles and --preserve-fds handling trivial (because we don't need to
worry about it).

Fixes: CVE-2019-5736
Co-developed-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-02-08 18:57:59 +11:00
Aleksa Sarai dd023c457d
merge branch 'pr-1972'
Update vendored golang.org/x/sys to latest

LGTMs: @crosbymichael @cyphar
Closes #1972
2019-02-08 18:52:59 +11:00
John Howard ec069fe332 Vendor opencontainers/runtime-spec 29686dbc
Signed-off-by: John Howard <jhoward@microsoft.com>
2019-02-07 14:49:22 -08:00
Filipe Brandenburger 4a600c04ed Update vendored golang.org/x/sys to latest
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2019-02-06 17:59:21 -08:00
Mrunal Patel e4fa8a4575
Merge pull request #1955 from xiaochenshen/rdt-fix-destroy-issue
libcontainer: intelrdt: fix null intelrdt path issue in Destroy()
2019-02-01 13:18:56 -08:00
Mrunal Patel 4e4c907193
Merge pull request #1950 from cloudfoundry-incubator/enter-pid-race
Resilience in adding of exec tasks to cgroups
2019-02-01 13:18:16 -08:00
Mrunal Patel 6994ff2742
Merge pull request #1967 from cyphar/integration-factory-fixup
integration: fix mis-use of libcontainer.Factory
2019-02-01 13:16:36 -08:00
Michael Crosby 8011af4a96
Merge pull request #1964 from adrianreber/org.criu
Document 'org.criu.config' annotation
2019-01-25 14:28:19 -05:00
Aleksa Sarai 565325fc36
integration: fix mis-use of libcontainer.Factory
For some reason, libcontainer/integration has a whole bunch of incorrect
usages of libcontainer.Factory -- causing test failures with a set of
security patches that will be published soon. Fixing ths is fairly
trivial (switch to creating a new libcontainer.Factory once in each
process, rather than creating one in TestMain globally).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-01-24 23:12:48 +13:00
Adrian Reber dd50c7e332
Add 'org.criu.config' annotation documentation
Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-15 19:54:47 +01:00
Adrian Reber 5f32bb94fd
Update runc-checkpoint man-page
This just copies the latest output from 'runc checkpoint --help' to the
man page.

Signed-off-by: Adrian Reber <areber@redhat.com>
2019-01-15 19:54:47 +01:00
Michael Crosby c1e454b2a1
Merge pull request #1960 from giuseppe/fix-kmem-systemd
systemd: fix setting kernel memory limit
2019-01-15 13:21:01 -05:00
Michael Crosby 4e9d52da54
Merge pull request #1933 from adrianreber/master
Add CRIU configuration file support
2019-01-15 11:22:38 -05:00
Aleksa Sarai 12f6a99120
merge branch 'pr-1962'
rootfs: umount all procfs and sysfs with --no-pivot

LGTMs: @mrunalp @cyphar
Closes #1962
2019-01-15 15:15:53 +11:00
Giuseppe Scrivano 28a697cce3
rootfs: umount all procfs and sysfs with --no-pivot
When creating a new user namespace, the kernel doesn't allow to mount
a new procfs or sysfs file system if there is not already one instance
fully visible in the current mount namespace.

When using --no-pivot we were effectively inhibiting this protection
from the kernel, as /proc and /sys from the host are still present in
the container mount namespace.

A container without full access to /proc could then create a new user
namespace, and from there able to mount a fully visible /proc, bypassing
the limitations in the container.

A simple reproducer for this issue is:

unshare -mrfp sh -c "mount -t proc none /proc && echo c > /proc/sysrq-trigger"

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-14 09:53:35 +01:00
Giuseppe Scrivano f01923376d
systemd: fix setting kernel memory limit
since commit df3fa115f9 it is not
possible to set a kernel memory limit when using the systemd cgroups
backend as we use cgroup.Apply twice.

Skip enabling kernel memory if there are already tasks in the cgroup.

Without this patch, runc fails with:

container_linux.go:344: starting container process caused
"process_linux.go:311: applying cgroup configuration for process
caused \"failed to set memory.kmem.limit_in_bytes, because either
tasks have already joined this cgroup or it has children\""

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-01-10 11:33:50 +01:00
Xiaochen Shen acb75d0e38 libcontainer: intelrdt: fix null intelrdt path issue in Destroy()
This patch fixes a corner case when destroy a container:

If we start a container without 'intelRdt' config set, and then we run
“runc update --l3-cache-schema/--mem-bw-schema” to add 'intelRdt' config
implicitly.

Now if we enter "exit" from the container inside, we will pass through
linuxContainer.Destroy() -> state.destroy() -> intelRdtManager.Destroy().
But in IntelRdtManager.Destroy(), IntelRdtManager.Path is still null
string, it hasn’t been initialized yet. As a result, the created rdt
group directory during "runc update" will not be removed as expected.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2019-01-05 00:34:25 +08:00
Adrian Reber 403986c5dd
Add CRIU patch to fix checkpoint test
For the newly integrated feature to use CRIU configuration files the
test is broken without an additional CRIU patch.

The test changes CRIU's log file. Changing the log file is unfortunately
the only thing which is in broken in CRIU 3.11. But it is the easiest
option for testing. With CRIU 3.12 this will be fixed. All other CRIU
options can be changed with a CRIU configuration file.

With this change the CRIU 3.11 feature can be merged into runc with a
test and for the user it should just work, if they are not trying to
change CRIU's log file.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber 6f3e13cc48
Added test for container specific CRIU configuration files
Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber e157963054
Enable CRIU configuration files
CRIU 3.11 introduces configuration files:

https://criu.org/Configuration_files
https://lisas.de/~adrian/posts/2018-Nov-08-criu-configuration-files.html

This enables the user to influence CRIU's behaviour without code changes
if using new CRIU features or if the user wants to enable certain CRIU
behaviour without always specifying certain options.

With this it is possible to write 'tcp-established' to the configuration
file:

$ echo tcp-established > /etc/criu/runc.conf

and from now on all checkpoints will preserve the state of established
TCP connections. This removes the need to always use

$ runc checkpoint --tcp-stablished

If the goal is to always checkpoint with '--tcp-established'

It also adds the possibility for unexpected CRIU behaviour if the user
created a configuration file at some point in time and forgets about it.

As a result of the discussion in https://github.com/opencontainers/runc/pull/1933
it is now also possible to define a CRIU configuration file for each
container with the annotation 'org.criu.config'.

If 'org.criu.config' does not exist, runc will tell CRIU to use
'/etc/criu/runc.conf' if it exists.

If 'org.criu.config' is set to an empty string (''), runc will tell CRIU
to not use any runc specific configuration file at all.

If 'org.criu.config' is set to a non-empty string, runc will use that
value as an additional configuration file for CRIU.

With the annotation the user can decide to use the default configuration
file ('/etc/criu/runc.conf'), none or a container specific configuration
file.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Adrian Reber 360ba8a27d
Update criurpc definition for latest features
Signed-off-by: Adrian Reber <areber@redhat.com>
2018-12-21 07:42:12 +01:00
Michael Crosby bbb17efcb4
Merge pull request #1952 from JoeWrightss/patch-4
Fix .Fatalf() error message
2018-12-20 09:18:50 -05:00
JoeWrightss 0855bce448 Fix .Fatalf() error message
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-19 20:22:48 +08:00
Tom Godkin bdf3524b34 Retry adding pids to cgroups when EINVAL occurs
The kernel will sometimes return EINVAL when writing a pid to a
cgroup.procs file. It does so when the task being added still has the
state TASK_NEW.

See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286

Co-authored-by: Danail Branekov <danailster@gmail.com>

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
2018-12-17 15:34:47 +00:00
Aleksa Sarai f5b99917df
merge branch 'pr-1945'
Fix some typos

LGTMs: @crosbymichael @cyphar
Closes #1945
2018-12-11 03:43:44 +11:00
JoeWrightss 769d6c4a75 Fix some typos
Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>
2018-12-09 23:52:54 +08:00
Daniel, Dao Quang Minh 859f74576e
Merge pull request #1942 from KentaTada/fix-kernel-config-to-adjust-to-moby
Modify check-config.sh in accordance with Moby Project updates
2018-12-08 20:15:53 +00:00
Michael Crosby 25f3f893c8
Merge pull request #1939 from cyphar/nokmem-error
cgroups: nokmem: error out on explicitly-set kmemcg limits
2018-12-04 11:14:56 -05:00
Michael Crosby 96ec2177ae
Merge pull request #1943 from giuseppe/allow-to-signal-paused-containers
kill: allow to signal paused containers
2018-12-03 16:55:13 -05:00
Michael Crosby ff38d6e7cc
Merge pull request #1944 from Ace-Tang/criu_notify_pid
cr: get pid from criu notify when restore
2018-12-03 10:35:58 -05:00
Ace-Tang dce70cdff5 cr: get pid from criu notify when restore
when restore container from a checkpoint directory, we should get
pid from criu notify, since c.initProcess has not been created.

Signed-off-by: Ace-Tang <aceapril@126.com>
2018-12-03 13:31:20 +08:00
Aleksa Sarai 8a4629f7b5
cgroups: nokmem: error out on explicitly-set kmemcg limits
When built with nokmem we explicitly are disabling support for kmemcg,
but it is a strict specification requirement that if we cannot fulfil an
aspect of the container configuration that we error out.

Completely ignoring explicitly-requested kmemcg limits with nokmem would
undoubtably lead to problems.

Fixes: 6a2c155968 ("libcontainer: ability to compile without kmem")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-12-01 14:31:35 +11:00
Giuseppe Scrivano 07d1ad44c8
kill: allow to signal paused containers
regression introduced by 87a188996e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-11-30 23:35:47 +01:00
Kenta Tada 30817421ef Modify check-config.sh in accordance with Moby Project updates
This commit modifies check-config.sh to keep up with current kernel config.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2018-11-30 16:38:19 +09:00
Michael Crosby 4932620b62
Merge pull request #1919 from xiaochenshen/rdt-mba-software-controller
libcontainer: intelrdt: add support for Intel RDT/MBA Software Controller in runc
2018-11-26 16:45:42 -05:00
Michael Crosby 1d1597315c
Merge pull request #1940 from cyphar/rohit-maintainership
MAINTAINERS: remove @rjnagal and @vmarmol
2018-11-26 15:37:08 -05:00
Aleksa Sarai a020000185
MAINTAINERS: remove @vmarmol
After discussion with Victor, he mentioned that he wanted to rescind
his maintainership a few years ago (due to a change in priorities and
what he's been working on) but wasn't sure what the right process is.

Thanks for your hard work Victor!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-24 23:19:12 +11:00
Aleksa Sarai 2efedb02aa
MAINTAINERS: remove @rjnagal
After talking to Rohit, he mentioned that he wasn't aware he was still a
maintainer (and that his maintainership was grandfathered from his
Docker maintainership). He's moved on to other projects now, and thus
said he would happily step down as maintainer. (Since he's stepping down
voluntarily, this doesn't require a mailing-list vote.)

Thanks for all of your hard work, Rohit!

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-23 09:47:50 +11:00
Aleksa Sarai 9397a6f62e
merge branch 'pr-1937'
VERSION: back to development
  VERSION: release v1.0.0~rc6

Votes: +5 -0 /2
LGTMs: @crosbymichael @hqhq
Closes #1937
2018-11-22 22:47:26 +11:00
Michael Crosby 50e2634995
Merge pull request #1934 from lifubang/kill
fix: may kill other process when container has been stopped
2018-11-21 10:30:25 -05:00
Lifubang 87a188996e may kill other process when container has been stopped
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-11-21 17:44:52 +08:00
Aleksa Sarai 061dfe95ad
VERSION: back to development
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:55:26 +11:00
Aleksa Sarai ccb5efd37f
VERSION: release v1.0.0~rc6
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-11-21 13:54:59 +11:00
Aleksa Sarai 73856f6d6f
merge branch 'pr-1936'
Small fixes for CRIU based test cases

LGTMs: @crosbymichael @cyphar
Closes #1936
2018-11-20 12:26:02 +11:00
Aleksa Sarai ceefc3fe4e
merge branch 'pr-1741'
libcontainer: Set 'status' in hook stdin

LGTMs: @cyphar @crosbymichael
Closes #1741
2018-11-20 06:39:30 +11:00
Adrian Reber bc0b047198
Small fixes for CRIU based test cases
This removes unnecessary lines from checkpoint.bats like:

 sed -i 's;"readonly": true;"readonly": false;' config.json

and adds (and corrects) comments which are leftover from older
versions of checkpoint.bats.

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-11-19 16:08:29 +01:00
Michael Crosby 785fd9c39d
Merge pull request #1935 from adrianreber/criu-3.11
Bump CRIU to 3.11
2018-11-19 09:32:06 -05:00