Commit Graph

27 Commits

Author SHA1 Message Date
Akihiro Suda 06f789cf26 Disable rootless mode except RootlessCgMgr when executed as the root in userns
This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and
`RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc.

`RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in
the current user namespace. `RootlessEUID` is almost identical to the former `Rootless`
except cgroups stuff.

`RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups.
`RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace.
Otherwise `RootlessCgroups` is set to true.
(Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well)

When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes),
`RootlessEUID` is set to false but `RootlessCgroups` is set to true.
So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored.

This PR does not have any impact on CLI flags and `state.json`.

Note about CLI:
* Now `runc --rootless=(auto|true|false)` CLI flag is only used for setting `RootlessCgroups`.
* Now `runc spec --rootless` is only required when `RootlessEUID` is set to true.
  For runc-in-userns, `runc spec`  without `--rootless` should work, when sufficient numbers of
  UID/GID are mapped.

Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`):
* `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility.
  (`/run/runc` is used)
* If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`.
  This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`.

Note about `state.json`:
* `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-09-07 15:05:03 +09:00
Andrei Vagin 8187fb740c cr: don't dump network devices and their configuration
RunC doesn't manage network devices and their configuration,
so it is impossible to describe external dependencies to restore them
back.

This means that all users have to set --empty-ns network, so let's do
this by default.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
2018-07-10 23:24:19 -07:00
Akihiro Suda f103de57ec main: support rootless mode in userns
Running rootless containers in userns is useful for mounting
filesystems (e.g. overlay) with mapped euid 0, but without actual root
privilege.

Usage: (Note that `unshare --mount` requires `--map-root-user`)

  user$ mkdir lower upper work rootfs
  user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz | tar Cxz ./lower || ( true; echo "mknod errors were ignored" )
  user$ unshare --mount --map-root-user
  mappedroot# runc spec --rootless
  mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json
  mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs
  mappedroot# runc run foo

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-10 12:16:43 +09:00
Ma Shimiao 5061fd3e6e stopped container can't be checkpoint
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-12-07 15:43:56 +08:00
Adrian Reber 60ae7091de checkpoint: support lazy migration
With the help of userfaultfd CRIU supports lazy migration. Lazy
migration means that memory pages are only transferred from the
migration source to the migration destination on page fault.

This enables to reduce the downtime during process or container
migration to a minimum as the memory does not need to be transferred
during migration.

Lazy migration currently depends on userfaultfd being available on the
current Linux kernel and if the used CRIU version supports lazy
migration. Both dependencies can be checked by querying CRIU via RPC if
the lazy migration feature is available. Using feature checking instead
of version comparison enables runC to use CRIU features from the
criu-dev branch. This way the user can decide if lazy migration should
be available by choosing the right kernel and CRIU branch.

To use lazy migration the CRIU process during dump needs to dump
everything besides the memory pages and then it opens a network port
waiting for remote page fault requests:

 # runc checkpoint httpd --lazy-pages --page-server 0.0.0.0:27 \
  --status-fd /tmp/postcopy-pipe

In this example CRIU will hang/wait once it has opened the network port
and wait for network connection. As runC waits for CRIU to finish it
will also hang until the lazy migration has finished. To know when the
restore on the destination side can start the '--status-fd' parameter is
used:

 #️ runc checkpoint --help | grep status
  --status-fd value   criu writes \0 to this FD once lazy-pages is ready

The parameter '--status-fd' is directly from CRIU and this way the
process outside of runC which controls the migration knows exactly when
to transfer the checkpoint (without memory pages) to the destination and
that the restore can be started.

On the destination side it is necessary to start CRIU in 'lazy-pages'
mode like this:

 # criu lazy-pages --page-server --address 192.168.122.3 --port 27 \
  -D checkpoint

and tell runC to do a lazy restore:

 # runc restore -d --image-path checkpoint --work-path checkpoint \
  --lazy-pages httpd

If both processes on the restore side have the same working directory
'criu lazy-pages' creates a unix domain socket where it waits for
requests from the actual restore. runC starts CRIU restore in lazy
restore mode and talks to 'criu lazy-pages' that it wants to restore
memory pages on demand. CRIU continues to restore the process and once
the process is running and accesses the first non-existing memory page
the 'criu lazy-pages' server will request the page from the source
system. Thus all pages from the source system will be transferred to the
destination system. Once all pages have been transferred runC on the
source system will end and the container will have finished migration.

This can also be combined with CRIU's pre-copy support. The combination
of pre-copy and post-copy (lazy migration) provides the possibility to
migrate containers with minimal downtimes.

Some additional background about post-copy migration can be found in
these articles:

 https://lisas.de/~adrian/?p=1253
 https://lisas.de/~adrian/?p=1183

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-09-06 12:35:38 +00:00
Nikolas Sepos 3f234b15d0 Add auto-dedup flag for checkpoint/restore
When doing incremental dumps is useful to use auto deduplication of
memory images to save space.

Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>
2017-08-18 16:19:21 +02:00
Christy Perez 187d2d85be Moving the rest of runc to x/sys/unix
Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:36:02 -05:00
Tim Potter 9458b39ca9 Fix misspelling of "properties" in various places
Signed-off-by: Tim Potter <tpot@hpe.com>
2017-04-21 13:29:58 +10:00
Aleksa Sarai d2f49696b0
runc: add support for rootless containers
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:

* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update

The following commands work:

* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state

In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:45:24 +11:00
Deng Guangxing 98f004182b add pre-dump and parent-path to checkpoint
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.

This patch add pre-dump and parent-path arguments to runc checkpoint

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2017-02-14 19:45:07 +08:00
Aleksa Sarai c6d8a2f26f
merge branch 'pr-1158'
Closes #1158
LGTMs: @hqhq @cyphar
2016-12-26 13:59:47 +11:00
Zhang Wei 8eea644ccc Bump runtime-spec to v1.0.0-rc3
* Bump underlying runtime-spec to version 1.0.0-rc3
* Fix related changed struct names in config.go

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-17 14:02:35 +08:00
Zhang Wei b517076907 Check args numbers before application start
Add a general args number validator for all client commands.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-11-29 11:18:51 +08:00
Aleksa Sarai 38560a0316
checkpoint: fix gofmt
Fixes: a60040c62d ("Container must not checkpoint in created state")
Fixes: #1076
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-19 05:37:24 +11:00
rajasec a60040c62d Container must not checkpoint in created state
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-09-25 21:09:23 +05:30
Mrunal Patel a753b06645 Replace github.com/codegangsta/cli by github.com/urfave/cli
The package got moved to a different repository

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-06-06 11:47:20 -07:00
Qiang Huang 2503fca35d Update man pages to refect the latest cli change
The major change is the description of options, change
it as the latest cli help message shows, which specify
a "value" after an option if it takes value, and add
(default: xxx) if the option has a default value.

This also includes some other minor consistency fixes.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-28 13:33:57 +08:00
Andrew Vagin 22d60d9874 checkpoint: add the empty-ns option
For example:
./runc checkpoint --empty-ns network CTID

In this case criu creates a network namespace, but doesn't restore it.

We are going to use this option to restore docker containers and
Docker sets a hook to restore a network namespace.

https://github.com/xemul/criu/issues/165
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
2016-05-28 06:21:17 +03:00
Qiang Huang 8477638aab Update cli package
The old one has bug when showing help message for IntFlags.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-10 13:58:09 +08:00
Mike Brown f4e37ab63e updating usage for runc and runc commands
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2016-02-17 09:00:39 -06:00
Michael Crosby 4415446c32 Add state pattern for container state transition
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add state status() method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Allow multiple checkpoint on restore

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handle leave-running state

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Fix state transitions for inprocess

Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove isDestroyed method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handling Pausing from freezer state

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

freezer status

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Fixing review comments

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Added comment when freezer not available

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Conflicts:
	libcontainer/container_linux.go

Change checkFreezer logic to isPaused()

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove state base and factor out destroy func

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add unit test for state transitions

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-17 13:55:38 -08:00
Hui Kang 25da513c4b Add option to support criu manage cgroups mode for dump and restore
CRIU supports cgroup-manage mode from v1.7

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-10-11 04:42:54 +00:00
Rajasekaran 57cc442c13 Fixing checkpoint issue
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2015-09-04 16:20:45 +05:30
Mrunal Patel f3a3025933 Fix minor stylistic issues
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-04 17:44:45 -04:00
Marianna 5aa82c950d Enable build on unsupported platforms
Should compile now without errors but changes needed to be added for each system so it actually works.
main_unsupported.go is a new file with all the unsupported commands
Fixes #9

Signed-off-by: Marianna <mtesselh@gmail.com>
2015-06-29 17:03:44 -07:00
mapk0y 986dc0f730 checkpoint/restore commands support 'file-locks' option.
Signed-off-by: mapk0y <mapk0y@gmail.com>
2015-06-27 18:56:24 +09:00
Michael Crosby 9fac183294 Initial commit of runc binary
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:34:13 -07:00