This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and
`RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc.
`RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in
the current user namespace. `RootlessEUID` is almost identical to the former `Rootless`
except cgroups stuff.
`RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups.
`RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace.
Otherwise `RootlessCgroups` is set to true.
(Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well)
When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes),
`RootlessEUID` is set to false but `RootlessCgroups` is set to true.
So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored.
This PR does not have any impact on CLI flags and `state.json`.
Note about CLI:
* Now `runc --rootless=(auto|true|false)` CLI flag is only used for setting `RootlessCgroups`.
* Now `runc spec --rootless` is only required when `RootlessEUID` is set to true.
For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of
UID/GID are mapped.
Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`):
* `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility.
(`/run/runc` is used)
* If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`.
This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`.
Note about `state.json`:
* `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`.
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
RunC doesn't manage network devices and their configuration,
so it is impossible to describe external dependencies to restore them
back.
This means that all users have to set --empty-ns network, so let's do
this by default.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
With the help of userfaultfd CRIU supports lazy migration. Lazy
migration means that memory pages are only transferred from the
migration source to the migration destination on page fault.
This enables to reduce the downtime during process or container
migration to a minimum as the memory does not need to be transferred
during migration.
Lazy migration currently depends on userfaultfd being available on the
current Linux kernel and if the used CRIU version supports lazy
migration. Both dependencies can be checked by querying CRIU via RPC if
the lazy migration feature is available. Using feature checking instead
of version comparison enables runC to use CRIU features from the
criu-dev branch. This way the user can decide if lazy migration should
be available by choosing the right kernel and CRIU branch.
To use lazy migration the CRIU process during dump needs to dump
everything besides the memory pages and then it opens a network port
waiting for remote page fault requests:
# runc checkpoint httpd --lazy-pages --page-server 0.0.0.0:27 \
--status-fd /tmp/postcopy-pipe
In this example CRIU will hang/wait once it has opened the network port
and wait for network connection. As runC waits for CRIU to finish it
will also hang until the lazy migration has finished. To know when the
restore on the destination side can start the '--status-fd' parameter is
used:
#️ runc checkpoint --help | grep status
--status-fd value criu writes \0 to this FD once lazy-pages is ready
The parameter '--status-fd' is directly from CRIU and this way the
process outside of runC which controls the migration knows exactly when
to transfer the checkpoint (without memory pages) to the destination and
that the restore can be started.
On the destination side it is necessary to start CRIU in 'lazy-pages'
mode like this:
# criu lazy-pages --page-server --address 192.168.122.3 --port 27 \
-D checkpoint
and tell runC to do a lazy restore:
# runc restore -d --image-path checkpoint --work-path checkpoint \
--lazy-pages httpd
If both processes on the restore side have the same working directory
'criu lazy-pages' creates a unix domain socket where it waits for
requests from the actual restore. runC starts CRIU restore in lazy
restore mode and talks to 'criu lazy-pages' that it wants to restore
memory pages on demand. CRIU continues to restore the process and once
the process is running and accesses the first non-existing memory page
the 'criu lazy-pages' server will request the page from the source
system. Thus all pages from the source system will be transferred to the
destination system. Once all pages have been transferred runC on the
source system will end and the container will have finished migration.
This can also be combined with CRIU's pre-copy support. The combination
of pre-copy and post-copy (lazy migration) provides the possibility to
migrate containers with minimal downtimes.
Some additional background about post-copy migration can be found in
these articles:
https://lisas.de/~adrian/?p=1253https://lisas.de/~adrian/?p=1183
Signed-off-by: Adrian Reber <areber@redhat.com>
When doing incremental dumps is useful to use auto deduplication of
memory images to save space.
Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:
* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update
The following commands work:
* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state
In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.
This patch add pre-dump and parent-path arguments to runc checkpoint
Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
The major change is the description of options, change
it as the latest cli help message shows, which specify
a "value" after an option if it takes value, and add
(default: xxx) if the option has a default value.
This also includes some other minor consistency fixes.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
For example:
./runc checkpoint --empty-ns network CTID
In this case criu creates a network namespace, but doesn't restore it.
We are going to use this option to restore docker containers and
Docker sets a hook to restore a network namespace.
https://github.com/xemul/criu/issues/165
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add state status() method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow multiple checkpoint on restore
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handle leave-running state
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fix state transitions for inprocess
Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove isDestroyed method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handling Pausing from freezer state
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
freezer status
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Fixing review comments
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Added comment when freezer not available
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Conflicts:
libcontainer/container_linux.go
Change checkFreezer logic to isPaused()
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove state base and factor out destroy func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add unit test for state transitions
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Should compile now without errors but changes needed to be added for each system so it actually works.
main_unsupported.go is a new file with all the unsupported commands
Fixes#9
Signed-off-by: Marianna <mtesselh@gmail.com>