Make use of errors.Is() and errors.As() where appropriate to check
the underlying error. The biggest motivation is to simplify the code.
The feature requires go 1.13 but since merging #2256 we are already
not supporting go 1.12 (which is an unsupported release anyway).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using errors.Unwrap() is not the best thing to do, since it returns
nil in case of an error which was not wrapped. More to say,
errors package provides more elegant ways to check for underlying
errors, such as errors.As() and errors.Is().
This reverts commit f8e138855d, reversing
changes made to 6ca9d8e6da.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Using GetPaths from cgroupv2 unified hierarchy code is deprecated
and this function will (hopefully) be removed.
Use GetUnifiedPath() for v2 case.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When performing checkpoint or restore of cgroupv2 unified hierarchy,
there is no need to call getCgroupMounts() / cgroups.GetCgroupMounts()
as there's only a single mount in there.
This eliminates the last internal (i.e. runc) use case of
cgroups.GetCgroupMounts() for v2 unified. Unfortunately, there
are external ones (e.g. moby/moby) so we can't yet let it
return an error.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
It turns out that ioutil.Readfile wraps the error in a *os.PathError.
Since we cannot guarantee compilation with golang >= v1.13, we are
manually unwrapping the error.
Signed-off-by: Kieron Browne <kbrowne@pivotal.io>
The newest CRIU version supports freezer v2 and this tells runc
to use it if new enough or fall back to non-freezer based process
freezing on cgroup v2 system.
Signed-off-by: Adrian Reber <areber@redhat.com>
linuxContainer.Signal() can race with another call to say Destroy()
which clears the container's initProcess. This can cause a nil pointer
dereference in Signal().
This patch will synchronize Signal() and Destroy() by grabbing the
container's mutex as part of the Signal() call.
Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>
Adrian reported that the checkpoint test stated failing:
=== RUN TestCheckpoint
--- FAIL: TestCheckpoint (0.38s)
checkpoint_test.go:297: Did not restore the pipe correctly:
The problem here is when we start exec.Cmd, we don't call its wait
method. This means that we don't wait cmd.goroutines ans so we don't
know when all data will be read from process pipes.
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Fixes#2128
This allows proc to be bind mounted for host and rootless namespace usecases but
it removes the ability to mount over the top of proc with a directory.
```bash
> sudo docker run --rm apparmor
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:346: starting container process caused "process_linux.go:449:
container init caused \"rootfs_linux.go:58: mounting
\\\"/var/lib/docker/volumes/aae28ea068c33d60e64d1a75916cf3ec2dc3634f97571854c9ed30c8401460c1/_data\\\"
to rootfs
\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged\\\"
at \\\"/proc\\\" caused
\\\"\\\\\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged/proc\\\\\\\"
cannot be mounted because it is not of type proc\\\"\"": unknown.
> sudo docker run --rm -v /proc:/proc apparmor
docker-default (enforce) root 18989 0.9 0.0 1288 4 ?
Ss 16:47 0:00 sleep 20
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.
subsystemsUnified is used on a system running with cgroups v2 unified
mode.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Minor refactoring to use the filePair struct for both init sock and log pipe
Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
Refactor configuring logging into a reusable component
so that it can be nicely used in both main() and init process init()
Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Claudia Beresford <cberesford@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
Add support for children processes logging (including nsexec).
A pipe is used to send logs from children to parent in JSON.
The JSON format used is the same used by logrus JSON formatted,
i.e. children process can use standard logrus APIs.
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Writing a file to tmpfs actually incurs a memcg penalty, and thus the
benefit of being able to disable memfd_create(2) with
_LIBCONTAINER_DISABLE_MEMFD_CLONE is fairly minimal -- though it should
be noted that quite a few distributions don't use tmpfs for /tmp (and
instead have it as a regular directory or subvolume of the host
filesystem).
Since runc must have write access to the state directory anyway (and the
state directory is usually not on a tmpfs) we can use that instead of
/tmp -- avoiding potential memcg costs with no real downside.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This makes use of the vendored in Go bindings and removes the copy of
the CRIU RPC interface definition. runc now relies on go-criu for RPC
definition and hopefully more CRIU functions can be used in the future
from the CRIU Go bindings.
Signed-off-by: Adrian Reber <areber@redhat.com>
runc creates all missing mountpoints when it starts a container, this
commit also creates those mountpoints during restore. Now it is possible
to restore a container using the same, but newly created rootfs just as
during container start.
Signed-off-by: Adrian Reber <areber@redhat.com>
CRIU 3.11 introduces configuration files:
https://criu.org/Configuration_fileshttps://lisas.de/~adrian/posts/2018-Nov-08-criu-configuration-files.html
This enables the user to influence CRIU's behaviour without code changes
if using new CRIU features or if the user wants to enable certain CRIU
behaviour without always specifying certain options.
With this it is possible to write 'tcp-established' to the configuration
file:
$ echo tcp-established > /etc/criu/runc.conf
and from now on all checkpoints will preserve the state of established
TCP connections. This removes the need to always use
$ runc checkpoint --tcp-stablished
If the goal is to always checkpoint with '--tcp-established'
It also adds the possibility for unexpected CRIU behaviour if the user
created a configuration file at some point in time and forgets about it.
As a result of the discussion in https://github.com/opencontainers/runc/pull/1933
it is now also possible to define a CRIU configuration file for each
container with the annotation 'org.criu.config'.
If 'org.criu.config' does not exist, runc will tell CRIU to use
'/etc/criu/runc.conf' if it exists.
If 'org.criu.config' is set to an empty string (''), runc will tell CRIU
to not use any runc specific configuration file at all.
If 'org.criu.config' is set to a non-empty string, runc will use that
value as an additional configuration file for CRIU.
With the annotation the user can decide to use the default configuration
file ('/etc/criu/runc.conf'), none or a container specific configuration
file.
Signed-off-by: Adrian Reber <areber@redhat.com>
when restore container from a checkpoint directory, we should get
pid from criu notify, since c.initProcess has not been created.
Signed-off-by: Ace-Tang <aceapril@126.com>
Finish off the work started in a344b2d6 (sync up `HookState` with OCI
spec `State`, 2016-12-19, #1201).
And drop HookState, since there's no need for a local alias for
specs.State.
Also set c.initProcess in newInitProcess to support OCIState calls
from within initProcess.start(). I think the cyclic references
between linuxContainer and initProcess are unfortunate, but didn't
want to address that here.
I've also left the timing of the Prestart hooks alone, although the
spec calls for them to happen before start (not as part of creation)
[1,2]. Once the timing gets fixed we can drop the
initProcessStartTime hacks which initProcess.start currently needs.
I'm not sure why we trigger the prestart hooks in response to both
procReady and procHooks. But we've had two prestart rounds in
initProcess.start since 2f276498 (Move pre-start hooks after container
mounts, 2016-02-17, #568). I've left that alone too.
I really think we should have len() guards to avoid computing the
state when .Hooks is non-nil but the particular phase we're looking at
is empty. Aleksa, however, is adamantly against them [3] citing a
risk of sloppy copy/pastes causing the hook slice being len-guarded to
diverge from the hook slice being iterated over within the guard. I
think that ort of thing is very lo-risk, because:
* We shouldn't be copy/pasting this, right? DRY for the win :).
* There's only ever a few lines between the guard and the guarded
loop. That makes broken copy/pastes easy to catch in review.
* We should have test coverage for these. Guarding with the wrong
slice is certainly not the only thing you can break with a sloppy
copy/paste.
But I'm not a maintainer ;).
[1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart
[2]: https://github.com/opencontainers/runc/issues/1710
[3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570
Signed-off-by: W. Trevor King <wking@tremily.us>
Cgroup namespace can be configured in `config.json` as other
namespaces. Here is an example:
```
"namespaces": [
{
"type": "pid"
},
{
"type": "network"
},
{
"type": "ipc"
},
{
"type": "uts"
},
{
"type": "mount"
},
{
"type": "cgroup"
}
],
```
Note that if you want to run a container which has shared cgroup ns with
another container, then it's strongly recommended that you set
proper `CgroupsPath` of both containers(the second container's cgroup
path must be the subdirectory of the first one). Or there might be
some unexpected results.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>