The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:
* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update
The following commands work:
* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state
In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Effectively, !dumpable makes implementing rootless containers quite
hard, due to a bunch of different operations on /proc/self no longer
being possible without reordering everything.
!dumpable only really makes sense when you are switching between
different security contexts, which is only the case when we are joining
namespaces. Unfortunately this means that !dumpable will still have
issues in this instance, and it should only be necessary to set
!dumpable if we are not joining USER namespaces (new kernels have
protections that make !dumpable no longer necessary). But that's a topic
for another time.
This also includes code to unset and then re-set dumpable when doing the
USER namespace mappings. This should also be safe because in principle
processes in a container can't see us until after we fork into the PID
namespace (which happens after the user mapping).
In rootless containers, it is not possible to set a non-dumpable
process's /proc/self/oom_score_adj (it's owned by root and thus not
writeable). Thus, it needs to be set inside nsexec before we set
ourselves as non-dumpable.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
If we try to pause a container on the system without freezer cgroups,
we can found that runc tries to open ./freezer.state. It is obviously wrong.
$ ./runc pause test
no such directory for freezer.state
$ echo FROZEN > freezer.state
$ ./runc pause test
container not running or created: paused
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Master builds only have a 'git clone ...' [1] so FETCH_HEAD isn't
defined and git-validation crashes [2]. We don't want to be
hard-coding a range here, and should update git-validation to handle
these cases automatically.
Also echo TRAVIS_* variables during testing to make debugging
git-validation easier.
[1]: https://travis-ci.org/opencontainers/runc/jobs/213508696#L243
[2]: https://travis-ci.org/opencontainers/runc/jobs/213508696#L347
Signed-off-by: W. Trevor King <wking@tremily.us>
When process config doesnt specify capabilities anywhere, we should not panic
because setting capabilities are optional.
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
This reverts commit d4091ef151.
d4091ef151 ("fix minor issue") doesn't actually make any sense, and
actually makes the code more confusing.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This maybe a nice extra but it adds complication to the usecase. The
contract is listen on the socket and you get an fd to the pty master and
that is that.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.
This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.
Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
In container process's Init function, we use
fd + execFifoFilename to open exec fifo, so this
field in init config is never used.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
As per the discussions in #1156 , we think it's a bad
idea to allow multi container operations in runc. So
revert it.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This is a fix for rootless containers and general io handling. The
higher level systems must preparte the IO for the container in the
detach case and make sure it is setup correctly for the container's
process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fixes: #1347Fixes: #1083
The root cause of #1083 is because we're joining an
existed cgroup whose kmem accouting is not initialized,
and it has child cgroup or tasks in it.
Fix it by checking if the cgroup is first time created,
and we should enable kmem accouting if the cgroup is
craeted by libcontainer with or without kmem limit
configed. Otherwise we'll get issue like #1347
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>