Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.
This is the first step of many to allowing runC to work with a
completely read-only rootfs.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
In certain circumstances (such as the rootless containers patchset), it
is not possible to test things using /sys/firmware. In addition, we
should be testing our own functionality rather than testing protection
against /sys attacks (for which the system might already have extra
protections).
Instead, just make some fake paths in the rootfs that we then mask.
Oddly I noticed that one of the errors changed when doing this (because
before we tested removing a file from /sys/firmware which is -EPERM). So
the old test was broken.
Fixes: 53179559a1 ("MaskPaths: support directory")
Fixes: #1068
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Without this patch applied, RHEL's SELinux policies cause container
creation to not really work. Unfortunately this might be an issue for
rootless containers (opencontainers/runc#774) but we'll cross that
bridge when we come to it.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
We need support for read/only mounts in SELinux to allow a bunch of
containers to share the same read/only image. In order to do this
we need a new label which allows container processes to read/execute
all files but not write them.
Existing mount label is either shared write or private write. This
label is shared read/execute.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
At some point InitLabels was changed to look for SecuritOptions
separated by a ":" rather then an "=", but DupSecOpt was never
changed to match this default.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
With this patch, `runc start` command can start mulit-containers
at one command this patch also checks the argument of the `start`
command.
root@ubuntu:# runc list
ID PID STATUS BUNDLE CREATED
a 0 stopped /mycontainer 2016-09-23T08:56:42.754026567Z
b 62979 created /mycontainer 2016-09-23T09:01:36.421976458Z
c 62993 running /mycontainer 2016-09-23T09:01:38.105940389Z
d 63006 created /mycontainer 2016-09-23T09:01:39.65441942Z
e 63020 created /mycontainer 2016-09-23T09:01:40.989995515Z
root@ubuntu:# runc start
runc: "start" requires a minimum of 1 argument
root@ubuntu:# runc start a b c d e f
cannot start a container that has run and stopped
cannot start an already running container
container f is not exist
all or part of the containers start failed
root@ubuntu:# runc list
ID PID STATUS BUNDLE CREATED
a 0 stopped /mycontainer 2016-09-23T08:56:42.754026567Z
b 62979 running /mycontainer 2016-09-23T09:01:36.421976458Z
c 62993 running /mycontainer 2016-09-23T09:01:38.105940389Z
d 63006 running /mycontainer 2016-09-23T09:01:39.65441942Z
e 63020 running /mycontainer 2016-09-23T09:01:40.989995515Z
Signed-off-by: Wang Long <long.wanglong@huawei.com>
If copyup is specified for a tmpfs mount, then the contents of the
underlying directory are copied into the tmpfs mounted over it.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Depending on your SELinux setup, the order in which you join namespaces
can be important. In general, user namespaces should *always* be joined
and unshared first because then the other namespaces are correctly
pinned and you have the right priviliges within them. This also is very
useful for rootless containers, as well as older kernels that had
essentially broken unshare(2) and clone(2) implementations.
This also includes huge refactorings in how we spawn processes for
complicated reasons that I don't want to get into because it will make
me spiral into a cloud of rage. The reasoning is in the giant comment in
clone_parent. Have fun.
In addition, because we now create multiple children with CLONE_PARENT,
we cannot wait for them to SIGCHLD us in the case of a death. Thus, we
have to resort to having a child kindly send us their exit code before
they die. Hopefully this all works okay, but at this point there's not
much more than we can do.
Signed-off-by: Aleksa Sarai <asarai@suse.de>