The spec uses symlinks to "/proc/1/..." but the implementation uses
"/proc/self/...": see setupDevSymlinks (libcontainer/rootfs_linux.go).
The implementation is more correct, so I'm changing the spec to match
the implementation.
Signed-off-by: Alban Crequy <alban.crequy@coreos.com>
Minor fix, the former setupDev=true means not setup dev,
which is contrary to intuition, just correct it.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Also add cpuset as the first in the list to address issues setting the
pid in any cgroup before the cpuset is populated.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
It can avoid unnecessary task migrataion, see this scenario:
- container init task is on cpu 1, and we assigned it to cpu 1,
but parent cgroup's cpuset.cpus=2
- we created the cgroup dir and inherited cpuset.cpus from parent as 2
- write container init task's pid to cgroup.procs
- [it's possibile the container init task migrated to cpu 2 here]
- set cpuset.cpus as assigned to cpu 1
- [the container init task has to be migrated back to cpu 1]
So we should set cpuset.cpus and cpuset.mems before writing pids
to cgroup.procs to aviod such problem.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Only valid options to --security-opt for label should be
disable, user, role, type, level.
Return error on invalid entry
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.
Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.
With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Docker pkgs were updated while golinting the whole docker code base.
Now when trying to bump libcontainer/runc in docker, it fails compiling
with the following error:
``
vendor/src/github.com/opencontainers/runc/libcontainer/rootfs_linux.go:424:
undefined: mount.MountInfo
``
This is because, for instance, the mount pkg was updated here
0f5c9d301b (diff-49294d05afa48e2f7c0d2f02c6f7614c)
and now that type is only `mount.Info`.
This patch bump docker pkgs commit and adapt code to it.
Signed-off-by: Antonio Murdaca <amurdaca@redhat.com>
This allows getting the path to the subsystem and so is subsequently
used in EnterPid by an exec process.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
This is meant to be used in retrieving the paths so an exec
process enters all the cgroup paths correctly.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
/etc/groups is not needed when specifying numeric group ids. This
change allows containers without /etc/groups to specify numeric
supplemental groups.
Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
Godeps: Vendor opencontainers/specs 96bcd043aa
Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274
Signed-off-by: Antonio Murdaca <runcom@linux.com>
pivotDir is the one where pivot_root() call puts the old root. We will
unmount pivotDir() and delete it.
Previously we were making / always rslave or rprivate. That will mean
that pivotDir() could never have mounts which would be shared with
parent mount namespace. That also means that unmounting pivotDir() was
safe and none of the unmount will propagate to parent namespace and
unmount things which we did not want to.
But now user can specify that apply private, shared, slave on /. That
means some of the mounts we inherited from parent could be shared and that
also means if we umount pivotDir/, those mounts will get unmounted in
parent too. That's not what we want.
Instead make pivotDir rprivate so that unmounts don't propagate back to
parent.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
pivot_root() introduces bunch of restrictions otherwise it fails. parent
mount of container root can not be shared otherwise pivot_root() will
fail.
So far parent could not be shared as we marked everything either private
or slave. But now we have introduced new propagation modes where parent
mount of container rootfs could be shared and pivot_root() will fail.
So check if parent mount is shared and if yes, make it private. This will
make sure pivot_root() works.
Also it will make sure that when we bind mount container rootfs, it does
not propagate to parent mount namespace. Otherwise cleanup becomes a
problem.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.
Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.
Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.
config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Do not remount a bind mount to enable flags unless non-default flags are
provided for the requested mount. This solves a problem with user
namespaces and remount of bind mount permissions.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)