This removes the use of a signal handler and SIGCONT to signal the init
process to exec the users process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Prior to this change a cgroup with a `:` character in it's path was not
parsed correctly (as occurs on some instances of systemd cgroups under
some versions of systemd, e.g. 225 with accounting).
This fixes that issue and adds a test.
Signed-off-by: Euan Kemp <euank@coreos.com>
This adds an `--no-new-keyring` flag to run and create so that a new
session keyring is not created for the container and the calling
processes keyring is inherited.
Fixes#818
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Delegate is only available in systemd >218, applying it for older systemd will
result in an error. Therefore we should check for it when testing systemd
properties.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
Kernel memory cannot be set in these circumstances (before kernel 4.6):
1. kernel memory is not initialized, and there are tasks in cgroup
2. kernel memory is not initialized, and use_hierarchy is enabled,
and there are sub-cgroups
While we don't need to cover case 2 because when we set kernel
memory in runC, it's either:
- in Apply phase when we create the container, and in this case,
set kernel memory would definitely be valid;
- or in update operation, and in this case, there would be tasks
in cgroup, we only need to check if kernel memory is initialized
or not.
Even if we want to check use_hierarchy, we need to check sub-cgroups
as well, but for here, we can just leave it aside.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
In user namespaces, we need to make sure we don't chown() the console to
unmapped users. This means we need to get both the UID and GID of the
root user in the container when changing the owner.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Fix the following warnings when building runc with gcc 6+:
Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:
In function ‘nsexec’:
Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:322:6:
warning: ‘__s’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
pr_perror("Failed to open %s", ns);
Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:273:30:
note: ‘__s’ was declared here
static struct nsenter_config process_nl_attributes(int pipenum, char
*data, int data_size)
^~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
`libcontainer/cgroups/utils.go` uses an incorrect path to the
documentation for cgroups. This updates the comment to use the correct
URL. Fixes#794.
Signed-off-by: Jim Berlage <james.berlage@gmail.com>
See https://github.com/docker/docker/issues/22252
Previously we would apply seccomp rules before applying
capabilities, because it requires CAP_SYS_ADMIN. This
however means that a seccomp profile needs to allow
operations such as setcap() and setuid() which you
might reasonably want to disallow.
If prctl(PR_SET_NO_NEW_PRIVS) has been applied however
setting a seccomp filter is an unprivileged operation.
Therefore if this has been set, apply the seccomp
filter as late as possible, after capabilities have
been dropped and the uid set.
Note a small number of syscalls will take place
after the filter is applied, such as `futex`,
`stat` and `execve`, so these still need to be allowed
in addition to any the program itself needs.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Avoid parsing the whole lines of mountinfo after all mountpoints
of the target subsytems are found, or when the target subsystem
is not enabled.
Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>
Remove a wrongly added include which was added in commit 3c2e77ee (Add a
compatibility header for CentOS/RHEL 6, 2016-01-29) apparently to
fix this compile error on centos 6:
> In file included from
> Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/nsenter/nsexec.c:20:
> /usr/include/linux/netlink.h:35: error: expected specifier-qualifier-list before 'sa_family_t'
The glibc bits/sockaddr.h says that this header should never be included
directly[1]. Instead, sys/socket.h should be used.
The problem was correctly fixed later, in commit 394fb55 (Fix build
error on centos6, 2016-03-02) so the incorrect bits/sockaddr.h can
safely be removed.
This is needed to build musl libc.
Fixes#761
[1]: 20003c4988/bits/sockaddr.h (L20)
Signed-off-by: Natanael Copa <natanael.copa@docker.com>
This is the inital port of the libcontainer.Error to added a cause to
all the existing error messages. Going forward, when an error can be
wrapped because it is not being checked at the higher levels for
something like `os.IsNotExist` we can add more information to the error
message like cause and stack file/line information. This will help
higher level tools to know what cause a container start or operation to
fail.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
No substantial code change.
Note that some style errors reported by `golint` are not fixed due to possible compatibility issues.
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
When swap memory is unsupported, Docker will set
cgroup.Resources.MemorySwap as -1.
Fixes: https://github.com/docker/docker/pull/21937
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
setupDev was introduced in #96, but broken since #536 because spec 0.3.0 introduced default devices.
Fix#80 again
Fixdocker/docker#21808
Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
One of our volume plugins needs to get the label of the target mount point
so that it can set the content inside of the volume to match.
We need label.GetFileLabel() to make this work.
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Some of the code was quite confusing inside libcontainer/user, so
refactor and comment it so future maintainers can understand what's
going and what edge cases we have to deal with.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Most shadow-related tools don't treat numeric ids as potential
usernames, so change our behaviour to match that. Previously, using an
explicit specification like 111:222 could result in the UID and GID not
being 111 and 222 respectively (which is confusing).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This adds a `--no-pivot` cli flag to runc so that a container's rootfs
can be located ontop of ramdisk/tmpfs and not fail because you cannot
pivot root.
This should be a cli flag and not part of the spec because this is a
detail of the host/runtime environment and not an attribute of a
container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Currently, if we start a container with:
`docker run -ti --name foo --memory 300M --memory-swap 500M busybox sh`
Then we want to update it with:
`docker update --memory 600M --memory-swap 800M foo`
It'll get error because we can't set memory to 600M with
the 500M limit of swap memory.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Based on Golang document, %s is for "the uninterpreted bytes of the
string or slice", so %v is more appropriate.
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
Users of libcontainer other than runc may also require parsing and
converting specification configuration files.
Since runc cannot be imported, move the relevant functions and
definitions to a separate package, libcontainer/specconv.
Signed-off-by: Ido Yariv <ido@wizery.com>
Fixes#680
This changes setupRlimit to use the Prlimit syscall (rather than
Setrlimit) and moves the call to the parent process. This is necessary
because Setrlimit would affect the libcontainer consumer if called in
the parent, and would fail if called from the child if the
child process is in a user namespace and the requested rlimit is higher
than that in the parent.
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Make sure we don't error out collecting statistics for cases where
pids.max == "max". In that case, we can use a limit of 0 which means
"unlimited".
In addition, change the name of the stats attribute (Max) to mirror the
name of the resources attribute in the spec (Limit) so that it's
consistent internally.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This is required because we manage some of the cgroups ourselves.
This recommendation came from talking with systemd devs about
some of the issues that we see when using the systemd cgroups driver.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>