Currently, if we start a container with:
`docker run -ti --name foo --memory 300M --memory-swap 500M busybox sh`
Then we want to update it with:
`docker update --memory 600M --memory-swap 800M foo`
It'll get error because we can't set memory to 600M with
the 500M limit of swap memory.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Based on Golang document, %s is for "the uninterpreted bytes of the
string or slice", so %v is more appropriate.
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
Users of libcontainer other than runc may also require parsing and
converting specification configuration files.
Since runc cannot be imported, move the relevant functions and
definitions to a separate package, libcontainer/specconv.
Signed-off-by: Ido Yariv <ido@wizery.com>
Fixes#680
This changes setupRlimit to use the Prlimit syscall (rather than
Setrlimit) and moves the call to the parent process. This is necessary
because Setrlimit would affect the libcontainer consumer if called in
the parent, and would fail if called from the child if the
child process is in a user namespace and the requested rlimit is higher
than that in the parent.
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Make sure we don't error out collecting statistics for cases where
pids.max == "max". In that case, we can use a limit of 0 which means
"unlimited".
In addition, change the name of the stats attribute (Max) to mirror the
name of the resources attribute in the spec (Limit) so that it's
consistent internally.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This is required because we manage some of the cgroups ourselves.
This recommendation came from talking with systemd devs about
some of the issues that we see when using the systemd cgroups driver.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
We need to make sure the container is destroyed before closing the stdio
for the container. This becomes a big issues when running in the host's
pid namespace because the other processes could have inherited the stdio
of the initial process. The call to close will just block as they still
have the io open.
Calling destroy before closing io, especially in the host pid namespace
will cause all additional processes to be killed in the container's
cgroup. This will allow the io to be closed successfuly.
This change makes sure the order for destroy and close is correct as
well as ensuring that if any errors encoutered during start or exec will
be handled by terminating the process and destroying the container. We
cannot use defers here because we need to enforce the correct ordering
on destroy.
This also sets the subreaper setting for runc so that when running in
pid host, runc can wait on the addiontal processes launched by the
container, useful on destroy, but also good for reaping the additional
processes that were launched.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This ensures that anything written to the logs are synced as they
happen.
This also changes the error message of the libcontainer error. The
original idea was to have this extra information in the message but it
makes it hard to parse and if the caller needed this information they
can just get it from the error type.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
In order to allow nice usage statistics (in terms of percentages and
other such data), add the value of pids.max to the PidsStats struct
returned from the pids cgroup controller.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Clears supplementary groups that have effect on the
mount permissions before joining the user specified
groups happens.
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
This updates runc and libcontainer to handle rlimits per process and set
them correctly for the container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The rhel6 kernel returns EINVAL in this case
Known issue:
* CT with userns doesn't work
This is a copy of
d31e97fa28
to address https://github.com/opencontainers/runc/issues/613
Signed-off-by: Andrey Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Fernandes <andrew@fernandes.org>
Now that all the user namespace code is moved into C, these routines are
no longer used.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
The re-work of namespace entering lost the setuid/setgid that was part
of the Go-routine based process exec in the prior code. A side issue was
found with setting oom_score_adj before execve() in a userns that is
also solved here.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This is needed to make 'runc delete' correctly run the post-stop hooks.
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Ed King <eking@pivotal.io>
currentState() always adds all possible namespaces to the state,
regardless of whether they are supported.
If orderNamespacePaths detects an unsupported namespace, an error is
returned that results in initialization failure.
Fix this by only adding paths of supported namespaces to the state.
Signed-off-by: Ido Yariv <ido@wizery.com>
The path in the stacktrace might not be:
"github.com/opencontainers/runc/libcontainer/stacktrace"
For example, for me its:
"_/go/src/github.com/opencontainers/runc/libcontainer/stacktrace"
so I changed the check to make sure the tail end of the path matches instead
of the entire thing
Signed-off-by: Doug Davis <dug@us.ibm.com>
This simply move the call to the Prestart hooks to be made once we
receive the procReady message from the client.
This is necessary as we had to move the setns calls within nsexec in
order to be accomodate joining namespaces that only affect future
children (e.g. NEWPID).
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.
This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.
With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
This adds orderNamespacePaths to get correct order of namespaces for the
bootstrap program to join.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports
a namespace type.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
This allows the mount syscall to validate the addiontal types where we
do not have to perform extra validation and is up to the consumer to
verify the functionality of the type of device they are trying to
mount.
Fixes#572
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Exec erros from the exec() syscall in the container's init should be
treated as if the container ran but couldn't execute the process for the
user instead of returning a libcontainer error as if it was an issue in
the library.
Before specifying different commands like `/etc`, `asldfkjasdlfj`, or
`/alsdjfkasdlfj` would always return 1 on the command line with a
libcontainer specific error message. Now they return the correct
message and exit status defined for unix processes.
Example:
```bash
root@deathstar:/containers/redis# runc start test
exec: "/asdlfkjasldkfj": file does not exist
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "asdlfkjasldkfj": executable file not found in $PATH
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "/etc": permission denied
root@deathstar:/containers/redis# echo $?
126
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: rajasec <rajasec79@gmail.com>
Changing to name values for defer as per review comments
Signed-off-by: rajasec <rajasec79@gmail.com>
Fixed review comments
Signed-off-by: rajasec <rajasec79@gmail.com>
This prior fix to set "-1" explicitly was lost, and it is simpler to use
the same pointer type from the OCI spec to handle nil pointer == -1 ==
unset case.
Also, as a nearly humorous aside, there was a test for MemorySwappiness
that was actually setting Memory, and it was passing because of this
bug (as it was always setting everyone's MemorySwappiness to zero!)
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Create a unique session key name for every container. Use the pattern
_ses.<postfix> with postfix being the container's Id.
This patch does not prevent containers from joining each other's session
keyring.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>