The re-work of namespace entering lost the setuid/setgid that was part
of the Go-routine based process exec in the prior code. A side issue was
found with setting oom_score_adj before execve() in a userns that is
also solved here.
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This bump of the spec includes a change to the deivce type to be a
string so that it is more readable in the json serialization.
It also includes the change were caps, no new privs, and process
labeling features are moved from the container config onto the process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
currentState() always adds all possible namespaces to the state,
regardless of whether they are supported.
If orderNamespacePaths detects an unsupported namespace, an error is
returned that results in initialization failure.
Fix this by only adding paths of supported namespaces to the state.
Signed-off-by: Ido Yariv <ido@wizery.com>
The path in the stacktrace might not be:
"github.com/opencontainers/runc/libcontainer/stacktrace"
For example, for me its:
"_/go/src/github.com/opencontainers/runc/libcontainer/stacktrace"
so I changed the check to make sure the tail end of the path matches instead
of the entire thing
Signed-off-by: Doug Davis <dug@us.ibm.com>
This simply move the call to the Prestart hooks to be made once we
receive the procReady message from the client.
This is necessary as we had to move the setns calls within nsexec in
order to be accomodate joining namespaces that only affect future
children (e.g. NEWPID).
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.
This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.
With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
This adds orderNamespacePaths to get correct order of namespaces for the
bootstrap program to join.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports
a namespace type.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
This updates the current list to what we have now in docker and also
makes these always added so that these are masked out. Privileged
containers can always unmount these if they want to read from kcore or
something like that.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows the mount syscall to validate the addiontal types where we
do not have to perform extra validation and is up to the consumer to
verify the functionality of the type of device they are trying to
mount.
Fixes#572
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add a waitgroup to wait for the io.Copy of stdout/err to finish before
existing runc. The problem happens more in exec because it is really
fast and the pipe has data buffered but not yet read after the process
has already exited.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Exec erros from the exec() syscall in the container's init should be
treated as if the container ran but couldn't execute the process for the
user instead of returning a libcontainer error as if it was an issue in
the library.
Before specifying different commands like `/etc`, `asldfkjasdlfj`, or
`/alsdjfkasdlfj` would always return 1 on the command line with a
libcontainer specific error message. Now they return the correct
message and exit status defined for unix processes.
Example:
```bash
root@deathstar:/containers/redis# runc start test
exec: "/asdlfkjasldkfj": file does not exist
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "asdlfkjasldkfj": executable file not found in $PATH
root@deathstar:/containers/redis# echo $?
127
root@deathstar:/containers/redis# runc start test
exec: "/etc": permission denied
root@deathstar:/containers/redis# echo $?
126
```
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>