In container process's Init function, we use
fd + execFifoFilename to open exec fifo, so this
field in init config is never used.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Fixes: #1347Fixes: #1083
The root cause of #1083 is because we're joining an
existed cgroup whose kmem accouting is not initialized,
and it has child cgroup or tasks in it.
Fix it by checking if the cgroup is first time created,
and we should enable kmem accouting if the cgroup is
craeted by libcontainer with or without kmem limit
configed. Otherwise we'll get issue like #1347
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
It should not be binded to container creation, for
example, runc restore needs to create a
libcontainer.Container, but it won't need exec fifo.
So create exec fifo when container is started or run,
where we really need it.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
The error message added here provides no value as the caller already
knows all the added details. However it is covering up the underyling
system error (typically `ENOTSUP`). There is no way to handle this error before
this change.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
getDevices() has been updated to skip `/dev/.lxc` and `/dev/.lxd-mounts`, which was breaking privileged Docker containers running on runC, inside of LXD managed Linux Containers
Signed-off-by: Carlton-Semple <carlton.semple@ibm.com>
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.
This patch add pre-dump and parent-path arguments to runc checkpoint
Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
As the runtime-spec allows it, we want to be able to specify overlayfs
mounts with:
{
"destination": "/etc/pki",
"type": "overlay",
"source": "overlay",
"options": [
"lowerdir=/etc/pki:/home/amurdaca/go/src/github.com/opencontainers/runc/rootfs_fedora/etc/pki"
]
},
This patch takes care of allowing overlayfs mounts. Both RO and RW
should be supported.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
`label.InitLabels` takes options as a string slice in the form of:
user:system_u
role:system_r
type:container_t
level:s0:c4,c5
However, `DupSecOpt` and `DisableSecOpt` were still adding a docker
specifc `label=` in front of every option. That leads to `InitLabels`
not being able to correctly init selinux labels in this scenario for
instance:
label.InitLabels(DupSecOpt([%OPTIONS%]))
if `%OPTIONS` has options prefixed with `label=`, that's going to fail.
Fix this by removing that docker specific `label=` prefix.
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Correct container.Destroy() docs to clarify that destroy can only operate on containers in specific states.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
If a relative pathed exe is used for InitArgs init will fail to run if Cwd is not set the original path.
Prevent failure of init to run by ensuring that exe in InitArgs is an absolute path.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
*host* filesystem from that file descriptor. This was fixed in part by
5d93fed3d2 ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.
To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).
There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.
Fixes: CVE-2016-9962
Fixes: 5d93fed3d2 ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
When signaling children and the signal is SIGKILL wait for children
otherwise conditionally wait for children which are ready to report.
This reaps all children which exited due to the signal sent without
blocking indefinitely.
Also:
* Ignore ignore ECHILD, which means the child has already gone.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
Ensure that the pipe is always closed during the error processing of StartInitialization.
Also:
* Fix a comment typo.
* Use newContainerInit directly as there's no need for i to be an initer.
* Move the comment about the behaviour of Init() directly above it, clarifying what happens for all defers.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
Add the import of nsenter to the example in libcontainer's README.md, as without it none of the example code works.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
POSIX mandates that `cmsg_len` in `struct cmsghdr` is a `socklen_t`,
which is an `unsigned int`. Musl libc as used in Alpine implements
this; Glibc ignores the spec and makes it a `size_t` ie `unsigned long`.
To avoid the `-Wformat=` warning from the `%lu` on Alpine, cast this
to an `unsigned long` always.
Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Add godoc links to README.md files for runc and libcontainer so its easy to access the golang documentation.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
signalAllProcesses was making the assumption that the requested signal was SIGKILL, possibly due to the signal parameter being added at a later date, and hence it was safe to wait for all processes which is not the case.
BaseContainer.Signal(s os.Signal, all bool) exposes this functionality to consumers, so an arbitrary signal could be used which is not guaranteed to make the processes exit.
Correct the documentation for signalAllProcesses around the signal delivered and update it so that the wait is only performed on SIGKILL hence making it safe to process other signals without risk of blocking forever, while still maintaining compatibility to SIGKILL callers.
Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
The parameters passed to `GetExecUser` is not correct.
Consider the following code:
```
package main
import (
"fmt"
"io"
"os"
)
func main() {
passwd, err := os.Open("/etc/passwd1")
if err != nil {
passwd = nil
} else {
defer passwd.Close()
}
err = GetUserPasswd(passwd)
if err != nil {
fmt.Printf("%#v\n", err)
}
}
func GetUserPasswd(r io.Reader) error {
if r == nil {
return fmt.Errorf("nil source for passwd-formatted
data")
} else {
fmt.Printf("r = %#v\n", r)
}
return nil
}
```
If the file `/etc/passwd1` is not exist, we expect to return
`nil source for passwd-formatted data` error, and in fact, the func
`GetUserPasswd` return nil.
The same logic exists in runc code. this patch fix it.
Signed-off-by: Wang Long <long.wanglong@huawei.com>