Commit Graph

93 Commits

Author SHA1 Message Date
Mrunal Patel 4f9cb13b64 Update runtime spec to 1.0.0.rc5
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-15 11:38:37 -07:00
Qiang Huang b7932a2e07 Remove unused ExecFifoPath
In container process's Init function, we use
fd + execFifoFilename to open exec fifo, so this
field in init config is never used.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-09 10:58:16 +08:00
Qiang Huang 707dd48b2f Merge pull request #1001 from x1022as/predump
add pre-dump and parent-path to checkpoint
2017-02-24 10:55:06 -08:00
Qiang Huang 733563552e Fix state when _LIBCONTAINER in environment
Fixes: #1311

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:35:14 -08:00
Qiang Huang 805b8c73d3 Do not create exec fifo in factory.Create
It should not be binded to container creation, for
example, runc restore needs to create a
libcontainer.Container, but it won't need exec fifo.

So create exec fifo when container is started or run,
where we really need it.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:34:48 -08:00
Deng Guangxing 98f004182b add pre-dump and parent-path to checkpoint
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.

This patch add pre-dump and parent-path arguments to runc checkpoint

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2017-02-14 19:45:07 +08:00
Aleksa Sarai e034cedce7
libcontainer: init: only pass stateDirFd when creating a container
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
*host* filesystem from that file descriptor. This was fixed in part by
5d93fed3d2 ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.

To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).

There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.

Fixes: CVE-2016-9962
Fixes: 5d93fed3d2 ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-02-02 00:41:11 +11:00
Qiang Huang db99936a0e Merge pull request #1110 from avagin/cpt-in-userns
checkpoint: handle config.Devices and config.MaskPaths
2017-01-10 00:34:40 -06:00
Zhang Wei a344b2d6a8 sync up `HookState` with OCI spec `State`
`HookState` struct should follow definition of `State` in runtime-spec:

* modify json name of `version` to `ociVersion`.
* Remove redundant `Rootfs` field as rootfs can be retrived from
`bundlePath/config.json`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-20 00:00:43 +08:00
Mrunal Patel 34f23cb99c Merge pull request #1018 from cyphar/console-rewrite
Consoles, consoles, consoles.
2016-12-07 14:37:19 -08:00
Xianlu Bird e2e6f58e4e Fix typo
Fix typo
2016-12-01 15:23:58 +08:00
Aleksa Sarai 244c9fc426
*: console rewrite
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.

In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.

We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Michael Crosby e58671e530 Add --all flag to kill
This allows a user to send a signal to all the processes in the
container within a single atomic action to avoid new processes being
forked off before the signal can be sent.

This is basically taking functionality that we already use being
`delete` and exposing it ok the `kill` command by adding a flag.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-08 09:35:02 -08:00
Andrei Vagin 040fb7311c checkpoint: handle config.Devices and config.MaskPaths
In user namespaces devices are bind-mounted from the host, so
we need to add them as external mounts for CRIU.

Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2016-10-26 23:50:54 +03:00
Aleksa Sarai 2cd9c31b99
nsenter: guarantee correct user namespace ordering
Depending on your SELinux setup, the order in which you join namespaces
can be important. In general, user namespaces should *always* be joined
and unshared first because then the other namespaces are correctly
pinned and you have the right priviliges within them. This also is very
useful for rootless containers, as well as older kernels that had
essentially broken unshare(2) and clone(2) implementations.

This also includes huge refactorings in how we spawn processes for
complicated reasons that I don't want to get into because it will make
me spiral into a cloud of rage. The reasoning is in the giant comment in
clone_parent. Have fun.

In addition, because we now create multiple children with CLONE_PARENT,
we cannot wait for them to SIGCHLD us in the case of a death. Thus, we
have to resort to having a child kindly send us their exit code before
they die. Hopefully this all works okay, but at this point there's not
much more than we can do.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-04 16:17:55 +11:00
Aleksa Sarai ed053a740c
nsenter: specify namespace type in setns()
This avoids us from running into cases where libcontainer thinks that a
particular namespace file is a different type, and makes it a fatal
error rather than causing broken functionality.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-04 16:17:55 +11:00
Wang Long 59a241f647 update the comment for container.Pause() method on linux
if a container state is running or created, the container.Pause()
method can set the state to pausing, and then paused.

this patch update the comment, so it can be consistent with the code.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-09-20 10:49:04 +08:00
Qiang Huang 1e319efa36 Merge pull request #815 from rajasec/basecont-comments
Updated the libcontainer interface comments
2016-08-26 09:43:50 +08:00
Michael Crosby 46d9535096 Merge pull request #934 from macrosheep/fix-initargs
Fix and refactor init args
2016-08-24 10:06:01 -07:00
rajasec 1ea17d73fe Updated the libcontainer interface comments
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-08-23 19:14:27 +05:30
Phil Estes 85f4d20b44
Restored-from-checkpoint containers should have a start time
Set the start time similar to a brand new container.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2016-08-21 18:15:18 -04:00
Qiang Huang 41b12c095b Merge pull request #913 from cloudfoundry-incubator/addgroupsnocompatible
Let the user explicitly specify `additionalGids` on `runc exec`
2016-07-15 10:12:31 +08:00
Yang Hongyang a59d63c5d3 Fix and refactor init args
1. According to docs of Cmd.Path and Cmd.Args from package "os/exec":
   Path is the path of the command to run. Args holds command line
   arguments, including the command as Args[0]. We have mixed usage
   of args. In InitPath(), InitArgs only take arguments, in InitArgs(),
   InitArgs including the command as Args[0]. This is confusing.
2. InitArgs() already have the ability to configure a LinuxFactory
   with the provided absolute path to the init binary and arguements as
   InitPath() does.
3. exec.Command() will take care of serching executable path.
4. The default "/proc/self/exe" instead of os.Args[0] is passed to
   InitArgs in order to allow relative path for the runC binary.

Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>
2016-07-06 23:21:02 -04:00
Qiang Huang 14e95b2aa9 Make state detection precise
Fixes: https://github.com/opencontainers/runc/issues/871

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-07-05 08:24:13 +08:00
Petar Petrov f9b72b1b46 Allow additional groups to be overridden in exec
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2016-06-21 10:35:11 +03:00
Mrunal Patel f5b6ff23b8 Merge pull request #881 from rajasec/update-status
Update for stopped container
2016-06-13 16:05:25 -07:00
Michael Crosby 3aacff695d Use fifo for create/start
This removes the use of a signal handler and SIGCONT to signal the init
process to exec the users process.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-13 11:26:53 -07:00
rajasec 12869604ca Update for stopped container
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-04 22:08:08 +05:30
Michael Crosby 1d61abea46 Allow delete of created container
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-02 12:26:12 -07:00
Michael Crosby 6eba9b8ffb Fix SystemError and env lookup
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:10:47 -07:00
Michael Crosby efcd73fb5b Fix signal handling for unit tests
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:10:47 -07:00
Michael Crosby 30f1006b33 Fix libcontainer states
Move initialized to created and destoryed to stopped.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Michael Crosby 3fe7d7f31e Add create and start command for container lifecycle
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Andrew Vagin c161e65ac6 cr: don't fill veth devices if netns is in EmptyNs
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
2016-05-28 01:19:54 +03:00
Qiang Huang b6e23f8166 Add comments for error cases in status functions
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-16 18:24:07 +08:00
Michael Crosby 7dd87976ed Merge pull request #758 from rajasec/container-pause-comment
Update the comment for container pause
2016-04-19 16:16:41 -07:00
Michael Crosby 6978875298 Add cause to error messages
This is the inital port of the libcontainer.Error to added a cause to
all the existing error messages.  Going forward, when an error can be
wrapped because it is not being checked at the higher levels for
something like `os.IsNotExist` we can add more information to the error
message like cause and stack file/line information.  This will help
higher level tools to know what cause a container start or operation to
fail.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-18 11:37:26 -07:00
rajasec ccbd0a176f Update the comment for container pause
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-04-16 14:59:19 +05:30
Akihiro Suda 1829531241 Fix trivial style errors reported by `go vet` and `golint`
No substantial code change.
Note that some style errors reported by `golint` are not fixed due to possible compatibility issues.

Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>
2016-04-12 08:13:16 +00:00
George Lestaris f7ae27bfb7 HookState adhears to OCI
Signed-off-by: George Lestaris <glestaris@pivotal.io>
Signed-off-by: Ed King <eking@pivotal.io>
2016-04-06 16:57:59 +01:00
Peng Gao 3fa246609c Fix typo
Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>
2016-03-27 12:44:16 +08:00
Jessica Frazelle 2c5b10189c
remove deadcode
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-03-17 13:36:28 -07:00
Michael Crosby 732a0fb440 Merge pull request #638 from hqhq/hq_fix_bootstrapData
Fix encoding gid mappings
2016-03-14 11:55:12 -07:00
Mrunal Patel 459efccb0a Merge pull request #576 from avagin/cr
Call Prestart hooks before restoring processes
2016-03-14 11:21:29 -07:00
Qiang Huang 2f2c83a2a0 Fix encoding gid mappings
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-03-12 13:18:42 +08:00
Michael Crosby 20422c9bd9 Update libcontainer to support rlimit per process
This updates runc and libcontainer to handle rlimits per process and set
them correctly for the container.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-10 14:35:16 -08:00
Michael Crosby 3cc90bd2d8 Add support for process overrides of settings
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-03 11:41:33 -08:00
Ido Yariv 78f5148c67 Fix handling of unsupported namespaces
currentState() always adds all possible namespaces to the state,
regardless of whether they are supported.
If orderNamespacePaths detects an unsupported namespace, an error is
returned that results in initialization failure.

Fix this by only adding paths of supported namespaces to the state.

Signed-off-by: Ido Yariv <ido@wizery.com>
2016-03-02 10:16:51 -05:00
Daniel, Dao Quang Minh 42d5d04801 Sets custom namespaces for init processes
An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.

This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.

With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
2016-02-28 12:26:53 -08:00
Daniel, Dao Quang Minh d6bf4049f8 OrderNamespacePaths gets correct order of ns
This adds orderNamespacePaths to get correct order of namespaces for the
bootstrap program to join.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2016-02-28 12:26:53 -08:00