Commit Graph

762 Commits

Author SHA1 Message Date
Aleksa Sarai 7df64f8886
runc: implement --console-socket
This allows for higher-level orchestrators to be able to have access to
the master pty file descriptor without keeping the runC process running.
This is key to having (detach && createTTY) with a _real_ pty created
inside the container, which is then sent to a higher level orchestrator
over an AF_UNIX socket.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Mrunal Patel f1324a9fc1
Don't label the console as it already has the right label
[@cyphar: removed mountLabel argument from .mount().]

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai c0c8edb9e8
console: don't chown(2) the slave PTY
Since the gid=X and mode=Y flags can be set inside config.json as mount
options, don't override them with our own defaults. This avoids
/dev/pts/* not being owned by tty in a regular container, as well as all
of the issues with us implementing grantpt(3) manually. This is the
least opinionated approach to take.

This patch is part of the console rewrite patchset.

Reported-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 244c9fc426
*: console rewrite
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.

In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.

We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 4776b4326a
libcontainer: refactor syncT handling
To make the code cleaner, and more clear, refactor the syncT handling
used when creating the `runc init` process. In addition, document the
state changes so that people actually understand what is going on.

Rather than only using syncT for the standard initProcess, use it for
both initProcess and setnsProcess. This removes some special cases, as
well as allowing for the use of syncT with setnsProcess.

Also remove a bunch of the boilerplate around syncT handling.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
Aleksa Sarai 2055115566
cmsg: add cmsg {send,recv}fd wrappers
This adds C wrappers for sendmsg and recvmsg, specifically used for
passing around file descriptors in Go. The wrappers (sendfd, recvfd)
expect to be called in a context where it makes sense (where the other
side is carrying out the corresponding action).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
allencloud f596858395 fix typos
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-11-30 13:31:36 +08:00
Mrunal Patel 4c013a1524 Merge pull request #1194 from hqhq/fix_cpu_exclusive
Fix cpuset issue with cpuset.cpu_exclusive
2016-11-29 09:49:34 -08:00
Daniel, Dao Quang Minh f156f73c2a Merge pull request #1154 from hqhq/sync_child
Sync with grandchild
2016-11-23 09:10:00 -08:00
Qiang Huang aee46862ec Fix cpuset issue with cpuset.cpu_exclusive
This PR fix issue in this scenario:

```
in terminal 1:
~# cd /sys/fs/cgroup/cpuset
~# mkdir test
~# cd test
~# cat cpuset.cpus
0-3
~# echo 1 > cpuset.cpu_exclusive (make sure you don't have other cgroups under root)

in terminal 2:
~# echo $$ > /sys/fs/cgroup/cpuset/test/tasks
// set resources.cpu.cpus="0-2" in config.json
~# runc run test1

back to terminal 1:
~# cd test1
~# cat cpuset.cpus
0-2
~# echo 1 > cpuset.cpu_exclusive

in terminal 3:
~# echo $$ > /sys/fs/cgroup/test/tasks
// set resources.cpu.cpus="3" in config.json
~# runc run test2
container_linux.go:247: starting container process caused "process_linux.go:258:
applying cgroup configuration for process caused \"failed to write 0-3\\n to
cpuset.cpus: write /sys/fs/cgroup/cpuset/test2/cpuset.cpus: invalid argument\""
```

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-18 15:28:40 +08:00
Qiang Huang 16a2e8ba6e Sync with grandchild
Without this, it's possible that father process exit with
0 before grandchild exit with error.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-17 08:59:37 +08:00
rajasec 43287af982 Fixing error message in nsexec
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-11-10 17:06:50 +05:30
Mrunal Patel 51371867a0 Merge pull request #1180 from crosbymichael/kill-all
Add --all flag to kill
2016-11-09 12:21:22 -07:00
Michael Crosby e58671e530 Add --all flag to kill
This allows a user to send a signal to all the processes in the
container within a single atomic action to avoid new processes being
forked off before the signal can be sent.

This is basically taking functionality that we already use being
`delete` and exposing it ok the `kill` command by adding a flag.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-08 09:35:02 -08:00
Mrunal Patel 8779fa57eb Merge pull request #1168 from hqhq/fix_nsexec_comments
More fix to nsexec.c's comments
2016-11-07 16:20:42 -07:00
Michael Crosby 5f24c9a61a Merge pull request #1146 from cyphar/io-set-termios-onlcr
libcontainer: io: stop screwing with \n in console output
2016-11-03 09:49:50 -07:00
Mrunal Patel d7481c10f4 Merge pull request #1172 from crosbymichael/ambient-tag
Move ambient capabilties behind build tag
2016-11-02 20:16:26 -07:00
Qiang Huang 84a4218ece More fix to nsexec.c's comments
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-03 10:15:01 +08:00
Aleksa Sarai 49ed0a10e4
merge branch 'pr-1117'
LGTMs: @hqhq @cyphar
Closes: #1117
2016-11-03 05:03:26 +11:00
Michael Crosby 603c151e6c Move ambient capabilties behind build tag
This moves the ambient capability support behind an `ambient` build tag
so that it is only compiled upon request.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-02 10:59:59 -07:00
Crazykev 34d7c5c099 fix error message
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
2016-11-02 16:34:08 +08:00
Aleksa Sarai fd7ab60a70
libcontainer: make tests to make sure we don't mess with \r
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Aleksa Sarai eea28f480d
libcontainer: io: stop screwing with \n in console output
The default terminal setting for a new pty on Linux (unix98) has +ONLCR,
resulting in '\n' writes by a container process to be converted to
'\r\n' reads by the managing process. This is quite unexpected, and
causes multiple issues with things like bats testing. To fix it, make
the terminal sane after opening it by setting -ONLCR.

This patch might need to be rewritten after the console rewrite patchset
is merged.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Mrunal Patel bc462c96bf Merge pull request #1165 from cyphar/nsenter-fix-comments
nsenter: fix up comments
2016-10-31 10:39:34 -07:00
Daniel, Dao Quang Minh 509b1db98c Merge pull request #1160 from hqhq/fix_typos
Fix all typos found by misspell
2016-10-31 17:28:44 +00:00
Michael Crosby 8b9b444820 Merge pull request #1157 from rajasec/readme-containerstate
Updating container state and status API in README
2016-10-31 10:26:21 -07:00
Michael Crosby 4c7b8d6c59 Merge pull request #1159 from hqhq/unify_rootfs_validation
Unify rootfs validation
2016-10-31 10:22:01 -07:00
Aleksa Sarai 9b15bf17a0
nsenter: fix up comments
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 00:21:09 +11:00
rajasec 16ad3855e7 Correction in util error messages
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-29 19:50:56 +05:30
Qiang Huang b15668b36d Fix all typos found by misspell
I use the same tool (https://github.com/client9/misspell)
as Daniel used a few days ago, don't why he missed these
typos at that time.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 14:14:42 +08:00
Qiang Huang 81d6088c8f Unify rootfs validation
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 10:31:44 +08:00
rajasec 1535e67592 Updating container state and status API in README
Signed-off-by: rajasec <rajasec79@gmail.com>

Updating container state and status API in README

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-27 15:29:34 +05:30
Qiang Huang e7abf30cb8 Merge pull request #1150 from WeiZhang555/forbid-duplicated-namespace
Detect and forbid duplicated namespace in spec
2016-10-27 10:23:16 +08:00
Qiang Huang f520eab891 Remove unnecessary cloneflag validation
config.cloneflag is not mandatory, when using `runc exec`,
config.cloneflag can be empty, and even then it won't be
`-1` but `0`.

So this validation is totally wrong and unneeded.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 09:34:20 +08:00
Mrunal Patel 4599e7074e Merge pull request #1148 from rhvgoyal/parent-mount-private
Make parent mount private before bind mounting rootfs
2016-10-26 17:30:37 +00:00
Zhang Wei a0f7977f0f Detect and forbid duplicated namespace in spec
When spec file contains duplicated namespaces, e.g.

specs: specs.Spec{
        Linux: &specs.Linux{
            Namespaces: []specs.Namespace{
                {
                    Type: "pid",
                },
                {
                    Type: "pid",
                    Path: "/proc/1/ns/pid",
                },
            },
        },
    }

runc should report malformed spec instead of using latest one by
default, because this spec could be quite confusing.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 00:44:36 +08:00
Michael Crosby 6328410520 Merge pull request #1149 from cyphar/fix-sysctl-validation
validator: unbreak sysctl net.* validation
2016-10-26 09:06:41 -07:00
Aleksa Sarai 1ab3c035d2
validator: actually test success
Previously we only tested failures, which causes us to miss issues where
setting sysctls would *always* fail.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 23:07:57 +11:00
Aleksa Sarai 2a94c3651b
validator: unbreak sysctl net.* validation
When changing this validation, the code actually allowing the validation
to pass was removed. This meant that any net.* sysctl would always fail
to validate.

Fixes: bc84f83344 ("fix docker/docker#27484")
Reported-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 22:58:51 +11:00
Qiang Huang 157a96a428 Merge pull request #977 from cyphar/nsenter-userns-ordering
nsenter: guarantee correct user namespace ordering
2016-10-26 16:45:15 +08:00
Vivek Goyal 6c147f8649 Make parent mount private before bind mounting rootfs
This reverts part of the commit eb0a144b5e

That commit introduced two issues.

- We need to make parent mount of rootfs private before bind mounting
  rootfs. Otherwise bind mounting root can propagate in other mount
  namespaces. (If parent mount is shared).

- It broke test TestRootfsPropagationSharedMount() on Fedora.

  On fedora /tmp is a mount point with "shared" propagation. I think
  you should be able to reproduce it on other distributions as well
  as long as you mount tmpfs on /tmp and make it "shared" propagation.

  Reason for failure is that pivot_root() fails. And it fails because
  kernel does following check.

  IS_MNT_SHARED(new_mnt->mnt_parent)

  Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
  is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
  fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2016-10-25 11:15:11 -04:00
Qiang Huang 4ec570d060 Merge pull request #1138 from gaocegege/fix-config-validator
docker/docker#27484-check if sysctls are used in host network mode.
2016-10-25 11:08:51 +08:00
Aleksa Sarai c7ed2244f4
merge branch 'pr-1125'
LGTMs: @hqhq @mrunalp
Closes #1125
2016-10-25 10:05:28 +11:00
Ce Gao 41c35810f2 add test cases about host ns
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:31:15 +08:00
Ce Gao bc84f83344 fix docker/docker#27484
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:22:52 +08:00
Alexander Morozov 1ab9d5e6f4 Merge pull request #845 from mrunalp/cp_tmpfs
Add support for copying up directories into tmpfs when a tmpfs is mounted over them
2016-10-21 13:47:16 -07:00
Mrunal Patel c4198ad9af Merge pull request #1134 from WeiZhang555/tiny-refactor
Some refactor and cleanup
2016-10-20 15:08:40 -07:00
Yong Tang a83f5bac28 Fix issue in `GetProcessStartTime`
This fix tries to address the issue raised in docker:
https://github.com/docker/docker/issues/27540

The issue was that `GetProcessStartTime` use space `"  "`
to split the `/proc/[pid]/stat` and take the `22`th value.

However, the `2`th value is inside `(` and `)`, and could
contain space. The following are two examples:
```
ubuntu@ubuntu:~/runc$ cat /proc/90286/stat
90286 (bash) S 90271 90286 90286 34818 90286 4194560 1412 1130576 4 0 2 1 2334 438 20 0 1 0 3093098 20733952 823 18446744073709551615 1 1 0 0 0 0 0 3670020 1266777851 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ubuntu@ubuntu:~/runc$ cat /proc/89653/stat
89653 (gunicorn: maste) S 89630 89653 89653 0 -1 4194560 29689 28896 0 3 146 32 76 19 20 0 1 0 2971844 52965376 3920 18446744073709551615 1 1 0 0 0 0 0 16781312 137447943 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
```

This fix fixes this issue by removing the prefix before `)`,
then finding the `20`th value (instead of `22`th value).

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-10-20 11:34:21 -07:00
Zhang Wei c179b0ffc7 Some refactor and cleanup
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-20 17:58:51 +08:00
Aleksa Sarai f8e6b5af5e
rootfs: make pivot_root not use a temporary directory
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.

This is the first step of many to allowing runC to work with a
completely read-only rootfs.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-20 12:55:58 +11:00