Commit Graph

3085 Commits

Author SHA1 Message Date
Mrunal Patel 4f21aea40d Merge pull request #1321 from Mashimiao/support-device-new-type
support create device with type p and u
2017-02-10 10:02:15 -08:00
Ma Shimiao 06e27471bb support create device with type p and u
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-02-10 14:45:15 +08:00
Michael Crosby e944298919 Merge pull request #1316 from hqhq/cleanup_dest
Small cleanup
2017-02-08 10:04:39 -08:00
Qiang Huang a8d7eb7076 Merge pull request #1314 from runcom/overlay-mounts
libcontainer: rootfs_linux: support overlayfs
2017-02-08 16:17:01 +08:00
Qiang Huang 45a8341811 Small cleanup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-08 15:09:06 +08:00
Antonio Murdaca ca14e7b463
libcontainer: rootfs_linux: support overlayfs
As the runtime-spec allows it, we want to be able to specify overlayfs
mounts with:

    {
        "destination": "/etc/pki",
        "type": "overlay",
        "source": "overlay",
        "options": [
            "lowerdir=/etc/pki:/home/amurdaca/go/src/github.com/opencontainers/runc/rootfs_fedora/etc/pki"
        ]
    },

This patch takes care of allowing overlayfs mounts. Both RO and RW
should be supported.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 19:43:24 +01:00
Mrunal Patel b263a43430 Merge pull request #1312 from runcom/fix-selinux-labels
libcontainer: selinux: fix DupSecOpt and DisableSecOpt
2017-02-06 10:33:42 -08:00
Antonio Murdaca 75acc7c7c3
libcontainer: selinux: fix DupSecOpt and DisableSecOpt
`label.InitLabels` takes options as a string slice in the form of:

    user:system_u
    role:system_r
    type:container_t
    level:s0:c4,c5

However, `DupSecOpt` and `DisableSecOpt` were still adding a docker
specifc `label=` in front of every option. That leads to `InitLabels`
not being able to correctly init selinux labels in this scenario for
instance:

    label.InitLabels(DupSecOpt([%OPTIONS%]))

if `%OPTIONS` has options prefixed with `label=`, that's going to fail.
Fix this by removing that docker specific `label=` prefix.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 17:29:42 +01:00
Qiang Huang 7350cd8640 Merge pull request #1285 from stevenh/signal-wait
Only wait for processes after delivering SIGKILL in signalAllProcesses
2017-02-06 16:41:24 +08:00
Qiang Huang 0c21b089e6 Merge pull request #1309 from stevenh/recorded-state-typo
Correct docs typo for restoredState.
2017-02-04 11:51:25 +08:00
Daniel, Dao Quang Minh 35356c4a18 Merge pull request #1310 from stevenh/destroy-docs
Correct container.Destroy() docs
2017-02-03 22:12:02 +00:00
Steven Hartland 54862146c7 Correct docs typo for restoredState.
Correct typo in docs for restoredState.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:19:01 +00:00
Steven Hartland 3f431f497e Correct container.Destroy() docs
Correct container.Destroy() docs to clarify that destroy can only operate on containers in specific states.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:18:29 +00:00
Qiang Huang be33383e60 Merge pull request #1293 from stevenh/resolve-initarg
Resolve InitArgs to ensure init works
2017-02-03 19:25:52 +08:00
Mrunal Patel bb4066468c Merge pull request #1305 from giuseppe/kill-max-2-args
kill: requires max 2 arguments
2017-02-02 12:04:19 -08:00
Michael Crosby 9073486547 Merge pull request #1274 from cyphar/further-CVE-2016-9962-cleanup
libcontainer: init: only pass stateDirFd when creating a container
2017-02-02 11:11:42 -08:00
Giuseppe Scrivano 1d6447797e kill: requires max 2 arguments
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-02-02 17:32:54 +01:00
Mrunal Patel 1c9c074d79 Merge pull request #1303 from runcom/revert-initlabels
Revert "DupSecOpt needs to match InitLabels"
2017-02-01 10:37:16 -08:00
Steven Hartland b9dfa444c4 Resolve InitArgs to ensure init works
If a relative pathed exe is used for InitArgs init will fail to run if Cwd is not set the original path.

Prevent failure of init to run by ensuring that exe in InitArgs is an absolute path.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:42:09 +00:00
Aleksa Sarai e034cedce7
libcontainer: init: only pass stateDirFd when creating a container
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
*host* filesystem from that file descriptor. This was fixed in part by
5d93fed3d2 ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.

To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).

There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.

Fixes: CVE-2016-9962
Fixes: 5d93fed3d2 ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-02-02 00:41:11 +11:00
Steven Hartland 82d895fbb9 Conditionally wait for children after delivering signal
When signaling children and the signal is SIGKILL wait for children
otherwise conditionally wait for children which are ready to report.

This reaps all children which exited due to the signal sent without
blocking indefinitely.

Also:
* Ignore ignore ECHILD, which means the child has already gone.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:22:37 +00:00
Antonio Murdaca 384c1e595c
Revert "DupSecOpt needs to match InitLabels"
This reverts commit 491cadac92.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-01 09:14:20 +01:00
Mrunal Patel 510879e31f Merge pull request #1284 from stevenh/godoc
Add godoc links to README.md files
2017-01-30 10:56:58 -08:00
Daniel, Dao Quang Minh 6c22e77604 Merge pull request #1294 from stevenh/start-init-fixes
Ensure pipe is always closed on error in StartInitialization
2017-01-27 16:25:44 +00:00
Daniel, Dao Quang Minh 82f9fdd690 Merge pull request #1300 from hqhq/defer_tty_close_earlier
Call defer tty.Close() earlier
2017-01-27 16:20:20 +00:00
Qiang Huang ed2df2906b Merge pull request #1205 from YuPengZTE/devError
fix typos by the result of golint checking
2017-01-27 21:42:18 +08:00
Qiang Huang c9005dd1d5 Call defer tty.Close() earlier
We could fail and return in tty.recvtty() and miss
tty.Close(), which might cause broken console output.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-26 09:02:00 +08:00
Mrunal Patel c139a7c761 Merge pull request #1298 from stevenh/mention-nsenter
Add nsenter details to libcontainer README.md
2017-01-25 16:25:02 -08:00
Steven Hartland 64aa78b762 Ensure pipe is always closed on error in StartInitialization
Ensure that the pipe is always closed during the error processing of  StartInitialization.

Also:
* Fix a comment typo.
* Use newContainerInit directly as there's no need for i to be an initer.
* Move the comment about the behaviour of Init() directly above it, clarifying what happens for all defers.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-25 12:36:40 +00:00
Qiang Huang 8a055cad4b Merge pull request #1291 from justincormack/lu
Remove a compiler warning in some environments
2017-01-25 16:35:38 +08:00
Steven Hartland 89fb8b1609 Add nsenter details to libcontainer README.md
Add the import of nsenter to the example in libcontainer's README.md, as without it none of the example code works.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-25 01:05:36 +00:00
Justin Cormack 6ba5f5f9b8 Remove a compiler warning in some environments
POSIX mandates that `cmsg_len` in `struct cmsghdr` is a `socklen_t`,
which is an `unsigned int`. Musl libc as used in Alpine implements
this; Glibc ignores the spec and makes it a `size_t` ie `unsigned long`.
To avoid the `-Wformat=` warning from the `%lu` on Alpine, cast this
to an `unsigned long` always.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2017-01-24 14:06:15 +00:00
Mrunal Patel ce450bcc6c Merge pull request #1288 from rainrambler/patch-1
using golang-style assignment
2017-01-23 10:45:24 -08:00
rainrambler 4449acd306 using golang-style assignment
using golang-style assignment, not the c-style

Signed-off-by: Wang Anyu <wanganyu@outlook.com>
2017-01-23 14:37:16 +08:00
Steven Hartland a887fc3f2d Add godoc links to README.md files
Add godoc links to README.md files for runc and libcontainer so its easy to access the golang documentation.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-21 18:21:03 +00:00
Steven Hartland 27a5447ea4 Only wait for processes after delivering SIGKILL in signalAllProcesses
signalAllProcesses was making the assumption that the requested signal was SIGKILL, possibly due to the signal parameter being added at a later date, and hence it was safe to wait for all processes which is not the case.

BaseContainer.Signal(s os.Signal, all bool) exposes this functionality to consumers, so an arbitrary signal could be used which is not guaranteed to make the processes exit.

Correct the documentation for signalAllProcesses around the signal delivered and update it so that the wait is only performed on SIGKILL hence making it safe to process other signals without risk of blocking forever, while still maintaining compatibility to SIGKILL callers.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-21 18:20:23 +00:00
Daniel, Dao Quang Minh 0fefa36f3a Merge pull request #1278 from datawolf/scanner
move error check out of the for loop
2017-01-20 17:49:44 +00:00
Daniel, Dao Quang Minh b8cefd7d8f Merge pull request #1266 from mrunalp/ignore_cgroup_v2
Ignore cgroup2 mountpoints
2017-01-20 17:26:46 +00:00
Qiang Huang 8c4807f094 Merge pull request #1282 from giuseppe/kill-1-arg
kill: make second argument optional
2017-01-20 00:30:16 -06:00
Giuseppe Scrivano b760919f33 kill: make second argument optional
commit b517076907 added a check that kill accepts two arguments.
Since the second argument is optional, change it back to accept the
shorter form "kill CONTAINER".

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-01-19 10:13:08 +01:00
Wang Long 3a71eb0256 move error check out of the for loop
The `bufio.Scanner.Scan` method returns false either by reaching the
end of the input or an error. After Scan returns false, the Err method
will return any error that occurred during scanning, except that if it
was io.EOF, Err will return nil.

We should check the error when Scan return false(out of the for loop).

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-18 05:02:39 +00:00
Qiang Huang a9610f2c02 Merge pull request #1249 from datawolf/small-refactor
small refactor
2017-01-13 02:04:59 -06:00
Mrunal Patel 29008b871d Merge pull request #1271 from hqhq/bump_golang
Bump golang to 1.7.4
2017-01-12 07:32:46 -08:00
Qiang Huang c94bc353ef Bump golang to 1.7.4
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-12 16:15:39 +08:00
Mrunal Patel c7ebda72ac Add a test for testing that we ignore cgroup2 mounts
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 16:49:53 -08:00
Mrunal Patel e7b57cb042 Ignore cgroup2 mountpoints
Our current cgroup parsing logic assumes cgroup v1 mounts
so we should ignore cgroup2 mounts for now

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 12:34:50 -08:00
Mrunal Patel 361bb0001a Merge pull request #1268 from hqhq/use_source_mp
Do not create cgroup dir name from combining subsystems
2017-01-11 11:34:34 -08:00
Michael Crosby 5d93fed3d2 Set init processes as non-dumpable
This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

This patch also ensures that the stateDirFD being used is still closed
prior to exec, even though it is set as O_CLOEXEC, because of the order
in the kernel.

https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-01-11 09:56:56 -08:00
Daniel, Dao Quang Minh 2cc5a91249 Merge pull request #1260 from coolljt0725/remove_redundant
Cleanup: remove redundant code
2017-01-11 17:18:15 +00:00
Qiang Huang 0599ac7d93 Do not create cgroup dir name from combining subsystems
On some systems, when we mount some cgroup subsystems into
a same mountpoint, the name sequence of mount options and
cgroup directory name can not be the same.

For example, the mount option is cpuacct,cpu, but
mountpoint name is /sys/fs/cgroup/cpu,cpuacct. In current
runc, we set mount destination name from combining
subsystems, which comes from mount option from
/proc/self/mountinfo, so in my case the name would be
/sys/fs/cgroup/cpuacct,cpu, which is differernt from
host, and will break some applications.

Fix it by using directory name from host mountpoint.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-11 15:27:58 +08:00