Commit Graph

3157 Commits

Author SHA1 Message Date
Giuseppe Scrivano c8593c4d61 sanitize systemd-notify message
Accept only READY= notify messages from the container.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-02-22 22:28:03 +01:00
Giuseppe Scrivano 892f2ded6f fix systemd-notify when using a different PID namespace
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram.  A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.

The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.

Example of this issue:

https://github.com/projectatomic/atomic-system-containers/pull/24

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-02-22 22:27:59 +01:00
Qiang Huang bd2f9c52cd Merge pull request #1339 from cpuguy83/dont_override_errors
Don't override system error
2017-02-22 11:08:18 -08:00
Qiang Huang 733563552e Fix state when _LIBCONTAINER in environment
Fixes: #1311

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:35:14 -08:00
Qiang Huang 805b8c73d3 Do not create exec fifo in factory.Create
It should not be binded to container creation, for
example, runc restore needs to create a
libcontainer.Container, but it won't need exec fifo.

So create exec fifo when container is started or run,
where we really need it.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:34:48 -08:00
Brian Goff d193f95d07 Don't override system error
The error message added here provides no value as the caller already
knows all the added details. However it is covering up the underyling
system error (typically `ENOTSUP`). There is no way to handle this error before
this change.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-02-22 09:29:38 -05:00
Qiang Huang 3293874044 Merge pull request #1332 from sak0/dev
ps: --format value check
2017-02-21 11:56:15 -08:00
CuiHaozhi 08937e97bc ps: --format value check
Signed-off-by: CuiHaozhi <cuihaozhi@chinacloud.com.cn>
2017-02-22 00:20:23 +08:00
Aleksa Sarai 3d7cd1f1b2
merge branch 'pr-1335'
Closes #1335
LGTMs: @crosbymichael @cyphar
2017-02-21 12:29:57 +11:00
Michael Crosby 8438b26e9f Merge pull request #1237 from hqhq/fix_sync_race
Fix race condition when sync with child and grandchild
2017-02-20 17:16:43 -08:00
Qiang Huang 6ccc2096a8 Merge pull request #1336 from crosbymichael/size_t
Use %zu for printing of size_t values
2017-02-20 17:14:34 -08:00
Michael Crosby 4a164a826c Use %zu for printing of size_t values
This helps fix compile warnings on some arm systems.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-02-20 16:57:27 -08:00
Máximo Cuadros e773f96b0e update go version at travis-ci
Signed-off-by: Máximo Cuadros <mcuadros@gmail.com>
2017-02-20 13:15:58 +01:00
Ian Campbell f5adb05bce Add --preserve-fds=N to create and run
This preserves the given number of file descriptors on top of the 3 stdio and
the socket activation ($LISTEN_FDS=M) fds.

If LISTEN_FDS is not set then [3..3+N) would be preserved by --preserve-fds=N.

Given LISTEN_FDS=3 and --preserve-fds=5 then we would preserve fds [3, 11) (in
addition to stdio).  That's 3, 4 & 5 from LISTEN_FDS=3 and 6, 7, 8, 9 & 10 from
--preserve-fds=5.

Signed-off-by: Ian Campbell <ian.campbell@docker.com>
2017-02-20 11:50:18 +00:00
Qiang Huang a54316bae1 Fix race condition when sync with child and grandchild
Fixes: #1236
Fixes: #1281

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-18 20:42:08 +08:00
Qiang Huang 6b1d0e76f2 Merge pull request #1127 from boynux/fix-set-mem-to-unlimited
Fixes set memory to unlimited
2017-02-16 09:51:23 +08:00
Daniel, Dao Quang Minh b433dea1fc Merge pull request #1328 from sak0/dev
fix typo
2017-02-15 16:52:09 +00:00
CuiHaozhi f6524df0b1 fix typo
Signed-off-by: CuiHaozhi <61755280@qq.com>
2017-02-15 22:20:40 +08:00
Mohammad Arab 18ebc51b3c Reset Swap when memory is set to unlimited (-1)
Kernel validation fails if memory set to -1 which is unlimited but
swap is not set so.

Signed-off-by: Mohammad Arab <boynux@gmail.com>
2017-02-15 08:11:57 +01:00
Carlton Semple 9a7e5a9434 Update devices_unix.go for LXD
getDevices() has been updated to skip `/dev/.lxc` and `/dev/.lxd-mounts`, which was breaking privileged Docker containers running on runC, inside of LXD managed Linux Containers

Signed-off-by: Carlton-Semple <carlton.semple@ibm.com>
2017-02-14 16:12:03 -05:00
Deng Guangxing 98f004182b add pre-dump and parent-path to checkpoint
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.

This patch add pre-dump and parent-path arguments to runc checkpoint

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2017-02-14 19:45:07 +08:00
Mrunal Patel 4f21aea40d Merge pull request #1321 from Mashimiao/support-device-new-type
support create device with type p and u
2017-02-10 10:02:15 -08:00
Ma Shimiao 06e27471bb support create device with type p and u
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-02-10 14:45:15 +08:00
Michael Crosby e944298919 Merge pull request #1316 from hqhq/cleanup_dest
Small cleanup
2017-02-08 10:04:39 -08:00
Qiang Huang a8d7eb7076 Merge pull request #1314 from runcom/overlay-mounts
libcontainer: rootfs_linux: support overlayfs
2017-02-08 16:17:01 +08:00
Qiang Huang 45a8341811 Small cleanup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-08 15:09:06 +08:00
Antonio Murdaca ca14e7b463
libcontainer: rootfs_linux: support overlayfs
As the runtime-spec allows it, we want to be able to specify overlayfs
mounts with:

    {
        "destination": "/etc/pki",
        "type": "overlay",
        "source": "overlay",
        "options": [
            "lowerdir=/etc/pki:/home/amurdaca/go/src/github.com/opencontainers/runc/rootfs_fedora/etc/pki"
        ]
    },

This patch takes care of allowing overlayfs mounts. Both RO and RW
should be supported.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 19:43:24 +01:00
Mrunal Patel b263a43430 Merge pull request #1312 from runcom/fix-selinux-labels
libcontainer: selinux: fix DupSecOpt and DisableSecOpt
2017-02-06 10:33:42 -08:00
Antonio Murdaca 75acc7c7c3
libcontainer: selinux: fix DupSecOpt and DisableSecOpt
`label.InitLabels` takes options as a string slice in the form of:

    user:system_u
    role:system_r
    type:container_t
    level:s0:c4,c5

However, `DupSecOpt` and `DisableSecOpt` were still adding a docker
specifc `label=` in front of every option. That leads to `InitLabels`
not being able to correctly init selinux labels in this scenario for
instance:

    label.InitLabels(DupSecOpt([%OPTIONS%]))

if `%OPTIONS` has options prefixed with `label=`, that's going to fail.
Fix this by removing that docker specific `label=` prefix.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 17:29:42 +01:00
Qiang Huang 7350cd8640 Merge pull request #1285 from stevenh/signal-wait
Only wait for processes after delivering SIGKILL in signalAllProcesses
2017-02-06 16:41:24 +08:00
Qiang Huang 0c21b089e6 Merge pull request #1309 from stevenh/recorded-state-typo
Correct docs typo for restoredState.
2017-02-04 11:51:25 +08:00
Daniel, Dao Quang Minh 35356c4a18 Merge pull request #1310 from stevenh/destroy-docs
Correct container.Destroy() docs
2017-02-03 22:12:02 +00:00
Steven Hartland 54862146c7 Correct docs typo for restoredState.
Correct typo in docs for restoredState.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:19:01 +00:00
Steven Hartland 3f431f497e Correct container.Destroy() docs
Correct container.Destroy() docs to clarify that destroy can only operate on containers in specific states.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:18:29 +00:00
Qiang Huang be33383e60 Merge pull request #1293 from stevenh/resolve-initarg
Resolve InitArgs to ensure init works
2017-02-03 19:25:52 +08:00
Mrunal Patel bb4066468c Merge pull request #1305 from giuseppe/kill-max-2-args
kill: requires max 2 arguments
2017-02-02 12:04:19 -08:00
Michael Crosby 9073486547 Merge pull request #1274 from cyphar/further-CVE-2016-9962-cleanup
libcontainer: init: only pass stateDirFd when creating a container
2017-02-02 11:11:42 -08:00
Giuseppe Scrivano 1d6447797e kill: requires max 2 arguments
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-02-02 17:32:54 +01:00
Mrunal Patel 1c9c074d79 Merge pull request #1303 from runcom/revert-initlabels
Revert "DupSecOpt needs to match InitLabels"
2017-02-01 10:37:16 -08:00
Steven Hartland b9dfa444c4 Resolve InitArgs to ensure init works
If a relative pathed exe is used for InitArgs init will fail to run if Cwd is not set the original path.

Prevent failure of init to run by ensuring that exe in InitArgs is an absolute path.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:42:09 +00:00
Aleksa Sarai e034cedce7
libcontainer: init: only pass stateDirFd when creating a container
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
*host* filesystem from that file descriptor. This was fixed in part by
5d93fed3d2 ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.

To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).

There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.

Fixes: CVE-2016-9962
Fixes: 5d93fed3d2 ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-02-02 00:41:11 +11:00
Steven Hartland 82d895fbb9 Conditionally wait for children after delivering signal
When signaling children and the signal is SIGKILL wait for children
otherwise conditionally wait for children which are ready to report.

This reaps all children which exited due to the signal sent without
blocking indefinitely.

Also:
* Ignore ignore ECHILD, which means the child has already gone.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:22:37 +00:00
Antonio Murdaca 384c1e595c
Revert "DupSecOpt needs to match InitLabels"
This reverts commit 491cadac92.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-01 09:14:20 +01:00
Mrunal Patel 510879e31f Merge pull request #1284 from stevenh/godoc
Add godoc links to README.md files
2017-01-30 10:56:58 -08:00
Daniel, Dao Quang Minh 6c22e77604 Merge pull request #1294 from stevenh/start-init-fixes
Ensure pipe is always closed on error in StartInitialization
2017-01-27 16:25:44 +00:00
Daniel, Dao Quang Minh 82f9fdd690 Merge pull request #1300 from hqhq/defer_tty_close_earlier
Call defer tty.Close() earlier
2017-01-27 16:20:20 +00:00
Qiang Huang ed2df2906b Merge pull request #1205 from YuPengZTE/devError
fix typos by the result of golint checking
2017-01-27 21:42:18 +08:00
Qiang Huang c9005dd1d5 Call defer tty.Close() earlier
We could fail and return in tty.recvtty() and miss
tty.Close(), which might cause broken console output.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-26 09:02:00 +08:00
Mrunal Patel c139a7c761 Merge pull request #1298 from stevenh/mention-nsenter
Add nsenter details to libcontainer README.md
2017-01-25 16:25:02 -08:00
Steven Hartland 64aa78b762 Ensure pipe is always closed on error in StartInitialization
Ensure that the pipe is always closed during the error processing of  StartInitialization.

Also:
* Fix a comment typo.
* Use newContainerInit directly as there's no need for i to be an initer.
* Move the comment about the behaviour of Init() directly above it, clarifying what happens for all defers.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-25 12:36:40 +00:00