Commit Graph

3526 Commits

Author SHA1 Message Date
Aleksa Sarai 10b175ce49
signal: ignore tty.resize errors
Fixes a race that occurred very frequently in testing where the tty of
the container may be closed by the time that runc gets to sending
SIGWINCH. This failure mode is not fatal, but it would cause test
failures due to expected outputs not matching. On further review it
appears that the original addition of these checks in 4c5bf649d0
("Check error return values") was actually not necessary, so partially
revert that change.

The particular failure mode this resolves would manifest as error logs
of the form:

  time="2017-08-24T07:59:50Z" level=error msg="bad file descriptor"

Fixes: 4c5bf649d0 ("Check error return values")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-27 00:44:17 +10:00
Aleksa Sarai 1f32fff46d
setns init: delay seccomp as late as possible
This mirrors the standard_init_linux.go seccomp code, which only applies
seccomp early if NoNewPrivileges is enabled. Otherwise it's done
immediately before execve to reduce the amount of syscalls necessary for
users to enable in their seccomp profiles.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-26 13:42:30 +10:00
Aleksa Sarai 3ddde27d7d
init: move close(stateDirFd) before seccomp apply
This further reduces the number of syscalls that a user needs to enable
in their seccomp profile.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-26 13:42:26 +10:00
Qiang Huang 1c81e2a794 Merge pull request #1572 from tych0/fix-readonly-userns
fix --read-only containers under --userns-remap
2017-08-26 09:38:14 +08:00
Aleksa Sarai 4d6e6720a7
Merge branch 'pr-1573'
Fix systemd cgroup after memory type changed

LGTMs: @crosbymichael @cyphar
Closes #1573
2017-08-25 23:55:27 +10:00
Michael Crosby 4e33faefa7 Merge pull request #1570 from cyphar/close-statedirfd-hole
init: switch away from stateDirFd entirely
2017-08-25 09:52:16 -04:00
Qiang Huang acaf6897f5 Fix systemd cgroup after memory type changed
Fixes: #1557

I'm not quite sure about the root cause, looks like
systemd still want them to be uint64.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-08-25 01:14:16 -04:00
Aleksa Sarai 7d66aab77a
init: switch away from stateDirFd entirely
While we have significant protections in place against CVE-2016-9962, we
still were holding onto a file descriptor that referenced the host
filesystem. This meant that in certain scenarios it was still possible
for a semi-privileged container to gain access to the host filesystem
(if they had CAP_SYS_PTRACE).

Instead, open the FIFO itself using a O_PATH. This allows us to
reference the FIFO directly without providing the ability for
directory-level access. When opening the FIFO inside the init process,
open it through procfs to re-open the actual FIFO (this is currently the
only supported way to open such a file descriptor).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-25 13:19:03 +10:00
Tycho Andersen 66eb2a3e8f fix --read-only containers under --userns-remap
The documentation here:
https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations

says that readonly containers can't be used with user namespaces do to some
kernel restriction. In fact, there is a special case in the kernel to be
able to do stuff like this, so let's use it.

This takes us from:

ubuntu@docker:~$ docker run -it --read-only ubuntu
docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:125: remounting \\\"/dev\\\" as readonly caused \\\"operation not permitted\\\"\"".

to:

ubuntu@docker:~$ docker-runc --version
runc version 1.0.0-rc4+dev
commit: ae2948042b08ad3d6d13cd09f40a50ffff4fc688-dirty
spec: 1.0.0
ubuntu@docker:~$ docker run -it --read-only ubuntu
root@181e2acb909a:/# touch foo
touch: cannot touch 'foo': Read-only file system

Signed-off-by: Tycho Andersen <tycho@docker.com>
2017-08-24 16:43:21 -06:00
Michael Crosby ae2948042b Merge pull request #1561 from nseps/master
Add AutoDedup option to CriuOpts
2017-08-18 12:50:27 -04:00
Nikolas Sepos 3f234b15d0 Add auto-dedup flag for checkpoint/restore
When doing incremental dumps is useful to use auto deduplication of
memory images to save space.

Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>
2017-08-18 16:19:21 +02:00
Nikolas Sepos da4a5a9515 Add AutoDedup option to CriuOpts
Memory image deduplication, very useful for incremental dumps.

See: https://criu.org/Memory_images_deduplication

Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>
2017-08-18 01:21:42 +02:00
Aleksa Sarai 59bbdc41a3
merge branch 'pr-1560'
Check error return values

LGTMs: @crosbymichael @cyphar
Closes #1560
2017-08-18 01:31:18 +10:00
Michael Crosby ccd2c20aa4 Merge pull request #1559 from Mashimiao/panic-fix-nil-linux
fix panic when Linux is nil for rootless case
2017-08-17 09:57:35 -04:00
Tobias Klauser 4c5bf649d0 Check error return values
Both tty.resize and notifySocket.setupSocket return an error which isn't
handled in the caller. Fix this and either log or propagate the errors.

Found using https://github.com/mvdan/unparam

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-08-17 11:41:19 +02:00
Michael Crosby c6126b2141 Merge pull request #1554 from cyphar/use-umoci-release-script
release: import umoci's release.sh script
2017-08-16 09:46:56 -04:00
Aleksa Sarai c24f602407
ci: smoke-test the release script
To make sure that `make release` doesn't suddenly break after we've cut
a release, smoke-test the release scripts. The script won't fail if GPG
keys aren't found, so running in CI shouldn't be a huge issue.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-16 14:44:45 +10:00
Aleksa Sarai ed68ee1e10
release: import umoci's release.sh script
This script is far easier to use than the previous `make release`
target, not to mention that it also automatically signs all of the
artefacts and makes everything really easy to do for maintainers.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-16 14:35:52 +10:00
Ma Shimiao 2333e7dc67 fix panic when Linux is nil for rootless case
congfig.Sysctl setting is duplicated.
when contianer is rootless and Linux is nil, runc will panic.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-08-16 09:11:13 +08:00
Mrunal Patel b31bdfc38a Merge pull request #1558 from hqhq/update_state
Update state after update
2017-08-15 10:46:44 -07:00
Qiang Huang e6e1c34a7d Update state after update
state.json should be a reflection of the container's
realtime state, including resource configurations,
so we should update state.json after updating container
resources.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-08-15 14:38:44 +08:00
Qiang Huang eb464f7e43 Merge pull request #1542 from cyphar/buildmode-pic
makefile: enable -buildmode=pie
2017-08-15 09:30:40 +08:00
Aleksa Sarai b45e243f8b
*: enable -buildmode=pie
Go has supported PIC builds for a while now, and given the security
benefits of using PIC binaries we should really enable them. There also
appears to be some indication that non-PIC builds have been interacting
oddly on ppc64le (the linker cannot load some shared libraries), and
using PIC builds appears to solve this problem.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-15 00:12:27 +10:00
Michael Crosby 760c67744b Merge pull request #1555 from cyphar/remove-install-flag-makefile
makefile: drop usage of --install
2017-08-14 10:04:33 -04:00
Michael Crosby 3096b3fc85 Merge pull request #1556 from hqhq/fix_flakytest_TestNotifyOnOOM
Fix flaky test TestNotifyOnOOM
2017-08-14 10:03:23 -04:00
Qiang Huang 9aa46c1e66 Merge pull request #1551 from crosbymichael/linux-nil
fix panic when Linux is nil
2017-08-14 19:35:31 +08:00
Qiang Huang 7726bcf0e2 Some fixes for testMemoryNotification
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-08-14 15:28:03 +08:00
Qiang Huang 40a1fb0e2f Fix flaky test TestNotifyOnOOM
Fixes: #1228

It can be reproduced by applying this patch:
```diff
@@ -45,6 +46,7 @@ func registerMemoryEvent(cgDir string, evName string, arg string) (<-chan struct
        go func() {
                defer func() {
                        close(ch)
+                       <-time.After(1 * time.Second)
                        eventfd.Close()
                        evFile.Close()
                }()
```

We can close channel after fds were closed.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-08-14 15:18:59 +08:00
Aleksa Sarai 6581d0f488
makefile: drop usage of --install
The "go build -i" invocation may slightly help with incremental
recompilation, but it will cause builds to fail if $GOROOT is not
writeable by the current user. While this does appear to work sometimes,
it's a concern for external build systems where "-i" causes build errors
for no real gain.

Given the size of the runc project, --install is not really giving us
much anyway.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-08-14 00:10:32 +10:00
Ma Shimiao 527dc5acbb fix panic when Linux is nil
Linux is not always not nil.
If Linux is nil, panic will occur.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-10 15:57:49 -04:00
Michael Crosby 3f2f8b84a7 Merge pull request #1553 from mlaventure/handle-non-devices
Handle non-devices correctly in DeviceFromPath
2017-08-10 14:37:50 -04:00
Aleksa Sarai 739db6d3fa
merge branch 'pr-1532'
VERSION: back to development
  VERSION: release v1.0.0-rc4

Votes: +5 -0 /2
LGTMs: @hqhq @crosbymichael
Closes #1532k
2017-08-11 00:31:10 +10:00
Kenfe-Mickael Laventure 3ed492ad33
Handle non-devices correctly in DeviceFromPath
Before this change, some file type would be treated as char devices
(e.g. symlinks).

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-09 08:52:20 -07:00
Michael Crosby d40db12e72 Merge pull request #1506 from LittleLightLittleFire/1443-runc-reap-child-process
Pass back the pid of runc:[1:CHILD] so we can wait on it
2017-08-07 09:33:14 -04:00
Alex Fang e92add2151 Pass back the pid of runc:[1:CHILD] so we can wait on it
This allows the libcontainer to automatically clean up runc:[1:CHILD]
processes created as part of nsenter.

Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>
2017-08-05 13:44:36 +10:00
Aleksa Sarai 45bde006ca
merge branch 'pr-1535'
LGTMs: @avagin @cyphar
Closes #1535
2017-08-05 13:33:07 +10:00
Aleksa Sarai 22bbec1b7f
merge branch 'pr-1548'
LGTMs: @crosbymichael @mrunalp @cyphar
Closes #1548
2017-08-05 13:02:46 +10:00
Mrunal Patel 135b9992b3 Merge pull request #1544 from mlaventure/fix-device-from-path
Fix condition to detect device type in DeviceFromPath
2017-08-04 17:36:57 -07:00
Kenfe-Mickael Laventure 6056912217
Revert "Merge pull request #1450 from vrothberg/sgid-non-numeric"
This reverts commit 5c73abbe75, reversing
changes made to 51b501dab1.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-04 14:28:21 -07:00
Daniel, Dao Quang Minh 606fb713d9 Merge pull request #1545 from mlaventure/user-pkg-move-unix-call
Move user pkg unix specific calls to unix file
2017-08-03 23:29:58 +01:00
Kenfe-Mickael Laventure 25f4c7e72b
Move user pkg unix specific calls to unix file
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-03 11:31:21 -07:00
Kenfe-Mickael Laventure 9ed15e94c8
Fix condition to detect device type in DeviceFromPath
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2017-08-03 11:06:54 -07:00
Mrunal Patel 9a01140955 Merge pull request #1543 from avagin/maintainer
Remove @avagin as a maintainer
2017-08-02 11:12:42 -07:00
Andrei Vagin b9cff3c188 Remove @avagin as a maintainer
Unfortunately I don't have enough time to be a maintainer of runc.
I am not going to disappear from the community and as before
I always ready to help with anything.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
2017-08-02 10:55:08 -07:00
Adrian Reber 5d386f6e2b checkpoint: use CRIU VERSION RPC if available
With this runC also uses RPC to ask CRIU for its version. CRIU supports
a VERSION RPC since CRIU 3.0 and using the RPC interface does not
require parsing the console output of CRIU (which could change anytime).

For older CRIU versions which do not yet have the VERSION RPC runC falls
back to its old CRIU output parsing mode.

Once CRIU 3.0 is the minimum version required for runC the old code can
be removed.

v2:
 * adapt to changes in the previous patches based on the review

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-08-02 16:08:07 +00:00
Adrian Reber 2393692536 criurpc.proto: copy latest criurpc.proto from criu 3.3
Update criurpc.proto for the upcoming VERSION RPC.

This includes lazy_pages for the upcoming lazy migration support.

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-08-02 16:07:32 +00:00
Adrian Reber c71d9cd447 criuSwrk: prepare for CRIU VERSION RPC
To use the CRIU VERSION RPC the criuSwrk function is adapted to work
with CriuOpts set to 'nil' as CriuOpts is not required for the VERSION
RPC.

Also do not print c.criuVersion if it is '0' as the first RPC call will
always be the VERSION call and only after that the version will be
known.

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-08-02 16:07:28 +00:00
Adrian Reber c5f0ce979b checkCriuVersion: only ask criu once about its version
If the version of criu has already been determined there is no need to
ask criu for the version again. Use the value from c.criuVersion.

v2:
 * reduce unnecessary code movement in the patch series
 * factor out the criu version parsing into a separate function

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-08-02 16:07:15 +00:00
Adrian Reber b6c47281db checkCriuVersion: switch to version using int
The checkCriuVersion function used a string to specify the minimum
version required. This is more comfortable for an external interface
but for an internal function this added unnecessary complexity. This
changes to version string like '1.5.2' to an integer like 10502. This is
already the format used internally in the function.

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-08-02 16:05:27 +00:00
Michael Crosby 882d8eaba6 Merge pull request #1537 from tklauser/staticcheck
Fix issues found by staticcheck
2017-08-02 09:52:11 -04:00