jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kir Kolyshkin	59897367c4	cgroups/systemd: allow to set -1 as pids.limit Currently, both systemd cgroup drivers (v1 and v2) only set "TasksMax" unit property if the value > 0, so there is no way to update the limit to -1 / unlimited / infinity / max. Since systemd driver is backed by fs driver, and both fs and fs2 set the limit of -1 properly, it works, but systemd still has the old value: # runc --systemd-cgroup update $CT --pids-limit 42 # systemctl show runc-$CT.scope \| grep TasksMax TasksMax=42 # cat /sys/fs/cgroup/system.slice/runc-$CT.scope/pids.max 42 # ./runc --systemd-cgroup update $CT --pids-limit -1 # systemctl show runc-$CT.scope \| grep TasksMax= TasksMax=42 # cat /sys/fs/cgroup/system.slice/runc-xx77.scope/pids.max max Fix by changing the condition to allow -1 as a valid value. NOTE other negative values are still being ignored by systemd drivers (as it was done before). I am not sure whether this is correct, or should we return an error. A test case is added. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:20:04 -07:00
Kir Kolyshkin	95413ecdb0	tests/int/update: add cgroupv1 systemd CPU checks Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:19:03 -07:00
Kir Kolyshkin	06d7c1d261	systemd+cgroupv1: fix updating CPUQuotaPerSecUSec 1. do not allow to set quota without period or period without quota, as we won't be able to calculate new value for CPUQuotaPerSecUSec otherwise. 2. do not ignore setting quota to -1 when a period is not set. 3. update the test case accordingly. Note that systemd value checks will be added in the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:17:18 -07:00
Kir Kolyshkin	7abd93d156	tests/integration/update.bats: more systemd checks 1. add missing checks for systemd's MemoryMax / MemoryLimit. 2. add checks for systemd's MemoryLow and MemorySwapMax. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:16:50 -07:00
Kir Kolyshkin	e4a84bea99	cgroupv2+systemd: set MemoryLow For some reason, this was not set before. Test case is added by the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:15:29 -07:00
Kir Kolyshkin	4fc9fa05da	tests/int: simplify check_systemd_value use ...so it will be easier to write more tests Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:15:11 -07:00
Kir Kolyshkin	716079f95b	Merge pull request #2406 from cyphar/devices-cgroup-header cgroups: add copyright header to devices.Emulator implementation	2020-05-20 13:01:34 -07:00
Akihiro Suda	5b601c66d0	README.md: fix a dead link Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-05-21 02:31:33 +09:00
Akihiro Suda	cd4b71c27a	Merge pull request #2409 from adrianreber/go-criu-4-0-0 Update to latest go-criu	2020-05-21 01:39:09 +09:00
Kir Kolyshkin	28cd9d9c18	Merge pull request #2419 from tianon/buildmode-arch-toggle Remove "-buildmode=pie" from platforms that don't support it LGTMs: AkihiroSuda, kolyshkin	2020-05-20 09:15:55 -07:00
Mrunal Patel	9a808dd014	Merge pull request #2424 from giuseppe/errno-ret libcontainer: honor seccomp errnoRet	2020-05-20 07:41:01 -07:00
Adrian Reber	944e057025	Update to latest go-criu (4.0.2) This updates to the latest version of go-criu (4.0.2) which is based on CRIU 3.14. As go-criu provides an existing way to query the CRIU binary for its version this also removes all the code from runc to handle CRIU version checking and now relies on go-criu. An important side effect of this change is that this raises the minimum CRIU version to 3.0.0 as that is the first CRIU version that supports CRIU version queries via RPC in contrast to parsing the output of 'criu --version' CRIU 3.0 has been released in April of 2017. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-05-20 13:49:38 +02:00
Giuseppe Scrivano	41aa19662b	libcontainer: honor seccomp errnoRet Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-05-20 09:11:55 +02:00
Giuseppe Scrivano	510c79f9cf	vendor: update runtime-specs to 237cc4f519e Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-05-20 09:11:54 +02:00
Kir Kolyshkin	236ec04599	Dockerfile: speed up criu build ... in case we have more than one CPU, that is. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-19 17:19:14 -07:00
Tianon Gravi	be66519c26	Remove "-buildmode=pie" from platforms that don't support it This sequence (and syntax) is inspired by containerd's implementation of the same: `4e08c2de67/Makefile.linux (L21-L26)` Signed-off-by: Tianon Gravi <admwiggin@gmail.com>	2020-05-19 16:00:37 -07:00
Kir Kolyshkin	b207d578ec	Merge pull request #2418 from AkihiroSuda/fix-bad-rebase-2413 fix "libcontainer/cgroups/fs/cpuset.go:63:14: undefined: fmt"	2020-05-19 11:28:09 -07:00
Akihiro Suda	2fa3c286b5	fix "libcontainer/cgroups/fs/cpuset.go:63:14: undefined: fmt" The compilation error had ocurred because of a bad rebase during #2401 and #2413 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-05-19 23:38:20 +09:00
Akihiro Suda	f369199ff6	Merge pull request #2413 from JFHwang/2392-spec-check Add nil check of spec.Process in validateProcessSpec()	2020-05-19 08:11:22 +09:00
Mrunal Patel	53a4649776	Merge pull request #2401 from kolyshkin/fs-cpuset-mountinfo libct/cgroup: rm GetClosestMountpointAncestor using moby/sys/mountinfo parser	2020-05-18 10:43:55 -07:00
Mrunal Patel	825e91ada6	Merge pull request #2341 from kolyshkin/test-cpt-lazy runc checkpoint: fix --status-fd to accept fd	2020-05-18 10:43:24 -07:00
Mrunal Patel	67fac528d0	Merge pull request #2410 from lifubang/swap0patch cgroupv2: never write empty string to memory.swap.max	2020-05-18 10:42:17 -07:00
John Hwang	5aa0601a59	validateProcessSpec: prevent SEGV when config is valid json, but invalid. Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-18 09:38:22 -07:00
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
lifubang	9ad1beb40f	never write empty string to memory.swap.max Because the empty string means set swap to 0. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-05-16 06:52:14 +08:00
Aleksa Sarai	dc9a7879f9	cgroups: add copyright header to devices.Emulator implementation I forgot to include this in the original patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-15 11:29:51 +10:00
Akihiro Suda	3f1e886991	Merge pull request #2391 from cyphar/devices-cgroup cgroup: devices: major cleanups and minimal transition rules	2020-05-14 09:57:06 +09:00
Kir Kolyshkin	2db3240f35	libct/cgroups: rm GetClosestMountpointAncestor The function GetClosestMountpointAncestor is not very efficient, does not really belong to cgroup package, and is only used once (from fs/cpuset.go). Remove it, replacing with the implementation based on moby/sys/mountinfo parser. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-13 17:32:06 -07:00
Kir Kolyshkin	f160352682	libct/cgroup: prep to rm GetClosestMountpointAncestor This function is not very efficient, does not really belong to cgroup package, and is only used once (from fs/cpuset.go). Prepare to remove it by replacing with the implementation based on the parser from github.com/moby/sys/mountinfo parser. This commit is here to make sure the proposed replacement passes the unit test. Funny, but the unit test need to be slightly modified since it supplies the wrong mountinfo (space as the first character, empty line at the end). Validated by $ go test -v -run Ance === RUN TestGetClosestMountpointAncestor --- PASS: TestGetClosestMountpointAncestor (0.00s) PASS ok github.com/opencontainers/runc/libcontainer/cgroups 0.002s Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-13 16:26:16 -07:00
Kir Kolyshkin	85d4264d8a	Merge pull request #2390 from lifubang/threadedordomain cgroupv2: don't enable threaded mode by default LGTMs: AkihiroSuda, cyphar, kolyshkin	2020-05-13 14:30:25 -07:00
Kir Kolyshkin	4b71877f99	Merge pull request #2292 from Creatone/creatone/extend-intelrdt Add RDT Memory Bandwidth Monitoring (MBM) and Cache Monitoring Technology (CMT) statistics.	2020-05-13 13:33:55 -07:00
Kir Kolyshkin	41855317b6	Merge pull request #2271 from katarzyna-z/kk-cpuacct-usage-all Add reading of information from cpuacct.usage_all	2020-05-13 13:33:05 -07:00
lifubang	fe0669b26d	don't enable threaded mode by default Because in threaded mode, we can't enable the memory controller -- it isn't thread-aware. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-05-13 16:27:36 +08:00
Aleksa Sarai	ba6eb28229	tests: add integration test for paused-and-updated containers Such containers should remain paused after the update. This has historically been true, but this helps ensure that the systemd cgroup changes (freezing the container during SetUnitProperties) don't break this behaviour. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:44:11 +10:00
Aleksa Sarai	4438eaa5e4	tests: add integration test for devices transition rules Unfortunately, runc update doesn't support setting devices rules directly so we have to trigger it by modifying a different rule (which happens to trigger a devices update). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:44:11 +10:00
Aleksa Sarai	b810da1490	cgroups: systemd: make use of Device= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad and was a clear security bug. Luckily the commits which introduced this were never in an actual runc release. So we simply generate the cgroupv1-style rules (which is what systemd's DeviceAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DeviceAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: `afe83489d4` ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Fixes: `1d4ccc8e0c` ("fix data inconsistent when runc update in systemd driven cgroup v1") Fixes: `7682a2b2a5` ("fix data inconsistent when runc update in systemd driven cgroup v2") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:43:56 +10:00
Aleksa Sarai	afe83489d4	cgroupv1: devices: use minimal transition rules with devices.Emulator Now that all of the infrastructure for devices.Emulator is in place, we can finally implement minimal transition rules for devices cgroups. This allows for minimal disruption to running containers if a rule update is requested. Only in very rare circumstances (black-list cgroups and mode switching) will a clear-all rule be written. As a result, containers should no longer see spurious errors. A similar issue affects the cgroupv2 devices setup, but that is a topic for another time (as the solution is drastically different). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:42:43 +10:00
Aleksa Sarai	2353ffec2b	cgroups: implement a devices cgroupv1 emulator Okay, this requires a bit of explanation. The reason for this emulation is to allow us to have seamless updates of the devices cgroup for running containers. This was triggered by several users having issues where our initial writing of a deny-all rule (in all cases) results in spurrious errors. The obvious solution would be to just remove the deny-all rule, right? Well, it turns out that runc doesn't actually control the deny-all rule because all users of runc have explicitly specified their own deny-all rule for many years. This appears to have been done to work around a bug in runc (which this series has fixed in [1]) where we would actually act as a black-list despite this being a violation of the OCI spec. This means that not adding our own deny-all rule in the case of updates won't solve the issue. However, it will also not solve the issue in several other cases (the most notable being where a container is being switched between default-permission modes). So in order to handle all of these cases, a way of tracking the relevant internal cgroup state (given a certain state of "cgroups.list" and a set of rules to apply) is necessary. That is the purpose of DevicesEmulator. Reading "devices.list" is quite important because that's the only way we can tell if it's safe to skip the troublesome deny-all rules without making potentially-dangerous assumptions about the container. We also are currently bug-compatible with the devices cgroup (namely, removing rules that don't exist or having superfluous rules all works as with the in-kernel implementation). The only exception to this is that we give an error if a user requests to revoke part of a wildcard exception, because allowing such configurations could result in security holes (cgroupv1 silently ignores such rules, meaning in white-list mode that the access is still permitted). [1]: `b2bec9806f` ("cgroup: devices: eradicate the Allow/Deny lists") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:42:20 +10:00
Aleksa Sarai	24388be71e	configs: use different types for .Devices and .Resources.Devices Making them the same type is simply confusing, but also means that you could accidentally use one in the wrong context. This eliminates that problem. This also includes a whole bunch of cleanups for the types within DeviceRule, so that they can be used more ergonomically. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	60e21ec26e	specconv: remove default /dev/console access /dev/console is a host resouce which gives a bunch of permissions that we really shouldn't be giving to containers, not to mention that /dev/console in containers is actually /dev/pts/$n. Drop this since arguably this is a fairly scary thing to allow... Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	b2bec9806f	cgroup: devices: eradicate the Allow/Deny lists These lists have been in the codebase for a very long time, and have been unused for a large portion of that time -- specconv doesn't generate them and the only user of these flags has been tests (which doesn't inspire much confidence). In addition, we had an incorrect implementation of a white-list policy. This wasn't exploitable because all of our users explicitly specify "deny all" as the first rule, but it was a pretty glaring issue that came from the "feature" that users can select whether they prefer a white- or black- list. Fix this by always writing a deny-all rule (which is what our users were doing anyway, to work around this bug). This is one of many changes needed to clean up the devices cgroup code. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	859a780d6f	cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Aleksa Sarai	a79fa7caa0	contrib: recvtty: add --no-stdin flag This is mostly just useful for testing with the "single" mode, since it allows you to run recvtty in the background without the console being closed. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Mrunal Patel	df3d7f673a	Merge pull request #2393 from kolyshkin/criu-pi Vagrantfile: use criu from stable repo	2020-05-12 17:48:34 -07:00
Akihiro Suda	58bf083500	Merge pull request #2400 from kolyshkin/bats-1.2.0 Dockerfile: bump bats to 1.2.0	2020-05-13 08:56:53 +09:00
Kir Kolyshkin	17aee8c432	Dockerfile: bump bats to 1.2.0 Release notes: https://github.com/bats-core/bats-core/releases/tag/v1.2.0 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-12 11:54:17 -07:00
Akihiro Suda	2b9a36ee8c	Merge pull request #2398 from pkagrawal/master Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig	2020-05-12 15:05:55 +09:00
Mrunal Patel	867c9f5bc4	Merge pull request #2386 from kolyshkin/gordian-knot Simplify cgroup paths handling in v2 via unified v1/v2 API	2020-05-11 21:20:33 -07:00
Kir Kolyshkin	ca1d135bd4	runc checkpoint: fix --status-fd to accept fd 1. The command `runc checkpoint --lazy-server --status-fd $FD` actually accepts a file name as an $FD. Make it accept a file descriptor, like its name implies and the documentation states. In addition, since runc itself does not use the result of CRIU status fd, remove the code which relays it, and pass the FD directly to CRIU. Note 1: runc should close this file descriptor itself after passing it to criu, otherwise whoever waits on it might wait forever. Note 2: due to the way criu swrk consumes the fd (it reopens /proc/$SENDER_PID/fd/$FD), runc can't close it as soon as criu swrk has started. There is no good way to know when criu swrk has reopened the fd, so we assume that as soon as we have received something back, the fd is already reopened. 2. Since the meaning of --status-fd has changed, the test case using it needs to be fixed as well. Modify the lazy migration test to remove "sleep 2", actually waiting for the the lazy page server to be ready. While at it, - remove the double fork (using shell's background process is sufficient here); - check the exit code for "runc checkpoint" and "criu lazy-pages"; - remove the check for no errors in dump.log after restore, as we are already checking its exit code. [v2: properly close status fd after spawning criu] [v3: move close status fd to after the first read] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-11 15:36:50 -07:00
Pradyumna Agrawal	4aa9101477	Honor spec.Process.NoNewPrivileges in specconv.CreateLibcontainerConfig The change ensures that the passed in value of NoNewPrivileges under spec.Process is reflected in the container config generated by specconv.CreateLibcontainerConfig Closes #2397 Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>	2020-05-11 13:38:14 -07:00

1 2 3 4 5 ...

4443 Commits All Branches Search

4443 Commits

All Branches