jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kir Kolyshkin	2a322e91ec	cgroupv1: remove subsystemSet.Get() Instead of iterating over m.paths, iterate over subsystems and look up the path for each. This is faster since a map lookup is faster than iterating over the names in Get. A quick benchmark shows that the new way is 2.5x faster than the old one. Note though that this is not done to make things faster, as savings are negligible, but to make things simpler by removing some code. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-06 18:31:46 -07:00
Mrunal Patel	30dc54a995	Merge pull request #2503 from giuseppe/cgroup-fixes cgroup, systemd: cleanup cgroups	2020-07-06 15:14:29 -07:00
Mrunal Patel	3f81131845	Merge pull request #2490 from kolyshkin/dev-opt libct/cgroups: add SkipDevices to Resources	2020-07-06 14:28:30 -07:00
Giuseppe Scrivano	32034481ea	cgroup, systemd: cleanup cgroups some hierarchies were created directly by .Apply() on top of systemd managed cgroups. systemd doesn't manage these and as a result we leak these cgroups. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-06 23:06:16 +02:00
Giuseppe Scrivano	2deaeab08f	cgroup: store the result of IsRunningSystemd Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-05 12:42:27 +02:00
Kir Kolyshkin	cd479f9d14	cgroupv1/freezer: don't use subsystemSet.Get() Iterating over the list of subsystems and comparing their names to get an instance of fs.cgroupFreezer is useless and a waste of time, since it is a shallow type (i.e. does not have any data/state) and we can create an instance in place. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-03 14:00:44 -07:00
Kir Kolyshkin	108ee85b82	libct/cgroups: add SkipDevices to Resources The kubelet uses libct/cgroups code to set up cgroups. It creates a parent cgroup (kubepods) to put the containers into. The problem (for cgroupv2 that uses eBPF for device configuration) is the hard requirement to have devices cgroup configured results in leaking an eBPF program upon every kubelet restart. program. If kubelet is restarted 64+ times, the cgroup can't be configured anymore. Work around this by adding a SkipDevices flag to Resources. A check was added so that if SkipDevices is set, such a "container" can't be started (to make sure it is only used for non-containers). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-02 15:19:31 -07:00
Peter Hunt	6a0f64e7c9	systemd: add unit tests for systemdVersion Signed-off-by: Peter Hunt <pehunt@redhat.com>	2020-06-18 22:30:50 -04:00
Peter Hunt	6369e38871	systemd: parse systemdVersion in more situations there have been cases observed where instead of `v$VER.0-$OS` the systemdVersion returned is just `$VER`, or `$VER-1`. handle these cases Signed-off-by: Peter Hunt <pehunt@redhat.com>	2020-06-18 22:30:50 -04:00
Mrunal Patel	406298fdf0	Merge pull request #2466 from kolyshkin/systemd-cpu-quota-period cgroups/systemd: add setting CPUQuotaPeriod prop	2020-06-17 12:03:30 -07:00
Kir Kolyshkin	e751a168dc	cgroups/systemd: add setting CPUQuotaPeriod prop For some reason, runc systemd drivers (both v1 and v2) never set systemd unit property named `CPUQuotaPeriod` (known as `CPUQuotaPeriodUSec` on dbus and in `systemctl show` output). Set it, and add a check to all the integration tests. The check is less than trivial because, when not set, the value is shown as "infinity" but when set to the same (default) value, shown as "100ms", so in case we expect 100ms (period = 100000 us), we have to _also_ check for "infinity". [v2: add systemd version checks since CPUQuotaPeriod requires v242+] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 15:48:06 -07:00
Kir Kolyshkin	dd2426d067	libct/cgroups: fix m.paths map access This fixes a few cases of accessing m.paths map directly without holding the mutex lock. Fixes: `9087f2e82` Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-15 18:30:16 -07:00
Kir Kolyshkin	5b247e739c	Merge pull request #2338 from lifubang/systemdcgroupv2 fix path error in systemd when stopped LGTMs: @mrunalp @AkihiroSuda	2020-06-15 18:01:13 -07:00
Kir Kolyshkin	a92b0327ce	cgroups/systemd: fix set CPU quota if period is unset systemd drivers ignore --cpu-quota during update if the CPU period was not set earlier. Fixed by adding the default for the period. The test will be added by the following commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-09 17:32:17 -07:00
Kir Kolyshkin	8b9646775e	cgroups/systemd: unify adding CpuQuota The code that adds CpuQuotaPerSecUSec is the same in v1 and v2 systemd cgroup driver. Move it to common. No functional change. Note that the comment telling that we always set this property contradicts with the current code, and therefore it is removed. [v2: drop cgroupv1-specific comment] [v3: drop returning error as it's not used] [v4: remove an obsoleted comment] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-09 17:14:43 -07:00
Kir Kolyshkin	2ce20ed158	cgroups/systemd: simplify gen*ResourcesProperties Use r instead of c.Resources for readability. No functional change. This commit has been brought to you by '<,'>s/c\.Resources\./r./g Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-08 13:42:09 -07:00
lifubang	9087f2e827	fix path error in systemd when stopped When we use cgroup with systemd driver, the cgroup path will be auto removed by systemd when all processes exited. So we should check cgroup path exists when we access the cgroup path, for example in `kill/ps`, or else we will got an error. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-06-02 18:17:43 +08:00
Mrunal Patel	332a84581e	Merge pull request #2443 from kolyshkin/kmem-fixup cgroupv1/systemd.Set: don't enable kernel memory acct	2020-05-31 10:04:45 -07:00
Kir Kolyshkin	3fe6e04510	cgroupv1/systemd.Set: don't enable kernel memory acct This is a regression from commit `1d4ccc8e0`. We only need to enable kernel memory accounting once, from the (legacyManager).Apply(), and there is no need to do it in (legacyManager).Set(). While at it, rename the method to better reflect what it's doing. This saves 1 call to mountinfo parser. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-29 17:54:50 -07:00
Kir Kolyshkin	3249e2379c	cgroupv1: check cpu shares in place Commit `4e65e0e90a` added a check for cpu shares. Apparently, the kernel allows to set a value higher than max or lower than min without an error, but the value read back is always within the limits. The check (which was later moved out to a separate CheckCpushares() function) is always performed after setting the cpu shares, so let's move it to the very place where it is set. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-29 16:46:28 -07:00
Kir Kolyshkin	be5467872d	cgroupv1: minimal fix for cpu quota regression This is a quick-n-dirty fix the regression introduced by commit `06d7c1d`, which made it impossible to only set CpuQuota (without the CpuPeriod). It partially reverts the above commit, and adds a test case. The proper fix will follow. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-26 11:02:16 -07:00
Kir Kolyshkin	3c6e8ac4d2	cgroupv2: set mem+swap to max if mem set to max ... and mem+swap is not explicitly set otherwise. This ensures compatibility with cgroupv1 controller which interprets things this way. With this fixed, we can finally enable swap tests for cgroupv2. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-22 21:32:16 -07:00
Kir Kolyshkin	59897367c4	cgroups/systemd: allow to set -1 as pids.limit Currently, both systemd cgroup drivers (v1 and v2) only set "TasksMax" unit property if the value > 0, so there is no way to update the limit to -1 / unlimited / infinity / max. Since systemd driver is backed by fs driver, and both fs and fs2 set the limit of -1 properly, it works, but systemd still has the old value: # runc --systemd-cgroup update $CT --pids-limit 42 # systemctl show runc-$CT.scope \| grep TasksMax TasksMax=42 # cat /sys/fs/cgroup/system.slice/runc-$CT.scope/pids.max 42 # ./runc --systemd-cgroup update $CT --pids-limit -1 # systemctl show runc-$CT.scope \| grep TasksMax= TasksMax=42 # cat /sys/fs/cgroup/system.slice/runc-xx77.scope/pids.max max Fix by changing the condition to allow -1 as a valid value. NOTE other negative values are still being ignored by systemd drivers (as it was done before). I am not sure whether this is correct, or should we return an error. A test case is added. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:20:04 -07:00
Kir Kolyshkin	06d7c1d261	systemd+cgroupv1: fix updating CPUQuotaPerSecUSec 1. do not allow to set quota without period or period without quota, as we won't be able to calculate new value for CPUQuotaPerSecUSec otherwise. 2. do not ignore setting quota to -1 when a period is not set. 3. update the test case accordingly. Note that systemd value checks will be added in the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:17:18 -07:00
Kir Kolyshkin	e4a84bea99	cgroupv2+systemd: set MemoryLow For some reason, this was not set before. Test case is added by the next commit. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-20 13:15:29 -07:00
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
Aleksa Sarai	b810da1490	cgroups: systemd: make use of Device= properties It seems we missed that systemd added support for the devices cgroup, as a result systemd would actually write an allow-all rule each time you did 'runc update'* if you used the systemd cgroup driver. This is obviously ... bad and was a clear security bug. Luckily the commits which introduced this were never in an actual runc release. So we simply generate the cgroupv1-style rules (which is what systemd's DeviceAllow wants) and default to a deny-all ruleset. Unfortunately it turns out that systemd is susceptible to the same spurrious error failure that we were, so that problem is out of our hands for systemd cgroup users. However, systemd has a similar bug to the one fixed in [1]. It will happily write a disruptive deny-all rule when it is not necessary. Unfortunately, we cannot even use devices.Emulator to generate a minimal set of transition rules because the DBus API is limited (you can only clear or append to the DeviceAllow= list -- so we are forced to always clear it). To work around this, we simply freeze the container during SetUnitProperties. [1]: `afe83489d4` ("cgroupv1: devices: use minimal transition rules with devices.Emulator") Fixes: `1d4ccc8e0c` ("fix data inconsistent when runc update in systemd driven cgroup v1") Fixes: `7682a2b2a5` ("fix data inconsistent when runc update in systemd driven cgroup v2") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:43:56 +10:00
Aleksa Sarai	859a780d6f	cgroups: add GetFreezerState() helper to Manager This is effectively a nicer implementation of the container.isPaused() helper, but to be used within the cgroup code for handling some fun issues we have to fix with the systemd cgroup driver. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2020-05-13 17:38:45 +10:00
Kir Kolyshkin	714c91e9f7	Simplify cgroup path handing in v2 via unified API This unties the Gordian Knot of using GetPaths in cgroupv2 code. The problem is, the current code uses GetPaths for three kinds of things: 1. Get all the paths to cgroup v1 controllers to save its state (see (linuxContainer).currentState(), (LinuxFactory).loadState() methods). 2. Get all the paths to cgroup v1 controllers to have the setns process enter the proper cgroups in `(*setnsProcess).start()`. 3. Get the path to a specific controller (for example, `m.GetPaths()["devices"]`). Now, for cgroup v2 instead of a set of per-controller paths, we have only one single unified path, and a dedicated function `GetUnifiedPath()` to get it. This discrepancy between v1 and v2 cgroupManager API leads to the following problems with the code: - multiple if/else code blocks that have to treat v1 and v2 separately; - backward-compatible GetPaths() methods in v2 controllers; - - repeated writing of the PID into the same cgroup for v2; Overall, it's hard to write the right code with all this, and the code that is written is kinda hard to follow. The solution is to slightly change the API to do the 3 things outlined above in the same manner for v1 and v2: 1. Use `GetPaths()` for state saving and setns process cgroups entering. 2. Introduce and use Path(subsys string) to obtain a path to a subsystem. For v2, the argument is ignored and the unified path is returned. This commit converts all the controllers to the new API, and modifies all the users to use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 12:04:06 -07:00
Kir Kolyshkin	51e1a0842d	libct/cgroups/systemd/v1: privatize v1 manager This patch was generated entirely by gorename -- nothing to review here. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 10:09:48 -07:00
Kir Kolyshkin	d827e323b0	libct/cgroups/systemd/v1: add NewLegacyManager Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 10:07:40 -07:00
Kir Kolyshkin	24f945e08d	libct/cgroups/systemd/v2: return a public interface Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-05-08 10:06:02 -07:00
Akihiro Suda	bf15cc99b1	cgroup v2: support rootless systemd Tested with both Podman (master) and Moby (master), on Ubuntu 19.10 . $ podman --cgroup-manager=systemd run -it --rm --runtime=runc \ --cgroupns=host --memory 42m --cpus 0.42 --pids-limit 42 alpine / # cat /proc/self/cgroup 0::/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/memory.max 44040192 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/cpu.max 42000 100000 / # cat /sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/user.slice/libpod-132ff0d72245e6f13a3bbc6cdc5376886897b60ac59eaa8dea1df7ab959cbf1c.scope/pids.max 42 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-05-08 12:39:20 +09:00
lifubang	a70f354680	let runc disable swap in cgroup v2 In cgroup v2, when memory and memorySwap set to the same value which is greater than zero, runc should write zero in `memory.swap.max` to disable swap. Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-05-03 20:57:36 +08:00
lifubang	bfa1b2aab3	check that StartTransientUnit and StopUnit succeeds Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-04-28 15:46:28 +08:00
lifubang	1d4ccc8e0c	fix data inconsistent when runc update in systemd driven cgroup v1 Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-04-23 19:32:57 +08:00
lifubang	7682a2b2a5	fix data inconsistent when runc update in systemd driven cgroup v2 Signed-off-by: lifubang <lifubang@acmcoder.com>	2020-04-23 19:32:07 +08:00
Kir Kolyshkin	4b4bc995ad	CreateCgroupPath: only enable needed controllers 1. Instead of enabling all available controllers, figure out which ones are required, and only enable those. 2. Amend all setFoo() functions to call isFooSet(). While this might seem unnecessary, it might actually help to uncover a bug. Imagine someone: - adds a cgroup.Resources.CpuFoo setting; - modifies setCpu() to apply the new setting; - but forgets to amend isCpuSet() accordingly <-- BUG In this case, a test case modifying CpuFoo will help to uncover the BUG. This is the reason why it's added. This patch could be amended by enabling controllers on a best-effort basis, i.e. : - do not return an error early if we can't enable some controllers; - if we fail to enable all controllers at once (usually because one of them can't be enabled), try enabling them one by one. Currently this is not implemented, and it's not clear whether this would be a good way to go or not. [v2: add/use is${Controller}Set() functions] [v3: document neededControllers()] [v4: drop "best-effort" part] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	bb47e35843	cgroup/systemd: reorganize 1. Rename the files - v1.go: cgroupv1 aka legacy; - v2.go: cgroupv2 aka unified hierarchy; - unsupported.go: when systemd is not available. 2. Move the code that is common between v1 and v2 to common.go Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	813cb3eb94	cgroupv2: fix fs2 cgroup init fs2 cgroup driver was not working because it did not enable controllers while creating cgroup directory; instead it was merely doing MkdirAll() and gathered the list of available controllers in NewManager(). Also, cgroup should be created in Apply(), not while creating a new manager instance. To fix: 1. Move the createCgroupsv2Path function from systemd driver to fs2 driver, renaming it to CreateCgroupPath. Use in Apply() from both fs2 and systemd drivers. 2. Delay available controllers map initialization to until it is needed. With this patch: - NewManager() only performs minimal initialization (initializin m.dirPath, if not provided); - Apply() properly creates cgroup path, enabling the controllers; - m.controllers is initialized lazily on demand. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	dbeff89491	cgroupv2/systemd: privatize UnifiedManager ... and its Cgroup field. There is no sense to keep it public. This was generated by gorename. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	88c13c0713	cgroupv2: use SecureJoin in systemd driver It seems that some paths are coming from user and are therefore untrusted. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:20:22 -07:00
Kir Kolyshkin	9c80cd672d	cgroupv2: rm legacy Paths from systemd driver Having map of per-subsystem paths in systemd unified cgroups driver does not make sense and makes the code less readable. To get rid of it, move the systemd v1-or-v2 init code to libcontainer/factory_linux.go which already has a function to deduce unified path out of paths map. End result is much cleaner code. Besides, we no longer write pid to the same cgroup file 7 times in Apply() like we did before. While at it - add `rootless` flag which is passed on to fs2 manager - merge getv2Path() into GetUnifiedPath(), don't overwrite path if it is set during initialization (on Load). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:19:51 -07:00
Ted Yu	614bb96676	cgroupv2/systemd: Properly remove intermediate directory Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-04-13 08:32:08 -07:00
Kir Kolyshkin	c86be8a2c1	cgroupv2: fix setting MemorySwap The resources.MemorySwap field from OCI is memory+swap, while cgroupv2 has a separate swap limit, so subtract memory from the limit (and make sure values are set and sane). Make sure to set MemorySwapMax for systemd, too. Since systemd does not have MemorySwapMax for cgroupv1, it is only needed for v2 driver. [v2: return -1 on any negative value, add unit test] [v3: treat any negative value other than -1 as error] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-07 20:45:53 -07:00
Tobias Klauser	3e678c08f9	Remove unused consts testScopeWait and testSliceWait These are unused since commit `518c855833` ("Remove libcontainer detection for systemd features") Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2020-04-03 21:09:43 +02:00
Mrunal Patel	d05e5728aa	systemd: Lazy initialize the systemd dbus connection Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-03-30 15:24:06 -07:00
Mrunal Patel	33c6125da6	systemd: Export IsSystemdRunning() function Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-03-30 15:24:06 -07:00
Kir Kolyshkin	a949e4f22f	cgroupv2: UnifiedManager.Apply: simplify Remove joinCgroupsV2() function, as its name and second parameter are misleading. Use createCgroupsv2Path() directly, do not call getv2Path() twice. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-26 19:20:00 -07:00
Kir Kolyshkin	5406833a65	cgroupv2/systemd: add getv2Path Function getSubsystemPath(), while works for v2 unified case, is suboptimal, as it does a few unnecessary calls. Add a simplified version of getSubsystemPath(), called getv2Path(), and use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-26 19:17:09 -07:00

1 2 3

138 Commits