jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Akihiro Suda	f668854938	Merge pull request #2499 from kolyshkin/find-cgroup-mountpoint-fastpath cgroupv1/FindCgroupMountpoint: add a fast path	2020-08-04 14:06:41 +09:00
Akihiro Suda	234d15ecd0	Merge pull request #2520 from thaJeztah/bump_runtime_spec vendor: update runtime-spec v1.0.3-0.20200728170252-4d89ac9fbff6	2020-08-04 14:05:33 +09:00
Akihiro Suda	78d02e8563	Merge pull request #2534 from adrianreber/go-criu-4-1-0 Pass location of CRIU binary to go-criu	2020-08-03 16:21:50 +09:00
Kir Kolyshkin	3de3112c61	Merge pull request #2525 from adrianreber/external-pidns Tell CRIU to use an external pid namespace if necessary	2020-07-31 17:50:27 -07:00
Adrian Reber	6f4616dd73	Pass location of CRIU binary to go-criu If the CRIU binary is in a non $PATH location and passed to runc via '--criu /path/to/criu', this information has not been passed to go-criu and since the switch to use go-criu for CRIU version detection, non $PATH CRIU usage was broken. This uses the newly added go-criu interface to pass the location of the binary to go-criu. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-31 11:14:15 +02:00
Akihiro Suda	d6f5641c20	Merge pull request #2507 from kolyshkin/alt-to-2497 libct/cgroups/GetCgroupRoot: make it faster	2020-07-31 11:43:38 +09:00
Mrunal Patel	46243fcea1	Merge pull request #2500 from kolyshkin/fs-apply libct/cgroups/fs: rework Apply()	2020-07-30 16:39:53 -07:00
Kir Kolyshkin	e0c0b0cf32	libct/cgroups/GetCgroupRoot: make it faster ...by checking the default path first. Quick benchmark shows it's about 5x faster on an idle system, and the gain should be much more on a system doing mounts etc. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-30 13:45:21 -07:00
Sebastiaan van Stijn	901dccf05d	vendor: update runtime-spec v1.0.3-0.20200728170252-4d89ac9fbff6 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2020-07-30 22:08:54 +02:00
Aleksa Sarai	95a59bf206	devices: correctly check device types (mode&S_IFCHR == S_IFCHR) is the wrong way of checking the type of an inode because the S_IF* bits are actually not a bitmask and instead must be checked using S_IF*. This bug was neatly hidden behind a (major == 0) sanity-check but that was removed by [1]. In addition, add a test that makes sure that HostDevices() doesn't give rubbish results -- because we broke this and fixed this before[2]. [1]: `24388be71e` ("configs: use different types for .Devices and .Resources.Devices") [2]: `3ed492ad33` ("Handle non-devices correctly in DeviceFromPath") Fixes: `b0d014d0e1` ("libcontainer: one more switch from syscall to x/sys/unix") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>	2020-07-28 19:04:30 +10:00
Adrian Reber	09e103b01e	Tell CRIU to use an external pid namespace if necessary Trying to checkpoint a container out of pod in cri-o fails with: Error (criu/namespaces.c:1081): Can't dump a pid namespace without the process init Starting with the upcoming CRIU release 3.15, CRIU can be told to ignore the PID namespace during checkpointing and to restore processes into an existing network namespace. With the changes from this commit and CRIU 3.15 it is possible to checkpoint a container out of a pod in cri-o. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-27 10:14:08 +02:00
Adrian Reber	610c5ad75c	Factor out checkpointing with external namespace code To checkpoint and restore a container with an external network namespace (like with Podman and CNI), runc tells CRIU to ignore the network namespace during checkpoint and restore. This commit moves that code to their own functions to be able to reuse the same code path for external PID namespaces which are necessary for checkpointing and restoring containers out of a pod in cri-o. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-07-27 10:14:07 +02:00
Xiaodong Liu	af283b3f47	remove redundant the parameter of chroot function Signed-off-by: Xiaodong Liu <liuxiaodong@loongson.cn>	2020-07-15 16:22:07 +08:00
Mrunal Patel	cf1273abf4	Merge pull request #2498 from kolyshkin/v1-code-cleanups libct/cgroups/fs: code cleanups	2020-07-09 15:58:06 -07:00
Kir Kolyshkin	fbf047bf2f	Merge pull request #2501 from XiaodongLoong/systemderror-fix fix TestPidsSystemd and TestRunWithKernelMemorySystemd test error	2020-07-08 20:39:39 -07:00
Xiaodong Liu	f57bb2fe3d	fix TestPidsSystemd and TestRunWithKernelMemorySystemd test error Signed-off-by: Xiaodong Liu <liuxiaodong@loongson.cn>	2020-07-09 09:36:03 +08:00
Daniel J Walsh	d78ee47154	Allow libcontainer/configs to be imported on Windows Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2020-07-08 15:20:37 -04:00
Kir Kolyshkin	a73ce38d16	cgroupv1/FindCgroupMountpoint: add a fast path In case cgroupPath is under the default cgroup prefix, let's try to guess the mount point by adding the subsystem name to the default prefix, and resolving the resulting path in case it's a symlink. In most cases, given the default cgroup setup, this trick should result in returning the same result faster, and avoiding /proc/self/mountinfo parsing which is relatively slow and problematic. Be very careful with the default path, checking it is - a directory; - a mount point; - has cgroup fstype. If something is not right, fall back to parsing mountinfo. While at it, remove the obsoleted comment about mountinfo parsing. The comment belongs to findCgroupMountpointAndRootFromReader(), but rather than moving it there, let's just remove it, since it does not add any value in understanding the current code. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-07 13:57:33 -07:00
Kir Kolyshkin	c1adc99a20	cgroup/fs: rework Apply() In manager.Apply() method, a path to each subsystem is obtained by calling d.path(sys.Name()), and the sys.Apply() is called that does the same call to d.path() again. d.path() is an expensive call, so rather than to call it twice, let's reuse the result. This results the number of times we parse mountinfo during container start from 62 to 34 on my setup. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-07 10:58:37 -07:00
Aleksa Sarai	819fcc687e	merge branch 'pr-2495' Kir Kolyshkin (1): cgroups/fs/path: optimize LGTMs: @mrunalp @cyphar Closes #2495	2020-07-07 11:51:06 +10:00
Kir Kolyshkin	2a322e91ec	cgroupv1: remove subsystemSet.Get() Instead of iterating over m.paths, iterate over subsystems and look up the path for each. This is faster since a map lookup is faster than iterating over the names in Get. A quick benchmark shows that the new way is 2.5x faster than the old one. Note though that this is not done to make things faster, as savings are negligible, but to make things simpler by removing some code. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-06 18:31:46 -07:00
Kir Kolyshkin	daf30cb7ca	cgroups/fs: rm getSubsystems It does not add any value. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-06 18:29:14 -07:00
Kir Kolyshkin	2e22579946	libct/cgroups/fs.GetStats: drop PathExists check Half of controllers' GetStats just return nil, and most of the others ignore ENOENT on files, so it will be cheaper to not check that the path exists in the main GetStats method, offloading that to the controllers. Drop PathExists check from GetStats, add it to those controllers' GetStats where it was missing. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-06 18:02:17 -07:00
Kir Kolyshkin	11fb94965c	cgroups/fs: rm Remove method from controllers To my surprise, those are not used anywhere in the code. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-06 18:02:17 -07:00
Mrunal Patel	30dc54a995	Merge pull request #2503 from giuseppe/cgroup-fixes cgroup, systemd: cleanup cgroups	2020-07-06 15:14:29 -07:00
Mrunal Patel	3f81131845	Merge pull request #2490 from kolyshkin/dev-opt libct/cgroups: add SkipDevices to Resources	2020-07-06 14:28:30 -07:00
Giuseppe Scrivano	32034481ea	cgroup, systemd: cleanup cgroups some hierarchies were created directly by .Apply() on top of systemd managed cgroups. systemd doesn't manage these and as a result we leak these cgroups. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-06 23:06:16 +02:00
Mrunal Patel	46a304b592	Merge pull request #2502 from tjucoder/master make sure pty.Close() will be called and fix comment	2020-07-06 11:49:20 -07:00
Giuseppe Scrivano	2deaeab08f	cgroup: store the result of IsRunningSystemd Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2020-07-05 12:42:27 +02:00
tjucoder	ab35cfe23c	make sure pty.Close() will be called and fix comment Signed-off-by: tjucoder <chinesecoder@foxmail.com>	2020-07-05 16:37:21 +08:00
Kir Kolyshkin	62a30709d2	cgroups/fs/path: optimize The result of cgroupv1.FindCgroupMountpoint() call (which is relatively expensive) is only used in case raw.innerPath is absolute, so it only makes sense to call it in that case. This drastically reduces the number of calls to FindCgroupMountpoint during container start (from 116 to 62 in my setup). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-03 14:07:27 -07:00
Kir Kolyshkin	46b26bc05d	cgroups/fs/Freeze: simplify In here, defer looks like an overkill, since the code is very simple and we already have an error path. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-03 14:02:57 -07:00
Kir Kolyshkin	cd479f9d14	cgroupv1/freezer: don't use subsystemSet.Get() Iterating over the list of subsystems and comparing their names to get an instance of fs.cgroupFreezer is useless and a waste of time, since it is a shallow type (i.e. does not have any data/state) and we can create an instance in place. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-03 14:00:44 -07:00
Kir Kolyshkin	108ee85b82	libct/cgroups: add SkipDevices to Resources The kubelet uses libct/cgroups code to set up cgroups. It creates a parent cgroup (kubepods) to put the containers into. The problem (for cgroupv2 that uses eBPF for device configuration) is the hard requirement to have devices cgroup configured results in leaking an eBPF program upon every kubelet restart. program. If kubelet is restarted 64+ times, the cgroup can't be configured anymore. Work around this by adding a SkipDevices flag to Resources. A check was added so that if SkipDevices is set, such a "container" can't be started (to make sure it is only used for non-containers). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-07-02 15:19:31 -07:00
Aleksa Sarai	0fa097fc37	merge branch 'pr-2481' Tianjia Zhang (1): nsenter: fix repeat close() operations LGTMs: @kolyshkin @cyphar Closes #2481	2020-06-20 12:18:31 +10:00
Kir Kolyshkin	dff7685c18	Merge pull request #2459 from tedyu/linux-cont-set-cfg Set configs back when intelrdt configs cannot be set LGTMS: @AkihiroSuda @kolyshkin	2020-06-19 12:57:53 -07:00
Kir Kolyshkin	e643db6e0f	Merge pull request #2479 from haircommander/fix-systemd-version systemd: parse systemdVersion when only an int is returned LGTMS: @mrunalp @kolyshkin	2020-06-19 12:19:16 -07:00
Tianjia Zhang	04806abd39	nsenter: fix repeat close() operations It is obvious that the loop at the first place executes at least twice, and the close() call after the first time always returns an EBADF error, so move these operations outside the loop that do not need to be repeated. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>	2020-06-19 19:28:39 +08:00
Akihiro Suda	9748b48742	Merge pull request #2229 from RenaudWasTaken/create-container Add CreateRuntime, CreateContainer and StartContainer Hooks	2020-06-19 12:27:51 +09:00
Renaud Gaubert	861afa7509	Add integration tests for the new runc hooks This patch adds a test based on real world usage of runc hooks (libnvidia-container). We verify that mounting a library inside a container and running ldconfig succeeds. Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-19 02:39:20 +00:00
Renaud Gaubert	2f7bdf9d3b	Tests the new Hook Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-19 02:39:20 +00:00
Peter Hunt	6a0f64e7c9	systemd: add unit tests for systemdVersion Signed-off-by: Peter Hunt <pehunt@redhat.com>	2020-06-18 22:30:50 -04:00
Peter Hunt	6369e38871	systemd: parse systemdVersion in more situations there have been cases observed where instead of `v$VER.0-$OS` the systemdVersion returned is just `$VER`, or `$VER-1`. handle these cases Signed-off-by: Peter Hunt <pehunt@redhat.com>	2020-06-18 22:30:50 -04:00
Kir Kolyshkin	89516d17dd	libct/cgroups/readProcsFile: ret errorr if scan failed Not sure why but the errors from scanner were ignored. Such errors can happen if open(2) has succeeded but the subsequent read(2) fails. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-17 12:33:01 -07:00
Mrunal Patel	406298fdf0	Merge pull request #2466 from kolyshkin/systemd-cpu-quota-period cgroups/systemd: add setting CPUQuotaPeriod prop	2020-06-17 12:03:30 -07:00
Mrunal Patel	12a7c8fc2b	Merge pull request #2411 from kolyshkin/v1-specific libct/cgroups/utils: fix/separate cgroupv1 code	2020-06-17 06:45:19 -07:00
Renaud Gaubert	ccdd75760c	Add the CreateRuntime, CreateContainer and StartContainer Hooks Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>	2020-06-17 02:10:00 +00:00
Kir Kolyshkin	e751a168dc	cgroups/systemd: add setting CPUQuotaPeriod prop For some reason, runc systemd drivers (both v1 and v2) never set systemd unit property named `CPUQuotaPeriod` (known as `CPUQuotaPeriodUSec` on dbus and in `systemctl show` output). Set it, and add a check to all the integration tests. The check is less than trivial because, when not set, the value is shown as "infinity" but when set to the same (default) value, shown as "100ms", so in case we expect 100ms (period = 100000 us), we have to _also_ check for "infinity". [v2: add systemd version checks since CPUQuotaPeriod requires v242+] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 15:48:06 -07:00
Kir Kolyshkin	8c5a19f79b	libct/cgroups/fs: rename some files no changes, just a few git renames Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 12:45:54 -07:00
Kir Kolyshkin	cec5ae7c2d	libct/cgroupv1/getCgroupMountsHelper: minor nit It is easy to just use TrimPrefix which does nothing in case the prefix does not exist. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-06-16 12:45:50 -07:00

1 2 3 4 5 ...

1608 Commits