jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kir Kolyshkin	4b4bc995ad	CreateCgroupPath: only enable needed controllers 1. Instead of enabling all available controllers, figure out which ones are required, and only enable those. 2. Amend all setFoo() functions to call isFooSet(). While this might seem unnecessary, it might actually help to uncover a bug. Imagine someone: - adds a cgroup.Resources.CpuFoo setting; - modifies setCpu() to apply the new setting; - but forgets to amend isCpuSet() accordingly <-- BUG In this case, a test case modifying CpuFoo will help to uncover the BUG. This is the reason why it's added. This patch could be amended by enabling controllers on a best-effort basis, i.e. : - do not return an error early if we can't enable some controllers; - if we fail to enable all controllers at once (usually because one of them can't be enabled), try enabling them one by one. Currently this is not implemented, and it's not clear whether this would be a good way to go or not. [v2: add/use is${Controller}Set() functions] [v3: document neededControllers()] [v4: drop "best-effort" part] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	bb47e35843	cgroup/systemd: reorganize 1. Rename the files - v1.go: cgroupv1 aka legacy; - v2.go: cgroupv2 aka unified hierarchy; - unsupported.go: when systemd is not available. 2. Move the code that is common between v1 and v2 to common.go Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	813cb3eb94	cgroupv2: fix fs2 cgroup init fs2 cgroup driver was not working because it did not enable controllers while creating cgroup directory; instead it was merely doing MkdirAll() and gathered the list of available controllers in NewManager(). Also, cgroup should be created in Apply(), not while creating a new manager instance. To fix: 1. Move the createCgroupsv2Path function from systemd driver to fs2 driver, renaming it to CreateCgroupPath. Use in Apply() from both fs2 and systemd drivers. 2. Delay available controllers map initialization to until it is needed. With this patch: - NewManager() only performs minimal initialization (initializin m.dirPath, if not provided); - Apply() properly creates cgroup path, enabling the controllers; - m.controllers is initialized lazily on demand. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	dbeff89491	cgroupv2/systemd: privatize UnifiedManager ... and its Cgroup field. There is no sense to keep it public. This was generated by gorename. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:27:40 -07:00
Kir Kolyshkin	88c13c0713	cgroupv2: use SecureJoin in systemd driver It seems that some paths are coming from user and are therefore untrusted. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:20:22 -07:00
Kir Kolyshkin	9c80cd672d	cgroupv2: rm legacy Paths from systemd driver Having map of per-subsystem paths in systemd unified cgroups driver does not make sense and makes the code less readable. To get rid of it, move the systemd v1-or-v2 init code to libcontainer/factory_linux.go which already has a function to deduce unified path out of paths map. End result is much cleaner code. Besides, we no longer write pid to the same cgroup file 7 times in Apply() like we did before. While at it - add `rootless` flag which is passed on to fs2 manager - merge getv2Path() into GetUnifiedPath(), don't overwrite path if it is set during initialization (on Load). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-19 16:19:51 -07:00
Ted Yu	614bb96676	cgroupv2/systemd: Properly remove intermediate directory Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-04-13 08:32:08 -07:00
Kir Kolyshkin	c86be8a2c1	cgroupv2: fix setting MemorySwap The resources.MemorySwap field from OCI is memory+swap, while cgroupv2 has a separate swap limit, so subtract memory from the limit (and make sure values are set and sane). Make sure to set MemorySwapMax for systemd, too. Since systemd does not have MemorySwapMax for cgroupv1, it is only needed for v2 driver. [v2: return -1 on any negative value, add unit test] [v3: treat any negative value other than -1 as error] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-07 20:45:53 -07:00
Tobias Klauser	3e678c08f9	Remove unused consts testScopeWait and testSliceWait These are unused since commit `518c855833` ("Remove libcontainer detection for systemd features") Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2020-04-03 21:09:43 +02:00
Mrunal Patel	d05e5728aa	systemd: Lazy initialize the systemd dbus connection Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-03-30 15:24:06 -07:00
Mrunal Patel	33c6125da6	systemd: Export IsSystemdRunning() function Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2020-03-30 15:24:06 -07:00
Kir Kolyshkin	a949e4f22f	cgroupv2: UnifiedManager.Apply: simplify Remove joinCgroupsV2() function, as its name and second parameter are misleading. Use createCgroupsv2Path() directly, do not call getv2Path() twice. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-26 19:20:00 -07:00
Kir Kolyshkin	5406833a65	cgroupv2/systemd: add getv2Path Function getSubsystemPath(), while works for v2 unified case, is suboptimal, as it does a few unnecessary calls. Add a simplified version of getSubsystemPath(), called getv2Path(), and use it. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-26 19:17:09 -07:00
Kir Kolyshkin	ec1f957b23	cgroupv2: don't use getSubsystemPath in Apply This code is a copy-paste from cgroupv1 systemd code. Its aim is to check whether a subsystem is available, and skip those that are not. In case v2 unified hierarchy is used, getSubsystemPath never returns "not found" error, so calling it is useless. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-26 13:32:34 -07:00
Kir Kolyshkin	a675b5ebea	cgroupv2: don't try to set kmem for systemd case To the best of my knowledge, it has been decided to drop the kernel memory controller from the cgroupv2 hierarchy, so "kernel memory limits" do not exist if we're using v2 unified. So, we need to ignore kernel memory setting. This was already done in non-systemd case (see commit `88e8350de`), let's do the same for systemd. This fixes the following error: > container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"open /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/tasks: no such file or directory\"" Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-25 20:00:23 -07:00
Mrunal Patel	7de5db3dad	Merge pull request #2263 from kolyshkin/nits Assorted minor nits in libcontainer	2020-03-24 14:17:22 -07:00
Kir Kolyshkin	12dc475dd6	libcontainer: simplify createCgroupsv2Path fmt.Sprintf is slow and is not needed here, string concatenation would be sufficient. It is also redundant to convert []byte from string and back, since `bytes` package now provides the same functions as `strings`. Use Fields() instead of TrimSpace() and Split(), mainly for readability (note Fields() is somewhat slower than Split() but here it doesn't matter much). Use Join() to prepend the plus signs. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-03-20 11:51:55 -07:00
Akihiro Suda	492d525e55	vendor: update go-systemd and godbus Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2020-03-16 13:26:03 +09:00
Qiang Huang	3b7e32feba	Merge pull request #2210 from Zyqsempai/2164-remove-deprecated-systemd-resources Exchange deprecated systemd resources with the appropriate for cgroupv2	2020-02-29 10:13:55 +08:00
Kir Kolyshkin	4c5c3fb960	Support for setting systemd properties via annotations In case systemd is used to set cgroups for the container, it creates a scope unit dedicated to it (usually named `runc-$ID.scope`). This patch adds an ability to set arbitrary systemd properties for the systemd unit via runtime spec annotations. Initially this was developed as an ability to specify the `TimeoutStopUSec` property, but later generalized to work with arbitrary ones. Example usage: add the following to runtime spec (config.json): ``` "annotations": { "org.systemd.property.TimeoutStopUSec": "uint64 123456789", "org.systemd.property.CollectMode":"'inactive-or-failed'" }, ``` and start the container (e.g. `runc --systemd-cgroup run $ID`). The above will set the following systemd parameters: * `TimeoutStopSec` to 2 minutes and 3 seconds, * `CollectMode` to "inactive-or-failed". The values are in the gvariant format (see [1]). To figure out which type systemd expects for a particular parameter, see systemd sources. In particular, parameters with `USec` suffix require an `uint64` typed argument, while gvariant assumes int32 for a numeric values, therefore the explicit type is required. NOTE that systemd receives the time-typed parameters as USec but shows them (in `systemctl show`) as Sec. For example, the stop timeout should be set as `TimeoutStopUSec` but is shown as `TimeoutStopSec`. [1] https://developer.gnome.org/glib/stable/gvariant-text.html Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-02-17 16:07:19 -08:00
Boris Popovschi	5b96f314ba	Exchanged deprecated systemd resources with the appropriate for cgroupv2 Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>	2020-01-15 18:09:33 +02:00
Mrunal Patel	5cc0deaf7a	Merge pull request #2169 from AkihiroSuda/split-fs cgroup2: split fs2 from fs	2020-01-13 16:23:27 -08:00
Julio Montes	8ddd892072	libcontainer: add method to get cgroup config from cgroup Manager `configs.Cgroup` contains the configuration used to create cgroups. This configuration must be saved to disk, since it's required to restore the cgroup manager that was used to create the cgroups. Add method to get cgroup configuration from cgroup Manager to allow API users save it to disk and restore a cgroup manager later. fixes #2176 Signed-off-by: Julio Montes <julio.montes@intel.com>	2019-12-17 22:46:03 +00:00
Akihiro Suda	88e8350de2	cgroup2: split fs2 from fs split fs2 package from fs, as mixing up fs and fs2 is very likely to result in unmaintainable code. Inspired by containerd/cgroups#109 Fix #2157 Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-12-06 15:42:10 +09:00
Mrunal Patel	46def4cc4c	Merge pull request #2154 from jpeach/2008-remove-static-build-tag Remove the static_build build tag.	2019-11-04 17:10:59 -08:00
Akihiro Suda	faf673ee45	cgroup2: port over eBPF device controller from crun The implementation is based on https://github.com/containers/crun/blob/0.10.2/src/libcrun/ebpf.c Although ebpf.c is originally licensed under LGPL-3.0-or-later, the author Giuseppe Scrivano agreed to relicense the file in Apache License 2.0: https://github.com/opencontainers/runc/issues/2144#issuecomment-543116397 See libcontainer/cgroups/ebpf/devicefilter/devicefilter_test.go for tested configurations. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-10-31 14:01:46 +09:00
James Peach	13919f5dfd	Remove the static_build build tag. The `static_build` build tag was introduced in `e9944d0f` to remove build warnings related to systemd cgroup driver dependencies. Since then, those dependencies have changed and building the systemd cgroup driver no longer imports dlopen. After this change, runc builds will always include the systemd cgroup driver. This fixes #2008. Signed-off-by: James Peach <jpeach@apache.org>	2019-10-26 08:28:45 +11:00
Akihiro Suda	dbd771e475	cgroup2: implement `runc ps` Implemented `runc ps` for cgroup v2 , using a newly added method `m.GetUnifiedPath()`. Unlike the v1 implementation that checks `m.GetPaths()["devices"]`, the v2 implementation does not require the device controller to be available. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2019-10-19 01:59:24 +09:00
Giuseppe Scrivano	524cb7c318	libcontainer: add systemd.UnifiedManager Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-09-05 13:02:27 +02:00
Giuseppe Scrivano	ec11136828	libcontainer, cgroups: rename systemd.Manager to LegacyManager Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-09-05 13:02:26 +02:00
Mrunal Patel	3525eddec5	Merge pull request #2117 from filbranden/detection1 Remove libcontainer detection for systemd features	2019-08-25 13:15:15 -07:00
Filipe Brandenburger	518c855833	Remove libcontainer detection for systemd features Transient units (and transient slice units) have been available for quite a long time and RHEL 7 with systemd v219 (likely the oldest OS we care about at this point) supports that. A system running a systemd without these features is likely to break a lot of other stuff that runc/libcontainer care about. Regarding delegated slices, modern systemd doesn't allow it and runc/libcontainer run fine on it, so we might as well just stop requesting it on older versions of systemd which allowed it. (Those versions never really changed behavior significantly when that option was passed anyways.) Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>	2019-08-22 21:53:24 -07:00
Filipe Brandenburger	588f040a77	Avoid the dependency on cgo through go-systemd/util package This dependency is only needed in package "github.com/coreos/go-systemd/util" and we only use it for IsRunningSystemd(), which is a simple Go function that just stats a file. Let's just borrow it here, so we remove the dependency and can remove that package from vendored build. This also removes dependencies on dlopen and on trying to find libsystemd.so or libsystemd-login.so in the system. Tested that this still builds and works as expected. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>	2019-08-22 21:07:24 -07:00
Filipe Brandenburger	46351eb3d1	Move systemd.Manager initialization into a function in that module This will permit us to extend the internals of systemd.Manager to include further information about the system, such as whether cgroupv1, cgroupv2 or both are in effect. Furthermore, it allows a future refactor of moving more of UseSystemd() code into the factory initialization function. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>	2019-05-01 13:22:19 -07:00
Filipe Brandenburger	cd41feb46b	Remove detection for scope properties, which have always been broken The detection for scope properties (whether scope units support DefaultDependencies= or Delegate=) has always been broken, since systemd refuses to create scopes unless at least one PID is attached to it (and this has been so since scope units were introduced in systemd v205.) This can be seen in journal logs whenever a container is started with libpod: Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Since this logic never worked, just assume both attributes are supported (which is what the code does when detection fails for this reason, since it's looking for an "unknown attribute" or "read-only attribute" to mark them as false) and skip the detection altogether. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2019-02-11 16:05:37 -08:00
Giuseppe Scrivano	f01923376d	systemd: fix setting kernel memory limit since commit `df3fa115f9` it is not possible to set a kernel memory limit when using the systemd cgroups backend as we use cgroup.Apply twice. Skip enabling kernel memory if there are already tasks in the cgroup. Without this patch, runc fails with: container_linux.go:344: starting container process caused "process_linux.go:311: applying cgroup configuration for process caused \"failed to set memory.kmem.limit_in_bytes, because either tasks have already joined this cgroup or it has children\"" Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-01-10 11:33:50 +01:00
Michael Crosby	76520a4bf0	Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint Respect container's cgroup path	2018-11-16 14:06:54 -05:00
Sergio Lopez	5c6b9c3c1c	libcontainer: map PidsLimit to systemd's TasksMax property Currently runc applies PidsLimit restriction by writing directly to cgroup's pids.max, without notifying systemd. As a consequence, when the later updates the context of the corresponding scope, pids.max is reset to the value of systemd's TasksMax property. This can be easily reproduced this way (I'm using "postfix" here just an example, any unrelated but existing service will do): # CTR=`docker run --pids-limit 111 --detach --rm busybox /bin/sleep 8h` # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max 111 # systemctl disable --now postfix # systemctl enable --now postfix # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max max This patch adds TasksAccounting=true and TasksMax=PidsLimit to the properties sent to systemd. Signed-off-by: Sergio Lopez <slp@redhat.com>	2018-10-24 17:20:27 +02:00
Danail Branekov	a1d5398afa	Respect container's cgroup path Respect the container's cgroup path when finding the container's cgroup mount point, which is useful in multi-tenant environments, where containers have their own unique cgroup mounts Signed-off-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io> Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>	2018-09-25 17:43:36 +01:00
Derek Carr	b515963c10	systemd cpu quota ignores -1 Signed-off-by: Derek Carr <decarr@redhat.com>	2018-05-23 14:28:39 -04:00
Filipe Brandenburger	165ee45334	Make channel for StartTransientUnit buffered So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue #1780 and it's a follow up for PR #1683, PR #1754 and PR #1772. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-14 08:49:50 -07:00
Filipe Brandenburger	0e16bd9b53	Detect whether Delegate is available on both slices and scopes Starting with systemd 237, in preparation for cgroup v2, delegation is only now available for scopes, not slices. Update libcontainer code to detect whether delegation is available on both and use that information when creating new slices. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-10 11:42:55 -07:00
Filipe Brandenburger	8ab251f298	Fix systemd.Apply() to check for DBus error before waiting on a channel. The channel was introduced in #1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR #1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.) Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-09 11:51:59 -07:00
vikaschoudhary16	04e95b526d	Add timeout while waiting for StartTransinetUnit completion signal from dbus Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-03-07 05:11:38 -05:00
Michael Crosby	595bea022f	Merge pull request #1722 from ravisantoshgudimetla/fix-systemd-path fix systemd slice expansion so that it could be consumed by cAdvisor	2018-02-20 09:59:24 -05:00
ravisantoshgudimetla	7019e1de7b	fix systemd slice expansion so that it could be consumed by cAdvisor Signed-off-by: ravisantoshgudimetla <ravisantoshgudimetla@gmail.com>	2018-02-18 21:32:39 -05:00
vikaschoudhary16	d5b4a3eddb	Fix race against systemd - T0: runc triggers a systemd unit creation asynchronously from [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L298) - T1: runc then moves ahead and starts creating cgroup paths(.scope directories), [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L348). Kernel creates .scope directory and cgroup.procs file(along with other default files) in the directory automatically, in an atomic manner. - T3: systemd execution thread which was invoked at time `T0`, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the `.scope` directory which is created at time `T1` by runc from [here](https://github.com/systemd/systemd/blob/v219/src/shared/cgroup-util.c#L1630) in the code Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-01-08 09:37:26 -05:00
Seth Jennings	bca53e7b49	systemd: adjust CPUQuotaPerSecUSec to compensate for systemd internal handling Signed-off-by: Seth Jennings <sjenning@redhat.com>	2017-11-15 20:20:06 -06:00
Yong Tang	e9944d0f4c	Disable systemd in static build This fix tries to address the warnings caused by static build with go 1.9. As systemd needs dlopen/dlclose, the following warnings will be generated for static build in go 1.9: ``` root@f4b077232050:/go/src/github.com/opencontainers/runc# make static CGO_ENABLED=1 go build -tags "seccomp cgo static_build" -ldflags "-w -extldflags -static -X main.gitCommit="1c81e2a794c6e26a4c650142ae8893c47f619764" -X main.version=1.0.0-rc4+dev " -o runc . /tmp/go-link-113476657/000007.o: In function `_cgo_a5acef59ed3f_Cfunc_dlopen': /tmp/go-build/github.com/opencontainers/runc/vendor/github.com/coreos/pkg/dlopen/_obj/cgo-gcc-prolog:76: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking ``` This fix disables systemd when `static_build` flag is on (apply_nosystemd.go is used instead). This fix also fixes a small bug in `apply_nosystemd.go` for return value. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:38:22 +00:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00

1 2 3

101 Commits