jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Filipe Brandenburger	46351eb3d1	Move systemd.Manager initialization into a function in that module This will permit us to extend the internals of systemd.Manager to include further information about the system, such as whether cgroupv1, cgroupv2 or both are in effect. Furthermore, it allows a future refactor of moving more of UseSystemd() code into the factory initialization function. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>	2019-05-01 13:22:19 -07:00
Filipe Brandenburger	cd41feb46b	Remove detection for scope properties, which have always been broken The detection for scope properties (whether scope units support DefaultDependencies= or Delegate=) has always been broken, since systemd refuses to create scopes unless at least one PID is attached to it (and this has been so since scope units were introduced in systemd v205.) This can be seen in journal logs whenever a container is started with libpod: Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Since this logic never worked, just assume both attributes are supported (which is what the code does when detection fails for this reason, since it's looking for an "unknown attribute" or "read-only attribute" to mark them as false) and skip the detection altogether. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2019-02-11 16:05:37 -08:00
Mrunal Patel	4e4c907193	Merge pull request #1950 from cloudfoundry-incubator/enter-pid-race Resilience in adding of exec tasks to cgroups	2019-02-01 13:18:16 -08:00
Giuseppe Scrivano	f01923376d	systemd: fix setting kernel memory limit since commit `df3fa115f9` it is not possible to set a kernel memory limit when using the systemd cgroups backend as we use cgroup.Apply twice. Skip enabling kernel memory if there are already tasks in the cgroup. Without this patch, runc fails with: container_linux.go:344: starting container process caused "process_linux.go:311: applying cgroup configuration for process caused \"failed to set memory.kmem.limit_in_bytes, because either tasks have already joined this cgroup or it has children\"" Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-01-10 11:33:50 +01:00
Tom Godkin	bdf3524b34	Retry adding pids to cgroups when EINVAL occurs The kernel will sometimes return EINVAL when writing a pid to a cgroup.procs file. It does so when the task being added still has the state TASK_NEW. See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286 Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Tom Godkin <tgodkin@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>	2018-12-17 15:34:47 +00:00
JoeWrightss	769d6c4a75	Fix some typos Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>	2018-12-09 23:52:54 +08:00
Aleksa Sarai	8a4629f7b5	cgroups: nokmem: error out on explicitly-set kmemcg limits When built with nokmem we explicitly are disabling support for kmemcg, but it is a strict specification requirement that if we cannot fulfil an aspect of the container configuration that we error out. Completely ignoring explicitly-requested kmemcg limits with nokmem would undoubtably lead to problems. Fixes: `6a2c155968` ("libcontainer: ability to compile without kmem") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-12-01 14:31:35 +11:00
Michael Crosby	76520a4bf0	Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint Respect container's cgroup path	2018-11-16 14:06:54 -05:00
Mrunal Patel	4769cdf607	Merge pull request #1916 from crosbymichael/cgns Add support for cgroup namespace	2018-11-13 12:21:38 -08:00
Mrunal Patel	f000fe11ec	Merge pull request #1917 from slp/master libcontainer: map PidsLimit to systemd's TasksMax property	2018-11-13 12:21:23 -08:00
Michael Crosby	aa7917b751	Merge pull request #1911 from theSuess/linter-fixes Various cleanups to address linter issues	2018-11-13 12:13:34 -05:00
Kir Kolyshkin	6a2c155968	libcontainer: ability to compile without kmem Commit `fe898e7862` (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * https://github.com/opencontainers/runc/issues/1725 * https://github.com/kubernetes/kubernetes/issues/61937 * https://github.com/moby/moby/issues/29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-10-31 20:35:51 -07:00
Yuanhong Peng	df3fa115f9	Add support for cgroup namespace Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-31 10:51:43 -04:00
Sergio Lopez	5c6b9c3c1c	libcontainer: map PidsLimit to systemd's TasksMax property Currently runc applies PidsLimit restriction by writing directly to cgroup's pids.max, without notifying systemd. As a consequence, when the later updates the context of the corresponding scope, pids.max is reset to the value of systemd's TasksMax property. This can be easily reproduced this way (I'm using "postfix" here just an example, any unrelated but existing service will do): # CTR=`docker run --pids-limit 111 --detach --rm busybox /bin/sleep 8h` # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max 111 # systemctl disable --now postfix # systemctl enable --now postfix # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max max This patch adds TasksAccounting=true and TasksMax=PidsLimit to the properties sent to systemd. Signed-off-by: Sergio Lopez <slp@redhat.com>	2018-10-24 17:20:27 +02:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Dominik Süß	0b412e9482	various cleanups to address linter issues Signed-off-by: Dominik Süß <dominik@suess.wtf>	2018-10-13 21:14:03 +02:00
Danail Branekov	a1d5398afa	Respect container's cgroup path Respect the container's cgroup path when finding the container's cgroup mount point, which is useful in multi-tenant environments, where containers have their own unique cgroup mounts Signed-off-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io> Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>	2018-09-25 17:43:36 +01:00
Aleksa Sarai	578fe65e4f	merge branch 'pr-1817' Fix duplicate entries and missing entries in getCgroupMountsHelper Add test for testing cgroup mounts on bedrock linux Stop relying on number of subsystems for cgroups LGTMs: @crosbymichael @cyphar Closes #1817	2018-09-19 19:48:17 +10:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Jay Kamat	a2faaa1317	Fix duplicate entries and missing entries in getCgroupMountsHelper Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2018-07-31 20:12:18 -07:00
Jay Kamat	e5a7c61f3c	Add test for testing cgroup mounts on bedrock linux Add a mountinfo from a bedrock linux system with 4 strata, and include it for tests Signed-off-by: Jay Kamat <jaygkamat@gmail.com> Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2018-06-24 00:01:07 +01:00
Daniel Dao	5ee0648bfb	Stop relying on number of subsystems for cgroups When there are complicated mount setups, there can be multiple mount points which have the subsystem we are looking for. Instead of counting the mountpoints, tick off subsystems until we have found them all. Without the 'all' flag, ignore duplicate subsystems after the first. Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2018-06-24 00:00:58 +01:00
Aleksa Sarai	939d5a3753	cgroup: clean up isIgnorableError for skippable EROFS Include a rootless argument for isIgnorableError to avoid people accidentally using isIgnorableError when they shouldn't (we don't ignore any errors when running as root as that really isn't safe). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-05-25 11:31:41 +10:00
Qiang Huang	dd67ab10d7	Merge pull request #1759 from cyphar/rootless-erofs-as-eperm rootless: cgroup: treat EROFS as a skippable error	2018-05-25 09:24:16 +08:00
Derek Carr	b515963c10	systemd cpu quota ignores -1 Signed-off-by: Derek Carr <decarr@redhat.com>	2018-05-23 14:28:39 -04:00
Filipe Brandenburger	165ee45334	Make channel for StartTransientUnit buffered So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue #1780 and it's a follow up for PR #1683, PR #1754 and PR #1772. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-14 08:49:50 -07:00
Filipe Brandenburger	0e16bd9b53	Detect whether Delegate is available on both slices and scopes Starting with systemd 237, in preparation for cgroup v2, delegation is only now available for scopes, not slices. Update libcontainer code to detect whether delegation is available on both and use that information when creating new slices. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-10 11:42:55 -07:00
Filipe Brandenburger	8ab251f298	Fix systemd.Apply() to check for DBus error before waiting on a channel. The channel was introduced in #1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR #1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.) Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-09 11:51:59 -07:00
Aleksa Sarai	03e585985f	rootless: cgroup: treat EROFS as a skippable error In some cases, /sys/fs/cgroups is mounted read-only. In rootless containers we can consider this effectively identical to having cgroups that we don't have write permission to -- because the user isn't responsible for the read-only setup and cannot modify it. The rules are identical to when /sys/fs/cgroups is not writable by the unprivileged user. An example of this is the default configuration of Docker, where cgroups are mounted as read-only as a preventative security measure. Reported-by: Vladimir Rutsky <rutsky@google.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00
Qiang Huang	9facb87f87	Merge pull request #1754 from vikaschoudhary16/add-timeout Add timeout while waiting for StartTransinetUnit completion signal	2018-03-08 09:09:34 +08:00
vikaschoudhary16	04e95b526d	Add timeout while waiting for StartTransinetUnit completion signal from dbus Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-03-07 05:11:38 -05:00
Denys Smirnov	3d26fc3fd7	cgroups/fs: fix NPE on Destroy than no cgroups are set Currently Manager accepts nil cgroups when calling Apply, but it will panic then trying to call Destroy with the same config. Signed-off-by: Denys Smirnov <denys@sourced.tech>	2018-03-06 23:31:31 +01:00
Michael Crosby	595bea022f	Merge pull request #1722 from ravisantoshgudimetla/fix-systemd-path fix systemd slice expansion so that it could be consumed by cAdvisor	2018-02-20 09:59:24 -05:00
ravisantoshgudimetla	7019e1de7b	fix systemd slice expansion so that it could be consumed by cAdvisor Signed-off-by: ravisantoshgudimetla <ravisantoshgudimetla@gmail.com>	2018-02-18 21:32:39 -05:00
vikaschoudhary16	d5b4a3eddb	Fix race against systemd - T0: runc triggers a systemd unit creation asynchronously from [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L298) - T1: runc then moves ahead and starts creating cgroup paths(.scope directories), [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L348). Kernel creates .scope directory and cgroup.procs file(along with other default files) in the directory automatically, in an atomic manner. - T3: systemd execution thread which was invoked at time `T0`, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the `.scope` directory which is created at time `T1` by runc from [here](https://github.com/systemd/systemd/blob/v219/src/shared/cgroup-util.c#L1630) in the code Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-01-08 09:37:26 -05:00
Seth Jennings	bca53e7b49	systemd: adjust CPUQuotaPerSecUSec to compensate for systemd internal handling Signed-off-by: Seth Jennings <sjenning@redhat.com>	2017-11-15 20:20:06 -06:00
Michael Crosby	ff4481dbf6	Merge pull request #1540 from cloudfoundry-incubator/rootless-cgroups Support cgroups with limits as rootless	2017-10-16 12:03:49 -04:00
Sebastien Boeuf	acb93c9c62	libcontainer: cgroups: Write freezer state after every state check This commit ensures we write the expected freezer cgroup state after every state check, in case the state check does not give the expected result. This can happen when a new task is created and prevents the whole cgroup to be FROZEN, leaving the state into FREEZING instead. This patch prevents the case of an infinite loop to happen. Fixes https://github.com/opencontainers/runc/issues/1609 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2017-10-12 07:07:28 -07:00
Will Martin	ca4f427af1	Support cgroups with limits as rootless Signed-off-by: Ed King <eking@pivotal.io> Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io> Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>	2017-10-05 11:22:54 +01:00
Yong Tang	e9944d0f4c	Disable systemd in static build This fix tries to address the warnings caused by static build with go 1.9. As systemd needs dlopen/dlclose, the following warnings will be generated for static build in go 1.9: ``` root@f4b077232050:/go/src/github.com/opencontainers/runc# make static CGO_ENABLED=1 go build -tags "seccomp cgo static_build" -ldflags "-w -extldflags -static -X main.gitCommit="1c81e2a794c6e26a4c650142ae8893c47f619764" -X main.version=1.0.0-rc4+dev " -o runc . /tmp/go-link-113476657/000007.o: In function `_cgo_a5acef59ed3f_Cfunc_dlopen': /tmp/go-build/github.com/opencontainers/runc/vendor/github.com/coreos/pkg/dlopen/_obj/cgo-gcc-prolog:76: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking ``` This fix disables systemd when `static_build` flag is on (apply_nosystemd.go is used instead). This fix also fixes a small bug in `apply_nosystemd.go` for return value. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:38:22 +00:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00
Michael Crosby	882d8eaba6	Merge pull request #1537 from tklauser/staticcheck Fix issues found by staticcheck	2017-08-02 09:52:11 -04:00
Tobias Klauser	e4e56cb6d8	libcontainer: remove ineffective break statements go's switch statement doesn't need an explicit break. Remove it where that is the case and add a comment to indicate the purpose where the removal would lead to an empty case. Found with honnef.co/go/tools/cmd/staticcheck Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-28 15:13:39 +02:00
Steven Hartland	ee4f68e302	Updated logrus to v1 Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen. This includes a manual edit of the docker term package to also correct the name there too. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-07-19 15:20:56 +00:00
Daniel, Dao Quang Minh	7139b61f7f	Merge pull request #1378 from derekwaynecarr/expose_use_hierarchy Expose memory.use_hierarchy in MemoryStats	2017-06-30 16:08:21 +01:00
Justin Cormack	3d9074ead3	Update memory specs to use int64 not uint64 replace #1492 #1494 fix #1422 Since https://github.com/opencontainers/runtime-spec/pull/876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Daniel, Dao Quang Minh	67bd2ab554	Merge pull request #1442 from clnperez/libcontainer-sys-unix Move libcontainer to x/sys/unix	2017-05-26 12:18:33 +01:00
Michael Crosby	18cd7e06f7	Merge pull request #1372 from cloudfoundry-incubator/cpuset-mount-root Handle container creation when cgroups have already been mounted in another location	2017-05-25 09:53:57 -07:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00

1 2 3 4 5

226 Commits