jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Erik Sipsma	9c822e4847	cgroups/fs: check nil pointers in cgroup manager Signed-off-by: Erik Sipsma <sipsma@amazon.com>	2019-08-14 09:50:45 -07:00
Odin Ugedal	6f77e35daf	Export list of HugePageSizeUnits This will allow others to import it instead of copying it. Signed-off-by: Odin Ugedal <odin@ugedal.com>	2019-05-30 20:17:30 +02:00
Odin Ugedal	c6445b1c1c	Add tests for GetHugePageSize Add tests to avoid regressions Signed-off-by: Odin Ugedal <odin@ugedal.com>	2019-05-30 17:27:32 +02:00
Odin Ugedal	273e7b74a7	Fix cgroup hugetlb size prefix for kB The hugetlb cgroup control files (introduced here in 2012: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773) use "KB" and not "kB" (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349). The behavior in the kernel has not changed since the introduction, and the current code using "kB" will therefore fail on devices with small amounts of ram (see https://github.com/kubernetes/kubernetes/issues/77169) running a kernel with config flag CONFIG_HUGETLBFS=y As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB", "MB" and "GB" are used, so the others may be removed as well. Here is a real world example of the files inside the "/sys/kernel/mm/hugepages/" directory: - "hugepages-64kB" - "hugepages-2048kB" - "hugepages-32768kB" - "hugepages-1048576kB" And the corresponding cgroup files: - "hugetlb.64KB._____" - "hugetlb.2MB._____" - "hugetlb.32MB._____" - "hugetlb.1GB._____" Signed-off-by: Odin Ugedal <odin@ugedal.com>	2019-05-29 21:52:43 +02:00
Filipe Brandenburger	46351eb3d1	Move systemd.Manager initialization into a function in that module This will permit us to extend the internals of systemd.Manager to include further information about the system, such as whether cgroupv1, cgroupv2 or both are in effect. Furthermore, it allows a future refactor of moving more of UseSystemd() code into the factory initialization function. Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>	2019-05-01 13:22:19 -07:00
Filipe Brandenburger	cd41feb46b	Remove detection for scope properties, which have always been broken The detection for scope properties (whether scope units support DefaultDependencies= or Delegate=) has always been broken, since systemd refuses to create scopes unless at least one PID is attached to it (and this has been so since scope units were introduced in systemd v205.) This can be seen in journal logs whenever a container is started with libpod: Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Feb 11 15:08:07 myhost systemd[1]: libcontainer-12345-systemd-test-default-dependencies.scope: Scope has no PIDs. Refusing. Since this logic never worked, just assume both attributes are supported (which is what the code does when detection fails for this reason, since it's looking for an "unknown attribute" or "read-only attribute" to mark them as false) and skip the detection altogether. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2019-02-11 16:05:37 -08:00
Mrunal Patel	4e4c907193	Merge pull request #1950 from cloudfoundry-incubator/enter-pid-race Resilience in adding of exec tasks to cgroups	2019-02-01 13:18:16 -08:00
Giuseppe Scrivano	f01923376d	systemd: fix setting kernel memory limit since commit `df3fa115f9` it is not possible to set a kernel memory limit when using the systemd cgroups backend as we use cgroup.Apply twice. Skip enabling kernel memory if there are already tasks in the cgroup. Without this patch, runc fails with: container_linux.go:344: starting container process caused "process_linux.go:311: applying cgroup configuration for process caused \"failed to set memory.kmem.limit_in_bytes, because either tasks have already joined this cgroup or it has children\"" Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-01-10 11:33:50 +01:00
Tom Godkin	bdf3524b34	Retry adding pids to cgroups when EINVAL occurs The kernel will sometimes return EINVAL when writing a pid to a cgroup.procs file. It does so when the task being added still has the state TASK_NEW. See: https://elixir.bootlin.com/linux/v4.8/source/kernel/sched/core.c#L8286 Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Tom Godkin <tgodkin@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>	2018-12-17 15:34:47 +00:00
JoeWrightss	769d6c4a75	Fix some typos Signed-off-by: JoeWrightss <zhoulin.xie@daocloud.io>	2018-12-09 23:52:54 +08:00
Aleksa Sarai	8a4629f7b5	cgroups: nokmem: error out on explicitly-set kmemcg limits When built with nokmem we explicitly are disabling support for kmemcg, but it is a strict specification requirement that if we cannot fulfil an aspect of the container configuration that we error out. Completely ignoring explicitly-requested kmemcg limits with nokmem would undoubtably lead to problems. Fixes: `6a2c155968` ("libcontainer: ability to compile without kmem") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-12-01 14:31:35 +11:00
Michael Crosby	76520a4bf0	Merge pull request #1872 from masters-of-cats/better-find-cgroup-mountpoint Respect container's cgroup path	2018-11-16 14:06:54 -05:00
Mrunal Patel	4769cdf607	Merge pull request #1916 from crosbymichael/cgns Add support for cgroup namespace	2018-11-13 12:21:38 -08:00
Mrunal Patel	f000fe11ec	Merge pull request #1917 from slp/master libcontainer: map PidsLimit to systemd's TasksMax property	2018-11-13 12:21:23 -08:00
Michael Crosby	aa7917b751	Merge pull request #1911 from theSuess/linter-fixes Various cleanups to address linter issues	2018-11-13 12:13:34 -05:00
Kir Kolyshkin	6a2c155968	libcontainer: ability to compile without kmem Commit `fe898e7862` (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * https://github.com/opencontainers/runc/issues/1725 * https://github.com/kubernetes/kubernetes/issues/61937 * https://github.com/moby/moby/issues/29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-10-31 20:35:51 -07:00
Yuanhong Peng	df3fa115f9	Add support for cgroup namespace Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-31 10:51:43 -04:00
Sergio Lopez	5c6b9c3c1c	libcontainer: map PidsLimit to systemd's TasksMax property Currently runc applies PidsLimit restriction by writing directly to cgroup's pids.max, without notifying systemd. As a consequence, when the later updates the context of the corresponding scope, pids.max is reset to the value of systemd's TasksMax property. This can be easily reproduced this way (I'm using "postfix" here just an example, any unrelated but existing service will do): # CTR=`docker run --pids-limit 111 --detach --rm busybox /bin/sleep 8h` # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max 111 # systemctl disable --now postfix # systemctl enable --now postfix # cat /sys/fs/cgroup/pids/system.slice/docker-${CTR}.scope/pids.max max This patch adds TasksAccounting=true and TasksMax=PidsLimit to the properties sent to systemd. Signed-off-by: Sergio Lopez <slp@redhat.com>	2018-10-24 17:20:27 +02:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Dominik Süß	0b412e9482	various cleanups to address linter issues Signed-off-by: Dominik Süß <dominik@suess.wtf>	2018-10-13 21:14:03 +02:00
Danail Branekov	a1d5398afa	Respect container's cgroup path Respect the container's cgroup path when finding the container's cgroup mount point, which is useful in multi-tenant environments, where containers have their own unique cgroup mounts Signed-off-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io> Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>	2018-09-25 17:43:36 +01:00
Aleksa Sarai	578fe65e4f	merge branch 'pr-1817' Fix duplicate entries and missing entries in getCgroupMountsHelper Add test for testing cgroup mounts on bedrock linux Stop relying on number of subsystems for cgroups LGTMs: @crosbymichael @cyphar Closes #1817	2018-09-19 19:48:17 +10:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Jay Kamat	a2faaa1317	Fix duplicate entries and missing entries in getCgroupMountsHelper Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2018-07-31 20:12:18 -07:00
Jay Kamat	e5a7c61f3c	Add test for testing cgroup mounts on bedrock linux Add a mountinfo from a bedrock linux system with 4 strata, and include it for tests Signed-off-by: Jay Kamat <jaygkamat@gmail.com> Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2018-06-24 00:01:07 +01:00
Daniel Dao	5ee0648bfb	Stop relying on number of subsystems for cgroups When there are complicated mount setups, there can be multiple mount points which have the subsystem we are looking for. Instead of counting the mountpoints, tick off subsystems until we have found them all. Without the 'all' flag, ignore duplicate subsystems after the first. Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2018-06-24 00:00:58 +01:00
Aleksa Sarai	939d5a3753	cgroup: clean up isIgnorableError for skippable EROFS Include a rootless argument for isIgnorableError to avoid people accidentally using isIgnorableError when they shouldn't (we don't ignore any errors when running as root as that really isn't safe). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-05-25 11:31:41 +10:00
Qiang Huang	dd67ab10d7	Merge pull request #1759 from cyphar/rootless-erofs-as-eperm rootless: cgroup: treat EROFS as a skippable error	2018-05-25 09:24:16 +08:00
Derek Carr	b515963c10	systemd cpu quota ignores -1 Signed-off-by: Derek Carr <decarr@redhat.com>	2018-05-23 14:28:39 -04:00
Filipe Brandenburger	165ee45334	Make channel for StartTransientUnit buffered So that, if a timeout happens and we decide to stop blocking on the operation, the writer will not block when they try to report the result of the operation. This should address Issue #1780 and it's a follow up for PR #1683, PR #1754 and PR #1772. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-14 08:49:50 -07:00
Filipe Brandenburger	0e16bd9b53	Detect whether Delegate is available on both slices and scopes Starting with systemd 237, in preparation for cgroup v2, delegation is only now available for scopes, not slices. Update libcontainer code to detect whether delegation is available on both and use that information when creating new slices. Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-10 11:42:55 -07:00
Filipe Brandenburger	8ab251f298	Fix systemd.Apply() to check for DBus error before waiting on a channel. The channel was introduced in #1683 to work around a race condition. However, the check for error in StartTransientUnit ignores the error for an already existing unit, and in that case there will be no notification from DBus (so waiting on the channel will make it hang.) Later PR #1754 added a timeout, which worked around the issue, but we can fix this correctly by only waiting on the channel when there is no error. Fix the code to do so. The timeout handling was kept, since there might be other cases where this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358 mentions calling this code from inside a container, it's unclear whether an existing container was in use or not, so not sure whether this would have fixed that bug as well.) Signed-off-by: Filipe Brandenburger <filbranden@google.com>	2018-04-09 11:51:59 -07:00
Aleksa Sarai	03e585985f	rootless: cgroup: treat EROFS as a skippable error In some cases, /sys/fs/cgroups is mounted read-only. In rootless containers we can consider this effectively identical to having cgroups that we don't have write permission to -- because the user isn't responsible for the read-only setup and cannot modify it. The rules are identical to when /sys/fs/cgroups is not writable by the unprivileged user. An example of this is the default configuration of Docker, where cgroups are mounted as read-only as a preventative security measure. Reported-by: Vladimir Rutsky <rutsky@google.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00
Qiang Huang	9facb87f87	Merge pull request #1754 from vikaschoudhary16/add-timeout Add timeout while waiting for StartTransinetUnit completion signal	2018-03-08 09:09:34 +08:00
vikaschoudhary16	04e95b526d	Add timeout while waiting for StartTransinetUnit completion signal from dbus Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-03-07 05:11:38 -05:00
Denys Smirnov	3d26fc3fd7	cgroups/fs: fix NPE on Destroy than no cgroups are set Currently Manager accepts nil cgroups when calling Apply, but it will panic then trying to call Destroy with the same config. Signed-off-by: Denys Smirnov <denys@sourced.tech>	2018-03-06 23:31:31 +01:00
Michael Crosby	595bea022f	Merge pull request #1722 from ravisantoshgudimetla/fix-systemd-path fix systemd slice expansion so that it could be consumed by cAdvisor	2018-02-20 09:59:24 -05:00
ravisantoshgudimetla	7019e1de7b	fix systemd slice expansion so that it could be consumed by cAdvisor Signed-off-by: ravisantoshgudimetla <ravisantoshgudimetla@gmail.com>	2018-02-18 21:32:39 -05:00
vikaschoudhary16	d5b4a3eddb	Fix race against systemd - T0: runc triggers a systemd unit creation asynchronously from [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L298) - T1: runc then moves ahead and starts creating cgroup paths(.scope directories), [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L348). Kernel creates .scope directory and cgroup.procs file(along with other default files) in the directory automatically, in an atomic manner. - T3: systemd execution thread which was invoked at time `T0`, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the `.scope` directory which is created at time `T1` by runc from [here](https://github.com/systemd/systemd/blob/v219/src/shared/cgroup-util.c#L1630) in the code Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>	2018-01-08 09:37:26 -05:00
Seth Jennings	bca53e7b49	systemd: adjust CPUQuotaPerSecUSec to compensate for systemd internal handling Signed-off-by: Seth Jennings <sjenning@redhat.com>	2017-11-15 20:20:06 -06:00
Michael Crosby	ff4481dbf6	Merge pull request #1540 from cloudfoundry-incubator/rootless-cgroups Support cgroups with limits as rootless	2017-10-16 12:03:49 -04:00
Sebastien Boeuf	acb93c9c62	libcontainer: cgroups: Write freezer state after every state check This commit ensures we write the expected freezer cgroup state after every state check, in case the state check does not give the expected result. This can happen when a new task is created and prevents the whole cgroup to be FROZEN, leaving the state into FREEZING instead. This patch prevents the case of an infinite loop to happen. Fixes https://github.com/opencontainers/runc/issues/1609 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2017-10-12 07:07:28 -07:00
Will Martin	ca4f427af1	Support cgroups with limits as rootless Signed-off-by: Ed King <eking@pivotal.io> Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io> Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>	2017-10-05 11:22:54 +01:00
Yong Tang	e9944d0f4c	Disable systemd in static build This fix tries to address the warnings caused by static build with go 1.9. As systemd needs dlopen/dlclose, the following warnings will be generated for static build in go 1.9: ``` root@f4b077232050:/go/src/github.com/opencontainers/runc# make static CGO_ENABLED=1 go build -tags "seccomp cgo static_build" -ldflags "-w -extldflags -static -X main.gitCommit="1c81e2a794c6e26a4c650142ae8893c47f619764" -X main.version=1.0.0-rc4+dev " -o runc . /tmp/go-link-113476657/000007.o: In function `_cgo_a5acef59ed3f_Cfunc_dlopen': /tmp/go-build/github.com/opencontainers/runc/vendor/github.com/coreos/pkg/dlopen/_obj/cgo-gcc-prolog:76: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking ``` This fix disables systemd when `static_build` flag is on (apply_nosystemd.go is used instead). This fix also fixes a small bug in `apply_nosystemd.go` for return value. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:38:22 +00:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00
Michael Crosby	882d8eaba6	Merge pull request #1537 from tklauser/staticcheck Fix issues found by staticcheck	2017-08-02 09:52:11 -04:00
Tobias Klauser	e4e56cb6d8	libcontainer: remove ineffective break statements go's switch statement doesn't need an explicit break. Remove it where that is the case and add a comment to indicate the purpose where the removal would lead to an empty case. Found with honnef.co/go/tools/cmd/staticcheck Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-28 15:13:39 +02:00
Steven Hartland	ee4f68e302	Updated logrus to v1 Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen. This includes a manual edit of the docker term package to also correct the name there too. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-07-19 15:20:56 +00:00
Daniel, Dao Quang Minh	7139b61f7f	Merge pull request #1378 from derekwaynecarr/expose_use_hierarchy Expose memory.use_hierarchy in MemoryStats	2017-06-30 16:08:21 +01:00
Justin Cormack	3d9074ead3	Update memory specs to use int64 not uint64 replace #1492 #1494 fix #1422 Since https://github.com/opencontainers/runtime-spec/pull/876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Daniel, Dao Quang Minh	67bd2ab554	Merge pull request #1442 from clnperez/libcontainer-sys-unix Move libcontainer to x/sys/unix	2017-05-26 12:18:33 +01:00
Michael Crosby	18cd7e06f7	Merge pull request #1372 from cloudfoundry-incubator/cpuset-mount-root Handle container creation when cgroups have already been mounted in another location	2017-05-25 09:53:57 -07:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Derek Carr	4d6225aec2	Expose memory.use_hierarchy in MemoryStats Signed-off-by: Derek Carr <decarr@redhat.com>	2017-03-31 13:40:34 -04:00
Aleksa Sarai	baeef29858	rootless: add rootless cgroup manager The rootless cgroup manager acts as a noop for all set and apply operations. It is just used for rootless setups. Currently this is far too simple (we need to add opportunistic cgroup management), but is good enough as a first-pass at a noop cgroup manager. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Qiang Huang	8430cc4f48	Use uint64 for resources to keep consistency with runtime-spec Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-20 18:51:39 +08:00
Craig Furman	f5c5aac958	Create containers when cgroups already mounted Runc needs to copy certain files from the top of the cgroup cpuset hierarchy into the container's cpuset cgroup directory. Currently, runc determines which directory is the top of the hierarchy by using the parent dir of the first entry in /proc/self/mountinfo of type cgroup. This creates problems when cgroup subsystems are mounted arbitrarily in different dirs on the host. Now, we use the most deeply nested mountpoint that contains the container's cpuset cgroup directory. Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com> Signed-off-by: Will Martin <wmartin@pivotal.io>	2017-03-15 10:10:30 +00:00
Qiang Huang	8773c5f9a6	Remove unused function in systemd cgroup Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-07 15:11:37 +08:00
Michael Crosby	49a33c41f8	Merge pull request #1344 from xuxinkun/fixCPUQuota20170224 fix cpu.cfs_quota_us changed when systemd daemon-reload using systemd.	2017-03-06 10:02:28 -08:00
xuxinkun	c44aec9b23	fix cpu.cfs_quota_us changed when systemd daemon-reload using systemd. Signed-off-by: xuxinkun <xuxinkun@gmail.com>	2017-03-06 20:08:30 +11:00
Qiang Huang	fe898e7862	Fix kmem accouting when use with cgroupsPath Fixes: #1347 Fixes: #1083 The root cause of #1083 is because we're joining an existed cgroup whose kmem accouting is not initialized, and it has child cgroup or tasks in it. Fix it by checking if the cgroup is first time created, and we should enable kmem accouting if the cgroup is craeted by libcontainer with or without kmem limit configed. Otherwise we'll get issue like #1347 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-02-25 10:58:18 -08:00
Qiang Huang	6b1d0e76f2	Merge pull request #1127 from boynux/fix-set-mem-to-unlimited Fixes set memory to unlimited	2017-02-16 09:51:23 +08:00
Mohammad Arab	18ebc51b3c	Reset Swap when memory is set to unlimited (-1) Kernel validation fails if memory set to -1 which is unlimited but swap is not set so. Signed-off-by: Mohammad Arab <boynux@gmail.com>	2017-02-15 08:11:57 +01:00
Daniel, Dao Quang Minh	0fefa36f3a	Merge pull request #1278 from datawolf/scanner move error check out of the for loop	2017-01-20 17:49:44 +00:00
Daniel, Dao Quang Minh	b8cefd7d8f	Merge pull request #1266 from mrunalp/ignore_cgroup_v2 Ignore cgroup2 mountpoints	2017-01-20 17:26:46 +00:00
Wang Long	3a71eb0256	move error check out of the for loop The `bufio.Scanner.Scan` method returns false either by reaching the end of the input or an error. After Scan returns false, the Err method will return any error that occurred during scanning, except that if it was io.EOF, Err will return nil. We should check the error when Scan return false(out of the for loop). Signed-off-by: Wang Long <long.wanglong@huawei.com>	2017-01-18 05:02:39 +00:00
Qiang Huang	a9610f2c02	Merge pull request #1249 from datawolf/small-refactor small refactor	2017-01-13 02:04:59 -06:00
Mrunal Patel	c7ebda72ac	Add a test for testing that we ignore cgroup2 mounts Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-01-11 16:49:53 -08:00
Mrunal Patel	e7b57cb042	Ignore cgroup2 mountpoints Our current cgroup parsing logic assumes cgroup v1 mounts so we should ignore cgroup2 mounts for now Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-01-11 12:34:50 -08:00
Mrunal Patel	7ae521cef0	Merge pull request #1251 from datawolf/update-cgroup-comment cgroups: update the comments	2017-01-09 11:13:39 -08:00
Michael Crosby	44e60af49d	Merge pull request #1196 from hqhq/fix_cgroup_leftover Fix leftover cgroup directory issue	2017-01-09 10:31:04 -08:00
Wang Long	4732f46fd9	small refactor Signed-off-by: Wang Long <long.wanglong@huawei.com>	2017-01-04 11:39:44 +08:00
Wang Long	4dfd350a38	cgroups: update the comments Signed-off-by: Wang Long <long.wanglong@huawei.com>	2017-01-03 22:40:12 +08:00
Qiang Huang	14d58e1e48	Fix leftover cgroup directory issue In the cases that we got failure on a subsystem's Apply, we'll get some subsystems' cgroup directories leftover. On Docker's point of view, start a container failed, use `docker rm` to remove the container, but some cgroup files are leftover. Sometimes we don't want to clean everyting up when something went wrong, because we need these inter situation information to debug what's going on, but cgroup directories are not useful information we want to keep. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-11-22 08:02:43 +08:00
Qiang Huang	aee46862ec	Fix cpuset issue with cpuset.cpu_exclusive This PR fix issue in this scenario: ``` in terminal 1: ~# cd /sys/fs/cgroup/cpuset ~# mkdir test ~# cd test ~# cat cpuset.cpus 0-3 ~# echo 1 > cpuset.cpu_exclusive (make sure you don't have other cgroups under root) in terminal 2: ~# echo $$ > /sys/fs/cgroup/cpuset/test/tasks // set resources.cpu.cpus="0-2" in config.json ~# runc run test1 back to terminal 1: ~# cd test1 ~# cat cpuset.cpus 0-2 ~# echo 1 > cpuset.cpu_exclusive in terminal 3: ~# echo $$ > /sys/fs/cgroup/test/tasks // set resources.cpu.cpus="3" in config.json ~# runc run test2 container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused \"failed to write 0-3\\n to cpuset.cpus: write /sys/fs/cgroup/cpuset/test2/cpuset.cpus: invalid argument\"" ``` Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-11-18 15:28:40 +08:00
Derek Carr	d223e2adae	Ignore error when starting transient unit that already exists Signed-off-by: Derek Carr <decarr@redhat.com>	2016-10-19 14:55:52 -04:00
Daniel Dao	1b876b0bf2	fix typos with misspell pipe the source through https://github.com/client9/misspell. typos be gone! Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2016-10-11 23:22:48 +00:00
Michael Crosby	11222ee1f1	Don't enable kernel mem if not set Don't enable the kmem limit if it is not specified in the config. Fixes #1083 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-10-07 10:02:19 -07:00
derekwaynecarr	1a75f815d5	systemd cgroup driver supports slice management Signed-off-by: derekwaynecarr <decarr@redhat.com>	2016-09-27 16:01:37 -04:00
Mrunal Patel	5653ced544	Merge pull request #1059 from datawolf/use-WriteCgrougProc cgroup: using WriteCgroupProc to write the specified pid into the cgroup's cgroup.procs file	2016-09-22 11:31:35 -07:00
Wang Long	ce9951834c	cgroup: using WriteCgroupProc to write the specified pid into the cgroup's cgroup.procs file cgroupData.join method using `WriteCgroupProc` to place the pid into the proc file, it can avoid attach any pid to the cgroup if -1 is specified as a pid. so, replace `writeFile` with `WriteCgroupProc` like `cpuset.go`'s ApplyDir method. Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-09-21 10:57:03 +00:00
Mrunal Patel	f557996401	Add flag to allow getting all mounts for cgroups subsystems Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-09-15 15:19:27 -04:00
Wang Long	fd92846686	move m.GetPaths out of the loop only call m.GetPaths once is ok. os move it out of the loop. Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-09-13 12:19:48 +00:00
Michael Crosby	9a072b611e	Merge pull request #1013 from hqhq/fix_ps_issue Fix runc ps issue	2016-09-12 14:03:21 -07:00
Qiang Huang	b5b6989e9a	Fix runc pause and runc update Fixes: #1034 Fixes: #1031 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-09-12 16:02:56 +08:00
Qiang Huang	da7bac1c90	Fix runc ps issue After #1009, we don't always set `cgroup.Paths`, so `getCgroupPath()` will return wrong cgroup path because it'll take current process's cgroup as the parent, which would be wrong when we try to find the cgroup path in `runc ps` and `runc kill`. Fix it by using `m.GetPath()` to get the true cgroup paths. Reported-by: Yang Shukui <yangshukui@huawei.com> Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-09-12 15:41:16 +08:00
Yuanhong Peng	a71a301a28	Fix typo. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2016-09-09 16:18:54 +08:00
Alexander Morozov	0c6733d669	Merge pull request #970 from hqhq/fix_race_cgroup_paths Fix race condition when using cgroups.Paths	2016-08-23 10:47:00 -07:00
Michael Crosby	7d8f322fdd	Merge pull request #860 from bgray/806-set_cgroup_cpu_rt_before_joining Set the cpu cgroup RT sched params before joining.	2016-08-12 09:24:15 -07:00
Qiang Huang	6ecb469b2b	Fix race condition when using cgroups.Paths Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-08-02 15:43:04 +08:00
Qiang Huang	50f0a2b1e1	Merge pull request #962 from dubstack/fix_kmem_limits Remove kmem Initialization check while setting memory configuration	2016-08-02 10:04:18 +08:00
Mrunal Patel	56fc0ac9ce	Merge pull request #966 from sjenning/fix-initscope-cgroup-path fix init.scope in cgroup paths	2016-08-01 14:29:47 -07:00
Buddha Prakash	fcd966f501	Remove kmem Initialization check Signed-off-by: Buddha Prakash <buddhap@google.com>	2016-08-01 09:47:34 -07:00
Seth Jennings	4b44b98596	fix init.scope in cgroup paths Signed-off-by: Seth Jennings <sjenning@redhat.com>	2016-08-01 11:14:29 -05:00
Qiang Huang	1a81e9ab1f	Merge pull request #958 from dubstack/skip-devices Skip updates on parent Devices cgroup	2016-07-29 10:31:49 +08:00
Buddha Prakash	d4c67195c6	Add test Signed-off-by: Buddha Prakash <buddhap@google.com>	2016-07-28 17:14:51 -07:00
Buddha Prakash	ef4ff6a8ad	Skip updates on parent Devices cgroup Signed-off-by: Buddha Prakash <buddhap@google.com>	2016-07-25 10:30:46 -07:00
Daniel, Dao Quang Minh	f0e17e9a46	Merge pull request #961 from hqhq/revert_935 Revert "Use update time to detect if kmem limits have been set"	2016-07-21 14:51:21 +01:00
Daniel, Dao Quang Minh	ff88baa42f	Merge pull request #611 from mrunalp/fix_set Fix cgroup Set when Paths are specified	2016-07-21 14:00:22 +01:00
Qiang Huang	15c93ee9e0	Revert "Use update time to detect if kmem limits have been set" Revert: #935 Fixes: #946 I can reproduce #946 on some machines, the problem is on some machines, it could be very fast that modify time of `memory.kmem.limit_in_bytes` could be the same as before it's modified. And now we'll call `SetKernelMemory` twice on container creation which cause the second time failure. Revert this before we find a better solution. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-07-21 19:14:38 +08:00
Buddha Prakash	ebe85bf180	Allow cgroup creation without attaching a pid Signed-off-by: Buddha Prakash <buddhap@google.com>	2016-07-20 13:49:48 -07:00
Mrunal Patel	4dedd09396	Merge pull request #937 from hushan/net_cls-classid fix setting net_cls classid	2016-07-18 17:18:23 -04:00
Hushan Jia	bb42f80a86	fix setting net_cls classid Setting classid of net_cls cgroup failed: ERRO[0000] process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument" process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument" The spec has classid as a *uint32, the libcontainer configs should match the type. Signed-off-by: Hushan Jia <hushan.jia@gmail.com>	2016-07-11 05:00:35 +08:00
Vishnu kannan	8dd3d63455	Look at modify time to check if kmem limits are initialized. Signed-off-by: Vishnu kannan <vishnuk@google.com>	2016-07-06 15:14:25 -07:00
Ben	14e55d1692	Add unit test for setting the CPU RT sched cgroups values at apply time Added a unit test to verify that 'cpu.rt_runtime_us' and 'cpu.rt_runtime_us' cgroup values are set when the cgroup is applied to a process. Signed-off-by: Ben Gray <ben.r.gray@gmail.com>	2016-07-04 13:11:53 +01:00
ben	950700e73c	Set the 'cpu.rt_runtime_us' and 'cpu.rt_runtime_us' values of the cpu cgroup before trying to move the process into the cgroup. This is required if runc itself is running in SCHED_RR mode, as it is not possible to add a process in SCHED_RR mode to a cgroup which hasn't been assigned any RT bandwidth. And RT bandwidth is not inherited, each new cgroup starts with 0 b/w. Signed-off-by: Ben Gray <ben.r.gray@gmail.com>	2016-07-04 13:10:21 +01:00
Qiang Huang	42dfd60643	Merge pull request #904 from euank/fix-cgroup-parsing-err cgroups: Fix issue if cgroup path contains :	2016-06-14 14:19:20 +08:00
rajasec	146218ab92	Removing unused variable for cgroup subsystem Signed-off-by: rajasec <rajasec79@gmail.com>	2016-06-12 12:35:49 +05:30
Euan Kemp	394610a396	cgroups: Parse correctly if cgroup path contains : Prior to this change a cgroup with a `:` character in it's path was not parsed correctly (as occurs on some instances of systemd cgroups under some versions of systemd, e.g. 225 with accounting). This fixes that issue and adds a test. Signed-off-by: Euan Kemp <euank@coreos.com>	2016-06-10 23:09:03 -07:00
Christian Brauner	a1f8e0f184	fail if path to devices subsystem is missing The presence of the "devices" subsystem is a necessary condition for a (privileged) container. Signed-off-by: Christian Brauner <cbrauner@suse.com>	2016-06-08 16:44:15 +02:00
Daniel, Dao Quang Minh	d5ecf5c67c	systemd cgroup: check for Delegate property Delegate is only available in systemd >218, applying it for older systemd will result in an error. Therefore we should check for it when testing systemd properties. Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>	2016-06-01 14:32:24 +00:00
Qiang Huang	6fa490c664	Remove use_hierarchy check when set kernel memory Kernel memory cannot be set in these circumstances (before kernel 4.6): 1. kernel memory is not initialized, and there are tasks in cgroup 2. kernel memory is not initialized, and use_hierarchy is enabled, and there are sub-cgroups While we don't need to cover case 2 because when we set kernel memory in runC, it's either: - in Apply phase when we create the container, and in this case, set kernel memory would definitely be valid; - or in update operation, and in this case, there would be tasks in cgroup, we only need to check if kernel memory is initialized or not. Even if we want to check use_hierarchy, we need to check sub-cgroups as well, but for here, we can just leave it aside. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-05-28 15:22:58 +08:00
Mrunal Patel	4a8f0b4db4	Fix cgroup Set when Paths are specified Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-05-09 16:06:03 -07:00
Kenfe-Mickael Laventure	27814ee120	Allow updating kmem.limit_in_bytes if initialized at cgroup creation Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-05-06 08:05:15 -07:00
Jim Berlage	c5b0caf76d	Correct outdated URL `libcontainer/cgroups/utils.go` uses an incorrect path to the documentation for cgroups. This updates the comment to use the correct URL. Fixes #794. Signed-off-by: Jim Berlage <james.berlage@gmail.com>	2016-04-29 10:44:27 -05:00
Tatsushi Inagaki	2a1a6cdf44	Cgroup: reduce redundant parsing of mountinfo Avoid parsing the whole lines of mountinfo after all mountpoints of the target subsytems are found, or when the target subsystem is not enabled. Signed-off-by: Tatsushi Inagaki <e29253@jp.ibm.com>	2016-04-22 09:41:28 +09:00
Michael Crosby	660029b476	Merge pull request #745 from AkihiroSuda/very-trivial-style-fix Fix trivial style errors reported by `go vet` and `golint`	2016-04-12 13:33:00 -07:00
Akihiro Suda	1829531241	Fix trivial style errors reported by `go vet` and `golint` No substantial code change. Note that some style errors reported by `golint` are not fixed due to possible compatibility issues. Signed-off-by: Akihiro Suda <suda.kyoto@gmail.com>	2016-04-12 08:13:16 +00:00
Qiang Huang	792251ae38	Fix problem when swap memory unsupported When swap memory is unsupported, Docker will set cgroup.Resources.MemorySwap as -1. Fixes: https://github.com/docker/docker/pull/21937 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-04-12 15:08:10 +08:00
Mrunal Patel	3f4f4420fd	Merge pull request #592 from hqhq/hq_fix_update_memory Fix problem when update memory and swap memory	2016-04-05 10:19:33 -07:00
Mrunal Patel	857d418b09	Merge pull request #698 from ggaaooppeenngg/gaopeng/format-errorf Use %v for map structure format	2016-03-28 09:28:28 -07:00
Qiang Huang	d8b8f76c4f	Fix problem when update memory and swap memory Currently, if we start a container with: `docker run -ti --name foo --memory 300M --memory-swap 500M busybox sh` Then we want to update it with: `docker update --memory 600M --memory-swap 800M foo` It'll get error because we can't set memory to 600M with the 500M limit of swap memory. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-03-28 10:48:29 +08:00
Peng Gao	ffbc626e53	Use %v for map structure format Based on Golang document, %s is for "the uninterpreted bytes of the string or slice", so %v is more appropriate. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>	2016-03-26 23:28:59 +08:00
allencloud	10cc27888c	fix typos Signed-off-by: allencloud <allen.sun@daocloud.io>	2016-03-25 11:11:48 +08:00
Qiang Huang	69f8a50081	Merge pull request #669 from mrunalp/fix_test Fix the kmem TCP test	2016-03-22 09:45:13 +08:00
Michael Crosby	e80b6b67e6	Merge pull request #651 from mrunalp/quota_validation Add more information in the error messages when writing to a file	2016-03-21 17:53:49 -07:00
Mrunal Patel	73e48633a3	Fix the kmem TCP test Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-03-21 15:51:42 -07:00
Mrunal Patel	4d7929274d	Merge pull request #644 from cyphar/fix-pids-max-unlimited libcontainer: cgroups: deal with unlimited case for pids.max	2016-03-21 14:55:20 -07:00
Mrunal Patel	35541ebcd2	Add more information in the error messages when writing to a file This is helpful to debug "invalid argument" errors when writing to cgroup files Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-03-21 09:27:24 -07:00
Aleksa Sarai	f5e60cf775	libcontainer: cgroups: add statistics for kmem.tcp Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-20 22:04:02 +11:00
Aleksa Sarai	1448fe9568	libcontainer: cgroups: add support for kmem.tcp limits Kernel TCP memory has its own special knobs inside the cgroup. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-20 22:03:52 +11:00
Aleksa Sarai	a6d5179f60	libcontainer: cgroups: add tests for pids.max == "max" Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-18 08:46:24 +11:00
Aleksa Sarai	087b953dc5	libcontainer: cgroups: deal with unlimited case for pids.max Make sure we don't error out collecting statistics for cases where pids.max == "max". In that case, we can use a limit of 0 which means "unlimited". In addition, change the name of the stats attribute (Max) to mirror the name of the resources attribute in the spec (Limit) so that it's consistent internally. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-18 08:46:24 +11:00
Jessica Frazelle	2c5b10189c	remove deadcode Signed-off-by: Jessica Frazelle <acidburn@docker.com>	2016-03-17 13:36:28 -07:00
Mrunal Patel	93d1a1a6ea	Set Delegate to true for cgroups transient units This is required because we manage some of the cgroups ourselves. This recommendation came from talking with systemd devs about some of the issues that we see when using the systemd cgroups driver. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-03-16 09:44:27 -07:00
Aleksa Sarai	64286b443d	libcontainer: cgroups: add tests for pids.max in PidsStats Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-13 14:16:38 +11:00
Aleksa Sarai	2b1e086f62	libcontainer: cgroups: add pids.max to PidsStats In order to allow nice usage statistics (in terms of percentages and other such data), add the value of pids.max to the PidsStats struct returned from the pids cgroup controller. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-03-13 04:53:20 +11:00
Mrunal Patel	5b439d8c48	Merge pull request #491 from hqhq/hq_cleanup_systemd_apply Cleanup systemd apply	2016-03-08 08:32:02 -08:00
Mrunal Patel	6fc66fea48	Merge pull request #601 from LK4D4/fix_stats_race Fix race between Apply and GetStats	2016-02-29 11:01:09 -08:00
Alexander Morozov	e5906f7ed5	Fix race between Apply and GetStats Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2016-02-29 08:50:42 -08:00
rajasec	3b2805834b	Adding linux label to test file Signed-off-by: rajasec <rajasec79@gmail.com> Fixed review comments Signed-off-by: rajasec <rajasec79@gmail.com>	2016-02-25 07:52:32 +05:30
Phil Estes	0b5581fd28	Handle memory swappiness as a pointer to handle default/unset case This prior fix to set "-1" explicitly was lost, and it is simpler to use the same pointer type from the OCI spec to handle nil pointer == -1 == unset case. Also, as a nearly humorous aside, there was a test for MemorySwappiness that was actually setting Memory, and it was passing because of this bug (as it was always setting everyone's MemorySwappiness to zero!) Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2016-02-24 09:02:06 -06:00
Michael Crosby	ee6a72df4e	Merge pull request #577 from crosbymichael/m-named-cgroup Move the process outside of the systemd cgroup	2016-02-19 13:51:58 -08:00
Michael Crosby	47f16e89df	Move the process outside of the systemd cgroup If you don't move the process out of the named cgroup for systemd then systemd will try to delete all the cgroups that the process is currently in. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-02-19 11:26:46 -08:00
Alexander Morozov	98cbce80fb	Look for " - " instead of just - as separator - symbol can appear in any path Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2016-02-18 09:58:29 -08:00
Mrunal Patel	2c489ce2d9	Merge pull request #564 from hallyn/2016-02-16/userns.devicecg Do not set devices cgroup entries if in a user namespace	2016-02-17 09:25:24 +05:30
Serge Hallyn	655f8ea808	Do not set devices cgroup entries if in a user namespace When in a non-initial user namespace you cannot update the devices cgroup whitelist (or blacklist). The kernel won't allow it. So detect that case and don't try. This is a step to being able to run docker/runc containers inside a user namespaced container. Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>	2016-02-16 19:39:43 -08:00
Mrunal Patel	a86e44cf8f	Merge pull request #556 from hqhq/hq_remove_unneeded_cleanup Remove unneeded cgroups path removal	2016-02-17 08:31:35 +05:30
Qiang Huang	bda7742019	Cleanup systemd apply Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-02-15 15:56:59 +08:00
Qiang Huang	7b88f34d6e	Remove unneeded cgroups path removal It's handled in `destroy()`, no need to do this in `Apply()`. I found this because systemd cgroup didn't do this removal and it works well. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-02-15 11:22:13 +08:00
Aleksa Sarai	21dc85c4b8	libcontainer: cgroups: fs: add cgroup path safety unit tests In order to avoid problems with security regressions going unnoticed, add some unit tests that should make sure security regressions in cgroup path safety cause tests to fail in runC. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-02-14 00:37:21 +11:00
Aleksa Sarai	b8dc5213e8	libcontainer: cgroups: fs: fix path safety Ensure that path safety is maintained, this essentially reapplies `c0cad6aa5e` ("cgroups: fs: fix cgroup.Parent path sanitisation"), which was accidentally removed in `256f3a8ebc` ("Add support for CgroupsPath field"). Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-02-14 00:37:21 +11:00
Aleksa Sarai	90140a5688	libcontainer: cgroups: fs: fix innerPath Fix m.Path legacy code to actually work. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-02-14 00:37:21 +11:00
Kenfe-Mickael Laventure	256f3a8ebc	Add support for CgroupsPath field Fixes #396 Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-02-10 11:26:51 -08:00
Kenfe-Mickael Laventure	dceeb0d0df	Move pathClean to libcontainer/utils.CleanPath Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-02-09 16:21:58 -08:00
Michael Crosby	3baae2d525	Update runc for devices changes Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-02-08 13:15:12 -08:00
Michael Crosby	5fe15a53b6	Merge pull request #496 from LK4D4/remove_sscanf Remove usage of GetMounts from GetCgroupMounts	2016-02-04 14:55:41 -08:00
Kenfe-Mickael Laventure	7a12c92dbe	Add limit value to memory stats The value is populated with the content of `limit_in_bytes`. Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-02-03 11:54:09 -08:00
Alexander Morozov	97146f4dc6	Remove usage of GetMounts from GetCgroupMounts GetMounts is very cpu-expensive. I'll change other funcs in this package to reuse code from GetCgroupMounts later. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2016-02-01 11:00:23 -08:00
Aleksa Sarai	57ba666ef3	cgroup: systemd: further systemd slice validation Add some further (not critical, since Docker does this already) validation to systemd slice names, to make sure users don't get cryptic errors. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-27 19:00:52 +11:00
Aleksa Sarai	8b32914065	cgroup: systemd: properly expand systemd slice names Rather than using '/' to denote hierarchy in slice names, systemd uses '-' in an odd way. This results in runC incorrectly assuming that certain kernel features are missing (and using inconsistent paths for the cgroups not supported by systemd), because the "subsystem path" used is not the one that systemd has created. Fix all of this by properly expanding slice names. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-25 23:18:34 +11:00
Aleksa Sarai	75e38f94a0	cgroups: set memory cgroups in Set Modify the memory cgroup code such that kmem is not managed by Set(), in order to allow updating of memory constraints for containers by Docker. This also removes the need to make memory a special case cgroup. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-22 07:46:43 +11:00
Mrunal Patel	41d9d26513	Add support for just joining in apply using cgroup paths Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-01-20 14:23:05 -05:00
Alex Dadgar	a42f3236d5	Only validate post-hyphen field length on cgroup mounts Signed-off-by: Alex Dadgar <alex.dadgar@gmail.com>	2016-01-13 11:28:49 -08:00
Mrunal Patel	4c767d7046	Merge pull request #446 from cyphar/18-add-pids-controller cgroup: add PIDs cgroup controller support	2016-01-11 16:56:00 -08:00
Aleksa Sarai	a95483402e	libcontainer: cgroups: loudly fail with Set It is vital to loudly fail when a user attempts to set a cgroup limit (rather than using the system default). Otherwise the user will assume they have security they do not actually have. This mirrors the original Apply() (that would set cgroup configs) semantics. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:35 +11:00
Aleksa Sarai	f36ed4b174	libcontainer: cgroups: don't Set in Apply Apply and Set are two separate operations, and it doesn't make sense to group the two together (especially considering that the bootstrap process is added to the cgroup as well). The only exception to this is the memory cgroup, which requires the configuration to be set before processes can join. One of the weird cases to deal with is systemd. Systemd sets some of the cgroup configuration options, but not all of them. Because memory is a special case, we need to explicitly set memory in the systemd Apply(). Otherwise, the rest can be safely re-applied in .Set() as usual. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:35 +11:00
Aleksa Sarai	db3159c9d9	libcontainer: cgroups: add pids controller support Add support for the pids cgroup controller to libcontainer, a recent feature that is available in Linux 4.3+. Unfortunately, due to the init process being written in Go, it can spawn an an unknown number of threads due to blocked syscalls. This results in the init process being unable to run properly, and thus small pids.max configs won't work properly. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:32 +11:00
Alexander Morozov	c0cad6aa5e	Merge pull request #451 from cyphar/fix-infinite-recursion cgroups: fs: fix cgroup.Parent path sanitisation	2016-01-11 08:52:26 -08:00
Aleksa Sarai	bf899fef45	cgroups: fs: fix cgroup.Parent path sanitisation Properly sanitise the --cgroup-parent path, to avoid potential issues (as it starts creating directories and writing to files as root). In addition, fix an infinite recursion due to incomplete base cases. It might be a good idea to move pathClean to a separate library (which deals with path safety concerns, so all of runC and Docker can take advantage of it). Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-11 23:10:35 +11:00
Jimmi Dyson	91c7024e52	Revert to non-recursive GetPids, add recursive GetAllPids Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>	2016-01-08 19:42:25 +00:00
Mrunal Patel	4124ba9468	Revert "cgroups: add pids controller support" Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-12-19 07:48:48 -08:00
Aleksa Sarai	88e6d489f6	libcontainer: cgroups: loudly fail with Set It is vital to loudly fail when a user attempts to set a cgroup limit (rather than using the system default). Otherwise the user will assume they have security they do not actually have. This mirrors the original Apply() (that would set cgroup configs) semantics. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2015-12-19 11:30:47 +11:00
Aleksa Sarai	8a740d5391	libcontainer: cgroups: don't Set in Apply Apply and Set are two separate operations, and it doesn't make sense to group the two together (especially considering that the bootstrap process is added to the cgroup as well). The only exception to this is the memory cgroup, which requires the configuration to be set before processes can join. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2015-12-19 11:30:47 +11:00
Aleksa Sarai	37789f5bf1	libcontainer: cgroups: add pids controller support Add support for the pids cgroup controller to libcontainer, a recent feature that is available in Linux 4.3+. Unfortunately, due to the init process being written in Go, it can spawn an an unknown number of threads due to blocked syscalls. This results in the init process being unable to run properly, and thus small pids.max configs won't work properly. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2015-12-19 11:30:38 +11:00
Qiang Huang	9d6ce7168a	Merge pull request #434 from mrunalp/resources Move the cgroups setting into a Resources struct	2015-12-17 09:34:29 +08:00
Mrunal Patel	55a49f2110	Move the cgroups setting into a Resources struct This allows us to distinguish cases where a container needs to just join the paths or also additionally set cgroups settings. This will help in implementing cgroupsPath support in the spec. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-12-16 15:53:31 -05:00
David Calavera	977991d36f	Replace docker units package with new docker/go-units. It's the same library but it won't live in docker/docker anymore. Signed-off-by: David Calavera <david.calavera@gmail.com>	2015-12-14 20:45:30 -05:00
Qiang Huang	7695a0ddb0	systemd: support cgroup parent with specified slice Pick up #119 Fixes: docker/docker#16681 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-12-02 23:57:02 -05:00
Qiang Huang	209c8d9979	Add some comments about cgroup We fixed some bugs and introduced some code hard to be understood, add some comments for them. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-11-05 19:12:53 +08:00
Qiang Huang	8c98ae27ac	Refactor cgroupData The former cgroup entry is confusing, separate it to parent and name. Rename entry `c` to `config`. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-11-05 19:12:53 +08:00
Qiang Huang	a263afaf6c	Rename parent and data 'parent' function is confusing with parent cgroup, it's actually parent path, so rename it to parentPath. The name 'data' is too common to be identified, rename it to cgroupData which is exactly what it is. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-11-05 19:12:53 +08:00
Mrunal Patel	cf73b32eeb	Merge pull request #343 from hqhq/hq_unify_behavior_for_memory Unify behavior for memory cgroup	2015-11-02 14:58:31 -08:00
Doug Davis	e5dc12a0c9	Add more context around some error cases Signed-off-by: Doug Davis <dug@us.ibm.com>	2015-10-30 10:55:48 -07:00
John Howard	fb5a8febce	Fixes build tags on cgroups\fs\*.go Signed-off-by: John Howard <jhoward@microsoft.com>	2015-10-23 13:41:10 -07:00
Qiang Huang	194e0e4db6	Unify behavior for memory cgroup We have a rule that for optional cgroups, don't fail if some of them are not mounted, but we want it fail hard when a user specifies an option and we are unable to fulfill the request. Memory cgroup should also follow this rule. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-10-20 14:01:48 +08:00
Michael Crosby	ba2ce3b25a	Cgroup set order for systemd Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-10-19 13:32:45 -07:00
Michael Crosby	2554f49d5e	Use array instead of map for cgroup subsystems Also add cpuset as the first in the list to address issues setting the pid in any cgroup before the cpuset is populated. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-10-15 15:24:53 -07:00
Michael Crosby	02fdc70837	Add Name() to cgroup subsystems Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-10-15 15:19:23 -07:00
Mrunal Patel	3be7f87b1b	Merge pull request #334 from hqhq/hq_set_cpus_mems_first Set cpuset.cpus and cpuset.mems before join the cgroup	2015-10-15 14:33:28 -07:00
Qiang Huang	be6764508e	Set cpuset.cpus and cpuset.mems before join the cgroup It can avoid unnecessary task migrataion, see this scenario: - container init task is on cpu 1, and we assigned it to cpu 1, but parent cgroup's cpuset.cpus=2 - we created the cgroup dir and inherited cpuset.cpus from parent as 2 - write container init task's pid to cgroup.procs - [it's possibile the container init task migrated to cpu 2 here] - set cpuset.cpus as assigned to cpu 1 - [the container init task has to be migrated back to cpu 1] So we should set cpuset.cpus and cpuset.mems before writing pids to cgroup.procs to aviod such problem. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-10-15 11:16:56 +08:00
Alexander Morozov	6c198ae2d0	Reorder checks in Walk to avoid panics Also added test for host PID namespace Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-10-13 15:06:57 -07:00
Alexander Morozov	6dad176d01	Get PIDs from cgroups recursively Also lookup cgroup for systemd is changed to "device" to be consistent with fs implementation. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-10-13 10:19:01 -07:00
Mrunal Patel	cc84f2cc9b	Merge pull request #305 from hqhq/hq_add_softlimit_systemd Add memory reservation support for systemd	2015-10-05 16:37:32 -07:00
Mrunal Patel	223975564a	Merge pull request #276 from runcom/adapt-spec-96bcd043aa8a28f6f64c95ad61329765f01de1ba Adapt spec `96bcd043aa`	2015-10-05 16:36:09 -07:00
Mrunal Patel	79a02e35fb	cgroups: Add name=systemd to list of subsystems This allows getting the path to the subsystem and so is subsequently used in EnterPid by an exec process. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-10-05 14:24:11 -04:00
Mrunal Patel	1940c73777	cgroups: Add a name cgroup This is meant to be used in retrieving the paths so an exec process enters all the cgroup paths correctly. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-10-05 14:23:05 -04:00
Antonio Murdaca	c6e406af24	Adjust runc to new opencontainers/specs version Godeps: Vendor opencontainers/specs `96bcd043aa` Fix a bug where it's impossible to pass multiple devices to blkio cgroup controller files. See https://github.com/opencontainers/runc/issues/274 Signed-off-by: Antonio Murdaca <runcom@linux.com>	2015-10-03 12:25:33 +02:00
Alexander Morozov	0954faba13	Merge pull request #306 from hqhq/hq_join_perfevent_systemd Systemd: Join perf_event cgroup	2015-10-01 10:05:35 -07:00
Alexander Morozov	e32b3442ec	Run tests for all HugetlbSizes Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-09-29 17:08:41 -07:00
Qiang Huang	6a5ba1109c	Systemd: Join perf_event cgroup Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-09-29 15:42:29 +08:00
Qiang Huang	fb5a56fb97	Add memory reservation support for systemd Seems it's missed in the first place. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-09-29 10:02:12 +08:00
Mrunal Patel	ef9471fd5b	Merge pull request #253 from avagin/cr-cgroups c/r: create cgroups to restore a container	2015-09-11 18:03:40 -07:00
Andrey Vagin	da2535f2d1	mount: don't read /proc/self/cgroup many times Signed-off-by: Andrey Vagin <avagin@openvz.org>	2015-09-10 21:00:22 +03:00
Andrey Vagin	e49c1dc559	Rework ParseCgroupFile Currently we parse /proc/self/cgroup for each controller. It's ineffective. Signed-off-by: Andrey Vagin <avagin@openvz.org>	2015-09-10 20:59:27 +03:00
Qiang Huang	b94fe5b7f8	Fix bug in find cgroup mount point dir Bug was introduced in #250 According to: http://man7.org/linux/man-pages/man5/proc.5.html 36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue (1)(2)(3) (4) (5) (6) (7) (8) (9) (10) (11) ... (7) optional fields: zero or more fields of the form "tag[:value]". The 7th field is optional. We should skip it when parsing mount info. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-09-10 08:29:12 +08:00
Qiang Huang	f2ec7eff7e	Rename FindCgroupMountpointAndSource Rename it to FindCgroupMountpointAndRoot. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-09-09 09:29:11 +08:00
Qiang Huang	bc67941c72	Parse directly in FindCgroupMountpointDir Unify it with FindCgroupMountpoint, and add comments why we should to do this. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-09-09 09:28:50 +08:00
Mrunal Patel	c20bda3f71	Merge pull request #206 from mountkin/ensure-cleanup Ensure the cleanup jobs in the deferrer are executed on error	2015-08-18 14:16:31 -07:00
Shijiang Wei	f0679089b9	Ensure the cleanup jobs in the deferrer are executed on error Signed-off-by: Shijiang Wei <mountkin@gmail.com>	2015-08-16 12:29:04 +08:00
Alexander Morozov	2b28b3c276	Always use cgroup root of current process Because for host PID namespace /proc/1/cgroup can point to whole other world of cgroups. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-08-11 18:04:59 -07:00
Alexander Morozov	5aa6005498	Revert "Fix cgroup parent searching" This reverts commit `2f9052ca29`. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-08-11 18:04:55 -07:00
Alexander Morozov	2f9052ca29	Fix cgroup parent searching I had pretty convenient input data to miss this bug. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-08-10 14:30:05 -07:00
Michael Crosby	b1821a4edc	Merge pull request #150 from runcom/update-go-systemd-dbus-v3 Update go systemd dbus v3	2015-08-03 16:11:52 -04:00
Kir Kolyshkin	6f82d4b544	Simplify and fix os.MkdirAll() usage TL;DR: check for IsExist(err) after a failed MkdirAll() is both redundant and wrong -- so two reasons to remove it. Quoting MkdirAll documentation: > MkdirAll creates a directory named path, along with any necessary > parents, and returns nil, or else returns an error. If path > is already a directory, MkdirAll does nothing and returns nil. This means two things: 1. If a directory to be created already exists, no error is returned. 2. If the error returned is IsExist (EEXIST), it means there exists a non-directory with the same name as MkdirAll need to use for directory. Example: we want to MkdirAll("a/b"), but file "a" (or "a/b") already exists, so MkdirAll fails. The above is a theory, based on quoted documentation and my UNIX knowledge. 3. In practice, though, current MkdirAll implementation [1] returns ENOTDIR in most of cases described in #2, with the exception when there is a race between MkdirAll and someone else creating the last component of MkdirAll argument as a file. In this very case MkdirAll() will indeed return EEXIST. Because of #1, IsExist check after MkdirAll is not needed. Because of #2 and #3, ignoring IsExist error is just plain wrong, as directory we require is not created. It's cleaner to report the error now. Note this error is all over the tree, I guess due to copy-paste, or trying to follow the same usage pattern as for Mkdir(), or some not quite correct examples on the Internet. [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go Signed-off-by: Kir Kolyshkin <kir@openvz.org>	2015-07-29 18:03:27 -07:00
Mrunal Patel	0e72bfb815	Fix files not closed in mountinfo parsing function Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-07-27 19:33:39 -04:00
Antonio Murdaca	5eab2d59d3	Swap check for systemd booted to use go-systemd method Signed-off-by: Antonio Murdaca <runcom@linux.com>	2015-07-25 01:36:14 +02:00
Antonio Murdaca	15741a4ab3	Adapt code to go-systemd/dbus v3 Signed-off-by: Antonio Murdaca <runcom@linux.com>	2015-07-24 15:54:59 +02:00
Alexander Morozov	c0e18b96fb	Fix subsystem path with abs parent Sometimes subsystem can be mounted to path like "subsystem1,subsystem2", so we need to handle this. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-07-20 11:48:58 -07:00
Alexander Morozov	fc31076c23	Substract source mount from cgroup dir This is needed because for nested containers cgroups. Without this patch they creating unnecessary intermediate cgroup like: /sys/fs/cgroup/memory/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/docker/908ebcc9c13584a14322ec070bd971e0de62f126c0cd95c079acdb99990ad3a3 It is because in /proc/self/cgroup we see paths from host, and they don't exist in container. Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-07-17 11:41:58 -07:00
Mrunal Patel	2598484b97	Merge pull request #130 from LK4D4/cgroups_mount_fix Cgroups mount fix	2015-07-16 10:49:13 -07:00
Alexander Morozov	e289cf734b	Fix handling name= cgroups Before name=systemd cgroup was mounted inside container to /sys/fs/cgroup/name=systemd, which is wrong, it should be /sys/fs/cgroup/systemd Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-07-15 13:58:17 -07:00
Alexander Morozov	40b9b89107	Substract bindmount path from cgroup dir Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-07-15 10:41:25 -07:00
Qiang Huang	4e244108ef	Fix error when memory cgroup not mounted Fixes: #57 Normally all cgroup subsystems are optional except device cgroup, but memory cgroup optional was broken by: https://github.com/docker/libcontainer/pull/637 This patch fixes this. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-07-13 18:22:35 +08:00
Qiang Huang	b4d1df0131	Add oom-kill-disable support for systemd Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2015-07-08 09:21:46 +08:00
Raghavendra K T	88104a4444	Treat -1 as default value for memory swappiness. In some older kernels setting swappiness fails. This happens even when nobody tries to configure swappiness from docker UI because we would still get some default value from host config. With this we treat -1 value as default value (set implicitly) and skip the enforcement of swappiness. However from the docker UI setting an invalid value anything other than 0-100 including -1 should fail. This patch enables that fix in docker UI. without this fix container creation with invalid value succeeds with a default value (60) which in incorrect. Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>	2015-07-03 18:19:45 +05:30
unclejack	9408c09d50	libcontainer: gofmt pass	2015-06-24 01:57:42 +03:00
Michael Crosby	080df7ab88	Update import paths for new repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:59 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

... 3 4 5 6 7 ...

430 Commits