jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
John Hwang	7fc291fd45	Replace formatted errors when unneeded Signed-off-by: John Hwang <John.F.Hwang@gmail.com>	2020-05-16 18:13:21 -07:00
Yuanhong Peng	df3fa115f9	Add support for cgroup namespace Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-31 10:51:43 -04:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Aleksa Sarai	823c06eae9	libcontainer: improve "kernel.{domainname,hostname}" sysctl handling These sysctls are namespaced by CLONE_NEWUTS, and we need to use "kernel.domainname" if we want users to be able to set an NIS domainname on Linux. However we disallow "kernel.hostname" because it would conflict with the "hostname" field and cause confusion (but we include a helpful message to make it clearer to the user). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-06-18 21:48:04 +10:00
Giuseppe Scrivano	cbcc85d311	runc: not require uid/gid mappings if euid()==0 When running in a new unserNS as root, don't require a mapping to be present in the configuration file. We are already skipping the test for a new userns to be present. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-06-12 12:45:54 +02:00
Tamal Saha	58415b4b12	Fix error message Signed-off-by: Tamal Saha <tamal@appscode.com>	2018-03-21 20:52:09 -07:00
Will Martin	ca4f427af1	Support cgroups with limits as rootless Signed-off-by: Ed King <eking@pivotal.io> Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io> Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>	2017-10-05 11:22:54 +01:00
Giuseppe Scrivano	3282f5a7c1	tests: fix for rootless multiple uids/gids Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	d8b669400a	rootless: allow multiple user/group mappings Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Daniel, Dao Quang Minh	13a8c5d140	Merge pull request #1365 from hqhq/use_go_selinux Use opencontainers/selinux package	2017-04-15 14:22:32 +01:00
Aleksa Sarai	f0876b0427	libcontainer: configs: add proper HostUID and HostGID Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Qiang Huang	5e7b48f7c0	Use opencontainers/selinux package It's splitted as a separate project. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-23 08:21:19 +08:00
Samuel Ortiz	f19aa2d04d	validate: Check that the given namespace path is a symlink When checking if the provided networking namespace is the host one or not, we should first check if it's a symbolic link or not as in some cases we can use persistent networking namespace under e.g. /var/run/netns/. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-12-10 11:14:49 +01:00
Qiang Huang	81d6088c8f	Unify rootfs validation Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-10-29 10:31:44 +08:00
Aleksa Sarai	1ab3c035d2	validator: actually test success Previously we only tested failures, which causes us to miss issues where setting sysctls would always fail. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-26 23:07:57 +11:00
Aleksa Sarai	2a94c3651b	validator: unbreak sysctl net.* validation When changing this validation, the code actually allowing the validation to pass was removed. This meant that any net.* sysctl would always fail to validate. Fixes: `bc84f83344` ("fix docker/docker#27484") Reported-by: Justin Cormack <justin.cormack@docker.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-26 22:58:51 +11:00
Ce Gao	41c35810f2	add test cases about host ns Signed-off-by: Ce Gao <ce.gao@outlook.com>	2016-10-22 11:31:15 +08:00
Ce Gao	bc84f83344	fix docker/docker#27484 Signed-off-by: Ce Gao <ce.gao@outlook.com>	2016-10-22 11:22:52 +08:00
Zhao Lei	bac8b4f0b4	UNITTEST: Bypass userns test on platform without userns support We should bypass userns test instead of show fail in platform without userns support. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>	2016-07-25 15:35:04 +08:00
Aleksa Sarai	399175c227	Merge pull request #679 from rajasec/selinux-errorcheck Adding selinux check during container start	2016-04-24 16:24:26 +00:00
rajasec	733ff99f6d	Updating kcore in validator test Signed-off-by: rajasec <rajasec79@gmail.com>	2016-04-21 15:29:19 +05:30
rajasec	d0bf80e481	Adding selinux check during container start Signed-off-by: rajasec <rajasec79@gmail.com> Fixed review comments and rebased Signed-off-by: rajasec <rajasec79@gmail.com> updated the message as per review comment Signed-off-by: Rajasekaran <rajasec79@gmail.com>	2016-04-19 22:22:04 +05:30
Mrunal Patel	5640330693	Fix for runc failing when rootfs has a traling slash Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-04-11 09:50:28 -07:00
Alberto Leal	dca2d12760	Add unit tests for validate.Validator Signed-off-by: Alberto Leal <albertonb@gmail.com>	2016-04-06 11:18:11 +01:00
Dan Walsh	d2a39ea043	Return a more meaningful error when namespaces are disabled Signed-off-by: Dan Walsh <dwalsh@redhat.com>	2016-03-30 16:16:24 -04:00
Mrunal Patel	f7d1401a69	Add validation for sysctl /proc/sys isn't completely namespaced and only some properties are allowed per linux namespace. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-09-25 14:04:18 -04:00
Michael Crosby	080df7ab88	Update import paths for new repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:59 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

32 Commits