jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Mrunal Patel	4769cdf607	Merge pull request #1916 from crosbymichael/cgns Add support for cgroup namespace	2018-11-13 12:21:38 -08:00
Michael Crosby	bd420b59f1	Merge pull request #1925 from Ace-Tang/fix_dup_ns test: fix TestDupNamespaces fail to test dup-ns error	2018-11-13 12:11:11 -05:00
Ace-Tang	16d55f17a8	libcontainer: fix potential panic if spec.Process is nil for the code logic, pointer 'spec.Process' should be judge first to avoid panic. Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:55:30 +08:00
Ace-Tang	95d1aa1886	test: fix TestDupNamespaces add Root in created spec, or error message is 'Root must be specified' Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:36:27 +08:00
Aleksa Sarai	9a3a8a5ebf	libcontainer: implement CLONE_NEWCGROUP This is a very simple implementation because it doesn't require any configuration unlike the other namespaces, and in its current state it only masks paths. This feature is available in Linux 4.6+ and is enabled by default for kernels compiled with CONFIG_CGROUP=y. Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-23 16:23:00 -04:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Jonathan Marler	1499c746a1	Move spec.Linux.IntelRdt check to spec.Linux != nil block Signed-off-by: Jonathan Marler <johnnymarler@gmail.com>	2018-10-04 21:30:55 -06:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Alban Crequy	3321aa1af7	Fix regression with mounts with non-absolute source path PR #1753 introduced a test on the mount flags but the binary operator was wrong, see https://github.com/opencontainers/runc/pull/1753#discussion_r203445652 This was noticed when investigating https://github.com/opencontainers/runtime-tools/issues/651 Symptoms: in the container, /proc/self/mountinfo displays some mounts as follow: 296 279 0:67 / /tmp rw,nosuid - tmpfs /home/dpark/go/src/github.com/opencontainers/runc/tmpfs rw,size=65536k,mode=755 Signed-off-by: Alban Crequy <alban@kinvolk.io>	2018-07-18 18:30:49 +02:00
Qiang Huang	dd67ab10d7	Merge pull request #1759 from cyphar/rootless-erofs-as-eperm rootless: cgroup: treat EROFS as a skippable error	2018-05-25 09:24:16 +08:00
dlorenc	40680b2d37	Make the setupSeccomp function public. This function is useful for converting from the OCI spec format to the one used by runC/libcontainer. Signed-off-by: dlorenc <lorenc.d@gmail.com>	2018-04-17 10:47:22 -07:00
Michael Crosby	d56f6cc202	Merge pull request #1753 from wking/do-not-require-bind-mount-type libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts	2018-04-16 11:01:53 -04:00
Michael Crosby	9f0eca2a94	Merge pull request #1777 from nalind/no-config-for-extant-netns Only configure networking when creating a net ns	2018-04-12 10:55:02 -04:00
Nalin Dahyabhai	4521d4b19c	Only configure networking when creating a net ns When joining an existing namespace, don't default to configuring a loopback interface in that namespace. Its creator should have done that, and we don't want to fail to create the container when we don't have sufficient privileges to configure the network namespace. Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>	2018-04-11 13:28:19 -04:00
Aleksa Sarai	fd3a6e6c83	libcontainer: handle unset oomScoreAdj corectly Previously if oomScoreAdj was not set in config.json we would implicitly set oom_score_adj to 0. This is not allowed according to the spec: > If oomScoreAdj is not set, the runtime MUST NOT change the value of > oom_score_adj. Change this so that we do not modify oom_score_adj if oomScoreAdj is not present in the configuration. While this modifies our internal configuration types, the on-disk format is still compatible. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00
W. Trevor King	0aa6e4e5d3	libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts From the "Creating a bind mount" section of mount(2) [1]: > If mountflags includes MS_BIND (available since Linux 2.4), then > perform a bind mount... > > The filesystemtype and data arguments are ignored. This commit adds support for configurations that leave the OPTIONAL type [2] unset for bind mounts. There's a related spec-example change in flight with [3], although my personal preference would be a more explicit spec for the whole mount structure [4]. [1]: http://man7.org/linux/man-pages/man2/mount.2.html [2]: https://github.com/opencontainers/runtime-spec/blame/v1.0.1/config.md#L102 [3]: https://github.com/opencontainers/runtime-spec/pull/954 [4]: https://github.com/opencontainers/runtime-spec/pull/771 Signed-off-by: W. Trevor King <wking@tremily.us>	2018-03-07 10:23:42 -08:00
Aleksa Sarai	757e78bebd	merge branch 'pr-1743' The setupUserNamespace function is always called. LGTMs: @crosbymichael @mrunalp @cyphar Closes #1743	2018-02-27 12:22:52 +11:00
ynirk	2420eb1f4d	The setupUserNamespace function is always called. The function is called even if the usernamespace is not set. This results having wrong uid/gid set on devices. This fix add a test to check if usernamespace is set befor calling setupUserNamespace. Fixes #1742 Signed-off-by: Julien Lavesque <julien.lavesque@gmail.com>	2018-02-26 14:27:11 +01:00
Allen Sun	3f32e72963	fix lint error in specconv Signed-off-by: Allen Sun <allensun.shl@alibaba-inc.com>	2018-02-26 15:39:54 +08:00
Mrunal Patel	c6e4a1ebeb	Merge pull request #1665 from Mashimiao/gidmapping-valid-fix specconv: avoid skipping gidmappings applied when uidmappings is empty	2017-12-11 09:50:54 -08:00
Ma Shimiao	57edfbbaf2	specconv: avoid skipping gidmappings applied when uidmappings is empty Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-11-30 16:24:36 +08:00
Ma Shimiao	17db6560be	support unbindable,runbindable for rootfs propagation Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-11-17 16:14:15 +08:00
Akihiro Suda	0aac2368e4	specconv.Example(): add /proc/scsi to masked paths Port over https://github.com/moby/moby/pull/35399 Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2017-11-04 17:38:14 +00:00
Lorenzo Fontana	780f8ef567	Specconv: Test create command hooks and seccomp setup Signed-off-by: Lorenzo Fontana <lo@linux.com>	2017-10-28 21:46:46 +02:00
Lorenzo Fontana	c0e6e12f9d	Test Cgroup creation and memory allocations Signed-off-by: Lorenzo Fontana <lo@linux.com>	2017-10-25 01:58:10 +02:00
Aleksa Sarai	d4f0f9a52b	specconv: emit an error when using MS_PRIVATE with --no-pivot Due to the semantics of chroot(2) when it comes to mount namespaces, it is not generally safe to use MS_PRIVATE as a mount propgation when using chroot(2). The reason for this is that this effectively results in a set of mount references being held by the chroot'd namespace which the namespace cannot free. pivot_root(2) does not have this issue because the @old_root can be unmounted by the process. Ultimately, --no-pivot is not really necessary anymore as a commonly used option since `f8e6b5af5e` ("rootfs: make pivot_root not use a temporary directory") resolved the read-only issue. But if someone really needs to use it, MS_PRIVATE is never a good idea. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-10-08 17:50:55 +11:00
Qiang Huang	79ad714374	Merge pull request #1598 from euank/ragent libcontainer: default mount propagation correctly	2017-09-25 11:55:29 +08:00
Euan Kemp	4301b440d6	libcontainer: default mount propagation correctly The code in prepareRoot (`e385f67a0e/libcontainer/rootfs_linux.go (L599-L605)`) attempts to default the rootfs mount to `rslave`. However, since the spec conversion has already defaulted it to `rprivate`, that code doesn't actually ever do anything. This changes the spec conversion code to accept "" and treat it as 0. Implicitly, this makes rootfs propagation default to `rslave`, which is a part of fixing the moby bug https://github.com/moby/moby/issues/34672 Alternate implementatoins include changing this defaulting to be `rslave` and removing the defaulting code in prepareRoot, or skipping the mapping entirely for "", but I think this change is the cleanest of those options. Signed-off-by: Euan Kemp <euan.kemp@coreos.com>	2017-09-22 13:36:23 -07:00
Mrunal Patel	13fa5d2953	Merge pull request #1588 from s7v7nislands/delete_unused Delete unused function	2017-09-08 17:34:00 -07:00
s7v7nislands	c795b8690b	Delete unused function Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>	2017-09-08 10:35:46 +08:00
Ma Shimiao	c3d20e7817	Fixes #1585 config.Namespaces is empty when accessed Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-09-08 09:30:07 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Ma Shimiao	2333e7dc67	fix panic when Linux is nil for rootless case congfig.Sysctl setting is duplicated. when contianer is rootless and Linux is nil, runc will panic. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-08-16 09:11:13 +08:00
Ma Shimiao	527dc5acbb	fix panic when Linux is nil Linux is not always not nil. If Linux is nil, panic will occur. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-08-10 15:57:49 -04:00
Michael Crosby	eb70c213ba	Update runtime-spec to rc6 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-07-12 16:24:04 -07:00
Michael Crosby	fef3aced0e	Merge pull request #1460 from wking/mount-option-lazytime libcontainer/specconv/spec_linux: Add support for (no)lazytime	2017-06-29 10:06:23 -07:00
Justin Cormack	e1146182a8	Remove Platform as no longer in OCI spec This was never used, just validated, so was removed from spec. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
W. Trevor King	4f81337e95	libcontainer/specconv/spec_linux: Add support for (no)lazytime And also silent, loud, (no)iversion, and (no)acl. This is part of catching runC up with the spec, which punts valid options to mount(8) [1,2]. (no)acl is a filesystem-specific entry in mount(8), but it's represented by a MS_* flag in mount(2) so we need an entry in the translation table. [1]: https://github.com/opencontainers/runtime-spec/blame/v1.0.0-rc5/config.md#L68 [2]: https://github.com/opencontainers/runtime-spec/pull/771 Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-01 20:43:35 -07:00
Michael Crosby	854b41d81e	Update spec to `239c4e44f2` This provides updates to runc for the spec changes with *Process and OOMScoreAdj Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-06-01 16:29:47 -07:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Aleksa Sarai	d04cbc49d2	rootless: add autogenerated rootless config from `runc spec` Since this is a runC-specific feature, this belongs here over in opencontainers/ocitools (which is for generic OCI runtimes). In addition, we don't create a new network namespace. This is because currently if you want to set up a veth bridge you need CAP_NET_ADMIN in both network namespaces' pinned user namespace to create the necessary interfaces in each network namespace. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:21 +11:00
Aleksa Sarai	f0876b0427	libcontainer: configs: add proper HostUID and HostGID Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Qiang Huang	8430cc4f48	Use uint64 for resources to keep consistency with runtime-spec Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-20 18:51:39 +08:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Ma Shimiao	06e27471bb	support create device with type p and u Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-02-10 14:45:15 +08:00
Zhang Wei	8eea644ccc	Bump runtime-spec to v1.0.0-rc3 * Bump underlying runtime-spec to version 1.0.0-rc3 * Fix related changed struct names in config.go Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-17 14:02:35 +08:00
Zhang Wei	a0f7977f0f	Detect and forbid duplicated namespace in spec When spec file contains duplicated namespaces, e.g. specs: specs.Spec{ Linux: &specs.Linux{ Namespaces: []specs.Namespace{ { Type: "pid", }, { Type: "pid", Path: "/proc/1/ns/pid", }, }, }, } runc should report malformed spec instead of using latest one by default, because this spec could be quite confusing. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-10-27 00:44:36 +08:00
Alexander Morozov	1ab9d5e6f4	Merge pull request #845 from mrunalp/cp_tmpfs Add support for copying up directories into tmpfs when a tmpfs is mounted over them	2016-10-21 13:47:16 -07:00

1 2

78 Commits