jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Michael Crosby	aa7917b751	Merge pull request #1911 from theSuess/linter-fixes Various cleanups to address linter issues	2018-11-13 12:13:34 -05:00
Michael Crosby	bd420b59f1	Merge pull request #1925 from Ace-Tang/fix_dup_ns test: fix TestDupNamespaces fail to test dup-ns error	2018-11-13 12:11:11 -05:00
Ace-Tang	16d55f17a8	libcontainer: fix potential panic if spec.Process is nil for the code logic, pointer 'spec.Process' should be judge first to avoid panic. Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:55:30 +08:00
Ace-Tang	95d1aa1886	test: fix TestDupNamespaces add Root in created spec, or error message is 'Root must be specified' Signed-off-by: Ace-Tang <aceapril@126.com>	2018-11-06 11:36:27 +08:00
Michael Crosby	b1068fb925	Merge pull request #1814 from rhatdan/selinux SELinux labels are tied to the thread	2018-11-05 10:00:11 -05:00
Aleksa Sarai	9f1e94488e	merge branch 'pr-1921' libcontainer: ability to compile without kmem LGTMs: @mrunalp @cyphar Closes #1921	2018-11-02 09:54:16 +11:00
Michael Crosby	9e5aa7494d	Merge pull request #1918 from giuseppe/skip-setgroups rootless: fix running with /proc/self/setgroups set to deny	2018-11-01 13:16:47 -04:00
Kir Kolyshkin	6a2c155968	libcontainer: ability to compile without kmem Commit `fe898e7862` (PR #1350) enables kernel memory accounting for all cgroups created by libcontainer -- even if kmem limit is not configured. Kernel memory accounting is known to be broken in some kernels, specifically the ones from RHEL7 (including RHEL 7.5). Those kernels do not support kernel memory reclaim, and are prone to oopses. Unconditionally enabling kmem acct on such kernels lead to bugs, such as * https://github.com/opencontainers/runc/issues/1725 * https://github.com/kubernetes/kubernetes/issues/61937 * https://github.com/moby/moby/issues/29638 This commit gives a way to compile runc without kernel memory setting support. To do so, use something like make BUILDTAGS="seccomp nokmem" Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2018-10-31 20:35:51 -07:00
Chris Aniszczyk	f3ce8221ea	Merge pull request #1913 from xiaochenshen/rdt-add-diagnostics libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors	2018-10-25 14:27:17 -05:00
Giuseppe Scrivano	869add3318	rootless: fix running with /proc/self/setgroups set to deny This is a regression from `06f789cf26` when the user namespace was configured without a privileged helper. To allow a single mapping in an user namespace, it is necessary to set /proc/self/setgroups to "deny". For a simple reproducer, the user namespace can be created with "unshare -r". Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-10-25 15:44:15 +02:00
Aleksa Sarai	e93996674f	merge branch 'pr-1903' clarify license information LGTMs: @hqhq @cyphar Closes #1903	2018-10-24 22:03:44 +11:00
Xiaochen Shen	6c307f8ff2	libcontainer: intelrdt: add user-friendly diagnostics for Intel RDT operation errors Linux kernel v4.15 introduces better diagnostics for Intel RDT operation errors. If any error returns when making new directories or writing to any of the control file in resctrl filesystem, reading file /sys/fs/resctrl/info/last_cmd_status could provide more information that can be conveyed in the error returns from file operations. Some examples: echo "L3:0=f3;1=ff" > /sys/fs/resctrl/container_id/schemata -bash: echo: write error: Invalid argument cat /sys/fs/resctrl/info/last_cmd_status mask f3 has non-consecutive 1-bits echo "MB:0=0;1=110" > /sys/fs/resctrl/container_id/schemata -bash: echo: write error: Invalid argument cat /sys/fs/resctrl/info/last_cmd_status MB value 0 out of range [10,100] cd /sys/fs/resctrl mkdir 1 2 3 4 5 6 7 8 mkdir: cannot create directory '8': No space left on device cat /sys/fs/resctrl/info/last_cmd_status out of CLOSIDs See 'last_cmd_status' for more details in kernel documentation: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt In runc, we could append the diagnostics information to the error message of Intel RDT operation errors to provide more user-friendly information. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-19 00:16:08 +08:00
Mrunal Patel	c2ab1e656e	Merge pull request #1910 from adrianreber/tip Fix travis Go: tip	2018-10-17 12:47:08 -07:00
Michael Crosby	58592df567	Merge pull request #1880 from AkihiroSuda/fix-subgid libcontainer: CurrentGroupSubGIDs -> CurrentUserSubGIDs	2018-10-16 15:21:51 -04:00
Xiaochen Shen	d59b17d6d5	libcontainer: intelrdt: Add more check if sub-features are enabled Double check if Intel RDT sub-features are available in "resource control" filesystem. Intel RDT sub-features can be selectively disabled or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and newer kernel. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:44 +08:00
Xiaochen Shen	f097339289	libcontainer: intelrdt: add test cases for Intel RDT/MBA Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:39 +08:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Xiaochen Shen	c1cece7e23	libcontainer: intelrdt: add Intel RDT/MBA docs in SPEC.md Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:28:19 +08:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Dominik Süß	0b412e9482	various cleanups to address linter issues Signed-off-by: Dominik Süß <dominik@suess.wtf>	2018-10-13 21:14:03 +02:00
Adrian Reber	0d01164756	Fix travis Go: tip This fixes libcontainer/container_linux.go:1200: Error call has possible formatting directive %s Signed-off-by: Adrian Reber <areber@redhat.com>	2018-10-13 10:44:07 +00:00
Aleksa Sarai	e40d4635c4	merge branch 'pr-1894' Move spec.Linux.IntelRdt check to spec.Linux != nil block LGTMs: @crosbymichael @cyphar Closes #1894	2018-10-09 02:41:13 +11:00
Jonathan Marler	1499c746a1	Move spec.Linux.IntelRdt check to spec.Linux != nil block Signed-off-by: Jonathan Marler <johnnymarler@gmail.com>	2018-10-04 21:30:55 -06:00
Mike Brown	26bdc0dce7	clarify license information Signed-off-by: Mike Brown <brownwm@us.ibm.com>	2018-10-03 10:39:44 -05:00
Mrunal Patel	2abd837c8c	Merge pull request #1893 from cyphar/keyctl-ignore-enosys keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING)	2018-09-25 13:35:16 -07:00
Aleksa Sarai	578fe65e4f	merge branch 'pr-1817' Fix duplicate entries and missing entries in getCgroupMountsHelper Add test for testing cgroup mounts on bedrock linux Stop relying on number of subsystems for cgroups LGTMs: @crosbymichael @cyphar Closes #1817	2018-09-19 19:48:17 +10:00
Michael Crosby	cc8146cf93	Merge pull request #1858 from marcov/nsenter-README Update outdated nsenter README content	2018-09-17 10:53:19 -04:00
Michael Crosby	d77251d5fc	Merge pull request #1892 from Ace-Tang/add_clean_test test: add more test case for CleanPath	2018-09-17 10:51:17 -04:00
Aleksa Sarai	40f1468413	keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING) While all modern kernels (and I do mean _all_ of them -- this syscall was added in 2.6.10 before git had begun development!) have support for this syscall, LXC has a default seccomp profile that returns ENOSYS for this syscall. For most syscalls this would be a deal-breaker, and our use of session keyrings is security-based there are a few mitigating factors that make this change not-completely-insane: * We already have a flag that disables the use of session keyrings (for older kernels that had system-wide keyring limits and so on). So disabling it is not a new idea. * While the primary justification of using session keys is security-based, it's more of a security-by-obscurity protection. The main defense keyrings have is VFS credentials -- which is something that users already have better security tools for (setuid(2) and user namespaces). * Given the security justification you might argue that we shouldn't silently ignore this. However, the only way for the kernel to return -ENOSYS is either being ridiculously old (at which point we wouldn't work anyway) or that there is a seccomp profile in place blocking it. Given that the seccomp profile (if malicious) could very easily just return 0 or a silly return code (or something even more clever with seccomp-bpf) and trick us without this patch, there isn't much of a significant change in how much seccomp can trick us with or without this patch. Given all of that over-analysis, I'm pretty convinced there isn't a security problem in this very specific case and it will help out the ChromeOS folks by allowing Docker to run inside their LXC container setup. I'd be happy to be proven wrong. Ref: https://bugs.chromium.org/p/chromium/issues/detail?id=860565 Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-09-17 21:38:30 +10:00
Ace-Tang	5963cf2afc	test: add more test case for CleanPath Signed-off-by: Ace-Tang <aceapril@126.com>	2018-09-14 21:37:12 +08:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Michael Crosby	70ca035aa6	Merge pull request #1883 from lifubang/containeridinpath fix delete other file bug when container id is ..	2018-09-05 13:43:21 -04:00
Mrunal Patel	9cda583235	Merge pull request #1832 from giuseppe/runc-drop-invalid-proc-destination-with-chroot linux: drop check for /proc as invalid dest	2018-09-04 09:26:21 -07:00
Lifubang	4eb30fcdbe	code optimization: use securejoin.SecureJoin and CleanPath Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-09-04 09:02:18 +08:00
Lifubang	4fae8fcce2	code optimization after review Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-09-03 23:27:31 +08:00
Lifubang	d2d226e8f9	fix unexpected delete bug when container id is .. Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-08-31 11:17:42 +08:00
ChangFeng	3ce8fac7c4	libcontainer: add /proc/loadavg to the white list of bind mount Signed-off-by: JunLi <lijun.git@gmail.com>	2018-08-30 21:30:23 +08:00
Giuseppe Scrivano	636b664027	linux: drop check for /proc as invalid dest it is now allowed to bind mount /proc. This is useful for rootless containers when the PID namespace is shared with the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-08-30 09:56:18 +02:00
Akihiro Suda	b34d6d8a7c	libcontainer: CurrentGroupSubGIDs -> CurrentUserSubGIDs subgid is defined per user, not group (see subgid(5)) This commit also adds support for specifying subuid owner with a numeric UID. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-08-29 07:46:03 +09:00
Michael Crosby	1555a78945	Merge pull request #1874 from mrunalp/drop_unused_code Remove unused veth setup code	2018-08-27 11:07:25 -04:00
Qiang Huang	0228707b77	Merge pull request #1873 from rhatdan/ms_move When doing a copyup, /tmp can not be a shared mount point	2018-08-27 10:08:53 +08:00
Mrunal Patel	fe3d5c4c6e	Remove unused veth setup code Networking is setup by plugins for users of runc so it makes sense to get rid of the veth strategy. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2018-08-24 15:41:52 -07:00
Adrian Reber	fa43a72aba	criu: restore into existing namespace when specified Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes #1786 Related to https://github.com/projectatomic/libpod/pull/469 Thanks to Andrei Vagin for all the help in getting the interface between CRIU and runc right! Signed-off-by: Adrian Reber <areber@redhat.com>	2018-08-22 23:27:20 +02:00
Daniel J Walsh	62a4763a7a	When doing a copyup, /tmp can not be a shared mount point MOVE_MOUNT will fail under certain situations. You are not allowed to MS_MOVE if the parent directory is shared. man mount ... The move operation Move a mounted tree to another place (atomically). The call is: mount --move olddir newdir This will cause the contents which previously appeared under olddir to now be accessible under newdir. The physical location of the files is not changed. Note that olddir has to be a mountpoint. Note also that moving a mount residing under a shared mount is invalid and unsupported. Use findmnt -o TARGET,PROPAGATION to see the current propagation flags. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2018-08-20 17:41:06 -04:00
Aleksa Sarai	20aff4f048	merge branch 'pr-1867' Revert "libcontainer/rootfs_linux: minor cleanup" LGTMs: @hqhq @cyphar Closes #1867	2018-08-15 15:42:56 +10:00
Mrunal Patel	26ec8a9783	Revert "libcontainer/rootfs_linux: minor cleanup" This reverts commit `1b27db67f1`. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2018-08-14 15:50:18 -07:00
Marco Vedovati	34ed62697b	Update outdated nsenter README content Signed-off-by: Marco Vedovati <mvedovati@suse.com>	2018-08-07 17:53:56 +02:00
Michael Crosby	4056a41f58	Merge pull request #1830 from crosbymichael/procs Pass GOMAXPROCS to init processes	2018-08-01 10:48:06 -04:00
Jay Kamat	a2faaa1317	Fix duplicate entries and missing entries in getCgroupMountsHelper Signed-off-by: Jay Kamat <jaygkamat@gmail.com>	2018-07-31 20:12:18 -07:00

1 2 3 4 5 ...

1178 Commits