jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Xiaochen Shen	d59b17d6d5	libcontainer: intelrdt: Add more check if sub-features are enabled Double check if Intel RDT sub-features are available in "resource control" filesystem. Intel RDT sub-features can be selectively disabled or enabled by kernel command line (e.g., rdt=!l3cat,mba) in 4.14 and newer kernel. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:44 +08:00
Xiaochen Shen	f097339289	libcontainer: intelrdt: add test cases for Intel RDT/MBA Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:39 +08:00
Xiaochen Shen	1ed597bfe6	libcontainer: intelrdt: add update command support for Intel RDT/MBA Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:34 +08:00
Xiaochen Shen	27560ace2f	libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:29:29 +08:00
Xiaochen Shen	c1cece7e23	libcontainer: intelrdt: add Intel RDT/MBA docs in SPEC.md Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 14:28:19 +08:00
Xiaochen Shen	bd90541666	vendor: bump runtime-spec to 5684b8af48c1 Update runtime-spec to get Intel RDT/MBA Linux configs which will be used in successive commits. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2018-10-16 13:18:25 +08:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Dominik Süß	0b412e9482	various cleanups to address linter issues Signed-off-by: Dominik Süß <dominik@suess.wtf>	2018-10-13 21:14:03 +02:00
Adrian Reber	0d01164756	Fix travis Go: tip This fixes libcontainer/container_linux.go:1200: Error call has possible formatting directive %s Signed-off-by: Adrian Reber <areber@redhat.com>	2018-10-13 10:44:07 +00:00
Aleksa Sarai	398f670bcb	merge branch 'pr-1908' fix build break LGTMs: @crosbymichael @cyphar Closes #1908	2018-10-13 18:32:37 +11:00
Mike Brown	36f8472053	fix build break Signed-off-by: Mike Brown <brownwm@us.ibm.com>	2018-10-12 09:22:35 -05:00
Aleksa Sarai	e40d4635c4	merge branch 'pr-1894' Move spec.Linux.IntelRdt check to spec.Linux != nil block LGTMs: @crosbymichael @cyphar Closes #1894	2018-10-09 02:41:13 +11:00
Jonathan Marler	1499c746a1	Move spec.Linux.IntelRdt check to spec.Linux != nil block Signed-off-by: Jonathan Marler <johnnymarler@gmail.com>	2018-10-04 21:30:55 -06:00
Mike Brown	26bdc0dce7	clarify license information Signed-off-by: Mike Brown <brownwm@us.ibm.com>	2018-10-03 10:39:44 -05:00
Mrunal Patel	2abd837c8c	Merge pull request #1893 from cyphar/keyctl-ignore-enosys keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING)	2018-09-25 13:35:16 -07:00
Danail Branekov	a1d5398afa	Respect container's cgroup path Respect the container's cgroup path when finding the container's cgroup mount point, which is useful in multi-tenant environments, where containers have their own unique cgroup mounts Signed-off-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Oliver Stenbom <ostenbom@pivotal.io> Signed-off-by: Giuseppe Capizzi <gcapizzi@pivotal.io>	2018-09-25 17:43:36 +01:00
Aleksa Sarai	5de99cd390	tty: clean up epollConsole closing `ec0d23a92f` ("tty: close epollConsole on errors") fixed a significant issue, but the cleanup was not ideal (especially if the function is changed in future to add additional error conditions to those currently present). Using the defer-named-error trick avoids this issue and makes the code more readable. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-09-21 11:55:36 +10:00
Mrunal Patel	00dc70017d	Merge pull request #1895 from giuseppe/fix-tty-hang tty: close epollConsole on errors	2018-09-20 10:02:08 -07:00
Giuseppe Scrivano	ec0d23a92f	tty: close epollConsole on errors make sure epollConsole is closed before returning an error. It solves a hang when using these commands with a container that uses a terminal: runc run foo & ssh root@localhost runc exec foo echo hello Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-09-20 16:51:51 +02:00
Aleksa Sarai	578fe65e4f	merge branch 'pr-1817' Fix duplicate entries and missing entries in getCgroupMountsHelper Add test for testing cgroup mounts on bedrock linux Stop relying on number of subsystems for cgroups LGTMs: @crosbymichael @cyphar Closes #1817	2018-09-19 19:48:17 +10:00
Michael Crosby	cc8146cf93	Merge pull request #1858 from marcov/nsenter-README Update outdated nsenter README content	2018-09-17 10:53:19 -04:00
Michael Crosby	d77251d5fc	Merge pull request #1892 from Ace-Tang/add_clean_test test: add more test case for CleanPath	2018-09-17 10:51:17 -04:00
Michael Crosby	8facd6d2d5	Merge pull request #1886 from halfcrazy/fix/typo doc: fix typo	2018-09-17 10:44:23 -04:00
Aleksa Sarai	40f1468413	keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING) While all modern kernels (and I do mean _all_ of them -- this syscall was added in 2.6.10 before git had begun development!) have support for this syscall, LXC has a default seccomp profile that returns ENOSYS for this syscall. For most syscalls this would be a deal-breaker, and our use of session keyrings is security-based there are a few mitigating factors that make this change not-completely-insane: * We already have a flag that disables the use of session keyrings (for older kernels that had system-wide keyring limits and so on). So disabling it is not a new idea. * While the primary justification of using session keys is security-based, it's more of a security-by-obscurity protection. The main defense keyrings have is VFS credentials -- which is something that users already have better security tools for (setuid(2) and user namespaces). * Given the security justification you might argue that we shouldn't silently ignore this. However, the only way for the kernel to return -ENOSYS is either being ridiculously old (at which point we wouldn't work anyway) or that there is a seccomp profile in place blocking it. Given that the seccomp profile (if malicious) could very easily just return 0 or a silly return code (or something even more clever with seccomp-bpf) and trick us without this patch, there isn't much of a significant change in how much seccomp can trick us with or without this patch. Given all of that over-analysis, I'm pretty convinced there isn't a security problem in this very specific case and it will help out the ChromeOS folks by allowing Docker to run inside their LXC container setup. I'd be happy to be proven wrong. Ref: https://bugs.chromium.org/p/chromium/issues/detail?id=860565 Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-09-17 21:38:30 +10:00
Ace-Tang	5963cf2afc	test: add more test case for CleanPath Signed-off-by: Ace-Tang <aceapril@126.com>	2018-09-14 21:37:12 +08:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Michael Crosby	70ca035aa6	Merge pull request #1883 from lifubang/containeridinpath fix delete other file bug when container id is ..	2018-09-05 13:43:21 -04:00
Mrunal Patel	9cda583235	Merge pull request #1832 from giuseppe/runc-drop-invalid-proc-destination-with-chroot linux: drop check for /proc as invalid dest	2018-09-04 09:26:21 -07:00
Michael Crosby	784b601f68	Merge pull request #1882 from accepting/dev libcontainer: add /proc/loadavg to the white list of bind mount	2018-09-04 11:27:45 -04:00
Lifubang	4eb30fcdbe	code optimization: use securejoin.SecureJoin and CleanPath Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-09-04 09:02:18 +08:00
Lifubang	4fae8fcce2	code optimization after review Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-09-03 23:27:31 +08:00
Lifubang	d2d226e8f9	fix unexpected delete bug when container id is .. Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-08-31 11:17:42 +08:00
Michael Crosby	fdd8055cdd	Merge pull request #1868 from rhatdan/man Add --rootless option to man page	2018-08-30 20:28:02 -04:00
ChangFeng	3ce8fac7c4	libcontainer: add /proc/loadavg to the white list of bind mount Signed-off-by: JunLi <lijun.git@gmail.com>	2018-08-30 21:30:23 +08:00
Giuseppe Scrivano	636b664027	linux: drop check for /proc as invalid dest it is now allowed to bind mount /proc. This is useful for rootless containers when the PID namespace is shared with the host. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-08-30 09:56:18 +02:00
Akihiro Suda	b34d6d8a7c	libcontainer: CurrentGroupSubGIDs -> CurrentUserSubGIDs subgid is defined per user, not group (see subgid(5)) This commit also adds support for specifying subuid owner with a numeric UID. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-08-29 07:46:03 +09:00
Michael Crosby	1555a78945	Merge pull request #1874 from mrunalp/drop_unused_code Remove unused veth setup code	2018-08-27 11:07:25 -04:00
Qiang Huang	0228707b77	Merge pull request #1873 from rhatdan/ms_move When doing a copyup, /tmp can not be a shared mount point	2018-08-27 10:08:53 +08:00
Mrunal Patel	fe3d5c4c6e	Remove unused veth setup code Networking is setup by plugins for users of runc so it makes sense to get rid of the veth strategy. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2018-08-24 15:41:52 -07:00
Michael Crosby	459bfaec1f	Merge pull request #1849 from adrianreber/master Add support to checkpoint and restore into external network namespaces	2018-08-23 10:46:37 -04:00
Adrian Reber	832ac8a538	tests: add external network namespace tests This adds a new CRIU based checkpoint/restore test to check if the restored container runs in the same network namespace as before. Signed-off-by: Adrian Reber <areber@redhat.com>	2018-08-22 23:27:20 +02:00
Adrian Reber	fa43a72aba	criu: restore into existing namespace when specified Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes #1786 Related to https://github.com/projectatomic/libpod/pull/469 Thanks to Andrei Vagin for all the help in getting the interface between CRIU and runc right! Signed-off-by: Adrian Reber <areber@redhat.com>	2018-08-22 23:27:20 +02:00
Michael Crosby	308daade45	Merge pull request #1854 from KentaTada/add-docker-proxy-settings-for-test-in-makefile Add docker proxy settings for make test in a proxy environment	2018-08-22 13:51:47 -04:00
Kenta Tada	b399167f2c	Add docker proxy settings for make test in a proxy environment This commit modifies Makefile to execute `make test` in a proxy environment. Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>	2018-08-22 18:19:48 +09:00
Qiang Huang	6b8de90552	Merge pull request #1855 from KentaTada/add-an-explanation-for-testpath-flags Add an explanation for TESTPATH	2018-08-22 15:52:57 +08:00
Michael Crosby	9744d7958a	Merge pull request #1871 from Ace-Tang/add_mask_restore cr: don't restore net namespace by default	2018-08-21 10:05:14 -04:00
Daniel J Walsh	62a4763a7a	When doing a copyup, /tmp can not be a shared mount point MOVE_MOUNT will fail under certain situations. You are not allowed to MS_MOVE if the parent directory is shared. man mount ... The move operation Move a mounted tree to another place (atomically). The call is: mount --move olddir newdir This will cause the contents which previously appeared under olddir to now be accessible under newdir. The physical location of the files is not changed. Note that olddir has to be a mountpoint. Note also that moving a mount residing under a shared mount is invalid and unsupported. Use findmnt -o TARGET,PROPAGATION to see the current propagation flags. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2018-08-20 17:41:06 -04:00
Ace-Tang	4803faf00e	cr: don't restore net namespace by default since runc don't manage net device and their configuration, checkpoint also don't dump net namespace by default, so set 'nsmask = unix.CLONE_NEWNET' by default in restore. Or if user do not pass 'empty-ns network', criu will cost extra time in restore. Signed-off-by: Ace-Tang <aceapril@126.com>	2018-08-17 16:03:21 +08:00
Daniel J Walsh	cb3e35b589	Add missing data to man page Add create command Add --rootless option to man page Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2018-08-15 20:21:13 -04:00

1 2 3 4 5 ...

3795 Commits All Branches Search

3795 Commits

All Branches