jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Will Martin	ca4f427af1	Support cgroups with limits as rootless Signed-off-by: Ed King <eking@pivotal.io> Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io> Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>	2017-10-05 11:22:54 +01:00
Giuseppe Scrivano	3282f5a7c1	tests: fix for rootless multiple uids/gids Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	d8b669400a	rootless: allow multiple user/group mappings Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Daniel, Dao Quang Minh	b313a75364	Merge pull request #1477 from yummypeng/save-own-ns-path Always save own namespace paths	2017-08-02 11:24:30 +01:00
Steven Hartland	ee4f68e302	Updated logrus to v1 Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen. This includes a manual edit of the docker term package to also correct the name there too. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-07-19 15:20:56 +00:00
Yuanhong Peng	e939079acf	Always save own namespace paths fix #1476 If containerA shares namespace, say ipc namespace, with containerB, then its ipc namespace path would be the same as containerB and be stored in `state.json`. Exec into containerA will just read the namespace paths stored in this file and join these namespaces. So, if containerB has already been stopped, `docker exec containerA` will fail. To address this issue, we should always save own namespace paths no matter if we share namespaces with other containers. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2017-07-13 16:13:05 +08:00
Justin Cormack	3d9074ead3	Update memory specs to use int64 not uint64 replace #1492 #1494 fix #1422 Since https://github.com/opencontainers/runtime-spec/pull/876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Justin Cormack	4c67360296	Clean up unix vs linux usage FreeBSD does not support cgroups or namespaces, which the code suggested, and is not supported in runc anyway right now. So clean up the file naming to use `_linux` where appropriate. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-05-12 17:22:09 +01:00
Harshal Patil	22953c122f	Remove redundant declaraion of namespace slice Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>	2017-05-02 10:04:57 +05:30
Daniel, Dao Quang Minh	13a8c5d140	Merge pull request #1365 from hqhq/use_go_selinux Use opencontainers/selinux package	2017-04-15 14:22:32 +01:00
Aleksa Sarai	f0876b0427	libcontainer: configs: add proper HostUID and HostGID Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Qiang Huang	5e7b48f7c0	Use opencontainers/selinux package It's splitted as a separate project. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-23 08:21:19 +08:00
Qiang Huang	8430cc4f48	Use uint64 for resources to keep consistency with runtime-spec Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-20 18:51:39 +08:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Qiang Huang	f3c16acd47	Fix go_vet errors runc/libcontainer/configs/namespaces_syscall_unsupported.go Line 7: error: unreachable code (vet) Line 14: error: unreachable code (vet) Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-01-06 10:20:27 +08:00
Zhang Wei	a344b2d6a8	sync up `HookState` with OCI spec `State` `HookState` struct should follow definition of `State` in runtime-spec: * modify json name of `version` to `ociVersion`. * Remove redundant `Rootfs` field as rootfs can be retrived from `bundlePath/config.json`. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-20 00:00:43 +08:00
Samuel Ortiz	f19aa2d04d	validate: Check that the given namespace path is a symlink When checking if the provided networking namespace is the host one or not, we should first check if it's a symbolic link or not as in some cases we can use persistent networking namespace under e.g. /var/run/netns/. Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>	2016-12-10 11:14:49 +01:00
Qiang Huang	81d6088c8f	Unify rootfs validation Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-10-29 10:31:44 +08:00
Michael Crosby	6328410520	Merge pull request #1149 from cyphar/fix-sysctl-validation validator: unbreak sysctl net.* validation	2016-10-26 09:06:41 -07:00
Aleksa Sarai	1ab3c035d2	validator: actually test success Previously we only tested failures, which causes us to miss issues where setting sysctls would always fail. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-26 23:07:57 +11:00
Aleksa Sarai	2a94c3651b	validator: unbreak sysctl net.* validation When changing this validation, the code actually allowing the validation to pass was removed. This meant that any net.* sysctl would always fail to validate. Fixes: `bc84f83344` ("fix docker/docker#27484") Reported-by: Justin Cormack <justin.cormack@docker.com> Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-26 22:58:51 +11:00
Qiang Huang	157a96a428	Merge pull request #977 from cyphar/nsenter-userns-ordering nsenter: guarantee correct user namespace ordering	2016-10-26 16:45:15 +08:00
Qiang Huang	4ec570d060	Merge pull request #1138 from gaocegege/fix-config-validator docker/docker#27484-check if sysctls are used in host network mode.	2016-10-25 11:08:51 +08:00
Aleksa Sarai	c7ed2244f4	merge branch 'pr-1125' LGTMs: @hqhq @mrunalp Closes #1125	2016-10-25 10:05:28 +11:00
Ce Gao	41c35810f2	add test cases about host ns Signed-off-by: Ce Gao <ce.gao@outlook.com>	2016-10-22 11:31:15 +08:00
Ce Gao	bc84f83344	fix docker/docker#27484 Signed-off-by: Ce Gao <ce.gao@outlook.com>	2016-10-22 11:22:52 +08:00
Alexander Morozov	1ab9d5e6f4	Merge pull request #845 from mrunalp/cp_tmpfs Add support for copying up directories into tmpfs when a tmpfs is mounted over them	2016-10-21 13:47:16 -07:00
Aleksa Sarai	f8e6b5af5e	rootfs: make pivot_root not use a temporary directory Namely, use an undocumented feature of pivot_root(2) where pivot_root(".", ".") is actually a feature and allows you to make the old_root be tied to your /proc/self/cwd in a way that makes unmounting easy. Thanks a lot to the LXC developers which came up with this idea first. This is the first step of many to allowing runC to work with a completely read-only rootfs. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-20 12:55:58 +11:00
Daniel Dao	1b876b0bf2	fix typos with misspell pipe the source through https://github.com/client9/misspell. typos be gone! Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2016-10-11 23:22:48 +00:00
Aleksa Sarai	ed053a740c	nsenter: specify namespace type in setns() This avoids us from running into cases where libcontainer thinks that a particular namespace file is a different type, and makes it a fatal error rather than causing broken functionality. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-04 16:17:55 +11:00
Mrunal Patel	f5103d311e	config: Add new Extensions flag to support custom mount options in runc Also, defines a EXT_COPYUP flag for supporting tmpfs copyup operation. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-09-30 09:46:46 -07:00
Michael Crosby	ad400bb093	Change netclassid json tag This allows older state files to be loaded without the unmarshal error of the string to int conversion. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-09-12 09:31:58 -07:00
xiekeyang	206fea7f50	remove unused code Signed-off-by: xiekeyang <xiekeyang@huawei.com>	2016-08-22 17:16:45 +08:00
Justin Cormack	834e53144b	Do not create /dev/fuse by default This device is not required by the OCI spec. The rationale for this was linked to https://github.com/docker/docker/issues/2393 So a non functional /dev/fuse was created, and actual fuse use still is required to add the device explicitly. However even old versions of the JVM on Ubuntu 12.04 no longer require the fuse package, and this is all not needed. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2016-08-12 13:00:24 +01:00
Alexander Morozov	7679c80be5	libcontainer/configs: make hooks run safer It's possible that `cmd.Process` is still nil when we reach timeout. Start creates `Process` field synchronously, and there is no way to such race. Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>	2016-08-08 10:16:35 -07:00
Qiang Huang	1a81e9ab1f	Merge pull request #958 from dubstack/skip-devices Skip updates on parent Devices cgroup	2016-07-29 10:31:49 +08:00
Buddha Prakash	ef4ff6a8ad	Skip updates on parent Devices cgroup Signed-off-by: Buddha Prakash <buddhap@google.com>	2016-07-25 10:30:46 -07:00
Zhao Lei	bac8b4f0b4	UNITTEST: Bypass userns test on platform without userns support We should bypass userns test instead of show fail in platform without userns support. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>	2016-07-25 15:35:04 +08:00
Mrunal Patel	4dedd09396	Merge pull request #937 from hushan/net_cls-classid fix setting net_cls classid	2016-07-18 17:18:23 -04:00
Aleksa Sarai	aa029491be	configs: fix json tags for CpuRt* options Previously we used the same JSON tag name for the regular and realtime versions of the CpuRt* fields, which causes issues when you want to use two different values for the fields. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-07-18 17:02:30 +10:00
Hushan Jia	bb42f80a86	fix setting net_cls classid Setting classid of net_cls cgroup failed: ERRO[0000] process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument" process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument" The spec has classid as a *uint32, the libcontainer configs should match the type. Signed-off-by: Hushan Jia <hushan.jia@gmail.com>	2016-07-11 05:00:35 +08:00
Petar Petrov	f9b72b1b46	Allow additional groups to be overridden in exec Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com> Signed-off-by: Petar Petrov <pppepito86@gmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2016-06-21 10:35:11 +03:00
Michael Crosby	8c9db3a7a5	Add option to disable new session keys This adds an `--no-new-keyring` flag to run and create so that a new session keyring is not created for the container and the calling processes keyring is inherited. Fixes #818 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-03 11:53:07 -07:00
Aleksa Sarai	399175c227	Merge pull request #679 from rajasec/selinux-errorcheck Adding selinux check during container start	2016-04-24 16:24:26 +00:00
rajasec	733ff99f6d	Updating kcore in validator test Signed-off-by: rajasec <rajasec79@gmail.com>	2016-04-21 15:29:19 +05:30
rajasec	d0bf80e481	Adding selinux check during container start Signed-off-by: rajasec <rajasec79@gmail.com> Fixed review comments and rebased Signed-off-by: rajasec <rajasec79@gmail.com> updated the message as per review comment Signed-off-by: Rajasekaran <rajasec79@gmail.com>	2016-04-19 22:22:04 +05:30

1 2 3

112 Commits