jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Giuseppe Scrivano	d8b669400a	rootless: allow multiple user/group mappings Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Aleksa Sarai	7d66aab77a	init: switch away from stateDirFd entirely While we have significant protections in place against CVE-2016-9962, we still were holding onto a file descriptor that referenced the host filesystem. This meant that in certain scenarios it was still possible for a semi-privileged container to gain access to the host filesystem (if they had CAP_SYS_PTRACE). Instead, open the FIFO itself using a O_PATH. This allows us to reference the FIFO directly without providing the ability for directory-level access. When opening the FIFO inside the init process, open it through procfs to re-open the actual FIFO (this is currently the only supported way to open such a file descriptor). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-25 13:19:03 +10:00
Aleksa Sarai	7cfb107f2c	factory: use e{u,g}id as the owner of /run/runc/$id It appears as though these semantics were not fully thought out when implementing them for rootless containers. It is not necessary (and could be potentially dangerous) to set the owner of /run/ctr/$id to be the root inside the container (if user namespaces are being used). Instead, just use the e{g,u}id of runc to determine the owner. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-07-12 06:30:46 +10:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Harshal Patil	700c74cb7e	Issue #1429 : Removing check for id string length Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>	2017-05-04 09:21:29 +05:30
Aleksa Sarai	f0876b0427	libcontainer: configs: add proper HostUID and HostGID Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	baeef29858	rootless: add rootless cgroup manager The rootless cgroup manager acts as a noop for all set and apply operations. It is just used for rootless setups. Currently this is far too simple (we need to add opportunistic cgroup management), but is good enough as a first-pass at a noop cgroup manager. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Michael Crosby	00a0ecf554	Add separate console socket Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-03-16 10:23:59 -07:00
Qiang Huang	805b8c73d3	Do not create exec fifo in factory.Create It should not be binded to container creation, for example, runc restore needs to create a libcontainer.Container, but it won't need exec fifo. So create exec fifo when container is started or run, where we really need it. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-02-22 10:34:48 -08:00
Qiang Huang	be33383e60	Merge pull request #1293 from stevenh/resolve-initarg Resolve InitArgs to ensure init works	2017-02-03 19:25:52 +08:00
Steven Hartland	b9dfa444c4	Resolve InitArgs to ensure init works If a relative pathed exe is used for InitArgs init will fail to run if Cwd is not set the original path. Prevent failure of init to run by ensuring that exe in InitArgs is an absolute path. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-02-01 13:42:09 +00:00
Aleksa Sarai	e034cedce7	libcontainer: init: only pass stateDirFd when creating a container If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the host filesystem from that file descriptor. This was fixed in part by `5d93fed3d2` ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: `5d93fed3d2` ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-02-02 00:41:11 +11:00
Steven Hartland	64aa78b762	Ensure pipe is always closed on error in StartInitialization Ensure that the pipe is always closed during the error processing of StartInitialization. Also: * Fix a comment typo. * Use newContainerInit directly as there's no need for i to be an initer. * Move the comment about the behaviour of Init() directly above it, clarifying what happens for all defers. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-01-25 12:36:40 +00:00
Aleksa Sarai	4776b4326a	libcontainer: refactor syncT handling To make the code cleaner, and more clear, refactor the syncT handling used when creating the `runc init` process. In addition, document the state changes so that people actually understand what is going on. Rather than only using syncT for the standard initProcess, use it for both initProcess and setnsProcess. This removes some special cases, as well as allowing for the use of syncT with setnsProcess. Also remove a bunch of the boilerplate around syncT handling. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:46:04 +11:00
Michael Crosby	fcc40b7a63	Remove panic from init Print the error message to stderr if we are unable to return it back via the pipe to the parent process. Also, don't panic here as it is most likely a system or user error and not a programmer error. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-10-17 15:54:51 -07:00
Qiang Huang	3597b7b743	Merge pull request #1087 from williammartin/master Fix typo when container does not exist	2016-09-29 09:19:45 +08:00
William Martin	152169ed34	Fix typo when container does not exist Signed-off-by: William Martin <wmartin@pivotal.io>	2016-09-28 11:00:50 +00:00
Peng Gao	c5393da813	Refactor enum map range to slice range grep -r "range map" showw 3 parts use map to range enum types, use slice instead can get better performance and less memory usage. Signed-off-by: Peng Gao <peng.gao.dut@gmail.com>	2016-09-28 15:36:29 +08:00
rajasec	714550f87c	Error handling when container not exists Signed-off-by: rajasec <rajasec79@gmail.com> Error handling when container not exists Signed-off-by: rajasec <rajasec79@gmail.com> Error handling when container not exists Signed-off-by: rajasec <rajasec79@gmail.com> Error handling when container not exists Signed-off-by: rajasec <rajasec79@gmail.com>	2016-08-26 00:00:54 +05:30
Michael Crosby	46d9535096	Merge pull request #934 from macrosheep/fix-initargs Fix and refactor init args	2016-08-24 10:06:01 -07:00
Yang Hongyang	a59d63c5d3	Fix and refactor init args 1. According to docs of Cmd.Path and Cmd.Args from package "os/exec": Path is the path of the command to run. Args holds command line arguments, including the command as Args[0]. We have mixed usage of args. In InitPath(), InitArgs only take arguments, in InitArgs(), InitArgs including the command as Args[0]. This is confusing. 2. InitArgs() already have the ability to configure a LinuxFactory with the provided absolute path to the init binary and arguements as InitPath() does. 3. exec.Command() will take care of serching executable path. 4. The default "/proc/self/exe" instead of os.Args[0] is passed to InitArgs in order to allow relative path for the runC binary. Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>	2016-07-06 23:21:02 -04:00
Yang Hongyang	9ade2cc5ce	libcontainer: Add a helper func to set CriuPath Added a helper func to set CriuPath for LinuxFactory. Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>	2016-07-06 22:58:55 -04:00
Qiang Huang	14e95b2aa9	Make state detection precise Fixes: https://github.com/opencontainers/runc/issues/871 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-07-05 08:24:13 +08:00
Michael Crosby	5ce88a95f6	Fix fifo usage with userns Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 20:20:48 -07:00
Michael Crosby	3aacff695d	Use fifo for create/start This removes the use of a signal handler and SIGCONT to signal the init process to exec the users process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 11:26:53 -07:00
Michael Crosby	efcd73fb5b	Fix signal handling for unit tests Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:10:47 -07:00
Michael Crosby	3fc929f350	Only create a buffered channel of one Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Michael Crosby	30f1006b33	Fix libcontainer states Move initialized to created and destoryed to stopped. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Michael Crosby	3fe7d7f31e	Add create and start command for container lifecycle Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Alexander Morozov	d57898610b	Merge pull request #675 from pankit/master Allow + in container ID	2016-05-25 10:35:08 -07:00
Tonis Tiigi	78ecdfe18e	Show proper error from init process panic Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-03-22 15:57:15 -07:00
pankit thapar	4629512d89	Allow + in container ID Signed-off-by: pankit thapar <pankit@umich.edu>	2016-03-22 11:40:55 -04:00
rajasec	d4be3405c7	Fixing valid-id in regex Signed-off-by: rajasec <rajasec79@gmail.com>	2016-03-14 08:48:41 +05:30
Michael Crosby	213c1a1a4a	Revert "Return proper exit code for exec errors" This reverts commit `6bb653a6e8`. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-03-10 11:00:48 -08:00
Michael Crosby	6bb653a6e8	Return proper exit code for exec errors Exec erros from the exec() syscall in the container's init should be treated as if the container ran but couldn't execute the process for the user instead of returning a libcontainer error as if it was an issue in the library. Before specifying different commands like `/etc`, `asldfkjasdlfj`, or `/alsdjfkasdlfj` would always return 1 on the command line with a libcontainer specific error message. Now they return the correct message and exit status defined for unix processes. Example: ```bash root@deathstar:/containers/redis# runc start test exec: "/asdlfkjasldkfj": file does not exist root@deathstar:/containers/redis# echo $? 127 root@deathstar:/containers/redis# runc start test exec: "asdlfkjasldkfj": executable file not found in $PATH root@deathstar:/containers/redis# echo $? 127 root@deathstar:/containers/redis# runc start test exec: "/etc": permission denied root@deathstar:/containers/redis# echo $? 126 ``` Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-02-26 11:41:56 -08:00
Alexander Morozov	c6d18308b8	Merge pull request #526 from hqhq/hq_remove_procStart Remove procStart	2016-02-16 09:12:04 -08:00
Qiang Huang	13e8f6e589	Remove procStart It's never used and not needed. Our pipe is created with syscall.SOCK_CLOEXEC, so pipe will be closed once container process executed successfully, parent process will read EOF and continue. If container process got error before executed, we'll write procError to sync with parent. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-01-30 13:41:21 +08:00
Michael Crosby	1172a1e1e5	Update list command and created methods We don't need a CreatedTime method on the container because it's not part of the interface and can be received via the state. We also do not need to call it CreateTime because the type of this field is time.Time so we know its time. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-01-28 13:32:24 -08:00
Michael Crosby	480e5f4416	Merge pull request #507 from mikebrow/runc-ls-command adds list command	2016-01-28 13:20:07 -08:00
Mike Brown	4c871267db	adds list command, and a timestamp in the container state Signed-off-by: Mike Brown <brownwm@us.ibm.com>	2016-01-28 14:21:06 -06:00
Michael Crosby	7cd384c0e5	Merge pull request #515 from crosbymichael/readall Do not use stream encoders for pipe communication	2016-01-26 14:37:54 -08:00
Michael Crosby	ddcee3cc2a	Do not use stream encoders Marshall the raw objects for the sync pipes so that no new line chars are left behind in the pipe causing errors. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-01-26 11:22:05 -08:00
Doug Davis	ff034a5119	Remove the nullState Add a "createdState" in its place since I think that better describes what its used for. Signed-off-by: Doug Davis <dug@us.ibm.com>	2016-01-25 00:26:11 -08:00
Michael Crosby	9c3fa7928e	Allow switch to anything from nullState Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-01-21 16:48:05 -08:00
Aleksa Sarai	103853ead7	libcontainer: set cgroup config late Due to the fact that the init is implemented in Go (which seemingly randomly spawns new processes and loves eating memory), most cgroup configurations are required to have an arbitrary minimum dictated by the init. This confuses users and makes configuration more annoying than it should. An example of this is pids.max, where Go spawns multiple processes that then cause init to violate the pids cgroup constraint before the container can even start. Solve this problem by setting the cgroup configurations as late as possible, to avoid hitting as many of the resources hogged by the Go init as possible. This has to be done before seccomp rules are applied, as the parent and child must synchronise in order for the parent to correctly set the configurations (and writes might be blocked by seccomp). Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:35 +11:00
Michael Crosby	4415446c32	Add state pattern for container state transition Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add state status() method Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Allow multiple checkpoint on restore Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Handle leave-running state Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Fix state transitions for inprocess Because the tests use libcontainer in process between the various states we need to ensure that that usecase works as well as the out of process one. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Remove isDestroyed method Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Handling Pausing from freezer state Signed-off-by: Rajasekaran <rajasec79@gmail.com> freezer status Signed-off-by: Rajasekaran <rajasec79@gmail.com> Fixing review comments Signed-off-by: Rajasekaran <rajasec79@gmail.com> Added comment when freezer not available Signed-off-by: Rajasekaran <rajasec79@gmail.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Conflicts: libcontainer/container_linux.go Change checkFreezer logic to isPaused() Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Remove state base and factor out destroy func Signed-off-by: Michael Crosby <crosbymichael@gmail.com> Add unit test for state transitions Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-12-17 13:55:38 -08:00
Doug Davis	e5dc12a0c9	Add more context around some error cases Signed-off-by: Doug Davis <dug@us.ibm.com>	2015-10-30 10:55:48 -07:00
Alexander Morozov	d9e513043c	Use /proc/self/exe as default for InitPath Signed-off-by: Alexander Morozov <lk4d4@docker.com>	2015-07-24 11:45:09 -07:00

1 2

54 Commits