jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
W. Trevor King	2bea4c897e	libcontainer/system/proc: Add Stat_t.State And Stat_t.PID and Stat_t.Name while we're at it. Then use the new .State property in runType to distinguish between running and zombie/dead processes, since kill(2) does not [1]. With this change we no longer claim Running status for zombie/dead processes. I've also removed the kill(2) call from runType. It was originally added in `13841ef3` (new-api: return the Running state only if the init process is alive, 2014-12-23), but we've been accessing /proc/[pid]/stat since `14e95b2a` (Make state detection precise, 2016-07-05, #930), and with the /stat access the kill(2) check is redundant. I also don't see much point to the previously-separate doesInitProcessExist, so I've inlined that logic in runType. It would be nice to distinguish between "/proc/[pid]/stat doesn't exist" and errors parsing its contents, but I've skipped that for the moment. The Running -> Stopped change in checkpoint_test.go is because the post-checkpoint process is a zombie, and with this commit zombie processes are Stopped (and no longer Running). [1]: https://github.com/opencontainers/runc/pull/1483#issuecomment-307527789 Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
W. Trevor King	75d98b26b7	libcontainer: Replace GetProcessStartTime with Stat_t.StartTime And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Mrunal Patel	639454475c	Merge pull request #1355 from avagin/cr-console Dump and restore containers with external terminals	2017-05-18 11:22:52 -07:00
Harshal Patil	22953c122f	Remove redundant declaraion of namespace slice Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>	2017-05-02 10:04:57 +05:30
Andrei Vagin	73258813d3	cr: set a freezer cgroup for criu A freezer cgroup allows to dump processes faster. If a user wants to checkpoint a container and its storage, he has to pause a container, but in this case we need to pass a path to its freezer cgroup to "criu dump". Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-05-02 04:48:47 +03:00
Andrei Vagin	1c43d091a1	checkpoint: add support for containers with terminals CRIU was extended to report about orphaned master pty-s via RPC. Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-05-02 04:48:47 +03:00
Andrei Vagin	d307e85dbb	Print a criu version in a error message Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-05-01 21:45:23 +03:00
Harshal Patil	c44d4fa6ed	Optimizing looping over namespaces Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>	2017-04-26 11:54:43 +05:30
Qiang Huang	94cfb7955b	Merge pull request #1387 from avagin/freezer Don't try to read freezer.state from the current directory	2017-04-24 20:02:45 -05:00
Mrunal Patel	97db1eaad9	Merge pull request #1396 from harche/cstate Set container state only once during start	2017-04-17 11:32:42 -07:00
Mrunal Patel	7814a0d14b	Merge pull request #1399 from avagin/cr-cgroup restore: apply resource limits	2017-04-13 11:28:28 -07:00
Andrei Vagin	57ef30a2ae	restore: apply resource limits When C/R was implemented, it was enough to call manager.Set to apply limits and to move a task. Now .Set() and .Apply() have to be called separately. Fixes: `8a740d5391` ("libcontainer: cgroups: don't Set in Apply") Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-04-07 02:47:43 +03:00
Adrian Reber	273b7853c8	checkpoint: check if system supports pre-dumping Instead of relying on version numbers it is possible to check if CRIU actually supports certain features. This introduces an initial implementation to check if CRIU and the underlying kernel actually support dirty memory tracking for memory pre-dumping. Upstream CRIU also supports the lazy-page migration feature check and additional feature checks can be included in CRIU to reduce the version number parsing. There are also certain CRIU features which depend on one side on the CRIU version but also require certain kernel versions to actually work. CRIU knows if it can do certain things on the kernel it is running on and using the feature check RPC interface makes it easier for runc to decide if the criu+kernel combination will support that feature. Feature checking was introduced with CRIU 1.8. Running with older CRIU versions will ignore the feature check functionality and behave just like it used to. v2: - Do not use reflection to compare requested and responded features. Checking which feature is available is now hardcoded and needs to be adapted for every new feature check. The code is now much more readable and simpler. v3: - Move the variable criuFeat out of the linuxContainer struct, as it is not container specific. Now it is a global variable. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-04-06 11:17:52 +00:00
Harshal Patil	1be5d31da2	Set container state only once during start Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>	2017-04-04 15:08:04 +05:30
Aleksa Sarai	f0876b0427	libcontainer: configs: add proper HostUID and HostGID Previously Host{U,G}ID only gave you the root mapping, which isn't very useful if you are trying to do other things with the IDMaps. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	baeef29858	rootless: add rootless cgroup manager The rootless cgroup manager acts as a noop for all set and apply operations. It is just used for rootless setups. Currently this is far too simple (we need to add opportunistic cgroup management), but is good enough as a first-pass at a noop cgroup manager. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Aleksa Sarai	6bd4bd9030	*: handle unprivileged operations and !dumpable Effectively, !dumpable makes implementing rootless containers quite hard, due to a bunch of different operations on /proc/self no longer being possible without reordering everything. !dumpable only really makes sense when you are switching between different security contexts, which is only the case when we are joining namespaces. Unfortunately this means that !dumpable will still have issues in this instance, and it should only be necessary to set !dumpable if we are not joining USER namespaces (new kernels have protections that make !dumpable no longer necessary). But that's a topic for another time. This also includes code to unset and then re-set dumpable when doing the USER namespace mappings. This should also be safe because in principle processes in a container can't see us until after we fork into the PID namespace (which happens after the user mapping). In rootless containers, it is not possible to set a non-dumpable process's /proc/self/oom_score_adj (it's owned by root and thus not writeable). Thus, it needs to be set inside nsexec before we set ourselves as non-dumpable. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:19 +11:00
Andrei Vagin	88256d646d	Don't try to read freezer.state from the current directory If we try to pause a container on the system without freezer cgroups, we can found that runc tries to open ./freezer.state. It is obviously wrong. $ ./runc pause test no such directory for freezer.state $ echo FROZEN > freezer.state $ ./runc pause test container not running or created: paused Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2017-03-23 01:58:45 +03:00
Michael Crosby	00a0ecf554	Add separate console socket Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-03-16 10:23:59 -07:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Qiang Huang	b7932a2e07	Remove unused ExecFifoPath In container process's Init function, we use fd + execFifoFilename to open exec fifo, so this field in init config is never used. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-09 10:58:16 +08:00
Qiang Huang	707dd48b2f	Merge pull request #1001 from x1022as/predump add pre-dump and parent-path to checkpoint	2017-02-24 10:55:06 -08:00
Qiang Huang	733563552e	Fix state when _LIBCONTAINER in environment Fixes: #1311 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-02-22 10:35:14 -08:00
Qiang Huang	805b8c73d3	Do not create exec fifo in factory.Create It should not be binded to container creation, for example, runc restore needs to create a libcontainer.Container, but it won't need exec fifo. So create exec fifo when container is started or run, where we really need it. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-02-22 10:34:48 -08:00
Deng Guangxing	98f004182b	add pre-dump and parent-path to checkpoint CRIU gets pre-dump to complete iterative migration. pre-dump saves process memory info only. And it need parent-path to specify the former memory files. This patch add pre-dump and parent-path arguments to runc checkpoint Signed-off-by: Deng Guangxing <dengguangxing@huawei.com> Signed-off-by: Adrian Reber <areber@redhat.com>	2017-02-14 19:45:07 +08:00
Aleksa Sarai	e034cedce7	libcontainer: init: only pass stateDirFd when creating a container If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the host filesystem from that file descriptor. This was fixed in part by `5d93fed3d2` ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: `5d93fed3d2` ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-02-02 00:41:11 +11:00
Qiang Huang	db99936a0e	Merge pull request #1110 from avagin/cpt-in-userns checkpoint: handle config.Devices and config.MaskPaths	2017-01-10 00:34:40 -06:00
Zhang Wei	a344b2d6a8	sync up `HookState` with OCI spec `State` `HookState` struct should follow definition of `State` in runtime-spec: * modify json name of `version` to `ociVersion`. * Remove redundant `Rootfs` field as rootfs can be retrived from `bundlePath/config.json`. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-20 00:00:43 +08:00
Mrunal Patel	34f23cb99c	Merge pull request #1018 from cyphar/console-rewrite Consoles, consoles, consoles.	2016-12-07 14:37:19 -08:00
Xianlu Bird	e2e6f58e4e	Fix typo Fix typo	2016-12-01 15:23:58 +08:00
Aleksa Sarai	244c9fc426	*: console rewrite This implements {createTTY, detach} and all of the combinations and negations of the two that were previously implemented. There are some valid questions about out-of-OCI-scope topics like !createTTY and how things should be handled (why do we dup the current stdio to the process, and how is that not a security issue). However, these will be dealt with in a separate patchset. In order to allow for late console setup, split setupRootfs into the "preparation" section where all of the mounts are created and the "finalize" section where we pivot_root and set things as ro. In between the two we can set up all of the console mountpoints and symlinks we need. We use two-stage synchronisation to ensures that when the syscalls are reordered in a suboptimal way, an out-of-place read() on the parentPipe will not gobble the ancilliary information. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Michael Crosby	e58671e530	Add --all flag to kill This allows a user to send a signal to all the processes in the container within a single atomic action to avoid new processes being forked off before the signal can be sent. This is basically taking functionality that we already use being `delete` and exposing it ok the `kill` command by adding a flag. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-11-08 09:35:02 -08:00
Andrei Vagin	040fb7311c	checkpoint: handle config.Devices and config.MaskPaths In user namespaces devices are bind-mounted from the host, so we need to add them as external mounts for CRIU. Reported-by: Ross Boucher <boucher@gmail.com> Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>	2016-10-26 23:50:54 +03:00
Aleksa Sarai	2cd9c31b99	nsenter: guarantee correct user namespace ordering Depending on your SELinux setup, the order in which you join namespaces can be important. In general, user namespaces should always be joined and unshared first because then the other namespaces are correctly pinned and you have the right priviliges within them. This also is very useful for rootless containers, as well as older kernels that had essentially broken unshare(2) and clone(2) implementations. This also includes huge refactorings in how we spawn processes for complicated reasons that I don't want to get into because it will make me spiral into a cloud of rage. The reasoning is in the giant comment in clone_parent. Have fun. In addition, because we now create multiple children with CLONE_PARENT, we cannot wait for them to SIGCHLD us in the case of a death. Thus, we have to resort to having a child kindly send us their exit code before they die. Hopefully this all works okay, but at this point there's not much more than we can do. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-04 16:17:55 +11:00
Aleksa Sarai	ed053a740c	nsenter: specify namespace type in setns() This avoids us from running into cases where libcontainer thinks that a particular namespace file is a different type, and makes it a fatal error rather than causing broken functionality. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-10-04 16:17:55 +11:00
Wang Long	59a241f647	update the comment for container.Pause() method on linux if a container state is running or created, the container.Pause() method can set the state to pausing, and then paused. this patch update the comment, so it can be consistent with the code. Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-09-20 10:49:04 +08:00
Qiang Huang	1e319efa36	Merge pull request #815 from rajasec/basecont-comments Updated the libcontainer interface comments	2016-08-26 09:43:50 +08:00
Michael Crosby	46d9535096	Merge pull request #934 from macrosheep/fix-initargs Fix and refactor init args	2016-08-24 10:06:01 -07:00
rajasec	1ea17d73fe	Updated the libcontainer interface comments Signed-off-by: rajasec <rajasec79@gmail.com>	2016-08-23 19:14:27 +05:30
Phil Estes	85f4d20b44	Restored-from-checkpoint containers should have a start time Set the start time similar to a brand new container. Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2016-08-21 18:15:18 -04:00
Qiang Huang	41b12c095b	Merge pull request #913 from cloudfoundry-incubator/addgroupsnocompatible Let the user explicitly specify `additionalGids` on `runc exec`	2016-07-15 10:12:31 +08:00
Yang Hongyang	a59d63c5d3	Fix and refactor init args 1. According to docs of Cmd.Path and Cmd.Args from package "os/exec": Path is the path of the command to run. Args holds command line arguments, including the command as Args[0]. We have mixed usage of args. In InitPath(), InitArgs only take arguments, in InitArgs(), InitArgs including the command as Args[0]. This is confusing. 2. InitArgs() already have the ability to configure a LinuxFactory with the provided absolute path to the init binary and arguements as InitPath() does. 3. exec.Command() will take care of serching executable path. 4. The default "/proc/self/exe" instead of os.Args[0] is passed to InitArgs in order to allow relative path for the runC binary. Signed-off-by: Yang Hongyang <imhy.yang@gmail.com>	2016-07-06 23:21:02 -04:00
Qiang Huang	14e95b2aa9	Make state detection precise Fixes: https://github.com/opencontainers/runc/issues/871 Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-07-05 08:24:13 +08:00
Petar Petrov	f9b72b1b46	Allow additional groups to be overridden in exec Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com> Signed-off-by: Petar Petrov <pppepito86@gmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2016-06-21 10:35:11 +03:00
Mrunal Patel	f5b6ff23b8	Merge pull request #881 from rajasec/update-status Update for stopped container	2016-06-13 16:05:25 -07:00
Michael Crosby	3aacff695d	Use fifo for create/start This removes the use of a signal handler and SIGCONT to signal the init process to exec the users process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 11:26:53 -07:00
rajasec	12869604ca	Update for stopped container Signed-off-by: rajasec <rajasec79@gmail.com>	2016-06-04 22:08:08 +05:30
Michael Crosby	1d61abea46	Allow delete of created container Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-02 12:26:12 -07:00

1 2 3

114 Commits