jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Alex Fang	e92add2151	Pass back the pid of runc:[1:CHILD] so we can wait on it This allows the libcontainer to automatically clean up runc:[1:CHILD] processes created as part of nsenter. Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>	2017-08-05 13:44:36 +10:00
W. Trevor King	75d98b26b7	libcontainer: Replace GetProcessStartTime with Stat_t.StartTime And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
Daniel, Dao Quang Minh	67bd2ab554	Merge pull request #1442 from clnperez/libcontainer-sys-unix Move libcontainer to x/sys/unix	2017-05-26 12:18:33 +01:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Wentao Zhang	09c1f5c055	Fix setup cgroup before prestart hook * User Case: User could use prestart hook to add block devices to container. so the hook should have a way to set the permissions of the devices. Just move cgroup config operation before prestart hook will work. Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>	2017-05-19 17:53:43 +08:00
Aleksa Sarai	baeef29858	rootless: add rootless cgroup manager The rootless cgroup manager acts as a noop for all set and apply operations. It is just used for rootless setups. Currently this is far too simple (we need to add opportunistic cgroup management), but is good enough as a first-pass at a noop cgroup manager. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Aleksa Sarai	6bd4bd9030	*: handle unprivileged operations and !dumpable Effectively, !dumpable makes implementing rootless containers quite hard, due to a bunch of different operations on /proc/self no longer being possible without reordering everything. !dumpable only really makes sense when you are switching between different security contexts, which is only the case when we are joining namespaces. Unfortunately this means that !dumpable will still have issues in this instance, and it should only be necessary to set !dumpable if we are not joining USER namespaces (new kernels have protections that make !dumpable no longer necessary). But that's a topic for another time. This also includes code to unset and then re-set dumpable when doing the USER namespace mappings. This should also be safe because in principle processes in a container can't see us until after we fork into the PID namespace (which happens after the user mapping). In rootless containers, it is not possible to set a non-dumpable process's /proc/self/oom_score_adj (it's owned by root and thus not writeable). Thus, it needs to be set inside nsexec before we set ourselves as non-dumpable. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:19 +11:00
Michael Crosby	00a0ecf554	Add separate console socket Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-03-16 10:23:59 -07:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Aleksa Sarai	e034cedce7	libcontainer: init: only pass stateDirFd when creating a container If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the host filesystem from that file descriptor. This was fixed in part by `5d93fed3d2` ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: `5d93fed3d2` ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-02-02 00:41:11 +11:00
Zhang Wei	a344b2d6a8	sync up `HookState` with OCI spec `State` `HookState` struct should follow definition of `State` in runtime-spec: * modify json name of `version` to `ociVersion`. * Remove redundant `Rootfs` field as rootfs can be retrived from `bundlePath/config.json`. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-20 00:00:43 +08:00
Aleksa Sarai	244c9fc426	*: console rewrite This implements {createTTY, detach} and all of the combinations and negations of the two that were previously implemented. There are some valid questions about out-of-OCI-scope topics like !createTTY and how things should be handled (why do we dup the current stdio to the process, and how is that not a security issue). However, these will be dealt with in a separate patchset. In order to allow for late console setup, split setupRootfs into the "preparation" section where all of the mounts are created and the "finalize" section where we pivot_root and set things as ro. In between the two we can set up all of the console mountpoints and symlinks we need. We use two-stage synchronisation to ensures that when the syscalls are reordered in a suboptimal way, an out-of-place read() on the parentPipe will not gobble the ancilliary information. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Aleksa Sarai	4776b4326a	libcontainer: refactor syncT handling To make the code cleaner, and more clear, refactor the syncT handling used when creating the `runc init` process. In addition, document the state changes so that people actually understand what is going on. Rather than only using syncT for the standard initProcess, use it for both initProcess and setnsProcess. This removes some special cases, as well as allowing for the use of syncT with setnsProcess. Also remove a bunch of the boilerplate around syncT handling. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:46:04 +11:00
Michael Crosby	e58671e530	Add --all flag to kill This allows a user to send a signal to all the processes in the container within a single atomic action to avoid new processes being forked off before the signal can be sent. This is basically taking functionality that we already use being `delete` and exposing it ok the `kill` command by adding a flag. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-11-08 09:35:02 -08:00
Crazykev	34d7c5c099	fix error message Signed-off-by: Crazykev <crazykev@zju.edu.cn>	2016-11-02 16:34:08 +08:00
Daniel Dao	1b876b0bf2	fix typos with misspell pipe the source through https://github.com/client9/misspell. typos be gone! Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2016-10-11 23:22:48 +00:00
Wang Long	5eaa9ed5cd	just fix a typo Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-10-11 08:38:15 +00:00
Yuanhong Peng	6ed0652ee0	Fix typo Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2016-09-21 20:13:32 +08:00
Qiang Huang	b2e811183b	Allow recrusive generic error Error sent from child process is already genericError, if we don't allow recrusive generic error, we won't get any cause infomation from parent process. Before, we got: WARN[0000] exit status 1 ERRO[0000] operation not permitted After, we got: WARN[0000] exit status 1 ERRO[0000] container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"operation not permitted\"" it's not pretty but useful for detecting root causes. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-09-14 15:55:46 +08:00
Michael Crosby	3aacff695d	Use fifo for create/start This removes the use of a signal handler and SIGCONT to signal the init process to exec the users process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 11:26:53 -07:00
Michael Crosby	efcd73fb5b	Fix signal handling for unit tests Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:10:47 -07:00
Aleksa Sarai	1a913c7b89	*: correctly chown() consoles In user namespaces, we need to make sure we don't chown() the console to unmapped users. This means we need to get both the UID and GID of the root user in the container when changing the owner. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-05-22 22:37:13 +10:00
Michael Crosby	6978875298	Add cause to error messages This is the inital port of the libcontainer.Error to added a cause to all the existing error messages. Going forward, when an error can be wrapped because it is not being checked at the higher levels for something like `os.IsNotExist` we can add more information to the error message like cause and stack file/line information. This will help higher level tools to know what cause a container start or operation to fail. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-04-18 11:37:26 -07:00
George Lestaris	f7ae27bfb7	HookState adhears to OCI Signed-off-by: George Lestaris <glestaris@pivotal.io> Signed-off-by: Ed King <eking@pivotal.io>	2016-04-06 16:57:59 +01:00
Julian Friedman	e91b2b8aca	Set rlimits using prlimit in parent Fixes #680 This changes setupRlimit to use the Prlimit syscall (rather than Setrlimit) and moves the call to the parent process. This is necessary because Setrlimit would affect the libcontainer consumer if called in the parent, and would fail if called from the child if the child process is in a user namespace and the requested rlimit is higher than that in the parent. Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>	2016-03-25 15:11:44 +00:00
Tonis Tiigi	78ecdfe18e	Show proper error from init process panic Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-03-22 15:57:15 -07:00
Mrunal Patel	69db69668e	Set oom_score_adj before we send the config to avoid race Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-03-21 15:33:17 -07:00
Phil Estes	178bad5e71	Properly setuid/setgid after entering userns The re-work of namespace entering lost the setuid/setgid that was part of the Go-routine based process exec in the prior code. A side issue was found with setting oom_score_adj before execve() in a userns that is also solved here. Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2016-03-04 11:12:26 -05:00
Kenfe-Mickael Laventure	6325ab96e7	Call Prestart hook after namespaces have been set This simply move the call to the Prestart hooks to be made once we receive the procReady message from the client. This is necessary as we had to move the setns calls within nsexec in order to be accomodate joining namespaces that only affect future children (e.g. NEWPID). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-02-28 12:26:53 -08:00
Daniel, Dao Quang Minh	42d5d04801	Sets custom namespaces for init processes An init process can join other namespaces (pidns, ipc etc.). This leverages C code defined in nsenter package to spawn a process with correct namespaces and clone if necessary. This moves all setns and cloneflags related code to nsenter layer, which mean that we dont use Go os/exec to create process with cloneflags and set uid/gid_map or setgroups anymore. The necessary data is passed from Go to C using a netlink binary-encoding format. With this change, setns and init processes are almost the same, which brings some opportunity for refactoring. Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com> [mickael.laventure@docker.com: adapted to apply on master @ d97d5e] Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>	2016-02-28 12:26:53 -08:00
Hushan Jia	8597d5c969	Use single decoder instance for one stream This will avoid part of the stream be read and abandomed and resulting decoding errors. Signed-off-by: Hushan Jia <hushan.jia@gmail.com>	2016-02-26 19:40:35 +08:00
Mrunal Patel	2f27649848	Move pre-start hooks after container mounts Today mounts in pre-start hooks get overriden by the default mounts. Moving the pre-start hooks to after the container mounts and before the pivot/move root gives better flexiblity in the hooks. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-02-23 02:50:35 -08:00
Qiang Huang	13e8f6e589	Remove procStart It's never used and not needed. Our pipe is created with syscall.SOCK_CLOEXEC, so pipe will be closed once container process executed successfully, parent process will read EOF and continue. If container process got error before executed, we'll write procError to sync with parent. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-01-30 13:41:21 +08:00
Qiang Huang	064113363d	Fix the comment about sendConfig Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-01-28 09:58:30 +08:00
Michael Crosby	ddcee3cc2a	Do not use stream encoders Marshall the raw objects for the sync pipes so that no new line chars are left behind in the pipe causing errors. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-01-26 11:22:05 -08:00
Aleksa Sarai	103853ead7	libcontainer: set cgroup config late Due to the fact that the init is implemented in Go (which seemingly randomly spawns new processes and loves eating memory), most cgroup configurations are required to have an arbitrary minimum dictated by the init. This confuses users and makes configuration more annoying than it should. An example of this is pids.max, where Go spawns multiple processes that then cause init to violate the pids cgroup constraint before the container can even start. Solve this problem by setting the cgroup configurations as late as possible, to avoid hitting as many of the resources hogged by the Go init as possible. This has to be done before seccomp rules are applied, as the parent and child must synchronise in order for the parent to correctly set the configurations (and writes might be blocked by seccomp). Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:35 +11:00
Aleksa Sarai	f36ed4b174	libcontainer: cgroups: don't Set in Apply Apply and Set are two separate operations, and it doesn't make sense to group the two together (especially considering that the bootstrap process is added to the cgroup as well). The only exception to this is the memory cgroup, which requires the configuration to be set before processes can join. One of the weird cases to deal with is systemd. Systemd sets some of the cgroup configuration options, but not all of them. Because memory is a special case, we need to explicitly set memory in the systemd Apply(). Otherwise, the rest can be safely re-applied in .Set() as usual. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2016-01-12 10:06:35 +11:00
Mrunal Patel	4124ba9468	Revert "cgroups: add pids controller support" Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-12-19 07:48:48 -08:00
Aleksa Sarai	14ed8696c1	libcontainer: set cgroup config late Due to the fact that the init is implemented in Go (which seemingly randomly spawns new processes and loves eating memory), most cgroup configurations are required to have an arbitrary minimum dictated by the init. This confuses users and makes configuration more annoying than it should. An example of this is pids.max, where Go spawns multiple processes that then cause init to violate the pids cgroup constraint before the container can even start. Solve this problem by setting the cgroup configurations as late as possible, to avoid hitting as many of the resources hogged by the Go init as possible. This has to be done before seccomp rules are applied, as the parent and child must synchronise in order for the parent to correctly set the configurations (and writes might be blocked by seccomp). Signed-off-by: Aleksa Sarai <asarai@suse.com>	2015-12-19 11:30:48 +11:00
Aleksa Sarai	8a740d5391	libcontainer: cgroups: don't Set in Apply Apply and Set are two separate operations, and it doesn't make sense to group the two together (especially considering that the bootstrap process is added to the cgroup as well). The only exception to this is the memory cgroup, which requires the configuration to be set before processes can join. Signed-off-by: Aleksa Sarai <asarai@suse.com>	2015-12-19 11:30:47 +11:00
David Calavera	77c36f4b34	Move linux only Process.InitializeIO behind the linux build flag. Signed-off-by: David Calavera <david.calavera@gmail.com>	2015-12-15 15:12:29 -05:00
Daniel, Dao Quang Minh	d914bf7347	setns: add bootstrap data add bootstrap data to setns process. If we have any bootstrap data then copy it to the bootstrap process (i.e. nsexec) using the sync pipe. This will allow us to eventually replace environment variable usage with more structured data to setup namespaces, write pid/gid map, setgroup etc. Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>	2015-11-22 11:36:58 +00:00
Michael Crosby	879dfdd980	Fix race setting process opts When starting and quering for pids a container can start and exit before this is set. So set the opts after the process is started and while libcontainer still has the container's process blocking on the pipe. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-11-06 16:51:59 -08:00
Brian Goff	7632c4585f	Fix for race from error on process start This rather naively fixes an error observed where a processes stdio streams are not written to when there is an error upon starting up the process, such as when the executable doesn't exist within the container's rootfs. Before the "fix", when an error occurred on start, `terminate` is called immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on the process. In some cases when this `Kill` is called the stdio stream have not yet been written to, causing non-deterministic output. The error itself is properly preserved but users attached to the process will not see this error. With the fix it is just calling `Wait()` when an error occurs rather than trying to `Kill()` the process first. This seems to preserve stdio. Signed-off-by: Brian Goff <cpuguy83@gmail.com>	2015-10-07 21:28:26 -04:00
Mrunal Patel	c5d3bda7e1	Merge pull request #292 from keloyang/rpid no need to use p.cmd.Process.Pid in function, use p.pid() instead.	2015-09-24 15:59:39 -07:00
Mrunal Patel	dcafe48737	Add version to HookState to make it json-compatible with spec State Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2015-09-23 17:13:00 -07:00
keloyang	69a5b2df9e	no need to use p.cmd.Process.Pid in function, use p.pid() instead. Signed-off-by: keloyang <yangshukui@huawei.com>	2015-09-23 10:48:36 +08:00
David Calavera	0f28592b35	Turn hook pointers into values. Signed-off-by: David Calavera <david.calavera@gmail.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-09-11 11:34:34 -07:00
Michael Crosby	05567f2c94	Implement hooks in libcontainer Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-09-10 17:57:31 -07:00

1 2

55 Commits