jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Kir Kolyshkin	af6b9e7fa9	nit: do not use syscall package In many places (not all of them though) we can use `unix.` instead of `syscall.` as these are indentical. In particular, x/sys/unix defines: ```go type Signal = syscall.Signal type Errno = syscall.Errno type SysProcAttr = syscall.SysProcAttr const ENODEV = syscall.Errno(0x13) ``` and unix.Exec() calls syscall.Exec(). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2020-04-18 16:16:49 -07:00
zyu	957da1f9ab	Use named error return for initProcess#start Signed-off-by: zyu <yuzhihong@gmail.com>	2020-03-09 09:29:03 -07:00
Qiang Huang	c8337777b6	Merge pull request #2042 from xiaochenshen/rdt-add-missing-destroy libcontainer: intelrdt: add missing destroy handler in defer func	2019-05-21 09:48:00 +08:00
Georgi Sabev	a146081828	Write logs to stderr by default Minor refactoring to use the filePair struct for both init sock and log pipe Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2019-04-24 15:18:14 +03:00
Xiaochen Shen	17b37ea3fa	libcontainer: intelrdt: add missing destroy handler in defer func In the exception handling of initProcess.start(), we need to add the missing IntelRdtManager.Destroy() handler in defer func. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2019-04-24 16:41:51 +08:00
Georgi Sabev	ba3cabf932	Improve nsexec logging * Simplify logging function * Logs contain __FUNCTION__:__LINE__ * Bail uses write_log Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com> Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2019-04-22 17:53:52 +03:00
Danail Branekov	c486e3c406	Address comments in PR 1861 Refactor configuring logging into a reusable component so that it can be nicely used in both main() and init process init() Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com> Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io> Co-authored-by: Claudia Beresford <cberesford@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>	2019-04-04 14:57:28 +03:00
Marco Vedovati	9a599f62fb	Support for logging from children processes Add support for children processes logging (including nsexec). A pipe is used to send logs from children to parent in JSON. The JSON format used is the same used by logrus JSON formatted, i.e. children process can use standard logrus APIs. Signed-off-by: Marco Vedovati <mvedovati@suse.com>	2019-04-04 14:53:23 +03:00
Alex Fang	eab5330908	Fixes regression causing zombie runc:[1:CHILD] processes Whenever processes are spawned using nsexec, a zombie runc:[1:CHILD] process will always be created and will need to be reaped by the parent Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>	2019-03-21 13:43:38 +11:00
W. Trevor King	e23868603a	libcontainer: Set 'status' in hook stdin Finish off the work started in `a344b2d6` (sync up `HookState` with OCI spec `State`, 2016-12-19, #1201). And drop HookState, since there's no need for a local alias for specs.State. Also set c.initProcess in newInitProcess to support OCIState calls from within initProcess.start(). I think the cyclic references between linuxContainer and initProcess are unfortunate, but didn't want to address that here. I've also left the timing of the Prestart hooks alone, although the spec calls for them to happen before start (not as part of creation) [1,2]. Once the timing gets fixed we can drop the initProcessStartTime hacks which initProcess.start currently needs. I'm not sure why we trigger the prestart hooks in response to both procReady and procHooks. But we've had two prestart rounds in initProcess.start since `2f276498` (Move pre-start hooks after container mounts, 2016-02-17, #568). I've left that alone too. I really think we should have len() guards to avoid computing the state when .Hooks is non-nil but the particular phase we're looking at is empty. Aleksa, however, is adamantly against them [3] citing a risk of sloppy copy/pastes causing the hook slice being len-guarded to diverge from the hook slice being iterated over within the guard. I think that ort of thing is very lo-risk, because: * We shouldn't be copy/pasting this, right? DRY for the win :). * There's only ever a few lines between the guard and the guarded loop. That makes broken copy/pastes easy to catch in review. * We should have test coverage for these. Guarding with the wrong slice is certainly not the only thing you can break with a sloppy copy/paste. But I'm not a maintainer ;). [1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart [2]: https://github.com/opencontainers/runc/issues/1710 [3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570 Signed-off-by: W. Trevor King <wking@tremily.us>	2018-11-14 06:49:49 -08:00
Yuanhong Peng	df3fa115f9	Add support for cgroup namespace Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-31 10:51:43 -04:00
Mrunal Patel	a00bf01908	Merge pull request #1862 from AkihiroSuda/decompose-rootless-pr Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression)	2018-10-15 17:32:15 -07:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Yan Zhu	feb90346e0	doc: fix typo Signed-off-by: Yan Zhu <yanzhu@alauda.io>	2018-09-07 11:58:59 +08:00
Antonio Murdaca	cd1e7abee2	libcontainer: expose annotations in hooks Annotations weren't passed to hooks. This patch fixes that by passing annotations to stdin for hooks. Signed-off-by: Antonio Murdaca <runcom@redhat.com>	2018-01-11 16:54:01 +01:00
Will Martin	ca4f427af1	Support cgroups with limits as rootless Signed-off-by: Ed King <eking@pivotal.io> Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io> Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>	2017-10-05 11:22:54 +01:00
Michael Crosby	7062c7556b	Apply cgroups earlier This applies cgroups earlier for container creation before the init process starts running and forking off any additional processes. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-09-07 11:27:33 -04:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Aleksa Sarai	7d66aab77a	init: switch away from stateDirFd entirely While we have significant protections in place against CVE-2016-9962, we still were holding onto a file descriptor that referenced the host filesystem. This meant that in certain scenarios it was still possible for a semi-privileged container to gain access to the host filesystem (if they had CAP_SYS_PTRACE). Instead, open the FIFO itself using a O_PATH. This allows us to reference the FIFO directly without providing the ability for directory-level access. When opening the FIFO inside the init process, open it through procfs to re-open the actual FIFO (this is currently the only supported way to open such a file descriptor). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-25 13:19:03 +10:00
Alex Fang	e92add2151	Pass back the pid of runc:[1:CHILD] so we can wait on it This allows the libcontainer to automatically clean up runc:[1:CHILD] processes created as part of nsenter. Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>	2017-08-05 13:44:36 +10:00
W. Trevor King	75d98b26b7	libcontainer: Replace GetProcessStartTime with Stat_t.StartTime And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
Daniel, Dao Quang Minh	67bd2ab554	Merge pull request #1442 from clnperez/libcontainer-sys-unix Move libcontainer to x/sys/unix	2017-05-26 12:18:33 +01:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Wentao Zhang	09c1f5c055	Fix setup cgroup before prestart hook * User Case: User could use prestart hook to add block devices to container. so the hook should have a way to set the permissions of the devices. Just move cgroup config operation before prestart hook will work. Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>	2017-05-19 17:53:43 +08:00
Aleksa Sarai	baeef29858	rootless: add rootless cgroup manager The rootless cgroup manager acts as a noop for all set and apply operations. It is just used for rootless setups. Currently this is far too simple (we need to add opportunistic cgroup management), but is good enough as a first-pass at a noop cgroup manager. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:46:20 +11:00
Aleksa Sarai	d2f49696b0	runc: add support for rootless containers This enables the support for the rootless container mode. There are many restrictions on what rootless containers can do, so many different runC commands have been disabled: * runc checkpoint * runc events * runc pause * runc ps * runc restore * runc resume * runc update The following commands work: * runc create * runc delete * runc exec * runc kill * runc list * runc run * runc spec * runc state In addition, any specification options that imply joining cgroups have also been disabled. This is due to support for unprivileged subtree management not being available from Linux upstream. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:24 +11:00
Aleksa Sarai	6bd4bd9030	*: handle unprivileged operations and !dumpable Effectively, !dumpable makes implementing rootless containers quite hard, due to a bunch of different operations on /proc/self no longer being possible without reordering everything. !dumpable only really makes sense when you are switching between different security contexts, which is only the case when we are joining namespaces. Unfortunately this means that !dumpable will still have issues in this instance, and it should only be necessary to set !dumpable if we are not joining USER namespaces (new kernels have protections that make !dumpable no longer necessary). But that's a topic for another time. This also includes code to unset and then re-set dumpable when doing the USER namespace mappings. This should also be safe because in principle processes in a container can't see us until after we fork into the PID namespace (which happens after the user mapping). In rootless containers, it is not possible to set a non-dumpable process's /proc/self/oom_score_adj (it's owned by root and thus not writeable). Thus, it needs to be set inside nsexec before we set ourselves as non-dumpable. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-03-23 20:45:19 +11:00
Michael Crosby	00a0ecf554	Add separate console socket Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-03-16 10:23:59 -07:00
Mrunal Patel	4f9cb13b64	Update runtime spec to 1.0.0.rc5 Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2017-03-15 11:38:37 -07:00
Aleksa Sarai	e034cedce7	libcontainer: init: only pass stateDirFd when creating a container If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the host filesystem from that file descriptor. This was fixed in part by `5d93fed3d2` ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: `5d93fed3d2` ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-02-02 00:41:11 +11:00
Zhang Wei	a344b2d6a8	sync up `HookState` with OCI spec `State` `HookState` struct should follow definition of `State` in runtime-spec: * modify json name of `version` to `ociVersion`. * Remove redundant `Rootfs` field as rootfs can be retrived from `bundlePath/config.json`. Signed-off-by: Zhang Wei <zhangwei555@huawei.com>	2016-12-20 00:00:43 +08:00
Aleksa Sarai	244c9fc426	*: console rewrite This implements {createTTY, detach} and all of the combinations and negations of the two that were previously implemented. There are some valid questions about out-of-OCI-scope topics like !createTTY and how things should be handled (why do we dup the current stdio to the process, and how is that not a security issue). However, these will be dealt with in a separate patchset. In order to allow for late console setup, split setupRootfs into the "preparation" section where all of the mounts are created and the "finalize" section where we pivot_root and set things as ro. In between the two we can set up all of the console mountpoints and symlinks we need. We use two-stage synchronisation to ensures that when the syscalls are reordered in a suboptimal way, an out-of-place read() on the parentPipe will not gobble the ancilliary information. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Aleksa Sarai	4776b4326a	libcontainer: refactor syncT handling To make the code cleaner, and more clear, refactor the syncT handling used when creating the `runc init` process. In addition, document the state changes so that people actually understand what is going on. Rather than only using syncT for the standard initProcess, use it for both initProcess and setnsProcess. This removes some special cases, as well as allowing for the use of syncT with setnsProcess. Also remove a bunch of the boilerplate around syncT handling. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:46:04 +11:00
Michael Crosby	e58671e530	Add --all flag to kill This allows a user to send a signal to all the processes in the container within a single atomic action to avoid new processes being forked off before the signal can be sent. This is basically taking functionality that we already use being `delete` and exposing it ok the `kill` command by adding a flag. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-11-08 09:35:02 -08:00
Crazykev	34d7c5c099	fix error message Signed-off-by: Crazykev <crazykev@zju.edu.cn>	2016-11-02 16:34:08 +08:00
Daniel Dao	1b876b0bf2	fix typos with misspell pipe the source through https://github.com/client9/misspell. typos be gone! Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2016-10-11 23:22:48 +00:00
Wang Long	5eaa9ed5cd	just fix a typo Signed-off-by: Wang Long <long.wanglong@huawei.com>	2016-10-11 08:38:15 +00:00
Yuanhong Peng	6ed0652ee0	Fix typo Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2016-09-21 20:13:32 +08:00
Qiang Huang	b2e811183b	Allow recrusive generic error Error sent from child process is already genericError, if we don't allow recrusive generic error, we won't get any cause infomation from parent process. Before, we got: WARN[0000] exit status 1 ERRO[0000] operation not permitted After, we got: WARN[0000] exit status 1 ERRO[0000] container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"operation not permitted\"" it's not pretty but useful for detecting root causes. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2016-09-14 15:55:46 +08:00
Michael Crosby	3aacff695d	Use fifo for create/start This removes the use of a signal handler and SIGCONT to signal the init process to exec the users process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 11:26:53 -07:00
Michael Crosby	efcd73fb5b	Fix signal handling for unit tests Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:10:47 -07:00
Aleksa Sarai	1a913c7b89	*: correctly chown() consoles In user namespaces, we need to make sure we don't chown() the console to unmapped users. This means we need to get both the UID and GID of the root user in the container when changing the owner. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-05-22 22:37:13 +10:00
Michael Crosby	6978875298	Add cause to error messages This is the inital port of the libcontainer.Error to added a cause to all the existing error messages. Going forward, when an error can be wrapped because it is not being checked at the higher levels for something like `os.IsNotExist` we can add more information to the error message like cause and stack file/line information. This will help higher level tools to know what cause a container start or operation to fail. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-04-18 11:37:26 -07:00
George Lestaris	f7ae27bfb7	HookState adhears to OCI Signed-off-by: George Lestaris <glestaris@pivotal.io> Signed-off-by: Ed King <eking@pivotal.io>	2016-04-06 16:57:59 +01:00
Julian Friedman	e91b2b8aca	Set rlimits using prlimit in parent Fixes #680 This changes setupRlimit to use the Prlimit syscall (rather than Setrlimit) and moves the call to the parent process. This is necessary because Setrlimit would affect the libcontainer consumer if called in the parent, and would fail if called from the child if the child process is in a user namespace and the requested rlimit is higher than that in the parent. Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>	2016-03-25 15:11:44 +00:00
Tonis Tiigi	78ecdfe18e	Show proper error from init process panic Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>	2016-03-22 15:57:15 -07:00
Mrunal Patel	69db69668e	Set oom_score_adj before we send the config to avoid race Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-03-21 15:33:17 -07:00
Phil Estes	178bad5e71	Properly setuid/setgid after entering userns The re-work of namespace entering lost the setuid/setgid that was part of the Go-routine based process exec in the prior code. A side issue was found with setting oom_score_adj before execve() in a userns that is also solved here. Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2016-03-04 11:12:26 -05:00
Kenfe-Mickael Laventure	6325ab96e7	Call Prestart hook after namespaces have been set This simply move the call to the Prestart hooks to be made once we receive the procReady message from the client. This is necessary as we had to move the setns calls within nsexec in order to be accomodate joining namespaces that only affect future children (e.g. NEWPID). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2016-02-28 12:26:53 -08:00
Daniel, Dao Quang Minh	42d5d04801	Sets custom namespaces for init processes An init process can join other namespaces (pidns, ipc etc.). This leverages C code defined in nsenter package to spawn a process with correct namespaces and clone if necessary. This moves all setns and cloneflags related code to nsenter layer, which mean that we dont use Go os/exec to create process with cloneflags and set uid/gid_map or setgroups anymore. The necessary data is passed from Go to C using a netlink binary-encoding format. With this change, setns and init processes are almost the same, which brings some opportunity for refactoring. Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com> [mickael.laventure@docker.com: adapted to apply on master @ d97d5e] Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>	2016-02-28 12:26:53 -08:00

1 2

74 Commits