jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Michael Crosby	b1068fb925	Merge pull request #1814 from rhatdan/selinux SELinux labels are tied to the thread	2018-11-05 10:00:11 -05:00
Aleksa Sarai	40f1468413	keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING) While all modern kernels (and I do mean _all_ of them -- this syscall was added in 2.6.10 before git had begun development!) have support for this syscall, LXC has a default seccomp profile that returns ENOSYS for this syscall. For most syscalls this would be a deal-breaker, and our use of session keyrings is security-based there are a few mitigating factors that make this change not-completely-insane: * We already have a flag that disables the use of session keyrings (for older kernels that had system-wide keyring limits and so on). So disabling it is not a new idea. * While the primary justification of using session keys is security-based, it's more of a security-by-obscurity protection. The main defense keyrings have is VFS credentials -- which is something that users already have better security tools for (setuid(2) and user namespaces). * Given the security justification you might argue that we shouldn't silently ignore this. However, the only way for the kernel to return -ENOSYS is either being ridiculously old (at which point we wouldn't work anyway) or that there is a seccomp profile in place blocking it. Given that the seccomp profile (if malicious) could very easily just return 0 or a silly return code (or something even more clever with seccomp-bpf) and trick us without this patch, there isn't much of a significant change in how much seccomp can trick us with or without this patch. Given all of that over-analysis, I'm pretty convinced there isn't a security problem in this very specific case and it will help out the ChromeOS folks by allowing Docker to run inside their LXC container setup. I'd be happy to be proven wrong. Ref: https://bugs.chromium.org/p/chromium/issues/detail?id=860565 Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-09-17 21:38:30 +10:00
Daniel J Walsh	aa3fee6c80	SELinux labels are tied to the thread We need to lock the threads for the SetProcessLabel to work, should also call SetProcessLabel("") after the container starts to go back to the default SELinux behaviour. Once you call SetProcessLabel, then any process executed by runc will run with this label, even if the process is for setup rather then the container. It is always safest to call the SELinux calls just before the exec of the container, so that other processes do not get started with the incorrect label. Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>	2018-06-11 08:34:58 -04:00
Aleksa Sarai	1f32fff46d	setns init: delay seccomp as late as possible This mirrors the standard_init_linux.go seccomp code, which only applies seccomp early if NoNewPrivileges is enabled. Otherwise it's done immediately before execve to reduce the amount of syscalls necessary for users to enable in their seccomp profiles. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-26 13:42:30 +10:00
Tobias Klauser	4019833d46	libcontainer: use PR_SET_NO_NEW_PRIVS from x/sys/unix Use PR_SET_NO_NEW_PRIVS defined in golang.org/x/sys/unix instead of manually defining it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-13 15:31:33 +02:00
Tobias Klauser	553016d7da	Use Prctl() from x/sys/unix instead of own wrapper Use unix.Prctl() instead of reimplemnting it as system.Prctl(). Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-06-07 15:03:15 +02:00
Qiang Huang	5e7b48f7c0	Use opencontainers/selinux package It's splitted as a separate project. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-03-23 08:21:19 +08:00
Michael Crosby	00a0ecf554	Add separate console socket Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-03-16 10:23:59 -07:00
Aleksa Sarai	e034cedce7	libcontainer: init: only pass stateDirFd when creating a container If we pass a file descriptor to the host filesystem while joining a container, there is a race condition where a process inside the container can ptrace(2) the joining process and stop it from closing its file descriptor to the stateDirFd. Then the process can access the host filesystem from that file descriptor. This was fixed in part by `5d93fed3d2` ("Set init processes as non-dumpable"), but that fix is more of a hail-mary than an actual fix for the underlying issue. To fix this, don't open or pass the stateDirFd to the init process unless we're creating a new container. A proper fix for this would be to remove the need for even passing around directory file descriptors (which are quite dangerous in the context of mount namespaces). There is still an issue with containers that have CAP_SYS_PTRACE and are using the setns(2)-style of joining a container namespace. Currently I'm not really sure how to fix it without rampant layer violation. Fixes: CVE-2016-9962 Fixes: `5d93fed3d2` ("Set init processes as non-dumpable") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-02-02 00:41:11 +11:00
Michael Crosby	5d93fed3d2	Set init processes as non-dumpable This sets the init processes that join and setup the container's namespaces as non-dumpable before they setns to the container's pid (or any other ) namespace. This settings is automatically reset to the default after the Exec in the container so that it does not change functionality for the applications that are running inside, just our init processes. This prevents parent processes, the pid 1 of the container, to ptrace the init process before it drops caps and other sets LSMs. This patch also ensures that the stateDirFD being used is still closed prior to exec, even though it is set as O_CLOEXEC, because of the order in the kernel. https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318 The order during the exec syscall is that the process is set back to dumpable before O_CLOEXEC are processed. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-01-11 09:56:56 -08:00
Aleksa Sarai	244c9fc426	*: console rewrite This implements {createTTY, detach} and all of the combinations and negations of the two that were previously implemented. There are some valid questions about out-of-OCI-scope topics like !createTTY and how things should be handled (why do we dup the current stdio to the process, and how is that not a security issue). However, these will be dealt with in a separate patchset. In order to allow for late console setup, split setupRootfs into the "preparation" section where all of the mounts are created and the "finalize" section where we pivot_root and set things as ro. In between the two we can set up all of the console mountpoints and symlinks we need. We use two-stage synchronisation to ensures that when the syscalls are reordered in a suboptimal way, an out-of-place read() on the parentPipe will not gobble the ancilliary information. This patch is part of the console rewrite patchset. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2016-12-01 15:49:36 +11:00
Guilherme Rezende	1cdaa709f1	libcontainer: rename keyctl package to keys This avoid the goimports tool from remove the libcontainer/keys import line due the package name is diferent from folder name Signed-off-by: Guilherme Rezende <guilhermebr@gmail.com>	2016-07-25 20:59:26 -03:00
Michael Crosby	3aacff695d	Use fifo for create/start This removes the use of a signal handler and SIGCONT to signal the init process to exec the users process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-13 11:26:53 -07:00
Michael Crosby	8c9db3a7a5	Add option to disable new session keys This adds an `--no-new-keyring` flag to run and create so that a new session keyring is not created for the container and the calling processes keyring is inherited. Fixes #818 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-06-03 11:53:07 -07:00
Michael Crosby	c5060ff303	Merge pull request #827 from crosbymichael/create-start Implement create and start	2016-06-03 10:38:03 -07:00
rajasec	9742b02856	Removing the nil check for process label Signed-off-by: rajasec <rajasec79@gmail.com>	2016-06-01 20:29:44 +05:30
Michael Crosby	3fe7d7f31e	Add create and start command for container lifecycle Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-05-31 11:06:41 -07:00
Julian Friedman	e91b2b8aca	Set rlimits using prlimit in parent Fixes #680 This changes setupRlimit to use the Prlimit syscall (rather than Setrlimit) and moves the call to the parent process. This is necessary because Setrlimit would affect the libcontainer consumer if called in the parent, and would fail if called from the child if the child process is in a user namespace and the requested rlimit is higher than that in the parent. Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>	2016-03-25 15:11:44 +00:00
Michael Crosby	20422c9bd9	Update libcontainer to support rlimit per process This updates runc and libcontainer to handle rlimits per process and set them correctly for the container. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-03-10 14:35:16 -08:00
Phil Estes	178bad5e71	Properly setuid/setgid after entering userns The re-work of namespace entering lost the setuid/setgid that was part of the Go-routine based process exec in the prior code. A side issue was found with setting oom_score_adj before execve() in a userns that is also solved here. Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)	2016-03-04 11:12:26 -05:00
Michael Crosby	3cc90bd2d8	Add support for process overrides of settings This commit adds support to libcontainer to allow caps, no new privs, apparmor, and selinux process label to the process struct so that it can be used together of override the base settings on the container config per individual process. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2016-03-03 11:41:33 -08:00
Stefan Berger	5fbf791e31	Create unique session key name for every container Create a unique session key name for every container. Use the pattern _ses.<postfix> with postfix being the container's Id. This patch does not prevent containers from joining each other's session keyring. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>	2016-02-24 08:39:52 -05:00
Mrunal Patel	38b39645d9	Implement NoNewPrivileges support in libcontainer Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2016-02-16 06:57:50 -08:00
Stefan Berger	ad22e23aee	Create a new session key for every container Create a new session key ring '_ses' for every container. This avoids sharing the key structure with the process that created the container and the container inherits from. This patch fixes it init and exec. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>	2016-02-04 22:05:50 -05:00
Vishnu Kannan	cc232c4707	Adding oom_score_adj as a container config param. Signed-off-by: Vishnu Kannan <vishnuk@google.com>	2015-08-31 14:02:59 -07:00
Matthew Heon	2ae581ae62	Convert Seccomp support to use Libseccomp This removes the existing, native Go seccomp filter generation and replaces it with Libseccomp. Libseccomp is a C library which provides architecture independent generation of Seccomp filters for the Linux kernel. This adds a dependency on v2.2.1 or above of Libseccomp. Signed-off-by: Matthew Heon <mheon@redhat.com>	2015-08-13 07:56:27 -04:00
Michael Crosby	080df7ab88	Update import paths for new repository Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:59 -07:00
Michael Crosby	8f97d39dd2	Move libcontainer into subdirectory Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2015-06-21 19:29:15 -07:00

28 Commits