jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Ted Yu	49896ab0f4	Avoid double close of criuServer Signed-off-by: Ted Yu <yuzhihong@gmail.com>	2020-04-01 16:15:23 -07:00
Michael Crosby	9ec5b03e5a	Merge pull request #2259 from adrianreber/v2-test Add minimal cgroup2 checkpoint/restore support	2020-03-31 15:01:18 -04:00
Yulia Nedyalkova	2abc6a3605	Actually check for syscall.ENODEV when checking if a container is paused It turns out that ioutil.Readfile wraps the error in a *os.PathError. Since we cannot guarantee compilation with golang >= v1.13, we are manually unwrapping the error. Signed-off-by: Kieron Browne <kbrowne@pivotal.io>	2020-03-31 15:52:20 +01:00
Adrian Reber	9a0184b10f	cgroup2: use CRIU's new freezer v2 support The newest CRIU version supports freezer v2 and this tells runc to use it if new enough or fall back to non-freezer based process freezing on cgroup v2 system. Signed-off-by: Adrian Reber <areber@redhat.com>	2020-03-31 16:36:35 +02:00
Michael Crosby	88474967d3	Merge pull request #1974 from openSUSE/unreachable-code Remove unreachable code paths	2020-03-16 13:56:05 -04:00
Mrunal Patel	981dbef514	Merge pull request #2226 from avagin/runsc-restore-cmd-wait restore: fix a race condition in process.Wait()	2020-03-15 18:48:16 -07:00
Sascha Grunert	b477a159db	Remove unreachable code paths Signed-off-by: Sascha Grunert <sgrunert@suse.com>	2020-03-12 09:13:03 +01:00
Pradyumna Agrawal	5b2b138d24	Synchronize the call to linuxContainer.Signal() linuxContainer.Signal() can race with another call to say Destroy() which clears the container's initProcess. This can cause a nil pointer dereference in Signal(). This patch will synchronize Signal() and Destroy() by grabbing the container's mutex as part of the Signal() call. Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>	2020-03-09 11:15:22 -07:00
Andrei Vagin	269ea385a4	restore: fix a race condition in process.Wait() Adrian reported that the checkpoint test stated failing: === RUN TestCheckpoint --- FAIL: TestCheckpoint (0.38s) checkpoint_test.go:297: Did not restore the pipe correctly: The problem here is when we start exec.Cmd, we don't call its wait method. This means that we don't wait cmd.goroutines ans so we don't know when all data will be read from process pipes. Signed-off-by: Andrei Vagin <avagin@gmail.com>	2020-02-10 10:21:08 -08:00
Aleksa Sarai	f6fb7a0338	merge branch 'pr-2133' Julia Nedialkova (1): Handle ENODEV when accessing the freezer.state file LGTMs: @crosbymichael @cyphar Closes #2133	2020-01-17 02:07:19 +11:00
Jordan Liggitt	8541d9cf3d	Fix race checking for process exit and waiting for exec fifo Signed-off-by: Jordan Liggitt <liggitt@google.com>	2019-12-18 18:48:18 +00:00
Radostin Stoyanov	a610a84821	criu: Ensure other users cannot read c/r files No checkpoint files should be readable by anyone else but the user creating it. Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>	2019-10-17 07:49:38 +01:00
Radostin Stoyanov	f017e0f9e1	checkpoint: Set descriptors.json file mode to 0600 Prevent unprivileged users from being able to read descriptors.json Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>	2019-10-12 19:29:44 +01:00
Julia Nedialkova	e63b797f38	Handle ENODEV when accessing the freezer.state file ...when checking if a container is paused Signed-off-by: Julia Nedialkova <julianedialkova@hotmail.com>	2019-09-27 17:02:56 +03:00
Michael Crosby	331692baa7	Only allow proc mount if it is procfs Fixes #2128 This allows proc to be bind mounted for host and rootless namespace usecases but it removes the ability to mount over the top of proc with a directory. ```bash > sudo docker run --rm apparmor docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/volumes/aae28ea068c33d60e64d1a75916cf3ec2dc3634f97571854c9ed30c8401460c1/_data\\\" to rootfs \\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged\\\" at \\\"/proc\\\" caused \\\"\\\\\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged/proc\\\\\\\" cannot be mounted because it is not of type proc\\\"\"": unknown. > sudo docker run --rm -v /proc:/proc apparmor docker-default (enforce) root 18989 0.9 0.0 1288 4 ? Ss 16:47 0:00 sleep 20 ``` Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2019-09-24 11:00:18 -04:00
Giuseppe Scrivano	1932917b71	libcontainer: add initial support for cgroups v2 allow to set what subsystems are used by libcontainer/cgroups/fs.Manager. subsystemsUnified is used on a system running with cgroups v2 unified mode. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2019-09-05 13:02:25 +02:00
Georgi Sabev	a146081828	Write logs to stderr by default Minor refactoring to use the filePair struct for both init sock and log pipe Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2019-04-24 15:18:14 +03:00
Georgi Sabev	ba3cabf932	Improve nsexec logging * Simplify logging function * Logs contain __FUNCTION__:__LINE__ * Bail uses write_log Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com> Co-authored-by: Danail Branekov <danailster@gmail.com> Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>	2019-04-22 17:53:52 +03:00
Danail Branekov	c486e3c406	Address comments in PR 1861 Refactor configuring logging into a reusable component so that it can be nicely used in both main() and init process init() Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com> Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io> Co-authored-by: Claudia Beresford <cberesford@pivotal.io> Signed-off-by: Danail Branekov <danailster@gmail.com>	2019-04-04 14:57:28 +03:00
Marco Vedovati	9a599f62fb	Support for logging from children processes Add support for children processes logging (including nsexec). A pipe is used to send logs from children to parent in JSON. The JSON format used is the same used by logrus JSON formatted, i.e. children process can use standard logrus APIs. Signed-off-by: Marco Vedovati <mvedovati@suse.com>	2019-04-04 14:53:23 +03:00
Mrunal Patel	2b18fe1d88	Merge pull request #1984 from cyphar/memfd-cleanups nsenter: cloned_binary: "memfd" cleanups	2019-03-07 10:18:33 -08:00
Michael Crosby	f739110263	Merge pull request #1968 from adrianreber/podman Create bind mount mountpoints during restore	2019-03-04 11:37:07 -06:00
Aleksa Sarai	af9da0a450	nsenter: cloned_binary: use the runc statedir for O_TMPFILE Writing a file to tmpfs actually incurs a memcg penalty, and thus the benefit of being able to disable memfd_create(2) with _LIBCONTAINER_DISABLE_MEMFD_CLONE is fairly minimal -- though it should be noted that quite a few distributions don't use tmpfs for /tmp (and instead have it as a regular directory or subvolume of the host filesystem). Since runc must have write access to the state directory anyway (and the state directory is usually not on a tmpfs) we can use that instead of /tmp -- avoiding potential memcg costs with no real downside. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2019-03-01 23:28:51 +11:00
Adrian Reber	9edb5494bb	Use vendored in CRIU Go bindings This makes use of the vendored in Go bindings and removes the copy of the CRIU RPC interface definition. runc now relies on go-criu for RPC definition and hopefully more CRIU functions can be used in the future from the CRIU Go bindings. Signed-off-by: Adrian Reber <areber@redhat.com>	2019-02-14 18:20:02 +01:00
Adrian Reber	7354546cc8	Create mountpoints also on restore runc creates all missing mountpoints when it starts a container, this commit also creates those mountpoints during restore. Now it is possible to restore a container using the same, but newly created rootfs just as during container start. Signed-off-by: Adrian Reber <areber@redhat.com>	2019-02-08 15:59:51 +01:00
Adrian Reber	e157963054	Enable CRIU configuration files CRIU 3.11 introduces configuration files: https://criu.org/Configuration_files https://lisas.de/~adrian/posts/2018-Nov-08-criu-configuration-files.html This enables the user to influence CRIU's behaviour without code changes if using new CRIU features or if the user wants to enable certain CRIU behaviour without always specifying certain options. With this it is possible to write 'tcp-established' to the configuration file: $ echo tcp-established > /etc/criu/runc.conf and from now on all checkpoints will preserve the state of established TCP connections. This removes the need to always use $ runc checkpoint --tcp-stablished If the goal is to always checkpoint with '--tcp-established' It also adds the possibility for unexpected CRIU behaviour if the user created a configuration file at some point in time and forgets about it. As a result of the discussion in https://github.com/opencontainers/runc/pull/1933 it is now also possible to define a CRIU configuration file for each container with the annotation 'org.criu.config'. If 'org.criu.config' does not exist, runc will tell CRIU to use '/etc/criu/runc.conf' if it exists. If 'org.criu.config' is set to an empty string (''), runc will tell CRIU to not use any runc specific configuration file at all. If 'org.criu.config' is set to a non-empty string, runc will use that value as an additional configuration file for CRIU. With the annotation the user can decide to use the default configuration file ('/etc/criu/runc.conf'), none or a container specific configuration file. Signed-off-by: Adrian Reber <areber@redhat.com>	2018-12-21 07:42:12 +01:00
Michael Crosby	96ec2177ae	Merge pull request #1943 from giuseppe/allow-to-signal-paused-containers kill: allow to signal paused containers	2018-12-03 16:55:13 -05:00
Ace-Tang	dce70cdff5	cr: get pid from criu notify when restore when restore container from a checkpoint directory, we should get pid from criu notify, since c.initProcess has not been created. Signed-off-by: Ace-Tang <aceapril@126.com>	2018-12-03 13:31:20 +08:00
Giuseppe Scrivano	07d1ad44c8	kill: allow to signal paused containers regression introduced by `87a188996e` Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2018-11-30 23:35:47 +01:00
Michael Crosby	50e2634995	Merge pull request #1934 from lifubang/kill fix: may kill other process when container has been stopped	2018-11-21 10:30:25 -05:00
Lifubang	87a188996e	may kill other process when container has been stopped Signed-off-by: Lifubang <lifubang@acmcoder.com>	2018-11-21 17:44:52 +08:00
W. Trevor King	e23868603a	libcontainer: Set 'status' in hook stdin Finish off the work started in `a344b2d6` (sync up `HookState` with OCI spec `State`, 2016-12-19, #1201). And drop HookState, since there's no need for a local alias for specs.State. Also set c.initProcess in newInitProcess to support OCIState calls from within initProcess.start(). I think the cyclic references between linuxContainer and initProcess are unfortunate, but didn't want to address that here. I've also left the timing of the Prestart hooks alone, although the spec calls for them to happen before start (not as part of creation) [1,2]. Once the timing gets fixed we can drop the initProcessStartTime hacks which initProcess.start currently needs. I'm not sure why we trigger the prestart hooks in response to both procReady and procHooks. But we've had two prestart rounds in initProcess.start since `2f276498` (Move pre-start hooks after container mounts, 2016-02-17, #568). I've left that alone too. I really think we should have len() guards to avoid computing the state when .Hooks is non-nil but the particular phase we're looking at is empty. Aleksa, however, is adamantly against them [3] citing a risk of sloppy copy/pastes causing the hook slice being len-guarded to diverge from the hook slice being iterated over within the guard. I think that ort of thing is very lo-risk, because: * We shouldn't be copy/pasting this, right? DRY for the win :). * There's only ever a few lines between the guard and the guarded loop. That makes broken copy/pastes easy to catch in review. * We should have test coverage for these. Guarding with the wrong slice is certainly not the only thing you can break with a sloppy copy/paste. But I'm not a maintainer ;). [1]: https://github.com/opencontainers/runtime-spec/blob/v1.0.0/config.md#prestart [2]: https://github.com/opencontainers/runc/issues/1710 [3]: https://github.com/opencontainers/runc/pull/1741#discussion_r233331570 Signed-off-by: W. Trevor King <wking@tremily.us>	2018-11-14 06:49:49 -08:00
Mrunal Patel	4769cdf607	Merge pull request #1916 from crosbymichael/cgns Add support for cgroup namespace	2018-11-13 12:21:38 -08:00
Michael Crosby	aa7917b751	Merge pull request #1911 from theSuess/linter-fixes Various cleanups to address linter issues	2018-11-13 12:13:34 -05:00
Yuanhong Peng	df3fa115f9	Add support for cgroup namespace Cgroup namespace can be configured in `config.json` as other namespaces. Here is an example: ``` "namespaces": [ { "type": "pid" }, { "type": "network" }, { "type": "ipc" }, { "type": "uts" }, { "type": "mount" }, { "type": "cgroup" } ], ``` Note that if you want to run a container which has shared cgroup ns with another container, then it's strongly recommended that you set proper `CgroupsPath` of both containers(the second container's cgroup path must be the subdirectory of the first one). Or there might be some unexpected results. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-10-31 10:51:43 -04:00
Mrunal Patel	c2ab1e656e	Merge pull request #1910 from adrianreber/tip Fix travis Go: tip	2018-10-17 12:47:08 -07:00
Dominik Süß	0b412e9482	various cleanups to address linter issues Signed-off-by: Dominik Süß <dominik@suess.wtf>	2018-10-13 21:14:03 +02:00
Adrian Reber	0d01164756	Fix travis Go: tip This fixes libcontainer/container_linux.go:1200: Error call has possible formatting directive %s Signed-off-by: Adrian Reber <areber@redhat.com>	2018-10-13 10:44:07 +00:00
Akihiro Suda	06f789cf26	Disable rootless mode except RootlessCgMgr when executed as the root in userns This PR decomposes `libcontainer/configs.Config.Rootless bool` into `RootlessEUID bool` and `RootlessCgroups bool`, so as to make "runc-in-userns" to be more compatible with "rootful" runc. `RootlessEUID` denotes that runc is being executed as a non-root user (euid != 0) in the current user namespace. `RootlessEUID` is almost identical to the former `Rootless` except cgroups stuff. `RootlessCgroups` denotes that runc is unlikely to have the full access to cgroups. `RootlessCgroups` is set to false if runc is executed as the root (euid == 0) in the initial namespace. Otherwise `RootlessCgroups` is set to true. (Hint: if `RootlessEUID` is true, `RootlessCgroups` becomes true as well) When runc is executed as the root (euid == 0) in an user namespace (e.g. by Docker-in-LXD, Podman, Usernetes), `RootlessEUID` is set to false but `RootlessCgroups` is set to true. So, "runc-in-userns" behaves almost same as "rootful" runc except that cgroups errors are ignored. This PR does not have any impact on CLI flags and `state.json`. Note about CLI: * Now `runc --rootless=(auto\|true\|false)` CLI flag is only used for setting `RootlessCgroups`. * Now `runc spec --rootless` is only required when `RootlessEUID` is set to true. For runc-in-userns, `runc spec` without `--rootless` should work, when sufficient numbers of UID/GID are mapped. Note about `$XDG_RUNTIME_DIR` (e.g. `/run/user/1000`): * `$XDG_RUNTIME_DIR` is ignored if runc is being executed as the root (euid == 0) in the initial namespace, for backward compatibility. (`/run/runc` is used) * If runc is executed as the root (euid == 0) in an user namespace, `$XDG_RUNTIME_DIR` is honored if `$USER != "" && $USER != "root"`. This allows unprivileged users to allow execute runc as the root in userns, without mounting writable `/run/runc`. Note about `state.json`: * `rootless` is set to true when `RootlessEUID == true && RootlessCgroups == true`. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-09-07 15:05:03 +09:00
Adrian Reber	fa43a72aba	criu: restore into existing namespace when specified Using CRIU to checkpoint and restore a container into an existing network namespace is not possible. If the network namespace is defined like { "type": "network", "path": "/run/netns/test" } there is the expectation that the restored container is again running in the network namespace specified with 'path'. This adds the new CRIU 'external namespace' feature to runc, where during checkpointing that specific namespace is referenced and during restore CRIU tries to restore the container in exactly that namespace. This breaks/fixes current runc behavior. If, without this patch, runc restores a container with such a network namespace definition, it is ignored and CRIU recreates a network namespace without a name. With this patch runc uses the network namespace path (if available) to checkpoint and restore the container in just that network namespace. Restore will now fail if a container was checkpointed with a network namespace path set and if that network namespace path does not exist during restore. runc still falls back to the old behavior if CRIU older than 3.11 is installed. Fixes #1786 Related to https://github.com/projectatomic/libpod/pull/469 Thanks to Andrei Vagin for all the help in getting the interface between CRIU and runc right! Signed-off-by: Adrian Reber <areber@redhat.com>	2018-08-22 23:27:20 +02:00
Michael Crosby	53fddb540a	Pass GOMAXPROCS to init processes This will help runc's init to not spawn many threads on large systems when launched with max procs by the caller. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2018-06-26 11:23:37 -04:00
Mrunal Patel	bd3c4f844a	Fix race in runc exec There is a race in runc exec when the init process stops just before the check for the container status. It is then wrongly assumed that we are trying to start an init process instead of an exec process. This commit add an Init field to libcontainer Process to distinguish between init and exec processes to prevent this race. Signed-off-by: Mrunal Patel <mrunalp@gmail.com>	2018-06-01 16:25:58 -07:00
Michael Crosby	0e561642f8	Merge pull request #1688 from AkihiroSuda/unshare-m-r main: support rootless mode in userns	2018-05-29 15:41:17 -04:00
Qiang Huang	dd67ab10d7	Merge pull request #1759 from cyphar/rootless-erofs-as-eperm rootless: cgroup: treat EROFS as a skippable error	2018-05-25 09:24:16 +08:00
Akihiro Suda	c93815738a	libcontainer: remove extra CAP_SETGID check for SetgroupAttr Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-05-24 14:59:30 +09:00
Michael Crosby	bdbb9fab07	Merge pull request #1693 from AkihiroSuda/leave-setgroups-allow libcontainer: allow setgroup in rootless mode	2018-04-24 11:24:04 -04:00
Sebastien Boeuf	985628dda0	libcontainer: Don't set container state to running when exec'ing There is no reason to set the container state to "running" as a temporary value when exec'ing a process on a container in "created" state. The problem doing this is that consumers of the libcontainer library might use it by keeping pointers in memory. In this case, the container state will indicate that the container is running, which is wrong, and this will end up with a failure on the next action because the check for the container state transition will complain. Fixes #1767 Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>	2018-03-30 09:29:18 -07:00
Akihiro Suda	73f3dc6389	libcontainer: allow setgroup in rootless mode Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>	2018-03-27 17:42:05 +09:00
Aleksa Sarai	fd3a6e6c83	libcontainer: handle unset oomScoreAdj corectly Previously if oomScoreAdj was not set in config.json we would implicitly set oom_score_adj to 0. This is not allowed according to the spec: > If oomScoreAdj is not set, the runtime MUST NOT change the value of > oom_score_adj. Change this so that we do not modify oom_score_adj if oomScoreAdj is not present in the configuration. While this modifies our internal configuration types, the on-disk format is still compatible. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2018-03-17 13:53:42 +11:00
W. Trevor King	50dc7ee96c	libcontainer/capabilities_linux: Drop os.Getpid() call gocapability has supported 0 as "the current PID" since syndtr/gocapability@5e7cce49 (Allow to use the zero value for pid to operate with the current task, 2015-01-15, syndtr/gocapability#2). libcontainer was ported to that approach in `444cc298` (namespaces: allow to use pid namespace without mount namespace, 2015-01-27, docker/libcontainer#358), but the change was clobbered by `22df5551` (Merge branch 'master' into api, 2015-02-19, docker/libcontainer#388) which landed via `5b73860e` (Merge pull request #388 from docker/api, 2015-02-19, docker/libcontainer#388). This commit restores the changes from `444cc298`. Signed-off-by: W. Trevor King <wking@tremily.us>	2018-02-19 15:47:42 -08:00

1 2 3 4

193 Commits