jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Adrian Reber	60ae7091de	checkpoint: support lazy migration With the help of userfaultfd CRIU supports lazy migration. Lazy migration means that memory pages are only transferred from the migration source to the migration destination on page fault. This enables to reduce the downtime during process or container migration to a minimum as the memory does not need to be transferred during migration. Lazy migration currently depends on userfaultfd being available on the current Linux kernel and if the used CRIU version supports lazy migration. Both dependencies can be checked by querying CRIU via RPC if the lazy migration feature is available. Using feature checking instead of version comparison enables runC to use CRIU features from the criu-dev branch. This way the user can decide if lazy migration should be available by choosing the right kernel and CRIU branch. To use lazy migration the CRIU process during dump needs to dump everything besides the memory pages and then it opens a network port waiting for remote page fault requests: # runc checkpoint httpd --lazy-pages --page-server 0.0.0.0:27 \ --status-fd /tmp/postcopy-pipe In this example CRIU will hang/wait once it has opened the network port and wait for network connection. As runC waits for CRIU to finish it will also hang until the lazy migration has finished. To know when the restore on the destination side can start the '--status-fd' parameter is used: #️ runc checkpoint --help \| grep status --status-fd value criu writes \0 to this FD once lazy-pages is ready The parameter '--status-fd' is directly from CRIU and this way the process outside of runC which controls the migration knows exactly when to transfer the checkpoint (without memory pages) to the destination and that the restore can be started. On the destination side it is necessary to start CRIU in 'lazy-pages' mode like this: # criu lazy-pages --page-server --address 192.168.122.3 --port 27 \ -D checkpoint and tell runC to do a lazy restore: # runc restore -d --image-path checkpoint --work-path checkpoint \ --lazy-pages httpd If both processes on the restore side have the same working directory 'criu lazy-pages' creates a unix domain socket where it waits for requests from the actual restore. runC starts CRIU restore in lazy restore mode and talks to 'criu lazy-pages' that it wants to restore memory pages on demand. CRIU continues to restore the process and once the process is running and accesses the first non-existing memory page the 'criu lazy-pages' server will request the page from the source system. Thus all pages from the source system will be transferred to the destination system. Once all pages have been transferred runC on the source system will end and the container will have finished migration. This can also be combined with CRIU's pre-copy support. The combination of pre-copy and post-copy (lazy migration) provides the possibility to migrate containers with minimal downtimes. Some additional background about post-copy migration can be found in these articles: https://lisas.de/~adrian/?p=1253 https://lisas.de/~adrian/?p=1183 Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Adrian Reber	a3a632ad28	checkpoint: add support to query for lazy page support Before adding the actual lazy migration support, this adds the feature check for lazy-pages. Right now lazy migration, which is based on userfaultd is only available in the criu-dev branch and not yet in a release. As the check does not dependent on a certain version but on a CRIU feature which can be queried it can be part of runC without a new version check depending on a feature from criu-dev. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Mrunal Patel	aea4f21eec	Merge pull request #1575 from cyphar/tty-resize-ignore-errors signal: ignore tty.resize errors	2017-09-01 11:20:26 -07:00
Michael Crosby	84a082bfef	Merge pull request #1578 from cyphar/remove-shfmt-from-ci travis: drop shfmt install	2017-08-31 09:46:39 -04:00
Aleksa Sarai	ace083b650	travis: drop shfmt install It looks like we missed this in `5930d5b427` ("Remove shfmt"), which was causing CI to break (since it looks like the repo has moved or something like that). Since we're no longer using shfmt, drop it completely from the repo. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-31 20:49:51 +10:00
Aleksa Sarai	10b175ce49	signal: ignore tty.resize errors Fixes a race that occurred very frequently in testing where the tty of the container may be closed by the time that runc gets to sending SIGWINCH. This failure mode is not fatal, but it would cause test failures due to expected outputs not matching. On further review it appears that the original addition of these checks in `4c5bf649d0` ("Check error return values") was actually not necessary, so partially revert that change. The particular failure mode this resolves would manifest as error logs of the form: time="2017-08-24T07:59:50Z" level=error msg="bad file descriptor" Fixes: `4c5bf649d0` ("Check error return values") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-27 00:44:17 +10:00
Qiang Huang	1c81e2a794	Merge pull request #1572 from tych0/fix-readonly-userns fix --read-only containers under --userns-remap	2017-08-26 09:38:14 +08:00
Aleksa Sarai	4d6e6720a7	Merge branch 'pr-1573' Fix systemd cgroup after memory type changed LGTMs: @crosbymichael @cyphar Closes #1573	2017-08-25 23:55:27 +10:00
Michael Crosby	4e33faefa7	Merge pull request #1570 from cyphar/close-statedirfd-hole init: switch away from stateDirFd entirely	2017-08-25 09:52:16 -04:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00
Aleksa Sarai	7d66aab77a	init: switch away from stateDirFd entirely While we have significant protections in place against CVE-2016-9962, we still were holding onto a file descriptor that referenced the host filesystem. This meant that in certain scenarios it was still possible for a semi-privileged container to gain access to the host filesystem (if they had CAP_SYS_PTRACE). Instead, open the FIFO itself using a O_PATH. This allows us to reference the FIFO directly without providing the ability for directory-level access. When opening the FIFO inside the init process, open it through procfs to re-open the actual FIFO (this is currently the only supported way to open such a file descriptor). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-25 13:19:03 +10:00
Tycho Andersen	66eb2a3e8f	fix --read-only containers under --userns-remap The documentation here: https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations says that readonly containers can't be used with user namespaces do to some kernel restriction. In fact, there is a special case in the kernel to be able to do stuff like this, so let's use it. This takes us from: ubuntu@docker:~$ docker run -it --read-only ubuntu docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:125: remounting \\\"/dev\\\" as readonly caused \\\"operation not permitted\\\"\"". to: ubuntu@docker:~$ docker-runc --version runc version 1.0.0-rc4+dev commit: ae2948042b08ad3d6d13cd09f40a50ffff4fc688-dirty spec: 1.0.0 ubuntu@docker:~$ docker run -it --read-only ubuntu root@181e2acb909a:/# touch foo touch: cannot touch 'foo': Read-only file system Signed-off-by: Tycho Andersen <tycho@docker.com>	2017-08-24 16:43:21 -06:00
Michael Crosby	ae2948042b	Merge pull request #1561 from nseps/master Add AutoDedup option to CriuOpts	2017-08-18 12:50:27 -04:00
Nikolas Sepos	3f234b15d0	Add auto-dedup flag for checkpoint/restore When doing incremental dumps is useful to use auto deduplication of memory images to save space. Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>	2017-08-18 16:19:21 +02:00
Nikolas Sepos	da4a5a9515	Add AutoDedup option to CriuOpts Memory image deduplication, very useful for incremental dumps. See: https://criu.org/Memory_images_deduplication Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>	2017-08-18 01:21:42 +02:00
Aleksa Sarai	59bbdc41a3	merge branch 'pr-1560' Check error return values LGTMs: @crosbymichael @cyphar Closes #1560	2017-08-18 01:31:18 +10:00
Michael Crosby	ccd2c20aa4	Merge pull request #1559 from Mashimiao/panic-fix-nil-linux fix panic when Linux is nil for rootless case	2017-08-17 09:57:35 -04:00
Tobias Klauser	4c5bf649d0	Check error return values Both tty.resize and notifySocket.setupSocket return an error which isn't handled in the caller. Fix this and either log or propagate the errors. Found using https://github.com/mvdan/unparam Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-08-17 11:41:19 +02:00
Michael Crosby	c6126b2141	Merge pull request #1554 from cyphar/use-umoci-release-script release: import umoci's release.sh script	2017-08-16 09:46:56 -04:00
Aleksa Sarai	c24f602407	ci: smoke-test the release script To make sure that `make release` doesn't suddenly break after we've cut a release, smoke-test the release scripts. The script won't fail if GPG keys aren't found, so running in CI shouldn't be a huge issue. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-16 14:44:45 +10:00
Aleksa Sarai	ed68ee1e10	release: import umoci's release.sh script This script is far easier to use than the previous `make release` target, not to mention that it also automatically signs all of the artefacts and makes everything really easy to do for maintainers. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-16 14:35:52 +10:00
Ma Shimiao	2333e7dc67	fix panic when Linux is nil for rootless case congfig.Sysctl setting is duplicated. when contianer is rootless and Linux is nil, runc will panic. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-08-16 09:11:13 +08:00
Mrunal Patel	b31bdfc38a	Merge pull request #1558 from hqhq/update_state Update state after update	2017-08-15 10:46:44 -07:00
Qiang Huang	e6e1c34a7d	Update state after update state.json should be a reflection of the container's realtime state, including resource configurations, so we should update state.json after updating container resources. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-15 14:38:44 +08:00
Qiang Huang	eb464f7e43	Merge pull request #1542 from cyphar/buildmode-pic makefile: enable -buildmode=pie	2017-08-15 09:30:40 +08:00
Aleksa Sarai	b45e243f8b	*: enable -buildmode=pie Go has supported PIC builds for a while now, and given the security benefits of using PIC binaries we should really enable them. There also appears to be some indication that non-PIC builds have been interacting oddly on ppc64le (the linker cannot load some shared libraries), and using PIC builds appears to solve this problem. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-15 00:12:27 +10:00
Michael Crosby	760c67744b	Merge pull request #1555 from cyphar/remove-install-flag-makefile makefile: drop usage of --install	2017-08-14 10:04:33 -04:00
Michael Crosby	3096b3fc85	Merge pull request #1556 from hqhq/fix_flakytest_TestNotifyOnOOM Fix flaky test TestNotifyOnOOM	2017-08-14 10:03:23 -04:00
Qiang Huang	9aa46c1e66	Merge pull request #1551 from crosbymichael/linux-nil fix panic when Linux is nil	2017-08-14 19:35:31 +08:00
Qiang Huang	7726bcf0e2	Some fixes for testMemoryNotification Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-14 15:28:03 +08:00
Qiang Huang	40a1fb0e2f	Fix flaky test TestNotifyOnOOM Fixes: #1228 It can be reproduced by applying this patch: ```diff @@ -45,6 +46,7 @@ func registerMemoryEvent(cgDir string, evName string, arg string) (<-chan struct go func() { defer func() { close(ch) + <-time.After(1 * time.Second) eventfd.Close() evFile.Close() }() ``` We can close channel after fds were closed. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-14 15:18:59 +08:00
Aleksa Sarai	6581d0f488	makefile: drop usage of --install The "go build -i" invocation may slightly help with incremental recompilation, but it will cause builds to fail if $GOROOT is not writeable by the current user. While this does appear to work sometimes, it's a concern for external build systems where "-i" causes build errors for no real gain. Given the size of the runc project, --install is not really giving us much anyway. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-14 00:10:32 +10:00
Ma Shimiao	527dc5acbb	fix panic when Linux is nil Linux is not always not nil. If Linux is nil, panic will occur. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-08-10 15:57:49 -04:00
Michael Crosby	3f2f8b84a7	Merge pull request #1553 from mlaventure/handle-non-devices Handle non-devices correctly in DeviceFromPath	2017-08-10 14:37:50 -04:00
Aleksa Sarai	739db6d3fa	merge branch 'pr-1532' VERSION: back to development VERSION: release v1.0.0-rc4 Votes: +5 -0 /2 LGTMs: @hqhq @crosbymichael Closes #1532k	2017-08-11 00:31:10 +10:00
Kenfe-Mickael Laventure	3ed492ad33	Handle non-devices correctly in DeviceFromPath Before this change, some file type would be treated as char devices (e.g. symlinks). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-09 08:52:20 -07:00
Michael Crosby	d40db12e72	Merge pull request #1506 from LittleLightLittleFire/1443-runc-reap-child-process Pass back the pid of runc:[1:CHILD] so we can wait on it	2017-08-07 09:33:14 -04:00
Alex Fang	e92add2151	Pass back the pid of runc:[1:CHILD] so we can wait on it This allows the libcontainer to automatically clean up runc:[1:CHILD] processes created as part of nsenter. Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>	2017-08-05 13:44:36 +10:00
Aleksa Sarai	45bde006ca	merge branch 'pr-1535' LGTMs: @avagin @cyphar Closes #1535	2017-08-05 13:33:07 +10:00
Aleksa Sarai	22bbec1b7f	merge branch 'pr-1548' LGTMs: @crosbymichael @mrunalp @cyphar Closes #1548	2017-08-05 13:02:46 +10:00
Mrunal Patel	135b9992b3	Merge pull request #1544 from mlaventure/fix-device-from-path Fix condition to detect device type in DeviceFromPath	2017-08-04 17:36:57 -07:00
Kenfe-Mickael Laventure	6056912217	Revert "Merge pull request #1450 from vrothberg/sgid-non-numeric" This reverts commit `5c73abbe75`, reversing changes made to `51b501dab1`. Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-04 14:28:21 -07:00
Daniel, Dao Quang Minh	606fb713d9	Merge pull request #1545 from mlaventure/user-pkg-move-unix-call Move user pkg unix specific calls to unix file	2017-08-03 23:29:58 +01:00
Kenfe-Mickael Laventure	25f4c7e72b	Move user pkg unix specific calls to unix file Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-03 11:31:21 -07:00
Kenfe-Mickael Laventure	9ed15e94c8	Fix condition to detect device type in DeviceFromPath Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-03 11:06:54 -07:00
Mrunal Patel	9a01140955	Merge pull request #1543 from avagin/maintainer Remove @avagin as a maintainer	2017-08-02 11:12:42 -07:00
Andrei Vagin	b9cff3c188	Remove @avagin as a maintainer Unfortunately I don't have enough time to be a maintainer of runc. I am not going to disappear from the community and as before I always ready to help with anything. Signed-off-by: Andrei Vagin <avagin@openvz.org>	2017-08-02 10:55:08 -07:00
Adrian Reber	5d386f6e2b	checkpoint: use CRIU VERSION RPC if available With this runC also uses RPC to ask CRIU for its version. CRIU supports a VERSION RPC since CRIU 3.0 and using the RPC interface does not require parsing the console output of CRIU (which could change anytime). For older CRIU versions which do not yet have the VERSION RPC runC falls back to its old CRIU output parsing mode. Once CRIU 3.0 is the minimum version required for runC the old code can be removed. v2: * adapt to changes in the previous patches based on the review Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:08:07 +00:00
Adrian Reber	2393692536	criurpc.proto: copy latest criurpc.proto from criu 3.3 Update criurpc.proto for the upcoming VERSION RPC. This includes lazy_pages for the upcoming lazy migration support. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:07:32 +00:00
Adrian Reber	c71d9cd447	criuSwrk: prepare for CRIU VERSION RPC To use the CRIU VERSION RPC the criuSwrk function is adapted to work with CriuOpts set to 'nil' as CriuOpts is not required for the VERSION RPC. Also do not print c.criuVersion if it is '0' as the first RPC call will always be the VERSION call and only after that the version will be known. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:07:28 +00:00

1 2 3 4 5 ...

3371 Commits All Branches Search

3371 Commits

All Branches