jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Xiaochen Shen	2549545df5	intelrdt: always init IntelRdtManager if Intel RDT is enabled In current implementation: Either Intel RDT is not enabled by hardware and kernel, or intelRdt is not specified in original config, we don't init IntelRdtManager in the container to handle intelrdt constraint. It is a tradeoff that Intel RDT has hardware limitation to support only limited number of groups. This patch makes a minor change to support update command: Whether or not intelRdt is specified in config, we always init IntelRdtManager in the container if Intel RDT is enabled. If intelRdt is not specified in original config, we just don't Apply() to create intelrdt group or attach tasks for this container. In update command, we could re-enable through IntelRdtManager.Apply() and then update intelrdt constraint. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-20 01:37:31 +08:00
Michael Crosby	593914b8bd	Merge pull request #1593 from s7v7nislands/drop_go1.5 Drop support golang 1.5	2017-09-12 15:22:00 -04:00
s7v7nislands	00ad8e1e56	Drop support golang 1.5 Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>	2017-09-12 20:56:51 +08:00
Qiang Huang	68e00e906b	Merge pull request #1586 from crosbymichael/set-cgroups Apply cgroups earlier	2017-09-12 12:13:29 +08:00
Aleksa Sarai	f1e19e9744	merge branch 'pr-1579' Disable systemd in static build LGTMs: @crosbymichael @cyphar Closes #1579	2017-09-12 08:01:24 +10:00
Aleksa Sarai	f756d904ce	merge branch 'pr-1577' Add `-installsuffix netgo` in static build Use `netgo` for static build LGTMs: @crosbymichael @cyphar Closes #1577	2017-09-12 08:00:00 +10:00
Yong Tang	e9944d0f4c	Disable systemd in static build This fix tries to address the warnings caused by static build with go 1.9. As systemd needs dlopen/dlclose, the following warnings will be generated for static build in go 1.9: ``` root@f4b077232050:/go/src/github.com/opencontainers/runc# make static CGO_ENABLED=1 go build -tags "seccomp cgo static_build" -ldflags "-w -extldflags -static -X main.gitCommit="1c81e2a794c6e26a4c650142ae8893c47f619764" -X main.version=1.0.0-rc4+dev " -o runc . /tmp/go-link-113476657/000007.o: In function `_cgo_a5acef59ed3f_Cfunc_dlopen': /tmp/go-build/github.com/opencontainers/runc/vendor/github.com/coreos/pkg/dlopen/_obj/cgo-gcc-prolog:76: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking ``` This fix disables systemd when `static_build` flag is on (apply_nosystemd.go is used instead). This fix also fixes a small bug in `apply_nosystemd.go` for return value. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:38:22 +00:00
Mrunal Patel	d5b43c3981	Merge pull request #1455 from dqminh/epoll-io tty: move IO of master pty to be done with epoll	2017-09-11 11:32:42 -07:00
Yong Tang	ec42eaa427	Add `-installsuffix netgo` in static build This fix adds `-installsuffix netgo` in static build in combination of `-tags netgo`. See following for the reason: https://github.com/golang/go/issues/9369#issuecomment-69864440 Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:20:19 +00:00
Yong Tang	337c3fb88c	Use `netgo` for static build This fix adds `netgo` to tags for static build so that the following warning could be addressed: ``` /tmp/go-link-355596637/000000.o: In function `_cgo_b0c710f30cfd_C2func_getaddrinfo': /tmp/go-build/net/_obj/cgo-gcc-prolog:46: warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking ``` The above warning appears when building `make static` with go 1.9. Signed-off-by: Yong Tang <yong.tang.github@outlook.com>	2017-09-11 18:20:19 +00:00
Michael Crosby	8b47a242a9	Merge pull request #1529 from giuseppe/rootless-improvements Support multiple users/groups mapped for the rootless case	2017-09-11 14:01:31 -04:00
Aleksa Sarai	eb5bd4fa6a	tests: add tests for rootless multi-mapping configurations Enable several previously disabled tests (for the idmap execution mode) for rootless containers, in addition to making all tests use the additional mappings. At the moment there's no strong need to add any additional tests purely for rootless_idmap. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:33 +10:00
Aleksa Sarai	d0aec23c7e	tests: generalise rootless runner This is necessary in order to add proper opportunistic tests, and is a placeholder until we add tests for new{uid,gid}map configurations. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:33 +10:00
Aleksa Sarai	1a5fdc1c5f	init: support setting -u with rootless containers Now that rootless containers have support for multiple uid and gid mappings, allow --user to work as expected. If the user is not mapped, an error occurs (as usual). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:33 +10:00
Aleksa Sarai	969bb49cc3	nsenter: do not resolve path in nsexec context With the addition of our new{uid,gid}map support, we used to call execvp(3) from inside nsexec. This would mean that the path resolution for the binaries would happen in nsexec. Move the resolution to the initial setup code, and pass the absolute path to nsexec. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:33 +10:00
Aleksa Sarai	6097ce74d8	nsenter: correctly handle newgidmap path for rootless containers After quite a bit of debugging, I found that previous versions of this patchset did not include newgidmap in a rootless setting. Fix this by passing it whenever group mappings are applied, and also providing some better checking for try_mapping_tool. This commit also includes some stylistic improvements. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	3282f5a7c1	tests: fix for rootless multiple uids/gids Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	d8b669400a	rootless: allow multiple user/group mappings Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	fdf85e35b3	main: honor XDG_RUNTIME_DIR for rootless containers Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2017-09-09 12:44:34 +10:00
Mrunal Patel	13fa5d2953	Merge pull request #1588 from s7v7nislands/delete_unused Delete unused function	2017-09-08 17:34:00 -07:00
Michael Crosby	b82d07e816	Merge pull request #1587 from Mashimiao/fix-namespace-empty Fixes #1585 config.Namespaces is empty when accessed	2017-09-08 10:50:16 -04:00
Michael Crosby	9755e0065f	Merge pull request #1589 from xiaochenshen/rdt-cat-bug-fix libcontainer: intelrdt: use init() to avoid race condition	2017-09-08 10:41:45 -04:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
s7v7nislands	c795b8690b	Delete unused function Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>	2017-09-08 10:35:46 +08:00
Ma Shimiao	c3d20e7817	Fixes #1585 config.Namespaces is empty when accessed Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-09-08 09:30:07 +08:00
Mrunal Patel	deb9d7fd96	Merge pull request #1569 from cyphar/delay-seccomp init: delay seccomp application as late as possible	2017-09-07 13:27:37 -07:00
Mrunal Patel	7e036aa0b0	Merge pull request #1541 from adrianreber/lazy checkpoint: support lazy migration	2017-09-07 13:25:04 -07:00
Michael Crosby	7062c7556b	Apply cgroups earlier This applies cgroups earlier for container creation before the init process starts running and forking off any additional processes. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-09-07 11:27:33 -04:00
Mrunal Patel	5274430fee	Merge pull request #1279 from xiaochenshen/rdt-cat-resource-manager-v1 libcontainer: add support for Intel RDT/CAT in runc	2017-09-06 14:36:02 -07:00
Adrian Reber	ec260653b7	lazy-migration: add test case The lazy-pages test case is not as straight forward as the other test cases. This is related to the fact that restoring requires a different name if restored on the same host. During 'runc checkpoint' the container is not destroyed before all memory pages have been transferred to the destination and thus the same container name cannot be used. As real world usage will rather migrate a container from one system to another than lazy migrate a container on the same host this is only problematic for this test case. Another reason is that it requires starting 'runc checkpoint' and 'criu lazy-pages' in the background as those process need to be running to start the final restore 'runc restore'. CRIU upstream is currently discussing to automatically start 'criu lazy-pages' which would simplify the lazy-pages test case a bit. The handling and checking of the background processes make the test case not the most elegant as at one point a 'sleep 2' is required to make sure that 'runc checkpoint' had time to do its thing before looking at log files. Before running the actual test criu is called in feature checking mode to make sure lazy migration is in the test case criu enabled. If not, the test is skipped. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:39 +00:00
Adrian Reber	60ae7091de	checkpoint: support lazy migration With the help of userfaultfd CRIU supports lazy migration. Lazy migration means that memory pages are only transferred from the migration source to the migration destination on page fault. This enables to reduce the downtime during process or container migration to a minimum as the memory does not need to be transferred during migration. Lazy migration currently depends on userfaultfd being available on the current Linux kernel and if the used CRIU version supports lazy migration. Both dependencies can be checked by querying CRIU via RPC if the lazy migration feature is available. Using feature checking instead of version comparison enables runC to use CRIU features from the criu-dev branch. This way the user can decide if lazy migration should be available by choosing the right kernel and CRIU branch. To use lazy migration the CRIU process during dump needs to dump everything besides the memory pages and then it opens a network port waiting for remote page fault requests: # runc checkpoint httpd --lazy-pages --page-server 0.0.0.0:27 \ --status-fd /tmp/postcopy-pipe In this example CRIU will hang/wait once it has opened the network port and wait for network connection. As runC waits for CRIU to finish it will also hang until the lazy migration has finished. To know when the restore on the destination side can start the '--status-fd' parameter is used: #️ runc checkpoint --help \| grep status --status-fd value criu writes \0 to this FD once lazy-pages is ready The parameter '--status-fd' is directly from CRIU and this way the process outside of runC which controls the migration knows exactly when to transfer the checkpoint (without memory pages) to the destination and that the restore can be started. On the destination side it is necessary to start CRIU in 'lazy-pages' mode like this: # criu lazy-pages --page-server --address 192.168.122.3 --port 27 \ -D checkpoint and tell runC to do a lazy restore: # runc restore -d --image-path checkpoint --work-path checkpoint \ --lazy-pages httpd If both processes on the restore side have the same working directory 'criu lazy-pages' creates a unix domain socket where it waits for requests from the actual restore. runC starts CRIU restore in lazy restore mode and talks to 'criu lazy-pages' that it wants to restore memory pages on demand. CRIU continues to restore the process and once the process is running and accesses the first non-existing memory page the 'criu lazy-pages' server will request the page from the source system. Thus all pages from the source system will be transferred to the destination system. Once all pages have been transferred runC on the source system will end and the container will have finished migration. This can also be combined with CRIU's pre-copy support. The combination of pre-copy and post-copy (lazy migration) provides the possibility to migrate containers with minimal downtimes. Some additional background about post-copy migration can be found in these articles: https://lisas.de/~adrian/?p=1253 https://lisas.de/~adrian/?p=1183 Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Adrian Reber	a3a632ad28	checkpoint: add support to query for lazy page support Before adding the actual lazy migration support, this adds the feature check for lazy-pages. Right now lazy migration, which is based on userfaultd is only available in the criu-dev branch and not yet in a release. As the check does not dependent on a certain version but on a CRIU feature which can be queried it can be part of runC without a new version check depending on a feature from criu-dev. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Mrunal Patel	aea4f21eec	Merge pull request #1575 from cyphar/tty-resize-ignore-errors signal: ignore tty.resize errors	2017-09-01 11:20:26 -07:00
Xiaochen Shen	4d2756c116	libcontainer: add test cases for Intel RDT/CAT Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:35:40 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Xiaochen Shen	af3b0d9dce	libcontainer/SPEC.md: add documentation for Intel RDT/CAT Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Michael Crosby	84a082bfef	Merge pull request #1578 from cyphar/remove-shfmt-from-ci travis: drop shfmt install	2017-08-31 09:46:39 -04:00
Aleksa Sarai	ace083b650	travis: drop shfmt install It looks like we missed this in `5930d5b427` ("Remove shfmt"), which was causing CI to break (since it looks like the repo has moved or something like that). Since we're no longer using shfmt, drop it completely from the repo. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-31 20:49:51 +10:00
Aleksa Sarai	10b175ce49	signal: ignore tty.resize errors Fixes a race that occurred very frequently in testing where the tty of the container may be closed by the time that runc gets to sending SIGWINCH. This failure mode is not fatal, but it would cause test failures due to expected outputs not matching. On further review it appears that the original addition of these checks in `4c5bf649d0` ("Check error return values") was actually not necessary, so partially revert that change. The particular failure mode this resolves would manifest as error logs of the form: time="2017-08-24T07:59:50Z" level=error msg="bad file descriptor" Fixes: `4c5bf649d0` ("Check error return values") Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-27 00:44:17 +10:00
Aleksa Sarai	1f32fff46d	setns init: delay seccomp as late as possible This mirrors the standard_init_linux.go seccomp code, which only applies seccomp early if NoNewPrivileges is enabled. Otherwise it's done immediately before execve to reduce the amount of syscalls necessary for users to enable in their seccomp profiles. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-26 13:42:30 +10:00
Aleksa Sarai	3ddde27d7d	init: move close(stateDirFd) before seccomp apply This further reduces the number of syscalls that a user needs to enable in their seccomp profile. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-26 13:42:26 +10:00
Qiang Huang	1c81e2a794	Merge pull request #1572 from tych0/fix-readonly-userns fix --read-only containers under --userns-remap	2017-08-26 09:38:14 +08:00
Aleksa Sarai	4d6e6720a7	Merge branch 'pr-1573' Fix systemd cgroup after memory type changed LGTMs: @crosbymichael @cyphar Closes #1573	2017-08-25 23:55:27 +10:00
Michael Crosby	4e33faefa7	Merge pull request #1570 from cyphar/close-statedirfd-hole init: switch away from stateDirFd entirely	2017-08-25 09:52:16 -04:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00
Aleksa Sarai	7d66aab77a	init: switch away from stateDirFd entirely While we have significant protections in place against CVE-2016-9962, we still were holding onto a file descriptor that referenced the host filesystem. This meant that in certain scenarios it was still possible for a semi-privileged container to gain access to the host filesystem (if they had CAP_SYS_PTRACE). Instead, open the FIFO itself using a O_PATH. This allows us to reference the FIFO directly without providing the ability for directory-level access. When opening the FIFO inside the init process, open it through procfs to re-open the actual FIFO (this is currently the only supported way to open such a file descriptor). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-25 13:19:03 +10:00
Tycho Andersen	66eb2a3e8f	fix --read-only containers under --userns-remap The documentation here: https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations says that readonly containers can't be used with user namespaces do to some kernel restriction. In fact, there is a special case in the kernel to be able to do stuff like this, so let's use it. This takes us from: ubuntu@docker:~$ docker run -it --read-only ubuntu docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:125: remounting \\\"/dev\\\" as readonly caused \\\"operation not permitted\\\"\"". to: ubuntu@docker:~$ docker-runc --version runc version 1.0.0-rc4+dev commit: ae2948042b08ad3d6d13cd09f40a50ffff4fc688-dirty spec: 1.0.0 ubuntu@docker:~$ docker run -it --read-only ubuntu root@181e2acb909a:/# touch foo touch: cannot touch 'foo': Read-only file system Signed-off-by: Tycho Andersen <tycho@docker.com>	2017-08-24 16:43:21 -06:00
Michael Crosby	ae2948042b	Merge pull request #1561 from nseps/master Add AutoDedup option to CriuOpts	2017-08-18 12:50:27 -04:00
Nikolas Sepos	3f234b15d0	Add auto-dedup flag for checkpoint/restore When doing incremental dumps is useful to use auto deduplication of memory images to save space. Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>	2017-08-18 16:19:21 +02:00
Nikolas Sepos	da4a5a9515	Add AutoDedup option to CriuOpts Memory image deduplication, very useful for incremental dumps. See: https://criu.org/Memory_images_deduplication Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>	2017-08-18 01:21:42 +02:00

... 3 4 5 6 7 ...

3614 Commits All Branches Search

3614 Commits

All Branches