jasder/runc - runc - 军科开源项目托管

Commit Graph

Author	SHA1	Message	Date
Aleksa Sarai	6097ce74d8	nsenter: correctly handle newgidmap path for rootless containers After quite a bit of debugging, I found that previous versions of this patchset did not include newgidmap in a rootless setting. Fix this by passing it whenever group mappings are applied, and also providing some better checking for try_mapping_tool. This commit also includes some stylistic improvements. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	3282f5a7c1	tests: fix for rootless multiple uids/gids Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2017-09-09 12:45:32 +10:00
Giuseppe Scrivano	d8b669400a	rootless: allow multiple user/group mappings Take advantage of the newuidmap/newgidmap tools to allow multiple users/groups to be mapped into the new user namespace in the rootless case. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> [ rebased to handle intelrdt changes. ] Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-09-09 12:45:32 +10:00
Mrunal Patel	13fa5d2953	Merge pull request #1588 from s7v7nislands/delete_unused Delete unused function	2017-09-08 17:34:00 -07:00
Michael Crosby	b82d07e816	Merge pull request #1587 from Mashimiao/fix-namespace-empty Fixes #1585 config.Namespaces is empty when accessed	2017-09-08 10:50:16 -04:00
Xiaochen Shen	88d22fde40	libcontainer: intelrdt: use init() to avoid race condition This is the follow-up PR of #1279 to fix remaining issues: Use init() to avoid race condition in IsIntelRdtEnabled(). Add also rename some variables and functions. Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-08 17:15:31 +08:00
s7v7nislands	c795b8690b	Delete unused function Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>	2017-09-08 10:35:46 +08:00
Ma Shimiao	c3d20e7817	Fixes #1585 config.Namespaces is empty when accessed Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-09-08 09:30:07 +08:00
Mrunal Patel	deb9d7fd96	Merge pull request #1569 from cyphar/delay-seccomp init: delay seccomp application as late as possible	2017-09-07 13:27:37 -07:00
Mrunal Patel	7e036aa0b0	Merge pull request #1541 from adrianreber/lazy checkpoint: support lazy migration	2017-09-07 13:25:04 -07:00
Michael Crosby	7062c7556b	Apply cgroups earlier This applies cgroups earlier for container creation before the init process starts running and forking off any additional processes. Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-09-07 11:27:33 -04:00
Adrian Reber	60ae7091de	checkpoint: support lazy migration With the help of userfaultfd CRIU supports lazy migration. Lazy migration means that memory pages are only transferred from the migration source to the migration destination on page fault. This enables to reduce the downtime during process or container migration to a minimum as the memory does not need to be transferred during migration. Lazy migration currently depends on userfaultfd being available on the current Linux kernel and if the used CRIU version supports lazy migration. Both dependencies can be checked by querying CRIU via RPC if the lazy migration feature is available. Using feature checking instead of version comparison enables runC to use CRIU features from the criu-dev branch. This way the user can decide if lazy migration should be available by choosing the right kernel and CRIU branch. To use lazy migration the CRIU process during dump needs to dump everything besides the memory pages and then it opens a network port waiting for remote page fault requests: # runc checkpoint httpd --lazy-pages --page-server 0.0.0.0:27 \ --status-fd /tmp/postcopy-pipe In this example CRIU will hang/wait once it has opened the network port and wait for network connection. As runC waits for CRIU to finish it will also hang until the lazy migration has finished. To know when the restore on the destination side can start the '--status-fd' parameter is used: #️ runc checkpoint --help \| grep status --status-fd value criu writes \0 to this FD once lazy-pages is ready The parameter '--status-fd' is directly from CRIU and this way the process outside of runC which controls the migration knows exactly when to transfer the checkpoint (without memory pages) to the destination and that the restore can be started. On the destination side it is necessary to start CRIU in 'lazy-pages' mode like this: # criu lazy-pages --page-server --address 192.168.122.3 --port 27 \ -D checkpoint and tell runC to do a lazy restore: # runc restore -d --image-path checkpoint --work-path checkpoint \ --lazy-pages httpd If both processes on the restore side have the same working directory 'criu lazy-pages' creates a unix domain socket where it waits for requests from the actual restore. runC starts CRIU restore in lazy restore mode and talks to 'criu lazy-pages' that it wants to restore memory pages on demand. CRIU continues to restore the process and once the process is running and accesses the first non-existing memory page the 'criu lazy-pages' server will request the page from the source system. Thus all pages from the source system will be transferred to the destination system. Once all pages have been transferred runC on the source system will end and the container will have finished migration. This can also be combined with CRIU's pre-copy support. The combination of pre-copy and post-copy (lazy migration) provides the possibility to migrate containers with minimal downtimes. Some additional background about post-copy migration can be found in these articles: https://lisas.de/~adrian/?p=1253 https://lisas.de/~adrian/?p=1183 Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Adrian Reber	a3a632ad28	checkpoint: add support to query for lazy page support Before adding the actual lazy migration support, this adds the feature check for lazy-pages. Right now lazy migration, which is based on userfaultd is only available in the criu-dev branch and not yet in a release. As the check does not dependent on a certain version but on a CRIU feature which can be queried it can be part of runC without a new version check depending on a feature from criu-dev. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-09-06 12:35:38 +00:00
Xiaochen Shen	4d2756c116	libcontainer: add test cases for Intel RDT/CAT Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:35:40 +08:00
Xiaochen Shen	692f6e1e27	libcontainer: add support for Intel RDT/CAT in runc About Intel RDT/CAT feature: Intel platforms with new Xeon CPU support Intel Resource Director Technology (RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which currently supports L3 cache resource allocation. This feature provides a way for the software to restrict cache allocation to a defined 'subset' of L3 cache which may be overlapping with other 'subsets'. The different subsets are identified by class of service (CLOS) and each CLOS has a capacity bitmask (CBM). For more information about Intel RDT/CAT can be found in the section 17.17 of Intel Software Developer Manual. About Intel RDT/CAT kernel interface: In Linux 4.10 kernel or newer, the interface is defined and exposed via "resource control" filesystem, which is a "cgroup-like" interface. Comparing with cgroups, it has similar process management lifecycle and interfaces in a container. But unlike cgroups' hierarchy, it has single level filesystem layout. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \|-- cbm_mask \| \|-- min_cbm_bits \| \|-- num_closids \|-- cpus \|-- schemata \|-- tasks \|-- <container_id> \|-- cpus \|-- schemata \|-- tasks For runc, we can make use of `tasks` and `schemata` configuration for L3 cache resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. If a pid is not in any sub group, it Is in root group. The file `schemata` has allocation bitmasks/values for L3 cache on each socket, which contains L3 cache id and capacity bitmask (CBM). Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..." For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0` which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0. The valid L3 cache CBM is a contiguous bits set and number of bits that can be set is less than the max bit. The max bits in the CBM is varied among supported Intel Xeon platforms. In Intel RDT "resource control" filesystem layout, the CBM in a group should be a subset of the CBM in root. Kernel will check if it is valid when writing. e.g., 0xfffff in root indicates the max bits of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc. For more information about Intel RDT/CAT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. With this configuration, tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: "linux": { "intelRdt": { "l3CacheSchema": "L3:0=ffff0;1=3ff" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Xiaochen Shen	af3b0d9dce	libcontainer/SPEC.md: add documentation for Intel RDT/CAT Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>	2017-09-01 14:26:33 +08:00
Aleksa Sarai	1f32fff46d	setns init: delay seccomp as late as possible This mirrors the standard_init_linux.go seccomp code, which only applies seccomp early if NoNewPrivileges is enabled. Otherwise it's done immediately before execve to reduce the amount of syscalls necessary for users to enable in their seccomp profiles. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-26 13:42:30 +10:00
Aleksa Sarai	3ddde27d7d	init: move close(stateDirFd) before seccomp apply This further reduces the number of syscalls that a user needs to enable in their seccomp profile. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-26 13:42:26 +10:00
Qiang Huang	1c81e2a794	Merge pull request #1572 from tych0/fix-readonly-userns fix --read-only containers under --userns-remap	2017-08-26 09:38:14 +08:00
Aleksa Sarai	4d6e6720a7	Merge branch 'pr-1573' Fix systemd cgroup after memory type changed LGTMs: @crosbymichael @cyphar Closes #1573	2017-08-25 23:55:27 +10:00
Qiang Huang	acaf6897f5	Fix systemd cgroup after memory type changed Fixes: #1557 I'm not quite sure about the root cause, looks like systemd still want them to be uint64. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-25 01:14:16 -04:00
Aleksa Sarai	7d66aab77a	init: switch away from stateDirFd entirely While we have significant protections in place against CVE-2016-9962, we still were holding onto a file descriptor that referenced the host filesystem. This meant that in certain scenarios it was still possible for a semi-privileged container to gain access to the host filesystem (if they had CAP_SYS_PTRACE). Instead, open the FIFO itself using a O_PATH. This allows us to reference the FIFO directly without providing the ability for directory-level access. When opening the FIFO inside the init process, open it through procfs to re-open the actual FIFO (this is currently the only supported way to open such a file descriptor). Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-08-25 13:19:03 +10:00
Tycho Andersen	66eb2a3e8f	fix --read-only containers under --userns-remap The documentation here: https://docs.docker.com/engine/security/userns-remap/#user-namespace-known-limitations says that readonly containers can't be used with user namespaces do to some kernel restriction. In fact, there is a special case in the kernel to be able to do stuff like this, so let's use it. This takes us from: ubuntu@docker:~$ docker run -it --read-only ubuntu docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "process_linux.go:339: container init caused \"rootfs_linux.go:125: remounting \\\"/dev\\\" as readonly caused \\\"operation not permitted\\\"\"". to: ubuntu@docker:~$ docker-runc --version runc version 1.0.0-rc4+dev commit: ae2948042b08ad3d6d13cd09f40a50ffff4fc688-dirty spec: 1.0.0 ubuntu@docker:~$ docker run -it --read-only ubuntu root@181e2acb909a:/# touch foo touch: cannot touch 'foo': Read-only file system Signed-off-by: Tycho Andersen <tycho@docker.com>	2017-08-24 16:43:21 -06:00
Nikolas Sepos	da4a5a9515	Add AutoDedup option to CriuOpts Memory image deduplication, very useful for incremental dumps. See: https://criu.org/Memory_images_deduplication Signed-off-by: Nikolas Sepos <nikolas.sepos@gmail.com>	2017-08-18 01:21:42 +02:00
Michael Crosby	ccd2c20aa4	Merge pull request #1559 from Mashimiao/panic-fix-nil-linux fix panic when Linux is nil for rootless case	2017-08-17 09:57:35 -04:00
Ma Shimiao	2333e7dc67	fix panic when Linux is nil for rootless case congfig.Sysctl setting is duplicated. when contianer is rootless and Linux is nil, runc will panic. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>	2017-08-16 09:11:13 +08:00
Mrunal Patel	b31bdfc38a	Merge pull request #1558 from hqhq/update_state Update state after update	2017-08-15 10:46:44 -07:00
Qiang Huang	e6e1c34a7d	Update state after update state.json should be a reflection of the container's realtime state, including resource configurations, so we should update state.json after updating container resources. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-15 14:38:44 +08:00
Michael Crosby	3096b3fc85	Merge pull request #1556 from hqhq/fix_flakytest_TestNotifyOnOOM Fix flaky test TestNotifyOnOOM	2017-08-14 10:03:23 -04:00
Qiang Huang	7726bcf0e2	Some fixes for testMemoryNotification Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-14 15:28:03 +08:00
Qiang Huang	40a1fb0e2f	Fix flaky test TestNotifyOnOOM Fixes: #1228 It can be reproduced by applying this patch: ```diff @@ -45,6 +46,7 @@ func registerMemoryEvent(cgDir string, evName string, arg string) (<-chan struct go func() { defer func() { close(ch) + <-time.After(1 * time.Second) eventfd.Close() evFile.Close() }() ``` We can close channel after fds were closed. Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-08-14 15:18:59 +08:00
Ma Shimiao	527dc5acbb	fix panic when Linux is nil Linux is not always not nil. If Linux is nil, panic will occur. Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com> Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-08-10 15:57:49 -04:00
Kenfe-Mickael Laventure	3ed492ad33	Handle non-devices correctly in DeviceFromPath Before this change, some file type would be treated as char devices (e.g. symlinks). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-09 08:52:20 -07:00
Alex Fang	e92add2151	Pass back the pid of runc:[1:CHILD] so we can wait on it This allows the libcontainer to automatically clean up runc:[1:CHILD] processes created as part of nsenter. Signed-off-by: Alex Fang <littlelightlittlefire@gmail.com>	2017-08-05 13:44:36 +10:00
Aleksa Sarai	45bde006ca	merge branch 'pr-1535' LGTMs: @avagin @cyphar Closes #1535	2017-08-05 13:33:07 +10:00
Aleksa Sarai	22bbec1b7f	merge branch 'pr-1548' LGTMs: @crosbymichael @mrunalp @cyphar Closes #1548	2017-08-05 13:02:46 +10:00
Mrunal Patel	135b9992b3	Merge pull request #1544 from mlaventure/fix-device-from-path Fix condition to detect device type in DeviceFromPath	2017-08-04 17:36:57 -07:00
Kenfe-Mickael Laventure	6056912217	Revert "Merge pull request #1450 from vrothberg/sgid-non-numeric" This reverts commit `5c73abbe75`, reversing changes made to `51b501dab1`. Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-04 14:28:21 -07:00
Kenfe-Mickael Laventure	25f4c7e72b	Move user pkg unix specific calls to unix file Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-03 11:31:21 -07:00
Kenfe-Mickael Laventure	9ed15e94c8	Fix condition to detect device type in DeviceFromPath Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>	2017-08-03 11:06:54 -07:00
Adrian Reber	5d386f6e2b	checkpoint: use CRIU VERSION RPC if available With this runC also uses RPC to ask CRIU for its version. CRIU supports a VERSION RPC since CRIU 3.0 and using the RPC interface does not require parsing the console output of CRIU (which could change anytime). For older CRIU versions which do not yet have the VERSION RPC runC falls back to its old CRIU output parsing mode. Once CRIU 3.0 is the minimum version required for runC the old code can be removed. v2: * adapt to changes in the previous patches based on the review Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:08:07 +00:00
Adrian Reber	2393692536	criurpc.proto: copy latest criurpc.proto from criu 3.3 Update criurpc.proto for the upcoming VERSION RPC. This includes lazy_pages for the upcoming lazy migration support. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:07:32 +00:00
Adrian Reber	c71d9cd447	criuSwrk: prepare for CRIU VERSION RPC To use the CRIU VERSION RPC the criuSwrk function is adapted to work with CriuOpts set to 'nil' as CriuOpts is not required for the VERSION RPC. Also do not print c.criuVersion if it is '0' as the first RPC call will always be the VERSION call and only after that the version will be known. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:07:28 +00:00
Adrian Reber	c5f0ce979b	checkCriuVersion: only ask criu once about its version If the version of criu has already been determined there is no need to ask criu for the version again. Use the value from c.criuVersion. v2: * reduce unnecessary code movement in the patch series * factor out the criu version parsing into a separate function Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:07:15 +00:00
Adrian Reber	b6c47281db	checkCriuVersion: switch to version using int The checkCriuVersion function used a string to specify the minimum version required. This is more comfortable for an external interface but for an internal function this added unnecessary complexity. This changes to version string like '1.5.2' to an integer like 10502. This is already the format used internally in the function. Signed-off-by: Adrian Reber <areber@redhat.com>	2017-08-02 16:05:27 +00:00
Michael Crosby	882d8eaba6	Merge pull request #1537 from tklauser/staticcheck Fix issues found by staticcheck	2017-08-02 09:52:11 -04:00
Daniel, Dao Quang Minh	b313a75364	Merge pull request #1477 from yummypeng/save-own-ns-path Always save own namespace paths	2017-08-02 11:24:30 +01:00
Tobias Klauser	e4e56cb6d8	libcontainer: remove ineffective break statements go's switch statement doesn't need an explicit break. Remove it where that is the case and add a comment to indicate the purpose where the removal would lead to an empty case. Found with honnef.co/go/tools/cmd/staticcheck Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-28 15:13:39 +02:00
Tobias Klauser	24a4273cf9	libcontainer: handle error cases Handle err return value of fmt.Scanf, os.Pipe and unix.ParseUnixRights. Found with honnef.co/go/tools/cmd/staticcheck Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-28 15:13:11 +02:00
Daniel Dao	91eafcbc65	tty: move IO of master pty to be done with epoll This moves all console code to use github.com/containerd/console library to handle console I/O. Also move to use EpollConsole by default when user requests a terminal so we can still cope when the other side temporarily goes away. Signed-off-by: Daniel Dao <dqminh89@gmail.com>	2017-07-28 12:35:02 +01:00
Michael Crosby	e775f0fba3	Merge pull request #1526 from stevenh/logrus-v1 Updated logrus to v1	2017-07-27 13:28:55 -04:00
yangshukui	5428532bdd	remove the code that close negative descriptor Signed-off-by: yangshukui <yangshukui@huawei.com>	2017-07-24 11:10:18 +08:00
Tobias Klauser	b0d014d0e1	libcontainer: one more switch from syscall to x/sys/unix Refactor DeviceFromPath in order to get rid of package syscall and directly use the functions from x/sys/unix. This also allows to get rid of the conversion from the OS-independent file mode values (from the os package) to Linux specific values and instead let's us use the raw file mode value directly. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-21 16:59:15 +02:00
Steven Hartland	ee4f68e302	Updated logrus to v1 Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen. This includes a manual edit of the docker term package to also correct the name there too. Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>	2017-07-19 15:20:56 +00:00
Daniel, Dao Quang Minh	7ab4f43a4b	Merge pull request #1519 from tklauser/moar-unix libcontainer: use additional functions and constants from x/sys/unix	2017-07-17 10:07:22 +01:00
Qiang Huang	825b5c020a	Merge pull request #1516 from cyphar/list-casting-unicode list: fix various problems with owner field	2017-07-16 14:57:20 +08:00
Tobias Klauser	4019833d46	libcontainer: use PR_SET_NO_NEW_PRIVS from x/sys/unix Use PR_SET_NO_NEW_PRIVS defined in golang.org/x/sys/unix instead of manually defining it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-13 15:31:33 +02:00
Tobias Klauser	54d27bed7f	libcontainer: use ParseSocketControlMessage/ParseUnixRights from x/sys/unix Use ParseSocketControlMessage and ParseUnixRights from golang.org/x/sys/unix instead of their syscall equivalent. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-13 15:02:17 +02:00
Yuanhong Peng	e939079acf	Always save own namespace paths fix #1476 If containerA shares namespace, say ipc namespace, with containerB, then its ipc namespace path would be the same as containerB and be stored in `state.json`. Exec into containerA will just read the namespace paths stored in this file and join these namespaces. So, if containerB has already been stopped, `docker exec containerA` will fail. To address this issue, we should always save own namespace paths no matter if we share namespaces with other containers. Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>	2017-07-13 16:13:05 +08:00
Michael Crosby	eb70c213ba	Update runtime-spec to rc6 Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-07-12 16:24:04 -07:00
Aleksa Sarai	7cfb107f2c	factory: use e{u,g}id as the owner of /run/runc/$id It appears as though these semantics were not fully thought out when implementing them for rootless containers. It is not necessary (and could be potentially dangerous) to set the owner of /run/ctr/$id to be the root inside the container (if user namespaces are being used). Instead, just use the e{g,u}id of runc to determine the owner. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-07-12 06:30:46 +10:00
Tobias Klauser	078e903296	libcontainer: use ioctl wrappers from x/sys/unix Use IoctlGetInt and IoctlGetTermios/IoctlSetTermios instead of manually reimplementing them. Because of unlockpt, the ioctl wrapper is still needed as it needs to pass a pointer to a value, which is not supported by any ioctl function in x/sys/unix yet. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-10 10:56:58 +02:00
Tobias Klauser	a380fae959	libcontainer: use Prctl() from x/sys/unix Use unix.Prctl() instead of manually reimplementing it using unix.RawSyscall. Also use unix.SECCOMP_MODE_FILTER instead of locally defining it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-07-10 10:56:58 +02:00
Michael Crosby	5c73abbe75	Merge pull request #1450 from vrothberg/sgid-non-numeric libcontainer/user: add supplementary groups only for non-numeric users	2017-07-07 09:43:30 -07:00
Daniel, Dao Quang Minh	7139b61f7f	Merge pull request #1378 from derekwaynecarr/expose_use_hierarchy Expose memory.use_hierarchy in MemoryStats	2017-06-30 16:08:21 +01:00
Michael Crosby	fef3aced0e	Merge pull request #1460 from wking/mount-option-lazytime libcontainer/specconv/spec_linux: Add support for (no)lazytime	2017-06-29 10:06:23 -07:00
Aleksa Sarai	117c92745b	rootfs: switch ms_private remount of oldroot to ms_slave Using MS_PRIVATE meant that there was a race between the mount(2) and the umount2(2) calls where runc inadvertently has a live reference to a mountpoint that existed on the host (which the host cannot kill implicitly through an unmount and peer sharing). In particular, this means that if we have a devicemapper mountpoint and the host is trying to delete the underlying device, the delete will fail because it is "in use" during the race. While the race is _very_ small (and libdm actually retries to avoid these sorts of cases) this appears to manifest in various cases. Signed-off-by: Aleksa Sarai <asarai@suse.de>	2017-06-29 01:20:23 +10:00
Justin Cormack	3d9074ead3	Update memory specs to use int64 not uint64 replace #1492 #1494 fix #1422 Since https://github.com/opencontainers/runtime-spec/pull/876 the memory specifications are now `int64`, as that better matches the visible interface where `-1` is a valid value. Otherwise finding the correct value was difficult as it was kernel dependent. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Justin Cormack	e1146182a8	Remove Platform as no longer in OCI spec This was never used, just validated, so was removed from spec. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-06-27 12:16:07 +01:00
Michael Crosby	d337d807fc	Merge pull request #1482 from tklauser/x-sys-unix-keyctl Use keyctl wrappers from x/sys/unix	2017-06-23 11:07:55 -07:00
Mrunal Patel	8e1896b3bd	Merge pull request #1491 from tklauser/unix-eventfd Use Eventfd() from golang.org/x/sys/unix	2017-06-22 19:02:44 -07:00
Michael Crosby	bd65ef625d	Merge pull request #1489 from wking/process-status libcontainer/container_linux: Consider process state (running, zombie, etc.) in runType	2017-06-21 10:24:04 -07:00
Tobias Klauser	da4cebcfe2	libcontainer: use Eventfd() from x/sys/unix Use unix.Eventfd() instead of calling manually reimplementing it using the raw syscall. Also use the correct corresponding unix.EFD_CLOEXEC flag instead of unix.FD_CLOEXEC (which can have a different value on some architectures and thus might lead to unexpected behavior). Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-06-21 10:02:00 +02:00
W. Trevor King	2bea4c897e	libcontainer/system/proc: Add Stat_t.State And Stat_t.PID and Stat_t.Name while we're at it. Then use the new .State property in runType to distinguish between running and zombie/dead processes, since kill(2) does not [1]. With this change we no longer claim Running status for zombie/dead processes. I've also removed the kill(2) call from runType. It was originally added in `13841ef3` (new-api: return the Running state only if the init process is alive, 2014-12-23), but we've been accessing /proc/[pid]/stat since `14e95b2a` (Make state detection precise, 2016-07-05, #930), and with the /stat access the kill(2) check is redundant. I also don't see much point to the previously-separate doesInitProcessExist, so I've inlined that logic in runType. It would be nice to distinguish between "/proc/[pid]/stat doesn't exist" and errors parsing its contents, but I've skipped that for the moment. The Running -> Stopped change in checkpoint_test.go is because the post-checkpoint process is a zombie, and with this commit zombie processes are Stopped (and no longer Running). [1]: https://github.com/opencontainers/runc/pull/1483#issuecomment-307527789 Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
W. Trevor King	75d98b26b7	libcontainer: Replace GetProcessStartTime with Stat_t.StartTime And convert the various start-time properties from strings to uint64s. This removes all internal consumers of the deprecated GetProcessStartTime function. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-20 16:26:55 -07:00
Michael Crosby	6e57120d9f	Merge pull request #1481 from elianka/dev update READ.me for new struct configs.Config.Capabilities	2017-06-20 13:15:04 -07:00
W. Trevor King	439eaa3584	libcontainer/system/proc: Add Stat and Stat_t So we can extract more than the start time with a single read. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-14 15:28:03 -07:00
Tobias Klauser	cfe87fe3e2	Use keyctl wrappers from x/sys/unix Use KeyctlJoinSessionKeyring, KeyctlString and KeyctlSetperm from golang.org/x/sys/unix instead of manually reimplementing them. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-06-09 15:55:18 +02:00
Kang Liang	a341724c95	update READ.me for new struct configs.Config.Capabilities Signed-off-by: Kang Liang <kangliang424@gmail.com>	2017-06-09 18:47:05 +08:00
W. Trevor King	830c0d70df	libcontainer/console_linux.go: Make SaneTerminal public And use it only in local tooling that is forwarding the pseudoterminal master. That way runC no longer has an opinion on the onlcr setting for folks who are creating a terminal and detaching. They'll use --console-socket and can setup the pseudoterminal however they like without runC having an opinion. With this commit, the only cases where runC still has applies SaneTerminal is when it is the process consuming the master descriptor. Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-07 21:32:41 -07:00
Tobias Klauser	553016d7da	Use Prctl() from x/sys/unix instead of own wrapper Use unix.Prctl() instead of reimplemnting it as system.Prctl(). Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-06-07 15:03:15 +02:00
Mrunal Patel	9d6821d1b5	Merge pull request #1473 from crosbymichael/update-spec Update spec to `239c4e44f2`	2017-06-06 10:26:07 -07:00
Vladimir Stefanovic	d01050e6d4	Add support for mips/mips64 Signed-off-by: Vladimir Stefanovic <vladimir.stefanovic@imgtec.com>	2017-06-02 22:30:00 +02:00
Tobias Klauser	306b4980f7	Use NLA_* constants from x/sys/unix instead of syscall Use the NLA_ALIGNTO and NLA_HDRLEN constants from x/sys/unix instead of syscall, as the syscall package shouldn't be used anymore (except for a few exceptions). This also makes the syscall_NLA_HDRLEN workaround for gccgo unnecessary. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-06-02 10:42:11 +02:00
W. Trevor King	4f81337e95	libcontainer/specconv/spec_linux: Add support for (no)lazytime And also silent, loud, (no)iversion, and (no)acl. This is part of catching runC up with the spec, which punts valid options to mount(8) [1,2]. (no)acl is a filesystem-specific entry in mount(8), but it's represented by a MS_* flag in mount(2) so we need an entry in the translation table. [1]: https://github.com/opencontainers/runtime-spec/blame/v1.0.0-rc5/config.md#L68 [2]: https://github.com/opencontainers/runtime-spec/pull/771 Signed-off-by: W. Trevor King <wking@tremily.us>	2017-06-01 20:43:35 -07:00
Michael Crosby	18f336d23b	Merge pull request #1470 from tklauser/x-sys-unix-symlink-xattrs Use symlink xattr functions from x/sys/unix	2017-06-01 18:14:19 -07:00
Michael Crosby	854b41d81e	Update spec to `239c4e44f2` This provides updates to runc for the spec changes with *Process and OOMScoreAdj Signed-off-by: Michael Crosby <crosbymichael@gmail.com>	2017-06-01 16:29:47 -07:00
Tobias Klauser	d8b5c1c810	Use symlink xattr functions from x/sys/unix Use the symlink xattr syscall wrappers Lgetxattr, Llistxattr and Lsetxattr from x/sys/unix (introduced in golang/sys@b90f89a1e7) instead of providing own wrappers. Leave the functionality of system.Lgetxattr intact with respect to the retry with a larger buffer, but switch it to use unix.Lgetxattr. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-05-31 13:50:34 +02:00
Tobias Klauser	b5768387c6	Switch examples in README.md from syscall to x/sys/unix Follow commit `3d7cb4293c` ("Move libcontainer to x/sys/unix") and also move the examples in README.md from syscall to x/sys/unix. Signed-off-by: Tobias Klauser <tklauser@distanz.ch>	2017-05-30 14:50:59 +02:00
Daniel, Dao Quang Minh	67bd2ab554	Merge pull request #1442 from clnperez/libcontainer-sys-unix Move libcontainer to x/sys/unix	2017-05-26 12:18:33 +01:00
Qiang Huang	d7c264aaf1	Merge pull request #1239 from moypray/cgroup Fix setup cgroup before prestart hook	2017-05-26 09:22:49 +08:00
Michael Crosby	18cd7e06f7	Merge pull request #1372 from cloudfoundry-incubator/cpuset-mount-root Handle container creation when cgroups have already been mounted in another location	2017-05-25 09:53:57 -07:00
Christy Perez	3d7cb4293c	Move libcontainer to x/sys/unix Since syscall is outdated and broken for some architectures, use x/sys/unix instead. There are still some dependencies on the syscall package that will remain in syscall for the forseeable future: Errno Signal SysProcAttr Additionally: - os still uses syscall, so it needs to be kept for anything returning *os.ProcessState, such as process.Wait. Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>	2017-05-22 17:35:20 -05:00
Wentao Zhang	09c1f5c055	Fix setup cgroup before prestart hook * User Case: User could use prestart hook to add block devices to container. so the hook should have a way to set the permissions of the devices. Just move cgroup config operation before prestart hook will work. Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>	2017-05-19 17:53:43 +08:00
Mrunal Patel	639454475c	Merge pull request #1355 from avagin/cr-console Dump and restore containers with external terminals	2017-05-18 11:22:52 -07:00
Valentin Rothberg	77421139ab	libcontainer/user: add supplementary groups only for non-numeric users Signed-off-by: Valentin Rothberg <vrothberg@suse.com>	2017-05-16 13:54:27 +02:00
Justin Cormack	4c67360296	Clean up unix vs linux usage FreeBSD does not support cgroups or namespaces, which the code suggested, and is not supported in runc anyway right now. So clean up the file naming to use `_linux` where appropriate. Signed-off-by: Justin Cormack <justin.cormack@docker.com>	2017-05-12 17:22:09 +01:00
Qiang Huang	21ef2e3d12	Merge pull request #1410 from chchliang/statustest add createdState and runningState status testcase	2017-05-12 16:17:17 +08:00
Michael Crosby	2daa11574b	Merge pull request #1438 from hqhq/fix_rootfs_comments Fix comments about when to pivot_root	2017-05-05 20:15:49 -07:00
Qiang Huang	96e0df7633	Fix comments about when to pivot_root Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>	2017-05-06 07:59:03 +08:00

1 2 3 4 5 ...

1046 Commits