Commit Graph

2080 Commits

Author SHA1 Message Date
Qiang Huang be6764508e Set cpuset.cpus and cpuset.mems before join the cgroup
It can avoid unnecessary task migrataion, see this scenario:
 - container init task is on cpu 1, and we assigned it to cpu 1,
   but parent cgroup's cpuset.cpus=2
 - we created the cgroup dir and inherited cpuset.cpus from parent as 2
 - write container init task's pid to cgroup.procs
 - [it's possibile the container init task migrated to cpu 2 here]
 - set cpuset.cpus as assigned to cpu 1
 - [the container init task has to be migrated back to cpu 1]

So we should set cpuset.cpus and cpuset.mems before writing pids
to cgroup.procs to aviod such problem.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-10-15 11:16:56 +08:00
Mrunal Patel 7a95a531ba Merge pull request #333 from heavenlyhash/configurable-logfmt
Add ability to use json structured logging format.
2015-10-14 11:23:10 -07:00
Eric Myhre 2add2bc41a Add ability to use json structured logging format.
Signed-off-by: Eric Myhre <hash@exultant.us>
2015-10-13 22:57:07 -05:00
Mrunal Patel 872c4ac223 Merge pull request #332 from LK4D4/fix_panic_in_getpids
Reorder checks in Walk to avoid panics
2015-10-13 15:33:31 -07:00
Alexander Morozov 6c198ae2d0 Reorder checks in Walk to avoid panics
Also added test for host PID namespace

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-10-13 15:06:57 -07:00
Alexander Morozov 2bec85d74b Merge pull request #330 from LK4D4/recursive_pids
Get PIDs from cgroups recursively
2015-10-13 11:06:21 -07:00
Alexander Morozov 6dad176d01 Get PIDs from cgroups recursively
Also lookup cgroup for systemd is changed to "device" to be consistent
with fs implementation.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-10-13 10:19:01 -07:00
Adrian Reber c42ef59bf9 Add criu related debug output
While testing different versions of criu it helps to know which
criu binary with which options is currently used. Therefore additional
debug output to display these information is added.

v2: increase readability of printed out criu options

Signed-off-by: Adrian Reber <adrian@lisas.de>
2015-10-13 10:41:00 +02:00
Alexander Morozov d9ba9cebac Merge pull request #184 from huikang/criu-cgroup-manage-mode
Add option to support criu manage cgroups mode for dump and restore
2015-10-12 10:51:16 -07:00
Michael Crosby 869a582fd2 Merge pull request #177 from LK4D4/avagin_maintainer
Add Andrey Vagin as maintainer for runc
2015-10-12 10:35:33 -07:00
Mrunal Patel bfe2bacbf4 Merge pull request #320 from rhatdan/label
Validate label options
2015-10-11 20:54:38 -07:00
Hui Kang 25da513c4b Add option to support criu manage cgroups mode for dump and restore
CRIU supports cgroup-manage mode from v1.7

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-10-11 04:42:54 +00:00
Dan Walsh f8b34352fe Validate label options
Only valid options to --security-opt for label should be
disable, user, role, type, level.

Return error on invalid entry

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-10-10 06:51:49 -04:00
Alexander Morozov 76674393ef Merge pull request #324 from mrunalp/add_groups
Add additional groups support
2015-10-09 12:29:07 -07:00
Mrunal Patel f152edcb1c Merge pull request #316 from cpuguy83/race_on_output_start_error
Fix for race from error on process start
2015-10-08 13:51:54 -07:00
Mrunal Patel 7f9864f576 Merge pull request #326 from ZJU-SEL/fix-named
change named to names
2015-10-08 09:27:44 -07:00
xlgao-zju 02fc164456 change named to names
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-08 21:44:23 +08:00
Brian Goff 7632c4585f Fix for race from error on process start
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.

Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.

With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-10-07 21:28:26 -04:00
Mrunal Patel 546c5c80dc Add additional gids support
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-07 16:51:53 -04:00
Mrunal Patel f184a880a2 Bump up github.com/opencontainers/specs to cf8dd12093
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-07 16:51:10 -04:00
Alexander Morozov 902c012e85 Merge pull request #319 from dodgerblue/dodgerblue-arm64
nsexec: Align clone child stack ptr to 16
2015-10-06 08:28:24 -07:00
Mrunal Patel 0f137226d8 Merge pull request #317 from runcom/bump-docker-pkgs
bump docker pkgs
2015-10-06 08:03:14 -07:00
Bogdan Purcareata 4c5eb45862 nsexec: Align clone child stack ptr to 16
This is required on ARM64 builds that use the clone syscall. Check [1].

[1] http://lxr.free-electrons.com/source/arch/arm64/kernel/process.c#L264

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
2015-10-06 10:41:18 +00:00
Antonio Murdaca c5b80bddf1 bump docker pkgs
Docker pkgs were updated while golinting the whole docker code base.
Now when trying to bump libcontainer/runc in docker, it fails compiling
with the following error:
``
vendor/src/github.com/opencontainers/runc/libcontainer/rootfs_linux.go:424:
undefined: mount.MountInfo
``
This is because, for instance, the mount pkg was updated here
0f5c9d301b (diff-49294d05afa48e2f7c0d2f02c6f7614c)
and now that type is only `mount.Info`.
This patch bump docker pkgs commit and adapt code to it.

Signed-off-by: Antonio Murdaca <amurdaca@redhat.com>
2015-10-06 10:48:12 +02:00
Mrunal Patel cc84f2cc9b Merge pull request #305 from hqhq/hq_add_softlimit_systemd
Add memory reservation support for systemd
2015-10-05 16:37:32 -07:00
Mrunal Patel 223975564a Merge pull request #276 from runcom/adapt-spec-96bcd043aa8a28f6f64c95ad61329765f01de1ba
Adapt spec 96bcd043aa
2015-10-05 16:36:09 -07:00
Alexander Morozov d7ce356411 Merge pull request #315 from mrunalp/systemd_name
Systemd name
2015-10-05 15:12:28 -07:00
Mrunal Patel 0b9e7af763 Merge pull request #313 from swagiaal/fix-GetAdditionalGroups
Allow numeric groups for containers without /etc/group
2015-10-05 11:47:36 -07:00
Michael Crosby fa0f445353 Merge pull request #314 from LK4D4/fix_name
Fix name in MAINTAINERS list
2015-10-05 11:41:33 -07:00
Alexander Morozov 151a5cec3e Fix name in MAINTAINERS list
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-10-05 11:28:48 -07:00
Mrunal Patel 79a02e35fb cgroups: Add name=systemd to list of subsystems
This allows getting the path to the subsystem and so is subsequently
used in EnterPid by an exec process.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:24:11 -04:00
Mrunal Patel 1940c73777 cgroups: Add a name cgroup
This is meant to be used in retrieving the paths so an exec
process enters all the cgroup paths correctly.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:23:05 -04:00
Alexander Morozov 5d22c824be Merge pull request #312 from ZJU-SEL/fix-gid
change uid to gid in func HostGID
2015-10-04 16:54:32 -07:00
Sami Wagiaalla c25c38cc80 Allow numeric groups for containers without /etc/group
/etc/groups is not needed when specifying numeric group ids. This
change allows containers without /etc/groups to specify numeric
supplemental groups.

Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
2015-10-04 19:02:35 -04:00
xlgao-zju 4b360d6300 change uid to gid in func HostGID
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-05 01:11:48 +08:00
Antonio Murdaca c6e406af24 Adjust runc to new opencontainers/specs version
Godeps: Vendor opencontainers/specs 96bcd043aa

Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274

Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-10-03 12:25:33 +02:00
Alexander Morozov c573ffbd05 Merge pull request #208 from rhvgoyal/config-rootfsPropagation
Create container_private, container_slave and container_shared modes for rootfsPropagation
2015-10-02 13:42:20 -07:00
Vivek Goyal 6a851e1195 exec_test.go: Test case for rootfsPropagation="private"
A test case to test rootfsPropagation="private" and making sure shared
volumes work.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 175e4b8aec exec_test.go: Test cases for rootfsPropagation=rslave
test case to test rootfsPropagation=rslave

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal da8d776c08 Make pivotDir rprivate
pivotDir is the one where pivot_root() call puts the old root. We will
unmount pivotDir() and delete it.

Previously we were making / always rslave or rprivate. That will mean 
that pivotDir() could never have mounts which would be shared with
parent mount namespace. That also means that unmounting pivotDir() was
safe and none of the unmount will propagate to parent namespace and
unmount things which we did not want to.

But now user can specify that apply private, shared, slave on /. That
means some of the mounts we inherited from parent could be shared and that
also means if we umount pivotDir/, those mounts will get unmounted in
parent too. That's not what we want.

Instead make pivotDir rprivate so that unmounts don't propagate back to
parent.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 23ec72a426 Make parent mount of container root private if it is shared.
pivot_root() introduces bunch of restrictions otherwise it fails. parent
mount of container root can not be shared otherwise pivot_root() will
fail. 

So far parent could not be shared as we marked everything either private
or slave. But now we have introduced new propagation modes where parent
mount of container rootfs could be shared and pivot_root() will fail.

So check if parent mount is shared and if yes, make it private. This will
make sure pivot_root() works.

Also it will make sure that when we bind mount container rootfs, it does
not propagate to parent mount namespace. Otherwise cleanup becomes a 
problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal f6fadd2ffe Start parsing rootfsPropagation and make it effective
spec introduced a new field rootfsPropagation. Right now that field
is not parsed by runc and it does not take effect. Starting parsing
it and for now allow only limited propagation flags. More can be
opened as new use cases show up. 

We are apply propagation flags on / and not rootfs. So ideally
we should introduce another field in spec say rootPropagation. For
now I am parsing rootfsPropagation. Once we agree on design, we
can discuss if we need another field in spec or not.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 5dd6caf6cf Replace config.Privatefs with config.RootPropagation
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.

Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.

Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.

config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
 
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Alexander Morozov 0954faba13 Merge pull request #306 from hqhq/hq_join_perfevent_systemd
Systemd: Join perf_event cgroup
2015-10-01 10:05:35 -07:00
Alexander Morozov 4d5079b9dc Merge pull request #309 from chenchun/fix_reOpenDevNull
Fix reOpenDevNull
2015-09-30 19:06:43 -07:00
Alexander Morozov fba07bce72 Merge pull request #307 from estesp/no-remount-if-unecessary
Only remount if requested flags differ from current
2015-09-30 11:40:06 -07:00
Mrunal Patel 74ded3660b Merge pull request #304 from rhatdan/mountproc
/proc and /sys do not support labeling
2015-09-30 11:36:20 -07:00
Michael Crosby 146916ca93 Merge pull request #308 from LK4D4/fix_tlb_tests
Run tests for all HugetlbSizes
2015-09-30 11:26:40 -07:00
Chun Chen 06d91f546f Fix reOpenDevNull
We should open /dev/null with os.O_RDWR, otherwise it won't be
possible writen to it

Signed-off-by: Chun Chen <ramichen@tencent.com>
2015-09-30 16:05:49 +08:00
Phil Estes 97f5ee4e6a Only remount if requested flags differ from current
Do not remount a bind mount to enable flags unless non-default flags are
provided for the requested mount. This solves a problem with user
namespaces and remount of bind mount permissions.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2015-09-29 23:13:04 -04:00