Commit Graph

280 Commits

Author SHA1 Message Date
Alexander Morozov d9ba9cebac Merge pull request #184 from huikang/criu-cgroup-manage-mode
Add option to support criu manage cgroups mode for dump and restore
2015-10-12 10:51:16 -07:00
Mrunal Patel bfe2bacbf4 Merge pull request #320 from rhatdan/label
Validate label options
2015-10-11 20:54:38 -07:00
Hui Kang 25da513c4b Add option to support criu manage cgroups mode for dump and restore
CRIU supports cgroup-manage mode from v1.7

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-10-11 04:42:54 +00:00
Dan Walsh f8b34352fe Validate label options
Only valid options to --security-opt for label should be
disable, user, role, type, level.

Return error on invalid entry

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-10-10 06:51:49 -04:00
Mrunal Patel f152edcb1c Merge pull request #316 from cpuguy83/race_on_output_start_error
Fix for race from error on process start
2015-10-08 13:51:54 -07:00
xlgao-zju 02fc164456 change named to names
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-08 21:44:23 +08:00
Brian Goff 7632c4585f Fix for race from error on process start
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.

Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.

With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-10-07 21:28:26 -04:00
Alexander Morozov 902c012e85 Merge pull request #319 from dodgerblue/dodgerblue-arm64
nsexec: Align clone child stack ptr to 16
2015-10-06 08:28:24 -07:00
Bogdan Purcareata 4c5eb45862 nsexec: Align clone child stack ptr to 16
This is required on ARM64 builds that use the clone syscall. Check [1].

[1] http://lxr.free-electrons.com/source/arch/arm64/kernel/process.c#L264

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
2015-10-06 10:41:18 +00:00
Antonio Murdaca c5b80bddf1 bump docker pkgs
Docker pkgs were updated while golinting the whole docker code base.
Now when trying to bump libcontainer/runc in docker, it fails compiling
with the following error:
``
vendor/src/github.com/opencontainers/runc/libcontainer/rootfs_linux.go:424:
undefined: mount.MountInfo
``
This is because, for instance, the mount pkg was updated here
0f5c9d301b (diff-49294d05afa48e2f7c0d2f02c6f7614c)
and now that type is only `mount.Info`.
This patch bump docker pkgs commit and adapt code to it.

Signed-off-by: Antonio Murdaca <amurdaca@redhat.com>
2015-10-06 10:48:12 +02:00
Mrunal Patel cc84f2cc9b Merge pull request #305 from hqhq/hq_add_softlimit_systemd
Add memory reservation support for systemd
2015-10-05 16:37:32 -07:00
Mrunal Patel 223975564a Merge pull request #276 from runcom/adapt-spec-96bcd043aa8a28f6f64c95ad61329765f01de1ba
Adapt spec 96bcd043aa
2015-10-05 16:36:09 -07:00
Alexander Morozov d7ce356411 Merge pull request #315 from mrunalp/systemd_name
Systemd name
2015-10-05 15:12:28 -07:00
Mrunal Patel 0b9e7af763 Merge pull request #313 from swagiaal/fix-GetAdditionalGroups
Allow numeric groups for containers without /etc/group
2015-10-05 11:47:36 -07:00
Mrunal Patel 79a02e35fb cgroups: Add name=systemd to list of subsystems
This allows getting the path to the subsystem and so is subsequently
used in EnterPid by an exec process.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:24:11 -04:00
Mrunal Patel 1940c73777 cgroups: Add a name cgroup
This is meant to be used in retrieving the paths so an exec
process enters all the cgroup paths correctly.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:23:05 -04:00
Sami Wagiaalla c25c38cc80 Allow numeric groups for containers without /etc/group
/etc/groups is not needed when specifying numeric group ids. This
change allows containers without /etc/groups to specify numeric
supplemental groups.

Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
2015-10-04 19:02:35 -04:00
xlgao-zju 4b360d6300 change uid to gid in func HostGID
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-05 01:11:48 +08:00
Antonio Murdaca c6e406af24 Adjust runc to new opencontainers/specs version
Godeps: Vendor opencontainers/specs 96bcd043aa

Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274

Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-10-03 12:25:33 +02:00
Alexander Morozov c573ffbd05 Merge pull request #208 from rhvgoyal/config-rootfsPropagation
Create container_private, container_slave and container_shared modes for rootfsPropagation
2015-10-02 13:42:20 -07:00
Vivek Goyal 6a851e1195 exec_test.go: Test case for rootfsPropagation="private"
A test case to test rootfsPropagation="private" and making sure shared
volumes work.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 175e4b8aec exec_test.go: Test cases for rootfsPropagation=rslave
test case to test rootfsPropagation=rslave

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal da8d776c08 Make pivotDir rprivate
pivotDir is the one where pivot_root() call puts the old root. We will
unmount pivotDir() and delete it.

Previously we were making / always rslave or rprivate. That will mean 
that pivotDir() could never have mounts which would be shared with
parent mount namespace. That also means that unmounting pivotDir() was
safe and none of the unmount will propagate to parent namespace and
unmount things which we did not want to.

But now user can specify that apply private, shared, slave on /. That
means some of the mounts we inherited from parent could be shared and that
also means if we umount pivotDir/, those mounts will get unmounted in
parent too. That's not what we want.

Instead make pivotDir rprivate so that unmounts don't propagate back to
parent.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 23ec72a426 Make parent mount of container root private if it is shared.
pivot_root() introduces bunch of restrictions otherwise it fails. parent
mount of container root can not be shared otherwise pivot_root() will
fail. 

So far parent could not be shared as we marked everything either private
or slave. But now we have introduced new propagation modes where parent
mount of container rootfs could be shared and pivot_root() will fail.

So check if parent mount is shared and if yes, make it private. This will
make sure pivot_root() works.

Also it will make sure that when we bind mount container rootfs, it does
not propagate to parent mount namespace. Otherwise cleanup becomes a 
problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 5dd6caf6cf Replace config.Privatefs with config.RootPropagation
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.

Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.

Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.

config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
 
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Alexander Morozov 0954faba13 Merge pull request #306 from hqhq/hq_join_perfevent_systemd
Systemd: Join perf_event cgroup
2015-10-01 10:05:35 -07:00
Alexander Morozov 4d5079b9dc Merge pull request #309 from chenchun/fix_reOpenDevNull
Fix reOpenDevNull
2015-09-30 19:06:43 -07:00
Alexander Morozov fba07bce72 Merge pull request #307 from estesp/no-remount-if-unecessary
Only remount if requested flags differ from current
2015-09-30 11:40:06 -07:00
Mrunal Patel 74ded3660b Merge pull request #304 from rhatdan/mountproc
/proc and /sys do not support labeling
2015-09-30 11:36:20 -07:00
Michael Crosby 146916ca93 Merge pull request #308 from LK4D4/fix_tlb_tests
Run tests for all HugetlbSizes
2015-09-30 11:26:40 -07:00
Chun Chen 06d91f546f Fix reOpenDevNull
We should open /dev/null with os.O_RDWR, otherwise it won't be
possible writen to it

Signed-off-by: Chun Chen <ramichen@tencent.com>
2015-09-30 16:05:49 +08:00
Phil Estes 97f5ee4e6a Only remount if requested flags differ from current
Do not remount a bind mount to enable flags unless non-default flags are
provided for the requested mount. This solves a problem with user
namespaces and remount of bind mount permissions.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2015-09-29 23:13:04 -04:00
Alexander Morozov e32b3442ec Run tests for all HugetlbSizes
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-29 17:08:41 -07:00
Qiang Huang 6a5ba1109c Systemd: Join perf_event cgroup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 15:42:29 +08:00
Qiang Huang fb5a56fb97 Add memory reservation support for systemd
Seems it's missed in the first place.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 10:02:12 +08:00
Dan Walsh cab342f0de Check for failure on /dev/mqueue and try again without labeling
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Dan Walsh b4dcb75503 /proc and /sys do not support labeling
This is causing docker to crash when --selinux-enforcing mode is set.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Alexander Morozov 902ccd0f18 Merge pull request #302 from mrunalp/cap_list
Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b4
2015-09-25 08:49:44 -07:00
Mrunal Patel c5d3bda7e1 Merge pull request #292 from keloyang/rpid
no need to use p.cmd.Process.Pid in function, use p.pid() instead.
2015-09-24 15:59:39 -07:00
Mrunal Patel 34d3e2b948 Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b45114c80ac44119e7b8801fdd852
This allows us to use the capability.List() function to construct capability list
dynamically.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-24 18:44:01 -04:00
Alexander Morozov aac9179bba Merge pull request #160 from mrunalp/feature/hooks
Add prestart/poststop hooks to runc
2015-09-24 14:52:30 -07:00
Michael Crosby 203d3e258e Move mount methods out of configs pkg
Do not have methods and actions that require syscalls in the configs
package because it breaks cross compile.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-24 09:43:12 -07:00
Mrunal Patel dcafe48737 Add version to HookState to make it json-compatible with spec State
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-23 17:13:00 -07:00
Alexander Morozov 83b2975c8b Merge pull request #295 from mheon/seccomp_architecture
Libcontainer: Add support for multiple architectures in Seccomp
2015-09-23 11:08:47 -07:00
Matthew Heon 795a6c9702 Libcontainer: Add support for multiple architectures in Seccomp
This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-09-23 13:54:24 -04:00
Michael Crosby 5765dcd086 Merge pull request #296 from crosbymichael/mount-resolv-symlink
Change mount dest after resolving symlinks
2015-09-23 10:21:25 -07:00
Michael Crosby b3bb606513 Change mount dest after resolving symlinks
We need to update the mount's destination after we resolve symlinks so
that it properly creates and mounts the correct location.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-23 10:07:18 -07:00
keloyang 69a5b2df9e no need to use p.cmd.Process.Pid in function, use p.pid() instead.
Signed-off-by: keloyang <yangshukui@huawei.com>
2015-09-23 10:48:36 +08:00
Mrunal Patel d8b7deaf4c Merge pull request #283 from runcom/cleanup-unused-func-args
Cleanup unused func arguments
2015-09-22 16:53:19 -07:00
Mrunal Patel 7570169548 Merge pull request #288 from gitido/fix_userns
Enter existing user namespace if present
2015-09-22 16:27:57 -07:00
Michael Crosby 219b6c99e0 Ignore changing /dev/null permissions if used in STDIO
Whenever dev/null is used as one of the main processes STDIO, do not try
to change the permissions on it via fchown because we should not do it
in the first place and also this will fail if the container is supposed
to be readonly.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-22 15:32:31 -07:00
Ido Yariv 08366a8597 Enter existing user namespace if present
When executing an additional process in a container, all namespaces are
entered but the user namespace. As a result, the process may be
executed as the host's root user. This has both functionality and
security implications.

Fix this by adding the missing user namespace to the array of
namespaces. Since joining a user namespace in which the caller is
already a member yields an error, skip namespaces we're already in.

Last, remove a needless and buggy AT_SYMLINK_NOFOLLOW in the code.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-09-21 21:49:52 -04:00
Antonio Murdaca d6e6462478 Cleanup unused func arguments
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-21 11:50:29 +02:00
Michael Crosby 0dad64f7ad Fix STDIO permissions when container user not root
Fix the permissions of the container's main processes STDIO when the
process is not run as the root user.  This changes the permissions right
before switching to the specified user so that it's STDIO matches it's
UID and GID.

Add a test for checking that the STDIO of the process is owned by the
specified user.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-18 14:11:29 -07:00
Vivek Goyal d1f4a5b8b5 libcontainer: Allow passing mount propagation flags
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.

{
  "type": "bind",
  "source": "/root/mnt-source",
  "destination": "/root/mnt-dest",
  "options": "rbind,shared"
}

One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.

#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest                      private

Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).

Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.

Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.

This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-09-16 15:53:23 -04:00
Mrunal Patel ec37110957 Update README for the CAP prefix change
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-15 14:44:12 -04:00
Mrunal Patel 859abee0c8 Add CAP prefix for capabilities
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-15 14:43:03 -04:00
Mrunal Patel 4d8e13fc3e Merge pull request #43 from LK4D4/new_netlink
New netlink library
2015-09-14 14:01:07 -07:00
Mrunal Patel 486ac97618 Merge pull request #236 from hqhq/hq_fix_cgroup_rw
Always remount for bind mount
2015-09-14 12:08:34 -07:00
Rajasekaran 2940f73a14 make localtest failure on removing seccomp flag
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2015-09-12 14:43:55 +05:30
Mrunal Patel ef9471fd5b Merge pull request #253 from avagin/cr-cgroups
c/r: create cgroups to restore a container
2015-09-11 18:03:40 -07:00
Alexander Morozov b0fd9fb75a Merge pull request #220 from crosbymichael/build-tags
Add seccomp build tag
2015-09-11 12:06:27 -07:00
Michael Crosby a8e0185d97 Add seccomp build tag
Add a seccomp build tag and also support in the Makefile to add or
remove build tags.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 12:03:57 -07:00
David Calavera 0f28592b35 Turn hook pointers into values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 11:34:34 -07:00
Michael Crosby dd969cbacd Add test for function based hooks
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-10 18:15:00 -07:00
Mrunal Patel 1dca365393 Add test for prestart hook
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

Conflicts:
	libcontainer/integration/exec_test.go
2015-09-10 17:59:36 -07:00
Michael Crosby 05567f2c94 Implement hooks in libcontainer
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-10 17:57:31 -07:00
Andrey Vagin df39686c93 c/r: create cgroups to restore a container
Here are two reasons:
* If we use systemd, we need to ask it to create cgroups
* If a container is restored with another ID, we need to
  change paths to cgroups.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 21:00:27 +03:00
Andrey Vagin da2535f2d1 mount: don't read /proc/self/cgroup many times
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 21:00:22 +03:00
Andrey Vagin e49c1dc559 Rework ParseCgroupFile
Currently we parse /proc/self/cgroup for each controller.
It's ineffective.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 20:59:27 +03:00
Alexander Morozov 24f4d5d1fd Remove old netlink library
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-09 19:38:02 -07:00
Alexander Morozov 916bd6bd68 Use github.com/vishvananda/netlink for networking
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-09 19:32:46 -07:00
Qiang Huang b94fe5b7f8 Fix bug in find cgroup mount point dir
Bug was introduced in #250

According to: http://man7.org/linux/man-pages/man5/proc.5.html

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
...
(7)  optional fields: zero or more fields of the form
       "tag[:value]".
The 7th field is optional. We should skip it when parsing mount info.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-10 08:29:12 +08:00
Qiang Huang f2ec7eff7e Rename FindCgroupMountpointAndSource
Rename it to FindCgroupMountpointAndRoot.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-09 09:29:11 +08:00
Qiang Huang bc67941c72 Parse directly in FindCgroupMountpointDir
Unify it with FindCgroupMountpoint, and add comments why
we should to do this.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-09 09:28:50 +08:00
Alexander Morozov 05b1cda5dd Merge pull request #235 from hqhq/hq_fix_cgroup_test
Fix cgroup mount tests
2015-09-01 14:57:44 -07:00
Vishnu Kannan cc232c4707 Adding oom_score_adj as a container config param.
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2015-08-31 14:02:59 -07:00
Qiang Huang 085f465c00 Fix cgroup mount tests
I got:
```
exec_test.go:823: Mode expected to contain 'ro,nosuid,nodev,noexec': tmpfs on /sys/fs/cgroup type tmpfs (ro,seclabel,nosuid,nodev,noexec,relatime,mode=755
```wq

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-31 11:23:18 +08:00
Qiang Huang b7385e291c Always remount for bind mount
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-31 11:10:34 +08:00
Michael Crosby b1e7041957 Merge pull request #165 from calavera/context_labels
Make label.Relabel safer.
2015-08-28 14:20:00 -07:00
Matthew Heon 2ee6d1e8b6 Connect Seccomp configuration in Spec to configuration in Libcontainer
Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-25 17:35:06 -04:00
Mrunal Patel 2f4c229a8c Merge pull request #215 from boucher/huikang-patch
Add hooks for passing explicit veth pairs for forwarding to CRIU
2015-08-24 21:23:29 -07:00
Hui Kang 7f23085c82 Add hooks for passing explicit veth pairs for forwarding to CRIU.
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-08-24 09:26:39 -07:00
boucher 8c812d0f50 Add the criu log file path to the failure message.
Signed-off-by: Ross Boucher <rboucher@gmail.com>
2015-08-21 14:20:59 -07:00
Mrunal Patel e7663a673e Merge pull request #70 from mheon/seccomp
Convert Seccomp support to use Libseccomp
2015-08-21 12:25:33 -07:00
Lai Jiangshan e48363d777 simplify a variable declaration
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-20 08:21:44 +08:00
Mrunal Patel ca8831fa75 Merge pull request #183 from rajasec/securityfs
Adding securityfs mount
2015-08-18 14:24:38 -07:00
Mrunal Patel c20bda3f71 Merge pull request #206 from mountkin/ensure-cleanup
Ensure the cleanup jobs in the deferrer are executed on error
2015-08-18 14:16:31 -07:00
Michael Crosby b0ca535f75 Merge pull request #194 from LK4D4/fix_cgroups_again
Fix cgroups again
2015-08-18 13:49:31 -07:00
Michael Crosby c6b6be21c5 Merge pull request #199 from clnperez/ifrdatabyte-sign-pr
Fixing netlink build error on ppc64le with gccgo
2015-08-18 13:48:59 -07:00
rajasec 8cdc409715 Fixing tmpfs
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-08-17 06:22:48 +05:30
Shijiang Wei f0679089b9 Ensure the cleanup jobs in the deferrer are executed on error
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
2015-08-16 12:29:04 +08:00
Michael Chase-Salerno 9bc81d1699 Fixing netlink build error on ppc64le with gccgo
Again. It looks like a build tag was somehow dropped between
the PR here: https://github.com/docker/libcontainer/pull/625
and the move to runc.

Signed-off-by: Christy Perez <clnperez@linux.vnet.ibm.com>
2015-08-13 17:52:47 -05:00
Matthew Heon a6b73dbc73 Remove Seccomp build tag to fix godep
Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 15:23:43 -04:00
Matthew Heon 59264040bd Update tests to not error on library v2.2.0 and lower
As v2.1.0 is no longer required for successful testing, do not build it in the
Dockerfile - instead just use the version Ubuntu ships.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 09:36:21 -04:00
Matthew Heon 2ae581ae62 Convert Seccomp support to use Libseccomp
This removes the existing, native Go seccomp filter generation and replaces it
with Libseccomp. Libseccomp is a C library which provides architecture
independent generation of Seccomp filters for the Linux kernel.

This adds a dependency on v2.2.1 or above of Libseccomp.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 07:56:27 -04:00
Lai Jiangshan e8817e1104 Simplify the return on process wait
Simplify the code introduced by the commit d1f0d5705deb:
    Return actual ProcessState on Wait error

Cc: Alexander Morozov <lk4d4@docker.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-12 22:37:34 +08:00
Alexander Morozov 2b28b3c276 Always use cgroup root of current process
Because for host PID namespace /proc/1/cgroup can point to whole other
world of cgroups.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-11 18:04:59 -07:00
Alexander Morozov 5aa6005498 Revert "Fix cgroup parent searching"
This reverts commit 2f9052ca29.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-11 18:04:55 -07:00
Alexander Morozov 2f9052ca29 Fix cgroup parent searching
I had pretty convenient input data to miss this bug.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-10 14:30:05 -07:00
rajasec 24f7a10a93 Adding securityfs mount
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-08-05 16:50:08 +05:30
Mrunal Patel f3a3025933 Fix minor stylistic issues
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-04 17:44:45 -04:00
Mrunal Patel c9d5850629 Don't make modifications to /dev there are no devices in the configuration
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-04 16:57:29 -04:00
Michael Crosby a5ef75b681 Add signal API to Container interface
This adds a `Signal()` method to the container interface so that the
initial process can be signaled after a Load or operation.  It also
implements signaling the init process from a nonChildProcess.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-08-03 17:07:29 -07:00
Mrunal Patel ce0a339632 Merge pull request #166 from gitido/fixes
Go1.5 compatibility fix
2015-08-03 13:51:26 -07:00
Michael Crosby 76e706f856 Merge pull request #151 from LK4D4/use_proc_exe
Use /proc/self/exe as default for InitPath
2015-08-03 16:15:33 -04:00
Michael Crosby b1821a4edc Merge pull request #150 from runcom/update-go-systemd-dbus-v3
Update go systemd dbus v3
2015-08-03 16:11:52 -04:00
Ido Yariv 86a85582d2 Don't set /proc/<PID>/setgroups to deny in Go1.5
A boolean field named GidMappingsEnableSetgroups was added to
SysProcAttr in Go1.5. This field determines the value of the process's
setgroups proc entry.

Since the default is to set the entry to 'deny', calling setgroups will
fail on systems running kernels 3.19+.

Set GidMappingsEnableSetgroups to true so setgroups wont be set to
'deny'.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-08-03 14:59:15 -04:00
Hui Kang 0f66ff921a Add debug message when unable to execute criu
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-08-03 17:09:45 +00:00
Antonio Murdaca 9caef6c8c4 Remove reference to nsinit
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-08-02 12:00:39 +02:00
David Calavera 4bd4d462af Make label.Relabel safer.
- Check if Selinux is enabled before relabeling. This is a bug.
- Make exclusion detection constant time. Kinda buggy too, imo.
- Do not depend on a magic string to create a new Selinux context.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-07-31 10:37:32 -07:00
Mrunal Patel 602e8331a0 Merge pull request #164 from LK4D4/remove_dind
Remove dind
2015-07-31 07:53:03 -07:00
Mrunal Patel 19df27d08c Merge pull request #163 from avagin/cr_cgroups
tests: dump/restore a container with cgroups
2015-07-30 13:50:09 -07:00
Alexander Morozov 1735ad788f Replace dind with smaller script
It just mounts /tmp into tmpfs. We need this because criu tests has
problems on overlayfs.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-30 13:23:26 -07:00
Andrey Vagin aa3c2dc621 integration: show criu logs in a error case
Signed-off-by: Andrew Vagin <avagin@openvz.org>
2015-07-30 21:01:09 +03:00
Andrew Vagin e2e6a73b62 tests: dump/restore a container with cgroups
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-07-30 08:39:02 +03:00
Kir Kolyshkin 6f82d4b544 Simplify and fix os.MkdirAll() usage
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.

Quoting MkdirAll documentation:

> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.

This means two things:

1. If a directory to be created already exists, no error is
returned.

2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.

The above is a theory, based on quoted documentation and my UNIX
knowledge.

3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.

Because of #1, IsExist check after MkdirAll is not needed.

Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.

Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.

[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
2015-07-29 18:03:27 -07:00
Mrunal Patel 0e72bfb815 Fix files not closed in mountinfo parsing function
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-07-27 19:33:39 -04:00
Michael Crosby 4507c068ba Merge pull request #145 from LK4D4/sysfs_ro
Remount /sys/fs/cgroup as RO if MS_RDONLY was passed
2015-07-27 09:12:55 -07:00
Lai Jiangshan f26935eb0c test: propagate the error to the caller
When the copyBusybox() fails, the error message should be
propagated to the caller of newRootfs().

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-07-25 22:25:43 +08:00
Antonio Murdaca 5eab2d59d3 Swap check for systemd booted to use go-systemd method
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-07-25 01:36:14 +02:00
Alexander Morozov d9e513043c Use /proc/self/exe as default for InitPath
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-24 11:45:09 -07:00
Antonio Murdaca 15741a4ab3 Adapt code to go-systemd/dbus v3
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-07-24 15:54:59 +02:00
Mrunal Patel 32aa2756ca Merge pull request #148 from jhjeong-kr/typo
typo: tempory -> temporary
2015-07-23 20:46:12 -07:00
Jin-Hwan Jeong 491cfef259 typo: tempory -> temporary
Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
2015-07-24 11:19:25 +09:00
Alexander Morozov d89964eed3 Remount /sys/fs/cgroup as RO if MS_RDONLY was passed in m.Flags
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-22 11:05:40 -07:00
Alexander Morozov 1ed929f177 Merge pull request #114 from brahmaroutu/gccgo_stacktrace_loops
avoid infinite loop with GCCGO
2015-07-21 11:05:09 -07:00
Alexander Morozov d3217084b5 Create symlinks for merged cgroups
This allows software be not aware about existence of merged cgroups.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-20 16:12:28 -07:00
Mrunal Patel 073e76c9fc Merge pull request #142 from avagin/cr
ct: give criu informations about cgroup mounts
2015-07-20 15:13:03 -07:00
Andrey Vagin af4a5e708a ct: give criu informations about cgroup mounts
Actually cgroup mounts are bind-mounts, so they should be
handled by the same way.

Reported-by: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-07-20 22:56:07 +03:00
Alexander Morozov c0e18b96fb Fix subsystem path with abs parent
Sometimes subsystem can be mounted to path like "subsystem1,subsystem2",
so we need to handle this.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-20 11:48:58 -07:00
Mrunal Patel 5b805276c2 Revert "Remount /sys/fs/cgroup as readonly always"
This reverts commit 18de1a273e.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-07-17 17:50:46 -04:00
Mrunal Patel 618e0caeca Merge pull request #135 from LK4D4/fix_apply_cgroups
Substract source mount from cgroup dir
2015-07-17 13:55:01 -07:00
Alexander Morozov 18de1a273e Remount /sys/fs/cgroup as readonly always
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-17 12:45:09 -07:00
Alexander Morozov fc31076c23 Substract source mount from cgroup dir
This is needed because for nested containers cgroups. Without this patch
they creating unnecessary intermediate cgroup like:
/sys/fs/cgroup/memory/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/system.slice/docker-9409d9f0b68fb9e9d7d532d5b3f35e7c7f9cca1312af392ae3b28436f1f2998f.scope/docker/908ebcc9c13584a14322ec070bd971e0de62f126c0cd95c079acdb99990ad3a3

It is because in /proc/self/cgroup we see paths from host, and they don't
exist in container.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-17 11:41:58 -07:00
Mrunal Patel 2598484b97 Merge pull request #130 from LK4D4/cgroups_mount_fix
Cgroups mount fix
2015-07-16 10:49:13 -07:00
Alexander Morozov e289cf734b Fix handling name= cgroups
Before name=systemd cgroup was mounted inside container to
/sys/fs/cgroup/name=systemd, which is wrong, it should be
/sys/fs/cgroup/systemd

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-15 13:58:17 -07:00
Alexander Morozov f6eb19c0d5 Tests for mounting cgroups
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-15 11:07:03 -07:00
Alexander Morozov 40b9b89107 Substract bindmount path from cgroup dir
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-15 10:41:25 -07:00
Mrunal Patel 42aa891a6b Merge pull request #91 from hqhq/hq_add_cgroup_mount
Add cgroup mount in the recommended config
2015-07-15 09:51:24 -07:00
Alexander Morozov 0d948945b0 Merge pull request #127 from hqhq/hq_fix_tmpfs_mount
Correct tmpfs mount for cgroup
2015-07-14 22:11:52 -07:00
Qiang Huang d7181a73e4 Add cgroup mount in the recommended config
And allow cgroup mount take flags from user configs.
As we show ro in the recommendation, so hard-coded
read-only flag should be removed.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-07-15 09:31:39 +08:00
Qiang Huang b1fd78346e Correct tmpfs mount for cgroup
Fixes: https://github.com/docker/docker/issues/14543
Fixes: https://github.com/docker/docker/pull/14610

Before this, we got mount info in container:
```
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
 /sys/fs/cgroup tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
```

It has no mount source, so in `parseInfoFile` in Docker code,
we'll get:
```
Error found less than 3 fields post '-' in "84 83 0:41 / /sys/fs/cgroup rw,nosuid,nodev,noexec,relatime - tmpfs  rw,seclabel"
```

After this fix, we have mount info corrected:
```
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
tmpfs /sys/fs/cgroup tmpfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup/cpuset cgroup rw,relatime,cpuset 0 0
```

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-07-15 09:09:09 +08:00
Qiang Huang 4e244108ef Fix error when memory cgroup not mounted
Fixes: #57

Normally all cgroup subsystems are optional except device cgroup,
but memory cgroup optional was broken by:
https://github.com/docker/libcontainer/pull/637

This patch fixes this.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-07-13 18:22:35 +08:00
root b5412d1b59 the data type should be int8 for ppc64le
Signed-off-by: Srini Brahmaroutu <srbrahma@us.ibm.com>
2015-07-10 20:11:11 +00:00
root 23e3887e05 avoid infinite loop with GCCGO
Signed-off-by: Srini Brahmaroutu <srbrahma@us.ibm.com>
2015-07-10 19:15:26 +00:00
Mrunal Patel 503adf586f Remove deserialization tests.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-07-09 18:46:13 -04:00
Michael Crosby e9c0535f3c Merge pull request #52 from jhowardmsft/remove-seccomp
Windows: Factor out seccomp
2015-07-09 11:50:41 -07:00
Michael Crosby e224e2c468 Merge pull request #53 from jhowardmsft/CloseExecFrom
Windows: Factor out CloseExecFrom
2015-07-09 11:50:07 -07:00
Mrunal Patel 6c88b305de Merge pull request #97 from hqhq/hq_add_oom_kill_disable
Add oom-kill-disable support for systemd
2015-07-08 09:40:08 -07:00