Commit Graph

1940 Commits

Author SHA1 Message Date
Vivek Goyal 23ec72a426 Make parent mount of container root private if it is shared.
pivot_root() introduces bunch of restrictions otherwise it fails. parent
mount of container root can not be shared otherwise pivot_root() will
fail. 

So far parent could not be shared as we marked everything either private
or slave. But now we have introduced new propagation modes where parent
mount of container rootfs could be shared and pivot_root() will fail.

So check if parent mount is shared and if yes, make it private. This will
make sure pivot_root() works.

Also it will make sure that when we bind mount container rootfs, it does
not propagate to parent mount namespace. Otherwise cleanup becomes a 
problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal f6fadd2ffe Start parsing rootfsPropagation and make it effective
spec introduced a new field rootfsPropagation. Right now that field
is not parsed by runc and it does not take effect. Starting parsing
it and for now allow only limited propagation flags. More can be
opened as new use cases show up. 

We are apply propagation flags on / and not rootfs. So ideally
we should introduce another field in spec say rootPropagation. For
now I am parsing rootfsPropagation. Once we agree on design, we
can discuss if we need another field in spec or not.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 5dd6caf6cf Replace config.Privatefs with config.RootPropagation
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.

Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.

Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.

config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
 
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Alexander Morozov 0954faba13 Merge pull request #306 from hqhq/hq_join_perfevent_systemd
Systemd: Join perf_event cgroup
2015-10-01 10:05:35 -07:00
Alexander Morozov 4d5079b9dc Merge pull request #309 from chenchun/fix_reOpenDevNull
Fix reOpenDevNull
2015-09-30 19:06:43 -07:00
Alexander Morozov fba07bce72 Merge pull request #307 from estesp/no-remount-if-unecessary
Only remount if requested flags differ from current
2015-09-30 11:40:06 -07:00
Mrunal Patel 74ded3660b Merge pull request #304 from rhatdan/mountproc
/proc and /sys do not support labeling
2015-09-30 11:36:20 -07:00
Michael Crosby 146916ca93 Merge pull request #308 from LK4D4/fix_tlb_tests
Run tests for all HugetlbSizes
2015-09-30 11:26:40 -07:00
Chun Chen 06d91f546f Fix reOpenDevNull
We should open /dev/null with os.O_RDWR, otherwise it won't be
possible writen to it

Signed-off-by: Chun Chen <ramichen@tencent.com>
2015-09-30 16:05:49 +08:00
Phil Estes 97f5ee4e6a Only remount if requested flags differ from current
Do not remount a bind mount to enable flags unless non-default flags are
provided for the requested mount. This solves a problem with user
namespaces and remount of bind mount permissions.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2015-09-29 23:13:04 -04:00
Alexander Morozov e32b3442ec Run tests for all HugetlbSizes
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-29 17:08:41 -07:00
Qiang Huang 6a5ba1109c Systemd: Join perf_event cgroup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 15:42:29 +08:00
Qiang Huang fb5a56fb97 Add memory reservation support for systemd
Seems it's missed in the first place.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 10:02:12 +08:00
Dan Walsh cab342f0de Check for failure on /dev/mqueue and try again without labeling
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Dan Walsh b4dcb75503 /proc and /sys do not support labeling
This is causing docker to crash when --selinux-enforcing mode is set.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Alexander Morozov 902ccd0f18 Merge pull request #302 from mrunalp/cap_list
Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b4
2015-09-25 08:49:44 -07:00
Mrunal Patel c5d3bda7e1 Merge pull request #292 from keloyang/rpid
no need to use p.cmd.Process.Pid in function, use p.pid() instead.
2015-09-24 15:59:39 -07:00
Mrunal Patel 34d3e2b948 Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b45114c80ac44119e7b8801fdd852
This allows us to use the capability.List() function to construct capability list
dynamically.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-24 18:44:01 -04:00
Alexander Morozov aac9179bba Merge pull request #160 from mrunalp/feature/hooks
Add prestart/poststop hooks to runc
2015-09-24 14:52:30 -07:00
Alexander Morozov dbfcbc4ce6 Merge pull request #299 from crosbymichael/move-mount-methods
Move mount methods out of configs pkg
2015-09-24 14:41:22 -07:00
Michael Crosby 203d3e258e Move mount methods out of configs pkg
Do not have methods and actions that require syscalls in the configs
package because it breaks cross compile.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-24 09:43:12 -07:00
Mrunal Patel dcafe48737 Add version to HookState to make it json-compatible with spec State
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-23 17:13:00 -07:00
Mrunal Patel 9964fcde37 hooks: Integrate spec hooks with libcontainer
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-23 16:29:10 -07:00
Mrunal Patel 18c461301d Merge pull request #270 from laijs/spec-options-refactor
simple refactor for the options of `runc spec`
2015-09-23 16:26:01 -07:00
Mrunal Patel 32471bd735 Merge pull request #271 from laijs/update-config-example
README.md: Update the config example
2015-09-23 16:25:27 -07:00
Alexander Morozov 83b2975c8b Merge pull request #295 from mheon/seccomp_architecture
Libcontainer: Add support for multiple architectures in Seccomp
2015-09-23 11:08:47 -07:00
Matthew Heon 795a6c9702 Libcontainer: Add support for multiple architectures in Seccomp
This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-09-23 13:54:24 -04:00
Michael Crosby 5765dcd086 Merge pull request #296 from crosbymichael/mount-resolv-symlink
Change mount dest after resolving symlinks
2015-09-23 10:21:25 -07:00
Michael Crosby b3bb606513 Change mount dest after resolving symlinks
We need to update the mount's destination after we resolve symlinks so
that it properly creates and mounts the correct location.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-23 10:07:18 -07:00
keloyang 69a5b2df9e no need to use p.cmd.Process.Pid in function, use p.pid() instead.
Signed-off-by: keloyang <yangshukui@huawei.com>
2015-09-23 10:48:36 +08:00
Mrunal Patel d8b7deaf4c Merge pull request #283 from runcom/cleanup-unused-func-args
Cleanup unused func arguments
2015-09-22 16:53:19 -07:00
Mrunal Patel 7570169548 Merge pull request #288 from gitido/fix_userns
Enter existing user namespace if present
2015-09-22 16:27:57 -07:00
Mrunal Patel ec20263251 Merge pull request #289 from crosbymichael/stdio-null
Ignore changing /dev/null permissions if used in STDIO
2015-09-22 15:36:27 -07:00
Michael Crosby 219b6c99e0 Ignore changing /dev/null permissions if used in STDIO
Whenever dev/null is used as one of the main processes STDIO, do not try
to change the permissions on it via fchown because we should not do it
in the first place and also this will fail if the container is supposed
to be readonly.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-22 15:32:31 -07:00
Mrunal Patel ff122b864a Merge pull request #291 from runcom/test-dockerfile-criu-source
script: test_Dockerfile: install criu from source
2015-09-22 15:12:10 -07:00
Antonio Murdaca 9ff7b82f9b script: test_Dockerfile: install criu from source
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-22 23:59:31 +02:00
Ido Yariv 08366a8597 Enter existing user namespace if present
When executing an additional process in a container, all namespaces are
entered but the user namespace. As a result, the process may be
executed as the host's root user. This has both functionality and
security implications.

Fix this by adding the missing user namespace to the array of
namespaces. Since joining a user namespace in which the caller is
already a member yields an error, skip namespaces we're already in.

Last, remove a needless and buggy AT_SYMLINK_NOFOLLOW in the code.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-09-21 21:49:52 -04:00
Michael Crosby 08b5415ffa Merge pull request #280 from crosbymichael/stdio-perms
Fix STDIO permissions when container user not root
2015-09-21 11:23:29 -07:00
Antonio Murdaca d6e6462478 Cleanup unused func arguments
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-21 11:50:29 +02:00
Lai Jiangshan 7a54c7bd7b README.md: Update the config example
* version in the config example is advanced to 0.1.0
* rootfsPropagation in config.json is removed
    (The same one is already in runtime.json)
* rlimit time is changed from magic number to name(string)
* add pids cgroup
* add cgroup path

After this change applied, the example config in this README.md
is consistent with the result of `runc spec`.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-09-19 08:46:55 +08:00
Michael Crosby 0dad64f7ad Fix STDIO permissions when container user not root
Fix the permissions of the container's main processes STDIO when the
process is not run as the root user.  This changes the permissions right
before switching to the specified user so that it's STDIO matches it's
UID and GID.

Add a test for checking that the STDIO of the process is owned by the
specified user.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-18 14:11:29 -07:00
Alexander Morozov 4198b43b18 Merge pull request #279 from crosbymichael/fix-stdio-ownership
Fix STDIO ownership for non-tty processes
2015-09-18 11:54:41 -07:00
Michael Crosby 4a91d2c6e7 Fix STDIO ownership for non-tty processes
When we are using user namespaces we need to make sure that when we do
not have a TTY we change the ownership of the pipe()'s used for the
process to the root user within the container so that when you call
open() on any of the /proc/self/fd/*'s you do not get an EPERM.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-18 11:35:23 -07:00
Alexander Morozov a0f4c39049 Merge pull request #278 from runcom/update-criu
script: test_Dockerfile: update criu version
2015-09-18 09:47:50 -07:00
Antonio Murdaca 8d38af2c33 script: test_Dockerfile: update criu version
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-18 17:51:03 +02:00
Alexander Morozov cae6b31029 Merge pull request #264 from rhvgoyal/spec-allow-propgation-flags
libcontainer: Allow passing mount propagation flags
2015-09-17 15:02:17 -07:00
Mrunal Patel ef4ad24bb7 Merge pull request #269 from laijs/runc-start-usage
update the command usage for `runc start`
2015-09-17 08:39:28 -07:00
Lai Jiangshan 89aa41ee70 update the command usage for `runc start`
The patch mainly removes the wrong "for writing".
The config files are readonly when `runc start`.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-09-17 10:33:20 +08:00
Vivek Goyal d1f4a5b8b5 libcontainer: Allow passing mount propagation flags
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.

{
  "type": "bind",
  "source": "/root/mnt-source",
  "destination": "/root/mnt-dest",
  "options": "rbind,shared"
}

One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.

#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest                      private

Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).

Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.

Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.

This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-09-16 15:53:23 -04:00
Alexander Morozov dae4560ec2 Merge pull request #257 from mrunalp/cap_prefix
Add CAP prefix for capabilities
2015-09-16 11:39:39 -07:00