Commit Graph

599 Commits

Author SHA1 Message Date
Chun Chen 2ee9cbbd12 It's /proc/stat, not /proc/stats
Also adds /proc/net/dev to the valid mount destination white list

Signed-off-by: Chun Chen <ramichen@tencent.com>
2016-02-16 15:59:27 +08:00
rajasec 4cd31f63c5 Change softlink name to /dev/core
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-02-15 17:52:19 +05:30
Qiang Huang bda7742019 Cleanup systemd apply
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-15 15:56:59 +08:00
Qiang Huang 7b88f34d6e Remove unneeded cgroups path removal
It's handled in `destroy()`, no need to do this in
`Apply()`. I found this because systemd cgroup didn't
do this removal and it works well.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-15 11:22:13 +08:00
Aleksa Sarai 21dc85c4b8 libcontainer: cgroups: fs: add cgroup path safety unit tests
In order to avoid problems with security regressions going unnoticed,
add some unit tests that should make sure security regressions in cgroup
path safety cause tests to fail in runC.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-02-14 00:37:21 +11:00
Aleksa Sarai b8dc5213e8 libcontainer: cgroups: fs: fix path safety
Ensure that path safety is maintained, this essentially reapplies
c0cad6aa5e ("cgroups: fs: fix cgroup.Parent path sanitisation"), which
was accidentally removed in 256f3a8ebc ("Add support for CgroupsPath
field").

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-02-14 00:37:21 +11:00
Aleksa Sarai 90140a5688 libcontainer: cgroups: fs: fix innerPath
Fix m.Path legacy code to actually work.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-02-14 00:37:21 +11:00
Aleksa Sarai 1f8711751e libcontainer: integration: fix flaky pids limit tests
Because we are implemented in Go, the number of pids present in a
container is not very well-defined (other than it not being /much/
bigger than the limit you'd want to set). As a result, we need to make
the tests a bit less flaky in this regard.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-02-12 00:14:22 +11:00
Alexander Morozov 4678b01e64 Merge pull request #497 from mlaventure/cgroups-path
Replace Cgroup Parent and Name fields by CgroupsPath
2016-02-10 13:00:49 -08:00
Kenfe-Mickael Laventure 256f3a8ebc Add support for CgroupsPath field
Fixes #396

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-02-10 11:26:51 -08:00
Kenfe-Mickael Laventure dceeb0d0df Move pathClean to libcontainer/utils.CleanPath
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-02-09 16:21:58 -08:00
Alexander Morozov 8e8d01d38d Merge pull request #536 from crosbymichael/update-spec
Update spec to v0.3.0
2016-02-09 10:53:46 -08:00
rajasec 241e66dbe7 Adding pids subsystem in SPEC.md
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-02-09 20:42:11 +05:30
Michael Crosby 3baae2d525 Update runc for devices changes
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-02-08 13:15:12 -08:00
rajasec f1cde33ed7 Fixing capabilities name in SPEC.md
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-02-07 21:57:28 +05:30
Mike Brown c2c0458598 merges latest spec with runc
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2016-02-05 12:47:09 -08:00
Michael Crosby 9c9f8eeb4b Merge pull request #488 from stefanberger/new_session_keyring
Create a new session key for every container
2016-02-05 10:48:26 -08:00
Stefan Berger ad22e23aee Create a new session key for every container
Create a new session key ring '_ses' for every container. This avoids sharing
the key structure with the process that created the container and the
container inherits from.

This patch fixes it init and exec.

Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
2016-02-04 22:05:50 -05:00
rajasec 298cd1b285 Added error string for process operations
Signed-off-by: rajasec <rajasec79@gmail.com>

Changing the error code string name as per review comments

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-02-04 11:54:50 +05:30
Michael Crosby 5fe15a53b6 Merge pull request #496 from LK4D4/remove_sscanf
Remove usage of GetMounts from GetCgroupMounts
2016-02-04 14:55:41 -08:00
Michael Crosby 67cca27798 Merge pull request #529 from mlaventure/memory-limit-stat
Add limit value to memory stats
2016-02-04 11:21:35 -08:00
Qiang Huang d66c9632bf Merge pull request #524 from adfernandes/master
Add a compatibility header for CentOS/RHEL 6
2016-02-04 14:24:01 +08:00
Mrunal Patel 11a238b891 Merge pull request #522 from crosbymichael/created
Update list command and created methods
2016-02-04 09:47:10 +05:30
Kenfe-Mickael Laventure 7a12c92dbe Add limit value to memory stats
The value is populated with the content of `limit_in_bytes`.

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-02-03 11:54:09 -08:00
Alexander Morozov 97146f4dc6 Remove usage of GetMounts from GetCgroupMounts
GetMounts is very cpu-expensive. I'll change other funcs in this package
to reuse code from GetCgroupMounts later.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-02-01 11:00:23 -08:00
Qiang Huang 13e8f6e589 Remove procStart
It's never used and not needed. Our pipe is created with
syscall.SOCK_CLOEXEC, so pipe will be closed once container
process executed successfully, parent process will read EOF
and continue. If container process got error before executed,
we'll write procError to sync with parent.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-30 13:41:21 +08:00
Andrew Fernandes 3c2e77eed5 Add a compatibility header for CentOS/RHEL 6
Signed-off-by: Andrew Fernandes <andrew@fernandes.org>
2016-01-29 20:46:50 +00:00
Mrunal Patel 67aa3843e8 Merge pull request #474 from crosbymichael/detach
Add detach to runc
2016-01-28 14:09:07 -08:00
Michael Crosby 5cdb1be88f Merge pull request #517 from hqhq/hq_fix_comment
Fix the comment about sendConfig
2016-01-28 14:00:11 -08:00
Michael Crosby bb6a747825 Add detach to runc
By adding detach to runc the container process is the only thing running
on the system is the containers process.
This allows better usage of memeory and no runc process being long
lived.  With this addition you also need a delete command because the
detached container will not be able to remove state and the left over
cgroups directories.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-28 13:35:13 -08:00
Michael Crosby 1172a1e1e5 Update list command and created methods
We don't need a CreatedTime method on the container because it's not
part of the interface and can be received via the state.  We also do not
need to call it CreateTime because the type of this field is time.Time
so we know its time.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-28 13:32:24 -08:00
Michael Crosby 480e5f4416 Merge pull request #507 from mikebrow/runc-ls-command
adds list command
2016-01-28 13:20:07 -08:00
Mike Brown 4c871267db adds list command, and a timestamp in the container state
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2016-01-28 14:21:06 -06:00
Qiang Huang 064113363d Fix the comment about sendConfig
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-28 09:58:30 +08:00
Aleksa Sarai 57ba666ef3 cgroup: systemd: further systemd slice validation
Add some further (not critical, since Docker does this already)
validation to systemd slice names, to make sure users don't get cryptic
errors.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-27 19:00:52 +11:00
Michael Crosby 7cd384c0e5 Merge pull request #515 from crosbymichael/readall
Do not use stream encoders for pipe communication
2016-01-26 14:37:54 -08:00
Mrunal Patel 80c24730fa Merge pull request #511 from cyphar/fix-systemd-slice-expansion
cgroup: systemd: properly expand systemd slice names
2016-01-26 14:34:29 -08:00
Michael Crosby ddcee3cc2a Do not use stream encoders
Marshall the raw objects for the sync pipes so that no new line chars
are left behind in the pipe causing errors.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-26 11:22:05 -08:00
Alexander Morozov ee0a019448 Merge pull request #513 from duglin/RemoveNullState
Remove the nullState
2016-01-26 11:03:32 -08:00
Alexander Morozov 3268a1ea00 Merge pull request #499 from crosbymichael/state-fixes
Fix various state bugs for pause and destroy
2016-01-25 11:33:59 -08:00
Aleksa Sarai 8b32914065 cgroup: systemd: properly expand systemd slice names
Rather than using '/' to denote hierarchy in slice names, systemd uses
'-' in an odd way. This results in runC incorrectly assuming that
certain kernel features are missing (and using inconsistent paths for
the cgroups not supported by systemd), because the "subsystem path" used
is not the one that systemd has created. Fix all of this by properly
expanding slice names.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-25 23:18:34 +11:00
Doug Davis ff034a5119 Remove the nullState
Add a "createdState" in its place since I think that better describes
what its used for.

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-01-25 00:26:11 -08:00
Qiang Huang 045ada9be6 Revert "update date in README"
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-25 14:25:34 +08:00
rajasec 94b206102f Adding user namespace in README
Signed-off-by: rajasec <rajasec79@gmail.com>

Added UID/GID mappings section as per review comments

Signed-off-by: rajasec <rajasec79@gmail.com>

Added UID/GID mappings section as per review comments

Signed-off-by: rajasec <rajasec79@gmail.com>

Change size to 65536 per comments

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-01-25 07:07:44 +05:30
Qiang Huang 690e5d3251 Merge pull request #441 from ZJU-SEL/update-date
update date in README
2016-01-25 09:22:55 +08:00
Qiang Huang 4e6893b05a Merge pull request #494 from crosbymichael/cwd
Only set cwd when not empty
2016-01-22 09:50:38 +08:00
Qiang Huang 20c678ef50 Merge pull request #495 from cyphar/fix-memcg-set
cgroups: set memory cgroups in Set
2016-01-22 09:22:39 +08:00
Michael Crosby 9c3fa7928e Allow switch to anything from nullState
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-21 16:48:05 -08:00
Michael Crosby 556f798a19 Fix various state bugs for pause and destroy
There were issues where a process could die before pausing completed
leaving the container in an inconsistent state and unable to be
destoryed.  This makes sure that if the container is paused and the
process is dead it will unfreeze the cgroup before removing them.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-21 16:43:33 -08:00
Mrunal Patel 27132f2e51 Merge pull request #486 from duglin/removeHardCode
Remove some hard coded strings
2016-01-21 14:53:17 -08:00
Aleksa Sarai 75e38f94a0 cgroups: set memory cgroups in Set
Modify the memory cgroup code such that kmem is not managed by Set(), in
order to allow updating of memory constraints for containers by Docker.
This also removes the need to make memory a special case cgroup.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-22 07:46:43 +11:00
Michael Crosby ed7be1d082 Only set cwd when not empty
For existing consumers of libconatiner to not require cwd inside the
libcontainer code.  This can be done at the runc level and is already
evaluated there.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-21 11:08:32 -08:00
Qiang Huang 8bbe901045 Fix comment of swap limit
Set `-1` doesn't mean disable swap, disable swap means you
can't use swap memory, set `-1` really means you can use
unlimited swap memory.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-21 14:02:03 +08:00
Mrunal Patel 41d9d26513 Add support for just joining in apply using cgroup paths
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-01-20 14:23:05 -05:00
Doug Davis 49dfa1b62d Remove some hard coded strings
Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-01-19 19:02:31 -08:00
Mrunal Patel e91b055623 Merge pull request #476 from hqhq/hq_embed_resource
Embed Resources for backward compatibility
2016-01-19 14:59:39 -08:00
Michael Crosby 5637f38b8a Merge pull request #471 from jfrazelle/add-seccomp-enabled-check
add seccomp.IsEnabled() function
2016-01-19 14:52:51 -08:00
Michael Crosby 9c41e8388c
Handle seccomp proc parsing errors
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-19 11:43:49 -08:00
Qiang Huang f048eaf87a Embed Resources for backward compatibility
Fixes: docker/docker#19329

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-19 19:08:14 +08:00
Jessica Frazelle 41edbeb25e
add seccomp.IsEnabled() function
This is much like apparmor.IsEnabled() function and a nice helper.

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-01-18 10:44:31 -08:00
Jessica Frazelle ecf03fafa5
cleanup old hack dir
looks like this was left around from the libcontainer days ;)

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-01-15 16:39:38 -08:00
Alexander Morozov 54b07da69e Merge pull request #475 from mrunalp/set_cwd
Make cwd required
2016-01-15 13:54:35 -08:00
Alexander Morozov 6c9532f063 Merge pull request #461 from ahmetalpbalkan/selinux-setenforce
selinux: add SelinuxSetEnforceMode implementation
2016-01-15 13:01:27 -08:00
Alexander Morozov f2f8f0e4e6 Merge pull request #462 from hqhq/hq_fix_libcontainer_readme
Update README of libcontainer
2016-01-15 13:00:44 -08:00
Mrunal Patel 6259f09e97 Merge pull request #426 from gitido/pressure_level
libcontainer: Add support for memcg pressure notifications
2016-01-14 16:23:07 -08:00
Mrunal Patel 269a717555 Make cwd required
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-01-14 19:06:56 -05:00
Alexander Morozov 8962f371d6 Merge pull request #472 from dadgar/b-find-cgroup-mount
Only validate post-hyphen field length on cgroup mounts
2016-01-14 15:08:11 -08:00
Alexander Morozov 3b42992948 Merge pull request #455 from hallyn/tty01
Do not allow access to /dev/tty{0,1}
2016-01-14 14:35:46 -08:00
Qiang Huang d87ac4a2ca Update README of libcontainer
Fixes: #438

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-14 14:53:29 +08:00
Alex Dadgar a42f3236d5 Only validate post-hyphen field length on cgroup mounts
Signed-off-by: Alex Dadgar <alex.dadgar@gmail.com>
2016-01-13 11:28:49 -08:00
Mrunal Patel 4c767d7046 Merge pull request #446 from cyphar/18-add-pids-controller
cgroup: add PIDs cgroup controller support
2016-01-11 16:56:00 -08:00
Aleksa Sarai 103853ead7 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai a95483402e libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai f36ed4b174 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai db3159c9d9 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:32 +11:00
Alexander Morozov c0cad6aa5e Merge pull request #451 from cyphar/fix-infinite-recursion
cgroups: fs: fix cgroup.Parent path sanitisation
2016-01-11 08:52:26 -08:00
Mrunal Patel d43108184e Merge pull request #458 from hallyn/userns
Handle running nested in a user namespace
2016-01-11 08:41:46 -08:00
Aleksa Sarai bf899fef45 cgroups: fs: fix cgroup.Parent path sanitisation
Properly sanitise the --cgroup-parent path, to avoid potential issues
(as it starts creating directories and writing to files as root). In
addition, fix an infinite recursion due to incomplete base cases.

It might be a good idea to move pathClean to a separate library (which
deals with path safety concerns, so all of runC and Docker can take
advantage of it).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-11 23:10:35 +11:00
Alexander Morozov 910752f1f5 Merge pull request #463 from jimmidyson/non-recursive-pids
Revert to non-recursive GetPids, add recursive GetAllPids
2016-01-08 13:55:00 -08:00
Serge Hallyn c0ad40c5e6 Do not create devices when in user namespace
When we launch a container in a new user namespace, we cannot create
devices, so we bind mount the host's devices into place instead.

If we are running in a user namespace (i.e. nested in a container),
then we need to do the same thing.  Add a function to detect that
and check for it before doing mknod.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
---
 Changelog - add a comment clarifying what's going on with the
	     uidmap file.
2016-01-08 12:54:08 -08:00
Jimmi Dyson 91c7024e52 Revert to non-recursive GetPids, add recursive GetAllPids
Signed-off-by: Jimmi Dyson <jimmidyson@gmail.com>
2016-01-08 19:42:25 +00:00
Ahmet Alp Balkan c8b5e150f1 selinux: add SelinuxSetEnforceMode implementation
Signed-off-by: Ahmet Alp Balkan <ahmetalpbalkan@gmail.com>
2016-01-08 16:48:30 +00:00
xlgao-zju cdc53051a3 update date in README
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2016-01-08 10:48:11 +08:00
Mrunal Patel 749928a0a1 Merge pull request #421 from rajasec/selinux-compileflag
Adding selinux label
2016-01-07 17:57:54 -08:00
Serge Hallyn 2e13570679 Do not allow access to /dev/tty{0,1}
These are the real host devices, container should not generally
have or need them.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2016-01-06 18:42:17 -08:00
Mrunal Patel f03b7f8317 Merge pull request #419 from rajasec/selinux-teststepfix
make localtest failure with selinux enabled
2016-01-06 12:44:03 -08:00
Mrunal Patel 4fda64bc07 Merge pull request #452 from hqhq/hq_bindmount_whitelist
Add white list for bind mount check
2016-01-06 11:16:10 -08:00
Qiang Huang 9c1242ecba Add white list for bind mount chec
Fixes: #400

It would be useful to use fuse to isolate proc info.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-06 14:48:40 +08:00
Mrunal Patel fa24ebf26c Merge pull request #311 from crosbymichael/destory-state
Implement Container States
2016-01-04 09:59:28 -08:00
Kai Qiang WU(Kennan) c71d8e69f1 Fix typo word in SPEC.md
Signed-off-by: Kai Qiang WU(Kennan) <wkq5325@gmail.com>
2015-12-30 00:30:58 +00:00
Ido Yariv 55a8d686a9 libcontainer: Add support for memcg pressure notifications
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-12-28 13:36:55 -05:00
Mrunal Patel 4124ba9468 Revert "cgroups: add pids controller support"
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-19 07:48:48 -08:00
Mrunal Patel bc465742ac Merge pull request #58 from cyphar/18-add-pids-controller
cgroups: add pids controller support
2015-12-18 19:55:51 -08:00
Aleksa Sarai 14ed8696c1 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:48 +11:00
Aleksa Sarai 88e6d489f6 libcontainer: cgroups: loudly fail with Set
It is vital to loudly fail when a user attempts to set a cgroup limit
(rather than using the system default). Otherwise the user will assume
they have security they do not actually have. This mirrors the original
Apply() (that would set cgroup configs) semantics.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 8a740d5391 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
Aleksa Sarai 37789f5bf1 libcontainer: cgroups: add pids controller support
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.

Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:38 +11:00
Michael Crosby 766e4c5250 Merge pull request #437 from clnperez/nlahdrlen-fix-for-gccgo
Add NLA_HDRLEN workaround for gccgo
2015-12-18 15:57:26 -08:00
Christy Perez ced8e5e7ba Caclulate NLA_HDRLEN as gccgo workaround
syscall.NLA_HDRLEN is not in gccgo (as of 5.3), so in the meantime
use the #defines taken from linux/netlink.h.

See https://github.com/golang/go/issues/13629

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2015-12-17 17:36:47 -06:00
Michael Crosby 4415446c32 Add state pattern for container state transition
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add state status() method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Allow multiple checkpoint on restore

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handle leave-running state

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Fix state transitions for inprocess

Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove isDestroyed method

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Handling Pausing from freezer state

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

freezer status

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Fixing review comments

Signed-off-by: Rajasekaran <rajasec79@gmail.com>

Added comment when freezer not available

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Conflicts:
	libcontainer/container_linux.go

Change checkFreezer logic to isPaused()

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Remove state base and factor out destroy func

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>

Add unit test for state transitions

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-17 13:55:38 -08:00
Qiang Huang 9d6ce7168a Merge pull request #434 from mrunalp/resources
Move the cgroups setting into a Resources struct
2015-12-17 09:34:29 +08:00
Mrunal Patel 55a49f2110 Move the cgroups setting into a Resources struct
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-16 15:53:31 -05:00
David Calavera 77c36f4b34 Move linux only Process.InitializeIO behind the linux build flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-15 15:12:29 -05:00
David Calavera 977991d36f Replace docker units package with new docker/go-units.
It's the same library but it won't live in docker/docker anymore.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-14 20:45:30 -05:00
Mrunal Patel 11f8fdca33 Merge pull request #430 from crosbymichael/pipes
Move STDIO initialization to libcontainer.Process
2015-12-11 14:30:42 -08:00
Alexander Morozov cb04f03854 Merge pull request #336 from hqhq/hq_parent_cgroup_systemd
systemd: support cgroup parent with specified slice
2015-12-11 10:13:47 -08:00
xlgao-zju ff29daafc0 fix minor typo
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-12-11 21:37:32 +08:00
Michael Crosby 29b139f702 Move STDIO initialization to libcontainer.Process
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-10 16:11:49 -08:00
Mrunal Patel 0267ad05b0 Merge pull request #340 from dqminh/replace-env-netlink
nsexec: replace usage of environment variable with netlink message
2015-12-09 14:21:45 -08:00
Michael Crosby 9c9aac5385 Export console New func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-12-09 11:59:10 -08:00
Daniel, Dao Quang Minh 7d423cb7a1 setns: replace env with netlink for bootstrap data
replace passing of pid and console path via environment variable with passing
them with netlink message via an established pipe.

this change requires us to set _LIBCONTAINER_INITTYPE and
_LIBCONTAINER_INITPIPE as the env environment of the bootstrap process as we
only send the bootstrap data for setns process right now. When init and setns
bootstrap process are unified (i.e., init use nsexec instead of Go to clone new
process), we can remove _LIBCONTAINER_INITTYPE.

Note:
- we read nlmsghdr first before reading the content so we can get the total
  length of the payload and allocate buffer properly instead of allocating
  one large buffer.

- check read bytes vs the wanted number. It's an error if we failed to read
  the desired number of bytes from the pipe into the buffer.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-12-03 18:03:48 +00:00
Qiang Huang 7695a0ddb0 systemd: support cgroup parent with specified slice
Pick up #119
Fixes: docker/docker#16681

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-12-02 23:57:02 -05:00
Mrunal Patel 3317785f56 Merge pull request #420 from runcom/cgroups-unsupported
libcontainer: configs: create cgroup_unsupported.go in order to build on darwin as well
2015-11-30 09:20:23 -08:00
Alexander Morozov decba54d78 Merge pull request #424 from runcom/fix-go-vet
libcontainer: network_linux.go: fix go vet
2015-11-30 09:06:41 -08:00
Antonio Murdaca 3029587085 libcontainer: network_linux.go: fix go vet
This patch fixes the following go vet warnings:
```
libcontainer/network_linux.go:96: github.com/vishvananda/netlink.Device
composite literal uses unkeyed fields
libcontainer/network_linux.go:114: github.com/vishvananda/netlink.Device
composite literal uses unkeyed fields
```

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-30 12:31:18 +01:00
Rajasekaran 49ff2711e1 Fixing xattr test step issue
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2015-11-29 09:24:42 +05:30
rajasec a6614ba40f Fixing TestSetFilecon in selinux test step
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-28 13:51:46 +05:30
Antonio Murdaca 112493115f libcontainer: configs: create cgroup_unsupported.go in order to build on darwin as well
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-27 10:28:29 +01:00
rajasec 9f4d5340f4 Adding selinux label
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-26 19:44:51 +05:30
rajasec ce68f7aef7 make localtest failure with selinux enabled
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-24 23:24:30 +05:30
Daniel, Dao Quang Minh d914bf7347 setns: add bootstrap data
add bootstrap data to setns process. If we have any bootstrap data then copy it
to the bootstrap process (i.e. nsexec) using the sync pipe. This will allow us
to eventually replace environment variable usage with more structured data
to setup namespaces, write pid/gid map, setgroup etc.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-11-22 11:36:58 +00:00
rajasec 949d822675 Adding error conditions when apparmor disabled
Signed-off-by: rajasec <rajasec79@gmail.com>

Add the changes to errors in lower case

Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-22 13:14:18 +05:30
Antonio Murdaca 400e05fe5b libcontainer: configs: extend unsupported os
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2015-11-19 18:24:34 +01:00
Alexander Morozov 776791463d Merge pull request #357 from ashahab-altiscale/350-container-in-container
Bind mount device nodes on EPERM
2015-11-16 14:54:02 -08:00
Qiang Huang 96f0eefa1a Fix comment to be consistent with the code
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-16 19:16:27 +08:00
Abin Shahab 28c9d0252c Userns container in containers
Enables launching userns containers by catching EPERM errors for writing
to devices cgroups, and for mknod invocations.

Signed-off-by: Abin Shahab <ashahab@altiscale.com>
2015-11-15 14:42:35 -08:00
Alexander Morozov 48fdc50d09 Merge pull request #398 from crosbymichael/seccomp-trace
Add seccomp trace support
2015-11-13 10:54:18 -08:00
Alexander Morozov bda4ca2f8f Merge pull request #388 from hqhq/hq_cgroup_cleanups
Some cgroup cleanups
2015-11-13 09:06:18 -08:00
Michael Crosby caca840972 Add seccomp trace support
Closes #347

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-12 17:03:53 -08:00
Michael Crosby 2be14dc963 Merge pull request #392 from mrunalp/poststart
Add poststart hooks
2015-11-12 16:34:38 -08:00
Michael Crosby 879dfdd980 Fix race setting process opts
When starting and quering for pids a container can start and exit before
this is set.  So set the opts after the process is started and while
libcontainer still has the container's process blocking on the pipe.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-06 16:51:59 -08:00
Mrunal Patel 452e8a73c5 Integrate poststart hooks with spec
* Call poststart hooks after the container is started
* Tie in with spec configuration

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-11-06 18:03:32 -05:00
Mrunal Patel bb2d3cd1be Add Poststart hook to libcontainer config
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-11-06 18:02:50 -05:00
Qiang Huang 209c8d9979 Add some comments about cgroup
We fixed some bugs and introduced some code hard to be
understood, add some comments for them.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
Qiang Huang 8c98ae27ac Refactor cgroupData
The former cgroup entry is confusing, separate it to parent
and name.
Rename entry `c` to `config`.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
Qiang Huang a263afaf6c Rename parent and data
'parent' function is confusing with parent cgroup, it's actually
parent path, so rename it to parentPath.

The name 'data' is too common to be identified, rename it to cgroupData
which is exactly what it is.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-11-05 19:12:53 +08:00
John Howard a919bd3f67 Windows: Refactor Container interface
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-11-02 15:12:16 -08:00
Mrunal Patel c42a2952c4 Merge pull request #361 from jhowardmsft/jjh/criu_opts
Windows: Factor down criu_opts
2015-11-02 15:05:27 -08:00
Mrunal Patel 7caef5626b Merge pull request #359 from jhowardmsft/jjh/state_struct
Windows: Refactor state struct
2015-11-02 15:04:12 -08:00
Mrunal Patel cf73b32eeb Merge pull request #343 from hqhq/hq_unify_behavior_for_memory
Unify behavior for memory cgroup
2015-11-02 14:58:31 -08:00
Michael Crosby 26eb6a1bcd Merge pull request #377 from rhatdan/label
Docker needs to know whether the user requested a relabel
2015-11-02 14:55:27 -08:00
Doug Davis e5dc12a0c9 Add more context around some error cases
Signed-off-by: Doug Davis <dug@us.ibm.com>
2015-10-30 10:55:48 -07:00
Dan Walsh 69c3ea4e17 Docker needs to know whether the user requested a relabel
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-10-28 15:44:38 -04:00
John Howard fe1cce69b3 Windows: Refactor state struct
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-10-26 14:45:20 -07:00
Mrunal Patel 6c36d666a1 Merge pull request #365 from jhowardmsft/jjh/devices
Windows: Tidy libcontainer\devices
2015-10-24 19:36:26 -07:00
Mrunal Patel 0d155ba0fb Merge pull request #362 from jhowardmsft/jjh/configs-cgroup
Windows: Refactor configs/cgroup.go
2015-10-24 19:34:54 -07:00
Mrunal Patel 6d85c27599 Merge pull request #364 from jhowardmsft/jjh/fs-build-tags
Fixes build tags on cgroups\fs\*.go
2015-10-24 19:33:52 -07:00
John Howard 37675129ba Windows: Tidy libcontainer\devices
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-10-23 13:50:24 -07:00
Alexander Morozov 34fe03fa8a Merge pull request #238 from adrianreber/master
Add criu related debug output
2015-10-23 13:44:03 -07:00
John Howard fb5a8febce Fixes build tags on cgroups\fs\*.go
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-10-23 13:41:10 -07:00
Mrunal Patel b741e3dc9d Merge pull request #337 from alban/alban/stdio
libcontainer/SPEC.md: fix /dev/stdio symlinks
2015-10-23 13:40:56 -07:00
John Howard 8690e9cc8c Windows: Refactor configs/cgroup.go
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-10-23 13:08:18 -07:00
John Howard 78351a8e3d Windows: Factor down criu_opts
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-10-23 12:58:59 -07:00
Mrunal Patel bed70ca579 Merge pull request #358 from rajasec/exit-typo
Fixing typo in the comment for exit
2015-10-23 11:12:17 -07:00
Alexander Morozov 97929bd6dd Merge pull request #335 from crosbymichael/cgroup-order
Add name to cgroup subsystem and set order
2015-10-23 10:38:29 -07:00
yangshukui e5ef8d239a Add the conversion of architectures for seccomp config
Signed-off-by: yangshukui <yangshukui@huawei.com>
2015-10-23 10:17:39 +08:00
rajasec 58e3cde8f3 Fixing typo in the comment for exit
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-10-22 19:08:03 +05:30
Alban Crequy f381717120 libcontainer/SPEC.md: fix /dev/stdio symlinks
The spec uses symlinks to "/proc/1/..." but the implementation uses
"/proc/self/...": see setupDevSymlinks (libcontainer/rootfs_linux.go).

The implementation is more correct, so I'm changing the spec to match
the implementation.

Signed-off-by: Alban Crequy <alban.crequy@coreos.com>
2015-10-21 11:10:24 +02:00
Qiang Huang 34cff6f2f3 Correct intuition for setupDev
Minor fix, the former setupDev=true means not setup dev,
which is contrary to intuition, just correct it.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-10-21 16:06:26 +08:00
Qiang Huang 194e0e4db6 Unify behavior for memory cgroup
We have a rule that for optional cgroups, don't fail if some
of them are not mounted, but we want it fail hard when a
user specifies an option and we are unable to fulfill the
request.

Memory cgroup should also follow this rule.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-10-20 14:01:48 +08:00
Michael Crosby ba2ce3b25a Cgroup set order for systemd
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-10-19 13:32:45 -07:00
Michael Crosby 2554f49d5e Use array instead of map for cgroup subsystems
Also add cpuset as the first in the list to address issues setting the
pid in any cgroup before the cpuset is populated.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-10-15 15:24:53 -07:00
Michael Crosby 02fdc70837 Add Name() to cgroup subsystems
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-10-15 15:19:23 -07:00
Mrunal Patel 3be7f87b1b Merge pull request #334 from hqhq/hq_set_cpus_mems_first
Set cpuset.cpus and cpuset.mems before join the cgroup
2015-10-15 14:33:28 -07:00
Qiang Huang be6764508e Set cpuset.cpus and cpuset.mems before join the cgroup
It can avoid unnecessary task migrataion, see this scenario:
 - container init task is on cpu 1, and we assigned it to cpu 1,
   but parent cgroup's cpuset.cpus=2
 - we created the cgroup dir and inherited cpuset.cpus from parent as 2
 - write container init task's pid to cgroup.procs
 - [it's possibile the container init task migrated to cpu 2 here]
 - set cpuset.cpus as assigned to cpu 1
 - [the container init task has to be migrated back to cpu 1]

So we should set cpuset.cpus and cpuset.mems before writing pids
to cgroup.procs to aviod such problem.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-10-15 11:16:56 +08:00
Alexander Morozov 6c198ae2d0 Reorder checks in Walk to avoid panics
Also added test for host PID namespace

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-10-13 15:06:57 -07:00
Alexander Morozov 6dad176d01 Get PIDs from cgroups recursively
Also lookup cgroup for systemd is changed to "device" to be consistent
with fs implementation.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-10-13 10:19:01 -07:00
Adrian Reber c42ef59bf9 Add criu related debug output
While testing different versions of criu it helps to know which
criu binary with which options is currently used. Therefore additional
debug output to display these information is added.

v2: increase readability of printed out criu options

Signed-off-by: Adrian Reber <adrian@lisas.de>
2015-10-13 10:41:00 +02:00
Alexander Morozov d9ba9cebac Merge pull request #184 from huikang/criu-cgroup-manage-mode
Add option to support criu manage cgroups mode for dump and restore
2015-10-12 10:51:16 -07:00
Mrunal Patel bfe2bacbf4 Merge pull request #320 from rhatdan/label
Validate label options
2015-10-11 20:54:38 -07:00
Hui Kang 25da513c4b Add option to support criu manage cgroups mode for dump and restore
CRIU supports cgroup-manage mode from v1.7

Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-10-11 04:42:54 +00:00
Dan Walsh f8b34352fe Validate label options
Only valid options to --security-opt for label should be
disable, user, role, type, level.

Return error on invalid entry

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-10-10 06:51:49 -04:00
Mrunal Patel f152edcb1c Merge pull request #316 from cpuguy83/race_on_output_start_error
Fix for race from error on process start
2015-10-08 13:51:54 -07:00
xlgao-zju 02fc164456 change named to names
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-08 21:44:23 +08:00
Brian Goff 7632c4585f Fix for race from error on process start
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.

Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.

With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-10-07 21:28:26 -04:00
Alexander Morozov 902c012e85 Merge pull request #319 from dodgerblue/dodgerblue-arm64
nsexec: Align clone child stack ptr to 16
2015-10-06 08:28:24 -07:00
Bogdan Purcareata 4c5eb45862 nsexec: Align clone child stack ptr to 16
This is required on ARM64 builds that use the clone syscall. Check [1].

[1] http://lxr.free-electrons.com/source/arch/arm64/kernel/process.c#L264

Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
2015-10-06 10:41:18 +00:00
Antonio Murdaca c5b80bddf1 bump docker pkgs
Docker pkgs were updated while golinting the whole docker code base.
Now when trying to bump libcontainer/runc in docker, it fails compiling
with the following error:
``
vendor/src/github.com/opencontainers/runc/libcontainer/rootfs_linux.go:424:
undefined: mount.MountInfo
``
This is because, for instance, the mount pkg was updated here
0f5c9d301b (diff-49294d05afa48e2f7c0d2f02c6f7614c)
and now that type is only `mount.Info`.
This patch bump docker pkgs commit and adapt code to it.

Signed-off-by: Antonio Murdaca <amurdaca@redhat.com>
2015-10-06 10:48:12 +02:00
Mrunal Patel cc84f2cc9b Merge pull request #305 from hqhq/hq_add_softlimit_systemd
Add memory reservation support for systemd
2015-10-05 16:37:32 -07:00
Mrunal Patel 223975564a Merge pull request #276 from runcom/adapt-spec-96bcd043aa8a28f6f64c95ad61329765f01de1ba
Adapt spec 96bcd043aa
2015-10-05 16:36:09 -07:00
Alexander Morozov d7ce356411 Merge pull request #315 from mrunalp/systemd_name
Systemd name
2015-10-05 15:12:28 -07:00
Mrunal Patel 0b9e7af763 Merge pull request #313 from swagiaal/fix-GetAdditionalGroups
Allow numeric groups for containers without /etc/group
2015-10-05 11:47:36 -07:00
Mrunal Patel 79a02e35fb cgroups: Add name=systemd to list of subsystems
This allows getting the path to the subsystem and so is subsequently
used in EnterPid by an exec process.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:24:11 -04:00
Mrunal Patel 1940c73777 cgroups: Add a name cgroup
This is meant to be used in retrieving the paths so an exec
process enters all the cgroup paths correctly.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-10-05 14:23:05 -04:00
Sami Wagiaalla c25c38cc80 Allow numeric groups for containers without /etc/group
/etc/groups is not needed when specifying numeric group ids. This
change allows containers without /etc/groups to specify numeric
supplemental groups.

Signed-off-by: Sami Wagiaalla <swagiaal@redhat.com>
2015-10-04 19:02:35 -04:00
xlgao-zju 4b360d6300 change uid to gid in func HostGID
Signed-off-by: xlgao-zju <xlgao@zju.edu.cn>
2015-10-05 01:11:48 +08:00
Antonio Murdaca c6e406af24 Adjust runc to new opencontainers/specs version
Godeps: Vendor opencontainers/specs 96bcd043aa

Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274

Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-10-03 12:25:33 +02:00
Alexander Morozov c573ffbd05 Merge pull request #208 from rhvgoyal/config-rootfsPropagation
Create container_private, container_slave and container_shared modes for rootfsPropagation
2015-10-02 13:42:20 -07:00
Vivek Goyal 6a851e1195 exec_test.go: Test case for rootfsPropagation="private"
A test case to test rootfsPropagation="private" and making sure shared
volumes work.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 175e4b8aec exec_test.go: Test cases for rootfsPropagation=rslave
test case to test rootfsPropagation=rslave

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal da8d776c08 Make pivotDir rprivate
pivotDir is the one where pivot_root() call puts the old root. We will
unmount pivotDir() and delete it.

Previously we were making / always rslave or rprivate. That will mean 
that pivotDir() could never have mounts which would be shared with
parent mount namespace. That also means that unmounting pivotDir() was
safe and none of the unmount will propagate to parent namespace and
unmount things which we did not want to.

But now user can specify that apply private, shared, slave on /. That
means some of the mounts we inherited from parent could be shared and that
also means if we umount pivotDir/, those mounts will get unmounted in
parent too. That's not what we want.

Instead make pivotDir rprivate so that unmounts don't propagate back to
parent.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 23ec72a426 Make parent mount of container root private if it is shared.
pivot_root() introduces bunch of restrictions otherwise it fails. parent
mount of container root can not be shared otherwise pivot_root() will
fail. 

So far parent could not be shared as we marked everything either private
or slave. But now we have introduced new propagation modes where parent
mount of container rootfs could be shared and pivot_root() will fail.

So check if parent mount is shared and if yes, make it private. This will
make sure pivot_root() works.

Also it will make sure that when we bind mount container rootfs, it does
not propagate to parent mount namespace. Otherwise cleanup becomes a 
problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Vivek Goyal 5dd6caf6cf Replace config.Privatefs with config.RootPropagation
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.

Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.

Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.

config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
 
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-10-01 17:03:02 -04:00
Alexander Morozov 0954faba13 Merge pull request #306 from hqhq/hq_join_perfevent_systemd
Systemd: Join perf_event cgroup
2015-10-01 10:05:35 -07:00
Alexander Morozov 4d5079b9dc Merge pull request #309 from chenchun/fix_reOpenDevNull
Fix reOpenDevNull
2015-09-30 19:06:43 -07:00
Alexander Morozov fba07bce72 Merge pull request #307 from estesp/no-remount-if-unecessary
Only remount if requested flags differ from current
2015-09-30 11:40:06 -07:00
Mrunal Patel 74ded3660b Merge pull request #304 from rhatdan/mountproc
/proc and /sys do not support labeling
2015-09-30 11:36:20 -07:00
Michael Crosby 146916ca93 Merge pull request #308 from LK4D4/fix_tlb_tests
Run tests for all HugetlbSizes
2015-09-30 11:26:40 -07:00
Chun Chen 06d91f546f Fix reOpenDevNull
We should open /dev/null with os.O_RDWR, otherwise it won't be
possible writen to it

Signed-off-by: Chun Chen <ramichen@tencent.com>
2015-09-30 16:05:49 +08:00
Phil Estes 97f5ee4e6a Only remount if requested flags differ from current
Do not remount a bind mount to enable flags unless non-default flags are
provided for the requested mount. This solves a problem with user
namespaces and remount of bind mount permissions.

Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
2015-09-29 23:13:04 -04:00
Alexander Morozov e32b3442ec Run tests for all HugetlbSizes
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-29 17:08:41 -07:00
Qiang Huang 6a5ba1109c Systemd: Join perf_event cgroup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 15:42:29 +08:00
Qiang Huang fb5a56fb97 Add memory reservation support for systemd
Seems it's missed in the first place.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-29 10:02:12 +08:00
Dan Walsh cab342f0de Check for failure on /dev/mqueue and try again without labeling
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Dan Walsh b4dcb75503 /proc and /sys do not support labeling
This is causing docker to crash when --selinux-enforcing mode is set.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2015-09-28 12:31:52 -04:00
Mrunal Patel f7d1401a69 Add validation for sysctl
/proc/sys isn't completely namespaced and only some properties are allowed
per linux namespace.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-25 14:04:18 -04:00
Alexander Morozov 902ccd0f18 Merge pull request #302 from mrunalp/cap_list
Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b4
2015-09-25 08:49:44 -07:00
Mrunal Patel c5d3bda7e1 Merge pull request #292 from keloyang/rpid
no need to use p.cmd.Process.Pid in function, use p.pid() instead.
2015-09-24 15:59:39 -07:00
Mrunal Patel 34d3e2b948 Update github.com/syndtr/gocapability/capability to 2c00daeb6c3b45114c80ac44119e7b8801fdd852
This allows us to use the capability.List() function to construct capability list
dynamically.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-24 18:44:01 -04:00
Alexander Morozov aac9179bba Merge pull request #160 from mrunalp/feature/hooks
Add prestart/poststop hooks to runc
2015-09-24 14:52:30 -07:00
Michael Crosby 203d3e258e Move mount methods out of configs pkg
Do not have methods and actions that require syscalls in the configs
package because it breaks cross compile.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-24 09:43:12 -07:00
Mrunal Patel dcafe48737 Add version to HookState to make it json-compatible with spec State
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-23 17:13:00 -07:00
Alexander Morozov 83b2975c8b Merge pull request #295 from mheon/seccomp_architecture
Libcontainer: Add support for multiple architectures in Seccomp
2015-09-23 11:08:47 -07:00
Matthew Heon 795a6c9702 Libcontainer: Add support for multiple architectures in Seccomp
This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-09-23 13:54:24 -04:00
Michael Crosby 5765dcd086 Merge pull request #296 from crosbymichael/mount-resolv-symlink
Change mount dest after resolving symlinks
2015-09-23 10:21:25 -07:00
Michael Crosby b3bb606513 Change mount dest after resolving symlinks
We need to update the mount's destination after we resolve symlinks so
that it properly creates and mounts the correct location.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-23 10:07:18 -07:00
keloyang 69a5b2df9e no need to use p.cmd.Process.Pid in function, use p.pid() instead.
Signed-off-by: keloyang <yangshukui@huawei.com>
2015-09-23 10:48:36 +08:00
Mrunal Patel d8b7deaf4c Merge pull request #283 from runcom/cleanup-unused-func-args
Cleanup unused func arguments
2015-09-22 16:53:19 -07:00
Mrunal Patel 7570169548 Merge pull request #288 from gitido/fix_userns
Enter existing user namespace if present
2015-09-22 16:27:57 -07:00
Michael Crosby 219b6c99e0 Ignore changing /dev/null permissions if used in STDIO
Whenever dev/null is used as one of the main processes STDIO, do not try
to change the permissions on it via fchown because we should not do it
in the first place and also this will fail if the container is supposed
to be readonly.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-22 15:32:31 -07:00
Ido Yariv 08366a8597 Enter existing user namespace if present
When executing an additional process in a container, all namespaces are
entered but the user namespace. As a result, the process may be
executed as the host's root user. This has both functionality and
security implications.

Fix this by adding the missing user namespace to the array of
namespaces. Since joining a user namespace in which the caller is
already a member yields an error, skip namespaces we're already in.

Last, remove a needless and buggy AT_SYMLINK_NOFOLLOW in the code.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-09-21 21:49:52 -04:00
Antonio Murdaca d6e6462478 Cleanup unused func arguments
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-21 11:50:29 +02:00
Michael Crosby 0dad64f7ad Fix STDIO permissions when container user not root
Fix the permissions of the container's main processes STDIO when the
process is not run as the root user.  This changes the permissions right
before switching to the specified user so that it's STDIO matches it's
UID and GID.

Add a test for checking that the STDIO of the process is owned by the
specified user.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-18 14:11:29 -07:00
Vivek Goyal d1f4a5b8b5 libcontainer: Allow passing mount propagation flags
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.

{
  "type": "bind",
  "source": "/root/mnt-source",
  "destination": "/root/mnt-dest",
  "options": "rbind,shared"
}

One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.

#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest                      private

Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).

Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.

Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.

This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2015-09-16 15:53:23 -04:00
Mrunal Patel ec37110957 Update README for the CAP prefix change
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-15 14:44:12 -04:00
Mrunal Patel 859abee0c8 Add CAP prefix for capabilities
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-15 14:43:03 -04:00
Mrunal Patel 4d8e13fc3e Merge pull request #43 from LK4D4/new_netlink
New netlink library
2015-09-14 14:01:07 -07:00
Mrunal Patel 486ac97618 Merge pull request #236 from hqhq/hq_fix_cgroup_rw
Always remount for bind mount
2015-09-14 12:08:34 -07:00
Rajasekaran 2940f73a14 make localtest failure on removing seccomp flag
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2015-09-12 14:43:55 +05:30
Mrunal Patel ef9471fd5b Merge pull request #253 from avagin/cr-cgroups
c/r: create cgroups to restore a container
2015-09-11 18:03:40 -07:00
Alexander Morozov b0fd9fb75a Merge pull request #220 from crosbymichael/build-tags
Add seccomp build tag
2015-09-11 12:06:27 -07:00
Michael Crosby a8e0185d97 Add seccomp build tag
Add a seccomp build tag and also support in the Makefile to add or
remove build tags.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 12:03:57 -07:00
David Calavera 0f28592b35 Turn hook pointers into values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 11:34:34 -07:00
Michael Crosby dd969cbacd Add test for function based hooks
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-10 18:15:00 -07:00
Mrunal Patel 1dca365393 Add test for prestart hook
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>

Conflicts:
	libcontainer/integration/exec_test.go
2015-09-10 17:59:36 -07:00
Michael Crosby 05567f2c94 Implement hooks in libcontainer
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-10 17:57:31 -07:00
Andrey Vagin df39686c93 c/r: create cgroups to restore a container
Here are two reasons:
* If we use systemd, we need to ask it to create cgroups
* If a container is restored with another ID, we need to
  change paths to cgroups.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 21:00:27 +03:00
Andrey Vagin da2535f2d1 mount: don't read /proc/self/cgroup many times
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 21:00:22 +03:00
Andrey Vagin e49c1dc559 Rework ParseCgroupFile
Currently we parse /proc/self/cgroup for each controller.
It's ineffective.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-09-10 20:59:27 +03:00
Alexander Morozov 24f4d5d1fd Remove old netlink library
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-09 19:38:02 -07:00
Alexander Morozov 916bd6bd68 Use github.com/vishvananda/netlink for networking
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-09-09 19:32:46 -07:00
Qiang Huang b94fe5b7f8 Fix bug in find cgroup mount point dir
Bug was introduced in #250

According to: http://man7.org/linux/man-pages/man5/proc.5.html

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)
...
(7)  optional fields: zero or more fields of the form
       "tag[:value]".
The 7th field is optional. We should skip it when parsing mount info.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-10 08:29:12 +08:00
Qiang Huang f2ec7eff7e Rename FindCgroupMountpointAndSource
Rename it to FindCgroupMountpointAndRoot.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-09 09:29:11 +08:00
Qiang Huang bc67941c72 Parse directly in FindCgroupMountpointDir
Unify it with FindCgroupMountpoint, and add comments why
we should to do this.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-09-09 09:28:50 +08:00
Alexander Morozov 05b1cda5dd Merge pull request #235 from hqhq/hq_fix_cgroup_test
Fix cgroup mount tests
2015-09-01 14:57:44 -07:00
Vishnu Kannan cc232c4707 Adding oom_score_adj as a container config param.
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2015-08-31 14:02:59 -07:00
Qiang Huang 085f465c00 Fix cgroup mount tests
I got:
```
exec_test.go:823: Mode expected to contain 'ro,nosuid,nodev,noexec': tmpfs on /sys/fs/cgroup type tmpfs (ro,seclabel,nosuid,nodev,noexec,relatime,mode=755
```wq

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-31 11:23:18 +08:00
Qiang Huang b7385e291c Always remount for bind mount
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-31 11:10:34 +08:00
Michael Crosby b1e7041957 Merge pull request #165 from calavera/context_labels
Make label.Relabel safer.
2015-08-28 14:20:00 -07:00
Matthew Heon 2ee6d1e8b6 Connect Seccomp configuration in Spec to configuration in Libcontainer
Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-25 17:35:06 -04:00
Mrunal Patel 2f4c229a8c Merge pull request #215 from boucher/huikang-patch
Add hooks for passing explicit veth pairs for forwarding to CRIU
2015-08-24 21:23:29 -07:00
Hui Kang 7f23085c82 Add hooks for passing explicit veth pairs for forwarding to CRIU.
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-08-24 09:26:39 -07:00
boucher 8c812d0f50 Add the criu log file path to the failure message.
Signed-off-by: Ross Boucher <rboucher@gmail.com>
2015-08-21 14:20:59 -07:00
Mrunal Patel e7663a673e Merge pull request #70 from mheon/seccomp
Convert Seccomp support to use Libseccomp
2015-08-21 12:25:33 -07:00
Lai Jiangshan e48363d777 simplify a variable declaration
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-20 08:21:44 +08:00
Mrunal Patel ca8831fa75 Merge pull request #183 from rajasec/securityfs
Adding securityfs mount
2015-08-18 14:24:38 -07:00
Mrunal Patel c20bda3f71 Merge pull request #206 from mountkin/ensure-cleanup
Ensure the cleanup jobs in the deferrer are executed on error
2015-08-18 14:16:31 -07:00
Michael Crosby b0ca535f75 Merge pull request #194 from LK4D4/fix_cgroups_again
Fix cgroups again
2015-08-18 13:49:31 -07:00
Michael Crosby c6b6be21c5 Merge pull request #199 from clnperez/ifrdatabyte-sign-pr
Fixing netlink build error on ppc64le with gccgo
2015-08-18 13:48:59 -07:00
rajasec 8cdc409715 Fixing tmpfs
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-08-17 06:22:48 +05:30
Shijiang Wei f0679089b9 Ensure the cleanup jobs in the deferrer are executed on error
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
2015-08-16 12:29:04 +08:00
Michael Chase-Salerno 9bc81d1699 Fixing netlink build error on ppc64le with gccgo
Again. It looks like a build tag was somehow dropped between
the PR here: https://github.com/docker/libcontainer/pull/625
and the move to runc.

Signed-off-by: Christy Perez <clnperez@linux.vnet.ibm.com>
2015-08-13 17:52:47 -05:00
Matthew Heon a6b73dbc73 Remove Seccomp build tag to fix godep
Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 15:23:43 -04:00
Matthew Heon 59264040bd Update tests to not error on library v2.2.0 and lower
As v2.1.0 is no longer required for successful testing, do not build it in the
Dockerfile - instead just use the version Ubuntu ships.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 09:36:21 -04:00
Matthew Heon 2ae581ae62 Convert Seccomp support to use Libseccomp
This removes the existing, native Go seccomp filter generation and replaces it
with Libseccomp. Libseccomp is a C library which provides architecture
independent generation of Seccomp filters for the Linux kernel.

This adds a dependency on v2.2.1 or above of Libseccomp.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2015-08-13 07:56:27 -04:00
Lai Jiangshan e8817e1104 Simplify the return on process wait
Simplify the code introduced by the commit d1f0d5705deb:
    Return actual ProcessState on Wait error

Cc: Alexander Morozov <lk4d4@docker.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-12 22:37:34 +08:00
Alexander Morozov 2b28b3c276 Always use cgroup root of current process
Because for host PID namespace /proc/1/cgroup can point to whole other
world of cgroups.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-11 18:04:59 -07:00
Alexander Morozov 5aa6005498 Revert "Fix cgroup parent searching"
This reverts commit 2f9052ca29.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-11 18:04:55 -07:00
Alexander Morozov 2f9052ca29 Fix cgroup parent searching
I had pretty convenient input data to miss this bug.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-10 14:30:05 -07:00
rajasec 24f7a10a93 Adding securityfs mount
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-08-05 16:50:08 +05:30
Mrunal Patel f3a3025933 Fix minor stylistic issues
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-04 17:44:45 -04:00
Mrunal Patel c9d5850629 Don't make modifications to /dev there are no devices in the configuration
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-04 16:57:29 -04:00
Michael Crosby a5ef75b681 Add signal API to Container interface
This adds a `Signal()` method to the container interface so that the
initial process can be signaled after a Load or operation.  It also
implements signaling the init process from a nonChildProcess.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-08-03 17:07:29 -07:00
Mrunal Patel ce0a339632 Merge pull request #166 from gitido/fixes
Go1.5 compatibility fix
2015-08-03 13:51:26 -07:00
Michael Crosby 76e706f856 Merge pull request #151 from LK4D4/use_proc_exe
Use /proc/self/exe as default for InitPath
2015-08-03 16:15:33 -04:00
Michael Crosby b1821a4edc Merge pull request #150 from runcom/update-go-systemd-dbus-v3
Update go systemd dbus v3
2015-08-03 16:11:52 -04:00
Ido Yariv 86a85582d2 Don't set /proc/<PID>/setgroups to deny in Go1.5
A boolean field named GidMappingsEnableSetgroups was added to
SysProcAttr in Go1.5. This field determines the value of the process's
setgroups proc entry.

Since the default is to set the entry to 'deny', calling setgroups will
fail on systems running kernels 3.19+.

Set GidMappingsEnableSetgroups to true so setgroups wont be set to
'deny'.

Signed-off-by: Ido Yariv <ido@wizery.com>
2015-08-03 14:59:15 -04:00
Hui Kang 0f66ff921a Add debug message when unable to execute criu
Signed-off-by: Hui Kang <hkang.sunysb@gmail.com>
2015-08-03 17:09:45 +00:00
Antonio Murdaca 9caef6c8c4 Remove reference to nsinit
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-08-02 12:00:39 +02:00
David Calavera 4bd4d462af Make label.Relabel safer.
- Check if Selinux is enabled before relabeling. This is a bug.
- Make exclusion detection constant time. Kinda buggy too, imo.
- Do not depend on a magic string to create a new Selinux context.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-07-31 10:37:32 -07:00
Mrunal Patel 602e8331a0 Merge pull request #164 from LK4D4/remove_dind
Remove dind
2015-07-31 07:53:03 -07:00
Mrunal Patel 19df27d08c Merge pull request #163 from avagin/cr_cgroups
tests: dump/restore a container with cgroups
2015-07-30 13:50:09 -07:00
Alexander Morozov 1735ad788f Replace dind with smaller script
It just mounts /tmp into tmpfs. We need this because criu tests has
problems on overlayfs.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-30 13:23:26 -07:00
Andrey Vagin aa3c2dc621 integration: show criu logs in a error case
Signed-off-by: Andrew Vagin <avagin@openvz.org>
2015-07-30 21:01:09 +03:00
Andrew Vagin e2e6a73b62 tests: dump/restore a container with cgroups
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-07-30 08:39:02 +03:00
Kir Kolyshkin 6f82d4b544 Simplify and fix os.MkdirAll() usage
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.

Quoting MkdirAll documentation:

> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.

This means two things:

1. If a directory to be created already exists, no error is
returned.

2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.

The above is a theory, based on quoted documentation and my UNIX
knowledge.

3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.

Because of #1, IsExist check after MkdirAll is not needed.

Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.

Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.

[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
2015-07-29 18:03:27 -07:00
Mrunal Patel 0e72bfb815 Fix files not closed in mountinfo parsing function
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-07-27 19:33:39 -04:00
Michael Crosby 4507c068ba Merge pull request #145 from LK4D4/sysfs_ro
Remount /sys/fs/cgroup as RO if MS_RDONLY was passed
2015-07-27 09:12:55 -07:00
Lai Jiangshan f26935eb0c test: propagate the error to the caller
When the copyBusybox() fails, the error message should be
propagated to the caller of newRootfs().

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-07-25 22:25:43 +08:00
Antonio Murdaca 5eab2d59d3 Swap check for systemd booted to use go-systemd method
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-07-25 01:36:14 +02:00
Alexander Morozov d9e513043c Use /proc/self/exe as default for InitPath
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-24 11:45:09 -07:00
Antonio Murdaca 15741a4ab3 Adapt code to go-systemd/dbus v3
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-07-24 15:54:59 +02:00
Mrunal Patel 32aa2756ca Merge pull request #148 from jhjeong-kr/typo
typo: tempory -> temporary
2015-07-23 20:46:12 -07:00
Jin-Hwan Jeong 491cfef259 typo: tempory -> temporary
Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
2015-07-24 11:19:25 +09:00
Alexander Morozov d89964eed3 Remount /sys/fs/cgroup as RO if MS_RDONLY was passed in m.Flags
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-22 11:05:40 -07:00
Alexander Morozov 1ed929f177 Merge pull request #114 from brahmaroutu/gccgo_stacktrace_loops
avoid infinite loop with GCCGO
2015-07-21 11:05:09 -07:00
Alexander Morozov d3217084b5 Create symlinks for merged cgroups
This allows software be not aware about existence of merged cgroups.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-20 16:12:28 -07:00
Mrunal Patel 073e76c9fc Merge pull request #142 from avagin/cr
ct: give criu informations about cgroup mounts
2015-07-20 15:13:03 -07:00
Andrey Vagin af4a5e708a ct: give criu informations about cgroup mounts
Actually cgroup mounts are bind-mounts, so they should be
handled by the same way.

Reported-by: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
2015-07-20 22:56:07 +03:00
Alexander Morozov c0e18b96fb Fix subsystem path with abs parent
Sometimes subsystem can be mounted to path like "subsystem1,subsystem2",
so we need to handle this.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-07-20 11:48:58 -07:00