Commit Graph

1045 Commits

Author SHA1 Message Date
Harshal Patil 22953c122f Remove redundant declaraion of namespace slice
Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
2017-05-02 10:04:57 +05:30
Andrei Vagin 73258813d3 cr: set a freezer cgroup for criu
A freezer cgroup allows to dump processes faster.

If a user wants to checkpoint a container and its storage,
he has to pause a container, but in this case we need to pass
a path to its freezer cgroup to "criu dump".

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-02 04:48:47 +03:00
Andrei Vagin 1c43d091a1 checkpoint: add support for containers with terminals
CRIU was extended to report about orphaned master pty-s via RPC.

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-02 04:48:47 +03:00
Andrei Vagin 1a8b0aced5 Update criurpc
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-01 21:55:57 +03:00
Andrei Vagin f8ca1926c4 libcontainer: check cpt/rst for containers with userns
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-01 21:45:23 +03:00
Andrei Vagin d307e85dbb Print a criu version in a error message
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-05-01 21:45:23 +03:00
Harshal Patil c44d4fa6ed Optimizing looping over namespaces
Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
2017-04-26 11:54:43 +05:30
Qiang Huang 94cfb7955b Merge pull request #1387 from avagin/freezer
Don't try to read freezer.state from the current directory
2017-04-24 20:02:45 -05:00
chchliang 4f0e6c4ef0 add createdState and runningState status testcase
Signed-off-by: chchliang <chen.chuanliang@zte.com.cn>
2017-04-19 16:28:03 +08:00
Daniel, Dao Quang Minh 9f1ef73ef9 Merge pull request #1402 from chchliang/generictest
add testcase in generic_error_test.go
2017-04-18 11:42:24 +01:00
chchliang a23d7c2eab add testcase in generic_error_test.go
Signed-off-by: chchliang <chen.chuanliang@zte.com.cn>
2017-04-18 08:56:02 +08:00
Mrunal Patel 97db1eaad9 Merge pull request #1396 from harche/cstate
Set container state only once during start
2017-04-17 11:32:42 -07:00
Daniel, Dao Quang Minh 13a8c5d140 Merge pull request #1365 from hqhq/use_go_selinux
Use opencontainers/selinux package
2017-04-15 14:22:32 +01:00
Mrunal Patel 7814a0d14b Merge pull request #1399 from avagin/cr-cgroup
restore: apply resource limits
2017-04-13 11:28:28 -07:00
Michael Crosby f8ce01dbdc Merge pull request #1371 from adrianreber/master
checkpoint: check if system supports pre-dumping
2017-04-12 10:08:02 -07:00
CuiHaozhi 248c586500 could load a stopped container.
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-04-07 07:39:41 -04:00
Andrei Vagin 57ef30a2ae restore: apply resource limits
When C/R was implemented, it was enough to call manager.Set to apply
limits and to move a task. Now .Set() and .Apply() have to be called
separately.

Fixes: 8a740d5391 ("libcontainer: cgroups: don't Set in Apply")
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-04-07 02:47:43 +03:00
Christy Perez fca53109c1 Fix console syscalls
Fixes opencontainers/runc/issues/1364

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-04-06 16:51:54 -05:00
Adrian Reber 273b7853c8 checkpoint: check if system supports pre-dumping
Instead of relying on version numbers it is possible to check if CRIU
actually supports certain features. This introduces an initial
implementation to check if CRIU and the underlying kernel actually
support dirty memory tracking for memory pre-dumping.

Upstream CRIU also supports the lazy-page migration feature check and
additional feature checks can be included in CRIU to reduce the version
number parsing. There are also certain CRIU features which depend on one
side on the CRIU version but also require certain kernel versions to
actually work. CRIU knows if it can do certain things on the kernel it
is running on and using the feature check RPC interface makes it easier
for runc to decide if the criu+kernel combination will support that
feature.

Feature checking was introduced with CRIU 1.8. Running with older CRIU
versions will ignore the feature check functionality and behave just
like it used to.

v2:
 - Do not use reflection to compare requested and responded
   features. Checking which feature is available is now hardcoded
   and needs to be adapted for every new feature check. The code
   is now much more readable and simpler.

v3:
 - Move the variable criuFeat out of the linuxContainer struct,
   as it is not container specific. Now it is a global variable.

Signed-off-by: Adrian Reber <areber@redhat.com>
2017-04-06 11:17:52 +00:00
Harshal Patil 1be5d31da2 Set container state only once during start
Signed-off-by: Harshal Patil <harshal.patil@in.ibm.com>
2017-04-04 15:08:04 +05:30
Derek Carr 4d6225aec2 Expose memory.use_hierarchy in MemoryStats
Signed-off-by: Derek Carr <decarr@redhat.com>
2017-03-31 13:40:34 -04:00
Aleksa Sarai cbc4f9865a
libcontainer: rewrite cmsg to use sys/unix
The original implementation is in C, which increases cognitive load and
possibly might cause us problems in the future. Since sys/unix is better
maintained than the syscall standard library switching makes more sense.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-30 16:03:21 +11:00
Aleksa Sarai d04cbc49d2
rootless: add autogenerated rootless config from `runc spec`
Since this is a runC-specific feature, this belongs here over in
opencontainers/ocitools (which is for generic OCI runtimes).

In addition, we don't create a new network namespace. This is because
currently if you want to set up a veth bridge you need CAP_NET_ADMIN in
both network namespaces' pinned user namespace to create the necessary
interfaces in each network namespace.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:21 +11:00
Aleksa Sarai 76aeaf8181
libcontainer: init: fix unmapped console fchown
If the stdio of the container is owned by a group which is not mapped in
the user namespace, attempting to fchown the file descriptor will result
in EINVAL. Counteract this by simply not doing an fchown if the group
owner of the file descriptor has no host mapping according to the
configured GIDMappings.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:21 +11:00
Aleksa Sarai f0876b0427
libcontainer: configs: add proper HostUID and HostGID
Previously Host{U,G}ID only gave you the root mapping, which isn't very
useful if you are trying to do other things with the IDMaps.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Aleksa Sarai baeef29858
rootless: add rootless cgroup manager
The rootless cgroup manager acts as a noop for all set and apply
operations. It is just used for rootless setups. Currently this is far
too simple (we need to add opportunistic cgroup management), but is good
enough as a first-pass at a noop cgroup manager.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Aleksa Sarai d2f49696b0
runc: add support for rootless containers
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:

* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update

The following commands work:

* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state

In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:45:24 +11:00
Aleksa Sarai 6bd4bd9030
*: handle unprivileged operations and !dumpable
Effectively, !dumpable makes implementing rootless containers quite
hard, due to a bunch of different operations on /proc/self no longer
being possible without reordering everything.

!dumpable only really makes sense when you are switching between
different security contexts, which is only the case when we are joining
namespaces. Unfortunately this means that !dumpable will still have
issues in this instance, and it should only be necessary to set
!dumpable if we are not joining USER namespaces (new kernels have
protections that make !dumpable no longer necessary). But that's a topic
for another time.

This also includes code to unset and then re-set dumpable when doing the
USER namespace mappings. This should also be safe because in principle
processes in a container can't see us until after we fork into the PID
namespace (which happens after the user mapping).

In rootless containers, it is not possible to set a non-dumpable
process's /proc/self/oom_score_adj (it's owned by root and thus not
writeable). Thus, it needs to be set inside nsexec before we set
ourselves as non-dumpable.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:45:19 +11:00
Qiang Huang 5e7b48f7c0 Use opencontainers/selinux package
It's splitted as a separate project.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-23 08:21:19 +08:00
Andrei Vagin 88256d646d Don't try to read freezer.state from the current directory
If we try to pause a container on the system without freezer cgroups,
we can found that runc tries to open ./freezer.state. It is obviously wrong.

$ ./runc pause test
no such directory for freezer.state

$ echo FROZEN > freezer.state
$ ./runc pause test
container not running or created: paused

Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2017-03-23 01:58:45 +03:00
Daniel Dao 09c72cea69
fix panic regression when config doesnt have caps
When process config doesnt specify capabilities anywhere, we should not panic
because setting capabilities are optional.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2017-03-21 00:45:26 +00:00
Michael Crosby 767783a631 Merge pull request #1375 from hqhq/use_uint64_for_resources
Use uint64 for resources to keep consistency with runtime-spec
2017-03-20 12:47:21 -07:00
Qiang Huang 8430cc4f48 Use uint64 for resources to keep consistency with runtime-spec
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-20 18:51:39 +08:00
Aleksa Sarai c651512ad8
Revert "fix minor issue"
This reverts commit d4091ef151.

d4091ef151 ("fix minor issue") doesn't actually make any sense, and
actually makes the code more confusing.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-20 12:28:43 +11:00
Qiang Huang d270940363 Merge pull request #1356 from crosbymichael/console-socket
Add separate console socket
2017-03-18 04:03:03 -05:00
Mrunal Patel c266f1470c Merge pull request #1373 from moypray/minor
fix minor issue
2017-03-16 12:15:46 -07:00
Wentao Zhang d4091ef151 fix minor issue
When failed to attach veth pair, should remove the veth device

Signed-off-by: Wentao Zhang <zhangwentao234@huawei.com>
2017-03-17 03:18:44 +08:00
Michael Crosby 957ef9cc73 Remove terminal info
This maybe a nice extra but it adds complication to the usecase.  The
contract is listen on the socket and you get an fd to the pty master and
that is that.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-03-16 10:23:59 -07:00
Michael Crosby 00a0ecf554 Add separate console socket
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-03-16 10:23:59 -07:00
Mrunal Patel 4f903a21c4 Remove ambient build tag
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-15 11:38:43 -07:00
Mrunal Patel 4f9cb13b64 Update runtime spec to 1.0.0.rc5
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-15 11:38:37 -07:00
Craig Furman f5c5aac958 Create containers when cgroups already mounted
Runc needs to copy certain files from the top of the cgroup cpuset hierarchy
into the container's cpuset cgroup directory. Currently, runc determines
which directory is the top of the hierarchy by using the parent dir of
the first entry in /proc/self/mountinfo of type cgroup.

This creates problems when cgroup subsystems are mounted arbitrarily in
different dirs on the host.

Now, we use the most deeply nested mountpoint that contains the
container's cpuset cgroup directory.

Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
Signed-off-by: Will Martin <wmartin@pivotal.io>
2017-03-15 10:10:30 +00:00
Qiang Huang b7932a2e07 Remove unused ExecFifoPath
In container process's Init function, we use
fd + execFifoFilename to open exec fifo, so this
field in init config is never used.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-09 10:58:16 +08:00
Qiang Huang df4d872dd9 Merge pull request #1327 from CarltonSemple/lxd-fix
Update devices_unix.go for LXD
2017-03-08 19:34:31 -06:00
Carlton-Semple 0590736890 Added comment linking to LXD issue 2825
Signed-off-by: Carlton-Semple <carlton.semple@ibm.com>
2017-03-08 10:25:37 -05:00
Qiang Huang 8773c5f9a6 Remove unused function in systemd cgroup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-07 15:11:37 +08:00
Michael Crosby 49a33c41f8 Merge pull request #1344 from xuxinkun/fixCPUQuota20170224
fix cpu.cfs_quota_us changed when systemd daemon-reload using systemd.
2017-03-06 10:02:28 -08:00
xuxinkun c44aec9b23 fix cpu.cfs_quota_us changed when systemd daemon-reload using systemd.
Signed-off-by: xuxinkun <xuxinkun@gmail.com>
2017-03-06 20:08:30 +11:00
Michael Crosby c50d024500 Merge pull request #1280 from datawolf/user
user: fix the parameter error
2017-02-27 11:22:58 -08:00
Qiang Huang fe898e7862 Fix kmem accouting when use with cgroupsPath
Fixes: #1347
Fixes: #1083

The root cause of #1083 is because we're joining an
existed cgroup whose kmem accouting is not initialized,
and it has child cgroup or tasks in it.

Fix it by checking if the cgroup is first time created,
and we should enable kmem accouting if the cgroup is
craeted by libcontainer with or without kmem limit
configed. Otherwise we'll get issue like #1347

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-25 10:58:18 -08:00
Qiang Huang 707dd48b2f Merge pull request #1001 from x1022as/predump
add pre-dump and parent-path to checkpoint
2017-02-24 10:55:06 -08:00
Aleksa Sarai 02141ce862
merge branch 'pr-1317'
Closes #1317
LGTMs: @cyphar @crosbymichael
2017-02-24 08:21:58 +11:00
Qiang Huang 733563552e Fix state when _LIBCONTAINER in environment
Fixes: #1311

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:35:14 -08:00
Qiang Huang 805b8c73d3 Do not create exec fifo in factory.Create
It should not be binded to container creation, for
example, runc restore needs to create a
libcontainer.Container, but it won't need exec fifo.

So create exec fifo when container is started or run,
where we really need it.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 10:34:48 -08:00
Brian Goff d193f95d07 Don't override system error
The error message added here provides no value as the caller already
knows all the added details. However it is covering up the underyling
system error (typically `ENOTSUP`). There is no way to handle this error before
this change.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2017-02-22 09:29:38 -05:00
Michael Crosby 8438b26e9f Merge pull request #1237 from hqhq/fix_sync_race
Fix race condition when sync with child and grandchild
2017-02-20 17:16:43 -08:00
Michael Crosby 4a164a826c Use %zu for printing of size_t values
This helps fix compile warnings on some arm systems.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-02-20 16:57:27 -08:00
Qiang Huang a54316bae1 Fix race condition when sync with child and grandchild
Fixes: #1236
Fixes: #1281

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-18 20:42:08 +08:00
Qiang Huang 6b1d0e76f2 Merge pull request #1127 from boynux/fix-set-mem-to-unlimited
Fixes set memory to unlimited
2017-02-16 09:51:23 +08:00
Mohammad Arab 18ebc51b3c Reset Swap when memory is set to unlimited (-1)
Kernel validation fails if memory set to -1 which is unlimited but
swap is not set so.

Signed-off-by: Mohammad Arab <boynux@gmail.com>
2017-02-15 08:11:57 +01:00
Carlton Semple 9a7e5a9434 Update devices_unix.go for LXD
getDevices() has been updated to skip `/dev/.lxc` and `/dev/.lxd-mounts`, which was breaking privileged Docker containers running on runC, inside of LXD managed Linux Containers

Signed-off-by: Carlton-Semple <carlton.semple@ibm.com>
2017-02-14 16:12:03 -05:00
Deng Guangxing 98f004182b add pre-dump and parent-path to checkpoint
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.

This patch add pre-dump and parent-path arguments to runc checkpoint

Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
2017-02-14 19:45:07 +08:00
Ma Shimiao 06e27471bb support create device with type p and u
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-02-10 14:45:15 +08:00
Qiang Huang 45a8341811 Small cleanup
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-08 15:09:06 +08:00
Qiang Huang a8d7eb7076 Merge pull request #1314 from runcom/overlay-mounts
libcontainer: rootfs_linux: support overlayfs
2017-02-08 16:17:01 +08:00
Antonio Murdaca ca14e7b463
libcontainer: rootfs_linux: support overlayfs
As the runtime-spec allows it, we want to be able to specify overlayfs
mounts with:

    {
        "destination": "/etc/pki",
        "type": "overlay",
        "source": "overlay",
        "options": [
            "lowerdir=/etc/pki:/home/amurdaca/go/src/github.com/opencontainers/runc/rootfs_fedora/etc/pki"
        ]
    },

This patch takes care of allowing overlayfs mounts. Both RO and RW
should be supported.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 19:43:24 +01:00
Antonio Murdaca 75acc7c7c3
libcontainer: selinux: fix DupSecOpt and DisableSecOpt
`label.InitLabels` takes options as a string slice in the form of:

    user:system_u
    role:system_r
    type:container_t
    level:s0:c4,c5

However, `DupSecOpt` and `DisableSecOpt` were still adding a docker
specifc `label=` in front of every option. That leads to `InitLabels`
not being able to correctly init selinux labels in this scenario for
instance:

    label.InitLabels(DupSecOpt([%OPTIONS%]))

if `%OPTIONS` has options prefixed with `label=`, that's going to fail.
Fix this by removing that docker specific `label=` prefix.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-06 17:29:42 +01:00
Qiang Huang 7350cd8640 Merge pull request #1285 from stevenh/signal-wait
Only wait for processes after delivering SIGKILL in signalAllProcesses
2017-02-06 16:41:24 +08:00
Qiang Huang 0c21b089e6 Merge pull request #1309 from stevenh/recorded-state-typo
Correct docs typo for restoredState.
2017-02-04 11:51:25 +08:00
Steven Hartland 54862146c7 Correct docs typo for restoredState.
Correct typo in docs for restoredState.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:19:01 +00:00
Steven Hartland 3f431f497e Correct container.Destroy() docs
Correct container.Destroy() docs to clarify that destroy can only operate on containers in specific states.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-03 16:18:29 +00:00
Qiang Huang be33383e60 Merge pull request #1293 from stevenh/resolve-initarg
Resolve InitArgs to ensure init works
2017-02-03 19:25:52 +08:00
Michael Crosby 9073486547 Merge pull request #1274 from cyphar/further-CVE-2016-9962-cleanup
libcontainer: init: only pass stateDirFd when creating a container
2017-02-02 11:11:42 -08:00
Mrunal Patel 1c9c074d79 Merge pull request #1303 from runcom/revert-initlabels
Revert "DupSecOpt needs to match InitLabels"
2017-02-01 10:37:16 -08:00
Steven Hartland b9dfa444c4 Resolve InitArgs to ensure init works
If a relative pathed exe is used for InitArgs init will fail to run if Cwd is not set the original path.

Prevent failure of init to run by ensuring that exe in InitArgs is an absolute path.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:42:09 +00:00
Aleksa Sarai e034cedce7
libcontainer: init: only pass stateDirFd when creating a container
If we pass a file descriptor to the host filesystem while joining a
container, there is a race condition where a process inside the
container can ptrace(2) the joining process and stop it from closing its
file descriptor to the stateDirFd. Then the process can access the
*host* filesystem from that file descriptor. This was fixed in part by
5d93fed3d2 ("Set init processes as non-dumpable"), but that fix is
more of a hail-mary than an actual fix for the underlying issue.

To fix this, don't open or pass the stateDirFd to the init process
unless we're creating a new container. A proper fix for this would be to
remove the need for even passing around directory file descriptors
(which are quite dangerous in the context of mount namespaces).

There is still an issue with containers that have CAP_SYS_PTRACE and are
using the setns(2)-style of joining a container namespace. Currently I'm
not really sure how to fix it without rampant layer violation.

Fixes: CVE-2016-9962
Fixes: 5d93fed3d2 ("Set init processes as non-dumpable")
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-02-02 00:41:11 +11:00
Steven Hartland 82d895fbb9 Conditionally wait for children after delivering signal
When signaling children and the signal is SIGKILL wait for children
otherwise conditionally wait for children which are ready to report.

This reaps all children which exited due to the signal sent without
blocking indefinitely.

Also:
* Ignore ignore ECHILD, which means the child has already gone.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-02-01 13:22:37 +00:00
Antonio Murdaca 384c1e595c
Revert "DupSecOpt needs to match InitLabels"
This reverts commit 491cadac92.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-02-01 09:14:20 +01:00
Mrunal Patel 510879e31f Merge pull request #1284 from stevenh/godoc
Add godoc links to README.md files
2017-01-30 10:56:58 -08:00
Daniel, Dao Quang Minh 6c22e77604 Merge pull request #1294 from stevenh/start-init-fixes
Ensure pipe is always closed on error in StartInitialization
2017-01-27 16:25:44 +00:00
Qiang Huang ed2df2906b Merge pull request #1205 from YuPengZTE/devError
fix typos by the result of golint checking
2017-01-27 21:42:18 +08:00
Mrunal Patel c139a7c761 Merge pull request #1298 from stevenh/mention-nsenter
Add nsenter details to libcontainer README.md
2017-01-25 16:25:02 -08:00
Steven Hartland 64aa78b762 Ensure pipe is always closed on error in StartInitialization
Ensure that the pipe is always closed during the error processing of  StartInitialization.

Also:
* Fix a comment typo.
* Use newContainerInit directly as there's no need for i to be an initer.
* Move the comment about the behaviour of Init() directly above it, clarifying what happens for all defers.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-25 12:36:40 +00:00
Steven Hartland 89fb8b1609 Add nsenter details to libcontainer README.md
Add the import of nsenter to the example in libcontainer's README.md, as without it none of the example code works.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-25 01:05:36 +00:00
Justin Cormack 6ba5f5f9b8 Remove a compiler warning in some environments
POSIX mandates that `cmsg_len` in `struct cmsghdr` is a `socklen_t`,
which is an `unsigned int`. Musl libc as used in Alpine implements
this; Glibc ignores the spec and makes it a `size_t` ie `unsigned long`.
To avoid the `-Wformat=` warning from the `%lu` on Alpine, cast this
to an `unsigned long` always.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2017-01-24 14:06:15 +00:00
rainrambler 4449acd306 using golang-style assignment
using golang-style assignment, not the c-style

Signed-off-by: Wang Anyu <wanganyu@outlook.com>
2017-01-23 14:37:16 +08:00
Steven Hartland a887fc3f2d Add godoc links to README.md files
Add godoc links to README.md files for runc and libcontainer so its easy to access the golang documentation.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-21 18:21:03 +00:00
Steven Hartland 27a5447ea4 Only wait for processes after delivering SIGKILL in signalAllProcesses
signalAllProcesses was making the assumption that the requested signal was SIGKILL, possibly due to the signal parameter being added at a later date, and hence it was safe to wait for all processes which is not the case.

BaseContainer.Signal(s os.Signal, all bool) exposes this functionality to consumers, so an arbitrary signal could be used which is not guaranteed to make the processes exit.

Correct the documentation for signalAllProcesses around the signal delivered and update it so that the wait is only performed on SIGKILL hence making it safe to process other signals without risk of blocking forever, while still maintaining compatibility to SIGKILL callers.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-01-21 18:20:23 +00:00
Daniel, Dao Quang Minh 0fefa36f3a Merge pull request #1278 from datawolf/scanner
move error check out of the for loop
2017-01-20 17:49:44 +00:00
Daniel, Dao Quang Minh b8cefd7d8f Merge pull request #1266 from mrunalp/ignore_cgroup_v2
Ignore cgroup2 mountpoints
2017-01-20 17:26:46 +00:00
Wang Long dde4b1a885 user: fix the parameter error
The parameters passed to `GetExecUser` is not correct.
Consider the following code:

```
package main

import (
	"fmt"
	"io"
	"os"
)

func main() {
	passwd, err := os.Open("/etc/passwd1")
	if err != nil {
		passwd = nil
	} else {
		defer passwd.Close()
	}

	err = GetUserPasswd(passwd)
	if err != nil {
		fmt.Printf("%#v\n", err)
	}
}

func GetUserPasswd(r io.Reader) error {
	if r == nil {
		return fmt.Errorf("nil source for passwd-formatted
data")
	} else {
		fmt.Printf("r = %#v\n", r)
	}
	return nil
}
```

If the file `/etc/passwd1` is not exist, we expect to return
`nil source for passwd-formatted data` error, and in fact, the func
`GetUserPasswd` return nil.

The same logic exists in runc code. this patch fix it.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-19 10:02:47 +08:00
Wang Long 3a71eb0256 move error check out of the for loop
The `bufio.Scanner.Scan` method returns false either by reaching the
end of the input or an error. After Scan returns false, the Err method
will return any error that occurred during scanning, except that if it
was io.EOF, Err will return nil.

We should check the error when Scan return false(out of the for loop).

Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-18 05:02:39 +00:00
Qiang Huang a9610f2c02 Merge pull request #1249 from datawolf/small-refactor
small refactor
2017-01-13 02:04:59 -06:00
Mrunal Patel c7ebda72ac Add a test for testing that we ignore cgroup2 mounts
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 16:49:53 -08:00
Mrunal Patel e7b57cb042 Ignore cgroup2 mountpoints
Our current cgroup parsing logic assumes cgroup v1 mounts
so we should ignore cgroup2 mounts for now

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-01-11 12:34:50 -08:00
Mrunal Patel 361bb0001a Merge pull request #1268 from hqhq/use_source_mp
Do not create cgroup dir name from combining subsystems
2017-01-11 11:34:34 -08:00
Michael Crosby 5d93fed3d2 Set init processes as non-dumpable
This sets the init processes that join and setup the container's
namespaces as non-dumpable before they setns to the container's pid (or
any other ) namespace.

This settings is automatically reset to the default after the Exec in
the container so that it does not change functionality for the
applications that are running inside, just our init processes.

This prevents parent processes, the pid 1 of the container, to ptrace
the init process before it drops caps and other sets LSMs.

This patch also ensures that the stateDirFD being used is still closed
prior to exec, even though it is set as O_CLOEXEC, because of the order
in the kernel.

https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318

The order during the exec syscall is that the process is set back to
dumpable before O_CLOEXEC are processed.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-01-11 09:56:56 -08:00
Daniel, Dao Quang Minh 2cc5a91249 Merge pull request #1260 from coolljt0725/remove_redundant
Cleanup: remove redundant code
2017-01-11 17:18:15 +00:00
Qiang Huang 0599ac7d93 Do not create cgroup dir name from combining subsystems
On some systems, when we mount some cgroup subsystems into
a same mountpoint, the name sequence of mount options and
cgroup directory name can not be the same.

For example, the mount option is cpuacct,cpu, but
mountpoint name is /sys/fs/cgroup/cpu,cpuacct. In current
runc, we set mount destination name from combining
subsystems, which comes from mount option from
/proc/self/mountinfo, so in my case the name would be
/sys/fs/cgroup/cpuacct,cpu, which is differernt from
host, and will break some applications.

Fix it by using directory name from host mountpoint.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-11 15:27:58 +08:00
Qiang Huang db99936a0e Merge pull request #1110 from avagin/cpt-in-userns
checkpoint: handle config.Devices and config.MaskPaths
2017-01-10 00:34:40 -06:00
Mrunal Patel 11f6c37e75 Merge pull request #1248 from datawolf/fix-the-outdated-comment
Fix the outdated comment for Error interface
2017-01-09 11:14:07 -08:00
Mrunal Patel 7ae521cef0 Merge pull request #1251 from datawolf/update-cgroup-comment
cgroups: update the comments
2017-01-09 11:13:39 -08:00
Michael Crosby 9100e5f1f9 Merge pull request #1254 from hqhq/fix_go_vet
Fix go_vet errors
2017-01-09 10:49:45 -08:00
Michael Crosby 9adbb6cbf0 Merge pull request #1255 from hqhq/fix_typo
Fix typos
2017-01-09 10:49:16 -08:00
Michael Crosby 44e60af49d Merge pull request #1196 from hqhq/fix_cgroup_leftover
Fix leftover cgroup directory issue
2017-01-09 10:31:04 -08:00
Lei Jitang 689a116d18 Cleanup: remove redundant code
Signed-off-by: Lei Jitang <leijitang@huawei.com>
2017-01-09 01:56:14 -05:00
Qiang Huang 20f0ca7306 Fix typos
Found by:
https://goreportcard.com/report/github.com/opencontainers/runc#misspell

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-06 10:54:33 +08:00
Qiang Huang f3c16acd47 Fix go_vet errors
runc/libcontainer/configs/namespaces_syscall_unsupported.go
Line 7: error: unreachable code (vet)
Line 14: error: unreachable code (vet)

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-06 10:20:27 +08:00
Wang Long 4732f46fd9 small refactor
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-04 11:39:44 +08:00
Aleksa Sarai 816efe0abd
*: fix go-vet failures
Previously, we would get failures with go-vet with test files.

% go vet ./...
libcontainer/integration/exec_test.go:42: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:43: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:184: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:185: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1568: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1569: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1600: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/exec_test.go:1601: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:92: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:93: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:506: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields
libcontainer/integration/execin_test.go:507: github.com/opencontainers/runc/libcontainer/configs.IDMap composite literal uses unkeyed fields

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-01-04 09:48:32 +11:00
Wang Long 4dfd350a38 cgroups: update the comments
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-03 22:40:12 +08:00
Wang Long 61640b099a Fix the outdated comment for Error interface
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2017-01-03 15:06:47 +08:00
Qiang Huang f376b8033d Merge pull request #1222 from justincormack/remount-fixes
Split the code for remounting mount points and mounting paths.
2016-12-27 15:24:56 +08:00
Aleksa Sarai cae7979d1f
merge branch 'pr-1217'
Closes #1217
LGTMs: @cyphar @hqhq
2016-12-24 09:31:38 +11:00
Zhang Wei a344b2d6a8 sync up `HookState` with OCI spec `State`
`HookState` struct should follow definition of `State` in runtime-spec:

* modify json name of `version` to `ociVersion`.
* Remove redundant `Rootfs` field as rootfs can be retrived from
`bundlePath/config.json`.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-20 00:00:43 +08:00
Zhang Wei 8eea644ccc Bump runtime-spec to v1.0.0-rc3
* Bump underlying runtime-spec to version 1.0.0-rc3
* Fix related changed struct names in config.go

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-17 14:02:35 +08:00
Justin Cormack 50acb55233 Split the code for remounting mount points and mounting paths.
A remount of a mount point must include all the current flags or
these will be cleared:

```
The mountflags and data arguments should match the values used in the
original mount() call, except for those parameters that are being
deliberately changed.
```

The current code does not do this; the bug manifests in the specified
flags for `/dev` being lost on remount read only at present. As we
need to specify flags, split the code path for this from remounting
paths which are not mount points, as these can only inherit the
existing flags of the path, and these cannot be changed.

In the bind case, remove extra flags from the bind remount. A bind
mount can only be remounted read only, no other flags can be set,
all other flags are inherited from the parent. From the man page:

```
Since Linux 2.6.26, this flag can also be used to make an existing
bind mount read-only by specifying mountflags as:

MS_REMOUNT | MS_BIND | MS_RDONLY

Note that only the MS_RDONLY setting of the bind mount can be changed
in this manner.
```

MS_REC can only be set on the original bind, so move this. See note
in man page on bind mounts:

```
The remaining bits in the mountflags argument are also ignored, with
the exception of MS_REC.
```

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2016-12-16 14:01:17 -08:00
Samuel Ortiz f19aa2d04d
validate: Check that the given namespace path is a symlink
When checking if the provided networking namespace is the host
one or not, we should first check if it's a symbolic link or not
as in some cases we can use persistent networking namespace under
e.g. /var/run/netns/.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-12-10 11:14:49 +01:00
Mrunal Patel 34f23cb99c Merge pull request #1018 from cyphar/console-rewrite
Consoles, consoles, consoles.
2016-12-07 14:37:19 -08:00
Mrunal Patel 8f55948aa5 Don't add device to list if it doesn't exist anymore
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-12-07 11:08:00 -08:00
Eric Paris a4f3123c35 Fix thread safety of SelinuxEnabled and getSelinuxMountPoint
Both suffered from different race conditions.

SelinuxEnabled assigned selinuxEnabledChecked before selinuxEnabled.
Thus racing callers could see the wrong selinuxEnabled.

getSelinuxMountPoint assigned selinuxfs to "" before it know the right
value. Thus racing could see "" improperly.

The gate selinuxfs, enabled, and mclist all on the same lock
2016-12-06 13:50:03 -05:00
yupeng 602c85fdc6 trailing punctuation in header
Signed-off-by: yupeng <yu.peng36@zte.com.cn>
2016-12-02 15:42:17 +08:00
Mrunal Patel 4271a8b5ae Merge pull request #1211 from YummyPeng/fix_typo
Fix typo.
2016-12-01 11:14:42 -08:00
Mrunal Patel 5d842907c6 Merge pull request #1210 from xianlubird/fix-typo
Fix typo
2016-12-01 11:14:19 -08:00
Mrunal Patel 8002a8c894 Merge pull request #1208 from datawolf/tiny-refactor
tiny refactor
2016-12-01 11:13:33 -08:00
Yuanhong Peng 30e2d4b9da Fix typo.
Signed-off-by: Yuanhong Peng <pengyuanhong@huawei.com>
2016-12-01 16:48:09 +08:00
Xianlu Bird e2e6f58e4e Fix typo
Fix typo
2016-12-01 15:23:58 +08:00
Aleksa Sarai 972c176ae4
tests: fix all the things
This fixes all of the tests that were broken as part of the console
rewrite. This includes fixing the integration tests that used TTY
handling inside libcontainer, as well as the bats integration tests that
needed to be rewritten to use recvtty (as they rely on detached
containers that are running).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:37 +11:00
Aleksa Sarai bda3055055
*: update busybox test rootfs
Switch to the actual source of the official Docker library of images, so
that we have a proper source for the test filesystem. In addition,
update to the latest released version (1.25.0 [2016-06-23]) so that we
can use more up-to-date applets in our tests (such as stat(3)).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 7df64f8886
runc: implement --console-socket
This allows for higher-level orchestrators to be able to have access to
the master pty file descriptor without keeping the runC process running.
This is key to having (detach && createTTY) with a _real_ pty created
inside the container, which is then sent to a higher level orchestrator
over an AF_UNIX socket.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Mrunal Patel f1324a9fc1
Don't label the console as it already has the right label
[@cyphar: removed mountLabel argument from .mount().]

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai c0c8edb9e8
console: don't chown(2) the slave PTY
Since the gid=X and mode=Y flags can be set inside config.json as mount
options, don't override them with our own defaults. This avoids
/dev/pts/* not being owned by tty in a regular container, as well as all
of the issues with us implementing grantpt(3) manually. This is the
least opinionated approach to take.

This patch is part of the console rewrite patchset.

Reported-by: Mrunal Patel <mrunalp@gmail.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 244c9fc426
*: console rewrite
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.

In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.

We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:49:36 +11:00
Aleksa Sarai 4776b4326a
libcontainer: refactor syncT handling
To make the code cleaner, and more clear, refactor the syncT handling
used when creating the `runc init` process. In addition, document the
state changes so that people actually understand what is going on.

Rather than only using syncT for the standard initProcess, use it for
both initProcess and setnsProcess. This removes some special cases, as
well as allowing for the use of syncT with setnsProcess.

Also remove a bunch of the boilerplate around syncT handling.

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
Aleksa Sarai 2055115566
cmsg: add cmsg {send,recv}fd wrappers
This adds C wrappers for sendmsg and recvmsg, specifically used for
passing around file descriptors in Go. The wrappers (sendfd, recvfd)
expect to be called in a context where it makes sense (where the other
side is carrying out the corresponding action).

This patch is part of the console rewrite patchset.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-12-01 15:46:04 +11:00
yupeng 145d23e084 error strings should not be capitalized or end with punctuation
Signed-off-by: yupeng <yu.peng36@zte.com.cn>
2016-12-01 11:57:16 +08:00
Wang Long 1b401664d1 tiny refactor
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-11-30 20:53:37 +08:00
allencloud f596858395 fix typos
Signed-off-by: allencloud <allen.sun@daocloud.io>
2016-11-30 13:31:36 +08:00
Mrunal Patel 4c013a1524 Merge pull request #1194 from hqhq/fix_cpu_exclusive
Fix cpuset issue with cpuset.cpu_exclusive
2016-11-29 09:49:34 -08:00
Daniel, Dao Quang Minh f156f73c2a Merge pull request #1154 from hqhq/sync_child
Sync with grandchild
2016-11-23 09:10:00 -08:00
Qiang Huang 14d58e1e48 Fix leftover cgroup directory issue
In the cases that we got failure on a subsystem's Apply,
we'll get some subsystems' cgroup directories leftover.

On Docker's point of view, start a container failed, use
`docker rm` to remove the container, but some cgroup files
are leftover.

Sometimes we don't want to clean everyting up when something
went wrong, because we need these inter situation
information to debug what's going on, but cgroup directories
are not useful information we want to keep.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-22 08:02:43 +08:00
Qiang Huang aee46862ec Fix cpuset issue with cpuset.cpu_exclusive
This PR fix issue in this scenario:

```
in terminal 1:
~# cd /sys/fs/cgroup/cpuset
~# mkdir test
~# cd test
~# cat cpuset.cpus
0-3
~# echo 1 > cpuset.cpu_exclusive (make sure you don't have other cgroups under root)

in terminal 2:
~# echo $$ > /sys/fs/cgroup/cpuset/test/tasks
// set resources.cpu.cpus="0-2" in config.json
~# runc run test1

back to terminal 1:
~# cd test1
~# cat cpuset.cpus
0-2
~# echo 1 > cpuset.cpu_exclusive

in terminal 3:
~# echo $$ > /sys/fs/cgroup/test/tasks
// set resources.cpu.cpus="3" in config.json
~# runc run test2
container_linux.go:247: starting container process caused "process_linux.go:258:
applying cgroup configuration for process caused \"failed to write 0-3\\n to
cpuset.cpus: write /sys/fs/cgroup/cpuset/test2/cpuset.cpus: invalid argument\""
```

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-18 15:28:40 +08:00
Qiang Huang 16a2e8ba6e Sync with grandchild
Without this, it's possible that father process exit with
0 before grandchild exit with error.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-17 08:59:37 +08:00
rajasec 43287af982 Fixing error message in nsexec
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-11-10 17:06:50 +05:30
Mrunal Patel 51371867a0 Merge pull request #1180 from crosbymichael/kill-all
Add --all flag to kill
2016-11-09 12:21:22 -07:00
Michael Crosby e58671e530 Add --all flag to kill
This allows a user to send a signal to all the processes in the
container within a single atomic action to avoid new processes being
forked off before the signal can be sent.

This is basically taking functionality that we already use being
`delete` and exposing it ok the `kill` command by adding a flag.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-08 09:35:02 -08:00
Mrunal Patel 8779fa57eb Merge pull request #1168 from hqhq/fix_nsexec_comments
More fix to nsexec.c's comments
2016-11-07 16:20:42 -07:00
Michael Crosby 5f24c9a61a Merge pull request #1146 from cyphar/io-set-termios-onlcr
libcontainer: io: stop screwing with \n in console output
2016-11-03 09:49:50 -07:00
Mrunal Patel d7481c10f4 Merge pull request #1172 from crosbymichael/ambient-tag
Move ambient capabilties behind build tag
2016-11-02 20:16:26 -07:00
Qiang Huang 84a4218ece More fix to nsexec.c's comments
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-03 10:15:01 +08:00
Aleksa Sarai 49ed0a10e4
merge branch 'pr-1117'
LGTMs: @hqhq @cyphar
Closes: #1117
2016-11-03 05:03:26 +11:00
Michael Crosby 603c151e6c Move ambient capabilties behind build tag
This moves the ambient capability support behind an `ambient` build tag
so that it is only compiled upon request.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-11-02 10:59:59 -07:00
Crazykev 34d7c5c099 fix error message
Signed-off-by: Crazykev <crazykev@zju.edu.cn>
2016-11-02 16:34:08 +08:00
Aleksa Sarai fd7ab60a70
libcontainer: make tests to make sure we don't mess with \r
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Aleksa Sarai eea28f480d
libcontainer: io: stop screwing with \n in console output
The default terminal setting for a new pty on Linux (unix98) has +ONLCR,
resulting in '\n' writes by a container process to be converted to
'\r\n' reads by the managing process. This is quite unexpected, and
causes multiple issues with things like bats testing. To fix it, make
the terminal sane after opening it by setting -ONLCR.

This patch might need to be rewritten after the console rewrite patchset
is merged.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 14:40:54 +11:00
Mrunal Patel bc462c96bf Merge pull request #1165 from cyphar/nsenter-fix-comments
nsenter: fix up comments
2016-10-31 10:39:34 -07:00
Daniel, Dao Quang Minh 509b1db98c Merge pull request #1160 from hqhq/fix_typos
Fix all typos found by misspell
2016-10-31 17:28:44 +00:00
Michael Crosby 8b9b444820 Merge pull request #1157 from rajasec/readme-containerstate
Updating container state and status API in README
2016-10-31 10:26:21 -07:00
Michael Crosby 4c7b8d6c59 Merge pull request #1159 from hqhq/unify_rootfs_validation
Unify rootfs validation
2016-10-31 10:22:01 -07:00
Aleksa Sarai 9b15bf17a0
nsenter: fix up comments
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-11-01 00:21:09 +11:00
rajasec 16ad3855e7 Correction in util error messages
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-29 19:50:56 +05:30
Qiang Huang b15668b36d Fix all typos found by misspell
I use the same tool (https://github.com/client9/misspell)
as Daniel used a few days ago, don't why he missed these
typos at that time.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 14:14:42 +08:00
Qiang Huang 81d6088c8f Unify rootfs validation
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 10:31:44 +08:00
rajasec 1535e67592 Updating container state and status API in README
Signed-off-by: rajasec <rajasec79@gmail.com>

Updating container state and status API in README

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-27 15:29:34 +05:30
Qiang Huang e7abf30cb8 Merge pull request #1150 from WeiZhang555/forbid-duplicated-namespace
Detect and forbid duplicated namespace in spec
2016-10-27 10:23:16 +08:00
Qiang Huang f520eab891 Remove unnecessary cloneflag validation
config.cloneflag is not mandatory, when using `runc exec`,
config.cloneflag can be empty, and even then it won't be
`-1` but `0`.

So this validation is totally wrong and unneeded.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 09:34:20 +08:00
Andrei Vagin 040fb7311c checkpoint: handle config.Devices and config.MaskPaths
In user namespaces devices are bind-mounted from the host, so
we need to add them as external mounts for CRIU.

Reported-by: Ross Boucher <boucher@gmail.com>
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
2016-10-26 23:50:54 +03:00
Mrunal Patel 4599e7074e Merge pull request #1148 from rhvgoyal/parent-mount-private
Make parent mount private before bind mounting rootfs
2016-10-26 17:30:37 +00:00
Zhang Wei a0f7977f0f Detect and forbid duplicated namespace in spec
When spec file contains duplicated namespaces, e.g.

specs: specs.Spec{
        Linux: &specs.Linux{
            Namespaces: []specs.Namespace{
                {
                    Type: "pid",
                },
                {
                    Type: "pid",
                    Path: "/proc/1/ns/pid",
                },
            },
        },
    }

runc should report malformed spec instead of using latest one by
default, because this spec could be quite confusing.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 00:44:36 +08:00
Michael Crosby 6328410520 Merge pull request #1149 from cyphar/fix-sysctl-validation
validator: unbreak sysctl net.* validation
2016-10-26 09:06:41 -07:00
Aleksa Sarai 1ab3c035d2
validator: actually test success
Previously we only tested failures, which causes us to miss issues where
setting sysctls would *always* fail.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 23:07:57 +11:00
Aleksa Sarai 2a94c3651b
validator: unbreak sysctl net.* validation
When changing this validation, the code actually allowing the validation
to pass was removed. This meant that any net.* sysctl would always fail
to validate.

Fixes: bc84f83344 ("fix docker/docker#27484")
Reported-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 22:58:51 +11:00
Qiang Huang 157a96a428 Merge pull request #977 from cyphar/nsenter-userns-ordering
nsenter: guarantee correct user namespace ordering
2016-10-26 16:45:15 +08:00
Vivek Goyal 6c147f8649 Make parent mount private before bind mounting rootfs
This reverts part of the commit eb0a144b5e

That commit introduced two issues.

- We need to make parent mount of rootfs private before bind mounting
  rootfs. Otherwise bind mounting root can propagate in other mount
  namespaces. (If parent mount is shared).

- It broke test TestRootfsPropagationSharedMount() on Fedora.

  On fedora /tmp is a mount point with "shared" propagation. I think
  you should be able to reproduce it on other distributions as well
  as long as you mount tmpfs on /tmp and make it "shared" propagation.

  Reason for failure is that pivot_root() fails. And it fails because
  kernel does following check.

  IS_MNT_SHARED(new_mnt->mnt_parent)

  Say /tmp/foo is new rootfs, we have bind mounted rootfs, so new_mnt
  is /tmp/foo, and new_mnt->mnt_parent is /tmp which is "shared" on
  fedora and above check fails.

So this change broke few things, it is a good idea to revert part of it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
2016-10-25 11:15:11 -04:00
Qiang Huang 4ec570d060 Merge pull request #1138 from gaocegege/fix-config-validator
docker/docker#27484-check if sysctls are used in host network mode.
2016-10-25 11:08:51 +08:00
Aleksa Sarai c7ed2244f4
merge branch 'pr-1125'
LGTMs: @hqhq @mrunalp
Closes #1125
2016-10-25 10:05:28 +11:00
Ce Gao 41c35810f2 add test cases about host ns
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:31:15 +08:00
Ce Gao bc84f83344 fix docker/docker#27484
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:22:52 +08:00
Alexander Morozov 1ab9d5e6f4 Merge pull request #845 from mrunalp/cp_tmpfs
Add support for copying up directories into tmpfs when a tmpfs is mounted over them
2016-10-21 13:47:16 -07:00
Mrunal Patel c4198ad9af Merge pull request #1134 from WeiZhang555/tiny-refactor
Some refactor and cleanup
2016-10-20 15:08:40 -07:00
Yong Tang a83f5bac28 Fix issue in `GetProcessStartTime`
This fix tries to address the issue raised in docker:
https://github.com/docker/docker/issues/27540

The issue was that `GetProcessStartTime` use space `"  "`
to split the `/proc/[pid]/stat` and take the `22`th value.

However, the `2`th value is inside `(` and `)`, and could
contain space. The following are two examples:
```
ubuntu@ubuntu:~/runc$ cat /proc/90286/stat
90286 (bash) S 90271 90286 90286 34818 90286 4194560 1412 1130576 4 0 2 1 2334 438 20 0 1 0 3093098 20733952 823 18446744073709551615 1 1 0 0 0 0 0 3670020 1266777851 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
ubuntu@ubuntu:~/runc$ cat /proc/89653/stat
89653 (gunicorn: maste) S 89630 89653 89653 0 -1 4194560 29689 28896 0 3 146 32 76 19 20 0 1 0 2971844 52965376 3920 18446744073709551615 1 1 0 0 0 0 0 16781312 137447943 0 0 0 17 1 0 0 0 0 0 0 0 0 0 0 0 0 0
```

This fix fixes this issue by removing the prefix before `)`,
then finding the `20`th value (instead of `22`th value).

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-10-20 11:34:21 -07:00
Zhang Wei c179b0ffc7 Some refactor and cleanup
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-20 17:58:51 +08:00
Aleksa Sarai f8e6b5af5e
rootfs: make pivot_root not use a temporary directory
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.

This is the first step of many to allowing runC to work with a
completely read-only rootfs.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-20 12:55:58 +11:00
Derek Carr d223e2adae Ignore error when starting transient unit that already exists
Signed-off-by: Derek Carr <decarr@redhat.com>
2016-10-19 14:55:52 -04:00
Aleksa Sarai e3cd191acc
nsenter: un-split clone(cloneflags) for RHEL
Without this patch applied, RHEL's SELinux policies cause container
creation to not really work. Unfortunately this might be an issue for
rootless containers (opencontainers/runc#774) but we'll cross that
bridge when we come to it.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-18 18:26:27 +11:00
Michael Crosby fcc40b7a63 Remove panic from init
Print the error message to stderr if we are unable to return it back via
the pipe to the parent process.  Also, don't panic here as it is most
likely a system or user error and not a programmer error.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-17 15:54:51 -07:00
Mrunal Patel 4161f2a63b Merge pull request #1115 from rajasec/filemode-panic
Fixing runc panic for missing file mode
2016-10-17 15:01:49 -07:00
Dan Walsh 6932807107 Add support for r/o mount labels
We need support for read/only mounts in SELinux to allow a bunch of
containers to share the same read/only image.  In order to do this
we need a new label which allows container processes to read/execute
all files but not write them.

Existing mount label is either shared write or private write.  This
label is shared read/execute.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-10-17 16:56:42 -04:00
rajasec 034cba6af0 Fixing runc panic for missing file mode
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic for missing file mode

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-16 20:39:44 +05:30
rajasec 4b263c9594 Fixing runc panic during hugetlb pages
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic during hugetlb pages

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-15 19:47:33 +05:30
Dan Walsh 491cadac92 DupSecOpt needs to match InitLabels
At some point InitLabels was changed to look for SecuritOptions
separated by a ":" rather then an "=", but DupSecOpt was never
changed to match this default.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-10-13 16:10:29 -04:00
Daniel, Dao Quang Minh d186a7552b Merge pull request #1111 from keloyang/rpid-limit-check
tiny fix, add a null check for specs.Resources.Pids.Limit
2016-10-13 18:04:49 +01:00
Shukui Yang affc105264 tiny fix, add a null check for specs.Resources.Pids.Limit
Signed-off-by: Shukui Yang <yangshukui@huawei.com>
2016-10-13 15:55:30 +08:00
Daniel Dao 1b876b0bf2 fix typos with misspell
pipe the source through https://github.com/client9/misspell. typos be gone!

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2016-10-11 23:22:48 +00:00
Daniel, Dao Quang Minh 8d505cb9dc Merge pull request #1107 from datawolf/fix-a-typo
just fix a typo
2016-10-12 00:15:51 +01:00
Wang Long 5eaa9ed5cd just fix a typo
Signed-off-by: Wang Long <long.wanglong@huawei.com>
2016-10-11 08:38:15 +00:00
Xianglin Gao 9df4847a23 tiny fix
Signed-off-by: Xianglin Gao <xlgao@zju.edu.cn>
2016-10-11 16:32:56 +08:00
Michael Crosby 11222ee1f1 Don't enable kernel mem if not set
Don't enable the kmem limit if it is not specified in the config.

Fixes #1083

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-10-07 10:02:19 -07:00
Aleksa Sarai b1eb19b4f3
merge branch 'pr-1084'
LGTMs: @mrunalp @cyphar

Closes #1084
2016-10-07 19:10:14 +11:00
Mrunal Patel c4e7f01c4b Add an integration test for tmpfs copy up
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-04 11:26:37 -07:00