Commit Graph

1151 Commits

Author SHA1 Message Date
Mrunal Patel 63e6708c74
Merge pull request #1784 from pierrchen/master
libcontainer/rootfs_linux: minor cleanup
2018-04-17 17:02:10 -07:00
dlorenc 40680b2d37 Make the setupSeccomp function public.
This function is useful for converting from the OCI spec format to the one used by runC/libcontainer.

Signed-off-by: dlorenc <lorenc.d@gmail.com>
2018-04-17 10:47:22 -07:00
Michael Crosby d56f6cc202
Merge pull request #1753 from wking/do-not-require-bind-mount-type
libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts
2018-04-16 11:01:53 -04:00
Bin Chen 1b27db67f1 libcontainer/rootfs_linux: minor cleanup
move variable close to where is used

Signed-off-by: Bin Chen <nk@devicu.com>
2018-04-16 22:25:48 +10:00
Filipe Brandenburger 165ee45334 Make channel for StartTransientUnit buffered
So that, if a timeout happens and we decide to stop blocking on the
operation, the writer will not block when they try to report the result
of the operation.

This should address Issue #1780 and it's a follow up for PR #1683,
PR #1754 and PR #1772.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-14 08:49:50 -07:00
Michael Crosby f753f300ae
Merge pull request #1779 from runcom/gcc8-fix
nsexec.c: fix GCC 8 warning
2018-04-12 12:13:43 -04:00
Michael Crosby 9f0eca2a94
Merge pull request #1777 from nalind/no-config-for-extant-netns
Only configure networking when creating a net ns
2018-04-12 10:55:02 -04:00
Antonio Murdaca 1a5064622c
nsexec.c: fix GCC 8 warning
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-04-12 12:25:06 +02:00
Nalin Dahyabhai 4521d4b19c Only configure networking when creating a net ns
When joining an existing namespace, don't default to configuring a
loopback interface in that namespace.

Its creator should have done that, and we don't want to fail to create
the container when we don't have sufficient privileges to configure the
network namespace.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2018-04-11 13:28:19 -04:00
Filipe Brandenburger 0e16bd9b53 Detect whether Delegate is available on both slices and scopes
Starting with systemd 237, in preparation for cgroup v2, delegation is
only now available for scopes, not slices.

Update libcontainer code to detect whether delegation is available on
both and use that information when creating new slices.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-10 11:42:55 -07:00
Filipe Brandenburger 8ab251f298 Fix systemd.Apply() to check for DBus error before waiting on a channel.
The channel was introduced in #1683 to work around a race condition.
However, the check for error in StartTransientUnit ignores the error for
an already existing unit, and in that case there will be no notification
from DBus (so waiting on the channel will make it hang.)

Later PR #1754 added a timeout, which worked around the issue, but we
can fix this correctly by only waiting on the channel when there is no
error. Fix the code to do so.

The timeout handling was kept, since there might be other cases where
this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358
mentions calling this code from inside a container, it's unclear whether
an existing container was in use or not, so not sure whether this would
have fixed that bug as well.)

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-09 11:51:59 -07:00
Sebastien Boeuf 985628dda0 libcontainer: Don't set container state to running when exec'ing
There is no reason to set the container state to "running" as a
temporary value when exec'ing a process on a container in "created"
state. The problem doing this is that consumers of the libcontainer
library might use it by keeping pointers in memory. In this case,
the container state will indicate that the container is running, which
is wrong, and this will end up with a failure on the next action
because the check for the container state transition will complain.

Fixes #1767

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-03-30 09:29:18 -07:00
Akihiro Suda 73f3dc6389 libcontainer: allow setgroup in rootless mode
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-03-27 17:42:05 +09:00
Akihiro Suda ed58366cc8 libcontainer: fix Boolmsg alignment
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-03-26 14:44:03 +09:00
Tamal Saha 58415b4b12 Fix error message
Signed-off-by: Tamal Saha <tamal@appscode.com>
2018-03-21 20:52:09 -07:00
Aleksa Sarai fd3a6e6c83
libcontainer: handle unset oomScoreAdj corectly
Previously if oomScoreAdj was not set in config.json we would implicitly
set oom_score_adj to 0. This is not allowed according to the spec:

> If oomScoreAdj is not set, the runtime MUST NOT change the value of
> oom_score_adj.

Change this so that we do not modify oom_score_adj if oomScoreAdj is not
present in the configuration. While this modifies our internal
configuration types, the on-disk format is still compatible.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00
Aleksa Sarai 03e585985f
rootless: cgroup: treat EROFS as a skippable error
In some cases, /sys/fs/cgroups is mounted read-only. In rootless
containers we can consider this effectively identical to having cgroups
that we don't have write permission to -- because the user isn't
responsible for the read-only setup and cannot modify it. The rules are
identical to when /sys/fs/cgroups is not writable by the unprivileged
user.

An example of this is the default configuration of Docker, where cgroups
are mounted as read-only as a preventative security measure.

Reported-by: Vladimir Rutsky <rutsky@google.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00
Daniel J Walsh 43aea05946 Label the masked tmpfs with the mount label
Currently if a confined container process tries to list these directories
AVC's are generated because they are labeled with external labels.  Adding
the mountlabel will remove these AVC's.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-03-09 14:29:06 -05:00
Qiang Huang 9facb87f87
Merge pull request #1754 from vikaschoudhary16/add-timeout
Add timeout while waiting for StartTransinetUnit completion signal
2018-03-08 09:09:34 +08:00
W. Trevor King 0aa6e4e5d3 libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts
From the "Creating a bind mount" section of mount(2) [1]:

> If mountflags includes MS_BIND (available since Linux 2.4), then
> perform a bind mount...
>
> The filesystemtype and data arguments are ignored.

This commit adds support for configurations that leave the OPTIONAL
type [2] unset for bind mounts.  There's a related spec-example change
in flight with [3], although my personal preference would be a more
explicit spec for the whole mount structure [4].

[1]: http://man7.org/linux/man-pages/man2/mount.2.html
[2]: https://github.com/opencontainers/runtime-spec/blame/v1.0.1/config.md#L102
[3]: https://github.com/opencontainers/runtime-spec/pull/954
[4]: https://github.com/opencontainers/runtime-spec/pull/771

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-03-07 10:23:42 -08:00
vikaschoudhary16 04e95b526d Add timeout while waiting for StartTransinetUnit completion signal from dbus
Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>
2018-03-07 05:11:38 -05:00
Denys Smirnov 3d26fc3fd7 cgroups/fs: fix NPE on Destroy than no cgroups are set
Currently Manager accepts nil cgroups when calling Apply, but it will panic then trying to call Destroy with the same config.

Signed-off-by: Denys Smirnov <denys@sourced.tech>
2018-03-06 23:31:31 +01:00
Vincent Batts bf74951617
libcontainer/user: platform dependent calls
This rearranges a bit of the user and group lookup, such that only a
basic subset is exposed.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2018-02-28 14:14:24 -05:00
Aleksa Sarai 757e78bebd
merge branch 'pr-1743'
The setupUserNamespace function is always called.

LGTMs: @crosbymichael @mrunalp @cyphar
Closes #1743
2018-02-27 12:22:52 +11:00
Michael Crosby 8aca07289d
Merge pull request #1736 from allencloud/fix-lint-warning
fix lint error in specconv
2018-02-26 14:21:26 -05:00
ynirk 2420eb1f4d The setupUserNamespace function is always called.
The function is called even if the usernamespace is not set.
This results having wrong uid/gid set on devices.

This fix add a test to check if usernamespace is set befor calling
setupUserNamespace.

Fixes #1742

Signed-off-by: Julien Lavesque <julien.lavesque@gmail.com>
2018-02-26 14:27:11 +01:00
Allen Sun 3f32e72963 fix lint error in specconv
Signed-off-by: Allen Sun <allensun.shl@alibaba-inc.com>
2018-02-26 15:39:54 +08:00
Michael Crosby 595bea022f
Merge pull request #1722 from ravisantoshgudimetla/fix-systemd-path
fix systemd slice expansion so that it could be consumed by cAdvisor
2018-02-20 09:59:24 -05:00
W. Trevor King 50dc7ee96c libcontainer/capabilities_linux: Drop os.Getpid() call
gocapability has supported 0 as "the current PID" since
syndtr/gocapability@5e7cce49 (Allow to use the zero value for pid to
operate with the current task, 2015-01-15, syndtr/gocapability#2).
libcontainer was ported to that approach in 444cc298 (namespaces:
allow to use pid namespace without mount namespace, 2015-01-27,
docker/libcontainer#358), but the change was clobbered by 22df5551
(Merge branch 'master' into api, 2015-02-19, docker/libcontainer#388)
which landed via 5b73860e (Merge pull request #388 from docker/api,
2015-02-19, docker/libcontainer#388).  This commit restores the
changes from 444cc298.

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-02-19 15:47:42 -08:00
ravisantoshgudimetla 7019e1de7b fix systemd slice expansion so that it could be consumed by cAdvisor
Signed-off-by: ravisantoshgudimetla <ravisantoshgudimetla@gmail.com>
2018-02-18 21:32:39 -05:00
Mrunal Patel 6e15bc3f92
Merge pull request #1702 from crosbymichael/chroot
chroot when no mount namespaces is provided
2018-02-07 10:09:35 -08:00
W. Trevor King be16b13645 libcontainer/state_linux_test: Add a testTransitions helper
The helper DRYs up the transition tests and makes it easy to get
complete coverage for invalid transitions.

I'm also using t.Run() for subtests.  Run() is new in Go 1.7 [1], but
runc dropped support for 1.6 back in e773f96b (update go version at
travis-ci, 2017-02-20, #1335).

[1]: https://blog.golang.org/subtests

Signed-off-by: W. Trevor King <wking@tremily.us>
2018-01-25 11:18:45 -08:00
Michael Crosby 91ca331474 chroot when no mount namespaces is provided
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-01-25 11:36:37 -05:00
Michael Crosby c4e4bb0df2
Merge pull request #1699 from AkihiroSuda/indent-c
make: validate C format
2018-01-25 10:09:09 -05:00
Aleksa Sarai 5a46c2ba8b
nsenter: move namespace creation after userns creation
Technically, this change should not be necessary, as the kernel
documentation claims that if you call clone(flags|CLONE_NEWUSER), the
new user namespace will be the owner of all other namespaces created in
@flags. Unfortunately this isn't always the case, due to various
additional semantics and kernel bugs.

One particular instance is SELinux, which acts very strangely towards
the IPC namespace and mqueue. If you unshare the IPC namespace *before*
you map a user in the user namespace, the IPC namespace's internal
kern-mount for mqueue will be labelled incorrectly and the container
won't be able to access it. The only way of solving this is to unshare
IPC *after* the user has been mapped and we have changed to that user.
I've also heard of this happening to the NET namespace while talking to
some LXC folks, though I haven't personally seen that issue.

This change matches our handling of user namespaces to be the same as
how LXC handles these problems.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-01-25 23:56:49 +11:00
Akihiro Suda dd5eb3b9e3 make: validate C format
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-01-24 10:49:50 +09:00
Ed King 5c0af14bf8 Return from goroutine when it should terminate
Signed-off-by: Craig Furman <cfurman@pivotal.io>
2018-01-23 10:46:31 +00:00
Will Martin 8d3e6c9826 Avoid race when opening exec fifo
When starting a container with `runc start` or `runc run`, the stub
process (runc[2:INIT]) opens a fifo for writing. Its parent runc process
will open the same fifo for reading. In this way, they synchronize.

If the stub process exits at the wrong time, the parent runc process
will block forever.

This can happen when racing 2 runc operations against each other: `runc
run/start`, and `runc delete`. It could also happen for other reasons,
e.g. the kernel's OOM killer may select the stub process.

This commit resolves this race by racing the opening of the exec fifo
from the runc parent process against the stub process exiting. If the
stub process exits before we open the fifo, we return an error.

Another solution is to wait on the stub process. However, it seems it
would require more refactoring to avoid calling wait multiple times on
the same process, which is an error.

Signed-off-by: Craig Furman <cfurman@pivotal.io>
2018-01-22 17:03:02 +00:00
Antonio Murdaca cd1e7abee2
libcontainer: expose annotations in hooks
Annotations weren't passed to hooks. This patch fixes that by passing
annotations to stdin for hooks.

Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-01-11 16:54:01 +01:00
vikaschoudhary16 d5b4a3eddb Fix race against systemd
- T0: runc triggers a systemd unit creation asynchronously from [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L298)
- T1: runc then moves ahead and starts creating cgroup paths(.scope directories), [here](https://github.com/opencontainers/runc/blob/master/libcontainer/cgroups/systemd/apply_systemd.go#L348). Kernel creates .scope directory and cgroup.procs file(along with other default files) in the directory automatically, in an atomic manner.
- T3: systemd execution thread which was invoked at time `T0`, is still in the process of unit creation. systemd also trying to create cgroup paths and deletes the `.scope` directory which is created at time `T1` by runc from [here](https://github.com/systemd/systemd/blob/v219/src/shared/cgroup-util.c#L1630) in the code

Signed-off-by: vikaschoudhary16 <choudharyvikas16@gmail.com>
2018-01-08 09:37:26 -05:00
Mrunal Patel e6516b3d5d
Merge pull request #1678 from sboeuf/sboeuf/subreaper
libcontainer: Do not wait for signalled processes if subreaper is set
2017-12-15 08:47:07 -08:00
Michael Crosby 7f24b40cc5
Merge pull request #1675 from tklauser/apparmor-no-cgo
RFC: libcontainer: remove dependency on libapparmor
2017-12-15 11:23:35 -05:00
Tobias Klauser db093f621f libcontainer: remove dependency on libapparmor
libapparmor is integrated in libcontainer using cgo but is only used to
call a single function: aa_change_onexec. It turns out this function is
simple enough (writing a string to a file in /proc/<n>/attr/...) to be
re-implemented locally in libcontainer in plain Go.

This allows to drop the dependency on libapparmor and the corresponding
cgo integration.

Fixes #1674

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-12-15 09:59:58 +01:00
Sebastien Boeuf bb912eb00c libcontainer: Do not wait for signalled processes if subreaper is set
When a subreaper is enabled, it might expect to reap a process and
retrieve its exit code. That's the reason why this patch is giving
the possibility to define the usage of a subreaper as a consumer of
libcontainer. Relying on this information, libcontainer will not
wait for signalled processes in case a subreaper has been set.

Fixes #1677

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2017-12-14 10:37:38 -08:00
Mrunal Patel c6e4a1ebeb
Merge pull request #1665 from Mashimiao/gidmapping-valid-fix
specconv: avoid skipping gidmappings applied when uidmappings is empty
2017-12-11 09:50:54 -08:00
Mrunal Patel b028413c35
Merge pull request #1655 from Mashimiao/add-propagation-more
support unbindable,runbindable for rootfs propagation
2017-12-11 09:21:41 -08:00
Allen Sun fec6b0fea5 Update criu_opts_linux.go
Signed-off-by: Allen Sun <shlallen1990@gmail.com>
2017-12-05 15:16:26 +08:00
Michael Crosby 91e9795013
Merge pull request #1654 from dqminh/only-linux
remove placeholder for non-linux platforms
2017-11-30 09:51:47 -05:00
Ma Shimiao 57edfbbaf2 specconv: avoid skipping gidmappings applied when uidmappings is empty
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-11-30 16:24:36 +08:00
Aleksa Sarai e8149af291
merge branch 'pr-1661'
Ensure container tests do not write on the host

LGTMs: @hqhq @cyphar
Closes #1661
2017-11-27 20:10:48 +11:00
Danail Branekov 0495fece57 Ensure container tests do not write on the host
TestGetContainerStateAfterUpdate creates its state.json file on the current
directory which turns out to be the host runc directory. Thus whenever
the test completes it leaves the state.json file behind thus
a) poluting the local git repository
b) changing the host file system violating the principle of doing
everything in an isolated container environment

This change would create a new temporary (in-container) directory and use it as
linuxContainer.root

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
2017-11-27 10:43:10 +02:00
Daniel Dao 8898b6b446 remove placeholder for non-linux platforms
runc currently only support Linux platform, and since we dont intend to expose
the support to other platform, removing all other platforms placeholder code.

`libcontainer/configs` still being used in
https://github.com/moby/moby/blob/master/daemon/daemon_windows.go so
keeping it for now.

After this, we probably should also rename files to drop linux suffices
if possible.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2017-11-24 18:14:51 +00:00
Daniel, Dao Quang Minh fb871d9cd0
Merge pull request #1664 from tklauser/drop-freebsd
libcontainer: drop FreeBSD support
2017-11-24 18:08:21 +00:00
Tobias Klauser 4d27f20db0 libcontainer: drop FreeBSD support
runc is not supported on FreeBSD, so remove all FreeBSD specific bits.

As suggested by @crosbymichael in #1653

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-11-24 14:51:05 +01:00
Danail Branekov 38d1e6ec27 Delete xattr related code
Selinux related code has been moved to the selinux package
(https://github.com/opencontainers/selinux) and therefore xattr related
code can be deleted from libcontainer

Signed-off-by: Danail Branekov <danailster@gmail.com>
2017-11-21 12:49:28 +02:00
Ma Shimiao 17db6560be support unbindable,runbindable for rootfs propagation
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-11-17 16:14:15 +08:00
Seth Jennings bca53e7b49 systemd: adjust CPUQuotaPerSecUSec to compensate for systemd internal handling
Signed-off-by: Seth Jennings <sjenning@redhat.com>
2017-11-15 20:20:06 -06:00
Vincent Demeester 3ca4c78b1a
Import docker/docker/pkg/mount into runc
This will help get rid of docker/docker dependency in runc 👼

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-11-08 16:25:58 +01:00
Michael Crosby 2f010ecf19
Merge pull request #1622 from vdemeester/import-symlink-from-docker
Remove pkg/symlink from docker/docker and use cyphar/filepath-securejoin
2017-11-08 10:07:00 -05:00
Akihiro Suda 0aac2368e4 specconv.Example(): add /proc/scsi to masked paths
Port over https://github.com/moby/moby/pull/35399

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2017-11-04 17:38:14 +00:00
Michael Crosby 0232e38342
Merge pull request #1629 from masters-of-cats/busybox-inflation
Avoid disk usage explosion when copying busybox
2017-11-01 09:15:22 -04:00
Danail Branekov fdbb9e3e55 Avoid disk usage explosion when copying busybox
When running runc tests with temp directory with size 500M copying
busybox without preserving hardlinks causes the folder to inflate to
roughly 330M. Copying busybox twice in certain tests causes the /tmp
directory to overfill. Using `-a` preserves links which busybox uses to
implement its choice of binary to run.

Signed-off-by: Tom Godkin <tgodkin@pivotal.io>
2017-11-01 09:52:05 +00:00
Vincent Demeester 594501475e
Use cyphar/filepath-securejoin instead of docker pkg/symlink
runc shouldn't depend on docker and be more self-contained.
Removing github.com/pkg/symlink dep is the first step to not depend on docker anymore

Signed-off-by: Vincent Demeester <vincent@sbr.pm>
2017-10-31 16:53:45 +01:00
Lorenzo Fontana 780f8ef567
Specconv: Test create command hooks and seccomp setup
Signed-off-by: Lorenzo Fontana <lo@linux.com>
2017-10-28 21:46:46 +02:00
Mrunal Patel 9a1186d128 Merge pull request #1619 from fntlnz/spec-linux-testing
WIP: Better testsuite for specconv
2017-10-25 15:23:19 -07:00
Lorenzo Fontana c0e6e12f9d
Test Cgroup creation and memory allocations
Signed-off-by: Lorenzo Fontana <lo@linux.com>
2017-10-25 01:58:10 +02:00
Aleksa Sarai ff5075c33f
init: correctly handle unmapped stdio with multiple mappings
Previously we would handle the "unmapped stdio" case by just doing a
simple check, however this didn't handle cases where the overflow_uid
was actually mapped in the user namespace. Instead of doing some
userspace checks, just try to do the fchown(2) and ignore EINVAL
(unmapped) or EPERM (lacking privilege over inode) errors.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-10-25 00:12:21 +11:00
Qiang Huang 74a1729647 Merge pull request #1607 from crosbymichael/term-err
libcontainer: handler errors from terminate
2017-10-20 15:15:38 +08:00
Qiang Huang e8b9b92f57 Merge pull request #1206 from YuPengZTE/devMD026
trailing punctuation in header
2017-10-20 14:47:09 +08:00
Mrunal Patel 80ee9e50b5 Merge pull request #1616 from mheon/seccomp_fix_breakage
Fix breaking change in Seccomp profile behavior
2017-10-19 14:15:04 -07:00
Aleksa Sarai c05f6368af
merge branch 'pr-1615'
libcontainer: intelrdt: fix a GetStats() issue

LGTMs: @crosbymichael @cyphar
Closes #1615
2017-10-19 03:41:16 +11:00
Matthew Heon e9193ba6e6 Fix breaking change in Seccomp profile behavior
Multiple conditions were previously allowed to be placed upon the
same syscall argument. Restore this behavior.

Signed-off-by: Matthew Heon <mheon@redhat.com>
2017-10-18 11:53:56 -04:00
Qiang Huang 3409d5c555 Merge pull request #1606 from cyphar/rootfs-propagation-no-pivot
specconv: emit an error when using MS_PRIVATE with --no-pivot
2017-10-18 09:52:04 +08:00
Xiaochen Shen d89217515b libcontainer: intelrdt: fix a GetStats() issue
This fixes a GetStats() issue introduced in #1590:
If Intel RDT is enabled by hardware and kernel, but intelRdt is not
specified in original config, GetStats() will return error unexpectedly
because we haven't called Apply() to create intelrdt group or attach
tasks for this container. As a result, runc events command will have no
output.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-10-17 17:37:07 +08:00
Tobias Klauser 0eed453b21 libcontainer: use Major/Minor from x/sys/unix
The Major and Minor functions were added for Linux in golang/sys@85d1495
which is already vendored in. Use these functions instead of the local
re-implementation.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-10-17 09:06:42 +02:00
Aleksa Sarai 9b13f5cc7f
merge branch 'pr-1453'
propagate argv0 when re-execing from /proc/self/exe

LGTMs: @crosbymichael @cyphar
Closes #1453
2017-10-17 03:12:22 +11:00
Michael Crosby ff4481dbf6 Merge pull request #1540 from cloudfoundry-incubator/rootless-cgroups
Support cgroups with limits as rootless
2017-10-16 12:03:49 -04:00
Petros Angelatos 8098828680
propagate argv0 when re-execing from /proc/self/exe
This allows runc to be used as a target for docker's reexec module that
depends on a correct argv0 to select which process entrypoint to invoke.
Without this patch, when runc re-execs argv0 is set to "/proc/self/exe"
and the reexec module doesn't know what to do with it.

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>
2017-10-16 14:00:26 +02:00
Tobias Klauser d2bc081420 libcontainer: merge common syscall implementations
There are essentially two possible implementations for Setuid/Setgid on
Linux, either using SYS_SETUID32/SYS_SETGID32 or SYS_SETUID/SYS_SETGID,
depending on the architecture (see golang/go#1435 for why Setuid/Setgid
aren currently implemented for Linux neither in syscall nor in
golang.org/x/sys/unix).

Reduce duplication by merging the currently implemented variants and
adjusting the build tags accordingly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-10-16 11:11:18 +02:00
Aleksa Sarai 6d30f7a01b
merge branch 'pr-1424'
Update Travis config to use trusty-backports libseccomp
  Add integration tests for multi-argument Seccomp filters
  Vendor updated libseccomp-golang for bugfix

LGTMs: @crosbymichael @cyphar
Closes #1424
2017-10-16 03:01:37 +11:00
Aleksa Sarai d2ac52fe52
merge branch 'pr-1475'
Add support for mips/mips64
  Put signalMap in a separate file, so it may be arch-specific

LGTMs: @crosbymichael @cyphar
Closes #1475
2017-10-16 02:59:34 +11:00
Aleksa Sarai 2430a98e64
merge branch 'pr-1500'
rootfs: switch ms_private remount of oldroot to ms_slave

LGTMs: @crosbymichael @hqhq
Closes opencontainers/runc#1500
2017-10-14 09:32:59 +11:00
Sebastien Boeuf acb93c9c62 libcontainer: cgroups: Write freezer state after every state check
This commit ensures we write the expected freezer cgroup state after
every state check, in case the state check does not give the expected
result. This can happen when a new task is created and prevents the
whole cgroup to be FROZEN, leaving the state into FREEZING instead.

This patch prevents the case of an infinite loop to happen.

Fixes https://github.com/opencontainers/runc/issues/1609

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2017-10-12 07:07:28 -07:00
Matthew Heon bbc847a457 Add integration tests for multi-argument Seccomp filters
Signed-off-by: Matthew Heon <mheon@redhat.com>
2017-10-10 15:49:08 -04:00
Michael Crosby bfe3058fc9 Make process check more forgiving
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-10-10 15:36:19 -04:00
Steven Hartland eb68b900bc Prevent invalid errors from terminate
Both Process.Kill() and Process.Wait() can return errors that don't impact the correct behaviour of terminate.

Instead of letting these get returned and logged, which causes confusion, silently ignore them.

Currently the test needs to be a string test as the errors are private to the runtime packages, so its our only option.

This can be seen if init fails during the setns.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-10-10 15:32:46 -04:00
Michael Crosby 4693fae411 Merge pull request #1590 from xiaochenshen/rdt-cat-support-update-command
libcontainer: intelrdt: add update command support
2017-10-10 15:25:22 -04:00
Aleksa Sarai d4f0f9a52b
specconv: emit an error when using MS_PRIVATE with --no-pivot
Due to the semantics of chroot(2) when it comes to mount namespaces, it
is not generally safe to use MS_PRIVATE as a mount propgation when using
chroot(2). The reason for this is that this effectively results in a set
of mount references being held by the chroot'd namespace which the
namespace cannot free. pivot_root(2) does not have this issue because
the @old_root can be unmounted by the process.

Ultimately, --no-pivot is not really necessary anymore as a commonly
used option since f8e6b5af5e ("rootfs: make pivot_root not use a
temporary directory") resolved the read-only issue. But if someone
really needs to use it, MS_PRIVATE is never a good idea.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-10-08 17:50:55 +11:00
Michael Crosby f53ad9cec9 Merge pull request #1604 from AkihiroSuda/cwd
libcontainer: create Cwd when it does not exist
2017-10-05 11:15:10 -04:00
Will Martin ca4f427af1 Support cgroups with limits as rootless
Signed-off-by: Ed King <eking@pivotal.io>
Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io>
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
2017-10-05 11:22:54 +01:00
Akihiro Suda 2edd36fdff libcontainer: create Cwd when it does not exist
The benefit for doing this within runc is that it works well with
userns.
Actually, runc already does the same thing for mount points.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2017-10-05 05:31:46 +00:00
Konstantinos Karampogias 605dc5c811 Set initial console size based on process spec
Signed-off-by: Will Martin <wmartin@pivotal.io>
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Ed King <eking@pivotal.io>
Signed-off-by: Roberto Jimenez Sanchez <jszroberto@gmail.com>
Signed-off-by: Thomas Godkin <tgodkin@pivotal.io>
2017-10-04 12:32:16 +01:00
Daniel, Dao Quang Minh 0351df1c5a Merge pull request #1600 from crosbymichael/console
Bump console and sys deps
2017-09-26 10:15:10 +01:00
Michael Crosby f364c1a58c Set ClearONLCR in tests
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-09-25 13:35:22 -04:00
Tobias Klauser d713652bda libcontainer: remove unnecessary type conversions
Generated using github.com/mdempsky/unconvert

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-09-25 10:41:57 +02:00
Qiang Huang 79ad714374 Merge pull request #1598 from euank/ragent
libcontainer: default mount propagation correctly
2017-09-25 11:55:29 +08:00
Euan Kemp 4301b440d6 libcontainer: default mount propagation correctly
The code in prepareRoot (e385f67a0e/libcontainer/rootfs_linux.go (L599-L605))
attempts to default the rootfs mount to `rslave`. However, since the spec
conversion has already defaulted it to `rprivate`, that code doesn't
actually ever do anything.

This changes the spec conversion code to accept "" and treat it as 0.

Implicitly, this makes rootfs propagation default to `rslave`, which is
a part of fixing the moby bug https://github.com/moby/moby/issues/34672

Alternate implementatoins include changing this defaulting to be
`rslave` and removing the defaulting code in prepareRoot, or skipping
the mapping entirely for "", but I think this change is the cleanest of
those options.

Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
2017-09-22 13:36:23 -07:00
Xiaochen Shen 2549545df5 intelrdt: always init IntelRdtManager if Intel RDT is enabled
In current implementation:
Either Intel RDT is not enabled by hardware and kernel, or intelRdt is
not specified in original config, we don't init IntelRdtManager in the
container to handle intelrdt constraint. It is a tradeoff that Intel RDT
has hardware limitation to support only limited number of groups.

This patch makes a minor change to support update command:
Whether or not intelRdt is specified in config, we always init
IntelRdtManager in the container if Intel RDT is enabled. If intelRdt is
not specified in original config, we just don't Apply() to create
intelrdt group or attach tasks for this container.

In update command, we could re-enable through IntelRdtManager.Apply()
and then update intelrdt constraint.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-20 01:37:31 +08:00
Michael Crosby 593914b8bd Merge pull request #1593 from s7v7nislands/drop_go1.5
Drop support golang 1.5
2017-09-12 15:22:00 -04:00
s7v7nislands 00ad8e1e56 Drop support golang 1.5
Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>
2017-09-12 20:56:51 +08:00