Commit Graph

1134 Commits

Author SHA1 Message Date
Akihiro Suda b34d6d8a7c libcontainer: CurrentGroupSubGIDs -> CurrentUserSubGIDs
subgid is defined per user, not group (see subgid(5))

This commit also adds support for specifying subuid owner with a numeric UID.

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-08-29 07:46:03 +09:00
Michael Crosby 1555a78945
Merge pull request #1874 from mrunalp/drop_unused_code
Remove unused veth setup code
2018-08-27 11:07:25 -04:00
Qiang Huang 0228707b77
Merge pull request #1873 from rhatdan/ms_move
When doing a copyup, /tmp can not be a shared mount point
2018-08-27 10:08:53 +08:00
Mrunal Patel fe3d5c4c6e Remove unused veth setup code
Networking is setup by plugins for users of runc so it makes sense
to get rid of the veth strategy.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-08-24 15:41:52 -07:00
Adrian Reber fa43a72aba
criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes #1786

Related to https://github.com/projectatomic/libpod/pull/469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-08-22 23:27:20 +02:00
Daniel J Walsh 62a4763a7a
When doing a copyup, /tmp can not be a shared mount point
MOVE_MOUNT will fail under certain situations.

You are not allowed to MS_MOVE if the parent directory is shared.

man mount
...
   The move operation
       Move a mounted tree to another place (atomically).  The call is:

              mount --move olddir newdir

       This  will cause the contents which previously appeared under olddir to
       now be accessible under newdir.  The physical location of the files  is
       not changed.  Note that olddir has to be a mountpoint.

       Note  also that moving a mount residing under a shared mount is invalid
       and unsupported.  Use findmnt -o TARGET,PROPAGATION to see the  current
       propagation flags.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-08-20 17:41:06 -04:00
Aleksa Sarai 20aff4f048
merge branch 'pr-1867'
Revert "libcontainer/rootfs_linux: minor cleanup"

LGTMs: @hqhq @cyphar
Closes #1867
2018-08-15 15:42:56 +10:00
Mrunal Patel 26ec8a9783 Revert "libcontainer/rootfs_linux: minor cleanup"
This reverts commit 1b27db67f1.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-08-14 15:50:18 -07:00
Michael Crosby 4056a41f58
Merge pull request #1830 from crosbymichael/procs
Pass GOMAXPROCS to init processes
2018-08-01 10:48:06 -04:00
Alban Crequy 3321aa1af7 Fix regression with mounts with non-absolute source path
PR #1753 introduced a test on the mount flags but the binary operator
was wrong, see https://github.com/opencontainers/runc/pull/1753#discussion_r203445652

This was noticed when investigating https://github.com/opencontainers/runtime-tools/issues/651

Symptoms: in the container, /proc/self/mountinfo displays some mounts as
follow:

296 279 0:67 / /tmp rw,nosuid - tmpfs /home/dpark/go/src/github.com/opencontainers/runc/tmpfs rw,size=65536k,mode=755

Signed-off-by: Alban Crequy <alban@kinvolk.io>
2018-07-18 18:30:49 +02:00
Michael Crosby 53fddb540a Pass GOMAXPROCS to init processes
This will help runc's init to not spawn many threads on large systems when
launched with max procs by the caller.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-06-26 11:23:37 -04:00
Michael Crosby 2c632d1a2d
Merge pull request #1824 from cyphar/fix-mips-build-devNumber
libcontainer: devices: fix mips builds
2018-06-25 13:21:28 -04:00
Aleksa Sarai 823c06eae9
libcontainer: improve "kernel.{domainname,hostname}" sysctl handling
These sysctls are namespaced by CLONE_NEWUTS, and we need to use
"kernel.domainname" if we want users to be able to set an NIS domainname
on Linux. However we disallow "kernel.hostname" because it would
conflict with the "hostname" field and cause confusion (but we include a
helpful message to make it clearer to the user).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-06-18 21:48:04 +10:00
Aleksa Sarai a0e99e7a1a
libcontainer: devices: fix mips builds
It turns out that MIPS uses uint32 in the device number returned by
stat(2), so explicitly wrap everything to make the compiler happy. I
really wish that Go had C-like numeric type promotion.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-06-17 11:22:01 +10:00
Mrunal Patel ad0f525506
Merge pull request #1819 from tiborvass/fix-arm32bit
libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits)
2018-06-15 07:06:50 -07:00
Tibor Vass c205e9fb64 libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits)
This fixes the following compilation error on 32bit ARM:
```
$ GOARCH=arm GOARCH=6 go build ./libcontainer/system/
libcontainer/system/linux.go:119:89: constant 4294967295 overflows int
```

Signed-off-by: Tibor Vass <tibor@docker.com>
2018-06-14 18:33:14 +00:00
Giuseppe Scrivano cbcc85d311
runc: not require uid/gid mappings if euid()==0
When running in a new unserNS as root, don't require a mapping to be
present in the configuration file.  We are already skipping the test
for a new userns to be present.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-06-12 12:45:54 +02:00
Aleksa Sarai dd56ece823
merge branch 'pr-1812'
Fix race in runc exec

LGTMs: @dqminh @cyphar
Closes #1812
2018-06-04 19:02:33 +10:00
Daniel, Dao Quang Minh 2e91544060
Merge pull request #1806 from cyphar/cgroup-ignorable-error-fixup
cgroup: clean up isIgnorableError for skippable EROFS
2018-06-02 23:57:02 +01:00
Mrunal Patel bd3c4f844a Fix race in runc exec
There is a race in runc exec when the init process stops just before
the check for the container status. It is then wrongly assumed that
we are trying to start an init process instead of an exec process.

This commit add an Init field to libcontainer Process to distinguish
between init and exec processes to prevent this race.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-06-01 16:25:58 -07:00
Michael Crosby 0e561642f8
Merge pull request #1688 from AkihiroSuda/unshare-m-r
main: support rootless mode in userns
2018-05-29 15:41:17 -04:00
Aleksa Sarai 939d5a3753
cgroup: clean up isIgnorableError for skippable EROFS
Include a rootless argument for isIgnorableError to avoid people
accidentally using isIgnorableError when they shouldn't (we don't ignore
any errors when running as root as that really isn't safe).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-05-25 11:31:41 +10:00
Qiang Huang dd67ab10d7
Merge pull request #1759 from cyphar/rootless-erofs-as-eperm
rootless: cgroup: treat EROFS as a skippable error
2018-05-25 09:24:16 +08:00
Daniel, Dao Quang Minh 2e931185f9
Merge pull request #1805 from derekwaynecarr/systemd-cpuquota-fix
fix systemd cpu quota for -1
2018-05-24 11:24:27 +01:00
Akihiro Suda c93815738a libcontainer: remove extra CAP_SETGID check for SetgroupAttr
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-24 14:59:30 +09:00
Derek Carr b515963c10 systemd cpu quota ignores -1
Signed-off-by: Derek Carr <decarr@redhat.com>
2018-05-23 14:28:39 -04:00
Michael Crosby fd0febd3ce Wrap error messages during init
Fixes #1437

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-05-10 10:28:10 -04:00
Akihiro Suda f103de57ec main: support rootless mode in userns
Running rootless containers in userns is useful for mounting
filesystems (e.g. overlay) with mapped euid 0, but without actual root
privilege.

Usage: (Note that `unshare --mount` requires `--map-root-user`)

  user$ mkdir lower upper work rootfs
  user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz | tar Cxz ./lower || ( true; echo "mknod errors were ignored" )
  user$ unshare --mount --map-root-user
  mappedroot# runc spec --rootless
  mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json
  mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs
  mappedroot# runc run foo

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-10 12:16:43 +09:00
Akihiro Suda 9c7d8bc1fd libcontainer: add parser for /etc/sub{u,g}id and /proc/PID/{u,g}id_map
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-10 12:16:43 +09:00
Mrunal Patel 0cbfd8392f
Merge pull request #1562 from cyphar/carry-975-959-ipc-uid-namespaces
nsenter: improve namespace creation and SELinux IPC handling
2018-04-26 14:12:33 -07:00
Mrunal Patel 871ba2e58e
Merge pull request #1781 from filbranden/systemd3
Make channel for StartTransientUnit buffered
2018-04-24 11:56:34 -07:00
Michael Crosby bdbb9fab07
Merge pull request #1693 from AkihiroSuda/leave-setgroups-allow
libcontainer: allow setgroup in rootless mode
2018-04-24 11:24:04 -04:00
Michael Crosby 1f11dc5dba
Merge pull request #1785 from dlorenc/seccomp
Make the setupSeccomp function public.
2018-04-19 16:00:54 -04:00
Mrunal Patel 63e6708c74
Merge pull request #1784 from pierrchen/master
libcontainer/rootfs_linux: minor cleanup
2018-04-17 17:02:10 -07:00
dlorenc 40680b2d37 Make the setupSeccomp function public.
This function is useful for converting from the OCI spec format to the one used by runC/libcontainer.

Signed-off-by: dlorenc <lorenc.d@gmail.com>
2018-04-17 10:47:22 -07:00
Michael Crosby d56f6cc202
Merge pull request #1753 from wking/do-not-require-bind-mount-type
libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts
2018-04-16 11:01:53 -04:00
Bin Chen 1b27db67f1 libcontainer/rootfs_linux: minor cleanup
move variable close to where is used

Signed-off-by: Bin Chen <nk@devicu.com>
2018-04-16 22:25:48 +10:00
Filipe Brandenburger 165ee45334 Make channel for StartTransientUnit buffered
So that, if a timeout happens and we decide to stop blocking on the
operation, the writer will not block when they try to report the result
of the operation.

This should address Issue #1780 and it's a follow up for PR #1683,
PR #1754 and PR #1772.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-14 08:49:50 -07:00
Michael Crosby f753f300ae
Merge pull request #1779 from runcom/gcc8-fix
nsexec.c: fix GCC 8 warning
2018-04-12 12:13:43 -04:00
Michael Crosby 9f0eca2a94
Merge pull request #1777 from nalind/no-config-for-extant-netns
Only configure networking when creating a net ns
2018-04-12 10:55:02 -04:00
Antonio Murdaca 1a5064622c
nsexec.c: fix GCC 8 warning
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2018-04-12 12:25:06 +02:00
Nalin Dahyabhai 4521d4b19c Only configure networking when creating a net ns
When joining an existing namespace, don't default to configuring a
loopback interface in that namespace.

Its creator should have done that, and we don't want to fail to create
the container when we don't have sufficient privileges to configure the
network namespace.

Signed-off-by: Nalin Dahyabhai <nalin@redhat.com>
2018-04-11 13:28:19 -04:00
Filipe Brandenburger 0e16bd9b53 Detect whether Delegate is available on both slices and scopes
Starting with systemd 237, in preparation for cgroup v2, delegation is
only now available for scopes, not slices.

Update libcontainer code to detect whether delegation is available on
both and use that information when creating new slices.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-10 11:42:55 -07:00
Filipe Brandenburger 8ab251f298 Fix systemd.Apply() to check for DBus error before waiting on a channel.
The channel was introduced in #1683 to work around a race condition.
However, the check for error in StartTransientUnit ignores the error for
an already existing unit, and in that case there will be no notification
from DBus (so waiting on the channel will make it hang.)

Later PR #1754 added a timeout, which worked around the issue, but we
can fix this correctly by only waiting on the channel when there is no
error. Fix the code to do so.

The timeout handling was kept, since there might be other cases where
this situation occurs (https://bugzilla.redhat.com/show_bug.cgi?id=1548358
mentions calling this code from inside a container, it's unclear whether
an existing container was in use or not, so not sure whether this would
have fixed that bug as well.)

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
2018-04-09 11:51:59 -07:00
Sebastien Boeuf 985628dda0 libcontainer: Don't set container state to running when exec'ing
There is no reason to set the container state to "running" as a
temporary value when exec'ing a process on a container in "created"
state. The problem doing this is that consumers of the libcontainer
library might use it by keeping pointers in memory. In this case,
the container state will indicate that the container is running, which
is wrong, and this will end up with a failure on the next action
because the check for the container state transition will complain.

Fixes #1767

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
2018-03-30 09:29:18 -07:00
Akihiro Suda 73f3dc6389 libcontainer: allow setgroup in rootless mode
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-03-27 17:42:05 +09:00
Akihiro Suda ed58366cc8 libcontainer: fix Boolmsg alignment
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-03-26 14:44:03 +09:00
Tamal Saha 58415b4b12 Fix error message
Signed-off-by: Tamal Saha <tamal@appscode.com>
2018-03-21 20:52:09 -07:00
Aleksa Sarai fd3a6e6c83
libcontainer: handle unset oomScoreAdj corectly
Previously if oomScoreAdj was not set in config.json we would implicitly
set oom_score_adj to 0. This is not allowed according to the spec:

> If oomScoreAdj is not set, the runtime MUST NOT change the value of
> oom_score_adj.

Change this so that we do not modify oom_score_adj if oomScoreAdj is not
present in the configuration. While this modifies our internal
configuration types, the on-disk format is still compatible.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00
Aleksa Sarai 03e585985f
rootless: cgroup: treat EROFS as a skippable error
In some cases, /sys/fs/cgroups is mounted read-only. In rootless
containers we can consider this effectively identical to having cgroups
that we don't have write permission to -- because the user isn't
responsible for the read-only setup and cannot modify it. The rules are
identical to when /sys/fs/cgroups is not writable by the unprivileged
user.

An example of this is the default configuration of Docker, where cgroups
are mounted as read-only as a preventative security measure.

Reported-by: Vladimir Rutsky <rutsky@google.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-03-17 13:53:42 +11:00