Commit Graph

1151 Commits

Author SHA1 Message Date
Mrunal Patel 2abd837c8c
Merge pull request #1893 from cyphar/keyctl-ignore-enosys
keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING)
2018-09-25 13:35:16 -07:00
Aleksa Sarai 578fe65e4f
merge branch 'pr-1817'
Fix duplicate entries and missing entries in getCgroupMountsHelper
  Add test for testing cgroup mounts on bedrock linux
  Stop relying on number of subsystems for cgroups

LGTMs: @crosbymichael @cyphar
Closes #1817
2018-09-19 19:48:17 +10:00
Michael Crosby cc8146cf93
Merge pull request #1858 from marcov/nsenter-README
Update outdated nsenter README content
2018-09-17 10:53:19 -04:00
Michael Crosby d77251d5fc
Merge pull request #1892 from Ace-Tang/add_clean_test
test: add more test case for CleanPath
2018-09-17 10:51:17 -04:00
Aleksa Sarai 40f1468413
keyring: handle ENOSYS with keyctl(KEYCTL_JOIN_SESSION_KEYRING)
While all modern kernels (and I do mean _all_ of them -- this syscall
was added in 2.6.10 before git had begun development!) have support for
this syscall, LXC has a default seccomp profile that returns ENOSYS for
this syscall. For most syscalls this would be a deal-breaker, and our
use of session keyrings is security-based there are a few mitigating
factors that make this change not-completely-insane:

  * We already have a flag that disables the use of session keyrings
    (for older kernels that had system-wide keyring limits and so
    on). So disabling it is not a new idea.

  * While the primary justification of using session keys *is*
    security-based, it's more of a security-by-obscurity protection.
    The main defense keyrings have is VFS credentials -- which is
    something that users already have better security tools for
    (setuid(2) and user namespaces).

  * Given the security justification you might argue that we
    shouldn't silently ignore this. However, the only way for the
    kernel to return -ENOSYS is either being ridiculously old (at
    which point we wouldn't work anyway) or that there is a seccomp
    profile in place blocking it.

    Given that the seccomp profile (if malicious) could very easily
    just return 0 or a silly return code (or something even more
    clever with seccomp-bpf) and trick us without this patch, there
    isn't much of a significant change in how much seccomp can trick
    us with or without this patch.

Given all of that over-analysis, I'm pretty convinced there isn't a
security problem in this very specific case and it will help out the
ChromeOS folks by allowing Docker to run inside their LXC container
setup. I'd be happy to be proven wrong.

Ref: https://bugs.chromium.org/p/chromium/issues/detail?id=860565
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-09-17 21:38:30 +10:00
Ace-Tang 5963cf2afc test: add more test case for CleanPath
Signed-off-by: Ace-Tang <aceapril@126.com>
2018-09-14 21:37:12 +08:00
Yan Zhu feb90346e0 doc: fix typo
Signed-off-by: Yan Zhu <yanzhu@alauda.io>
2018-09-07 11:58:59 +08:00
Michael Crosby 70ca035aa6
Merge pull request #1883 from lifubang/containeridinpath
fix delete other file bug when container id is ..
2018-09-05 13:43:21 -04:00
Mrunal Patel 9cda583235
Merge pull request #1832 from giuseppe/runc-drop-invalid-proc-destination-with-chroot
linux: drop check for /proc as invalid dest
2018-09-04 09:26:21 -07:00
Lifubang 4eb30fcdbe code optimization: use securejoin.SecureJoin and CleanPath
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-09-04 09:02:18 +08:00
Lifubang 4fae8fcce2 code optimization after review
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-09-03 23:27:31 +08:00
Lifubang d2d226e8f9 fix unexpected delete bug when container id is ..
Signed-off-by: Lifubang <lifubang@acmcoder.com>
2018-08-31 11:17:42 +08:00
ChangFeng 3ce8fac7c4 libcontainer: add /proc/loadavg to the white list of bind mount
Signed-off-by: JunLi <lijun.git@gmail.com>
2018-08-30 21:30:23 +08:00
Giuseppe Scrivano 636b664027
linux: drop check for /proc as invalid dest
it is now allowed to bind mount /proc.  This is useful for rootless
containers when the PID namespace is shared with the host.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-08-30 09:56:18 +02:00
Michael Crosby 1555a78945
Merge pull request #1874 from mrunalp/drop_unused_code
Remove unused veth setup code
2018-08-27 11:07:25 -04:00
Qiang Huang 0228707b77
Merge pull request #1873 from rhatdan/ms_move
When doing a copyup, /tmp can not be a shared mount point
2018-08-27 10:08:53 +08:00
Mrunal Patel fe3d5c4c6e Remove unused veth setup code
Networking is setup by plugins for users of runc so it makes sense
to get rid of the veth strategy.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-08-24 15:41:52 -07:00
Adrian Reber fa43a72aba
criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes #1786

Related to https://github.com/projectatomic/libpod/pull/469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-08-22 23:27:20 +02:00
Daniel J Walsh 62a4763a7a
When doing a copyup, /tmp can not be a shared mount point
MOVE_MOUNT will fail under certain situations.

You are not allowed to MS_MOVE if the parent directory is shared.

man mount
...
   The move operation
       Move a mounted tree to another place (atomically).  The call is:

              mount --move olddir newdir

       This  will cause the contents which previously appeared under olddir to
       now be accessible under newdir.  The physical location of the files  is
       not changed.  Note that olddir has to be a mountpoint.

       Note  also that moving a mount residing under a shared mount is invalid
       and unsupported.  Use findmnt -o TARGET,PROPAGATION to see the  current
       propagation flags.

Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2018-08-20 17:41:06 -04:00
Aleksa Sarai 20aff4f048
merge branch 'pr-1867'
Revert "libcontainer/rootfs_linux: minor cleanup"

LGTMs: @hqhq @cyphar
Closes #1867
2018-08-15 15:42:56 +10:00
Mrunal Patel 26ec8a9783 Revert "libcontainer/rootfs_linux: minor cleanup"
This reverts commit 1b27db67f1.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-08-14 15:50:18 -07:00
Marco Vedovati 34ed62697b Update outdated nsenter README content
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
2018-08-07 17:53:56 +02:00
Michael Crosby 4056a41f58
Merge pull request #1830 from crosbymichael/procs
Pass GOMAXPROCS to init processes
2018-08-01 10:48:06 -04:00
Jay Kamat a2faaa1317
Fix duplicate entries and missing entries in getCgroupMountsHelper
Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
2018-07-31 20:12:18 -07:00
Alban Crequy 3321aa1af7 Fix regression with mounts with non-absolute source path
PR #1753 introduced a test on the mount flags but the binary operator
was wrong, see https://github.com/opencontainers/runc/pull/1753#discussion_r203445652

This was noticed when investigating https://github.com/opencontainers/runtime-tools/issues/651

Symptoms: in the container, /proc/self/mountinfo displays some mounts as
follow:

296 279 0:67 / /tmp rw,nosuid - tmpfs /home/dpark/go/src/github.com/opencontainers/runc/tmpfs rw,size=65536k,mode=755

Signed-off-by: Alban Crequy <alban@kinvolk.io>
2018-07-18 18:30:49 +02:00
Michael Crosby 53fddb540a Pass GOMAXPROCS to init processes
This will help runc's init to not spawn many threads on large systems when
launched with max procs by the caller.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-06-26 11:23:37 -04:00
Michael Crosby 2c632d1a2d
Merge pull request #1824 from cyphar/fix-mips-build-devNumber
libcontainer: devices: fix mips builds
2018-06-25 13:21:28 -04:00
Jay Kamat e5a7c61f3c Add test for testing cgroup mounts on bedrock linux
Add a mountinfo from a bedrock linux system with 4 strata, and include
it for tests

Signed-off-by: Jay Kamat <jaygkamat@gmail.com>
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:01:07 +01:00
Daniel Dao 5ee0648bfb Stop relying on number of subsystems for cgroups
When there are complicated mount setups, there can be multiple mount
points which have the subsystem we are looking for. Instead of
counting the mountpoints, tick off subsystems until we have found them
all.

Without the 'all' flag, ignore duplicate subsystems after the first.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2018-06-24 00:00:58 +01:00
Aleksa Sarai 823c06eae9
libcontainer: improve "kernel.{domainname,hostname}" sysctl handling
These sysctls are namespaced by CLONE_NEWUTS, and we need to use
"kernel.domainname" if we want users to be able to set an NIS domainname
on Linux. However we disallow "kernel.hostname" because it would
conflict with the "hostname" field and cause confusion (but we include a
helpful message to make it clearer to the user).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-06-18 21:48:04 +10:00
Aleksa Sarai a0e99e7a1a
libcontainer: devices: fix mips builds
It turns out that MIPS uses uint32 in the device number returned by
stat(2), so explicitly wrap everything to make the compiler happy. I
really wish that Go had C-like numeric type promotion.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-06-17 11:22:01 +10:00
Mrunal Patel ad0f525506
Merge pull request #1819 from tiborvass/fix-arm32bit
libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits)
2018-06-15 07:06:50 -07:00
Tibor Vass c205e9fb64 libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits)
This fixes the following compilation error on 32bit ARM:
```
$ GOARCH=arm GOARCH=6 go build ./libcontainer/system/
libcontainer/system/linux.go:119:89: constant 4294967295 overflows int
```

Signed-off-by: Tibor Vass <tibor@docker.com>
2018-06-14 18:33:14 +00:00
Giuseppe Scrivano cbcc85d311
runc: not require uid/gid mappings if euid()==0
When running in a new unserNS as root, don't require a mapping to be
present in the configuration file.  We are already skipping the test
for a new userns to be present.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2018-06-12 12:45:54 +02:00
Aleksa Sarai dd56ece823
merge branch 'pr-1812'
Fix race in runc exec

LGTMs: @dqminh @cyphar
Closes #1812
2018-06-04 19:02:33 +10:00
Daniel, Dao Quang Minh 2e91544060
Merge pull request #1806 from cyphar/cgroup-ignorable-error-fixup
cgroup: clean up isIgnorableError for skippable EROFS
2018-06-02 23:57:02 +01:00
Mrunal Patel bd3c4f844a Fix race in runc exec
There is a race in runc exec when the init process stops just before
the check for the container status. It is then wrongly assumed that
we are trying to start an init process instead of an exec process.

This commit add an Init field to libcontainer Process to distinguish
between init and exec processes to prevent this race.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2018-06-01 16:25:58 -07:00
Michael Crosby 0e561642f8
Merge pull request #1688 from AkihiroSuda/unshare-m-r
main: support rootless mode in userns
2018-05-29 15:41:17 -04:00
Aleksa Sarai 939d5a3753
cgroup: clean up isIgnorableError for skippable EROFS
Include a rootless argument for isIgnorableError to avoid people
accidentally using isIgnorableError when they shouldn't (we don't ignore
any errors when running as root as that really isn't safe).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2018-05-25 11:31:41 +10:00
Qiang Huang dd67ab10d7
Merge pull request #1759 from cyphar/rootless-erofs-as-eperm
rootless: cgroup: treat EROFS as a skippable error
2018-05-25 09:24:16 +08:00
Daniel, Dao Quang Minh 2e931185f9
Merge pull request #1805 from derekwaynecarr/systemd-cpuquota-fix
fix systemd cpu quota for -1
2018-05-24 11:24:27 +01:00
Akihiro Suda c93815738a libcontainer: remove extra CAP_SETGID check for SetgroupAttr
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-24 14:59:30 +09:00
Derek Carr b515963c10 systemd cpu quota ignores -1
Signed-off-by: Derek Carr <decarr@redhat.com>
2018-05-23 14:28:39 -04:00
Michael Crosby fd0febd3ce Wrap error messages during init
Fixes #1437

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2018-05-10 10:28:10 -04:00
Akihiro Suda f103de57ec main: support rootless mode in userns
Running rootless containers in userns is useful for mounting
filesystems (e.g. overlay) with mapped euid 0, but without actual root
privilege.

Usage: (Note that `unshare --mount` requires `--map-root-user`)

  user$ mkdir lower upper work rootfs
  user$ curl http://dl-cdn.alpinelinux.org/alpine/v3.7/releases/x86_64/alpine-minirootfs-3.7.0-x86_64.tar.gz | tar Cxz ./lower || ( true; echo "mknod errors were ignored" )
  user$ unshare --mount --map-root-user
  mappedroot# runc spec --rootless
  mappedroot# sed -i 's/"readonly": true/"readonly": false/g' config.json
  mappedroot# mount -t overlay -o lowerdir=./lower,upperdir=./upper,workdir=./work overlayfs ./rootfs
  mappedroot# runc run foo

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-10 12:16:43 +09:00
Akihiro Suda 9c7d8bc1fd libcontainer: add parser for /etc/sub{u,g}id and /proc/PID/{u,g}id_map
Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2018-05-10 12:16:43 +09:00
Mrunal Patel 0cbfd8392f
Merge pull request #1562 from cyphar/carry-975-959-ipc-uid-namespaces
nsenter: improve namespace creation and SELinux IPC handling
2018-04-26 14:12:33 -07:00
Mrunal Patel 871ba2e58e
Merge pull request #1781 from filbranden/systemd3
Make channel for StartTransientUnit buffered
2018-04-24 11:56:34 -07:00
Michael Crosby bdbb9fab07
Merge pull request #1693 from AkihiroSuda/leave-setgroups-allow
libcontainer: allow setgroup in rootless mode
2018-04-24 11:24:04 -04:00
Michael Crosby 1f11dc5dba
Merge pull request #1785 from dlorenc/seccomp
Make the setupSeccomp function public.
2018-04-19 16:00:54 -04:00