Commit Graph

1279 Commits

Author SHA1 Message Date
Radostin Stoyanov f017e0f9e1 checkpoint: Set descriptors.json file mode to 0600
Prevent unprivileged users from being able to read descriptors.json

Signed-off-by: Radostin Stoyanov <rstoyanov1@gmail.com>
2019-10-12 19:29:44 +01:00
Aleksa Sarai 1b8a1eeec3
merge branch 'pr-2132'
Support different field counts of cpuaact.stats

LGTMs: @crosbymichael @cyphar
Closes #2132
2019-10-02 01:50:47 +10:00
Aleksa Sarai d463f6485b
*: verify that operations on /proc/... are on procfs
This is an additional mitigation for CVE-2019-16884. The primary problem
is that Docker can be coerced into bind-mounting a file system on top of
/proc (resulting in label-related writes to /proc no longer happening).

While we are working on mitigations against permitting the mounts, this
helps avoid our code from being tricked into writing to non-procfs
files. This is not a perfect solution (after all, there might be a
bind-mount of a different procfs file over the target) but in order to
exploit that you would need to be able to tweak a config.json pretty
specifically (which thankfully Docker doesn't allow).

Specifically this stops AppArmor from not labeling a process silently
due to /proc/self/attr/... being incorrectly set, and stops any
accidental fd leaks because /proc/self/fd/... is not real.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-09-30 09:06:48 +10:00
tianye15 28e58a0f6a Support different field counts of cpuaact.stats
Signed-off-by: skilxnTL <tylxltt@gmail.com>
2019-09-29 10:20:58 +08:00
blacktop 84373aaa56 Add SCMP_ACT_LOG as a valid Seccomp action (#1951)
Signed-off-by: blacktop <blacktop@users.noreply.github.com>
2019-09-26 11:03:03 -04:00
Michael Crosby 331692baa7 Only allow proc mount if it is procfs
Fixes #2128

This allows proc to be bind mounted for host and rootless namespace usecases but
it removes the ability to mount over the top of proc with a directory.

```bash
> sudo docker run --rm  apparmor
docker: Error response from daemon: OCI runtime create failed:
container_linux.go:346: starting container process caused "process_linux.go:449:
container init caused \"rootfs_linux.go:58: mounting
\\\"/var/lib/docker/volumes/aae28ea068c33d60e64d1a75916cf3ec2dc3634f97571854c9ed30c8401460c1/_data\\\"
to rootfs
\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged\\\"
at \\\"/proc\\\" caused
\\\"\\\\\\\"/var/lib/docker/overlay2/a6be5ae911bf19f8eecb23a295dec85be9a8ee8da66e9fb55b47c841d1e381b7/merged/proc\\\\\\\"
cannot be mounted because it is not of type proc\\\"\"": unknown.

> sudo docker run --rm -v /proc:/proc apparmor

docker-default (enforce)        root     18989  0.9  0.0   1288     4 ?
Ss   16:47   0:00 sleep 20
```

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2019-09-24 11:00:18 -04:00
Jonathan Rudenberg af7b6547ec libcontainer/nsenter: Don't import C in non-cgo file
Signed-off-by: Jonathan Rudenberg <jonathan@titanous.com>
2019-09-11 17:03:07 +00:00
Giuseppe Scrivano 718a566e02
cgroup: support mount of cgroup2
convert a "cgroup" mount to "cgroup2" when the system uses cgroups v2
unified hierarchy.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-06 17:57:14 +02:00
Sebastiaan van Stijn eb86f6037e
bump syndtr/gocapability d98352740cb2c55f81556b63d4a1ec64c5a319c2
relevant changes:

  - syndtr/gocapability#14 capability: Deprecate NewPid and NewFile for NewPid2 and NewFile2
  - syndtr/gocapability#16 Fix capHeader.pid type

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-09-06 01:44:26 +02:00
Mrunal Patel 92ac8e3f84
Merge pull request #2113 from giuseppe/cgroupv2
libcontainer: initial support for cgroups v2
2019-09-05 13:14:29 -07:00
Giuseppe Scrivano 524cb7c318
libcontainer: add systemd.UnifiedManager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:27 +02:00
Giuseppe Scrivano ec11136828
libcontainer, cgroups: rename systemd.Manager to LegacyManager
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:26 +02:00
Giuseppe Scrivano 1932917b71
libcontainer: add initial support for cgroups v2
allow to set what subsystems are used by
libcontainer/cgroups/fs.Manager.

subsystemsUnified is used on a system running with cgroups v2 unified
mode.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2019-09-05 13:02:25 +02:00
Mrunal Patel 92d851e03b
Merge pull request #2123 from carlosedp/riscv64
Bump x/sys and update syscall for initial Risc-V support
2019-09-04 14:10:26 -07:00
Carlos de Paula 4316e4d047 Bump x/sys and update syscall to start Risc-V support
Signed-off-by: Carlos de Paula <me@carlosedp.com>
2019-08-29 12:09:08 -03:00
Akihiro Suda 0bc069d795 nsenter: fix clang-tidy warning
nsexec.c:148:3: warning: Initialized va_list 'args' is leaked [clang-analyzer-valist.Unterminated]

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-08-29 00:18:02 +09:00
Akihiro Suda b225ef58fb nsenter: minor clean up
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2019-08-28 19:50:35 +09:00
Daniel J Walsh e4aa73424b
Rename cgroups_windows.go to cgroups_unsupported.go
Signed-off-by: Daniel J Walsh <dwalsh@redhat.com>
2019-08-26 18:13:52 -04:00
Mrunal Patel c61c7370f9
Merge pull request #2103 from sipsma/cgnil
cgroups/fs: check nil pointers in cgroup manager
2019-08-26 14:05:44 -07:00
Mrunal Patel 68d73f0a2e
Merge pull request #2107 from sashayakovtseva/public-get-devices
Make get devices function public
2019-08-26 09:58:10 -07:00
Kenta Tada c740965a18 libcontainer: update masked paths of /proc
This commit updates the masked paths of /proc.

Related issues:
* https://github.com/moby/moby/pull/37404
* https://github.com/moby/moby/pull/38299
* https://github.com/moby/moby/pull/36368

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2019-08-26 12:25:56 +09:00
Mrunal Patel 3525eddec5
Merge pull request #2117 from filbranden/detection1
Remove libcontainer detection for systemd features
2019-08-25 13:15:15 -07:00
Filipe Brandenburger 518c855833 Remove libcontainer detection for systemd features
Transient units (and transient slice units) have been available for quite a
long time and RHEL 7 with systemd v219 (likely the oldest OS we care about at
this point) supports that. A system running a systemd without these features is
likely to break a lot of other stuff that runc/libcontainer care about.

Regarding delegated slices, modern systemd doesn't allow it and
runc/libcontainer run fine on it, so we might as well just stop requesting it
on older versions of systemd which allowed it. (Those versions never really
changed behavior significantly when that option was passed anyways.)

Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
2019-08-22 21:53:24 -07:00
Filipe Brandenburger 588f040a77 Avoid the dependency on cgo through go-systemd/util package
This dependency is only needed in package "github.com/coreos/go-systemd/util"
and we only use it for IsRunningSystemd(), which is a simple Go function that
just stats a file.

Let's just borrow it here, so we remove the dependency and can remove that
package from vendored build.

This also removes dependencies on dlopen and on trying to find libsystemd.so
or libsystemd-login.so in the system.

Tested that this still builds and works as expected.

Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
2019-08-22 21:07:24 -07:00
sashayakovtseva afc24792dc Make get devices function public
Signed-off-by: sashayakovtseva <sasha@sylabs.io>
2019-08-15 17:16:47 +03:00
Erik Sipsma 9c822e4847 cgroups/fs: check nil pointers in cgroup manager
Signed-off-by: Erik Sipsma <sipsma@amazon.com>
2019-08-14 09:50:45 -07:00
Mrunal Patel 2e94378464
Merge pull request #2094 from sipsma/2093-nodotudev
Skip searching /dev/.udev for device nodes.
2019-08-05 10:41:54 -07:00
Erik Sipsma f08cdaeec9 Skip searching /dev/.udev for device nodes.
Closes: #2093

Signed-off-by: Erik Sipsma <sipsma@amazon.com>
2019-07-31 19:41:33 +00:00
Andreas Stocker 808e809f8a doc: First process in container needs `Init: true`
`Init` on the `Process` struct specifies whether the process is the first process in the container. This needs to be set to `true` when running the container.

Signed-off-by: Andreas Stocker <astocker@anexia-it.com>
2019-07-29 22:24:28 +02:00
Mrunal Patel b4a0b1d737
Merge pull request #2065 from odinuge/master
Fix cgroup hugetlb size prefix for kB
2019-06-06 12:38:57 -07:00
Kenta Tada b54fd85bbf libcontainer: change seccomp test for clone syscall
This commit changes the value of seccomp test for clone syscall.
Also hardcoded values should be changed because it is unclear to
understand what flags are tested.

Related issues:

* https://github.com/containerd/containerd/pull/3314
* https://github.com/moby/moby/pull/39308
* https://github.com/opencontainers/runtime-tools/pull/694

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2019-06-04 18:52:00 +09:00
Odin Ugedal 6f77e35daf
Export list of HugePageSizeUnits
This will allow others to import it instead of copying it.

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 20:17:30 +02:00
Odin Ugedal c6445b1c1c
Add tests for GetHugePageSize
Add tests to avoid regressions

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-30 17:27:32 +02:00
Odin Ugedal 273e7b74a7
Fix cgroup hugetlb size prefix for kB
The hugetlb cgroup control files (introduced here in 2012:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=abb8206cb0773)
use "KB" and not "kB"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/hugetlb_cgroup.c?h=v5.0#n349).

The behavior in the kernel has not changed since the introduction, and
the current code using "kB" will therefore fail on devices with small
amounts of ram (see
https://github.com/kubernetes/kubernetes/issues/77169) running a kernel
with config flag CONFIG_HUGETLBFS=y

As seen from the code in "mem_fmt" inside hugetlb_cgroup.c, only "KB",
"MB" and "GB" are used, so the others may be removed as well.

Here is a real world example of the files inside the
"/sys/kernel/mm/hugepages/" directory:
- "hugepages-64kB"
- "hugepages-2048kB"
- "hugepages-32768kB"
- "hugepages-1048576kB"

And the corresponding cgroup files:
- "hugetlb.64KB._____"
- "hugetlb.2MB._____"
- "hugetlb.32MB._____"
- "hugetlb.1GB._____"

Signed-off-by: Odin Ugedal <odin@ugedal.com>
2019-05-29 21:52:43 +02:00
Mrunal Patel 5ef781c2e7
Merge pull request #2061 from KentaTada/add-cgroup-namespace-test
libcontainer: fix TestGetContainerState to check configs.NEWCGROUP
2019-05-22 16:09:38 -07:00
Qiang Huang c8337777b6
Merge pull request #2042 from xiaochenshen/rdt-add-missing-destroy
libcontainer: intelrdt: add missing destroy handler in defer func
2019-05-21 09:48:00 +08:00
Kenta Tada 65032b55b1 libcontainer: fix TestGetContainerState to check configs.NEWCGROUP
This test needs to handle the case of configs.NEWCGROUP
as Namespace's type.

Signed-off-by: Kenta Tada <Kenta.Tada@sony.com>
2019-05-21 09:10:38 +09:00
Mrunal Patel 2484581dd7
Merge pull request #2035 from cyphar/bindmount-types
specconv: always set "type: bind" in case of MS_BIND
2019-05-07 15:47:58 -07:00
Mrunal Patel a0ecf749ee
Merge pull request #2047 from filbranden/systemd7
Move systemd.Manager initialization into a function in that module
2019-05-07 15:08:41 -07:00
Filipe Brandenburger 46351eb3d1 Move systemd.Manager initialization into a function in that module
This will permit us to extend the internals of systemd.Manager to include
further information about the system, such as whether cgroupv1, cgroupv2 or
both are in effect.

Furthermore, it allows a future refactor of moving more of UseSystemd() code
into the factory initialization function.

Signed-off-by: Filipe Brandenburger <filbranden@gmail.com>
2019-05-01 13:22:19 -07:00
Georgi Sabev a146081828 Write logs to stderr by default
Minor refactoring to use the filePair struct for both init sock and log pipe

Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2019-04-24 15:18:14 +03:00
Georgi Sabev 68b4ff5b37 Simplify bail logic & minor nsexec improvements
Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2019-04-24 15:16:11 +03:00
Xiaochen Shen 17b37ea3fa libcontainer: intelrdt: add missing destroy handler in defer func
In the exception handling of initProcess.start(), we need to add the
missing IntelRdtManager.Destroy() handler in defer func.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2019-04-24 16:41:51 +08:00
Georgi Sabev 475aef10f7 Remove redundant log function
Bump logrus so that we can use logrus.StandardLogger().Logf instead

Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2019-04-22 17:54:55 +03:00
Georgi Sabev ba3cabf932 Improve nsexec logging
* Simplify logging function
* Logs contain __FUNCTION__:__LINE__
* Bail uses write_log

Co-authored-by: Julia Nedialkova <julianedialkova@hotmail.com>
Co-authored-by: Danail Branekov <danailster@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2019-04-22 17:53:52 +03:00
Aleksa Sarai 8296826da5
specconv: always set "type: bind" in case of MS_BIND
We discovered in umoci that setting a dummy type of "none" would result
in file-based bind-mounts no longer working properly, which is caused by
a restriction for when specconv will change the device type to "bind" to
work around rootfs_linux.go's ... issues.

However, bind-mounts don't have a type (and Linux will ignore any type
specifier you give it) because the type is copied from the source of the
bind-mount. So we should always overwrite it to avoid user confusion.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2019-04-08 15:08:08 +10:00
Danail Branekov c486e3c406 Address comments in PR 1861
Refactor configuring logging into a reusable component
so that it can be nicely used in both main() and init process init()

Co-authored-by: Georgi Sabev <georgethebeatle@gmail.com>
Co-authored-by: Giuseppe Capizzi <gcapizzi@pivotal.io>
Co-authored-by: Claudia Beresford <cberesford@pivotal.io>
Signed-off-by: Danail Branekov <danailster@gmail.com>
2019-04-04 14:57:28 +03:00
Marco Vedovati feebfac358 Remove pipe close before exec.
Pipe close before exec is not necessary as os.Pipe() is calling pipe2
with O_CLOEXEC option.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
2019-04-04 14:53:30 +03:00
Marco Vedovati 9a599f62fb Support for logging from children processes
Add support for children processes logging (including nsexec).
A pipe is used to send logs from children to parent in JSON.
The JSON format used is the same used by logrus JSON formatted,
i.e. children process can use standard logrus APIs.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
2019-04-04 14:53:23 +03:00
Michael Crosby 11fc498ffa
Merge pull request #2023 from LittleLightLittleFire/2022-fix-runc-zombie-process-regression
Fixes regression causing zombie runc:[1:CHILD] processes
2019-03-22 14:06:31 -04:00