Commit Graph

1370 Commits

Author SHA1 Message Date
Mrunal Patel f1eea9051c
Merge pull request #2275 from kolyshkin/scan-nits
bifio.Scan.Err usage nits
2020-03-27 11:41:06 -07:00
Mrunal Patel 53ad1d5100
Merge pull request #2256 from kolyshkin/mountinfo-alt
Use faster mountinfo parser (part 1)
2020-03-27 11:36:51 -07:00
Mrunal Patel 75ff40cd28
Merge pull request #2273 from kolyshkin/v2-untangle
cgroup v2 cleanups
2020-03-27 11:21:36 -07:00
Kir Kolyshkin aab2c8ba52 libcontainer/intelrdt: optimize parseCpuInfoFile
The line we are parsing looks like this

> flags		: fpu vme de pse <...>

so look for "flags" as a prefix, not substring.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-27 00:41:11 -07:00
Kir Kolyshkin 0af5cd2041 Nit: fix use of bufio.Scanner.Err
The Err() method should be called after the Scan() loop, not inside it.

Found by

 git grep -A3 -F '.Scan()' | grep Err

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-27 00:12:17 -07:00
Qiang Huang d4a6a1d998
Merge pull request #2258 from masters-of-cats/eintr-retry
Retry writing to cgroup files on EINTR error
2020-03-27 11:21:41 +08:00
Kir Kolyshkin b45db5d3b2 libcontainer/cgroup: obsolete Get*Cgroup for v2
These functions should not be called from any code handling
the cgroup2 unified hierarchy.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:20:00 -07:00
Kir Kolyshkin a949e4f22f cgroupv2: UnifiedManager.Apply: simplify
Remove joinCgroupsV2() function, as its name and second parameter
are misleading. Use createCgroupsv2Path() directly, do not call
getv2Path() twice.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:20:00 -07:00
Kir Kolyshkin 5406833a65 cgroupv2/systemd: add getv2Path
Function getSubsystemPath(), while works for v2 unified case, is
suboptimal, as it does a few unnecessary calls.

Add a simplified version of getSubsystemPath(), called getv2Path(),
and use it.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 19:17:09 -07:00
Kir Kolyshkin ec1f957b23 cgroupv2: don't use getSubsystemPath in Apply
This code is a copy-paste from cgroupv1 systemd code. Its aim
is to check whether a subsystem is available, and skip those
that are not.

In case v2 unified hierarchy is used, getSubsystemPath never
returns "not found" error, so calling it is useless.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 13:32:34 -07:00
Kir Kolyshkin 6905b72154 cgroupv2: use "max" for negative values
Cgroup v1 kernel doc [1] says:

> We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.

and cgroup v2 kernel documentation [2] says:

> - If a controller implements an absolute resource guarantee and/or
>  limit, the interface files should be named "min" and "max"
>  respectively.  If a controller implements best effort resource
>  guarantee and/or limit, the interface files should be named "low"
>  and "high" respectively.
>
>  In the above four control files, the special token "max" should be
>  used to represent upward infinity for both reading and writing.

Allow -1 value to still be used for v2, converting it to "max"
where it makes sense to do so.

This fixes the following issue:

> runc update test_update --memory-swap -1:
> error while setting cgroup v2: [write /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/memory.swap.max: invalid argument
> failed to write "-1" to "/sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/memory.swap.max"
> github.com/opencontainers/runc/libcontainer/cgroups/fscommon.WriteFile
> 	/home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fscommon/fscommon.go:21
> github.com/opencontainers/runc/libcontainer/cgroups/fs2.setMemory
> 	/home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fs2/memory.go:20
> github.com/opencontainers/runc/libcontainer/cgroups/fs2.(*manager).Set
> 	/home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/fs2/fs2.go:175
> github.com/opencontainers/runc/libcontainer/cgroups/systemd.(*UnifiedManager).Set
> 	/home/kir/go/src/github.com/opencontainers/runc/libcontainer/cgroups/systemd/unified_hierarchy.go:290
> github.com/opencontainers/runc/libcontainer.(*linuxContainer).Set
> 	/home/kir/go/src/github.com/opencontainers/runc/libcontainer/container_linux.go:211

[1] linux/Documentation/admin-guide/cgroup-v1/memory.rst
[2] linux/Documentation/admin-guide/cgroup-v2.rst

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-26 11:14:32 -07:00
Mrunal Patel 96596cbbec
Merge pull request #2270 from kolyshkin/systemd-no-kmem
cgroupv2: don't try to set kmem for systemd case
2020-03-25 21:39:52 -07:00
Kir Kolyshkin a675b5ebea cgroupv2: don't try to set kmem for systemd case
To the best of my knowledge, it has been decided to drop the kernel
memory controller from the cgroupv2 hierarchy, so "kernel memory limits"
do not exist if we're using v2 unified.

So, we need to ignore kernel memory setting. This was already done in
non-systemd case (see commit 88e8350de), let's do the same for systemd.

This fixes the following error:

> container_linux.go:349: starting container process caused "process_linux.go:306: applying cgroup configuration for process caused \"open /sys/fs/cgroup/machine.slice/runc-cgroups-integration-test.scope/tasks: no such file or directory\""

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-25 20:00:23 -07:00
Mrunal Patel be51398a8a
Merge pull request #2193 from milkwine/fix-readSync
fix readSync
2020-03-24 14:29:42 -07:00
Mrunal Patel 7de5db3dad
Merge pull request #2263 from kolyshkin/nits
Assorted minor nits in libcontainer
2020-03-24 14:17:22 -07:00
Akihiro Suda cc183ca662
Merge pull request #2242 from AkihiroSuda/vendor-systemd
vendor: update go-systemd and godbus
2020-03-25 02:40:22 +09:00
Mrunal Patel 3087d43bc8
Merge pull request #1826 from jingxiaolu/fix_specconv_process_nil
specconv: fix null spec.Process making runc panic
2020-03-23 21:07:06 -07:00
Kir Kolyshkin dd7b34618f libct/msMoveRoot: benefit from GetMounts filter
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-21 10:33:43 -07:00
Kir Kolyshkin fc4357a8b0 libct/msMoveRoot: rm redundant filepath.Abs() calls
1. rootfs is already validated to be kosher by (*ConfigValidator).rootfs()

2. mount points from /proc/self/mountinfo are absolute and clean, too

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-21 10:33:43 -07:00
Kir Kolyshkin dce0de8975 getParentMount: benefit from GetMounts filter
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-21 10:33:43 -07:00
Kir Kolyshkin 81d8452e30 libct/TestFactoryNewTmpfs: benefit from GetMounts
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-21 10:33:43 -07:00
Kir Kolyshkin c7ab2c036b libcontainer: switch to moby/sys/mountinfo package
Delete libcontainer/mount in favor of github.com/moby/sys/mountinfo,
which is fast mountinfo parser.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-21 10:33:43 -07:00
Kir Kolyshkin a572216f74 libcontainer/intelrdt: rm fmt.Sprintf
It it not needed as it does nothing here.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:33:24 -07:00
Kir Kolyshkin 5542a2c77d libcontainer/cgroups: GetAllPids: optimize
1. Return earlier if there is an error.

2. Do not use filepath.Split on every entry, use info.Name() instead.

3. Make readProcsFile() accept file name as an argument, to avoid
   unnecessary file name and directory splitting and merging.

4. Skip on info.IsDir() -- this avoids an error when cgroup name is
   set to "cgroup.procs".

This is still not very good since filepath.Walk() performs an unnecessary
stat(2) on every entry, but better than before.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 12:27:36 -07:00
Kir Kolyshkin 12dc475dd6 libcontainer: simplify createCgroupsv2Path
fmt.Sprintf is slow and is not needed here, string concatenation would
be sufficient. It is also redundant to convert []byte from string and
back, since `bytes` package now provides the same functions as `strings`.

Use Fields() instead of TrimSpace() and Split(), mainly for readability
(note Fields() is somewhat slower than Split() but here it doesn't
matter much).

Use Join() to prepend the plus signs.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-03-20 11:51:55 -07:00
Mario Nitchev 648295be98 Skip test for cgroups v2
Signed-off-by: Yulia Nedyalkova <julianedialkova@hotmail.com>
2020-03-19 12:54:54 +02:00
Danail Branekov f34eb2c003 Retry writing to cgroup files on EINTR error
Golang 1.14 introduces asynchronous preemption which results into
applications getting frequent EINTR (syscall interrupted) errors when
invoking slow syscalls, e.g. when writing to cgroup files.

As writing to cgroups is idempotent, it is safe to retry writing to the
file whenever the write syscall is interrupted.

Signed-off-by: Mario Nitchev <marionitchev@gmail.com>
2020-03-18 13:00:05 +02:00
SiYu Zhao 34d471769b fix readSync
Signed-off-by: SiYu Zhao <d.chaser.zsy@gmail.com>
2020-03-17 11:26:46 +08:00
Michael Crosby 939cd0b734
Merge pull request #1737 from wking/remove-procConsole-comment
libcontainer/sync: Drop procConsole transaction from comments
2020-03-16 14:00:00 -04:00
Michael Crosby 88474967d3
Merge pull request #1974 from openSUSE/unreachable-code
Remove unreachable code paths
2020-03-16 13:56:05 -04:00
Akihiro Suda 525b9f311c
Merge pull request #2248 from AkihiroSuda/fix-cgroupv2-conversion
cgroup2: fix conversion
2020-03-16 14:00:02 +09:00
Akihiro Suda 492d525e55 vendor: update go-systemd and godbus
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-16 13:26:03 +09:00
Mrunal Patel 981dbef514
Merge pull request #2226 from avagin/runsc-restore-cmd-wait
restore: fix a race condition in process.Wait()
2020-03-15 18:48:16 -07:00
Akihiro Suda aa269315a4 cgroup2: add CpuMax conversion
Fix #2243

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:58:39 +09:00
Akihiro Suda 64e9a97981 cgroup2: fix conversion
* TestConvertCPUSharesToCgroupV2Value(0) was returning 70369281052672, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(0) was returning 32, while the correct value is 0
* ConvertBlkIOToCgroupV2Value(1000) was returning 4, while the correct value is 10000

Fix #2244
Follow-up to #2212 #2213

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-03-13 02:57:07 +09:00
Sascha Grunert b477a159db
Remove unreachable code paths
Signed-off-by: Sascha Grunert <sgrunert@suse.com>
2020-03-12 09:13:03 +01:00
Mrunal Patel 0ff53526a4
Merge pull request #2252 from pkagrawal/2251-fix
Synchronize the call to linuxContainer.Signal()
2020-03-11 11:11:56 -07:00
Akihiro Suda 71dfb559d6
Merge pull request #2238 from tedyu/init-proc-err-ret
Use named error return for initProcess#start
2020-03-11 01:03:13 +09:00
l00397676 62cfad97ca specconv: add a test case to check null spec.Process
Signed-off-by: l00397676 <lujingxiao@huawei.com>
2020-03-10 11:43:51 +08:00
Pradyumna Agrawal 5b2b138d24 Synchronize the call to linuxContainer.Signal()
linuxContainer.Signal() can race with another call to say Destroy()
which clears the container's initProcess. This can cause a nil pointer
dereference in Signal().

This patch will synchronize Signal() and Destroy() by grabbing the
container's mutex as part of the Signal() call.

Signed-off-by: Pradyumna Agrawal <pradyumnaa@vmware.com>
2020-03-09 11:15:22 -07:00
zyu 957da1f9ab Use named error return for initProcess#start
Signed-off-by: zyu <yuzhihong@gmail.com>
2020-03-09 09:29:03 -07:00
Akihiro Suda 6503438fd6
Merge pull request #2212 from Zyqsempai/2211-convert-blkio-weight-properly
Convert blkioWeight to io.weight properly
2020-03-05 09:32:45 +09:00
Aleksa Sarai 93e5c4d320
merge branch 'pr-2232'
Aleksa Sarai (1):
  libcontainer: dual-license nsenter/cloned_binary.c

LGTMs: @mrunalp @AkihiroSuda
Closes #2232
2020-03-04 11:10:49 +11:00
Qiang Huang 3b7e32feba
Merge pull request #2210 from Zyqsempai/2164-remove-deprecated-systemd-resources
Exchange deprecated systemd resources with the appropriate for cgroupv2
2020-02-29 10:13:55 +08:00
Aleksa Sarai 98de84265d
libcontainer: dual-license nsenter/cloned_binary.c
The new license is Apache-2.0 OR LPGL-2.1-or-later. This is necessary
for libcrun to be relicensed under the LGPL-2.1[1], and all of the
relevant copyright holders have agreed to relicense this code under the
dual license:

  * Aleksa Sarai [2]
  * Christian Brauner [3]
  * Justin Cormack [4]

Because it is still dual-licensed as an Apache-2.0 work, this doesn't
affect it's usability within runc or any other dependent projects.

[1]: https://github.com/containers/crun/issues/256
[2]: https://github.com/containers/crun/issues/256#issuecomment-589498088
[3]: https://github.com/containers/crun/issues/256#issuecomment-589605034
[4]: https://github.com/containers/crun/issues/256#issuecomment-589504231

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2020-02-22 00:17:07 +11:00
Aleksa Sarai 0f32b03dda
merge branch 'pr-2192'
Boris Popovschi (2):
  Fix skip message for cgroupv2
  Fix MAJ:MIN io.stat parsing order

LGTMs: @hqhq @cyphar
Closes #2192
2020-02-21 16:00:17 +11:00
Boris Popovschi 4b8134f63b Convert blkioWeight to io.weight properly
Signed-off-by: Boris Popovschi <zyqsempai@mail.ru>
2020-02-18 15:44:07 +02:00
Kir Kolyshkin 1cd71dfd71 systemd properties: support for *Sec values
Some systemd properties are documented as having "Sec" suffix
(e.g. "TimeoutStopSec") but are expected to have "USec" suffix
when passed over dbus, so let's provide appropriate conversion
to improve compatibility.

This means, one can specify TimeoutStopSec with a numeric argument,
in seconds, and it will be properly converted to TimeoutStopUsec
with the argument in microseconds. As a side bonus, even float
values are converted, so e.g. TimeoutStopSec=1.5 is possible.

This turned out a bit more tricky to implement when I was
originally expected, since there are a handful of numeric
types in dbus and each one requires explicit conversion.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-02-17 16:07:19 -08:00
Kir Kolyshkin 4c5c3fb960 Support for setting systemd properties via annotations
In case systemd is used to set cgroups for the container,
it creates a scope unit dedicated to it (usually named
`runc-$ID.scope`).

This patch adds an ability to set arbitrary systemd properties
for the systemd unit via runtime spec annotations.

Initially this was developed as an ability to specify the
`TimeoutStopUSec` property, but later generalized to work with
arbitrary ones.

Example usage: add the following to runtime spec (config.json):

```
	"annotations": {
		"org.systemd.property.TimeoutStopUSec": "uint64 123456789",
		"org.systemd.property.CollectMode":"'inactive-or-failed'"
	},
```

and start the container (e.g. `runc --systemd-cgroup run $ID`).

The above will set the following systemd parameters:
* `TimeoutStopSec` to 2 minutes and 3 seconds,
* `CollectMode` to "inactive-or-failed".

The values are in the gvariant format (see [1]). To figure out
which type systemd expects for a particular parameter, see
systemd sources.

In particular, parameters with `USec` suffix require an `uint64`
typed argument, while gvariant assumes int32 for a numeric values,
therefore the explicit type is required.

NOTE that systemd receives the time-typed parameters as *USec
but shows them (in `systemctl show`) as *Sec. For example,
the stop timeout should be set as `TimeoutStopUSec` but
is shown as `TimeoutStopSec`.

[1] https://developer.gnome.org/glib/stable/gvariant-text.html

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-02-17 16:07:19 -08:00
Mrunal Patel 81ef5024f8
Merge pull request #2213 from Zyqsempai/2166-convert-cpu-weight-poperly
Added conversion for cpu.weight v2
2020-02-17 07:49:39 -08:00