Commit Graph

436 Commits

Author SHA1 Message Date
Mrunal Patel a5847db387
Merge pull request #2506 from kolyshkin/cgroup-fixes
cgroupv1 removal nits
2020-08-17 21:13:31 -07:00
Mrunal Patel b70de388e4
Merge pull request #2540 from kolyshkin/unify-test-inval-cgroup
cgroups/fs tests: unify TestInvalid*Cgroup*
2020-08-17 11:40:44 -07:00
Akihiro Suda f668854938
Merge pull request #2499 from kolyshkin/find-cgroup-mountpoint-fastpath
cgroupv1/FindCgroupMountpoint: add a fast path
2020-08-04 14:06:41 +09:00
Kir Kolyshkin 637d54b7ce cgroups/fs tests: unify TestInvalid*Cgroup*
All the test cases are doing the same checks, only input differs,
so we can unify those using a test data table.

While at it:
 - use t.Fatalf where it makes sense (no further checks are possible);
 - remove the "XXX" comments as we won't get rid of cgroup Name/Parent.

PS I tried using t.Parallel() as well but it did not result in any
noticeable speedup, so I dropped it for simplicity.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-31 18:02:05 -07:00
Akihiro Suda d6f5641c20
Merge pull request #2507 from kolyshkin/alt-to-2497
libct/cgroups/GetCgroupRoot: make it faster
2020-07-31 11:43:38 +09:00
Mrunal Patel 46243fcea1
Merge pull request #2500 from kolyshkin/fs-apply
libct/cgroups/fs: rework Apply()
2020-07-30 16:39:53 -07:00
Kir Kolyshkin e0c0b0cf32 libct/cgroups/GetCgroupRoot: make it faster
...by checking the default path first.

Quick benchmark shows it's about 5x faster on an idle system, and the
gain should be much more on a system doing mounts etc.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-30 13:45:21 -07:00
Mrunal Patel cf1273abf4
Merge pull request #2498 from kolyshkin/v1-code-cleanups
libct/cgroups/fs: code cleanups
2020-07-09 15:58:06 -07:00
Kir Kolyshkin a73ce38d16 cgroupv1/FindCgroupMountpoint: add a fast path
In case cgroupPath is under the default cgroup prefix, let's try to
guess the mount point by adding the subsystem name to the default
prefix, and resolving the resulting path in case it's a symlink.

In most cases, given the default cgroup setup, this trick
should result in returning the same result faster, and avoiding
/proc/self/mountinfo parsing which is relatively slow and problematic.

Be very careful with the default path, checking it is
 - a directory;
 - a mount point;
 - has cgroup fstype.

If something is not right, fall back to parsing mountinfo.

While at it, remove the obsoleted comment about mountinfo parsing.  The
comment belongs to findCgroupMountpointAndRootFromReader(), but rather
than moving it there, let's just remove it, since it does not add any
value in understanding the current code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-07 13:57:33 -07:00
Kir Kolyshkin c1adc99a20 cgroup/fs: rework Apply()
In manager.Apply() method, a path to each subsystem is obtained by
calling d.path(sys.Name()), and the sys.Apply() is called that does
the same call to d.path() again.

d.path() is an expensive call, so rather than to call it twice, let's
reuse the result.

This results the number of times we parse mountinfo during container
start from 62 to 34 on my setup.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-07 10:58:37 -07:00
Aleksa Sarai 819fcc687e
merge branch 'pr-2495'
Kir Kolyshkin (1):
  cgroups/fs/path: optimize

LGTMs: @mrunalp @cyphar
Closes #2495
2020-07-07 11:51:06 +10:00
Kir Kolyshkin 2a322e91ec cgroupv1: remove subsystemSet.Get()
Instead of iterating over m.paths, iterate over subsystems and look up
the path for each. This is faster since a map lookup is faster than
iterating over the names in Get. A quick benchmark shows that the new
way is 2.5x faster than the old one.

Note though that this is not done to make things faster, as savings are
negligible, but to make things simpler by removing some code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 18:31:46 -07:00
Kir Kolyshkin daf30cb7ca cgroups/fs: rm getSubsystems
It does not add any value.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 18:29:14 -07:00
Kir Kolyshkin 2e22579946 libct/cgroups/fs.GetStats: drop PathExists check
Half of controllers' GetStats just return nil, and most of the others
ignore ENOENT on files, so it will be cheaper to not check that the
path exists in the main GetStats method, offloading that to the
controllers.

Drop PathExists check from GetStats, add it to those controllers'
GetStats where it was missing.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 18:02:17 -07:00
Kir Kolyshkin 11fb94965c cgroups/fs: rm Remove method from controllers
To my surprise, those are not used anywhere in the code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 18:02:17 -07:00
Kir Kolyshkin 19be8e5ba5 libct/cgroups.RemovePaths: speedup
Using os.RemoveAll has the following two issues:

 1. it tries to remove all files, which does not make sense for cgroups;
 2. it tries rm(2) which fails to directories, and then rmdir(2).

Let's reuse our RemovePath instead, and add warnings and errors logging.

PS I am somewhat hesitant to remove the weird checking my means of stat,
as it might break something. Unfortunately, neither commit 6feb7bda04
nor the PR it contains [1] do not explain what kind of weird errors were
seen from os.RemoveAll. Most probably our code won't return any bogus
errors, but let's keep the old code to be on the safe side.

[1] https://github.com/docker-archive/libcontainer/pull/308

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin 3f14242e0a libct/cgroups: move RemovePath from fs2
This is to be used by RemovePaths.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Kir Kolyshkin 254d23b964 libc/cgroups: empty map in RemovePaths
RemovePaths() deletes elements from the paths map for paths that has
been successfully removed.

Although, it does not empty the map itself (which is needed that AFAIK
Go garbage collector does not shrink the map), but all its callers do.

Move this operation from callers to RemovePaths.

No functional change, except the old map should be garbage collected now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-06 17:54:44 -07:00
Mrunal Patel 30dc54a995
Merge pull request #2503 from giuseppe/cgroup-fixes
cgroup, systemd: cleanup cgroups
2020-07-06 15:14:29 -07:00
Mrunal Patel 3f81131845
Merge pull request #2490 from kolyshkin/dev-opt
libct/cgroups: add SkipDevices to Resources
2020-07-06 14:28:30 -07:00
Giuseppe Scrivano 32034481ea
cgroup, systemd: cleanup cgroups
some hierarchies were created directly by .Apply() on top of systemd
managed cgroups.  systemd doesn't manage these and as a result we leak
these cgroups.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-06 23:06:16 +02:00
Giuseppe Scrivano 2deaeab08f
cgroup: store the result of IsRunningSystemd
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-05 12:42:27 +02:00
Kir Kolyshkin 62a30709d2 cgroups/fs/path: optimize
The result of cgroupv1.FindCgroupMountpoint() call (which is relatively
expensive) is only used in case raw.innerPath is absolute, so it only
makes sense to call it in that case.

This drastically reduces the number of calls to FindCgroupMountpoint
during container start (from 116 to 62 in my setup).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-03 14:07:27 -07:00
Kir Kolyshkin 46b26bc05d cgroups/fs/Freeze: simplify
In here, defer looks like an overkill, since the code is very simple and
we already have an error path.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-03 14:02:57 -07:00
Kir Kolyshkin cd479f9d14 cgroupv1/freezer: don't use subsystemSet.Get()
Iterating over the list of subsystems and comparing their names to get an
instance of fs.cgroupFreezer is useless and a waste of time, since it is
a shallow type (i.e. does not have any data/state) and we can create an
instance in place.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-03 14:00:44 -07:00
Kir Kolyshkin 108ee85b82 libct/cgroups: add SkipDevices to Resources
The kubelet uses libct/cgroups code to set up cgroups. It creates a
parent cgroup (kubepods) to put the containers into.

The problem (for cgroupv2 that uses eBPF for device configuration) is
the hard requirement to have devices cgroup configured results in
leaking an eBPF program upon every kubelet restart.  program. If kubelet
is restarted 64+ times, the cgroup can't be configured anymore.

Work around this by adding a SkipDevices flag to Resources.

A check was added so that if SkipDevices is set, such a "container"
can't be started (to make sure it is only used for non-containers).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-07-02 15:19:31 -07:00
Kir Kolyshkin e643db6e0f
Merge pull request #2479 from haircommander/fix-systemd-version
systemd: parse systemdVersion when only an int is returned

LGTMS: @mrunalp @kolyshkin
2020-06-19 12:19:16 -07:00
Peter Hunt 6a0f64e7c9 systemd: add unit tests for systemdVersion
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2020-06-18 22:30:50 -04:00
Peter Hunt 6369e38871 systemd: parse systemdVersion in more situations
there have been cases observed where instead of `v$VER.0-$OS` the systemdVersion returned is just `$VER`, or `$VER-1`.
handle these cases

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2020-06-18 22:30:50 -04:00
Kir Kolyshkin 89516d17dd libct/cgroups/readProcsFile: ret errorr if scan failed
Not sure why but the errors from scanner were ignored. Such errors
can happen if open(2) has succeeded but the subsequent read(2) fails.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-17 12:33:01 -07:00
Mrunal Patel 406298fdf0
Merge pull request #2466 from kolyshkin/systemd-cpu-quota-period
cgroups/systemd: add setting CPUQuotaPeriod prop
2020-06-17 12:03:30 -07:00
Kir Kolyshkin e751a168dc cgroups/systemd: add setting CPUQuotaPeriod prop
For some reason, runc systemd drivers (both v1 and v2) never set
systemd unit property named `CPUQuotaPeriod` (known as
`CPUQuotaPeriodUSec` on dbus and in `systemctl show` output).

Set it, and add a check to all the integration tests. The check is less
than trivial because, when not set, the value is shown as "infinity" but
when set to the same (default) value, shown as "100ms", so in case we
expect 100ms (period = 100000 us), we have to _also_ check for
"infinity".

[v2: add systemd version checks since CPUQuotaPeriod requires v242+]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 15:48:06 -07:00
Kir Kolyshkin 8c5a19f79b libct/cgroups/fs: rename some files
no changes, just a few git renames

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:54 -07:00
Kir Kolyshkin cec5ae7c2d libct/cgroupv1/getCgroupMountsHelper: minor nit
It is easy to just use TrimPrefix which does nothing in case the prefix
does not exist.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:50 -07:00
Kir Kolyshkin 0626c150c1 libct/cgroupv1: fix TestGetCgroupMounts test cases
When testing GetCgroupMounts, the map data is supposed to be obtained
from /proc/self/cgroup, but since we're mocking things, we provide
our own map.

Unfortunately, not all controllers existing in mountinfos were listed.
Also, "name=systemd" needs special handling, so add it.

The controllers added were:

 * for fedoraMountinfo case: name=systemd
 * for systemdMountinfo case: name=systemd, net_prio
 * for bedrockMountinfo case: name=systemd, net_prio, pids

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:30 -07:00
Kir Kolyshkin 0681d456fc libct/cgroups/utils: move cgroup v1 code to separate file
In most project, "utils" is a big mess, and this is not an exception.
Try to clean it up a bit by moving cgroup v1 specific code to a separate
source file.

There are no code changes in this commit, just moving it from one file
to another.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:45:07 -07:00
Kir Kolyshkin 7db2d3e146 libcontainer/cgroups: rm FindCgroupMountpointDir
This function is cgroupv1-specific, is only used once, and its name
is very close to the name of another function, FindCgroupMountpoint.

Inline it into the (only) caller.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:15 -07:00
Kir Kolyshkin d244b4058e libct/cgroups: improve ParseCgroupFile docs
In particular, state that for cgroup v2 the result is very different.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:08 -07:00
Kir Kolyshkin 5785aabc13 libct/cgroups: make isSubsystemAvailable v1-specific
This function is only called from cgroupv1 code, so there is no need
for it to implement cgroupv2 stuff.

Make it v1-specific, and panic if it is called from v2 code (since this
is an internal function, the panic would mean incorrect runc code).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:40:04 -07:00
Kir Kolyshkin 142d0f2d5d libct/cgroups/utils: make FindCgroupMountpoint* v1-specific
It's bad and wrong to use these functions for any cgroupv2 code,
and there are no existing users (in runc, at least).

Make them return an error in such case.

Also, remove the cgroupv2-specific handling from
findCgroupMountpointAndRootFromReader().

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:58 -07:00
Kir Kolyshkin 44b75e760e libct/cgroups: separate getCgroupMountsV1
This function should not really be used for cgroupv2 code.
Currently it is used in kubernetes code, so we can't remove
the v2 case yet.

Add a TODO item to remove v2 code once kubernetes is converted
to not use it, and separate out v1 code.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-16 12:39:06 -07:00
Kir Kolyshkin 3834222d88 libct/cgroups/utils: getControllerPath return err for v2
This function is not used and were never used in any cgroupv2 code.

To have it stay that way, let it return error in case it's called
for v2.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 20:23:59 -07:00
Kir Kolyshkin dd2426d067 libct/cgroups: fix m.paths map access
This fixes a few cases of accessing m.paths map directly without holding
the mutex lock.

Fixes: 9087f2e82
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 18:30:16 -07:00
Kir Kolyshkin a77d7b1d0f libct: don't use GetPaths
Since commit 714c91e9f7, method GetPaths() should only be used
for saving container state. For other uses, we have a new method,
Path(), which is cleaner.

Fix GetPaths() usage introduced by recent commits 859a780d6f and 9087f2e82.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-15 18:27:34 -07:00
Kir Kolyshkin 5b247e739c
Merge pull request #2338 from lifubang/systemdcgroupv2
fix path error in systemd when stopped

LGTMs: @mrunalp @AkihiroSuda
2020-06-15 18:01:13 -07:00
Katarzyna Kujawa 71e63de4a3 Fix #2469 omit memory.numa_stat when not available
Signed-off-by: Katarzyna Kujawa <katarzyna.kujawa@intel.com>
2020-06-15 11:39:34 +02:00
Kir Kolyshkin a92b0327ce cgroups/systemd: fix set CPU quota if period is unset
systemd drivers ignore --cpu-quota during update if the CPU
period was not set earlier.

Fixed by adding the default for the period.

The test will be added by the following commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:32:17 -07:00
Kir Kolyshkin 4189cb65f8 cgroups: remove cgroup.Resources.CpuMax
This (and the converting function) is only used by one of the four
cgroup drivers. The other three do some checking and conversion in
place, so let the fs2 do the same.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:15:38 -07:00
Kir Kolyshkin 8b9646775e cgroups/systemd: unify adding CpuQuota
The code that adds CpuQuotaPerSecUSec is the same in v1 and v2
systemd cgroup driver. Move it to common.

No functional change.

Note that the comment telling that we always set this property
contradicts with the current code, and therefore it is removed.

[v2: drop cgroupv1-specific comment]
[v3: drop returning error as it's not used]
[v4: remove an obsoleted comment]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-09 17:14:43 -07:00
Kir Kolyshkin 2ce20ed158 cgroups/systemd: simplify gen*ResourcesProperties
Use r instead of c.Resources for readability. No functional change.

This commit has been brought to you by '<,'>s/c\.Resources\./r./g

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-08 13:42:09 -07:00