Commit Graph

20 Commits

Author SHA1 Message Date
Xiaochen Shen 27560ace2f libcontainer: intelrdt: add support for Intel RDT/MBA in runc
Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature
of Intel Resource Director Technology (RDT) which is supported on some
Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate
throttle over memory bandwidth for the software. A user controls the
resource by indicating the percentage of maximum memory bandwidth.

Hardware details of Intel RDT/MBA can be found in section 17.18 of
Intel Software Developer Manual:
https://software.intel.com/en-us/articles/intel-sdm

In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel
config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and
`mba` will be set in /proc/cpuinfo.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|   |   |-- cbm_mask
|   |   |-- min_cbm_bits
|   |   |-- num_closids
|   |-- MB
|       |-- bandwidth_gran
|       |-- delay_linear
|       |-- min_bandwidth
|       |-- num_closids
|-- ...
|-- schemata
|-- tasks
|-- <container_id>
    |-- ...
    |-- schemata
    |-- tasks

For MBA support for `runc`, we will reuse the infrastructure and code
base of Intel RDT/CAT which implemented in #1279. We could also make
use of `tasks` and `schemata` configuration for memory bandwidth
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the
task ID to the "tasks" file (which will automatically remove them from
the previous group to which they belonged). New tasks created by
fork(2) and clone(2) are added to the same group as their parent.

The file `schemata` has a list of all the resources available to this
group. Each resource (L3 cache, memory bandwidth) has its own line and
format.

Memory bandwidth schema:
It has allocation values for memory bandwidth on each socket, which
contains L3 cache id and memory bandwidth percentage.
    Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

The minimum bandwidth percentage value for each CPU model is predefined
and can be looked up through "info/MB/min_bandwidth". The bandwidth
granularity that is allocated is also dependent on the CPU model and
can be looked up at "info/MB/bandwidth_gran". The available bandwidth
control steps are: min_bw + N * bw_gran. Intermediate values are
rounded to the next control step available on the hardware.

For more information about Intel RDT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the minimum
memory bandwidth of 10% with a memory bandwidth granularity of 10%.
Tasks inside the container may use a maximum memory bandwidth of 20%
on socket 0 and 70% on socket 1.

"linux": {
    "intelRdt": {
        "memBwSchema": "MB:0=20;1=70"
    }
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2018-10-16 14:29:29 +08:00
Xiaochen Shen 692f6e1e27 libcontainer: add support for Intel RDT/CAT in runc
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-01 14:26:33 +08:00
Steven Hartland ee4f68e302 Updated logrus to v1
Updated logrus to use v1 which includes a breaking name change Sirupsen -> sirupsen.

This includes a manual edit of the docker term package to also correct the name there too.

Signed-off-by: Steven Hartland <steven.hartland@multiplay.co.uk>
2017-07-19 15:20:56 +00:00
Qiang Huang ed2df2906b Merge pull request #1205 from YuPengZTE/devError
fix typos by the result of golint checking
2017-01-27 21:42:18 +08:00
yupeng 145d23e084 error strings should not be capitalized or end with punctuation
Signed-off-by: yupeng <yu.peng36@zte.com.cn>
2016-12-01 11:57:16 +08:00
Zhang Wei b517076907 Check args numbers before application start
Add a general args number validator for all client commands.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-11-29 11:18:51 +08:00
rajasec e56e7ce9ca Removing fatal error from events in stopped state
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-09-12 21:06:32 +05:30
Mrunal Patel a753b06645 Replace github.com/codegangsta/cli by github.com/urfave/cli
The package got moved to a different repository

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-06-06 11:47:20 -07:00
Mrunal Patel 39aeb98025 Merge pull request #872 from rajasec/eventduration
runc events hang for zero duration
2016-06-03 12:02:26 -07:00
rajasec fa1c2e8337 runc events hang for zero duration
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-06-02 16:46:36 +05:30
Michael Crosby 30f1006b33 Fix libcontainer states
Move initialized to created and destoryed to stopped.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-05-31 11:06:41 -07:00
Qiang Huang 8477638aab Update cli package
The old one has bug when showing help message for IntFlags.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-05-10 13:58:09 +08:00
Michael Crosby a62dbf48b0 Improve stats output
This adds specific types and improves the json format for the marshaled
structure so that it is inline with the output that the spec produce,
camelCase not snake_case.

This should be the last change needed for people to really depend on the
output of this command and ensure that it does not change with any
internal changes instead of just marshaling the libcontainer structure.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-25 16:15:48 -07:00
rajasec 9adc142404 Updated as per review comments by moving to caller
Signed-off-by: rajasec <rajasec79@gmail.com>

Changing to container ID as per comments

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-04-23 20:31:05 +05:30
rajasec fb53190389 Not showing up the events for destroyed container
Signed-off-by: rajasec <rajasec79@gmail.com>

Updated as per review comments by moving to caller

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-04-23 20:25:57 +05:30
Michael Crosby 044e298507 Improve error handling in runc
The error handling on the runc cli is currenly pretty messy because
messages to the user are split between regular stderr format and logrus
message format.  This changes all the error reporting to the cli to only
output on stderr and exit(1) for consumers of the api.

By default logrus logs to /dev/null so that it is not seen by the user.
If the user wants extra and/or structured loggging/errors from runc they
can use the `--log` flag to provide a path to the file where they want
this information.  This allows a consistent behavior on the cli but
extra power and information when debugging with logs.

This also includes a change to enable the same logging information
inside the container's init by adding an init cli command that can share
the existing flags for all other runc commands.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-09 11:08:30 -08:00
Mike Brown f4e37ab63e updating usage for runc and runc commands
Signed-off-by: Mike Brown <brownwm@us.ibm.com>
2016-02-17 09:00:39 -06:00
rajasec de9b496b2d Fixing minor typo in usage
Signed-off-by: rajasec <rajasec79@gmail.com>
2015-11-23 23:10:32 +05:30
Marianna 5aa82c950d Enable build on unsupported platforms
Should compile now without errors but changes needed to be added for each system so it actually works.
main_unsupported.go is a new file with all the unsupported commands
Fixes #9

Signed-off-by: Marianna <mtesselh@gmail.com>
2015-06-29 17:03:44 -07:00
Michael Crosby 9fac183294 Initial commit of runc binary
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:34:13 -07:00