Merge pull request #199 from runcom/rework-runtime-config-linux
runtime: config: linux: add cgroups informations
This commit is contained in:
commit
ab4acc05ff
|
@ -6,11 +6,24 @@ A namespace wraps a global system resource in an abstraction that makes it appea
|
|||
Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.
|
||||
For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html).
|
||||
|
||||
Namespaces are specified in the spec as an array of entries.
|
||||
Each entry has a type field with possible values described below and an optional path element.
|
||||
Namespaces are specified as an array of entries inside the `namespaces` root field.
|
||||
The following parameters can be specified to setup namespaces:
|
||||
|
||||
* **`type`** *(string, required)* - namespace type. The following namespaces types are supported:
|
||||
* **`pid`** processes inside the container will only be able to see other processes inside the same container
|
||||
* **`network`** the container will have its own network stack
|
||||
* **`mount`** the container will have an isolated mount table
|
||||
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same container via system level IPC
|
||||
* **`uts`** the container will be able to have its own hostname and domain name
|
||||
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container
|
||||
|
||||
* **`path`** *(string, optional)* - path to namespace file
|
||||
|
||||
If a path is specified, that particular file is used to join that type of namespace.
|
||||
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"namespaces": [
|
||||
{
|
||||
|
@ -36,32 +49,29 @@ Also, when a path is specified, a runtime MUST assume that the setup for that pa
|
|||
]
|
||||
```
|
||||
|
||||
#### Namespace types
|
||||
|
||||
* **`pid`** processes inside the container will only be able to see other processes inside the same container.
|
||||
* **`network`** the container will have its own network stack.
|
||||
* **`mount`** the container will have an isolated mount table.
|
||||
* **`ipc`** processes inside the container will only be able to communicate to other processes inside the same
|
||||
container via system level IPC.
|
||||
* **`uts`** the container will be able to have its own hostname and domain name.
|
||||
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups
|
||||
within the container.
|
||||
|
||||
## Devices
|
||||
|
||||
Devices is an array specifying the list of devices to be created in the container.
|
||||
Next parameters can be specified:
|
||||
`devices` is an array specifying the list of devices to be created in the container.
|
||||
|
||||
* **`type`** - type of device: `c`, `b`, `u` or `p`. More info in `man mknod`
|
||||
* **`path`** - full path to device inside container
|
||||
* **`major, minor`** - major, minor numbers for device. More info in `man mknod`.
|
||||
There is special value: `-1`, which means `*` for `device`
|
||||
cgroup setup.
|
||||
* **`permissions`** - cgroup permissions for device. A composition of `r`
|
||||
(read), `w` (write), and `m` (mknod).
|
||||
* **`fileMode`** - file mode for device file
|
||||
* **`uid`** - uid of device owner
|
||||
* **`gid`** - gid of device owner
|
||||
The following parameters can be specified:
|
||||
|
||||
* **`type`** *(char, required)* - type of device: `c`, `b`, `u` or `p`. More info in `man mknod`.
|
||||
|
||||
* **`path`** *(string, optional)* - full path to device inside container
|
||||
|
||||
* **`major, minor`** *(int64, required)* - major, minor numbers for device. More info in `man mknod`. There is a special value: `-1`, which means `*` for `device` cgroup setup.
|
||||
|
||||
* **`permissions`** *(string, optional)* - cgroup permissions for device. A composition of `r` (*read*), `w` (*write*), and `m` (*mknod*).
|
||||
|
||||
* **`fileMode`** *(uint32, optional)* - file mode for device file
|
||||
|
||||
* **`uid`** *(uint32, optional)* - uid of device owner
|
||||
|
||||
* **`gid`** *(uint32, optional)* - gid of device owner
|
||||
|
||||
**`fileMode`**, **`uid`** and **`gid`** are required if **`path`** is given and are otherwise not allowed.
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"devices": [
|
||||
|
@ -154,6 +164,16 @@ For example, to run a new process in an existing container without updating limi
|
|||
|
||||
#### Disable out-of-memory killer
|
||||
|
||||
`disableOOMKiller` contains a boolean (`true` or `false`) that enables or disables the Out of Memory killer for a cgroup.
|
||||
If enabled (`false`), tasks that attempt to consume more memory than they are allowed are immediately killed by the OOM killer.
|
||||
The OOM killer is enabled by default in every cgroup using the `memory` subsystem.
|
||||
To disable it, specify a value of `true`.
|
||||
For more information, see [the memory cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/memory.txt).
|
||||
|
||||
* **`disableOOMKiller`** *(bool, optional)* - enables or disables the OOM killer
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"disableOOMKiller": false
|
||||
```
|
||||
|
@ -168,6 +188,23 @@ More information on `oom_score_adj` available [here](https://www.kernel.org/doc/
|
|||
|
||||
#### Memory
|
||||
|
||||
`memory` represents the cgroup subsystem `memory` and it's used to set limits on the container's memory usage.
|
||||
For more information, see [the memory cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/memory.txt).
|
||||
|
||||
The following parameters can be specified to setup the controller:
|
||||
|
||||
* **`limit`** *(uint64, optional)* - sets limit of memory usage
|
||||
|
||||
* **`reservation`** *(uint64, optional)* - sets soft limit of memory usage
|
||||
|
||||
* **`swap`** *(uint64, optional)* - sets limit of memory+Swap usage
|
||||
|
||||
* **`kernel`** *(uint64, optional)* - sets hard limit for kernel memory
|
||||
|
||||
* **`swappiness`** *(uint64, optional)* - sets swappiness parameter of vmscan (See sysctl's vm.swappiness)
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"memory": {
|
||||
"limit": 0,
|
||||
|
@ -180,6 +217,27 @@ More information on `oom_score_adj` available [here](https://www.kernel.org/doc/
|
|||
|
||||
#### CPU
|
||||
|
||||
`cpu` represents the cgroup subsystems `cpu` and `cpusets`.
|
||||
For more information, see [the cpusets cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/cpusets.txt).
|
||||
|
||||
The following parameters can be specified to setup the controller:
|
||||
|
||||
* **`shares`** *(uint64, optional)* - specifies a relative share of CPU time available to the tasks in a cgroup
|
||||
|
||||
* **`quota`** *(uint64, optional)* - specifies the total amount of time in microseconds for which all tasks in a cgroup can run during one period (as defined by **`period`** below)
|
||||
|
||||
* **`period`** *(uint64, optional)* - specifies a period of time in microseconds for how regularly a cgroup's access to CPU resources should be reallocated (CFS scheduler only)
|
||||
|
||||
* **`realtimeRuntime`** *(uint64, optional)* - specifies a period of time in microseconds for the longest continuous period in which the tasks in a cgroup have access to CPU resources
|
||||
|
||||
* **`realtimePeriod`** *(uint64, optional)* - same as **`period`** but applies to realtime scheduler only
|
||||
|
||||
* **`cpus`** *(cpus, optional)* - list of CPUs the container will run in
|
||||
|
||||
* **`mems`** *(mems, optional)* - list of Memory Nodes the container will run in
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"cpu": {
|
||||
"shares": 0,
|
||||
|
@ -195,9 +253,9 @@ More information on `oom_score_adj` available [here](https://www.kernel.org/doc/
|
|||
#### Block IO Controller
|
||||
|
||||
`blockIO` represents the cgroup subsystem `blkio` which implements the block io controller.
|
||||
For more information, see the [kernel cgroups documentation about `blkio`](https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt).
|
||||
For more information, see [the kernel cgroups documentation about blkio](https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt).
|
||||
|
||||
The following parameters can be specified to setup the block io controller:
|
||||
The following parameters can be specified to setup the controller:
|
||||
|
||||
* **`blkioWeight`** *(uint16, optional)* - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules. The range is from 10 to 1000.
|
||||
|
||||
|
@ -205,8 +263,8 @@ The following parameters can be specified to setup the block io controller:
|
|||
|
||||
* **`blkioWeightDevice`** *(array, optional)* - specifies the list of devices which will be bandwidth rate limited. The following parameters can be specified per-device:
|
||||
* **`major, minor`** *(int64, required)* - major, minor numbers for device. More info in `man mknod`.
|
||||
* **`weight`** *(uint16, optional)* - bandwidth rate for the device, range is from 10 to 1000.
|
||||
* **`leafWeight`** *(uint16, optional)* - bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, cfq scheduler only.
|
||||
* **`weight`** *(uint16, optional)* - bandwidth rate for the device, range is from 10 to 1000
|
||||
* **`leafWeight`** *(uint16, optional)* - bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, CFQ scheduler only
|
||||
|
||||
You must specify at least one of `weight` or `leafWeight` in a given entry, and can specify both.
|
||||
|
||||
|
@ -252,6 +310,18 @@ The following parameters can be specified to setup the block io controller:
|
|||
|
||||
#### Huge page limits
|
||||
|
||||
`hugepageLimits` represents the `hugetlb` controller which allows to limit the
|
||||
HugeTLB usage per control group and enforces the controller limit during page fault.
|
||||
For more information, see the [kernel cgroups documentation about HugeTLB](https://www.kernel.org/doc/Documentation/cgroups/hugetlb.txt).
|
||||
|
||||
`hugepageLimits` is an array of entries, each having the following structure:
|
||||
|
||||
* **`pageSize`** *(string, required)* - hugepage size
|
||||
|
||||
* **`limit`** *(uint64, required)* - limit in bytes of *hugepagesize* HugeTLB usage
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"hugepageLimits": [
|
||||
{
|
||||
|
@ -263,9 +333,23 @@ The following parameters can be specified to setup the block io controller:
|
|||
|
||||
#### Network
|
||||
|
||||
`network` represents the cgroup subsystems `net_cls` and `net_prio`.
|
||||
For more information, see [the net\_cls cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/net_cls.txt) and [the net\_prio cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/net_prio.txt).
|
||||
|
||||
The following parameters can be specified to setup these cgroup controllers:
|
||||
|
||||
* **`classID`** *(string, optional)* - is the network class identifier the cgroup's network packets will be tagged with
|
||||
|
||||
* **`priorities`** *(array, optional)* - specifies a list of objects of the priorities assigned to traffic originating from
|
||||
processes in the group and egressing the system on various interfaces. The following parameters can be specified per-priority:
|
||||
* **`name`** *(string, required)* - interface name
|
||||
* **`priority`** *(uint32, required)* - priority applied to the interface
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"network": {
|
||||
"classId": "ClassId",
|
||||
"classID": "0x100001",
|
||||
"priorities": [
|
||||
{
|
||||
"name": "eth0",
|
||||
|
@ -279,11 +363,31 @@ The following parameters can be specified to setup the block io controller:
|
|||
}
|
||||
```
|
||||
|
||||
#### PIDs
|
||||
|
||||
`pids` represents the cgroup subsystem `pids`.
|
||||
For more information, see [the pids cgroup man page](https://www.kernel.org/doc/Documentation/cgroups/pids.txt
|
||||
).
|
||||
|
||||
The following paramters can be specified to setup the controller:
|
||||
|
||||
* **`limit`** *(int64, required)* - specifies the maximum number of tasks in the cgroup
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"pids": {
|
||||
"limit": 32771
|
||||
}
|
||||
```
|
||||
|
||||
## Sysctl
|
||||
|
||||
sysctl allows kernel parameters to be modified at runtime for the container.
|
||||
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"sysctl": {
|
||||
"net.ipv4.ip_forward": "1",
|
||||
|
@ -297,6 +401,8 @@ rlimits allow setting resource limits.
|
|||
`type` is a string with a value from those defined in [the man page](http://man7.org/linux/man-pages/man2/setrlimit.2.html).
|
||||
The kernel enforces the `soft` limit for a resource while the `hard` limit acts as a ceiling for that value that could be set by an unprivileged process.
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"rlimits": [
|
||||
{
|
||||
|
@ -311,6 +417,9 @@ The kernel enforces the `soft` limit for a resource while the `hard` limit acts
|
|||
|
||||
SELinux process label specifies the label with which the processes in a container are run.
|
||||
For more information about SELinux, see [Selinux documentation](http://selinuxproject.org/page/Main_Page)
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"selinuxProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675"
|
||||
```
|
||||
|
@ -320,6 +429,8 @@ For more information about SELinux, see [Selinux documentation](http://selinuxp
|
|||
Apparmor profile specifies the name of the apparmor profile that will be used for the container.
|
||||
For more information about Apparmor, see [Apparmor documentation](https://wiki.ubuntu.com/AppArmor)
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"apparmorProfile": "acme_secure_profile"
|
||||
```
|
||||
|
@ -361,6 +472,8 @@ Operator Constants:
|
|||
* `SCMP_CMP_GT`
|
||||
* `SCMP_CMP_MASKED_EQ`
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"seccomp": {
|
||||
"defaultAction": "SCMP_ACT_ALLOW",
|
||||
|
@ -382,6 +495,8 @@ rootfsPropagation sets the rootfs's mount propagation.
|
|||
Its value is either slave, private, or shared.
|
||||
[The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
|
||||
|
||||
###### Example
|
||||
|
||||
```json
|
||||
"rootfsPropagation": "slave",
|
||||
```
|
||||
|
|
|
@ -103,7 +103,7 @@ type InterfacePriority struct {
|
|||
// Name is the name of the network interface
|
||||
Name string `json:"name"`
|
||||
// Priority for the interface
|
||||
Priority int64 `json:"priority"`
|
||||
Priority uint32 `json:"priority"`
|
||||
}
|
||||
|
||||
// blockIODevice holds major:minor format supported in blkio cgroup
|
||||
|
@ -119,7 +119,7 @@ type WeightDevice struct {
|
|||
blockIODevice
|
||||
// Weight is the bandwidth rate for the device, range is from 10 to 1000
|
||||
Weight uint16 `json:"weight"`
|
||||
// LeafWeight is the bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, cfq scheduler only
|
||||
// LeafWeight is the bandwidth rate for the device while competing with the cgroup's child cgroups, range is from 10 to 1000, CFQ scheduler only
|
||||
LeafWeight uint16 `json:"leafWeight"`
|
||||
}
|
||||
|
||||
|
@ -134,7 +134,7 @@ type ThrottleDevice struct {
|
|||
type BlockIO struct {
|
||||
// Specifies per cgroup weight, range is from 10 to 1000
|
||||
Weight uint16 `json:"blkioWeight"`
|
||||
// Specifies tasks' weight in the given cgroup while competing with the cgroup's child cgroups, range is from 10 to 1000, cfq scheduler only
|
||||
// Specifies tasks' weight in the given cgroup while competing with the cgroup's child cgroups, range is from 10 to 1000, CFQ scheduler only
|
||||
LeafWeight uint16 `json:"blkioLeafWeight"`
|
||||
// Weight per cgroup per device, can override BlkioWeight
|
||||
WeightDevice []*WeightDevice `json:"blkioWeightDevice"`
|
||||
|
@ -151,29 +151,29 @@ type BlockIO struct {
|
|||
// Memory for Linux cgroup 'memory' resource management
|
||||
type Memory struct {
|
||||
// Memory limit (in bytes)
|
||||
Limit int64 `json:"limit"`
|
||||
Limit uint64 `json:"limit"`
|
||||
// Memory reservation or soft_limit (in bytes)
|
||||
Reservation int64 `json:"reservation"`
|
||||
Reservation uint64 `json:"reservation"`
|
||||
// Total memory usage (memory + swap); set `-1' to disable swap
|
||||
Swap int64 `json:"swap"`
|
||||
Swap uint64 `json:"swap"`
|
||||
// Kernel memory limit (in bytes)
|
||||
Kernel int64 `json:"kernel"`
|
||||
Kernel uint64 `json:"kernel"`
|
||||
// How aggressive the kernel will swap memory pages. Range from 0 to 100. Set -1 to use system default
|
||||
Swappiness int64 `json:"swappiness"`
|
||||
Swappiness uint64 `json:"swappiness"`
|
||||
}
|
||||
|
||||
// CPU for Linux cgroup 'cpu' resource management
|
||||
type CPU struct {
|
||||
// CPU shares (relative weight vs. other cgroups with cpu shares)
|
||||
Shares int64 `json:"shares"`
|
||||
Shares uint64 `json:"shares"`
|
||||
// CPU hardcap limit (in usecs). Allowed cpu time in a given period
|
||||
Quota int64 `json:"quota"`
|
||||
Quota uint64 `json:"quota"`
|
||||
// CPU period to be used for hardcapping (in usecs). 0 to use system default
|
||||
Period int64 `json:"period"`
|
||||
Period uint64 `json:"period"`
|
||||
// How many time CPU will use in realtime scheduling (in usecs)
|
||||
RealtimeRuntime int64 `json:"realtimeRuntime"`
|
||||
RealtimeRuntime uint64 `json:"realtimeRuntime"`
|
||||
// CPU period to be used for realtime scheduling (in usecs)
|
||||
RealtimePeriod int64 `json:"realtimePeriod"`
|
||||
RealtimePeriod uint64 `json:"realtimePeriod"`
|
||||
// CPU to use within the cpuset
|
||||
Cpus string `json:"cpus"`
|
||||
// MEM to use within the cpuset
|
||||
|
@ -189,7 +189,9 @@ type Pids struct {
|
|||
// Network identification and priority configuration
|
||||
type Network struct {
|
||||
// Set class identifier for container's network packets
|
||||
ClassID string `json:"classId"`
|
||||
// this is actually a string instead of a uint64 to overcome the json
|
||||
// limitation of specifying hex numbers
|
||||
ClassID string `json:"classID"`
|
||||
// Set priority of network traffic for container
|
||||
Priorities []InterfacePriority `json:"priorities"`
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue