2015-06-25 08:15:48 +08:00
|
|
|
# Linux
|
|
|
|
|
2015-07-01 06:18:40 +08:00
|
|
|
The Linux container specification uses various kernel features like namespaces,
|
|
|
|
cgroups, capabilities, LSM, and file system jails to fulfill the spec.
|
|
|
|
Additional information is needed for Linux over the default spec configuration
|
|
|
|
in order to configure these various kernel features.
|
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
## Linux namespaces
|
|
|
|
|
|
|
|
A namespace wraps a global system resource in an abstraction that makes it
|
|
|
|
appear to the processes within the namespace that they have their own isolated
|
|
|
|
instance of the global resource. Changes to the global resource are visible to
|
|
|
|
other processes that are members of the namespace, but are invisible to other
|
|
|
|
processes. For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html)
|
|
|
|
|
|
|
|
Namespaces are specified in the spec as an array of entries. Each entry has a
|
|
|
|
type field with possible values described below and an optional path element.
|
|
|
|
If a path is specified, that particular file is used to join that type of namespace.
|
|
|
|
|
2015-06-30 02:54:10 +08:00
|
|
|
```json
|
2015-06-25 08:15:48 +08:00
|
|
|
"namespaces": [
|
2015-06-28 07:19:27 +08:00
|
|
|
{
|
|
|
|
"type": "pid",
|
|
|
|
"path": "/proc/1234/ns/pid"
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"type": "net",
|
|
|
|
"path": "/var/run/netns/neta"
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"type": "mnt",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"type": "ipc",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"type": "uts",
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"type": "user",
|
|
|
|
},
|
2015-06-30 02:54:10 +08:00
|
|
|
]
|
2015-06-25 08:15:48 +08:00
|
|
|
```
|
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
#### Namespace types
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
* **pid** processes inside the container will only be able to see other processes inside the same container.
|
|
|
|
* **network** the container will have it's own network stack.
|
|
|
|
* **mnt** the container will have an isolated mount table.
|
|
|
|
* **ipc** processes inside the container will only be able to communicate to other processes inside the same
|
|
|
|
container via system level IPC.
|
|
|
|
* **uts** the container will be able to have it's own hostname and domain name.
|
|
|
|
* **user** the container will be able to remap user and group IDs from the host to local users and groups
|
|
|
|
within the container.
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
### Access to devices
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
Devices is an array specifying the list of devices from the host to make available in the container.
|
|
|
|
By providing a device name within the list the runtime should lookup the same device on the host's `/dev`
|
|
|
|
and collect information about the device node so that it can be recreated for the container. The runtime
|
|
|
|
should not only create the device inside the container but ensure that the root user inside
|
|
|
|
the container has access rights for the device.
|
2015-06-25 12:14:35 +08:00
|
|
|
|
2015-06-30 02:54:10 +08:00
|
|
|
```json
|
|
|
|
"devices": [
|
|
|
|
"null",
|
|
|
|
"random",
|
|
|
|
"full",
|
|
|
|
"tty",
|
|
|
|
"zero",
|
|
|
|
"urandom"
|
|
|
|
]
|
|
|
|
```
|
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
## Linux control groups
|
2015-06-30 02:54:10 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
Also known as cgroups, they are used to restrict resource usage for a container and handle
|
2015-07-01 06:18:40 +08:00
|
|
|
device access. cgroups provide controls to restrict cpu, memory, IO, and network for
|
2015-07-01 06:13:13 +08:00
|
|
|
the container.
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
## Linux capabilities
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
Capabilities is an array that specifies Linux capabilities that can be provided to the process
|
|
|
|
inside the container. Valid values are the string after `CAP_` for capabilities defined
|
|
|
|
in [the man page](http://man7.org/linux/man-pages/man7/capabilities.7.html)
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-06-30 02:54:10 +08:00
|
|
|
```json
|
2015-06-25 08:15:48 +08:00
|
|
|
"capabilities": [
|
|
|
|
"AUDIT_WRITE",
|
|
|
|
"KILL",
|
|
|
|
"NET_BIND_SERVICE"
|
2015-06-30 02:54:10 +08:00
|
|
|
]
|
2015-06-25 08:15:48 +08:00
|
|
|
```
|
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
## Linux sysctl
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
sysctl allows kernel parameters to be modified at runtime for the container.
|
|
|
|
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
|
2015-06-27 02:20:17 +08:00
|
|
|
|
|
|
|
```
|
|
|
|
"sysctl": {
|
|
|
|
"net.ipv4.ip_forward": "1",
|
|
|
|
"net.core.somaxconn": "256"
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
## Security
|
2015-06-25 08:15:48 +08:00
|
|
|
|
2015-07-01 06:13:13 +08:00
|
|
|
**TODO:** security profiles
|
2015-06-25 08:15:48 +08:00
|
|
|
|