Go to file
Xiaochen Shen 692f6e1e27 libcontainer: add support for Intel RDT/CAT in runc
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-01 14:26:33 +08:00
contrib libcontainer/console_linux.go: Make SaneTerminal public 2017-06-07 21:32:41 -07:00
libcontainer libcontainer: add support for Intel RDT/CAT in runc 2017-09-01 14:26:33 +08:00
man update manpages for `runc update` 2017-05-04 07:41:08 -04:00
script release: import umoci's release.sh script 2017-08-16 14:35:52 +10:00
tests/integration Fix integration when missing criu 2017-07-14 20:15:20 +08:00
vendor Merge pull request #1526 from stevenh/logrus-v1 2017-07-27 13:28:55 -04:00
.gitignore move from Godeps to vndr 2017-02-24 11:25:21 +00:00
.pullapprove.yml Disallow self-LGTMs 2016-06-01 09:31:21 +08:00
.travis.yml travis: drop shfmt install 2017-08-31 20:49:51 +10:00
CONTRIBUTING.md *: add information about security mailing list 2016-12-03 18:54:53 +11:00
Dockerfile Remove shfmt 2017-07-06 11:08:44 -07:00
LICENSE Initial commit of runc binary 2015-06-21 19:34:13 -07:00
MAINTAINERS Remove @avagin as a maintainer 2017-08-02 10:55:08 -07:00
MAINTAINERS_GUIDE.md Update maintainers guide 2015-07-21 10:59:56 -07:00
Makefile ci: smoke-test the release script 2017-08-16 14:44:45 +10:00
NOTICE Move libcontainer documenation to root of repo 2015-06-26 11:50:46 -07:00
PRINCIPLES.md Move libcontainer documenation to root of repo 2015-06-26 11:50:46 -07:00
README.md README.md: adjust capabilities section in config.json example 2017-07-25 13:46:20 +02:00
VERSION VERSION: back to development 2017-08-02 15:24:09 +10:00
checkpoint.go Add auto-dedup flag for checkpoint/restore 2017-08-18 16:19:21 +02:00
create.go Prepare startContainer() to have more action 2017-05-01 21:55:57 +03:00
delete.go Moving the rest of runc to x/sys/unix 2017-05-22 17:36:02 -05:00
events.go libcontainer: add support for Intel RDT/CAT in runc 2017-09-01 14:26:33 +08:00
exec.go Update spec to 239c4e44f2 2017-06-01 16:29:47 -07:00
init.go runc only works on Linux so remove putative Solaris and unsupported main 2017-06-29 16:00:26 +01:00
kill.go Moving the rest of runc to x/sys/unix 2017-05-22 17:36:02 -05:00
list.go list: stop casting unknown UIDs to their unicode values 2017-07-12 06:30:01 +10:00
main.go Updated logrus to v1 2017-07-19 15:20:56 +00:00
notify_socket.go Updated logrus to v1 2017-07-19 15:20:56 +00:00
pause.go Only allow single container operation 2017-03-08 10:02:39 +08:00
ps.go runc: add support for rootless containers 2017-03-23 20:45:24 +11:00
restore.go Add auto-dedup flag for checkpoint/restore 2017-08-18 16:19:21 +02:00
rlimit_linux.go error strings should not be capitalized or end with punctuation 2016-12-01 11:57:16 +08:00
run.go Prepare startContainer() to have more action 2017-05-01 21:55:57 +03:00
signals.go Check error return values 2017-08-17 11:41:19 +02:00
spec.go Update runtime-spec to rc6 2017-07-12 16:24:04 -07:00
start.go Only allow single container operation 2017-03-08 10:02:39 +08:00
state.go Check args numbers before application start 2016-11-29 11:18:51 +08:00
tty.go libcontainer/console_linux.go: Make SaneTerminal public 2017-06-07 21:32:41 -07:00
update.go Update memory specs to use int64 not uint64 2017-06-27 12:16:07 +01:00
utils.go Updated logrus to v1 2017-07-19 15:20:56 +00:00
utils_linux.go libcontainer: add support for Intel RDT/CAT in runc 2017-09-01 14:26:33 +08:00
vendor.conf Merge pull request #1526 from stevenh/logrus-v1 2017-07-27 13:28:55 -04:00

README.md

runc

Build Status Go Report Card GoDoc

Introduction

runc is a CLI tool for spawning and running containers according to the OCI specification.

Releases

runc depends on and tracks the runtime-spec repository. We will try to make sure that runc and the OCI specification major versions stay in lockstep. This means that runc 1.0.0 should implement the 1.0 version of the specification.

You can find official releases of runc on the release page.

Security

If you wish to report a security issue, please disclose the issue responsibly to security@opencontainers.org.

Building

runc currently supports the Linux platform with various architecture support. It must be built with Go version 1.6 or higher in order for some features to function properly.

In order to enable seccomp support you will need to install libseccomp on your platform.

e.g. libseccomp-devel for CentOS, or libseccomp-dev for Ubuntu

Otherwise, if you do not want to build runc with seccomp support you can add BUILDTAGS="" when running make.

# create a 'github.com/opencontainers' in your GOPATH/src
cd github.com/opencontainers
git clone https://github.com/opencontainers/runc
cd runc

make
sudo make install

runc will be installed to /usr/local/sbin/runc on your system.

Build Tags

runc supports optional build tags for compiling support of various features. To add build tags to the make option the BUILDTAGS variable must be set.

make BUILDTAGS='seccomp apparmor'
Build Tag Feature Dependency
seccomp Syscall filtering libseccomp
selinux selinux process and mount labeling
apparmor apparmor profile support libapparmor
ambient ambient capability support kernel 4.3

Running the test suite

runc currently supports running its test suite via Docker. To run the suite just type make test.

make test

There are additional make targets for running the tests outside of a container but this is not recommended as the tests are written with the expectation that they can write and remove anywhere.

You can run a specific test case by setting the TESTFLAGS variable.

# make test TESTFLAGS="-run=SomeTestFunction"

Dependencies Management

runc uses vndr for dependencies management. Please refer to vndr for how to add or update new dependencies.

Using runc

Creating an OCI Bundle

In order to use runc you must have your container in the format of an OCI bundle. If you have Docker installed you can use its export method to acquire a root filesystem from an existing Docker container.

# create the top most bundle directory
mkdir /mycontainer
cd /mycontainer

# create the rootfs directory
mkdir rootfs

# export busybox via Docker into the rootfs directory
docker export $(docker create busybox) | tar -C rootfs -xvf -

After a root filesystem is populated you just generate a spec in the format of a config.json file inside your bundle. runc provides a spec command to generate a base template spec that you are then able to edit. To find features and documentation for fields in the spec please refer to the specs repository.

runc spec

Running Containers

Assuming you have an OCI bundle from the previous step you can execute the container in two different ways.

The first way is to use the convenience command run that will handle creating, starting, and deleting the container after it exits.

# run as root
cd /mycontainer
runc run mycontainerid

If you used the unmodified runc spec template this should give you a sh session inside the container.

The second way to start a container is using the specs lifecycle operations. This gives you more power over how the container is created and managed while it is running. This will also launch the container in the background so you will have to edit the config.json to remove the terminal setting for the simple examples here. Your process field in the config.json should look like this below with "terminal": false and "args": ["sleep", "5"].

        "process": {
                "terminal": false,
                "user": {
                        "uid": 0,
                        "gid": 0
                },
                "args": [
                        "sleep", "5"
                ],
                "env": [
                        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                        "TERM=xterm"
                ],
                "cwd": "/",
                "capabilities": {
                        "bounding": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "effective": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "inheritable": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "permitted": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "ambient": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ]
                },
                "rlimits": [
                        {
                                "type": "RLIMIT_NOFILE",
                                "hard": 1024,
                                "soft": 1024
                        }
                ],
                "noNewPrivileges": true
        },

Now we can go through the lifecycle operations in your shell.

# run as root
cd /mycontainer
runc create mycontainerid

# view the container is created and in the "created" state
runc list

# start the process inside the container
runc start mycontainerid

# after 5 seconds view that the container has exited and is now in the stopped state
runc list

# now delete the container
runc delete mycontainerid

This adds more complexity but allows higher level systems to manage runc and provides points in the containers creation to setup various settings after the container has created and/or before it is deleted. This is commonly used to setup the container's network stack after create but before start where the user's defined process will be running.

Rootless containers

runc has the ability to run containers without root privileges. This is called rootless. You need to pass some parameters to runc in order to run rootless containers. See below and compare with the previous version. Run the following commands as an ordinary user:

# Same as the first example
mkdir ~/mycontainer
cd ~/mycontainer
mkdir rootfs
docker export $(docker create busybox) | tar -C rootfs -xvf -

# The --rootless parameter instructs runc spec to generate a configuration for a rootless container, which will allow you to run the container as a non-root user.
runc spec --rootless

# The --root parameter tells runc where to store the container state. It must be writable by the user.
runc --root /tmp/runc run mycontainerid

Supervisors

runc can be used with process supervisors and init systems to ensure that containers are restarted when they exit. An example systemd unit file looks something like this.

[Unit]
Description=Start My Container

[Service]
Type=forking
ExecStart=/usr/local/sbin/runc run -d --pid-file /run/mycontainerid.pid mycontainerid
ExecStopPost=/usr/local/sbin/runc delete mycontainerid
WorkingDirectory=/mycontainer
PIDFile=/run/mycontainerid.pid

[Install]
WantedBy=multi-user.target