Go to file

Xiaochen Shen 27560ace2f libcontainer: intelrdt: add support for Intel RDT/MBA in runc Memory Bandwidth Allocation (MBA) is a resource allocation sub-feature of Intel Resource Director Technology (RDT) which is supported on some Intel Xeon platforms. Intel RDT/MBA provides indirect and approximate throttle over memory bandwidth for the software. A user controls the resource by indicating the percentage of maximum memory bandwidth. Hardware details of Intel RDT/MBA can be found in section 17.18 of Intel Software Developer Manual: https://software.intel.com/en-us/articles/intel-sdm In Linux 4.12 kernel and newer, Intel RDT/MBA is enabled by kernel config CONFIG_INTEL_RDT. If hardware support, CPU flags `rdt_a` and `mba` will be set in /proc/cpuinfo. Intel RDT "resource control" filesystem hierarchy: mount -t resctrl resctrl /sys/fs/resctrl tree /sys/fs/resctrl /sys/fs/resctrl/ \|-- info \| \|-- L3 \| \| \|-- cbm_mask \| \| \|-- min_cbm_bits \| \| \|-- num_closids \| \|-- MB \| \|-- bandwidth_gran \| \|-- delay_linear \| \|-- min_bandwidth \| \|-- num_closids \|-- ... \|-- schemata \|-- tasks \|-- <container_id> \|-- ... \|-- schemata \|-- tasks For MBA support for `runc`, we will reuse the infrastructure and code base of Intel RDT/CAT which implemented in #1279. We could also make use of `tasks` and `schemata` configuration for memory bandwidth resource constraints. The file `tasks` has a list of tasks that belongs to this group (e.g., <container_id>" group). Tasks can be added to a group by writing the task ID to the "tasks" file (which will automatically remove them from the previous group to which they belonged). New tasks created by fork(2) and clone(2) are added to the same group as their parent. The file `schemata` has a list of all the resources available to this group. Each resource (L3 cache, memory bandwidth) has its own line and format. Memory bandwidth schema: It has allocation values for memory bandwidth on each socket, which contains L3 cache id and memory bandwidth percentage. Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..." The minimum bandwidth percentage value for each CPU model is predefined and can be looked up through "info/MB/min_bandwidth". The bandwidth granularity that is allocated is also dependent on the CPU model and can be looked up at "info/MB/bandwidth_gran". The available bandwidth control steps are: min_bw + N * bw_gran. Intermediate values are rounded to the next control step available on the hardware. For more information about Intel RDT kernel interface: https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt An example for runc: Consider a two-socket machine with two L3 caches where the minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%. Tasks inside the container may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1. "linux": { "intelRdt": { "memBwSchema": "MB:0=20;1=70" } } Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>		2018-10-16 14:29:29 +08:00
contrib	doc: fix typo	2018-09-07 11:58:59 +08:00
docs	docs: add information about terminals	2018-06-25 05:34:50 +10:00
libcontainer	libcontainer: intelrdt: add support for Intel RDT/MBA in runc	2018-10-16 14:29:29 +08:00
man	Add missing data to man page	2018-08-15 20:21:13 -04:00
script	make: validate C format	2018-01-24 10:49:50 +09:00
tests	doc: fix typo	2018-09-07 11:58:59 +08:00
vendor	vendor: bump runtime-spec to 5684b8af48c1	2018-10-16 13:18:25 +08:00
.gitignore	travis: test cross compilation	2018-06-16 09:32:39 +09:00
.pullapprove.yml	Disallow self-LGTMs	2016-06-01 09:31:21 +08:00
.travis.yml	fix build break	2018-10-12 09:22:35 -05:00
CONTRIBUTING.md	*: add information about security mailing list	2016-12-03 18:54:53 +11:00
Dockerfile	Dockerfile: update criu to v3.10 + checkpoint-restore/criu@27034e7c	2018-08-12 14:05:44 +09:00
LICENSE	Initial commit of runc binary	2015-06-21 19:34:13 -07:00
MAINTAINERS	Remove @avagin as a maintainer	2017-08-02 10:55:08 -07:00
MAINTAINERS_GUIDE.md	trailing punctuation in header	2016-12-02 15:42:17 +08:00
Makefile	Add docker proxy settings for make test in a proxy environment	2018-08-22 18:19:48 +09:00
NOTICE	Move libcontainer documenation to root of repo	2015-06-26 11:50:46 -07:00
PRINCIPLES.md	Move libcontainer documenation to root of repo	2015-06-26 11:50:46 -07:00
README.md	Add docker proxy settings for make test in a proxy environment	2018-08-22 18:19:48 +09:00
VERSION	VERSION: back to development	2018-02-27 12:25:04 +11:00
checkpoint.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
create.go	Prepare startContainer() to have more action	2017-05-01 21:55:57 +03:00
delete.go	Moving the rest of runc to x/sys/unix	2017-05-22 17:36:02 -05:00
events.go	libcontainer: intelrdt: add support for Intel RDT/MBA in runc	2018-10-16 14:29:29 +08:00
exec.go	Fix race in runc exec	2018-06-01 16:25:58 -07:00
init.go	runc only works on Linux so remove putative Solaris and unsupported main	2017-06-29 16:00:26 +01:00
kill.go	kill.go: Remove unnecessary checks	2018-01-26 09:52:05 +03:00
list.go	list: stop casting unknown UIDs to their unicode values	2017-07-12 06:30:01 +10:00
main.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
notify_socket.go	libcontainer/specconv/spec_linux: Support empty 'type' for bind mounts	2018-03-07 10:23:42 -08:00
pause.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
ps.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
restore.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
rlimit_linux.go	error strings should not be capitalized or end with punctuation	2016-12-01 11:57:16 +08:00
rootless_linux.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
run.go	Prepare startContainer() to have more action	2017-05-01 21:55:57 +03:00
signalmap.go	Stop using unix.SIGUNUSED which has been removed from golang.org/x/sys	2018-06-17 19:12:12 +10:00
signalmap_mipsx.go	Add support for mips/mips64	2017-06-02 22:30:00 +02:00
signals.go	doc: fix typo	2018-09-07 11:58:59 +08:00
spec.go	Disable rootless mode except RootlessCgMgr when executed as the root in userns	2018-09-07 15:05:03 +09:00
start.go	Only allow single container operation	2017-03-08 10:02:39 +08:00
state.go	Check args numbers before application start	2016-11-29 11:18:51 +08:00
tty.go	Merge pull request #1895 from giuseppe/fix-tty-hang	2018-09-20 10:02:08 -07:00
update.go	libcontainer: intelrdt: add support for Intel RDT/MBA in runc	2018-10-16 14:29:29 +08:00
utils.go	main: support rootless mode in userns	2018-05-10 12:16:43 +09:00
utils_linux.go	libcontainer: intelrdt: add support for Intel RDT/MBA in runc	2018-10-16 14:29:29 +08:00
vendor.conf	vendor: bump runtime-spec to 5684b8af48c1	2018-10-16 13:18:25 +08:00

README.md

runc

Introduction

runc is a CLI tool for spawning and running containers according to the OCI specification.

Releases

runc depends on and tracks the runtime-spec repository. We will try to make sure that runc and the OCI specification major versions stay in lockstep. This means that runc 1.0.0 should implement the 1.0 version of the specification.

You can find official releases of runc on the release page.

Security

If you wish to report a security issue, please disclose the issue responsibly to security@opencontainers.org.

Building

runc currently supports the Linux platform with various architecture support. It must be built with Go version 1.6 or higher in order for some features to function properly.

In order to enable seccomp support you will need to install libseccomp on your platform.

e.g. libseccomp-devel for CentOS, or libseccomp-dev for Ubuntu

Otherwise, if you do not want to build runc with seccomp support you can add BUILDTAGS="" when running make.

# create a 'github.com/opencontainers' in your GOPATH/src
cd github.com/opencontainers
git clone https://github.com/opencontainers/runc
cd runc

make
sudo make install

You can also use go get to install to your GOPATH, assuming that you have a github.com parent folder already created under src:

go get github.com/opencontainers/runc
cd $GOPATH/src/github.com/opencontainers/runc
make
sudo make install

runc will be installed to /usr/local/sbin/runc on your system.

Build Tags

runc supports optional build tags for compiling support of various features. To add build tags to the make option the BUILDTAGS variable must be set.

make BUILDTAGS='seccomp apparmor'

Build Tag	Feature	Dependency
seccomp	Syscall filtering	libseccomp
selinux	selinux process and mount labeling
apparmor	apparmor profile support
ambient	ambient capability support	kernel 4.3

Running the test suite

runc currently supports running its test suite via Docker. To run the suite just type make test.

make test

There are additional make targets for running the tests outside of a container but this is not recommended as the tests are written with the expectation that they can write and remove anywhere.

You can run a specific test case by setting the TESTFLAGS variable.

# make test TESTFLAGS="-run=SomeTestFunction"

You can run a specific integration test by setting the TESTPATH variable.

# make test TESTPATH="/checkpoint.bats"

You can run a test in your proxy environment by setting DOCKER_BUILD_PROXY and DOCKER_RUN_PROXY variables.

# make test DOCKER_BUILD_PROXY="--build-arg HTTP_PROXY=http://yourproxy/" DOCKER_RUN_PROXY="-e HTTP_PROXY=http://yourproxy/"

Dependencies Management

runc uses vndr for dependencies management. Please refer to vndr for how to add or update new dependencies.

Using runc

Creating an OCI Bundle

In order to use runc you must have your container in the format of an OCI bundle. If you have Docker installed you can use its export method to acquire a root filesystem from an existing Docker container.

# create the top most bundle directory
mkdir /mycontainer
cd /mycontainer

# create the rootfs directory
mkdir rootfs

# export busybox via Docker into the rootfs directory
docker export $(docker create busybox) | tar -C rootfs -xvf -

After a root filesystem is populated you just generate a spec in the format of a config.json file inside your bundle. runc provides a spec command to generate a base template spec that you are then able to edit. To find features and documentation for fields in the spec please refer to the specs repository.

runc spec

Running Containers

Assuming you have an OCI bundle from the previous step you can execute the container in two different ways.

The first way is to use the convenience command run that will handle creating, starting, and deleting the container after it exits.

# run as root
cd /mycontainer
runc run mycontainerid

If you used the unmodified runc spec template this should give you a sh session inside the container.

The second way to start a container is using the specs lifecycle operations. This gives you more power over how the container is created and managed while it is running. This will also launch the container in the background so you will have to edit the config.json to remove the terminal setting for the simple examples here. Your process field in the config.json should look like this below with "terminal": false and "args": ["sleep", "5"].

        "process": {
                "terminal": false,
                "user": {
                        "uid": 0,
                        "gid": 0
                },
                "args": [
                        "sleep", "5"
                ],
                "env": [
                        "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                        "TERM=xterm"
                ],
                "cwd": "/",
                "capabilities": {
                        "bounding": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "effective": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "inheritable": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "permitted": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ],
                        "ambient": [
                                "CAP_AUDIT_WRITE",
                                "CAP_KILL",
                                "CAP_NET_BIND_SERVICE"
                        ]
                },
                "rlimits": [
                        {
                                "type": "RLIMIT_NOFILE",
                                "hard": 1024,
                                "soft": 1024
                        }
                ],
                "noNewPrivileges": true
        },

Now we can go through the lifecycle operations in your shell.

# run as root
cd /mycontainer
runc create mycontainerid

# view the container is created and in the "created" state
runc list

# start the process inside the container
runc start mycontainerid

# after 5 seconds view that the container has exited and is now in the stopped state
runc list

# now delete the container
runc delete mycontainerid

This allows higher level systems to augment the containers creation logic with setup of various settings after the container is created and/or before it is deleted. For example, the container's network stack is commonly set up after create but before start.

Rootless containers

runc has the ability to run containers without root privileges. This is called rootless. You need to pass some parameters to runc in order to run rootless containers. See below and compare with the previous version. Run the following commands as an ordinary user:

# Same as the first example
mkdir ~/mycontainer
cd ~/mycontainer
mkdir rootfs
docker export $(docker create busybox) | tar -C rootfs -xvf -

# The --rootless parameter instructs runc spec to generate a configuration for a rootless container, which will allow you to run the container as a non-root user.
runc spec --rootless

# The --root parameter tells runc where to store the container state. It must be writable by the user.
runc --root /tmp/runc run mycontainerid

Supervisors

runc can be used with process supervisors and init systems to ensure that containers are restarted when they exit. An example systemd unit file looks something like this.

[Unit]
Description=Start My Container

[Service]
Type=forking
ExecStart=/usr/local/sbin/runc run -d --pid-file /run/mycontainerid.pid mycontainerid
ExecStopPost=/usr/local/sbin/runc delete mycontainerid
WorkingDirectory=/mycontainer
PIDFile=/run/mycontainerid.pid

[Install]
WantedBy=multi-user.target