runc/libcontainer
Kir Kolyshkin 59897367c4 cgroups/systemd: allow to set -1 as pids.limit
Currently, both systemd cgroup drivers (v1 and v2) only set
"TasksMax" unit property if the value > 0, so there is no
way to update the limit to -1 / unlimited / infinity / max.

Since systemd driver is backed by fs driver, and both fs and fs2
set the limit of -1 properly, it works, but systemd still has
the old value:

 # runc --systemd-cgroup update $CT --pids-limit 42
 # systemctl show runc-$CT.scope | grep TasksMax
 TasksMax=42
 # cat /sys/fs/cgroup/system.slice/runc-$CT.scope/pids.max
 42

 # ./runc --systemd-cgroup update $CT --pids-limit -1
 # systemctl show runc-$CT.scope | grep TasksMax=
 TasksMax=42
 # cat /sys/fs/cgroup/system.slice/runc-xx77.scope/pids.max
 max

Fix by changing the condition to allow -1 as a valid value.

NOTE other negative values are still being ignored by systemd drivers
(as it was done before). I am not sure whether this is correct, or
should we return an error.

A test case is added.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-20 13:20:04 -07:00
..
apparmor *: verify that operations on /proc/... are on procfs 2019-09-30 09:06:48 +10:00
cgroups cgroups/systemd: allow to set -1 as pids.limit 2020-05-20 13:20:04 -07:00
configs libcontainer: honor seccomp errnoRet 2020-05-20 09:11:55 +02:00
devices configs: use different types for .Devices and .Resources.Devices 2020-05-13 17:38:45 +10:00
integration libcontainer: honor seccomp errnoRet 2020-05-20 09:11:55 +02:00
intelrdt intelrdt: Add Cache Monitoring Technology stats 2020-04-25 09:43:48 +02:00
keys Replace formatted errors when unneeded 2020-05-16 18:13:21 -07:00
logs Write logs to stderr by default 2019-04-24 15:18:14 +03:00
nsenter libcontainer: dual-license nsenter/cloned_binary.c 2020-02-22 00:17:07 +11:00
seccomp libcontainer: honor seccomp errnoRet 2020-05-20 09:11:55 +02:00
specconv libcontainer: honor seccomp errnoRet 2020-05-20 09:11:55 +02:00
stacktrace doc: fix typo 2018-09-07 11:58:59 +08:00
system Simplify ticks, as the value is a constant 2020-05-04 23:05:46 +02:00
user Nit: fix use of bufio.Scanner.Err 2020-03-27 00:12:17 -07:00
utils *: verify that operations on /proc/... are on procfs 2019-09-30 09:06:48 +10:00
README.md cgroup: devices: eradicate the Allow/Deny lists 2020-05-13 17:38:45 +10:00
SPEC.md Merge pull request #1919 from xiaochenshen/rdt-mba-software-controller 2018-11-26 16:45:42 -05:00
capabilities_linux.go bump syndtr/gocapability d98352740cb2c55f81556b63d4a1ec64c5a319c2 2019-09-06 01:44:26 +02:00
console_linux.go tty: move IO of master pty to be done with epoll 2017-07-28 12:35:02 +01:00
container.go libcontainer: Set 'status' in hook stdin 2018-11-14 06:49:49 -08:00
container_linux.go Update to latest go-criu (4.0.2) 2020-05-20 13:49:38 +02:00
container_linux_test.go cgroups: add GetFreezerState() helper to Manager 2020-05-13 17:38:45 +10:00
criu_opts_linux.go runc checkpoint: fix --status-fd to accept fd 2020-05-11 15:36:50 -07:00
error.go Fix the outdated comment for Error interface 2017-01-03 15:06:47 +08:00
error_test.go [unittest] add extra ErrorCode in TestErrorCode testcase 2016-09-22 20:15:54 +08:00
factory.go could load a stopped container. 2017-04-07 07:39:41 -04:00
factory_linux.go libct/cgroups/systemd/v1: add NewLegacyManager 2020-05-08 10:07:40 -07:00
factory_linux_test.go libct/TestFactoryNewTmpfs: benefit from GetMounts 2020-03-21 10:33:43 -07:00
generic_error.go libcontainer: don't double-quote errors 2020-05-03 02:55:15 +02:00
generic_error_test.go add testcase in generic_error_test.go 2017-04-18 08:56:02 +08:00
init_linux.go Replace formatted errors when unneeded 2020-05-16 18:13:21 -07:00
message_linux.go Disable rootless mode except RootlessCgMgr when executed as the root in userns 2018-09-07 15:05:03 +09:00
network_linux.go Expose network interfaces via runc events 2019-12-05 13:20:51 +01:00
notify_linux.go notify: simplify usage 2020-05-08 10:05:58 -07:00
notify_linux_test.go notify: simplify usage 2020-05-08 10:05:58 -07:00
notify_linux_v2.go fix runc events error in cgroup v2 2020-05-07 22:18:46 +08:00
process.go Improve nsexec logging 2019-04-22 17:53:52 +03:00
process_linux.go Replace formatted errors when unneeded 2020-05-16 18:13:21 -07:00
restored_process.go restore: fix a race condition in process.Wait() 2020-02-10 10:21:08 -08:00
rootfs_linux.go configs: use different types for .Devices and .Resources.Devices 2020-05-13 17:38:45 +10:00
rootfs_linux_test.go Only allow proc mount if it is procfs 2019-09-24 11:00:18 -04:00
setns_init_linux.go vendor: opencontainers/selinux v1.5.1, update deprecated uses 2020-05-05 15:53:40 +02:00
standard_init_linux.go vendor: opencontainers/selinux v1.5.1, update deprecated uses 2020-05-05 15:53:40 +02:00
state_linux.go Remove unreachable code paths 2020-03-12 09:13:03 +01:00
state_linux_test.go libcontainer/state_linux_test: Add a testTransitions helper 2018-01-25 11:18:45 -08:00
stats_linux.go Expose network interfaces via runc events 2019-12-05 13:20:51 +01:00
sync.go Replace formatted errors when unneeded 2020-05-16 18:13:21 -07:00

README.md

libcontainer

GoDoc

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

import (
	_ "github.com/opencontainers/runc/libcontainer/nsenter"
)

func init() {
	if len(os.Args) > 1 && os.Args[1] == "init" {
		runtime.GOMAXPROCS(1)
		runtime.LockOSThread()
		factory, _ := libcontainer.New("")
		if err := factory.StartInitialization(); err != nil {
			logrus.Fatal(err)
		}
		panic("--this line should have never been executed, congratulations--")
	}
}

Then to create a container you first have to initialize an instance of a factory that will handle the creation and initialization for a container.

factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
	logrus.Fatal(err)
	return
}

Once you have an instance of the factory created we can create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
                Bounding: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Effective: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Inheritable: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Permitted: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Ambient: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
        },
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
		{Type: configs.NEWCGROUP},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			Devices:          specconv.AllowedDevices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  specconv.AllowedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container:

container, err := factory.Create("container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
	Init:   true,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This let's you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Code and documentation copyright 2014 Docker, inc. The code and documentation are released under the Apache 2.0 license. The documentation is also released under Creative Commons Attribution 4.0 International License. You may obtain a copy of the license, titled CC-BY-4.0, at http://creativecommons.org/licenses/by/4.0/.