runc/libcontainer
Adrian Reber fa43a72aba
criu: restore into existing namespace when specified
Using CRIU to checkpoint and restore a container into an existing
network namespace is not possible.

If the network namespace is defined like

	{
		"type": "network",
		"path": "/run/netns/test"
	}

there is the expectation that the restored container is again running in
the network namespace specified with 'path'.

This adds the new CRIU 'external namespace' feature to runc, where
during checkpointing that specific namespace is referenced and during
restore CRIU tries to restore the container in exactly that
namespace.

This breaks/fixes current runc behavior. If, without this patch, runc
restores a container with such a network namespace definition, it is
ignored and CRIU recreates a network namespace without a name.

With this patch runc uses the network namespace path (if available) to
checkpoint and restore the container in just that network namespace.

Restore will now fail if a container was checkpointed with a network
namespace path set and if that network namespace path does not exist
during restore.

runc still falls back to the old behavior if CRIU older than 3.11 is
installed.

Fixes #1786

Related to https://github.com/projectatomic/libpod/pull/469

Thanks to Andrei Vagin for all the help in getting the interface between
CRIU and runc right!

Signed-off-by: Adrian Reber <areber@redhat.com>
2018-08-22 23:27:20 +02:00
..
apparmor libcontainer: remove dependency on libapparmor 2017-12-15 09:59:58 +01:00
cgroups cgroup: clean up isIgnorableError for skippable EROFS 2018-05-25 11:31:41 +10:00
configs libcontainer: improve "kernel.{domainname,hostname}" sysctl handling 2018-06-18 21:48:04 +10:00
criurpc criurpc.proto: copy latest criurpc.proto from criu 3.3 2017-08-02 16:07:32 +00:00
devices libcontainer: devices: fix mips builds 2018-06-17 11:22:01 +10:00
integration Fix race in runc exec 2018-06-01 16:25:58 -07:00
intelrdt libcontainer: intelrdt: fix a GetStats() issue 2017-10-17 17:37:07 +08:00
keys libcontainer: remove unnecessary type conversions 2017-09-25 10:41:57 +02:00
mount remove placeholder for non-linux platforms 2017-11-24 18:14:51 +00:00
nsenter Merge pull request #1562 from cyphar/carry-975-959-ipc-uid-namespaces 2018-04-26 14:12:33 -07:00
seccomp Fix breaking change in Seccomp profile behavior 2017-10-18 11:53:56 -04:00
specconv Fix regression with mounts with non-absolute source path 2018-07-18 18:30:49 +02:00
stacktrace fix typos 2016-11-30 13:31:36 +08:00
system libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits) 2018-06-14 18:33:14 +00:00
user libcontainer: fix compilation on GOARCH=arm GOARM=6 (32 bits) 2018-06-14 18:33:14 +00:00
utils tty: move IO of master pty to be done with epoll 2017-07-28 12:35:02 +01:00
README.md update READ.me for new struct configs.Config.Capabilities 2017-06-09 18:47:05 +08:00
SPEC.md Merge pull request #1206 from YuPengZTE/devMD026 2017-10-20 14:47:09 +08:00
capabilities_linux.go libcontainer/capabilities_linux: Drop os.Getpid() call 2018-02-19 15:47:42 -08:00
console_linux.go tty: move IO of master pty to be done with epoll 2017-07-28 12:35:02 +01:00
container.go libcontainer: Replace GetProcessStartTime with Stat_t.StartTime 2017-06-20 16:26:55 -07:00
container_linux.go criu: restore into existing namespace when specified 2018-08-22 23:27:20 +02:00
container_linux_test.go Ensure container tests do not write on the host 2017-11-27 10:43:10 +02:00
criu_opts_linux.go Update criu_opts_linux.go 2017-12-05 15:16:26 +08:00
error.go Fix the outdated comment for Error interface 2017-01-03 15:06:47 +08:00
error_test.go [unittest] add extra ErrorCode in TestErrorCode testcase 2016-09-22 20:15:54 +08:00
factory.go could load a stopped container. 2017-04-07 07:39:41 -04:00
factory_linux.go rootless: cgroup: treat EROFS as a skippable error 2018-03-17 13:53:42 +11:00
factory_linux_test.go Import docker/docker/pkg/mount into runc 2017-11-08 16:25:58 +01:00
generic_error.go libcontainer: refactor syncT handling 2016-12-01 15:46:04 +11:00
generic_error_test.go add testcase in generic_error_test.go 2017-04-18 08:56:02 +08:00
init_linux.go Wrap error messages during init 2018-05-10 10:28:10 -04:00
message_linux.go libcontainer: fix Boolmsg alignment 2018-03-26 14:44:03 +09:00
network_linux.go Revert "fix minor issue" 2017-03-20 12:28:43 +11:00
notify_linux.go Fix flaky test TestNotifyOnOOM 2017-08-14 15:18:59 +08:00
notify_linux_test.go Some fixes for testMemoryNotification 2017-08-14 15:28:03 +08:00
process.go Fix race in runc exec 2018-06-01 16:25:58 -07:00
process_linux.go libcontainer: expose annotations in hooks 2018-01-11 16:54:01 +01:00
restored_process.go libcontainer: Replace GetProcessStartTime with Stat_t.StartTime 2017-06-20 16:26:55 -07:00
rootfs_linux.go Revert "libcontainer/rootfs_linux: minor cleanup" 2018-08-14 15:50:18 -07:00
rootfs_linux_test.go Remove check for binding to / 2016-09-29 15:26:09 -07:00
setns_init_linux.go setns init: delay seccomp as late as possible 2017-08-26 13:42:30 +10:00
standard_init_linux.go Wrap error messages during init 2018-05-10 10:28:10 -04:00
state_linux.go libcontainer: expose annotations in hooks 2018-01-11 16:54:01 +01:00
state_linux_test.go libcontainer/state_linux_test: Add a testTransitions helper 2018-01-25 11:18:45 -08:00
stats.go Move libcontainer into subdirectory 2015-06-21 19:29:15 -07:00
stats_linux.go libcontainer: add support for Intel RDT/CAT in runc 2017-09-01 14:26:33 +08:00
sync.go Add separate console socket 2017-03-16 10:23:59 -07:00

README.md

libcontainer

GoDoc

Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.

Container

A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.

Using libcontainer

Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".

In addition to the go init function the early stage bootstrap is handled by importing nsenter.

import (
	_ "github.com/opencontainers/runc/libcontainer/nsenter"
)

func init() {
	if len(os.Args) > 1 && os.Args[1] == "init" {
		runtime.GOMAXPROCS(1)
		runtime.LockOSThread()
		factory, _ := libcontainer.New("")
		if err := factory.StartInitialization(); err != nil {
			logrus.Fatal(err)
		}
		panic("--this line should have never been executed, congratulations--")
	}
}

Then to create a container you first have to initialize an instance of a factory that will handle the creation and initialization for a container.

factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
	logrus.Fatal(err)
	return
}

Once you have an instance of the factory created we can create a configuration struct describing how the container is to be created. A sample would look similar to this:

defaultMountFlags := unix.MS_NOEXEC | unix.MS_NOSUID | unix.MS_NODEV
config := &configs.Config{
	Rootfs: "/your/path/to/rootfs",
	Capabilities: &configs.Capabilities{
                Bounding: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Effective: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Inheritable: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Permitted: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
                Ambient: []string{
                        "CAP_CHOWN",
                        "CAP_DAC_OVERRIDE",
                        "CAP_FSETID",
                        "CAP_FOWNER",
                        "CAP_MKNOD",
                        "CAP_NET_RAW",
                        "CAP_SETGID",
                        "CAP_SETUID",
                        "CAP_SETFCAP",
                        "CAP_SETPCAP",
                        "CAP_NET_BIND_SERVICE",
                        "CAP_SYS_CHROOT",
                        "CAP_KILL",
                        "CAP_AUDIT_WRITE",
                },
        },
	Namespaces: configs.Namespaces([]configs.Namespace{
		{Type: configs.NEWNS},
		{Type: configs.NEWUTS},
		{Type: configs.NEWIPC},
		{Type: configs.NEWPID},
		{Type: configs.NEWUSER},
		{Type: configs.NEWNET},
	}),
	Cgroups: &configs.Cgroup{
		Name:   "test-container",
		Parent: "system",
		Resources: &configs.Resources{
			MemorySwappiness: nil,
			AllowAllDevices:  nil,
			AllowedDevices:   configs.DefaultAllowedDevices,
		},
	},
	MaskPaths: []string{
		"/proc/kcore",
		"/sys/firmware",
	},
	ReadonlyPaths: []string{
		"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
	},
	Devices:  configs.DefaultAutoCreatedDevices,
	Hostname: "testing",
	Mounts: []*configs.Mount{
		{
			Source:      "proc",
			Destination: "/proc",
			Device:      "proc",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "tmpfs",
			Destination: "/dev",
			Device:      "tmpfs",
			Flags:       unix.MS_NOSUID | unix.MS_STRICTATIME,
			Data:        "mode=755",
		},
		{
			Source:      "devpts",
			Destination: "/dev/pts",
			Device:      "devpts",
			Flags:       unix.MS_NOSUID | unix.MS_NOEXEC,
			Data:        "newinstance,ptmxmode=0666,mode=0620,gid=5",
		},
		{
			Device:      "tmpfs",
			Source:      "shm",
			Destination: "/dev/shm",
			Data:        "mode=1777,size=65536k",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "mqueue",
			Destination: "/dev/mqueue",
			Device:      "mqueue",
			Flags:       defaultMountFlags,
		},
		{
			Source:      "sysfs",
			Destination: "/sys",
			Device:      "sysfs",
			Flags:       defaultMountFlags | unix.MS_RDONLY,
		},
	},
	UidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	GidMappings: []configs.IDMap{
		{
			ContainerID: 0,
			HostID: 1000,
			Size: 65536,
		},
	},
	Networks: []*configs.Network{
		{
			Type:    "loopback",
			Address: "127.0.0.1/0",
			Gateway: "localhost",
		},
	},
	Rlimits: []configs.Rlimit{
		{
			Type: unix.RLIMIT_NOFILE,
			Hard: uint64(1025),
			Soft: uint64(1025),
		},
	},
}

Once you have the configuration populated you can create a container:

container, err := factory.Create("container-id", config)
if err != nil {
	logrus.Fatal(err)
	return
}

To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:

process := &libcontainer.Process{
	Args:   []string{"/bin/bash"},
	Env:    []string{"PATH=/bin"},
	User:   "daemon",
	Stdin:  os.Stdin,
	Stdout: os.Stdout,
	Stderr: os.Stderr,
}

err := container.Run(process)
if err != nil {
	container.Destroy()
	logrus.Fatal(err)
	return
}

// wait for the process to finish.
_, err := process.Wait()
if err != nil {
	logrus.Fatal(err)
}

// destroy the container.
container.Destroy()

Additional ways to interact with a running container are:

// return all the pids for all processes running inside the container.
processes, err := container.Processes()

// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()

// pause all processes inside the container.
container.Pause()

// resume all paused processes.
container.Resume()

// send signal to container's init process.
container.Signal(signal)

// update container resource constraints.
container.Set(config)

// get current status of the container.
status, err := container.Status()

// get current container's state information.
state, err := container.State()

Checkpoint & Restore

libcontainer now integrates CRIU for checkpointing and restoring containers. This let's you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.

criu version 1.5.2 or higher is required to use checkpoint and restore. If you don't already have criu installed, you can build it from source, following the online instructions. criu is also installed in the docker image generated when building libcontainer with docker.

Code and documentation copyright 2014 Docker, inc. Code released under the Apache 2.0 license. Docs released under Creative commons.