5d93fed3d2
This sets the init processes that join and setup the container's namespaces as non-dumpable before they setns to the container's pid (or any other ) namespace. This settings is automatically reset to the default after the Exec in the container so that it does not change functionality for the applications that are running inside, just our init processes. This prevents parent processes, the pid 1 of the container, to ptrace the init process before it drops caps and other sets LSMs. This patch also ensures that the stateDirFD being used is still closed prior to exec, even though it is set as O_CLOEXEC, because of the order in the kernel. https://github.com/torvalds/linux/blob/v4.9/fs/exec.c#L1290-L1318 The order during the exec syscall is that the process is set back to dumpable before O_CLOEXEC are processed. Signed-off-by: Michael Crosby <crosbymichael@gmail.com> |
||
---|---|---|
.. | ||
apparmor | ||
cgroups | ||
configs | ||
criurpc | ||
devices | ||
integration | ||
keys | ||
label | ||
nsenter | ||
seccomp | ||
selinux | ||
specconv | ||
stacktrace | ||
system | ||
user | ||
utils | ||
xattr | ||
README.md | ||
SPEC.md | ||
capabilities_ambient.go | ||
capabilities_linux.go | ||
capabilities_noambient.go | ||
compat_1.5_linux.go | ||
console.go | ||
console_freebsd.go | ||
console_linux.go | ||
console_solaris.go | ||
console_windows.go | ||
container.go | ||
container_linux.go | ||
container_linux_test.go | ||
container_solaris.go | ||
container_windows.go | ||
criu_opts_unix.go | ||
criu_opts_windows.go | ||
error.go | ||
error_test.go | ||
factory.go | ||
factory_linux.go | ||
factory_linux_test.go | ||
generic_error.go | ||
generic_error_test.go | ||
init_linux.go | ||
message_linux.go | ||
network_linux.go | ||
notify_linux.go | ||
notify_linux_test.go | ||
process.go | ||
process_linux.go | ||
restored_process.go | ||
rootfs_linux.go | ||
rootfs_linux_test.go | ||
setgroups_linux.go | ||
setns_init_linux.go | ||
standard_init_linux.go | ||
state_linux.go | ||
state_linux_test.go | ||
stats.go | ||
stats_freebsd.go | ||
stats_linux.go | ||
stats_solaris.go | ||
stats_windows.go | ||
sync.go |
README.md
Libcontainer provides a native Go implementation for creating containers with namespaces, cgroups, capabilities, and filesystem access controls. It allows you to manage the lifecycle of the container performing additional operations after the container is created.
Container
A container is a self contained execution environment that shares the kernel of the host system and which is (optionally) isolated from other containers in the system.
Using libcontainer
Because containers are spawned in a two step process you will need a binary that will be executed as the init process for the container. In libcontainer, we use the current binary (/proc/self/exe) to be executed as the init process, and use arg "init", we call the first step process "bootstrap", so you always need a "init" function as the entry of "bootstrap".
func init() {
if len(os.Args) > 1 && os.Args[1] == "init" {
runtime.GOMAXPROCS(1)
runtime.LockOSThread()
factory, _ := libcontainer.New("")
if err := factory.StartInitialization(); err != nil {
logrus.Fatal(err)
}
panic("--this line should have never been executed, congratulations--")
}
}
Then to create a container you first have to initialize an instance of a factory that will handle the creation and initialization for a container.
factory, err := libcontainer.New("/var/lib/container", libcontainer.Cgroupfs, libcontainer.InitArgs(os.Args[0], "init"))
if err != nil {
logrus.Fatal(err)
return
}
Once you have an instance of the factory created we can create a configuration struct describing how the container is to be created. A sample would look similar to this:
defaultMountFlags := syscall.MS_NOEXEC | syscall.MS_NOSUID | syscall.MS_NODEV
config := &configs.Config{
Rootfs: "/your/path/to/rootfs",
Capabilities: []string{
"CAP_CHOWN",
"CAP_DAC_OVERRIDE",
"CAP_FSETID",
"CAP_FOWNER",
"CAP_MKNOD",
"CAP_NET_RAW",
"CAP_SETGID",
"CAP_SETUID",
"CAP_SETFCAP",
"CAP_SETPCAP",
"CAP_NET_BIND_SERVICE",
"CAP_SYS_CHROOT",
"CAP_KILL",
"CAP_AUDIT_WRITE",
},
Namespaces: configs.Namespaces([]configs.Namespace{
{Type: configs.NEWNS},
{Type: configs.NEWUTS},
{Type: configs.NEWIPC},
{Type: configs.NEWPID},
{Type: configs.NEWUSER},
{Type: configs.NEWNET},
}),
Cgroups: &configs.Cgroup{
Name: "test-container",
Parent: "system",
Resources: &configs.Resources{
MemorySwappiness: nil,
AllowAllDevices: nil,
AllowedDevices: configs.DefaultAllowedDevices,
},
},
MaskPaths: []string{
"/proc/kcore",
"/sys/firmware",
},
ReadonlyPaths: []string{
"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
},
Devices: configs.DefaultAutoCreatedDevices,
Hostname: "testing",
Mounts: []*configs.Mount{
{
Source: "proc",
Destination: "/proc",
Device: "proc",
Flags: defaultMountFlags,
},
{
Source: "tmpfs",
Destination: "/dev",
Device: "tmpfs",
Flags: syscall.MS_NOSUID | syscall.MS_STRICTATIME,
Data: "mode=755",
},
{
Source: "devpts",
Destination: "/dev/pts",
Device: "devpts",
Flags: syscall.MS_NOSUID | syscall.MS_NOEXEC,
Data: "newinstance,ptmxmode=0666,mode=0620,gid=5",
},
{
Device: "tmpfs",
Source: "shm",
Destination: "/dev/shm",
Data: "mode=1777,size=65536k",
Flags: defaultMountFlags,
},
{
Source: "mqueue",
Destination: "/dev/mqueue",
Device: "mqueue",
Flags: defaultMountFlags,
},
{
Source: "sysfs",
Destination: "/sys",
Device: "sysfs",
Flags: defaultMountFlags | syscall.MS_RDONLY,
},
},
UidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 65536,
},
},
GidMappings: []configs.IDMap{
{
ContainerID: 0,
HostID: 1000,
Size: 65536,
},
},
Networks: []*configs.Network{
{
Type: "loopback",
Address: "127.0.0.1/0",
Gateway: "localhost",
},
},
Rlimits: []configs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Hard: uint64(1025),
Soft: uint64(1025),
},
},
}
Once you have the configuration populated you can create a container:
container, err := factory.Create("container-id", config)
if err != nil {
logrus.Fatal(err)
return
}
To spawn bash as the initial process inside the container and have the processes pid returned in order to wait, signal, or kill the process:
process := &libcontainer.Process{
Args: []string{"/bin/bash"},
Env: []string{"PATH=/bin"},
User: "daemon",
Stdin: os.Stdin,
Stdout: os.Stdout,
Stderr: os.Stderr,
}
err := container.Run(process)
if err != nil {
container.Destroy()
logrus.Fatal(err)
return
}
// wait for the process to finish.
_, err := process.Wait()
if err != nil {
logrus.Fatal(err)
}
// destroy the container.
container.Destroy()
Additional ways to interact with a running container are:
// return all the pids for all processes running inside the container.
processes, err := container.Processes()
// get detailed cpu, memory, io, and network statistics for the container and
// it's processes.
stats, err := container.Stats()
// pause all processes inside the container.
container.Pause()
// resume all paused processes.
container.Resume()
// send signal to container's init process.
container.Signal(signal)
// update container resource constraints.
container.Set(config)
// get current status of the container.
status, err := container.Status()
// get current container's state information.
state, err := container.State()
Checkpoint & Restore
libcontainer now integrates CRIU for checkpointing and restoring containers. This let's you save the state of a process running inside a container to disk, and then restore that state into a new process, on the same machine or on another machine.
criu
version 1.5.2 or higher is required to use checkpoint and restore.
If you don't already have criu
installed, you can build it from source, following the
online instructions. criu
is also installed in the docker image
generated when building libcontainer with docker.
Copyright and license
Code and documentation copyright 2014 Docker, inc. Code released under the Apache 2.0 license. Docs released under Creative commons.