This updates runc and libcontainer to handle rlimits per process and set
them correctly for the container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This is needed to make 'runc delete' correctly run the post-stop hooks.
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Ed King <eking@pivotal.io>
This adds `configs.IsNamespaceSupported(nsType)` to check if the host supports
a namespace type.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
This prior fix to set "-1" explicitly was lost, and it is simpler to use
the same pointer type from the OCI spec to handle nil pointer == -1 ==
unset case.
Also, as a nearly humorous aside, there was a test for MemorySwappiness
that was actually setting Memory, and it was passing because of this
bug (as it was always setting everyone's MemorySwappiness to zero!)
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
Set `-1` doesn't mean disable swap, disable swap means you
can't use swap memory, set `-1` really means you can use
unlimited swap memory.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Godeps: Vendor opencontainers/specs 96bcd043aa
Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274
Signed-off-by: Antonio Murdaca <runcom@linux.com>
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.
Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.
Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.
config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Do not have methods and actions that require syscalls in the configs
package because it breaks cross compile.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.
Signed-off-by: Matthew Heon <mheon@redhat.com>
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.
{
"type": "bind",
"source": "/root/mnt-source",
"destination": "/root/mnt-dest",
"options": "rbind,shared"
}
One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.
#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest private
Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).
Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.
Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.
This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>