This updates runc and libcontainer to handle rlimits per process and set
them correctly for the container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The error handling on the runc cli is currenly pretty messy because
messages to the user are split between regular stderr format and logrus
message format. This changes all the error reporting to the cli to only
output on stderr and exit(1) for consumers of the api.
By default logrus logs to /dev/null so that it is not seen by the user.
If the user wants extra and/or structured loggging/errors from runc they
can use the `--log` flag to provide a path to the file where they want
this information. This allows a consistent behavior on the cli but
extra power and information when debugging with logs.
This also includes a change to enable the same logging information
inside the container's init by adding an init cli command that can share
the existing flags for all other runc commands.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This bump of the spec includes a change to the deivce type to be a
string so that it is more readable in the json serialization.
It also includes the change were caps, no new privs, and process
labeling features are moved from the container config onto the process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This updates the current list to what we have now in docker and also
makes these always added so that these are masked out. Privileged
containers can always unmount these if they want to read from kcore or
something like that.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This saves and returns the bundle path for the container in the
container's config and state. It also returns the information via runc
list.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This prior fix to set "-1" explicitly was lost, and it is simpler to use
the same pointer type from the OCI spec to handle nil pointer == -1 ==
unset case.
Also, as a nearly humorous aside, there was a test for MemorySwappiness
that was actually setting Memory, and it was passing because of this
bug (as it was always setting everyone's MemorySwappiness to zero!)
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com> (github: estesp)
When CgroupsPath code was introduced with #497 it was mistakenly made
to act as the equivalent of docker CgroupsParent. This ensure that it
is taken as the final cgroup path.
A couple of unit tests have been added to prevent future regression.
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
This leaves out the internal conversions as we may need to consider
docker backward compatibility for those changes.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
Add support for the pids cgroup controller to libcontainer, a recent
feature that is available in Linux 4.3+.
Unfortunately, due to the init process being written in Go, it can spawn
an an unknown number of threads due to blocked syscalls. This results in
the init process being unable to run properly, and thus small pids.max
configs won't work properly.
Signed-off-by: Aleksa Sarai <asarai@suse.com>
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
Godeps: Vendor opencontainers/specs 96bcd043aa
Fix a bug where it's impossible to pass multiple devices to blkio
cgroup controller files. See https://github.com/opencontainers/runc/issues/274
Signed-off-by: Antonio Murdaca <runcom@linux.com>
spec introduced a new field rootfsPropagation. Right now that field
is not parsed by runc and it does not take effect. Starting parsing
it and for now allow only limited propagation flags. More can be
opened as new use cases show up.
We are apply propagation flags on / and not rootfs. So ideally
we should introduce another field in spec say rootPropagation. For
now I am parsing rootfsPropagation. Once we agree on design, we
can discuss if we need another field in spec or not.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now config.Privatefs is a boolean which determines if / is applied
with propagation flag syscall.MS_PRIVATE | syscall.MS_REC or not.
Soon we want to represent other propagation states like private, [r]slave,
and [r]shared. So either we can introduce more boolean variable or keep
track of propagation flags in an integer variable. Keeping an integer
variable is more versatile and can allow various kind of propagation flags
to be specified. So replace Privatefs with RootPropagation which is an
integer.
Note, this will require changes in docker. Instead of setting Privatefs
to true, they will need to set.
config.RootPropagation = syscall.MS_PRIVATE | syscall.MS_REC
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.
{
"type": "bind",
"source": "/root/mnt-source",
"destination": "/root/mnt-dest",
"options": "rbind,shared"
}
One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.
#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest private
Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).
Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.
Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.
This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>