This updates runc and libcontainer to handle rlimits per process and set
them correctly for the container.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This commit adds support to libcontainer to allow caps, no new privs,
apparmor, and selinux process label to the process struct so that it can
be used together of override the base settings on the container config
per individual process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
currentState() always adds all possible namespaces to the state,
regardless of whether they are supported.
If orderNamespacePaths detects an unsupported namespace, an error is
returned that results in initialization failure.
Fix this by only adding paths of supported namespaces to the state.
Signed-off-by: Ido Yariv <ido@wizery.com>
An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.
This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.
With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
This adds orderNamespacePaths to get correct order of namespaces for the
bootstrap program to join.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
Create a unique session key name for every container. Use the pattern
_ses.<postfix> with postfix being the container's Id.
This patch does not prevent containers from joining each other's session
keyring.
Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com>
Docker uses Prestart hooks to call a libnetwork hook to create
network devices and set addesses and routes.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
This options is set a namespace mask which will not be dumped and restored.
For example, we are going to use this option to restore network
for docker containers. CRIU will create a network namespace and
call a libnetwork hook to restore network devices, addresses and routes.
Signed-off-by: Andrew Vagin <avagin@virtuozzo.com>
We don't need a CreatedTime method on the container because it's not
part of the interface and can be received via the state. We also do not
need to call it CreateTime because the type of this field is time.Time
so we know its time.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Marshall the raw objects for the sync pipes so that no new line chars
are left behind in the pipe causing errors.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
There were issues where a process could die before pausing completed
leaving the container in an inconsistent state and unable to be
destoryed. This makes sure that if the container is paused and the
process is dead it will unfreeze the cgroup before removing them.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
It may be desirable to receive memory pressure levels notifications
before the container depletes all memory. This may be useful for
handling cases where the system thrashes when reaching the container's
memory limits.
Signed-off-by: Ido Yariv <ido@wizery.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add state status() method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Allow multiple checkpoint on restore
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handle leave-running state
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Fix state transitions for inprocess
Because the tests use libcontainer in process between the various states
we need to ensure that that usecase works as well as the out of process
one.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove isDestroyed method
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Handling Pausing from freezer state
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
freezer status
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Fixing review comments
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Added comment when freezer not available
Signed-off-by: Rajasekaran <rajasec79@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Conflicts:
libcontainer/container_linux.go
Change checkFreezer logic to isPaused()
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove state base and factor out destroy func
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Add unit test for state transitions
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows us to distinguish cases where a container
needs to just join the paths or also additionally
set cgroups settings. This will help in implementing
cgroupsPath support in the spec.
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
replace passing of pid and console path via environment variable with passing
them with netlink message via an established pipe.
this change requires us to set _LIBCONTAINER_INITTYPE and
_LIBCONTAINER_INITPIPE as the env environment of the bootstrap process as we
only send the bootstrap data for setns process right now. When init and setns
bootstrap process are unified (i.e., init use nsexec instead of Go to clone new
process), we can remove _LIBCONTAINER_INITTYPE.
Note:
- we read nlmsghdr first before reading the content so we can get the total
length of the payload and allocate buffer properly instead of allocating
one large buffer.
- check read bytes vs the wanted number. It's an error if we failed to read
the desired number of bytes from the pipe into the buffer.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
add bootstrap data to setns process. If we have any bootstrap data then copy it
to the bootstrap process (i.e. nsexec) using the sync pipe. This will allow us
to eventually replace environment variable usage with more structured data
to setup namespaces, write pid/gid map, setgroup etc.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
When starting and quering for pids a container can start and exit before
this is set. So set the opts after the process is started and while
libcontainer still has the container's process blocking on the pipe.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
While testing different versions of criu it helps to know which
criu binary with which options is currently used. Therefore additional
debug output to display these information is added.
v2: increase readability of printed out criu options
Signed-off-by: Adrian Reber <adrian@lisas.de>
Here are two reasons:
* If we use systemd, we need to ask it to create cgroups
* If a container is restored with another ID, we need to
change paths to cgroups.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
This adds a `Signal()` method to the container interface so that the
initial process can be signaled after a Load or operation. It also
implements signaling the init process from a nonChildProcess.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
A boolean field named GidMappingsEnableSetgroups was added to
SysProcAttr in Go1.5. This field determines the value of the process's
setgroups proc entry.
Since the default is to set the entry to 'deny', calling setgroups will
fail on systems running kernels 3.19+.
Set GidMappingsEnableSetgroups to true so setgroups wont be set to
'deny'.
Signed-off-by: Ido Yariv <ido@wizery.com>
Actually cgroup mounts are bind-mounts, so they should be
handled by the same way.
Reported-by: Ross Boucher <rboucher@gmail.com>
Signed-off-by: Andrey Vagin <avagin@openvz.org>