Commit Graph

26 Commits

Author SHA1 Message Date
Kenfe-Mickael Laventure 6325ab96e7 Call Prestart hook after namespaces have been set
This simply move the call to the Prestart hooks to be made once we
receive the procReady message from the client.

This is necessary as we had to move the setns calls within nsexec in
order to be accomodate joining namespaces that only affect future
children (e.g. NEWPID).

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-02-28 12:26:53 -08:00
Daniel, Dao Quang Minh 42d5d04801 Sets custom namespaces for init processes
An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.

This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.

With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
2016-02-28 12:26:53 -08:00
Hushan Jia 8597d5c969 Use single decoder instance for one stream
This will avoid part of the stream be read and abandomed
and resulting decoding errors.

Signed-off-by: Hushan Jia <hushan.jia@gmail.com>
2016-02-26 19:40:35 +08:00
Mrunal Patel 2f27649848 Move pre-start hooks after container mounts
Today mounts in pre-start hooks get overriden by the default mounts.
Moving the pre-start hooks to after the container mounts and before
the pivot/move root gives better flexiblity in the hooks.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-02-23 02:50:35 -08:00
Qiang Huang 13e8f6e589 Remove procStart
It's never used and not needed. Our pipe is created with
syscall.SOCK_CLOEXEC, so pipe will be closed once container
process executed successfully, parent process will read EOF
and continue. If container process got error before executed,
we'll write procError to sync with parent.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-30 13:41:21 +08:00
Qiang Huang 064113363d Fix the comment about sendConfig
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-01-28 09:58:30 +08:00
Michael Crosby ddcee3cc2a Do not use stream encoders
Marshall the raw objects for the sync pipes so that no new line chars
are left behind in the pipe causing errors.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-01-26 11:22:05 -08:00
Aleksa Sarai 103853ead7 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Aleksa Sarai f36ed4b174 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

One of the weird cases to deal with is systemd. Systemd sets some of the
cgroup configuration options, but not all of them. Because memory is a
special case, we need to explicitly set memory in the systemd Apply().
Otherwise, the rest can be safely re-applied in .Set() as usual.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2016-01-12 10:06:35 +11:00
Mrunal Patel 4124ba9468 Revert "cgroups: add pids controller support"
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-12-19 07:48:48 -08:00
Aleksa Sarai 14ed8696c1 libcontainer: set cgroup config late
Due to the fact that the init is implemented in Go (which seemingly
randomly spawns new processes and loves eating memory), most cgroup
configurations are required to have an arbitrary minimum dictated by the
init. This confuses users and makes configuration more annoying than it
should. An example of this is pids.max, where Go spawns multiple
processes that then cause init to violate the pids cgroup constraint
before the container can even start.

Solve this problem by setting the cgroup configurations as late as
possible, to avoid hitting as many of the resources hogged by the Go
init as possible. This has to be done before seccomp rules are applied,
as the parent and child must synchronise in order for the parent to
correctly set the configurations (and writes might be blocked by seccomp).

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:48 +11:00
Aleksa Sarai 8a740d5391 libcontainer: cgroups: don't Set in Apply
Apply and Set are two separate operations, and it doesn't make sense to
group the two together (especially considering that the bootstrap
process is added to the cgroup as well). The only exception to this is
the memory cgroup, which requires the configuration to be set before
processes can join.

Signed-off-by: Aleksa Sarai <asarai@suse.com>
2015-12-19 11:30:47 +11:00
David Calavera 77c36f4b34 Move linux only Process.InitializeIO behind the linux build flag.
Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-12-15 15:12:29 -05:00
Daniel, Dao Quang Minh d914bf7347 setns: add bootstrap data
add bootstrap data to setns process. If we have any bootstrap data then copy it
to the bootstrap process (i.e. nsexec) using the sync pipe. This will allow us
to eventually replace environment variable usage with more structured data
to setup namespaces, write pid/gid map, setgroup etc.

Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
2015-11-22 11:36:58 +00:00
Michael Crosby 879dfdd980 Fix race setting process opts
When starting and quering for pids a container can start and exit before
this is set.  So set the opts after the process is started and while
libcontainer still has the container's process blocking on the pipe.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-11-06 16:51:59 -08:00
Brian Goff 7632c4585f Fix for race from error on process start
This rather naively fixes an error observed where a processes stdio
streams are not written to when there is an error upon starting up the
process, such as when the executable doesn't exist within the
container's rootfs.

Before the "fix", when an error occurred on start, `terminate` is called
immediately, which calls `cmd.Process.Kill()`, then calling `Wait()` on
the process. In some cases when this `Kill` is called the stdio stream
have not yet been written to, causing non-deterministic output. The
error itself is properly preserved but users attached to the process
will not see this error.

With the fix it is just calling `Wait()` when an error occurs rather
than trying to `Kill()` the process first. This seems to preserve stdio.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-10-07 21:28:26 -04:00
Mrunal Patel c5d3bda7e1 Merge pull request #292 from keloyang/rpid
no need to use p.cmd.Process.Pid in function, use p.pid() instead.
2015-09-24 15:59:39 -07:00
Mrunal Patel dcafe48737 Add version to HookState to make it json-compatible with spec State
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-23 17:13:00 -07:00
keloyang 69a5b2df9e no need to use p.cmd.Process.Pid in function, use p.pid() instead.
Signed-off-by: keloyang <yangshukui@huawei.com>
2015-09-23 10:48:36 +08:00
David Calavera 0f28592b35 Turn hook pointers into values.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-11 11:34:34 -07:00
Michael Crosby 05567f2c94 Implement hooks in libcontainer
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-09-10 17:57:31 -07:00
Lai Jiangshan e48363d777 simplify a variable declaration
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-20 08:21:44 +08:00
Shijiang Wei f0679089b9 Ensure the cleanup jobs in the deferrer are executed on error
Signed-off-by: Shijiang Wei <mountkin@gmail.com>
2015-08-16 12:29:04 +08:00
Lai Jiangshan e8817e1104 Simplify the return on process wait
Simplify the code introduced by the commit d1f0d5705deb:
    Return actual ProcessState on Wait error

Cc: Alexander Morozov <lk4d4@docker.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-08-12 22:37:34 +08:00
Michael Crosby 080df7ab88 Update import paths for new repository
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:29:59 -07:00
Michael Crosby 8f97d39dd2 Move libcontainer into subdirectory
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:29:15 -07:00