An init process can join other namespaces (pidns, ipc etc.). This leverages
C code defined in nsenter package to spawn a process with correct namespaces
and clone if necessary.
This moves all setns and cloneflags related code to nsenter layer, which mean
that we dont use Go os/exec to create process with cloneflags and set
uid/gid_map or setgroups anymore. The necessary data is passed from Go to C
using a netlink binary-encoding format.
With this change, setns and init processes are almost the same, which brings
some opportunity for refactoring.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
[mickael.laventure@docker.com: adapted to apply on master @ d97d5e]
Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@docker.com>
replace passing of pid and console path via environment variable with passing
them with netlink message via an established pipe.
this change requires us to set _LIBCONTAINER_INITTYPE and
_LIBCONTAINER_INITPIPE as the env environment of the bootstrap process as we
only send the bootstrap data for setns process right now. When init and setns
bootstrap process are unified (i.e., init use nsexec instead of Go to clone new
process), we can remove _LIBCONTAINER_INITTYPE.
Note:
- we read nlmsghdr first before reading the content so we can get the total
length of the payload and allocate buffer properly instead of allocating
one large buffer.
- check read bytes vs the wanted number. It's an error if we failed to read
the desired number of bytes from the pipe into the buffer.
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>