Currently startContainer() is used to create and to run a container.
In the next patch it will be used to restore a container.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
Previously Host{U,G}ID only gave you the root mapping, which isn't very
useful if you are trying to do other things with the IDMaps.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:
* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update
The following commands work:
* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state
In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This maybe a nice extra but it adds complication to the usecase. The
contract is listen on the socket and you get an fd to the pty master and
that is that.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This is a fix for rootless containers and general io handling. The
higher level systems must preparte the IO for the container in the
detach case and make sure it is setup correctly for the container's
process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The current support of systemd-notify has a race condition as the
message send to the systemd notify socket might be dropped if the sender
process is not running by the time systemd checks for the sender of the
datagram. A proper fix of this in systemd would require changes to the
kernel to maintain the cgroup of the sender process when it is dead (but
it is not probably going to happen...)
Generally, the solution to this issue is to specify the PID in the
message itself so that systemd has not to guess the sender, but this
wouldn't work when running in a PID namespace as the container will pass
the PID known in its namespace (something like PID=1,2,3..) and systemd
running on the host is not able to map it to the runc service.
The proposed solution is to have a proxy in runc that forwards the
messages to the host systemd.
Example of this issue:
https://github.com/projectatomic/atomic-system-containers/pull/24
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
This preserves the given number of file descriptors on top of the 3 stdio and
the socket activation ($LISTEN_FDS=M) fds.
If LISTEN_FDS is not set then [3..3+N) would be preserved by --preserve-fds=N.
Given LISTEN_FDS=3 and --preserve-fds=5 then we would preserve fds [3, 11) (in
addition to stdio). That's 3, 4 & 5 from LISTEN_FDS=3 and 6, 7, 8, 9 & 10 from
--preserve-fds=5.
Signed-off-by: Ian Campbell <ian.campbell@docker.com>
We could fail and return in tty.recvtty() and miss
tty.Close(), which might cause broken console output.
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
This allows for higher-level orchestrators to be able to have access to
the master pty file descriptor without keeping the runC process running.
This is key to having (detach && createTTY) with a _real_ pty created
inside the container, which is then sent to a higher level orchestrator
over an AF_UNIX socket.
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This implements {createTTY, detach} and all of the combinations and
negations of the two that were previously implemented. There are some
valid questions about out-of-OCI-scope topics like !createTTY and how
things should be handled (why do we dup the current stdio to the
process, and how is that not a security issue). However, these will be
dealt with in a separate patchset.
In order to allow for late console setup, split setupRootfs into the
"preparation" section where all of the mounts are created and the
"finalize" section where we pivot_root and set things as ro. In between
the two we can set up all of the console mountpoints and symlinks we
need.
We use two-stage synchronisation to ensures that when the syscalls are
reordered in a suboptimal way, an out-of-place read() on the parentPipe
will not gobble the ancilliary information.
This patch is part of the console rewrite patchset.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
This removes the use of a signal handler and SIGCONT to signal the init
process to exec the users process.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This adds an `--no-new-keyring` flag to run and create so that a new
session keyring is not created for the container and the calling
processes keyring is inherited.
Fixes#818
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
In user namespaces, we need to make sure we don't chown() the console to
unmapped users. This means we need to get both the UID and GID of the
root user in the container when changing the owner.
Signed-off-by: Aleksa Sarai <asarai@suse.de>