This adds a new env var for identifying the internal sync pipe that
libcontainer uses to sync with the container and parent process. This
replaces #496 to allow the user to add additional files to the processes
and not take over fd 3 for all containers.
Closes#496
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This updates the console handling to chown the console on creation to
the root user within the container.
This also moves the setup mounts from the userns sidecar process into
the main init processes by trying to mknod devices, if it fails on an
EPERM then bind mount the device from the host into the container for
use. This prevents access issues when the sidecar process mknods the
device for the usernamespace returning an EPERM when writting to
dev/null.
This also adds some error handling for init processes and nsinit updates
with added flags for testing and other functions.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This allows you to set certian configuration options such as what cgroup
implementation to use on the factory at create time.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Remove veth interfaces on the host if an error occurs.
Provide the host interface name, temporary peer interface name and the
name of the peer once it is inside the container's namespace in the
Network config.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This removes a new unused methods from the container interface and types
parameters such as os.Signal and WaitStatus
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
A new constructor function (like nsenter) is added in this patch. This
function gets arguments from environment variables and its behaviour doesn't
depend on a command line arguments.
A program which calls factory.StartInitialization() must import the nsenter
package. It looks ugly, but I don't know another way how to enter into CT from
a go code.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Use namespace.Exec() and namespace.Init() to execute processes in CT.
Now an init process is actually executed in a new container. This series
doesn't change code about creating containers, it only reworks code according
with new API.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
This removes the entire syncpipe package and replaces it with standard
operations on the pipes. The syncpipe type just never felt right and
probably should not have been there.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
This moves the sync pipe into a separate package to help the changes
when moving the API around.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@docker.com> (github: crosbymichael)
It is no longer necessary to pass "SETUID" or "SETGID" capabilities to
the container when a "user" is specified in the config.
Docker-DCO-1.1-Signed-off-by: Bernerd Schaefer <bj.schaefer@gmail.com> (github: bernerdschaefer)
Some applications want to write to /proc. For instance:
docker run -it centos groupadd foo
Gives: groupadd: failure while writing changes to /etc/group
And strace reveals why:
open("/proc/self/task/13/attr/fscreate", O_RDWR) = -1 EROFS (Read-only file system)
I've looked at what other systems do, and systemd-nspawn makes /proc read-write
and /proc/sys readonly, while lxc allows "proc:mixed" which does the same,
plus it makes /proc/sysrq-trigger also readonly.
The later seems like a prudent idea, so we follows lxc proc:mixed.
Additionally we make /proc/irq and /proc/bus, as these seem to let
you control various hardware things.
Docker-DCO-1.1-Signed-off-by: Alexander Larsson <alexl@redhat.com> (github: alexlarsson)
We don't have the flexibility to do extra things with lxc because it is
a black box and most fo the magic happens before we get a chance to
interact with it in dockerinit.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
There is not need for the remount hack, we use aa_change_onexec so the
apparmor profile is not applied until we exec the users app.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
This also cleans up some of the left over restriction paths code from
before.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful. As such, the shadowing
of /proc/kcore is still enabled with both LXC and native drivers.
Special care has to be taken with /proc/1/attr, which still needs to
be mounted read-write in order to enable the AppArmor profile. It is
bind-mounted from a private read-write mount of procfs.
All that enforcement is done in dockerinit. The code doing the real
work is in libcontainer. The init function for the LXC driver calls
the function from libcontainer to avoid code duplication.
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <jerome@docker.com> (github: jpetazzo)
Without this patch, containers inherit the open file descriptors of the daemon, so my "exec 42>&2" allows us to "echo >&42 some nasty error with some bad advice" directly into the daemon log. :)
Also, "hack/dind" was already doing this due to issues caused by the inheritance, so I'm removing that hack too since this patch obsoletes it by generalizing it for all containers.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)
This has every container using the docker daemon's pid for the processes
label so it does not work correctly.
Docker-DCO-1.1-Signed-off-by: Michael Crosby <michael@crosbymichael.com> (github: crosbymichael)
These are unnecessary since the user package handles these cases properly already (as evidenced by the LXC backend not having these special cases).
I also updated the errors returned to match the other libcontainer error messages in this same file.
Also, switching from Setresuid to Setuid directly isn't a problem, because the "setuid" system call will automatically do that if our own effective UID is root currently: (from `man 2 setuid`)
setuid() sets the effective user ID of the calling process. If the
effective UID of the caller is root, the real UID and saved set-user-
ID are also set.
Docker-DCO-1.1-Signed-off-by: Andrew Page <admwiggin@gmail.com> (github: tianon)