This commit allows additional architectures to be added to Seccomp filters
created by containers. This allows containers to make syscalls using these
architectures. For example, in a container on an AMD64 system, only AMD64
syscalls would be usable unless x86 was added to the filter using this patch,
which would allow both 32-bit and 64-bit syscalls to be used.
Signed-off-by: Matthew Heon <mheon@redhat.com>
We need to update the mount's destination after we resolve symlinks so
that it properly creates and mounts the correct location.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Whenever dev/null is used as one of the main processes STDIO, do not try
to change the permissions on it via fchown because we should not do it
in the first place and also this will fail if the container is supposed
to be readonly.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When executing an additional process in a container, all namespaces are
entered but the user namespace. As a result, the process may be
executed as the host's root user. This has both functionality and
security implications.
Fix this by adding the missing user namespace to the array of
namespaces. Since joining a user namespace in which the caller is
already a member yields an error, skip namespaces we're already in.
Last, remove a needless and buggy AT_SYMLINK_NOFOLLOW in the code.
Signed-off-by: Ido Yariv <ido@wizery.com>
* version in the config example is advanced to 0.1.0
* rootfsPropagation in config.json is removed
(The same one is already in runtime.json)
* rlimit time is changed from magic number to name(string)
* add pids cgroup
* add cgroup path
After this change applied, the example config in this README.md
is consistent with the result of `runc spec`.
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Fix the permissions of the container's main processes STDIO when the
process is not run as the root user. This changes the permissions right
before switching to the specified user so that it's STDIO matches it's
UID and GID.
Add a test for checking that the STDIO of the process is owned by the
specified user.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
When we are using user namespaces we need to make sure that when we do
not have a TTY we change the ownership of the pipe()'s used for the
process to the root user within the container so that when you call
open() on any of the /proc/self/fd/*'s you do not get an EPERM.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
The patch mainly removes the wrong "for writing".
The config files are readonly when `runc start`.
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Right now if one passes a mount propagation flag in spec file, it
does not take effect. For example, try following in spec json file.
{
"type": "bind",
"source": "/root/mnt-source",
"destination": "/root/mnt-dest",
"options": "rbind,shared"
}
One would expect that /root/mnt-dest will be shared inside the container
but that's not the case.
#findmnt -o TARGET,PROPAGATION
`-/root/mnt-dest private
Reason being that propagation flags can't be passed in along with other
regular flags. They need to be passed in a separate call to mount syscall.
That too, one propagation flag at a time. (from mount man page).
Hence, store propagation flags separately in a slice and apply these
in that order after the mount call wherever appropriate. This allows
user to control the propagation property of mount point inside
the container.
Storing them separately also solves another problem where recursive flag
(syscall.MS_REC) can get mixed up. For example, options "rbind,private"
and "bind,rprivate" will be same and there will be no way to differentiate
between these if all the flags are stored in a single integer.
This patch would allow one to pass propagation flags "[r]shared,[r]slave,
[r]private,[r]unbindable" in spec file as per mount property.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
I deleted possibility to specify config file from commands for now.
Until we decide how it'll be done. Also I changed runc spec interface to
write config files instead of output them.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>