config-linux: Specify host mount namespace for namespace paths

Avoid trouble with situations like:

  # mount --bind /mnt/test /mnt/test
  # mount --make-rprivate /mnt/test
  # touch /mnt/test/mnt /mnt/test/user
  # mount --bind /proc/123/ns/mnt /mnt/test/mnt
  # mount --bind /proc/123/ns/user /mnt/test/user
  # nsenter --mount=/proc/123/ns/mnt --user /proc/123/ns/user sh

which uses the required private mount for binding mount namespace
references [1,2,3].  We want to avoid:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime joins the mount namespace referenced by fd 3.
3. Runtime fails to open /mnt/test/user, because /mnt/test is not
   visible in the current mount namespace.

and instead get runtime authors to setup flows like:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime opens /mnt/test/user as fd 4.
3. Runtime joins the mount namespace referenced by fd 3.
4. Runtime joins the user namespace referenced by fd 4.

This also applies to new namespace creation.  We want to avoid:

1. Runtime clones a container process with a new mount namespace.
2c. Container process fails to open /mnt/test/user, because /mnt/test
    is not visible in the current mount namespace.

in favor of something like:

1. Runtime opens /mnt/test/user as fd 3.
2. Runtime clones a container process with a new mount namespace.
3h. Host process closes unneeded fd 3.
3c. Container process joins the user namespace referenced by fd 3.

I also define runtime and container namespaces, so we have consistent
terminology.  I prefer:

* host namespace: a namespace you are in when you invoke the runtime
* host process: the runtime process invoked by the user
* container process: the process created by a clone call in the host
  process which will eventually execute the user-configured process.

Both the host and container processes are running runtime code
(although the container process eventually transitions to
user-configured code), so I find "runtime process", "runtime
namespace", etc. to be imprecise.  However, the maintainer consensus
is for "runtime namespace" [4,5], so that's what we're going with
here.

[1]: http://karelzak.blogspot.com/2015/04/persistent-namespaces.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4ce5d2b1a8fde84c0eebe70652cf28b9beda6b4e
[3]: http://mid.gmane.org/87haeahkzc.fsf@xmission.com
[4]: https://github.com/opencontainers/specs/pull/275#discussion_r48057211
[5]: https://github.com/opencontainers/specs/pull/275#discussion_r48324264

Signed-off-by: W. Trevor King <wking@tremily.us>
This commit is contained in:
W. Trevor King 2015-12-18 10:42:33 -08:00
parent b8d67bbaf1
commit 5dad125595
2 changed files with 11 additions and 1 deletions

View File

@ -34,7 +34,7 @@ The following parameters can be specified to setup namespaces:
* **`uts`** the container will be able to have its own hostname and domain name * **`uts`** the container will be able to have its own hostname and domain name
* **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container * **`user`** the container will be able to remap user and group IDs from the host to local users and groups within the container
* **`path`** *(string, optional)* - path to namespace file * **`path`** *(string, optional)* - path to namespace file in the [runtime mount namespace](glossary.md#runtime-namespace)
If a path is specified, that particular file is used to join that type of namespace. If a path is specified, that particular file is used to join that type of namespace.
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace. Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.

View File

@ -13,6 +13,10 @@ The [`config.json`](config.md) file in a [bundle](#bundle) which defines the int
An environment for executing processes with configurable isolation and resource limitations. An environment for executing processes with configurable isolation and resource limitations.
For example, namespaces, resource limits, and mounts are all part of the container environment. For example, namespaces, resource limits, and mounts are all part of the container environment.
## Container namespace
On Linux, a leaf in the [namespace][namespaces.7] hierarchy in which the [configured process](config.md#process-configuration) executes.
## JSON ## JSON
All configuration [JSON][] MUST be encoded in [UTF-8][]. All configuration [JSON][] MUST be encoded in [UTF-8][].
@ -22,5 +26,11 @@ All configuration [JSON][] MUST be encoded in [UTF-8][].
An implementation of this specification. An implementation of this specification.
It reads the [configuration files](#configuration) from a [bundle](#bundle), uses that information to create a [container](#container), launches a process inside the container, and performs other [lifecycle actions](runtime.md). It reads the [configuration files](#configuration) from a [bundle](#bundle), uses that information to create a [container](#container), launches a process inside the container, and performs other [lifecycle actions](runtime.md).
## Runtime namespace
On Linux, a leaf in the [namespace][namespaces.7] hierarchy from which the [runtime](#runtime) process is executed.
New container namespaces will be created as children of the runtime namespaces.
[JSON]: http://json.org/ [JSON]: http://json.org/
[UTF-8]: http://www.unicode.org/versions/Unicode8.0.0/ch03.pdf [UTF-8]: http://www.unicode.org/versions/Unicode8.0.0/ch03.pdf
[namespaces.7]: http://man7.org/linux/man-pages/man7/namespaces.7.html