Commit Graph

56 Commits

Author SHA1 Message Date
W. Trevor King 60fff3f51c config-linux: Add (array, optional) for linux.devices
To match the omitempty which the Go property has had since 28cc4239
(add omitempty to 'Device' and 'Namespace', 2016-03-10, #340).

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-03 09:13:53 -07:00
Aleksa Sarai 4ed839e747
config-linux: add example of cgroup resource limits
The example section looks very sparse otherwise.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 4291fd1d5a
config-linux: allow lazy cgroup handling
Make explicit that runtimes only have to attach to the bare minimum
number of cgroups in order to fulfil the users' requirements. However,
runtimes are of course allowed to attach to more than the bare minimum.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 0c440a216c
config-linux: clarify cgroupsPath
Clarify some of the confusion with cgroupsPath. Due to systemd, we
cannot require that relative paths be treated in any specific way. In
addition, add a line stating that not all values of cgroupsPath are
required to be valid (and that runtimes must error out if they have an
invalid cgroup path). However, any given value of cgroupsPath should
provide consistent results.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 9ffd72407b
config-linux: cleanup cgroup wording
Some of the wording was a bit clumsy (and incorrect, by conflating
different concepts in control groups as "cgroups").

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-22 01:30:36 +10:00
Phil Estes 124ce0beeb Add new architectures from libseccomp 2.3.0
Signed-off-by: Phil Estes <estesp@gmail.com>
2016-06-22 17:43:50 -04:00
Aleksa Sarai ce19b8d167 *: add support for cgroup namespace
The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows for hiding information from /proc/self/cgroup, and mounting
cgroupfs as an unprivileged user. In the future, this namespace may
allow for subtree management by a container.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-04 00:14:39 +10:00
W. Trevor King f830d50a52 config-linux: Make "don't modify filesystem permissions" generic
The user-namespace restriction isn't about the root filesystem in
particular.  For example, if you bind mount in a second filesystem,
the runtime shouldn't adjust ownership on that filesystem either.

I've also adjusted the old "permissions" to "ownership", since that
more clearly reflects the fields (user and group) that you would
modify if you wanted to adjust for user namespacing.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-24 14:27:38 -07:00
W. Trevor King b373a155de config: Split platform-specific configuration into its own section (#414)
To make it clear that the whole 'linux' section is optional.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-02 14:04:39 -04:00
Mrunal Patel 7350d5e1f1 Add support for Selinux mount context labels
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-22 13:40:49 -07:00
Mrunal Patel 6734c7a3a1 Merge pull request #370 from vbatts/json_schema_and_examples
Json schema and examples
2016-04-11 17:25:03 -07:00
W. Trevor King 74da108732 config-linux.md: Reword kernelTCP docs and mention bytes
Avoid the dangling 'using' from e9a6d948 (cgroup: Add support for
memory.kmem.tcp.limit_in_bytes, 2015-10-26, #235).  I've tried to echo
the kernel docs by mentioning buffer memory [1].  I'd personally
prefer linking to the kernel docs and mentioning
memory.kmem.tcp.limit_in_bytes, but that seemed like too big of a
break from the existing style for this commit.

[1]: https://kernel.org/doc/Documentation/cgroup-v1/memory.txt

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-04-11 16:07:28 -07:00
Vincent Batts d4e7326d50 config: JSON examples
* "complete" JSON example
* fix a couple of values
* fix a missing comma

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2016-04-11 18:56:04 -04:00
Michael Crosby adcbe530a9 Add masked and readonly paths
Fixes #320

This adds the maskedPaths and readonlyPaths fields to the spec so that
proper masking and setting of files in /proc can be configured.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-01 10:46:41 -07:00
Antonio Murdaca 5ded78475c *: fix typos
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-03-21 11:51:19 +01:00
W. Trevor King 5dad125595 config-linux: Specify host mount namespace for namespace paths
Avoid trouble with situations like:

  # mount --bind /mnt/test /mnt/test
  # mount --make-rprivate /mnt/test
  # touch /mnt/test/mnt /mnt/test/user
  # mount --bind /proc/123/ns/mnt /mnt/test/mnt
  # mount --bind /proc/123/ns/user /mnt/test/user
  # nsenter --mount=/proc/123/ns/mnt --user /proc/123/ns/user sh

which uses the required private mount for binding mount namespace
references [1,2,3].  We want to avoid:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime joins the mount namespace referenced by fd 3.
3. Runtime fails to open /mnt/test/user, because /mnt/test is not
   visible in the current mount namespace.

and instead get runtime authors to setup flows like:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime opens /mnt/test/user as fd 4.
3. Runtime joins the mount namespace referenced by fd 3.
4. Runtime joins the user namespace referenced by fd 4.

This also applies to new namespace creation.  We want to avoid:

1. Runtime clones a container process with a new mount namespace.
2c. Container process fails to open /mnt/test/user, because /mnt/test
    is not visible in the current mount namespace.

in favor of something like:

1. Runtime opens /mnt/test/user as fd 3.
2. Runtime clones a container process with a new mount namespace.
3h. Host process closes unneeded fd 3.
3c. Container process joins the user namespace referenced by fd 3.

I also define runtime and container namespaces, so we have consistent
terminology.  I prefer:

* host namespace: a namespace you are in when you invoke the runtime
* host process: the runtime process invoked by the user
* container process: the process created by a clone call in the host
  process which will eventually execute the user-configured process.

Both the host and container processes are running runtime code
(although the container process eventually transitions to
user-configured code), so I find "runtime process", "runtime
namespace", etc. to be imprecise.  However, the maintainer consensus
is for "runtime namespace" [4,5], so that's what we're going with
here.

[1]: http://karelzak.blogspot.com/2015/04/persistent-namespaces.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4ce5d2b1a8fde84c0eebe70652cf28b9beda6b4e
[3]: http://mid.gmane.org/87haeahkzc.fsf@xmission.com
[4]: https://github.com/opencontainers/specs/pull/275#discussion_r48057211
[5]: https://github.com/opencontainers/specs/pull/275#discussion_r48324264

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-03-16 14:47:29 -07:00
Julian Friedman 9d9ed06d5e Move rlimits to process
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
2016-03-10 09:44:43 +00:00
Michael Crosby 5a8a779fb0 Move process specific settings to process
This moves process specific settings like caps, apparmor, and selinux
process label onto the process structure to allow the same settings to
be changed at exec time.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-02 11:40:09 -08:00
Qiang Huang ccf3a246ca Fix fileMode json example
In json, os.FileMode would be presented as a uint32, which
is decimal. Otherwise we'll get error:
`invalid character '6' after object key:value pair`
when unmarshal the json file.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-23 13:34:20 +08:00
Qiang Huang 9bab930044 Fix type of devices type
Fixes: opencontainers/runc#566

For type rune, we can assign char as 'c' in struct, but after
marshal, it'll be presented as int32. So in json config it needs
to be presented as a number which is not friendly to be identified.

Change it to string so that you can actually write "b", "c" in json
spec and you can easily know what type of device it is.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-23 13:33:57 +08:00
W. Trevor King 1b0056cbff config-linux: Update links to cgroups documentation
With 34a9304a (Merge branch 'for-4.5' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup, 2016-01-13,
[1]), Linux restructured their cgroups documentation.  This updated
all of our Documentation/cgroups references to match the new layout,
using reference-style links [2] which let us collect link label
definitions at the bottom of the file.  That makes the spec source
easier to read (no distracting URLs in the middle of a sentence) and
makes the URLs easier to update (only one place to check / fix).

[1]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8
[2]: http://daringfireball.net/projects/markdown/syntax#link

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 20:14:33 -08:00
W. Trevor King 7d5b027673 runtime-config-linux: Separate mknod from cgroups
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

fileMode, uid, and gid are optional, because mknod(2) doesn't need
them and specifies the handling when they aren't set [6,7].
Similarly, major/minor numbers are only required for S_IFCHR and
S_IFBLK [6].  I've left off wording about required runtime behavior
for unset values, because I'd rather address that with a blanket rule
[8].

For the cgroup, access is optional because the kernel docs show an
example that doesn't write an access field to the devices.deny file
[9].  The current kernel docs don't go into much detail on this
behavior (I expect unset and 'rwm' are equivalent), but if the kernel
doesn't need a value written, the spec should get out of the way and
allow users to not specify a value.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [10].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <20151005194655.GN28418@odin.tremily.us>
[2]: https://github.com/opencontainers/specs/pull/98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: https://github.com/opencontainers/specs/pull/101
[5]: https://github.com/opencontainers/specs/pull/171#discussion_r41190655
[6]: http://man7.org/linux/man-pages/man2/mknod.2.html#DESCRIPTION
[7]: https://github.com/opencontainers/specs/pull/298/files#r51053835
[8]: https://github.com/opencontainers/specs/pull/285#issuecomment-167823651
[9]: https://kernel.org/doc/Documentation/cgroup-v1/devices.txt
[10]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 13:52:15 -08:00
W. Trevor King cb2da5430a config: Single, unified config file
Reverting 7232e4b1 (specs: introduce the concept of a runtime.json,
2015-07-30, #88) after discussion on the mailing list [1].  The main
reason is that it's hard to draw a clear line around "inherently
runtime-specific" or "non-portable", so we shouldn't try to do that in
the spec.  Folks who want to flag settings as non-portable for their
own system are welcome to do so (e.g. "we will clobber 'hooks' in
bundles we run") are welcome to do so, but we don't have to have
to split the config into multiple files to do that.

There have been a number of additional changes since #88, so this
isn't a pure Git reversion.  Besides copy-pasting and the associated
link-target updates, I've:

* Restored path -> destination, now that the mount type contains both
  source and target paths again.  I'd prefer 'target' to 'destination'
  to match mount(2), but the pre-7232e4b1 phrasing was 'destination'
  (possibly due to Windows using 'target' for the source?).

* Restored the Windows mount example to its pre-7232e4b1 content.

* Removed required mounts from the config example (requirements landed
  in 3848a238, config-linux: specify the default devices/filesystems
  available, 2015-09-09, #164), because specifying those mounts in the
  config is now redundant.

* Used headers (vs. bold paragraphs) to set off mount examples so we
  get link anchors in the rendered Markdown.

* Replaced references to runtime.json with references to config.json.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
     Subject: Single, unified config file (i.e. rolling back specs#88)
     Date: Wed, 4 Nov 2015 09:53:20 -0800
     Message-ID: <20151104175320.GC24652@odin.tremily.us>

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 09:51:54 -08:00
Gao feng 053f05933b move the description of user ns mapping to proper file
They should stay in runtime not config.

Signed-off-by: Gao feng <omarapazanadi@gmail.com>
2016-01-05 14:19:45 +08:00
Vincent Batts 70372d3880 *.md: update TOC and links
Some of the docs were not even linked to, and did not have a logic
outline for their grouping.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-09-25 11:47:16 -04:00
Vincent Batts 712a7467d1 Merge remote-tracking branch 'origin/pr/163' 2015-09-10 10:07:40 -04:00
Vincent Batts 9a8748cad4 Merge pull request #160 from mrunalp/cap_fix
Modify the capabilities constants to match header files like other constants
2015-09-09 18:59:48 -04:00
Mrunal Patel 663be9d677 Modify the capabilities constants to match header files like other constants
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-09 12:43:17 -04:00
Brandon Philips 3848a23819 config-linux: specify the default devices/filesystems available
Fixes #95

Signed-off-by: Brandon Philips <brandon.philips@coreos.com>
2015-09-09 09:36:59 -07:00
Lai Jiangshan 339e038400 Deduplicate the field of RootfsPropagation
There are two RootfsPropagation fields, one is Linux.RootfsPropagation,
the other one is LinuxRuntime.RootfsPropagation. They are duplicated,
one of them should be removed.

The RootfsPropagation is definitely a runtime specific configuration,
so we remove the one of Linux.RootfsPropagation.

And the description of it is moved from config-linux.md to
runtime-config-linux.md.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-09-09 23:27:37 +08:00
Vincent Batts 6cab2747d9 *.md: markdown formatting
Closes https://github.com/opencontainers/specs/issues/83

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-09-09 10:17:06 -04:00
Brandon Philips 7232e4b137 specs: introduce the concept of a runtime.json
Based on our discussion in-person yesterday it seems necessary to
separate the concept of runtime configuration from application
configuration. There are a few motivators:

- To support runtime updates of things like cgroups, rlimits, etc we
  should separate things that are inherently runtime specific from
  things that are static to the application running in the container.

- To support the goal of being able to move a bundle between hosts we
  should make it clear what parts of the spec are and are not portable
  between hosts so that upon landing on a new host the non-portable
  options may be rewritten or removed.

- In order to attach a cryptographic identity to a bundle we must not
  include details in the bundle that are host specific.
2015-08-26 09:44:09 -07:00
Tiesheng 45ae53d4db Fix typos in the "Namespace types" section
Signed-off-by: ChengTiesheng <chengtiesheng@huawei.com>
2015-08-20 11:08:40 +08:00
Mrunal Patel af36d746ba Add Apparmor, Selinux and Seccomp sections
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-08-07 14:19:10 -04:00
Alexander Morozov 5273b3d785 Replace Linux.Device with more specific config
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2015-08-06 10:26:29 -07:00
Michael Crosby 55912bd676 Merge pull request #79 from laijs/json-notation-in-md
specs: add json notation
2015-07-27 09:06:50 -07:00
Lai Jiangshan d485f77fbd specs: fix the description for the [ug]idMappings
The fields in the [ug]idMappings are changed, we should fix
the description correspondingly.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-07-26 16:30:59 +08:00
Lai Jiangshan 2e186c62c3 specs: add json notation
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
2015-07-26 16:27:20 +08:00
Huamin Chen c53bf87ac2 make rootfs mount propagation mode settable
Signed-off-by: Huamin Chen  <hchen@redhat.com>
2015-07-16 08:50:11 -04:00
W. Trevor King 0887300359 spec_linux.go: Rename IDMapping fields to follow syscall.SysProcIDMap
'From' and 'To' are potentially ambiguous for a one-to-one map like
this, and there's already an established name convention in
SysProcIDMap [1].  This commit removes the mental overhead of two
separate naming schemes for the same information.  I'd like to drop
IDMapping entirely in favor of SysProcIDMap, but SysProcIDMap doesn't
give the JSON hints we need for (de)serializing.

[1]: https://golang.org/pkg/syscall/#SysProcIDMap
2015-07-08 10:48:51 -07:00
Michael Crosby e8990d65d1 Merge pull request #50 from mrunalp/userns_section
Adds a section for user namespace mappings
2015-07-08 09:28:18 -07:00
Jonathan Boulle 625798536e config: minor cleanup
- link to official SemVer page
- link between config.md and config-linux.md and explain relationship
- fix typo (arch -> os)
- tweak formatting of some special characters
2015-07-06 17:37:01 -07:00
Mrunal Patel d8237f1899 Adds a section for user namespace mappings
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-07-06 16:05:05 -04:00
Jonathan Boulle 1937c009ea *: small spelling fixes 2015-07-01 10:20:43 -07:00
lizf-os a402b7ae4e Fix typos in the rlimits section
Signed-off-by: Zefan Li <lizefan@huawei.com>
2015-07-01 10:25:46 +08:00
Brandon Philips aa7e14306b Merge pull request #35 from mrunalp/rlimits
Adds section for Linux Rlimits
2015-06-30 16:04:05 -07:00
Mrunal Patel a4df2e4ad5 Adds link to kernel cgroups documentation
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-06-30 18:45:10 -04:00
Mrunal Patel 7f9d7d30bd Adds section for Linux Rlimits
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-06-30 18:35:38 -04:00
Michael Crosby 92b590a760 Add linux spec description
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-30 15:19:06 -07:00
Michael Crosby f2569d17b4 Update config-linux for better formatting on values
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-30 15:13:30 -07:00