Commit Graph

80 Commits

Author SHA1 Message Date
Mrunal Patel c0206be451 Merge pull request #647 from Mashimiao/config-linux-fix-device-path
config-linux: Add restriction for duplicated device path
2017-01-12 09:57:11 -08:00
Ma Shimiao 1fc1464dbc config-linux: Add restriction for duplicated device path
I think runtime should generate an error, if devices has
duplicated device path.
Because we don't know which one is really needed.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-01-12 14:24:52 +08:00
W. Trevor King d43fc428aa config-linux: Lift no-tweaking namespace restriction
This restriction originally landed via 02b456e9 (Clarify behavior
around namespaces paths, 2015-09-08, #158).  The hostname case landed
via 66a0543e (config: Require a new UTS namespace for config.json's
hostname, 2015-10-05, #214) citing the namespace restriction.  The
restriciton extended to runtime namespaces in 01c2d55f (config-linux:
Extend no-tweak requirement to runtime namespaces, 2016-08-24, #538).
There was a proposal in-flight to get config-wide consistency around
the no-tweaking concept [1].

In today's meeting, the maintainer consensus was to strike the
no-tweaking restriction [2], which is what I've done here.  I've
removed the ROADMAP entry because this gives folks a way to adjust
existing containers (launch a new container which joins and tweaks the
original).

The hostname entry still mentions the UTS namespace to provide a guard
against accidental foot-gunning.  There was no no-tweaking language
for properties related to other namespaces (e.g. 'mounts').
Maybe the other namespaces have more obvious names.

[1]: https://github.com/opencontainers/runtime-spec/pull/540
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-01-11-22.04.log.html#l-117

Signed-off-by: W. Trevor King <wking@tremily.us>
2017-01-11 15:16:54 -08:00
Qiang Huang 082e93a2bd Allow negative value for some resource fields
Carry #499

For these values, cgroup kernal APIs accept -1 to set
them as unlimited, as docker and runc all support
update resources, we should not set drawbacks in spec.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-05 19:03:57 +08:00
Doug Davis e7be40f0c3 Cleanup the spec a bit to remove WG/git text that's not really part of the spec
renamed an href to "container-namespace2" to avoid a dup-warning msg from
the PDF generator

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-11-16 09:50:03 -08:00
Rob Dolin (MSFT) 675a67dc17 [Config Linux] Consistent size values in example
Matches the example in config.md

Signed-off-by: Rob Dolin <robdolin@microsoft.com>
2016-11-08 13:44:16 -08:00
Daniel, Dao Quang Minh f815650e67 Merge pull request #608 from hqhq/fix_format_issues
Fix several format issues found by pdf and html
2016-11-08 02:21:10 +00:00
Qiang Huang 0df2586f03 Merge pull request #518 from mrunalp/terminal
Clarify wording for terminal setting and /dev/console
2016-11-07 09:49:28 +08:00
Qiang Huang 661314a926 Fix several format issues found by pdf and html
This carries #578 and fixes some other format issues.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-03 09:33:51 +08:00
Qiang Huang b8e2ebec5f Merge pull request #597 from WeiZhang555/fix-duplicated-namespaces
Forbid duplicated namespaces with same `type`.
2016-11-01 11:42:41 +08:00
Mrunal Patel dc42b45811 Merge pull request #601 from hqhq/rewrite_idmapping
Rewrite LinuxIDMappings
2016-10-31 13:58:45 -07:00
Qiang Huang 4404abf6cb Consistent wording for parameters in array and object
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-28 15:09:05 +08:00
Vincent Batts 28c6afea8b Merge pull request #600 from hqhq/fix_typos
Fix some typos
2016-10-28 01:11:18 +00:00
Qiang Huang 621684f645 Rewrite LinuxIDMappings
Basicly make the format consistent with others, no
semantics change.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 19:00:39 +08:00
Qiang Huang f37cd3a903 Fix some typos
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 18:00:08 +08:00
Qiang Huang 2379be75cb Use IO instead of io
For consistency, while all other places use IO.

$ grep -rnIw IO * | wc -l
10

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 16:10:02 +08:00
Zhang Wei c22eeb2197 Forbid duplicated namespaces with same `type`.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 11:25:43 +08:00
Mrunal Patel 52f3cdecd1 Clarify wording for terminal setting and /dev/console
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-19 10:22:05 -07:00
Ma Shimiao 25f44dd0e8 config-linux: fix format and definitely require value of masked and readonly paths
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2016-09-30 13:51:00 +08:00
W. Trevor King d49c29f042 config: Replace "required" with "REQUIRED"
In all of these cases we want to use the RFC 2119 semantics.
Generated with:

  $ sed -i 's/required/REQUIRED/g' config*.md

after which I rolled back the change for:

  ...controllers required to fulfill...

since that was already MUSTed.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-09-17 22:03:26 -07:00
W. Trevor King c35cf57303 config: Replace "optional" with "OPTIONAL"
In all of these cases we want to use the RFC 2119 semantics.
Generated with:

  $ sed -i 's/optional/OPTIONAL/g' config*.md

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-09-17 22:03:26 -07:00
W. Trevor King 01c2d55fac config-linux: Extend no-tweak requirement to runtime namespaces
Since [1] we've required runtimes to error out if a configuration
joins an existing namespace and adjusts it somehow (e.g. joining an
existing UTC namespace and setting 'hostname', [2]).  However, the
wording from [1] (which survives untouched in the current master) only
talked about "when a path is specified".  I see two possible
approaches for internal consistency:

a. Lift the OCI restriction and allow join-and-tweak [3] where the
   kernel supports it.  When we landed the current restriction, the
   main issues seemed to be "we don't have a clear use-case for join
   and tweak" [4] (although see [5]) and "this is a foot gun [6,7]"
   (I'd rather leave policy to higher-level config linters).

b. Extend the OCI restriction to all cases where the runtime does not
   create a new namespace.  Besides the already covered "namespace
   entry exists and includes 'path'", we'd also want to forbid configs
   that were missing the relevant namespace(s) entirely (in which case
   the container inherits the host namespace(s)).

I'm partial to (a) in the long run, but (b) is less of a shift from
the current spec and likely a better choice for a pending 1.0.

This commit implements (b).

It also makes it explicit that not listing a namespace type will cause
the container to inherit the runtime namespace of that type.

[1]: https://github.com/opencontainers/runtime-spec/pull/158
     Subject: Clarify behavior around namespaces paths
[2]: https://github.com/opencontainers/runtime-spec/pull/214
     Subject: config: Require a new UTS namespace for config.json's hostname
[3]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-138687129
[4]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-138997548
[5]: https://github.com/opencontainers/runtime-spec/pull/305
     Subject: [Tracker] Live Container Updates
[6]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-139106987
[7]: https://github.com/opencontainers/runtime-spec/issues/537#issuecomment-242132288
     Subject: [linux] Tweaking host namespaces?

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-24 10:41:50 -07:00
Lei Jitang d0b0ac224f Use filesystem instead of file system
Signed-off-by: Lei Jitang <leijitang@huawei.com>
2016-08-12 00:00:00 -04:00
W. Trevor King 054d2df15a config-linux: Make linux.resources.devices explicitly optional
And mark it omitempty to avoid:

  $ ocitools generate --template <(echo '{"linux": {"resources": {}}}') | jq .linux
  {
    "resources": {
      "devices": null
    }
  }

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-03 09:13:53 -07:00
W. Trevor King 60fff3f51c config-linux: Add (array, optional) for linux.devices
To match the omitempty which the Go property has had since 28cc4239
(add omitempty to 'Device' and 'Namespace', 2016-03-10, #340).

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-03 09:13:53 -07:00
Aleksa Sarai 4ed839e747
config-linux: add example of cgroup resource limits
The example section looks very sparse otherwise.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 4291fd1d5a
config-linux: allow lazy cgroup handling
Make explicit that runtimes only have to attach to the bare minimum
number of cgroups in order to fulfil the users' requirements. However,
runtimes are of course allowed to attach to more than the bare minimum.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 0c440a216c
config-linux: clarify cgroupsPath
Clarify some of the confusion with cgroupsPath. Due to systemd, we
cannot require that relative paths be treated in any specific way. In
addition, add a line stating that not all values of cgroupsPath are
required to be valid (and that runtimes must error out if they have an
invalid cgroup path). However, any given value of cgroupsPath should
provide consistent results.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 9ffd72407b
config-linux: cleanup cgroup wording
Some of the wording was a bit clumsy (and incorrect, by conflating
different concepts in control groups as "cgroups").

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-22 01:30:36 +10:00
Phil Estes 124ce0beeb Add new architectures from libseccomp 2.3.0
Signed-off-by: Phil Estes <estesp@gmail.com>
2016-06-22 17:43:50 -04:00
Aleksa Sarai ce19b8d167 *: add support for cgroup namespace
The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows for hiding information from /proc/self/cgroup, and mounting
cgroupfs as an unprivileged user. In the future, this namespace may
allow for subtree management by a container.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-04 00:14:39 +10:00
W. Trevor King f830d50a52 config-linux: Make "don't modify filesystem permissions" generic
The user-namespace restriction isn't about the root filesystem in
particular.  For example, if you bind mount in a second filesystem,
the runtime shouldn't adjust ownership on that filesystem either.

I've also adjusted the old "permissions" to "ownership", since that
more clearly reflects the fields (user and group) that you would
modify if you wanted to adjust for user namespacing.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-24 14:27:38 -07:00
W. Trevor King b373a155de config: Split platform-specific configuration into its own section (#414)
To make it clear that the whole 'linux' section is optional.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-02 14:04:39 -04:00
Mrunal Patel 7350d5e1f1 Add support for Selinux mount context labels
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-22 13:40:49 -07:00
Mrunal Patel 6734c7a3a1 Merge pull request #370 from vbatts/json_schema_and_examples
Json schema and examples
2016-04-11 17:25:03 -07:00
W. Trevor King 74da108732 config-linux.md: Reword kernelTCP docs and mention bytes
Avoid the dangling 'using' from e9a6d948 (cgroup: Add support for
memory.kmem.tcp.limit_in_bytes, 2015-10-26, #235).  I've tried to echo
the kernel docs by mentioning buffer memory [1].  I'd personally
prefer linking to the kernel docs and mentioning
memory.kmem.tcp.limit_in_bytes, but that seemed like too big of a
break from the existing style for this commit.

[1]: https://kernel.org/doc/Documentation/cgroup-v1/memory.txt

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-04-11 16:07:28 -07:00
Vincent Batts d4e7326d50 config: JSON examples
* "complete" JSON example
* fix a couple of values
* fix a missing comma

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2016-04-11 18:56:04 -04:00
Michael Crosby adcbe530a9 Add masked and readonly paths
Fixes #320

This adds the maskedPaths and readonlyPaths fields to the spec so that
proper masking and setting of files in /proc can be configured.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-01 10:46:41 -07:00
Antonio Murdaca 5ded78475c *: fix typos
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-03-21 11:51:19 +01:00
W. Trevor King 5dad125595 config-linux: Specify host mount namespace for namespace paths
Avoid trouble with situations like:

  # mount --bind /mnt/test /mnt/test
  # mount --make-rprivate /mnt/test
  # touch /mnt/test/mnt /mnt/test/user
  # mount --bind /proc/123/ns/mnt /mnt/test/mnt
  # mount --bind /proc/123/ns/user /mnt/test/user
  # nsenter --mount=/proc/123/ns/mnt --user /proc/123/ns/user sh

which uses the required private mount for binding mount namespace
references [1,2,3].  We want to avoid:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime joins the mount namespace referenced by fd 3.
3. Runtime fails to open /mnt/test/user, because /mnt/test is not
   visible in the current mount namespace.

and instead get runtime authors to setup flows like:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime opens /mnt/test/user as fd 4.
3. Runtime joins the mount namespace referenced by fd 3.
4. Runtime joins the user namespace referenced by fd 4.

This also applies to new namespace creation.  We want to avoid:

1. Runtime clones a container process with a new mount namespace.
2c. Container process fails to open /mnt/test/user, because /mnt/test
    is not visible in the current mount namespace.

in favor of something like:

1. Runtime opens /mnt/test/user as fd 3.
2. Runtime clones a container process with a new mount namespace.
3h. Host process closes unneeded fd 3.
3c. Container process joins the user namespace referenced by fd 3.

I also define runtime and container namespaces, so we have consistent
terminology.  I prefer:

* host namespace: a namespace you are in when you invoke the runtime
* host process: the runtime process invoked by the user
* container process: the process created by a clone call in the host
  process which will eventually execute the user-configured process.

Both the host and container processes are running runtime code
(although the container process eventually transitions to
user-configured code), so I find "runtime process", "runtime
namespace", etc. to be imprecise.  However, the maintainer consensus
is for "runtime namespace" [4,5], so that's what we're going with
here.

[1]: http://karelzak.blogspot.com/2015/04/persistent-namespaces.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4ce5d2b1a8fde84c0eebe70652cf28b9beda6b4e
[3]: http://mid.gmane.org/87haeahkzc.fsf@xmission.com
[4]: https://github.com/opencontainers/specs/pull/275#discussion_r48057211
[5]: https://github.com/opencontainers/specs/pull/275#discussion_r48324264

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-03-16 14:47:29 -07:00
Julian Friedman 9d9ed06d5e Move rlimits to process
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
2016-03-10 09:44:43 +00:00
Michael Crosby 5a8a779fb0 Move process specific settings to process
This moves process specific settings like caps, apparmor, and selinux
process label onto the process structure to allow the same settings to
be changed at exec time.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-02 11:40:09 -08:00
Qiang Huang ccf3a246ca Fix fileMode json example
In json, os.FileMode would be presented as a uint32, which
is decimal. Otherwise we'll get error:
`invalid character '6' after object key:value pair`
when unmarshal the json file.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-23 13:34:20 +08:00
Qiang Huang 9bab930044 Fix type of devices type
Fixes: opencontainers/runc#566

For type rune, we can assign char as 'c' in struct, but after
marshal, it'll be presented as int32. So in json config it needs
to be presented as a number which is not friendly to be identified.

Change it to string so that you can actually write "b", "c" in json
spec and you can easily know what type of device it is.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-02-23 13:33:57 +08:00
W. Trevor King 1b0056cbff config-linux: Update links to cgroups documentation
With 34a9304a (Merge branch 'for-4.5' of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup, 2016-01-13,
[1]), Linux restructured their cgroups documentation.  This updated
all of our Documentation/cgroups references to match the new layout,
using reference-style links [2] which let us collect link label
definitions at the bottom of the file.  That makes the spec source
easier to read (no distracting URLs in the middle of a sentence) and
makes the URLs easier to update (only one place to check / fix).

[1]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8
[2]: http://daringfireball.net/projects/markdown/syntax#link

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 20:14:33 -08:00
W. Trevor King 7d5b027673 runtime-config-linux: Separate mknod from cgroups
With mknod entries in linux.devices and cgroups entries in
linux.resources.devices.  Background discussion in [1].

For specifying device cgroups independent of device creation.  This
makes it easy to distinguish between configs that call for cgroup
adjustments (which have linux.resources entries) from those that
don't.  Without this split, folks interested in making that
distinction would have to parse the device section to determine if it
included cgroup changes.  This will also make it easy to drop either
portion (mknod [2] or cgroups [3]) independently of the other if the
project decides to do so.

Using seperate sections for mknod and cgroups also allows us to avoid
the complicated validation rules needed for the combined format
mknod/cgroup [4].

Now that there is a section specific to supplying devices, I shifted
the default device listing over from config-linux [5].  The /dev/ptmx
entry is a bit awkward, since it's not a device, but it seemed to fit
better over here.  But I would also be fine leaving it with the other
mounts in config-linux.

fileMode, uid, and gid are optional, because mknod(2) doesn't need
them and specifies the handling when they aren't set [6,7].
Similarly, major/minor numbers are only required for S_IFCHR and
S_IFBLK [6].  I've left off wording about required runtime behavior
for unset values, because I'd rather address that with a blanket rule
[8].

For the cgroup, access is optional because the kernel docs show an
example that doesn't write an access field to the devices.deny file
[9].  The current kernel docs don't go into much detail on this
behavior (I expect unset and 'rwm' are equivalent), but if the kernel
doesn't need a value written, the spec should get out of the way and
allow users to not specify a value.

The reference links are sorted into two blocks, with kernel-doc links
sorted alphabetically followed by man pages sorted alphabetically by
section.  The cgroup link is new since 2016-01-13 [10].

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/y_Fsa2_jJaM
     Subject: Separate config entries for device mknod and cgroups?
     Date: Mon, 5 Oct 2015 12:46:55 -0700
     Message-ID: <20151005194655.GN28418@odin.tremily.us>
[2]: https://github.com/opencontainers/specs/pull/98
[3]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/qWHoKs8Fsrk
     Subject: removal of cgroups from the OCI Linux spec
     Date: Wed, 28 Oct 2015 17:01:59 +0000
     Message-ID: <CAD2oYtO1RMCcUp52w-xXemzDTs+J6t4hS5Mm4mX+uBnVONGDfA@mail.gmail.com>
[4]: https://github.com/opencontainers/specs/pull/101
[5]: https://github.com/opencontainers/specs/pull/171#discussion_r41190655
[6]: http://man7.org/linux/man-pages/man2/mknod.2.html#DESCRIPTION
[7]: https://github.com/opencontainers/specs/pull/298/files#r51053835
[8]: https://github.com/opencontainers/specs/pull/285#issuecomment-167823651
[9]: https://kernel.org/doc/Documentation/cgroup-v1/devices.txt
[10]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=34a9304a96d6351c2d35dcdc9293258378fc0bd8

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 13:52:15 -08:00
W. Trevor King cb2da5430a config: Single, unified config file
Reverting 7232e4b1 (specs: introduce the concept of a runtime.json,
2015-07-30, #88) after discussion on the mailing list [1].  The main
reason is that it's hard to draw a clear line around "inherently
runtime-specific" or "non-portable", so we shouldn't try to do that in
the spec.  Folks who want to flag settings as non-portable for their
own system are welcome to do so (e.g. "we will clobber 'hooks' in
bundles we run") are welcome to do so, but we don't have to have
to split the config into multiple files to do that.

There have been a number of additional changes since #88, so this
isn't a pure Git reversion.  Besides copy-pasting and the associated
link-target updates, I've:

* Restored path -> destination, now that the mount type contains both
  source and target paths again.  I'd prefer 'target' to 'destination'
  to match mount(2), but the pre-7232e4b1 phrasing was 'destination'
  (possibly due to Windows using 'target' for the source?).

* Restored the Windows mount example to its pre-7232e4b1 content.

* Removed required mounts from the config example (requirements landed
  in 3848a238, config-linux: specify the default devices/filesystems
  available, 2015-09-09, #164), because specifying those mounts in the
  config is now redundant.

* Used headers (vs. bold paragraphs) to set off mount examples so we
  get link anchors in the rendered Markdown.

* Replaced references to runtime.json with references to config.json.

[1]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
     Subject: Single, unified config file (i.e. rolling back specs#88)
     Date: Wed, 4 Nov 2015 09:53:20 -0800
     Message-ID: <20151104175320.GC24652@odin.tremily.us>

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-01-27 09:51:54 -08:00
Gao feng 053f05933b move the description of user ns mapping to proper file
They should stay in runtime not config.

Signed-off-by: Gao feng <omarapazanadi@gmail.com>
2016-01-05 14:19:45 +08:00
Vincent Batts 70372d3880 *.md: update TOC and links
Some of the docs were not even linked to, and did not have a logic
outline for their grouping.

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2015-09-25 11:47:16 -04:00
Vincent Batts 712a7467d1 Merge remote-tracking branch 'origin/pr/163' 2015-09-10 10:07:40 -04:00