Commit Graph

88 Commits

Author SHA1 Message Date
Mrunal Patel d01ef9a806 Add anchors to config and config linux
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-02 11:00:31 -08:00
Qiang Huang ec9449187b Set specs value the same as kernel API input
This partially revert #648 , after a second thought, I think we
should use specs value the same as kernel API input, see:
https://github.com/opencontainers/runtime-spec/issues/692#issuecomment-281889852

For memory and hugetlb limits *.limit_in_bytes, cgroup APIs take the values
as string, but the parsed values are unsigned long, see:
https://github.com/torvalds/linux/blob/v4.10/mm/page_counter.c#L175-L193

For `cpu.cfs_quota_us` and `cpu.rt_runtime_us`, cgroup APIs take the input
value as signed long long, while `cpu.cfs_period_us` and `cpu.rt_periof_us`
take the input value as unsigned long long.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-01 09:10:43 +08:00
zhouhao 5a470213e7 config-linux.md: fix info
Signed-off-by: zhouhao <zhouhao@cn.fujitsu.com>
2017-02-27 16:07:52 +08:00
Mrunal Patel ae7a541930 Merge pull request #657 from GrantSeltzer/improve-seccomp-spec
config: Improve seccomp format to be more expressive
2017-02-24 18:59:49 -08:00
grantseltzer 652323cd77 improve seccomp format to be more expressive
Signed-off-by: grantseltzer <grantseltzer@gmail.com>
2017-02-22 18:17:16 -05:00
Qiang Huang a5c4e91dae Remove uid/gid mapping limit depend on kernel
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-02-22 14:43:18 -08:00
Daniel Dao 279c3c095c
linux: relax filesystem requirements for container
change MUST to SHOULD so containers are not required to have all these
filesystems mounted.

Signed-off-by: Daniel Dao <dqminh89@gmail.com>
2017-01-23 12:44:36 +00:00
Rob Dolin (MSFT) 646826658d [Config Linux] Clarify: App --> Container
Replaces #577

Signed-off-by: Rob Dolin (MSFT) <robdolin@microsoft.com>
2017-01-18 10:29:13 -08:00
Mrunal Patel c0206be451 Merge pull request #647 from Mashimiao/config-linux-fix-device-path
config-linux: Add restriction for duplicated device path
2017-01-12 09:57:11 -08:00
Ma Shimiao 1fc1464dbc config-linux: Add restriction for duplicated device path
I think runtime should generate an error, if devices has
duplicated device path.
Because we don't know which one is really needed.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-01-12 14:24:52 +08:00
W. Trevor King d43fc428aa config-linux: Lift no-tweaking namespace restriction
This restriction originally landed via 02b456e9 (Clarify behavior
around namespaces paths, 2015-09-08, #158).  The hostname case landed
via 66a0543e (config: Require a new UTS namespace for config.json's
hostname, 2015-10-05, #214) citing the namespace restriction.  The
restriciton extended to runtime namespaces in 01c2d55f (config-linux:
Extend no-tweak requirement to runtime namespaces, 2016-08-24, #538).
There was a proposal in-flight to get config-wide consistency around
the no-tweaking concept [1].

In today's meeting, the maintainer consensus was to strike the
no-tweaking restriction [2], which is what I've done here.  I've
removed the ROADMAP entry because this gives folks a way to adjust
existing containers (launch a new container which joins and tweaks the
original).

The hostname entry still mentions the UTS namespace to provide a guard
against accidental foot-gunning.  There was no no-tweaking language
for properties related to other namespaces (e.g. 'mounts').
Maybe the other namespaces have more obvious names.

[1]: https://github.com/opencontainers/runtime-spec/pull/540
[2]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2017/opencontainers.2017-01-11-22.04.log.html#l-117

Signed-off-by: W. Trevor King <wking@tremily.us>
2017-01-11 15:16:54 -08:00
Qiang Huang 082e93a2bd Allow negative value for some resource fields
Carry #499

For these values, cgroup kernal APIs accept -1 to set
them as unlimited, as docker and runc all support
update resources, we should not set drawbacks in spec.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-01-05 19:03:57 +08:00
Doug Davis e7be40f0c3 Cleanup the spec a bit to remove WG/git text that's not really part of the spec
renamed an href to "container-namespace2" to avoid a dup-warning msg from
the PDF generator

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-11-16 09:50:03 -08:00
Rob Dolin (MSFT) 675a67dc17 [Config Linux] Consistent size values in example
Matches the example in config.md

Signed-off-by: Rob Dolin <robdolin@microsoft.com>
2016-11-08 13:44:16 -08:00
Daniel, Dao Quang Minh f815650e67 Merge pull request #608 from hqhq/fix_format_issues
Fix several format issues found by pdf and html
2016-11-08 02:21:10 +00:00
Qiang Huang 0df2586f03 Merge pull request #518 from mrunalp/terminal
Clarify wording for terminal setting and /dev/console
2016-11-07 09:49:28 +08:00
Qiang Huang 661314a926 Fix several format issues found by pdf and html
This carries #578 and fixes some other format issues.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-11-03 09:33:51 +08:00
Qiang Huang b8e2ebec5f Merge pull request #597 from WeiZhang555/fix-duplicated-namespaces
Forbid duplicated namespaces with same `type`.
2016-11-01 11:42:41 +08:00
Mrunal Patel dc42b45811 Merge pull request #601 from hqhq/rewrite_idmapping
Rewrite LinuxIDMappings
2016-10-31 13:58:45 -07:00
Qiang Huang 4404abf6cb Consistent wording for parameters in array and object
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-28 15:09:05 +08:00
Vincent Batts 28c6afea8b Merge pull request #600 from hqhq/fix_typos
Fix some typos
2016-10-28 01:11:18 +00:00
Qiang Huang 621684f645 Rewrite LinuxIDMappings
Basicly make the format consistent with others, no
semantics change.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 19:00:39 +08:00
Qiang Huang f37cd3a903 Fix some typos
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 18:00:08 +08:00
Qiang Huang 2379be75cb Use IO instead of io
For consistency, while all other places use IO.

$ grep -rnIw IO * | wc -l
10

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-27 16:10:02 +08:00
Zhang Wei c22eeb2197 Forbid duplicated namespaces with same `type`.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 11:25:43 +08:00
Mrunal Patel 52f3cdecd1 Clarify wording for terminal setting and /dev/console
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-10-19 10:22:05 -07:00
Ma Shimiao 25f44dd0e8 config-linux: fix format and definitely require value of masked and readonly paths
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2016-09-30 13:51:00 +08:00
W. Trevor King d49c29f042 config: Replace "required" with "REQUIRED"
In all of these cases we want to use the RFC 2119 semantics.
Generated with:

  $ sed -i 's/required/REQUIRED/g' config*.md

after which I rolled back the change for:

  ...controllers required to fulfill...

since that was already MUSTed.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-09-17 22:03:26 -07:00
W. Trevor King c35cf57303 config: Replace "optional" with "OPTIONAL"
In all of these cases we want to use the RFC 2119 semantics.
Generated with:

  $ sed -i 's/optional/OPTIONAL/g' config*.md

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-09-17 22:03:26 -07:00
W. Trevor King 01c2d55fac config-linux: Extend no-tweak requirement to runtime namespaces
Since [1] we've required runtimes to error out if a configuration
joins an existing namespace and adjusts it somehow (e.g. joining an
existing UTC namespace and setting 'hostname', [2]).  However, the
wording from [1] (which survives untouched in the current master) only
talked about "when a path is specified".  I see two possible
approaches for internal consistency:

a. Lift the OCI restriction and allow join-and-tweak [3] where the
   kernel supports it.  When we landed the current restriction, the
   main issues seemed to be "we don't have a clear use-case for join
   and tweak" [4] (although see [5]) and "this is a foot gun [6,7]"
   (I'd rather leave policy to higher-level config linters).

b. Extend the OCI restriction to all cases where the runtime does not
   create a new namespace.  Besides the already covered "namespace
   entry exists and includes 'path'", we'd also want to forbid configs
   that were missing the relevant namespace(s) entirely (in which case
   the container inherits the host namespace(s)).

I'm partial to (a) in the long run, but (b) is less of a shift from
the current spec and likely a better choice for a pending 1.0.

This commit implements (b).

It also makes it explicit that not listing a namespace type will cause
the container to inherit the runtime namespace of that type.

[1]: https://github.com/opencontainers/runtime-spec/pull/158
     Subject: Clarify behavior around namespaces paths
[2]: https://github.com/opencontainers/runtime-spec/pull/214
     Subject: config: Require a new UTS namespace for config.json's hostname
[3]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-138687129
[4]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-138997548
[5]: https://github.com/opencontainers/runtime-spec/pull/305
     Subject: [Tracker] Live Container Updates
[6]: https://github.com/opencontainers/runtime-spec/pull/158#issuecomment-139106987
[7]: https://github.com/opencontainers/runtime-spec/issues/537#issuecomment-242132288
     Subject: [linux] Tweaking host namespaces?

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-24 10:41:50 -07:00
Lei Jitang d0b0ac224f Use filesystem instead of file system
Signed-off-by: Lei Jitang <leijitang@huawei.com>
2016-08-12 00:00:00 -04:00
W. Trevor King 054d2df15a config-linux: Make linux.resources.devices explicitly optional
And mark it omitempty to avoid:

  $ ocitools generate --template <(echo '{"linux": {"resources": {}}}') | jq .linux
  {
    "resources": {
      "devices": null
    }
  }

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-03 09:13:53 -07:00
W. Trevor King 60fff3f51c config-linux: Add (array, optional) for linux.devices
To match the omitempty which the Go property has had since 28cc4239
(add omitempty to 'Device' and 'Namespace', 2016-03-10, #340).

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-08-03 09:13:53 -07:00
Aleksa Sarai 4ed839e747
config-linux: add example of cgroup resource limits
The example section looks very sparse otherwise.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 4291fd1d5a
config-linux: allow lazy cgroup handling
Make explicit that runtimes only have to attach to the bare minimum
number of cgroups in order to fulfil the users' requirements. However,
runtimes are of course allowed to attach to more than the bare minimum.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 0c440a216c
config-linux: clarify cgroupsPath
Clarify some of the confusion with cgroupsPath. Due to systemd, we
cannot require that relative paths be treated in any specific way. In
addition, add a line stating that not all values of cgroupsPath are
required to be valid (and that runtimes must error out if they have an
invalid cgroup path). However, any given value of cgroupsPath should
provide consistent results.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-23 02:46:12 +10:00
Aleksa Sarai 9ffd72407b
config-linux: cleanup cgroup wording
Some of the wording was a bit clumsy (and incorrect, by conflating
different concepts in control groups as "cgroups").

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-07-22 01:30:36 +10:00
Phil Estes 124ce0beeb Add new architectures from libseccomp 2.3.0
Signed-off-by: Phil Estes <estesp@gmail.com>
2016-06-22 17:43:50 -04:00
Aleksa Sarai ce19b8d167 *: add support for cgroup namespace
The cgroup namespace is a new kernel feature available in 4.6+ that
allows a container to isolate its cgroup hierarchy. This currently only
allows for hiding information from /proc/self/cgroup, and mounting
cgroupfs as an unprivileged user. In the future, this namespace may
allow for subtree management by a container.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-06-04 00:14:39 +10:00
W. Trevor King f830d50a52 config-linux: Make "don't modify filesystem permissions" generic
The user-namespace restriction isn't about the root filesystem in
particular.  For example, if you bind mount in a second filesystem,
the runtime shouldn't adjust ownership on that filesystem either.

I've also adjusted the old "permissions" to "ownership", since that
more clearly reflects the fields (user and group) that you would
modify if you wanted to adjust for user namespacing.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-24 14:27:38 -07:00
W. Trevor King b373a155de config: Split platform-specific configuration into its own section (#414)
To make it clear that the whole 'linux' section is optional.

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-05-02 14:04:39 -04:00
Mrunal Patel 7350d5e1f1 Add support for Selinux mount context labels
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-22 13:40:49 -07:00
Mrunal Patel 6734c7a3a1 Merge pull request #370 from vbatts/json_schema_and_examples
Json schema and examples
2016-04-11 17:25:03 -07:00
W. Trevor King 74da108732 config-linux.md: Reword kernelTCP docs and mention bytes
Avoid the dangling 'using' from e9a6d948 (cgroup: Add support for
memory.kmem.tcp.limit_in_bytes, 2015-10-26, #235).  I've tried to echo
the kernel docs by mentioning buffer memory [1].  I'd personally
prefer linking to the kernel docs and mentioning
memory.kmem.tcp.limit_in_bytes, but that seemed like too big of a
break from the existing style for this commit.

[1]: https://kernel.org/doc/Documentation/cgroup-v1/memory.txt

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-04-11 16:07:28 -07:00
Vincent Batts d4e7326d50 config: JSON examples
* "complete" JSON example
* fix a couple of values
* fix a missing comma

Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
2016-04-11 18:56:04 -04:00
Michael Crosby adcbe530a9 Add masked and readonly paths
Fixes #320

This adds the maskedPaths and readonlyPaths fields to the spec so that
proper masking and setting of files in /proc can be configured.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-04-01 10:46:41 -07:00
Antonio Murdaca 5ded78475c *: fix typos
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-03-21 11:51:19 +01:00
W. Trevor King 5dad125595 config-linux: Specify host mount namespace for namespace paths
Avoid trouble with situations like:

  # mount --bind /mnt/test /mnt/test
  # mount --make-rprivate /mnt/test
  # touch /mnt/test/mnt /mnt/test/user
  # mount --bind /proc/123/ns/mnt /mnt/test/mnt
  # mount --bind /proc/123/ns/user /mnt/test/user
  # nsenter --mount=/proc/123/ns/mnt --user /proc/123/ns/user sh

which uses the required private mount for binding mount namespace
references [1,2,3].  We want to avoid:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime joins the mount namespace referenced by fd 3.
3. Runtime fails to open /mnt/test/user, because /mnt/test is not
   visible in the current mount namespace.

and instead get runtime authors to setup flows like:

1. Runtime opens /mnt/test/mnt as fd 3.
2. Runtime opens /mnt/test/user as fd 4.
3. Runtime joins the mount namespace referenced by fd 3.
4. Runtime joins the user namespace referenced by fd 4.

This also applies to new namespace creation.  We want to avoid:

1. Runtime clones a container process with a new mount namespace.
2c. Container process fails to open /mnt/test/user, because /mnt/test
    is not visible in the current mount namespace.

in favor of something like:

1. Runtime opens /mnt/test/user as fd 3.
2. Runtime clones a container process with a new mount namespace.
3h. Host process closes unneeded fd 3.
3c. Container process joins the user namespace referenced by fd 3.

I also define runtime and container namespaces, so we have consistent
terminology.  I prefer:

* host namespace: a namespace you are in when you invoke the runtime
* host process: the runtime process invoked by the user
* container process: the process created by a clone call in the host
  process which will eventually execute the user-configured process.

Both the host and container processes are running runtime code
(although the container process eventually transitions to
user-configured code), so I find "runtime process", "runtime
namespace", etc. to be imprecise.  However, the maintainer consensus
is for "runtime namespace" [4,5], so that's what we're going with
here.

[1]: http://karelzak.blogspot.com/2015/04/persistent-namespaces.html
[2]: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4ce5d2b1a8fde84c0eebe70652cf28b9beda6b4e
[3]: http://mid.gmane.org/87haeahkzc.fsf@xmission.com
[4]: https://github.com/opencontainers/specs/pull/275#discussion_r48057211
[5]: https://github.com/opencontainers/specs/pull/275#discussion_r48324264

Signed-off-by: W. Trevor King <wking@tremily.us>
2016-03-16 14:47:29 -07:00
Julian Friedman 9d9ed06d5e Move rlimits to process
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
2016-03-10 09:44:43 +00:00
Michael Crosby 5a8a779fb0 Move process specific settings to process
This moves process specific settings like caps, apparmor, and selinux
process label onto the process structure to allow the same settings to
be changed at exec time.

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-03-02 11:40:09 -08:00