Commit Graph

58 Commits

Author SHA1 Message Date
Mrunal Patel c6e4a1ebeb
Merge pull request #1665 from Mashimiao/gidmapping-valid-fix
specconv: avoid skipping gidmappings applied when uidmappings is empty
2017-12-11 09:50:54 -08:00
Ma Shimiao 57edfbbaf2 specconv: avoid skipping gidmappings applied when uidmappings is empty
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-11-30 16:24:36 +08:00
Ma Shimiao 17db6560be support unbindable,runbindable for rootfs propagation
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-11-17 16:14:15 +08:00
Akihiro Suda 0aac2368e4 specconv.Example(): add /proc/scsi to masked paths
Port over https://github.com/moby/moby/pull/35399

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2017-11-04 17:38:14 +00:00
Lorenzo Fontana 780f8ef567
Specconv: Test create command hooks and seccomp setup
Signed-off-by: Lorenzo Fontana <lo@linux.com>
2017-10-28 21:46:46 +02:00
Lorenzo Fontana c0e6e12f9d
Test Cgroup creation and memory allocations
Signed-off-by: Lorenzo Fontana <lo@linux.com>
2017-10-25 01:58:10 +02:00
Aleksa Sarai d4f0f9a52b
specconv: emit an error when using MS_PRIVATE with --no-pivot
Due to the semantics of chroot(2) when it comes to mount namespaces, it
is not generally safe to use MS_PRIVATE as a mount propgation when using
chroot(2). The reason for this is that this effectively results in a set
of mount references being held by the chroot'd namespace which the
namespace cannot free. pivot_root(2) does not have this issue because
the @old_root can be unmounted by the process.

Ultimately, --no-pivot is not really necessary anymore as a commonly
used option since f8e6b5af5e ("rootfs: make pivot_root not use a
temporary directory") resolved the read-only issue. But if someone
really needs to use it, MS_PRIVATE is never a good idea.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-10-08 17:50:55 +11:00
Qiang Huang 79ad714374 Merge pull request #1598 from euank/ragent
libcontainer: default mount propagation correctly
2017-09-25 11:55:29 +08:00
Euan Kemp 4301b440d6 libcontainer: default mount propagation correctly
The code in prepareRoot (e385f67a0e/libcontainer/rootfs_linux.go (L599-L605))
attempts to default the rootfs mount to `rslave`. However, since the spec
conversion has already defaulted it to `rprivate`, that code doesn't
actually ever do anything.

This changes the spec conversion code to accept "" and treat it as 0.

Implicitly, this makes rootfs propagation default to `rslave`, which is
a part of fixing the moby bug https://github.com/moby/moby/issues/34672

Alternate implementatoins include changing this defaulting to be
`rslave` and removing the defaulting code in prepareRoot, or skipping
the mapping entirely for "", but I think this change is the cleanest of
those options.

Signed-off-by: Euan Kemp <euan.kemp@coreos.com>
2017-09-22 13:36:23 -07:00
Mrunal Patel 13fa5d2953 Merge pull request #1588 from s7v7nislands/delete_unused
Delete unused function
2017-09-08 17:34:00 -07:00
s7v7nislands c795b8690b Delete unused function
Signed-off-by: Xiaobing Jiang <s7v7nislands@gmail.com>
2017-09-08 10:35:46 +08:00
Ma Shimiao c3d20e7817 Fixes #1585 config.Namespaces is empty when accessed
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-09-08 09:30:07 +08:00
Xiaochen Shen 692f6e1e27 libcontainer: add support for Intel RDT/CAT in runc
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-01 14:26:33 +08:00
Ma Shimiao 2333e7dc67 fix panic when Linux is nil for rootless case
congfig.Sysctl setting is duplicated.
when contianer is rootless and Linux is nil, runc will panic.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-08-16 09:11:13 +08:00
Ma Shimiao 527dc5acbb fix panic when Linux is nil
Linux is not always not nil.
If Linux is nil, panic will occur.

Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-08-10 15:57:49 -04:00
Michael Crosby eb70c213ba Update runtime-spec to rc6
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-07-12 16:24:04 -07:00
Michael Crosby fef3aced0e Merge pull request #1460 from wking/mount-option-lazytime
libcontainer/specconv/spec_linux: Add support for (no)lazytime
2017-06-29 10:06:23 -07:00
Justin Cormack e1146182a8 Remove Platform as no longer in OCI spec
This was never used, just validated, so was removed from spec.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
2017-06-27 12:16:07 +01:00
W. Trevor King 4f81337e95 libcontainer/specconv/spec_linux: Add support for (no)lazytime
And also silent, loud, (no)iversion, and (no)acl.  This is part of
catching runC up with the spec, which punts valid options to mount(8)
[1,2].

(no)acl is a filesystem-specific entry in mount(8), but it's
represented by a MS_* flag in mount(2) so we need an entry in the
translation table.

[1]: https://github.com/opencontainers/runtime-spec/blame/v1.0.0-rc5/config.md#L68
[2]: https://github.com/opencontainers/runtime-spec/pull/771

Signed-off-by: W. Trevor King <wking@tremily.us>
2017-06-01 20:43:35 -07:00
Michael Crosby 854b41d81e Update spec to 239c4e44f2
This provides updates to runc for the spec changes with *Process and
OOMScoreAdj

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2017-06-01 16:29:47 -07:00
Christy Perez 3d7cb4293c Move libcontainer to x/sys/unix
Since syscall is outdated and broken for some architectures,
use x/sys/unix instead.

There are still some dependencies on the syscall package that will
remain in syscall for the forseeable future:

Errno
Signal
SysProcAttr

Additionally:
- os still uses syscall, so it needs to be kept for anything
returning *os.ProcessState, such as process.Wait.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2017-05-22 17:35:20 -05:00
Aleksa Sarai d04cbc49d2
rootless: add autogenerated rootless config from `runc spec`
Since this is a runC-specific feature, this belongs here over in
opencontainers/ocitools (which is for generic OCI runtimes).

In addition, we don't create a new network namespace. This is because
currently if you want to set up a veth bridge you need CAP_NET_ADMIN in
both network namespaces' pinned user namespace to create the necessary
interfaces in each network namespace.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:21 +11:00
Aleksa Sarai f0876b0427
libcontainer: configs: add proper HostUID and HostGID
Previously Host{U,G}ID only gave you the root mapping, which isn't very
useful if you are trying to do other things with the IDMaps.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Aleksa Sarai d2f49696b0
runc: add support for rootless containers
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:

* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update

The following commands work:

* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state

In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:45:24 +11:00
Qiang Huang 8430cc4f48 Use uint64 for resources to keep consistency with runtime-spec
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-20 18:51:39 +08:00
Mrunal Patel 4f9cb13b64 Update runtime spec to 1.0.0.rc5
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2017-03-15 11:38:37 -07:00
Ma Shimiao 06e27471bb support create device with type p and u
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2017-02-10 14:45:15 +08:00
Zhang Wei 8eea644ccc Bump runtime-spec to v1.0.0-rc3
* Bump underlying runtime-spec to version 1.0.0-rc3
* Fix related changed struct names in config.go

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-12-17 14:02:35 +08:00
Zhang Wei a0f7977f0f Detect and forbid duplicated namespace in spec
When spec file contains duplicated namespaces, e.g.

specs: specs.Spec{
        Linux: &specs.Linux{
            Namespaces: []specs.Namespace{
                {
                    Type: "pid",
                },
                {
                    Type: "pid",
                    Path: "/proc/1/ns/pid",
                },
            },
        },
    }

runc should report malformed spec instead of using latest one by
default, because this spec could be quite confusing.

Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-10-27 00:44:36 +08:00
Alexander Morozov 1ab9d5e6f4 Merge pull request #845 from mrunalp/cp_tmpfs
Add support for copying up directories into tmpfs when a tmpfs is mounted over them
2016-10-21 13:47:16 -07:00
rajasec 034cba6af0 Fixing runc panic for missing file mode
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic for missing file mode

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-16 20:39:44 +05:30
rajasec 4b263c9594 Fixing runc panic during hugetlb pages
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixing runc panic during hugetlb pages

Signed-off-by: rajasec <rajasec79@gmail.com>
2016-10-15 19:47:33 +05:30
Shukui Yang affc105264 tiny fix, add a null check for specs.Resources.Pids.Limit
Signed-off-by: Shukui Yang <yangshukui@huawei.com>
2016-10-13 15:55:30 +08:00
Mrunal Patel 4356468f49 Parse the new extension flags
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-09-30 09:48:03 -07:00
Adam Thomason 83cbdbd64c Add checks for nil spec.Linux
Signed-off-by: Adam Thomason <ad@mthomason.net>
2016-09-11 16:31:34 -07:00
Zhang Wei 7303a9a720 Tiny refactor: remove unused local variables
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
2016-09-06 23:41:40 +08:00
Qiang Huang aa2dd02f5a Fix null point reference panic
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-09-01 08:34:22 +08:00
Qiang Huang 220e5098a8 Fix default cgroup path
Alternative of #895 , part of #892

The intension of current behavior if to create cgroup in
parent cgroup of current process, but we did this in a
wrong way, we used devices cgroup path of current process
as the default parent path for all subsystems, this is
wrong because we don't always have the same cgroup path
for all subsystems.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-08-30 14:12:15 +08:00
Mrunal Patel 4dedd09396 Merge pull request #937 from hushan/net_cls-classid
fix setting net_cls classid
2016-07-18 17:18:23 -04:00
Yen-Lin Chen a318a2ae1b Fixed typo in build constraint.
Signed-off-by: Yenlin Chen <hencrice@gmail.com>
2016-07-15 19:24:22 -07:00
Hushan Jia bb42f80a86 fix setting net_cls classid
Setting classid of net_cls cgroup failed:

ERRO[0000] process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument"
process_linux.go:291: setting cgroup config for ready process caused "failed to write 𐀁 to net_cls.classid: write /sys/fs/cgroup/net_cls,net_prio/user.slice/abc/net_cls.classid: invalid argument"

The spec has classid as a *uint32, the libcontainer configs should match the type.

Signed-off-by: Hushan Jia <hushan.jia@gmail.com>
2016-07-11 05:00:35 +08:00
Petar Petrov f9b72b1b46 Allow additional groups to be overridden in exec
Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Signed-off-by: Petar Petrov <pppepito86@gmail.com>
Signed-off-by: Georgi Sabev <georgethebeatle@gmail.com>
2016-06-21 10:35:11 +03:00
Aleksa Sarai 0636bdd45b Merge pull request #874 from crosbymichael/keyring
Add option to disable new session keys
2016-06-12 21:44:45 +10:00
root 56abe735f2 bug fix, LeafWeight nil err
Signed-off-by: root <yangshukui@huawei.com>
2016-06-10 18:11:20 -07:00
Michael Crosby 8c9db3a7a5 Add option to disable new session keys
This adds an `--no-new-keyring` flag to run and create so that a new
session keyring is not created for the container and the calling
processes keyring is inherited.

Fixes #818

Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-03 11:53:07 -07:00
Michael Crosby 5abffd3100 Add annotations to list and state output
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2016-06-02 12:44:43 -07:00
Mrunal Patel 091ed0b043 Merge pull request #777 from cyphar/fix-null-pointer-deref
libcontainer: specconv: fix nil dereference in resource setup
2016-04-24 19:09:30 -07:00
Aleksa Sarai a939c7ecd9 libcontainer: specconv: fix nil dereference in resource setup
This caused issues if someone omitted or set "resources": null, in the
runC config. The panic follows.

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x545b53]

goroutine 1 [running]:
panic(0x7aed40, 0xc820014260)
        /usr/lib64/go/src/runtime/panic.go:464 +0x3e6
github.com/opencontainers/runc/libcontainer/specconv.CreateLibcontainerConfig(0xc8200b0e30, 0x836480, 0x0, 0x0)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/opencontainers/runc/libcontainer/specconv/spec_linux.go:222 +0xe83
main.createContainer(0xc82007eb40, 0x7ffd8024e439, 0x4, 0xc82008e780, 0x0, 0x0, 0x0, 0x0)
        /home/cyphar/src/runc/utils_linux.go:174 +0x105
main.startContainer(0xc82007eb40, 0xc82008e780, 0x0, 0x0, 0x0)
        /home/cyphar/src/runc/start.go:114 +0x189
main.glob.func11(0xc82007eb40)
        /home/cyphar/src/runc/start.go:78 +0x13e
github.com/codegangsta/cli.Command.Run(0x829a58, 0x5, 0x0, 0x0, 0x0, 0x0, 0x0, 0x87ada0, 0x1a, 0x8dff80, ...)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/codegangsta/cli/command.go:137 +0x1081
github.com/codegangsta/cli.(*App).Run(0xc82007e900, 0xc82000a050, 0x5, 0x5, 0x0, 0x0)
        /home/cyphar/src/runc/Godeps/_workspace/src/github.com/codegangsta/cli/app.go:176 +0xffa
main.main()
        /home/cyphar/src/runc/main.go:123 +0xc8e

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-04-25 11:52:22 +10:00
Aleksa Sarai 399175c227 Merge pull request #679 from rajasec/selinux-errorcheck
Adding selinux check during container start
2016-04-24 16:24:26 +00:00
Mrunal Patel e25811108b Bump up spec and add support for mount label
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-22 15:31:39 -07:00