Commit Graph

25 Commits

Author SHA1 Message Date
Will Martin ca4f427af1 Support cgroups with limits as rootless
Signed-off-by: Ed King <eking@pivotal.io>
Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io>
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
2017-10-05 11:22:54 +01:00
Giuseppe Scrivano 3282f5a7c1
tests: fix for rootless multiple uids/gids
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2017-09-09 12:45:32 +10:00
Giuseppe Scrivano d8b669400a
rootless: allow multiple user/group mappings
Take advantage of the newuidmap/newgidmap tools to allow multiple
users/groups to be mapped into the new user namespace in the rootless
case.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
[ rebased to handle intelrdt changes. ]
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-09-09 12:45:32 +10:00
Xiaochen Shen 88d22fde40 libcontainer: intelrdt: use init() to avoid race condition
This is the follow-up PR of #1279 to fix remaining issues:

Use init() to avoid race condition in IsIntelRdtEnabled().
Add also rename some variables and functions.

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-08 17:15:31 +08:00
Xiaochen Shen 692f6e1e27 libcontainer: add support for Intel RDT/CAT in runc
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux 4.10 kernel or newer, the interface is defined and exposed via
"resource control" filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://www.kernel.org/doc/Documentation/x86/intel_rdt_ui.txt

An example for runc:
Consider a two-socket machine with two L3 caches where the default CBM is
0xfffff and the max CBM length is 20 bits. With this configuration, tasks
inside the container only have access to the "upper" 80% of L3 cache id 0 and
the "lower" 50% L3 cache id 1:

"linux": {
	"intelRdt": {
		"l3CacheSchema": "L3:0=ffff0;1=3ff"
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
2017-09-01 14:26:33 +08:00
Daniel, Dao Quang Minh 13a8c5d140 Merge pull request #1365 from hqhq/use_go_selinux
Use opencontainers/selinux package
2017-04-15 14:22:32 +01:00
Aleksa Sarai f0876b0427
libcontainer: configs: add proper HostUID and HostGID
Previously Host{U,G}ID only gave you the root mapping, which isn't very
useful if you are trying to do other things with the IDMaps.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:46:20 +11:00
Aleksa Sarai d2f49696b0
runc: add support for rootless containers
This enables the support for the rootless container mode. There are many
restrictions on what rootless containers can do, so many different runC
commands have been disabled:

* runc checkpoint
* runc events
* runc pause
* runc ps
* runc restore
* runc resume
* runc update

The following commands work:

* runc create
* runc delete
* runc exec
* runc kill
* runc list
* runc run
* runc spec
* runc state

In addition, any specification options that imply joining cgroups have
also been disabled. This is due to support for unprivileged subtree
management not being available from Linux upstream.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2017-03-23 20:45:24 +11:00
Qiang Huang 5e7b48f7c0 Use opencontainers/selinux package
It's splitted as a separate project.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2017-03-23 08:21:19 +08:00
Samuel Ortiz f19aa2d04d
validate: Check that the given namespace path is a symlink
When checking if the provided networking namespace is the host
one or not, we should first check if it's a symbolic link or not
as in some cases we can use persistent networking namespace under
e.g. /var/run/netns/.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-12-10 11:14:49 +01:00
Qiang Huang 81d6088c8f Unify rootfs validation
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2016-10-29 10:31:44 +08:00
Aleksa Sarai 1ab3c035d2
validator: actually test success
Previously we only tested failures, which causes us to miss issues where
setting sysctls would *always* fail.

Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 23:07:57 +11:00
Aleksa Sarai 2a94c3651b
validator: unbreak sysctl net.* validation
When changing this validation, the code actually allowing the validation
to pass was removed. This meant that any net.* sysctl would always fail
to validate.

Fixes: bc84f83344 ("fix docker/docker#27484")
Reported-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
2016-10-26 22:58:51 +11:00
Ce Gao 41c35810f2 add test cases about host ns
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:31:15 +08:00
Ce Gao bc84f83344 fix docker/docker#27484
Signed-off-by: Ce Gao <ce.gao@outlook.com>
2016-10-22 11:22:52 +08:00
Zhao Lei bac8b4f0b4 UNITTEST: Bypass userns test on platform without userns support
We should bypass userns test instead of show fail in platform
without userns support.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
2016-07-25 15:35:04 +08:00
Aleksa Sarai 399175c227 Merge pull request #679 from rajasec/selinux-errorcheck
Adding selinux check during container start
2016-04-24 16:24:26 +00:00
rajasec 733ff99f6d Updating kcore in validator test
Signed-off-by: rajasec <rajasec79@gmail.com>
2016-04-21 15:29:19 +05:30
rajasec d0bf80e481 Adding selinux check during container start
Signed-off-by: rajasec <rajasec79@gmail.com>

Fixed review comments and rebased

Signed-off-by: rajasec <rajasec79@gmail.com>

updated the message as per review comment

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
2016-04-19 22:22:04 +05:30
Mrunal Patel 5640330693 Fix for runc failing when rootfs has a traling slash
Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2016-04-11 09:50:28 -07:00
Alberto Leal dca2d12760 Add unit tests for validate.Validator
Signed-off-by: Alberto Leal <albertonb@gmail.com>
2016-04-06 11:18:11 +01:00
Dan Walsh d2a39ea043 Return a more meaningful error when namespaces are disabled
Signed-off-by: Dan Walsh <dwalsh@redhat.com>
2016-03-30 16:16:24 -04:00
Mrunal Patel f7d1401a69 Add validation for sysctl
/proc/sys isn't completely namespaced and only some properties are allowed
per linux namespace.

Signed-off-by: Mrunal Patel <mrunalp@gmail.com>
2015-09-25 14:04:18 -04:00
Michael Crosby 080df7ab88 Update import paths for new repository
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:29:59 -07:00
Michael Crosby 8f97d39dd2 Move libcontainer into subdirectory
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
2015-06-21 19:29:15 -07:00