Adjust runc to new opencontainers/specs version

I deleted possibility to specify config file from commands for now.
Until we decide how it'll be done. Also I changed runc spec interface to
write config files instead of output them.

Signed-off-by: Alexander Morozov <lk4d4@docker.com>
This commit is contained in:
Alexander Morozov 2015-09-01 09:32:29 -07:00
parent 4d8e13fc3e
commit ea5032bc5e
23 changed files with 1260 additions and 684 deletions

3
Godeps/Godeps.json generated
View File

@ -53,7 +53,8 @@
},
{
"ImportPath": "github.com/opencontainers/specs",
"Rev": "da9240a7125f601aef46f66ea615177607b00d39"
"Comment": "v0.1.1",
"Rev": "f6ec7a75c65cd58322ec120f651ccdf465af7973"
},
{
"ImportPath": "github.com/seccomp/libseccomp-golang",

View File

@ -14,4 +14,5 @@ install: true
script:
- go vet -x ./...
- $HOME/gopath/bin/golint ./...
- go run .tools/validate.go -range ${TRAVIS_COMMIT_RANGE}

View File

@ -0,0 +1,8 @@
Michael Crosby <michael@docker.com> (@crosbymichael)
Alexander Morozov <lk4d4@docker.com> (@LK4D4)
Rohit Jnagal <jnagal@google.com> (@rjnagal)
Victor Marmol <vmarmol@google.com> (@vmarmol)
Mrunal Patel <mpatel@redhat.com> (@mrunalp)
Vincent Batts <vbatts@redhat.com> (@vbatts)
Daniel, Dao Quang Minh <dqminh89@gmail.com> (@dqminh)
Brandon Philips <brandon.philips@coreos.com> (@philips)

View File

@ -1,6 +1,7 @@
# Open Container Specifications
This project is where the [Open Container Initiative](http://www.opencontainers.org/) Specifications are written. This is a work in progress. We should have a first draft by end of July 2015.
This project is where the [Open Container Initiative](http://www.opencontainers.org/) Specifications are written.
This is a work in progress.
Table of Contents
@ -18,57 +19,95 @@ To provide context for users the following section gives example use cases for e
- A user can create a root filesystem and configuration, with low-level OS and host specific details, and launch it as a container under an Open Container runtime.
## Releases
There is a loose [Road Map](https://github.com/opencontainers/specs/wiki/RoadMap:) on the wiki.
During the `0.x` series of OCI releases we make no backwards compatibility guarantees and intend to break the schema during this series.
# The 5 principles of Standard Containers
Define a unit of software delivery called a Standard Container. The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container.
Define a unit of software delivery called a Standard Container.
The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container.
The specification for Standard Containers is straightforward. It mostly defines 1) a file format, 2) a set of standard operations, and 3) an execution environment.
The specification for Standard Containers is straightforward.
It mostly defines 1) a file format, 2) a set of standard operations, and 3) an execution environment.
A great analogy for this is the shipping container. Just like how Standard Containers are a fundamental unit of software delivery, shipping containers are a fundamental unit of physical delivery.
A great analogy for this is the shipping container.
Just like how Standard Containers are a fundamental unit of software delivery, shipping containers are a fundamental unit of physical delivery.
## 1. Standard operations
Just like shipping containers, Standard Containers define a set of STANDARD OPERATIONS. Shipping containers can be lifted, stacked, locked, loaded, unloaded and labelled. Similarly, Standard Containers can be created, started, and stopped using standard container tools (what this spec is about); copied and snapshotted using standard filesystem tools; and downloaded and uploaded using standard network tools.
Just like shipping containers, Standard Containers define a set of STANDARD OPERATIONS.
Shipping containers can be lifted, stacked, locked, loaded, unloaded and labelled.
Similarly, Standard Containers can be created, started, and stopped using standard container tools (what this spec is about); copied and snapshotted using standard filesystem tools; and downloaded and uploaded using standard network tools.
## 2. Content-agnostic
Just like shipping containers, Standard Containers are CONTENT-AGNOSTIC: all standard operations have the same effect regardless of the contents. A shipping container will be stacked in exactly the same way whether it contains Vietnamese powder coffee or spare Maserati parts. Similarly, Standard Containers are started or uploaded in the same way whether they contain a postgres database, a php application with its dependencies and application server, or Java build artifacts.
Just like shipping containers, Standard Containers are CONTENT-AGNOSTIC: all standard operations have the same effect regardless of the contents.
A shipping container will be stacked in exactly the same way whether it contains Vietnamese powder coffee or spare Maserati parts.
Similarly, Standard Containers are started or uploaded in the same way whether they contain a postgres database, a php application with its dependencies and application server, or Java build artifacts.
## 3. Infrastructure-agnostic
Both types of containers are INFRASTRUCTURE-AGNOSTIC: they can be transported to thousands of facilities around the world, and manipulated by a wide variety of equipment. A shipping container can be packed in a factory in Ukraine, transported by truck to the nearest routing center, stacked onto a train, loaded into a German boat by an Australian-built crane, stored in a warehouse at a US facility, etc. Similarly, a standard container can be bundled on my laptop, uploaded to S3, downloaded, run and snapshotted by a build server at Equinix in Virginia, uploaded to 10 staging servers in a home-made Openstack cluster, then sent to 30 production instances across 3 EC2 regions.
Both types of containers are INFRASTRUCTURE-AGNOSTIC: they can be transported to thousands of facilities around the world, and manipulated by a wide variety of equipment.
A shipping container can be packed in a factory in Ukraine, transported by truck to the nearest routing center, stacked onto a train, loaded into a German boat by an Australian-built crane, stored in a warehouse at a US facility, etc.
Similarly, a standard container can be bundled on my laptop, uploaded to S3, downloaded, run and snapshotted by a build server at Equinix in Virginia, uploaded to 10 staging servers in a home-made Openstack cluster, then sent to 30 production instances across 3 EC2 regions.
## 4. Designed for automation
Because they offer the same standard operations regardless of content and infrastructure, Standard Containers, just like their physical counterparts, are extremely well-suited for automation. In fact, you could say automation is their secret weapon.
Because they offer the same standard operations regardless of content and infrastructure, Standard Containers, just like their physical counterparts, are extremely well-suited for automation.
In fact, you could say automation is their secret weapon.
Many things that once required time-consuming and error-prone human effort can now be programmed. Before shipping containers, a bag of powder coffee was hauled, dragged, dropped, rolled and stacked by 10 different people in 10 different locations by the time it reached its destination. 1 out of 50 disappeared. 1 out of 20 was damaged. The process was slow, inefficient and cost a fortune - and was entirely different depending on the facility and the type of goods.
Many things that once required time-consuming and error-prone human effort can now be programmed.
Before shipping containers, a bag of powder coffee was hauled, dragged, dropped, rolled and stacked by 10 different people in 10 different locations by the time it reached its destination.
1 out of 50 disappeared.
1 out of 20 was damaged.
The process was slow, inefficient and cost a fortune - and was entirely different depending on the facility and the type of goods.
Similarly, before Standard Containers, by the time a software component ran in production, it had been individually built, configured, bundled, documented, patched, vendored, templated, tweaked and instrumented by 10 different people on 10 different computers. Builds failed, libraries conflicted, mirrors crashed, post-it notes were lost, logs were misplaced, cluster updates were half-broken. The process was slow, inefficient and cost a fortune - and was entirely different depending on the language and infrastructure provider.
Similarly, before Standard Containers, by the time a software component ran in production, it had been individually built, configured, bundled, documented, patched, vendored, templated, tweaked and instrumented by 10 different people on 10 different computers.
Builds failed, libraries conflicted, mirrors crashed, post-it notes were lost, logs were misplaced, cluster updates were half-broken.
The process was slow, inefficient and cost a fortune - and was entirely different depending on the language and infrastructure provider.
## 5. Industrial-grade delivery
There are 17 million shipping containers in existence, packed with every physical good imaginable. Every single one of them can be loaded onto the same boats, by the same cranes, in the same facilities, and sent anywhere in the World with incredible efficiency. It is embarrassing to think that a 30 ton shipment of coffee can safely travel half-way across the World in *less time* than it takes a software team to deliver its code from one datacenter to another sitting 10 miles away.
There are 17 million shipping containers in existence, packed with every physical good imaginable.
Every single one of them can be loaded onto the same boats, by the same cranes, in the same facilities, and sent anywhere in the World with incredible efficiency.
It is embarrassing to think that a 30 ton shipment of coffee can safely travel half-way across the World in *less time* than it takes a software team to deliver its code from one datacenter to another sitting 10 miles away.
With Standard Containers we can put an end to that embarrassment, by making INDUSTRIAL-GRADE DELIVERY of software a reality.
# Contributing
Development happens on github for the spec. Issues are used for bugs and actionable items and longer
discussions can happen on the mailing list. You can subscribe and join the mailing list on
[google groups](https://groups.google.com/a/opencontainers.org/forum/#!forum/dev).
Development happens on GitHub for the spec.
Issues are used for bugs and actionable items and longer discussions can happen on the [mailing list](#mailing-list).
The specification and code is licensed under the Apache 2.0 license found in
the `LICENSE` file of this repository.
The specification and code is licensed under the Apache 2.0 license found in the `LICENSE` file of this repository.
## Code of Conduct
Participation in the OpenContainers community is governed by [OpenContainer's Code of Conduct](code-of-conduct.md).
## Discuss your design
The project welcomes submissions, but please let everyone know what you are working on.
Before undertaking a nontrivial change to this specification, send mail to the [mailing list](#mailing-list) to discuss what you plan to do.
This gives everyone a chance to validate the design, helps prevent duplication of effort, and ensures that the idea fits.
It also guarantees that the design is sound before code is written; a GitHub pull-request is not the place for high-level discussions.
Typos and grammatical errors can go straight to a pull-request.
When in doubt, start on the [mailing-list](#mailing-list).
## Weekly Call
The contributors and maintainers of the project have a weekly meeting Wednesdays at 10:00 AM PST.
The link to the call will be posted on the mailing list each week along with set topics for discussion.
Everyone is welcome to participate in the call, although there can only be speaking members on the Google Hangout.
Participants who don't get a speaking slot can watch the live broadcast on [this YouTube channel][youtube] and post feedback and questions on [the IRC channel](#irc).
Everyone is welcome to propose additional topics, suggest other agenda alterations, or request a speaking slot via the mailing list.
Minutes for the call will be posted to the mailing list for those who are unable to join the call.
Everyone is welcome to participate in the [BlueJeans call][BlueJeans].
An initial agenda will be posted to the [mailing list](#mailing-list) earlier in the week, and everyone is welcome to propose additional topics or suggest other agenda alterations there.
Minutes for the call will be posted to the [mailing list](#mailing-list) for those who are unable to join the call.
## Mailing List
You can subscribe and join the mailing list on [Google Groups](https://groups.google.com/a/opencontainers.org/forum/#!forum/dev).
## IRC
@ -80,13 +119,12 @@ To keep consistency throughout the Markdown files in the Open Container spec all
This fixes two things: it makes diffing easier with git and it resolves fights about line wrapping length.
For example, this paragraph will span three lines in the Markdown source.
## Git commit
### Sign your work
The sign-off is a simple line at the end of the explanation for the
patch, which certifies that you wrote it or otherwise have the right to
pass it on as an open-source patch. The rules are pretty simple: if you
can certify the below (from
[developercertificate.org](http://developercertificate.org/)):
The sign-off is a simple line at the end of the explanation for the patch, which certifies that you wrote it or otherwise have the right to pass it on as an open-source patch.
The rules are pretty simple: if you can certify the below (from [developercertificate.org](http://developercertificate.org/)):
```
Developer Certificate of Origin
@ -135,4 +173,19 @@ using your real name (sorry, no pseudonyms or anonymous contributions.)
You can add the sign off when creating the git commit via `git commit -s`.
[youtube]: https://www.youtube.com/channel/UC1wmLdEYmwWcsFg7bt1s5nw
### Commit Style
Simple house-keeping for clean git history.
Read more on [How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/) or the Discussion section of [`git-commit(1)`](http://git-scm.com/docs/git-commit).
1. Separate the subject from body with a blank line
2. Limit the subject line to 50 characters
3. Capitalize the subject line
4. Do not end the subject line with a period
5. Use the imperative mood in the subject line
6. Wrap the body at 72 characters
7. Use the body to explain what and why vs. how
* If there was important/useful/essential conversation or information, copy or include a reference
8. When possible, one keyword to scope the change in the subject (i.e. "README: ...", "runtime: ...")
[BlueJeans]: https://bluejeans.com/1771332256/

View File

@ -1,8 +1,12 @@
# Bundle Container Format
This section defines a format for encoding a container as a *bundle* - a directory organized in a certain way, and containing all the necessary data and metadata for any compliant runtime to perform all standard operations against it. See also [OS X application bundles](http://en.wikipedia.org/wiki/Bundle_%28OS_X%29) for a similar use of the term *bundle*.
This section defines a format for encoding a container as a *bundle* - a directory organized in a certain way, and containing all the necessary data and metadata for any compliant runtime to perform all standard operations against it.
See also [OS X application bundles](http://en.wikipedia.org/wiki/Bundle_%28OS_X%29) for a similar use of the term *bundle*.
The format does not define distribution. In other words, it only specifies how a container must be stored on a local filesystem, for consumption by a runtime. It does not specify how to transfer a container between computers, how to discover containers, or assign names or versions to them. Any distribution method capable of preserving the original layout of a container, as specified here, is considered compliant.
The format does not define distribution.
In other words, it only specifies how a container must be stored on a local filesystem, for consumption by a runtime.
It does not specify how to transfer a container between computers, how to discover containers, or assign names or versions to them.
Any distribution method capable of preserving the original layout of a container, as specified here, is considered compliant.
A standard container bundle is made of the following 3 parts:
@ -12,19 +16,19 @@ A standard container bundle is made of the following 3 parts:
# Directory layout
A Standard Container bundle is a directory containing all the content needed to load and run a container. This includes its configuration file (`config.json`) and content directories. The main property of this directory layout is that it can be moved as a unit to another machine and run the same container.
A Standard Container bundle is a directory containing all the content needed to load and run a container.
This includes two configuration files `config.json` and `runtime.json`, and a rootfs directory.
The `config.json` file contains settings that are host independent and application specific such as security permissions, environment variables and arguments.
The `runtime.json` file contains settings that are host specific such as memory limits, local device access and mount points.
The goal is that the bundle can be moved as a unit to another machine and run the same application if `runtime.json` is removed or reconfigured.
The syntax and semantics for `config.json` are described in [this specification](config.md).
One or more *content directories* may be adjacent to the configuration file. This must include at least the root filesystem (referenced in the configuration file by the *root* field) and may include other related content (signatures, other configs, etc.). The interpretation of these resources is specified in the configuration. The names of the directories may be arbitrary, but users should consider using conventional names as in the example below.
A single `rootfs` directory MUST be in the same directory as the `config.json`.
The names of the directories may be arbitrary, but users should consider using conventional names as in the example below.
```
/
!
--- config.json
!
--- rootfs
!
--- signatures
config.json
runtime.json
rootfs/
```

View File

@ -0,0 +1,39 @@
# OpenContainers Code of Conduct
Behave as a community member, follow the code of conduct.
## Code of Conduct
The OpenContainers community is made up of a mixture of professionals and volunteers from all over the world.
When we disagree, we try to understand why.
Disagreements, both social and technical, happen all the time and OpenContainers is no exception.
It is important that we resolve disagreements and differing views constructively.
This code of conduct applies both within project spaces and in public spaces when an individual is representing the project or its community.
Participants should be aware of these concerns.
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion, or nationality.
Examples of unacceptable behavior by participants include:
* The use of sexualized language or imagery
* Personal attacks
* Trolling or insulting/derogatory comments
* Public or private harassment
* Publishing other's private information, such as physical or electronic addresses, without explicit permission
* Other unethical or unprofessional conduct
The OpenContainers team does not condone any statements by speakers contrary to these standards.
The OpenContainers team reserves the right to deny participation any individual found to be engaging in discriminatory or harassing actions.
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct.
By adopting this Code of Conduct, project maintainers commit themselves to fairly and consistently applying these principles to every aspect of managing this project.
## Thanks
Thanks to the [Fedora Code of Conduct](https://getfedora.org/code-of-conduct) and [Contributor Covenant](http://contributor-covenant.org) for inspiration and ideas.
Portions of this Code of Conduct are adapted from the Contributor Covenant, version 1.2.0, available at http://contributor-covenant.org/version/1/2/0/

View File

@ -1,186 +1,22 @@
# Linux-specific configuration
The Linux container specification uses various kernel features like namespaces,
cgroups, capabilities, LSM, and file system jails to fulfill the spec.
Additional information is needed for Linux over the [default spec configuration](config.md)
in order to configure these various kernel features.
The Linux container specification uses various kernel features like namespaces, cgroups, capabilities, LSM, and file system jails to fulfill the spec.
Additional information is needed for Linux over the [default spec configuration](config.md) in order to configure these various kernel features.
## Linux namespaces
## Capabilities
A namespace wraps a global system resource in an abstraction that makes it
appear to the processes within the namespace that they have their own isolated
instance of the global resource. Changes to the global resource are visible to
other processes that are members of the namespace, but are invisible to other
processes. For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html)
Namespaces are specified in the spec as an array of entries. Each entry has a
type field with possible values described below and an optional path element.
If a path is specified, that particular file is used to join that type of namespace.
```json
"namespaces": [
{
"type": "pid",
"path": "/proc/1234/ns/pid"
},
{
"type": "net",
"path": "/var/run/netns/neta"
},
{
"type": "mnt",
},
{
"type": "ipc",
},
{
"type": "uts",
},
{
"type": "user",
},
]
```
#### Namespace types
* **pid** processes inside the container will only be able to see other processes inside the same container.
* **network** the container will have it's own network stack.
* **mnt** the container will have an isolated mount table.
* **ipc** processes inside the container will only be able to communicate to other processes inside the same
container via system level IPC.
* **uts** the container will be able to have it's own hostname and domain name.
* **user** the container will be able to remap user and group IDs from the host to local users and groups
within the container.
### Access to devices
Devices is an array specifying the list of devices to be created in the container.
Next parameters can be specified:
* type - type of device: 'c', 'b', 'u' or 'p'. More info in `man mknod`
* path - full path to device inside container
* major, minor - major, minor numbers for device. More info in `man mknod`.
There is special value: `-1`, which means `*` for `device`
cgroup setup.
* permissions - cgroup permissions for device. A composition of 'r'
(read), 'w' (write), and 'm' (mknod).
* fileMode - file mode for device file
* uid - uid of device owner
* gid - gid of device owner
```json
"devices": [
{
"path": "/dev/random",
"type": "c",
"major": 1,
"minor": 8,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/urandom",
"type": "c",
"major": 1,
"minor": 9,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/null",
"type": "c",
"major": 1,
"minor": 3,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/zero",
"type": "c",
"major": 1,
"minor": 5,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/tty",
"type": "c",
"major": 5,
"minor": 0,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/full",
"type": "c",
"major": 1,
"minor": 7,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
}
]
```
## Linux control groups
Also known as cgroups, they are used to restrict resource usage for a container and handle
device access. cgroups provide controls to restrict cpu, memory, IO, and network for
the container. For more information, see the [kernel cgroups documentation](https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt)
## Linux capabilities
Capabilities is an array that specifies Linux capabilities that can be provided to the process
inside the container. Valid values are the string after `CAP_` for capabilities defined
in [the man page](http://man7.org/linux/man-pages/man7/capabilities.7.html)
Capabilities is an array that specifies Linux capabilities that can be provided to the process inside the container.
Valid values are the strings for capabilities defined in [the man page](http://man7.org/linux/man-pages/man7/capabilities.7.html)
```json
"capabilities": [
"AUDIT_WRITE",
"KILL",
"NET_BIND_SERVICE"
"CAP_AUDIT_WRITE",
"CAP_KILL",
"CAP_NET_BIND_SERVICE"
]
```
## Linux sysctl
sysctl allows kernel parameters to be modified at runtime for the container.
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
```json
"sysctl": {
"net.ipv4.ip_forward": "1",
"net.core.somaxconn": "256"
}
```
## Linux rlimits
```json
"rlimits": [
{
"type": "RLIMIT_NPROC",
"soft": 1024,
"hard": 102400
}
]
```
rlimits allow setting resource limits. The type is from the values defined in [the man page](http://man7.org/linux/man-pages/man2/setrlimit.2.html). The kernel enforces the soft limit for a resource while the hard limit acts as a ceiling for that value that could be set by an unprivileged process.
## Linux user namespace mappings
## User namespace mappings
```json
"uidMappings": [
@ -199,48 +35,29 @@ rlimits allow setting resource limits. The type is from the values defined in [t
]
```
uid/gid mappings describe the user namespace mappings from the host to the container. *hostID* is the starting uid/gid on the host to be mapped to *containerID* which is the starting uid/gid in the container and *size* refers to the number of ids to be mapped. The Linux kernel has a limit of 5 such mappings that can be specified.
uid/gid mappings describe the user namespace mappings from the host to the container.
The mappings represent how the bundle `rootfs` expects the user namespace to be setup and the runtime SHOULD NOT modify the permissions on the rootfs to realize the mapping.
*hostID* is the starting uid/gid on the host to be mapped to *containerID* which is the starting uid/gid in the container and *size* refers to the number of ids to be mapped.
There is a limit of 5 mappings which is the Linux kernel hard limit.
## Rootfs Mount Propagation
rootfsPropagation sets the rootfs's mount propagation. Its value is either slave, private, or shared. [The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
## Default Devices and File Systems
```json
"rootfsPropagation": "slave",
```
The Linux ABI includes both syscalls and several special file paths.
Applications expecting a Linux environment will very likely expect these files paths to be setup correctly.
## Selinux process label
The following devices and filesystems MUST be made available in each application's filesystem
Selinux process label specifies the label with which the processes in a container are run.
For more information about SELinux, see [Selinux documentation](http://selinuxproject.org/page/Main_Page)
```json
"selinuxProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675"
```
## Apparmor profile
Apparmor profile specifies the name of the apparmor profile that will be used for the container.
For more information about Apparmor, see [Apparmor documentation](https://wiki.ubuntu.com/AppArmor)
```json
"apparmorProfile": "acme_secure_profile"
```
## Seccomp
Seccomp provides application sandboxing mechanism in the Linux kernel.
Seccomp configuration allows one to configure actions to take for matched syscalls and furthermore also allows
matching on values passed as arguments to syscalls.
For more information about Seccomp, see [Seccomp kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt)
The actions and operators are strings that match the definitions in seccomp.h from [libseccomp](https://github.com/seccomp/libseccomp) and are translated to corresponding values.
```json
"seccomp": {
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "getcwd",
"action": "SCMP_ACT_ERRNO"
}
]
}
```
| Path | Type | Notes |
| ------------ | ------ | ------- |
| /proc | [procfs](https://www.kernel.org/doc/Documentation/filesystems/proc.txt) | |
| /sys | [sysfs](https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt) | |
| /dev/null | [device](http://man7.org/linux/man-pages/man4/null.4.html) | |
| /dev/zero | [device](http://man7.org/linux/man-pages/man4/zero.4.html) | |
| /dev/full | [device](http://man7.org/linux/man-pages/man4/full.4.html) | |
| /dev/random | [device](http://man7.org/linux/man-pages/man4/random.4.html) | |
| /dev/urandom | [device](http://man7.org/linux/man-pages/man4/random.4.html) | |
| /dev/tty | [device](http://man7.org/linux/man-pages/man4/tty.4.html) | |
| /dev/console | [device](http://man7.org/linux/man-pages/man4/console.4.html) | |
| /dev/pts | [devpts](https://www.kernel.org/doc/Documentation/filesystems/devpts.txt) | |
| /dev/ptmx | [device](https://www.kernel.org/doc/Documentation/filesystems/devpts.txt) | Bind-mount or symlink of /dev/pts/ptmx |
| /dev/shm | [tmpfs](https://www.kernel.org/doc/Documentation/filesystems/tmpfs.txt) | |

View File

@ -14,30 +14,7 @@ type Spec struct {
// Hostname is the container's host name.
Hostname string `json:"hostname"`
// Mounts profile configuration for adding mounts to the container's filesystem.
Mounts []Mount `json:"mounts"`
// Hooks are the commands run at various lifecycle events of the container.
Hooks Hooks `json:"hooks"`
}
type Hooks struct {
// Prestart is a list of hooks to be run before the container process is executed.
// On Linux, they are run after the container namespaces are created.
Prestart []Hook `json:"prestart"`
// Poststop is a list of hooks to be run after the container process exits.
Poststop []Hook `json:"poststop"`
}
// Mount specifies a mount for a container.
type Mount struct {
// Type specifies the mount kind.
Type string `json:"type"`
// Source specifies the source path of the mount. In the case of bind mounts on
// linux based systems this would be the file on the host.
Source string `json:"source"`
// Destination is the path where the mount will be placed relative to the container's root.
Destination string `json:"destination"`
// Options are fstab style mount options.
Options string `json:"options"`
Mounts []MountPoint `json:"mounts"`
}
// Process contains information to start a specific application inside the container.
@ -72,9 +49,22 @@ type Platform struct {
Arch string `json:"arch"`
}
// Hook specifies a command that is run at a particular event in the lifecycle of a container.
type Hook struct {
Path string `json:"path"`
Args []string `json:"args"`
Env []string `json:"env"`
// MountPoint describes a directory that may be fullfilled by a mount in the runtime.json.
type MountPoint struct {
// Name is a unique descriptive identifier for this mount point.
Name string `json:"name"`
// Path specifies the path of the mount. The path and child directories MUST exist, a runtime MUST NOT create directories automatically to a mount point.
Path string `json:"path"`
}
// State holds information about the runtime state of the container.
type State struct {
// Version is the version of the specification that is supported.
Version string `json:"version"`
// ID is the container ID
ID string `json:"id"`
// Pid is the process id for the container's main process.
Pid int `json:"pid"`
// Root is the path to the container's bundle directory.
Root string `json:"root"`
}

View File

@ -1,7 +1,7 @@
# Configuration file
The containers top-level directory MUST contain a configuration file called `config.json`.
For now the canonical schema is defined in [spec.go](spec.go) and [spec_linux.go](spec_linux.go), but this will be moved to a formal JSON schema over time.
The container's top-level directory MUST contain a configuration file called `config.json`.
For now the canonical schema is defined in [config.go](config.go) and [config_linux.go](config_linux.go), but this will be moved to a formal JSON schema over time.
The configuration file contains metadata necessary to implement standard operations against the container.
This includes the process to run, environment variables to inject, sandboxing features to use, etc.
@ -34,61 +34,37 @@ Each container has exactly one *root filesystem*, specified in the *root* object
}
```
## Mount Configuration
## Mount Points
Additional filesystems can be declared as "mounts", specified in the *mounts* array. The parameters are similar to the ones in Linux mount system call. [http://linux.die.net/man/2/mount](http://linux.die.net/man/2/mount)
You can add array of mount points inside container as `mounts`.
Each record in this array must have configuration in [runtime config](runtime-config.md#mount-configuration).
* **type** (string, required) Linux, *filesystemtype* argument supported by the kernel are listed in */proc/filesystems* (e.g., "minix", "ext2", "ext3", "jfs", "xfs", "reiserfs", "msdos", "proc", "nfs", "iso9660"). Windows: ntfs
* **source** (string, required) a device name, but can also be a directory name or a dummy. Windows, the volume name that is the target of the mount point. \\?\Volume\{GUID}\ (on Windows source is called target)
* **destination** (string, required) where the source filesystem is mounted relative to the container rootfs.
* **options** (string, optional) in the fstab format [https://wiki.archlinux.org/index.php/Fstab](https://wiki.archlinux.org/index.php/Fstab).
* **name** (string, required) Name of mount point. Used for config lookup.
* **path** (string, required) Destination of mount point: path inside container.
*Example (Linux)*
*Example*
```json
"mounts": [
{
"type": "proc",
"source": "proc",
"destination": "/proc",
"options": ""
"name": "proc",
"path": "/proc"
},
{
"type": "tmpfs",
"source": "tmpfs",
"destination": "/dev",
"options": "nosuid,strictatime,mode=755,size=65536k"
"name": "dev",
"path": "/dev"
},
{
"type": "devpts",
"source": "devpts",
"destination": "/dev/pts",
"options": "nosuid,noexec,newinstance,ptmxmode=0666,mode=0620,gid=5"
"name": "devpts",
"path": "/dev/pts"
},
{
"type": "bind",
"source": "/volumes/testing",
"destination": "/data",
"options": "rbind,rw"
"name": "data",
"path": "/data"
}
]
```
*Example (Windows)*
```json
"mounts": [
{
"type": "ntfs",
"source": "\\\\?\\Volume\\{2eca078d-5cbc-43d3-aff8-7e8511f60d0e}\\",
"destination": "C:\\Users\\crosbymichael\\My Fancy Mount Point\\",
"options": ""
}
]
```
See links for details about [mountvol](http://ss64.com/nt/mountvol.html) and [SetVolumeMountPoint](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365561(v=vs.85).aspx) in Windows.
## Process configuration
* **terminal** (bool, optional) specifies whether you want a terminal attached to that process. Defaults to false.
@ -111,13 +87,13 @@ For Linux-based systems the user structure has the following fields:
"user": {
"uid": 1,
"gid": 1,
"additionalGids": []
"additionalGids": [5, 6]
},
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": "",
"cwd": "/root",
"args": [
"sh"
]
@ -147,4 +123,5 @@ For Linux-based systems the user structure has the following fields:
}
```
Interpretation of the platform section of the JSON file is used to find which platform-specific sections may be available in the document. For example, if `os` is set to `linux`, then a JSON object conforming to the [Linux-specific schema](config-linux.md) SHOULD be found at the key `linux` in the `config.json`.
Interpretation of the platform section of the JSON file is used to find which platform-specific sections may be available in the document.
For example, if `os` is set to `linux`, then a JSON object conforming to the [Linux-specific schema](config-linux.md) SHOULD be found at the key `linux` in the `config.json`.

View File

@ -0,0 +1,27 @@
// +build linux
package specs
// LinuxSpec is the full specification for linux containers.
type LinuxSpec struct {
Spec
// Linux is platform specific configuration for linux based containers.
Linux Linux `json:"linux"`
}
// Linux contains platform specific configuration for linux based containers.
type Linux struct {
// Capabilities are linux capabilities that are kept for the container.
Capabilities []string `json:"capabilities"`
}
// User specifies linux specific user and group information for the container's
// main process.
type User struct {
// UID is the user id.
UID int32 `json:"uid"`
// GID is the group id.
GID int32 `json:"gid"`
// AdditionalGids are additional group ids set for the container's process.
AdditionalGids []int32 `json:"additionalGids"`
}

View File

@ -0,0 +1,261 @@
## Namespaces
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes.
For more information, see [the man page](http://man7.org/linux/man-pages/man7/namespaces.7.html).
Namespaces are specified in the spec as an array of entries.
Each entry has a type field with possible values described below and an optional path element.
If a path is specified, that particular file is used to join that type of namespace.
Also, when a path is specified, a runtime MUST assume that the setup for that particular namespace has already been done and error out if the config specifies anything else related to that namespace.
```json
"namespaces": [
{
"type": "pid",
"path": "/proc/1234/ns/pid"
},
{
"type": "network",
"path": "/var/run/netns/neta"
},
{
"type": "mount",
},
{
"type": "ipc",
},
{
"type": "uts",
},
{
"type": "user",
},
]
```
#### Namespace types
* **pid** processes inside the container will only be able to see other processes inside the same container.
* **network** the container will have its own network stack.
* **mount** the container will have an isolated mount table.
* **ipc** processes inside the container will only be able to communicate to other processes inside the same
container via system level IPC.
* **uts** the container will be able to have its own hostname and domain name.
* **user** the container will be able to remap user and group IDs from the host to local users and groups
within the container.
## Devices
Devices is an array specifying the list of devices to be created in the container.
Next parameters can be specified:
* type - type of device: 'c', 'b', 'u' or 'p'. More info in `man mknod`
* path - full path to device inside container
* major, minor - major, minor numbers for device. More info in `man mknod`.
There is special value: `-1`, which means `*` for `device`
cgroup setup.
* permissions - cgroup permissions for device. A composition of 'r'
(read), 'w' (write), and 'm' (mknod).
* fileMode - file mode for device file
* uid - uid of device owner
* gid - gid of device owner
```json
"devices": [
{
"path": "/dev/random",
"type": "c",
"major": 1,
"minor": 8,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/urandom",
"type": "c",
"major": 1,
"minor": 9,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/null",
"type": "c",
"major": 1,
"minor": 3,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/zero",
"type": "c",
"major": 1,
"minor": 5,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/tty",
"type": "c",
"major": 5,
"minor": 0,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
},
{
"path": "/dev/full",
"type": "c",
"major": 1,
"minor": 7,
"permissions": "rwm",
"fileMode": 0666,
"uid": 0,
"gid": 0
}
]
```
## Control groups
Also known as cgroups, they are used to restrict resource usage for a container and handle device access.
cgroups provide controls to restrict cpu, memory, IO, pids and network for the container.
For more information, see the [kernel cgroups documentation](https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt).
The path to the cgroups can to be specified in the Spec via `cgroupsPath`.
`cgroupsPath` is expected to be relative to the cgroups mount point.
If not specified, cgroups will be created under '/'.
Implementations of the Spec can choose to name cgroups in any manner.
The Spec does not include naming schema for cgroups.
The Spec does not support [split hierarchy](https://www.kernel.org/doc/Documentation/cgroups/unified-hierarchy.txt).
The cgroups will be created if they don't exist.
```json
"cgroupsPath": "/myRuntime/myContainer"
```
`cgroupsPath` can be used to either control the cgroups hierarchy for containers or to run a new process in an existing container.
Optionally, cgroups limits can be specified via `resources`.
```json
"resources": {
"disableOOMKiller": false,
"memory": {
"limit": 0,
"reservation": 0,
"swap": 0,
"kernel": 0,
"swappiness": -1
},
"cpu": {
"shares": 0,
"quota": 0,
"period": 0,
"realtimeRuntime": 0,
"realtimePeriod": 0,
"cpus": "",
"mems": ""
},
"blockIO": {
"blkioWeight": 0,
"blkioWeightDevice": "",
"blkioThrottleReadBpsDevice": "",
"blkioThrottleWriteBpsDevice": "",
"blkioThrottleReadIopsDevice": "",
"blkioThrottleWriteIopsDevice": ""
},
"hugepageLimits": null,
"network": {
"classId": "",
"priorities": null
}
}
```
Do not specify `resources` unless limits have to be updated.
For example, to run a new process in an existing container without updating limits, `resources` need not be specified.
## Sysctl
sysctl allows kernel parameters to be modified at runtime for the container.
For more information, see [the man page](http://man7.org/linux/man-pages/man8/sysctl.8.html)
```json
"sysctl": {
"net.ipv4.ip_forward": "1",
"net.core.somaxconn": "256"
}
```
## Rlimits
```json
"rlimits": [
{
"type": "RLIMIT_NPROC",
"soft": 1024,
"hard": 102400
}
]
```
rlimits allow setting resource limits.
`type` is a string with a value from those defined in [the man page](http://man7.org/linux/man-pages/man2/setrlimit.2.html).
The kernel enforces the `soft` limit for a resource while the `hard` limit acts as a ceiling for that value that could be set by an unprivileged process.
## SELinux process label
SELinux process label specifies the label with which the processes in a container are run.
For more information about SELinux, see [Selinux documentation](http://selinuxproject.org/page/Main_Page)
```json
"selinuxProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c124,c675"
```
## Apparmor profile
Apparmor profile specifies the name of the apparmor profile that will be used for the container.
For more information about Apparmor, see [Apparmor documentation](https://wiki.ubuntu.com/AppArmor)
```json
"apparmorProfile": "acme_secure_profile"
```
## seccomp
Seccomp provides application sandboxing mechanism in the Linux kernel.
Seccomp configuration allows one to configure actions to take for matched syscalls and furthermore also allows matching on values passed as arguments to syscalls.
For more information about Seccomp, see [Seccomp kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt)
The actions and operators are strings that match the definitions in seccomp.h from [libseccomp](https://github.com/seccomp/libseccomp) and are translated to corresponding values.
```json
"seccomp": {
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": [
{
"name": "getcwd",
"action": "SCMP_ACT_ERRNO"
}
]
}
```
## Rootfs Mount Propagation
rootfsPropagation sets the rootfs's mount propagation.
Its value is either slave, private, or shared.
[The kernel doc](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt) has more information about mount propagation.
```json
"rootfsPropagation": "slave",
```

View File

@ -0,0 +1,52 @@
## Mount Configuration
Additional filesystems can be declared as "mounts", specified in the *mounts* object.
Keys in this object are names of mount points from portable config.
Values are objects with configuration of mount points.
The parameters are similar to the ones in [the Linux mount system call](http://man7.org/linux/man-pages/man2/mount.2.html).
Only [mounts from the portable config](config.md#mount-points) will be mounted.
* **type** (string, required) Linux, *filesystemtype* argument supported by the kernel are listed in */proc/filesystems* (e.g., "minix", "ext2", "ext3", "jfs", "xfs", "reiserfs", "msdos", "proc", "nfs", "iso9660"). Windows: ntfs
* **source** (string, required) a device name, but can also be a directory name or a dummy. Windows, the volume name that is the target of the mount point. \\?\Volume\{GUID}\ (on Windows source is called target)
* **options** (list of strings, optional) in the fstab format [https://wiki.archlinux.org/index.php/Fstab](https://wiki.archlinux.org/index.php/Fstab).
*Example (Linux)*
```json
"mounts": {
"proc": {
"type": "proc",
"source": "proc",
"options": []
},
"dev": {
"type": "tmpfs",
"source": "tmpfs",
"options": ["nosuid","strictatime","mode=755","size=65536k"]
},
"devpts": {
"type": "devpts",
"source": "devpts",
"options": ["nosuid","noexec","newinstance","ptmxmode=0666","mode=0620","gid=5"]
},
"data": {
"type": "bind",
"source": "/volumes/testing",
"options": ["rbind","rw"]
}
}
```
*Example (Windows)*
```json
"mounts": {
"myfancymountpoint": {
"type": "ntfs",
"source": "\\\\?\\Volume\\{2eca078d-5cbc-43d3-aff8-7e8511f60d0e}\\",
"options": []
}
}
```
See links for details about [mountvol](http://ss64.com/nt/mountvol.html) and [SetVolumeMountPoint](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365561(v=vs.85).aspx) in Windows.

View File

@ -0,0 +1,6 @@
## File descriptors
By default, only the `stdin`, `stdout` and `stderr` file descriptors are kept open for the application by the runtime.
The runtime may pass additional file descriptors to the application to support features such as [socket activation](http://0pointer.de/blog/projects/socket-activated-containers.html).
Some of the file descriptors may be redirected to `/dev/null` even though they are open.

View File

@ -1,5 +1,34 @@
# Runtime and Lifecycle
## State
The runtime state for a container is persisted on disk so that external tools can consume and act on this information.
The runtime state is stored in a JSON encoded file.
It is recommended that this file is stored in a temporary filesystem so that it can be removed on a system reboot.
On Linux based systems the state information should be stored in `/run/opencontainer/containers`.
The directory structure for a container is `/run/opencontainer/containers/<containerID>/state.json`.
By providing a default location that container state is stored external applications can find all containers running on a system.
* **version** (string) Version of the OCI specification used when creating the container.
* **id** (string) ID is the container's ID.
* **pid** (int) Pid is the ID of the main process within the container.
* **root** (string) Root is the path to the container's bundle directory.
The ID is provided in the state because hooks will be executed with the state as the payload.
This allows the hook to perform clean and teardown logic after the runtime destroys its own state.
The root directory to the bundle is provided in the state so that consumers can find the container's configuration and rootfs where it is located on the host's filesystem.
*Example*
```json
{
"id": "oc-container",
"pid": 4422,
"root": "/containers/redis"
}
```
## Lifecycle
### Create
@ -8,21 +37,26 @@ Creates the container: file system, namespaces, cgroups, capabilities.
### Start (process)
Runs a process in a container. Can be invoked several times.
Runs a process in a container.
Can be invoked several times.
### Stop (process)
Not sure we need that from runc cli. Process is killed from the outside.
Not sure we need that from runc cli.
Process is killed from the outside.
This event needs to be captured by runc to run onstop event handlers.
## Hooks
Hooks allow one to run code before/after various lifecycle events of the container.
Hooks MUST be called in the listed order.
The state of the container is passed to the hooks over stdin, so the hooks could get the information they need to do their work.
Hook paths are absolute and are executed from the host's filesystem.
### Pre-start
The pre-start hooks are called after the container process is spawned, but before the user supplied command is executed.
They are called after the container namespaces are created on Linux, so they provide an opportunity to customize the container.
In Linux, for e.g., the network namespace could be configured in this hook.
@ -30,7 +64,9 @@ In Linux, for e.g., the network namespace could be configured in this hook.
If a hook returns a non-zero exit code, then an error including the exit code and the stderr is returned to the caller and the container is torn down.
### Post-stop
The post-stop hooks are called after the container process is stopped. Cleanup or debugging could be performed in such a hook.
The post-stop hooks are called after the container process is stopped.
Cleanup or debugging could be performed in such a hook.
If a hook returns a non-zero exit code, then an error is logged and the remaining hooks are executed.
*Example*
@ -56,4 +92,5 @@ If a hook returns a non-zero exit code, then an error is logged and the remainin
}
```
`path` is required for a hook. `args` and `env` are optional.
`path` is required for a hook.
`args` and `env` are optional.

View File

@ -0,0 +1,38 @@
package specs
// RuntimeSpec is the generic runtime state information on a running container
type RuntimeSpec struct {
// Mounts is a mapping of names to mount configurations.
// Which mounts will be mounted and where should be chosen with MountPoints
// in Spec.
Mounts map[string]Mount `json:"mounts"`
// Hooks are the commands run at various lifecycle events of the container.
Hooks Hooks `json:"hooks"`
}
// Hook specifies a command that is run at a particular event in the lifecycle of a container
type Hook struct {
Path string `json:"path"`
Args []string `json:"args"`
Env []string `json:"env"`
}
// Hooks for container setup and teardown
type Hooks struct {
// Prestart is a list of hooks to be run before the container process is executed.
// On Linux, they are run after the container namespaces are created.
Prestart []Hook `json:"prestart"`
// Poststop is a list of hooks to be run after the container process exits.
Poststop []Hook `json:"poststop"`
}
// Mount specifies a mount for a container
type Mount struct {
// Type specifies the mount kind.
Type string `json:"type"`
// Source specifies the source path of the mount. In the case of bind mounts on
// linux based systems this would be the file on the host.
Source string `json:"source"`
// Options are fstab style mount options.
Options []string `json:"options"`
}

View File

@ -1,33 +1,36 @@
// +build linux
package specs
import "os"
// LinuxSpec is the full specification for Linux containers
type LinuxSpec struct {
Spec
// Linux is platform specific configuration for Linux based containers
Linux Linux `json:"linux"`
// LinuxStateDirectory holds the container's state information
const LinuxStateDirectory = "/run/opencontainer/containers"
// LinuxRuntimeSpec is the full specification for linux containers.
type LinuxRuntimeSpec struct {
RuntimeSpec
// LinuxRuntime is platform specific configuration for linux based containers.
Linux LinuxRuntime `json:"linux"`
}
// Linux contains platform specific configuration for Linux based containers
type Linux struct {
// UIDMapping specifies user mappings for supporting user namespaces on Linux
// LinuxRuntime hosts the Linux-only runtime information
type LinuxRuntime struct {
// UIDMapping specifies user mappings for supporting user namespaces on linux.
UIDMappings []IDMapping `json:"uidMappings"`
// GIDMapping specifies group mappings for supporting user namespaces on Linux
// GIDMapping specifies group mappings for supporting user namespaces on linux.
GIDMappings []IDMapping `json:"gidMappings"`
// Rlimits specifies rlimit options to apply to the container's process
// Rlimits specifies rlimit options to apply to the container's process.
Rlimits []Rlimit `json:"rlimits"`
// Sysctl are a set of key value pairs that are set for the container on start
Sysctl map[string]string `json:"sysctl"`
// Resources contain cgroup information for handling resource constraints
// for the container
Resources Resources `json:"resources"`
Resources *Resources `json:"resources"`
// CgroupsPath specifies the path to cgroups that are created and/or joined by the container.
// The path is expected to be relative to the cgroups mountpoint.
// If resources are specified, the cgroups at CgroupsPath will be updated based on resources.
CgroupsPath string `json:"cgroupsPath"`
// Namespaces contains the namespaces that are created and/or joined by the container
Namespaces []Namespace `json:"namespaces"`
// Capabilities are Linux capabilities that are kept for the container
Capabilities []string `json:"capabilities"`
// Devices are a list of device nodes that are created and enabled for the container
Devices []Device `json:"devices"`
// ApparmorProfile specified the apparmor profile for the container.
@ -40,26 +43,33 @@ type Linux struct {
RootfsPropagation string `json:"rootfsPropagation"`
}
// User specifies Linux specific user and group information for the container's
// main process
type User struct {
// Uid is the user id
UID int32 `json:"uid"`
// Gid is the group id
GID int32 `json:"gid"`
// AdditionalGids are additional group ids set for the container's process
AdditionalGids []int32 `json:"additionalGids"`
}
// Namespace is the configuration for a Linux namespace
// Namespace is the configuration for a linux namespace
type Namespace struct {
// Type is the type of Linux namespace
Type string `json:"type"`
Type NamespaceType `json:"type"`
// Path is a path to an existing namespace persisted on disk that can be joined
// and is of the same type
Path string `json:"path"`
}
// NamespaceType is one of the linux namespaces
type NamespaceType string
const (
// PIDNamespace for isolating process IDs
PIDNamespace NamespaceType = "pid"
// NetworkNamespace for isolating network devices, stacks, ports, etc
NetworkNamespace = "network"
// MountNamespace for isolating mount points
MountNamespace = "mount"
// IPCNamespace for isolating System V IPC, POSIX message queues
IPCNamespace = "ipc"
// UTSNamespace for isolating hostname and NIS domain name
UTSNamespace = "uts"
// UserNamespace for isolating user and group IDs
UserNamespace = "user"
)
// IDMapping specifies UID/GID mappings
type IDMapping struct {
// HostID is the UID/GID of the host user or group
@ -73,7 +83,7 @@ type IDMapping struct {
// Rlimit type and restrictions
type Rlimit struct {
// Type of the rlimit to set
Type int `json:"type"`
Type string `json:"type"`
// Hard is the hard limit for the specified type
Hard uint64 `json:"hard"`
// Soft is the soft limit for the specified type
@ -142,6 +152,12 @@ type CPU struct {
Mems string `json:"mems"`
}
// Pids for Linux cgroup 'pids' resource management (Linux 4.3)
type Pids struct {
// Maximum number of PIDs. A value < 0 implies "no limit".
Limit int64 `json:"limit"`
}
// Network identification and priority configuration
type Network struct {
// Set class identifier for container's network packets
@ -158,6 +174,8 @@ type Resources struct {
Memory Memory `json:"memory"`
// CPU resource restriction configuration
CPU CPU `json:"cpu"`
// Task resource restriction configuration.
Pids Pids `json:"pids"`
// BlockIO restriction configuration
BlockIO BlockIO `json:"blockIO"`
// Hugetlb limit (in bytes)
@ -166,11 +184,12 @@ type Resources struct {
Network Network `json:"network"`
}
// Device represents the information on a Linux special device file
type Device struct {
// Device type, block, char, etc.
Type rune `json:"type"`
// Path to the device.
Path string `json:"path"`
// Device type, block, char, etc.
Type rune `json:"type"`
// Major is the device's major number.
Major int64 `json:"major"`
// Minor is the device's minor number.

View File

@ -1,4 +1,15 @@
package specs
import "fmt"
const (
// VersionMajor is for an API incompatible changes
VersionMajor = 0
// VersionMinor is for functionality in a backwards-compatible manner
VersionMinor = 1
// VersionPatch is for backwards-compatible bug fixes
VersionPatch = 0
)
// Version is the specification version that the package types support.
const Version = "pre-draft"
var Version = fmt.Sprintf("%d.%d.%d", VersionMajor, VersionMinor, VersionPatch)

511
README.md
View File

@ -4,16 +4,17 @@
## State of the project
Currently `runc` is an implementation of the OCF specification. We are currently sprinting
Currently `runc` is an implementation of the OCI specification. We are currently sprinting
to have a v1 of the spec out within a quick timeframe of a few weeks, ~July 2015,
so the `runc` config format will be constantly changing until
the spec is finalized. However, we encourage you to try out the tool and give feedback.
### OCF
How does `runc` integrate with the Open Container Format? `runc` depends on the types
specified in the [specs](https://github.com/opencontainers/specs) repository. Whenever
the specification is updated and ready to be versioned `runc` will update it's dependency
How does `runc` integrate with the Open Container Initiative Specification?
`runc` depends on the types specified in the
[specs](https://github.com/opencontainers/specs) repository. Whenever the
specification is updated and ready to be versioned `runc` will update its dependency
on the specs repository and support the update spec.
### Building:
@ -56,229 +57,293 @@ PID USER COMMAND
/ $
```
Or you can specify the path to a JSON configuration file:
```bash
runc start config.json
/ $ ps
PID USER COMMAND
1 daemon sh
5 daemon sh
/ $
```
Note: the use of the `start` command is required when specifying a
configuration file.
### OCI Container JSON Format:
### OCF Container JSON Format:
Below is a sample `config.json` configuration file. It assumes that
Below are sample `config.json` and `runtime.json` configuration files. It assumes that
the file-system is found in a directory called `rootfs` and there is a
user with uid and gid of `0` defined within that file-system.
`config.json`:
```json
{
"version": "pre-draft",
"platform": {
"os": "linux",
"arch": "amd64"
},
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0,
"additionalGids": null
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": ""
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "shell",
"mounts": [
{
"type": "proc",
"source": "proc",
"destination": "/proc",
"options": ""
},
{
"type": "tmpfs",
"source": "tmpfs",
"destination": "/dev",
"options": "nosuid,strictatime,mode=755,size=65536k"
},
{
"type": "devpts",
"source": "devpts",
"destination": "/dev/pts",
"options": "nosuid,noexec,newinstance,ptmxmode=0666,mode=0620,gid=5"
},
{
"type": "tmpfs",
"source": "shm",
"destination": "/dev/shm",
"options": "nosuid,noexec,nodev,mode=1777,size=65536k"
},
{
"type": "mqueue",
"source": "mqueue",
"destination": "/dev/mqueue",
"options": "nosuid,noexec,nodev"
},
{
"type": "sysfs",
"source": "sysfs",
"destination": "/sys",
"options": "nosuid,noexec,nodev"
},
{
"type": "cgroup",
"source": "cgroup",
"destination": "/sys/fs/cgroup",
"options": "nosuid,noexec,nodev,relatime,ro"
}
],
"linux": {
"uidMapping": null,
"gidMapping": null,
"rlimits": [
{
"type": 7,
"hard": 1024,
"soft": 1024
}
],
"systemProperties": null,
"resources": {
"disableOOMKiller": false,
"memory": {
"limit": 0,
"reservation": 0,
"swap": 0,
"kernel": 0,
"swappiness": -1
},
"cpu": {
"shares": 0,
"quota": 0,
"period": 0,
"realtimeRuntime": 0,
"realtimePeriod": 0,
"cpus": "",
"mems": ""
},
"blockIO": {
"blkioWeight": 0,
"blkioWeightDevice": "",
"blkioThrottleReadBpsDevice": "",
"blkioThrottleWriteBpsDevice": "",
"blkioThrottleReadIopsDevice": "",
"blkioThrottleWriteIopsDevice": ""
},
"hugepageLimits": null,
"network": {
"classId": "",
"priorities": null
}
},
"namespaces": [
{
"type": "pid",
"path": ""
},
{
"type": "network",
"path": ""
},
{
"type": "ipc",
"path": ""
},
{
"type": "uts",
"path": ""
},
{
"type": "mount",
"path": ""
}
],
"capabilities": [
"AUDIT_WRITE",
"KILL",
"NET_BIND_SERVICE"
],
"devices": [
{
"type": 99,
"path": "/dev/null",
"major": 1,
"minor": 3,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"type": 99,
"path": "/dev/random",
"major": 1,
"minor": 8,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"type": 99,
"path": "/dev/full",
"major": 1,
"minor": 7,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"type": 99,
"path": "/dev/tty",
"major": 5,
"minor": 0,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"type": 99,
"path": "/dev/zero",
"major": 1,
"minor": 5,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"type": 99,
"path": "/dev/urandom",
"major": 1,
"minor": 9,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
}
],
}
"version": "pre-draft",
"platform": {
"os": "linux",
"arch": "amd64"
},
"process": {
"terminal": true,
"user": {
"uid": 0,
"gid": 0,
"additionalGids": null
},
"args": [
"sh"
],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm"
],
"cwd": ""
},
"root": {
"path": "rootfs",
"readonly": true
},
"hostname": "shell",
"mounts": [
{
"name": "proc",
"path": "/proc"
},
{
"name": "dev",
"path": "/dev"
},
{
"name": "devpts",
"path": "/dev/pts"
},
{
"name": "shm",
"path": "/dev/shm"
},
{
"name": "mqueue",
"path": "/dev/mqueue"
},
{
"name": "sysfs",
"path": "/sys"
},
{
"name": "cgroup",
"path": "/sys/fs/cgroup"
}
],
"linux": {
"capabilities": [
"AUDIT_WRITE",
"KILL",
"NET_BIND_SERVICE"
],
"rootfsPropagation": ""
}
}
```
`runtime.json`:
```json
{
"mounts": {
"proc": {
"type": "proc",
"source": "proc",
"options": null
},
"dev": {
"type": "tmpfs",
"source": "tmpfs",
"options": [
"nosuid",
"strictatime",
"mode=755",
"size=65536k"
]
},
"devpts": {
"type": "devpts",
"source": "devpts",
"options": [
"nosuid",
"noexec",
"newinstance",
"ptmxmode=0666",
"mode=0620",
"gid=5"
]
},
"shm": {
"type": "tmpfs",
"source": "shm",
"options": [
"nosuid",
"noexec",
"nodev",
"mode=1777",
"size=65536k"
]
},
"mqueue": {
"type": "mqueue",
"source": "mqueue",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
"sysfs": {
"type": "sysfs",
"source": "sysfs",
"options": [
"nosuid",
"noexec",
"nodev"
]
},
"cgroup": {
"type": "cgroup",
"source": "cgroup",
"options": [
"nosuid",
"noexec",
"nodev",
"relatime",
"ro"
]
}
},
"hooks": {
"prestart": null,
"poststop": null
},
"linux": {
"uidMappings": null,
"gidMappings": null,
"rlimits": [
{
"type": 7,
"hard": 1024,
"soft": 1024
}
],
"sysctl": null,
"resources": {
"disableOOMKiller": false,
"memory": {
"limit": 0,
"reservation": 0,
"swap": 0,
"kernel": 0,
"swappiness": -1
},
"cpu": {
"shares": 0,
"quota": 0,
"period": 0,
"realtimeRuntime": 0,
"realtimePeriod": 0,
"cpus": "",
"mems": ""
},
"blockIO": {
"blkioWeight": 0,
"blkioWeightDevice": "",
"blkioThrottleReadBpsDevice": "",
"blkioThrottleWriteBpsDevice": "",
"blkioThrottleReadIopsDevice": "",
"blkioThrottleWriteIopsDevice": ""
},
"hugepageLimits": null,
"network": {
"classId": "",
"priorities": null
}
},
"namespaces": [
{
"type": "pid",
"path": ""
},
{
"type": "network",
"path": ""
},
{
"type": "ipc",
"path": ""
},
{
"type": "uts",
"path": ""
},
{
"type": "mount",
"path": ""
}
],
"devices": [
{
"path": "/dev/null",
"type": 99,
"major": 1,
"minor": 3,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/random",
"type": 99,
"major": 1,
"minor": 8,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/full",
"type": 99,
"major": 1,
"minor": 7,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/tty",
"type": 99,
"major": 5,
"minor": 0,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/zero",
"type": 99,
"major": 1,
"minor": 5,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
},
{
"path": "/dev/urandom",
"type": 99,
"major": 1,
"minor": 9,
"permissions": "rwm",
"fileMode": 438,
"uid": 0,
"gid": 0
}
],
"apparmorProfile": "",
"selinuxProcessLabel": "",
"seccomp": {
"defaultAction": "SCMP_ACT_ALLOW",
"syscalls": []
},
"rootfsPropagation": ""
}
}
```
@ -295,8 +360,8 @@ To test using Docker's `busybox` image follow these steps:
mkdir rootfs
tar -C rootfs -xf busybox.tar
```
* Create a file called `config.json` using the example from above. You can also
generate a spec using `runc spec`, redirecting the output into `config.json`
* Create `config.json` and `runtime.json` using the example from above. You can also
generate a spec using `runc spec`, which will create those files for you.
* Execute `runc start` and you should be placed into a shell where you can run `ps`:
```
$ runc start

View File

@ -5,10 +5,11 @@ import (
"github.com/Sirupsen/logrus"
"github.com/codegangsta/cli"
"github.com/opencontainers/specs"
)
const (
version = "0.2"
version = "0.3"
usage = `Open Container Initiative runtime
runc is a command line client for running applications packaged according to
@ -54,7 +55,7 @@ func main() {
},
cli.StringFlag{
Name: "root",
Value: "/run/oci",
Value: specs.LinuxStateDirectory,
Usage: "root directory for storage of container state (this should be located in tmpfs)",
},
cli.StringFlag{
@ -87,8 +88,6 @@ func main() {
}
return nil
}
// Default to 'start' if no command is specified
app.Action = startCommand.Action
if err := app.Run(os.Args); err != nil {
logrus.Fatal(err)
}

View File

@ -18,23 +18,53 @@ var restoreCommand = cli.Command{
Name: "restore",
Usage: "restore a container from a previous checkpoint",
Flags: []cli.Flag{
cli.StringFlag{Name: "image-path", Value: "", Usage: "path to criu image files for restoring"},
cli.StringFlag{Name: "work-path", Value: "", Usage: "path for saving work files and logs"},
cli.BoolFlag{Name: "tcp-established", Usage: "allow open tcp connections"},
cli.BoolFlag{Name: "ext-unix-sk", Usage: "allow external unix sockets"},
cli.BoolFlag{Name: "shell-job", Usage: "allow shell jobs"},
cli.BoolFlag{Name: "file-locks", Usage: "handle file locks, for safety"},
cli.StringFlag{
Name: "image-path",
Value: "",
Usage: "path to criu image files for restoring",
},
cli.StringFlag{
Name: "work-path",
Value: "",
Usage: "path for saving work files and logs",
},
cli.BoolFlag{
Name: "tcp-established",
Usage: "allow open tcp connections",
},
cli.BoolFlag{
Name: "ext-unix-sk",
Usage: "allow external unix sockets",
},
cli.BoolFlag{
Name: "shell-job",
Usage: "allow shell jobs",
},
cli.BoolFlag{
Name: "file-locks",
Usage: "handle file locks, for safety",
},
cli.StringFlag{
Name: "config-file, c",
Value: "config.json",
Usage: "path to spec file for writing",
},
cli.StringFlag{
Name: "runtime-file, r",
Value: "runtime.json",
Usage: "path for runtime file for writing",
},
},
Action: func(context *cli.Context) {
imagePath := context.String("image-path")
if imagePath == "" {
imagePath = getDefaultImagePath(context)
}
spec, err := loadSpec(context.Args().First())
spec, rspec, err := loadSpec(context.String("config-file"), context.String("runtime-file"))
if err != nil {
fatal(err)
}
config, err := createLibcontainerConfig(context.GlobalString("id"), spec)
config, err := createLibcontainerConfig(context.GlobalString("id"), spec, rspec)
if err != nil {
fatal(err)
}

49
rlimit_linux.go Normal file
View File

@ -0,0 +1,49 @@
package main
import "fmt"
const (
RLIMIT_CPU = iota // CPU time in sec
RLIMIT_FSIZE // Maximum filesize
RLIMIT_DATA // max data size
RLIMIT_STACK // max stack size
RLIMIT_CORE // max core file size
RLIMIT_RSS // max resident set size
RLIMIT_NPROC // max number of processes
RLIMIT_NOFILE // max number of open files
RLIMIT_MEMLOCK // max locked-in-memory address space
RLIMIT_AS // address space limit
RLIMIT_LOCKS // maximum file locks held
RLIMIT_SIGPENDING // max number of pending signals
RLIMIT_MSGQUEUE // maximum bytes in POSIX mqueues
RLIMIT_NICE // max nice prio allowed to raise to
RLIMIT_RTPRIO // maximum realtime priority
RLIMIT_RTTIME // timeout for RT tasks in us
)
var rlimitMap = map[string]int{
"RLIMIT_CPU": RLIMIT_CPU,
"RLIMIT_FSIZE": RLIMIT_FSIZE,
"RLIMIT_DATA": RLIMIT_DATA,
"RLIMIT_STACK": RLIMIT_STACK,
"RLIMIT_CORE": RLIMIT_CORE,
"RLIMIT_RSS": RLIMIT_RSS,
"RLIMIT_NPROC": RLIMIT_NPROC,
"RLIMIT_NOFILE": RLIMIT_NOFILE,
"RLIMIT_MEMLOCK": RLIMIT_MEMLOCK,
"RLIMIT_AS": RLIMIT_AS,
"RLIMIT_LOCKS": RLIMIT_LOCKS,
"RLIMIT_SGPENDING": RLIMIT_SIGPENDING,
"RLIMIT_MSGQUEUE": RLIMIT_MSGQUEUE,
"RLIMIT_NICE": RLIMIT_NICE,
"RLIMIT_RTPRIO": RLIMIT_RTPRIO,
"RLIMIT_RTTIME": RLIMIT_RTTIME,
}
func strToRlimit(key string) (int, error) {
rl, ok := rlimitMap[key]
if !ok {
return 0, fmt.Errorf("Wrong rlimit value: %s", key)
}
return rl, nil
}

238
spec.go
View File

@ -5,6 +5,7 @@ package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"runtime"
@ -22,6 +23,10 @@ import (
var specCommand = cli.Command{
Name: "spec",
Usage: "create a new specification file",
Flags: []cli.Flag{
cli.StringFlag{Name: "config-file, c", Value: "config.json", Usage: "path to spec file for writing"},
cli.StringFlag{Name: "runtime-file, r", Value: "runtime.json", Usage: "path to runtime file for writing"},
},
Action: func(context *cli.Context) {
spec := specs.LinuxSpec{
Spec: specs.Spec{
@ -46,52 +51,86 @@ var specCommand = cli.Command{
},
},
Hostname: "shell",
Mounts: []specs.Mount{
Mounts: []specs.MountPoint{
{
Type: "proc",
Source: "proc",
Destination: "/proc",
Options: "",
Name: "proc",
Path: "/proc",
},
{
Type: "tmpfs",
Source: "tmpfs",
Destination: "/dev",
Options: "nosuid,strictatime,mode=755,size=65536k",
Name: "dev",
Path: "/dev",
},
{
Type: "devpts",
Source: "devpts",
Destination: "/dev/pts",
Options: "nosuid,noexec,newinstance,ptmxmode=0666,mode=0620,gid=5",
Name: "devpts",
Path: "/dev/pts",
},
{
Type: "tmpfs",
Source: "shm",
Destination: "/dev/shm",
Options: "nosuid,noexec,nodev,mode=1777,size=65536k",
Name: "shm",
Path: "/dev/shm",
},
{
Type: "mqueue",
Source: "mqueue",
Destination: "/dev/mqueue",
Options: "nosuid,noexec,nodev",
Name: "mqueue",
Path: "/dev/mqueue",
},
{
Type: "sysfs",
Source: "sysfs",
Destination: "/sys",
Options: "nosuid,noexec,nodev",
Name: "sysfs",
Path: "/sys",
},
{
Type: "cgroup",
Source: "cgroup",
Destination: "/sys/fs/cgroup",
Options: "nosuid,noexec,nodev,relatime,ro",
Name: "cgroup",
Path: "/sys/fs/cgroup",
},
},
},
Linux: specs.Linux{
Capabilities: []string{
"AUDIT_WRITE",
"KILL",
"NET_BIND_SERVICE",
},
},
}
rspec := specs.LinuxRuntimeSpec{
RuntimeSpec: specs.RuntimeSpec{
Mounts: map[string]specs.Mount{
"proc": {
Type: "proc",
Source: "proc",
Options: nil,
},
"dev": {
Type: "tmpfs",
Source: "tmpfs",
Options: []string{"nosuid", "strictatime", "mode=755", "size=65536k"},
},
"devpts": {
Type: "devpts",
Source: "devpts",
Options: []string{"nosuid", "noexec", "newinstance", "ptmxmode=0666", "mode=0620", "gid=5"},
},
"shm": {
Type: "tmpfs",
Source: "shm",
Options: []string{"nosuid", "noexec", "nodev", "mode=1777", "size=65536k"},
},
"mqueue": {
Type: "mqueue",
Source: "mqueue",
Options: []string{"nosuid", "noexec", "nodev"},
},
"sysfs": {
Type: "sysfs",
Source: "sysfs",
Options: []string{"nosuid", "noexec", "nodev"},
},
"cgroup": {
Type: "cgroup",
Source: "cgroup",
Options: []string{"nosuid", "noexec", "nodev", "relatime", "ro"},
},
},
},
Linux: specs.LinuxRuntime{
Namespaces: []specs.Namespace{
{
Type: "pid",
@ -109,19 +148,13 @@ var specCommand = cli.Command{
Type: "mount",
},
},
Capabilities: []string{
"AUDIT_WRITE",
"KILL",
"NET_BIND_SERVICE",
},
Rlimits: []specs.Rlimit{
{
Type: syscall.RLIMIT_NOFILE,
Type: "RLIMIT_NOFILE",
Hard: uint64(1024),
Soft: uint64(1024),
},
},
Devices: []specs.Device{
{
Type: 'c',
@ -184,7 +217,7 @@ var specCommand = cli.Command{
GID: 0,
},
},
Resources: specs.Resources{
Resources: &specs.Resources{
Memory: specs.Memory{
Swappiness: -1,
},
@ -195,42 +228,75 @@ var specCommand = cli.Command{
},
},
}
checkNoFile := func(name string) error {
_, err := os.Stat(name)
if err == nil {
return fmt.Errorf("File %s exists. Remove it first", name)
}
if !os.IsNotExist(err) {
return err
}
return nil
}
cName := context.String("config-file")
rName := context.String("runtime-file")
if err := checkNoFile(cName); err != nil {
logrus.Fatal(err)
}
if err := checkNoFile(rName); err != nil {
logrus.Fatal(err)
}
data, err := json.MarshalIndent(&spec, "", "\t")
if err != nil {
logrus.Fatal(err)
}
fmt.Printf("%s", data)
if err := ioutil.WriteFile(cName, data, 0666); err != nil {
logrus.Fatal(err)
}
rdata, err := json.MarshalIndent(&rspec, "", "\t")
if err != nil {
logrus.Fatal(err)
}
if err := ioutil.WriteFile(rName, rdata, 0666); err != nil {
logrus.Fatal(err)
}
},
}
var namespaceMapping = map[string]configs.NamespaceType{
"pid": configs.NEWPID,
"network": configs.NEWNET,
"mount": configs.NEWNS,
"user": configs.NEWUSER,
"ipc": configs.NEWIPC,
"uts": configs.NEWUTS,
var namespaceMapping = map[specs.NamespaceType]configs.NamespaceType{
specs.PIDNamespace: configs.NEWPID,
specs.NetworkNamespace: configs.NEWNET,
specs.MountNamespace: configs.NEWNS,
specs.UserNamespace: configs.NEWUSER,
specs.IPCNamespace: configs.NEWIPC,
specs.UTSNamespace: configs.NEWUTS,
}
// loadSpec loads the specification from the provided path.
// If the path is empty then the default path will be "config.json"
func loadSpec(path string) (*specs.LinuxSpec, error) {
if path == "" {
path = "config.json"
}
f, err := os.Open(path)
func loadSpec(cPath, rPath string) (spec *specs.LinuxSpec, rspec *specs.LinuxRuntimeSpec, err error) {
cf, err := os.Open(cPath)
if err != nil {
if os.IsNotExist(err) {
return nil, fmt.Errorf("JSON specification file for %s not found", path)
return nil, nil, fmt.Errorf("JSON specification file at %s not found", cPath)
}
return nil, err
return
}
defer f.Close()
var s *specs.LinuxSpec
if err := json.NewDecoder(f).Decode(&s); err != nil {
return nil, err
rf, err := os.Open(rPath)
if err != nil {
if os.IsNotExist(err) {
return nil, nil, fmt.Errorf("JSON runtime config file at %s not found", rPath)
}
return
}
return s, checkSpecVersion(s)
defer rf.Close()
if err = json.NewDecoder(cf).Decode(&spec); err != nil {
return
}
if err = json.NewDecoder(rf).Decode(&rspec); err != nil {
return
}
return spec, rspec, checkSpecVersion(spec)
}
// checkSpecVersion makes sure that the spec version matches runc's while we are in the initial
@ -242,7 +308,7 @@ func checkSpecVersion(s *specs.LinuxSpec) error {
return nil
}
func createLibcontainerConfig(cgroupName string, spec *specs.LinuxSpec) (*configs.Config, error) {
func createLibcontainerConfig(cgroupName string, spec *specs.LinuxSpec, rspec *specs.LinuxRuntimeSpec) (*configs.Config, error) {
cwd, err := os.Getwd()
if err != nil {
return nil, err
@ -258,7 +324,7 @@ func createLibcontainerConfig(cgroupName string, spec *specs.LinuxSpec) (*config
Hostname: spec.Hostname,
Privatefs: true,
}
for _, ns := range spec.Linux.Namespaces {
for _, ns := range rspec.Linux.Namespaces {
t, exists := namespaceMapping[ns.Type]
if !exists {
return nil, fmt.Errorf("namespace %q does not exist", ns)
@ -272,19 +338,27 @@ func createLibcontainerConfig(cgroupName string, spec *specs.LinuxSpec) (*config
},
}
}
for _, m := range spec.Mounts {
config.Mounts = append(config.Mounts, createLibcontainerMount(cwd, m))
for _, mp := range spec.Mounts {
m, ok := rspec.Mounts[mp.Name]
if !ok {
return nil, fmt.Errorf("Mount with Name %q not found in runtime config", mp.Name)
}
config.Mounts = append(config.Mounts, createLibcontainerMount(cwd, mp.Path, m))
}
if err := createDevices(spec, config); err != nil {
if err := createDevices(rspec, config); err != nil {
return nil, err
}
if err := setupUserNamespace(spec, config); err != nil {
if err := setupUserNamespace(rspec, config); err != nil {
return nil, err
}
for _, rlimit := range spec.Linux.Rlimits {
config.Rlimits = append(config.Rlimits, createLibContainerRlimit(rlimit))
for _, rlimit := range rspec.Linux.Rlimits {
rl, err := createLibContainerRlimit(rlimit)
if err != nil {
return nil, err
}
config.Rlimits = append(config.Rlimits, rl)
}
c, err := createCgroupConfig(cgroupName, spec, config.Devices)
c, err := createCgroupConfig(cgroupName, rspec, config.Devices)
if err != nil {
return nil, err
}
@ -298,18 +372,18 @@ func createLibcontainerConfig(cgroupName string, spec *specs.LinuxSpec) (*config
"/proc/sys", "/proc/sysrq-trigger", "/proc/irq", "/proc/bus",
}
}
seccomp, err := setupSeccomp(&spec.Linux.Seccomp)
seccomp, err := setupSeccomp(&rspec.Linux.Seccomp)
if err != nil {
return nil, err
}
config.Seccomp = seccomp
config.Sysctl = spec.Linux.Sysctl
config.ProcessLabel = spec.Linux.SelinuxProcessLabel
config.AppArmorProfile = spec.Linux.ApparmorProfile
config.Sysctl = rspec.Linux.Sysctl
config.ProcessLabel = rspec.Linux.SelinuxProcessLabel
config.AppArmorProfile = rspec.Linux.ApparmorProfile
return config, nil
}
func createLibcontainerMount(cwd string, m specs.Mount) *configs.Mount {
func createLibcontainerMount(cwd, dest string, m specs.Mount) *configs.Mount {
flags, data := parseMountOptions(m.Options)
source := m.Source
if m.Type == "bind" {
@ -320,13 +394,13 @@ func createLibcontainerMount(cwd string, m specs.Mount) *configs.Mount {
return &configs.Mount{
Device: m.Type,
Source: source,
Destination: m.Destination,
Destination: dest,
Data: data,
Flags: flags,
}
}
func createCgroupConfig(name string, spec *specs.LinuxSpec, devices []*configs.Device) (*configs.Cgroup, error) {
func createCgroupConfig(name string, spec *specs.LinuxRuntimeSpec, devices []*configs.Device) (*configs.Cgroup, error) {
myCgroupPath, err := cgroups.GetThisCgroupDir("devices")
if err != nil {
return nil, err
@ -372,7 +446,7 @@ func createCgroupConfig(name string, spec *specs.LinuxSpec, devices []*configs.D
return c, nil
}
func createDevices(spec *specs.LinuxSpec, config *configs.Config) error {
func createDevices(spec *specs.LinuxRuntimeSpec, config *configs.Config) error {
for _, d := range spec.Linux.Devices {
device := &configs.Device{
Type: d.Type,
@ -397,7 +471,7 @@ func setReadonly(config *configs.Config) {
}
}
func setupUserNamespace(spec *specs.LinuxSpec, config *configs.Config) error {
func setupUserNamespace(spec *specs.LinuxRuntimeSpec, config *configs.Config) error {
if len(spec.Linux.UIDMappings) == 0 {
return nil
}
@ -430,17 +504,21 @@ func setupUserNamespace(spec *specs.LinuxSpec, config *configs.Config) error {
return nil
}
func createLibContainerRlimit(rlimit specs.Rlimit) configs.Rlimit {
func createLibContainerRlimit(rlimit specs.Rlimit) (configs.Rlimit, error) {
rl, err := strToRlimit(rlimit.Type)
if err != nil {
return configs.Rlimit{}, err
}
return configs.Rlimit{
Type: int(rlimit.Type),
Type: rl,
Hard: uint64(rlimit.Hard),
Soft: uint64(rlimit.Soft),
}
}, nil
}
// parseMountOptions parses the string and returns the flags and any mount data that
// it contains.
func parseMountOptions(options string) (int, string) {
func parseMountOptions(options []string) (int, string) {
var (
flag int
data []string
@ -483,7 +561,7 @@ func parseMountOptions(options string) (int, string) {
"sync": {false, syscall.MS_SYNCHRONOUS},
"unbindable": {false, syscall.MS_UNBINDABLE},
}
for _, o := range strings.Split(options, ",") {
for _, o := range options {
// If the option does not exist in the flags table or the flag
// is not supported on the platform,
// then it is a data value for a specific fs type

View File

@ -20,15 +20,27 @@ const SD_LISTEN_FDS_START = 3
var startCommand = cli.Command{
Name: "start",
Usage: "create and run a container",
Flags: []cli.Flag{
cli.StringFlag{
Name: "config-file, c",
Value: "config.json",
Usage: "path to spec file for writing",
},
cli.StringFlag{
Name: "runtime-file, r",
Value: "runtime.json",
Usage: "path for runtime file for writing",
},
},
Action: func(context *cli.Context) {
spec, err := loadSpec(context.Args().First())
spec, rspec, err := loadSpec(context.String("config-file"), context.String("runtime-file"))
if err != nil {
fatal(err)
}
notifySocket := os.Getenv("NOTIFY_SOCKET")
if notifySocket != "" {
setupSdNotify(spec, notifySocket)
setupSdNotify(spec, rspec, notifySocket)
}
listenFds := os.Getenv("LISTEN_FDS")
@ -41,7 +53,7 @@ var startCommand = cli.Command{
if os.Geteuid() != 0 {
logrus.Fatal("runc should be run as root")
}
status, err := startContainer(context, spec)
status, err := startContainer(context, spec, rspec)
if err != nil {
logrus.Fatalf("Container start failed: %v", err)
}
@ -63,8 +75,8 @@ func init() {
}
}
func startContainer(context *cli.Context, spec *specs.LinuxSpec) (int, error) {
config, err := createLibcontainerConfig(context.GlobalString("id"), spec)
func startContainer(context *cli.Context, spec *specs.LinuxSpec, rspec *specs.LinuxRuntimeSpec) (int, error) {
config, err := createLibcontainerConfig(context.GlobalString("id"), spec, rspec)
if err != nil {
return -1, err
}
@ -117,9 +129,11 @@ func startContainer(context *cli.Context, spec *specs.LinuxSpec) (int, error) {
// If systemd is supporting sd_notify protocol, this function will add support
// for sd_notify protocol from within the container.
func setupSdNotify(spec *specs.LinuxSpec, notifySocket string) {
spec.Mounts = append(spec.Mounts, specs.Mount{Type: "bind", Source: notifySocket, Destination: notifySocket, Options: "bind"})
func setupSdNotify(spec *specs.LinuxSpec, rspec *specs.LinuxRuntimeSpec, notifySocket string) {
mountName := "sdNotify"
spec.Mounts = append(spec.Mounts, specs.MountPoint{Name: mountName, Path: notifySocket})
spec.Process.Env = append(spec.Process.Env, fmt.Sprintf("NOTIFY_SOCKET=%s", notifySocket))
rspec.Mounts[mountName] = specs.Mount{Type: "bind", Source: notifySocket, Options: []string{"bind"}}
}
// If systemd is supporting on-demand socket activation, this function will add support