1. The command `runc checkpoint --lazy-server --status-fd $FD` actually
accepts a file name as an $FD. Make it accept a file descriptor,
like its name implies and the documentation states.
In addition, since runc itself does not use the result of CRIU status
fd, remove the code which relays it, and pass the FD directly to CRIU.
Note 1: runc should close this file descriptor itself after passing it
to criu, otherwise whoever waits on it might wait forever.
Note 2: due to the way criu swrk consumes the fd (it reopens
/proc/$SENDER_PID/fd/$FD), runc can't close it as soon as criu swrk has
started. There is no good way to know when criu swrk has reopened the
fd, so we assume that as soon as we have received something back, the
fd is already reopened.
2. Since the meaning of --status-fd has changed, the test case using
it needs to be fixed as well.
Modify the lazy migration test to remove "sleep 2", actually waiting
for the the lazy page server to be ready.
While at it,
- remove the double fork (using shell's background process is
sufficient here);
- check the exit code for "runc checkpoint" and "criu lazy-pages";
- remove the check for no errors in dump.log after restore, as we
are already checking its exit code.
[v2: properly close status fd after spawning criu]
[v3: move close status fd to after the first read]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
With the fix in the previous commit and criu patched with support for
cgroupv2, these tests should now pass.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Same test as the first one, just with cgroupns enabled.
Since in case of cgroupv2 `runc spec` enables cgroupns,
this case was already tested by the first checkpoint test,
so skip it.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit a9e15e7e0 adds a check that stdin/out/err pipes
are restored correctly. Commit ec260653b7 copy/pastes
the same code to one more another test.
Problem is (as pointed out in commit 5369f9ade3) these tests
sometimes hang. I have also seen them fail.
Apparently, the code used to create pipes and open them to fds
is racy:
```shell
cat $fifo | cat $fifo &
pid=$!
exec 50</proc/$pid/fd/0
exec 51>/proc/$pid/fd/0
```
Since `cat | cat` is spawned asynchronously, by the time exec is used,
the second cat process (i.e. $pid) is already fork'ed but it might
not be exec'ed yet. As a result, we get this (`ls -l /proc/self/fd`):
```
lr-x------. 1 root root 64 Apr 20 02:39 50 -> /dev/pts/1
l-wx------. 1 root root 64 Apr 20 02:39 51 -> /dev/pts/1
```
or, in some cases:
```
lr-x------. 1 root root 64 Apr 20 02:45 50 -> /dev/pts/1
l-wx------. 1 root root 64 Apr 20 02:45 51 -> 'pipe:[215791]'
```
instead of expected set of pipes:
```
> lr-x------. 1 root root 64 Apr 20 02:45 50 -> 'pipe:[215791]'
> l-wx------. 1 root root 64 Apr 20 02:45 51 -> 'pipe:[215791]'
```
One possible workaround is to add `sleep 0.1` or so after cat|cat,
but it is outright ugly (besides, we already have one sleep in
the test code).
The solution is to not use any external processes to create pipes.
I admit this still looks not very comprehensible, but at least it
is easier than before, and it works.
While at it, remove code duplication, moving the setup and check
code into a pair of functions.
Finally, since the tests are working now, remove the skip.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Since all the criu tests have the same requirements,
move them to setup().
While at it, remove an obviously redundant comment.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Introduce a special case for `testcontainer` to test
for container that is not present (checkpointed), use it.
Fix one place where testcontainer was not used.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
runc in this file is actually a function that does `run runc ...`,
and `run` sets variable `$status` as the exit code, so `$status`
is what should be checked.
If calling runc directly (as in `__runc ...`), then $? is legit.
While at it, remove an obsoleted comment, and an unneeded
`ret=$?` assignment (check `$?` directly).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Run in the same environment as systemd tests.
Disable CRIU tests because:
- they all fail with cgroup v2;
- CRIU v3.14 is required and it's not yet released, and
rebuilding it from sources with patches applied (like
it is currently done in Dockerfile) is too much work.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
These tests sometimes hang, so let's skip them for now.
Tested:
$ sudo make localintegration TESTPATH='/checkpoint.bats' RUNC_USE_SYSTEMD=1
The 5 tests in this test suite will be skipped.
Signed-off-by: Filipe Brandenburger <filbranden@google.com>
This removes unnecessary lines from checkpoint.bats like:
sed -i 's;"readonly": true;"readonly": false;' config.json
and adds (and corrects) comments which are leftover from older
versions of checkpoint.bats.
Signed-off-by: Adrian Reber <areber@redhat.com>
This adds a new CRIU based checkpoint/restore test to check if
the restored container runs in the same network namespace as before.
Signed-off-by: Adrian Reber <areber@redhat.com>
Upstream renamed the feature check for lazy migration support from
'lazy_pages' to 'uffd'. The lazy migration test case was therefore
not running at all. This enables the lazy migration test case in runc
again.
The test will, however, not run in travis as the kernel is too old.
But it works again locally.
Signed-off-by: Adrian Reber <areber@redhat.com>
The lazy-pages test case is not as straight forward as the other test
cases. This is related to the fact that restoring requires a different
name if restored on the same host. During 'runc checkpoint' the
container is not destroyed before all memory pages have been transferred
to the destination and thus the same container name cannot be used.
As real world usage will rather migrate a container from one system to
another than lazy migrate a container on the same host this is only
problematic for this test case.
Another reason is that it requires starting 'runc checkpoint' and 'criu
lazy-pages' in the background as those process need to be running to
start the final restore 'runc restore'.
CRIU upstream is currently discussing to automatically start 'criu
lazy-pages' which would simplify the lazy-pages test case a bit.
The handling and checking of the background processes make the test case
not the most elegant as at one point a 'sleep 2' is required to make
sure that 'runc checkpoint' had time to do its thing before looking at
log files.
Before running the actual test criu is called in feature checking mode
to make sure lazy migration is in the test case criu enabled. If not,
the test is skipped.
Signed-off-by: Adrian Reber <areber@redhat.com>
print out errors in checkpoint/restore log when it failed similar to how we did i
in `checkpoint --pre-dump` tests
Signed-off-by: Daniel Dao <dqminh89@gmail.com>
We have two test cases with and without pre-dump. Terminals and
pre-dump features are orthogonal, so we can modify one of these test cases.
Signed-off-by: Andrei Vagin <avagin@virtuozzo.com>
This adds targets for rootless integration tests, as well as all of the
required setup in order to get the tests to run. This includes quite a
few changes, because of a lot of assumptions about things running as
root within the bats scripts (which is not true when setting up rootless
containers).
Signed-off-by: Aleksa Sarai <asarai@suse.de>
CRIU gets pre-dump to complete iterative migration.
pre-dump saves process memory info only. And it need parent-path
to specify the former memory files.
This patch add pre-dump and parent-path arguments to runc checkpoint
Signed-off-by: Deng Guangxing <dengguangxing@huawei.com>
Signed-off-by: Adrian Reber <areber@redhat.com>
This makes it much simpler to write tests, and you don't have to worry
about some of the oddness with bats.
Signed-off-by: Aleksa Sarai <asarai@suse.de>