Signed-off-by: Forest-L <lilin@yunify.com>
This commit is contained in:
Forest-L 2021-01-05 17:21:58 +08:00
parent d7ef7250d4
commit 22d998e449
2 changed files with 60 additions and 23 deletions

View File

@ -40,5 +40,5 @@ func init() {
rootCmd.AddCommand(auditCmd)
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
auditCmd.Flags().StringVarP(&config, "filename", "f", "", "Customize best practice configuration")
auditCmd.Flags().BoolVarP(&allInformation, "all", "a", false, "Show more specific information")
auditCmd.Flags().BoolVarP(&allInformation, "all", "A", false, "Show more specific information")
}

View File

@ -2,40 +2,77 @@ The main purpose of this document is how to recover and eliminate the problem wh
## Node-level issues
1. Container runtime not ready: RuntimeReady=false reason:DockerDaemonNotReady message:docker: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
```
Message: There is a problem with the docker service that causes the node NotReady.
Solution Ideas:
1. On the corresponding node, such as: systemctl status docker, see if the service is Running or exist?
2. If it's not running, start it. such as: systemctl start docker.
3. If it's not exist, it means that the corresponding node is reset and need to add node or delete node.
4. If start fails, such as: journalctl -u docker -f, see detailed docker logs.
```
#### 1. Node is not ready due to docker service exception
##### Symptoms:
Node not ready. The error log shows the following error message:
`Container runtime not ready: failed to get docker version: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?`
##### Cause:
Docker service exception
##### Resolving the problem:
1. On the corresponding node, see if the docker service is Running or exist?, such as the following command:
`systemctl status docker`
2. If it's not running, start it, such as the following command:
`systemctl start docker`
3. If it's not exist, it means that the corresponding node is reset and need to add node or delete node. prefer to [add/delete](https://github.com/kubesphere/kubekey#add-nodes)
4. If start fails, you can open two terminals on the same machine, one with the command view docker logs and the other with start docker command. such as the following command:
one terminal: `journalctl -u docker -f`, other terminal: `systemctl start docker`
## Pod-level issues
1. message: Error, ImagePullBackOff
#### 1. Pod is not Running due to image pull failure
##### Symptoms:
The status of Pod is ErrImagePull. The error log shows the following error message:
`Error, ImagePullBackOff`
##### Cause:
Pod is dispatched to that node and the pull image fails
##### Resolving the problem:
1. kubectl describe the corresponding pod with namespace, see the image that cannot be pulled. such as the following command:
`kubectl describe pod -n <namespace> <podName>`
2. Compare the pulled image with the actual one needed, note the image format.
3. Check the image repository or try to pull it manually on corresponding node to see if it succeeds.
`docker pull <registry>/workspace/imageName:imageTag`
4. If you can not pull, check if the corresponding node is configured to pull the image repository trust source.
```
Message: ImagePullBackOff
Solution Ideas:
1. kubectl describe pod -n <namespace> <podName>, such as: kubectl describe pod -n default nginx-b8ffcf679-q4n9v.16491643e6b68cd7, see event's log.
2. Compare the pulled image with the actual one needed.
3. Whether the pulled image exists in the mirror repositroy?
4. Check the mirror repositroy or try pulling it manually on another node in the cluster to see if it succeeds.
5. If another node can pull, check if the corresponding node is configured to pull the mirror repository trust source.
cat /etc/docker/daemon.json
{
"log-opts": {
"max-size": "5m",
"max-file":"3"
},
"registry-mirrors": ["https://*****.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"]
}
```
5. If you can not pull, check the the machine network.
`curl www.baidu.com`
6. Need images are re-pushed to the repository or tag existing images as need images or copy from another node.
```shell script
docker push <registry>/workspace/imageName:imageTag
or
docker tag <existingImage> <needImage>
or
another node: docker save -o needImage.tar existingImage
corresponding node: docker load -i needImage.tar
```
## Best Practice issues
1. message: cpuLimitsMissing
#### 1. The CPU Limits parameter is not set at the corresponding pod resource
##### Symptoms:
When this parameter is not set, pod service exceptions may require unlimited CPU, resulting in high node CPU usage and downtime. The log shows the following message:
`cpuLimitsMissing or CPU limits should be set`
##### Cause:
The CPU Limits parameter is not set at the corresponding pod resource
##### Resolving the problem:
1. To specify a CPU limit, include resources:limits. Usually cpu limits do not exceed 1 core. refer to [CPU limits](https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/)
```
Message: The CPU Limits parameter is not set at the corresponding pod resource
Solution Ideas:
Specific values refer to the actual application, such as,
spec:
containers:
- image: nginx:latest
resources:
limits:
cpu: 200m
```
```