5.0 KiB

Raw Blame History

Kubeye

Kubeye is a tool for inspecting Kubernetes clusters. It runs a variety of checks to ensure that Kubernetes pods are configured using best practices, helping you avoid problems in the future. Quickly get cluster core component status and cluster size information and abnormal Pods information and tons of node problems. Developed by the GO language. Support for user-defined best practice configuration rules and the addition of cluster fault scouts, which can refer to the Node-Problem-Detector project。

Usage

1、Get the Installer Excutable File

Binary downloads of the kubeye.

wget https://installertest.pek3b.qingstor.com/ke
chmod +x ke

Build Binary from Source Code

git clone https://github.com/kubesphere/kubeye.git
cd kubeye 
make

2、Perform operation

./ke audit --kubeconfig ***

--kubeconfig string
      Path to a kubeconfig. Only required if out-of-cluster.
> Note: If it is an external cluster, the server needs an external network address in the config file.

3、Install Node-problem-Detector in the inspection cluster

Note: The NPD module does not need to be installed When more detailed node information does not need to be probed.

Create a ConfigMap file for Node-Problem-Detector, which contains fault patrol rules and can be added by the user npd-config.yaml.
kubectl apply -f npd-config.yaml
Create the DaemonSet file for Node-Problem-Detector npd.yaml.
kubectl apply -f npd.yaml

Results

Whether the core components of the cluster are healthy, including controller-manager, scheduler and etc.
Whether the cluster node healthy.
Whether the cluster pod is healthy.

Check for more detail items Click here

Results Example

root@node1:/home/ubuntu/go/src/kubeye# ./ke audit --kubeconfig /home/ubuntu/config
HEARTBEATTIME                   SEVERITY                                 NODENAME   REASON              MESSAGE
2020-11-19 10:32:03 +0800 CST   danger                                   node18     NodeStatusUnknown   Kubelet stopped posting node status.
2020-11-19 10:31:37 +0800 CST   danger                                   node19     NodeStatusUnknown   Kubelet stopped posting node status.
2020-11-19 10:31:14 +0800 CST   danger                                   node2      NodeStatusUnknown   Kubelet stopped posting node status.
2020-11-19 10:31:58 +0800 CST   danger                                   node3      NodeStatusUnknown   Kubelet stopped posting node status.

NAME                            SEVERITY                                 MESSAGE
scheduler                       danger                                   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused

EVENTTIME                       NODENAME                                 NAMESPACE     REASON       MESSAGE
2020-11-20 18:52:13 +0800 CST   nginx-b8ffcf679-q4n9v.16491643e6b68cd7   default       Failed       Error: ImagePullBackOff

TIME                            NAME                                     NAMESPACE     KIND         MESSAGE
2020-11-20T18:54:44+08:00       calico-node                              kube-system   DaemonSet    [{map[cpuLimitsMissing:{cpuLimitsMissing CPU limits should be set false    warning  Resources} runningAsPrivileged:{runningAsPrivileged Should not be running as privileged false    warning  Security}]}]
2020-11-20T18:54:44+08:00       kube-proxy                               kube-system   DaemonSet    [{map[runningAsPrivileged:{runningAsPrivileged Should not be running as privileged false    warning  Security}]}]
2020-11-20T18:54:44+08:00       coredns                                  kube-system   Deployment   [{map[cpuLimitsMissing:{cpuLimitsMissing CPU limits should be set false    warning  Resources}]}]
2020-11-20T18:54:44+08:00       nodelocaldns                             kube-system   DaemonSet    [{map[cpuLimitsMissing:{cpuLimitsMissing CPU limits should be set false    warning  Resources} hostPortSet:{hostPortSet Host port should not be configured false    warning  Networking} runningAsPrivileged:{runningAsPrivileged Should not be running as privileged false    warning  Security}]}]
2020-11-20T18:54:44+08:00       nginx                                    default       Deployment   [{map[cpuLimitsMissing:{cpuLimitsMissing CPU limits should be set false    warning  Resources} livenessProbeMissing:{livenessProbeMissing Liveness probe should be configured false    warning  Health Checks} tagNotSpecified:{tagNotSpecified Image tag should be specified false    danger   Images  }]}]
2020-11-20T18:54:44+08:00       calico-kube-controllers                  kube-system   Deployment   [{map[cpuLimitsMissing:{cpuLimitsMissing CPU limits should be set false    warning  Resources} livenessProbeMissing:{livenessProbeMissing Liveness probe should be configured false    warning  Health Checks}]}

5.0 KiB Raw Blame History

Kubeye

Usage

Results

Results Example

5.0 KiB

Raw Blame History