2017-05-08 193 views
2

出於某種原因Kubernetes 1.6.2不會觸發自動縮放在谷歌集裝箱引擎。Kubernetes不會觸發自動縮放的GKE

我有一個someservice定義具有以下資源和滾動更新:

apiVersion: extensions/v1beta1 
kind: Deployment 
metadata: 
    name: someservice 
    labels: 
    layer: backend 
spec: 
    minReadySeconds: 160 
    replicas: 1 
    strategy: 
    rollingUpdate: 
     maxSurge: 100% 
     maxUnavailable: 0 
    type: RollingUpdate 
    template: 
    metadata: 
     labels: 
     name: someservice 
     layer: backend   
    spec: 
     containers: 
     - name: someservice 
     image: eu.gcr.io/XXXXXX/someservice:v1 
     imagePullPolicy: Always 
     resources: 
      limits: 
      cpu: 2 
      memory: 20Gi   
      requests: 
      cpu: 400m 
      memory: 18Gi 
    <.....> 

改變圖像版本後,新的實例無法啓動:

$ kubectl -n dev get pods -l name=someservice 
NAME      READY  STATUS RESTARTS AGE 
someservice-2595684989-h8c5d 0/1  Pending 0   42m 
someservice-804061866-f2trc 1/1  Running 0   1h 

$ kubectl -n dev describe pod someservice-2595684989-h8c5d 

Events: 
    FirstSeen LastSeen Count From   SubObjectPath Type  Reason   Message 
    --------- -------- ----- ----   ------------- -------- ------   ------- 
    43m  43m  4 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (4), Insufficient memory (3). 
    43m  42m  6 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3), Insufficient memory (3). 
    41m  41m  2 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (3). 
    40m  36s  136 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (3). 
    43m  2s  243 cluster-autoscaler   Normal  NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added) 

我的節點池設置爲自動縮放與min: 2,max: 5。節點池中的機器(n1-highmem-8)足夠大(52GB)以適應此服務。但不知何故:

$ kubectl get nodes 
NAME         STATUS AGE  VERSION 
gke-dev-default-pool-efca0068-4qq1 Ready  2d  v1.6.2 
gke-dev-default-pool-efca0068-597s Ready  2d  v1.6.2 
gke-dev-default-pool-efca0068-6srl Ready  2d  v1.6.2 
gke-dev-default-pool-efca0068-hb1z Ready  2d  v1.6.2 

$ kubectl describe nodes | grep -A 4 'Allocated resources' 
Allocated resources: 
    (Total limits may be over 100 percent, i.e., overcommitted.) 
    CPU Requests CPU Limits Memory Requests  Memory Limits 
    ------------ ---------- ---------------  ------------- 
    7060m (88%) 15510m (193%) 39238591744 (71%) 48582818048 (88%) 
-- 
Allocated resources: 
    (Total limits may be over 100 percent, i.e., overcommitted.) 
    CPU Requests CPU Limits Memory Requests Memory Limits 
    ------------ ---------- --------------- ------------- 
    6330m (79%) 22200m (277%) 48930Mi (93%) 66344Mi (126%) 
-- 
Allocated resources: 
    (Total limits may be over 100 percent, i.e., overcommitted.) 
    CPU Requests CPU Limits Memory Requests Memory Limits 
    ------------ ---------- --------------- ------------- 
    7360m (92%) 13200m (165%) 49046Mi (93%) 44518Mi (85%) 
-- 
Allocated resources: 
    (Total limits may be over 100 percent, i.e., overcommitted.) 
    CPU Requests CPU Limits Memory Requests  Memory Limits 
    ------------ ---------- ---------------  ------------- 
    7988m (99%) 11538m (144%) 32967256Ki (61%) 21690968Ki (40%) 

$ gcloud container node-pools describe default-pool --cluster=dev 
autoscaling: 
    enabled: true 
    maxNodeCount: 5 
    minNodeCount: 2 
config: 
    diskSizeGb: 100 
    imageType: COS 
    machineType: n1-highmem-8 
    oauthScopes: 
    - https://www.googleapis.com/auth/compute 
    - https://www.googleapis.com/auth/datastore 
    - https://www.googleapis.com/auth/devstorage.read_only 
    - https://www.googleapis.com/auth/devstorage.read_write 
    - https://www.googleapis.com/auth/service.management.readonly 
    - https://www.googleapis.com/auth/servicecontrol 
    - https://www.googleapis.com/auth/sqlservice 
    - https://www.googleapis.com/auth/logging.write 
    - https://www.googleapis.com/auth/monitoring 
    serviceAccount: default 
initialNodeCount: 2 
instanceGroupUrls: 
- https://www.googleapis.com/compute/v1/projects/XXXXXX/zones/europe-west1-b/instanceGroupManagers/gke-dev-default-pool-efca0068-grp 
management: 
    autoRepair: true 
name: default-pool 
selfLink: https://container.googleapis.com/v1/projects/XXXXXX/zones/europe-west1-b/clusters/dev/nodePools/default-pool 
status: RUNNING 
version: 1.6.2 

$ kubectl -n dev get pods -l name=someservice 
NAME      READY  STATUS RESTARTS AGE 
someservice-2595684989-h8c5d 0/1  Pending 0   42m 
someservice-804061866-f2trc 1/1  Running 0   1h 

$ kubectl -n dev describe pod someservice-2595684989-h8c5d 

Events: 
    FirstSeen LastSeen Count From   SubObjectPath Type  Reason   Message 
    --------- -------- ----- ----   ------------- -------- ------   ------- 
    43m  43m  4 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (4), Insufficient memory (3). 
    43m  42m  6 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (3), Insufficient memory (3). 
    41m  41m  2 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (3). 
    40m  36s  136 default-scheduler   Warning  FailedScheduling No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (3). 
    43m  2s  243 cluster-autoscaler   Normal  NotTriggerScaleUp pod didn't trigger scale-up (it wouldn't fit if a new node is added) 


$ kubectl version 
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:33:11Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"darwin/amd64"} 
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.2", GitCommit:"477efc3cbe6a7effca06bd1452fa356e2201e1ee", GitTreeState:"clean", BuildDate:"2017-04-19T20:22:08Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"} 

回答

1

因此,這似乎是一個與Kubernetes 1.6.2的錯誤。據GKE支持工程師:

從消息「沒有可用的節點相匹配的所有 以下謂詞的」,這似乎是一個已知的問題和 工程師設法追查根本原因。這是在 集羣自動配置器0.5.1當前在GKE 1.6使用(最多 至1.6.2)的問題。這個問題已經已經固定在集羣自動配置器 0.5.2,它包含在頭爲1.6分支。

0

確保實例組autoscaler被禁用或具有適當的最小/最大實例數設置。

根據Kubernetes Cluster Autoscaler FAQ

基於CPU(或任何基於度量的)羣集/節點組autoscalers,像 GCE實例組自動配置器,不與[Kubernetes 羣集Austoscaler]兼容。它們也不是特別適合與Kubernetes一起使用 。

...所以它可能應該被禁用。

嘗試:

gcloud compute instance-groups managed describe gke-dev-default-pool-efca0068-grp \ 
    --zone europe-west1-b 

然後檢查autoscaler財產。如果實例組autoscaler被禁用,它將不存在。

要禁用它,這樣做:

gcloud compute instance-groups managed stop-autoscaling gke-dev-default-pool-efca0068-grp \ 
    --zone europe-west1-b