Make all ansible role tasks infinity retry by yq

So you want mitigate connectivity problem during installation

In main.yml you have

- name: Install requirements packages
  yum:
    name:
      - epel-release
      - jq
- name: Install Icinga packages
  yum:
    name:
      - icinga2      

Run

yq -i '(.[] | select (has("yum")) | .yum.retries) = 10 ' main.yml
yq -i '(.[] | select (has("yum")) | .yum.delay) = 5 ' main.yml
yq -i '(.[] | select (has("yum")) | .yum.until) = "result | succeeded" ' main.yml
# I am too lazy to figure out one line code

Now you have

- name: Install requirements packages
  yum:
    name:
      - epel-release
      - jq
    retries: 10
    delay: 5
    until: result | succeeded
- name: Install Icinga packages
  yum:
    name:
      - icinga2
    retries: 10
    delay: 5
    until: result | succeeded

Kustomize replacement on annotations

The magic of fieldPaths is metadata.annotations.[external-dns.alpha.kubernetes.io/hostname]

NOT metadata.annotations.external-dns.alpha.kubernetes.io/hostname
NOT metadata.annotations[external-dns.alpha.kubernetes.io/hostname]
NOT metadata.annotations.external-dns\.alpha.\kubernetes\.io/hostname

Version:kustomize/v4.4.0 GitCommit:63ec6bdb3d737a7c66901828c5743656c49b60e1

cat parameters.env

FQDN=host.domain.tld

cat ingress.yaml

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  ports:
    - name: "8000"
      port: 8000
  type: NodePort
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: "my-service-ingress"
  annotations:
    external-dns.alpha.kubernetes.io/hostname:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS-1-1-2017-01
    alb.ingress.kubernetes.io/healthcheck-port: traffic-port
spec:
  rules:
    - host:
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: my-service
                port:
                  number: 8000

cat kustomizaiton.yaml

replacements:
- source:
    kind: ConfigMap
    fieldPath: data.FQDN
  targets:
  - select:
      kind: Ingress
      name: my-service-ingress
    fieldPaths:
      - spec.rules.0.host
      - metadata.annotations.[external-dns.alpha.kubernetes.io/hostname]

TLS Termination on AWS NLB to avoid managing certs in contour

開源定律:官方文件永遠裝不起來

雖然 Contour 本身就可以 TLS Termination,但是採用 AWS NLB 做 TLS Termination 有些好處:

  • 分離傳輸加密跟流量調度
  • 簡化憑證管理

AWS Network Load Balancer TLS Termination with Contour (projectcontour.io) 雖然官網有這篇文章,新世紀開源戰士當然總是碰到初號機不會動的窘境。(Sep 2021)

這是不動的官網寫法

apiVersion: v1
kind: Service
metadata:
  name: envoy
  namespace: projectcontour
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-2:185309785115:certificate/7610ed7d-5a81-4ea2-a18a-7ba1606cca3e"
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 80
    name: http
    protocol: TCP
  selector:
    app: envoy
  type: LoadBalancer

這是會動的寫法

apiVersion: v1
kind: Service
metadata:
  name: envoy
  namespace: projectcontour
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-2:185309785115:certificate/7610ed7d-5a81-4ea2-a18a-7ba1606cca3e"
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  externalTrafficPolicy: Local
  ports:
  - port: 443
    name: https
    protocol: TCP
    targetPort: 8080
  selector:
    app: envoy
  type: LoadBalancer

Grafana Loki with S3 Backend Using Tanka

官網文件安裝起不來,因為文件跟很深的預設值會裝出 schema on GCP + data on AWS 組態。

如果不依賴 helm charts 做自動化的話,建議依照官方建議用 Tanka | Grafana Labs 安裝。

要補上的主要是 schema_config 那一段

local gateway = import 'loki/gateway.libsonnet';
local loki = import 'loki/loki.libsonnet';
local promtail = import 'promtail/promtail.libsonnet';

loki + promtail + gateway {
  _config+:: {
    namespace: 'loki',
    htpasswd_contents: '',

    // S3 variables remove if not using aws
    storage_backend: 's3',
    s3_access_key: '',
    s3_secret_access_key: '',
    s3_address: 'ap-northeast-1',
    s3_bucket_name: 'loki',

    //Set this variable based on the type of object storage you're using.
    boltdb_shipper_shared_store: 's3',
    compactor_pvc_class: 'gp2',

    loki+: {
      schema_config: {
        configs: [{
          from: '2020-10-24',
          store: 'boltdb-shipper',
          object_store: 's3',
          schema: 'v11',
          index: {
            prefix: '%s_index_' % $._config.table_prefix,
            period: '%dh' % $._config.index_period_hours,
          },
        }],
      },
    },

    promtail_config+: {
      clients: [{
        scheme:: 'http',
        hostname:: 'gateway.%(namespace)s.svc' % $._config,
        username:: 'loki',
        password:: '',
        container_root_path:: '/var/lib/docker',
      }],
    },

    replication_factor: 3,
    consul_replicas: 1,
  },
}

在 Grafana Data Source 用 http://gateway.loki.svc 就可以了

Build Periodically in Jenkins Templating Engine

翻半天資料才找到 Jenkins Templating Engine 怎麼做定期建置

Free Style Job

GUI 點一點就好,很簡單

Declarative Pipeline

也很簡單,pipeline 裡面宣告一下就好

pipeline {
  agent any

  triggers {
      cron('H 0 1,15 * *')
  }

Jenkins Templating Engine

https://boozallen.github.io/sdp-docs/jte/2.2.2/index.html
JTE 支援兩種方式,Scripted Pipeline 跟 Declarative Pipeline,Scripted 較早出也較成熟。
設定的方式非常不直覺,其實在任何一個 stage 裡面的 node scope 設定 properties 就可以。

cat libraries/common/steps/common_init.groovy
@Init

void call() {
  stage('Common: Init'){
    node {
      properties([pipelineTriggers([cron('H 0 1,15 * *')])])
      cleanWs(disableDeferredWipeout: true)
      checkout scm        
    }
  }
}

Argo CD PreSync 雞蛋問題

如果你看 Argo CD Resource Hooks 官方文件,會告訴你 PreSync 可以拿來做 database schema migration,然而這是有問題的。

這有個問題,譬如說外部的資料庫我們會這樣宣告

apiVersion: v1
kind: Service
metadata:
  name: postgres
spec:
  type: ExternalName
  externalName: foo.bar.us-east-2.rds.amazonaws.com

大部分像這樣的 Resource 都是在 PreSync 之後的 Sync phase 去同步的,這就造成 Application 第一次安裝的時候,尚未跑到 Sync phase -> 資料庫不存在 -> PreSync 階段的 databae schema migration kind: Job 就會失敗。

如果把 kind: Service 標記成 Presync phase 會有另一個問題,Presync hook 成功跑完以後就會消失。(另一個模式是每次跑之前砍掉重建,問題差不多)

由於會消失,像是 kind: Service 或是 kind: PersistentVOlume 就不能放在 Presync phase。

解決方法是改用 Sync Waves

apiVersion: v1
kind: Service
metadata:
  name: postgres
  annotations:
    argocd.argoproj.io/sync-wave: "-1"  
spec:
  type: ExternalName
  externalName: foo.bar.us-east-2.rds.amazonaws.com

透過自由標記正負整數來控制執行順序,都 2021 了還有這種非 DAG 的設計也頗為奇妙。Flux CD 則是設計成比較好用的 spec.dependsOn

參考資料:

Linux Partition 命名規則真是一團亂

主要是 /dev/sda1 /dev/mapper/namep1 /dev/mapper/name1 /dev/mapper/name-part1
分隔字元要用 p 還是 -part 還是沒有,各種程式搞不定,然後又一堆 hard code。

[parted-devel] “linux: use devicemapper task name instead of device node name" causes dmraid breakage

fix handling of multipathed disks

[dm-devel] What is the deal with the partition separator?