用 Operator 在 Kubernetes 中部署 RisingWave 集群

本章将帮助您使用 [Kubernetes Operator for RisingWave](https://github.com/risingwavelabs/risingwave-operator) （以下简称“Operator”）在 Kubernetes 中部署 RisingWave 集群。

Operator 是一个专为 RisingWave 设计的部署和管理系统。它在 Kubernetes 上运行，并提供诸如配置、升级、扩缩容及销毁集群内 RisingWave 实例等功能。

开始之前

安装 kubectl
请确保 Kubernetes 命令行工具 kubectl 已安装在您的环境中。
安装 psql
请确保 PostgreSQL 交互式终端 psql 已安装在您的环境中。
安装并运行 Docker
请确保 Docker 已安装并正在您的环境中运行。
请确保为部署分配足够的资源，并使用推荐的磁盘用于 etcd。详情请见硬件要求。

创建 Kubernetes 集群

info

本节的步骤指导的是在本地环境中创建 Kubernetes 集群。
如果您使用的是 AKS、GKE 和 EKS 等托管 Kubernetes 服务，请参考相应文档以获得说明。

步骤:

安装 kind。
kind 是一个使用 Docker 容器作为集群节点运行本地 Kubernetes 集群的工具。您可以在 Docker Hub 上查看 kind 的可用标签。
创建集群。
```
kind create cluster
```
可选: 检查集群是否正确创建。
```
kubectl cluster-info
```

部署 Operator

在部署之前，请确保满足以下要求。

Docker 版本 ≥ 18.09
kubectl 版本 ≥ 1.18
对于 Linux，将 sysctl 参数 net.ipv4.ip_forward 的值设置为 1。

步骤:

安装 cert-manager 并等待一分钟以完成初始化。
安装 Operator 的最新版本。
```
kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/latest/download/risingwave-operator.yaml
```
如果您想安装 Operator 的某个特定版本
运行以下命令来安装您想要的特定版本：
# 替换 ${VERSION} 为您想要安装的版本，例如，v1.3.0 kubectl apply --server-side -f https://github.com/risingwavelabs/risingwave-operator/releases/download/${VERSION}/risingwave-operator.yaml
兼容性表
Operator RisingWave Kubernetes
v0.4.0 v0.18.0+ v1.21+
v0.3.6 v0.18.0+ v1.21+
您可以在此处找到每个版本的发布说明。
note
如果 cert-manager 未完全初始化，可能会出现以下错误。只需再等一分钟并重新运行上述命令即可。
Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.105.102.32:443: connect: connection refused
Error from server (InternalError): Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.105.102.32:443: connect: connection refused
可选: 检查 Pods 是否正确安装。

Operator	RisingWave	Kubernetes
v0.4.0	v0.18.0+	v1.21+
v0.3.6	v0.18.0+	v1.21+

```shell
kubectl -n cert-manager get pods
kubectl -n risingwave-operator-system get pods
```

部署 RisingWave 实例

在管理 RisingWave 时，RisingWave Kubernetes Operator 通过 CRDs（Custom Resource Definitions，自定义资源定义）扩展了 Kubernetes。这意味着您只需要做一件事——在您的 Kubernetes 集群中创建一个 RisingWave 资源，剩下的所有工作交给 RisingWave Kubernetes Operator 即可。

使用示例资源文件

RisingWave 资源是一个自定义资源，它定义了一个 RisingWave 集群。在这个目录中，您可以找到部署 RisingWave 的资源示例，这些示例具有不同的元数据存储和状态后端配置。根据您的需求，您可以直接使用这些资源文件，或者作为您定制自己文件的参考。在稳定目录中，可以找到我们已经测试过、确定与最新发布的 RisingWave Operator 版本兼容的资源文件：

资源文件使用“risingwave-<meta_store>-<state_backend>.yaml” 命名。例如，risingwave-etcd-s3.yaml 表示这个清单文件使用 etcd 作为元数据存储、AWS S3 作为状态后端。名称中不包含 etcd 的资源文件则使用内存作为元数据存储，这不会持久化 Meta 节点数据，因此有丢失数据的风险。请注意，对于生产部署，您应该使用 etcd 作为元数据存储。因此，请使用名称中包含 etcd 的资源文件，或选择 /stable/ 目录中的文件。

RisingWave 支持使用以下系统或服务作为状态后端。

MinIO
AWS S3
与 S3 兼容的对象存储
谷歌云存储
Azure Blob 存储

您可以自定义 etcd 作为单独的集群、自定义状态后端或自定义状态存储目录。

可选：自定义 etcd 部署

RisingWave 使用 etcd 来持久化 Meta 节点的数据。需要注意的是，etcd 对磁盘写入延迟非常敏感。磁盘性能较慢会导致 etcd 请求延迟增加，并可能影响集群的稳定性。在规划您的 RisingWave 部署时，请遵循 etcd 磁盘推荐。

我们推荐使用 bitnami/etcd Helm 图表来部署 etcd。请将以下配置保存为 etcd-values.yaml。

service:
  ports:
    client: 2379
  peer: 2380

replicaCount: 3

resources:
  limits:
    cpu: 1
    memory: 2Gi
  requests:
    cpu: 1
    memory: 2Gi

persistence:
  # storageClass: default
  size: 10Gi
  accessModes: [ "ReadWriteOnce" ]

auth:
  rbac:
    create: false
    allowNoneAuthentication: true

extraEnvVars:
- name:  "ETCD_MAX_REQUEST_BYTES"
  value: "104857600"
- name:  "MAX_QUOTA_BACKEND_BYTES"
  value: "8589934592"
- name:  "ETCD_AUTO_COMPACTION_MODE"
  value: "periodic"
- name:  "ETCD_AUTO_COMPACTION_RETENTION"
  value: "1m"
- name:  "ETCD_SNAPSHOT_COUNT"
  value: "10000"
- name:  "ETCD_MAX_TXN_OPS"
  value: "999999"

如果您想要指定持久卷的存储类别，请取消 # storageClass: default 这一行的注释，并指定正确的值。

然后，运行以下命令来部署 etcd 集群：

helm install -f etcd-values.yaml etcd bitnami/etcd

可选：自定义状态后端

如果您打算自定义资源文件，请将文件下载到本地路径并编辑它：

curl https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/<sub-directory> -o risingwave.yaml

如果您熟悉 Kubernetes 资源文件，也可以从头开始创建自己的资源文件。

然后，使用以下命令应用资源文件：

kubectl apply -f a.yaml      # 相对路径
kubectl apply -f /tmp/a.yaml # 绝对路径

要自定义您的 RisingWave 集群的状态后端，请编辑 RisingWave 资源（kind: RisingWave）下的 spec:stateStore 部分。

AWS S3
MinIO
S3-compatible
Azure Blob Storage
Google Cloud Storage

spec:
  stateStore:
    # Prefix to objects in the object stores or directory in file system. Default to "hummock".
    dataDirectory: hummock
    
    # Declaration of the S3 state store backend.
    s3:
      # Region of the S3 bucket.
      region: us-east-1
      
      # Name of the S3 bucket.
      bucket: risingwave
      
      # Credentials to access the S3 bucket.
      credentials:
        # Name of the Kubernetes secret that stores the credentials.
        secretName: s3-credentials
        
        # Key of the access key ID in the secret.
        accessKeyRef: AWS_ACCESS_KEY_ID
        
        # Key of the secret access key in the secret.
        secretAccessKeyRef: AWS_SECRET_ACCESS_KEY
        
        # Optional, set it to true when the credentials can be retrieved 
        # with the service account token, e.g., running inside the EKS.
        # 
        # useServiceAccount: true 

note

MinIO 的性能与其托管节点的磁盘性能密切相关。我们观察到 AWS EBS 在我们的测试中表现不佳。为了获得最佳性能，我们推荐使用 S3 或兼容的云服务。

spec:
  stateStore:
    # Prefix to objects in the object stores or directory in file system. Default to "hummock".
    dataDirectory: hummock
    
    # Declaration of the MinIO state store backend.
    minio:
      # Endpoint of the MinIO service.
      endpoint: risingwave-minio:9301
      
      # Name of the MinIO bucket.
      bucket: hummock001
      
      # Credentials to access the MinIO bucket.
      credentials:
        # Name of the Kubernetes secret that stores the credentials.
        secretName: minio-credentials
        
        # Key of the username ID in the secret.
        usernameKeyRef: username
        
        # Key of the password key in the secret.
        passwordKeyRef: password 

spec:
  stateStore:
    # Prefix to objects in the object stores or directory in file system. Default to "hummock".
    dataDirectory: hummock
    
    # Declaration of the S3 compatible state store backend.
    s3:
      # Endpoint of the S3 compatible object storage. Two variables are supported:
      # - ${BUCKET}: name of the S3 bucket.
      # - ${REGION}: name of the region.
      endpoint: ${BUCKET}.cos.${REGION}.myqcloud.com
      
      # Region of the S3 compatible bucket.
      region: ap-guangzhou
      
      # Name of the S3 compatible bucket.
      bucket: risingwave
      
      # Credentials to access the S3 compatible bucket.
      credentials:
        # Name of the Kubernetes secret that stores the credentials.
        secretName: cos-credentials
        
        # Key of the access key ID in the secret.
        accessKeyRef: ACCESS_KEY_ID
        
        # Key of the secret access key in the secret.
        secretAccessKeyRef: SECRET_ACCESS_KEY

Explain
spec:
  stateStore:
    # Prefix to objects in the object stores or directory in file system. Default to "hummock".
    dataDirectory: hummock
    
    # Declaration of the Google Cloud Storage state store backend.
    azureBlob:
      # Endpoint of the Azure Blob service.
      endpoint: https://you-blob-service.blob.core.windows.net
      
      # Working directory root of the Azure Blob service.
      root: risingwave
      
      # Container name of the Azure Blob service.
      container: risingwave
    
      # Credentials to access the Google Cloud Storage bucket.
      credentials:
        # Name of the Kubernetes secret that stores the credentials.
        secretName: gcs-credentials
        
        # Key of the account name in the secret.
        accountNameRef: AccountName
        
        # Key of the account name in the secret.
        accountKeyRef: AccountKey

spec:
  stateStore:
    # Prefix to objects in the object stores or directory in file system. Default to "hummock".
    dataDirectory: hummock
    
    # Declaration of the Google Cloud Storage state store backend.
    gcs:
      # Name of the Google Cloud Storage bucket.
      bucket: risingwave
      
      # Root directory of the Google Cloud Storage bucket.
      root: risingwave
    
      # Credentials to access the Google Cloud Storage bucket.
      credentials:
        # Name of the Kubernetes secret that stores the credentials.
        secretName: gcs-credentials
        
        # Key of the service account credentials in the secret.
        serviceAccountCredentialsKeyRef: ServiceAccountCredentials
        
        # Optional, set it to true when the credentials can be retrieved.
        # useWorkloadIdentity: true

可选：自定义状态存储目录

在部署 RisingWave 实例的 risingwave.yaml 文件中， spec: stateStore: dataDirectory 参数可以用来自定义存储状态数据的目录。如果您有多个 RisingWave 实例，请确保新实例的 dataDirectory 值是唯一的（默认值是 hummock）。否则，新的 RisingWave 实例可能会崩溃。在运行 kubectl apply -f <...risingwave.yaml> 命令之前，请保存对 risingwave.yaml 文件的更改。注意，目录路径不能是绝对地址，如 /a/b，并且长度必须不超过 180 个字符。

验证实例的状态

您可以通过运行以下命令来检查 RisingWave 实例的状态。

kubectl get risingwave

如果实例运行正常，输出应该如下所示：

NAME        RUNNING   STORAGE(META)   STORAGE(OBJECT)   AGE
risingwave  True      etcd            S3                30s

连接到 RisingWave

ClusterIP
NodePort
LoadBalancer

默认情况下，Operator 会为前端组件创建一个服务，您可以通过该服务与 RisingWave 互动，其类型为 ClusterIP。但它在 Kubernetes 外部是不可访问的。因此，您需要在 Kubernetes 内部为 PostgreSQL 创建一个独立的 Pod。

步骤：

创建一个 Pod。

kubectl apply -f https://raw.githubusercontent.com/risingwavelabs/risingwave-operator/main/docs/manifests/psql/psql-console.yaml

附加到 Pod 以便在容器内执行命令。
```
kubectl exec -it psql-console -- bash
```

通过 psql 连接到 RisingWave。

psql -h risingwave-frontend -p 4567 -d dev -U root

您可以从 Kubernetes 中的节点（如 EC2）连接到 RisingWave。

步骤：

在您用来部署 RisingWave 实例的 risingwave.yaml 文件中，为 RisingWave 服务的配置添加一个 frontendServiceType 参数，并将其值设置为 NodePort。
```
# ...
kind: RisingWave
...
spec:
  frontendServiceType: NodePort
# ...
```

通过在节点上运行以下命令连接到 RisingWave。

export RISINGWAVE_NAME=risingwave-etcd-hdfs
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get node -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].nodePort}'`

psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U root

如果您使用的是 EKS、GCP 或其他云供应商提供的托管 Kubernetes 服务，您可以通过云中的负载均衡器将服务暴露给公网。

步骤：

在您用来部署 RisingWave 实例的 risingwave.yaml 文件中，为 RisingWave 服务的配置添加一个 frontendServiceType 参数，并将其值设置为 LoadBalancer。
```
# ...
kind: RisingWave
...
spec:
  frontendServiceType: LoadBalancer
# ...
```

使用以下命令连接到 RisingWave。

export RISINGWAVE_NAME=risingwave-etcd-hdfs
export RISINGWAVE_NAMESPACE=default
export RISINGWAVE_HOST=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}'`
export RISINGWAVE_PORT=`kubectl -n ${RISINGWAVE_NAMESPACE} get svc -l risingwave/name=${RISINGWAVE_NAME},risingwave/component=frontend -o jsonpath='{.items[0].spec.ports[0].port}'`

psql -h ${RISINGWAVE_HOST} -p ${RISINGWAVE_PORT} -d dev -U root

现在您可以导入和转换流数据了。详情见快速上手指南。

开始之前​

创建 Kubernetes 集群​

部署 Operator​

部署 RisingWave 实例​

使用示例资源文件​

可选：自定义 etcd 部署​

可选：自定义状态后端​

可选：自定义状态存储目录​

验证实例的状态​

连接到 RisingWave​

开始之前

创建 Kubernetes 集群

部署 Operator

部署 RisingWave 实例

使用示例资源文件

可选：自定义 etcd 部署

可选：自定义状态后端

可选：自定义状态存储目录

验证实例的状态

连接到 RisingWave