版本：v3&v6

重建 Performance 数据

操作步骤

创建脚本

在服务器上创建脚本文件 reset_performance_k3s.sh ，内容查看文档下方【脚本文件】。

可查看帮助说明

root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh 
usage:
    [-a]: Reset all performance data, include ones-bi-sync-canal/ones-bi-sync-etl/kafka && clickhouse performance data.
    [-s]: Show bi-sync-etl sync status.
    [-o]: Optimize tables (task, field_value_history, manhour), to avoid the same data.
    [-v]: Show version.

完全重新同步-a

通常以下情况需要完全重新同步：

performance 展示的数据有部分未同步或跟 project 展示的有差异（可能是程序 bug）
增改了 bi-sync-canal 对 mysql 表的 binlog 监听则需要完成重新同步数据才能落入 kafka 供下游消费；
kafka 中存在脏数据；

执行 -a --后可使用 -s 查看同步状态：

root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -a
---- start to reset all performance data...
localstorageStorageBasePath: /data/ones/ones-local-storage
KafkaReadAddress: kafka-ha:9092
kafkaProjectBinlogTopic: project_binlog

---- step 1 stop bi-sync-*-deployment:
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled

---- step 2 delete topic project_binlog:
kubectl exec -it -nones kafka-ha-0 -- kafka-topics.sh --bootstrap-server kafka-ha:9092 --delete --topic project_binlog
    
---- step 3 delete ones-bi-sync-*/* files:
rm -rf /data/ones/ones-local-storage/others-static-pvc/ones/ones-bi-sync-*/*bolt
    
---- step 4 restart bi-sync-*-deployment
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled

---- wait a few seconds for restart ...
    
---- reset performance success, now you can get bi-sync-etl status by '-s' argv

指定表快速同步-q

（>= 6.1.0 版本有效）当需要针对一个或多个特定表进行重新同步时，可执行该选项。

需要注意，指定重新同步的表，可能会依赖于其它表，如果依赖的表未同步或依赖表数据本身有问题了，那么指定重新同步的表可能会同步不成功

使用方法，例如指定重新同步 task 和 sprint 表（多表用逗号隔开，中间不要出现空格）：

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -q project.task,project.sprint

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -q project.task,project.sprint
---- start to reset cdc performance data by the specified tables quickly ...
---- step 1 stop bi-sync-etl-deployment:
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-etl-deployment scaled

---- step 2 set onesBISyncEtlQuickSnapshot* and make setup-ones
2024-07-01 02:19:44,110 [INFO] ones_path=, k8s_root_dir=/data/ones/ones-ai-k8s
2024-07-01 02:19:44,111 [INFO] waiting for lock...
2024-07-01 02:19:44,111 [INFO] starting...
2024-07-01 02:19:44,111 [INFO] render config
… …（此处省略一些信息）
2024-07-01 02:20:32,711 [INFO] setup ones finish
2024-07-01 02:20:32,711 [INFO] elapsed time: 48.600 seconds
---- set config/private.yaml success: 
onesBISyncEtlQuickSnapshotTables: project.task,project.sprint
onesBISyncEtlQuickSnapshotVersion: 20240701101943

---- step 3 redeploy ones-bi-sync-etl-deployment
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-etl-deployment scaled

---- wait a few seconds for restart ...
   
---- reset performance success ----

etl 服务重启后，会发现重新对 task 和sprint 表做了快照并重新消费

... ...
2024/07/01 10:30:22 [INFO] Waiting for preorder snapshots to complete...
2024/07/01 10:30:32 [INFO] Topic initialization completed.
2024/07/01 10:30:32 [INFO] Connector initialization successful.
2024/07/01 10:30:32 [INFO] Consumer quick snapshot: {20240701102842 [project.task_status]}, force snapshot: {1 []}
2024/07/01 10:30:32 [INFO] Consumer quick snapshot all: false, force snapshot all: false
2024/07/01 10:30:32 [INFO] Consumers prepare tables for snapshots: []
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.task
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.sprint
2024/07/01 10:30:32 [INFO] The tables ready to execute snapshots are: [project.task_status]
2024/07/01 10:30:46 [INFO] Consumer bi-sync-etl_project_data start consuming the schema event stream
... ...

查看同步状态-s

这里指的是 ones-bi-sync-etl 服务的同步状态。用法（使用 -s 选项，可多次执行）

ONES版本 >= 6.1.0 时

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s // 反复执行，可见同步进度
[project_data] running, dumping  // 还在准备同步中，不一定有实时进度
[department] running, dump finished, incremental syncing 

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: org_user at 1 / 26 (3%) // 存量数据同步中，可看到实时进度
[department] running, dump finished, incremental syncing  

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: project at 6 / 26 (23%)
[department] running, dump finished, incremental syncing  

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: field_value at 16 / 26 (61%)
[department] running, dump finished 

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: manhour at 24 / 26 (92%)
[department] running, dump finished, incremental syncing 

root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
// 所有管道的存量数据同步完成，进入正常增量同步
[project_data] running, dump finished, incremental syncing 
[department] running, dump finished, incremental syncing 

ONES版本 < 6.1.0 时

root@iZwz94kqmvp7aa5zkju609Z:~# sh performance_tool_k3s.sh -s
Defaulted container "ones-bi-sync-etl" out of: ones-bi-sync-etl, wait-for-mysql (init), wait-for-clickhouse (init), wait-for-kafka (init)
[project_data] running, dumping, prepared, 379963/594657 (63%)
[department] running, dumping, prepared, 594652/594657 (99%)

重做完成的标准：

[project_data] 和 [department] 这两个管道组都要求达到99%或100%的状态。（99%也可以认为同步完成，是因为同步过程中客户环境可能不断产生新数据，从而无法完全达到100%）

强制合并表数据-o

脚本执行前面同步（-p/-a）命令后，页面上刷新数据可能会出现相同的数据（这是因为 Clickhouse 来不及进行 mergetree 合并数据的操作）。例如：

#492840 效能管理数据重复计算 https://our.ones.pro/project/#/team/RDjYMhKq/task/W32D3WkfokwFOKty

服务最迟会在每天凌晨（服务器时区）进行优化操作，但有时为了避免造成客户困扰，也可以在 -a 数据同步完成后，执行一下-o选项，合并数据：

root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -o
clickhouseUser: default
clickhousePassword: ****
clickhousePortTCP: 9000
OPTIMIZE TABLE default.task FINAL success
OPTIMIZE TABLE default.field_value_history FINAL success
OPTIMIZE TABLE default.manhour FINAL success

查看版本-v

root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -v
v1.0.1 20240603 performance tool for k3s deployment

脚本文件

#!/bin/bash
# 该运维脚本如有功能更新，需同步更新效能运维文档，并知会运维同学
version="v1.1.0 20240628 performance tool for k3s deployment"
namespace=ones

getLocalstorageStorageBasePath() {
    kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
    localstorageStorageBasePath=$(${kubectlInstallerCmd} make get_value KEY=localstorageStorageBasePath)
    if [ "$localstorageStorageBasePath" = "" ]; then
      echo "[ERROR] get localstorageStorageBasePath empty! "
      exit 1
    fi
    echo "${localstorageStorageBasePath}"
}

etlStatus(){
  # 打印当前 etl 服务同步状态
   pod=$(kubectl get po -nones | grep bi-sync-etl | awk '{print $1}')
   kubectl exec -it -nones "${pod}" -- etl/ones-bi-sync-etl -s status -c etl/config.json
}

# 同步完成后强制进行一次数据优化，可避免出现相同数据，通常情况下 etl 服务会在每天凌晨进行一次优化
optimizeCHTable(){
    # 获取连接 Clickhouse 的用户名密码和端口
    kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
    ckUser=$(${kubectlInstallerCmd} make get_value KEY=clickhouseUser)
    if [ "$ckUser" = "" ]; then
      echo "[ERROR] get clickhouseUser empty! "
      exit 1
    fi
    echo "clickhouseUser: ${ckUser}"

    ckPassWord=$(${kubectlInstallerCmd} make get_value KEY=clickhousePassword)
    if [ "$ckUser" = "" ]; then
      echo "[ERROR] get clickhousePassword empty! "
      exit 1
    fi
    echo "clickhousePassword: ****"

    ckTcpPort=$(${kubectlInstallerCmd} make get_value KEY=clickhousePortTCP)
    if [ "$ckTcpPort" = "" ]; then
      echo "[ERROR] get clickhousePortTCP empty! "
      exit 1
    fi
    echo "clickhousePortTCP: ${ckTcpPort}"

    # 执行优化
    pod=$(kubectl get po -nones | grep clickhouse-statefulset-0 | awk '{print $1}')
    kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.task FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
    echo "OPTIMIZE TABLE default.task FINAL success"
    kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.field_value_history FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
    echo "OPTIMIZE TABLE default.field_value_history FINAL success"
    kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.manhour FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
    echo "OPTIMIZE TABLE default.manhour FINAL success"
}

forceRebuildALLPerformance(){
    echo "---- start to reset all performance data..."
    sleep 1
    set -e
    kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
    # 0 获取用于重建的 kafka 连接方式/topic 名称/目录存储路径
    localstorageStorageBasePath=$(getLocalstorageStorageBasePath)
    echo "localstorageStorageBasePath: ${localstorageStorageBasePath}"

    KafkaReadAddress=$(${kubectlInstallerCmd} make get_value KEY=KafkaReadAddress)
    if [ "$KafkaReadAddress" = "" ]; then
      echo "[ERROR] get KafkaReadAddress empty! "
      exit 1
    fi
    echo "KafkaReadAddress: ${KafkaReadAddress}"

    performanceTopicName=$(${kubectlInstallerCmd} make get_value KEY=kafkaProjectBinlogTopic)
    if [ "$performanceTopicName" = "" ]; then
      echo "[ERROR] get kafkaProjectBinlogTopic empty! "
      exit 1
    fi
    echo "kafkaProjectBinlogTopic: ${performanceTopicName}
    "

    # 1.将副本数降为 0，停止效能管道同步服务
    echo "---- step 1 stop bi-sync-*-deployment:"
    deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas="
    replicas=0
    stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${stopDeploymentCmd}"
    if ! $stopDeploymentCmd; then
        echo "[ERROR] stop bi-sync-*-deployment failed !"
        exit 1
    fi
    echo ""

    # 2. 删除kafka 中效能的 topic（bi-sync-etl 服务启动时会重新创建topic）
    echo "---- step 2 delete topic ${performanceTopicName}:"
    pod=$(kubectl get po -nones | grep kafka-ha-0 | awk '{print $1}')
    delTopicCmd="kubectl exec -it -nones ${pod} -- kafka-topics.sh --bootstrap-server ${KafkaReadAddress} --delete --topic ${performanceTopicName}"
    echo "${delTopicCmd}
    "
    ret=$($delTopicCmd)
    if [ "$ret" != "" ]; then
        retGroup="$(echo "${ret}" | grep "does not exist as expected")" # topic 不存在，认为删除 topic 成功
        if [ "${retGroup}" = "" ]; then
          echo "[ERROR] delete topic ${performanceTopicName} failed !
          "
          exit 1
        fi
    fi

    # 3. 删除 bolt 偏移文件
    echo "---- step 3 delete ones-bi-sync-*/* files:"
    delBoltFilesCmd="rm -rf ${localstorageStorageBasePath}/others-static-pvc/${namespace}/ones-bi-sync-*/*bolt"
    echo "${delBoltFilesCmd}
    "
    if ! $delBoltFilesCmd; then
        echo "[ERROR] delete bolt files failed !"
        exit 1
    fi

    # 4.将副本数改为 1，启动服务 pod
    echo "---- step 4 redeploy bi-sync-*-deployment"
    replicas=1
    startDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${startDeploymentCmd}"
    if ! $startDeploymentCmd; then
        echo "[ERROR] start bi-sync-*-deployment failed !"
        exit 1
    fi
}

forceRebuildALLPerformanceCDC(){
    echo "---- start to reset all cdc performance data ..."
    sleep 1
    set -e

    # 0 获取用于重建的 kafka 连接方式/topic 名称/目录存储路径
    localstorageStorageBasePath=$(getLocalstorageStorageBasePath)
    echo "localstorageStorageBasePath: ${localstorageStorageBasePath}"

    # 1.将副本数降为 0，停止效能管道同步服务
    echo "---- step 1 stop bi-sync-etl-deployment:"
    deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas="
    replicas=0
    stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${stopDeploymentCmd}"
    if ! $stopDeploymentCmd; then
        echo "[ERROR] stop bi-sync-etl-deployment failed !"
        exit 1
    fi
    sleep 2
    echo ""

    # 2. 删除 bolt 偏移文件
    echo "---- step 2 delete ones-bi-sync-*/* files:"
    delBoltFilesCmd="rm -rf ${localstorageStorageBasePath}/others-static-pvc/${namespace}/ones-bi-sync-*/*bolt"
    echo "${delBoltFilesCmd}
    "
    if ! $delBoltFilesCmd; then
        echo "[ERROR] delete bolt files failed !"
        exit 1
    fi

    # 3.设置 cdc 的 etl 配置为全部强制重建
    echo "---- step 3 set onesBISyncEtlForceSnapshot* and make setup-ones"
    ret=$(kubectl -n ones-installer exec deploy/installer-api -c installer-api -- bash -c "
    grep -vE onesBISyncEtl.*Snapshot config/private.yaml > config/private_rebuild_performance.yaml
    echo  'onesBISyncEtlForceSnapshotAll: true' >> config/private_rebuild_performance.yaml
    echo  'onesBISyncEtlForceSnapshotVersion: $(date +'%Y%m%d%H%M%S')' >> config/private_rebuild_performance.yaml
    mv config/private_rebuild_performance.yaml config/private.yaml;tail -2 config/private.yaml
    make setup-ones")

    # shellcheck disable=SC2181
    if [ $? -ne 0 ]; then
        echo "[ERROR] ${ret}
        "
        exit 1
    else
       echo "---- set config/private.yaml success: "
       echo "${ret}" | grep "onesBISyncEtl"
       echo ""
    fi

    # 4.将副本数改为 1，启动服务 pod
    echo "---- step 4 redeploy ones-bi-sync-etl-deployment"
    replicas=1
    startDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${startDeploymentCmd}"
    if ! $startDeploymentCmd; then
        echo "[ERROR] start ones-bi-sync-etl-deployment failed !"
        exit 1
    fi
}

quickRebuildALLPerformanceCDC(){
    echo "---- start to reset cdc performance data by the specified tables quickly ..."
    sleep 1
    set -e

    # 将副本数降为 0，停止效能管道同步服务
    echo "---- step 1 stop bi-sync-etl-deployment:"
    deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas="
    replicas=0
    stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${stopDeploymentCmd}"
    if ! $stopDeploymentCmd; then
        echo "[ERROR] stop bi-sync-etl-deployment failed !"
        exit 1
    fi
    sleep 2
    echo ""

    # 设置 cdc 的 etl 配置为快速重建
    echo "---- step 2 set onesBISyncEtlQuickSnapshot* and make setup-ones"
    ret=$(kubectl -n ones-installer exec deploy/installer-api -c installer-api -- bash -c "
    grep -vE onesBISyncEtl.*Snapshot config/private.yaml > config/private_rebuild_performance.yaml
    echo  'onesBISyncEtlQuickSnapshotTables: $2' >> config/private_rebuild_performance.yaml
    echo  'onesBISyncEtlQuickSnapshotVersion: $(date +'%Y%m%d%H%M%S')' >> config/private_rebuild_performance.yaml
    mv config/private_rebuild_performance.yaml config/private.yaml;tail -2 config/private.yaml
    make setup-ones")

    # shellcheck disable=SC2181
    if [ $? -ne 0 ]; then
        echo "[ERROR] ${ret}
        "
        exit 1
    else
       echo "---- set config/private.yaml success: "
       echo "${ret}" | grep "onesBISyncEtl"
       echo ""
    fi

    # 将副本数改为 1，启动服务 pod
    echo "---- step 3 redeploy ones-bi-sync-etl-deployment"
    replicas=1
    startDeploymentCmd="${deploymentScaleCmd}${replicas}"
    echo "${startDeploymentCmd}"
    if ! $startDeploymentCmd; then
        echo "[ERROR] start ones-bi-sync-etl-deployment failed !"
        exit 1
    fi
   echo ""
   echo "---- wait a few seconds for restart ...
   "
   sleep 8
   echo "---- reset performance success ----"
}

argv=$1
# 判断当前环境是否存在 kafka-cdc-connect pod，若不存在则为旧环境
kafkaCdcConnect=$(kubectl get po -nones | grep kafka-cdc-connect | awk '{print $1}')
if [ "$argv" = "-a" ]; then
   if [ "$kafkaCdcConnect" != "" ]; then
       forceRebuildALLPerformanceCDC
   else
       forceRebuildALLPerformance # （兼容旧版）将删除  bi-sync-canal 和 etl 以及 kafka 的效能数据
   fi
   echo ""
   echo "---- wait a few seconds for restart ...
   "
   sleep 8
   echo "---- reset performance success ----"
elif [ "$argv" = "-q" ]; then
  if [ "$2" = "" ]; then
      echo "[ERROR] please specify db:table for rebuild"
      exit 1
  fi
  quickRebuildALLPerformanceCDC "$@"
elif [ "$argv" = "-s" ]; then
   etlStatus
elif [ "$argv" = "-o" ]; then
   optimizeCHTable
elif [ "$argv" = "-v" ]; then
   echo "${version}"
else
   if [ "$kafkaCdcConnect" != "" ]; then
      echo "usage:
    [-a]: Reset the performance data, include ones-bi-sync-etl/clickhouse performance data with kafka cdc snapshots.
    [-q]: Reset the performance data quickly by the specified tables. eg [-q project:task,project:sprint]"
   else
      echo "usage:
    [-a]: Reset all performance data, include ones-bi-sync-canal/ones-bi-sync-etl/kafka && clickhouse performance data."
   fi
    echo "    [-s]: Show bi-sync-etl sync status.
    [-o]: Optimize tables (task, field_value_history, manhour), to avoid the same data.
    [-v]: Show version."
fi

FAQ

1. 全量重建过程中，cdc报错

参考重建索引的FAQ第1节。

重建 Performance 数据

操作步骤​

创建脚本​

完全重新同步-a​

指定表快速同步-q​

查看同步状态-s​

强制合并表数据-o​

查看版本-v​

脚本文件​

FAQ​

1. 全量重建过程中，cdc报错​