重建 Performance 数据
操作步骤
创建脚本
在服务器上创建脚本文件 reset_performance_k3s.sh
,内容查看文档下方【脚本文件】。
可查看帮助说明
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh
usage:
[-a]: Reset all performance data, include ones-bi-sync-canal/ones-bi-sync-etl/kafka && clickhouse performance data.
[-s]: Show bi-sync-etl sync status.
[-o]: Optimize tables (task, field_value_history, manhour), to avoid the same data.
[-v]: Show version.
完全重新同步-a
通常以下情况需要完全重新同步:
performance 展示的数据有部分未同步 或 跟 project 展示的有差异(可能是程序 bug)
增改了 bi-sync-canal 对 mysql 表的 binlog 监听则需要完成重新同步数据才能落入 kafka 供下游消费;
kafka 中存在脏数据;
执行 -a --后可使用 -s 查看同步状态:
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -a
---- start to reset all performance data...
localstorageStorageBasePath: /data/ones/ones-local-storage
KafkaReadAddress: kafka-ha:9092
kafkaProjectBinlogTopic: project_binlog
---- step 1 stop bi-sync-*-deployment:
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled
---- step 2 delete topic project_binlog:
kubectl exec -it -nones kafka-ha-0 -- kafka-topics.sh --bootstrap-server kafka-ha:9092 --delete --topic project_binlog
---- step 3 delete ones-bi-sync-*/* files:
rm -rf /data/ones/ones-local-storage/others-static-pvc/ones/ones-bi-sync-*/*bolt
---- step 4 restart bi-sync-*-deployment
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled
---- wait a few seconds for restart ...
---- reset performance success, now you can get bi-sync-etl status by '-s' argv
指定表快速同步-q
(>= 6.1.0 版本有效) 当需要针对一个或多个特定表进行重新同步时,可执行该选项。
需要注意,指定重新同步的表,可能会依赖于其它表,如果依赖的表未同步 或 依赖表数据本身有问题了,那么指定重新同步的表可能会同步不成功
使用方法,例如指定重新同步 task 和 sprint 表(多表用逗号隔开,中间不要出现空格):
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -q project.task,project.sprint
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -q project.task,project.sprint
---- start to reset cdc performance data by the specified tables quickly ...
---- step 1 stop bi-sync-etl-deployment:
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-etl-deployment scaled
---- step 2 set onesBISyncEtlQuickSnapshot* and make setup-ones
2024-07-01 02:19:44,110 [INFO] ones_path=, k8s_root_dir=/data/ones/ones-ai-k8s
2024-07-01 02:19:44,111 [INFO] waiting for lock...
2024-07-01 02:19:44,111 [INFO] starting...
2024-07-01 02:19:44,111 [INFO] render config
… …(此处省略一些信息)
2024-07-01 02:20:32,711 [INFO] setup ones finish
2024-07-01 02:20:32,711 [INFO] elapsed time: 48.600 seconds
---- set config/private.yaml success:
onesBISyncEtlQuickSnapshotTables: project.task,project.sprint
onesBISyncEtlQuickSnapshotVersion: 20240701101943
---- step 3 redeploy ones-bi-sync-etl-deployment
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-etl-deployment scaled
---- wait a few seconds for restart ...
---- reset performance success ----
etl 服务重启后,会发现重新对 task 和sprint 表做了快照并重新消费
... ...
2024/07/01 10:30:22 [INFO] Waiting for preorder snapshots to complete...
2024/07/01 10:30:32 [INFO] Topic initialization completed.
2024/07/01 10:30:32 [INFO] Connector initialization successful.
2024/07/01 10:30:32 [INFO] Consumer quick snapshot: {20240701102842 [project.task_status]}, force snapshot: {1 []}
2024/07/01 10:30:32 [INFO] Consumer quick snapshot all: false, force snapshot all: false
2024/07/01 10:30:32 [INFO] Consumers prepare tables for snapshots: []
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.task
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.sprint
2024/07/01 10:30:32 [INFO] The tables ready to execute snapshots are: [project.task_status]
2024/07/01 10:30:46 [INFO] Consumer bi-sync-etl_project_data start consuming the schema event stream
... ...
查看同步状态-s
这里指的是 ones-bi-sync-etl 服务的同步状态。用法(使用 -s 选项,可多次执行)
ONES版本 >= 6.1.0 时
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s // 反复执行,可见同步进度
[project_data] running, dumping // 还在准备同步中,不一定有实时进度
[department] running, dump finished, incremental syncing
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: org_user at 1 / 26 (3%) // 存量数据同步中,可看到实时进度
[department] running, dump finished, incremental syncing
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: project at 6 / 26 (23%)
[department] running, dump finished, incremental syncing
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: field_value at 16 / 26 (61%)
[department] running, dump finished
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
[project_data] running, dumping, current table: manhour at 24 / 26 (92%)
[department] running, dump finished, incremental syncing
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh performance_tool_k3s.sh -s
// 所有管道的存量数据同步完成,进入正常增量同步
[project_data] running, dump finished, incremental syncing
[department] running, dump finished, incremental syncing
ONES版本 < 6.1.0 时
root@iZwz94kqmvp7aa5zkju609Z:~# sh performance_tool_k3s.sh -s
Defaulted container "ones-bi-sync-etl" out of: ones-bi-sync-etl, wait-for-mysql (init), wait-for-clickhouse (init), wait-for-kafka (init)
[project_data] running, dumping, prepared, 379963/594657 (63%)
[department] running, dumping, prepared, 594652/594657 (99%)
重做完成的标准:
[project_data] 和 [department] 这两个管道组都要求达到99%或100%的状态。(99%也可以认为同步完成,是因为同步过程中客户环境可能不断产生新数据,从而无法完全达到100%)
强制合并表数据-o
脚本执行前面同步(-p/-a)命令后,页面上刷新数据可能会出现相同的数据(这是因为 Clickhouse 来不及进行 mergetree 合并数据的操作)。例如:
#492840 效能管理数据重复计算 https://our.ones.pro/project/#/team/RDjYMhKq/task/W32D3WkfokwFOKty
服务最迟会在每天凌晨(服务器时区)进行优化操作,但有时为了避免造成客户困扰,也可以在 -a 数据同步完成后,执行一下-o选项,合并数据:
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -o
clickhouseUser: default
clickhousePassword: ****
clickhousePortTCP: 9000
OPTIMIZE TABLE default.task FINAL success
OPTIMIZE TABLE default.field_value_history FINAL success
OPTIMIZE TABLE default.manhour FINAL success
查看版本-v
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -v
v1.0.1 20240603 performance tool for k3s deployment
脚本文件
#!/bin/bash
# 该运维脚本如有功能更新,需同步更新效能运维文档,并知会运维同学
version="v1.1.0 20240628 performance tool for k3s deployment"
namespace=ones
getLocalstorageStorageBasePath() {
kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
localstorageStorageBasePath=$(${kubectlInstallerCmd} make get_value KEY=localstorageStorageBasePath)
if [ "$localstorageStorageBasePath" = "" ]; then
echo "[ERROR] get localstorageStorageBasePath empty! "
exit 1
fi
echo "${localstorageStorageBasePath}"
}
etlStatus(){
# 打印当前 etl 服务同步状态
pod=$(kubectl get po -nones | grep bi-sync-etl | awk '{print $1}')
kubectl exec -it -nones "${pod}" -- etl/ones-bi-sync-etl -s status -c etl/config.json
}
# 同步完成后强制进行一次数据优化,可避免出现相同数据,通常情况下 etl 服务会在每天凌晨进行一次优化
optimizeCHTable(){
# 获取连接 Clickhouse 的用户名密码和端口
kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
ckUser=$(${kubectlInstallerCmd} make get_value KEY=clickhouseUser)
if [ "$ckUser" = "" ]; then
echo "[ERROR] get clickhouseUser empty! "
exit 1
fi
echo "clickhouseUser: ${ckUser}"
ckPassWord=$(${kubectlInstallerCmd} make get_value KEY=clickhousePassword)
if [ "$ckUser" = "" ]; then
echo "[ERROR] get clickhousePassword empty! "
exit 1
fi
echo "clickhousePassword: ****"
ckTcpPort=$(${kubectlInstallerCmd} make get_value KEY=clickhousePortTCP)
if [ "$ckTcpPort" = "" ]; then
echo "[ERROR] get clickhousePortTCP empty! "
exit 1
fi
echo "clickhousePortTCP: ${ckTcpPort}"
# 执行优化
pod=$(kubectl get po -nones | grep clickhouse-statefulset-0 | awk '{print $1}')
kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.task FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
echo "OPTIMIZE TABLE default.task FINAL success"
kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.field_value_history FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
echo "OPTIMIZE TABLE default.field_value_history FINAL success"
kubectl exec -it -nones "${pod}" -- clickhouse-client -q "OPTIMIZE TABLE default.manhour FINAL" --port "${ckTcpPort}" --user "${ckUser}" --password "${ckPassWord}"
echo "OPTIMIZE TABLE default.manhour FINAL success"
}
forceRebuildALLPerformance(){
echo "---- start to reset all performance data..."
sleep 1
set -e
kubectlInstallerCmd="kubectl -n ones-installer exec deploy/installer-api -c installer-api -- "
# 0 获取用于重建的 kafka 连接方式/topic 名称/目录存储路径
localstorageStorageBasePath=$(getLocalstorageStorageBasePath)
echo "localstorageStorageBasePath: ${localstorageStorageBasePath}"
KafkaReadAddress=$(${kubectlInstallerCmd} make get_value KEY=KafkaReadAddress)
if [ "$KafkaReadAddress" = "" ]; then
echo "[ERROR] get KafkaReadAddress empty! "
exit 1
fi
echo "KafkaReadAddress: ${KafkaReadAddress}"
performanceTopicName=$(${kubectlInstallerCmd} make get_value KEY=kafkaProjectBinlogTopic)
if [ "$performanceTopicName" = "" ]; then
echo "[ERROR] get kafkaProjectBinlogTopic empty! "
exit 1
fi
echo "kafkaProjectBinlogTopic: ${performanceTopicName}
"
# 1.将副本数降为 0,停止效能管道同步服务
echo "---- step 1 stop bi-sync-*-deployment:"
deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas="
replicas=0
stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${stopDeploymentCmd}"
if ! $stopDeploymentCmd; then
echo "[ERROR] stop bi-sync-*-deployment failed !"
exit 1
fi
echo ""
# 2. 删除kafka 中效能的 topic(bi-sync-etl 服务启动时会重新创建topic)
echo "---- step 2 delete topic ${performanceTopicName}:"
pod=$(kubectl get po -nones | grep kafka-ha-0 | awk '{print $1}')
delTopicCmd="kubectl exec -it -nones ${pod} -- kafka-topics.sh --bootstrap-server ${KafkaReadAddress} --delete --topic ${performanceTopicName}"
echo "${delTopicCmd}
"
ret=$($delTopicCmd)
if [ "$ret" != "" ]; then
retGroup="$(echo "${ret}" | grep "does not exist as expected")" # topic 不存在,认为删除 topic 成功
if [ "${retGroup}" = "" ]; then
echo "[ERROR] delete topic ${performanceTopicName} failed !
"
exit 1
fi
fi
# 3. 删除 bolt 偏移文件
echo "---- step 3 delete ones-bi-sync-*/* files:"
delBoltFilesCmd="rm -rf ${localstorageStorageBasePath}/others-static-pvc/${namespace}/ones-bi-sync-*/*bolt"
echo "${delBoltFilesCmd}
"
if ! $delBoltFilesCmd; then
echo "[ERROR] delete bolt files failed !"
exit 1
fi
# 4.将副本数改为 1,启动服务 pod
echo "---- step 4 redeploy bi-sync-*-deployment"
replicas=1
startDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${startDeploymentCmd}"
if ! $startDeploymentCmd; then
echo "[ERROR] start bi-sync-*-deployment failed !"
exit 1
fi
}
forceRebuildALLPerformanceCDC(){
echo "---- start to reset all cdc performance data ..."
sleep 1
set -e
# 0 获取用于重建的 kafka 连接方式/topic 名称/目录存储路径
localstorageStorageBasePath=$(getLocalstorageStorageBasePath)
echo "localstorageStorageBasePath: ${localstorageStorageBasePath}"
# 1.将副本数降为 0,停止效能管道同步服务
echo "---- step 1 stop bi-sync-etl-deployment:"
deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas="
replicas=0
stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${stopDeploymentCmd}"
if ! $stopDeploymentCmd; then
echo "[ERROR] stop bi-sync-etl-deployment failed !"
exit 1
fi
sleep 2
echo ""
# 2. 删除 bolt 偏移文件
echo "---- step 2 delete ones-bi-sync-*/* files:"
delBoltFilesCmd="rm -rf ${localstorageStorageBasePath}/others-static-pvc/${namespace}/ones-bi-sync-*/*bolt"
echo "${delBoltFilesCmd}
"
if ! $delBoltFilesCmd; then
echo "[ERROR] delete bolt files failed !"
exit 1
fi
# 3.设置 cdc 的 etl 配置为全部强制重建
echo "---- step 3 set onesBISyncEtlForceSnapshot* and make setup-ones"
ret=$(kubectl -n ones-installer exec deploy/installer-api -c installer-api -- bash -c "
grep -vE onesBISyncEtl.*Snapshot config/private.yaml > config/private_rebuild_performance.yaml
echo 'onesBISyncEtlForceSnapshotAll: true' >> config/private_rebuild_performance.yaml
echo 'onesBISyncEtlForceSnapshotVersion: $(date +'%Y%m%d%H%M%S')' >> config/private_rebuild_performance.yaml
mv config/private_rebuild_performance.yaml config/private.yaml;tail -2 config/private.yaml
make setup-ones")
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
echo "[ERROR] ${ret}
"
exit 1
else
echo "---- set config/private.yaml success: "
echo "${ret}" | grep "onesBISyncEtl"
echo ""
fi
# 4.将副本数改为 1,启动服务 pod
echo "---- step 4 redeploy ones-bi-sync-etl-deployment"
replicas=1
startDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${startDeploymentCmd}"
if ! $startDeploymentCmd; then
echo "[ERROR] start ones-bi-sync-etl-deployment failed !"
exit 1
fi
}
quickRebuildALLPerformanceCDC(){
echo "---- start to reset cdc performance data by the specified tables quickly ..."
sleep 1
set -e
# 将副本数降为 0,停止效能管道同步服务
echo "---- step 1 stop bi-sync-etl-deployment:"
deploymentScaleCmd="kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas="
replicas=0
stopDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${stopDeploymentCmd}"
if ! $stopDeploymentCmd; then
echo "[ERROR] stop bi-sync-etl-deployment failed !"
exit 1
fi
sleep 2
echo ""
# 设置 cdc 的 etl 配置为快速重建
echo "---- step 2 set onesBISyncEtlQuickSnapshot* and make setup-ones"
ret=$(kubectl -n ones-installer exec deploy/installer-api -c installer-api -- bash -c "
grep -vE onesBISyncEtl.*Snapshot config/private.yaml > config/private_rebuild_performance.yaml
echo 'onesBISyncEtlQuickSnapshotTables: $2' >> config/private_rebuild_performance.yaml
echo 'onesBISyncEtlQuickSnapshotVersion: $(date +'%Y%m%d%H%M%S')' >> config/private_rebuild_performance.yaml
mv config/private_rebuild_performance.yaml config/private.yaml;tail -2 config/private.yaml
make setup-ones")
# shellcheck disable=SC2181
if [ $? -ne 0 ]; then
echo "[ERROR] ${ret}
"
exit 1
else
echo "---- set config/private.yaml success: "
echo "${ret}" | grep "onesBISyncEtl"
echo ""
fi
# 将副本数改为 1,启动服务 pod
echo "---- step 3 redeploy ones-bi-sync-etl-deployment"
replicas=1
startDeploymentCmd="${deploymentScaleCmd}${replicas}"
echo "${startDeploymentCmd}"
if ! $startDeploymentCmd; then
echo "[ERROR] start ones-bi-sync-etl-deployment failed !"
exit 1
fi
echo ""
echo "---- wait a few seconds for restart ...
"
sleep 8
echo "---- reset performance success ----"
}
argv=$1
# 判断当前环境是否存在 kafka-cdc-connect pod,若不存在则为旧环境
kafkaCdcConnect=$(kubectl get po -nones | grep kafka-cdc-connect | awk '{print $1}')
if [ "$argv" = "-a" ]; then
if [ "$kafkaCdcConnect" != "" ]; then
forceRebuildALLPerformanceCDC
else
forceRebuildALLPerformance # (兼容旧版)将删除 bi-sync-canal 和 etl 以及 kafka 的效能数据
fi
echo ""
echo "---- wait a few seconds for restart ...
"
sleep 8
echo "---- reset performance success ----"
elif [ "$argv" = "-q" ]; then
if [ "$2" = "" ]; then
echo "[ERROR] please specify db:table for rebuild"
exit 1
fi
quickRebuildALLPerformanceCDC "$@"
elif [ "$argv" = "-s" ]; then
etlStatus
elif [ "$argv" = "-o" ]; then
optimizeCHTable
elif [ "$argv" = "-v" ]; then
echo "${version}"
else
if [ "$kafkaCdcConnect" != "" ]; then
echo "usage:
[-a]: Reset the performance data, include ones-bi-sync-etl/clickhouse performance data with kafka cdc snapshots.
[-q]: Reset the performance data quickly by the specified tables. eg [-q project:task,project:sprint]"
else
echo "usage:
[-a]: Reset all performance data, include ones-bi-sync-canal/ones-bi-sync-etl/kafka && clickhouse performance data."
fi
echo " [-s]: Show bi-sync-etl sync status.
[-o]: Optimize tables (task, field_value_history, manhour), to avoid the same data.
[-v]: Show version."
fi