重建 Performance 数据
当出现效能统计数据不符合预期等异常时,可能是报表数据异常,可参考本文重建。
1 影响说明
在效能重建期间,系统会逐步创建全新的效能报表。该过程采用增量式处理,数据会按批次被重新分析并加入的报表。
因此,在重建过程中,各个报表会逐步恢复可用;不可用的报表会提示正在重建中。
2 重建操作
2.1 获取ONES版本号
参考获取ONES版本号,下述相关操作分支与版本有关。
2.2 6.103.0及以上版本
2.2.1 重建操作
进入 ones-ai-k8s 操作终端
ones-ai-k8s.sh
全量重建
make rebuild-perf
2.2.2 检查状态
pod=$(kubectl get po -nones | grep bi-sync-etl | awk '{print $1}')
kubectl exec -it -nones "${pod}" -- etl/ones-bi-sync-etl -s status -c etl/config.json
2.3 6.103.0以下版本
2.3.1 下载脚本
curl -O https://res.ones.pro/script/reset_performance_k3s.sh
具体脚本用法可查看帮助说明
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh
usage:
[-a]: Reset all performance data, include ones-bi-sync-canal/ones-bi-sync-etl/kafka && clickhouse performance data.
[-s]: Show bi-sync-etl sync status.
[-o]: Optimize tables (task, field_value_history, manhour), to avoid the same data.
[-v]: Show version.
2.3.2 完全重新同步-a
通常以下情况需要完全重新同步:
-
performance 展示的数据有部分未同步 或 跟 project 展示的有差异(可能是程序 bug)
-
增改了 bi-sync-canal 对 mysql 表的 binlog 监听则需要完成重新同步数据才能落入 kafka 供下游消费;
-
kafka 中存在脏数据;
执行 -a --后可使用 -s 查看同步状态:
root@iZwz94kqmvp7aa5zkju609Z:~# sh reset_performance_k3s.sh -a
---- start to reset all performance data...
localstorageStorageBasePath: /data/ones/ones-local-storage
KafkaReadAddress: kafka-ha:9092
kafkaProjectBinlogTopic: project_binlog
---- step 1 stop bi-sync-*-deployment:
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled
---- step 2 delete topic project_binlog:
kubectl exec -it -nones kafka-ha-0 -- kafka-topics.sh --bootstrap-server kafka-ha:9092 --delete --topic project_binlog
---- step 3 delete ones-bi-sync-*/* files:
rm -rf /data/ones/ones-local-storage/others-static-pvc/ones/ones-bi-sync-*/*bolt
---- step 4 restart bi-sync-*-deployment
kubectl -n ones scale deployment ones-bi-sync-canal-deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-canal-deployment scaled
deployment.apps/ones-bi-sync-etl-deployment scaled
---- wait a few seconds for restart ...
---- reset performance success, now you can get bi-sync-etl status by '-s' argv
2.3.3 指定表快速同步-q
(>= 6.1.0 版本有效) 当需要针对一个或多个特定表进行重新同步时,可执行该选项。
需要注意,指定重新同步的表,可能会依赖于其它表,如果依赖的表未同步 或 依赖表数据本身有问题了,那么指定重新同步的表可能会同步不成功
使用方法,例如指定重新同步 task 和 sprint 表(多表用逗号隔开,中间不要出现空格):
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh reset_performance_k3s.sh -q project.task,project.sprint
root@iZwz90sbtj7ak4ns2gr2heZ:~# sh reset_performance_k3s.sh -q project.task,project.sprint
---- start to reset cdc performance data by the specified tables quickly ...
---- step 1 stop bi-sync-etl-deployment:
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=0
deployment.apps/ones-bi-sync-etl-deployment scaled
---- step 2 set onesBISyncEtlQuickSnapshot* and make setup-ones
2024-07-01 02:19:44,110 [INFO] ones_path=, k8s_root_dir=/data/ones/ones-ai-k8s
2024-07-01 02:19:44,111 [INFO] waiting for lock...
2024-07-01 02:19:44,111 [INFO] starting...
2024-07-01 02:19:44,111 [INFO] render config
… …(此处省略一些信息)
2024-07-01 02:20:32,711 [INFO] setup ones finish
2024-07-01 02:20:32,711 [INFO] elapsed time: 48.600 seconds
---- set config/private.yaml success:
onesBISyncEtlQuickSnapshotTables: project.task,project.sprint
onesBISyncEtlQuickSnapshotVersion: 20240701101943
---- step 3 redeploy ones-bi-sync-etl-deployment
kubectl -n ones scale deployment ones-bi-sync-etl-deployment --replicas=1
deployment.apps/ones-bi-sync-etl-deployment scaled
---- wait a few seconds for restart ...
---- reset performance success ----
etl 服务重启后,会发现重新对 task 和sprint 表做了快照并重新消费
... ...
2024/07/01 10:30:22 [INFO] Waiting for preorder snapshots to complete...
2024/07/01 10:30:32 [INFO] Topic initialization completed.
2024/07/01 10:30:32 [INFO] Connector initialization successful.
2024/07/01 10:30:32 [INFO] Consumer quick snapshot: {20240701102842 [project.task_status]}, force snapshot: {1 []}
2024/07/01 10:30:32 [INFO] Consumer quick snapshot all: false, force snapshot all: false
2024/07/01 10:30:32 [INFO] Consumers prepare tables for snapshots: []
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.task
2024/07/01 10:30:32 [INFO] The snapshot record message is: &{Timestamp:1719801837520 AppName:bi-sync-etl_project_data}, table: project.sprint
2024/07/01 10:30:32 [INFO] The tables ready to execute snapshots are: [project.task_status]
2024/07/01 10:30:46 [INFO] Consumer bi-sync-etl_project_data start consuming the schema event stream
... ...