精华 如果使用复制表和集群复制,会不会冲突
发布于 2 个月前 作者 jackpgao 294 次浏览 来自 问答
  • 已知ClickHouse的数据复制有两种:
    1. 依赖复制表
    2. 集群的分片里,同时给分片加上副本
  • 问题,如果两种都弄了,会不会冲突?
6 回复

不会,ReplicaMergeTree引擎会有同样数据删除机制

@artJava 那意思就是复制的逻辑是走了两遍?

@artJava ReplicaMergeTree 对相同数据会删除? 怎么个相同法? 主键不是非唯一的么

如果不依赖主键的话 怎么判定行是相同的

@jackpgao

四种组合方案如下 Non-replicated tables, internal_replication=false. Data inserted into the Distributed table is inserted into both local tables and if there are no problems during inserts, the data on both local tables stays in sync. We call this “poor man’s replication” because replicas easily diverge in case of network problems and there is no easy way to determine which one is the correct replica.

Replicated tables, internal_replication=true. Data inserted into the Distributed table is inserted into only one of the local tables, but is transferred to the table on the other host via the replication mechanism. Thus data on both local tables stays in sync. This is the recommended configuration.

Non-replicated tables, internal_replication=true. Data is inserted into only one of the local tables, but there is no mechanism to transfer it to the other table. So local tables on different hosts end up with different data and you get confusing results when querying the Distributed table. Obviously this is an incorrect way to configure ClickHouse cluster.

Replicated tables, internal_replication=false. Data is inserted into both local tables, but thanks to the deduplication feature of Replicated tables only the first insert goes through and the insert into the other table on the other host gets silently discarded because it is a duplicate. The first insert is then replicated to the other table. So in this case nothing overtly bad happens, replicas stay in sync, but there is significant performance degradation due to the constant stream of duplicates. So you should avoid this configuration too and use configuration 2 instead.

不冲突, 试着搭建了3分片3副本的集群. 就是一个分片对应3个副本,每个副本内容一样.  三个分片同时使用这三个副本, 就是3对3的情况.

试了下达到了想要的效果,每个节点上都有整个集群的完整数据.

回到顶部