Scaling ClickHouse on a Kubernetes Cluster with Sharding and Replication

ClickHouse Sharding and Replication Diagram

Building a data warehouse from scratch is a challenging task. You have to build a solid ETL pipeline, provide high-availability databases, and API endpoints, including batch processing and rate limiters.

Data warehouses, by definition, are like dams. Several applications might depend on those warehouses, and any misfortune can cause disaster. I am not going any deeper into what data warehouses are like or how to build one. In this article, I am going to explain how you can scale your ClickHouse StatefulSets, which are the heart of the warehouse, on your Kubernetes cluster with sharding and replication.

What do Replication and Sharding Mean?

Replication involves duplicating data across multiple servers by ensuring high availability and fault tolerance. On the other hand, sharding distributes large datasets across several servers to manage large volumes of data and handle high throughput operations.

That sounds awesome, and it is exactly what we need for our data warehouse. Building a database that provides sharding and replication is not our concern for now. Most of the known database engines have sharding and replication support, for instance, Postgres Patroni Cluster. You should check their official docs about scaling. In our case, we are going to follow ClickHouse's official documentation for scaling. However, this guide is for a Dockerized setup. We apply these steps to a Kubernetes cluster.

All set. Let's start.

You can refer to the YAML files in this repository: ahmethedev/ch-2R2S-cluster for a ready-to-use ClickHouse cluster configuration with 2 shards and 2 replicas.

1. Setup ZooKeeper

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

ZooKeeper is a crucial service for our case. We are planning a 4-pod system design for scaling ClickHouse as 2 pods for replication and 2 pods for sharding. Replication and sharding require high attention for data management. ClickHouse uses ZooKeeper to save metadata about the cluster, such as which node has which table, which data it has handled, and which has not yet, etc. Also, ZooKeeper plays a critical role in DDL tasks such as CREATE or UPDATE. If you run a DDL command in the clickhouse-0 node, the ZooKeeper task queue will execute the same DDL in clickhouse-1 and clickhouse-2 nodes synchronously. It also helps with the leader election for the cluster. As you can see, we use ZooKeeper for all distributed system requirements.

How to Set Up ZooKeeper on Your Kubernetes Cluster?

Create a namespace for ZooKeeper:


kubectl create namespace zookeeper

Apply all ZooKeeper configuration files:


kubectl apply -f zookeeper/

Wait until the first pod is ready:


kubectl get pods -n zookeeper -w

Test the first pod:


kubectl exec -n zookeeper zookeeper-0 -- bash -c "echo ruok | nc localhost 2181"
# Expected output: imok

Scale to multiple pods:


# Add the second pod
kubectl scale statefulset zookeeper -n zookeeper --replicas=2

# Add the third after the second pod is ready
kubectl scale statefulset zookeeper -n zookeeper --replicas=3

Verify cluster status:


kubectl exec -n zookeeper zookeeper-0 -- bash -c "echo stat | nc localhost 2181"

# Expected zkServer.sh status output:
# zookeeper-0: Mode: follower (or leader)
# zookeeper-1: Mode: leader (or follower) 
# zookeeper-2: Mode: follower

We are all set for ZooKeeper.

2. Set Up ClickHouse with Sharding and Replication

As I mentioned earlier, we are not going to write a database engine from scratch for scaling. ClickHouse already has this capability. We just need to configure it to make it ready.

Apply all ClickHouse configuration files:


kubectl apply -f clickhouse/

Wait for the first pod to be ready, then scale to 4 pods:


# Scale gradually to ensure proper initialization
kubectl scale statefulset clickhouse-cluster --replicas=2
kubectl wait --for=condition=ready pod clickhouse-cluster-1 --timeout=300s

kubectl scale statefulset clickhouse-cluster --replicas=3
kubectl wait --for=condition=ready pod clickhouse-cluster-2 --timeout=300s

kubectl scale statefulset clickhouse-cluster --replicas=4
kubectl wait --for=condition=ready pod clickhouse-cluster-3 --timeout=300s

All pods need to be running. If you get errors, you need to check your ConfigMap or StatefulSet files.

We are done with the ClickHouse setup. Now we should have a ClickHouse cluster with 2 replicas and 2 shards.

How to Verify the Cluster?

Connect to the ClickHouse pod:


kubectl exec -it clickhouse-cluster-0 -- clickhouse-client

Check the cluster topology:


SELECT cluster, shard_num, replica_num, host_name, port FROM system.clusters;

Check the ZooKeeper connection:


SELECT * FROM system.zookeeper WHERE path IN ('/', '/clickhouse');

Test distributed table creation:


-- Create local table on each node
CREATE TABLE test_local ON CLUSTER cluster_2R_2S (
    id UInt32,
    data String,
    timestamp DateTime
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/test_local', '{replica}')
ORDER BY (id, timestamp);

-- Create distributed table
CREATE TABLE test_distributed ON CLUSTER cluster_2R_2S (
    id UInt32,
    data String,
    timestamp DateTime
) ENGINE = Distributed('cluster_2R_2S', default, test_local, rand());

-- Test data insertion
INSERT INTO test_distributed VALUES (1, 'test data', now());

-- Verify data distribution
SELECT * FROM test_distributed;

Troubleshooting Common Issues

ZooKeeper Connection Issues


# Check ZooKeeper logs
kubectl logs -n zookeeper zookeeper-0

# Test ZooKeeper connectivity
kubectl exec -n zookeeper zookeeper-0 -- zkCli.sh -server localhost:2181 ls /

ClickHouse Cluster Issues


# Check ClickHouse logs
kubectl logs clickhouse-cluster-0

# Verify cluster configuration
kubectl exec -it clickhouse-cluster-0 -- clickhouse-client --query "SELECT * FROM system.clusters"

File Structure

Your project structure should look like this:


clickhouse-k8s/
├── README.md
├── zookeeper/
│   ├── 01-zookeeper-configmap.yaml
│   ├── 02-zookeeper-headless.yaml
│   └── 03-zookeeper-deployment.yaml
└── clickhouse/
    ├── 01-clickhouse-configmap.yaml
    ├── 02-clickhouse-headless.yaml
    ├── 03-clickhouse-deployment.yaml
    └── 04-clickhouse-service.yaml

Conclusion

Congratulations! You have successfully set up a distributed ClickHouse cluster with 2 replicas and 2 shards on Kubernetes. This setup provides both high availability through replication and horizontal scalability through sharding.