Confluent Cluster Linking
Introduction
In today's data-driven world, businesses rely on distributed data architectures across multiple cloud environments, regions, or on-premise data centers. Managing and synchronizing these distributed Kafka clusters efficiently is a complex challenge.
Confluent Cluster Linking provides a seamless way to link two Kafka clusters, enabling real-time data synchronization without requiring external replication tools like MirrorMaker 2. In this article, we’ll explore how Cluster Linking works, its benefits, and a practical example.
What is Confluent Cluster Linking?
Cluster Linking is a feature in Confluent Platform and Confluent Cloud that allows you to mirror topics from a source Kafka cluster to a destination Kafka cluster. This enables enterprises to build multi-region, hybrid cloud, or disaster recovery architectures with minimal operational overhead.
Unlike MirrorMaker 2, Cluster Linking directly transfers compressed Kafka messages, offsets, and metadata from one cluster to another without requiring a separate consumer-producer pipeline.
Key Features of Cluster Linking
⦁ Real-time Topic Mirroring - Automatically synchronizes data across clusters without manual intervention.
⦁ Offset Preservation - Ensures that consumer offsets remain the same, making failover seamless.
⦁ Cross-Cloud & Hybrid Support - Links clusters across different cloud providers (AWS, GCP, Azure) or on-premise deployments.
⦁ Efficient Data Transfer - Transfers data without re-producing messages, reducing duplicate storage and network costs.
⦁ Multi-Region Disaster Recovery - Enables fast recovery by keeping standby clusters updated.
⦁ Read-Only Mirroring - Ensures data integrity by preventing writes on the destination cluster.
Use Cases
⦁ Multi-Region Active-Passive Architecture: Ensuring disaster recovery by maintaining an up-to-date copy in another region.
⦁ Hybrid Cloud Deployment: Bridging data between on-premise Kafka clusters and cloud-based Confluent clusters.
⦁ Cross-Cloud Data Synchronization: Synchronizing Kafka topics between AWS, GCP, or Azure.
⦁ Compliance & Data Sovereignty: Mirror data to region-specific clusters to meet regulatory requirements (e.g., GDPR).
⦁ Zero-Downtime Migrations:Shift consumers between clusters without offset resets.Upgrade Kafka versions or migrate clusters with no downtime.
How Cluster Linking Works
Cluster Linking enables a direct connection between a source Kafka cluster and a destination Kafka cluster, allowing topics to be mirrored without the need for additional infrastructure. The mirroring process includes:
⦁ Topic creation on the destination cluster.
⦁ Message replication in real-time.
⦁ Offset preservation to ensure consumer consistency.
In other words
Direct Connection: Source and destination clusters link via Kafka protocol.
Topic Mirroring: Topics are replicated with metadata/offsets intact.
Continuous Sync: Changes in the source cluster propagate in real time.
Example: Setting Up Cluster Linking
Step 1: Create Two Kafka Clusters
Ensure you have two clusters:
- Source Cluster (where original data resides)
- Destination Cluster (where data will be mirrored)
Step 2: Get Cluster Details
Run the following command to get the Cluster IDs:
confluent kafka cluster list
Step 3: Create the Cluster Link
Run the following command to create a Cluster Link from the destination cluster:
confluent kafka link create my-cluster-link \
--source-cluster-id <SOURCE_CLUSTER_ID> \
--source-bootstrap-server <SOURCE_BOOTSTRAP_SERVER> \
--source-api-key <SOURCE_API_KEY> \
--source-api-secret <SOURCE_API_SECRET>
Step 4: Mirror a Topic
To mirror a topic (e.g., orders), use the following command:
confluent kafka mirror create orders --link my-cluster-link --sync
Step 5: Verify Cluster Link
Check the status of the Cluster Link:
confluent kafka link list
To check the mirrored topics, run:
confluent kafka mirror list --link my-cluster-link
Step 6: Consume Messages from Destination Cluster
Since Cluster Linking preserves offsets, consumers can read from the mirrored topic without issues:
confluent kafka topic consume orders --from-beginning
Cluster Linking vs MirrorMaker 2
Feature | Cluster Linking | MirrorMaker 2 |
Message Delivery | Direct message transfer | Consumer/Producer-based |
Offset Preservation | Yes | No |
Latency | Low | Higher |
Cloud Support | Multi-cloud & Hybrid | Limited |
Complexity | Low (Native Kafka feature) | High (Requires extra setup) |
Active-Active vs. Active-Passive
Model | Pros | Use Case |
Active-Passive | Cost-efficient, simple failover. | Disaster recovery, backup clusters. |
Active-Active | Low latency, high availability | Global applications, real-time analytics. |
Best Practices for Cluster Linking
⦁ Use Secure Authentication: Always use API keys or mTLS for authentication.
⦁ Monitor Link Performance: Use `confluent kafka link describe my-cluster-link` to check link health.
⦁ Ensure Network Connectivity: The destination cluster must have access to the source cluster.
⦁ Enable Compression: This reduces bandwidth and storage costs.
⦁ Handle Failover Gracefully: Have consumer failover strategies to switch between clusters when needed.
CLI Commands Cheat Sheet
Command Description
confluent kafka link create Create a Cluster Link
confluent kafka mirror list List mirrored topics
confluent kafka mirror pause Pause topic replication
confluent kafka link describe Check link status
Confluent Cluster Linking: Key Facts & Limitations
Key Facts
Real-Time Replication
Mirrors Kafka topics between clusters in real time without external tools like MirrorMaker 2.
Syncs messages, offsets, and metadata (e.g., partitions, configurations).
Offset Preservation
Maintains consumer offsets, enabling seamless failover for consumer groups.
Cross-Cloud & Hybrid Support
Links clusters across AWS, Azure, GCP, or on-premises environments.
Efficient Data Transfer
Transfers messages in compressed Kafka format, reducing network bandwidth usage.
Read-Only Mirrored Topics
Prevents accidental writes on destination clusters to ensure data integrity.
Native Integration
Built into Confluent Platform (6.1+) and Confluent Cloud.
Secure Authentication
Supports mTLS, SASL/SCRAM, and API keys for secure cluster-to-cluster communication.
No Consumer Groups Required
Operates at the broker level, avoiding the need for consumer/producer pipelines.
Compression Support
Retains source topic compression (e.g., gzip, snappy), lowering storage costs.
Multi-Use Case Flexibility
Ideal for disaster recovery, hybrid cloud, data sovereignty, and zero-downtime migrations.
Limitations
Confluent-Specific
Only available in Confluent Platform (6.1+) or Confluent Cloud (not in open-source Apache Kafka).
No Data Transformation
Mirrors topics 1:1 without filtering or modifying data. Use MirrorMaker 2 for transformations.
Network Connectivity Required
Source and destination clusters must have direct network access (public or private).
Unidirectional Replication
Supports one-way replication (source → destination). Active-Active setups require separate links.
No Metadata Sync
Does not replicate schemas, ACLs, or Kafka Streams state stores.
Performance Overheads
High-throughput clusters may experience latency during peak loads.
Read-Only Destination Topics
Mirrored topics on the destination cluster cannot be written to directly.
Version Compatibility
Source and destination clusters must run compatible Confluent Platform/Kafka versions.
No Conflict Resolution
Active-Active architectures require custom logic to handle conflicting writes.
Cluster Link Limits
Confluent Cloud imposes limits on the number of links per cluster (varies by plan).
When to Use Cluster Linking
✅ Best For:
⦁ Real-time disaster recovery.
⦁ Hybrid/multi-cloud Kafka architectures.
⦁ Offset-preserving consumer migrations.
🚫 Avoid If:
⦁ You need data filtering or transformation.
⦁ Using open-source Apache Kafka without Confluent.
⦁ Bidirectional replication is required.
Conclusion
Cluster Linking in Confluent Kafka is a powerful solution for real-time data synchronization across multiple Kafka clusters. Whether you need multi-region replication, hybrid cloud integration, or disaster recovery, Cluster Linking offers a more efficient and reliable alternative to MirrorMaker 2.