π§ Kafka & Zookeeper β Simplified & Sequential Guide
1οΈβ£ What is Apache Kafka?β
Apache Kafka is a distributed event streaming platform used to build real-time pipelines and event-driven applications.
π¨ Think of it like a post office that handles messages (data) from one system and delivers them to others β fast and reliably.
2οΈβ£ Core Componentsβ
π§© Component | π Description |
---|---|
Producer | Sends messages to Kafka |
Broker | Kafka server that stores messages |
Topic | Named category (like a folder) for messages |
Partition | Subdivision of a topic for scalability |
Offset | Unique ID of a message within a partition |
Consumer | Reads messages from Kafka |
Consumer Group | Set of consumers for parallel reading |
3οΈβ£ What is Zookeeper?β
Zookeeper is a coordination service used by Kafka for:
- π§ Managing broker metadata
- π Leader election
- βοΈ Sync and configuration
- β οΈ Failure detection
4οΈβ£ Kafka Architecture (Simple View)β
graph TD
A[Producer] --> B[Kafka Broker]
B --> C[Topic: orders]
C --> D[Partition 0]
C --> E[Partition 1]
F[Consumer Group] --> D
F --> E
5οΈβ£ Kafka Message Flow (5 Steps)β
- π¨ Producer sends message to Kafka Topic
- π Kafka assigns it to a Partition
- π Kafka stores it with an Offset
- π± Consumer reads the message
- β»οΈ Kafka tracks what each Consumer has read
6οΈβ£ Installation & Setup (Ubuntu)β
β Step 1: Install Javaβ
sudo apt update
sudo apt install default-jdk -y
java -version
π€ Step 2: Create Kafka Userβ
sudo adduser kafka
sudo usermod -aG sudo kafka
su - kafka
π Step 3: Download & Extract Kafkaβ
cd /var/www/html/Tools/
sudo mkdir -p /var/www/html/Tools/
sudo wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
sudo tar -xvzf kafka_2.13-3.6.1.tgz
sudo mv kafka_2.13-3.6.1 kafka
sudo chown -R kafka:kafka kafka
7οΈβ£ Run Services (Manually)β
π΅ Start Zookeeperβ
cd /var/www/html/Tools/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties
π₯€ Start Kafkaβ
cd /var/www/html/Tools/kafka
bin/kafka-server-start.sh config/server.properties
8οΈβ£ Create Topic, Produce & Consume Messagesβ
# Create a topic
bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
# Start producer
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092
# Start consumer
bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092
9οΈβ£ Optional: Setup systemd Servicesβ
π Zookeeper Service: /etc/systemd/system/zookeeper.service
β
[Unit]
Description=Apache Zookeeper
After=network.target
[Service]
Type=simple
User=kafka
ExecStart=/var/www/html/Tools/kafka/bin/zookeeper-server-start.sh /var/www/html/Tools/kafka/config/zookeeper.properties
Restart=on-failure
[Install]
WantedBy=multi-user.target
π Kafka Service: /etc/systemd/system/kafka.service
β
[Unit]
Description=Apache Kafka
After=zookeeper.service
[Service]
Type=simple
User=kafka
ExecStart=/var/www/html/Tools/kafka/bin/kafka-server-start.sh /var/www/html/Tools/kafka/config/server.properties
Restart=on-failure
[Install]
WantedBy=multi-user.target
π§ Kafka Config (server.properties)β
Property | π Description |
---|---|
broker.id=0 | Unique ID for the broker |
log.dirs=... | Where Kafka stores messages |
zookeeper.connect=... | Zookeeper host/port |
listeners=... | Kafka network interface |
log.retention.hours= | Message retention period |
π Kafka Integrationsβ
π Tool | π’ Purpose |
---|---|
Apache Spark | Real-time stream processing |
Apache Flink | Complex event handling |
Elasticsearch | Log search & dashboards |
Debezium + MySQL | Change Data Capture (CDC) |
Kafka Connect | Source/sink connectors |
π£ Nginx Kafka Proxy Exampleβ
/etc/nginx/sites-enabled/kafka.conf
β
server {
server_name kafka-asifahmadkhan.com;
location / {
proxy_pass http://localhost:9092;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
π Kafka + PHP Integration Noteβ
In PHP, Kafka does not connect directly by default. You need a Kafka client library for PHP. Thatβs what the rdkafka
PECL extension provides.
But this extension is just a wrapper around a C/C++ library called librdkafka
β the actual Kafka client written in C. Without this library, the PHP extension cannot work.
β Installation Steps for PHP (e.g. 8.4)β
sudo apt update
sudo apt install -y librdkafka-dev
sudo pecl install rdkafka
echo "extension=rdkafka.so" | sudo tee /etc/php/8.4/cli/conf.d/20-rdkafka.ini
echo "extension=rdkafka.so" | sudo tee /etc/php/8.4/fpm/conf.d/20-rdkafka.ini
sudo systemctl restart php8.4-fpm
sudo systemctl restart apache2 # or nginx
php -m | grep rdkafka
π Kafka GUI Setup with Redpanda Consoleβ
To set up a Kafka GUI using Redpanda Console:
curl -1sLf 'https://dl.redpanda.com/nzc4ZYQK3WRGd9sy/redpanda/cfg/setup/bash.deb.sh' | \
sudo -E bash && sudo apt-get install redpanda-console -y
β Kafka is now ready for real-time streaming, big data pipelines, and PHP-based integrations!