Skip to main content

🧠 Kafka & Zookeeper – Simplified & Sequential Guide


1️⃣ What is Apache Kafka?​

Apache Kafka is a distributed event streaming platform used to build real-time pipelines and event-driven applications.

πŸ“¨ Think of it like a post office that handles messages (data) from one system and delivers them to others β€” fast and reliably.


2️⃣ Core Components​

🧩 ComponentπŸ” Description
ProducerSends messages to Kafka
BrokerKafka server that stores messages
TopicNamed category (like a folder) for messages
PartitionSubdivision of a topic for scalability
OffsetUnique ID of a message within a partition
ConsumerReads messages from Kafka
Consumer GroupSet of consumers for parallel reading

3️⃣ What is Zookeeper?​

Zookeeper is a coordination service used by Kafka for:

  • 🧠 Managing broker metadata
  • πŸ‘‘ Leader election
  • βš–οΈ Sync and configuration
  • ⚠️ Failure detection

4️⃣ Kafka Architecture (Simple View)​

graph TD
A[Producer] --> B[Kafka Broker]
B --> C[Topic: orders]
C --> D[Partition 0]
C --> E[Partition 1]
F[Consumer Group] --> D
F --> E

5️⃣ Kafka Message Flow (5 Steps)​

  1. πŸ“¨ Producer sends message to Kafka Topic
  2. 🌐 Kafka assigns it to a Partition
  3. πŸ“œ Kafka stores it with an Offset
  4. πŸ“± Consumer reads the message
  5. ♻️ Kafka tracks what each Consumer has read

6️⃣ Installation & Setup (Ubuntu)​

β˜• Step 1: Install Java​

sudo apt update
sudo apt install default-jdk -y
java -version

πŸ‘€ Step 2: Create Kafka User​

sudo adduser kafka
sudo usermod -aG sudo kafka
su - kafka

πŸ“ Step 3: Download & Extract Kafka​

cd /var/www/html/Tools/
sudo mkdir -p /var/www/html/Tools/
sudo wget https://downloads.apache.org/kafka/3.6.1/kafka_2.13-3.6.1.tgz
sudo tar -xvzf kafka_2.13-3.6.1.tgz
sudo mv kafka_2.13-3.6.1 kafka
sudo chown -R kafka:kafka kafka

7️⃣ Run Services (Manually)​

🍡 Start Zookeeper​

cd /var/www/html/Tools/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties

πŸ₯€ Start Kafka​

cd /var/www/html/Tools/kafka
bin/kafka-server-start.sh config/server.properties

8️⃣ Create Topic, Produce & Consume Messages​

# Create a topic
bin/kafka-topics.sh --create --topic test --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

# Start producer
bin/kafka-console-producer.sh --topic test --bootstrap-server localhost:9092

# Start consumer
bin/kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server localhost:9092

9️⃣ Optional: Setup systemd Services​

πŸ“ Zookeeper Service: /etc/systemd/system/zookeeper.service​

[Unit]
Description=Apache Zookeeper
After=network.target

[Service]
Type=simple
User=kafka
ExecStart=/var/www/html/Tools/kafka/bin/zookeeper-server-start.sh /var/www/html/Tools/kafka/config/zookeeper.properties
Restart=on-failure

[Install]
WantedBy=multi-user.target

πŸš€ Kafka Service: /etc/systemd/system/kafka.service​

[Unit]
Description=Apache Kafka
After=zookeeper.service

[Service]
Type=simple
User=kafka
ExecStart=/var/www/html/Tools/kafka/bin/kafka-server-start.sh /var/www/html/Tools/kafka/config/server.properties
Restart=on-failure

[Install]
WantedBy=multi-user.target

πŸ”§ Kafka Config (server.properties)​

PropertyπŸ” Description
broker.id=0Unique ID for the broker
log.dirs=...Where Kafka stores messages
zookeeper.connect=...Zookeeper host/port
listeners=...Kafka network interface
log.retention.hours=Message retention period

πŸ”— Kafka Integrations​

πŸ“„ ToolπŸ”’ Purpose
Apache SparkReal-time stream processing
Apache FlinkComplex event handling
ElasticsearchLog search & dashboards
Debezium + MySQLChange Data Capture (CDC)
Kafka ConnectSource/sink connectors

πŸ“£ Nginx Kafka Proxy Example​

/etc/nginx/sites-enabled/kafka.conf​

server {
server_name kafka-asifahmadkhan.com;

location / {
proxy_pass http://localhost:9092;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}

πŸ—“ Kafka + PHP Integration Note​

In PHP, Kafka does not connect directly by default. You need a Kafka client library for PHP. That’s what the rdkafka PECL extension provides.

But this extension is just a wrapper around a C/C++ library called librdkafka β€” the actual Kafka client written in C. Without this library, the PHP extension cannot work.

βœ… Installation Steps for PHP (e.g. 8.4)​

sudo apt update
sudo apt install -y librdkafka-dev

sudo pecl install rdkafka

echo "extension=rdkafka.so" | sudo tee /etc/php/8.4/cli/conf.d/20-rdkafka.ini
echo "extension=rdkafka.so" | sudo tee /etc/php/8.4/fpm/conf.d/20-rdkafka.ini

sudo systemctl restart php8.4-fpm
sudo systemctl restart apache2 # or nginx

php -m | grep rdkafka

πŸ‘Œ Kafka GUI Setup with Redpanda Console​

To set up a Kafka GUI using Redpanda Console:

curl -1sLf 'https://dl.redpanda.com/nzc4ZYQK3WRGd9sy/redpanda/cfg/setup/bash.deb.sh' | \
sudo -E bash && sudo apt-get install redpanda-console -y

βœ… Kafka is now ready for real-time streaming, big data pipelines, and PHP-based integrations!