A brief description of Apache Kafka
Posted By : Lokesh Babu Sharma | 30-Aug-2019
Generally, Kafka is used for two broad classes of applications first one is building real-time streaming data pipelines that easily transfer data between systems or applications and the second one is building real-time streaming applications that transform or react data as streams.
2. Kafka stores the stream of 'records' according to categories called topics.
3. Each record consists of a timestamp, a key, and a value.
Mainly four core APIs are used in Kafka:
1. Producer API - Publish streams of records to one or more Kafka topics.
2. Consumer API - Subscribe one or more topics and process the streams of records.
3. Streams API - Converts the input stream to output stream and producing results. Kafka works as a stream processor, consuming an input stream and producing an output stream.
4. Connector API - Running and building reusable producers and consumers that connect Kafka topics to already in use applications or data systems.
Kafka as a cluster have different components:
2. Kafka Zookeeper - a Zookeeper is a top-level software developed by Apache which used to maintain configuration data and naming and to hand over robust and flexible synchronization within distributed systems and acts as a centralized service. Zookeeper keeps information on the status of cluster nodes of Kafka and keeps information on Kafka topics, partitions, etc. The brain of the whole system is known as Zookeeper Atomic Broadcast (ZAB) protocol.
3. Kafka Producer - Producers send data to brokers. All the producers search the newly started broker and automatically sends a message to that newly started broker. Kafka producer doesn’t wait for acknowledgments from the broker and Kafka producer sends messages as fast as the broker can handle, it doesn’t wait for acknowledgments from the broker.
4. Kafka Consumers - Kafka broker is stateless this is why consumers manage that how many messages have been consumed. The broker is issued an asynchronous pull request to the consumer to have bytes buffer ready to consume. Once the consumer acknowledges a particular message offset, the consumer has consumed all prior messages.
5. Kafka topics - The stream of a particular type /classification of data is defined by a topic. The producer produces a stream of data with topics then consumers consume these data topic wise. In the Kafka, cluster topics name must be unique. We can use unlimited topics. We can update data or messages after gets published.
6. Partitions in Kafka - Topics are breaks into the partition and replicated across brokers. Messages, each assigned an incremental id called offset, are stored in sequence manner in a partition, These messages are meaningful only within the partition.
7. Topic replication Factor - Topic makes a replica in another broker. If any broker goes down, topics replica's from another broker can solve this crisis.
Setup Apache Kafka :
sudo apt-get update sudo apt-get install default-jdk
2. Setup Kafka using the following commands:
wget http://www-us.apache.org/dist/kafka/2.2.1/kafka_2.12-2.2.1.tgz extract the archive file tar xzf kafka_2.12-2.2.1.tgz move file mv kafka_2.12-2.2.1 /usr/local/kafka
3. Start Kafka :