Apache kafka is a distributed streaming platform

Posted By : Pradeep Singh Kushwah | 28-Dec-2017


Apache Kafka is an open source flow processing platform developed project by the Apache Software Foundation written in Scala and Java. The project aims at providing a comprehensive platform, low latency, and high performance to manage real-time data sources. Its storage layer is essentially a "massively scalable pub/sub message queue designed as a distributed transaction log", [3] which makes it very valuable for the company's infrastructure to process the transmission data. In addition, Kafka connects external systems (for import/export of data) via Kafka Connect and supplies Kafka Streams, a library of Java flow processing.


Messaging System:-

A messaging system is responsible for transferring information from one point to another, so applications can focus on the information, but don't worry about sharing them. A Distributed messaging is based on the concept of trusted and secure messages between two points. Messages are asynchronous between client and messaging systems. There are two types of message patterns: one is point-to-point and the other is the public-subscribe messaging system (pub-sub). Most message patterns follow pub-sub.


Point-to-point messaging system:-
In a point-to-point system, messages continue in a queue. A consumer reads the message from the queue. When a consumer reads a message in the queue, it disappears from the queue. One or more consumers can consume the messages in the queue, but a particular message can only be taken by a maximum consumer. The typical example of this system is an order processing system, where each order is processed by an order processor, but multiple order processors can also work simultaneously. The following diagram shows the structure.


Publish-Subscribe Message system:- 

In the subscription of the version, the messages continue in a topic. Unlike the point-to-point system, consumers can subscribe to one or more topics and consume all the messages of that topic. In the publication subscription system, message manufacturers are called publishers and message consumers are called subscribers. A real example is Dish TV, which publishes several channels such as sports, movies, music, etc. And everyone can subscribe to their own set of channels and get them when their subscribed channels are available.


What is Kafka?

Apache Kafka is a distributed pub-sub messaging system and a robust queue that can handle a large amount of information (data) and allows you to forward messages from one point to another. Kafka is ideal for offline and instant messaging. Kafka messages are retained on the disk and replicated within the clusters to avoid data loss. Kafka is based on ZooKeeper Synchronization Service. It integrates very well with Apache Storm and Spark to analyze data in real time.



Reliability: Kafka distributes, partitions, replicates and tolerates mistakes.

Scalability: Kafka's message system easily scales without downtime.

Durability: Kafka uses the distributed commitment record, which means that messages continue on the disk as quickly as possible and, therefore, are durable.

Performance: Kafka is very fast and guarantees zero downtime and zero data loss.


Need for Kafka:-
Kafka is a comprehensive platform for managing all data feeds in real time. Kafka supports the delivery of low latency messages and offers warranty for fault tolerance in case of machine failure. It has the ability to handle a large number of different consumers. Kafka is very fast, makes 2 million scripts/sec. Kafka retains all data on disk, which means that all printers go to the cache on the operating system page (RAM). This makes it very efficient to transfer data from the page cache to a network plug.


Application of Apache Kafka:-

Twitter is a social networking service online provides a platform for sending and receiving tweets from users. Registered users can read and send tweets, but unregistered users can only read tweets. Twitter uses Storm-Kafka as part of its energy processing infrastructure.


Apache Kafka is used on LinkedIn for activity flow data and operational measurements. The messaging system Kafka LinkedIn helps with various products such as LinkedIn Newsfeed, LinkedIn today to use message analysis systems online and offline such as Hadoop. Kafka's strong durability and high performance are also one of the key factors associated with LinkedIn.


Netflix is a multinational provider of EE. UU. From online transmission media to order. Netflix uses Kafka for real-time monitoring and incident.


Mozilla is an open source software, created in 1998 by members of Netscape. Kafka soon replaces part of the Mozilla to collect info for and use of the end user browser for major projects such as test,telemetry, pilot etc.


Oracle provides an integrated product Apache Kafka from Enterprise Service Bus, called OSB (Oracle Service Bus), which allows developers to utilize the built-in mediation capabilities to complete OSB(Oracle Service Bus) data connection pipes in stages.


About Author

Author Image
Pradeep Singh Kushwah

Pradeep is an accomplished Backend Developer with in-depth knowledge and hands-on experience in various cutting-edge technologies. He specializes in Core Java, Spring-Boot, Optaplanner, Angular, and databases such as MongoDB, Neo4j, Redis, and PostgreSQL. Additionally, he has worked with cloud services like AWS and Google Cloud, and he has experience with monitoring tools such as Datadog and Raygun. Pradeep has honed his skills in API Implementations, Integration, optimization, Webservices, Development Testings, and deployments, code enhancements, and has contributed to company values through his deliverables in various client projects, including Kairos, Slick Payroll, Captionlabs, and FarmQ. He is a creative individual with strong analytical skills and a passion for exploring and learning new technologies.

Request for Proposal

Name is required

Comment is required

Sending message..