Basics of Publish Subscribe Model

publish - publishing a message (data) to a partition. used by the publisher or producer. publisher or producer creates a message to the partition. topic is a folder like filesystem where these messages are stored in partitions. partition is sub modules of a topic. a single topic can have multiple partitions. pub/sub system uses hash map (key/ value pair) to store a message. after publishing a message to a topic topic creates a hash of the key and store it to a particular partition and the publisher does not care about the partition where the message will be stored. often message is stored in JSON or XML format. there is another tool or schema management framework is called Avro. this is used to serialize the schema. an offset is an integer value which continously increases and added to the end of a message to uniquely identify and find the position of it. every message has an unique offset at the end of it which is coming from last consumed message. messages are often produced/published and consumed/subscribed in batch to manage storage.

subscribe - Messages are pushed to consumers, which means that consumers are delivered messages without having to request them. Messages are exchanged through a virtual channel called a topic. A topic is a destination where producers can publish, and subscribers can consume, messages. Messages delivered to a topic are automatically pushed to all qualified consumers.

broker/cluster - broker is a single kafka server which contains topics and partitions. this is used to produce a message that is it stores the message to a storage disk and getting the hash for a message. this also fetches messages to consume. a cluster is a group of brokers. here one broker will be automatically selected as controller which manages the administrative things. a broker can have a single partition that is called the leader of that partition. this is used to maintain redundancy. having control of a partition is called ownership. this can be taken over by another broker if a broker failure occures. in kafka there is a functionality called retension. a topic can be configured to retain the partitions and messages for a certain condition, that may be a certain number of time or certain size of the topic. this can be also configured for a particular partition or messages.

Multiple datacentre - if there is multiple datacentre and multiple clusters are there in the datacentres and all the modification of a message in a datacentre
needs to be copied to other datacentres then it cannot be done my kafka redundancy. it uses a middleware tool named mirrormaker. this acts as producer/consumer. it consumes messages from a cluster and produces it to another.

Comments

Popular posts from this blog

SSO — WSO2 API Manager and Keycloak Identity Manager

Garbage Collectors - Serial vs. Parallel vs. CMS vs. G1 (and what’s new in Java 8)

Recommendation System Using Word2Vec with Python