This is part 7 of Developing Instagram Clone series, other parts are linked below

  1. Developing Instagram Clone: Introduction.
  2. Developing Instagram Clone: Discovery Service.
  3. Developing Instagram Clone: Auth Service
  4. Developing Instagram Clone: Media Service.
  5. Developing Instagram Clone: Post Service.
  6. Developing Instagram Clone: Graph Service.
  7. Developing Instagram Clone: Newsfeed Service.
  8. Developing Instagram Clone: Gateway Service.
  9. Developing Instagram Clone: Front-end Service

In any social network, there is a place where updates from followings. In our case updates mean new photos.

The feed service has two main functionalities

  1. Generate user feed based on the posts the user is following.
  2. Retrieve user feed.

Feed Generation Models

There are two models that we will discuss, the pull and the push models.

Pull model: whenever feed service receives a request to generate the feed for a user, it will perform the following steps:

  • 1. Retrieve IDs of all users the user follows.
  • 2. Retrieve latest posts for those IDs.
  • 3. Store this feed in the cache and return top posts (say 20).
  • 4. On the front-end, when the user reaches the end of her current feed, she can fetch the next 20 posts from the server and so on.

In this model if the user is following many users, will have a great performance impact.

Push model: In push model, it keeps an entry for each user in a database (Cassandra in our case) and whenever a user creates a post, it will perform the following:

  • 1. Retrieve IDs of all users that the user who created the post follows.
  • 2. Push the post to the feeds for those IDs in the database.

In our case we use the push model, the service listens to post created event and it does the above steps whenever it receives a post created event.

Why Cassandra?

We choose Cassandra because Cassandra can handle millions of reads and writes in a second and scales linearly effortlessly, Also it operates on peer-to-peer fashion, this means there is no master-workers relations, hence, providing a high availability.

Why not using MongoDB? because MongoDB has a bad write performance at scale, and in our case read and write performance is really important.

Cassandra using a query language call CQL (Cassandra query language) which is very similar to SQL, and it has all the terminologies from the SQL except for schema is called “keyspace”.

Cassandra Data Sharding

Sharding is distributing the data among multiple nodes (servers), if you have a huge data that cannot fit in one server.

Cassandra uses a partition key to determine how the data should be distributed and cluster key to determine how the data is sorted within the server.

Cassandra Model Design

To use Cassandra effectively, you have to follow table per query approach, which means if you have a queries to retrieve feeds ordered by creation date ascending and descending, you should have a table that stores data in ascending order and another table that stores it in descending order.

Data redundancy is the cost of great performance you get.

Cassandra on docker

It is the time now to uncomment Cassandra dependency in our docker compose file.

cassandra:                           
image: cassandra:latest                           
ports:                             
- "7000:7000"                             
- "9042:9042"                           
volumes:                             
- /home/amr/instaCassandra:/var/lib/cassandra

You should change the path “/home/amr/instaCassandra” to a path that is exists on your machine.

#microservices #developer

Microservices In Practice: Developing Instagram Clone —Newsfeed Service
6.20 GEEK