How to Build a Scalable Follower Feed in Firestore

How to Build a Scalable Follower Feed in Firestore

18 min read

## Update 8/28/21 I am currently in the process of creating this. Until then, check out my other solutions [here](https://dev.to/jdgamble555/firestore-many-to-many-part-4-working-follower-feeds-3k9o). _____ As the author of [adv-firestore-functions](https://github.com/jdgamble555/adv-firestore-functions), I feel like I have figured out how to hack every problem the [Firestore Team refuses to solve internally](https://dev.to/jdgamble555/dear-firebase-team-p17), except how to connect the relational data of a follower feed. I have stared at this [Stack Overflow Question](https://stackoverflow.com/questions/46979375/firestore-how-to-structure-a-feed-and-follow-system) for hours (will update my answer on it after I post this), and filled pages of Microsoft OneNote on my IPAD with conceptual ideas. I think I have finally found a workable, scalable, solution. *SPOILER:* - this is conceptual and has not been tested... # The Problem Allowing users to have followers and to follow other users is not a problem. You could easily add and remove a user by the collections: `users/followers` and `users/following`..., but here is the kicker... how do you pull only posts by users you are following and sort them by `createdAt` in descending order. Keep in mind a user may have 2 million followers, but may only follow say 1000 people. **Following Angle**: Pulling 1000 different users' latest posts to populate my feed will not only cost me extraneous reads, it will be slow to sort, and I don't need all feeds. I only want 5 or 10 at a time. Also, the 10 latest posts may only be by 3 users, so it gets even more complicated. **Followers Angle**: The noSQL way of doing things would theoretically be to update every user's feed with a copy of my post every time a new post is created. If a user has 2 million followers, I need to make 2 million copies. This is a crap load of writes, and could timeout in a firestore function. Again, this is very slow as well. # Problem Specifics So we need to basically make an index, not timeout the firestore function, try and limit the number of writes, and be able to sort them when reading without slowing down the client. ~Easy Peasy~ # Non-Scalable Method One method you may see is to use arrays, and `array-contains` like in [Fireship.io's Data Modeling Course](https://fireship.io/courses/firestore-data-modeling/models-social-feed/). Obviously documents have limits, so my estimate is a max of 10,000 items in an array from [this data](https://gist.github.com/khaykov/a6105154becce4c0530da38e723c2330). Jeff (from Fireship) also puts the latest 3 posts on a user document, which creates the problem of getting anything beyond 3 posts. Theoretically you could copy the user data into a second (or third, forth, etc) for every 10,000 users... but then you still have an inexact user count if you use my `colCounter()` function, for example, and all the other problems from above. But, Fireship did get me going in the right direction... (if you're reading this, you need to check out his courses regardless, all kinds of topics outside of firebase - [fireship.io](https://fireship.io)). # Problem 1: Solving relations... However, arrays are the key here... I believe a better way to model the data is like so: `users/$userId/followers` `users/$userId/followers_index` `posts/$postId` `_relations/_relationId` Each relation document contains: ```typescript { userId: 12ksl2123k, postId: 12sk2skeeiwo2, createdAt: 5/2/21, followers: [ 3k2l12k3ls, g2lss9837ie, titsiel22, ... ] } ``` And you have a copy of this for each group of 10,000 followers a user has. I will get into data consistency in a bit, so hold tight. The key here is that if a user has 20,000,000 followers, I only need 2000 copies of each posts (20,000,000 / 10,000). This is HUGE! ## Sidenote `users/userId/followers` is a collection with all followers. `users/userId/followers_index` is a collection of documents with the followers array ready to be copied. That way you don't read all followers one by one ever. Again, 2000 docs for 20,000,000 followers... ## Creating the relation index... My goal was to write something in my `adv-firestore-functions` that does this automatically like so, but sadly I may never return to Firestore development due to the reasons [here](https://dev.to/jdgamble555/dear-firebase-team-p17). It would run on a `postWrite` trigger and look like this: ```typescript await relationIndex(change, context, { fields: ['createdAt'], array_collection: `users/${author$}/followers_index`, array_name: 'followers' }); ``` (just like my search functions...) I would have added options, but generally speaking it would have created 2000 documents for 20 million users automatically, for example. It would also add the `createdAt` field for sorting (or whatever fields from the post document necessary for your user case). This assumes the id of the followers collection is the userId. Like the rest of my search indexes, if any of the fields in posts were changed or the post was deleted, it would auto update these documents. Here are some [ideas from my package](https://github.com/jdgamble555/adv-firestore-functions/blob/master/src/shared/search.ts) on how to do that if you decide to implement this. I would have written a second function for data consistency. In order to keep the data consistent, you need to update all documents by a user a user is subscribed to. So, if user `123` unsubscribes from user `456`, `123` needs to be removed from the follower array for every post user `456` has ever created. If this is just posts, it may only be dozens or hundreds. If this is videos it could be thousands, and tweets may be tens of thousands, but I believe that is even more rare. Most cases will be 1-30 documents, not a big deal. If a user is removed, that document will always have 9999 items in the array (or less). It makes more sense instead of more complex functions to always have 10,000 users on each document. Users don't unsubscribe as often. This would all be done on a `users/followers` write trigger (which would also add the user to `users/followers_index`). This document would look like: ```typescript count: 52, followers: [ 123ksl2, 2k3l22l, 3920132, s2l2235, ... ] ``` ...an array of follower with the total count. The docID is whatever. You would also need a third trigger for when users are deleting their accounts, but you may not want to even allow that option (just disable). Finally, you get the user feed on the front end like so: ```typescript db.collections('_relations') .where('followers', 'array-contains', CURRENT_USER) .orderBy('createdAt', 'desc'); ``` You could index all the post info on that document, or you could just pull the document from the **postId** on the relation doc using pipes, for example. The point is, you have options... # Problem 2: Firestore Function Limits The next problem I believe I solved, is firebase function limits... so my theory is simple: run multiple functions for chunking. The same function triggers itself until the updates are completed. Firebase functions have time and memory limits... Internally my package would have created `_functions/${eventId}` using the firestore event Id from the firestore functions. I do similar things in my package with `_events`, you just never needed to understand it. The `postWrite` trigger from above would basically create a new document from the eventId like so: ```typescript { lastDoc: (reference to first follower document), collection: 'users/followers_index', field: 'followers', chunk: 500 } ``` And the `_functions` collection would have another trigger that repeats updating the **lastDoc** document reference until all documents have been read... The function would get `db.collection('users/followers').startAfter(lastDoc)` in chunks of 500 and add it to the post relation index document. After there are no more followers left, the trigger loop ends... Is your head exploding yet?! The point here is not about the followers, but about the concept of **Bulk Reads and Writes** by saving your chunks and place into a separate document. I would have probably updated my [bulk delete and bulk update](https://github.com/jdgamble555/adv-firestore-functions/blob/master/HELPER.md) functions to do this as a side node. This concept would also have been used to unfollow etc... *You may not even need this, since even 2000 documents can be handled by firestore functions easily... we know batch can handle 600*. This thing is freakin scalable... # Conclusion I am writing this article for two reasons. 1.) To show proof of concept and get it out of my head 2.) To hope someone someday uses some of these ideas I would love if someone wrote this, as I probably never will. I am exhausted of noSQL in general, but love challenges. I am currently developing in DGraph, but Supabase.io seems really interesting, as well as NHost.io. They all solve problems I never want to solve again using noSQL, and the perhaps weakest in features, Firestore. If anyone wants to write this, feel free to send me a pull request. In fact, any updates to my package are welcomed. Keep ideas flowing, and keep speech free... J # Update 8/12/21 I thought I would give a little more specifics on how this theoretically works: **users/{userId}** ```typescript { ...user data... followers_index: [ slejf, 23k2l2, ... ], latestFollowersIndex: slejf } ``` The followers_index would be a list of all doc ids for the followers index (each with 10,000 users), and the latest one being latestFollowersIndex. This is NOT an array of followerId, but an array of followers_index, which itself is an doc with an array of followers... 1. A user follows another user - Your client adds a new doc to `users/{userId}/followers/{followerId}` - the **followers** collection triggers an **onWrite** function that: * gets latestFollowersIndex from user doc * if count is >= 10000 on latestFollowerIndex doc, then create new followers_index doc, set latestfollowersIndex to new doc * adds the **followerId** to the followers field in `users/{userId}/followers_index/{latestFollowersIndex}` * increases **count** field * get all postIds `collection('posts').where('userId', '==', userId)` * foreach postId create new doc `_relations/_relationId` (with whatever post doc data you want) * copy followers array from `users/userId/followers_index/latestFollowersIndex --> followers[]` to each relations doc -- created for each user's posts -- (there should be a relations doc for each post by each user for each 10,000 followers), so a user with 10 followers and 5 posts = 5 relation docs only 2. A user unfollows another user - `users/{userId}/followers/{followerId}` is deleted - **onWrite** trigger removes userId from relevant followers_index doc, which in turn updates relation docs with new array (without that user) and removes one from that count 3. A user adds a new post - posts **onWrite** trigger creates a new _relations doc only foreach 10,000 followers, so < 10,000 followers === 1 doc Hope this helps give a little more information, J
manytomany
followerfeed