Quick Firestore Frontend Search Index

Quick Firestore Frontend Search Index

16 min read

For those of you that don't know you can get full-text search capabilities in Firestore, read my article on my [adv-firestore-functions](https://dev.to/jdgamble555/firestore-full-text-search-package-1ea7) search package. However, as great as using Firebase Functions are, sometimes we just want a simple and quick way to be able to search through our data. Unfortunately, the [Firebase Team](https://dev.to/jdgamble555/dear-firebase-team-p17) has not built this natively yet. So, I wanted to create a quick way to index your data from the frontend... **Note:** - This post uses angular examples, but the premise is for any framework. ## Soundex The core of this code is based on the **soundex** function which has been used in SQL databases for generations to emulate a fuzzy search. It basically translates your text so that similar sounds in the English Language would be stored as the same string. There are other versions in other languages of this algorithm as well. Just search `'french' + 'soundex'` i.e. ```typescript soundex(s: string) { const a = s.toLowerCase().split(""); const f = a.shift() as string; let r = ""; const codes = { a: "", e: "", i: "", o: "", u: "", b: 1, f: 1, p: 1, v: 1, c: 2, g: 2, j: 2, k: 2, q: 2, s: 2, x: 2, z: 2, d: 3, t: 3, l: 4, m: 5, n: 5, r: 6, } as any; r = f + a .map((v: string) => codes[v]) .filter((v: any, i: number, b: any[]) => i === 0 ? v !== codes[f] : v !== b[i - 1]) .join(""); return (r + "000").slice(0, 4).toUpperCase(); } ``` ## Create the Index Based on my [relevant search index](https://github.com/jdgamble555/adv-firestore-functions/blob/HEAD/SEARCHING.md), I created a simple frontend version you can use in your app. ```typescript async searchIndex(opts: { ref: DocumentReference, after: any, fields: string[], del?: boolean, useSoundex?: boolean }) { opts.del = opts.del || false; opts.useSoundex = opts.useSoundex || true; const allCol = '_all'; const searchCol = '_search'; const termField = '_term'; const numWords = 6; const colId = opts.ref.path.split('/').slice(0, -1).join('/'); // get collection const searchRef = doc( this.afs, `${searchCol}/${colId}/${allCol}/${opts.ref.id}` ); if (opts.del) { await deleteDoc(searchRef); } else { let data: any = {}; let m: any = {}; // go through each field to index for (const field of opts.fields) { // new indexes let fieldValue = opts.after[field]; // if array, turn into string if (Array.isArray(fieldValue)) { fieldValue = fieldValue.join(' '); } let index = this.createIndex(fieldValue, numWords); // if filter function, run function on each word if (opts.useSoundex) { const temp = []; for (const i of index) { temp.push(i.split(' ').map( (v: string) => this.fm.soundex(v) ).join(' ')); } index = temp; for (const phrase of index) { if (phrase) { let v = ''; const t = phrase.split(' '); while (t.length > 0) { const r = t.shift(); v += v ? ' ' + r : r; // increment for relevance m[v] = m[v] ? m[v] + 1 : 1; } } } } else { for (const phrase of index) { if (phrase) { let v = ''; for (let i = 0; i < phrase.length; i++) { v = phrase.slice(0, i + 1).trim(); // increment for relevance m[v] = m[v] ? m[v] + 1 : 1; } } } } } data[termField] = m; data = { ...data, slug: opts.after.slug, title: opts.after.title }; try { await setDoc(searchRef, data) } catch (e: any) { console.error(e); } } } ``` And you will also need the `index` function: ```typescript createIndex(html: string, n: number): string[] { // create document after text stripped from html function createDocs(text: string) { const finalArray: string[] = []; const wordArray = text .toLowerCase() .replace(/[^\p{L}\p{N}]+/gu, ' ') .replace(/ +/g, ' ') .trim() .split(' '); do { finalArray.push( wordArray.slice(0, n).join(' ') ); wordArray.shift(); } while (wordArray.length !== 0); return finalArray; } // strip text from html function extractContent(html: string) { const tmp = document.createElement('div'); tmp.innerHTML = html; return tmp.textContent || tmp.innerText || ''; } // get rid of code first return createDocs( extractContent(html) ); } ``` **Note:** - For SSR, never access the document directly, inject instead the framework [document](https://stackoverflow.com/questions/37521298/how-to-inject-document-in-service) variable. ## Usage To use it, after you update data you want searchable, update the index: ```typescript async indexPost(id: string, data: any) { await this.searchIndex({ ref: doc(this.afs, 'posts', id), after: data, fields: ['content', 'title', 'tags'] }); } ``` Pass in all your doc data as `after`, your document ref as `ref`, and the fields you want searchable as `fields`. The rest is done automatically. If you're deleting a post, simply pass in `del: true`, and it will delete the index. You will end up with an index like this: ![Firestore Index](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/e3htbcwntandm2eelbkr.png) The beauty is, it will automatically store more relevant items with a higher number. If you mention `star wars` 7 times, it will have a relevance of 7. ## Searching To actually use the indexing for searching, you need to grab the term on your frontend through a form keyup value, and run the search like so: ```typescript /** * Search posts by term * @param term * @returns Observable of search */ searchPost(term: string) { term = term.split(' ') .map( (v: string) => this.ns.soundex(v) ).join(' '); return collectionData( query( collection(this.afs, '_search/posts/_all'), orderBy('_term.' + term), ), { idField: 'id' } ).pipe( take(1), debounceTime(100) ); } ``` As you can see, all search indexes are stored in `_search/{YOUR COLLECTION}/_all/{YOUR DOC ID}`. The field `_term` will contain all of your searchable data. This will return an observable with all of the documents that match your query. It also saves the document data in the search document for easy access and less reads. You could easily just print the 'title' of each document if you wanted an autocomplete, or the whole documents if you have a full search. ## Faq - 1) Why do we duplicate the data in an index, and not just store the searchable information on the regular document as well? - Speed. You don't want to read all of the search data unless you're doing an actual search. NoSQL has to copy data for reads to be more efficient. - 2) If I do this on the frontend, am I going to slow down my app with code that should be on the backend? - No. Not if you build your app efficiently. You should only be loading read functions for most users. If a user is logged in, and wants to edit a post, or whatever searchable document, only then should these write functions be lazy-loaded. The `soundex` function, however, should be shared for searching and indexing. - If you use a router, you should update your document, redirect to that page, then run the index function in the background. **Example** ```typescript // add post info try { this.id = await this.db.setPost(data, this.id, publish); } catch (e: any) { console.error(e); error = true; } if (publish && !error) { this.sb.showMsg(this.messages.published); this.router.navigate(['/post', this.id, slug]); // create search index data.content = this.markdownService.compile(data.content); await this.db.indexPost(this.id, data); } ``` After you publish your data, display the message, redirect, then run the search index in the background while you continue to browse. Note: If you use a markdown service, you may need to compile your code to html before you can index it. Look at how your app works. You may not have to do all that, as you will find this function is **really fast**. - 3) What about security? Data integrity? In reality, if a user wants to mess with their own index, let them. Their index is based on their content, so they have full access to those words in their index anyway. However, we don't want them messing with someone else's index, so we can use this Firestore rule: ```typescript function searchIndex() { let docPath = /databases/$(database)/documents/$(request.path[4])/$(request.path[6]); return get(docPath).data.authorId == request.auth.uid; } match /_search/{document=**} { allow read; allow write: if searchIndex(); } ``` This only let's them edit a document in whatever collection based on the `authorId` being equal to the logged in user. You may need to change that variable based on your app. - 4) What if I store data in many language? - Don't use the `soundex` function. Pass in `useSoundex: false`, or better yet, just modify the code without the soundex function. You will still have an exact search which is similar to `LIKE 'Term%'` in sql, allowing you to only search for letters starting with 'Term'. It will also automatically sort by relevance of the term in your data. You could also theoretically change the `soundex` function depending on the language you're searching in. And, you have a fully working search index without firebase functions. For more info, see the [backend version](https://dev.to/jdgamble555/firestore-full-text-search-package-1ea7), which has a few more features (create indexes by field instead of `_all` etc). **Note:** If you have a very large dataset, you could get a `too many index entries for entity` or a `firestore exceeds the maximum size` document error. If that is the case, consider parsing out `pre` tags, shortening your allowable article length, only adding the needed fields (like title) to the document, or writing custom code to split the index into multiple documents (I may do this eventually). **UPDATE:** I fixed the bug creating overly large indexes, check the code above, only chose a SOUNDEX code block or a text code block! Happy searching. Yes, this site uses it! J
index
search
firestorerules
angular