What Approaches do we Have when it Comes to Search Engine?
Recently I am responsible for building search functions in frontend. I come up with the following in my mind:
-
We sends everything we want from backend to frontend, and we may either use standard regular expression or dedicated library like
Fuse.js
orlunr.js
to query for desired results. This works perfectly fine for static web pages (such as this blog). -
We build Elastic Stack, such as Elastic Search and Kibana, which in essense also save results in
Document
and index the fields for searching the documents.
And after struggling for tutorials in youtube, I came across:
- We use Algolia by feeding our json files (wich consists of search targets) and setting the field names we want to use as search indexes.
Code Implementation for Fuse.js
Search Target
First we build our blog.json
file which serves as a search resource.
[ ... { "content": "..." "title": "Write Middleware in Redux-Toolkit", "date": "2023-06-20T00:00:00.000Z", "id": "blog0132", "tag": "react", "intro": "We list sample usage of ..." "toc": true }, ... ]
Build a blog.json
which Contains Documents to Search
blog.json
which Contains Documents to SearchIn my case I use the following script:
import fs from "fs"; import matter from "gray-matter"; import path from "path"; const mdDirs = ["./src/mds/articles/tech", "./src/mds/articles/math"]; const getAllMdFilePaths = (dir: string) => { const mdFiles: string[] = []; const getFiles = (dir: string) => { const paths = fs.readdirSync(dir); paths.forEach((p) => { const newPath = path.join(`${dir}/${p}`); const pathStat = fs.statSync(newPath); if (pathStat.isDirectory()) { getFiles(newPath); } else { if (newPath.endsWith(".md")) { mdFiles.push(newPath); } } }); }; getFiles(dir); return mdFiles; }; const writeMdInJson = () => { const targetPaths = "./src/mds/blog.json"; const blogJson: any[] = []; for (const dirpath of mdDirs) { const mdpaths = getAllMdFilePaths(dirpath); mdpaths.forEach((path) => { const mdText = fs.readFileSync(path, { encoding: "utf8", flag: "r" }); const { data, content } = matter(mdText); const { wip = false } = data; if (!wip) { blogJson.push({ content, ...data }); } }); } fs.writeFileSync( targetPaths, JSON.stringify(blogJson, null, 0) .replace(/(\\r\\n)/g, " ") .replace(/`/g, "") .replace(/\s+/g, " ") ); }; const main = () => { writeMdInJson(); }; main();
Search Component
Fuse.js (Deprecated as the result is not satisfactory)
Next in the our search component:
import searchJson from "../../../mds/blog.json"; export default function SearchComponent() { const fuse = useRef( new Fuse(searchJson, { keys: ["content", "tag", "tags", "title", "intro"], threshold: config.fuzzySearchThreshold, }) ); const [searchResults, setSearchedResults] = useState< { title: string; intro: string; tag: string; tags: string }[] >([]); const [searchText, setSearchText] = useState(""); const searchBarRef = useRef<HTMLInputElement>(null); const handleSearchChange = debounce((e: ChangeEvent<HTMLInputElement>) => { setSearchText(e.target.value); const searchValue = e.target.value; if (searchValue) { const result = fuse.current.search(searchValue); setSearchedResults( result.map((r) => { const { title, intro, tag, tags } = r.item; return { title, intro, tag: tag || "", tags: tags || "" }; }) ); } else { setSearchedResults([]); } }, 300); return ( <SearchBar placeholder="Tag, title or content" onChange={handleSearchChange} inputRef={searchBarRef} /> ); }
- The
Fuse
object can be created anywhere and imported into the component. - In my case I simply use
useRef
as it is going to be aways static and unchanged in the life cycle of theSearchComponent
.
lunr.js, A much more Powerful Version of Fuse.js
The implementation is very similar to Fuse.js
:
export default function SearchComponent() { const [searchResults, setSearchedResults] = useState< { title: string; intro: string; tag: string; tags: string }[] >([]); const [searchText, setSearchText] = useState(""); const searchBarRef = useRef<HTMLInputElement>(null); const lunrSearch = useRef<lunr.Index | null>(null); const searchMapping = useRef<{ [id: string]: { content: string, title: string, intro: string, tag: string, tags: string } }>({}); useState(() => { lunrSearch.current = lunr(function () { this.field("tag"); this.field("tags"); this.field("title"); this.field("intro"); this.field("content"); console.log("indexing ..."); (searchJson as { content: string, title: string, date: string, id: string, tag?: string, tags?: string, intro: string, toc: boolean }[]).forEach( (searchTarget, index) => { const id = index.toString(); const { intro, tag = "", tags = "", title, content } = searchTarget; const searchJson = { intro, tag, tags, title, content }; searchMapping.current[id] = searchJson this.add({ ...searchJson, id }) } ); })}) const handleSearchChange = debounce((e: ChangeEvent<HTMLInputElement>) => { setSearchText(e.target.value); const searchValue = e.target.value; if (searchValue) { const result = lunrSearch?.current?.search(searchValue); const displayResult = result?.sort((r1, r2) => r2.score - r1.score).map(r => { const { ref } = r; const doc = searchMapping.current?.[ref]; // we dont' need to return content in the search field return { intro: doc.intro, tag: doc.tag, tags: doc.tags, title: doc.title } }) || []; setSearchedResults(displayResult); } else { setSearchedResults([]); } }, 300); return ( <SearchBar placeholder="Tag, title or content" onChange={handleSearchChange} inputRef={searchBarRef} /> ); }
Code Implementation for Algolia
Backend Using Java
Responsibilities of Backend in Using Algolia
Our backend will take the following tasks:
-
Provide
ALGOLIA_SEARCH_INDEX
-
Provide
applicationID
-
Provide frontend client with
searchApiKey
's with differnent priviledges for searching, for example:- Admin users can search everything
- Users of some organization can only search their own related remails
-
Upload searchable targets (named
Record
) to Algolia database -
Add new search item into algolia when needed (like emails)
Dependencies
After registering an account in Algolia and creating an application there, we include the following two dependencies:
<dependency> <groupId>com.algolia</groupId> <artifactId>algoliasearch-core</artifactId> <version>3.16.5</version> </dependency> <dependency> <groupId>com.algolia</groupId> <artifactId>algoliasearch-java-net</artifactId> <version>3.16.5</version> </dependency>
Record Object
- Algolia requires users define a
Record
object which at least contains a non-nullable field calledobjectID
. - Luckily we use mongodb in our java backend, we simply use a stringified
_id
and we use amodelMapper.map()
to take aDocument
object into our desiredRecord
object:
package com.organization.web.service.dto; import java.util.List; import lombok.Data; @Data public class EmailChainRecord { @Data public static class Supplier { private List<String> material_manu_internal_codes; } @Data public static class NameField { private String name; } @Data public static class EmailField { private String body; private List<String> participant_emails; } @Data public static class SenderInDb { private Integer id; private String user_name; private String first_name; private String last_name; private String email; } @Data public static class Task { private String code; private String name; } @Data public static class Section { private String name; private List<Task> tasks; } @Data public static class ProgramDetail { private String prog_ref_no; private String name; private List<Section> sections; } private String oid; private String objectID; private String title; private String buyer_company_code; private String latest_gmail_snippet; private List<String> sender_emails; private List<SenderInDb> sendersInDb; private NameField buyerCompanyDetail; private NameField projectDetail; private List<ProgramDetail> programmesDetail; private List<EmailField> emails_body; private List<String> participant_emails; }
SearchIndex Object
In both frontend and backend, the major api calls are all managed by the SearchIndex
object:
package com.organization.web.algolia; import org.springframework.beans.factory.annotation.Value; import org.springframework.context.annotation.Bean; import org.springframework.stereotype.Service; import com.algolia.search.DefaultSearchClient; import com.algolia.search.SearchClient; import com.algolia.search.SearchIndex; import com.organization.web.controller.err.CustomException; import com.organization.web.service.dto.EmailChainRecord; @Service public class Algolia { @Value("${algolia.application.id}") private String applicationID; @Value("${algolia.api.key}") private String APIKEY; @Bean public SearchClient getSearchClient() throws CustomException { if (this.applicationID == null || this.APIKEY == null) { throw new CustomException("application id and apikey cannot be null for algolia"); } return DefaultSearchClient.create(this.applicationID, this.APIKEY); } @Bean public SearchIndex<EmailChainRecord> getIndex() throws CustomException { SearchClient client = getSearchClient(); var initedIndex = client.initIndex("correspondence", EmailChainRecord.class); return initedIndex; } }
SearchService: All the Utility Functions
Contructor Injection. To facilitate unit testing, we use autowired constructor injection:
1package com.organization.web.service.impl; 2 3import com.algolia.search.SearchClient; 4import com.algolia.search.SearchIndex; 5import com.algolia.search.models.apikeys.SecuredApiKeyRestriction; 6import com.algolia.search.models.indexing.Query; 7import com.algolia.search.models.settings.IndexSettings; 8import com.mongodb.client.MongoCollection; 9import com.mongodb.client.model.Filters; 10import com.organization.web.controller.codes.UserRoles; 11import com.organization.web.controller.err.CustomException; 12import com.organization.web.mongodb.CollectionNames; 13import com.organization.web.mongodb.MongoDB; 14import com.organization.web.mongodb.MongoDB.JsonPipeline; 15import com.organization.web.service.SearchService; 16import com.organization.web.service.dto.EmailChainRecord; 17import com.organization.web.service.dto.EmailChainRecord.ProgramDetail; 18 19import java.util.ArrayList; 20import java.util.Arrays; 21import java.util.List; 22import java.util.stream.Collectors; 23 24import org.apache.commons.collections4.ListUtils; 25import org.bson.Document; 26import org.bson.types.ObjectId; 27import org.modelmapper.ModelMapper; 28import org.springframework.beans.factory.annotation.Autowired; 29import org.springframework.stereotype.Service; 30import org.springframework.beans.factory.annotation.Value; 31 32@Service 33public class SearchServiceImpl implements SearchService { 34 35 @Value("${algolia.public.search.api.key}") 36 private String publicSearchAPIKey; 37 38 private MongoDB mongodb; 39 private ModelMapper modelMapper = new ModelMapper(); 40 private SearchIndex<EmailChainRecord> index; 41 private SearchClient searchClient; 42 // A search key that you keep private 43 44 @Autowired 45 public SearchServiceImpl( 46 MongoDB mongodb, 47 ModelMapper modelMapper, 48 SearchIndex<EmailChainRecord> index, 49 SearchClient searchClient) { 50 this.mongodb = mongodb; 51 this.modelMapper = modelMapper; 52 this.index = index; 53 this.searchClient = searchClient; 54 } 55 56 public void clearObjects() { 57 this.index.clearObjects(); 58 }
Insert Data Into Algolia.
59 public void insertEmailsIntoAlgolia() { 60 clearObjects(); 61 ... 62 var searchDocuments = someCollection 63 .aggregate(somePipeline) 64 .map(u -> { 65 return modelMapper.map(u, EmailChainRecord.class); 66 }) 67 .forEach(u -> { 68 // refine data in u for search logic 69 }) 70 .into(new ArrayList<>()); 71 if (searchDocuments != null) { 72 this.index.saveObjects(searchDocuments).waitTask(); 73 } 74 }
Define Attributes that Contributes to the Search.
75 public void setKeyAndFacetsForQueryAndFilter() { 76 var indexSettings = new IndexSettings(); 77 78 List<String> attributes = Arrays.asList( 79 "latest_gmail_snippet", 80 "sender_emails", 81 "projectDetail.name", 82 "searchabletitle", 83 "title", 84 "projectDetail.name", 85 "senderInDb.user_name", 86 "senderInDb.first_name", 87 "senderInDb.last_name", 88 "programmesDetail.name", 89 "programmesDetail.sections.tasks.code", 90 "buyerCompanyDetail.name", 91 "emails_body.body", 92 "emails_body.participant_emails", 93 "participant_emails"); 94 indexSettings.setSearchableAttributes(attributes);
Define Facets (configs to the search keys)
95 List<String> filterFacets = Arrays.asList( 96 "filterOnly(participant_emails)", 97 "filterOnly(emails_body.participant_emails)");
98 List<String> searchFacets = attributes.stream() 99 .map(key -> String.format("searchable(%s)", key)) 100 .collect(Collectors.toList());
Add the Facets into Index Settings. ListUtils.union
is the same as arr1 + arr2
in python:
101 indexSettings.setAttributesForFaceting( 102 ListUtils.union(searchFacets, filterFacets)); 103 104 this.index.setSettings(indexSettings); 105 }
Impose Restrictions to Search Api Key.
107 public String createSearchAPIKey(Document user) throws Exception { 108 List<String> roles = user.getList("roles", String.class); 109 110 if (roles.contains(UserRoles.MANAGER) || roles.contains(UserRoles.STAFF)) { 111 return this.publicSearchAPIKey; 112 } 113 114 String userName = user.getString("user_name"); 115 SecuredApiKeyRestriction restriction = new SecuredApiKeyRestriction() 116 .setQuery(new Query().setFilters(String.format( 117 "participant_emails:%s OR emails_body.participant_emails:%s", 118 userName, 119 userName))); 120 121 String publicKey = this.searchClient.generateSecuredAPIKey( 122 this.publicSearchAPIKey, 123 restriction); 124 125 return publicKey; 126 }
Save a Record into Algolia.
127 public void saveObject(ObjectId someId) throws CustomException { 128 // logics to fetch search targets 129 130 EmailChainRecord record = modelMapper.map( 131 targetMailchain, 132 EmailChainRecord.class); 133 134 if (record != null) { 135 this.index.partialUpdateObject(record); 136 } 137 } 138}
Remark. From documentation if a record exists in your database but does not exist in algolia, then:
If the objectID is specified but doesn’t exist, Algolia creates a new record
That means an upsert
operation is automatic.
Frontend
Responsibility of Frontend
The frontend needs to
- Get
applicationID
andsearchApiKey
from backend - Call the search api to get
target document
searchable facets
for search suggestions.
Frontend Implementation in React
-
Algolia provides us with an npm package:
react-instantsearch
. -
However, if we use the UI component provided by that library, we will quickly use up our free quota for the api.
-
It is because the change handler in the provided searchbar is intentionally designed not to have any debounce rule.
-
Instead we create our own search component (with
<input/>
) and use debouncedonChange
handler with the followingsearch<T>
function.
export default class AlgoliaUtil { public static instance: AlgoliaUtil | undefined; public algoliaEnabled: boolean | undefined; private algoliaSearchIndex: string | undefined; private searchClient: SearchClient | undefined; private searchIndex: SearchIndex | undefined; constructor(props: { applicationID: string, apiKey: string, initIndex: string, algoliaEnabled: boolean }) { this.algoliaEnabled = props.algoliaEnabled; this.algoliaSearchIndex = props.initIndex; this.searchClient = algoliasearch( props.applicationID, props.apiKey, ); } public static getInstance() { if (!AlgoliaUtil.instance) { throw new Error("An algolia instance has not been instantiated yet.") } return AlgoliaUtil.instance; } private getSearchClient(): SearchClient { if (!this.searchClient) { throw new Error("Search Client is undefined"); } return this.searchClient; } private getIndex() { if (!this.searchIndex) { const searchClient = this.getSearchClient(); if (this.algoliaSearchIndex) { this.searchIndex = searchClient.initIndex(this.algoliaSearchIndex); } } return this.searchIndex; } public search<T>(params: { queryString: string, attributesToRetrieve: Extract<keyof T, string>[] }) { const { attributesToRetrieve, queryString } = params; const index = this.getIndex(); return index?.search(queryString, { attributesToRetrieve, facets: constant.FACETS_TO_RECEIVE }); } }
We instantiate AlgoliaUtil
object when some page is rendered. Sometimes when search feature is not ready yet, and we determine whether algolia is available by setting:
useEffect(() => { if (dialogOpen) { const enabled = AlgoliaUtil.getInstance().algoliaEnabled; setAlgoliaEnabled(enabled || false); } }, [dialogOpen]);
-
Here the type
T
insearch<T>
is simply the target attribute to retrieve. In our case, we useT = { oid: string }
. -
Also:
constant.FACETS_TO_RECEIVE = [ "title", "latest_gmail_snippet", "programmesDetail.name", "emails_body.body", "projectDetail.name" ],
are the results that were hit in the past, they are used as search suggestions.