0%
July 26, 2023

Build a Search Function

algolia

fusejs

java

react

searching

What Approaches do we Have when it Comes to Search Engine?

Recently I am responsible for building search functions in frontend. I come up with the following in my mind:

  • We sends everything we want from backend to frontend, and we may either use standard regular expression or dedicated library like Fuse.js or lunr.js to query for desired results. This works perfectly fine for static web pages (such as this blog).

  • We build Elastic Stack, such as Elastic Search and Kibana, which in essense also save results in Document and index the fields for searching the documents.

And after struggling for tutorials in youtube, I came across:

  • We use Algolia by feeding our json files (wich consists of search targets) and setting the field names we want to use as search indexes.

Code Implementation for Fuse.js

Search Target

First we build our blog.json file which serves as a search resource.

[
    ...
    {
        "content": "..."
        "title": "Write Middleware in Redux-Toolkit",
        "date": "2023-06-20T00:00:00.000Z",
        "id": "blog0132",
        "tag": "react",
        "intro": "We list sample usage of ..."
        "toc": true
    },
    ...
]
Build a blog.json which Contains Documents to Search

In my case I use the following script:

import fs from "fs";
import matter from "gray-matter";
import path from "path";

const mdDirs = ["./src/mds/articles/tech", "./src/mds/articles/math"];

const getAllMdFilePaths = (dir: string) => {
  const mdFiles: string[] = [];

  const getFiles = (dir: string) => {
    const paths = fs.readdirSync(dir);
    paths.forEach((p) => {
      const newPath = path.join(`${dir}/${p}`);
      const pathStat = fs.statSync(newPath);
      if (pathStat.isDirectory()) {
        getFiles(newPath);
      } else {
        if (newPath.endsWith(".md")) {
          mdFiles.push(newPath);
        }
      }
    });
  };

  getFiles(dir);
  return mdFiles;
};

const writeMdInJson = () => {
  const targetPaths = "./src/mds/blog.json";
  const blogJson: any[] = [];
  for (const dirpath of mdDirs) {
    const mdpaths = getAllMdFilePaths(dirpath);
    mdpaths.forEach((path) => {
      const mdText = fs.readFileSync(path, { encoding: "utf8", flag: "r" });
      const { data, content } = matter(mdText);
      const { wip = false } = data;
      if (!wip) {
        blogJson.push({ content, ...data });
      }
    });
  }
  fs.writeFileSync(
    targetPaths,
    JSON.stringify(blogJson, null, 0)
      .replace(/(\\r\\n)/g, " ")
      .replace(/`/g, "")
      .replace(/\s+/g, " ")
  );
};

const main = () => {
  writeMdInJson();
};

main();
Search Component
Fuse.js (Deprecated as the result is not satisfactory)

Next in the our search component:

import searchJson from "../../../mds/blog.json";

export default function SearchComponent() {
  const fuse = useRef(
    new Fuse(searchJson, {
      keys: ["content", "tag", "tags", "title", "intro"],
      threshold: config.fuzzySearchThreshold,
    })
  );
  const [searchResults, setSearchedResults] = useState<
    { title: string; intro: string; tag: string; tags: string }[]
  >([]);
  const [searchText, setSearchText] = useState("");
  const searchBarRef = useRef<HTMLInputElement>(null);

  const handleSearchChange = debounce((e: ChangeEvent<HTMLInputElement>) => {
    setSearchText(e.target.value);
    const searchValue = e.target.value;
    if (searchValue) {
      const result = fuse.current.search(searchValue);
      setSearchedResults(
        result.map((r) => {
          const { title, intro, tag, tags } = r.item;
          return { title, intro, tag: tag || "", tags: tags || "" };
        })
      );
    } else {
      setSearchedResults([]);
    }
  }, 300);

  return (
    <SearchBar
      placeholder="Tag, title or content"
      onChange={handleSearchChange}
      inputRef={searchBarRef}
    />
  );
}
  • The Fuse object can be created anywhere and imported into the component.
  • In my case I simply use useRef as it is going to be aways static and unchanged in the life cycle of the SearchComponent.
lunr.js, A much more Powerful Version of Fuse.js

The implementation is very similar to Fuse.js:

export default function SearchComponent() {
  const [searchResults, setSearchedResults] = useState<
    { title: string; intro: string; tag: string; tags: string }[]
  >([]);
  const [searchText, setSearchText] = useState("");
  const searchBarRef = useRef<HTMLInputElement>(null);
  const lunrSearch = useRef<lunr.Index | null>(null);
  const searchMapping = useRef<{
    [id: string]: {
      content: string,
      title: string,
      intro: string,
      tag: string,
      tags: string
    }
  }>({});

  useState(() => {
    lunrSearch.current = lunr(function () {
      this.field("tag");
      this.field("tags");
      this.field("title");
      this.field("intro");
      this.field("content");

      console.log("indexing ...");

      (searchJson as { content: string, title: string, date: string, id: string, tag?: string, tags?: string, intro: string, toc: boolean }[]).forEach(
        (searchTarget, index) => {
          const id = index.toString();
          const { intro, tag = "", tags = "", title, content } = searchTarget;
          const searchJson = { intro, tag, tags, title, content };
          searchMapping.current[id] = searchJson
          this.add({ ...searchJson, id })
        }
      );
    })})

  const handleSearchChange = debounce((e: ChangeEvent<HTMLInputElement>) => {
    setSearchText(e.target.value);
    const searchValue = e.target.value;
    if (searchValue) {
      const result = lunrSearch?.current?.search(searchValue);
      const displayResult = result?.sort((r1, r2) => r2.score - r1.score).map(r => {
        const { ref } = r;
        const doc = searchMapping.current?.[ref];
        // we dont' need to return content in the search field
        return {
          intro: doc.intro,
          tag: doc.tag,
          tags: doc.tags,
          title: doc.title
        }
      }) || [];
        setSearchedResults(displayResult);
      }
      else {
        setSearchedResults([]);
      }
    }, 300);

  return (
    <SearchBar
      placeholder="Tag, title or content"
      onChange={handleSearchChange}
      inputRef={searchBarRef}
    />
  );
}

Code Implementation for Algolia

Backend Using Java
Responsibilities of Backend in Using Algolia

Our backend will take the following tasks:

  • Provide ALGOLIA_SEARCH_INDEX

  • Provide applicationID

  • Provide frontend client with searchApiKey's with differnent priviledges for searching, for example:

    • Admin users can search everything
    • Users of some organization can only search their own related remails
  • Upload searchable targets (named Record) to Algolia database

  • Add new search item into algolia when needed (like emails)

Dependencies

After registering an account in Algolia and creating an application there, we include the following two dependencies:

<dependency>
  <groupId>com.algolia</groupId>
  <artifactId>algoliasearch-core</artifactId>
  <version>3.16.5</version>
</dependency>
<dependency>
  <groupId>com.algolia</groupId>
  <artifactId>algoliasearch-java-net</artifactId>
  <version>3.16.5</version>
</dependency>
Record Object
  • Algolia requires users define a Record object which at least contains a non-nullable field called objectID.
  • Luckily we use mongodb in our java backend, we simply use a stringified _id and we use a modelMapper.map() to take a Document object into our desired Record object:
package com.organization.web.service.dto;

import java.util.List;
import lombok.Data;

@Data
public class EmailChainRecord {

    @Data
    public static class Supplier {
        private List<String> material_manu_internal_codes;
    }

    @Data
    public static class NameField {
        private String name;
    }

    @Data
    public static class EmailField {
        private String body;
        private List<String> participant_emails;
    }

    @Data
    public static class SenderInDb {
        private Integer id;
        private String user_name;
        private String first_name;
        private String last_name;
        private String email;
    }

    @Data
    public static class Task {
        private String code;
        private String name;
    }

    @Data
    public static class Section {
        private String name;
        private List<Task> tasks;
    }

    @Data
    public static class ProgramDetail {
        private String prog_ref_no;
        private String name;
        private List<Section> sections;
    }

    private String oid;
    private String objectID;
    private String title;
    private String buyer_company_code;
    private String latest_gmail_snippet;
    private List<String> sender_emails;
    private List<SenderInDb> sendersInDb;
    private NameField buyerCompanyDetail;
    private NameField projectDetail;
    private List<ProgramDetail> programmesDetail;
    private List<EmailField> emails_body;
    private List<String> participant_emails;
}
SearchIndex Object

In both frontend and backend, the major api calls are all managed by the SearchIndex object:

package com.organization.web.algolia;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.stereotype.Service;

import com.algolia.search.DefaultSearchClient;
import com.algolia.search.SearchClient;
import com.algolia.search.SearchIndex;
import com.organization.web.controller.err.CustomException;
import com.organization.web.service.dto.EmailChainRecord;

@Service
public class Algolia {
    @Value("${algolia.application.id}")
    private String applicationID;
    @Value("${algolia.api.key}")
    private String APIKEY;

    @Bean
    public SearchClient getSearchClient() throws CustomException {
        if (this.applicationID == null || this.APIKEY == null) {
            throw new CustomException("application id and apikey cannot be null for algolia");
        }
        return DefaultSearchClient.create(this.applicationID, this.APIKEY);
    }

    @Bean
    public SearchIndex<EmailChainRecord> getIndex() throws CustomException {
        SearchClient client = getSearchClient();
        var initedIndex = client.initIndex("correspondence", EmailChainRecord.class);
        return initedIndex;
    }
}
SearchService: All the Utility Functions

Contructor Injection. To facilitate unit testing, we use autowired constructor injection:

1package com.organization.web.service.impl;
2
3import com.algolia.search.SearchClient;
4import com.algolia.search.SearchIndex;
5import com.algolia.search.models.apikeys.SecuredApiKeyRestriction;
6import com.algolia.search.models.indexing.Query;
7import com.algolia.search.models.settings.IndexSettings;
8import com.mongodb.client.MongoCollection;
9import com.mongodb.client.model.Filters;
10import com.organization.web.controller.codes.UserRoles;
11import com.organization.web.controller.err.CustomException;
12import com.organization.web.mongodb.CollectionNames;
13import com.organization.web.mongodb.MongoDB;
14import com.organization.web.mongodb.MongoDB.JsonPipeline;
15import com.organization.web.service.SearchService;
16import com.organization.web.service.dto.EmailChainRecord;
17import com.organization.web.service.dto.EmailChainRecord.ProgramDetail;
18
19import java.util.ArrayList;
20import java.util.Arrays;
21import java.util.List;
22import java.util.stream.Collectors;
23
24import org.apache.commons.collections4.ListUtils;
25import org.bson.Document;
26import org.bson.types.ObjectId;
27import org.modelmapper.ModelMapper;
28import org.springframework.beans.factory.annotation.Autowired;
29import org.springframework.stereotype.Service;
30import org.springframework.beans.factory.annotation.Value;
31
32@Service
33public class SearchServiceImpl implements SearchService {
34
35    @Value("${algolia.public.search.api.key}")
36    private String publicSearchAPIKey;
37
38    private MongoDB mongodb;
39    private ModelMapper modelMapper = new ModelMapper();
40    private SearchIndex<EmailChainRecord> index;
41    private SearchClient searchClient;
42    // A search key that you keep private
43
44    @Autowired
45    public SearchServiceImpl(
46            MongoDB mongodb,
47            ModelMapper modelMapper,
48            SearchIndex<EmailChainRecord> index,
49            SearchClient searchClient) {
50        this.mongodb = mongodb;
51        this.modelMapper = modelMapper;
52        this.index = index;
53        this.searchClient = searchClient;
54    }
55
56    public void clearObjects() {
57        this.index.clearObjects();
58    }

Insert Data Into Algolia.

59    public void insertEmailsIntoAlgolia() {
60        clearObjects();
61        ...
62        var searchDocuments = someCollection
63                .aggregate(somePipeline)
64                .map(u -> {
65                    return modelMapper.map(u, EmailChainRecord.class);
66                })
67                .forEach(u -> {
68                    // refine data in u for search logic
69                })
70                .into(new ArrayList<>());
71        if (searchDocuments != null) {
72            this.index.saveObjects(searchDocuments).waitTask();
73        }
74    }

Define Attributes that Contributes to the Search.

75    public void setKeyAndFacetsForQueryAndFilter() {
76        var indexSettings = new IndexSettings();
77
78        List<String> attributes = Arrays.asList(
79                "latest_gmail_snippet",
80                "sender_emails",
81                "projectDetail.name",
82                "searchabletitle",
83                "title",
84                "projectDetail.name",
85                "senderInDb.user_name",
86                "senderInDb.first_name",
87                "senderInDb.last_name",
88                "programmesDetail.name",
89                "programmesDetail.sections.tasks.code",
90                "buyerCompanyDetail.name",
91                "emails_body.body",
92                "emails_body.participant_emails",
93                "participant_emails");
94        indexSettings.setSearchableAttributes(attributes);

Define Facets (configs to the search keys)

95        List<String> filterFacets = Arrays.asList(
96                "filterOnly(participant_emails)",
97                "filterOnly(emails_body.participant_emails)");
98        List<String> searchFacets = attributes.stream()
99                .map(key -> String.format("searchable(%s)", key))
100                .collect(Collectors.toList());

Add the Facets into Index Settings. ListUtils.union is the same as arr1 + arr2 in python:

101        indexSettings.setAttributesForFaceting(
102                ListUtils.union(searchFacets, filterFacets));
103
104        this.index.setSettings(indexSettings);
105    }

Impose Restrictions to Search Api Key.

107    public String createSearchAPIKey(Document user) throws Exception {
108        List<String> roles = user.getList("roles", String.class);
109
110        if (roles.contains(UserRoles.MANAGER) || roles.contains(UserRoles.STAFF)) {
111            return this.publicSearchAPIKey;
112        }
113
114        String userName = user.getString("user_name");
115        SecuredApiKeyRestriction restriction = new SecuredApiKeyRestriction()
116                .setQuery(new Query().setFilters(String.format(
117                        "participant_emails:%s OR emails_body.participant_emails:%s",
118                        userName,
119                        userName)));
120
121        String publicKey = this.searchClient.generateSecuredAPIKey(
122                this.publicSearchAPIKey,
123                restriction);
124
125        return publicKey;
126    }

Save a Record into Algolia.

127    public void saveObject(ObjectId someId) throws CustomException {
128        // logics to fetch search targets
129
130        EmailChainRecord record = modelMapper.map(
131                targetMailchain,
132                EmailChainRecord.class);
133
134        if (record != null) {
135            this.index.partialUpdateObject(record);
136        }
137    }
138}

Remark. From documentation if a record exists in your database but does not exist in algolia, then:

If the objectID is specified but doesn’t exist, Algolia creates a new record

That means an upsert operation is automatic.

Frontend
Responsibility of Frontend

The frontend needs to

  • Get applicationID and searchApiKey from backend
  • Call the search api to get
    • target document
    • searchable facets for search suggestions.
Frontend Implementation in React
  • Algolia provides us with an npm package: react-instantsearch.

  • However, if we use the UI component provided by that library, we will quickly use up our free quota for the api.

  • It is because the change handler in the provided searchbar is intentionally designed not to have any debounce rule.

  • Instead we create our own search component (with <input/>) and use debounced onChange handler with the following search<T> function.

export default class AlgoliaUtil {
	public static instance: AlgoliaUtil | undefined;
	public algoliaEnabled: boolean | undefined;
	private algoliaSearchIndex: string | undefined;
	private searchClient: SearchClient | undefined;
	private searchIndex: SearchIndex | undefined;

	constructor(props: { applicationID: string, apiKey: string, initIndex: string, algoliaEnabled: boolean }) {
		this.algoliaEnabled = props.algoliaEnabled;
		this.algoliaSearchIndex = props.initIndex;
		this.searchClient = algoliasearch(
			props.applicationID,
			props.apiKey,
		);
	}

	public static getInstance() {
		if (!AlgoliaUtil.instance) {
			throw new Error("An algolia instance has not been instantiated yet.")
		}
		return AlgoliaUtil.instance;
	}

	private getSearchClient(): SearchClient {
		if (!this.searchClient) {
			throw new Error("Search Client is undefined");
		}
		return this.searchClient;
	}

	private getIndex() {
		if (!this.searchIndex) {
			const searchClient = this.getSearchClient();
			if (this.algoliaSearchIndex) {
				this.searchIndex = searchClient.initIndex(this.algoliaSearchIndex);
			}
		}
		return this.searchIndex;
	}

	public search<T>(params: { queryString: string, attributesToRetrieve: Extract<keyof T, string>[] }) {
		const { attributesToRetrieve, queryString } = params;
		const index = this.getIndex();
		return index?.search(queryString, {
			attributesToRetrieve, facets: constant.FACETS_TO_RECEIVE
		});
	}
}

We instantiate AlgoliaUtil object when some page is rendered. Sometimes when search feature is not ready yet, and we determine whether algolia is available by setting:

useEffect(() => {
  if (dialogOpen) {
    const enabled = AlgoliaUtil.getInstance().algoliaEnabled;
    setAlgoliaEnabled(enabled || false);
  }
}, [dialogOpen]);
  • Here the type T in search<T> is simply the target attribute to retrieve. In our case, we use T = { oid: string }.

  • Also:

    constant.FACETS_TO_RECEIVE = [
      "title",
      "latest_gmail_snippet",
      "programmesDetail.name",
      "emails_body.body",
      "projectDetail.name"
    ],

    are the results that were hit in the past, they are used as search suggestions.