Search in Hugo with Lunr

Reading time: 6 min

¿What is the best way to make a search engine in Hugo? My first attempt was looking for information in the official documentation, I found some guides and links which helped me, but none of them suited my needs. So I will detail in this guide the solution that I have developed.

Every guide suggest these 2 possibilities:

  • Algolia: It´s a service that provides a set of tools that simplify the process of making and integrating a full search experience into our sites and applications. It has an automated crawler to extract content from web sites. It´s a third party service which allows the integration on a static site easily, we just have to use the provided API in front. Algolia is used in a lot of different sites but I decided I didn’t want to use a third party service.

  • Lunr: Lunr.js is a small, full-text search library for use in the browser. It indexes JSON documents and provides a simple search interface for retrieving documents that best match text queries. It allows fuzzy search that is the technique of finding strings that match a pattern approximately (rather than exactly). This is the approach used in this blog.

Defining the output of our content in JSON format

Having Lunr as Javascript library for fuzzy search we just need the other part in order to complete our search engine: a JSON file with posts content from the blog. We can define this behavior in 2 different ways:

  1. In Hugo global config file, config.toml in my case. All the content in Hugo will be available in HTML and RSS format by default if no config is provided. In this site, all the blog articles are under what is called in Hugo as section, in content/blog path. We can set global JSON output for every section on this way:
[outputs]
section = ["JSON", "HTML", "RSS"]

This configuration will output a JSON file for each section. For example, if we have another section (courses) in addition to blog, each one will have its index.json file with the content of the section. This is very powerful and we will be able to make independent searches for each section. But, How can I tell to Hugo the content and format for that JSON file? We have to create a list.json template file like we have list.html for HTML files.

If we want each section file different, inside layouts folder and following with this site as example, in a folder called blog we must create a list.json file and inside courses folder we will create another list.json different. However if the format for each section is going to be the same, we can create list.json file inside _default folder and it will be used for every section (unless the section has a list.json file that would replace the global in _default folder).

Let´s see list.json content finally. It just generates an array of posts using Hugo syntax.

[
{{ range $index, $value := where .Site.Pages "Type" "post" }}
{{ if $index }}, {{ end }}
{
"url": "{{ .RelPermalink }}",
"title": "{{ .Title }}",
"content": {{ .Content | plainify | jsonify }},
"summary": "{{.Summary}}"
}
{{ end }}
]

Of course the content is completely customizable, and so the name for each field. In my case I have a reduced number of posts, so by the moment I put all the content in content field, but in the future because of better performance I will try to reduce the JSON file size leaving just title and summary for the search.

  1. Enabling for each section the output format from the front matter. Following with my site as example, I have _index.md and _index.en.md (English version) files in content/blog folder. We can enable JSON format on this way:
---
title: Personal blog
subtitle: Posts written on a variety of topics related to the world of technology and programming.
outputs:
- html
- rss
- json
---

The template list.json part is exactly the same than at point 1. You can read more about custom output formats in Hugo documentation.

Implementing search engine using lunr with Javascript

Let´s see the code that will allow us to search in the JSON file generated. In my case, I have created a search.html file in layouts/page. Let´s check step by step the content. First of all we need and input text for the search term, followed by a section element where search results will be showed.

<label for="search-input">Término de búsqueda</label>
<input type="text" id="search-input" name="search" placeholder="{{i18n "search_loading"}}...">

<section id="search-results"></section>

Javascript code is also inside the html file so that we can create dynamic vars in Javascript with Hugo, for example for getting translation of strings and getting relative path to index.json for each language.

Next step is importing Lunr, in my case in static/js/ folder:

{{ $lunr := "js/lunr.min.js" | absURL }}

<script src="{{ $lunr }}"></script>

It would be possible importing Lunr via CDN:


<script src="https://unpkg.com/lunr/lunr.js"></script>

Below is shown the Javascript code that on page load makes a petition to load index.json file and creates indexed document that will be used to return search results. Searches will be launched when user is typing in the input field.


<script type="text/javascript">
(function () {
  let idx;
  let documents = [];
  const URL_LIST_POSTS = '{{ "blog/index.json" | relLangURL }}';
  const searchInput = document.getElementById("search-input");
  const searchResults = document.getElementById("search-results");

  // Request and index documents
  fetch(URL_LIST_POSTS, {
    method: "get",
  })
    .then((res) => res.json())
    .then((res) => {
      // Create index document with lunr
      idx = lunr(function () {
        this.ref("url");
        this.field("title");
        this.field("content");
        this.field("summary");

        res.forEach(function (doc) {
          this.add(doc);
          documents[doc.url] = {
            title: doc.title,
            content: doc.content,
            summary: doc.summary,
          };
        }, this);
      });

      // Once data is loaded we can register handler
      registerSearchHandler();
    })
    .catch((err) => {
      console.log({ err });
      const errorMsg = '{{ i18n "search_error" }}';
      searchResults.innerHTML = `
        <div class="bg-red-100 border-l-4 border-red-500 text-red-700 p-4" role="alert">
          <p>${errorMsg}</p>
        </div>`;
    });

  ///////////////////////////////////////////////////////////

  function renderSearchResults(results) {
    const noResults = '{{ i18n "search_noCoincidence" }}';
    // If results are empty
    if (results.length === 0) {
      searchResults.innerHTML = `
        <div class="bg-blue-100 border-l-4 border-blue-500 text-blue-700 p-4" role="alert">
          <p>${noResults}</p>
        </div>
            `;
      return;
    }

    // Show max 10 results
    if (results.length > 9) {
      results = results.slice(0, 10);
    }

    // Reset search results
    searchResults.innerHTML = "";

    // Append results
    results.forEach((result) => {
      // Create result item
      let article = document.createElement("article");
      article.classList.add("mb-8");
      article.innerHTML = `
        <a href="${result.ref}" class="block group">
          <h2 class="article-title group-hover:text-green-500 pb-1">${documents[result.ref].title}</h2>
          <div class="text-gray-700 dark:text-gray-300"><p>${documents[result.ref].summary}</p></div>
        </a>`;
      searchResults.appendChild(article);
    });
  }

  function registerSearchHandler() {
    // Register on input event
    searchInput.oninput = function (event) {
      if (searchInput.value === "") {
        searchResults.innerHTML = "";
        return;
      }

      // Get input value
      const query = event.target.value;

      // Run fuzzy search
      const results = idx.query(function(q) {
        q.term(lunr.tokenizer(query.trim()), { usePipeline: true, boost: 100 });
        q.term(lunr.tokenizer(query.trim()) + '*', { usePipeline: false, boost: 10 });
        q.term(lunr.tokenizer(query.trim()), { usePipeline: false, editDistance: 1 });
      });

      // Render results
      renderSearchResults(results);
    };

    searchInput.placeholder = '{{ i18n "search_inputPlaceholder" }}';
  }
})();
</script>

And that´s all. To develop this solution I have followed the guides of Joseph Earl and Matt Walters. I have adapted and updated Javascript code and corrected the method for searching with Lunr, because it didn’t work properly in all cases. This is the most important part, I found the solution in an issue in Lunr github:

// Run fuzzy search
const results = idx.query(function(q) {
  q.term(lunr.tokenizer(query.trim()), { usePipeline: true, boost: 100 });
  q.term(lunr.tokenizer(query.trim()) + '*', { usePipeline: false, boost: 10 });
  q.term(lunr.tokenizer(query.trim()), { usePipeline: false, editDistance: 1 });
});

You can take a look to search.html file with the complete code in github and adapt it to your needs.