Mario Mueller Mario Mueller

I recently made a little prototype for a search server. Normally I would have used Elasticsearch for such a case, but I wanted to write this thing in pure Go. After looking at some alternatives, like a lucene port to Go, which seemed to be “done” since 2015. Lucene 4.10 was also not very appealing to me. While searching for a solution, I stumbled over Bleve, which is a Go-only full-text indexing and search library implementation.

As I still had Elasticsearch in the back of my hand, so I gave it a try. I use GB for my Go projects, so please do not wonder about my imports in the source snippets. GB enables something that feels to me like a real project structure and I can really recommend it (although it lacks of the support to run go run).

I did a lot of Elasticsearch projects in my past, so I hoped, that the basic knowledge of how to index something would help me to get things up and running fast.

I started with the data model. I took a shop system from a friend, politely stole his article database and used it for my indexing test.

 1 // Article represents a single article
 2 type Article struct {
 3     ArticleID int `json:"article_id"`
 4     Name string `json:"name"`
 5     OrderNumber string `json:"order_number"`
 6     SalesCount int `json:"sales_count"`
 7     Keywords []string `json:"keywords"`
 8     Color string `json:"color"`
 9     Locale string `json:"locale"`
10     TranslatedName string `json:"translated_name"`
11 }
12 
13 // Type refers to the document type in bleve
14 func (a *Article) Type() string {
15     return "article"
16 }

Bleve is based on file indexes, which can be stored in different backends. I tried LevelDB and BoltDB, where the latter is the faster one. Bleve makes heavy use of compile tags, where +boltdb is one of them.

I came up with an indexing package, that should contain the related sources for indexing. First step: create and open an index.

 1 package indexing
 2 
 3 import (
 4     log "github.com/Sirupsen/logrus"
 5     "github.com/blevesearch/bleve"
 6     "os"
 7 )
 8 
 9 // OpenIndex returns the opened index
10 func OpenIndex(databasePath string) bleve.Index {
11     index, err := bleve.Open(databasePath)
12 
13     if err != nil {
14         log.Fatal(err)
15         os.Exit(-1)
16     }
17 
18     return index
19 }
20 
21 // CreateIndex creates the initial index
22 func CreateIndex(databasePath string) bleve.Index {
23     mapping := bleve.NewIndexMapping()
24     mapping = addCustomAnalyzers(mapping)
25     mapping = createArticleMapping(mapping)
26     index, err := bleve.New(databasePath, mapping)
27     if err != nil {
28         log.Fatal(err)
29         os.Exit(-1)
30     }
31     return index
32 }

The code points out, that there are two more functions, that support the index creation: addCustomAnalyzers and createArticleMapping.

In order make a search useful, you need to customize the process of analyzing the terms you want to index. The standard is in most cases worth nothing. In my case, I wanted to have an edge-ngram tokenizer for the translated_name.

 1 package indexing
 2 
 3 import (
 4     "github.com/blevesearch/bleve"
 5 )
 6 
 7 func createArticleMapping(indexMapping *bleve.IndexMapping) *bleve.IndexMapping {
 8     articleMapping := bleve.NewDocumentMapping()
 9 
10     articleIDMapping := bleve.NewNumericFieldMapping()
11     articleIDMapping.IncludeInAll = false
12     articleMapping.AddFieldMappingsAt("article_id", articleIDMapping)
13 
14     nameMapping := bleve.NewTextFieldMapping()
15     nameMapping.IncludeInAll = false
16     articleMapping.AddFieldMappingsAt("name", nameMapping)
17 
18     orderNumberMapping := bleve.NewTextFieldMapping()
19     orderNumberMapping.IncludeInAll = false
20     orderNumberMapping.IncludeTermVectors = false
21     articleMapping.AddFieldMappingsAt("order_number", orderNumberMapping)
22 
23     salesCountMapping := bleve.NewNumericFieldMapping()
24     salesCountMapping.IncludeInAll = false
25     articleMapping.AddFieldMappingsAt("sales_count", salesCountMapping)
26 
27     keywordsMapping := bleve.NewTextFieldMapping()
28     articleMapping.AddFieldMappingsAt("keywords", keywordsMapping)
29 
30     translatedName := bleve.NewTextFieldMapping()
31     translatedName.Analyzer = "fulltext_ngram"
32     articleMapping.AddFieldMappingsAt("translated_name", translatedName)
33 
34     color := bleve.NewTextFieldMapping()
35     color.IncludeInAll = false
36     articleMapping.AddFieldMappingsAt("color", color)
37 
38     locale := bleve.NewTextFieldMapping()
39     locale.IncludeInAll = false
40     locale.Analyzer = "not_analyzed"
41     articleMapping.AddFieldMappingsAt("locale", locale)
42 
43     indexMapping.AddDocumentMapping("article", articleMapping)
44 
45     return indexMapping
46 }

and the custom analyzers:

 1 package indexing
 2 
 3 import (
 4     log "github.com/Sirupsen/logrus"
 5     "github.com/blevesearch/bleve"
 6     "github.com/blevesearch/bleve/analysis/analyzers/custom_analyzer"
 7     "github.com/blevesearch/bleve/analysis/token_filters/edge_ngram_filter"
 8     "github.com/blevesearch/bleve/analysis/token_filters/lower_case_filter"
 9     "github.com/blevesearch/bleve/analysis/tokenizers/single_token"
10     "github.com/blevesearch/bleve/analysis/tokenizers/unicode"
11 )
12 
13 func addCustomTokenFilter(indexMapping *bleve.IndexMapping) *bleve.IndexMapping {
14     err := indexMapping.AddCustomTokenFilter("bigram_tokenfilter", map[string]interface{}{
15         "type": edge_ngram_filter.Name,
16         "side": edge_ngram_filter.FRONT,
17         "min":  3.0,
18         "max":  25.0,
19     })
20 
21     if err != nil {
22         log.Fatal(err)
23     }
24 
25     return indexMapping
26 }
27 
28 func addCustomAnalyzers(indexMapping *bleve.IndexMapping) *bleve.IndexMapping {
29     indexMapping = addCustomTokenFilter(indexMapping)
30 
31     err := indexMapping.AddCustomAnalyzer("not_analyzed", map[string]interface{}{
32         "type":      custom_analyzer.Name,
33         "tokenizer": single_token.Name,
34     })
35 
36     if err != nil {
37         log.Fatal(err)
38     }
39 
40     err = indexMapping.AddCustomAnalyzer("fulltext_ngram", map[string]interface{}{
41         "type":      custom_analyzer.Name,
42         "tokenizer": unicode.Name,
43         "token_filters": []string{
44             lower_case_filter.Name,
45             "bigram_tokenfilter",
46         },
47     })
48 
49     if err != nil {
50         log.Fatal(err)
51     }
52 
53     return indexMapping
54 }

Setting up the indexing process was the least effort, compared to understanding how the mapping works in Bleve. You need to read much of their source code, the wiki gives you only a very high level overview.

Finally, here is the code I’ve used to index the article data, bringing it all together. You can assume, that the slice of articles was loaded from a MySQL database and mapped to the article struct you’ve seen in above’s example.

 1 // Execute the import step
 2 func (istep *ImportFromMySQL) Execute(barContainer *multibar.BarContainer) {
 3     articles := istep.getArticles()
 4     idxProgress := barContainer.MakeBar(len(articles), "Indexing")
 5     go barContainer.Listen()
 6     for k, v := range articles {
 7         log.WithFields(log.Fields{
 8             "OrderID":        v.OrderNumber,
 9             "Color":          v.Color,
10             "TranslatedName": v.TranslatedName,
11             "Locale":         v.Locale,
12         }).Debug("Indexing Article")
13         id := fmt.Sprintf("%s-%s", string(v.OrderNumber), v.Locale)
14         istep.index.Index(id, v)
15         idxProgress(k + 1)
16     }
17     idxProgress(len(articles))
18 }

At the end, I’ve switched it all to Elasticsearch, due to performance reasons. A search request took in the best case 60ms and in the worst case about 300ms. Compared to Elasticsearch, even with taking the HTTP overhead into consideration, I get results below 10ms on my local 2015 MacBook Pro. My personal decision to go back to Elasticsearch was truly biased by the availability of the much deeper knowledge of the Elasticsearch internals. Bleve seems to my very appealing, as it can be a no-other-server-needed way of building a medium complex search service. I will follow its progress and maybe retest it some day. I am thankful anyway that somebody made the effort of creating such an educated library in the Go world. Keep up the good work!

I hope these code bits help you to get started with Bleve, it was a lot of fun for me.

Tags: