elasticsearch之phrase suggester


词组建议器和词条建议器一样,不过它不再为单个词条提供建议,而是为整个文本提供建议。
准备数据:

PUT s4
{
  "mappings": {
    "doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "standard"
        }
      }
    }
  }
}

PUT s4/doc/1
{
  "title": "Lucene is cool"
}

PUT s4/doc/2
{
  "title": "Elasticsearch builds on top of lucene"
}

PUT s4/doc/3
{
  "title": "Elasticsearch rocks"
}

PUT s4/doc/4
{
  "title": "Elastic is the company behind ELK stack"
}

PUT s4/doc/5
{
  "title": "elk rocks"
}

PUT s4/doc/6
{
  "title": "elasticsearch is rock solid"
}

现在我们来看看phrase是如何建议的:

GET s4/doc/_search
{
  "suggest": {
    "my_s4": {
      "text": "lucne and elasticsear rock",
      "phrase": {
        "field": "title"
      }
    }
  }
}

text是输入带有拼错的文本。而建议类型则换成了phrase。来看查询结果:

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "my_s4" : [
      {
        "text" : "lucne and elasticsear rock",
        "offset" : 0,
        "length" : 26,
        "options" : [
          {
            "text" : "lucne and elasticsearch rocks",
            "score" : 0.12709484
          },
          {
            "text" : "lucne and elasticsearch rock",
            "score" : 0.10422645
          },
          {
            "text" : "lucne and elasticsear rocks",
            "score" : 0.10036137
          }
        ]
      }
    ]
  }
}

可以看到options直接返回了相关短语列表。虽然lucene建议的并不好。但elasticserchrock很不错。除此之外,我们还可以使用高亮来向用户展示哪些原有的词条被纠正了。

GET s4/doc/_search
{
  "suggest": {
    "my_s4": {
      "text": "lucne and elasticsear rock",
      "phrase": {
        "field": "title",
        "highlight":{
          "pre_tag":"<em>",
          "post_tag":"</em>"
        }
      }
    }
  }
}

除了默认的,还可以自定义高亮显示:

GET s4/doc/_search
{
  "suggest": {
    "my_s4": {
      "text": "lucne and elasticsear rock",
      "phrase": {
        "field": "title",
        "highlight":{
          "pre_tag":"<b id='d1' class='t1' style='color:red;font-size:18px;'>",
          "post_tag":"</b>"
        }
      }
    }
  }
}

需要注意的是,建议器结果的高亮显示和查询结果高亮显示有些许区别,比如说,这里的自定义标签是pre_tagpost_tag而不是之前如这样的:

GET s4/doc/_search
{
  "query": {
    "match": {
      "title": "rock"
    }
  },
  "highlight": {
    "pre_tags": "<b style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "title": {}
    }
  }
}

phrase suggesterterm suggester的基础上,会考虑多个term之间的关系,比如是否同时出现索引的原文中,临近程度,词频等。