Elasticsearch学习指南

1. 初步检索 _cat

GET /_cat/nodes：查看所有节点
127.0.0.1 39 92 6 0.26 0.19 0.14 mdi * efBli3S

GET /_cat/health：查看es健康状况
1585677729 18:02:09 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0%
单节点正常的话都是yellow，集群上线正常为green

GET /_cat/master：查看主节点
efBli3STR_i2BsdwuEcrhw 127.0.0.1 127.0.0.1 efBli3S

GET /_cat/indices：查看所有索引
yellow open .kibana khzaMKDvQnWaMF_TNUbZYQ 1 1 1 0 3.2kb 3.2kb

2. crud

2.1. POST 新增

POST /customer/external/2
{
  "name":"John Hua"
}

{
  "_index": "customer",
  "_type": "external",
  "_id": "1",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}

2.1.1. 没有id，自创建

AXExzS8q1XLtoSIsto8W

POST /customer/external
{
  "name":"John Hua"
}

{
  "_index": "customer",
  "_type": "external",
  "_id": "AXExzS8q1XLtoSIsto8W",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}

不带数据校验

2.1.2. 有id，新增

POST /customer/external/2
{
  "name":"John Hua"
}

{
  "_index": "customer",
  "_type": "external",
  "_id": "2",
  "_version": 2,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": false
}

2.1.3. _upadte

POST /customer/external/2/_update
{
  "doc":{
    "age":18,
    "name":"John Doe"
  }
}

{
  "_index": "customer",
  "_type": "external",
  "_id": "2",
  "_version": 3,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

带数据校验，数据一样不操作 “result”: “noop”

POST /customer/external/2/_update
{
  "doc":{
    "age":"18",
    "name":"John Doe"
  }
}

{
  "_index": "customer",
  "_type": "external",
  "_id": "2",
  "_version": 3,
  "result": "noop",
  "_shards": {
    "total": 0,
    "successful": 0,
    "failed": 0
  }
}

运行脚本

ctx 当前上下文环境 “_source”: { “age”: 18 }

POST /customer/external/2/_update
{
  "script":"ctx._source.age+=5"
}

2.1.4. 批量新增 _bulk

http://192.168.0.101:9200/bank/account/_bulk

{
  { action: { metadata }}
	{ request body        }

	{ action: { metadata }}
	{ request body        }
}

{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }} 

{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title":    "My first blog post" }

{ "index":  { "_index": "website", "_type": "blog" }}
{ "title":    "My second blog post" }

{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

测试数据

https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true

2.2. PUT 修改

2.2.1. 有id，新增 updated

PUT /customer/external/1
{
  "name":"John Doe"
}


{
  "_index": "customer",
  "_type": "external",
  "_id": "1",
  "_version": 6,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": false
}

2.2.2. 没有id，异常

1	No handler found for uri [/customer/external] and method [PUT]

2.3. GET 查

GET customer/external/1

{
  "_index": "customer",
  "_type": "external",
  "_id": "1",
  "_version": 6,
  "found": true,
  "_source": {
    "name": "John Doe"
  }
}

2.4. DELETE 删除

只是删除标志位，重新新增版本号增加

DELETE /customer/external/1

{
  "found": true,
  "_index": "customer",
  "_type": "external",
  "_id": "1",
  "_version": 2,
  "result": "deleted",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}


PUT /customer/external/1
{
  "name":"John Doe"
}
{
  "_index": "customer",
  "_type": "external",
  "_id": "1",
  "_version": 3,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}

#整个库
DELETE /customer
{
  "acknowledged": true
}

3. query

3.1. match_all

GET bank/_search
{
  "query":{
    "match_all": {}
  },
  "_source": ["age","address"]
}

{
  "took": 4, #took - Elasticsearch 执行搜索的时间（毫秒）
  "timed_out": false, #搜索是否超时
  "_shards": { #多少个分片被搜索了，以及统计了成功/失败的搜索分片
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": { #搜索结果
    "total": 999, #搜索结果
    "max_score": 1,
    "hits": [ #实际的搜索结果数组（默认为前 10 的文档）
      {
        "_index": "bank",
        "_type": "account",
        "_id": "25",
        "_score": 1, #score 和 max_score –相关性得分和最高得分
        "_source": {
        }
			},
			{},
			{}
      ...        
  }
}

3.2. 显示部分字段 “_source”: [“ “,” “]

GET bank/_search
{
  "query":{
    "match_all": {}
  },
  "_source": ["age","address"]
}

3.3. 模糊匹配 match

采用分词查询，一个个单词为关键词，记录所在文档号，倒排索引，根据内容锁定索引(文档ID)

故为模糊查询


GET bank/_search
{
  "query": {
    "match": {
      "account_number": "44"
    }
  }
}

3.4. 精确匹配 match_phrase

GET bank/_search
{
  "query": {
    "match": {
      "address.keyword": "930 Bay Avenue"
    }
  }
}

GET bank/_search
{
  "query": {
    "match_phrase": {
      "address": "930 Bay Avenue"
    }
  }
}

3.5. 多字段查询 “multi_match”

GET bank/_search
{
  "query": {
    "multi_match": {
      "query": "Mill ak",
      "fields": ["address","state"]
    }
  }
}

3.6. 复合查询 bool

3.6.1. must-必须达到列举的所有条件，有助于得分

3.6.2. should-如果达到会增加相关文档的评分

3.6.3. must_not-必须不是指定的情况

性别为M，地址中包括Mill，州不在IL，年龄在30-40岁最好

GET bank/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "gender": "M"
          }
        },
        {
          "match": {
            "address": "Mill"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "state": "IL"
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gte": 30,
              "lte": 40
            }
          }
        }
      ]
    }
  }
}

3.6.4. filter- 必须匹配，分数会被忽略

GET bank/_search
{
  "query": {
    "bool": {
      "filter": {
        "match": {
          "address": "Bay"
        }
      }
    }
  }
}

3.7. term 精确匹配非文本

查询精确值，但类型不是文本，只能是数字/布尔

GET bank/_search
{
  "query": {
    "term": {
      "age": {
        "value": "38"
      }
    }
  }
}

"hits": {
    "total": 39
}


GET bank/_search
{
  "query": {
    "term": {
      "gender": {
        "value": "F"
      }
    }
  }
}


"hits": {
    "total": 0
}


GET bank/_search
{
  "query": {
    "term": {
      "gender.keyword": {
        "value": "F"
      }
    }
  }
}

"hits": {
    "total": 493
}

3.8. terms 匹配多个值

age=23/35

GET bank/_search
{
  "query": {
    "terms": {
      "age": [
        "23",
        "35"
      ]
    }
  }
}

3.9. 🌟聚合🌟

所有人平均年龄

# 所有人平均年龄
GET bank/_search
{
  "aggs": {
    "age_average": { #值名
      "avg": { #求平均
        "field": "age" #哪个属性
      }
    }
  }
}

"aggregations": {
    "age_average": {
      "value": 30.176176176176178
    }
  }

address含Mill的人的年龄分布以及所有年龄平均值

GET bank/_search
{
  "query": {
    "match": {
      "address": "Mill"
    }
  },
  "aggs": {
    "age_avg": {
      "avg": {
        "field": "age"
      }
    },
    "age_count":{
      "terms": {
        "field": "age",
        "size": 100
      }
    }
  }
}

"aggregations": {
    "age_avg": {
      "value": 34
    },
    "age_count": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 38,
          "doc_count": 2
        },
        {
          "key": 28,
          "doc_count": 1
        },
        {
          "key": 32,
          "doc_count": 1
        }
      ]
    }
}

按照年龄聚合，并且请求这些年龄段的这些人的平均薪资

GET bank/_search
{
  "aggs": {
    "age_count": {
      "terms": {
        "field":"age"
      }
    },
    "avg_balance":{
      "avg": {
        "field": "balance"
      }
    }
  }
}

查出所有年龄分布，并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

GET bank/_search
{
  "aggs": {
    "age_count": {
      "terms": {
        "field":"age"
      },
      "aggs": {
        "gender_count":{
          "terms": {
            "field": "gender.keyword"
          }
        },
        "gender_avg":{
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

3.10. GET bank/_mapping

默认映射，需要在创建索引的时候指定映射

{
  "bank": {
    "mappings": {
      "account": {
        "properties": {
          "account_number": {
            "type": "long"
          },
          "address": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "age": {
            "type": "long"
          },
          "balance": {
            "type": "long"
          },
          "city": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "email": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "employer": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "firstname": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "gender": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "lastname": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "state": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

4. 分词

一个 tokenizer（分词器）接收一个字符流，将之分割为独立的 tokens（词元，通常是独立的单词），然后输出 tokens 流。

例如，whitespace tokenizer 遇到空白字符时分割文本。它会将文本 “Quick brown fox!“ 分割为 [Quick, brown, fox!]。

该 tokenizer（分词器）还负责记录各个 term（词条）的顺序或 position 位置（用于 phrase 短语和 word proximity 词近邻查询），以及 term（词条）所代表的原始 word（单词）的 start（起始）和 end（结束）的 character offsets（字符偏移量）（用于高亮显示搜索的内容）。

Elasticsearch 提供了很多内置的分词器，可以用来构建 custom analyzers（自定义分词器）

4.1. ik分词器

https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.6.11

找到对应版本

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip

进入docker elasticsearch终端的plugins文件夹安装ik

下载wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip

解压unzip elasticsearch-analysis-ik-5.6.11.zip

查看是否安装 elasticsearch-plugin list

[linux@localhost ~]$ docker exec -it elasticsearch /bin/bash
root@19d9ac40ba76:/usr/share/elasticsearch# ls
NOTICE.txt  README.textile  bin  config  data  lib  logs  modules  plugins
root@19d9ac40ba76:/usr/share/elasticsearch# cd plugins
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# unzip elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# rm -rf elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
elasticsearch
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# mv elasticsearch ik
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
ik
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# cd ../bin
root@19d9ac40ba76:/usr/share/elasticsearch/bin# ls
elasticsearch  elasticsearch-keystore  elasticsearch-plugin  elasticsearch-systemd-pre-exec  elasticsearch-translog  elasticsearch.in.sh
root@19d9ac40ba76:/usr/share/elasticsearch/bin# elasticsearch-plugin list
ik

4.2. 效果

GET bank/_analyze
{  
  "text": "我是中国人"
}
================================
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "中",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "国",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

4.2.1. ik-smart

GET bank/_analyze
{
  "analyzer": "ik_smart",
  "text":"我是中国人"
}
===========识别================
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中国人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    }
  ]
}

4.2.2. ik-max-word

GET bank/_analyze
{
  "analyzer": "ik_max_word",
  "text":"我是中国人"
}
=================================
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中国人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "中国",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "国人",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    }
  ]
}