Elasticsearch学习

Elasticsearch学习指南

1. 初步检索 _cat

1
2
3
4
5
6
7
8
9
10
11
12
GET /_cat/nodes:查看所有节点
127.0.0.1 39 92 6 0.26 0.19 0.14 mdi * efBli3S

GET /_cat/health:查看es健康状况
1585677729 18:02:09 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0%
单节点正常的话都是yellow,集群上线正常为green

GET /_cat/master:查看主节点
efBli3STR_i2BsdwuEcrhw 127.0.0.1 127.0.0.1 efBli3S

GET /_cat/indices:查看所有索引
yellow open .kibana khzaMKDvQnWaMF_TNUbZYQ 1 1 1 0 3.2kb 3.2kb

2. crud

2.1. POST 新增

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
POST /customer/external/2
{
"name":"John Hua"
}

{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

2.1.1. 没有id,自创建

AXExzS8q1XLtoSIsto8W

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
POST /customer/external
{
"name":"John Hua"
}

{
"_index": "customer",
"_type": "external",
"_id": "AXExzS8q1XLtoSIsto8W",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

不带数据校验

2.1.2. 有id,新增

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
POST /customer/external/2
{
"name":"John Hua"
}

{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

2.1.3. _upadte

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
POST /customer/external/2/_update
{
"doc":{
"age":18,
"name":"John Doe"
}
}

{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}

带数据校验,数据一样不操作 “result”: “noop”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
POST /customer/external/2/_update
{
"doc":{
"age":"18",
"name":"John Doe"
}
}

{
"_index": "customer",
"_type": "external",
"_id": "2",
"_version": 3,
"result": "noop",
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
}
}

运行脚本

ctx 当前上下文环境 “_source”: { “age”: 18 }

1
2
3
4
POST /customer/external/2/_update
{
"script":"ctx._source.age+=5"
}

2.1.4. 批量新增 _bulk

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
http://192.168.0.101:9200/bank/account/_bulk

{
{ action: { metadata }}
{ request body }

{ action: { metadata }}
{ request body }
}

{ "delete": { "_index": "website", "_type": "blog", "_id": "123" }}

{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "My first blog post" }

{ "index": { "_index": "website", "_type": "blog" }}
{ "title": "My second blog post" }

{ "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} }
{ "doc" : {"title" : "My updated blog post"} }

测试数据

https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true

2.2. PUT 修改

2.2.1. 有id,新增 updated

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
PUT /customer/external/1
{
"name":"John Doe"
}


{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 6,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}

2.2.2. 没有id,异常

1
No handler found for uri [/customer/external] and method [PUT]

2.3. GET 查

1
2
3
4
5
6
7
8
9
10
11
12
GET customer/external/1

{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 6,
"found": true,
"_source": {
"name": "John Doe"
}
}

2.4. DELETE 删除

只是删除标志位,重新新增 版本号增加

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
DELETE /customer/external/1

{
"found": true,
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 2,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
}
}


PUT /customer/external/1
{
"name":"John Doe"
}
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 3,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}

#整个库
DELETE /customer
{
"acknowledged": true
}

3. query

3.1. match_all

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
GET bank/_search
{
"query":{
"match_all": {}
},
"_source": ["age","address"]
}

{
"took": 4, #took - Elasticsearch 执行搜索的时间(毫秒)
"timed_out": false, #搜索是否超时
"_shards": { #多少个分片被搜索了,以及统计了成功/失败的搜索分片
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": { #搜索结果
"total": 999, #搜索结果
"max_score": 1,
"hits": [ #实际的搜索结果数组(默认为前 10 的文档)
{
"_index": "bank",
"_type": "account",
"_id": "25",
"_score": 1, #score 和 max_score –相关性得分和最高得分
"_source": {
}
},
{},
{}
...
}
}

3.2. 显示部分字段 “_source”: [“ “,” “]

1
2
3
4
5
6
7
GET bank/_search
{
"query":{
"match_all": {}
},
"_source": ["age","address"]
}

3.3. 模糊匹配 match

采用分词查询,一个个单词为关键词,记录所在文档号,倒排索引,根据内容锁定索引(文档ID)

故为模糊查询

1
2
3
4
5
6
7
8
9

GET bank/_search
{
"query": {
"match": {
"account_number": "44"
}
}
}

3.4. 精确匹配 match_phrase

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
GET bank/_search
{
"query": {
"match": {
"address.keyword": "930 Bay Avenue"
}
}
}

GET bank/_search
{
"query": {
"match_phrase": {
"address": "930 Bay Avenue"
}
}
}

3.5. 多字段查询 “multi_match”

1
2
3
4
5
6
7
8
9
GET bank/_search
{
"query": {
"multi_match": {
"query": "Mill ak",
"fields": ["address","state"]
}
}
}

3.6. 复合查询 bool

3.6.1. must-必须达到列举的所有条件,有助于得分

3.6.2. should-如果达到会增加相关文档的评分

3.6.3. must_not-必须不是指定的情况

性别为M,地址中包括Mill,州不在IL,年龄在30-40岁最好

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "Mill"
}
}
],
"must_not": [
{
"match": {
"state": "IL"
}
}
],
"should": [
{
"range": {
"age": {
"gte": 30,
"lte": 40
}
}
}
]
}
}
}

3.6.4. filter- 必须匹配,分数会被忽略

1
2
3
4
5
6
7
8
9
10
11
12
GET bank/_search
{
"query": {
"bool": {
"filter": {
"match": {
"address": "Bay"
}
}
}
}
}

3.7. term 精确匹配 非文本

查询精确值,但类型不是文本,只能是数字/布尔

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
GET bank/_search
{
"query": {
"term": {
"age": {
"value": "38"
}
}
}
}

"hits": {
"total": 39
}


GET bank/_search
{
"query": {
"term": {
"gender": {
"value": "F"
}
}
}
}


"hits": {
"total": 0
}


GET bank/_search
{
"query": {
"term": {
"gender.keyword": {
"value": "F"
}
}
}
}

"hits": {
"total": 493
}

3.8. terms 匹配多个值

age=23/35

1
2
3
4
5
6
7
8
9
10
11
GET bank/_search
{
"query": {
"terms": {
"age": [
"23",
"35"
]
}
}
}

3.9. 🌟聚合🌟

所有人平均年龄

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 所有人平均年龄
GET bank/_search
{
"aggs": {
"age_average": { #值名
"avg": { #求平均
"field": "age" #哪个属性
}
}
}
}

"aggregations": {
"age_average": {
"value": 30.176176176176178
}
}

address含Mill的人的年龄分布以及所有年龄平均值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
GET bank/_search
{
"query": {
"match": {
"address": "Mill"
}
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
},
"age_count":{
"terms": {
"field": "age",
"size": 100
}
}
}
}

"aggregations": {
"age_avg": {
"value": 34
},
"age_count": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 38,
"doc_count": 2
},
{
"key": 28,
"doc_count": 1
},
{
"key": 32,
"doc_count": 1
}
]
}
}

按照年龄聚合,并且请求这些年龄段的这些人的平均薪资

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
GET bank/_search
{
"aggs": {
"age_count": {
"terms": {
"field":"age"
}
},
"avg_balance":{
"avg": {
"field": "balance"
}
}
}
}

查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
GET bank/_search
{
"aggs": {
"age_count": {
"terms": {
"field":"age"
},
"aggs": {
"gender_count":{
"terms": {
"field": "gender.keyword"
}
},
"gender_avg":{
"avg": {
"field": "balance"
}
}
}
}
}
}

3.10. GET bank/_mapping

默认映射, 需要在创建索引的时候指定映射

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
{
"bank": {
"mappings": {
"account": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"balance": {
"type": "long"
},
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}

4. 分词

一个 tokenizer(分词器)接收一个字符流,将之分割为独立的 tokens(词元,通常是独立的单词),然后输出 tokens 流。

例如,whitespace tokenizer 遇到空白字符时分割文本。它会将文本 “Quick brown fox!“ 分割为 [Quick, brown, fox!]。

tokenizer(分词器)还负责记录各个 term(词条)的顺序或 position 位置(用于 phrase 短语和 word proximity 词近邻查询),以及 term(词条)所代表的原始 word(单词)的 start(起始)和 end(结束)的 character offsets(字符偏移量)(用于高亮显示搜索的内容)。

Elasticsearch 提供了很多内置的分词器,可以用来构建 custom analyzers(自定义分词器)

4.1. ik分词器

https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.6.11

找到对应版本

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip

进入docker elasticsearch终端的plugins文件夹安装ik

下载wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip

解压unzip elasticsearch-analysis-ik-5.6.11.zip

查看是否安装 elasticsearch-plugin list

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[linux@localhost ~]$ docker exec -it elasticsearch /bin/bash
root@19d9ac40ba76:/usr/share/elasticsearch# ls
NOTICE.txt README.textile bin config data lib logs modules plugins
root@19d9ac40ba76:/usr/share/elasticsearch# cd plugins
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# unzip elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# rm -rf elasticsearch-analysis-ik-5.6.11.zip
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
elasticsearch
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# mv elasticsearch ik
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls
ik
root@19d9ac40ba76:/usr/share/elasticsearch/plugins# cd ../bin
root@19d9ac40ba76:/usr/share/elasticsearch/bin# ls
elasticsearch elasticsearch-keystore elasticsearch-plugin elasticsearch-systemd-pre-exec elasticsearch-translog elasticsearch.in.sh
root@19d9ac40ba76:/usr/share/elasticsearch/bin# elasticsearch-plugin list
ik

4.2. 效果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
GET bank/_analyze
{
"text": "我是中国人"
}
================================
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "<IDEOGRAPHIC>",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "中",
"start_offset": 2,
"end_offset": 3,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "国",
"start_offset": 3,
"end_offset": 4,
"type": "<IDEOGRAPHIC>",
"position": 3
},
{
"token": "人",
"start_offset": 4,
"end_offset": 5,
"type": "<IDEOGRAPHIC>",
"position": 4
}
]
}

4.2.1. ik-smart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
GET bank/_analyze
{
"analyzer": "ik_smart",
"text":"我是中国人"
}
===========识别================
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
}
]
}

4.2.2. ik-max-word

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
GET bank/_analyze
{
"analyzer": "ik_max_word",
"text":"我是中国人"
}
=================================
{
"tokens": [
{
"token": "我",
"start_offset": 0,
"end_offset": 1,
"type": "CN_CHAR",
"position": 0
},
{
"token": "是",
"start_offset": 1,
"end_offset": 2,
"type": "CN_CHAR",
"position": 1
},
{
"token": "中国人",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 2
},
{
"token": "中国",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 3
},
{
"token": "国人",
"start_offset": 3,
"end_offset": 5,
"type": "CN_WORD",
"position": 4
}
]
}
本文结束  感谢您的阅读
  • 本文作者: Wang Ting
  • 本文链接: /zh-CN/2020/03/29/elasticsearch学习/
  • 发布时间: 2020-03-29 11:58
  • 更新时间: 2022-10-24 23:32
  • 版权声明: 本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!