Elasticsearch学习指南
1. 初步检索 _cat 1 2 3 4 5 6 7 8 9 10 11 12 GET /_cat/nodes:查看所有节点 127.0.0.1 39 92 6 0.26 0.19 0.14 mdi * efBli3S GET /_cat/health:查看es健康状况 1585677729 18:02:09 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0% 单节点正常的话都是yellow,集群上线正常为green GET /_cat/master:查看主节点 efBli3STR_i2BsdwuEcrhw 127.0.0.1 127.0.0.1 efBli3S GET /_cat/indices:查看所有索引 yellow open .kibana khzaMKDvQnWaMF_TNUbZYQ 1 1 1 0 3.2kb 3.2kb
2. crud 2.1. POST 新增 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 POST /customer/external/2 { "name" : "John Hua" } { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 1 , "result" : "created" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "created" : true }
2.1.1. 没有id,自创建 AXExzS8q1XLtoSIsto8W
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 POST /customer/external { "name" : "John Hua" } { "_index" : "customer" , "_type" : "external" , "_id" : "AXExzS8q1XLtoSIsto8W" , "_version" : 1 , "result" : "created" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "created" : true }
不带数据校验
2.1.2. 有id,新增 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 POST /customer/external/2 { "name" : "John Hua" } { "_index" : "customer" , "_type" : "external" , "_id" : "2" , "_version" : 2 , "result" : "updated" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "created" : false }
2.1.3. _upadte 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 POST /customer/external/2 /_update { "doc" : { "age" : 18 , "name" : "John Doe" } } { "_index" : "customer" , "_type" : "external" , "_id" : "2" , "_version" : 3 , "result" : "updated" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } }
带数据校验,数据一样不操作 “result”: “noop”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 POST /customer/external/2 /_update { "doc" : { "age" : "18" , "name" : "John Doe" } } { "_index" : "customer" , "_type" : "external" , "_id" : "2" , "_version" : 3 , "result" : "noop" , "_shards" : { "total" : 0 , "successful" : 0 , "failed" : 0 } }
运行脚本
ctx 当前上下文环境 “_source”: { “age”: 18 }
1 2 3 4 POST /customer/external/2 /_update { "script" : "ctx._source.age+=5" }
2.1.4. 批量新增 _bulk 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 http: { { action: { metadata } } { request body } { action: { metadata } } { request body } } { "delete" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" } } { "create" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" } } { "title" : "My first blog post" } { "index" : { "_index" : "website" , "_type" : "blog" } } { "title" : "My second blog post" } { "update" : { "_index" : "website" , "_type" : "blog" , "_id" : "123" , "_retry_on_conflict" : 3 } } { "doc" : { "title" : "My updated blog post" } }
测试数据
https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true
2.2. PUT 修改 2.2.1. 有id,新增 updated 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PUT /customer/external/1 { "name" : "John Doe" } { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 6 , "result" : "updated" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "created" : false }
2.2.2. 没有id,异常 1 No handler found for uri [ /customer/external] and method [ PUT]
2.3. GET 查 1 2 3 4 5 6 7 8 9 10 11 12 GET customer/external/1 { "_index": "customer", "_type": "external", "_id": "1", "_version": 6, "found": true, "_source": { "name": "John Doe" } }
2.4. DELETE 删除 只是删除标志位,重新新增 版本号增加
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 DELETE /customer/external/1 { "found" : true , "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 2 , "result" : "deleted" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } } PUT /customer/external/1 { "name" : "John Doe" } { "_index" : "customer" , "_type" : "external" , "_id" : "1" , "_version" : 3 , "result" : "created" , "_shards" : { "total" : 2 , "successful" : 1 , "failed" : 0 } , "created" : true } #整个库 DELETE /customer { "acknowledged" : true }
3. query 3.1. match_all 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 GET bank/_search { "query" : { "match_all" : { } } , "_source" : [ "age" , "address" ] } { "took" : 4 , #took - Elasticsearch 执行搜索的时间(毫秒) "timed_out" : false , #搜索是否超时 "_shards" : { #多少个分片被搜索了,以及统计了成功/失败的搜索分片 "total" : 5 , "successful" : 5 , "skipped" : 0 , "failed" : 0 } , "hits" : { #搜索结果 "total" : 999 , #搜索结果 "max_score" : 1 , "hits" : [ #实际的搜索结果数组(默认为前 10 的文档) { "_index" : "bank" , "_type" : "account" , "_id" : "25" , "_score" : 1 , #score 和 max_score –相关性得分和最高得分 "_source" : { } } , { } , { } ... } }
3.2. 显示部分字段 “_source”: [“ “,” “] 1 2 3 4 5 6 7 GET bank/_search { "query" : { "match_all" : { } } , "_source" : [ "age" , "address" ] }
3.3. 模糊匹配 match 采用分词查询,一个个单词为关键词,记录所在文档号,倒排索引,根据内容锁定索引(文档ID)
故为模糊查询
1 2 3 4 5 6 7 8 9 GET bank/_search { "query" : { "match" : { "account_number" : "44" } } }
3.4. 精确匹配 match_phrase 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 GET bank/_search { "query" : { "match" : { "address.keyword" : "930 Bay Avenue" } } } GET bank/_search { "query" : { "match_phrase" : { "address" : "930 Bay Avenue" } } }
3.5. 多字段查询 “multi_match” 1 2 3 4 5 6 7 8 9 GET bank/_search { "query" : { "multi_match" : { "query" : "Mill ak" , "fields" : [ "address" , "state" ] } } }
3.6. 复合查询 bool 3.6.1. must-必须达到列举的所有条件,有助于得分 3.6.2. should-如果达到会增加相关文档的评分 3.6.3. must_not-必须不是指定的情况 性别为M,地址中包括Mill,州不在IL,年龄在30-40岁最好
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 GET bank/_search { "query" : { "bool" : { "must" : [ { "match" : { "gender" : "M" } } , { "match" : { "address" : "Mill" } } ] , "must_not" : [ { "match" : { "state" : "IL" } } ] , "should" : [ { "range" : { "age" : { "gte" : 30 , "lte" : 40 } } } ] } } }
3.6.4. filter- 必须匹配,分数会被忽略 1 2 3 4 5 6 7 8 9 10 11 12 GET bank/_search { "query" : { "bool" : { "filter" : { "match" : { "address" : "Bay" } } } } }
3.7. term 精确匹配 非文本 查询精确值,但类型不是文本,只能是数字/布尔
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 GET bank/_search { "query" : { "term" : { "age" : { "value" : "38" } } } } "hits" : { "total" : 39 } GET bank/_search { "query" : { "term" : { "gender" : { "value" : "F" } } } } "hits" : { "total" : 0 } GET bank/_search { "query" : { "term" : { "gender.keyword" : { "value" : "F" } } } } "hits" : { "total" : 493 }
3.8. terms 匹配多个值 age=23/35
1 2 3 4 5 6 7 8 9 10 11 GET bank/_search { "query" : { "terms" : { "age" : [ "23" , "35" ] } } }
3.9. 🌟聚合🌟 所有人平均年龄
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 # 所有人平均年龄 GET bank/_search { "aggs" : { "age_average" : { #值名 "avg" : { #求平均 "field" : "age" #哪个属性 } } } } "aggregations" : { "age_average" : { "value" : 30.176176176176178 } }
address含Mill的人的年龄分布以及所有年龄平均值
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 GET bank/_search { "query" : { "match" : { "address" : "Mill" } } , "aggs" : { "age_avg" : { "avg" : { "field" : "age" } } , "age_count" : { "terms" : { "field" : "age" , "size" : 100 } } } } "aggregations" : { "age_avg" : { "value" : 34 } , "age_count" : { "doc_count_error_upper_bound" : 0 , "sum_other_doc_count" : 0 , "buckets" : [ { "key" : 38 , "doc_count" : 2 } , { "key" : 28 , "doc_count" : 1 } , { "key" : 32 , "doc_count" : 1 } ] } }
按照年龄聚合,并且请求这些年龄段的这些人的平均薪资
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 GET bank/_search { "aggs" : { "age_count" : { "terms" : { "field" : "age" } } , "avg_balance" : { "avg" : { "field" : "balance" } } } }
查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 GET bank/_search { "aggs" : { "age_count" : { "terms" : { "field" : "age" } , "aggs" : { "gender_count" : { "terms" : { "field" : "gender.keyword" } } , "gender_avg" : { "avg" : { "field" : "balance" } } } } } }
3.10. GET bank/_mapping 默认映射, 需要在创建索引的时候指定映射
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 { "bank" : { "mappings" : { "account" : { "properties" : { "account_number" : { "type" : "long" } , "address" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "age" : { "type" : "long" } , "balance" : { "type" : "long" } , "city" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "email" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "employer" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "firstname" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "gender" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "lastname" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } , "state" : { "type" : "text" , "fields" : { "keyword" : { "type" : "keyword" , "ignore_above" : 256 } } } } } } } }
4. 分词 一个 tokenizer (分词器)接收一个字符流,将之分割为独立的 tokens (词元,通常是独立的单词),然后输出 tokens 流。
例如,whitespace tokenizer 遇到空白字符时分割文本。它会将文本 “Quick brown fox! “ 分割为 [Quick , brown , fox! ]。
该 tokenizer (分词器)还负责记录各个 term (词条)的顺序或 position 位置(用于 phrase 短语和 word proximity 词近邻查询),以及 term (词条)所代表的原始 word (单词)的 start (起始)和 end (结束)的 character offsets (字符偏移量)(用于高亮显示搜索的内容)。
Elasticsearch 提供了很多内置的分词器,可以用来构建 custom analyzers (自定义分词器)
4.1. ik分词器 https://github.com/medcl/elasticsearch-analysis-ik/releases/tag/v5.6.11
找到对应版本
https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip
进入docker elasticsearch终端的plugins文件夹安装ik
下载wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip
解压unzip elasticsearch-analysis-ik-5.6.11.zip
查看是否安装 elasticsearch-plugin list
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 [linux@localhost ~]$ docker exec -it elasticsearch /bin/bash root@19d9ac40ba76:/usr/share/elasticsearch# ls NOTICE.txt README.textile bin config data lib logs modules plugins root@19d9ac40ba76:/usr/share/elasticsearch# cd plugins root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls root@19d9ac40ba76:/usr/share/elasticsearch/plugins# wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.6.11/elasticsearch-analysis-ik-5.6.11.zip root@19d9ac40ba76:/usr/share/elasticsearch/plugins# unzip elasticsearch-analysis-ik-5.6.11.zip root@19d9ac40ba76:/usr/share/elasticsearch/plugins# rm -rf elasticsearch-analysis-ik-5.6.11.zip root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls elasticsearch root@19d9ac40ba76:/usr/share/elasticsearch/plugins# mv elasticsearch ik root@19d9ac40ba76:/usr/share/elasticsearch/plugins# ls ik root@19d9ac40ba76:/usr/share/elasticsearch/plugins# cd ../bin root@19d9ac40ba76:/usr/share/elasticsearch/bin# ls elasticsearch elasticsearch-keystore elasticsearch-plugin elasticsearch-systemd-pre-exec elasticsearch-translog elasticsearch.in.sh root@19d9ac40ba76:/usr/share/elasticsearch/bin# elasticsearch-plugin list ik
4.2. 效果 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 GET bank/_analyze { "text" : "我是中国人" } ================================ { "tokens" : [ { "token" : "我" , "start_offset" : 0 , "end_offset" : 1 , "type" : "<IDEOGRAPHIC>" , "position" : 0 } , { "token" : "是" , "start_offset" : 1 , "end_offset" : 2 , "type" : "<IDEOGRAPHIC>" , "position" : 1 } , { "token" : "中" , "start_offset" : 2 , "end_offset" : 3 , "type" : "<IDEOGRAPHIC>" , "position" : 2 } , { "token" : "国" , "start_offset" : 3 , "end_offset" : 4 , "type" : "<IDEOGRAPHIC>" , "position" : 3 } , { "token" : "人" , "start_offset" : 4 , "end_offset" : 5 , "type" : "<IDEOGRAPHIC>" , "position" : 4 } ] }
4.2.1. ik-smart 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 GET bank/_analyze { "analyzer" : "ik_smart" , "text" : "我是中国人" } ===========识别================ { "tokens" : [ { "token" : "我" , "start_offset" : 0 , "end_offset" : 1 , "type" : "CN_CHAR" , "position" : 0 } , { "token" : "是" , "start_offset" : 1 , "end_offset" : 2 , "type" : "CN_CHAR" , "position" : 1 } , { "token" : "中国人" , "start_offset" : 2 , "end_offset" : 5 , "type" : "CN_WORD" , "position" : 2 } ] }
4.2.2. ik-max-word 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 GET bank/_analyze { "analyzer" : "ik_max_word" , "text" : "我是中国人" } ================================= { "tokens" : [ { "token" : "我" , "start_offset" : 0 , "end_offset" : 1 , "type" : "CN_CHAR" , "position" : 0 } , { "token" : "是" , "start_offset" : 1 , "end_offset" : 2 , "type" : "CN_CHAR" , "position" : 1 } , { "token" : "中国人" , "start_offset" : 2 , "end_offset" : 5 , "type" : "CN_WORD" , "position" : 2 } , { "token" : "中国" , "start_offset" : 2 , "end_offset" : 4 , "type" : "CN_WORD" , "position" : 3 } , { "token" : "国人" , "start_offset" : 3 , "end_offset" : 5 , "type" : "CN_WORD" , "position" : 4 } ] }