-
聚合可以实现对文档数据的统计,分析,运算,聚合常见有三类(聚合的值一定不能是text类型的):
桶(Bucket)聚合:用来对文档做分组。
度量(Metric)聚合:用以计算一些值,比方说最大值,最小值,平均值等。
管道(pipeline)聚合:其它聚合的结果为基础进行聚合。
参与聚合的字段类型:keyword,数值,日期,布尔。
-
DSL实现Bucket聚合
lasticsearch 的 Bucket 聚合(桶聚合)是将文档分组到 "桶" 中的强大工具,类似于 SQL 中的
GROUP BY
。每个桶关联一个条件,符合条件的文档会被分到对应的桶中。Terms聚合
-
场景:统计博客文章中每个标签的文档数量。
-
GET /blog/_search {"size": 0, // 不返回原始文档,只返回聚合结果"aggs": {"tags": {"terms": {"field": "tags.keyword", // 使用keyword类型避免分词"size": 10, // 返回前10个最常见的标签"order": {"_count": "desc" // 按文档数量降序排序}}}} } 结果示例 {"aggregations": {"tags": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 15,"buckets": [{"key": "elasticsearch","doc_count": 25},{"key": "java","doc_count": 18}]}} } //大多数 Bucket 聚合默认按文档数量(_count)降序排序。 //按文档数量排序DSL演示 GET /products/_search {"size": 0,"aggs": {"by_category": {"terms": {"field": "category.keyword","order": {"_count": "desc" // 按文档数量降序(默认)}}}} } //结果演示 {"aggregations": {"by_category": {"buckets": [{ "key": "electronics", "doc_count": 120 },{ "key": "clothing", "doc_count": 80 },{ "key": "books", "doc_count": 50 }]}} } //场景:只对价格大于 100 的商品进行类别聚合 //DSL示例 GET /products/_search {"query": {"range": {"price": {"gt": 100}}},"size": 0,"aggs": {"by_category": {"terms": {"field": "category.keyword"}}} } //结果显示 {"aggregations": {"by_category": {"buckets": [{"key": "electronics","doc_count": 100,"expensive_products": {"doc_count": 75, // 价格>100的电子产品数量"count": {"value": 75}}}]}} }
-
aggs代表聚合,与query同级,此时query的作用是限定聚合的的文档范围
-
聚合必须的三要素
-
聚合名称
-
聚合类型
-
聚合字段
-
-
聚合可配置的属性有:size:指定聚合结果数量,order指定聚合结果排序方式,field指定聚合字段。
-
-
DSL实现Metric聚合
计算所有产品的平均价格
GET /products/_search {"size": 0, // 不返回原始文档"aggs": {"avg_price": {"avg": {"field": "price"}}} } //结果显示 {"aggregations": {"avg_price": {"value": 125.5 // 平均价格}} }
嵌套聚合metric聚合的组合使用
//按类别分组,计算每个类别的平均价格、最高价格和最低价格。 GET /products/_search {"size": 0,"aggs": {"by_category": {"terms": {"field": "category.keyword"},"aggs": {"avg_price": { "avg": { "field": "price" } },"max_price": { "max": { "field": "price" } },"min_price": { "min": { "field": "price" } },"price_stats": { "stats": { "field": "price" } }}}} }
在java中进行聚合
import org.elasticsearch.action.search.SearchRequest; import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.RequestOptions; import org.elasticsearch.client.RestHighLevelClient; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.aggregations.AggregationBuilders; import org.elasticsearch.search.aggregations.bucket.filter.Filter; import org.elasticsearch.search.aggregations.bucket.terms.Terms; import org.elasticsearch.search.builder.SearchSourceBuilder; import java.io.IOException; public class FilterAggregationExample {private final RestHighLevelClient client;public FilterAggregationExample(RestHighLevelClient client) {this.client = client;}public void filterAggregation() throws IOException {SearchRequest searchRequest = new SearchRequest("products");SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// 构建按类别分组的聚合,在每个类别中过滤价格>100的商品searchSourceBuilder.aggregation(AggregationBuilders.terms("by_category").field("category.keyword").subAggregation(AggregationBuilders.filter("expensive_products",QueryBuilders.rangeQuery("price").gt(100)).subAggregation(AggregationBuilders.valueCount("count").field("id"))));searchRequest.source(searchSourceBuilder);SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);// 处理聚合结果Terms byCategory = response.getAggregations().get("by_category");for (Terms.Bucket bucket : byCategory.getBuckets()) {String category = bucket.getKeyAsString();long totalCount = bucket.getDocCount();Filter expensiveProducts = bucket.getAggregations().get("expensive_products");long expensiveCount = expensiveProducts.getDocCount();System.out.println("Category: " + category + ", Total: " + totalCount + ", Expensive: " + expensiveCount);}} }