2018-09-29

Elasticsearch-聚合

聚合为你的数据提供分组和提取统计数据的能力，最简单的理解聚合的方式是粗略的认为它和SQL GROUP BY 和SQL 聚合函数相同。在Elasticsearch中，你可以在一个查询返回结果中同时返回命中结果和各自的聚合结果。这是非常高效的方式，你可以使用简洁的API在一次网络调用中执行查询和复杂聚合并获取结果（同时或各自）。

开始，这个例子中我们通过state为accounts分组，按照数量倒序（默认）返回前十（默认）个结果：


GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

在SQL中，上边的聚合相当于：

1	SELECT state, COUNT() FROM bank GROUP BY state ORDER BY COUNT() DESC LIMIT 10;

结果为(部分展示)：

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets" : [ {
        "key" : "ID",
        "doc_count" : 27
      }, {
        "key" : "TX",
        "doc_count" : 27
      }, {
        "key" : "AL",
        "doc_count" : 25
      }, {
        "key" : "MD",
        "doc_count" : 25
      }, {
        "key" : "TN",
        "doc_count" : 23
      }, {
        "key" : "MA",
        "doc_count" : 21
      }, {
        "key" : "NC",
        "doc_count" : 21
      }, {
        "key" : "ND",
        "doc_count" : 21
      }, {
        "key" : "ME",
        "doc_count" : 20
      }, {
        "key" : "MO",
        "doc_count" : 20
      } ]
    }
  }
}

我们能看到，27个accounts在ID（Idaho），接着，27个accounts在TX（Texas),然后是25个accounts在AL（Alabama) 诸如此类…

注意，这里我们设置size=0来不展示命中的内容，是为了只在响应中仅展示聚合结果。

基于前边的聚合，这个例子计算各个state的平均账户金额（同样只展示通过数量倒序排序的前十个数据）：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

这里要注意我们是如何在group_by_state聚合中嵌套average_balance聚合的。这是所有聚合的通用模式，你可以在聚合中任意嵌套聚合来从数据中取出你需要的那部分。

基于前边的聚合，让我们通过平均余额来对accounts进行排序：

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

这个例子示范了如何通过年龄段进行分组，然后通过性别，最后获取每个年龄区间的每个性别的平均余额

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

聚合还有很多能力我们没能在这里详细展现，如果你想了解更多特性请参照aggregations reference guide

文章链接 https://fangzongzhou.github.io/2018/09/29/计算机/技术文档/Elasticsearch/聚合/