聚合操作（count、distinct）

作者：陈川阅读数：43794人阅读分类： MongoDB

聚合操作中的count

count是MongoDB中最基础的聚合操作之一，用于统计集合中符合条件的文档数量。这个操作在数据分析和报表生成时特别有用。基本语法如下：

db.collection.count(query)

其中query参数是可选的，用于指定筛选条件。例如统计users集合中所有文档数量：

db.users.count()

如果要统计年龄大于30的用户数量：

db.users.count({age: {$gt: 30}})

在MongoDB 4.0之后，count()方法被标记为已废弃，推荐使用countDocuments()或estimatedDocumentCount()替代：

// 精确计数，支持查询条件
db.users.countDocuments({age: {$gt: 30}})

// 快速估计数量，但不支持查询条件
db.users.estimatedDocumentCount()

distinct操作

distinct操作用于获取集合中某个字段的所有不同值。这在需要获取分类列表或枚举值时非常实用。基本语法：

db.collection.distinct(field, query)

例如获取users集合中所有不同的城市：

db.users.distinct("address.city")

也可以添加查询条件，比如获取年龄大于30的用户所在的不同城市：

db.users.distinct("address.city", {age: {$gt: 30}})

distinct操作返回的是一个数组，包含所有不重复的值。对于大型集合，这个操作可能会消耗较多资源。

聚合管道中的count

在更复杂的聚合场景中，可以在聚合管道中使用$count阶段。这种方式可以与其他聚合阶段配合使用：

db.users.aggregate([
  {$match: {age: {$gt: 30}}},
  {$count: "over30Count"}
])

这个例子会返回类似这样的结果：

{"over30Count": 42}

$count阶段必须放在管道的最后，因为它会消耗所有输入文档并输出单个文档。

聚合管道中的distinct

虽然distinct可以作为独立操作使用，但在聚合管道中实现类似功能通常使用$group阶段：

db.users.aggregate([
  {$group: {_id: "$address.city"}}
])

这种方式比distinct更灵活，可以与其他聚合阶段组合。例如统计每个城市的用户数量：

db.users.aggregate([
  {$group: {
    _id: "$address.city",
    userCount: {$sum: 1}
  }}
])

性能考虑

count和distinct操作的性能受多种因素影响：

索引使用：确保查询字段有适当索引
集合大小：大型集合可能需要更长时间
分片集群：跨分片操作会有额外开销

对于countDocuments()，MongoDB会执行完整的查询计划，而estimatedDocumentCount()使用集合元数据，速度更快但不精确。

distinct操作在内存中构建所有唯一值，对于高基数字段可能消耗大量内存。替代方案是使用聚合管道的$group阶段配合allowDiskUse选项。

实际应用示例

假设有一个电商订单集合orders，包含字段：orderDate、customerId、products（数组）、totalAmount等。

统计2023年的订单数量：

db.orders.countDocuments({
  orderDate: {
    $gte: ISODate("2023-01-01"),
    $lt: ISODate("2024-01-01")
  }
})

获取所有购买过特定类别商品的客户ID：

db.orders.distinct("customerId", {
  "products.category": "电子产品"
})

使用聚合管道统计每月订单数量和不同客户数：

db.orders.aggregate([
  {$match: {orderDate: {$gte: ISODate("2023-01-01")}}},
  {$project: {
    month: {$month: "$orderDate"},
    customerId: 1
  }},
  {$group: {
    _id: "$month",
    orderCount: {$sum: 1},
    uniqueCustomers: {$addToSet: "$customerId"}
  }},
  {$project: {
    month: "$_id",
    orderCount: 1,
    customerCount: {$size: "$uniqueCustomers"}
  }}
])

高级用法

对于更复杂的需求，可以结合多个聚合阶段：

先$match筛选文档
然后$unwind展开数组
接着$group分组计算
最后$sort排序结果

例如统计每个产品类别的销售额和订单数：

db.orders.aggregate([
  {$unwind: "$products"},
  {$group: {
    _id: "$products.category",
    totalSales: {$sum: "$products.price"},
    orderCount: {$sum: 1}
  }},
  {$sort: {totalSales: -1}}
])

限制与替代方案

count和distinct有一些限制：

结果集大小限制（16MB）
内存使用限制
分片集群中的性能问题

替代方案包括：

使用聚合管道的$facet阶段并行执行多个操作
使用$sample进行抽样统计
使用Map-Reduce处理超大数据集

例如使用$facet同时计算多个统计量：

db.orders.aggregate([
  {$match: {status: "completed"}},
  {$facet: {
    "totalOrders": [{$count: "count"}],
    "byMonth": [
      {$group: {
        _id: {$month: "$orderDate"},
        count: {$sum: 1}
      }}
    ],
    "topCustomers": [
      {$group: {
        _id: "$customerId",
        orderCount: {$sum: 1}
      }},
      {$sort: {orderCount: -1}},
      {$limit: 5}
    ]
  }}
])

做个网站！

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱：cc@cccx.cn

上一篇：排序（sort）与分页（limit、skip）

下一篇：Vite.js核心知识点