Druid入门指南 查询数据教程


1 查询数据

本教程将以Druid SQL和Druid的原生查询格式的示例演示如何在Apache Druid中进行数据查询。

本教程假定您已经完成了读取文件、数据的任一教程:

Druid查询通过HTTP进行发送;Druid的控制台包含一个视图,用于向Druid发出查询并展示结果。

*本教程所使用的的数据均源于Druid安装包,具体请参考Apache Druid 启动

2 Druid SQL查询

Druid支持SQL查询。

以下的查询语句检索了2015年9月12日被编辑最多的10个维基百科页面。

SELECT page, COUNT(*) AS Edits
FROM wikipedia
WHERE TIMESTAMP '2015-09-12 00:00:00' <= "__time" AND "__time" < TIMESTAMP '2015-09-13 00:00:00'
GROUP BY page
ORDER BY Edits DESC
LIMIT 10

1)通过控制台查询SQL

Druid SQL,Druid查询,Druid数据查询,Druid dsql

控制台查询视图提供语句自动补全功能。

Druid SQL,Druid查询,Druid数据查询,Druid dsql

您还可以从 … 选项菜单中配置要与查询一起发送的其他上下文标志。

请注意,控制台将(默认情况下)使用带Limit的SQL查询,以便可以完成诸如SELECT * FROM wikipedia之类的查询,您可以通过 Smart query limit 切换关闭此行为。

Druid SQL,Druid查询,Druid数据查询,Druid dsql

查询视图提供了可以为您编写和修改查询的上下文操作。

2)通过dsql查询SQL

Druid软件包中包括了一个SQL命令行客户端,位于Druid根目录中的 bin/dsql

运行 bin/dsql, 如下所示:

Welcome to dsql, the command-line client for Druid SQL.
Type "\h" for help.
dsql>

将SQL粘贴到 dsql 中提交查询:

dsql> SELECT page, COUNT(*) AS Edits FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 10;
┌──────────────────────────────────────────────────────────┬───────┐
│ page                                                     │ Edits │
├──────────────────────────────────────────────────────────┼───────┤
│ Wikipedia:Vandalismusmeldung                             │    33 │
│ User:Cyde/List of candidates for speedy deletion/Subpage │    28 │
│ Jeremy Corbyn                                            │    27 │
│ Wikipedia:Administrators' noticeboard/Incidents          │    21 │
│ Flavia Pennetta                                          │    20 │
│ Total Drama Presents: The Ridonculous Race               │    18 │
│ User talk:Dudeperson176123                               │    18 │
│ Wikipédia:Le Bistro/12 septembre 2015                    │    18 │
│ Wikipedia:In the news/Candidates                         │    17 │
│ Wikipedia:Requests for page protection                   │    17 │
└──────────────────────────────────────────────────────────┴───────┘
Retrieved 10 rows in 0.06s.

3)通过HTTP查询SQL

SQL查询作为JSON通过HTTP提交

教程包括一个示例文件,该文件quickstart/tutorial/wikipedia-top-pages-sql.json包含上面显示的SQL查询,我们将该查询提交给Druid Broker。

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages-sql.json http://localhost:8888/druid/v2/sql

结果返回如下:

[
  {
    "page": "Wikipedia:Vandalismusmeldung",
    "Edits": 33
  },
  {
    "page": "User:Cyde/List of candidates for speedy deletion/Subpage",
    "Edits": 28
  },
  {
    "page": "Jeremy Corbyn",
    "Edits": 27
  },
  {
    "page": "Wikipedia:Administrators' noticeboard/Incidents",
    "Edits": 21
  },
  {
    "page": "Flavia Pennetta",
    "Edits": 20
  },
  {
    "page": "Total Drama Presents: The Ridonculous Race",
    "Edits": 18
  },
  {
    "page": "User talk:Dudeperson176123",
    "Edits": 18
  },
  {
    "page": "Wikipédia:Le Bistro/12 septembre 2015",
    "Edits": 18
  },
  {
    "page": "Wikipedia:In the news/Candidates",
    "Edits": 17
  },
  {
    "page": "Wikipedia:Requests for page protection",
    "Edits": 17
  }
]

3 其他Druid SQL示例

1)时间查询

SELECT FLOOR(__time to HOUR) AS HourTime, SUM(deleted) AS LinesDeleted
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY 1

Druid SQL,Druid查询,Druid数据查询,Druid dsql

2)聚合查询

SELECT channel, page, SUM(added)
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00'
GROUP BY channel, page
ORDER BY SUM(added) DESC

Druid SQL,Druid查询,Druid数据查询,Druid dsql

3)查询原始数据

SELECT user, page
FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2015-09-12 02:00:00' AND TIMESTAMP '2015-09-12 03:00:00'
LIMIT 5

Druid SQL,Druid查询,Druid数据查询,Druid dsql

4 原生JSON数据查询

Druid的原生查询格式以JSON表示。

1)通过控制台查询

您可以从控制台的”Query”视图发出原生Druid数据查询。

这是一个查询,可检索2015-09-12上具有最多页面编辑量的10个wikipedia页面。

{
  "queryType" : "topN",
  "dataSource" : "wikipedia",
  "intervals" : ["2015-09-12/2015-09-13"],
  "granularity" : "all",
  "dimension" : "page",
  "metric" : "count",
  "threshold" : 10,
  "aggregations" : [
    {
      "type" : "count",
      "name" : "count"
    }
  ]
}

只需将其粘贴到控制台即可将编辑器切换到JSON模式。

Druid SQL,Druid查询,Druid数据查询,Druid dsql

2)通过HTTP查询

在 quickstart/tutorial/wikipedia-top-pages.json 文件中包括了一个示例原生TopN查询。

提交该查询到Druid:

curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-top-pages.json http://localhost:8888/druid/v2?pretty

您可以看到如下的查询结果:

[ {
  "timestamp" : "2015-09-12T00:46:58.771Z",
  "result" : [ {
    "count" : 33,
    "page" : "Wikipedia:Vandalismusmeldung"
  }, {
    "count" : 28,
    "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
  }, {
    "count" : 27,
    "page" : "Jeremy Corbyn"
  }, {
    "count" : 21,
    "page" : "Wikipedia:Administrators' noticeboard/Incidents"
  }, {
    "count" : 20,
    "page" : "Flavia Pennetta"
  }, {
    "count" : 18,
    "page" : "Total Drama Presents: The Ridonculous Race"
  }, {
    "count" : 18,
    "page" : "User talk:Dudeperson176123"
  }, {
    "count" : 18,
    "page" : "Wikipédia:Le Bistro/12 septembre 2015"
  }, {
    "count" : 17,
    "page" : "Wikipedia:In the news/Candidates"
  }, {
    "count" : 17,
    "page" : "Wikipedia:Requests for page protection"
  } ]
} ]