pipelinedb 初探

2019 年 10 月 4 日
筆記

pipelinedb的安装：

ref:

http://docs.pipelinedb.com/installation.html#installation

http://docs.pipelinedb.com/quickstart.html#quickstart

说明： pipelinedb 现在已经是pg的一个插件，【Since PipelineDB runs as an extension to PostreSQL, begin by installing PostgreSQL】。我们需要先安装pg，然后安装pipeline的扩展即可

需要先安装postgresql-server11版本

postgresql11-server-11.2-1PGDG.rhel7.x86_64

postgresql11-11.2-1PGDG.rhel7.x86_64

然后，安装对应版本的pipelinedb

rpm -ivh pipelinedb-postgresql-11-1.0.0-13.centos7.x86_64.rpm

# 初始化pg

su – postgres

/usr/pgsql-11/bin/pg_ctl initdb -D /var/lib/pgsql/11/data

修改pg的配置文件

listen_addresses = '*'

max_worker_processes = 128

shared_preload_libraries = 'pipelinedb'

# 启动pg

/usr/pgsql-11/bin/pg_ctl start -D /var/lib/pgsql/11/data

# 登录pg

/usr/pgsql-11/bin/psql

create database pipeline ;

c pipeline;

create extension pipelinedb;

# 这里还可以有些创建账号、添加pg_hba地址授权等操作，不在本文讨论的范畴内。

/usr/pgsql-11/bin/psql pipeline # 登录后，执行如下的2个SQL：

-- 创建一个外部表，实际数据来自pipelinedb  CREATE FOREIGN TABLE wiki_stream (  hour timestamp,  project text,  title text,  view_count bigint,  size bigint)  SERVER pipelinedb;

-- 创建CV视图  CREATE VIEW wiki_stats WITH (action=materialize) AS  SELECT hour, project,  count(*) AS total_pages,  sum(view_count) AS total_views,  min(view_count) AS min_views,  max(view_count) AS max_views,  avg(view_count) AS avg_views,  percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,  sum(size) AS total_bytes_served  FROM wiki_stream  GROUP BY hour, project;

# 查看数据表：

pipeline=# d  List of relations  Schema |       Name       |     Type      |  Owner  --------+------------------+---------------+----------  public | wiki_stats       | view          | postgres  public | wiki_stats_def   | view          | postgres  public | wiki_stats_mrel  | table         | postgres  public | wiki_stats_osrel | foreign table | postgres  public | wiki_stats_seq   | sequence      | postgres  public | wiki_stream      | foreign table | postgres  (6 rows)

# 现在我们将数据集解压缩为流并将其写入stdin，它可以用作COPY的输入

curl -sL http://pipelinedb.com/data/wiki-pagecounts | gunzip | /usr/pgsql-11/bin/psql pipeline -c "COPY wiki_stream (hour, project, title, view_count, size) FROM STDIN"

# 查看测试的数据集：

/usr/pgsql-11/bin/psql pipeline -c "SELECT * FROM wiki_stats ORDER BY total_views DESC";

# 要查看系统中当前的连续视图及其定义，可以运行以下查询：

pipeline=# SELECT * FROM pipelinedb.views;  id | schema |    name    | active |                                                              query  ----+--------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------  3 | public | wiki_stats | t      |  SELECT wiki_stream.hour,                                                                                                       +  |        |            |        |     wiki_stream.project,                                                                                                        +  |        |            |        |     count(*) AS total_pages,                                                                                                    +  |        |            |        |     sum(wiki_stream.view_count) AS total_views,                                                                                 +  |        |            |        |     min(wiki_stream.view_count) AS min_views,                                                                                   +  |        |            |        |     max(wiki_stream.view_count) AS max_views,                                                                                   +  |        |            |        |     avg(wiki_stream.view_count) AS avg_views,                                                                                   +  |        |            |        |     percentile_cont((0.99)::double precision) WITHIN GROUP (ORDER BY ((wiki_stream.view_count)::double precision)) AS p99_views,+  |        |            |        |     sum(wiki_stream.size) AS total_bytes_served                                                                                 +  |        |            |        |    FROM wiki_stream                                                                                                             +  |        |            |        |   GROUP BY wiki_stream.hour, wiki_stream.project  (1 row)

pipelinedb 在实时流式计算方面，有很多用法。具体可以参照德哥的github。

pipelinedb 初探

VirMach 便宜 VPS

QNews

pipelinedb 初探

分享此文：

Related Posts

赋予楼宇“智慧大脑”：厦门双子塔3D可视化

2019秋招：460道Java后端面试高频题答案版【模块二：Java集合类】

一步一步构建Spring5源码

【leetcode刷题】T165-最长上升子序列

VirMach 便宜 VPS

QNews

熱門搜尋