Hive-Testbench:https://github.com/hortonworks/hive-testbench/
Tpcds-Kit:https://github.com/gregrahn/tpcds-kit
官网:http://www.tpc.org/
针对数据库不同的使用场景TPC组织提供了多种数据集,主要的TPC数据集有如下几种
TPC-C:模拟一个库存-订单系统以及其上的多用户并发事务;
TPC-DI:模拟多种类型的大数据源的ETL过程;
TPC-DS:模拟大型零售业务的系统,该系统主要用于BI和决策支持,数据量和OLAP查询复杂度都很高,是与真实场景非常接近的一个测试集;
TPC-E:模拟证券经纪人的系统,该系统主要用于提供大量查询的OLTP服务;
TPC-H:可以近似视为TPC-DS的简化版本,但它的数据表数据特征单一,已经不能精准反映当今数据库系统的真实性能。
官网下载页面
这里通过官网的下载导入,将下载的文件解压后,修改dbgen中makefile.suite为makefile文件并进行修改
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH
由于官方数据库支持列表并没有原生支持Hive,所以通过SQLSERVER进行修改为HQL风格,修改tpcd.h文件
#ifdef SQLSERVER
#define GEN_QUERY_PLAN "explain;"
#define START_TRAN "start transaction;\n"
#define END_TRAN "commit;\n"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\m"
#define SET_DBASE "use %s;\n"
#endif
先通过make命令进行编译,再执行./dbgen -s 6生成数据。6为指定数据大小为5个SF,SF为TPC中描述数据量的单位,1个SF约等于1GB
[root@node03 dbgen]# ll -h *.tbl
-rw-r--r-- 1 root root 140M 6月 8 13:30 customer.tbl
-rw-r--r-- 1 root root 4.4G 6月 8 13:30 lineitem.tbl
-rw-r--r-- 1 root root 2.2K 6月 8 13:30 nation.tbl
-rw-r--r-- 1 root root 997M 6月 8 13:30 orders.tbl
-rw-r--r-- 1 root root 688M 6月 8 13:30 partsupp.tbl
-rw-r--r-- 1 root root 139M 6月 8 13:30 part.tbl
-rw-r--r-- 1 root root 389 6月 8 13:30 region.tbl
-rw-r--r-- 1 root root 8.2M 6月 8 13:30 supplier.tbl
建表语句在dss.ddl文件中,但是它的格式与HQL不符,因此需要另外改写建表语句。
create database tpch; use tpch; create external table lineitem ( l_orderkey int, l_partkey int, l_suppkey int, l_linenumber int, l_quantity double, l_extendedprice double, l_discount double, l_tax double, l_returnflag string, l_linestatus string, l_shipdate string, l_commitdate string, l_receiptdate string, l_shipinstruct string, l_shipmode string, l_comment string) row format delimited fields terminated by '|' stored as textfile; create external table nation ( n_nationkey int, n_name string, n_regionkey int, n_comment string) row format delimited fields terminated by '|' stored as textfile; create external table region ( r_regionkey int, r_name string, r_comment string) row format delimited fields terminated by '|' stored as textfile; create external table part ( p_partkey int, p_name string, p_mfgr string, p_brand string, p_type string, p_size int, p_container string, p_retailprice double, p_comment string) row format delimited fields terminated by '|' stored as textfile; create external table supplier ( s_suppkey int, s_name string, s_address string, s_nationkey int, s_phone string, s_acctbal double, s_comment string) row format delimited fields terminated by '|' stored as textfile; create external table partsupp ( ps_partkey int, ps_suppkey int, ps_availqty int, ps_supplycost double, ps_comment string) row format delimited fields terminated by '|' stored as textfile; create external table customer ( c_custkey int, c_name string, c_address string, c_nationkey int, c_phone string, c_acctbal double, c_mktsegment string, c_comment string) row format delimited fields terminated by '|' stored as textfile; create external table orders ( o_orderkey int, o_custkey int, o_orderstatus string, o_totalprice double, o_orderdate date, o_orderpriority string, o_clerk string, o_shippriority int, o_comment string) row format delimited fields terminated by '|' stored as textfile; use tpch; load data inpath "/tpch/supplier.tbl" into table supplier; load data inpath "/tpch/region.tbl" into table region; load data inpath "/tpch/partsupp.tbl" into table partsupp; load data inpath "/tpch/part.tbl" into table part; load data inpath "/tpch/orders.tbl" into table orders; load data inpath "/tpch/nation.tbl" into table nation; load data inpath "/tpch/lineitem.tbl" into table lineitem; load data inpath "/tpch/customer.tbl" into table customer;
8个表的Schema,其行数也是SF的倍数。HQL风格查询语句:https://issues.apache.org/jira/browse/HIVE-600 TPCH 22条SQL语句分析:https://yq.aliyun.com/articles/149715 参考来源:https://www.jianshu.com/p/59155803d67b
文章评论