Hive-Testbench:https://github.com/hortonworks/hive-testbench/
Tpcds-Kit:https://github.com/gregrahn/tpcds-kit
官网:http://www.tpc.org/
针对数据库不同的使用场景TPC组织提供了多种数据集,主要的TPC数据集有如下几种
TPC-C:模拟一个库存-订单系统以及其上的多用户并发事务;
TPC-DI:模拟多种类型的大数据源的ETL过程;
TPC-DS:模拟大型零售业务的系统,该系统主要用于BI和决策支持,数据量和OLAP查询复杂度都很高,是与真实场景非常接近的一个测试集;
TPC-E:模拟证券经纪人的系统,该系统主要用于提供大量查询的OLTP服务;
TPC-H:可以近似视为TPC-DS的简化版本,但它的数据表数据特征单一,已经不能精准反映当今数据库系统的真实性能。
官网下载页面
这里通过官网的下载导入,将下载的文件解压后,修改dbgen中makefile.suite为makefile文件并进行修改
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE= SQLSERVER
MACHINE = LINUX
WORKLOAD = TPCH
由于官方数据库支持列表并没有原生支持Hive,所以通过SQLSERVER进行修改为HQL风格,修改tpcd.h文件
#ifdef SQLSERVER
#define GEN_QUERY_PLAN "explain;"
#define START_TRAN "start transaction;\n"
#define END_TRAN "commit;\n"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\m"
#define SET_DBASE "use %s;\n"
#endif
先通过make命令进行编译,再执行./dbgen -s 6生成数据。6为指定数据大小为5个SF,SF为TPC中描述数据量的单位,1个SF约等于1GB
[root@node03 dbgen]# ll -h *.tbl
-rw-r--r-- 1 root root 140M 6月 8 13:30 customer.tbl
-rw-r--r-- 1 root root 4.4G 6月 8 13:30 lineitem.tbl
-rw-r--r-- 1 root root 2.2K 6月 8 13:30 nation.tbl
-rw-r--r-- 1 root root 997M 6月 8 13:30 orders.tbl
-rw-r--r-- 1 root root 688M 6月 8 13:30 partsupp.tbl
-rw-r--r-- 1 root root 139M 6月 8 13:30 part.tbl
-rw-r--r-- 1 root root 389 6月 8 13:30 region.tbl
-rw-r--r-- 1 root root 8.2M 6月 8 13:30 supplier.tbl
建表语句在dss.ddl文件中,但是它的格式与HQL不符,因此需要另外改写建表语句。
create database tpch;
use tpch;
create external table lineitem (
l_orderkey int,
l_partkey int,
l_suppkey int,
l_linenumber int,
l_quantity double,
l_extendedprice double,
l_discount double,
l_tax double,
l_returnflag string,
l_linestatus string,
l_shipdate string,
l_commitdate string,
l_receiptdate string,
l_shipinstruct string,
l_shipmode string,
l_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table nation (
n_nationkey int,
n_name string,
n_regionkey int,
n_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table region (
r_regionkey int,
r_name string,
r_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table part (
p_partkey int,
p_name string,
p_mfgr string,
p_brand string,
p_type string,
p_size int,
p_container string,
p_retailprice double,
p_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table supplier (
s_suppkey int,
s_name string,
s_address string,
s_nationkey int,
s_phone string,
s_acctbal double,
s_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table partsupp (
ps_partkey int,
ps_suppkey int,
ps_availqty int,
ps_supplycost double,
ps_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table customer (
c_custkey int,
c_name string,
c_address string,
c_nationkey int,
c_phone string,
c_acctbal double,
c_mktsegment string,
c_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
create external table orders (
o_orderkey int,
o_custkey int,
o_orderstatus string,
o_totalprice double,
o_orderdate date,
o_orderpriority string,
o_clerk string,
o_shippriority int,
o_comment string)
row format delimited
fields terminated by '|'
stored as textfile;
use tpch;
load data inpath "/tpch/supplier.tbl" into table supplier;
load data inpath "/tpch/region.tbl" into table region;
load data inpath "/tpch/partsupp.tbl" into table partsupp;
load data inpath "/tpch/part.tbl" into table part;
load data inpath "/tpch/orders.tbl" into table orders;
load data inpath "/tpch/nation.tbl" into table nation;
load data inpath "/tpch/lineitem.tbl" into table lineitem;
load data inpath "/tpch/customer.tbl" into table customer;
8个表的Schema,其行数也是SF的倍数。
HQL风格查询语句:https://issues.apache.org/jira/browse/HIVE-600
TPCH 22条SQL语句分析:https://yq.aliyun.com/articles/149715
参考来源:https://www.jianshu.com/p/59155803d67b
文章评论