汉中建设工程招标投标信息网,优化营商环境条例心得体会,设计公司企业标语,电影网站的建设Hive内置UDTF 1、UDF、UDAF、UDTF简介2、Hive内置UDTF 1、UDF、UDAF、UDTF简介 在Hive中#xff0c;所有的运算符和用户定义函数#xff0c;包括用户定义的和内置的#xff0c;统称为UDF#xff08;User-Defined Functions#xff09;。如下图所示#xff1a; UDF官方文档… Hive内置UDTF 1、UDF、UDAF、UDTF简介2、Hive内置UDTF 1、UDF、UDAF、UDTF简介 在Hive中所有的运算符和用户定义函数包括用户定义的和内置的统称为UDFUser-Defined Functions。如下图所示 UDF官方文档https://cwiki.apache.org/confluence/display/Hive/LanguageManualUDF
其中用户自定义聚合函数和内置聚合函数统称为UDAFUser-Defined Aggregate Functions用户自定义表生成函数和内置表生成函数统称为UDTFUser-Defined Table-Generating Functions
本文将主要通过具体案例详细介绍Hive的内置表生成函数UDTF
2、Hive内置UDTF Hive内置UDTF官方文档https://cwiki.apache.org/confluence/display/Hive/LanguageManualUDF#LanguageManualUDF-Built-inTable-GeneratingFunctions%28UDTF%29
2.1、explode(array/map)
功能列转行
示例
select explode(array(1,2,3))
select explode(split(1,2,3, ,))
col
1
2
3select explode(map(1,2,3,4))
key value
1 2
3 42.2、posexplode(array)
功能列转行第一列添加元素索引从0开始
示例
select posexplode(array(1,2,3))
pos val
0 1
1 2
2 32.3、stack(n,v1,v2,…,vk)
功能将k个数据平均转换成n行即k/n列k必须是n的整数倍空值使用NULL
示例
-- 将9个元素按顺序分成3行3列
with user_log as (select stack (3,1001, 2021-12-12, 123,1002, 2021-12-12, 145,1001, 2021-12-13, 143)as (id, dt, lowcarbon)
)
select * from user_log
user_log.id user_log.dt user_log.lowcarbon
1001 2021-12-12 123
1002 2021-12-12 145
1001 2021-12-13 1432.4、lateral view UDTF
功能UDTF只允许在SELECT后面跟UDTF不允许在SELECT后跟其他字段例如
select CN as country,explode(array(1,2,3))Hive报错SparkSQL不报错。lateral view可以解决这个问题
示例1字符串分割
-- 方式1
with shop as (select 1001 as pid,1,2,3 as svsunion select 1002 as pid,4,5, as svs
)
select pid,svs,sv from shop
lateral view outer explode(split(svs, ,)) tmp_v as sv-- 方式2
select pid,svs,sv from (select * from (select 1001 as pid,1,2,3 as svsunion select 1002 as pid,4,5, as svs) tmp
) shop
lateral view outer explode(split(svs, ,)) tmp_v as sv
pid svs sv
1001 1,2,3 1
1001 1,2,3 2
1001 1,2,3 3
1002 4,5, 4
1002 4,5, 5
1002 4,5, 方式1和方式2使用lateral view和lateral view outer效果相同空缺值显示为空字符串
示例2数组
-- 方式1
with shop as (select 1001 as pid,array(1,2,3) as svsunion select 1002 as pid,array(4,5,NULL) as svs
)
select pid,svs,sv from shop
lateral view outer explode(svs) tmp_v as sv-- 方式2
select pid,svs,sv from (select * from (select 1001 as pid,array(1,2,3) as svsunion select 1002 as pid,array(4,5,NULL) as svs) tmp
) shop
lateral view outer explode(svs) tmp_v as sv
pid svs sv
1001 [1,2,3] 1
1001 [1,2,3] 2
1001 [1,2,3] 3
1002 [4,5,null] 4
1002 [4,5,null] 5
1002 [4,5,null] NULL方式1和方式2使用lateral view和lateral view outer效果相同空缺值显示为NULL
示例3数据存在NULL
-- 方式1
with shop as (select 1001 as pid, 1,2,3 as svsunion select 1002 as pid, NULL as svs
)
select pid,svs,sv from shop
lateral view outer explode(split(svs, ,)) tmp_v as sv-- 方式2
select pid,svs,sv from (select * from (select 1001 as pid, 1,2,3 as svsunion select 1002 as pid, NULL as svs) tmp
) shop
lateral view outer explode(split(svs, ,)) tmp_v as sv-- lateral view结果pid svs sv
1001 1,2,3 1
1001 1,2,3 2
1001 1,2,3 3-- lateral view outer结果pid svs sv
1001 1,2,3 1
1001 1,2,3 2
1001 1,2,3 3
1002 NULL NULL方式1和方式2使用lateral view和lateral view outer效果不同lateral view空缺值数据丢失lateral view outer空缺值显示为NULL
lateral view [outer]详解见文章传送门
2.5、json_tuple(json_str,k1,k2,…)
功能从json字符串中根据key获取对应的value返回
示例json_tuple()使用见文章传送门
2.6、parse_url_tuple(url,p1,p2,…)
功能从url中根据属性property获取对应的value返回
示例
select parse_url_tuple(http://facebook.com/path1/p.php?k1v1k2v2#Ref1, HOST, PATH, QUERY, REF, PROTOCOL, QUERY:k1, QUERY:k2)
c0 c1 c2 c3 c4 c5 c6
facebook.com /path1/p.php k1v1k2v2 Ref1 http v1 v2参数详解见https://help.aliyun.com/zh/maxcompute/user-guide/parse-url-tuple
2.7、inline(arraystruct)
功能将结构体数组并列分解为多行
示例
select inline(array(struct(A,18,date 2023-10-01),struct(B,20,date 2023-11-01))) as (col1,col2,col3)
col1 col2 col3
A 18 2023-10-01
B 20 2023-11-01