信息安全从业人员^_^
一个未入门de情报学胖子(邮箱:tenghm1986@163.com)
Toggle navigation
信息安全从业人员^_^
主页
About Me
归档
标签
【hive】nested select * from(select * from order by..)tmp多级嵌套问题
2020-06-12 10:13:58
690
1
1
heming
#0.缘起 现象描述:执行如下sql语句,因为含有`order by limit`,正常执行时(asc),结果是`增序`的;加上代理且开启脱敏后顺序变成`降序`,执行图如下: ``` select * from ( select t.rn , t.rec_site_name , t.ratio_site_over , t.onum / 10000 onum , t.num_site_over / 10000 num_site_over , t.stipulated_deadline , b.categorycode , c.sum_latetime / c.num_latecar , c.sum_waittime / c.num_latecar , c.num_latecar / c.num_latecar from yygk.yygk_top30_submit_abnormal_site t left join dim.baseorganize b on t.rec_site_name = b.fullname and b.deletionstatecode = 0 left join tmp.tmp_jean_sitecarinfo c on b.leaguename = c.full_name where t.ds = '20200610' order by categorycode asc, rn asc limit 1000 ) dataclub_sql_limit300000 limit 300000; ``` > hive直接执行 <center> ![正常](https://leanote.com/api/file/getImage?fileId=5ee1f9c3ab64416aa70008a8) </center> > proxy-hive开启脱敏 <center> ![yichang](https://leanote.com/api/file/getImage?fileId=5ee1f9ddab64416aa70008aa) </center> # 1.原生hive二级嵌套本来就会改变顺序结构(不通过代理,直连hive) 在对hive进行脱敏时,针对hive sql中含有`order by/group by/sort by`等语句时,会在`原语句`的基础上添加`select * from(原语句) tmp`,但客户如果执行一级嵌套语句(含有order by limit)时,改写后将变成二级嵌套,此时顺序会变掉,传统oracle数据库也是同样问题,顺序也会变掉。 一级嵌套 ``` select * from(select .. from.. order by ... limit ...)tmp limit .. ``` 二级嵌套 ``` select * from(select * from(select .. from.. order by ...limit ...)tmp limit ...) tmp ``` ## 1.1 原本语句 ``` SELECT id,phone,email FROM sen_t1 ORDER BY id desc; ``` <center> ![0](https://leanote.com/api/file/getImage?fileId=5ee2d319ab644168ab0012be) </center> ## 1.2 一级嵌套 ``` select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp limit 5; ``` <center> ![1](https://leanote.com/api/file/getImage?fileId=5ee2d383ab644168ab0012c2) </center> ## 1.3 二级嵌套 ``` select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp limit 5)tmp2; ``` <center> ![2](https://leanote.com/api/file/getImage?fileId=5ee2d41eab644168ab0012ca) </center> **NOTICE:** **`此时顺序已经变化,由降序变为增序`** --- **tips:如果把一级嵌套里limit 5 去掉,结果又是正常顺序** ``` select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp )tmp2; ``` <center> ![1](https://leanote.com/api/file/getImage?fileId=5ee2d545ab644168ab0012d3) </center> --- 观察这两个语句①` explain select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp limit 5)tmp2;` and ②`explain select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp )tmp2;`这两个语句的执行计划,可以看到①因为里面多了`limit 5`字段而多了一次mapreduce过程,且在多的一次mapreduce过程中,Reduce Output Operator中sort order: 为空,这可能是导致①返回结果顺序变化的原因。 > 执行计划 explain select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp limit 5)tmp2; ``` STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 depends on stages: Stage-2 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: sen_t1 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), phone (type: string), email (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: - Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.1 value expressions: _col1 (type: string), _col2 (type: string) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), VALUE._col1 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe Stage: Stage-2 Map Reduce Map Operator Tree: TableScan Reduce Output Operator sort order: Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.1 value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string) Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: int), VALUE._col1 (type: string), VALUE._col2 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink ``` >执行计划:explain select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp limit 5; ``` STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: sen_t1 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), phone (type: string), email (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: - Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE TopN Hash Memory Usage: 0.1 value expressions: _col1 (type: string), _col2 (type: string) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), VALUE._col1 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 5 Data size: 370 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: 5 Processor Tree: ListSink ``` >执行计划:explain select * from(select * from (SELECT id,phone,email FROM sen_t1 ORDER BY id desc )tmp )tmp2; ``` STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: sen_t1 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), phone (type: string), email (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: - Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: string), _col2 (type: string) Reduce Operator Tree: Select Operator expressions: KEY.reducesinkkey0 (type: int), VALUE._col0 (type: string), VALUE._col1 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 8 Data size: 592 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink ```
上一篇:
如何不靠运气变得富有
下一篇:
读书汇报--Data and Goliath
1
赞
690 人读过
新浪微博
微信
腾讯微博
QQ空间
人人网
Please enable JavaScript to view the
comments powered by Disqus.
comments powered by
Disqus
文档导航