Hive Join详解

最近对hive的join用的比较多,特地归纳下常用的各种连接,看看hive的连接和我们普通的是否有不同。创建ta.txt和tb.txt两个文件,加载数据:

hive (cfpd_ods_safe)> load data local inpath        '/data/bdp/bdp_etl_deploy/hduser06/jaysonding/ta.txt' into table ta;  hive (cfpd_ods_safe)> load data local inpath        '/data/bdp/bdp_etl_deploy/hduser06/jaysonding/tb.txt' into table tb;

          查询数据:

hive (cfpd_ods_safe)> select * from ta;  OK  ta.uid  1111  2222  3333  4444  Time taken: 0.087 seconds, Fetched: 4 row(s)  hive (cfpd_ods_safe)> select * from tb;  OK  tb.uid  1111  2222  5555  Time taken: 0.183 seconds, Fetched: 3 row(s)

            现在尝试来连接了。

(1)普通的,连接:

ta.uid  tb.uid  1111    1111  1111    2222  1111    5555  2222    1111  2222    2222  2222    5555  3333    1111  3333    2222  3333    5555  4444    1111  4444    2222  4444    5555  Time taken: 21.328 seconds, Fetched: 12 row(s)

           可见普通逗号,不带条件结果就是一个笛卡尔积。再看带条件的:

hive (cfpd_ods_safe)> select * from ta,tb where ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  Time taken: 23.147 seconds, Fetched: 2 row(s)

(2)内连接 inner join:

hive (cfpd_ods_safe)> select * from ta inner join tb on ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  Time taken: 21.597 seconds, Fetched: 2 row(s)

         可见inner join和直接逗号连接效果是一样的。

(3)左连接left join:

hive (cfpd_ods_safe)> select * from ta left join tb on ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  3333    NULL  4444    NULL  Time taken: 22.921 seconds, Fetched: 4 row(s)

(5)左外连接 left outer join:

hive (cfpd_ods_safe)> select * from ta left outer join tb on ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  3333    NULL  4444    NULL  Time taken: 22.637 seconds, Fetched: 4 row(s)

(6)全连接 full join:

hive (cfpd_ods_safe)> select * from ta full join tb on ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  3333    NULL  4444    NULL  NULL    5555  Time taken: 19.39 seconds, Fetched: 5 row(s)

(7)全外连接 full outer join:

hive (cfpd_ods_safe)> select * from ta full outer join tb on ta.uid=tb.uid;  ta.uid  tb.uid  1111    1111  2222    2222  3333    NULL  4444    NULL  NULL    5555  Time taken: 20.414 seconds, Fetched: 5 row(s)

结论:

(1)inner join效果和逗号连接一样,逗号其实是inner join的简写。

(2)不带条件的所有连接都是笛卡尔积

(3)left join和left outer join是一样的,full join和full outer join是一样的。right一样。