(超長文)Hive Sql基本操作
- 2019 年 10 月 6 日
- 筆記
Hive Sql基本操作
1.hive裡面不顯示列名
2.注釋問題2.1 MySQL中的注釋2.2 Hive中的注釋3.亂碼問題3.1 修改表欄位註解和表註解3.2 修改分區欄位註解3.3 修改索引註解3.4 修改metastore的連接URL4.資料庫基本操作4.1 創建帶屬性的庫4.2 顯示資料庫詳情:4.3 查看正在使用哪個庫4.4 查看資料庫的詳情語句5.刪除資料庫5.1 刪除庫原則5.2 刪除不含表的資料庫5.3 刪除含表資料庫6.切換庫及創建表6.1 切換庫6.2 創建表7.表詳情及表操作7.1 表詳情7.2 表操作8.分區8.1 查看分區8.2 添加分區8.3 修改分區8.4 刪除分區
1.hive裡面不顯示列名
問題:

修改hive-site
配置:
<property> <name>hive.cli.print.current.db</name> <!--默認是true--> <value>true</value> </property> <property> <name>hive.cli.print.header</name> <!--默認是false--> <value>true</value> </property>
修改後:

2.注釋問題
Hive中支援創建資料庫時帶注釋。
create database if not exists t2 comment 'l h';
MySQL中是錶帶注釋。當然Hive中也行。
2.1 MySQL中的注釋
MySQL創建帶列與表注釋的表:
create table test(id int primary key auto_increment comment '設置主鍵自增',name varchar(100) comment '列注釋')comment '表注釋';
查看列注釋
查看全部列注釋:
mysql> show full columns from test; +-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+ | id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | 設置主鍵自增 | | name | varchar(100) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | 列注釋 | +-------+--------------+-------------------+------+-----+---------+----------------+---------------------------------+--------------------+ 2 rows in set (0.00 sec)
查看特定列注釋:
mysql> show full columns from test where field='id'; +-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+ | id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | 設置主鍵自增 | +-------+---------+-----------+------+-----+---------+----------------+---------------------------------+--------------------+ 1 row in set (0.01 sec)
查看錶注釋
第一種:
mysql> show create table test; +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '設置主鍵自增', `name` varchar(100) DEFAULT NULL COMMENT '列注釋', PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='表注釋' | +-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
第二種:
mysql> SELECT table_name,table_comment FROM information_schema.tables WHERE table_schema = 'test' AND table_name ='test' -> ; +------------+---------------+ | table_name | table_comment | +------------+---------------+ | test | 表注釋 | +------------+---------------+ 1 row in set (0.01 sec)
修改表注釋
mysql> alter table test comment '表注釋修改'; Query OK, 0 rows affected (0.11 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> SELECT table_name,table_comment FROM information_schema.tables WHERE table_schema = 'test' AND table_name ='test' -> ; +------------+-----------------+ | table_name | table_comment | +------------+-----------------+ | test | 表注釋修改 | +------------+-----------------+ 1 row in set (0.00 sec)
修改列注釋
mysql> ALTER TABLE test MODIFY COLUMN name VARCHAR(100) COMMENT '修改列注釋'; Query OK, 0 rows affected (0.10 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> show full columns from test where field='name'; +-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+ | name | varchar(100) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | 修改列注釋 | +-------+--------------+-------------------+------+-----+---------+-------+---------------------------------+-----------------+ 1 row in set (0.00 sec)
2.2 Hive中的注釋
創建表注釋與列注釋
hive (t1)> create table test(id int comment '用戶ID',price string comment '價格') comment '測試表' ; OK Time taken: 1.146 seconds hive (t1)> desc test; OK col_name data_type comment id int 用戶ID price string 價格
修改注釋
hive (t1)> alter table test change id id int comment 'id'; OK Time taken: 0.355 seconds hive (t1)> desc test; OK col_name data_type comment id int id price string 價格 Time taken: 0.039 seconds, Fetched: 2 row(s)
3.亂碼問題
當添加註釋如果是中文,會亂碼,因為MySQL中的元數據出現亂碼。需要修改metastore中的表、分區、視圖的編碼。
3.1 修改表欄位註解和表註解
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8; alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
3.2 修改分區欄位註解
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 ; alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
3.3 修改索引註解
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
3.4 修改metastore的連接URL
修改hive-site
配置文件:
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://city:3306/hive?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF-8</value> <description>JDBC connect string for a JDBC metastore</description> </property>
經過上述操作,就可以完成中文顯示。
4.資料庫基本操作
4.1 創建帶屬性的庫
create database if not exists t3 with dbproperties('creator'='hadoop','date'='2019-06-11');
4.2 顯示資料庫詳情:
desc database [extended] dbname;
示例:
hive (t1)> desc database t3; OK db_name comment location owner_name owner_type parameters t3 hdfs://city:9012/user/hive/warehouse/t3.db hadoop USER Time taken: 0.009 seconds, Fetched: 1 row(s) hive (t1)> desc database extended t3; OK db_name comment location owner_name owner_type parameters t3 hdfs://city:9012/user/hive/warehouse/t3.db hadoop USER {date=2019-06-11, creator=hadoop} Time taken: 0.011 seconds, Fetched: 1 row(s)
4.3 查看正在使用哪個庫
hive (t1)> select current_database(); OK _c0 t1 Time taken: 0.954 seconds, Fetched: 1 row(s)
4.4 查看資料庫的詳情語句
hive (t1)> show create database t3; OK createdb_stmt CREATE DATABASE `t3` LOCATION 'hdfs://city:9012/user/hive/warehouse/t3.db' WITH DBPROPERTIES ( 'creator'='hadoop', 'date'='2019-06-11') Time taken: 0.016 seconds, Fetched: 6 row(s)5.刪除資料庫
5.1 刪除庫原則
hive不允許刪除包含表的資料庫
hive (t1)> drop database t1; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database t1 is not empty. One or more tables exist.)
兩種解決方法:
- 手動刪除表,再刪除庫;
- 使用cascade關鍵字
5.2 刪除不含表的資料庫
檢查是否包含表
hive (t1)> show tables in t1; OK tab_name test Time taken: 0.02 seconds, Fetched: 1 row(s)
hive (t1)> show tables in t2; OK tab_name Time taken: 0.017 seconds
發現t1資料庫中有表,t2資料庫中無表。
直接刪除資料庫t2:
hive (t1)> drop database t2; OK Time taken: 0.223 seconds
5.3 刪除含表資料庫
hive (t1)> drop database if exists t1 cascade; OK Time taken: 0.611 seconds6.切換庫及創建表
6.1 切換庫
使用use
即可。
6.2 創建表
分為內部表、外部表、分區表、分桶表。默認為內部表。
內部表
create table student(id int, name string, sex string, age int,department string) row format delimited fields terminated by ",";
這種就屬於默認的內部表。
外部表
create external table student_ext (id int, name string, sex string, age int,department string) row format delimited fields terminated by "," location "/hive/student";
內部表與外部表刪除的一個區別:當使用drop table xxxx
時候,hdfs文件系統上如果是內部表,則直接會被刪除,同時本地資料庫裡面的表也會被刪除;而如果是外部表,則只有本地資料庫表刪除,hdfs文件系統上不會被刪除。
存儲路徑不一樣:內部表存儲在/user/hive/warehouse/xx.db
裡面,而外部表存儲在自定指定的location裡面。(註:該location為hdfs目錄文件系統)。如果創建時候不指定,默認跟內部表的hdfs路徑一致,當然內部表也可以指定位置。
分區表
分區表實際上就是一種外部表,如果某張表是分區表。那麼每個分區的定義,其實就表現為了這張表的數據存儲目錄下的一個子目錄。如果是分區表。那麼數據文件一定要存儲在某個分區中,而不能直接存儲在表中。
hive (t1)> create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) location "/hive/student_ptn"; OK Time taken: 0.455 seconds hive (t1)> alter table student_ptn add partition(city="beijing"); OK Time taken: 0.406 seconds hive (t1)> alter table student_ptn add partition(city="shenzhen"); OK Time taken: 0.525 seconds

分桶表
create external table student_bck(id int, name string, sex string, age int,department string) clustered by(id) sorted by (id asc, name desc) into 4 buckets location "/hive/student_bck";
Hive使用CTAS創建表
讀取本地文件:
注意:如果local不寫,則從hdfs中查找文件。
load data local inpath "/home/light/mysql/student.txt" into table student;
使用CTAS創建表:
create table student_ctas as select * from student where id < 95012;
MySQL使用CTAS創建表
load data local inpath
得變為load data local infile
。
load data local infile "/home/light/mysql/student.txt" into table student fields terminated by ',';
CTAS創建如下:
hive (t1)> load data local inpath "/home/light/mysql/student.txt" into table student; Loading data to table t1.student OK Time taken: 0.637 seconds hive (t1)> select * from student; OK student.id student.name student.sex student.age student.department 95002 劉晨 女 19 IS 95017 王風娟 女 18 IS 95018 王一 女 19 IS 95013 馮偉 男 21 CS 95014 王小麗 女 19 CS 95019 邢小麗 女 19 IS 95020 趙錢 男 21 IS 95003 王敏 女 22 MA 95004 張立 男 19 IS 95012 孫花 女 20 CS 95010 孔小濤 男 19 CS 95005 劉剛 男 18 MA 95006 孫慶 男 23 CS 95007 易思玲 女 19 MA 95008 李娜 女 18 CS 95021 周二 男 17 MA 95022 鄭明 男 20 MA 95001 李勇 男 20 CS 95011 包小柏 男 18 MA 95009 夢圓圓 女 18 MA 95015 王君 男 18 MA Time taken: 1.302 seconds, Fetched: 21 row(s) hive (t1)> create table student_ctas as select * from student where id < 95012; Query ID = hadoop_20190611151034_31ae197c-8ad7-4e81-9d23-ce78cfbbb1e5 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1560231670908_0001, Tracking URL = http://localhost:8088/proxy/application_1560231670908_0001/ Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1560231670908_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2019-06-11 15:10:46,629 Stage-1 map = 0%, reduce = 0% 2019-06-11 15:10:52,875 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.17 sec MapReduce Total cumulative CPU time: 3 seconds 170 msec Ended Job = job_1560231670908_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://city:9012/user/hive/warehouse/t1.db/.hive-staging_hive_2019-06-11_15-10-34_904_5157590666630521651-1/-ext-10002 Moving data to directory hdfs://city:9012/user/hive/warehouse/t1.db/student_ctas MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 3.17 sec HDFS Read: 6199 HDFS Write: 337 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 170 msec OK student.id student.name student.sex student.age student.department Time taken: 20.827 seconds hive (t1)> select * from student_ctas; OK student_ctas.id student_ctas.name student_ctas.sex student_ctas.agestudent_ctas.department 95002 劉晨 女 19 IS 95003 王敏 女 22 MA 95004 張立 男 19 IS 95010 孔小濤 男 19 CS 95005 劉剛 男 18 MA 95006 孫慶 男 23 CS 95007 易思玲 女 19 MA 95008 李娜 女 18 CS 95001 李勇 男 20 CS 95011 包小柏 男 18 MA 95009 夢圓圓 女 18 MA Time taken: 0.165 seconds, Fetched: 11 row(s)
上述的CTAS創建表在MySQL中也是適用,如下所示:
mysql> create table student_ctas as select * from student where id<95012; Query OK, 11 rows affected (0.51 sec) Records: 11 Duplicates: 0 Warnings: 0 mysql> select * from student_ctas; +-------+-----------+------+------+------------+ | id | name | sex | age | department | +-------+-----------+------+------+------------+ | 95002 | 劉晨 | 女 | 19 | IS | | 95003 | 王敏 | 女 | 22 | MA | | 95004 | 張立 | 男 | 19 | IS | | 95010 | 孔小濤 | 男 | 19 | CS | | 95005 | 劉剛 | 男 | 18 | MA | | 95006 | 孫慶 | 男 | 23 | CS | | 95007 | 易思玲 | 女 | 19 | MA | | 95008 | 李娜 | 女 | 18 | CS | | 95001 | 李勇 | 男 | 20 | CS | | 95011 | 包小柏 | 男 | 18 | MA | | 95009 | 夢圓圓 | 女 | 18 | MA | +-------+-----------+------+------+------------+ 11 rows in set (0.00 sec)
複製表結構
在MySQL中也有一樣的語法。
create table a like student;
7.表詳情及表操作
7.1 表詳情
show tables in t1; OK tab_name a student student_bck student_ctas student_ext student_ptn test Time taken: 0.018 seconds, Fetched: 7 row(s)
匹配:查看以xxx開頭或結尾的所有表。
查看以student開頭的所有表:
hive (t1)> show tables like 'student*'; OK tab_name student student_bck student_ctas student_ext student_ptn Time taken: 0.011 seconds, Fetched: 5 row(s)
查看以dent結尾的所有表:
hive (t1)> show tables like '*dent'; OK tab_name student Time taken: 0.017 seconds, Fetched: 1 row(s)
表詳情1:desc xxx
。
表詳情2:
hive (t1)> desc formatted student; OK col_name data_type comment # col_name data_type comment id int name string sex string age int department string # Detailed Table Information Database: t1 OwnerType: USER Owner: hadoop CreateTime: Tue Jun 11 15:09:13 CST 2019 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://city:9012/user/hive/warehouse/t1.db/student Table Type: MANAGED_TABLE ... ... ...
表詳情3:
hive (t1)> desc extended student; OK col_name data_type comment id int name string sex string age int department string Detailed Table Information Table(tableName:student, dbName:t1, owner:hadoop, createTime:1560236953, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:sex, type:string, comment:null), FieldSchema(name:age, type:int, comment:null), FieldSchema(name:department, type:string, comment:null)], location:hdfs://city:9012/user/hive/warehouse/t1.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=504, numRows=0, rawDataSize=0, numFiles=1, transient_lastDdlTime=1560236957, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER) Time taken: 0.049 seconds, Fetched: 7 row(s)
7.2 表操作
對比MySQL與Hive修改表名
在Hive中必須有to
,而MySQL中可以不要!
alter table student rename to ab;
添加欄位
alter table student add columns(sc int)
MySQL中columns
不能加s。
alter table ba add column(sc1 int,sc2 int);
修改欄位的定義
alter table student change name name1 string;
MySQL語法同上。
刪除一個欄位不支援!
但可以通過replace
替換掉!
替換所有欄位
刪除name
列:
hive (t1)> select * from student; OK student.id student.address student.name 95002 劉晨 女 95017 王風娟 女 95018 王一 女 95013 馮偉 男 95014 王小麗 女 95019 邢小麗 女 95020 趙錢 男 95003 王敏 女 95004 張立 男 95012 孫花 女 95010 孔小濤 男 95005 劉剛 男 95006 孫慶 男 95007 易思玲 女 95008 李娜 女 95021 周二 男 95022 鄭明 男 95001 李勇 男 95011 包小柏 男 95009 夢圓圓 女 95015 王君 男 Time taken: 0.159 seconds, Fetched: 21 row(s) hive (t1)> alter table student replace columns (id int,address string); OK Time taken: 0.25 seconds hive (t1)> select * from student; OK student.id student.address 95002 劉晨 95017 王風娟 95018 王一 95013 馮偉 95014 王小麗 95019 邢小麗 95020 趙錢 95003 王敏 95004 張立 95012 孫花 95010 孔小濤 95005 劉剛 95006 孫慶 95007 易思玲 95008 李娜 95021 周二 95022 鄭明 95001 李勇 95011 包小柏 95009 夢圓圓 95015 王君 Time taken: 0.162 seconds, Fetched: 21 row(s)
刪除表:
drop table xxx
清空表:
truncate table xxx;
8.分區
8.1 查看分區
查看分區資訊
hive (t1)> show partitions student_ptn; OK partition city=beijing city=shenzhen Time taken: 0.103 seconds, Fetched: 2 row(s)
8.2 添加分區
添加靜態分區
hive (t1)> alter table student_ptn add partition(city="sc") partition(city="bj"); OK Time taken: 0.722 seconds hive (t1)> show partitions student_ptn; OK partition city=beijing city=bj city=cq city=sc city=shenzhen Time taken: 0.073 seconds, Fetched: 5 row(s)
添加動態分區
Hive導入數據成功,但是查詢結果為NULL,且未報錯。
這是因為創建表格時沒有對導入的數據格式沒有處理,比如每行數據以tab鍵隔開,以換行鍵結尾,就要以如下語句創建表格。
比如:當時創建的分區表語句:
create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) location "/hive/student_ptn";
應該改為:
create external table student_ptn(id int, name string, sex string, age int,department string) partitioned by (city string) row format delimited fields terminated by "," location "/hive/student_ptn";
先向student_ptn表中插入數據:
load data local inpath "/home/light/mysql/student.txt" into table student_ptn partition(city="cq");
其中where city='cq'
正是分區欄位。如果不加where篩選,直接select *
那麼會返回所有分區結果資訊。
hive (t1)> select * from student_ptn where city='cq'; OK student_ptn.id student_ptn.name student_ptn.sex student_ptn.age student_ptn.department student_ptn.city 95002 劉晨 女 19 IS cq 95017 王風娟 女 18 IS cq 95018 王一 女 19 IS cq 95013 馮偉 男 21 CS cq 95014 王小麗 女 19 CS cq 95019 邢小麗 女 19 IS cq 95020 趙錢 男 21 IS cq 95003 王敏 女 22 MA cq 95004 張立 男 19 IS cq 95012 孫花 女 20 CS cq 95010 孔小濤 男 19 CS cq 95005 劉剛 男 18 MA cq 95006 孫慶 男 23 CS cq 95007 易思玲 女 19 MA cq 95008 李娜 女 18 CS cq 95021 周二 男 17 MA cq 95022 鄭明 男 20 MA cq 95001 李勇 男 20 CS cq 95011 包小柏 男 18 MA cq 95009 夢圓圓 女 18 MA cq 95015 王君 男 18 MA cq Time taken: 0.21 seconds, Fetched: 21 row(s)
現在我把這張表的內容直接插入到另一張表student_ptn_age中,並實現age為動態分區(不指定到底是哪中性別,讓系統自己分配決定)。
create table student_ptn_age(id int,name string,sex string,department string) partitioned by (age int);
從student_ptn表中查詢數據並插入student_ptn_age表中
hive (t1)> insert overwrite table student_ptn_age partition(age) select id,name,sex,department,age from student_ptn; FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
此時報錯了,需要在當前命令窗口處輸入:
hive (t1)> set hive.exec.dynamic.partition=true; hive (t1)> set hive.exec.dynamic.partition.mode=nostrick;
然後再運行上述插入命令即可。

8.3 修改分區
修改分區一般來說就是修改分區的數據存儲目錄。
在添加分區的時候,直接指定當前分區的數據存儲目錄
alter table student_ptn add if not exists partition(city='beijing') location '/student_ptn_beijing' partition(city='cc') location '/student_cc';
修改已經指定好的分區的數據存儲目錄:
alter table student_ptn partition (city='beijing') set location '/student_ptn_beijing';
此時原先的分區文件夾仍存在,但是在往分區添加數據時,只會添加到新的分區目錄
load data local inpath "/home/light/mysql/student.txt" into table student_ptn partition(city="beijing");


8.4 刪除分區
hive (t1)> alter table student_ptn drop partition (city='beijing'); Dropped the partition city=beijing OK Time taken: 0.459 seconds hive (t1)> show partitions student_ptn; OK partition city=cc city=cq Time taken: 0.075 seconds, Fetched: 2 row(s)
最後,學習參考了:https://www.cnblogs.com/qingyunzong/p/8723271.html