一、准备
hive 与 hadoop 对应版本:
https://hive.apache.org/general/downloads/
1.下载安装 jdk-8u341-linux-x64.tar.gz
2.下载安装 hadoop-2.10.2.tar.gz
3.下载 apache-hive-2.3.9-bin.tar.gz
二、安装 hive
1.解压
把 apache-hive-2.3.9-bin.tar.gz 解压到 /opt/ 目录下
tar -xf apache-hive-2.3.9-bin.tar.gz -C /opt/
2.修改配置文件的 HADOOP_HOME
把配置文件 hive-env.sh.template,重命名为 hive-env.sh,并配置 HADOOP_HOME
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/opt/hadoop-2.10.2
3.初始化数据库 derby
删除 derby 数据目录:
root@node0001:/opt/apache-hive-2.3.9-bin# rm -rf metastore_db/
初始化 derby 数据库:
root@node0001:/opt/apache-hive-2.3.9-bin# bin/schematool -initSchema -dbType derby
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver : org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User: APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed
三、使用
1.启动 hive
root@node0001:/opt/apache-hive-2.3.9-bin# bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
2.使用 hive
hive> show databases;
OK
default
Time taken: 3.052 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
Time taken: 0.042 seconds
hive> create table student(id int, name string);
OK
Time taken: 0.407 seconds
hive> show tables;
OK
student
Time taken: 0.023 seconds, Fetched: 1 row(s)
hive> desc student;
OK
id int
name string
Time taken: 0.102 seconds, Fetched: 2 row(s)
hive> insert into student values(1,"zhangsan");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013173034_61e5db72-71c1-4ee1-b817-4a392fffe16d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 17:30:39,160 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local1229433203_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/student/.hive-staging_hive_2023-10-13_17-30-34_835_889218708810284417-1/-ext-10000
Loading data to table default.student
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 11 HDFS Write: 93 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 4.763 seconds
hive> insert into student values(2,"lisi");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013173940_edca02cd-4fe6-4b90-beab-720bd084cf4e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 17:39:42,492 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local2131074738_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/student/.hive-staging_hive_2023-10-13_17-39-40_232_8370877040343431751-1/-ext-10000
Loading data to table default.student
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 18 HDFS Write: 85 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 3.009 seconds
hive> select * from student;
OK
1 zhangsan
2 lisi
Time taken: 0.151 seconds, Fetched: 2 row(s)
hive> quit;
数据位置:
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# ls -lh
total 285M
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741825_1001.meta
-rw-r--r-- 1 root root 14M Oct 13 17:19 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741827_1003.meta
-rw-r--r-- 1 root root 14M Oct 13 17:19 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741828_1004.meta
-rw-r--r-- 1 root root 11 Oct 13 17:30 blk_1073741830
-rw-r--r-- 1 root root 11 Oct 13 17:30 blk_1073741830_1006.meta
-rw-r--r-- 1 root root 7 Oct 13 17:39 blk_1073741833
-rw-r--r-- 1 root root 11 Oct 13 17:39 blk_1073741833_1009.meta
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0#
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741830
1zhangsan
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741833
2lisi
四、mysql 存储元数据
1.安装 mysql 8.0
2.下载 mysql-connector-java-8.0.30.jar
1.放入驱动
mysql驱动放入hive
root@node0001:~# cp mysql-connector-java-8.0.30.jar /opt/apache-hive-2.3.9-bin/lib/
2.创建元数据库
mysql 创建 metastore 库,存放 hive 元数据库
mysql> create database metastore;
Query OK, 1 row affected (0.00 sec)
mysql> quit;
3.hive 配置
root@node0001:/opt/apache-hive-2.3.9-bin/conf# touch hive-site.xml
hive-site.xml 如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc连接的URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/metastore?useSSL=false</value>
</property>
<!-- jdbc连接的Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<!-- jdbc连接的username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc连接的password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<!-- Hive默认在HDFS的工作目录 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
</configuration>
4.初始化元数据
使用 bin/schematool -initSchema -dbType mysql
,初始化元数据
root@node0001:/opt/apache-hive-2.3.9-bin# bin/schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://127.0.0.1:3306/metastore?useSSL=false
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
5.使用
root@node0001:/opt/apache-hive-2.3.9-bin# bin/hive
hive> show databases;
OK
default
Time taken: 5.964 seconds, Fetched: 1 row(s)
hive> show tables;
OK
Time taken: 0.085 seconds
hive> create table stu(id int, name string);
OK
Time taken: 1.074 seconds
hive> insert into stu values(3,"wangwu");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013183233_4c43800c-a75d-48c5-8e86-cb81bc20c7d4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 18:32:37,254 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local2081344082_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/stu/.hive-staging_hive_2023-10-13_18-32-33_348_4173909191419990915-1/-ext-10000
Loading data to table default.stu
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 9 HDFS Write: 85 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 4.569 seconds
hive> insert into stu values(4,"zhaoliu");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013183256_1489e679-c84b-434a-9f70-fb7e9e46377d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 18:32:57,753 Stage-1 map = 100%, reduce = 0%
Ended Job = job_local706844254_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/stu/.hive-staging_hive_2023-10-13_18-32-56_124_8498680797194661419-1/-ext-10000
Loading data to table default.stu
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 86 HDFS Write: 172 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.194 seconds
hive> select * from stu;
OK
3 wangwu
4 zhaoliu
Time taken: 0.234 seconds, Fetched: 2 row(s)
hive> quit;
查看数据:
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# ls -lh
total 285M
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741825_1001.meta
-rw-r--r-- 1 root root 14M Oct 13 17:19 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741827_1003.meta
-rw-r--r-- 1 root root 14M Oct 13 17:19 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741828_1004.meta
-rw-r--r-- 1 root root 11 Oct 13 17:30 blk_1073741830
-rw-r--r-- 1 root root 11 Oct 13 17:30 blk_1073741830_1006.meta
-rw-r--r-- 1 root root 7 Oct 13 17:39 blk_1073741833
-rw-r--r-- 1 root root 11 Oct 13 17:39 blk_1073741833_1009.meta
-rw-r--r-- 1 root root 9 Oct 13 18:32 blk_1073741836
-rw-r--r-- 1 root root 11 Oct 13 18:32 blk_1073741836_1012.meta
-rw-r--r-- 1 root root 10 Oct 13 18:32 blk_1073741839
-rw-r--r-- 1 root root 11 Oct 13 18:32 blk_1073741839_1015.meta
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741836
3wangwu
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741839
4zhaoliu
6.查看元数据
- 查看元数据库中存储的库信息
select * from DBS
- 查看元数据库中存储的表信息
select * from TBLS
- 查看元数据库中存储的表中列相关信息
select * from COLUMNS_V2
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| metastore |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> use metastore;
mysql> select * from DBS;
+-------+-----------------------+------------------------------------------+---------+------------+------------+
| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
| 1 | Default Hive database | hdfs://node0001:9000/user/hive/warehouse | default | public | ROLE |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
1 row in set (0.00 sec)
mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
| 1 | 1697193100 | 1 | 0 | root | 0 | 1 | stu | MANAGED_TABLE | NULL | NULL | 0x00 |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
1 row in set (0.00 sec)
mysql> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | id | int | 0 |
| 1 | NULL | name | string | 1 |
+-------+---------+-------------+-----------+-------------+
2 rows in set (0.00 sec)