hive · 2022-11-30 0

hive 安装 (derby和mysql)

一、准备

hive 与 hadoop 对应版本:

https://hive.apache.org/general/downloads/

1.下载安装 jdk-8u341-linux-x64.tar.gz
2.下载安装 hadoop-2.10.2.tar.gz
3.下载 apache-hive-2.3.9-bin.tar.gz

二、安装 hive

1.解压

把 apache-hive-2.3.9-bin.tar.gz 解压到 /opt/ 目录下

tar -xf apache-hive-2.3.9-bin.tar.gz -C /opt/

2.修改配置文件的 HADOOP_HOME

把配置文件 hive-env.sh.template,重命名为 hive-env.sh,并配置 HADOOP_HOME

# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
HADOOP_HOME=/opt/hadoop-2.10.2

3.初始化数据库 derby

删除 derby 数据目录:

root@node0001:/opt/apache-hive-2.3.9-bin# rm -rf metastore_db/

初始化 derby 数据库:

root@node0001:/opt/apache-hive-2.3.9-bin# bin/schematool -initSchema -dbType derby
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:derby:;databaseName=metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.EmbeddedDriver
Metastore connection User:   APP
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.derby.sql
Initialization script completed
schemaTool completed

三、使用

1.启动 hive

root@node0001:/opt/apache-hive-2.3.9-bin# bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

2.使用 hive

hive> show databases;
OK
default
Time taken: 3.052 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
Time taken: 0.042 seconds
hive> create table student(id int, name string);
OK
Time taken: 0.407 seconds
hive> show tables;
OK
student
Time taken: 0.023 seconds, Fetched: 1 row(s)
hive> desc student;
OK
id                      int
name                    string
Time taken: 0.102 seconds, Fetched: 2 row(s)
hive> insert into student values(1,"zhangsan");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013173034_61e5db72-71c1-4ee1-b817-4a392fffe16d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 17:30:39,160 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1229433203_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/student/.hive-staging_hive_2023-10-13_17-30-34_835_889218708810284417-1/-ext-10000
Loading data to table default.student
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 11 HDFS Write: 93 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 4.763 seconds
hive> insert into student values(2,"lisi");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013173940_edca02cd-4fe6-4b90-beab-720bd084cf4e
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 17:39:42,492 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local2131074738_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/student/.hive-staging_hive_2023-10-13_17-39-40_232_8370877040343431751-1/-ext-10000
Loading data to table default.student
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 18 HDFS Write: 85 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 3.009 seconds
hive> select * from student;
OK
1   zhangsan
2   lisi
Time taken: 0.151 seconds, Fetched: 2 row(s)
hive> quit;

数据位置:

root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# ls -lh
total 285M
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741825_1001.meta
-rw-r--r-- 1 root root  14M Oct 13 17:19 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741827_1003.meta
-rw-r--r-- 1 root root  14M Oct 13 17:19 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741828_1004.meta
-rw-r--r-- 1 root root   11 Oct 13 17:30 blk_1073741830
-rw-r--r-- 1 root root   11 Oct 13 17:30 blk_1073741830_1006.meta
-rw-r--r-- 1 root root    7 Oct 13 17:39 blk_1073741833
-rw-r--r-- 1 root root   11 Oct 13 17:39 blk_1073741833_1009.meta
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# 
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741830
1zhangsan
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741833
2lisi

四、mysql 存储元数据

1.安装 mysql 8.0
2.下载 mysql-connector-java-8.0.30.jar

1.放入驱动

mysql驱动放入hive

root@node0001:~# cp mysql-connector-java-8.0.30.jar /opt/apache-hive-2.3.9-bin/lib/

2.创建元数据库

mysql 创建 metastore 库,存放 hive 元数据库

mysql> create database metastore;
Query OK, 1 row affected (0.00 sec)
mysql> quit;

3.hive 配置

root@node0001:/opt/apache-hive-2.3.9-bin/conf# touch hive-site.xml

hive-site.xml 如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://127.0.0.1:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
         <value>com.mysql.cj.jdbc.Driver</value>
    </property>

    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
    </property>

    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
</configuration>

4.初始化元数据

使用 bin/schematool -initSchema -dbType mysql,初始化元数据

root@node0001:/opt/apache-hive-2.3.9-bin# bin/schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.10.2/share/hadoop/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:    jdbc:mysql://127.0.0.1:3306/metastore?useSSL=false
Metastore Connection Driver :    com.mysql.cj.jdbc.Driver
Metastore connection User:   root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

5.使用

root@node0001:/opt/apache-hive-2.3.9-bin# bin/hive
hive> show databases;
OK
default
Time taken: 5.964 seconds, Fetched: 1 row(s)
hive> show tables;
OK
Time taken: 0.085 seconds
hive> create table stu(id int, name string);
OK
Time taken: 1.074 seconds
hive> insert into stu values(3,"wangwu");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013183233_4c43800c-a75d-48c5-8e86-cb81bc20c7d4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 18:32:37,254 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local2081344082_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/stu/.hive-staging_hive_2023-10-13_18-32-33_348_4173909191419990915-1/-ext-10000
Loading data to table default.stu
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 9 HDFS Write: 85 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 4.569 seconds
hive> insert into stu values(4,"zhaoliu");
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20231013183256_1489e679-c84b-434a-9f70-fb7e9e46377d
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2023-10-13 18:32:57,753 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local706844254_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://node0001:9000/user/hive/warehouse/stu/.hive-staging_hive_2023-10-13_18-32-56_124_8498680797194661419-1/-ext-10000
Loading data to table default.stu
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 86 HDFS Write: 172 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.194 seconds
hive> select * from stu;
OK
3   wangwu
4   zhaoliu
Time taken: 0.234 seconds, Fetched: 2 row(s)
hive> quit;

查看数据:

root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# ls -lh
total 285M
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741825_1001.meta
-rw-r--r-- 1 root root  14M Oct 13 17:19 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 128M Oct 13 17:19 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 13 17:19 blk_1073741827_1003.meta
-rw-r--r-- 1 root root  14M Oct 13 17:19 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 13 17:19 blk_1073741828_1004.meta
-rw-r--r-- 1 root root   11 Oct 13 17:30 blk_1073741830
-rw-r--r-- 1 root root   11 Oct 13 17:30 blk_1073741830_1006.meta
-rw-r--r-- 1 root root    7 Oct 13 17:39 blk_1073741833
-rw-r--r-- 1 root root   11 Oct 13 17:39 blk_1073741833_1009.meta
-rw-r--r-- 1 root root    9 Oct 13 18:32 blk_1073741836
-rw-r--r-- 1 root root   11 Oct 13 18:32 blk_1073741836_1012.meta
-rw-r--r-- 1 root root   10 Oct 13 18:32 blk_1073741839
-rw-r--r-- 1 root root   11 Oct 13 18:32 blk_1073741839_1015.meta
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741836
3wangwu
root@node0001:/var/hadoop/pseudo/dfs/data/current/BP-795811933-172.17.0.2-1697188224715/current/finalized/subdir0/subdir0# cat blk_1073741839
4zhaoliu

6.查看元数据

  1. 查看元数据库中存储的库信息 select * from DBS
  2. 查看元数据库中存储的表信息 select * from TBLS
  3. 查看元数据库中存储的表中列相关信息 select * from COLUMNS_V2
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| metastore          |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)
mysql> use metastore;
mysql> select * from DBS;
+-------+-----------------------+------------------------------------------+---------+------------+------------+
| DB_ID | DESC                  | DB_LOCATION_URI                          | NAME    | OWNER_NAME | OWNER_TYPE |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
|     1 | Default Hive database | hdfs://node0001:9000/user/hive/warehouse | default | public     | ROLE       |
+-------+-----------------------+------------------------------------------+---------+------------+------------+
1 row in set (0.00 sec)

mysql> select * from TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT | IS_REWRITE_ENABLED                     |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
|      1 |  1697193100 |     1 |                0 | root  |         0 |     1 | stu      | MANAGED_TABLE | NULL               | NULL               | 0x00                                   |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+----------------------------------------+
1 row in set (0.00 sec)

mysql> select * from COLUMNS_V2;
+-------+---------+-------------+-----------+-------------+
| CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX |
+-------+---------+-------------+-----------+-------------+
|     1 | NULL    | id          | int       |           0 |
|     1 | NULL    | name        | string    |           1 |
+-------+---------+-------------+-----------+-------------+
2 rows in set (0.00 sec)