hadoop · 2019-12-06 0

Hadoop全分布式安装

一、准备

准备四个机器,分别是 node0001、node0002、node0003、node0004

1.下载 jdk-8u341-linux-x64.tar.gz

2.下载 hadoop-2.6.5.tar.gz

3.修改别名

配置 /etc/hostname 文件,四个机器分别配置 node0001、node0002、node0003、node0004

4.配置/etc/hosts

为了方便,配置映射信息

172.17.0.2      node0001
172.17.0.3      node0002
172.17.0.4      node0003
172.17.0.5      node0004

二、安装hadoop

1.解压

把 jdk-8u341-linux-x64.tar.gz 和 hadoop-2.6.5.tar.gz 解压到 /opt/ 目录下

tar -xf jdk-8u341-linux-x64.tar.gz -C /opt/
tar -xf hadoop-2.6.5.tar.gz -C /opt/

2.修改配置文件的JAVA_HOME

在 /opt/hadoop-2.6.5/etc/hadoop/ 目录下,有 hadoop-env.sh、mapred-env.sh、yarn-env.sh 文件,修改其 JAVA_HOME 的路径

export JAVA_HOME=/opt/jdk1.8.0_341/

3.配置主节点信息,dir目录信息

打开 /opt/hadoop-2.6.5/etc/hadoop/core-site.xml 文件

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://node0001:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/var/hadoop/full</value>
        </property>
</configuration>

4.配置从节点信息

打开 /opt/hadoop-2.6.5/etc/hadoop/slaves 文件

node0002
node0003
node0004

5.配置副本数量、及secondary

打开 /opt/hadoop-2.6.5/etc/hadoop/hdfs-site.xml 文件

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>node0002:50090</value>
        </property>
</configuration>

三、配置 ssh

node0001 安装 openssh-client,生成公钥

ssh-keygen -t rsa

node0001、node0002、node0003、node0004 开启 sshserver 服务

把 node0001 的 ssh 公钥追加到 node0001、node0002、node0003、node0004 .ssh 的 authorized_keys 文件中
(node0001 可以免密登陆 node0001、node0002、node0003、node0004)

四、使用

1.格式化

在 node0001 上格式化 namenode

hdfs namenode -format

格式化成功

root@node0001:/opt/hadoop-2.6.5/bin# ./hdfs namenode -format
23/10/11 07:26:07 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = node0001/172.17.0.2
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.6.5
23/10/11 07:26:07 INFO util.GSet: Computing capacity for map NameNodeRetryCache
23/10/11 07:26:07 INFO util.GSet: VM type       = 64-bit
23/10/11 07:26:07 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
23/10/11 07:26:07 INFO util.GSet: capacity      = 2^15 = 32768 entries
23/10/11 07:26:07 INFO namenode.NNConf: ACLs enabled? false
23/10/11 07:26:07 INFO namenode.NNConf: XAttrs enabled? true
23/10/11 07:26:07 INFO namenode.NNConf: Maximum size of an xattr: 16384
23/10/11 07:26:07 INFO namenode.FSImage: Allocated new BlockPoolId: BP-950352295-172.17.0.2-1697009167802
23/10/11 07:26:07 INFO common.Storage: Storage directory /var/hadoop/full/dfs/name has been successfully formatted.
23/10/11 07:26:07 INFO namenode.FSImageFormatProtobuf: Saving image file /var/hadoop/full/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
23/10/11 07:26:07 INFO namenode.FSImageFormatProtobuf: Image file /var/hadoop/full/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 321 bytes saved in 0 seconds.
23/10/11 07:26:07 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
23/10/11 07:26:07 INFO util.ExitUtil: Exiting with status 0
23/10/11 07:26:07 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node0001/172.17.0.2
************************************************************/

2.启动

在 node0001 上使用 start-dfs.sh 启动,可以看到 node0001、node0002、node0003、node0004 都启动了

start-dfs.sh
root@node0001:/opt/hadoop-2.6.5/sbin# ./start-dfs.sh 
Starting namenodes on [node0001]
node0001: starting namenode, logging to /opt/hadoop-2.6.5/logs/hadoop-root-namenode-node0001.out
node0002: datanode running as process 4561. Stop it first.
node0003: datanode running as process 4508. Stop it first.
node0004: datanode running as process 4535. Stop it first.
Starting secondary namenodes [node0002]
node0002: secondarynamenode running as process 4646. Stop it first.

jps 查看 java 进程

node0001

root@node0001:/opt/jdk1.8.0_341/bin# ./jps 
5463 Jps
5224 NameNode

node0002

root@node0002:/opt/jdk1.8.0_341/bin# ./jps
4561 DataNode
4803 Jps
4646 SecondaryNameNode

node0003

root@node0003:/opt/jdk1.8.0_341/bin# ./jps 
4629 Jps
4508 DataNode

node0004

root@node0004:/opt/jdk1.8.0_341/bin# ./jps 
4656 Jps
4535 DataNode

在浏览器输入 node0001:50070 可以看到界面

3.使用 hdfs 建立目录

建立目录 /user/roots

hdfs dfs -mkdir -p /user/root
root@node0001:/opt/hadoop-2.6.5/bin# ./hdfs dfs -mkdir -p /user/root

4.上传文件

把文件上传到 /user/root

hdfs dfs -put /root/test.tar.gz /user/root
hdfs dfs -put /root/test1.tar.gz /user/root

原文件大小 141.3 MB

root@node0001:/opt/hadoop-2.6.5/bin# ./hdfs dfs -put /root/test.tar.gz /user/root
root@node0001:/opt/hadoop-2.6.5/bin# ./hdfs dfs -put /root/test1.tar.gz /user/root

5.数据位置

原文件大小 141.3 MB

node0001 上 namenode 信息在:

root@node0001:/var/hadoop/full/dfs/name/current# ls -lh
total 2.1M
-rw-r--r-- 1 root root  201 Oct 11 15:26 VERSION
-rw-r--r-- 1 root root   42 Oct 11 15:33 edits_0000000000000000001-0000000000000000002
-rw-r--r-- 1 root root 1.0M Oct 11 15:33 edits_0000000000000000003-0000000000000000003
-rw-r--r-- 1 root root   42 Oct 11 15:41 edits_0000000000000000004-0000000000000000005
-rw-r--r-- 1 root root 1.0M Oct 11 15:50 edits_inprogress_0000000000000000006
-rw-r--r-- 1 root root  321 Oct 11 15:33 fsimage_0000000000000000002
-rw-r--r-- 1 root root   62 Oct 11 15:33 fsimage_0000000000000000002.md5
-rw-r--r-- 1 root root  321 Oct 11 15:41 fsimage_0000000000000000005
-rw-r--r-- 1 root root   62 Oct 11 15:41 fsimage_0000000000000000005.md5
-rw-r--r-- 1 root root    2 Oct 11 15:41 seen_txid

node0002 上 datanode 信息在:

root@node0002:/var/hadoop/full/dfs/data/current/BP-950352295-172.17.0.2-1697009167802/current/finalized/subdir0/subdir0# ls -lh
total 259M
-rw-r--r-- 1 root root 128M Oct 11 15:42 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 11 15:42 blk_1073741825_1001.meta
-rw-r--r-- 1 root root 128M Oct 11 15:50 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 11 15:50 blk_1073741827_1003.meta

node0003 上 datanode 信息在:

root@node0003:/var/hadoop/full/dfs/data/current/BP-950352295-172.17.0.2-1697009167802/current/finalized/subdir0/subdir0# ls -lh
total 27M
-rw-r--r-- 1 root root  14M Oct 11 15:42 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 11 15:42 blk_1073741826_1002.meta
-rw-r--r-- 1 root root  14M Oct 11 15:50 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 11 15:50 blk_1073741828_1004.meta

node0004 上 datanode 信息在:

root@node0004:/var/hadoop/full/dfs/data/current/BP-950352295-172.17.0.2-1697009167802/current/finalized/subdir0/subdir0# ls -lh
total 285M
-rw-r--r-- 1 root root 128M Oct 11 15:42 blk_1073741825
-rw-r--r-- 1 root root 1.1M Oct 11 15:42 blk_1073741825_1001.meta
-rw-r--r-- 1 root root  14M Oct 11 15:42 blk_1073741826
-rw-r--r-- 1 root root 107K Oct 11 15:42 blk_1073741826_1002.meta
-rw-r--r-- 1 root root 128M Oct 11 15:50 blk_1073741827
-rw-r--r-- 1 root root 1.1M Oct 11 15:50 blk_1073741827_1003.meta
-rw-r--r-- 1 root root  14M Oct 11 15:50 blk_1073741828
-rw-r--r-- 1 root root 107K Oct 11 15:50 blk_1073741828_1004.meta

6.关闭

在 node0001 上使用 stop-dfs.sh 关闭 hadoop,可以看到关闭信息

stop-dfs.sh

关闭成功

root@node0001:/opt/hadoop-2.6.5/sbin# ./stop-dfs.sh 
Stopping namenodes on [node0001]
node0001: stopping namenode
node0002: stopping datanode
node0004: stopping datanode
node0003: stopping datanode
Stopping secondary namenodes [node0002]
node0002: stopping secondarynamenode