Hadoop分布式文件系统的配置与实现

snow chuai汇总、整理、撰写—2020/2/28


1. 拓扑
------------+---------------------------+---------------------------+------------
            |                           |                           |
        eth0|192.168.10.11          eth0|192.168.10.12          eth0|192.168.10.13
+-----------+-----------+   +-----------+-----------+   +-----------+-----------+
|  [ srv1.1000cc.net ]  |   |  [ srv2.1000cc.net ]  |   |  [ srv3.1000cc.net ]  |
|                       |   |                       |   |                       |
|       Master Node     |   |      Slave Node       |   |      Slave Node       |
|                       |   |                       |   |                       |
+-----------------------+   +-----------------------+   +-----------------------+
2. 安装JDK
1) 自行下载JDK8
2) 在所有的节点上均安装JDK8 [root@srv1 ~]# rpm -Uvh jdk-8u221-linux-x64.rpm
[root@srv2 ~]# rpm -Uvh jdk-8u221-linux-x64.rpm
[root@srv3 ~]# rpm -Uvh jdk-8u221-linux-x64.rpm
3) 设置JDK环境 # srv1配置 [root@srv1 ~]# vim /etc/profile.d/jdk8.sh export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
[root@srv1 ~]# source /etc/profile.d/jdk8.sh
# srv2配置 [root@srv2 ~]# vim /etc/profile.d/jdk8.sh export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
[root@srv2 ~]# source /etc/profile.d/jdk8.sh
# srv3配置 [root@srv3 ~]# vim /etc/profile.d/jdk8.sh export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
[root@srv3 ~]# source /etc/profile.d/jdk8.sh
4) 选择JDK版本 srv1配置 [root@srv1 ~]# alternatives --config java There is 1 program that provides 'java'.
Selection Command ----------------------------------------------- *+ 1 /usr/java/jdk1.8.0_221-amd64/jre/bin/java
Enter to keep the current selection[+], or type selection number: # 回车
# srv2配置 [root@srv2 ~]# alternatives --config java There is 1 program that provides 'java'.
Selection Command ----------------------------------------------- *+ 1 /usr/java/jdk1.8.0_221-amd64/jre/bin/java
Enter to keep the current selection[+], or type selection number: # 回车
# srv3配置 [root@srv3 ~]# alternatives --config java There is 1 program that provides 'java'.
Selection Command ----------------------------------------------- *+ 1 /usr/java/jdk1.8.0_221-amd64/jre/bin/java
Enter to keep the current selection[+], or type selection number: # 回车
3.创建账户并完成ssh免密登录
1) 在所有节点上创建hadoop账户
#srv1配置
[root@srv1 ~]# useradd -d /usr/hadoop hadoop
[root@srv1 ~]# chmod 755 /usr/hadoop/
[root@srv1 ~]# passwd hadoop
Changing password for user hadoop.
New password: 
Retype new password: 
passwd: all authentication tokens updated successfully.
# srv2配置 [root@srv2 ~]# useradd -d /usr/hadoop hadoop [root@srv2 ~]# chmod 755 /usr/hadoop/ [root@srv2 ~]# passwd hadoop Changing password for user hadoop. New password: Retype new password: passwd: all authentication tokens updated successfully.
# srv3配置 [root@srv3 ~]# useradd -d /usr/hadoop hadoop [root@srv3 ~]# chmod 755 /usr/hadoop/ [root@srv3 ~]# passwd hadoop Changing password for user hadoop. New password: Retype new password: passwd: all authentication tokens updated successfully.
2) 在主节点上创建ssh秘钥并传递至其他节点 [root@srv1 ~]# su - hadoop
[hadoop@srv1 ~]$ ssh-keygen -q -N '' Enter file in which to save the key (/usr/hadoop/.ssh/id_rsa): # 回车
[hadoop@srv1 ~]$ ssh-copy-id localhost [hadoop@srv1 ~]$ ssh-copy-id srv2.1000cc.net [hadoop@srv1 ~]$ ssh-copy-id srv3.1000cc.net
4. 安装与配置Hadoop
1) 在所有节点上下载并安装、配置hadoop
[hadoop@srv1 ~]$ wget \
https://mirror.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
[hadoop@srv2 ~]$ wget \ https://mirror.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
[hadoop@srv3 ~]$ wget \ https://mirror.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
# 解压时去除第1层目录 [hadoop@srv1 ~]$ tar zxvf hadoop-3.2.1.tar.gz -C /usr/hadoop --strip-components 1 [hadoop@srv2 ~]$ tar zxvf hadoop-3.2.1.tar.gz -C /usr/hadoop --strip-components 1 [hadoop@srv3 ~]$ tar zxvf hadoop-3.2.1.tar.gz -C /usr/hadoop --strip-components 1
# srv1配置 [hadoop@srv1 ~]$ vim ~/.bash_profile ...... ...... ...... ...... ...... ......
# 于文件最后添加如下内容 export HADOOP_HOME=/usr/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CLASSPATH=${HADOOP_HOME} export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[hadoop@srv1 ~]$ source ~/.bash_profile
# srv2配置 [hadoop@srv2 ~]$ vim ~/.bash_profile ...... ...... ...... ...... ...... ......
# 于文件最后添加如下内容 export HADOOP_HOME=/usr/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CLASSPATH=${HADOOP_HOME} export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[hadoop@srv2 ~]$ source ~/.bash_profile
# srv3配置 [hadoop@srv3 ~]$ vim ~/.bash_profile ...... ...... ...... ...... ...... ......
# 于文件最后添加如下内容 export HADOOP_HOME=/usr/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CLASSPATH=${HADOOP_HOME} export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[hadoop@srv3 ~]$ source ~/.bash_profile

3) 配置hadoop [hadoop@srv1 ~]$ mkdir ~/datanode [hadoop@srv1 ~]$ ssh srv2.1000cc.net "mkdir ~/datanode" [hadoop@srv1 ~]$ ssh srv3.1000cc.net "mkdir ~/datanode"
(1) 配置hdfs-site [hadoop@srv1 ~]$ vim ~/etc/hadoop/hdfs-site.xml # 在[configuration]区段中增加如下内容 <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/hadoop/datanode</value> </property> </configuration>
[hadoop@srv1 ~]$ scp ~/etc/hadoop/hdfs-site.xml srv2.1000cc.net:~/etc/hadoop/ [hadoop@srv2 ~]$ scp ~/etc/hadoop/hdfs-site.xml srv3.1000cc.net:~/etc/hadoop/
(2) 配置core-site [hadoop@srv1 ~]$ vim ~/etc/hadoop/core-site.xml # 在[configuration]区段中增加如下内容 <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://srv1.1000cc.net:9000/</value> </property> </configuration>
[hadoop@srv1 ~]$ scp ~/etc/hadoop/core-site.xml srv2.1000cc.net:~/etc/hadoop/ [hadoop@srv1 ~]$ scp ~/etc/hadoop/core-site.xml srv3.1000cc.net:~/etc/hadoop/
(3) 设定JDK环境 [hadoop@srv1 ~]$ sed -i -e 's/\${JAVA_HOME}/\/usr\/java\/default/' ~/etc/hadoop/hadoop-env.sh [hadoop@srv1 ~]$ scp ~/etc/hadoop/hadoop-env.sh srv2.1000cc.net:~/etc/hadoop/ [hadoop@srv1 ~]$ scp ~/etc/hadoop/hadoop-env.sh srv3.1000cc.net:~/etc/hadoop/
(4) 增加namenode信息 [hadoop@srv1 ~]$ mkdir ~/namenode [hadoop@srv1 ~]$ vim ~/etc/hadoop/hdfs-site.xml # 在[configuration]区段中增加如下内容 <configuration> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/hadoop/datanode</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/hadoop/namenode</value> </property> </configuration>
(5) 设定Mapred-site # 获取classpath的值 [hadoop@srv1 ~]$ yarn classpath /usr/hadoop/etc/hadoop:/usr/hadoop/share/hadoop/common/lib/*:/usr/hadoop/share/hadoop/common/*:/usr/hadoop/share/hadoop/hdfs: /usr/hadoop/share/hadoop/hdfs/lib/*:/usr/hadoop/share/hadoop/hdfs/*:/usr/hadoop/share/hadoop/mapreduce/lib/*: /usr/hadoop/share/hadoop/mapreduce/*:/usr/hadoop/share/hadoop/yarn:/usr/hadoop/share/hadoop/yarn/lib/*: /usr/hadoop/share/hadoop/yarn/*
[hadoop@srv1 ~]$ vim ~/etc/hadoop/mapred-site.xml # 在[configuration]区段中增加如下内容 <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value> /usr/hadoop/etc/hadoop, /usr/hadoop/share/hadoop/common/lib/*, /usr/hadoop/share/hadoop/common/*, /usr/hadoop/share/hadoop/hdfs, /usr/hadoop/share/hadoop/hdfs/lib/*, /usr/hadoop/share/hadoop/hdfs/*, /usr/hadoop/share/hadoop/mapreduce/lib/*, /usr/hadoop/share/hadoop/mapreduce/*, /usr/hadoop/share/hadoop/yarn, /usr/hadoop/share/hadoop/yarn/lib/*, /usr/hadoop/share/hadoop/yarn/* </value> </property> </configuration>
(6) 设定yarn-site [hadoop@srv1 ~]$ vim ~/etc/hadoop/yarn-site.xml # 在[configuration]区段中删除注释行,并增加如下内容 <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>srv1.1000cc.net</value> </property> <property> <name>yarn.nodemanager.hostname</name> <value>srv1.1000cc.net</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
(7) 添加slaves hadoop@srv1 ~]$ vim ~/etc/hadoop/slaves srv1.1000cc.net srv2.1000cc.net srv3.1000cc.net
5. 格式化NameNode并启动Hadoop服务
1) 格式化NameNode
[hadoop@srv1 ~]$ hdfs namenode -format
WARNING: /usr/hadoop/logs does not exist. Creating.
2020-02-28 06:34:30,174 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = srv1.1000cc.net/192.168.10.11
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.2.1
......
......
2020-02-28 06:34:32,132 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at srv1.1000cc.net/192.168.10.11
************************************************************/
2) 启动Hadoop服务 [hadoop@srv1 ~]$ start-dfs.sh Starting namenodes on [srv1.1000cc.net] Starting datanodes Starting secondary namenodes [srv1.1000cc.net]
[hadoop@srv1 ~]$ start-yarn.sh Starting resourcemanager Starting nodemanagers
3) 查看Hadoop服务状态 [hadoop@srv1 ~]$ jps 5171 DataNode 5365 SecondaryNameNode 6204 Jps 5726 ResourceManager 5055 NameNode 5839 NodeManager
6. 创建一个测试程序
1) 执行测试
[hadoop@srv1 ~]$ hdfs dfs -mkdir /test
# 拷贝一个本地文件到/test目录 [hadoop@srv1 ~]$ hdfs dfs -copyFromLocal ~/NOTICE.txt /test
# 显示文件里的内容 [hadoop@srv1 ~]$ hdfs dfs -cat /test/NOTICE.txt This product includes software developed by The Apache Software Foundation (http://www.apache.org/). ...... ...... Expert Group and released to the public domain, as explained at http://creativecommons.org/publicdomain/zero/1.0/
# 执行给一个测试程序 [hadoop@srv1 ~]$ hadoop jar ~/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar \ wordcount /test/NOTICE.txt /output01 2020-02-28 06:44:03,301 INFO client.RMProxy: Connecting to ResourceManager at srv1.1000cc.net/192.168.10.11:8032 ...... ......
2) 结果显示 # 显示运行结果 [hadoop@srv1 ~]$ hdfs dfs -ls /output01 Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2020-02-28 07:35 /output01/_SUCCESS -rw-r--r-- 2 hadoop supergroup 12464 2020-02-28 07:35 /output01/part-r-00000
# 显示文件里的内容(生成字数) [hadoop@srv1 ~]$ hdfs dfs -cat /output01/part-r-00000 ...... ...... written 7 you 2 zlib 1 © 6
7. 通过WEB查看Hadoop集群信息
1) 查看Hadoop集群的摘要
[浏览器]===>http://srv1.1000cc.net:9870

2) 查看Hadoop集群的信息 [浏览器]===>http://srv1.1000cc.net:8088

 

如对您有帮助,请随缘打个赏。^-^

gold