文章目录
- 1、免密登录
- 2、安装hadoop
- 3、Spark配置
具体详细报告见资源部分,全部实验内容已经上传,如有需要请自行下载。
1、免密登录
使用的配置命令:
cd ~/.ssh/
ssh-keygen -t rsa
Enter键回车
y
回车
回车
出现如上所示
cat ./id_rsa.pub >> ./authorized_keys
ssh hadoop01
exit
scp /root/.ssh/id_rsa.pub root@hadoop02:/root/.ssh/id_rsa.pub
然后输入hadoop02的密码,去复制就行
scp /root/.ssh/id_rsa.pub root@hadoop03:/root/.ssh/id_rsa.pub
然后输入hadoop03的密码,去复制就行
显示图示这样的,重启就行了。
全部重启一下,从开头输入一下命令,验证:ssh hadoop02
ssh hadoop03
不需要密码,则已经成功,退出:exit
2、安装hadoop
java -version
显示如下:
nano ~/.bashrc
在文本的最后加入:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
然后保存退出:Ctrl+X,然后输入Y,回车即可
让配置生效:
source ~/.bashrc
验证JAVA_HOME 配置是否成功:
echo $JAVA_HOME
如上所示JAVA_HOME 已经配置成功
cd /usr/local
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
解压:tar -xzvf hadoop-3.3.5.tar.gz
重命名:mv hadoop-3.3.5 /usr/local/hadoop
修改文件权限:chown -R root:root ./hadoop
ls -1 hadoop/
配置 Hadoop 环境变量:
nano ~/.bashrc
在最下面加入:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
保存退出:Ctrl+X,Y,回车
source ~/.bashrc
检查Hadoop命令是否可用:
cd /usr/local/hadoop
./bin/hadoop version
配置集群/分布式环境:
修改文件profile:
cd /usr/local/hadoop/etc/hadoop
nano /etc/profile
加入如下内容:
# Hadoop Service Users
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
source /etc/profile
修改文件workers:
nano workers
hadoop01
hadoop02
hadoop03
保存退出:Ctrl+X,Y,回车
修改文件core-site.xml:
nano core-site.xml
添加如下配置:
<configuration><property><name>fs.defaultFS</name><value>hdfs://hadoop01:9000</value></property><property><name>hadoop.tmp.dir</name><value>file:/usr/local/hadoop/tmp</value><description>Abase for other temporary directories.</description></property>
</configuration>
修改文件hdfs-site.xml:
nano hdfs-site.xml
添加如下内容:
<configuration><property><name>dfs.namenode.secondary.http-address</name><value>hadoop03:50090</value></property><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.namenode.name.dir</name><value>file:/usr/local/hadoop/tmp/dfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>file:/usr/local/hadoop/tmp/dfs/data</value></property>
</configuration>
修改文件mapred-site.xml:
nano mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>hadoop01:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop01:19888</value></property><property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
</configuration>
保存退出:Ctrl+X,Y,回车
修改文件 yarn-site.xml:
nano yarn-site.xml
<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>hadoop01</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
</configuration>
保存退出:Ctrl+X,Y,回车
修改文件hadoop-env.sh:
nano hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
保存退出:Ctrl+X,Y,回车
复制hadoop01节点的Hadoop文件夹,分发:
cd /usr/local
tar -zcf ~/hadoop.master.tar.gz ./hadoop
cd ~
scp ./hadoop.master.tar.gz hadoop02:/root
scp ./hadoop.master.tar.gz hadoop03:/root
在02中:
tar -zxf ~/hadoop.master.tar.gz -C /usr/local
chown -R root /usr/local/hadoop
在hadoop03中:
tar -zxf ~/hadoop.master.tar.gz -C /usr/local
chown -R root /usr/local/hadoop
在hadoop01中:
cd /usr/local/hadoop
./bin/hdfs namenode -format
启动hadoop:
cd /usr/local/hadoop
./sbin/start-dfs.sh
./sbin/start-yarn.sh
./sbin/mr-jobhistory-daemon.sh start historyserver
jps
在hadoop02:jps
在hadoop03:jps
回hadoop01:
./bin/hdfs dfsadmin -report
stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver
成功结束Hadoop相关配置。
3、Spark配置
将spark解压到/usr/local中:
tar -zxf /root/spark-3.4.2-bin-without-hadoop.tgz -C /usr/local
cd /usr/local
mv ./spark-3.4.2-bin-without-hadoop ./spark
chown -R root ./spark
(2)配置相关文件:
修改spark-env.sh文件:
cd /usr/local/spark
cp ./conf/spark-env.sh.template ./conf/spark-env.sh
nano ./conf/spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
Ctrl+X,Y,回车
发现不对,往回找,然后一个里面内容不对,修改.bashrc文件:
nano ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/binexport PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport SPARK_HOME=/usr/local/spark
export JRE_HOME=${JAVA_HOME}jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:${JAVA_HOME}/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin
export PYSPARK_PYTHON=/root/anaconda3/bin/python
Ctrl+X,Y,回车
source ~/.bashrc
(3)设置日志信息:
cd /usr/local/spark/conf
sudo mv log4j2.properties.template log4j.properties
vim log4j.properties
按i进入编辑模式
将里面的rootLogger.level改成=error
先ESC退出编辑模式,然后保存并退出:在命令模式下输入 :wq,然后按 Enter。
验证Spark是否安装成功:
cd /usr/local/spark
./bin/run-example SparkPi
使用Anaconda修改Python版本:
conda create -n pyspark python=3.8
y
切换python环境:
conda activate pyspark
启动pyspark:
cd /usr/local/spark
./bin/pyspark
安装 Spark(Spark on YARN模式):
cd /usr/local/spark
./bin/pyspark --master yarn
成功结束!