在生活中我們大家有時候肯定會因為搬家而煩惱，但是那些搬家公司可以為我們解決這些問題，今天我們講的Yarn在Hadoop集群中就起到了這樣的作用，來負責資源調度

Yarn

5.Yarn的調度器和調度算法
（1）先進先出調度器（FIFO）（2）容量調度器（Capacity Scheduler）（3）公平調度器（Fair Scheduler）
6.1 yarn Application查看任務6.2 yarn logs查看日志6.3 yarn applicationattempt查看嘗試運行的任務6.4 yarn container查看容器6.5 yarn node查看節點狀態6.6 yarn rmadmin更新配置6.7 yarn queue查看隊列
8.3 說一下關于yarn的幾種資源調度器8.4 簡單介紹三個組件的作用？8.5 什么是container？

1.什么是Yarn？

Apache Hadoop YARN 是開源 Hadoop 分布式處理框架中的資源管理和作業調度技術。作為 Apache Hadoop 的核心組件之一，YARN 負責將系統資源分配給在 Hadoop 集群中運行的各種應用程序，并調度要在不同集群節點上執行的任務。

2.Yarn的基礎架構

YARN主要由ResourceManager、NodeManager、ApplicationMaster和Container等組件構成。

YARN的基本組成結構，YARN 主要由 ResourceManager、NodeManager、ApplicationMaster 和 Container 等幾個組件構成。

ResourceManager是Master上一個獨立運行的進程，負責集群統一的資源管理、調度、分配等等；
NodeManager是Slave上一個獨立運行的進程，負責上報節點的狀態；
ApplicationMaster相當于這個Application的監護人和管理者，負責監控、管理這個Application的所有Attempt在* cluster中各個節點上的具體運行，同時負責向Yarn ResourceManager申請資源、返還資源等；
Container是yarn中分配資源的一個單位，包涵內存、CPU等等資源，YARN以Container為單位分配資源；

ResourceManager 負責對各個 NadeManager 上資源進行統一管理和調度。當用戶提交一個應用程序時，需要提供一個用以跟蹤和管理這個程序的 ApplicationMaster，它負責向 ResourceManager 申請資源，并要求 NodeManger 啟動可以占用一定資源的任務。由于不同的 ApplicationMaster 被分布到不同的節點上，因此它們之間不會相互影響。

3.Yarn的工作機制

（1）MR程序提交到客戶端所在的節點。

（2）YarnRunner向ResourceManager申請一個Application。

（3）RM將該應用程序的資源路徑返回給YarnRunner。

（4）該程序將運行所需資源提交到HDFS上。

（5）程序資源提交完畢后，申請運行mrAppMaster。

（6）RM將用戶的請求初始化成一個Task。

（7）其中一個NodeManager領取到Task任務。

（8）該NodeManager創建容器Container，并產生MRAppmaster。

（9）Container從HDFS上拷貝資源到本地。

（10）MRAppmaster向RM 申請運行MapTask資源。

（11）RM將運行MapTask任務分配給另外兩個NodeManager，另兩個NodeManager分別領取任務并創建容器。

（12）MR向兩個接收到任務的NodeManager發送程序啟動腳本，這兩個NodeManager分別啟動MapTask，MapTask對數據分區排序。

（13）MrAppMaster等待所有MapTask運行完畢后，向RM申請容器，運行ReduceTask。

（14）ReduceTask向MapTask獲取相應分區的數據。

（15）程序運行完畢后，MR會向RM申請注銷自己。

4.Yarn的作業提交過程

Application在Yarn中的執行過程，整個執行過程可以總結為三步：

（1）應用程序提交

（2）啟動應用的ApplicationMaster實例

（3）ApplicationMaster 實例管理應用程序的執行

作業提交全過程詳解

（1）作業提交

第1步：Client調用job.waitForCompletion方法，向整個集群提交MapReduce作業。

第2步：Client向RM申請一個作業id。

第3步：RM給Client返回該job資源的提交路徑和作業id。

第4步：Client提交jar包、切片信息和配置文件到指定的資源提交路徑。

第5步：Client提交完資源后，向RM申請運行MrAppMaster。

（2）作業初始化

第6步：當RM收到Client的請求后，將該job添加到容量調度器中。

第7步：某一個空閑的NM領取到該Job。

第8步：該NM創建Container，并產生MRAppmaster。

第9步：下載Client提交的資源到本地。

（3）任務分配

第10步：MrAppMaster向RM申請運行多個MapTask任務資源。

第11步：RM將運行MapTask任務分配給另外兩個NodeManager，另兩個NodeManager分別領取任務并創建容器。

（4）任務運行

第12步：MR向兩個接收到任務的NodeManager發送程序啟動腳本，這兩個NodeManager分別啟動MapTask，MapTask對數據分區排序。

第13步：MrAppMaster等待所有MapTask運行完畢后，向RM申請容器，運行ReduceTask。

第14步：ReduceTask向MapTask獲取相應分區的數據。

第15步：程序運行完畢后，MR會向RM申請注銷自己。

（5）進度和狀態更新

YARN中的任務將其進度和狀態(包括counter)返回給應用管理器, 客戶端每秒(通過
mapreduce.client.progressmonitor.pollinterval設置)向應用管理器請求進度更新, 展示給用戶。

（6）作業完成

除了向應用管理器請求作業進度外, 客戶端每5秒都會通過調用waitForCompletion()來檢查作業是否完成。時間間隔可以通過
mapreduce.client.completion.pollinterval來設置。作業完成之后, 應用管理器和Container會清理工作狀態。作業的信息會被作業歷史服務器存儲以備之后用戶核查。

5.Yarn的調度器和調度算法

目前，Hadoop作業調度器主要有三種：FIFO、容量（Capacity Scheduler）和公平（Fair Scheduler）。Apache Hadoop3.1.3默認的資源調度器是Capacity Scheduler。

CDH框架默認調度器是Fair Scheduler。

具體設置詳見：yarn-default.xml文件

<property>
    <description>The class to use as the resource scheduler.</description>
    <name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

（1）先進先出調度器（FIFO）

FIFO調度器（First In First Out）：單隊列，根據提交作業的先后順序，先來先服務。

優點：簡單易懂；

缺點：不支持多隊列，生產環境很少使用；

（2）容量調度器（Capacity Scheduler）

Capacity Scheduler是Yahoo開發的多用戶調度器。

（3）公平調度器（Fair Scheduler）

Fair Schedulere是Facebook開發的多用戶調度器。

公平調度器缺額

公平調度器資源分配算法

公平調度器隊列資源分配方式

6.Yarn常用命令

Yarn狀態的查詢，除了可以在hadoop103:8088頁面查看外，還可以通過命令操作。常見的命令操作如下所示：

需求：執行wordCount案例，并用Yarn命令查看任務運行情況。

[atguigu@hadoop102 hadoop-3.1.3]$ myhadoop.sh start

[atguigu@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output

6.1 yarn application查看任務

（1）列出所有Application：

[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -list
2021-02-06 10:21:19,238 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL

（2）根據Application狀態過濾：yarn application -list -appStates （所有狀態：ALL、NEW、NEW_SAVING、SUBMITTED、ACCEPTED、RUNNING、FINISHED、FAILED、KILLED）

[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -list -appStates FINISHED
2021-02-06 10:22:20,029 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of applications (application-types: [], states: [FINISHED] and tags: []):1
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
application_1612577921195_0001	          word count	           MAPREDUCE	   atguigu	   default	          FINISHED	         SUCCEEDED	           100%	http://hadoop102:19888/jobhistory/job/job_1612577921195_0001

（3）Kill掉Application：

[atguigu@hadoop102 hadoop-3.1.3]$ yarn application -kill application_1612577921195_0001
2021-02-06 10:23:48,530 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application application_1612577921195_0001 has already finished

6.2 yarn logs查看日志

（1）查詢Application日志：yarn logs -applicationId

[atguigu@hadoop102 hadoop-3.1.3]$ yarn logs -applicationId application_1612577921195_0001

（2）查詢Container日志：yarn logs -applicationId -containerId

[atguigu@hadoop102 hadoop-3.1.3]$ yarn logs -applicationId application_1612577921195_0001 -containerId container_1612577921195_0001_01_000001

6.3 yarn applicationattempt查看嘗試運行的任務

（1）列出所有Application嘗試的列表：yarn applicationattempt -list

[atguigu@hadoop102 hadoop-3.1.3]$ yarn applicationattempt -list application_1612577921195_0001
2021-02-06 10:26:54,195 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total number of application attempts :1
         ApplicationAttempt-Id	               State	                    AM-Container-Id	                       Tracking-URL
appattempt_1612577921195_0001_000001	            FINISHED	container_1612577921195_0001_01_000001	http://hadoop103:8088/proxy/application_1612577921195_0001/

（2）打印ApplicationAttemp狀態：yarn applicationattempt -status

[atguigu@hadoop102 hadoop-3.1.3]$ yarn applicationattempt -status appattempt_1612577921195_0001_000001
2021-02-06 10:27:55,896 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Application Attempt Report : 
	ApplicationAttempt-Id : appattempt_1612577921195_0001_000001
	State : FINISHED
	AMContainer : container_1612577921195_0001_01_000001
	Tracking-URL : http://hadoop103:8088/proxy/application_1612577921195_0001/
	RPC Port : 34756
	AM Host : hadoop104
	Diagnostics :

6.4 yarn container查看容器

（1）列出所有Container：yarn container -list <ApplicationAttemptId>

[atguigu@hadoop102 hadoop-3.1.3]$ yarn container -list appattempt_1612577921195_0001_000001

2021-02-06 10:28:41,396 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032

Total number of containers :0

Container-Id Start Time Finish Time State Host Node Http Address

（2）打印Container狀態： yarn container -status

[atguigu@hadoop102 hadoop-3.1.3]$ yarn container -status container_1612577921195_0001_01_000001

2021-02-06 10:29:58,554 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032

Container with id ‘container_1612577921195_0001_01_000001’ doesn’t exist in RM or Timeline Server.

注：只有在任務跑的途中才能看到container的狀態

6.5 yarn node查看節點狀態

列出所有節點：yarn node -list -all

[atguigu@hadoop102 hadoop-3.1.3]$ yarn node -list -all
2021-02-06 10:31:36,962 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Total Nodes:3
         Node-Id	     Node-State	Node-Http-Address	Number-of-Running-Containers
 hadoop103:38168	        RUNNING	   hadoop103:8042	                           0
 hadoop102:42012	        RUNNING	   hadoop102:8042	                           0
 hadoop104:39702	        RUNNING	   hadoop104:8042	                           0

6.6 yarn rmadmin更新配置

加載隊列配置：yarn rmadmin -refreshQueues

[atguigu@hadoop102 hadoop-3.1.3]$ yarn rmadmin -refreshQueues
2021-02-06 10:32:03,331 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8033

6.7 yarn queue查看隊列

打印隊列信息：yarn queue -status <QueueName>

[atguigu@hadoop102 hadoop-3.1.3]$ yarn queue -status default
2021-02-06 10:32:33,403 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.10.103:8032
Queue Information : 
Queue Name : default
	State : RUNNING
	Capacity : 100.0%
	Current Capacity : .0%
	Maximum Capacity : 100.0%
	Default Node Label expression : <DEFAULT_PARTITION>
	Accessible Node Labels : *
	Preemption : disabled
	Intra-queue Preemption : disabled

7.Yarn的生產核心參數

8.Yarn常見面試題

8.1 yarn主要作用

YARN 的基本設計思想是將MapReduce V1 中的JobTracker 拆分為兩個獨立的服務：ResourceManager 和ApplicationMaster。ResourceManager 負責整個系統的資源管理和分配，ApplicationMaster 負責單個應用程序的的管理。

8.2 yarn的結構

關于yarn的結構博主在上面詳細的介紹了，請移步上文。

8.3 說一下關于yarn的幾種資源調度器

這個在博主的博客中etl實習面試里面有，附上一個超鏈接捏（
https://blog.csdn.NET/h123456789999999/article/details/125305835?spm=1001.2014.3001.5502）

8.4 簡單介紹三個組件的作用？

RM：負責所有資源的監控、分配和管理

AM：負責每一個具體應用程序的調度和協調

--applicationmaster。用戶提交的每個應用程序均包含一個AM，它可以運行在RM以外的機器上。

NM：負責每一個節點的維護。

--nodemanger負責該節點的程序的正常運行，定時向RM匯報本節點資源（cpu、內存）的使用情況和Container的運行狀況。當Rm宕機后連接RM的備用節點。負責接收并處理來自AM的Container的啟動、停止等各種請求。

8.5 什么是container？

是一個抽象概念，稱之為容器，包含任務運行時所需的資源（包括內存、硬盤、cpu等）和環境（包含啟動命令、環境變量等）

8.6 yarn的執行流程？

①客戶端向集群提交一個任務，該任務首先到RM中的AM

②AM收到任務后，會在集群中找一個NodeManger，在該NodeManger上啟動一個APPMaster進程。該進程用于執行任務劃分和任務監控。

③AppMaster啟動起來之后，會向RM中的AM注冊信息，APPMaster向RM下的ResourceSchedule申請計算任務所需的資源。

④AppMaster申請到資源之后，會與所有NodeManger通信要求他們啟動所有計算任務（map和reudce）

⑤各個NM啟動對應的容器Container用來執行Map和Reduce任務。

⑥各個任務會向APPMaster匯報自己的執行進度和執行狀況，以便讓AppMaster隨時掌握各個任務的運行狀態，在某個任務出了問題之后重啟執行該任務。

⑦在執行完之后，APPMaster會向AM匯報，以便讓ApplicationManger注銷并關閉自己，使得資源得以回收。

**關于Yarn的知識先拓展這些，博主想和大家說如果拿到了自己的offer一定要問清楚再去，否則會賊慘，歇了一周繼續碼吧。。

日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

大數據崗位必問面試題，關于資源調度器yarn的使用以及執行流程