HDFS追加數據報錯解決辦法
主要的兩個錯誤,今天晚上一直輪着報:
第一個
2022-10-25 21:37:11,901 WARN hdfs.DataStreamer: DataStreamer Exception
java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.88.151:9866,DS-4b3969ed-e679-4990-8d2e-374f24c1955d,DISK]], original=[DatanodeInfoWithStorage[192.168.88.151:9866,DS-4b3969ed-e679-4990-8d2e-374f24c1955d,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1304)
at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1372)
at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1598)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1499)
at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1481)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:719)
appendToFile: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[192.168.88.151:9866,DS-4b3969ed-e679-4990-8d2e-374f24c1955d,DISK]], original=[DatanodeInfoWithStorage[192.168.88.151:9866,DS-4b3969ed-e679-4990-8d2e-374f24c1955d,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration.
這個我看了看文章還是比較能理解,看網上意思是無法寫入;即假如你的環境中有3個datanode,備份數量設置的是3。在寫操作時,它會在pipeline中寫3個機器。默認replace-datanode-on-failure.policy是DEFAULT,如果系統中的datanode大於等於3,它會找另外一個datanode來拷貝。目前機器只有3台,因此只要一台datanode出問題,就一直無法寫入成功。
當時我也是在進行操作時只打開了集群中的一台機器,然後datanode應該是集群中的數量,但是我只有一個,所以報錯了。然後我讓三台機器都啟動了Hadoop,然後就可以了。
網上普遍的方法是修改hdfs-core.xml文件,如下:
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
<value>NEVER</value>
</property>
<property>
<name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
<value>true</value>
</property>
網上對節點響應是這麼解釋的,我覺得對於二、三節點這個可以留意一下:dfs.client.block.write.replace-datanode-on-failure.policy,default在3個或以上備份的時候,是會嘗試更換結點嘗試寫入datanode。而在兩個備份的時候,不更換datanode,直接開始寫。對於3個datanode的集群,只要一個節點沒響應寫入就會出問題,所以可以關掉。
第二個
appendToFile: Failed to APPEND_FILE /2.txt for DFSClient_NONMAPREDUCE_505101511_1 on 192.168.88.151 because this file lease is currently owned by DFSClient_NONMAPREDUCE_-474039103_1 on 192.168.88.151
appendToFile: Failed to APPEND_FILE /2.txt for DFSClient_NONMAPREDUCE_814684116_1 on 192.168.88.151 because lease recovery is in progress. Try again later.
第二個就是這兩句話來回搗,但是我看because後都是lease,我感覺應該是一個問題,這個應該就是網上所說的節點響應吧,我看第一句後面提到了owned by DFSClient,估計是我只開了一個機器的問題,因為後面的IP就是第一台機器,如果你們全開了,拿應該就是節點響應問題,這就回到了上面,就去老老實實修改文件吧。