Hadoop之check the logs or run fsck in order to identify the missing blocks

  • 2020 年 3 月 26 日
  • 筆記

    hadoop版本是2.8.3

    今天发现有奇怪的问题,如下List-1所示,提示有俩个文件块丢失

List-1

There are 2 missing blocks. The following files may be corrupted:    blk_1073857294	/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar  blk_1073857295	/tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-  hcatalog-core-3.0.0.jar    Please check the logs or run fsck in order to identify the missing blocks. See the Hadoop FAQ for common causes and potential solutions.

    由于是/tmp目录下,不是正常的业务数据,我们直接删除,如下List-2,之后再去看hdfs的页面,无此问题了。

List-2

[xx@xxx hadoop]# hadoop  fsck -delete  DEPRECATED: Use of this script to execute hdfs command is deprecated.  Instead use the hdfs command for it.    Connecting to namenode via http://xxxx:50070/fsck?ugi=root&delete=1&path=%2F  FSCK started by root (auth:SIMPLE) from /10.42.5.26 for path / at Wed Mar 25 12:35:39 CST 2020  ..............................................................................  /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar: CORRUPT blockpool BP-604784226-10.42.1.102-1577681916881 block blk_1073857294    /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-exec-2.1.1.jar: MISSING 1 blocks of total size 32441258 B..  /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-hcatalog-core-3.0.0.jar: CORRUPT blockpool BP-604784226-10.42.1.102-1577681916881 block blk_1073857295    /tmp/xxx/b9a11fe8-306a-42cc-b49f-2a7f098ecb5a/hive-hcatalog-core-3.0.0.jar: MISSING 1 blocks of total size 269009 B......................  ...

    原因分析: 

    数据是按blk_1073857294、blk_1073857295方式存储在hdfs上的,删除了blk_1073857294、blk_1073857295后,元数据还在,但是数据块不在了,才报的这个错,但是这部分数据其实我不需要了,所以就直接把出异常的文件块的元数据信息也删除就可以了。

Reference

1.https://blog.csdn.net/lsr40/article/details/79426333