在 Hadoop 中更改文件拆分大小

时间：2023-05-04

本文介绍了在 Hadoop 中更改文件拆分大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我在 HDFS 目录中有一堆小文件.虽然文件的体积相对较小，但每个文件的处理时间量很大.也就是说，一个 64mb 文件，它是 TextInputFormat 的默认分割大小，甚至需要几个小时来处理.

I have a bunch of small files in an HDFS directory. Although the volume of the files is relatively small, the amount of processing time per file is huge. That is, a 64mb file, which is the default split size for TextInputFormat, would take even several hours to be processed.

我需要做的是减小分割大小，这样我就可以利用更多节点来完成一项工作.

What I need to do, is to reduce the split size, so that I can utilize even more nodes for a job.

所以问题是，如何以 10kb 来分割文件?我是否需要为此实现自己的 InputFormat 和 RecordReader ，或者是否需要设置任何参数?谢谢.

So the question is, how is it possible to split the files by let's say 10kb? Do I need to implement my own InputFormat and RecordReader for this, or is there any parameter to set? Thanks.

在 Hadoop 中更改文件拆分大小

问题描述

推荐答案

相关文章