如何在 hadoop 中序列化对象(在 HDFS 中)

时间：2023-05-05

本文介绍了如何在 hadoop 中序列化对象(在 HDFS 中)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我有一个 HashMap <字符串，数组列表 <整数 > >.我想将我的 HashMap 对象(hmap)序列化到 HDFS 位置，然后在 Mapper 和 Reducers 将其反序列化以使用它.

I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.

为了在 HDFS 上序列化我的 HashMap 对象，我使用了如下的普通 java 对象序列化代码，但出现错误(权限被拒绝)

To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)

try
        {
            FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
            ObjectOutputStream out = new ObjectOutputStream(fileOut);
            out.writeObject(hm);
            out.close();

        }
        catch(Exception e)
        {
            e.printStackTrace();
        }

我遇到了以下异常

java.io.FileNotFoundException: hashmap.ser (Permission denied)
    at java.io.FileOutputStream.open(Native Method)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
    at KMerIndex.createIndex(KMerIndex.java:121)
    at MyDriverClass.formRefIndex(MyDriverClass.java:717)
    at MyDriverClass.main(MyDriverClass.java:768)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

有人可以建议或分享如何在 hdfs 上的 hadoop 中序列化对象的示例代码吗?

Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?

推荐答案

请尝试使用 SerializationUtils 来自 Apache Commons Lang.

Please try using SerializationUtils from Apache Commons Lang.

下面是方法

static Object   clone(Serializable object)  //Deep clone an Object using serialization.
static Object   deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object   deserialize(InputStream inputStream)  //Deserializes an Object from the specified stream.
static byte[]   serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.

在存储到 HDFS 时，您可以存储从序列化返回的 byte[].在获取对象时，您可以将类型转换为相应的对象，例如:文件对象并可以将其取回.

While storing in to HDFS you can store byte[] which was returned from serialize. While getting the Object you can type cast to corresponding object for ex: File object and can get it back.

在我的例子中，我在 Hbase 列中存储了一个哈希图，我在我的映射器方法中将它检索回来，作为 Hashmap .. 并且成功了.强>

当然，你也可以用同样的方法...

Surely, you can also do that in the same way...

另一件事是你也可以使用 Apache Commons IO 参考这个 (org.apache.commons.io.FileUtils);但稍后您需要将此文件复制到 HDFS.因为您希望 HDFS 作为数据存储.

Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils); but later you need to copy this file to HDFS. since you wanted HDFS as datastore.

FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);

注意: jar apache commons io 和 apache commons lang 在 hadoop 集群中始终可用.

Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.

这篇关于如何在 hadoop 中序列化对象(在 HDFS 中)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：在 hadoop 上解析 Stackoverflow 的 posts.xml 下一篇：等效于 mongo 的 out:reduce 选项在 hadoop

相关文章

Java MapReduce 按日期计数

在 RIAK 上获取 MapReduce 结果(使用 Java 客户端)

Hadoop 框架中使用的属性的完整列表

从远程系统提交 mapreduce 作业时出现异常

Hadoop:reducer 的数量不等于我在程序中设置的数量

如何通过 API 访问 Hadoop 计数器值?

HADOOP :: java.lang.ClassNotFoundException: WordCount

为什么 hadoop 无法识别我的 Map 类?

Java mapToInt vs Reduce with map

org.apache.hadoop.mapreduce.counters.LimitExceededException: