Java中字符的大小不是2字节吗?

时间：2023-04-07

本文介绍了Java中字符的大小不是2字节吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

我使用 RandomAccessFile 从文本文件中读取 byte.

I used RandomAccessFile to read a byte from a text file.

public static void readFile(RandomAccessFile fr) {
    byte[] cbuff = new byte[1];
    fr.read(cbuff,0,1);
    System.out.println(new String(cbuff));
}

为什么我看到一个完整的字符被它读取?

Why am I seeing one full character being read by this?

推荐答案

一个char在Java中表示一个字符^(*).它是 2 字节大(或 16 位).

A char represents a character in Java ^(*). It is 2 bytes large (or 16 bits).

这并不一定意味着一个字符的每个表示都是 2 个字节长.事实上，许多字符编码只为每个字符保留 1 个字节(或为最常见的字符保留 1 个字节)字符).

That doesn't necessarily mean that every representation of a character is 2 bytes long. In fact many character encodings only reserve 1 byte for every character (or use 1 byte for the most common characters).

当您调用 String(byte[]) 构造函数你要求 Java 将 byte[] 转换为String 使用平台的默认字符集.由于平台默认字符集通常是 1 字节编码(例如 ISO-8859-1)或可变长度编码(例如 UTF-8)，因此它可以轻松地将 1 字节转换为单个字符.

When you call the String(byte[]) constructor you ask Java to convert the byte[] to a String using the platform's default charset. Since the platform default charset is usually a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.

如果您在使用 UTF-16(或 UTF-32 或 UCS-2 或 UCS-4 或 ...)作为平台默认编码的平台上运行该代码，那么您将不会得到有效的结果(您' 将获得一个包含 Unicode 替换字符的 String).

If you run that code on a platform that uses UTF-16 (or UTF-32 or UCS-2 or UCS-4 or ...) as the platform default encoding, then you will not get a valid result (you'll get a String containing the Unicode Replacement Character instead).

这就是您不应该依赖平台默认编码的原因之一:在 byte[] 和 char[]/String 或 InputStream 和 Reader 或 OutputStream 和 Writer 之间，你应该总是指定您要使用的编码.如果您不这样做，那么您的代码将依赖于平台.


That's one of the reasons why you should not depend on the platform default encoding: when converting between byte[] and char[]/String or between InputStream and Reader or between OutputStream and Writer, you should always specify which encoding you want to use. If you don't, then your code will be platform-dependent.
^{(*) 不完全是:一个 char 代表一个 UTF-16 代码单元.一个或两个 UTF-16代码单元代表一个Unicode 代码点.一个 Unicode 码点通常代表一个字符，但有时多个 Unicode 码点用于组成一个字符.但是上面的近似值已经足够接近讨论手头的话题了.}
^{(*) that's not entirely true: a char represents a UTF-16 code unit. Either one or two UTF-16 code units represent a Unicode code point. A Unicode code point usually represents a character, but sometimes multiple Unicode code points are used to make up a single character. But the approximation above is close enough to discuss the topic at hand.}

                        这篇关于Java中字符的大小不是2字节吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！



上一篇：将每 N 个字符的 char 放入一个 java 字符串中 
下一篇：获取字符的扫描仪方法 

 
相关文章

     
    
如何将char数组转换回字符串?
Java - char、int 转换
Java:从 char 中解析 int 值
加载程序约束违规
是否可以仅从一个特定分支触发 Jenkins?
如何强制使用 @Override 注释?
Maven 构建只更改了文件
持续集成服务器
用于 Java 项目的 Hudson 和 CruiseControl 有什么区别?
JavaFX 17 之后 Leaflet 在 WebEngine 中不起作用