unicode:bom_to_encoding/1
检测一个二进制数据的 UTF 字节顺序标记
用法:
bom_to_encoding(Bin) -> {Encoding, Length}
内部实现:
-spec bom_to_encoding(Bin) -> {Encoding, Length} when Bin :: binary(), Encoding :: 'latin1' | 'utf8' | {'utf16', endian()} | {'utf32', endian()}, Length :: non_neg_integer(). bom_to_encoding(<<239,187,191,_/binary>>) -> {utf8,3}; bom_to_encoding(<<0,0,254,255,_/binary>>) -> {{utf32,big},4}; bom_to_encoding(<<255,254,0,0,_/binary>>) -> {{utf32,little},4}; bom_to_encoding(<<254,255,_/binary>>) -> {{utf16,big},2}; bom_to_encoding(<<255,254,_/binary>>) -> {{utf16,little},2}; bom_to_encoding(Bin) when is_binary(Bin) -> {latin1,0}.
检测一个二进制数据 Bin 的 UTF 字节顺序标记(Byte Order Mark)
unicode:bom_to_encoding(<<16#FE, 16#FF>>).
unicode:bom_to_encoding(<<16#EF, 16#BB, 16#BF>>).
unicode:bom_to_encoding(<<16#FF, 16#FE>>).
如果找不到字节顺序标记,则返回 {latin1,0}。
unicode:bom_to_encoding(<<123456>>).
下面把读入的文件 test.txt 的编码 encoding 设置为输出端的编码:
{ok, File} = file:open("test.txt", [read, binary]), {ok, Bin} = file:read(File, 4), {Encoding, _Length} = unicode:bom_to_encoding(Bin), io:setopts(File, [{encoding, Encoding}]).