如何优雅地使用C++读取并处理UTF-16编码的文本

如何优雅地使用C++读取并处理UTF-16编码的文本,第1张

fopen("xxxtxt", "w,ccs=UTF-8");

仅针对新版的VC有效。60不行的样子,2008记得可以,2005及以前不明

wfstream可以通过设置locale的方法实现我记得,不过VC的话似乎到了2012才给你utf8的类。

还是用C的接口的方式比较容易的样子

网页编码你可以自己设置,你设置成什么,别人客户端浏览时会自动根据你的设置使用编码的!

js获取网页编码的方法:

IE下用:documentcharset

Firefox下用:documentcharacterSet

asp的。没有测试,你可以去google搜索一下asp获取网页编码

function checkcode(path)

set objstream=servercreateobject("adodbstream")

objstreamType=1

objstreammode=3

objstreamopen

objstreamPosition=0

objstreamloadfromfile path

bintou=objstreamread(2)

If AscB(MidB(bintou,1,1))=&HEF And AscB(MidB(bintou,2,1))=&HBB Then

checkcoder="utf-8"

ElseIf AscB(MidB(bintou,1,1))=&HFF And AscB(MidB(bintou,2,1))=&HFE Then

checkcode="unicode"

Else

checkcode="gb2312"

End If

objstreamclose

set objstream=nothing

end function

//(1)引入namespace

using SystemText;

//(2)定义源编码utf-8和目标编码Shift_JIS 的Encoding类实例

Encoding myEncoding = EncodingGetEncoding("Shift_JIS");

Encoding utf8Encoding = EncodingGetEncoding("utf-8");

string unf8String = "你要转化的utf-8编码的字符串";

//(3)获取要转化(源utf-8编码的)string的比特数组

byte[] unf8Bytes = utf8EncodingGetBytes(unf8String);

//(4)将源比特数组 转化为Shift_JIS编码的比特数组

byte[] myBytes = EncodingConvert(utf8Encoding, myEncoding, unf8Bytes);

//(5)将转化后的比特数组输出为string

string myString = myEncodingGetString(myBytes);

总结:

实际编码转化由SystemTextEncodingConvert()方法负责

因为该方法 *** 作的对象是byte[]数组,所以需要先将String转换为

byte[]数组,之后进行编码钻换 *** 作得到目标byte[]数组,最后再将

这个数组转化为String

下面是net指出的编码列表,或者叫代码页(Code Page)

>

File in = new File(args[0]);

InputStreamReader r = new InputStreamReader(new FileInputStream(in));

Systemoutprintln(rgetEncoding());

VB60可声明一个Object 数据类型,使用它的Charset属性为"UTF8",可解决TextBox控件中UTF8文本显示乱码问题。

Object 数据类型,Object 变量存储为 32 位(4

个字节)的地址形式,其为对象的引用。利用 Set 语句,声明为 Object

的变量可以赋值为任何对象的引用。

Charset 属性,设置或者返回字体中所用字符集。

实现代码:

Private Sub Command1_Click() '打开UTF-8文本

  Dim Ados As Object

  CommonDialog1FileName = ""

  CommonDialog1Filter = "文本文件(txt)|txt|"

  CommonDialog1Action = 1

  str = CommonDialog1FileName

  Set Ados = CreateObject("adodbstream")

  With Ados

      Charset = "utf-8"

      Type = 2

      Open

      LoadFromFile str

       Text1Text = ReadText

      Close

  End With

  Set Ados = Nothing

End Sub

转化一下就可以,以下代码测试通过:

Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long

Private Const CP_UTF8 = 65001

Private Function Utf8ToUnicode(ByRef Utf() As Byte) As String

Dim lRet As Long

Dim lLength As Long

Dim lBufferSize As Long

lLength = UBound(Utf) - LBound(Utf) + 1

If lLength <= 0 Then Exit Function

lBufferSize = lLength 2

Utf8ToUnicode = String$(lBufferSize, Chr(0))

lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(Utf(0)), lLength, StrPtr(Utf8ToUnicode), lBufferSize)

If lRet <> 0 Then

Utf8ToUnicode = Left(Utf8ToUnicode, lRet)

Else

Utf8ToUnicode = ""

End If

End Function

Private Sub Command1_Click()

Dim aa() As Byte

Dim t As String, t1 As String, t2 As String

aa = Inet1OpenURL(Text1Text, icByteArray) 'URL在Text1里输入

If IsUTF8(aa) Then

t = Utf8ToUnicode(aa)

Else

t = StrConv(aa, vbUnicode)

End If

t1 = InStr(t, "<title>") + Len("<title>")

t2 = InStr(t, "</title>")

Text2Text = Mid(t, t1, t2 - t1)

End Sub

'判断网页编码函数

Private Function IsUTF8(Bytes) As Boolean

Dim i As Long, AscN As Long, Length As Long

Length = UBound(Bytes) + 1

If Length < 3 Then

IsUTF8 = False

Exit Function

ElseIf Bytes(0) = &HEF And Bytes(1) = &HBB And Bytes(2) = &HBF Then

IsUTF8 = True

Exit Function

End If

Do While i <= Length - 1

If Bytes(i) < 128 Then

i = i + 1

AscN = AscN + 1

ElseIf (Bytes(i) And &HE0) = &HC0 And (Bytes(i + 1) And &HC0) = &H80 Then

i = i + 2

ElseIf i + 2 < Length Then

If (Bytes(i) And &HF0) = &HE0 And (Bytes(i + 1) And &HC0) = &H80 And (Bytes(i + 2) And &HC0) = &H80 Then

i = i + 3

Else

IsUTF8 = False

Exit Function

End If

Else

IsUTF8 = False

Exit Function

End If

Loop

If AscN = Length Then

IsUTF8 = False

Else

IsUTF8 = True

End If

End Function

以上就是关于如何优雅地使用C++读取并处理UTF-16编码的文本全部的内容,包括:如何优雅地使用C++读取并处理UTF-16编码的文本、js 或者asp怎么获取本地的编码方式是UTF-8,还是GB2312 然后根据跳转页面、如何获得字符串编码类型等相关内容解答,如果想了解更多相关内容,可以关注我们,你们的支持是我们更新的动力!

欢迎分享,转载请注明来源:内存溢出

原文地址:https://www.54852.com/web/10089845.html

(0)
打赏 微信扫一扫微信扫一扫 支付宝扫一扫支付宝扫一扫
上一篇 2023-05-05
下一篇2023-05-05

发表评论

登录后才能评论

评论列表(0条)

    保存