L ͷ ڵ ڵ
c++ ڸ ϱ 8 Ʈ ũ char Ÿ ̿ܿ 16 Ʈ ũ(쿡 16 Ʈ н 迭 32 Ʈ Ѵٰ մϴ.) ̵ ij ϴ wchar_t Ÿ ϰ ֽϴ. wchar_t ŸԿ MBCS/DBCS UCSUTF-16(쿡 little endian ⺻ մϴ.) ڵ ֽϴ. , wchar_t Բ ̵ ij/ڿ ϱ “L” մϴ.
ڿ ڵ Ʈ globalization ʿ ̽ ϳ MSDN ˻ ٷ ˻ ֽϴ. ڵ ij ظ ʼ ϰ ִ Joel Spolsky մϴ. jrogue “ Ʈ” ѱ۷ε о ֽϴ.
The Complete Guide to C++ Strings, Part I - Win32 Character Encodings Դϴ. ѱ۷ε Ǿ ֽϴ.([1], [2])
wchar_t ڵ带 ٷ鼭 ߴ Ϸ縦 ̾߱Դϴ.
wcslen() ҵ带 L”" ڿ ڼ ˾ƿµ ߽ϴ.
setlocale(LC_ALL, ".949");
wchar_t *wstr1 = L"abc";
printf("length: %d", wcslen(wstr1));
--- ---
length: 3
ϴ ϴ. ڵ ̻մϴ.
setlocale(LC_ALL, ".949");
wchar_t *wstr2 = L"";
printf("length: %d", wcslen(wstr2));
--- ---
length: 6
и Ÿ wchar_t ̹Ƿ 3 ; char Ÿ strlen() Լ 6 ɴϴ. غýϴ. L”abc” USC-16 little endian Ȯ Ǿϴ.
61 00 62 00 63 00 00 00
ι° L””
B0 00 A1 00 B3 00 AA 00 B4 00 D9 00 00 00
(ڵ ѱ ڵ̺ KS-5601 ڵ̺ ϼ)
“” DBCS ڵ ܼ 16Ʈ ϰ ִ ̿ϴ. “” DBCS ڵ ϴ.
B0 A1 B3 AA B4 D9 00
Ȥ ƴ ;, vmware ѱ 98 SE ѱ 2000 Ʈ .
̹ MBCS ڿ ڵ(USC-16 little endian) ȯ wchar_t ҽϴ.
setlocale(LC_ALL, ".949"); char *str2 = ""; wchar_t wstr2_uni[blocksize]; memset(wstr2_uni, 0, blocksize * sizeof(wchar_t)); mbstowcs(wstr2_uni, str2, _mbstrlen(str2)); --- wstr2_uni --- 00 AC 98 B0 E4 B2 00 00
̷ϴ ó ߴ USC-16 little endian Ǵ.
- L”" ڿ ڵ ڵ ʴ´. ٸ ڿ 16Ʈ wchar_t ̴. (̷ Ǿ Ascii ڵ ڵ ֽϴ.)
- ڵ ڵ ڿ ؼ MBSC ڿ ڵ ȯ ؾ ȴ.
װ ٰ ġ, ƽ ڵ 鿡 L”" ͷ ̿ Ư ãƺ ٴ Դϴ.
Ƹ ڵ ʿ ƴ κ (Ȥ ƾ迭 ) ϴ ۼ ̰ų ̱ ̶ ϴ.
ps) ʳⰣ Ʈ 簡 귶 globalization ϱ 䱸 ƴұ մϴ.
update
Ȥó ϰ #pragma غ #pragma setlocale() ߽߰ϴ. L”" ͷ ذǴ .
#pragma setlocale(".949")
˾ƺ Ʈ ϴ.
permalink

오늘 배운 내용들.
1. wchar_t 와 인코딩과의 관계. 예전부터 많이 헷갈렸런 내용인데, 어느정도 정리를 해서 위키에 올렸다. 배우면서 든 생각은 인코딩 라이브러리를 공부해야 겠다는 것하고 왠만하면 xml로 저장…
Ʈ µ, ̳..
а ϴ.
Ʈ euc-kr ڵǾ ־ ̴ϴ.
utf-8 Ѵٰ ε ֳ