L ͷ ڵ ڵ

c++ ڸ ϱ 8 Ʈ ũ char Ÿ ̿ܿ 16 Ʈ ũ(쿡 16 Ʈ н 迭 32 Ʈ Ѵٰ մϴ.) ̵ ij͸ ϴ wchar_t Ÿ ϰ ֽϴ. wchar_t ŸԿ MBCS/DBCS UCSUTF-16(쿡 little endian ⺻ մϴ.) ڵ ֽϴ. , wchar_t Բ ̵ ij/ڿ ϱ “L” մϴ.


ڿ ڵ Ʈ globalization ʿ ̽ ϳ MSDN ˻ ٷ ˻ ֽϴ. ڵ ij ظ ʼ׸ ϰ ִ Joel Spolsky մϴ. jrogue “ Ʈ” ѱ۷ε о ֽϴ.

The Complete Guide to C++ Strings, Part I - Win32 Character Encodings Դϴ. ѱ۷ε Ǿ ֽϴ.([1], [2])

wchar_t ڵ带 ٷ鼭 ߴ Ϸ縦 ̾߱Դϴ.

wcslen() ޼ҵ带 L”" ڿ ڼ ˾ƿµ ߽ϴ.

setlocale(LC_ALL, ".949");
wchar_t *wstr1 = L"abc";
printf("length: %d", wcslen(wstr1));

---  ---
length: 3

ϴ ϴ. ׷ ڵ ̻մϴ.

setlocale(LC_ALL, ".949");
wchar_t *wstr2 = L"";
printf("length: %d", wcslen(wstr2));

---  ---
length: 6

и Ÿ wchar_t ̹Ƿ 3 ; char Ÿ԰ strlen() Լ 6 ɴϴ. ׷ ޸𸮸 غýϴ. L”abc” USC-16 little endian Ȯ Ǿϴ.

61 00 62 00 63 00 00 00

׷ ι° L””

B0 00 A1 00 B3 00 AA 00 B4 00 D9 00 00 00

(ڵ ѱ ڵ̺ KS-5601 ڵ̺ ϼ)

“” DBCS ڵ ܼ 16Ʈ ϰ ִ ̿ϴ. “” DBCS ڵ ϴ.

B0 A1 B3 AA B4 D9 00

Ȥ ƴ ;, vmware ѱ 98 SE ѱ 2000 ׽Ʈ .

̹ MBCS ڿ ڵ(USC-16 little endian) ȯ wchar_t ҽϴ.

setlocale(LC_ALL, ".949");
char *str2 = "";
wchar_t wstr2_uni[blocksize];
memset(wstr2_uni, 0, blocksize * sizeof(wchar_t));
mbstowcs(wstr2_uni, str2, _mbstrlen(str2));

--- wstr2_uni ޸  ---
00 AC 98 B0 E4 B2 00 00

̷ϴ ó ߴ USC-16 little endian Ǵ.



  • L”" ڿ ڵ ڵ ʴ´. ٸ ڿ 16Ʈ wchar_t ̴. (̷ Ǿ Ascii ڵ ڵ ֽϴ.)
  • ڵ ڵ ڿ ؼ MBSC ڿ ڵ ȯ ؾ ȴ.

װ ׷ٰ ġ, ƽ ڵ 鿡 L”" ͷ ̿ Ư ãƺ ٴ Դϴ.

Ƹ ڵ ʿ ƴ κ (Ȥ ƾ迭 ) ϴ ۼ ̰ų ̱ ̶ ϴ.

ps) ʳⰣ Ʈ 簡 귶 globalization ޼ϱ 䱸 ƴұ մϴ.

update

Ȥó ϰ #pragma غ #pragma setlocale() ߽߰ϴ. L”" ͷ ذǴ .

#pragma setlocale(".949")

˾ƺ Ʈ ׷ϴ.

2005-05-24 9:26 PM
permalink

ĿƮ Ʈ rss (3)

Ʈ ּ: http://codian.net/wp/archives/255/trackback/

오늘 배운 내용들.

1. wchar_t 와 인코딩과의 관계. 예전부터 많이 헷갈렸런 내용인데, 어느정도 정리를 해서 위키에 올렸다. 배우면서 든 생각은 인코딩 라이브러리를 공부해야 겠다는 것하고 왠만하면 xml로 저장…

2006-12-3 2:05 AM | trackback by pok씨의 세상사는 이야기 | permalink

Ʈ µ, ̳׿..
а ϴ.

2006-12-3 10:55 AM | comment by pok | permalink

Ʈ euc-kr ڵǾ ־ ׷̴ϴ.
utf-8 Ѵٰ ε ֳ׿

2006-12-3 11:05 PM | comment by codian | permalink

ĿƮ ۼ