Discussing the nuts and bolts of software development

Friday, July 20, 2007


How do I convert a wchar_t to a char?

How do I convert a wchar_t to a char? How do I convert a char to a wchar_t?

It depends! Welcome to the wonderful world of string conversion.

You are probably asking this because you have a std::wstring or wchar_t* and need to pass it to a function which takes a std::string or char*. You just need to know the name of the conversion function to use, right? The problem is that the way to do the conversion varies depending on the context of your code.

Before asking how to convert a wchar_t to a char, there are other questions you should be asking first such as "What encoding is my wchar_t using?" and more important "What encoding does my function expect that char param to be in?"

Distinguish Storage From Encoding

char's and wchar_t's just represent storage space, defined by the compiler. For example, Microsoft VC defines a char as 1 byte of storage, and a wchar_t as 2 bytes. Some versions of GCC define wchar_t as 4 bytes.

What matters is the encoding used for the data contained within those bytes. Is it ascii? If so, what code page is being used? Or maybe it's unicode? If so, what unicode encoding is being used?

The encoding tells you how to interpret the data, and thus how you'll need to convert it.

Encoding Within The Wide String

You need to know the encoding of the wide string, ie. the way to interpret the data stored in each wchar_t. This is typically UTF-16 or UCS-2 when wchar_t's are 2 bytes, UTF-32 when wchar_t's are 4 bytes.

For example, let's say you had an array of wchar_t on Windows and the first wchar_t's value in hex was 2D25. This is probably UTF-16 encoding, and represents the Georgian small letter 'hoe' (http://www.unicode.org/charts/PDF/U2D00.pdf).

But who knows? Although very unlikely, it's possible that the string was read in from a source which was stuffed with 2 UTF-8 characters in each wchar_t, in which case the value 2D25 represents '-' (2D) and '%' (25).

The point is, in order to know what encoding your source string is in, you must understand the context of the code. How did you obtain this string? If it was from a Windows file function, then probably it is UTF-16 encoded. If it was from a 3rd-party library, then consult the 3rd party documentation just to be sure.

Encoding Within The Narrow String

Next, you need to know the encoding that the function expects the char* parameter to be in. Again, it's all about context. Consult the function documentation. Old unix-style functions such as unlink and rmdir generally expect the char* string to be ascii-encoded using the current locale of the OS. Other functions from 3rd-party libraries might expect the char* to be a UTF-8 encoded string, etc.

_Be careful!_ It's easy to confuse us-ascii with UTF-8 because the first 128 characters (hex values 00 to 7F) represent the same symbols. For example, value 6B represents 'k' in both us-ascii and UTF-8. It's only once you get into higher values that they get out of synch. This is why developers often think they got the used the right encoding, until their product ships internationally, and some important executives freak out because the é, ç and ä in their names are garbled.

Time To Convert!

Once you know the platform, source encoding and destination encoding, you are ready to convert. There are numerous different conversion utilities on the web, so start searching! If you google "Convert UTF16 to UTF8 on Windows" for example, this can yield better results than "convert wchar_t to char".

I did 5 minutes of googling just now and was able to find a few links to get you started:

Converting from UTF16 to UTF8 and vice versa on Windows:

Converting from UTF16 to a given Windows code page:

Converting to ascii using the current code page on Unix:

Lossless and Lossy Conversions

_WARNING:_ If you are converting from one unicode encoding to another (for example, UTF-16 to UTF-8), then this will be a lossless conversion. You can convert back and forth as many times as you need without losing any encoding information.

If on the other hand you convert a unicode string to an ascii string, this is a lossy conversion. You will not be able to convert back without knowledge of the ascii code page or locale used for the initial conversion.

For example, if I send a char* ascii string over the wire to another machine, the receiving end will not be able to convert it back to a wchar_t* unicode string without knowing what locale my machine was in when I built the ascii string in the first place.

If there's one principle to remember when working with string conversion, it's that a good programmer is aware of the context in which he's working at all times. Take the extra minute to understand the source and destination encodings, the platform and the locale, and you will be rewarded with a lower bug count once your localized product hits international markets.

Labels: , ,

In my opinion, you can always use this source if you want to get a review on ExpressVPN. It was useful at least for me and my friends
you can use a short function wchar_t array into a char array. remember one thing the character is not ANSI code is (0 - 127) are replaced by "?"

size_t to_narrow(const wchar_t * src, char * dest, size_t dest_len){
size_t i;
wchar_t code;

i = 0;

while (src[i] != '\0' && i < (dest_len - 1)){
code = src[i];
if (code < 128)
dest[i] = char(code);
dest[i] = '?';
if (code >= 0xD800 && code <= 0xD8FF)
// lead surrogate, skip the next code unit, which is the trail

dest[i] = '\0';

return i - 1;

TreasureBox is operated by a group of young, passionate, and ambitious people that are working diligently towards the same goal - make your every dollar count, as we believe you deserve something better.
Check out the best
tv stand nz
bike stand nz
Reliable religion & theology coursework writing services are not hard to come across for those in need of Religious Research Writing Services and theology & religion essay writing services.

After all considerations, it is possible and beneficial to purchase essays online. Students of the twenty first century need not struggle in outsourcing the online paper writing services necessary material that will aid with their essay writing tasks. You can gain access to books which will give you the gist of your essay papers.
Short-term loans suppose that a borrower will be bad credit loans guaranteed approval direct lenders able to make the repayment on the payday, usually the loan is given for one or two weeks but not longer than one month. The amount of such loan varies from $100 to $1000. That’s why you should evaluate your financial situation and get the loan if your problem requires the money solution within the guaranteed payday loans no matter what amount offered.
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?