hachyderm.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
Hachyderm is a safe space, LGBTQIA+ and BLM, primarily comprised of tech industry professionals world wide. Note that many non-user account types have restrictions - please see our About page.

Administered by:

Server stats:

10K
active users

Trying to figure out how to print a UTF32 character in C and so far the answer seems to be "you can't"

@eniko On conforming implementions, printf("%lc", unicode_codepoint_val);

@eniko wint_t, but default promotions from wchar_t should be fine.

@dalias everything i've found tells me not to use wchar_t because it is unclear what width its going to be

Cassandrich

@eniko Because Windows is wrong. If wchar_t is too narrow for full Unicode you're not allowed to support all of Unicode. C explicitly forbids "multi wchar_t chars" (thus UTF-16) which they do because they insisted on contradicting the experts in the early 90s who told them 16 bits wasn't enough and got themselves stuck. C11 strongly prefers wchar_t numeric vals be UCS codepoints (there's a macro that tells you this) and unless I'm misremembering, C23 requires it.

@dalias ok so then how do i support printing cross platform 32-bit unicode code points

@eniko With modern Windows, you can set the locale codepage to UTF-8 and it should just work doing everything in UTF-8 not touching wchar_t. Arguably this is the best way to do things, but it doesn't respect systems with legacy unix systems with non-UTF-8 encodings. Modern C also has char32_t (always UTF-32) which can be used if you're worried the system wchar_t is broken like on Windows but what you can easily do with it is limited..

@dalias from what I read char32_t isn't actually guaranteed to be utf32 and also I couldn't find a way to print it

@eniko Unfortunately the only way to print it is c32rtomb to convert it to a multibyte char string (in any reasonable setup this is UTF-8) in the current locale encoding.

@dalias i found beej.us/guide/bgc/html/split/u earlier and it says:

are values in these stored in UTF-16 or UTF-32? Depends on the implementation.

But you can test to see if they are. If the macros __STDC_UTF_16__ or __STDC_UTF_32__ are defined (to 1) it means the types hold UTF-16 or UTF-32, respectively.

beej.usBeej's Guide to C Programming

@dalias @eniko

The fact that UTF-16 can't die is just wild.

@lulu @dalias @eniko Java's internal representation for non-ASCII strings is UTF-16 and its not immediately clear how that could be changed. So I think it'll be around for the forseeable future.

@dalias @lulu @eniko Number of active server JVMs in the wild continues to increase, having doubled in ~6 years IIRC.

@kittylyst @dalias @eniko

Yes, I know. Same for javascript. But this is so bad.