Arc Forumnew | comments | leaders | submitlogin
4 points by no 4541 days ago | link | parent

Okay pg, it's time to call you out. Unicode support is not trivial, like you make it out to be, and it's not a waste of time. It's a critical piece of infrastructure for any runtime. You fail.

5 points by pg 4527 days ago | link

I'm glad this is preserved for posterity, since it did turn out to be trivial and in fact got added with about -2 lines of code a few days after this comment was posted...


4 points by maxwell 4541 days ago | link

So, how would you implement it?


6 points by kirkeby 4541 days ago | link

I think the Py3K solution sounds right: Two different types of strings, 8-bit byte-strings and full unicode strings, with encode and decode functions to convert between the two.

Without this strict separation you get the wart that is Pythons current string-support.


9 points by olavk 4541 days ago | link

Or just one type of string: unicode character strings (which is sequences of unicode code points). Then a seperate type for byte arrays. Byte arrays are not character strings, but can easily be translated into a string (and back).


3 points by olavk 4540 days ago | link

...and this seem to be exactly what MzScheme provides :-) Strings in MzScheme are sequences of unicode code points. "bytes" is a seperate type which is a sequence of bytes. There are functions to translate between the two, given an encoding.

Python 3000 is close to this, but I think Python muddles the issue by providing character-releated operations like "capitalize" and so on on byte arrays. This is bound to lead to confusion. (The reason seem to be that the byte array is really the old 8-bit string type renamed. Will it never go away?) MzScheme does not have that issue.