Arc Forumnew | comments | leaders | submitlogin
1 point by olavk 5936 days ago | link | parent

It is not really more work to make strings sequences of (at least) 24bit values rather than sequences of 8bit values. Actually it makes a lot of things simpler, since all strings can be in the unicode character set, rather that a host of different and incompatible 8bit character sets, which is the case in non-Unicode languages.

The difficulties languages like Python and Ruby has is because of backwards compatibility - a lot of existing code expects strings to be 8bit byte arrays. Java and JavaScript got this more right by using 16bit chars. It is still not enough for the full Unicode set, but at least they don't have the problem with strings in multiple incompatible character sets.