Arc Forumnew | comments | leaders | submitlogin
The Trouble with Arc's Lack of Unicode Support (plasmasturm.org)
9 points by noahlt 5938 days ago | 3 comments


6 points by pg 5937 days ago | link

I don't think it's an accident that so many successful languages have had trouble later with strings. It's like the fact that so many successful startups have had trouble with scaling. They were only successful because they focused initially on what mattered most.

Though it sounds prudent for a startup to do things "right" when building their initial infrastructure, you do this only at the expense of something much more important: thinking about the product. So we tell startups not to do more about scaling than have an emergency plan. They'll be lucky if they need it.

Probably the same is true with strings and languages.

-----

4 points by ap 5937 days ago | link

> I don't think it's an accident that so many successful languages have had trouble later with strings.

Of course. The obvious reason is that all of them are at least as old as Unicode itself.

Unicode awareness was not even looming on the horizon when they were first created. Citing their initial lack of Unicode awareness as support for any theory about language success is... well, it's a number of things, but "relevant" is not of them, and I'm honestly puzzled as to what would lead you to think otherwise.

-----

1 point by olavk 5936 days ago | link

It is not really more work to make strings sequences of (at least) 24bit values rather than sequences of 8bit values. Actually it makes a lot of things simpler, since all strings can be in the unicode character set, rather that a host of different and incompatible 8bit character sets, which is the case in non-Unicode languages.

The difficulties languages like Python and Ruby has is because of backwards compatibility - a lot of existing code expects strings to be 8bit byte arrays. Java and JavaScript got this more right by using 16bit chars. It is still not enough for the full Unicode set, but at least they don't have the problem with strings in multiple incompatible character sets.

-----