Arc Forumnew | comments | leaders | submit | tree's commentslogin
1 point by tree 4404 days ago | link | parent | on: Clarification about Character Sets

The only reason Unicode contains combined forms is for compatibility with existing standards: you cannot invent new code points representing a novel combination of base and combining characters. The Unicode normalization forms deal with these issues.

Unicode support is a complex issue: fundamentally there are the issues of low-level character representation (e.g., internal representation) followed by library support to handle normalization and higher-level text processing operations.


8 points by tree 4404 days ago | link | parent | on: Clarification about Character Sets

If you don't deal with character sets up front then you too will spend a year getting Unicode (or whatever) right when the time comes, just like GvR (and others) did with Python.

Not using Arc because it lacks Unicode support is a bit silly, but it can be a show stopper: one of the reasons I never used Ruby and stayed with Python was that I needed support for non-Scripts.

Retrofitting this into a language is hard.


1 point by willchang 4404 days ago | link

GvR took a year because he didn't want to break old code. PG appears to have no compunctions about doing such a thing. And not only does that makes perfect sense, but he also warned us. The moral of the story is, don't write a million-line application in Arc just yet.


2 points by tree 4404 days ago | link

Then to me this makes the whole thing a non-starter, unfortunately, because no one will want to write any non-trivial program in a language that could (will, by the creators declaration!) change in incompatible ways in the future.

One example: generic collections in Java 5. Sun went out of their way to make ensure compatibility with pre-generic collections, giving us type-erasure. Bletcherousness in the sake of backwards compatibility.

Characters are such a fundamental part of a modern, general purpose, computer language that it seems short-sighted not to allow for dealing with the issue up front.

Honestly, though, it is early enough in the game that if people wanted to hash out the specification for Unicode support in Arc, it could be done. MzScheme characters are Unicode, aren't they? Build the definition on that foundation.


2 points by lojic 4404 days ago | link

Non-starter for your next production app maybe, but not a non-starter to code enough Arc to see how it compares to your other favorite languages so you can submit suggestions for improvement. If this mode of operation makes it easier to change the language for the better, I'm all for it :)

Eventually, backward compatibility will be very important, but having that too early just kills momentum IMO.