Arc Forumnew | comments | leaders | submitlogin
3 points by aaco 4162 days ago | link | parent

I fail to see where Arc doesn't support Unicode, since it seems to me that Arc is just using MzScheme strings, which are just Unicode strings.

Can someone explain this to me?

Some examples:

  ;◠ is a 2 bytes Unicode char, but I guess it's escaped in this forum, so replace it with the correct character when testing.
  arc> (len "a◠b") ; Unicode
  arc> (len "axb") ; ascii
  arc> (coerce #\◠ 'int) ; Unicode
  arc> (coerce #\x 'int)  ; ascii
  arc> (subseq "a◠b" 1 2) ; Unicode
  arc> (subseq "axb" 1 2)  ; ascii
Where does Arc don't support Unicode?!

3 points by olavk 4162 days ago | link

That just shows how agile PG is. He added unicode support the minute he saw people request it! :)

Seriously, PG explicitly claims that Arc intentionally doesn't support anything but ASCII (, so that might be why people (including me) believed that to be the case.


1 point by aaco 4162 days ago | link

Yes, I think Arc intentionally supports only ASCII just to not bother with Unicode issues as of right now.

Anyway, I can't see how Unicode can break in Arc. I'm not a Lisper, but I think you can't extract 1 byte from an Arc string (since it's just a MzScheme string), but 1 char instead. That's a different concept, because in Unicode 1 char can be formed with 1, 2 or more bytes.


2 points by bobbane 4162 days ago | link

Watch out - that's single-portable-implementation thinking. When Paul puts out another release of Arc based on, say, another Scheme implementation, or SBCL, those tricks won't work.