Arc Forumnew | comments | leaders | submitlogin
4 points by aw 2383 days ago | link | parent

Not all binary sequences encode valid utf-8 characters.


4 points by akkartik 2382 days ago | link

This was my first thought as well. But why does `readc` (Racket's `read-char`) silently accept invalid utf-8?

-----

5 points by rocketnia 2353 days ago | link

It's reading the invalid sequence as � U+FFFD REPLACEMENT CHARACTER, which translates back to UTF-8 as EF BF BD (as we can see in the actual results above). The replacement character is what Unicode offers for use as a placeholder for corrupt sequences in encoded Unicode text, just like the way it's being used here.

-----