Arc Forumnew | comments | leaders | submitlogin
2 points by raymyers 5878 days ago | link | parent

Hmm. I just tried replacing the scanner. Just as you also found, it didn't help performance.

  (def scanner-string2 (s (o start 0) (o end (len s)))
    (map idfn (scanner-string s start end)))
I'll see if I can find time soon to digest the wiki-arc grammar. Possibly it could be optimized, but I suspect treeparse could be handling it better. There are probably lessons to be learned from how Parsec gets good performance.


1 point by almkglor 5876 days ago | link

Found the parsec paper:

http://legacy.cs.uu.nl/daan/download/papers/parsec-paper.pdf

http://legacy.cs.uu.nl/daan/pubs.html#parsec

It seems that part of Parsec's optimization is that it is actually limited to an LL(1) grammar (whatever that means) and it's <|> ('alt) will fail immediately if the first parser ever consumed anything at all. Not sure how that translates to treeparse.

-----

1 point by raymyers 5876 days ago | link

LL(1) grammars only require one token of look-ahead to parse.

Parsec does not strictly require this, it can handle infinite look-ahead grammars. However, for good performance, it is best to use LL(1) grammars -- so there will be no backtracking required.

When using Parsec, I have often been surprised by the quick-failing behavior of <|> that you mentioned. Thus, I did not duplicate it in treeparse.

-----

1 point by almkglor 5876 days ago | link

Hmm. Apparently it's the fast way of doing it, although I'm not sure how to implement it in the first place.

-----