Arc Forumnew | comments | leaders | submitlogin
2 points by Pauan 4359 days ago | link | parent

To fully explain how the system works, let's use this program:

  prn 10 20
    foo bar + 50 * 20 - 30
After tokenizing, we use the significant whitespace rules to wrap the stuff in lists:

  {prn 10 20
    {foo bar + 50 * 20 - 30}}
Now, for each list, we run a modified Pratt parser on it:

https://github.com/Pauan/nulan/blob/79ea2a9fee8cb1ea7640d8f5...

Yes the Pratt parser really is that tiny. Now, let's start with this list:

  {foo bar + 50 * 20 - 30}
We start parsing at a precedence level of 0. First, we take all the non-syntax rule symbols and put them into a list. Now we have these two lists:

  left:      {foo bar}
  symbol:    +
  remaining: {50 * 20 - 30}
The + syntax rule has a precedence of 70. Now we need to recursively call the Pratt parser with a precedence of 70:

  left:      {50}
  symbol:    *
  remaining: {20 - 30}
The * syntax rule has a precedence of 80, which is greater than 70, so it continues parsing:

  left:      {50}
  symbol:    *
  right:     {20}
  remaining: {- 30}
The - syntax rule has a precedence of 70, which is equal to 70, so it stops parsing. It now calls the action function for * with the arguments left, symbol, and right, which returns {* 50 20}

Now we go back to the + syntax rule, which looks like this:

  left:      {foo bar}
  symbol:    +
  right:     {* 50 20}
  remaining: {- 30}
It calls the action function for + with the arguments left, symbol, and right, which returns {foo {+ bar {* 50 20}}}:

  left:      {foo {+ bar {* 50 20}}}
  remaining: {- 30}
Now it continues parsing with a precedence of 0. - has a precedence of 70 which is greater than 0, so it recursively calls the parser with a precedence of 70:

  left:      {foo {+ bar {* 50 20}}}
  symbol:    -
  right:     {30}
  remaining: {}
It now calls the action function for - which returns {foo {- {+ bar {* 50 20}} 30}}

We now call the Pratt parser on the outer list with a precedence of 0:

  left:      {prn 10 20 {foo {- {+ bar {* 50 20}} 30}}}
  remaining: {}
There aren't any syntax rules, so it just returns it unmodified, and so the final answer is:

  {prn 10 20
    {foo {- {+ bar {* 50 20}} 30}}}
---

Pratt parsers are absolutely amazing. They're very very short, easy to implement, very fast, and incredibly flexible.

If you want to learn more, I recommend these links:

http://eli.thegreenplace.net/2010/01/02/top-down-operator-pr...

http://javascript.crockford.com/tdop/tdop.html

http://effbot.org/zone/simple-top-down-parsing.htm

https://groups.google.com/forum/?fromgroups=#!topic/comp.pro...

http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...

I can also clarify things further, and provide a simple stripped-down version of Nulan's parser, if that helps to understand.



2 points by Pauan 4358 days ago | link

I just made some changes to Nulan's parser. Now almost all the syntax is customizable:

https://github.com/Pauan/nulan/commit/b13e6477726cc111af4c23...

The changes are: renamed "action" to "parse", added in a "vertical" option, added in a "tokenize" function.

---

Now, significant whitespace is hardcoded, numbers are always 0-9, and symbols are anything that isn't a number or delimiter.

But everything else is customizable. Everything. Even comments, strings, and space/newline is customizable.

---

The new "tokenize" function lets you fairly easily add in new tokenizers, which is what string/comment/space/newline does.

---

The new "vertical" option is pretty cool too. It's currently only used by "|" and it means that this:

  foo bar
    | qux 1 2 3
    | corge 4 5 6
Is parsed like this:

  {foo bar
    | {qux 1 2 3
       corge 4 5 6}}
But the "|" syntax also specifies "separator", so it's further parsed to this:

  {foo bar
    | {{qux 1 2 3}
       {corge 4 5 6}}}
After running the "parse" functions, the final result is this:

  {foo bar
    {| {qux 1 2 3}
       {corge 4 5 6}}}
By the way, "|" is just like "do" in Arc, but I think it looks a lot nicer.

-----

1 point by akkartik 4358 days ago | link

Thanks! 'Pratt parsers' was precisely the phrase I was looking for :)

> > ..resolving function/macro calls to be anywhere in the list..

> Not quite.. This all happens before macros are expanded..

Yeah, that was exactly what I meant.

-----