Arc Forumnew | comments | leaders | submitlogin
1 point by akkartik 4164 days ago | link | parent

Very cool!

If I follow everything on that page, this allows you to programmatically control where lists are segmented, and to generalize infix, resolving function/macro calls to be anywhere in the list. In other words, macro foo doesn't have to look like

  (foo ...)
it can also look like

  (... foo ...)
Does this make the grammar context-sensitive? Does it often introduce ambiguity? Have you run into non-terminating parsing? Can you talk about how it compares with oMeta? (Are you sure you aren't greenspunning it? :)


2 points by Pauan 4164 days ago | link

"If I follow everything on that page, this allows you to programmatically control where lists are segmented [...]"

Something like that, yes.

---

"[...] resolving function/macro calls to be anywhere in the list. In other words, macro foo doesn't have to look like"

Not quite. The parser is very simple: it just pushes symbols around. Which means that when you use a syntax rule like this:

  $syntax-infix "foo"
Then the parser will rewrite this:

  1 foo 2
Into this:

  foo 1 2
This all happens before macros are expanded, so macros receive a uniform list representation. In this way, it's similar to wart's system, except that it's much more flexible and powerful.

---

"Does this make the grammar context-sensitive? Does it often introduce ambiguity? Have you run into non-terminating parsing?"

I have no idea, no, and no. It's all just simple list transformations, similar to what macros do.

---

"Can you talk about how it compares with oMeta? (Are you sure you aren't greenspunning it? :)"

I haven't use oMeta, but from what I understand, it uses something like PEG parsing, which is completely different.

Think of my system as being like macros: you take this list of symbols and that list of symbols and return this list of symbols.

The difference with macros is that my system has left/right associative, prefix/infix/suffix, precedence, and a slew of other options too.

The key insight is that unlike most syntax systems which return a single expression, Nulan's syntax system returns a list of expressions.

And then the syntax rules operate on this list, which effectively lets them look-behind and look-ahead arbitrarily many tokens, but only within the list.

PEG parsing lets you look-ahead as many tokens as you like, but not look-behind. Nulan's system supports both, but the amount of look-ahead/behind is controlled by the indentation, so everything is handled in a consistent way.

-----

2 points by Pauan 4164 days ago | link

To fully explain how the system works, let's use this program:

  prn 10 20
    foo bar + 50 * 20 - 30
After tokenizing, we use the significant whitespace rules to wrap the stuff in lists:

  {prn 10 20
    {foo bar + 50 * 20 - 30}}
Now, for each list, we run a modified Pratt parser on it:

https://github.com/Pauan/nulan/blob/79ea2a9fee8cb1ea7640d8f5...

Yes the Pratt parser really is that tiny. Now, let's start with this list:

  {foo bar + 50 * 20 - 30}
We start parsing at a precedence level of 0. First, we take all the non-syntax rule symbols and put them into a list. Now we have these two lists:

  left:      {foo bar}
  symbol:    +
  remaining: {50 * 20 - 30}
The + syntax rule has a precedence of 70. Now we need to recursively call the Pratt parser with a precedence of 70:

  left:      {50}
  symbol:    *
  remaining: {20 - 30}
The * syntax rule has a precedence of 80, which is greater than 70, so it continues parsing:

  left:      {50}
  symbol:    *
  right:     {20}
  remaining: {- 30}
The - syntax rule has a precedence of 70, which is equal to 70, so it stops parsing. It now calls the action function for * with the arguments left, symbol, and right, which returns {* 50 20}

Now we go back to the + syntax rule, which looks like this:

  left:      {foo bar}
  symbol:    +
  right:     {* 50 20}
  remaining: {- 30}
It calls the action function for + with the arguments left, symbol, and right, which returns {foo {+ bar {* 50 20}}}:

  left:      {foo {+ bar {* 50 20}}}
  remaining: {- 30}
Now it continues parsing with a precedence of 0. - has a precedence of 70 which is greater than 0, so it recursively calls the parser with a precedence of 70:

  left:      {foo {+ bar {* 50 20}}}
  symbol:    -
  right:     {30}
  remaining: {}
It now calls the action function for - which returns {foo {- {+ bar {* 50 20}} 30}}

We now call the Pratt parser on the outer list with a precedence of 0:

  left:      {prn 10 20 {foo {- {+ bar {* 50 20}} 30}}}
  remaining: {}
There aren't any syntax rules, so it just returns it unmodified, and so the final answer is:

  {prn 10 20
    {foo {- {+ bar {* 50 20}} 30}}}
---

Pratt parsers are absolutely amazing. They're very very short, easy to implement, very fast, and incredibly flexible.

If you want to learn more, I recommend these links:

http://eli.thegreenplace.net/2010/01/02/top-down-operator-pr...

http://javascript.crockford.com/tdop/tdop.html

http://effbot.org/zone/simple-top-down-parsing.htm

https://groups.google.com/forum/?fromgroups=#!topic/comp.pro...

http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...

I can also clarify things further, and provide a simple stripped-down version of Nulan's parser, if that helps to understand.

-----

2 points by Pauan 4164 days ago | link

I just made some changes to Nulan's parser. Now almost all the syntax is customizable:

https://github.com/Pauan/nulan/commit/b13e6477726cc111af4c23...

The changes are: renamed "action" to "parse", added in a "vertical" option, added in a "tokenize" function.

---

Now, significant whitespace is hardcoded, numbers are always 0-9, and symbols are anything that isn't a number or delimiter.

But everything else is customizable. Everything. Even comments, strings, and space/newline is customizable.

---

The new "tokenize" function lets you fairly easily add in new tokenizers, which is what string/comment/space/newline does.

---

The new "vertical" option is pretty cool too. It's currently only used by "|" and it means that this:

  foo bar
    | qux 1 2 3
    | corge 4 5 6
Is parsed like this:

  {foo bar
    | {qux 1 2 3
       corge 4 5 6}}
But the "|" syntax also specifies "separator", so it's further parsed to this:

  {foo bar
    | {{qux 1 2 3}
       {corge 4 5 6}}}
After running the "parse" functions, the final result is this:

  {foo bar
    {| {qux 1 2 3}
       {corge 4 5 6}}}
By the way, "|" is just like "do" in Arc, but I think it looks a lot nicer.

-----

1 point by akkartik 4164 days ago | link

Thanks! 'Pratt parsers' was precisely the phrase I was looking for :)

> > ..resolving function/macro calls to be anywhere in the list..

> Not quite.. This all happens before macros are expanded..

Yeah, that was exactly what I meant.

-----