Arc Forum | "If I follow everything on that page, this allows you to programmatically contro...

Arc Forum

new | comments | leaders | submit

2 points by Pauan 4573 days ago | link | parent

"If I follow everything on that page, this allows you to programmatically control where lists are segmented [...]"

Something like that, yes.

---

"[...] resolving function/macro calls to be anywhere in the list. In other words, macro foo doesn't have to look like"

Not quite. The parser is very simple: it just pushes symbols around. Which means that when you use a syntax rule like this:

  $syntax-infix "foo"

Then the parser will rewrite this:

  1 foo 2

Into this:

  foo 1 2

This all happens before macros are expanded, so macros receive a uniform list representation. In this way, it's similar to wart's system, except that it's much more flexible and powerful.

---

"Does this make the grammar context-sensitive? Does it often introduce ambiguity? Have you run into non-terminating parsing?"

I have no idea, no, and no. It's all just simple list transformations, similar to what macros do.

---

"Can you talk about how it compares with oMeta? (Are you sure you aren't greenspunning it? :)"

I haven't use oMeta, but from what I understand, it uses something like PEG parsing, which is completely different.

Think of my system as being like macros: you take this list of symbols and that list of symbols and return this list of symbols.

The difference with macros is that my system has left/right associative, prefix/infix/suffix, precedence, and a slew of other options too.

The key insight is that unlike most syntax systems which return a single expression, Nulan's syntax system returns a list of expressions.

And then the syntax rules operate on this list, which effectively lets them look-behind and look-ahead arbitrarily many tokens, but only within the list.

PEG parsing lets you look-ahead as many tokens as you like, but not look-behind. Nulan's system supports both, but the amount of look-ahead/behind is controlled by the indentation, so everything is handled in a consistent way.

2 points by Pauan 4573 days ago | link

To fully explain how the system works, let's use this program:

  prn 10 20
    foo bar + 50 * 20 - 30

After tokenizing, we use the significant whitespace rules to wrap the stuff in lists:

  {prn 10 20
    {foo bar + 50 * 20 - 30}}

Now, for each list, we run a modified Pratt parser on it:

https://github.com/Pauan/nulan/blob/79ea2a9fee8cb1ea7640d8f5...

Yes the Pratt parser really is that tiny. Now, let's start with this list:

  {foo bar + 50 * 20 - 30}

We start parsing at a precedence level of 0. First, we take all the non-syntax rule symbols and put them into a list. Now we have these two lists:

  left:      {foo bar}
  symbol:    +
  remaining: {50 * 20 - 30}

The + syntax rule has a precedence of 70. Now we need to recursively call the Pratt parser with a precedence of 70:

  left:      {50}
  symbol:    *
  remaining: {20 - 30}

The * syntax rule has a precedence of 80, which is greater than 70, so it continues parsing:

  left:      {50}
  symbol:    *
  right:     {20}
  remaining: {- 30}

The - syntax rule has a precedence of 70, which is equal to 70, so it stops parsing. It now calls the action function for * with the arguments left, symbol, and right, which returns {* 50 20}

Now we go back to the + syntax rule, which looks like this:

  left:      {foo bar}
  symbol:    +
  right:     {* 50 20}
  remaining: {- 30}

It calls the action function for + with the arguments left, symbol, and right, which returns {foo {+ bar {* 50 20}}}:

  left:      {foo {+ bar {* 50 20}}}
  remaining: {- 30}

Now it continues parsing with a precedence of 0. - has a precedence of 70 which is greater than 0, so it recursively calls the parser with a precedence of 70:

  left:      {foo {+ bar {* 50 20}}}
  symbol:    -
  right:     {30}
  remaining: {}

It now calls the action function for - which returns {foo {- {+ bar {* 50 20}} 30}}

We now call the Pratt parser on the outer list with a precedence of 0:

  left:      {prn 10 20 {foo {- {+ bar {* 50 20}} 30}}}
  remaining: {}

There aren't any syntax rules, so it just returns it unmodified, and so the final answer is:

  {prn 10 20
    {foo {- {+ bar {* 50 20}} 30}}}

---

Pratt parsers are absolutely amazing. They're very very short, easy to implement, very fast, and incredibly flexible.

If you want to learn more, I recommend these links:

http://eli.thegreenplace.net/2010/01/02/top-down-operator-pr...

http://javascript.crockford.com/tdop/tdop.html

http://effbot.org/zone/simple-top-down-parsing.htm

https://groups.google.com/forum/?fromgroups=#!topic/comp.pro...

http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-e...

I can also clarify things further, and provide a simple stripped-down version of Nulan's parser, if that helps to understand.

-----

2 points by Pauan 4573 days ago | link

I just made some changes to Nulan's parser. Now almost all the syntax is customizable:

https://github.com/Pauan/nulan/commit/b13e6477726cc111af4c23...

The changes are: renamed "action" to "parse", added in a "vertical" option, added in a "tokenize" function.

---

Now, significant whitespace is hardcoded, numbers are always 0-9, and symbols are anything that isn't a number or delimiter.

But everything else is customizable. Everything. Even comments, strings, and space/newline is customizable.

---

The new "tokenize" function lets you fairly easily add in new tokenizers, which is what string/comment/space/newline does.

---

The new "vertical" option is pretty cool too. It's currently only used by "|" and it means that this:

  foo bar
    | qux 1 2 3
    | corge 4 5 6

Is parsed like this:

  {foo bar
    | {qux 1 2 3
       corge 4 5 6}}

But the "|" syntax also specifies "separator", so it's further parsed to this:

  {foo bar
    | {{qux 1 2 3}
       {corge 4 5 6}}}

After running the "parse" functions, the final result is this:

  {foo bar
    {| {qux 1 2 3}
       {corge 4 5 6}}}

By the way, "|" is just like "do" in Arc, but I think it looks a lot nicer.

-----

1 point by akkartik 4573 days ago | link

Thanks! 'Pratt parsers' was precisely the phrase I was looking for :)

> > ..resolving function/macro calls to be anywhere in the list..

> Not quite.. This all happens before macros are expanded..

Yeah, that was exactly what I meant.

-----