Arc Forumnew | comments | leaders | submitlogin
1 point by shader 4061 days ago | link | parent

I'm actually fairly convinced that my original idea as stated above wouldn't work, because without specific syntactical support already built in for whatever new reader you're trying to add, the original reader would have no good way of knowing when your macro ended, i.e.:

  (re /\)*/)
How does the original reader know to pass all of "/\)*/" in to the 're macro? Maybe there's some clever way to tell the difference between "escaped" special characters and normal ones, but it would limit the possibilities on the temp reader.

Maybe one option would be to have more flexibility when defining macros in the first place by taking advantage of the fact that macros can be defined and evaluated at the reader stage, so they can specify their own read semantics for their evaluation if they choose. I.e. make it so that macro definitions can specify a more generic level of control than just unevaluated, pre-parsed s-exps. That would make them more of scoped 'reading macros' that shadow the existing reader, rather than 'reader macros' that just hook into it.

3 points by rocketnia 4060 days ago | link

Your train of thought is very similar to a factor in several of my designs over time: Jisp[1], Blade[2], Penknife[3], Chops[4], and now the syntax I'm planning to use with Era[5].

I've always used use bracket nesting to determine where the syntax's body begins and ends. This way, most code requires no escape sequences, and the few times escape sequences are necessary, at least they stand a chance of being consistent when cutting-and-pasting across staging levels.

  (re /\)*/)       ; Broken.
  (re /(?#()\)*/)  ; Fixed with a regex comment. (Does this work?)
  (re /\>*/)       ; Fixed by redesigning the escape sequence.
Reader macros are a different approach, where the entire rest of the file/stream has unknown syntax until the reader reaches that point. That's simple in its own way, but I prefer not to make my code that lopsided in its semantics I guess. :)

(EDIT: Half an hour after I posted this, I realized I had a big mess of previous drafts tacked onto the end. I've deleted those now.)


[1] Jisp was one of the first toy languages I made, and it was before I programmed in Arc (or any other lisp). When it encounters (foo a b c), it resolves "foo" to an operator and sends "a b c" to that operator.

  > (if (eq "string (doublequote string)) (exit) 'whoops)
  [The REPL terminates.]
[2] Blade didn't get far off the ground, but it was meant to have a similar parser, with the explicit goal of making it easy to combine several languages into a single compiled program, with an extra emphasis on having no accidental code-order-dependent semantics at the top level. I switched to square brackets-- [foo a b c] --since these didn't require the shift key.

      import com.rocketnia.blade.*
      define( [ "out", "sample-var" ], BladeString.of( "sample-val" ) )
[3] Penknife was meant to be a REPL companion to Blade, and I developed the syntax into a more complicated, Arc-like combination of infix and prefix syntaxes ( Penknife was complete enough that I used it as a static site generator for a while. However, at this point I realized all the custom syntax processing I was doing was really killing the compile-time performance.

  arc.prn.q[Defining fn-iferr.]
  [fun fn-iferr [body catch]
    [defuse body [hf1:if-maybe err
  arc.prn.q[Defining iferr.]
  [mac iferr [var body catch]
    qq.[fn-iferr tf0.\,body [tf [\,var] \,catch]]]
[4] Chops is a JavaScript library that achieves Blade-like parsing without any goals for infix treatment or general-purpose programming. I use it as a markup language and a JavaScript preprocessor now that my static site generator runs in the browser.

  $rc.rcPage( "/", $cg.parseIn( [str RocketN[i I]] ),
      "19-Nov-2012", $cg.parseIn( [str 2005[en]2010, 2012] ),
      { "title": ", Virtual Index of Ross Angle",
          "breadcrumbs": $cg.parseIn(
              [str RocketN[i I] Virtual Index of Ross Angle] ) },
      $cg.parse( [str
  ((This is the open source version of my site.
  [out The online version] has a bit more
      ] ) )
[5] Era is a module system I'm making, and I intend to compile to those modules from a lisp-like language. I've switched back to parentheses-- (foo a b c) --because smartphone keyboards tend to omit square brackets. The code (foo a b c) parses as a four-element list of symbols in the hope of more efficient processing, but the code foo( a b c) parses as a single symbol named "foo( a b c)".


1 point by akkartik 4060 days ago | link

The bouncing between parens and square brackets is interesting ^_^ I weakly feel it's not worth optimizing for what's easy to type because things can change (or be changed, with keybindings, etc.) so easily. Better to optimize for how things look. But even there, parens vs brackets is in the eye of the beholder.


1 point by akkartik 4060 days ago | link

"I've always used use bracket nesting to determine where the syntax's body begins and ends."

I have no idea what you mean by bracket nesting, or by staging levels. It also wasn't clear what the escape sequence is in the third example.


3 points by rocketnia 4060 days ago | link

"bracket nesting"

Whoops, I can't believe I didn't use the phrase "balanced brackets" instead. ^_^

The following two pieces of text may be similar, but I'd give them significantly different behavior as code:

  (foo a b (bar c d) (baz e) f)
  (foo a b bar c d) (baz e f)
My systems don't provide any way to pass the string "a b bar c d) (baz e f" to an operator.


"staging levels"

Staged programming is where a program generates some code to run later on, perhaps as a second program in some sense--especially if that boundary is enforced by a need to serialize, transmit, or sandbox the second program rather than executing it here and now. Staged programming has some implications for syntax, since it's valuable to be able to see the code we're generating.

Most languages use " to denote the beginning and end of a string, so they can't also use " to represent the character " inside the string. This can makes it frustrating to nest code within code. I'll use a JavaScript example.

  > eval( "eval( \"1 + 2\" )" )
  > eval( "eval( \"eval( \\\"eval( \\\\\\\"1 + 2\\\\\\\" )\\\" )\" )" )
While all these stages are JavaScript code, they all effectively use different syntax. It's not easy to copy and paste code from one stage to another.

Suppose we identify the end of the string by looking for a matching bracket, possibly with other pairs of matched brackets in between. I'll use ~< and >~ as example string brackets.

  > eval( ~<eval( ~<1 + 2>~ )>~ )
  > eval( ~<eval( ~<eval( ~<eval( ~<1 + 2>~ )>~ )>~ )>~ )
This fixes the issue. The same code is used at every level.

In JavaScript, technically we can implement delimiters like these if we're open-minded about what a delimiter looks like. We just need a function str() that turns a first-class string into a string literal.

  > str( "abc" )

   Open string:  " + str( "
  Close string:  " ) + "

  > eval( "eval( " + str( "1 + 2" ) + " )" )
  > eval( "eval( " + str( "eval( " + str( "eval( " + str( "1 + 2" ) + " )" ) + " )" ) + " )" )
Now the code is consistent! Consistently infuriating. :-p

In Arc, we delimit code using balanced ( ). The code isn't a string this time, but the use of balanced delimiters has the same advantage.

  > (eval '(eval '(eval '(eval '(+ 1 2)))))
This advantage is crucial in Arc, because any code that uses macros already runs across this issue. A macro call takes an s-expression, which contains a macro call, which takes an s-expression....

Since we're now talking about macros that take strings as input, let's see what happens if Arc syntax is based on strings instead of lists.

  > (let total 0 (each x (list 1 2 3) (++ total x)) total)

  > "let total 0 \"each x \\\"list 1 2 3\\\" \\\"++ total x\\\"\" total"
If we use balanced ( ) to delimit strings, we're back where we started, at least as long as we don't look behind the curtain.

  > (let total 0 (each x (list 1 2 3) (++ total x)) total)
If you want working code for a language like this, look no further than Penknife. :)


"It also wasn't clear what the escape sequence is in the third example."

Are you talking about this one?

  (re /\>*/)
The original code would be broken in my approach because it uses ) in the middle of the regex, so the macro's input would stop at "/\". This fix addresses the issue by using a hypothetical escape sequence \> to match a right parenthesis, rather than using the standard escape sequence \).

If you're talking about my Penknife code sample, the "qq." part is quasiquote, and the \, part is unquote. Quasiquotation is relatively complicated here due to the fact that it generates soup, which is like a string with little pieces floating in it. :-p Penknife has no s-expressions, so it was either this foundational kludge or the hard-to-read use of manual AST constructors.

It's hard to count these examples with a whole number. XD Let me know if you were talking about my Blade code sample (the third code block in the post) or my Jisp code sample (third if you count the two example regex fixes separately).


1 point by akkartik 4060 days ago | link

Super useful, thanks. Yes, you correctly picked the code I was referring to :)

That issue with backslashes you're referring to, I've heard it called leaning toothpick syndrome.


2 points by rocketnia 4060 days ago | link

"Yes, you correctly picked the code I was referring to :) "

Which of my guesses was correct? :-p


"That issue with backslashes you're referring to, I've heard it called leaning toothpick syndrome."

Nice, I hadn't heard of that!

It might be worth pointing out that the LTS appearing in my examples is more pronounced than it needs to be. The usual escape sequence \\ for backslashes creates ridiculous results like \\\\\\\". If we use \- to escape \ we get the more reasonable \--" instead, and then we can see the nonuniform nesting problem without that other distraction:

  > eval( "eval( \"1 + 2\" )" )
  > eval( "eval( \"eval( \-"eval( \--"1 + 2\--" )\-" )\" )" )
Here's another time I talked about this:


2 points by dido 4061 days ago | link

I ran into a similar issue with regex syntax when attempting to incorporate it into Arcueid's reader. There seems to be no easy way to parse a regular expression using Perl-like /.../ syntax, not if you also want symbols that use /'s for other things, e.g. the division function. Arcueid thus uses for now, r/.../ for regular expressions, and that syntax could be more easily distinguished from other legitimate uses of symbols with a minimum of fuss.


1 point by akkartik 4061 days ago | link

Wart's tokenizer already knows about backslashes inside strings, so "\"" becomes one token. It seems plausible to try tokenizing in a backslash-aware way everywhere and not just inside strings. Other than that you would have to treat slashes as a delimiter like double-quotes.

It might be an ugly design, but I think it would work, and it would be worth trying out.