Arc Forum | html.arc: another approach

Arc Forum

html.arc: another approach

6 points by evanrmurphy 5479 days ago | 22 comments

html.arc has certainly been useful to me, but two things about it have sometimes been a bother:

1. Using 'tag, 'gentag etc. as the basic building blocks breaks the elegant isomorphism s-expressions have with HTML.

2. The way attributes are handled (i.e. only valid if they're in the table) requires too much micromanagement of the attributes table.

I've taken a stab at an html.arc that addresses these concerns, though I've inevitably created new ones in the process. The way this works is that an 'html macro serves as a keyword for entering an environment where code is treated as HTML instead of as normal Arc:

  ; return values will be omitted for readability, 
  ; since we're printing to stdout

  arc> (html ((a href "http://arclanguage.org/") "Arc Forum"))
  <a href="http://arclanguage.org/">Arc Forum</a>

  arc> (html (html
               (head
                 ((script src "arc.js")))
               (body
                 "hello world!")))
  <html><head><script src="arc.js"/></head><body>hello world!</body></html>

Invoking the 'html macro causes the cars of included conses to be treated as tags with their cdrs as tag bodies:

  arc> (html (some-tag "and its body"))
  <some-tag>and its body</some-tag>

If the cadr is a string (as above), it's printed as literal text between the corresponding start and end tags. If it's another cons, then the process recurses and you get nested tags:

  arc> (html (1st-tag (2nd-tag (3rd-tag "3rd-tag's body"))))
  <1st-tag><2nd-tag><3rd-tag>3rd-tag's body</3rd-tag></2nd-tag></1st-tag>

Note that you just need the string literal "3rd-tag's body", rather than (pr "3rd-tag's body") as it would be using the original html.arc.

To generate attributes, you only need to make the car a list instead of an atom; the first element will be treated as the tag name and the rest as a series of attribute pairs:

  arc> (html ((mytag attr1 "attr1's val" attr2 "attr2's val")))
  <mytag attr1="attr1's val" attr2="attr2's val"/>

And as the above illustrates, when there is no cdr, an empty tag is generated (i.e. <mytag/>)instead of a start-tag/end-tag pair (<mytag></mytag>).

That just about covers basic usage for this html.arc. Now just a few words on the more interesting (and problematic) stuff. As I said before, the 'html macro invokes a new environment where everything is considered HTML. If you want to escape into Arc and have your functions and macros back, use the 'arc keyword:

  arc> (html (html
               (body
                 (arc (if (is 1 1)   (pr "then it all makes sense")
                          (is 1 nil) (pr "then I'm awfully confused"))))))
  <html><body>then it all makes sense</body></html>

How about defining macros, a la 'whitepage and 'link from the original html.arc? You could call the 'html macro within each of those macro definitions, but then they're only good as top-level tags or if you escape into arc each time you use them. To help address the problem, I provide the 'html-mac facility for defining HTML-specific macros. For example, we define 'link,

  (html-mac link (text (o dest text))
    `((a href ,dest) ,text))

which then acts as an exception to the regular tag-processing algorithm:

  arc> (html (link "Arc Forum" "http://arclanguage.org"))
  <a href="http://arclanguage.org">Arc Forum</a>

(If 'link hadn't been predefined with html-mac, it would have printed as:

  <link>Arc Forumhttp://arclanguage.org</link>  )

This html.arc lacks some of the refinements of the original, but I am enjoying being free from the attributes table and able to write in something that looks more like HTML. I'm most uncertain about: this design choice of essentially creating another namespace; the kludginess surrounding having to start every page with "(html (html ..."; and needing the 'arc keyword to escape into Arc proper. Looking forward to everyone's feedback and hoping this might be useful to some of you.

Complete source is posted below, and here's a good related thread: http://arclanguage.org/item?id=5565

  ;; html.arc: another approach

  (def cdar (xs)
    (cdr (car xs)))
  
  ; string escaping utils, mostly for use in conjunction with
  ; js.arc and arc.js from http://arclanguage.org/item?id=11918

  (let html-nestlev 0
  
    (def html-q ()
      (if (is html-nestlev 0)
          (pr #\")
          (pr "&quot;"))
      
      ; how to warn without muddying stdout?
  
      ;(if (> html-nestlev 1)
      ;    (warn "maximum html nest level exceeded")) 
  
      )
  
    (def html-openq ()
      (html-q)
      ++.html-nestlev)
  
    (def html-closeq ()
      --.html-nestlev
      (html-q)))
  
  (mac html-w/quotes body
    `(do (html-openq)
         ,@body
         (html-closeq)))
  
  (def attrs (as)
    (each a pair.as
      (pr #\ car.a #\=)
      (html-w/quotes
        (htmlf cadr.a))))
  
  (def start-tag (t . as)
    (pr #\< t)
    attrs.as
    (pr #\>))
  
  (def end-tag (t)
    (pr #\< #\/ t #\>))
  
  (def empty-tag (t . as)
    (pr #\< t)
    attrs.as
    (pr #\/ #\>))
  
  (def tag (t as . body)
    (apply start-tag t as)
    (if (acons car.body)
         (apply htmlfs body)
        (apply pr body))
    (end-tag t))

  ; the html-macros are kept in this table
  
  (= html-macs* (table))
  
  (mac html-mac (name args . body)
    `(= (html-macs* ',name) (fn ,args (htmlf ,@body))))

  ; 'link is just the only one I've bothered to define so far
  
  (html-mac link (text (o dest text))
    `((a href ,dest) ,text))

  ; program's central function  

  (def htmlf (s)
    (if no.s                   nil
        atom.s                 pr.s
        (caris s 'arc)         (apply eval cdr.s)
        (caris s 'js)          (apply jsfs cdr.s)
        (html-macs* car.s)     (apply (html-macs* car.s) cdr.s)
        (acons car.s)          (if (no cdr.s)
                                    (apply empty-tag caar.s cdar.s)
                                   (apply tag caar.s cdar.s cdr.s))
                               (if (no cdr.s)
                                    (apply empty-tag car.s nil)
                                   (apply tag car.s nil cdr.s))))
  
  (def htmlfs args
    (each a args
      htmlf.a))

  ; entry point
  
  (mac html args
    `(apply htmlfs ',args))

3 points by shader 5479 days ago | link

That looks pretty good. The implementation seems relatively straightforward and short, and the value is pretty high. I recommend however that you look at SXML for the syntax, as opposed to reinventing the wheel. http://en.wikipedia.org/wiki/SXML

For the (html (html issue, I would consider instead making a more general 'sxml macro, and then several helper macros such as 'html which automatically generate boilerplate doctypes, etc.

Also, why are some of your expressions wrapped in an extra layer of parens? i.e. ((script src "arc.js")). Is this so that you can differentiate the body vs attributes? If it is, I would recommend sxml syntax for attributes instead.

-----

2 points by evanrmurphy 5479 days ago | link

Thanks for the link and feedback.

> Also, why are some of your expressions wrapped in an extra layer of parens? i.e. ((script src "arc.js")). Is this so that you can differentiate the body vs attributes?

Yes, exactly. I can see now that they're using the @ symbol to differentiate attribute lists, but it's interesting to note that they actually end up using more parens. Unless I've misunderstood, where my program would have

  ((a href "http://arclanguage.org" onclick "alert();") "click here")

SXML would have

  (x:a (@ (href "http://arclanguage.org") (onclick "alert();")) "click here")

In light of this, could you elaborate on your recommendation to adopt SXML's syntax? Do you find it somehow superior or does it have more to do with adopting a standard?

> For the (html (html issue, I would consider instead making a more general 'sxml macro, and then several helper macros such as 'html which automatically generate boilerplate doctypes, etc.

Sounds like a wise path of generalization to carry out over the long run.

-----

1 point by shader 5479 days ago | link

You're right, I hadn't noticed how many parens SXML actually uses. I guess I just preferred more of a tree style "tag owns attributes" than the idea of applying a tag to its body.

I originally like the idea of the standard, but SXML probably supports far more than we need for now. How about a compromise?

  (x:a (@ href "http://arclanguage.org" onclick "alert();") "click here")

This uses arc's style of "leave out parens for grouping pairs" like you were doing, but also doesn't require the attribute parens if you don't have attributes. This seems to make the tag style more homogeneous.

-----

3 points by rocketnia 5479 days ago | link

Why use parentheses at all?

  (x:a href "http://arclanguage.org onclick "alert();"
    "click here")

The body can be distinguished from the attribute-value pairs thanks to either the fact that it doesn't begin with a symbol or the fact that there's nothing to pair it with.

-----

2 points by evanrmurphy 5479 days ago | link

Wow, neat idea! I'm looking into it right now...

And Arc continues on its quest to become the most parens-frugal Lisp there ever was. :P

Question: Do y'all like the x:a? It seems kludgy to me and I'd rather just do a, but maybe I'm missing the point...

-----

2 points by rocketnia 5479 days ago | link

Wow, neat idea! I'm looking into it right now...

Oh, I guess sml.arc already allows for that syntax. XD I should have taken a closer look....

Anyway, in case it helps, Lathe has a utility called parse-magic-withlike defined here: http://github.com/rocketnia/lathe/blob/master/arc/modules/mo...

It does save a single pair of parentheses every once in a while, but it takes half a page of comments to explain comprehensively. :-p Then again, it's meant for Arc macros in general, so some of the idiosyncracies might disappear if it's modified for a specific purpose like this one.

Do y'all like the x:a?

I thought the "x:a" was just an abbreviation for things like "w/html:a", "html:a", "tohtml:a", and "sml:a", substituting whatever you decided the macro name would be. What alternative are you thinking about?

-----

2 points by evanrmurphy 5479 days ago | link

> I thought the "x:a" was just an abbreviation for things like "w/html:a", "html:a", "tohtml:a", and "sml:a", substituting whatever you decided the macro name would be.

The "x:a" originally came from the example at http://en.wikipedia.org/wiki/SXML#Example, where I think it had another meaning - maybe something to do with XHTML... anyway, it's probably not important. You clarified that I wasn't missing something in the conversation (unless we both are ;).

-----

2 points by shader 5478 days ago | link

I think in the x:a syntax, the x part is supposed to denote the xml namespace in which the tag is defined. It's used in things like xpath.

-----

1 point by rocketnia 5478 days ago | link

I think so too. The x is defined here at the beginning of the example:

  (*TOP* (@ (*NAMESPACES* (x "http://www.w3.org/1999/xhtml")))
   ...

There's an okay introduction to namespace usage in XML and SXML here: http://www196.pair.com/lisovsky/xml/ns/

-----

1 point by evanrmurphy 5479 days ago | link

> This uses arc's style of "leave out parens for grouping pairs" like you were doing, but also doesn't require the attribute parens if you don't have attributes.

I like that compromise and will definitely consider it. To be sure though, my current implementation doesn't require the attribute parens for nil attributes either:

  arc> (html (h1 "Some heading"))
  <h1>Some heading</h1>
  arc> (html (script "someFunction();"))
  <script>someFunction();</script>

On these cases, my html.arc and the compromise converge. It's for the non-nil attributes case that they differ, and I think it comes down which you dislike less: the tag name in the caar position or the @ symbols.

-----

1 point by shader 5479 days ago | link

Yes, but your current method seems to lack consistency (at least to me).

In some cases, (script fn "somefn();") means script with attributes, in others it means that it has a body. The difference is where the (script ...) is located; at the car of a list or not. Personally, I like the consistency provided by "anything that's not an @ list is body" which means that you don't have to pay as much attention to the number and layering of parens.

-----

3 points by fallintothis 5479 days ago | link

how to warn without muddying stdout?

If you mean redirecting to, say, stderr:

  (w/stdout (stderr)
    (if (> html-nestlev 1)
        (warn "maximum html nest level exceeded")))

-----

2 points by evanrmurphy 5479 days ago | link

I think that will do it, thanks!

-----

3 points by shader 5479 days ago | link

I'd be interested to know what you think of almkglor's 'w/html. It was mentioned on the related thread you posted. The interesting feature is that all tag symbols are quoted, so you don't need a php-style escape for arc code.

-----

3 points by evanrmurphy 5479 days ago | link

I'm quite impressed, actually! In addition to the name (which in itself beats my (html (html problem), the quoting system seems smart. I'm going to have to mull this over more.

Another punctuation possibility I've considered that's different from both of these is quasi-quotation,

  (tohtml     ; another decent name
    `(body
      (h1 "Simple Math")
      (p "1 plus 1 equals " ,(+ 1 1))

  ; would generate:
  ; <body><h1>Simple Math</h1><p>1 plus 1 equals 2</p></body>nil

which would seem like a very natural way to escape into Arc. Not only does it look better, but it could get me out of using 'eval in that 'htmlf function. In fact, the more I write about this, the better it sounds. I think I'll give quasi-quotation a shot.

-----

3 points by shader 5479 days ago | link

I think that

  `(body
    (h1 "Simple Math")
    (p "1 plus 1 equals " ,(+ 1 1))

and

  ('body
    ('h1 "Simple Math")
    ('p "1 plus 1 equals " (+ 1 1))

are probably equivalent, though I don't know how quoting affects strings. Odds are the same system could handle both styles, which means that programmers could use whichever they prefer. Personally I like the 'tag notation for tags, but hopefully it would be flexible enough to handle both.

What's the current status of whtml.arc? I can't look at the repo right now. Does it look like something that could be used as is?

-----

1 point by evanrmurphy 5479 days ago | link

> probably equivalent

I agree that they seem close and are probably one macro apart from behaving the same, but they must be different since the latter throws

  Error: "Can't coerce  body fn"

while the former returns a list. I think its equivalent normal-quoted expression would have to be something more like:

  (list 'body
    (list 'h1 "Simple Math")
    (list 'p "1 plus 1 equals " (+ 1 1)))

I haven't found whtml at http://github.com/nex3/arc/tree/master/lib/ yet. I did however just stumble upon sml.arc by akkartik, which appears to be super relevant to this discussion.

-----

1 point by shader 5479 days ago | link

It looks like sml supports both '@ based attributes and also a form like

  (tag attr1 "value" attr2 "value"
    (bodyless-tag)
    (tag-w/body "body"))

It looks like it's function based and works on simple lists, so it would probably be conducive to the quasiquote form of escaping to arc. Anything seem to be missing from sml.arc that you wanted in your new version? Personally I think the interface could use some work, but that's about it.

-----

1 point by shader 5479 days ago | link

Hmm. I'll have to see if I can find it elsewhere. I guess that w/html must have been a macro that bent quotes to its own purpose, whereas marcup or your 'tohtml merely take lists as input.

sml.arc sounds familiar; I vaguely remember someone else going through a discussion similar to ours, in which SXML was also mentioned.

-----

1 point by akkartik 5462 days ago | link

I shouldn't get any credit for jazzdev's sml.arc: http://github.com/nex3/arc/commits/master/lib/sml.arc

-----

1 point by evanrmurphy 5461 days ago | link

Thanks for clarifying the authorship.

-----

2 points by shader 5479 days ago | link

I'd like to add the comment that I think everyone who has gone to the trouble of writing useful code should put it somewhere accessible, such as GitHub or Anarki, so that we can continue to find it later even after the article is buried.

Several times the only discoverable implementation of useful code is the discussion on the forum, which can be quite difficult to find later, and also challenging to get to in a way that's easy to use.

-----