Arc Forumnew | comments | leaders | submitlogin
Revisiting + vs join
1 point by akkartik 4516 days ago | 32 comments
I ended up at http://arclanguage.org/item?id=13023 after some googling around, and realized I have an addition to make to the + vs join argument:

a) I'm mostly on the side of the majority wrt this idea (sorry waterhouse). Punning functions for different meanings is fine if done tastefully. In particular, + for concatenation is fine.

b) However, it interacts poorly with infix. The infix version:

  ('(1 2 3) + '(4 5))
is far inferior to:

  (+ '(1 2 3) '(4 5))
My general heuristic is to not use infix if parens or spaces or underscores are involved. That just makes the operator hard to spot.

Given that I want to resist the temptation to use infix + on lists and strings, I'm inclined to continue supporting join, and only use + for the easiest case:

  (join '(1 2 3)
        '(4 5))

  (file + ".wart")
What do y'all think? Perhaps this is too much angels and pinheads.


3 points by zck 4511 days ago | link

On a slightly different topic, I find '+ vs 'join really weird. They are only superficially similar; '+ actually reaches inside its arguments to muck with the cons cells:

  arc> (def check-cars (join-fun)
            (let x (list (list 1))
                 (is (car (join-fun x))
                     (car x))))
  #<procedure: check-cars>

  arc> (check-cars +)
  nil

  arc> (check-cars join)
  t
(originally written up by me at http://zck.me/golf). 'mappend uses '+, so it's also weird. This behavior seems insane to me, and was the cause of some hard-to-track-down bugs in the code at the above link.

-----

2 points by Pauan 4511 days ago | link

Interesting... doing that test in Arc/Nu returns "t" in both cases. I wasn't even aware of that issue but apparently "fixed" it accidentally.

By comparing Arc to Arc/Nu, it seems that the problem is that Arc returns a new list that is 'nil terminated, whereas Arc/Nu uses () for 'nil, so it doesn't have to do any conversion.

-----

1 point by akkartik 4510 days ago | link

It took me a while to understand why this should be. cartesian calls mem which calls testify, and testify uses is.

  (def testify (x)
    (if (isa x 'fn) x [is _ x]))
I think it should use iso. Can y'all think of any reasons that's a bad idea?

If we make that change, I'm not too concerned about whether + is like join. A caller of either shouldn't typically care whether they create new conses or not.

-----

2 points by rocketnia 4509 days ago | link

"I think it should use iso ."

Of course! A quick Google search for [site:arclanguage.org testify iso] turns up a few places Arc Forumgoers have talked about this before:

http://arclanguage.org/item?id=9183 (aw)

http://arclanguage.org/item?id=13678 (me)

http://arclanguage.org/item?id=13267 (me and you)

These were just my first three search results. :)

---

"A caller of either shouldn't typically care whether they create new conses or not."

Does anyone mutate association lists? It might matter there.

-----

1 point by akkartik 4509 days ago | link

There'll totally be places where it matters whether something conses or not. But that should reflect in the (longer) name, IMO. cons-minimal-append, mayhap?

-----

2 points by akkartik 4509 days ago | link

OMG, why didn't I fix anarki? Doing so now.

-----

2 points by zck 4509 days ago | link

It seems ugly that calling '+ changes the elements inside its arguments. Even if 'testify were changed, I still think '+ shouldn't do that.

-----

2 points by Pauan 4509 days ago | link

Yeah, definitely. The reason for the bug is because the Arc compiler uses Racket's "append" to do the join, which means it first needs to convert the Arc list to a Racket list, do the append, then convert back to an Arc list.

So there's two ways to fix this:

1) Change Arc to be like Arc/Nu, so that it doesn't need to do the conversion. I don't expect this to be too hard.

2) Write + in Arc, and have it call "join" when the first argument is a list. Then there's no discrepency between the two.

I actually dislike how the overloaded + is written in Arc, so I think having it call join instead would be great.

-----

1 point by akkartik 4509 days ago | link

Yeah I don't disagree. I just found it more fundamental to ask why it should be so subtle to debug.

Also, if the language used immutable conses then every cons would indeed have to change, so there's some rationale for keeping it purely functional.

I do totally agree that the two should be consistent. That sucks a lot.

-----

3 points by zck 4508 days ago | link

If the language used immutable conses, it still shouldn't have to change every cons. The issue here arises not because the list you pass in has different conses than the list returned, but because the elements of the list you pass in have different conses than the elements of the list returned.

Let's say we call (+ '((1) (2))). I'll draw the list this way:

  ( , . , )                   
    |   `-------> ( , . nil)  
    |               |         
    |               |         
    |               |         
    v               v         
   (1 . nil)       (2 . nil)
To my brain, there are two "kinds" of conses in the list: those that make up the top-level list itself, and those that make up the elements.

  top-level  ( , . , )                   
    conses     |   `-------> ( , . nil)  
               |               |         
               |               |         
               |               |         
   element     v               v         
    conses    (1 . nil)       (2 . nil)

And here's what I think should happen in the call to +

  these conses  ( , . , )
  can change      |   `-------> ( , . nil)
                  |               |
                  |               |
                  |               |
  these conses    v               v
  shouldn't      (1 . nil)       (2 . nil)
Similarly, if you call

  (+ (list (obj 1 2)))
Would you expect the hash table to have its elements copied into a new hash table? No, the same hashtable should be shared between the calling list and the returned list, even if the list itself has different conses. And indeed, this is the case (thanks again, evanrmurphy, for tryarc):

  arc> (def check-cars-obj (join-fun)
              (let x (list (obj 1 2))
                   (is (car (join-fun x))
                       (car x))))
  #<procedure: check-cars-obj>
  arc> (check-cars-obj +)
  t

-----

1 point by akkartik 4508 days ago | link

But the result should be:

  ( 1 . , )                   
        `-------> ( 2 . nil)
So the first of your element conses has to change too.

But you are right that not all the conses need to change. I finally understood Pauan's diagnosis, that it was arc copying nil vs () terminated lists. I've been against that 'feature' for a long time. My private fork just stays with () terminated lists: http://github.com/akkartik/arc

-----

2 points by zck 4508 days ago | link

That's neither '+ nor 'join :

  arc> (+ '((1) (2)))
  ((1) (2))
  arc> (join '((1) (2)))
  ((1) (2))
That's flatten:

  arc> (flat '((1) (2)))
  (1 2)

-----

1 point by akkartik 4508 days ago | link

Sorry, I think we've both managed to confuse each other. + and join on a single arg is just the identity function, right? Aren't these the cases we care about?

  arc> (+ '(1 2) '(3 4))
  (1 2 3 4)
My argument was that in the presence of mutable conses all the conses would have to change. After your response I'm correcting that to "all the outer conses (except in the last arg)".

Does this seem right? I was reading your previous comment without one wrapping set of parens.

-----

2 points by zck 4508 days ago | link

Yes, the outer conses all have to change, except for the last one. But the inner conses don't, and they are changed right now. That, specifically, is my problem with '+ . If you call (+ '(((((((1))))))) '(((((((2)))))))) , you shouldn't have to change fifteen conses, just one or two.

-----

4 points by Pauan 4508 days ago | link

akkartik: "I finally understood Pauan's diagnosis, that it was arc copying nil vs () terminated lists. I've been against that 'feature' for a long time."

Yes, but Arc/Nu manages to implement that Arc feature without conversion, so I don't consider it a problem with the feature, but a problem with the implementation.

---

akkartik: "+ and join on a single arg is just the identity function, right?"

Yes, but Arc does the conversion even if you pass in a single argument. So another possible solution to the problem is to change Arc to return the first argument unchanged when passed in a single argument.

---

zck: "If you call (+ '(((((((1))))))) '(((((((2)))))))) , you shouldn't have to change fifteen conses, just one or two."

Yes, you shouldn't have to. Arc uses ar-nil-terminate and ac-niltree to do the Arc->Racket->Arc conversion. ar-nil-terminate does a shallow copy, but ac-niltree does a deep copy. Simply changing ac-niltree wouldn't work, because I'm pretty sure other stuff depends on the deep copy.

However, the Arc compiler could be changed so that + uses a shallow version of ac-niltree. That's an interesting idea: how much stuff in the Arc compiler is currently using the deep ac-niltree when it could instead use a shallow ac-niltree?

-----

3 points by rocketnia 4508 days ago | link

"Arc uses ar-nil-terminate and ac-niltree to do the Arc->Racket->Arc conversion. ar-nil-terminate does a shallow copy, but ac-niltree does a deep copy.

However, the Arc compiler could be changed so that + uses a shallow version of ac-niltree."

That's the essence of the bug, as I see it. This fix is much shallower than the other fixes discussed, so this fix would make the most sense in a minimally updated variant of pg's Arc.

Should Anarki go for shallow or deep improvement? I've advocated shallow in the past, but now I'm thinking Arc oughta follow through on its promise to "break all your code," which of course means the entire network of hacks, libraries, help resources, and alternate language implementations we've built up so far. It would be nice to see Anarki become a fork of Arc/Nu and undergo deep improvements, without losing the Arc flavor.

-----

1 point by akkartik 4507 days ago | link

I agree. I feel uneasy about the shallow change to +; I'm sure the same bug exists in some other functions since deep niltree is the default.

(Confusing that you're using deep/shallow with two meanings in the same comment :)

-----

4 points by rocketnia 4507 days ago | link

"I feel uneasy about the shallow change to +; I'm sure the same bug exists in some other functions since deep niltree is the default."

From what I can see, there are only three places in the pg-Arc code where 'ac-niltree traverses too far, and they're all in ac.scm. Two are the definitions of + and ar-+2, and one is a misleading comment:

  ; Arc primitives written in Scheme should look like:
  
  ; (xdef foo (lambda (lst)
  ;           (ac-niltree (scheme-foo (ar-nil-terminate lst)))))
  
  ; That is, Arc lists are NIL-terminated. When calling a Scheme
  ; function that treats an argument as a list, call ar-nil-terminate
  ; to change NIL to '(). When returning any data created by Scheme
  ; to Arc, call ac-niltree to turn all '() into NIL.
  ; (hash-table-get doesn't use its argument as a list, so it doesn't
  ; need ar-nil-terminate).
From another point of view, there are only a few places where 'ac-niltree probably needs to be recursive. Those are in the definitions of 'ac-call, 'ac-mac-call, and 'ac-macex, where they deal with quotation and macroexpansion, the two ways literal code is made available to Arc programs.

The other uses of 'ac-niltree are in 'ar-coerce (string to cons), 'dir, and 'timedate, where the lists are flat anyway.

I only looked for uses in pg-Arc, not any code that's been derived from it.

https://github.com/nex3/arc/blob/official/ac.scm

-----

2 points by rocketnia 4507 days ago | link

"Confusing that you're using deep/shallow with two meanings in the same comment :)"

Whoops! My comment was originally going to stand on its own, but I adapted it into a reply when Pauan got to what I wanted to say first. :) I didn't notice the word overlap.

-----

1 point by akkartik 4507 days ago | link

Arc/Nu manages to implement that Arc feature without conversion

Hmm, can you elaborate on how it manages this? Is it just printing nil, but otherwise leaving the actual cons cells ()-terminated? (That option has probably been discussed multiple times: http://arclanguage.org/item?id=3094. I'm sure I mentioned it too at some point.)

-----

2 points by Pauan 4507 days ago | link

"Hmm, can you elaborate on how it manages this? Is it just printing nil, but otherwise leaving the actual cons cells ()-terminated? (That option has probably been discussed multiple times: http://arclanguage.org/item?id=3094. I'm sure I mentioned it too at some point.)"

Yes. And then "nil" is a global variable that is bound to (). I also have to do a few other things like changing quote so that it changes the symbol "nil" to (), but overall it isn't that hard to have perfect Arc compatibility, even without doing the conversion.

-----

2 points by Pauan 4507 days ago | link

In case you're curious, in order to accomodate () as 'nil, Arc/Nu had to change the following Arc functions: coerce, type, disp, write, quote, sread

And the places in the Arc/Nu source that treat () as 'nil:

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

https://github.com/Pauan/ar/blob/c8dea3f2d3a4f343ec633c81437...

16 lines total. Probably less than it would have taken to do the 'nil -> () conversion. But Arc can't tell the difference, unless I've overlooked something.

-----

1 point by akkartik 4507 days ago | link

ar-nil-terminate does a shallow copy, but ac-niltree does a deep copy.

Insane. I'm bummed I haven't noticed this before :( Has it come up in the past?

---

I also just noticed: It is _utterly_ insane that niltree converts () to nil, while nil-terminate does the opposite.

-----

1 point by Pauan 4507 days ago | link

"I also just noticed: It is _utterly_ insane that niltree converts () to nil, while nil-terminate does the opposite."

It's called "tree" because it recurses on the car. In other words, it traverses sublists. As for why it's called "nil", well, I guess that's because you can't use () in the name without using || quotes. I'd call them "nil->null" and "null->nil". Or perhaps "arc->racket" and "racket->arc".

-----

1 point by akkartik 4507 days ago | link

I've renamed ar-nil-terminate: http://github.com/nex3/arc/commit/c7a27e0157

-----

2 points by rocketnia 4516 days ago | link

"Punning functions for different meanings is fine if done tastefully. In particular, + for concatenation is fine."

The reason I'm okay with using + for concatenation is because I intuitively accept addition as a special case of concatenation. Concatenation is all + does.

However, I'm not as sure about this as I used to be. I can list a few reasons for separating addition from concatenation, and indeed for separating all kinds of seemingly polymorphic operators:

- Just because one operator gives us three context-appropriate kinds of concatenation doesn't mean we can define another operator that gives us the context-appropriate identity element (() or "" or 0).

- Addition tends to promise more algebraic properties than concatenation does (e.g. commutativity), so it may be easier to trust the behavior of an addition operator if it doesn't have concatenation weighing it down.

- Among numbers, there are multiple associative operators with identity, including addition, multiplication, and max (on the extended real number line), and I'm not sure any of these stands out as a natural choice for "concatenation."

---

"The infix version [...] is far inferior[...]"

I don't see that much inferiority. Maybe I've been working in JavaScript too long. Maybe I'm just making excuses to disagree with you. :-p

---

"Given that I want to resist the temptation to use infix + on lists and strings, I'm inclined to continue supporting join, and only use + for the easiest case"

Well, if I saw code that used 'join, I would be tempted to turn it into +, especially when it saves on indentation. YMMV.

-----

1 point by zck 4511 days ago | link

To discuss your specific issues, I prefer '+ to 'join for lists. I don't particularly have an opinion on how it looks in infix, because I don't like infix. Certainly

  ('(1 2 3) + '(4 5))
does look pretty ugly, but I like infix for basically everything. Your heuristic seems quite reasonable, though. What do you do about math? Do you still hold to it? For example, how would you write y(x) = mx+b:

  (def y (x)
       (+ (* m x)
          b))

  (def y (x)
       (+ (m * x)
          b)

  (def y (x)
       (m * x
        + b))
Or something else? I prefer the first one, but I know not everyone does.

-----

1 point by akkartik 4510 days ago | link

I don't know if you saw the thread about infix in wart[1], but I can say:

  def (y x)
    (m*x + b)
All operators have the same precedence, but operators without whitespace have precedence over operators with whitespace:

  (n * n-1)
The major drawback is, of course, that you can no longer use '* or '- in symbols. I use capitalization for globals and underscores in long names, and have been mulling camelCase.

(Wait, didn't someone check the pitchforks at the door?)

[1] http://arclanguage.org/item?id=16775

-----

2 points by Pauan 4516 days ago | link

I don't think it's that bad if you have true infix:

  '(1 2 3) + '(4 5)
But I agree it does look funky with that extra pair of parens around the entire expression.

-----

1 point by akkartik 4516 days ago | link

You do have that in combination with indent sensitivity. But you can't rely on it with compound exprs and outer parens. And I still think it's klunky. Try it without the quote, or with nested parens in the operands.

-----

1 point by Pauan 4516 days ago | link

You mean like this?

  (qux 1 (foo 2 3)) + ((bar 4) 5)
I still think it looks fine. And yeah, I'm not talking about wart in specific: Nulan hardcodes + as an infix, so it won't work nearly as well with wart's system.

-----

1 point by akkartik 4516 days ago | link

Random follow-up: I figured I'd try to make concatenation more efficient by using queues:

  def! (join ... args)
    collect:each arg args   # using faster accum: http://www.arclanguage.org/item?id=14492
      each elem arg
        yield elem
For some reason that is 3x slower than the obvious recursive version in wart. I need to understand why..

-----