Hi, finally replying to this thread. So, I, a long time ago, found that the + function was allocating memory every time it was called; this a) made things slower than they had to be (according to my old thread, incrementing a variable was half as fast as it should have been), b) triggered garbage collections, which were annoying pauses at the REPL, and c) triggered garbage collections when I was doing, like, (time:repeat 100000 <test procedure>) (note that "repeat" is implemented with "for", which uses +).
The problem with (c) is that it made the time to completion rather variable (depending on whether a GC happened, or sometimes on how many GCs happened), so the results were difficult to interpret. It interfered with my measuring and comparing the performance of my procedures.
So it annoyed me. And I found that just using the Scheme + fixed it, stopped the malloc'ing. To know that this crap was caused by a feature I didn't use and that Paul Graham had seemed to declare was a bad idea, a feature that seemed it shouldn't even be there anymore... I took it upon myself to eradicate it from the source code. And I felt better afterwards.
But now this case-lambda version of + (which is polymorphic) seems to not malloc at all, and a bit of testing by me indicated it performed approximately as well as the raw Racket +; I couldn't see any difference. (Btw, I think I perceived that Racket doesn't malloc for argument lists when you call "apply" or something, but it does malloc--probably copies the arglist--if you access it with car or cdr.)
So polymorphic + is no longer so obnoxious to me, and I no longer feel the need to eradicate it. I'd probably still prefer to use "string" to join/coerce to strings and "join" to join lists, myself, and I'd probably prefer "join" as a generic join operator. But I won't kill concatenate-+ on sight. Since overloading itself seems not to be the cause of the performance problems, I might warm to the idea... maybe overload + for my polynomials--or for matrices, I've been playing with them recently--though I've actually only multiplied them, never added them--perhaps I should overload *...
You may find the following definitions extremely useful. (I do.) Evolved from my previous "mem-use" macro.
;note that "memory" = $.current-memory-use
; and "current-gc-milliseconds", "current-process-milliseconds"
; were supplied in ac.scm
(= gc-msec current-gc-milliseconds
process-msec current-process-milliseconds)
(mac utime body
(w/uniq (gtime ggc gmem)
`(with (,gtime (msec) ,ggc (gc-msec) ,gmem (memory))
(do1 (do ,@body)
(prn "time: " (- (msec) ,gtime)
" gc: " (- (gc-msec) ,ggc)
" mem: " (- (memory) ,gmem))))))
(= time utime)
You might expect some memory or time overhead just from using this, but in fact there doesn't seem to be any:
arc> (time:- 1 2)
time: 0 gc: 0 mem: 0 ;zero overhead
-1
;now I copy the old definition of + from ac.scm to clipboard
arc> (= + (eval:list '$ (read:pbpaste)))
#<procedure>
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 352 ;probably from JIT-compiling +
3
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 64
3
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 64 ;by now this is clearly the usual mem-use
3
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 64
3
arc> (= + $.+) ;Racket +
#<procedure:+>
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 0
3
arc> (time:+ 1 2)
time: 0 gc: 0 mem: 0
3
;now I copy this definition to clipboard:
(define (ar-+2 x y)
(cond ((char-or-string? x)
(string-append (ar-coerce x 'string) (ar-coerce y 'string)))
((and (arc-list? x) (arc-list? y))
(ac-niltree (append (ar-nil-terminate x) (ar-nil-terminate y))))
(#t (+ x y))))
arc> (eval:list '$ (read:pbpaste))
#<void>
arc> (time ($.ar-+2 1 2))
time: 0 gc: 0 mem: 416 ;JIT
3
arc> (time ($.ar-+2 1 2))
time: 0 gc: 0 mem: 0 ;so it allocates zero memory
3
arc> (time ($.ar-+2 1 2))
time: 0 gc: 0 mem: 0
3
Now for more playing around. A Racket loop that counts to 1 million.
arc> (time:$:let loop ((n 0)) (if (> n 1000000) 't (loop (+ n 1))))
time: 3 gc: 0 mem: 760
t
arc> (time:$:let loop ((n 0)) (if (> n 1000000) 't (loop (+ n 1))))
time: 5 gc: 0 mem: 920 ;kinda weird, it alternates 760 and 920
t
;now up to 10 million
arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (+ n 1))))
time: 36 gc: 0 mem: 760
t
;now we test ar-+2
arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
time: 1020 gc: 0 mem: 1096
t
arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
time: 1019 gc: 0 mem: 776
Apparently it makes a significant difference (30x) inside a tight Racket loop. For fun, let's try the unsafe ops too.
arc> ($:require racket/unsafe/ops)
#<void>
arc> (time:$:let loop ((n 0)) (if (unsafe-fx> n 10000000) 't (loop (unsafe-fx+ n 1))))
time: 28 gc: 0 mem: 760
t
Hmm, I had an idea. Put the number check first. New definition:
(define (ar-+2 x y)
(cond ((number? x) (+ x y))
((char-or-string? x)
(string-append (ar-coerce x 'string) (ar-coerce y 'string)))
((and (arc-list? x) (arc-list? y))
(ac-niltree (append (ar-nil-terminate x) (ar-nil-terminate y))))
(#t (+ x y))))
arc> (eval:list '$ (read:pbpaste))
#<void>
arc> (= + $.ar-+2)
#<procedure:ar-+2>
arc> (time:repeat 1000000 nil)
time: 122 gc: 0 mem: 3256
nil
arc> (time:repeat 1000000 nil)
time: 121 gc: 0 mem: 1880
nil
arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
time: 310 gc: 0 mem: 776
t
arc> (time:$:let loop ((n 0)) (if (> n 10000000) 't (loop (ar-+2 n 1))))
time: 323 gc: 0 mem: 936
t
What, now the Arc loop goes faster than it did with the Racket +. Probably because this function assumes two arguments, while Racket + assumes any number. And now the Racket loop is only 10x slower than with Racket +. (I expect Racket knows how to optimize its own 2-argument +.) By the way, in case you think the Arc loop goes faster than Racket, note that the Arc loop goes to 1 million and the Racket loop to 10 million; Racket is here going 3x as fast as Arc (and if Racket used Racket +, it'd be going 30x as fast).
Oh, um, I should test the case-lambda thing. I copy to clipboard my definition of + from the Optimizations page:
Virtually no difference from the raw ar-+2. Just for reference, so I can be sure ar-+2 only goes faster than Racket + in the Arc loop because ar-+2 effectively declares that it takes only two arguments:
Good, I'm not going crazy. Anyway, that final version of ar-+2 is my official recommendation.
Oh, incidentally, this "time" thing (though I think it was just "mem-use" back then) also helped me figure out that "<", ">", and "is" were allocating memory (provoking my work behind the Optimizations page). For example, I found that (no x) malloc'd; I eventually realized that this is because "is" mallocs, and (no x) is defined as (is x nil).
Also, here's some further relevant usage of "time":
arc> (time:no:= xs (n-of 1000000 0))
time: 2351 gc: 331 mem: 32513488
nil
arc> (time:no:zap nrev xs)
time: 152 gc: 0 mem: 632 ;hell yeah, nrev works well
nil
arc> (time:no:zap rev xs)
time: 254 gc: 0 mem: 32000584
nil
;now here's when it GC's
arc> (time:no:zap rev xs)
time: 371 gc: 117 mem: 31824248
nil
arc> time:len.xs ;this is my fixed "len"
time: 8 gc: 0 mem: 0
1000000
;now, for comparison, let's see the old definition of "len" from ac.scm
arc> (= len (eval:list '$ (read:pbpaste)))
#<procedure>
arc> time:len.xs
time: 461 gc: 261 mem: 68936656 ;OMG this is ridiculous
1000000