Arc Forumnew | comments | leaders | submitlogin
1 point by thaddeus 4849 days ago | link | parent

A few things....

1. I made a big mistake on my time calculator for Clojure... it was using nanoTime with rounding errors. I had to re-write it. This is now correct with no rounding:

  (defmacro millitime*
    ([expr]
       `(let [start# (System/currentTimeMillis) 
              ret# ~expr
              stop# (System/currentTimeMillis)]
          (- stop# start#))) 
    ([n expr]
      `(map (fn[~'x](~'millitime* ~expr))(range 0 ~n))))

  2. Clojure runs:

  (def data (read-string (slurp (str rootDir* "data.seq"))))

  (defn sortby-commonest-Clojure-Long [seq]
    (flatten 
      (sort #(> (count %1)(count %2))
  	(afn [xs ys]
	  (if (empty? xs) ys
              (let [x  (first xs)  
                    r  (for [i xs :while (= i x)] i)]
              (self (drop (count r) xs) (cons r ys))))
       (sort seq) nil))))
	
  => (millitime* 10  (sortby-commonest-Clojure-Long data))
  (27 26 23 17 17 16 16 16 17 18)
  
  run2: (6 17 16 17 16 17 16 17 6 7)
  run3: (9 6 6 7 6 6 6 7 6 6)	

  ; restarted & reloaded data

  (defn sort-by-commonest-Clojure-succinct [xs]
    (let [h (frequencies xs)]
      (sort (comparitor > #(h %)) xs)))	
	
  => (millitime* 10 (sort-by-commonest-Clojure-succinct data))	
  (42 28 26 22 22 24 21 21 21 20)

  run2: (10 10 10 10 11 10 10 10 11 10)	
  run3: (10 11 10 10 10 12 11 10 10 10)

  3. Arc Runs:

 ; new time code for arc 
  (mac millitime (expr)
    (w/uniq (t1 t2)
    `(let ,t1 (msec)
       (do ,expr
            (let ,t2 (msec)
                (- ,t2 ,t1))))))

  (mac millitime10 (expr)
    `(accum xs
       (each i (range 1 10)
         (xs (millitime ,expr)))
      xs))


  (def sortby-commonest-Arc-Long (seq)
    (mappend [do _]	  
      (sort (fn (x y)(> len.x len.y)) 
	((afn (xs cxs ys)
	   (if empty.xs ys
               (withs (x  car.xs
                       f  (testify [isnt _ x]) 
                       r  (reclist [if (f:car _) _] xs)
                       cr len.r)
                (self r cr (cons (firstn (- cxs cr) xs) ys)))))
	   (sort > seq) len.seq nil))))

   arc> (millitime10 (sortby-commonest-Arc-Long data))
   (110 103 104 104 105 105 105 105 104 104)

   run2: (105 103 105 105 105 105 105 104 106 103)
   run3: (106 106 106 105 104 101 107 105 105 104)

   ; restarted & reloaded data
   
   (def sort-by-commonest-Arc-succinct (l (o f idfn))
    (let h (counts:map f l)
      (sort (compare > h:f) l)))

   arc> (millitime10 (sort-by-commonest-Arc-succinct data))
   (241 238 238 240 237 237 234 236 235 234)
  
   run2: (238 238 274 283 239 239 239 239 237 240)
   run3: (240 239 238 238 239 239 242 243 239 237)

  Here's rocketnia's new version:
  ; restarted & reloaded data

  arc>(millitime10 (sort-by-commonest5r1 data))
  (110 102 105 104 104 104 101 106 105 104)

  run2: (105 102 104 104 103 104 102 105 105 103)
  run3: (103 106 105 104 102 104 104 105 102 106)
  
Whew, this whole timing things sure takes allot of time :)


1 point by akkartik 4840 days ago | link

Ok, so arc's basically 10x slower than clojure for this example.

-----

2 points by thaddeus 4840 days ago | link

Yes, .... But, Clojure being faster was not really the part I cared about. What I feel this shows is that the straight forward idiomatic approach is, while concise, not always the optimal solution. The investigation was an attempt to consider other approaches, gain a better understanding of the languages and find some optimal paths. Comparing Arc to Clojure shows both languages have a similar ratio in performance metrics for the two cases, which allows me to normalize my approaches and not assume that an approach for one is equally applicable to the other.

-----

2 points by akkartik 4840 days ago | link

Yeah, I wanted to correct/clarify my comment at http://arclanguage.org/item?id=15076 for people coming here from search engines.

I'm not surprised that the readable way is less performant; I'm reassured that the price for readability is quite low. I was totally unsurprised that both require similar optimizations given that both are fairly mature toolchains. Compilers by and large get their performance the same way.

The one thing that can cause the optimal optimizations to shift is different machines with different capacities and hierarchies of caches and RAM. But this is a compute-bound program so even that's not a concern.

Update: I should add that I enjoyed this summary of yours. The previous comment just gave data and left your conclusions unspoken.

-----

2 points by thaddeus 4840 days ago | link

I agree, and my apologies, I did give only data - at the time of the writing I was trying not to draw conclusions and leave the door open for other options to come forward, but I should have followed up.

I'm not so sure I can agree with the price for readability being quite low. One could call it premature optimization, but a ~50% gain is pretty significant in my mind. Had it been 20% or less I would probably go forward and spend less of my time attempting alternate approaches, but at ~50% I think playing around and learning the boundaries and their benefits can yield positive results for me.

And, besides, stuff like this is just plain fun!

-----

2 points by akkartik 4840 days ago | link

:)

(No criticism intended; no apologies necessary.)

It's 50% if you do just that. In a larger app it's a difference of 1ms.

Now you could argue that everything in an arc program will be slower so I should consider it to be 50%. Lately I've been mulling the tradeoff of creating optimized versions vs just switching languages. When I built keyword extraction for readwarp.com I was shocked how easy it was to add a C library using the FFI. Why bother with a less readable, hard-to-maintain arc version for 2x or 5x if you can get 50-100x using an idiomatic C version?

The whole point of a HLL is to trade off performance for readability; I'd rather back off that tradeoff in a key function or two.

---

Shameless plug section

Wart currently has an utterly simple FFI: create a new C file, write functions that take in Cells and return Cells, run:

  $ wart
and they get automatically included and made available.

But I want to do more than just an FFI. I have this hazy idea of being able to specify optimization hints for my code without actually needing to change it or affect its correctness. They might go into a separate file like the tests currently do.

I dream of being able to go from simple interpreter to optimized JIT using just HLL lisp code (and a minimum of LLVM bindings). So far the prospect of attempting this has been so terrifying that I've done nothing for weeks..

-----

2 points by thaddeus 4840 days ago | link

"In a larger app it's a difference of 1ms." ...

I think that's generalizing too much. Using that one function with our nominal data set may only cost some ms, but what if you're dealing with hundreds of millions records? It may then, even though only representing .001% of your code base, account for 90% of your operating time - which is when someone normally kicks in with the "premature optimization" argument, which I can't really argue against, other than to say optimizing code is a skill that is generally done well by those who take it into account to begin with.

"Now you could argue that everything in an arc program will be slower so I should consider it to be 50%"

I wouldn't think this to be the case. I'm sure there's a tonne of juice squeezing one can do, but as a general statement, having played around with a lot of arc's code, I would guess most optimizing would yield less than 10%, but these 50+%, while they are few and far between are still worth the effort (to me).

The key, in all this, is to understand these languages well enough to make good judgment calls on where best to invest ones time.

I can't say much about wart and the rest (you're much deeper into language design than I am). :)

-----

2 points by akkartik 4840 days ago | link

"what if you're dealing with hundreds of millions records?"

Then just write it in C :)

Let me try to rephrase my argument. Sometimes you care about every last cycle, most of the time you care about making it a little bit faster and then you're done. Sometimes your program needs large scale changes to global data structures that change the work done, sometimes it needs core inner loops to just go faster. Sometimes your program is still evolving, and sometimes you know what you want and don't expect changes.

just a little faster + work smarter => rearchitect within arc.

just a little faster + faster inner loops => rewrite them in C

every last cycle + rigid requirements => gradually rewrite the whole thing in C and then do micro-optimizations on the C sources. You could do arc optimizations here, but this is my bias.

every last cycle + still evolving => ask lkml :)

If you're doing optimizations for fun, more power to you. It's pretty clear I'm more enamored with the readability problem than the performance problem.

I don't want to sound more certain than I am about all this. Lately I spend all my time in the 'just a little faster' quadrant, and I like it that way.

"The key, in all this, is to understand these languages well enough to make good judgment calls on where best to invest ones time."

Ideally they have profiling tools. For arc I use the helpers at the bottom of http://arclanguage.org/item?id=11556.

-----

3 points by thaddeus 4840 days ago | link

I don't disagree with your thought's, however I don't think they account for the TCQ aspect of everyone's situation.

Let's put put it another way. In my situation, if I have to look a using C then it's already game over[1]. However I can, with my limited amount of time (lunches, evenings and weekends) become proficient enough in Clojure (with a continual thanks to Arc).

What I am suggesting is that knowing the language well enough to easily identify these 50%+ hitters is a matter of finding low hanging fruit and at the same time becoming a better Arc/Clojure programmer. It does not mean I want to change my direction in the heavily-optimized/low-level to high-level/low-maintenance/readable code continuum.

[1] I'm not a programmer[2], I am a business analyst that works with software that simulates oil & gas production from reservoirs that sit several hundred kilometers underground. Simulation scenario's are run for 20 year production periods across hundreds of interdependent wells. The current software tools run the simulations across many cores, they take 8 days to run to completion and they cost about a half a million dollars per seat (+18% per year maintenance fees). This cost can be attributed to the R&D that occurred 8-10 years ago (i.e they required a team(s?) of P.Engs writing software in C to maximize the performance). Eight years ago you couldn't do (pmap[3] (sortby-commonest or whatever.... )) so easily. Nowadays I have the opportunity to create a 70% solution all by my lonesome, costing only a portion of my time. Hence why understanding the language well enough to find the low hanging fruit and not having to use C, is probably bigger deal to me.

[2] Well, maybe I shouldn't say this... rather I should say it's not my day job. I have no formal education in the field. I'm pretty much self taught.

[3] pmap is the same as map only it distributes the load across multiple processors in parallel.

:)

-----

2 points by akkartik 4840 days ago | link

Ah you're right. I wasn't taking individual strengths and weaknesses into account in my analysis.

-----

1 point by thaddeus 4840 days ago | link

> several hundred kilometers underground...

Lol, that's a gross overstatement, I meant to say several hundred meters (not that it's relevant to the topic) :)

-----