Arc Forum | Oh, at first glance it seemed to behave somewhat like an interpreter, my mistake...

Arc Forum

1 point by Pauan 5208 days ago | link | parent

Oh, at first glance it seemed to behave somewhat like an interpreter, my mistake.

---

Yeah, I do, actually. I wrote a program that can take an (ad-hoc and simple) playlist format and convert it into .xspf. I then rewrote the program in Arc (which was much nicer than writing it in Python), but it ended up being drastically slower. Which means I'm not sure if it's a problem in Arc, or Racket, or my program! It could be any combination of those three.

The program itself should have O(n^2) complexity, due to error checking (which actually isn't even implemented in the Arc version...) If I got rid of error checking, I could get it down to O(n log n) but I don't want to do that.

In any case, the complexity should be the same for both Python and Arc. If I recall, the slowness was primarily caused by me searching for whether one string is a substring of another string. Python has string.find (which performs faster than regexps, when I tried it), but I'm guessing Arc's implementation is slow.

This is all whishy-washy guessing, though. I haven't done concrete tests or anything, so take it with a grain of salt. However, I'm vaguely interested in finding out what the performance bottleneck is, and possibly fixing it.

---

Edit: I just tried these:

  # Python
  for item in range(100000):
      "foobarquxcorge".find("foobar") != -1

  ; Arc
  (repeat 100000 (posmatch "foobar" "foobarquxcorge"))

And got some very weird results... the Python version consistently takes about 2 seconds. When I first tried the Arc version, it took something like a minute or two. Aha! So that's the bottleneck? Not so fast. I then tried it a second time, and it took ~5 seconds... slower than Python, but not by too much.

It seems pgArc's performance isn't very reliable. I've noticed sometimes at the REPL that running the same (or similar) code will sometimes be instantaneous, and other times it'll chug along for a few seconds. I don't think it should be taking 1-2 minutes for something that should be taking 5 seconds, though.

However, my program pretty consistently took much longer than it does in Python, so I think I can safely rule out posmatch, which actually seems quite fast (almost as fast as Python, anyways).

There's room for improvement in posmatch, but it doesn't seem to be the smoking gun (at least not yet). For fun, I tried this:

  for item in range(100000):
      re.search("foobar", "foobarquxcorge")

It took ~10 seconds, so as I said, regexps are slower than find, in Python. Possibly because it has to create a match object, rather than just returning a number? I don't know.

1 point by aw 5208 days ago | link

Times will be variable because of garbage collection, that's normal. But having your posmatch example take a minute is very weird though, I've never seen anything like that happen.

I'm actually surprised to hear that posmatch is almost as fast as Python's find. Python's find, after all, isn't written in Python (like how Arc's posmatch is written in Arc), it's written in C.

If you use a regex more than once you'll want to compile it. Both Python and Racket have this capability.

-----

1 point by waterhouse 5208 days ago | link

I've sometimes experienced extremely long delays with DrRacket (I use DrRacket as an editor; I use the command-line "racket" to interact with Arc, and haven't observed these delays with racket). Perhaps Pauan is using DrRacket? And as for why that happens... I think it was a) trying to complete a garbage collection with b) a large part of its memory put into swap space. I'm puzzled by how it could take so long to un-swap that much memory (max 360-ish MB)... perhaps it did it in an impressively inefficient way (like, load page 1 containing a certain cons cell, then seek the disk for page 25359 containing its car, then seek back to page 2 for the cdr, and so on). Also, perhaps displaying the "recycling" animation contributes to the problem. Hmm, perhaps I'll figure that out at some point.

If you're not using DrRacket, then I'm not sure what might cause it, other than some unlikely things (e.g. memory of "racket" was swapped to disk and you were moving massive files around).

Meanwhile, if you really care about string-searching, or find it to be a pain point, you may want to implement the awesome Boyer-Moore algorithm: http://en.wikipedia.org/wiki/Boyer-Moore

-----

1 point by Pauan 5208 days ago | link

No, I'm just using Arc on the command line. Garbage collection was my bet, and you're right that it could have been swap as well.

---

Yeah, might actually be a good idea to integrate that into the base posmatch, but as I said, posmatch isn't actually that slow compared to Python's find, so the bottleneck is probably elsewhere.

-----

1 point by Pauan 5208 days ago | link

Actually, Python caches recently-used regexps, so you don't need to compile them in simple cases.

http://docs.python.org/library/re.html#re.compile

But yes, obviously in production-level code you should be compiling/caching them yourself, if solely on a "just-in-case" basis.

---

Also, I would like to point out that in all my time using Python, it's been very consistent as far as speed goes (no 3 second pauses), so would that imply that Python's garbage collector is more incremental than Racket's?

-----

1 point by aw 5208 days ago | link

One possibility is that Python uses reference counting, which immediately frees non-cyclic garbage as you go, and, iirc, an additional occasional garbage collection cycle to free the remaining cyclic garbage. So I'm just guessing, but if you're not creating much cyclic garbage, maybe that's an explanation for why you're not seeing many garbage collection pauses.

-----