Arc Forumnew | comments | leaders | submitlogin
1 point by aw 4740 days ago | link | parent

Times will be variable because of garbage collection, that's normal. But having your posmatch example take a minute is very weird though, I've never seen anything like that happen.

I'm actually surprised to hear that posmatch is almost as fast as Python's find. Python's find, after all, isn't written in Python (like how Arc's posmatch is written in Arc), it's written in C.

If you use a regex more than once you'll want to compile it. Both Python and Racket have this capability.



1 point by waterhouse 4740 days ago | link

I've sometimes experienced extremely long delays with DrRacket (I use DrRacket as an editor; I use the command-line "racket" to interact with Arc, and haven't observed these delays with racket). Perhaps Pauan is using DrRacket? And as for why that happens... I think it was a) trying to complete a garbage collection with b) a large part of its memory put into swap space. I'm puzzled by how it could take so long to un-swap that much memory (max 360-ish MB)... perhaps it did it in an impressively inefficient way (like, load page 1 containing a certain cons cell, then seek the disk for page 25359 containing its car, then seek back to page 2 for the cdr, and so on). Also, perhaps displaying the "recycling" animation contributes to the problem. Hmm, perhaps I'll figure that out at some point.

If you're not using DrRacket, then I'm not sure what might cause it, other than some unlikely things (e.g. memory of "racket" was swapped to disk and you were moving massive files around).

Meanwhile, if you really care about string-searching, or find it to be a pain point, you may want to implement the awesome Boyer-Moore algorithm: http://en.wikipedia.org/wiki/Boyer-Moore

-----

1 point by Pauan 4739 days ago | link

No, I'm just using Arc on the command line. Garbage collection was my bet, and you're right that it could have been swap as well.

---

Yeah, might actually be a good idea to integrate that into the base posmatch, but as I said, posmatch isn't actually that slow compared to Python's find, so the bottleneck is probably elsewhere.

-----

1 point by Pauan 4740 days ago | link

Actually, Python caches recently-used regexps, so you don't need to compile them in simple cases.

http://docs.python.org/library/re.html#re.compile

But yes, obviously in production-level code you should be compiling/caching them yourself, if solely on a "just-in-case" basis.

---

Also, I would like to point out that in all my time using Python, it's been very consistent as far as speed goes (no 3 second pauses), so would that imply that Python's garbage collector is more incremental than Racket's?

-----

1 point by aw 4739 days ago | link

One possibility is that Python uses reference counting, which immediately frees non-cyclic garbage as you go, and, iirc, an additional occasional garbage collection cycle to free the remaining cyclic garbage. So I'm just guessing, but if you're not creating much cyclic garbage, maybe that's an explanation for why you're not seeing many garbage collection pauses.

-----