Arc Forumnew | comments | leaders | submitlogin
4 points by Pauan 4502 days ago | link | parent

It didn't take long at all, actually. Just ~1 day of average work. The old version was this unmaintainable 220 line mess:

https://github.com/Pauan/ar/blob/a5b503bab736336c5e919eeb0b0...

The new version is this 148 line bit of loveliness:

https://github.com/Pauan/ar/blob/b086a3959b91265fbb4e00cc1d9...

Even ignoring the line count difference, the new version is so much easier to understand because it's much much simpler. Yet, despite (because of?) that simplicity, it provides some features that would have been really hard to add to the old version.

It's also much much faster, thanks to rocketnia who pointed out that I could use Racket's regexp support, which is amazingly fast, by the way.

At the time of this writing, I have 735 strings in all my playlists, and 1,437 music files. Because of the way it works, the playlist program requires O(nm) searches. That means, in my particular case, 1,056,195 substring searches. Let's look at how fast the various options are:

  118.529  posmatch (strings.arc)
   56.148  posmatch (mine)
   19.502  boyer-moore-search
    7.214  re-match
    1.995  %.regexp-match? (filename only)
    1.255  %.regexp-match? (entire filepath)
The numbers are how many seconds it took to process 1,056,195 strings. The posmatch included in Arc's "strings.arc" is quite slow: even my fairly simple custom posmatch is much faster.

The Boyer-Moore-Horspool string search library I wrote a while back is quite good, especially given how it's written in pure Arc.

But Racket's regexp support blows them all out of the water. Using the "re.arc" library included with Arc/Nu is already almost 3 times faster than Boyer-Moore-Horspool! And then when I switched to Racket's "regexp-match?" function it got even faster still.

Yes, that's right, using Racket's regexps, it only takes my playlist program ~1.26 seconds to process 1 million+ substring searches. Amazingly fast!

Also, something a bit strange... I had the idea of splitting the files so that it would only match against the filename rather than the filename + directory. That should have been much faster, but somehow Racket's regexps run faster on the longer filepaths than they do on the shorter filenames...

---

Oh yes, and, one thing that I find pretty nifty... the error messages are parseable Nuit text:

  @error string did not match any file
    @strings
      03 - Chrono Trigger
    @playlists
      Chrono Trigger
      3 Stars

  @error string matched multiple files
    @strings
      Battle 1
    @files
      lists/Chrono Trigger/Chrono Resurrection/Chrono Trigger Resurrection - Battle 1.mp3
      lists/Chrono Trigger/Chrono Resurrection/Chrono Trigger Resurrection - Boss Battle 1.mp3
    @playlists
      Chrono Resurrection
      5 Stars

  @error file was matched by multiple strings
    @files
      lists/Chrono Trigger/CT99/The Brink of Time Track 10 - Outskirts of Time <ysQw1h3SmbE>.mp4
    @strings
      The Brink of Time Track 10 - Outskirts of Time
      The Brink of Time
    @playlists
      Chrono Trigger
      3 Stars
      Chrono Resurrection
      5 Stars