Arc Forumnew | comments | leaders | submitlogin
An O(ND) diff algorithm in Arc
13 points by almkglor 6115 days ago | 10 comments
Adapted from the paper:

http://citeseer.ist.psu.edu/cache/papers/cs/31172/http:zSzzSzxmailserver.orgzSzdiff2.pdf/myers86ond.pdf

  (require "lib/defpat.arc")
  
  (def seq-diff (a b (o is is))
    ; data structure:
    ;  (a-seq b-seq diff N)
    (breakable:withs
          (
           build list
           ; dt here is the data structure above
           snake
           (fn (dt)
             (let (a-seq b-seq diff N) dt
               (if
                 (is (car a-seq) (car b-seq))
                   (do
                     (while (and a-seq b-seq (is (car a-seq) (car b-seq)))
                       (= diff (cons `(skip) diff)
                          N (- N 2)
                          a-seq (cdr a-seq)
                          b-seq (cdr b-seq)))
                      (if (is N 0) (break:rev diff))
                     (build a-seq b-seq diff N))
                 (is N 0)
                   (break:rev diff)
                 ; else
                   dt)))
           downsnake
           (fn (dt)
             (let (a-seq b-seq diff N) dt
               (if b-seq
                 (snake (build
                          a-seq
                          (cdr b-seq)
                          (cons `(insert ,(car b-seq)) diff)
                          (- N 1))))))
           rightsnake
           (fn (dt)
             (let (a-seq b-seq diff N) dt
               (if a-seq
                 (snake (build
                          (cdr a-seq)
                          b-seq
                          (cons `(delete ,(car a-seq)) diff)
                          (- N 1))))))
           N-of [_ 3]
           minN
           (fn (dt1 dt2)
             (if (no dt1) dt2
                 (no dt2) dt1
                          (with (N1 (N-of dt1) N2 (N-of dt2))
                            (if (< N1 N2) dt1 dt2))))
           next-line
           (p-m:afn
             ((x y . zs))  (cons (minN (rightsnake x) (downsnake y)) (self (cons y zs)))
             ((x))         (list (rightsnake x))))
    ((afn (line)
        (self
          (cons
            (downsnake (car line))
            (next-line line))))
      ; start the triangle with a single line
      (list (snake (build a b nil (+ (len a) (len b))))))))
Example:

  arc> (seq-diff '(a b b d) '(a b d b))
  ((skip) (skip) (insert d) (skip) (delete d))


1 point by almkglor 6114 days ago | link

11 upvotes and no comments?

-----

2 points by akkartik 6114 days ago | link

Perhaps the best responses will be code enhancements. And those have a slower cadence.

What would be cool is if we could see a thread-based view of anarki. Kinda like news.yc but with modules or functions as stories. So that next time somebody makes a change to this function it shows up as a comment under it. Patches that change multiple functions show up in them all.

Argh, that makes no sense so far..

-----

3 points by almkglor 6114 days ago | link

Actually sounds cool ^^. Although try this:

http://git.nex-3.com/arc-wiki.git?a=history;f=arc.arc;h=5aae...

I guess though that it isn't quite like threads ^^ there's just one thread after all

-----

1 point by akkartik 6114 days ago | link

well, each file is its own thread. I guess what I'm asking for is one thread per function. And you don't have to parse code or anything, just delimit functions to be between unindented lines.

-----

1 point by almkglor 6113 days ago | link

There is the minor problem where a bunch of functions have the same environment, cf. w/infile and friends^^.

-----

1 point by akkartik 6113 days ago | link

Yeah, browsing static code won't show you every aspect of the dynamic runtime context for that code, but that doesn't seem like a fair comparison given we're just talking about a skin for gitweb. Perhaps I don't understand what you're getting at?

-----

1 point by almkglor 6112 days ago | link

  (let expander 
       (fn (f var name body)
         `(let ,var (,f ,name)
            (after (do ,@body) (close ,var))))
Unindented here. Is the above given its own division? But it wouldn't make much sense to group this by itself, since it's used privately by other functions.

    (mac w/infile (var name . body)
      " Opens the given file `name' for input, assigning the stream to `var'.
        The stream is automatically closed on exit from the `body'.
        See also [[w/outfile]] [[w/instring]] [[w/stdin]] [[w/socket]] "
      (expander 'infile var name body))
Again, unindented at this point. However, it shares some code with other functions after it.

    (mac w/outfile (var name . body)
      " Opens the given file `name' for output, assigning the stream to `var'.
        The stream is automatically closed on exit from the `body'.
        See also [[w/infile]] [[w/appendfile]] [[w/outstring]] [[w/stdout]] "
      (expander 'outfile var name body))
... etc.

Of course, we could just define a definition as being divided by unindented non-empty lines ^^.

-----

1 point by akkartik 6112 days ago | link

Jeez, but of course :)

While we're nitpicking, a set of unindented lines should get grouped together as well. No point having a bunch of single-line separators.

And if using indentation seems too hackish, just balance parens, curlies, etc. And separate by lines containing level-1 brackets. Almost as easy to do, and just as language-agnostic.

-----

1 point by nex3 6114 days ago | link

It's cool, but there's not much to add, I guess.

-----

2 points by almkglor 6114 days ago | link

Crick, my code must be completely unclear. Oh well. The paper describes snakes; my implementation makes use of "downsnakes" and "rightsnakes" as references to the edit graph, with a "downsnake" being a single downward move followed by a snake and a "rightsnake" similarly defined.

The paper describes a graph traversal method which I eventually realized was suspiciously like Pascal's Triangle (tilt your head to the left 45 degrees and squint ^^); hence the reference to "start the triangle". Basically instead of + as in pascal's triangle I have (minN (rightsnake left) (downsnake right)), while the leftmost and rightmost nodes are generated via downsnake and rightsnake, respectively, instead of the identity function as in Pascal's.

-----