Arc Forumnew | comments | leaders | submitlogin
4 points by palsecam 5294 days ago | link | parent

OK, quick & dirty implementation in Perl5, copying pg's impl:

   open F, 'big.txt';
   while (<F>) { $NWORDS{$_}++ for (lc =~ /[a-z]+/g); }

   sub dedup { %h = map { $_, 1 } @_; keys %h }

   sub edits1 {
     $word = shift;
     &dedup(map { ($a, $b) = @$_;
	          ( $a . substr($b, 1),
	            $a . substr($b, 1, 1) . substr($b, 0, 1) . substr($b, 2),
	            map { ($a . $_ . substr($b,1), $a . $_ . $b) } 'a'..'z' )
       } map { [substr($word, 0, $_), substr($word, $_)] } 0..length($word)-1);
   }

   sub edits2 { &dedup(map { &edits1($_) } &edits1(shift)) }

   sub correct {
     $win = shift;
     for (&edits1($win), &edits2($win)) {
	$win = $_ if ($NWORDS{$_} > $NWORDS{$win});
     }
     $win;
   }
That's 19 lines of code. So Arc "wins" (and is far more elegant anyway, this is spaghetti Perl). But I feel better to know a Perl version doesn't need to be 63 lines long (http://www.riffraff.info/2007/5/20/a-spell-corrector-in-perl...).

But. Both this version & pg's one are actually buggy. Obvious example:

  arc> (correct "yellow")
  "fellow"
Canonical version gives the expected answer, "yellow". So I'd say, no, these are not correct implementations (the Perl version seems to always give the same results than the Arc one).

A thing I have remarked is that the list returned by 'edits{1|2} contains the original value (i.e: (find "python" (edits1 "python")) is t), where this is not the behaviour of Norvig's version. (That's also why the loop in my correct() isn't (for ($win, &edits1($win), &edits2($win)), in contrary to the Arc one).

And this may be incorrect. Or maybe, if $win/word is in nwords, 'correct should stop (immediately). This would - at least - correct the "yellow" -> "fellow" problem.

This is, however, not the only issue. Norvig's version and these versions, given the same "big.txt", don't give the same correction for other words (try "speling", "godd"). And I strongly suppose that Norvig's version is the most correct.



3 points by palsecam 5294 days ago | link

  (def known (words) (dedup:keep [nwords _] words))  ; lines count is now 12

  (def correct2 (word (o f [nwords _]))
    (most f 
      (or (known:list word) (known:edits1 word) (known:edits2 word) (list word))))
Or:

  (def correct3 (word (o f [nwords _]))  ; don't need 'known, but require aspirin
    (most f (or ((apply orf (map [fn (w) (dedup:keep [nwords _] (_ w))]
                              (list list edits1 edits2)))
                   word) (list word))))

  arc> (correct{2/3} "yellow")
  yellow
  arc> (correct{2/3} "godd")
  good
  arc> (correct{2/3} "speling")
  spelling
  arc> (correct{2/3} "unnkknnoown")
  unnkknnoown
Not exactly the same results than Norvig's version (>>> correct("speling") -> "sling" / "godd" -> "god") but I tested the Ruby version linked on his site, and it yields the same results. Note that the result for "speling" is not really good in canonical version. Maybe it's because the order on Python sets is different from the one in Ruby/Arc lists. I should port the test program of Norvig to stop worrying, but it's OK for now. For now, let's say this version is better than the Norvig's one (!!!)

Bonus: performance is better on average.

---

EDIT:

1. also need to be clarified:

* (range 0 (- (len word) 1)) VS (range 0 (len word))

* the need or not of (known:list word)

2. A revised version for the language with the happiest users (according to... Twitter :-D, see http://blog.doloreslabs.com/2009/05/the-programming-language... FTR) is left as an exercice to the reader. Hey it's an Arc forum here, not a Perl one ;-)!

3. In the same vein, could be interesting to solve the Euler Project (http://projecteuler.net/) in Arc. I think zck has done some work in this area (http://news.ycombinator.com/item?id=837030).

-----

1 point by s-phi-nl 5280 days ago | link

When I run Norvig's version, I do not get the results you do, but get the same results as you get from the Ruby/Arc versions.

  >>> spell.correct('speling')
  'spelling'
  >>> spell.correct('godd')
  'good'

-----