* the original version of Norvig's spelling corrector is written in Python, is 21 lines, and can be found at http://norvig.com/spell-correct.html, w/ lots of explanations and test materials.
Will look at it in details this evening, but for now:
* not sure if this changes anything, but in Norvig's edits1() version:
>>> edits1("python")
# the var `s' is [('', 'python'), ('p', 'ython'), ('py', 'thon'),
('pyt', 'hon'), ('pyth', 'on'), ('pytho', 'n'), ('python', '')]
But in your Arc version, the result of (map [split ... (- (len word) 1)) omits the last ("python" "")
arc> (def e (word)
(map [split word _] (range 0 (- (len word) 1))))
#<procedure: e>
arc> (e "python")
(("" "python") ("p" "ython") ("py" "thon") ("pyt" "hon") ("pyth" "on")
("pytho" "n"))
arc> (def e (word) ; "Fixed", making it even shorter :-)
(map [split word _] (range 0 (len word))))
*** redefining e
#<procedure: e>
arc> (e "python")
(("" "python") ("p" "ython") ("py" "thon") ("pyt" "hon") ("pyth" "on")
("pytho" "n") ("python" ""))
Makes me think, (- (len xs) 1) is a common idiom in languages w/ zero-based indexing.
Makes me think, maybe 'range should behave like in Python, i.e: if there is only one arg, the `start' is implicitely 0.
---
EDIT: oh my god, there are links to the impls of this spelling corrector in other languages on Norvig's page, and the Perl one is 63 lines (!) and is written in Perl6. I suppose I'd not be able to sleep well before trying to make something shorter (and in Perl5). A Perl program 3 times longer than its Python equivalent, never seen that before!
EDIT2: heck, thinking of it, the list-comprehension and special indexes of Python (word[:i]/word[i:]) may actually not be easy/short to do in Perl...
But. Both this version & pg's one are actually buggy. Obvious example:
arc> (correct "yellow")
"fellow"
Canonical version gives the expected answer, "yellow". So I'd say, no, these are not correct implementations (the Perl version seems to always give the same results than the Arc one).
A thing I have remarked is that the list returned by 'edits{1|2} contains the original value (i.e: (find "python" (edits1 "python")) is t), where this is not the behaviour of Norvig's version. (That's also why the loop in my correct() isn't (for ($win, &edits1($win), &edits2($win)), in contrary to the Arc one).
And this may be incorrect. Or maybe, if $win/word is in nwords, 'correct should stop (immediately). This would - at least - correct the "yellow" -> "fellow" problem.
This is, however, not the only issue. Norvig's version and these versions, given the same "big.txt", don't give the same correction for other words (try "speling", "godd"). And I strongly suppose that Norvig's version is the most correct.
(def known (words) (dedup:keep [nwords _] words)) ; lines count is now 12
(def correct2 (word (o f [nwords _]))
(most f
(or (known:list word) (known:edits1 word) (known:edits2 word) (list word))))
Or:
(def correct3 (word (o f [nwords _])) ; don't need 'known, but require aspirin
(most f (or ((apply orf (map [fn (w) (dedup:keep [nwords _] (_ w))]
(list list edits1 edits2)))
word) (list word))))
arc> (correct{2/3} "yellow")
yellow
arc> (correct{2/3} "godd")
good
arc> (correct{2/3} "speling")
spelling
arc> (correct{2/3} "unnkknnoown")
unnkknnoown
Not exactly the same results than Norvig's version (>>> correct("speling") -> "sling" / "godd" -> "god") but I tested the Ruby version linked on his site, and it yields the same results. Note that the result for "speling" is not really good in canonical version. Maybe it's because the order on Python sets is different from the one in Ruby/Arc lists. I should port the test program of Norvig to stop worrying, but it's OK for now. For now, let's say this version is better than the Norvig's one (!!!)