Arc Forum | Norvig's spelling corrector in Arc?

Arc Forum

	Norvig's spelling corrector in Arc?
	4 points by adm 6538 days ago \| 5 comments
	anybody tried doing Arc version? http://norvig.com/spell-correct.html

1 point by lojic 6481 days ago | link

I found the following implementation in Clojure which compares much more favorably to Norvig's Python version than the other Scheme/Lisp versions I've seen. Still curious about how Arc might compare.

Source: http://en.wikibooks.org/wiki/Clojure_Programming#Norvig.27s_...

  (defn words [text] (re-seq #"[a-z]+" (. text (toLowerCase))))

  (defn train [features]
    (reduce (fn [model f] (assoc model f (inc (get model f 1)))) 
            {} features))

  (def *nwords* (train (words (slurp "big.txt"))))

  (defn edits1 [word]
    (let [alphabet "abcdefghijklmnopqrstuvwxyz", n (count word)]
      (distinct (concat
        (for [i (range n)] (str (subs word 0 i) (subs word (inc i))))
        (for [i (range (dec n))]
          (str (subs word 0 i) (nth word (inc i)) (nth word i) (subs word (+ 2 i))))
        (for [i (range n) c alphabet] (str (subs word 0 i) c (subs word (inc i))))
        (for [i (range (inc n)) c alphabet] (str (subs word 0 i) c (subs word i)))))))

  (defn known [words nwords] (for [w words :when (nwords w)]  w))

  (defn known-edits2 [word nwords] 
    (for [e1 (edits1 word) e2 (edits1 e1) :when (nwords e2)]  e2))

  (defn correct [word nwords]
    (let [candidates (or (known [word] nwords) (known (edits1 word) nwords) 
                         (known-edits2 word nwords) [word])]
      (apply max-key #(get nwords % 1) candidates)))

-----

2 points by almkglor 6537 days ago | link

Probably one potential problem would be the splitting of a string into words. A minor problem is that of figuring out what a "word" is, i.e. the division between words.

Otherwise looks like a pretty standard Bayesian analysis, which I believe pg has done already.

-----

1 point by fallintothis 6536 days ago | link

Not really. It's a simple regexp that Norvig uses.

Python:

  def words(text): return re.findall('[a-z]+', text.lower())

Arc:

  (def words (text) (tokens (downcase text) [~<= #\a _ #\z]))

-----

2 points by cluelessmoron 6414 days ago | link

      (bayes-train (map lower-case (parse (read-file {big.dms}) {\W} 0)) 'Lexicon)
      (define (edits1 word) 
       (let ((l (length word)) (res '()))
        (for (i 0 (- l 1)) (push (replace (nth i (string word)) (string word) "") res -1))
        (for (i 0 (- l 2)) (push (swap i (+ i 1) (string word)) res -1))
        (for (i 0 (- l 1)) (for (c (char "a") (char "z")) 
          (push (replace (nth i (string word)) (string word) (char c)) res -1)
          (push (replace (nth i (string word)) (string word) (string (char c) (nth i (string word)) )) res -1)))
        res))
      (define (edits2 word) (unique (flat (map edits1 (edits1 word)))))
      (define (known-edits2 word) (filter (fn (w) (Lexicon w)) (edits2 word)))
      (define (known words) (filter (fn (w) (Lexicon w)) words))
      (define (correct word)
        (let ((candidates (or (known (list word)) (known (edits1 word)) (known (edits2 word)))))
          (first (sort candidates (fn (w1 w2) (> (Lexicon w1) (Lexicon w2)))))))

Test

      (join (map correct (parse "tihs sentnce is ful of inkorrect spelingz")) " ")
      
      this sentence is full of incorrect spelling

that's newLISP - should be easy to convert...

-----

1 point by lojic 6535 days ago | link

Here's a translation of Norvig's code to Ruby for comparison:

http://lojic.com/blog/2008/09/04/how-to-write-a-spelling-cor...

-----