Arc Forumnew | comments | leaders | submitlogin
Forum patch: better domain stripping
6 points by jcs 2984 days ago | discuss
something that always bothered me on HN was that when displaying a domain name like code.google.com or someuser.github.com, it would strip off the first hostname component regardless of what it was. it looks like there is some code there to prevent this for certain domains, but rather than manually maintain such a list, i modified news.arc to only strip off the first part if it matches www.

this way a url with www4.hp.com will still show as hp.com, but support.hp.com will show as support.hp.com.

i realize this is probably used for some kind of spam control/rate limit, so maybe the code to show a domain name of a url should be different than sitename.

     --- ../arc3.orig/news.arc  2009-06-29 11:54:11.000000000 -0500
     +++ news.arc       2011-06-20 12:31:35.000000000 -0500
     @@ -1545,28 +1545,17 @@
        (aand (url->story* (canonical-url url)) (check (item it) live)))
      
      (def parse-site (url)
     -  (rev (tokens (cadr (tokens url [in _ #\/ #\?])) #\.)))
     +  (tokens (cadr (tokens url [in _ #\/ #\?])) #\.))
      
      (defmemo sitename (url)
        (and (valid-url url)
             (let toks (parse-site (rem #\space url))
               (if (isa (saferead (car toks)) 'int)
                   (tostring (prall toks "" "."))
     -             (let (t1 t2 t3 . rest) toks  
     -               (if (and t3 (or (mem t1 multi-tld-countries*) 
     -                               (mem t2 long-domains*)))
     -                   (+ t3 "." t2 "." t1)
     -                   (and t2 (+ t2 "." t1))))))))
     -
     -; Minor bug: can have both google.at and google.co.at.  Same for jp.
     -
     -(= multi-tld-countries* '("uk" "jp" "au" "in" "ph" "tr" "za" "my" "nz" "br" 
     -                          "mx" "th" "sg" "id" "pk" "eg" "il" "at" "pl"))
     -
     -(= long-domains* '("blogspot" "wordpress" "livejournal" "blogs" "typepad" 
     -                   "weebly" "blog-city" "supersized"
     -                   ; "sampasite"  "multiply" "wetpaint" ; all spam, just ban
     -                   "eurekster" "blogsome" "edogo" "blog" "com"))
     +             (let (t1 t2 t3 . rest) toks
     +               (if (headmatch "www" t1)
     +                  (+ t2 "." t3)
     +                  (tostring (prall toks "" "."))))))))
      
      (def create-story (url title text user ip)
        (newslog ip user 'create url (list title))