Arc Forumnew | comments | leaders | submitlogin
Reading a webpage
1 point by skenney26 5875 days ago | 6 comments
How do you read a webpage? I'd like to write a macro like this...

  (w/url u "http://arcxs.posterous.com/"
    ...)
... and have the variable u bound to a string that represents the content of the specified url. Then I could search the page for links, pics, etc.

I've tried a variety of approaches on Arc2 and Anarki but haven't found a solution yet. Any help would be appreciated.



2 points by antiismist 5875 days ago | link

... I haven't seen http://arcxs.posterous.com before - interesting site!

-----

2 points by almkglor 5875 days ago | link

stefano has an "http-get" library on Anarki. I haven't used it yet thought ^^

-----

1 point by skenney26 5875 days ago | link

I've been looking through stefano's code. It looks like something like this should work...

  (let (i o) (connect-socket url* 80)
    (disp (readline i)))
... or something like that, but I haven't gotten it to work. Usually I get an error or a return value of nil.

-----

3 points by stefano 5875 days ago | link

With http-get loaded this macro should do the job. You could also have a look at xml.arc to parse xhtml pages.

  (mac w/url (var url . body)
    `(let ,var (cadr (get-request (str->url ,url)))
       ,@body)))

-----

1 point by skenney26 5874 days ago | link

Awesome, that's exactly what I needed.

I'm not familiar with how to use xml.arc but this is what I came up with for finding the links on a page:

  (def find-links (str)
    (with (start 0 acc nil)
      (whilet p (posmatch " href=" str start)
        (= start
           (+ p (if (in (str (+ p 6)) #\' #\") 7 6)))
      (push (cut str start (pos [in _ #\' #\" #\> #\space] str start))
            acc))
      (rev acc)))

  (w/url u "http://www.google.com/"
    (find-links u))

-----

1 point by antiismist 5875 days ago | link

I used it a while back, and it seems to work.

-----