Arc Forumnew | comments | leaders | submitlogin
Scheme version of JSON
3 points by akkartik 1789 days ago | 30 comments
I just added an existing scheme json parser to anarki (http://www.lshift.net/blog/2005/08/22/json-for-mzscheme-and-a-portable-packrat-parsing-combinator-library). It seems to be over 20x faster than the pure-arc version shown in this forum. Not a big deal for most uses, but I've been importing a couple KB of json.

The parser needed some changes for lifting into arc: it returns arc literals for true/false/null (false and null both return nil). Let me know if you spot any other bugs.

The process raised several issues:

a) anarki needs a process for loading scheme libraries. The best I could do was add to the end of ac.scm. Did I miss something? I think the ideal solution would be to specify scheme libraries on the arc commandline to be loaded on startup.

b) Is there a way to update the mzscheme lib load path from within scheme? The best I could find was the PLTCOLLECTS variable.

c) Setting the PLTCOLLECTS variable is contextual, so the best I could do was to comment out the load to json with instructions. So I'm pretty sure I haven't broken anarki for everyone :o)

And oh, does github usually take a while to reflect new pushes on the website?

The whole process seems hacky to me, so feel free to tell me off if this was a bad idea.



2 points by aw 1787 days ago | link

[edit] added support for #t, #f, and #\nul...

akkartik, I was wondering if you might like to test this approach and see how fast it is compared to your version? (I could test the speed myself if you wanted, but without your JSON data my results might not be representative).

I believe the particular require planet line below requires MzScheme version 4, let me know if you're using version 3...

  (def deepcopylist (xs)
    (if (no xs)
         nil
         (cons (deepcopy (car xs)) (deepcopylist (cdr xs)))))

  (= scheme-f (read "#f"))
  (= scheme-t (read "#t"))

  (def deepcopy (x)
    (if (is x scheme-t)
         t
        (is x scheme-f)
         nil
        (is x #\nul)
         nil
         (case (type x)
           table (w/table new
                   (each (k v) x
                     (= (new (deepcopy k)) (deepcopy v))))
           cons (deepcopylist x)
           x)))

  ($ (require (planet dherman/json:3:0)))

  (= scheme-read-json ($ read-json))

  (def read-json (s)
    (deepcopy (scheme-read-json s)))
The require planet line gave me a bunch of warnings about not having scribble installed, but it worked anyway.

The idea is we run the original unmodified Scheme module, and then convert its Scheme return value to Arc.

I expect this would probably be slower than your code. The question is, how much slower? If it turns out to be, oh, 1% slower for example, we might not care! :) And a lot easier than going into every Scheme module we might want to use and modifying it to return correct Arc values.

-----

1 point by aw 1787 days ago | link

OK, here's a version that uses the lshift parser that you ported. Uncomment the require file line and put in the path to the original json.ss module...

  (def deepcopylist (xs)
    (if (no xs)
         nil
         (cons (deepcopy (car xs)) (deepcopylist (cdr xs)))))

  (= scheme-f (read "#f"))
  (= scheme-t (read "#t"))
  (= scheme-vector? ($ vector?))
  (= scheme-void? ($ void?))
  (= vector->list ($ vector->list))
          
  (def deepcopy (x)
    (if (is x scheme-t)
         t
        (is x scheme-f)
         nil
        (scheme-void? x)
         nil
        (scheme-vector? x)
         (w/table new
           (each (k . v) (vector->list x)
             (= (new (deepcopy k)) (deepcopy v))))
        (acons x)
         (deepcopylist x)
         x))

  ; ($ (require (file "/tmp/json-scheme-20050827134102/json.ss")))

  (= scheme-json-read ($ json-read))

  (def json-read (s)
    (deepcopy (scheme-json-read s)))
I timed a few runs of your port and this version against your data set; times for both versions varied between 585ms and 831ms on my laptop, but there wasn't a difference that I could see between the two versions given the spread of times for each run.

-----

1 point by akkartik 1787 days ago | link

Interesting that you see no difference! What platform are you on?

BTW, you can replace deepcopylist with simply (map deepcopy x).

-----

1 point by aw 1787 days ago | link

I'm running Linux on my laptop. There could well be a difference, I simply haven't run the tests enough times to be able to tell, given that I'm getting a pretty wide spread of run times for each version. Which could be for example my web browser sucking up some CPU at random times or whatever...

-----

1 point by akkartik 1787 days ago | link

Great idea. I thought of that approach but discounted it without attempting.

I tried running it but got a parse error.

  default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
  default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
  setup-plt: error: during making for <planet>/dherman/json.plt/3/0 (json)
  setup-plt:   default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
  setup-plt: error: during Building docs for /home/pair0/.plt-scheme/planet/300/4.1.3/cache/dherman/json.plt/3/0/json.scrbl
  setup-plt:   default-load-handler: cannot open input file: "/usr/lib/plt/collects/scribble/base.ss" (No such file or directory; errno=2)
  Error: "read: expected: digits, got: ."
I'm going to email you the data set.

-----

1 point by aw 1787 days ago | link

This appears to be a bug in the JSON parser...

  $ mzscheme
  Welcome to MzScheme v4.1.5 [3m], Copyright (c) 2004-2009 PLT Scheme Inc.
  > (require (planet dherman/json:3:0))
  > (read-json (open-input-string "3.1"))
  read: expected: digits, got: .
I'll take a look at the JSON parser you ported.

It will be a better test anyway... the two different JSON parsers might have very different speeds for all we know.

-----

1 point by akkartik 1787 days ago | link

I just tried it with the version I already have.

  (time:deepcopy:w/infile f "x" (json-read f))
  time: 5874 msec.
  (time:w/infile f "x" (json-read f))
  time: 3953 msec.
So at least here it's within a factor of 2. Pretty useable.

-----

1 point by aw 1787 days ago | link

Yeah, be sure to try each test multiple times. You can get that much of a variance simply from the garbage collector running at one time but another, from the file needing to be read from disk vs. already cached in memory by the operating system, some other process hitting the CPU, or in EC2, the virtual CPU getting fewer cycles at the moment...

-----

1 point by akkartik 1787 days ago | link

Yeah it's fairly consistent. I've tried experiments where I interleave the two versions and compute an average.

-----

1 point by akkartik 1785 days ago | link

I just switched to this version (also in anarki). I was seeing errors with unicode escape sequences, and this bug was just easier to fix.

-----

1 point by aw 1785 days ago | link

Good! I wanted to see if the latest code would work with the hackinator, so I grabbed a copy from Anarki and updated the deepcopy code:

  $ hack ycombinator.com/arc/arc3.1.tar \
         awwx.ws/ac0.hack \
         awwx.ws/scheme-json0.hack
Seems to work. (require (file "lib/.../foo.ss")) appears to be the right thing to do for .ss files; a Scheme load uses Arc's readtable which messes up on square brackets, and a plain (require "lib/.../foo.ss") doesn't like periods in directory names.

-----

1 point by aw 1787 days ago | link

OK.

What version of MzScheme are you running?

-----

1 point by akkartik 1787 days ago | link

v4.1.3 on Ubuntu jaunty on EC2

-----

2 points by aw 1789 days ago | link

Note that you're not converting Scheme '() terminated lists to Arc nil terminated lists. This often isn't noticeable since Arc largely treats '() as a synonym for nil, but it does have an effect in some cases, such as keys in a table:

  arc> (= a (table))
  #hash()
  arc> (= k (fromstring "[1,2]" (json-read (stdin))))
  (1 2)
  arc> (= (a k) 'foo)
  foo
  arc> k
  (1 2)
  arc> (a k)
  foo
  arc> (a '(1 2))
  nil
Here k is (1 2 . ()) while '(1 2) is (1 2 . nil), so they're actually different keys in the table.

Whether you feel the need to fix this depends on your application... I ran into a similar issue myself in another project when I was using lists as keys in a table, but it might not affect you for what you're doing.

-----

1 point by akkartik 1789 days ago | link

Yeah, that's a bug. Sigh. The whole nil vs () makes it non-trivial for arc to benefit from mzscheme's extensive libraries.

Hmm, perhaps there's a way to fix it: get the reader to recognize #t, #f, #<void>, and get (is nil ()) to return t. Is that reasonable?

-----

1 point by aw 1789 days ago | link

(is nil ()) does return t:

  arc> (is nil (ac-scheme '()))
  t
or in Anarki,

  arc> (is nil ($ '()))
  t
Arc does treat Scheme's '() as a synonym for nil quite a bit, so there's only a few odd corner cases where the use of '() becomes visible.

Not sure what you mean by getting the reader to recognize #t, #f, #<void>?

For fun I once hacked ac.scm so that Arc's nil was represented internally by Scheme's '(). Inside of Arc everything was still the same: nil was still a symbol and so on. It even seemed to work, though I didn't test it very much.

-----

1 point by akkartik 1788 days ago | link

Most interesting, thanks for these ideas and tips.

The original scheme version converted true to #t, false to #f, and null to (void) which turns into the #<void> literal. These break in arc because #f isn't nil, and #<void> can't even be read by the reader. So I think I have to take a comprehensive look at mzscheme's syntax at some point and make sure that anything scheme code can emit can be read and understood by the arc compiler.

-----

1 point by aw 1788 days ago | link

Oh, do you mean that if, for example, Arc treated #t as a synonym for t, then we wouldn't have to do that conversion ourselves manually?

-----

1 point by akkartik 1788 days ago | link

Yes.

-----

1 point by aw 1788 days ago | link

hmm, ac.scm is a module, so you should be able to import it into json.ss:

  (require "../ac.scm")
that would give you access to ac-niltree, so at the point where json.ss is creating a list you could convert it to a nil-terminated list.

Another option is to recursively copy the return value of json-read.

-----

2 points by aw 1789 days ago | link

Scheme libraries can be loaded dynamically from Arc using Anarki's $ or my ac-scheme. And, using the the "file" variant of require means that PLTCOLLECTS doesn't need to be set. Thus in a fresh clone of Anarki as of this morning, with nothing uncommented:

  arc> ($ (require (file "lib/json.ss")))
  #<void>
  arc> ($ (xdef json-read json-read))
  #<procedure:json-read>
  arc> (fromstring "true" (json-read (stdin)))
  t

-----

1 point by akkartik 1789 days ago | link

Thanks a lot.

-----

1 point by akkartik 1787 days ago | link

I decided that the json library perhaps shouldn't even be in anarki, so I've been experimenting with keeping it in my project sources. It seems useful to be able to mix arc with scheme in a project, especially for performance reasons.

Problem is, I can't do it without literally hardcoding a string literal: ($ (require (file "/path/to/json.ss")))

I've tried defining a helper function, but require must be at the top level. I've tried saying (require (file (+ dir "json.ss"))), but it seems the inside of the require form isn't lisp but some toy, bizarro universe.

This whole experience is bringing home just how much I hate the mzscheme module system. Lisp is all flowing lines; require is a brick. Just one way to use it; once you release it all it can do is sink.

Any suggestions?

-----

1 point by aw 1787 days ago | link

A macro can expand into a ($ ...) or (ac-scheme ...) form, so try

  (= json-path "/tmp/json-scheme-20050827134102/json.ss")

  (mac load-json ()
    `($ (require (file ,json-path))))

  (load-json)

-----

1 point by akkartik 1787 days ago | link

Awesome thanks!

  (mac load-scheme(f)
    `($ (require (file ,(+ ($ start-dir*) f)))))
  (load-scheme "json.ss")

-----

1 point by thaddeus 1776 days ago | link

I was trying to read this thread and determine what came out of this... did it turn out that the scheme version was faster ? I'm curious.

thnx.

-----

1 point by aw 1776 days ago | link

What input did you give the parser that gave that error?

akkartik reports that the Scheme version is substantially faster. I have it working with the hackinator: http://arclanguage.org/item?id=10848

-----

1 point by thaddeus 1776 days ago | link

Sorry aw; I messed you up and edited my post after you replied....

see http://arclanguage.org/item?id=10942 - I fixed it on my own.

Here was the original case example:

  (fromjson "{\"RESULT\" : {\"SUCCESS\" : true} , \"SERVER\" : \"cool dude\"}")

-----

1 point by aw 1776 days ago | link

This is what I get:

  $ hack ycombinator.com/arc/arc3.1.tar awwx.ws/parsecomb0.arc awwx.ws/fromjson0.arc
  /tmp/4uG5ClxUqg
  ycombinator.com/arc/arc3.1.tar
  awwx.ws/parsecomb0.arc
  awwx.ws/fromjson0.arc
  mzscheme -f as.scm
  Use (quit) to quit, (tl) to return here after an interrupt.
  arc> (fromjson "{\"RESULT\" : {\"SUCCESS\" : true} , \"SERVER\" : \"cool dude\"}")
  #hash(("SERVER" . "cool dude") ("RESULT" . #hash(("SUCCESS" . t))))
  arc>
Are you possibly using an earlier version?

-----

1 point by thaddeus 1776 days ago | link

yup - sorry. I didn't update to the last version :)

-----