Arc Forumnew | comments | leaders | submitlogin
"Magical" on-disk persistence
12 points by palsecam 5348 days ago | 9 comments

  ;; Naive but "magical" on-disk persistence.
  ;; Just a demo, don't use this code for real.

  (= ptables* ())

  (def ptable (fname)
    (let tbl (safe-load-table fname)
      (push (list tbl fname) ptables*)

  (let _sref sref
    (def sref (com val ind)
      (let fnam (and (isa com 'table) (alref ptables* com))
        (do1 (_sref com val ind)
             (when fnam (save-table com fnam)))))
I initially thought 'disktable works like this, but 'disktable is actually just small sugar around 'save/load-table to save you some keystrokes. You still have to explicitely save the file to the disk by calling 'todisk, and this is the kind of things I easily forget to do.

'ptable is more like what Elephant ( offers you: nearly completely transparent on-disk persistence.

   arc> (= users* (ptable "usrs.db"))
   arc> (= (users* "admin") "adminpwd"
           (users* "palsecam") "12345")
   arc> (quit)
   ... later ...
   arc> (= users* (ptable "usrs.db"))
   #hash(("admin" . "adminpwd") ("palsecam" . "12345"))     
   arc> (wipe (users* "admin"))
   arc> (time:for i 0 1000 (= users*.i i))
   time: 1490 msec.  ; the file is rewritten at every pass of the loop...
   $ cat usrs.db     # in your shell
   (("palsecam" "12345") (1000 1000) (999 999)
   (6 6) (5 5) (4 4) (3 3) (2 2) (1 1) (0 0))
I don't know if 'sref is the right place to make this modification. A big trade-off is that all calls to 'assign (and so to '=, 'set, etc.) are slower.

Need a 'smart-save-table that would somehow delay the writing to the disk. And a SSD :-).

2 points by akkartik 5335 days ago | link

My current approach is more efficient in I/O: registries for load and save functions, and a thread that periodically saves everything. You would call it like so:

  (= l ())
  (is-persisted l)
  (= s (table))
  (is-persisted s)

  (= (s 3) 4)
  (push 34 l)

  (save-state) ; redundant; see save-thread* below

  ; New session

  s ; => #hash((3 . 4))
  l ; => (34)

  (= load-registry* () save-registry* ())

  (def load-state()
    (prn "*** Loading state")
    (each loadfn load-registry*

  (def save-state()
    (each savefn save-registry*

  (def save-thread()
    (while t
      (sleep 10)
  (= save-thread* (new-thread save-thread))

  (mac is-persisted(var)
    (withs (save-function-name (symize "save-" var)
            load-function-name (symize "load-" var))
        (def ,save-function-name()
          (errsafe:rmfile "snapshot.tmp") ;; XXX not thread-safe
          (fwrite "snapshot.tmp" ,var)
          (errsafe:mvfile "snapshot.tmp" (snapshot-name ,var)))
        (def ,load-function-name()
          (when (file-exists (snapshot-name ,var))
            (prn "Loading " ',var)
            (fread (snapshot-name ,var) ,var)))
        (push ,load-function-name load-registry*)
        (push ,save-function-name save-registry*))))

  (mac snapshot-name(var)
    `(+ "snapshot." ,(stringify var)))

  (mac fwrite(filename form)
    (let f (uniq)
      `(w/outfile ,f ,filename
          (if (isa ,form 'table)
            (write-nested-table ,form ,f)
            (write ,form ,f)))))

  (mac fread(filename form)
    (let f (uniq)
      `(w/infile ,f ,filename
          (if (isa ,form 'table)
            (= ,form (read-nested-table ,f))
            (= ,form (read ,f))))))

  (def list-len(l)
    (if (acons l)
      (len l)
  (def alist? (l)
    (if (isa l 'cons)
      (all 2 (map list-len l))))

  (def converting-tablists(l)
    (if (alist? l)
      (listtab2 l)

  (def listtab2(al)
    (let h (table)
      (map (fn ((k v)) (= (h k) (converting-tablists v)))

  (def read-nested-table((o i (stdin)) (o eof))
    (let e (read i eof)
      (if (alist e) (listtab2 e) e)))

  (def tablist2(h)
    (if (isa h 'table)
      (accum a (maptable (fn (k v) (a (list k (tablist2 v)))) h))

  (def write-nested-table(h (o o (stdout)))
    (write (tablist2 h) o))
I built this without knowing about diskvar; I'd prob not need to generate new functions for every variable.


2 points by palsecam 5335 days ago | link

(disclaimer: I'll nitpick in this post, and give some free, personal, critiscm, but akkartik asked me for comment by email, so this is what I'll do)

> My current approach is more efficient in I/O: registries for load and save functions, and a thread that periodically saves everything.

I think my improved solution (, just below) is actually the most I/O effective. Yours will rewrite the file(s) every 10 seconds, regarless of if the content has actually been modified. But I like the global thread idea.

In 'is-persisted:

  (fwrite "snapshot.tmp" ,var)
  (errsafe:mvfile "snapshot.tmp" (snapshot-name ,var)))
You're a good system programmer to write a temp file then rename it. But Arc does this for you when you call 'writefile.

This brings me to: your code is difficult to read. You're defining, IMO, way too many clones/variants of existing Arc functions ('tablist2, 'listtab2, 'fwrite, etc...). Maybe they're actually needed for your case, but maybe take the time to dive into Arc source, and see if something existing is not fitting your needs. If you really need to define them, please include a short docstring or a comment to explain why '...2 is needed.

'alist?: ahhh I'm like you, I like the '?' convention for predicates, but Arc doesn't follow it, and use the ambiguous 'a... convention instead. IMO, it's better to follow the convention, even if bad, because otherwise, again, it makes the code inconsistent, difficult to read by others. 'alist? is a name too close to the existing 'alist and this is quite confusing. fallintothis' suggestion, 'an-alist is a good one IMO. Still, I laughed reading it, because well, that's where the 'a... convention brings us. 'an-a... LOL.

Lisp is so awesome to let you use "special" characters in identifiers, and predicates are not often used, and '?' is like '=' it's easily parsable, I can't understand the use of 'a... But this is maybe just personal taste. The ...[-]p convention of CL is worse than '?' but at least less ambiguous. Why is 'afn not [isa _ 'fn]? What about the anaphoric stuff, which also use 'a... etc. And of course, english-centric convention, which is worse than latin-centric when you can choose between the two.

In general: interesting idea and implementation. Still, I prefer either 'diskvar/'disktable because it's in vanilla Arc, either my 'db/'ptable because the code is, well I wrote it so I can't objectively judge but, easier, shorter to read, and it doesn't use macros. And 'db/'ptable is transparent, where your code is like 'disktable: half-transparent (still need to 'todisk/'save-state).

Hope it was useful.


2 points by akkartik 5335 days ago | link

Thanks palsecam, I found it most useful. The issues with 'a naming are compelling. Good to know writefile implicitly does write+rename.

I like your alternative; weird that our comments kinda crossed (sorry I missed it before). I don't yet understand how it avoids unnecessary writes. I didn't appreciate how foundational sref is.

I left a link to an earlier discussion about the ..2 variants :) ( I need read-nested-table and write-nested-table because read-table and write-table don't handle nested tables. fread/fwrite is my attempt at a unified interface for pickling arbitrary objects. Ideally read/write would take care of that.


1 point by palsecam 5335 days ago | link

> I don't yet understand how it avoids unnecessary writes.

'ptable/'db? Because it only calls 'save-table when 'sref is called. 'sref is called when you modify/delete/create an element in the table.

  arc> (= sometbl!somekey 42)   ; 'sref is called, so is 'save-table (not immediately if using 'db)
So, if your table is not changed for 10 minutes, 'save-table is not called during these 10 minutes. See?

> / read-nested-table and write-nested-table

OK, I re-read it and I can understand now. Thanks. An unified interface is a good idea.

I just know about it but can't remember its purpose, but 'load-tables (notice the final "s") exists. Maybe it's here for nested tables [edit after looking at no it's not].


2 points by akkartik 5335 days ago | link

Ah, I don't know why it took me so long to realize how works.

Every sref calls the corresponding buffer-exec savefn. The first call to buffer-exec in an interval spawns a thread to save after the interval.

I think I just had my head stuck in the 'iterative' way and had to twist a little to return to the event-driven approach.

Reminds me of the time I used at to simulate cron.


2 points by akkartik 5335 days ago | link

See for a less ridiculously slow alist?


2 points by palsecam 5335 days ago | link

Improved version with a 'smarter-save-table. I changed the name of 'ptable to 'db in reference to the first drafts of Arc + it's short + I like it.

  (= buffered-execs* (table))

  (def buffer-exec (f (o delay 1))
    (unless buffered-execs*.f
      (= buffered-execs*.f 
         (thread (sleep delay) (wipe buffered-execs*.f) (f)))))

  (= dbs* ())

  (def db (fname (o delay 0.5))  ; file "synced" every 0.5 sec
    (withs (tbl (safe-load-table fname)
    	    savefn (fn () (atomic:save-table tbl fname)))
      (push (list tbl (fn () (buffer-exec savefn delay))) dbs*)

  (let _sref sref
    (def sref (com val ind)
      (do1 (_sref com val ind)
        (awhen (and (isa com 'table) (alref dbs* com)) (it))))


3 points by conanite 5348 days ago | link

Nice - ActiveRecord for arc? But why does it make 'assign slower - unless I've missed something, assign doesn't depend on sref?


2 points by rntz 5345 days ago | link

You are correct. 'assign doesn't depend on 'sref. '= does.