Arc Forumnew | comments | leaders | submitlogin
Arc DB
7 points by shader 4095 days ago | 10 comments
Every time I almost get around to working on an arc based web service, one of the challenges I end up facing is, 'how do I store my data in an arc-friendly, scalable fashion?' What have you guys been using for your services, and what would you like in a database system if I was to work on one?

So far, most of my data models have been simple obj's with an id field, indexed in multiple ways by adding them to the relevant index tables. Like "users" and "items", with subsets of items and users as necessary, using the push-on-update semantics. This seems fairly scalable, and relatively directly mappable to some of the newer nosql key-value storage systems, but I'm not sure what would be best for arc.

Most of what I want feature wise is just the ability to save and load items by key, generate the indexes, and be able to persist the links between objects. Ideally, a user could have references to items, and in the live instance they would just be pointers in a list, but when persisting have them convert to the guid and back again. Has anyone made anything like that, or been using a similar model?

I also wanted something where I could version the objects, though that feature is proving more challenging and may be unnecessary. Ideally, this would allow for an immutable data store, and I could load the state of the entire system by date, or a single object by incremental version. That may end up being best implemented by making a Cassandra (or similar) interface and storing each object as a key in a CF and each column be the date of the change.

Thoughts on the data model and versioning concept? Should this be implemented directly in arc or just as a mapper over an existing nosql system? I had started out by using git as the database, but supposedly the index system that git uses doesn't scale very well to large numbers of objects, which I'm expecting to have.

3 points by dido 4095 days ago | link

I wonder why Arc seems to have stayed away from the traditional SQL databases that have long been used for web apps. There are few problems for which relational databases are inappropriate, and not using one results in reinvention of the wheel which I think is hardly a good thing to be doing. A DBI work-alike by which one can connect to a MySQL, Postgres, or Sqlite database would be a lot simpler to my mind.


3 points by Pauan 4094 days ago | link

Because it's the simplest solution for Arc. Arc does not have any kind of SQL library, and it doesn't even have an officially supported way to drop down to Racket libraries, nor does it have any kind of FFI whatsoever. So flat files are indeed the fastest way to get up and running.

Perhaps in the long run it might be better to have some sort of way to connect to an SQL DB, but keep in mind that Arc is still a work in progress, and unfortunately has not been updated in a long time. For a while, Arc didn't even have Unicode support, because pg was working on more important things.

If you think of Arc as being a prototype, then it all makes sense. pg probably intended to eventually flesh it out into a full language, but I hear he hasn't had the time.


3 points by shader 4094 days ago | link

If arc had support for something like DBA or SQLAlchemy built in, I might just have used it with either postgres or sqlite. However, neither of those databases really fit the arc data model very well, imo, because arc is very hash table and list oriented. Objects have very little in the way of a set schema, and hash tables map pretty well to... hash tables.

Anyway, I mostly want to leave all the objects in memory and use direct references between them; my data relations aren't that complicated, and explicit relations where necessary are actually fairly efficient. In fact, that's what most orm's a la SQLAlchemy seem to do; whenever an object is loaded, you can specify desired relations that also get loaded in memory, so you don't have to explicitly query the database each time.

Memory is cheap these days, and I was hoping for something that allowed versioning and perhaps graph-db features.


2 points by akkartik 4094 days ago | link

Hmm, do you care about threading and consistency at all? If not, you could probably do everything with just arc macros over the existing flat file approach..


3 points by shader 4092 days ago | link

I think that some form of scalability would be valuable, but that could easily be achieved with some sort of single threaded worker for each db 'server', and then have multiple instances running to provide the scalability. In order to make the single threaded semantics work well even in a multi-threaded application, I already have a short library for erlang-style pattern matched message passing.

Given the data volumes I've been planning on working with, I mostly want to use the permanent storage for history and fault tolerance, as opposed to live access. That could probably be handled in-memory for the most part. So maybe some form of flat file system would work without causing too many problems.

I originally started using git to effectively achieve that design without having to manage the trees, history, and diff calculation myself, but I've discovered that storing thousands of tiny objects in the git index may not be very efficient. I still think something similar is a good idea, but I would want to separate 'local' version states for each object from the 'global' version, so that it doesn't take forever to save the state of a single object. Maybe storing each object in a git 'branch' with the guid of the object as the branch name would work, since only one object would be in each index. The overhead for saving each object would be slightly higher, but it should be constant, rather than linear with the total number of objects.

Any obvious flaws with that idea that I'm missing? Have any better ideas or foundations to build off of?


1 point by akkartik 4092 days ago | link

Building atop git is an interesting idea, and you clearly have more experience with it. Do you have any pointers to code?


3 points by shader 4092 days ago | link

Here's the code I had written before, using the shell git interface to interact with the repo:

That code is pretty rudimentary, but allows low level access to git commands from arc, plus storage and retrieval of arc objects. After my previous comment though, I'll probably change it so that each object gets a separate branch, with 'meta branches' listing which branches to load if necessary.


1 point by akkartik 4094 days ago | link

Let's build this for the LISP contest!


4 points by dido 4094 days ago | link

So I guess that makes a consistent foreign function interface something very important for Arcueid to start having then. I think I've built up a foreign function API (sorta) and am now working out the details for dynamic loading of shared libraries so you can do something like (load "") and have it dynamically link into Arcueid, as well as a way to compile C sources into such an extension shared object.


1 point by akkartik 4095 days ago | link

I've thought about this as well for a while. Arc's approach seems to be flat files so far, which means any move to multiple servers is terra incognita on some level[1]. I mean to build a nosql system at some point, but I want to do something besides yet another[2] project. I want to understand[3] why we must couple technology stack choices with where we want to be in CAP space[4]. Why can't we have a single project that lets us tweak one knob for RAM vs persistent store, strong vs eventual consistency, and so on? Laying out the design space in one place may give us some chance at least accumulating lessons as we reinvent the wheel.

[1] I was overly harsh, but it's still non-trivial.