Peter Norvig has a simple Scheme interpreter (tail recursive, with call/cc) implemented in Common Lisp at http://norvig.com/paip/README.html - it would probably be a reasonable basis. One option would be to implement Scheme in Common Lisp, and then simply run the existing Arc ac.scm on top of that. But it would probably make more sense to cut out the middle layer and modify Norvig's code to run Arc instead of Scheme.
I'm finding Arc's type system to be kind of random. I'm not sure has-a vs is-a is the solution, but here are my issues:
a) Lots of operations operate on "sequences" (tables, lists, and string). Lots of other operations operate only on lists, or only on string and tables. There doesn't seem to be any real pattern. There also doesn't seem to be any fundamental "sequence" abstraction that supports implementation of more complex operations. It's more like "if list, do this. If string, do that. If table, do that."
b) Lots of string operations treat a list of characters and a string interchangeably. Lots of operations don't. Again, it seems pretty random.
c) Non-fundamental types, such as queues (http://www.arcfn.com/doc/queue.html) don't interoperate at all. That is, a sequence operation such as map will do entirely the wrong thing to a queue.
I don't know what the solution is, but it seems like there are some type abstractions struggling to escape from arc.arc..
I guess these issues will eventually be fixed. These are only bugs, not design issues.
As for is-a vs. has-a, I was wondering, isn't the biggest mistake considering that a piece of data only has one type ? That is, isa & type are broken ?
Considering values have a list of types (instead of a single type) would fix a few problems. For example, what is the type of nil ? Is it a symbol or a list ? Yes, it is both, but to do so, pg had to define the alist function ! (type nil) returns sym and cannot let you know nil is a list. What is '(1 2 3) ? Is it a cons or a list ? It is a list, but its type is only cons. type should return a list (it is already possible through annotate) and the definition of isa should be :
And the system would remain very generic (even more than now). Thus, almkglor's scanners would work with each (provided the definitions of car and cdr were adapted) :
(type s) -> '(cons scanner)
(isa cons s) -> t
Other representations would be possible, including the ones we already discussed here. The name isa does not necessarily mean we have an is-a semantic (and not has-a). Once again, each item in type is an annotation and can mean many things that are left to the programmer (it can be a class, a class type, a bunch of functions, whatever).
But the "objects-have-one-type" model seems broken. At least for lists, which is quite annoying for a Lisp !
"objects-have-one-type" model helps with nex3's defm:
(defm something ((t f scanner))
(defm something ((t f cons))
In this case, if we pass an object that is-a scanner and is-a cons, which method gets called?
This is the main reason I'm advocating is-a and has-a separation. We can say that an object is-a 'cons cell and has-a scanner. If something requires that an object is-a real, element-pointer-and-next-pointer 'cons cell, as opposed to somethingthing that requires that an object has-a 'car and 'cdr, we can make the distinction.
So we can say that an object is-a 'cons cell - it's what it really is, what it's implemented with. However, a 'cons cell has-a scanner interface, and if it's proper it has-a list interface, etc.
Separating is-a and has-a could also be useful for optimization of basic parts.
For example, a basic non-optimized diff algorithm might operate on has-a 'scanner, and use 'car and 'cdr operations. However a string-scanner is really just a wrapper around a string and an index into the string, as well as the string's length. Each 'string-scanner object contains three slots: one for the string, one for the index, and one for the length. This applies to each 'cdr on a string-scanner.
Now suppose we have a version of the diff algo which specifically detects if an object is-a string-scanner. It destructures the string-scanner into the string, index, and end, and instead of carrying around a triple of (string, index, length) it only carries the index, leaving string and length into local variables. This reduces memory consumption to only one-third.
(Note that the diff algo I posted a while back actually keeps entire sections of the list, in order to properly scan through their differences; that is, it keeps several scanners)
Hmm, I think there are 2 really different concepts here :
- type declaration of real-implementation (what you call "is-a"),
- type declaration in the sense of "capabilities" an object has (what you call has-a).
I think they should really be distinguished. The former is about optimized compilation, the latter about which functions can be applied to a given object.
But optimization is linked to variables (e.g. "in this block n always holds an integer, s always holds a string and l is always a cons) and does not need to be declared until you want to compile something.
On the opposite, capabilities are linked to values (e.g., "n, s and l are all scanners, they all have scanner capabilities, you can apply car and cdr to all of them. This is currently true, but could change if values referenced by n, s or l change). These are mandatory, and have to be known dynamically (this is not a declaration in the static meaning, they can even change later). When you apply car to a variable, you must know if its attached value can answer it (and eventually how).
(= str (string-scanner "foo bar baz"))
-> (scanner string)
(def scan (s)
(if (no s)
(cons (foo (car s)) (cdr s))))
There, the values held by s are considered as a scanner and a string, that is, car and cdr can be applied to them. A dispatch algorithm is applied to them on the moment we need it. If, at any moment, an object held by s cannot be applied the method car or cdr, we have an error. Until we want more speed, that's enough.
Now suppose we want more. All we have to do is :
(def scan (s)
(istype string-scanner s
(if (no s)
(cons (foo (car s) (cdr s)))))
That way, for optimization purpose, we state that s only holds string-scanner objects. It does not even have to care with the annotations you added to the value (or values) held by s. If an object held by s is not really a string-scanner, well, anything could happen.
I might be wrong, but I think super-optimizing CL compilers work that way. You say them "optimize that function, and btw, this var always holds strings, don't even bother checking and dispatching the right function".
Yes, but then it does not work with the core functions (notably each & friends), as they are relying on isa. Renaming your function isa does not work either (that would be too easy) : atom and some call isa themselves, so you get in an infinite loop. Btw, obj is a macro (at least in Anarki, I don't know if it's present in the official Arc2), so your code has a red flag on it (although it does seem to work).
I tried this, it does work :
(redef isa (x typ)
(isa-rec type.x typ))
(def isa-rec (types typ)
(no types) nil
(is types typ) t
(isnt type.types 'cons) nil
(is (car types) typ) t
(isa-rec (cdr types) typ)))
(isa nil 'sym) -> t
(isa (annotate '(sym list) nil) 'sym) -> t
(isa (annotate '(sym list) nil) 'list) -> t
(isa (annotate '(sym list) nil) 'cons) -> nil
And now the funny part :
(redef car (x)
(if (isa x 'int)
(if (> rep.x 0)
(redef cdr (x)
(if (isa x 'int)
(if (> rep.x 0)
(annotate type.x (- rep.x 1))
Both car and cdr now work on objects of type 'int (and objects of type 'cons, as before). If an object is both an int and a cons, its int being is taken into consideration. Every operation requiring 'cons cells or calling car and cdr can now be overridden to use ints instead (an int being a list whose car is the symbol 'a and whose cdr is that num - 1).
The problem is that each calls acons and alist, but they are defined in terms of (is (type x) 'cons) instead of (isa x 'cons). Once you redefine them, it works fine.
pg, do you still accept suggestions about the core functions ? Shouldn't acons and alist be defined with isa instead of is ? That would let us redefine them more easily. Btw, what do you think of all these discussions about types ?
A couple more broken things. trim seems to be broken with 'front:
arc> (trim "abc" 'front)
Error: "<: expected argument of type <real number>; given nil"
num doesn't work well with negative numbers:
arc> (num -123456)
I also don't get why there are both findsubseq and posmatch functions, since findsubseq seems to be a less-functional version of posmatch. findsubseq doesn't support a function as pattern, but posmatch does. On the other hand, findsubseq actually works.)
arc> (findsubseq "an" "banana" 2)
arc> (posmatch "an" "banana" 2)
arc> (findsubseq "an" "foo")
arc> (posmatch "an" "foo")
Error: "string-ref: index 3 out of range [0, 2] for string: \"foo\""
When I did the earlier Fibonacci benchmarking, I found it useful to take the code generated by ac, and run it inside mzscheme. (Note that :a will drop you from Arc to Scheme.) That is, running the exact same code that Arc does, just from the Scheme REPL. (This should take the same time as running in Arc, or else something is very wrong.) Then I could tweak the code a function at a time evolving from the Arc-generated code to the Scheme code, and see where the time was going.
Overall, it makes no sense that Arc would be faster. But it would be very interesting to find out why you're getting those results. Could there be some subtle algorithm difference, maybe lack of tail recursion, in the Scheme version?
Ok, I tried that. Running the fib code in a mzscheme REPL is slower than in an Arc REPL. But, if once in Arc REPL you type :a (and get to the mzscheme REPL with Arc functions loaded), the same Scheme code is then a little faster than the Arc counterpart. That means there is something in ac.scm that speeds up mzscheme computations. Maybe the fact that it's embedded in a module ? I really don't know, but at least this is not an aberration anymore.
Unfortunately, compatible axioms don't guarantee compatible code on top. The big problem is the operations that use "system" (e.g. date and ensure-dir) break on different OS's. This is the root of many of the Windows problems.
For this reason, I believe that use of "system" makes a mockery of the axiomatic construction, and has no place in the Arc core.
But at the very least, if both standard Arc and offshoot implementations both use system, they'll both break in the same way. Therefore code will still be "compatible" across Arc implementations (in that it will generate the same error messages on all implementations).
Yeah, I know that doesn't fix the problems you mentioned, and yes, I run Windows so I do encounter those problems with the vanilla Arc releases. But we can't expect a offshoot implementation to do anything about the problem, we really need pg to solve the problem at the source.
While Unix did win, that doesn't mean that Unix's names won. It's Unix's features that won. So it's reasonable to provide certain operations, but implementing them in terms of (system "unix-command-name ...") is not.
Of course, I use Mac OS X, so it's not like I'm adversely impacted. But the point still stands.
Arc doesn't have any way to seek in a file. The MzScheme (file-position n) operation does what you want. You can use it via a Scheme escape from the Anarki version of Arc, if they haven't explicitly added it already.
Client-server sounds more like RPC than FFI. That leads to SOAP and WSDL, or XML-RPC if you want something lighter weight.
The biggest problem I see with implementing FFI through a server is it's very expensive if the server needs lots of the client state. For instance, suppose you implement matrix determinant in C for speed. With client-server, you'd need to copy the whole matrix across the connection; with local FFI, the C code can access the matrix directly.