| I'm writing a basic Arc interpreter--in Arc, of course. I'm kind of surprised that I never actually did this until now, but now I am. I have some grandiose eventual plans for it, but in the meantime, I've noticed something. Currently, my interpreter makes an expression whose car is 'fn evaluate to itself. In other words, functions are stored as lists. (No lexical scope yet; I just added global variables.) I'm about to add the Arc "data structures are functions" feature. That is, if xs is a list, then (xs n) is interpreted as (list-ref xs n). But what if xs is the list '(fn (x) (+ x 2))? (This might actually happen if, say, I'm reading code--or running an Arc interpreter.) This is a conflict. Either that won't work properly, or functions won't work properly. There needs to be a way to distinguish the function and the list. I'm seeing the need to tag objects, and why Paul Graham put "tag", "type", and "rep" there in the first place. [1] But, now, wait a moment. We need "type" to work, so that polymorphic functions ("apply" here) will work. We'll want "tag" so we can create and use new types. Why do we need a "rep" function? After all, functions, numbers, and conses are plain Racket objects, but the "type" function can recognize their types. Furthermore, at least three Lisp implementations (probably all that compile to machine code) use some of the low bits of a machine word to identify an object as a fixnum[2], a character, a pointer to a cons, and probably as a pointer to anything else[3]. It is entirely possible for the object and its tag to be the same object in an extremely literal sense. If you wanted to tag a cons as a "function" or "compiled-closure", you could just change the tagging low bits; the underlying "apply" function would use "type" to determine what kind of cons the cons was and act accordingly (likely extracting the car/cdr in the process). (...Actually, a better idea would be to have some user-controllable slightly higher bits that the builtin car/cdr wouldn't look at; the user would add his new tag and leave the "this is a cons" bits in place; then the compiler would feel comfortable, knowing that it is a cons cell (not an arbitrary integer) and it's safe to take the car of it. This takes a lot of bits, but on a 64-bit computer, a CPU word with a pointer in even the extremely forward-looking 48-bit address space of current computers still has 16 taggable bits left over. And even if you didn't want to supply bit-tags for all user-defined types--or they wanted non-symbol types--then you could escape by having a particular bit-tag mean "this is a pointer to an object containing the full tag followed by the actual object".) In Racket, you don't exactly have access to the low bits of a cons cell for tagging purposes. Instead we have to make a compound data structure--tag things with a vector--and we need a "rep" function to extract the object itself. But this is an implementation detail, and it can and should be hidden. It can be hidden by making all of the builtin functions just treat vector-tagged values as though they weren't tagged. E.g. we would say, in ac.scm: ;old version
(define (ar-xcar x)
(if (or (eqv? x 'nil) (eqv? x '()))
'nil
(car x)))
;new version
(define (ar-xcar x)
(cond ((ar-tagged? x) (ar-xcar (ar-rep x)))
((or (eqv? x 'nil) (eqv? x '())) 'nil)
(#t (car x))))
If you pass "car" a tagged cons, or you pass "+" a tagged number, it just strips off the tag, finds that this object is indeed something it knows how to add, and just adds it. If desired, you could add polymorphism--redefine + to dispatch on numbers tagged as numbers mod 7 or something, which it would discover by calling "type" on it before stripping off the tag. However, the strong default would be to treat it like the thing that it is--when "car" is called on the "fn"-tagged list, it's probably being called by "ar-apply", which knows exactly what it's doing and would like "car" to shut up and give me the car, thank you.(...Hmm. This vector representation seems to allow, in theory, for tags to be arbitrarily nested--or for, as people mentioned[4], to be any values. An ideal compiler would probably use low bits to tag the builtins, though, and if you just used simple symbol tags, maybe you could tell it to please allocate some more bit-tags for some heavily used types in your particular program.) The current state of affairs is, if you want to tag cons cells as polynomials or something, and you want to call car or map or anything on them, you have to use "rep" every single time you reference your polynomial, or else you have to redefine "car" and "map" to do some dispatching. It seems pretty horrible to me. The arbitrarily nested tagging is technically possible here, but I think people haven't run with it very much because the interface sucks. I'm kind of dumbfounded by how much better my proposal seems than the current scheme, and by how no one seems to have thought of it until now. Is there something wrong with this? Am I missing something? I've googled and grepped around, but haven't found anyone mentioning it. Hmm, one possible flaw occurs to me. If something is (tag a (tag b x)), a user-defined function won't be able to extract the "b" using 'rep. However, I don't think that's much of a problem: if the thing that tagged the (tag b x) as an "a" wanted to allow its callers to do inheritance or whatever on the "b" tag, it could instead have returned (tag (list a b) x). This seems pretty good to me, in fact; you can do anything you want with types, but the default (if you just tag things with symbols) is the very simple "everything has a single type (and a single physical representation)". And the interface is probably even pleasant to use. TLDR: "rep" is an implementation detail that can and should be hidden, and when it is hidden, we can all be happier. [1] http://www.paulgraham.com/ilc03.html [2] This is hella cool. In particular, if you use 000 [however many 0's] to tag fixnums, then you can do arithmetic with little overhead. All numbers are represented as 8 * their actual values; 8x + 8y = 8(x+y), so addition can be done with just an ADD instruction; and 8x * 8y = 64xy, so multiplication can be done with a MUL instruction plus a rightward shift by 3 bits. SBCL does this (note that 40 = 8 * 5): * (defun meh (x) (declare (optimize (speed 3) (safety 0)) (fixnum x))
(the fixnum (+ x 5)))
MEH
* (disassemble 'meh)
; disassembly for MEH
; 0404275F: 4883C228 ADD RDX, 40 ; no-arg-parsing entry point
; 63: 488BE5 MOV RSP, RBP
; 66: F8 CLC
; 67: 5D POP RBP
; 68: C3 RET
NIL
[3] http://wiki.call-cc.org/man/4/Data%20representation#immediate-objects[4] http://arclanguage.org/item?id=4855 |