Arc Forum | akkartik's comments

Arc Forum

new | comments | leaders | submit | akkartik's comments

3 points by akkartik 4961 days ago | link | parent | on: Dynamic languages are static languages

You'll like this rebuttal :)

http://tratt.net/laurie/tech_articles/articles/another_non_a...

-----

2 points by rocketnia 4959 days ago | link

Robert Harper: "Since every value in a dynamic language is classified in this manner, what we are doing is agglomerating all of the values of the language into a single, gigantic (perhaps even extensible) type."

Laurence Tratt: "Bob points out that, conceptually, a dynamically typed program has a single static sum type ranging over all the other types in the program. We could therefore take a dynamically typed program and perform a conversion to a statically typed equivalent. Although Bob doesn't make it explicit how to do this, we can work out a simple manual method (though it's not clear to me that this would be fully automatable for most real programs)."

This part of the debate goes off on an unnecessary tangent. There's not really any need for a "gigantic (perhaps even extensible)" sum type.

  -- Haskell code
  
  {-# LANGUAGE FlexibleInstances #-}
  
  -- A hack to show functions as opaque values at the REPL.
  data ShowableFunc a b = ShowableFunc (a -> b)
  instance Show (ShowableFunc a b) where
    show _ = "[function]"
  
  data Value =
      FunctionValue (ShowableFunc Value Value)
    | SymbolValue String
    | TaggedValue Value Value
    -- etc.
    deriving Show

Having abandoned gigantic and extensible notions, the translation is actually very straightforward. The TaggedValue case is all we need in order to define a so-called dynamic type system in the Arc style.

Admittedly, we may need a basic DSL to help our embedded untyped programs go against the grain of the language.

Here's how we might start to support this kind of programming in Haskell. The only callable things in Haskell are of type (A -> B), so our programs will need to use a "call" function all over the place, but Haskell's type classes (with FlexibleInstances) make it possible to shove aside a lot of other boilerplate.

  -- Some type classes for convenient expression of Value values using
  -- Haskell syntaxes. We could go without these, but we'd need
  -- explicit conversions all over. Consider this weak typing.
  class ToValue a where
    tv :: a -> Value
  instance ToValue Value where
    tv x = x
  instance (ToValue a) => ToValue (Value -> a) where
    tv x = FunctionValue (ShowableFunc (\v -> tv (x v)))
  instance ToValue String where
    tv x = SymbolValue x
  
  call :: (ToValue func, ToValue arg) => func -> arg -> Value
  call func arg =
    let func' = tv func in
    let arg' = tv arg in
    case func' of
      FunctionValue (ShowableFunc haskellFunc) -> haskellFunc arg'
      _ -> error ("Call to non-function " ++ show func')

We can go on to define some built-ins for the dynamic type system:

  annotate :: Value
  annotate = tv $ \tag rep ->
    TaggedValue tag rep
  
  -- I'd name this "type" as in Arc, but that's a reserved word.
  getType :: Value
  getType = tv $ \x -> case x of
    FunctionValue _ -> SymbolValue "fn"
    SymbolValue _ -> SymbolValue "sym"
    TaggedValue tag _ -> tag
    -- etc.

Now we can try this out in GHCi:

  *Main> getType `call` (annotate `call` "my-type" `call` "my-rep")
  SymbolValue "my-type"
  *Main> getType `call` "my-type"
  SymbolValue "sym"
  *Main> tv (\a b -> annotate `call` (b :: Value) `call` (a :: Value)) `call` "bar" `call` "foo"
  TaggedValue (SymbolValue "foo") (SymbolValue "bar")

The ":: Value" annotations are necessary to make the type class instance we're using unambiguous. (As far as Haskell is concerned, we're still allowed to define another instance which works for arguments of type other than Value.) Another few utilities could save us from typing these annotations, but this is an example.

Notice that this technique can faithfully translate dynamic type errors too, without the static type checker getting in the way.

  *Main> "foo" `call` "bar"
  *** Exception: Call to non-function SymbolValue "foo"

I initially coded this stuff without the type classes, but I ended up having to write the conversions explicitly like this:

  *Main> getType `call` (annotate `call` SymbolValue "my-type" `call` SymbolValue "my-rep")
  SymbolValue "my-type"
  *Main> getType `call` (SymbolValue "my-type")
  SymbolValue "my-type"
  *Main> let fn haskellFunc = FunctionValue (ShowableFunc haskellFunc)
  *Main> fn (\a -> fn $ \b -> annotate `call` b `call` a) `call` SymbolValue "bar"
 `call` SymbolValue "foo"
  TaggedValue (SymbolValue "foo") (SymbolValue "bar")

---

Above I quoted both articles, but the rest of the quotes I'm responding to are just from Laurence Tratt.

---

"[If sum types are] Used indiscriminately – particularly where a sum type ranges over an unduly large number of other types – they create an unwieldy mess because of the requirement to check all the types that have been ranged over."

See my use of the catch-all pattern "_ ->" in the definition of "call" above. If the majority of the branches use a default behavior, usually this kind of pattern-match is available to save the day.

---

"The encoding of dynamically typed languages we used above would lead to a huge increase in the size of programs, since a static type system forces us to explicitly express all the checks that a dynamically typed run-time implicitly performs for us."

The kind of encoding I used would lead to only a moderate increase, and then only due to the lack of dedicated language-level support for that style of programming.

In statically typed programming, the "checks" tend to be expressed in the types rather than the expessions. Usually, when code uses a library utility, it doesn't have to reiterate the types of that utility, so the checks are still just as implicit.

(In my code above, the ":: Value" annotations are a counterexample. In a sense, this "check" is used for branching among type class instances.)

---

"It would also remove the possibility of making use of a dynamically typed system's ability to execute and run programs which are deliberately not statically type safe (a useful property when comprehending and modifying large systems)."

Some escape hatches have been investigated for this purpose. See gradual typing or Haskell's "Dynamic" type.

The way I think of it, sometimes types are a meaningful part of the program, and sometimes they're an effective way for a language or library to expose a well-behaving realm of possible uses in the first place.

It's easy to interpret the "well-behaving realm" case as artificial restrictions on an innately wild program space. But restrictive interfaces can frustrate us in untyped languages too.

---

"We could design a sound type system that will deal with this specific case but, because a sound type-system won't be complete, we can always find another similar run-time correct example which the type system will reject."

"It also provides a mirror-image encoding from above: one can always translate statically typed programs to a dynamically typed equivalent."

This section relies on the implicit assumption that for every program in a typed language, the type annotations can be deleted to yield a program in a corresponding untyped language.

That observation does not apply in general. Certain static type system features, such as Haskell's type classes, allow the programmer to specify run time behavior in the type annotations rather than in the expressions. This technique is used extensively in Haskell and Agda as a way to reduce expression boilerplate. I use it above.

---

"In this article, I've tried to show that statically and dynamically typed languages overlap in many ways, but differ in others. Neither is exactly a subset of the other nor, as the underlying theory shows, could they be."

I flatly disagree; untyped languages are exactly a special case of typed languages.

The thing is, integers are exactly a special case of real numbers too, and that's no reason to dismiss integers. :)

---

"One must consider the pros and cons for the tasks one has at hand, and choose accordingly. Personally, I use both styles, and I have no wish to unduly restrict my language toolbox."

Agreed.

-----

2 points by rocketnia 4956 days ago | link

At the risk of talking to myself... :-p

Actually, there's a terminology issue going on here. Laurence Tratt and Pauan find it important to talk about errors, so in the above post I went point by point to address that. But as far as I'm concerned, "{static,dynamic} type error" is nothing more than a way to say "{compile,run} time error that a statically {untyped,typed} style could have avoided," and I take that suggestion with a grain of salt.

Robert Parker and I don't see "dynamic typing" as some kind of alternative strategy with extra features that somehow prevent the use of static typing. For what particular reasons and contexts should static typing be considered harmful?

---

In my opinion, static typing is mostly useful for three reasons: It has nice symmetry with mathematical reasoning[1], it can support static analysis of programs[2], and it can control a particular side effect. I'll focus on that last part, since side effects are what distinguish one computing platform from another.

Static typing controls the side effect of indeterminism when calling a black box library. By default, code that calls a black box can't expect very much about what will happen, due to the fact that it's a black box. By the time the program has run, the box could have been filled with any version of the library. Static typing establishes at least some guarantees about what kind of library code will fit in the box. (Since the intricacy of those guarantees can span a rich mathematical theory, it's nothing to scoff at.)

The black box side effect is ultimately "implemented" as a result of the programmer's manual process of selecting and upgrading libraries. So we can't just write an interpreter that simulates a new variant of this side effect; it needs to fit seamlessly into the programmer's existing habits, or we're veering into new language territory.

Within a statically typed module system, untyped programming seamlessly fits in as a trivial special case--as "unityped" programming. But it isn't as clear how to do typed programming within a module system that doesn't already support that particular static type system (or another that's at least as logically strong). I'm very interested in options here.

---

[1] Symmetry with mathematics is just an aesthetic. Aesthetics has power too, but it doesn't do much to resolve debates. :)

[2] Static analysis is just the design pattern of doing a preliminary computation in order to reason about an upcoming "real" computation. When this pattern comes up, it's useful to design a static type system that meets the needs of the analysis, but static typing in the implementation language doesn't always help.

-----

2 points by Pauan 4961 days ago | link

I think this sums it up pretty well:

"All that I've done in the above is to restate a simple principle: statically and dynamically typed languages enforce their types at different points in a program's life-cycle. Depending on whether you view things through a static or dynamic type prism, either class of language can be considered more expressive. Dynamically typed languages will try and execute type-unsafe programs; statically typed languages will reject some type-safe programs. Neither choice is ideal, but all our current theories suggest this choice is fundamental. Therefore, stating that one class of language is more expressive than the other is pointless unless you clearly state which prism you're viewing life through."

-----

2 points by akkartik 4966 days ago | link | parent | on: Bounded Height Priority Queues

Interesting, passing in one of exactly two procedures. I tried to come up with a way to use compare, but it doesn't work.

Each bucket seems to act as a stack rather than a queue; elements come off in reverse order that they go in.

-----

2 points by fallintothis 4966 days ago | link

Interesting, passing in one of exactly two procedures. I tried to come up with a way to use compare, but it doesn't work.

I did find it a little ugly, but I thought it would be better than hard coding one over the other. At least the data structure itself is simple to understand, those details notwithstanding. :)

Each bucket seems to act as a stack rather than a queue; elements come off in reverse order that they go in.

Probably because each bucket is a stack, here. :P I favored -push and -pop over -enq and -deq names for that reason. It'd of course be easy to use Arc's queues instead; we'd get the same complexities.

I suppose "priority queue" is a misnomer we got stuck with. At least, Wikipedia thinks FILO v FIFO is secondary to the "priority" part: https://en.wikipedia.org/wiki/Priority_queue.

-----

2 points by akkartik 4966 days ago | link

Ah, I figured it out:

  (def bhpq (h (o < <))
    (obj top     (best ~< (list (+ h 1) -1))
         height  h
         buckets (table)
         < <))

  (def bhpq-push (priority elt bhpq)
    (unless (<= 0 priority bhpq!height)
      (err (+ "Priority must be between 0 and " bhpq!height ":") priority))
    (zap [best bhpq!< (list _ priority)] bhpq!top)
    (push elt bhpq!buckets.priority))

  (def bhpq-peek (bhpq)
    (car (bhpq!buckets bhpq!top)))

  ; still O(h), but probably does a lot more consing than the original
  (def bhpq-pop (bhpq)
    (do1 (pop (bhpq!buckets bhpq!top))
         (unless (bhpq!buckets bhpq!top)
           (= bhpq!top (best bhpq!< (keys bhpq!buckets))))))

I'm not sure it's an improvement, but it was a fun exercise to go through :)

-----

2 points by fallintothis 4966 days ago | link

Spiffy! Definitely reads a lot better.

You're right about the consing, though. It seems like there should be a lazy way of doing keys. But then, best hard-codes calls to both car and cdr upon the sequence. So, it looks like we'd be resigned to doing it by hand, and I'm not sure it's any better:

  (unless (bhpq!buckets bhpq!top)
    (= bhpq!top nil)
    (each (priority elt) bhpq!buckets
      (when (bhpq!< priority bhpq!top)
        (= bhpq!top priority))))

Ah, but this has made me notice an issue with the best keys approach to begin with: once we empty out the queue, !top will be nil, which won't compare correctly on the next bhpq-push.

  arc> (= q (bhpq 10))
  #hash((top . 11) (buckets . #hash()) (height . 10) (< . #<procedure:<>))
  arc> (bhpq-push 1 'hi q)
  (hi)
  arc> (bhpq-pop q)
  hi
  arc> (bhpq-push 1 'bye q)
  Error: "<: expects type <real number> as 2nd argument, given: nil; other arguments were: 1"

Conundrum. Suppose the answer is to factor out the (best ~< (list (+ h 1) -1)) so that we can get a fitting default value for !top:

  ; Top element of the abstract comparison operator bhpq!< (though we assume
  ; the operator compares integers!).
  (def bhpq-top (bhpq (o < bhpq!<) (o h bhpq!height))
    (best ~< (list (+ h 1) -1)))

  (def bhpq (h (o < <))
    (obj top     (bhpq-top nil < h)
         height  h
         buckets (table)
         < <))

  (def bhpq-push (priority elt bhpq)
    (unless (<= 0 priority bhpq!height)
      (err (+ "Priority must be between 0 and " bhpq!height ":") priority))
    (zap [best bhpq!< (list _ priority)] bhpq!top)
    (push elt bhpq!buckets.priority))

  (def bhpq-peek (bhpq)
    (car (bhpq!buckets bhpq!top)))

  ; Not as concise, but less consing/breaking
  (def bhpq-pop (bhpq)
    (do1 (pop (bhpq!buckets bhpq!top))
         (unless (bhpq!buckets bhpq!top)
           (= bhpq!top (bhpq-top bhpq))
           (each (priority elt) bhpq!buckets
             (when (bhpq!< priority bhpq!top)
               (= bhpq!top priority))))))

-----

3 points by akkartik 4969 days ago | link | parent | on: Requesting a hallway usability test

Yeah, I didn't expect everyone to remember that :) I just wanted a sense of whether the no-prompt prompt was intuitive. So far it seems to be intuitive!

-----

2 points by fallintothis 4969 days ago | link

One more vote for "intuitive". Works surprisingly well.

-----

1 point by akkartik 4969 days ago | link | parent | on: Requesting a hallway usability test

Thanks a lot! Yes, wart doesn't support bignums yet. Its arithmetic is that of the underlying C, which on a 64-bit machine can handle factorial of 20, but not 21.

  fact.20

  => 2432902008176640000

-----

1 point by akkartik 4969 days ago | link | parent | on: Requesting a hallway usability test

Thanks a lot for the comments! Pauan's right that the indent sensitivity seems to require the two newlines, and it is indeed weird that responses get grouped with the next command. I'm still looking for a better approach.

I'm interested in that warning you get. Are you on a 64-bit system? Could you type in the following into gdb and share the results?

  $ gdb -q
  (gdb) p sizeof(int)
  (gdb) p sizeof(long)
  (gdb) p sizeof(void*)

Edit: I just noticed that gdb on my machine gives different results with and without a binary. So you'll need to invoke it from the wart directory like so:

  $ gdb -q wart_bin

-----

2 points by kinleyd 4969 days ago | link

You are welcome, akkartik, and yes, I'm on a 64-bit system. Here's the info you wanted:

    [kdd@780 wart]$ gdb -q wart_bin
    Reading symbols from /home/kdd/My-Apps/wart/wart_bin...(no debugging symbols found)...done.
    (gdb) p sizeof(int)
    $1 = 4
    (gdb) p sizeof(long)
    $2 = 8
    (gdb) p sizeof(void*)
    $3 = 8

-----

1 point by akkartik 4969 days ago | link

Thanks! On my machine (gcc 4.4.3) I only get that warning for C files, not C++. Weird. What version gcc do you have?

  $ gcc --version

-----

2 points by kinleyd 4968 days ago | link

Also, it seems that error message came up only the first time I ran wart. I haven't seen it in later executions so there's probably no need for any additional concern. :)

-----

1 point by akkartik 4968 days ago | link

It comes up when building the C files. Since you haven't made any changes to the sources since there's no need to rebuild. But you can see it again if you first run:

  $ make clean

-----

2 points by kinleyd 4968 days ago | link

Ah yes, I see that now. So may be something to sweat a bit about, after all. :)

-----

2 points by kinleyd 4969 days ago | link

gcc 4.7.2

-----

1 point by akkartik 4970 days ago | link | parent | on: Wart update

I've been thinking harder about the numbered filenames. I had three goals in doing them.

a) To be able to add new files, split files, merge files, more easily. To not need to hack on the build system as much.

b) To make it dead obvious where a new user should start reading. But this only needs the early files to be numbered; past a point the precise number isn't important.

c) To not need to mess with include files as much. This overlaps with a) above.

Still thinking. I'm mulling some sort of symlink-based approach to make navigation more convenient.. Or maybe I can stop numbering past a point and ensure that the remaining files are loadable in any order. That might interact with your point 4 about generics, though.

-----

2 points by Pauan 4969 days ago | link

For what it's worth, Arc/Nu works like this. The "core" files are numbered:

  01 nu.rkt
  02 arc.arc
  03 utils.arc
  04 parameters.arc
  05 paths.arc
  06 import.arc
  07 repl.arc

Excluding parts of "03 utils.arc", the above is roughly the minimum needed to define Arc, the "import" macro, and the REPL.

They are loaded in numeric order, but that's controlled by the "arc" executable, so there's nothing stopping you from changing the order, or adding new non-numeric files[1], etc.

Everything else is put into either the "app" or "lib" folder, and none of the files in either folder are numbered.

This means the numbers are purely for documentation purposes, to help people see which parts go where. In particular, changing the numbers doesn't change the order in which things are loaded: you need to make the changes in the "arc" executable.

I think this is a good blend between not being too cumbersome, improving the visibility of order (which matters), and allowing fine-grained control of order.

---

* [1]: As an example of that, the "arc" executable also loads "lib/strings.arc" and "lib/re.arc" in addition to the above.

-----

1 point by akkartik 4969 days ago | link

Interesting. It doesn't address fallintothis's pain point though -- he still needs to remember the numeric prefix to navigate between the files.

As an experiment I've added a script that creates more mnemonic symlinks to all the files in the repo:

  $ git clone http://github.com/akkartik/wart.git
  $ cd wart
  $ git checkout 8332909aec  # Unnecessary except for visitors in the far future
  $ ./denum                  # New helper
  $ vim -O assign.wart mutate.wart  # etc.

I'm going to play with keeping these symlinks around and see how much I use them.

-----

2 points by fallintothis 4969 days ago | link

Addresses my chief complaint, at any rate. :) Couple of things:

- denum doesn't seem to work with plain /bin/sh,

  ./denum: 9: ./denum: [[: not found

though bash works.

- Slight nitpick over your perl regex:

  $ echo '(prn "hail satan")' > 666thedarklordcometh666.wart
  $ ls | grep thedarklordcometh
  666thedarklordcometh666.wart
  $ ./denum
  $ ls | grep thedarklordcometh
  666thedarklordcometh666.wart
  thedarklordcometh.wart

You probably want

  s/^[0-9]*//

- You could move the

  find . maxdepth 1 -type l -exec rm {} \;

to another script, like renum (not a 100% correct name, but close enough).

-----

2 points by akkartik 4970 days ago | link | parent | on: Anyone else going to Startup School / in the San F...

Yup, I'll be there.

-----

1 point by akkartik 4970 days ago | link | parent | on: Wart update

The port of quasiquote looks great!

1) I would use alias to ensure that atom? automatically picks up extensions to cons?, etc.

2) I've been experimenting with keeping prefix <- at the top-level, just to make globals salient. But this 'idiom' is less than 48 hours old; what do you think of it?

  <- carif (cons? & car)

3) I tend to often wrap stuff in parens even when I don't have to. I only skip parens for def/mac/if/while, etc., and for imperatives like err and prn. So:

  ern "The syntax `,@" expr " is invalid"

But:

  (and list?.xs ~dotted?.xs)
  (append @cdr.args)
  (cons car.a (append cdr.a @cdr.args))

..etc.

4) This is totally fine:

  (or (atom? cdr.x) (dotted? cdr.x))

But wart's precedence rules support:

  (or atom?:cdr.x dotted?:cdr.x)

Does that seem too ugly?

5) I really like to use :else in multi-branch if.

6) There's a gotcha with multi-branch ifs: beware if you rely on paren insertion in test expressions:

  if nil
      car '(34 35)
     no nil
      cdr '(34 35)

You might expect this to return (35), but it doesn't because it's interpreted as:

  (if nil
      (car '(34 35))
      (no nil (cdr '(34 35))))

I've been guarding against this by always adding parens for things that should act as expressions, and by always explicitly adding parens in multi-branch ifs:

  (if
    no.args
      nil
    ~cdr.args
      car.args
    :else
      (let a car.args
        (if no.a
          (append @cdr.args)
          (cons car.a (append cdr.a @cdr.args)))))

I think this adds to your case against indent-sensitivity :) Multi-branch if is the one place I find myself wanting more syntax. Curlies for all their flaws really help to separate test expressions from actions in C or Java. I experimented with a 'comment token' in the past (http://arclanguage.org/item?id=16495) but that feature was lost in the move to infix.

In any case, I hope I've convinced you that I don't consider parens to be second-class. I still reach for them a lot: http://github.com/akkartik/wart/blob/c1ca2f9749/004optional_...

-----

2 points by fallintothis 4969 days ago | link

1) Interesting! I assumed alias was an alias for <-. Clever, making it define a macro.

2) I liked being able to line up the infix <- for different-length variable names assigned in succession:

  Optimize_cons   <- nil
  Optimize_append <- nil

For carif, I was using it more like an "anonymous" def where I didn't have to name the arguments. Perhaps this purpose could use its own top-level form?

By the way, I like the Global naming scheme, which I picked up from Coercions; it's a good, surprisingly readable alternative to mashing shift for GLOBAL names.

3) Ah, ern is what I was looking for! Should've looked instead of guessing names from Arc, haha.

4) A bit much for my tastes, but then so is infix. ;)

5) I noticed and tried to use it throughout. Still missed a few instances, I gather! :)

6) Yeah, I ran into that a bit. Might've missed some cases, though.

-----

1 point by akkartik 4970 days ago | link | parent | on: Wart update

Thanks a lot for the comments! In particular you've got me thinking about the numeric filename prefixes. The fatal errors have been getting onerous for me as well, thanks for drawing my conscious attention to them.

Thanks for the crazy load bug. I'll fix it later today.

Yes, loading wart is painfully slow. It's gotten 2x slower since the infix changes. Startup isn't doing anything special, it's just that the wart 'prelude' is a lot more code (since it's most of the language) than we typically type into the repl. I'll work on this some and get back to you.

You're right that relying on load order for generic dispatch makes some cases weird. On the flip side, it's powerful enough to permit multiple dispatch. Like the regex idea in http://arclanguage.org/item?id=16833 that permits either arg to be a regex. When I experimented with generic functions in the past I was always unsure whether to dispatch on the first or last arg. Things like keep like the last arg to be generic, for example. I think the problem isn't predicate dispatch, but checking predicates in load order. Is there a way to make the clauses load-order independent? Or an elegant way to influence load order without moving code around? Hmm, I'm going to think about this.

Regarding 6: I really don't intend to make fully-parenthesized lisp some sort of second class citizen. Feel absolutely free to stick with it for code you write. And I'd love to hear if it makes reading existing code more painful.

Regarding 5: I intended to show that construct_macro_call is an internal detail. I would move it later if I could, but it's used when extending def later in the file. I could a) just unindent it where it is, b) move it after mac and unindent it, c) move it to the bottom of the file, and not use the new mac to extend def (because that wouldn't work yet). Somehow I don't like any of those choices. What do you think?

Regarding 7: It didn't occur to me to watch keypresses on shift. That wasn't a motivator at all for eliminating parens or anything else. (I've got parens swapped with square brackets in my local environment for lisp.) In general, wart is more concerned with reading than with writing. But I'm going to think about it.

Regarding 9: Yes, this is a little weird when you come from other lisps. Wart doesn't consider ' to be an abbreviation for quote. It's the other way around; ' is part of the core syntax (it's processed during tokenization) along with backquote and unquote. The name quote is just for manipulating code. Things like if (car.form = quote), etc. I used to use it in my inliner/partial evaluator, but the inliner is not even in there anymore, and there's really no reason for code-manipulation stuff to be available so early on. I'm just going to take the long forms out. [Edit: done]

Another difference with traditional lisp: 'a is read as (' . a) rather than (' a). Similarly for backquote and unquote. This allows me to sidestep complexities like what happens in (' a b) or ,@,@x that I haven't found a use for yet.

-----

2 points by fallintothis 4969 days ago | link

Is there a way to make the clauses load-order independent?

The Magpie stuff Pauan linked is really interesting. From my layman's intuition, though, it still only seems to be able to linearize clauses because they're limited. As long as we know what sort of clauses we're working with, we can define a partial order on them. The same also happens to hold true for "typical" generic methods that dispatch based on class, because classes readily form a poset (using the subclass relationship). I have a feeling that the answer would be somewhere around "intractable" for arbitrary code, especially if the code didn't have formal semantics we could reason about. I mean, arbitrary code might never halt, so it'd be impossible to linearize those cases to the "end of the line" just because (in general) the Halting Problem is undecidable. But hey, I'm not a doctor.

At least the order that definitions appear in the code easily linearizes the clauses, despite the elephant-in-the-room of loading multiple files.

Regarding 6: I really don't intend to make fully-parenthesized lisp some sort of second class citizen.

That's good. I didn't mean to mischaracterize. I think that might inadvertently be the impression (or even outcome?) from so much as having a no-parens rule: the thought goes from "let's do Lisp with fewer parentheses" to "let's do Lisp with no parentheses", slippery slope fallacy though that is. Plus, at a glance, their elision seems common in your source.

And I'd love to hear if it makes reading existing code more painful.

I think that's a big reason I don't like removing them. There are fewer visual cues for grouping expressions. Not as big a deal in something where indentation is all you have, like Python. A little weirder when my head's trying to fill in where parentheses should rightly be. Blah blah, curmudgeon, blah.

Regarding 5: What do you think?

It only has the one call site; could you just inline it using afn?

Regarding 7: It didn't occur to me to watch keypresses on shift.

It's more of a heuristic than a solid metric, but I definitely notice it. E.g., I get annoyed working in camelCase languages.

Regarding 9: I'm just going to take the long forms out.

So, how do I manipulate quotes in macroexpansions, as needed in qq.wart?

-----

2 points by akkartik 4958 days ago | link

"how do I manipulate quotes in macroexpansions, as needed in qq.wart?"

Yes, you're right that I need to bring back my definitions of quote, etc for qq.wart. But there was more to it than that. Here is -- I think -- a relatively faithful port of qq-expand including the optimizer: http://gist.github.com/3971932#file_080qq.wart. I say 'relatively' because wart doesn't support multiple args to quote/unquote/quasiquote, so a lot of your hardest tests become moot. Also, the backquote doesn't expand to a call to quasiquote in wart.

Sorry this took so long. Even when you provided such thorough tests it was hard to make myself focus on this exercise; there's always alternatives with more instant gratification. But in the end it was a fun exercise in recreating the 'theory' of a (tiny) codebase (http://alistair.cockburn.us/ASD+book+extract%3A+%22Naur,+Ehn...)

---

Rather than start with your port I went back to your arc sources and tried to track my process in an effort to generalize lessons. I was able to split your recursive functions into clauses without much effort[1]. For example:

  ; Arc
  (def qq-cons (expr1 expr2)
    ; assume expr2 is non-splicing
    (let operator (if (splicing expr1) 'dotted-list 'cons)
      (if (no optimize-cons*)
           (list operator expr1 expr2)
          (and (~splicing expr1) (literal expr1) (no expr2))
           (list 'quote (list (eval expr1)))
          (no expr2)
           (list 'list expr1)
          (atom expr2)
           (list operator expr1 expr2)
          (caris expr2 'list)
           (dotted-list 'list expr1 (cdr expr2))
          (and (quoted-non-splice expr1) (quoted-non-splice expr2))
           (list 'quote (cons (cadr expr1) (cadr expr2)))
          (list operator expr1 expr2))))

  # Wart; an early incorrect version
  def qq_cons(expr1 expr2)
    # assume expr2 is non-splicing
    let operator (if splicing?.expr1 'dotted_list 'cons)
      (list operator expr1 expr2)

  def qq_cons(expr1 expr2) :case (and Optimize_cons quoted_non_splice?.expr1
                                                    quoted_non_splice?.expr2)
    (list quote (list cdr.expr1 cdr.expr2))

  def qq_cons(expr1 expr2) :case (and Optimize_cons (carif.expr2 = 'list))
    (dotted_list 'list expr1 cdr.expr2)

  def qq_cons(expr1 expr2) :case (and Optimize_cons atom?.expr2)
    let operator (if splicing?.expr1 'dotted_list 'cons)
      (list operator expr1 expr2)

  def qq_cons(expr1 expr2) :case (and Optimize_cons no.expr2)
    (list 'list expr1)

  def qq_cons(expr1 expr2) :case (and Optimize_cons ~splicing?.expr1
                                                    literal?.expr1
                                                    no.expr2)
    (list quote (list eval.expr1))

The key was to peel the cases off in reverse order.

As in the past, wart is more verbose than arc, but in exchange you get to move the cases around. For example, all but the first clause can be in a separate file so you don't care about them when Optimize_cons is unset.

---

But there was more to this than a simple transliterated port, because quote/unquote/backquote were represented differently in wart than other lisps. I had to track down and update all the places in your version that made this foundational assumption.

Since the backquote is baked into wart's reader rather than expanding to quasiquote like in other lisps, I traced through all top-level calls to qq-expand as a baseline to compare against:

  ; change in original arc version
  (mac quasiquote (expr)
    (ret ans qq-expand.expr
      (ero expr " => " ans)))

Once I had this output in hand I could start porting your tests. I started with a pass at just visually catching as many cases of treating the cdr of quote, quasiquote, etc. as a list as I could[2]; you might enjoy seeing what I was missing at that point, the 5 tokens that took a week of debugging :)

Two forms turned out to be quite useful in my debugging:

  # http://arclanguage.org/item?id=11140
  after_fn qq_expand_list(expr)
    (prn "qq_expand_list " expr "
  => " result)
  # ..and so on for qq_transform, qq_cons, qq_append.

  # Easily turn debug prints on and off.
  def xprn args
    nil

---

References are to snapshots of the same gist:

[1] https://gist.github.com/3971932/f85f4da34d79d7d6cb1c9b01ce60.... Try diffing this version against your port.

[2] https://gist.github.com/3971932/72d159184afaa68e42920801b75e...

-----

2 points by Pauan 4969 days ago | link

"As long as we know what sort of clauses we're working with, we can define a partial order on them."

You could require that the user provide an order when defining a new pattern.

---

My idea with Nulan is that functions shouldn't change: they should be static algorithms. Thus, Nulan doesn't allow you to overload functions like you can in Arc or Wart. Instead, if you wish to overload a function, you define a new gensym which can be stuck into any object to overload it:

  $def %foo (uniq)

  $def foo
    { %foo F } -> ...
    ...        -> ...

Now, if you call the "foo" function with an object that has a %foo key, it'll call the first clause. Otherwise it'll call the second clause.

This gives complete control to the function to decide whether it can be overloaded or not, and exactly how the overloading happens. I think this is the most flexible approach I've ever seen.

As a more concrete example, objects can have a %len key to define a custom (len ...) behavior, an %eval key for custom evaluation behavior, %call for custom call behavior, %pattern-match for custom pattern match behavior, etc.

And because of the way pattern matching works, you can even match against multiple gensyms at once:

  $def %foo (uniq)
  $def %bar (uniq)

  $def foo
    { %foo F
      %bar G } -> ...
    { %foo F } -> ...
    { %bar G } -> ...
    ...        -> ...

The above defines different behavior when an object has both the %foo and %bar keys, and also when it has just %foo or just %bar.

---

"So, how do I manipulate quotes in macroexpansions, as needed in qq.wart?"

I think hardcoding symbols (like in quasiquote) is a terrible idea, so I would just not define quasiquote. Instead, I'd use Arc/Nu quasisyntax, which doesn't use quote at all:

  `(foo bar qux)   -> (list foo bar qux)
  `(foo (bar) qux) -> (list foo (list bar) qux)

If you want quote, just use unquote:

  `(foo ,'bar qux) -> (list foo (quote bar) qux)

Also, I wouldn't define quasiquote as a macro, I'd have it be a part of the reader.

-----

3 points by fallintothis 4969 days ago | link

You could require that the user provide an order when defining a new pattern.

Interesting idea, but the proper mechanism for it eludes me.

- You could have the user give a precedence level, since integers are totally ordered, but that's a terrible idea---magic constants in every generic declaration. Any more restricted domain (e.g., specifying high, medium, or low precedence) gets a little vague.

- You could have a simpler mechanism where each new rule is added to a known, fixed location in the linearization. So basically the order is a double-ended queue and generic declarations have keywords for "push me to the front" or "push me to the back". But that's probably a bit too basic, and still relies on declaration order (in fact, complicating it).

- You could have the user specify a previously-declared chunk of code to execute first, like

  def g(x)
    (prn "default")

  def g(x) :case (odd? x)
    (prn "odd")

  def g(x) :case (x = 1) :before (odd? x)
    (prn "one")

But that's too tightly-coupled and requires code duplication. Plus, if you're already duplicating the code that's close by a definition, why not instead reorder the definitions so a simpler order-they're-read mechanism works?

- I'm really scraping the bottom of the barrel now...Have the user supply their own predicate that determines which generic to apply first? Which means the user basically has to implement their own customized partial-order. I really don't see that happening.

When using generics myself, I don't want to think about these sorts of things too hard. That's what I like about class-based generic dispatch: I just have to think about the function one class at a time, and the right function will be applied to the right instances using their simple, implicit order. For general predicates, instead of hard-coding an order I'd rather use a big if/case/pattern-match that I at least know is ordered the way I wrote it.

-----

4 points by rocketnia 4968 days ago | link

"You could have the user specify a previously-declared chunk of code to execute first"

I think this is similar to Inform 7's approach, where rules can be referred to by name. Generally, every rule in a library has a name, but individual stories omit them unless necessary.

---

"- I'm really scraping the bottom of the barrel now...Have the user supply their own predicate that determines which generic to apply first? Which means the user basically has to implement their own customized partial-order. I really don't see that happening."

That's the approach I took in Lathe a long time ago.[1] Under my approach, the partial order is itself extensible, and it even determines the ordering of its own extensions. I quite like the expressiveness of this approach, but this expressiveness is barely meaningful for in-the-large programming: In order for the precedence rule programmer to have enough information to make judgments by, they need to require all other programmers to annotate their extensions with appropriate metadata. That or they need to maintain their own up-to-date metadata describing common libraries! Instead of programming language battles, we get framework battles full of boilerplate.

Since finishing that system, I've been pondering the "framework" conventions I'd like to use myself, and I've been trying to fold those decisions into the core of a language design.

Whereas Magpie and many other multimethod systems make a big deal about subclasses, I try to avoid any kind of programming where almost all X's do Xfoo, but some X's are also Z's so they do Zfoo instead. By the same principle, I'd avoid the [odd? _] vs. [= 1 _] case altogether. If fact, as long as I succeed in formulating in the non-overlapping designs I like, I avoid the need for precedence rules altogether... but it's still an option I keep in mind just in case.

Currently, I favor supporting extensibility by way of sealer/unsealer pairs and first-class (multi)sets.

Sealer/unsealer pairs work when each extension is an all new range of possibilities. In Arc, I'd just represent these as tagged values, and I wouldn't bother to pursue the secure encapsulation associated with sealer/unsealer pairs. In linear logic, the additive operators describe this kind of combination.

First-class (multi)sets work for when each extension is a new participant in a single computation. In Arc, a list is a good representation for this. In linear logic, the multiplicative operators describe this kind of combination.

When precedence is necessary, it can be modeled explicitly as a transformation of a (multi)set into a more structured model. I think any programmer who makes an extensible tool should carefully choose a transformation that makes sense for their own purposes--whether it's really "precedence" or something else.

---

"For general predicates, instead of hard-coding an order I'd rather use a big if/case/pattern-match that I at least know is ordered the way I wrote it."

That's my take on it, too. My examples of multi-rule functions have included factorial and exponentiation-by-squaring, but those are unrealistic. There's no reason for an extension to come interfere in that process, so it might as well be the body of a single declaration.

When I was using predicate dispatch more often, I often discovered that I could satisfy most of my use cases with just two extension annotations:

- This is the real meaning of the function, and all the other cases are just for auto-coercion of parameters into the expected format. Use this extension first. (This came up when implementing 'iso in terms of 'is, and it also makes sense for coercion functions themselves, such as 'testify.)

- This extension is the last resort. Use it last. (Maybe it throws an informative error, or maybe it returns false when everything else returns true. This also works for all-accepting coercion functions like 'testify, but I'm suspicious of this kind of design. A call to 'testify always does Xfoo, except when it does Zfoo.)

---

[1] Unfortunately, right now arc-Lathe's version of the precedence system isn't something I maintain, and Lathe.js's version has no easy-looking examples because I no longer store extensions in mutable global state. Lathe.js's "hackable" utilities are now higher-order utilities that take all their once-global dependencies as explicit parameters.

-----

1 point by Pauan 4968 days ago | link

"they need to require all other programmers to annotate their extensions with appropriate metadata."

If Nulan had multimethods, I'd probably just require programmers to add additional metadata to the object in addition to the %pattern-match gensym. But Nulan doesn't have linearization or multimethods, so I avoid that whole mess!

---

"Whereas Magpie and many other multimethod systems make a big deal about subclasses"

And I have to agree: objects are amazing, but classes and subclasses are terrible. In fact, I'm of the opinion that probably all hierarchial relationships are too restrictive. Hence Nulan's object system which is completely flat, but has pretty much all the benefits of classes/prototypes.

Basically, the behavior that is common to all objects (of a particular kind) is put into a function, and behavior that is specific to a particular object is put into the object itself. And immutability gives you wonderfully clean semantics for prototype/class behavior, without all the complication and baggage.

-----

2 points by Pauan 4968 days ago | link

Well, I was thinking in terms of Nulan, where you define new patterns with a custom "%pattern-match" property. It wouldn't be hard to add in a "%pattern-match-order" property or such, though I admit I haven't really thought through how such a property would work...

Obviously that kind of system wouldn't work in wart where predicate dispatch is based on arbitrary function calls. Hence my idea in Nulan of not allowing function mutation. I would write the above code like this:

  $def g
    1        -> (prn "one")
    (odd? X) -> (prn "odd")
    ~        -> (prn "default")

That is, it first tries 1, then (odd? X), then the ~ wildcard, which matches anything. This is equivalent to the following Arc code:

  (def g (x)
    (if (is x 1)
          (prn "one")
        (odd? x)
          (prn "odd")
        (prn "default")))

If you want to make a function extensible, you would use a property on an object, like I described here: http://arclanguage.org/item?id=16851

-----

1 point by akkartik 4969 days ago | link

Yeah, I went through a similar thought process.

One possibility is to pin a clause at the front:

  def g(x) :priority-case (x = 1)
    (prn "one")

It generalizes to your high/medium/low categories, but perhaps it's useful if you permit exactly one priority clause (newer ones overwrite it).

-----

1 point by akkartik 4969 days ago | link

I think I'm getting lost in the $def foos and the %foos. Can you show how you would make say len aware of queues?

-----

2 points by Pauan 4969 days ago | link

I could switch to Arc syntax if you like. Looking at the Arc implementation of queues...

  (def queue () (list nil nil 0))
  (def qlen (q) (q 2))

Well, Arc uses mutation, and I wouldn't, but here's how you would define it in Nulan:

  $def queue; ->
    [[] [] 0
      @{ %len: X -> (X 2) }]

And here's the same thing, but with Arc syntax[1]:

  (def queue ()
    (array nil nil 0
      @(dict %len (fn (x) (x 2)))))

The above is basically exactly the same as Arc, except it uses an array rather than a cons, and it has a custom %len property. If you don't want the different parts of the queue to be public, you could use gensyms like this:

  (w/uniq (len left right)
    (def queue ()
      (dict %len  (fn (x) (x len))
            len   0
            left  nil
            right nil)))

Rather than returning an array of 3 elements, it returns an object, which has a "len", "left", and "right" property.

In either case, the object that is returned has a %len property, which is a function that computes the length. The "len" function would then be defined like this:

  (def len (x)
    ((x %len) x))

That is, it first looks up the %len property in the object, and then calls it with itself as the first argument.

---

* [1]: You might be wondering what's going on here... well, in Nulan, a "list" is just like a JavaScript array: it's an ordinary object which uses numeric keys and has a custom %len property. In particular, that means that all the tools that work on objects work on arrays too.

The @ syntax is used for splicing. So, for instance, to merge 3 objects together, you'd say { @foo @bar @qux }. And because arrays are objects too, you can splice them.

So what we're doing is, we first create an array of 3 elements, and we then splice in a new object. This new object has a custom %len property, which overrides the %len property of the array.

Alternatively, we could have done it like this:

  (set (array nil nil 0) %len ...)

But I think it's better to use the splicing notation, especially when you use [] for arrays and {} for objects. By the way, these two are actually equivalent:

  { @[[] [] 0]
    %len ... }

  [[] [] 0
    @{ %len ... }]

In the first case we take an empty object, splice in an array, and then assign a %len property. In the second case, we take an array and splice in an object that has a %len property.

In either case, the object that is returned has the same keys and values.

-----

2 points by Pauan 4969 days ago | link

Sorry to hijack your wart thread, but... I just realized something. I wanted to allow for iterating over the keys of an object, but that caused issues, as I discussed with rocketnia earlier (http://arclanguage.org/item?id=16823)

Anyways, JavaScript gets around this problem by allowing you to define a property on an object that is "non-enumerable". But any kind of system that I add in that lets you define "non-enumerable" properties is going to be big and complicated.

Instead, I found a very simple way to create enumerable objects in a way that is completely customizable, and doesn't even need to be built-in:

  (= %keys (uniq))
  
  (def iterable-object ()
    (let keys []
      { %set  (fn (x k v)
                (pushnew k keys))
        %rem  (fn (x k)
                (pull k v))
        %keys (fn (x) keys) }))

Here's what's going on. Firstly, we got the %keys gensym, which is supposed to be a function that returns a list of keys in the object.

The function "iterable-object" returns a new object that has custom %set, %rem, and %keys properties:

%set is called when assigning a property to the object. It takes the key and pushes it into the "keys" array.

%rem is called when deleting a property from the object. It takes the key and removes it from the "keys" array.

%keys just returns the "keys" array.

Now, these objects are capable of being enumerated, which means they could be passed to "map", for instance. But here's the fun part: you can completely control which properties are enumerated and which aren't.

In this case, the object will only enumerate properties that are added or removed after the object is created. So any properties defined previously are still hidden. At least, depending on how I implement splicing and %set...

---

What the above basically means is... "every computer problem can be solved by the addition of more properties on an object" :P

-----

1 point by Pauan 4968 days ago | link

After fidgeting with the syntax, here's what I got:

  $def iterable-object ->
    [ %set  -> x k o n
              [ @x %keys -> ~ (pushnew (keys x) k) ]
      %rem  -> x k o
              [ @x %keys -> ~ (pull (keys x) k) ]
      %keys -> ~ {} ]

I actually realized that swapping [] and {} is way better, meaning that [ foo bar ] is (dict foo bar) and { foo bar } is (array foo bar). There's two reasons for this:

1) {} is closer to () than [] is, which is really nice in macros:

  $mac $accessor -> n v
    $uniq %a
      {$def n -> %a
        {{%a v} %a}}

2) I found that {} didn't stand out enough, but [] does.

---

By the way, in case you're curious about the Nulan syntax... $ is prefixed to vau/macros, which is why it's "$def" rather than "def"

-> is the function macro, which means (-> x y z ...) is equivalent to (fn (x y z) ...) in Arc

~ is the "wildcard syntax" which matches anything, just like _ in Haskell

[ foo bar ] translates to (dict foo bar), and { 1 2 3 } translates to (array 1 2 3)

@ is for splicing. Which means that [ @foo @bar @qux ] merges three objects into one. If you want to update an object with new properties, it's idiomatic to say [ @foo ... ]

Gensyms are prefixed with %

-----

1 point by Pauan 4968 days ago | link

Which, if translated into JavaScript, would look something like this...

  var iterableObject = function () {
    var a = {};
    a.set = function (x, k, o, n) {
      var a = Object.create(x);
      a.keys = function () {
        return pushnew(keys(x), k)
      }
    };
    a.rem = function (x, k, o) {
      var a = Object.create(x);
      a.keys = function () {
        return pull(keys(x), k)
      }
    };
    a.keys = function () {
      return []
    };
    return a
  }

-----

2 points by rocketnia 4968 days ago | link

"Instead, I found a very simple way to create enumerable objects in a way that is completely customizable, and doesn't even need to be built-in"

That's my approach too. Or it would be, if I actually got around to building a language out of Cairntaker. :)

-----

2 points by akkartik 4969 days ago | link

:) All very interesting, thanks. The len example was very clear. Hijack away!

-----

3 points by Pauan 4970 days ago | link

"Is there a way to make the clauses load-order independent?"

Magpie has a precedence system with its multimethods:

http://magpie-lang.org/multimethods.html#linearization

-----

2 points by akkartik 4970 days ago | link

I've fixed the load bug.

I also managed to return performance to pre-infix levels by the simple expedient of turning on compiler optimizations; on my machine it yields a 2x speedup. Startup time is now 2.6s, and tests take 35s to run.

Finally, the interpreter no longer dies when it encounters unbound vars, or when trying to invoke a non-function.

Update 9 hours later: After a little more optimization, wart will now start up in less than a second if the binary is up to date. The first run after downloading (build+run time) takes 20s. Tests run in 10s (excluding build time).

I tried to be clever and assume frequent building if running tests, but it turns out wart now does enough work that optimizations always improve build+tests time.

http://github.com/akkartik/wart/compare/5707dcc022...904be9f...

-----

1 point by akkartik 4971 days ago | link | parent | on: Wart update

Yeah, I considered the issue of perl[1]. I might use =~, but I'll probably not use =~ for regex match if I use ~= for inequality. Different things should look different, and all that. The simple option would be to just use match like in javascript. But maybe something else will present itself. Maybe just define equality between regex and string as match?

  let r (regex "a.*")
    if ("abc" = r)
      ..

[1] See the commit message at http://github.com/akkartik/wart/commit/f0e3d726eb

-----

2 points by Pauan 4971 days ago | link

I like that idea. You're testing whether the regexp is "equal to" the string. But I'd put the regexp first, like (r = "abc")

-----