Arc Forum | I've been bashing my head in this somewhat, in SNAP. The problem is always with...

Arc Forum

4 points by almkglor 6508 days ago | link | parent

I've been bashing my head in this somewhat, in SNAP. The problem is always with the reader function: we cannot use a single reader function, since two different process could be using the reader and expecting it to be in two different packages. Obviously we would have to create separate monadic readers for each process.

The other problem then becomes: how do we implement, say, (import ...) or (in-package ...) ? Should the reader silently filter them out? If not, how does 'eval notify the reader that the package is changed? If someone defines foo::in-package and bar::in-package, which one does the reader return, and how will 'eval know if the package should be changed?

Yet another problem would be intrasymbol syntax: should the reader leave foo::bar!nitz, or should it process it into (foo::bar 'foo::nitz) or even (foo::bar 'quux::nitz) if (import quux::nitz foo::nitz) is used?

Yet another problem is the "standard" package, i.e. CL-USER in CL. Obviously all symbols from this package are imported into all packages.... or should they? The problem is a problem which PG ignored in ArcN: backwards compatibility. If I write code today which uses the symbol "convoke" as, say, a table in my function and tomorrow PG decides to create a macro "convoke" which does something else, kaboom! my code dies.

In my opinion we should separate packages into several interfaces.

Basically, suppose I release a package, AlmkglorSuperSupremePanPizza, and I define a "version 1" of this interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)

Then someone else can use:

  (import AlmkglorSuperSupremePanPizza::v1)

This imports the 'eat, 'drink, and 'be-merry symbols from AlmkglorSuperSupremePanPizza. Importantly, once I publish the interface, I cannot change it. If I realize that my interface is lacking, I will have to define a new interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)
  (interface v2
    ;include v1 interface elements
    v1 for-tomorrow-we-die)

4 points by almkglor 6508 days ago | link

continued:

Anyway, I've been thinking rather deeply about packages and module systems and the like. One major reason for using a symbol-based package/module system is types: for example, if packageA defines a 'gaz type, and packageB defines a 'gaz type also, obviously the types are incompatible and they shouldn't be the same.

Also, for a potential solution from within the Scheme implementation, consider instead moving the problem from the reader to the evaller.

Instead of having symbol packages be handled by the reader, we might instead define a new builtin type, 'eval-cxt. An 'eval-cxt object is simply an evaluator, but understands packages and the (in-package ...) and (import ...) etc. syntaxes.

The reader simply reads in symbols blindly, without caring about the exact package they should go to. This simplifies the reader and allows us to continue using the reader and writer for saving plain Arc data. Instead, any particular context for evaluation is put into the 'eval-cxt object. 'eval-cxt will understand 'import etc. forms, and will perform the translation of all symbols into their qualified, package-based counterparts, i.e.:

  (= evaller (eval-cxt))
  (= tmp (read))
  user input> '(hello world)
  => (quote (hello world))

  (evaller tmp)
  => (arc-user::hello arc-user::world)

The Arc REPL would then be something like:

  (def tl ()
    (let my-eval (eval-cxt)
      ((afn ()
         (pr "arc> ")
         (write:my-eval:read)
         (self)))))

This simplifies 'read, EXCEPT: intrasymbol syntax must, must be absolutely expanded by the reader. Why? Because otherwise macros whose expansions use intrasymbol syntax won't work properly:

  (in-package sample)
  (= private-table
     (table 'foo 42
            'bar 99))
  (mac foo ()
    `(private-table!foo))

The problem is the 'foo symbol above: the macroexpansion must express both private-table as sample::private-table and 'foo as sample::foo.

Incidentally, the 'eval function can still be expressed rather simply by this manner:

  (def eval (e)
    ((eval-cxt) e))

Alternatively we can give an optional package name:

  (w/uniq no-param
    (def eval (e (o p no-param))
      (let evaller (eval-cxt)
        (unless (is p no-param)
          (evaller `(in-package ,p)))
        (evaller e))))

This keeps maximum backward compatibility with ArcN: a simple reader and a simple eval function

-----

3 points by cchooper 6508 days ago | link

Hmm... interesting stuff.

I think I'll keep your interfaces idea. It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

I don't really understand the problem with 'in-package. Is this a SNAP-specific problem or does it affect Arc generally? Won't things work the same way as in CL?

Some other thoughts I had about this which may be useful: modules are usually kept in a file, so a 'load-in-package function which takes a package argument might be useful

Also, I thought it would be good to have a read-macro to switch packages. I'll reuse #: as it's not needed in my system:

  #:foo (some expressions)

This would read the expressions in package 'foo before executing them. That might solve your problem as the package is passed to the reader explicitly.

I like the idea of symbols being read in without a package, but then getting a package at eval time. One way to implement this may be to store all the symbols in a special package when they are read, then eval can move these to a new package when it evaluates them. This makes packages very dynamic.

One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc. That;s just a bit crazy, so I think strings should be used to name packages instead. Alternatively, package names should also be interned in a special package that's treated differently. Seeing as packages are just mappings from strings to symbols, that doesn't really make much difference.

-----

3 points by almkglor 6507 days ago | link

> It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

Which is the point, of course ^^

The other point is disciplining package makers to make package interfaces constant even as newer versions of the package are made. This helps preserve backward compatibility. In fact, if the ac.scm and arc.arc functions are kept in their own package, we can even allow effective backward compatibility of much of Arc by separating them by version, i.e.

  (using arc v3)
  (using arc v4)
  (using arc v5)

> I don't really understand the problem with 'in-package.

  ; tell the reader that package 'foo has a symbol 'in-package
  (= foo::in-package t)
  ; enter package foo
  (in-package foo)
  ; now: does the reader parse this as (in-package ...) or (foo::in-package ...)
  (in-package bar)

> Is this a SNAP-specific problem or does it affect Arc generally?

It's somewhat SNAP-specific, since we cannot have a stateful, shared reader, but I suspect that any Arc implementation that supports concurrency of any form will have similar problems with having readers keep state across invocations. The alternative would be having a monadic reader.

> Won't things work the same way as in CL?

Not sure: I never grokked anything except the basics of CL packages.

> #:foo (some expressions)

How about in a module file? It might get inconvenient to have to keep typing #:foo for each expression I want to invoke in the foo package, which means we really should think deeply about how in-package should be properly implemented.

> One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc.

If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package, i.e. foo::bar is always foo::bar, as long as :: exists in the symbol.

Of course, hierarchical packages are nice too ^^

-----

1 point by cchooper 6505 days ago | link

> #:foo (some expressions)

I'm assuming you can use it like this:

  #:foo 
  ((def (x) (+ 1 x))
   (def (y) (expt y))
   (= bar 123))

or like this:

  #:foo ((load "filename"))

so you only have to type it once (although you would at the top level!) With this syntax, #:foo x could expand to something like (read-with-package "foo" x), so you wouldn't need a stateful read. Well, unless you called 'read within the file. So I guess you do. :)

> If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package

True, but it will be a bit confusing:

  (import 'foo::bar 'foo::quux)
  (import 'foo::bar 'baz::quux)

Are 'foo:quux and 'baz::quux the same package? If so, it's a bit strange that you can refer to the same thing by different symbols. That's why I think strings are better. Not sure what I think about nested packages. I'll have to ponder on that.

-----

1 point by almkglor 6505 days ago | link

load is currently defined as:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (whilet e (read f)
          (eval (hook e))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))

What magic needs to be inserted here to make 'load use the correct 'read, keeping in mind that even plain Arc supports threads and those threads share global variables?

It still looks like a stateful 'read to me, and I don't want a stateful 'read at all, because a file might want to directly use 'read:

  $ cat getconfig.arc
  (= configuration (read "my.cfg"))

This is one good reason to try to keep 'read stupid: one of Arc's idioms is to simply dump data as s-expressions and read them in later as list structures. If 'read is too smart, this idiom might have some subtle gotchas.

For that matter I'd prefer to keep the package definitions in the file itself, rather than have to remember to put the file in a package:

  $ cat mine.arc
  (in-package mine)

  (def mine ()
    (prn "this is my mine!!"))

> (import 'foo::bar 'foo::quux)

Okay, I have to ask: what does 'import mean?

-----

1 point by cchooper 6504 days ago | link

I would define 'load-in-package as

  - (def load (file (o hook))
  + (def load-in-package (package file (o hook))
  - (whilet e (read f)
  + (whilet e (read-in-package package f)

That's the best I can do. I think that if packages are involved then read is inherently stateful, so even threads are a problem. I have no idea how CL implementations deal with threads and package*, because the spec makes no account for it. :(

> (import 'foo::bar 'foo::quux)

Oops, that should be

  (import 'foo::quux 'baz::quux)

-----

1 point by almkglor 6504 days ago | link

Here's my proposal:

We move all state information into a new object type called a "context". It can be constructed without parameters via the 'cxt function:

  (cxt)
  => <implementation-specific>
  (type (cxt))
  => arc::cxt

The REPL becomes a RCEPL, a read-contexter-eval-print loop. For convenience, we also provide an 'eval-cxt object:

  (eval-cxt)
  => <implementation-specific>
  (type (eval-cxt))
  => arc::eval-cxt

'eval-cxt objects are callable, and their call is equivalent to:

  (let ob (eval-cxt)
    (ob x))
  ==>
  (let ob (cxt)
    (eval:ob x))

The implementation is free to define 'cxt and/or 'eval-cxt objects in terms of Arc axioms or by adding them as implementation-specific axioms.

The context object accepts a plain read expression (with unpackaged symbols) and emits an s-expression where all symbols are packaged symbols.

It is the context object which keeps track of the current package, so you might have some accessor functions to manipulate the context object (e.g. destructure it into the current package, etc.).

The read function is stateless and simply emits unpackaged symbols, and emits packaged symbols if and only if the given plaintext specifically includes a package specification.

A package object is a stateful, synchronized (as in safely accessible across different threads, and whose basic operations are assuredly atomic) object. A context is a stateful object intended for thread- and function- local usage.

context objects

===============

A context object is callable (and has an entry in the axiom::call* table) and has the following form:

  (let ob (cxt)
    (ob expression))

The return value of the context is either of the following:

1. If the expression is one of the following forms (the first symbol in each form is unpackaged, 'symbol here is a variable symbol):

  (in-package symbol)
  (interface symbol . symbols)
  (using symbol)
  (import symbol symbol)

...then the return value is axiom::t, and either the context's state is changed, or the state of a package (specifically the current package of the context) is changed.

2. For all other forms, it returns an equivalent expression, but containing only packaged symbols. The state of the context is not changed.

The forms in number 1 above have the following changes in the context or current package of the context:

  (in-package symbol)

Changes the current package of the context to the package represented by the unpackaged symbol. The implementation is free to throw an error if the given symbol is packaged.

  (interface symbol . symbols)

Defines an interface. All symbols are first applied to the current package to translate them into packaged symbols, if they are unpackaged (this translation by itself may change the package's state, and also a packaged symbol will simply be passed as-is by the package object; see section "package objects" below). It then modifies the package of the first symbol to have an interface whose symbols are the given symbols.

If the interface already exists, it is checked if the lists are the same to the existing list. If it is not the same, the implementation is free to throw an error.

  (using symbol)

The given symbol must be a packaged symbol. It must name an interface of its package; if the interface does not exist on the package, the implementation must throw an error. For each symbol in the interface, this changes the current package's state, creating or modifying the mapping from the unpackaged symbol of the same name to the symbol in the interface.

For conflicting package interfaces: let us suppose that the context is in package 'User, and there exists two package interfaces, A::v1 and B::v1. A::v1 is composed of (A::foo A::bar) while B::v1 is composed of (B::bar B::quux). If the context receives (using A::v1), the User package contains the mapping {foo => A::foo, bar => A::bar}. Then if the context receives (using B::v1), the User package afterwards contains the mapping {foo => A::foo, bar => B::bar, quux => B::quux}.

  (import symbol symbol)

Forces the current package to have a specific mapping. The first symbol must be a packaged symbol and the second symbol must be unpackaged. The implementation must throw an error if this invariant is violated.

Continuing the example above, after (import A::bar A-bar), this changes the package to {foo => A::foo, bar => B::bar, A-bar => A::bar, quux => B::quux}

package objects

===============

A package object is callable and has the following form:

  (ob expression)

expression must evaluate to a symbol, and if a non-symbol is applied to a package object, the implementation is free to throw an error. The application otherwise evaluates to either:

1. The same symbol, if the given symbol is a packaged symbol; this does not change the state of the package

2. A packaged symbol, if the given symbol is an unpackaged symbol. If the package does not contain a mapping for the unpackaged symbol, the state of the package is changed so that a mapping for the unpackaged symbol to a packaged symbol exists.

The package object also supports an 'sref operation:

  (sref ob v k)

k is an unpackaged symbol while v is a packaged symbol; the implementation is free to throw an error if this invariant is violated.

Packages are handled by interface.

Further, we also predefine two packages, axiom and arc.

The axiom package contains the following symbols:

  axiom::t
  axiom::nil
  axiom::fn
  axiom::if
  axiom::quote
  axiom::quasiquote
  axiom::unquote
  axiom::unquote-splicing
  axiom::set
  axiom::call*

The axiom package is implicitly imported into all packages. It presents no interface

The arc package contains all "standard" Arc functions and macros. The arc package is not implicitly imported into all packages.

The arc package contains the interface arc::v3 . This interface is the set of symbols currently defined on Anarki. Future extensions to the arc standard library must be first placed in the interface arc::v3-exp until they are placed into a future arc::v4 interface, and so on.

load

====

The load implementation is thus:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (let evaller (eval-cxt)
          (evaller '(in-package User))
          (whilet e (read f)
            (evaller (hook e)))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))

-----

1 point by cchooper 6505 days ago | link

As for CL packages, I've decided I don't really like the way they work. If a file is compiled, then (in-package foo) is only guaranteed to work if it appears at the top level. So...

  (if (eq x 10) (in-package foo) (in-package bar))

works in an interpreted file, but the behaviour is undefined if the file is compiled. CLISP handles both cases fine.

Also in CL, the value of package* doesn't always correspond to the actual current package. For example

  (setf x "I'm in the default pacakge!")
  (setf foo::x "I'm in the FOO package!")
  (setf *package* foo)
  (print *package*)
  (print x)

does this when interpreted

  #<PACKAGE FOO>
  "I'm in the FOO package!"

but this when compiled

  #<PACKAGE FOO>
  "I'm in the default pacakge!"

Either the package should be determined at eval-time (as was your suggestion) or the user should be forced to use read macros like #: and #.(in-package ...) to switch packages at read time. The CL solution is an ad-hoc compromise between the two.

-----

1 point by almkglor 6505 days ago | link

Forcing the user to keep using read macros doesn't feel quite right. Personally I'm more for using 'eval-cxt objects, which would do the assignment from plain symbols to qualified symbols, and keep track of the current package.

Of course, using 'eval-cxt raises questions about static whole-program compilation, I think. Hmm. I haven't thought deeply about that yet.

-----