Probably one potential problem would be the splitting of a string into words. A minor problem is that of figuring out what a "word" is, i.e. the division between words.
Otherwise looks like a pretty standard Bayesian analysis, which I believe pg has done already.
> This would be possible if there were easy ways to reprogram the Arc syntax in a library.
Again, like I said, this is probably implementable using readermacros, but fooling around with the reader is always troublesome.
Consider some random programmer who uses /ca/ as a variable name in his or her programs for some inexplicable reason. Whether this is considered a regular expression or a valid symbol will then depend on whether or not it is loaded before or after the regular expression library.
If you want nice regular expression syntax, then it must be standardized as part of the language syntax so that everyone knows they should avoid using such variable names. Alternatively, give some method for specifying a reader for each module file. No, this is an exploratory language, and someone will try using /ca/ as a variable name unless you specifically ban it. I promise you that.
This is only partially implementable using ssyntax, but again this may be considered as "fooling around with the reader".
If strings as regular expressions work for you, then it's okay, since strings are already standardized in the syntax:
a plain expression like /ca/ seems rather straightforward to me.
This requires some kind of special version of the reader.
One thing I suggest would be to use some sort of # readermacro (which you'll have to hack in the underlying Scheme):
#/ca/
#/s/foo/bar/
If you want nice syntax for regexps, almost definitely it will have to be part of the axioms, or at least readermacros (which I personally don't like). Otherwise if representing them via strings is acceptable, then we don't need it as part of the axioms.
I am not sure what the consequences are for possible implementations, but preferrably, I would want to be able to use regexps in (find or (keep, or (findsubseq just as easily as I would now use strings or functions
(keep odd (list 1 2 3 4))
(keep (reg /nan/) (list "banana" "bonobo" "bandanga"))
(findsubseq (reg /\d+:\d+/) "The current time is 10:00 am")
When looking at http://arcfn.com/doc/string.html , there is an aweful lot of restrictions on when we can use variables, or functions as arguments, so maybe a lot of the string operations in there become obsolete if there is a regexp engine in arc.
Re: "chunk of code should be executed atomically" - I'm not 100% sure but I think it's possible that (from the way 'atom is implemented) a sequence that is not protected within an 'atom scope will be able to see an "atomic" operation in progress.
Basically 'atom just acquires a Global Interpreter Lock around the function it executes. I'm not sure, but from the semantics of locking, I think that another thread may see the "atomic" operation step-by-step unless it is itself protected by 'atom (and then there is potential contention for a single global lock!)
Of course, mzscheme uses green threads, so maybe it won't work that way.
> Although I have managed to figure out how to enable forwarding port 80, I can't find anything on stopping access to port 8080 (so you can still access that port from the internet...). If someone could tell me how to lock that down, I would appreciate it. Thanks in advance.
Thanks for the reminder, I had forgotten about that.
I'm still not sure that's a complete solution though... you don't really prevent access to the port, you just send an access denied message instead of serving the request. (But I don't know that much about web security, so maybe that really is sufficient.)
Not sure either. It depends on whether the Arc Server is secure/{not dumb} enough such that it won't be fooled by someone pretending to be from 127.0.0.1 , for example.
Couldn't you just make Apache or Linux firewall port 8080 so all attempts to access it from outside are blocked? (That said, I wouldn't know how to do that off the top of my head.)
> Well, this approach basically is the 'annotate approach, except instead of having a separate basic data type - a "tagged" object - you just use cons cells, where the car is the type and the cdr the rep.
Except for the case where (isa R T), where (annotate T R) returns R directly:
This isn't a fundamental difference. Just as 'annotate could as easily create a new doubly-wrapped object, 'annotate-cons can as easily not cons when the type is the same.
(def annotate-cons (typ val)
(if (isa val typ) val (cons typ val)))
I've been bashing my head in this somewhat, in SNAP. The problem is always with the reader function: we cannot use a single reader function, since two different process could be using the reader and expecting it to be in two different packages. Obviously we would have to create separate monadic readers for each process.
The other problem then becomes: how do we implement, say, (import ...) or (in-package ...) ? Should the reader silently filter them out? If not, how does 'eval notify the reader that the package is changed? If someone defines foo::in-package and bar::in-package, which one does the reader return, and how will 'eval know if the package should be changed?
Yet another problem would be intrasymbol syntax: should the reader leave foo::bar!nitz, or should it process it into (foo::bar 'foo::nitz) or even (foo::bar 'quux::nitz) if (import quux::nitz foo::nitz) is used?
Yet another problem is the "standard" package, i.e. CL-USER in CL. Obviously all symbols from this package are imported into all packages.... or should they? The problem is a problem which PG ignored in ArcN: backwards compatibility. If I write code today which uses the symbol "convoke" as, say, a table in my function and tomorrow PG decides to create a macro "convoke" which does something else, kaboom! my code dies.
In my opinion we should separate packages into several interfaces.
Basically, suppose I release a package, AlmkglorSuperSupremePanPizza, and I define a "version 1" of this interface:
This imports the 'eat, 'drink, and 'be-merry symbols from AlmkglorSuperSupremePanPizza. Importantly, once I publish the interface, I cannot change it. If I realize that my interface is lacking, I will have to define a new interface:
Anyway, I've been thinking rather deeply about packages and module systems and the like. One major reason for using a symbol-based package/module system is types: for example, if packageA defines a 'gaz type, and packageB defines a 'gaz type also, obviously the types are incompatible and they shouldn't be the same.
Also, for a potential solution from within the Scheme implementation, consider instead moving the problem from the reader to the evaller.
Instead of having symbol packages be handled by the reader, we might instead define a new builtin type, 'eval-cxt. An 'eval-cxt object is simply an evaluator, but understands packages and the (in-package ...) and (import ...) etc. syntaxes.
The reader simply reads in symbols blindly, without caring about the exact package they should go to. This simplifies the reader and allows us to continue using the reader and writer for saving plain Arc data. Instead, any particular context for evaluation is put into the 'eval-cxt object. 'eval-cxt will understand 'import etc. forms, and will perform the translation of all symbols into their qualified, package-based counterparts, i.e.:
This simplifies 'read, EXCEPT: intrasymbol syntax must, must be absolutely expanded by the reader. Why? Because otherwise macros whose expansions use intrasymbol syntax won't work properly:
I think I'll keep your interfaces idea. It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).
I don't really understand the problem with 'in-package. Is this a SNAP-specific problem or does it affect Arc generally? Won't things work the same way as in CL?
Some other thoughts I had about this which may be useful: modules are usually kept in a file, so a 'load-in-package function which takes a package argument might be useful
Also, I thought it would be good to have a read-macro to switch packages. I'll reuse #: as it's not needed in my system:
#:foo (some expressions)
This would read the expressions in package 'foo before executing them. That might solve your problem as the package is passed to the reader explicitly.
I like the idea of symbols being read in without a package, but then getting a package at eval time. One way to implement this may be to store all the symbols in a special package when they are read, then eval can move these to a new package when it evaluates them. This makes packages very dynamic.
One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc. That;s just a bit crazy, so I think strings should be used to name packages instead. Alternatively, package names should also be interned in a special package that's treated differently. Seeing as packages are just mappings from strings to symbols, that doesn't really make much difference.
> It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).
Which is the point, of course ^^
The other point is disciplining package makers to make package interfaces constant even as newer versions of the package are made. This helps preserve backward compatibility. In fact, if the ac.scm and arc.arc functions are kept in their own package, we can even allow effective backward compatibility of much of Arc by separating them by version, i.e.
(using arc v3)
(using arc v4)
(using arc v5)
> I don't really understand the problem with 'in-package.
; tell the reader that package 'foo has a symbol 'in-package
(= foo::in-package t)
; enter package foo
(in-package foo)
; now: does the reader parse this as (in-package ...) or (foo::in-package ...)
(in-package bar)
> Is this a SNAP-specific problem or does it affect Arc generally?
It's somewhat SNAP-specific, since we cannot have a stateful, shared reader, but I suspect that any Arc implementation that supports concurrency of any form will have similar problems with having readers keep state across invocations. The alternative would be having a monadic reader.
> Won't things work the same way as in CL?
Not sure: I never grokked anything except the basics of CL packages.
> #:foo (some expressions)
How about in a module file? It might get inconvenient to have to keep typing #:foo for each expression I want to invoke in the foo package, which means we really should think deeply about how in-package should be properly implemented.
> One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc.
If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package, i.e. foo::bar is always foo::bar, as long as :: exists in the symbol.
so you only have to type it once (although you would at the top level!) With this syntax, #:foo x could expand to something like (read-with-package "foo" x), so you wouldn't need a stateful read. Well, unless you called 'read within the file. So I guess you do. :)
> If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package
Are 'foo:quux and 'baz::quux the same package? If so, it's a bit strange that you can refer to the same thing by different symbols. That's why I think strings are better. Not sure what I think about nested packages. I'll have to ponder on that.
(def load (file (o hook))
" Reads the expressions in `file' and evaluates them. Read expressions
may be preprocessed by `hook'.
See also [[require]]. "
(push current-load-file* load-file-stack*)
(= current-load-file* file)
(or= hook idfn)
(after
(w/infile f file
(whilet e (read f)
(eval (hook e))))
(do (= current-load-file* (pop load-file-stack*)) nil)))
What magic needs to be inserted here to make 'load use the correct 'read, keeping in mind that even plain Arc supports threads and those threads share global variables?
It still looks like a stateful 'read to me, and I don't want a stateful 'read at all, because a file might want to directly use 'read:
This is one good reason to try to keep 'read stupid: one of Arc's idioms is to simply dump data as s-expressions and read them in later as list structures. If 'read is too smart, this idiom might have some subtle gotchas.
For that matter I'd prefer to keep the package definitions in the file itself, rather than have to remember to put the file in a package:
$ cat mine.arc
(in-package mine)
(def mine ()
(prn "this is my mine!!"))
- (def load (file (o hook))
+ (def load-in-package (package file (o hook))
- (whilet e (read f)
+ (whilet e (read-in-package package f)
That's the best I can do. I think that if packages are involved then read is inherently stateful, so even threads are a problem. I have no idea how CL implementations deal with threads and package*, because the spec makes no account for it. :(
'eval-cxt objects are callable, and their call is equivalent to:
(let ob (eval-cxt)
(ob x))
==>
(let ob (cxt)
(eval:ob x))
The implementation is free to define 'cxt and/or 'eval-cxt objects in terms of Arc axioms or by adding them as implementation-specific axioms.
The context object accepts a plain read expression (with unpackaged symbols) and emits an s-expression where all symbols are packaged symbols.
It is the context object which keeps track of the current package, so you might have some accessor functions to manipulate the context object (e.g. destructure it into the current package, etc.).
The read function is stateless and simply emits unpackaged symbols, and emits packaged symbols if and only if the given plaintext specifically includes a package specification.
A package object is a stateful, synchronized (as in safely accessible across different threads, and whose basic operations are assuredly atomic) object. A context is a stateful object intended for thread- and function- local usage.
context objects
===============
A context object is callable (and has an entry in the axiom::call* table) and has the following form:
(let ob (cxt)
(ob expression))
The return value of the context is either of the following:
1. If the expression is one of the following forms (the first symbol in each form is unpackaged, 'symbol here is a variable symbol):
(in-package symbol)
(interface symbol . symbols)
(using symbol)
(import symbol symbol)
...then the return value is axiom::t, and either the context's state is changed, or the state of a package (specifically the current package of the context) is changed.
2. For all other forms, it returns an equivalent expression, but containing only packaged symbols. The state of the context is not changed.
The forms in number 1 above have the following changes in the context or current package of the context:
(in-package symbol)
Changes the current package of the context to the package represented by the unpackaged symbol. The implementation is free to throw an error if the given symbol is packaged.
(interface symbol . symbols)
Defines an interface. All symbols are first applied to the current package to translate them into packaged symbols, if they are unpackaged (this translation by itself may change the package's state, and also a packaged symbol will simply be passed as-is by the package object; see section "package objects" below). It then modifies the package of the first symbol to have an interface whose symbols are the given symbols.
If the interface already exists, it is checked if the lists are the same to the existing list. If it is not the same, the implementation is free to throw an error.
(using symbol)
The given symbol must be a packaged symbol. It must name an interface of its package; if the interface does not exist on the package, the implementation must throw an error. For each symbol in the interface, this changes the current package's state, creating or modifying the mapping from the unpackaged symbol of the same name to the symbol in the interface.
For conflicting package interfaces: let us suppose that the context is in package 'User, and there exists two package interfaces, A::v1 and B::v1. A::v1 is composed of (A::foo A::bar) while B::v1 is composed of (B::bar B::quux). If the context receives (using A::v1), the User package contains the mapping {foo => A::foo, bar => A::bar}. Then if the context receives (using B::v1), the User package afterwards contains the mapping {foo => A::foo, bar => B::bar, quux => B::quux}.
(import symbol symbol)
Forces the current package to have a specific mapping. The first symbol must be a packaged symbol and the second symbol must be unpackaged. The implementation must throw an error if this invariant is violated.
Continuing the example above, after (import A::bar A-bar), this changes the package to {foo => A::foo, bar => B::bar, A-bar => A::bar, quux => B::quux}
package objects
===============
A package object is callable and has the following form:
(ob expression)
expression must evaluate to a symbol, and if a non-symbol is applied to a package object, the implementation is free to throw an error. The application otherwise evaluates to either:
1. The same symbol, if the given symbol is a packaged symbol; this does not change the state of the package
2. A packaged symbol, if the given symbol is an unpackaged symbol. If the package does not contain a mapping for the unpackaged symbol, the state of the package is changed so that a mapping for the unpackaged symbol to a packaged symbol exists.
The package object also supports an 'sref operation:
(sref ob v k)
k is an unpackaged symbol while v is a packaged symbol; the implementation is free to throw an error if this invariant is violated.
Packages are handled by interface.
Further, we also predefine two packages, axiom and arc.
The axiom package is implicitly imported into all packages. It presents no interface
The arc package contains all "standard" Arc functions and macros. The arc package is not implicitly imported into all packages.
The arc package contains the interface arc::v3 . This interface is the set of symbols currently defined on Anarki. Future extensions to the arc standard library must be first placed in the interface arc::v3-exp until they are placed into a future arc::v4 interface, and so on.
load
====
The load implementation is thus:
(def load (file (o hook))
" Reads the expressions in `file' and evaluates them. Read expressions
may be preprocessed by `hook'.
See also [[require]]. "
(push current-load-file* load-file-stack*)
(= current-load-file* file)
(or= hook idfn)
(after
(w/infile f file
(let evaller (eval-cxt)
(evaller '(in-package User))
(whilet e (read f)
(evaller (hook e)))))
(do (= current-load-file* (pop load-file-stack*)) nil)))
As for CL packages, I've decided I don't really like the way they work. If a file is compiled, then (in-package foo) is only guaranteed to work if it appears at the top level. So...
(if (eq x 10) (in-package foo) (in-package bar))
works in an interpreted file, but the behaviour is undefined if the file is compiled. CLISP handles both cases fine.
Also in CL, the value of package* doesn't always correspond to the actual current package. For example
(setf x "I'm in the default pacakge!")
(setf foo::x "I'm in the FOO package!")
(setf *package* foo)
(print *package*)
(print x)
does this when interpreted
#<PACKAGE FOO>
"I'm in the FOO package!"
but this when compiled
#<PACKAGE FOO>
"I'm in the default pacakge!"
Either the package should be determined at eval-time (as was your suggestion) or the user should be forced to use read macros like #: and #.(in-package ...) to switch packages at read time. The CL solution is an ad-hoc compromise between the two.
Forcing the user to keep using read macros doesn't feel quite right. Personally I'm more for using 'eval-cxt objects, which would do the assignment from plain symbols to qualified symbols, and keep track of the current package.
Of course, using 'eval-cxt raises questions about static whole-program compilation, I think. Hmm. I haven't thought deeply about that yet.