Arc Forumnew | comments | leaders | submit | almkglor's commentslogin
2 points by almkglor 6283 days ago | link | parent | on: Norvig's spelling corrector in Arc?

Probably one potential problem would be the splitting of a string into words. A minor problem is that of figuring out what a "word" is, i.e. the division between words.

Otherwise looks like a pretty standard Bayesian analysis, which I believe pg has done already.

-----

1 point by fallintothis 6281 days ago | link

Not really. It's a simple regexp that Norvig uses.

Python:

  def words(text): return re.findall('[a-z]+', text.lower())
Arc:

  (def words (text) (tokens (downcase text) [~<= #\a _ #\z]))

-----


> This would be possible if there were easy ways to reprogram the Arc syntax in a library.

Again, like I said, this is probably implementable using readermacros, but fooling around with the reader is always troublesome.

Consider some random programmer who uses /ca/ as a variable name in his or her programs for some inexplicable reason. Whether this is considered a regular expression or a valid symbol will then depend on whether or not it is loaded before or after the regular expression library.

If you want nice regular expression syntax, then it must be standardized as part of the language syntax so that everyone knows they should avoid using such variable names. Alternatively, give some method for specifying a reader for each module file. No, this is an exploratory language, and someone will try using /ca/ as a variable name unless you specifically ban it. I promise you that.

This is only partially implementable using ssyntax, but again this may be considered as "fooling around with the reader".

If strings as regular expressions work for you, then it's okay, since strings are already standardized in the syntax:

  arc> ((rex "s/b(\\w)/d\\\\1/g") "bobcat")
  "dodcat"

-----

4 points by cchooper 6289 days ago | link

Strings would work ok if Arc had a means of representing unprocessed strings like Perl or C#.

  arc> ((rex @"s/b(\w)/d\\1/g") "bobcat")
Has lots of other uses too, so I think this would be a good feature regardless.

-----

1 point by almkglor 6288 days ago | link

True, another good place would be docstrings.

Now all we need is to (re)build a reader for Arc. ^^

-----

2 points by almkglor 6290 days ago | link | parent | on: Why I think Arc should use packages

> My preferred solution, FYI, would be some kind of crazy thread-specific namespacing.

The devil is in the details.

-----


a plain expression like /ca/ seems rather straightforward to me.

This requires some kind of special version of the reader.

One thing I suggest would be to use some sort of # readermacro (which you'll have to hack in the underlying Scheme):

  #/ca/
  #/s/foo/bar/
If you want nice syntax for regexps, almost definitely it will have to be part of the axioms, or at least readermacros (which I personally don't like). Otherwise if representing them via strings is acceptable, then we don't need it as part of the axioms.

-----

2 points by bOR_ 6289 days ago | link

I am not sure what the consequences are for possible implementations, but preferrably, I would want to be able to use regexps in (find or (keep, or (findsubseq just as easily as I would now use strings or functions

  (keep odd (list 1 2 3 4))
  (keep (reg /nan/) (list "banana" "bonobo" "bandanga"))
  (findsubseq (reg /\d+:\d+/) "The current time is 10:00 am")
When looking at http://arcfn.com/doc/string.html , there is an aweful lot of restrictions on when we can use variables, or functions as arguments, so maybe a lot of the string operations in there become obsolete if there is a regexp engine in arc.

-----


Re: "chunk of code should be executed atomically" - I'm not 100% sure but I think it's possible that (from the way 'atom is implemented) a sequence that is not protected within an 'atom scope will be able to see an "atomic" operation in progress.

Basically 'atom just acquires a Global Interpreter Lock around the function it executes. I'm not sure, but from the semantics of locking, I think that another thread may see the "atomic" operation step-by-step unless it is itself protected by 'atom (and then there is potential contention for a single global lock!)

Of course, mzscheme uses green threads, so maybe it won't work that way.

-----

2 points by almkglor 6292 days ago | link | parent | on: Why I think Arc should use packages

Then 'self will have to be part of the arc::v3 interface.

There's a reason why there's an interface abstraction in my proposal.

-----

2 points by almkglor 6294 days ago | link | parent | on: Show and Tell: elliottslaughter.net

Arc doesn't have any.

If it's always N decimal places of a reasonable number, you can do magic stuff like:

  (def to-N-places (f (o N 3))
    (with (fact (let rv 1
                  (repeat N (zap * rv 10))
                  rv)
           float [+ _ 0.0])
      (pr:/ (float:floor:* f fact) (float fact))))
It won't pad though.

-----

3 points by eds 6293 days ago | link

Your version doesn't round.

  arc> (do (to-N-places 5.123456789 5) (prn))
  5.12345
  nil
But the following function will.

  (def to-N-places (f (o N 3))
    (let s (string (to-nearest f (expt 10.0 (- N))))
      (cut s 0 (min (+ N 1 (pos #\. s)) (len s)))))

  arc> (to-N-places 5.123456789 5)
  "5.12346"
(That said, there may be other problems with it.)

Personally, I think we really need CL-style format.

-----

1 point by eds 6290 days ago | link

After a bit more searching, I finally found a page on printing numbers in the scheme cookbook (http://schemecookbook.org/Cookbook/NumberPrinting), specifically the first example which uses SRFI 48 for some basic formatting support (http://srfi.schemers.org/srfi-48/srfi-48.html). The following to works in Anarki/MzScheme 352.

  arc> ($ (require (lib "48.ss" "srfi")))
  #<void>
  arc> (def format args ($ (format ,@args)))
  #<procedure: format>
  arc> (format "~4,4F" (sqrt 2))
  "1.4142"
Enjoy ;-)

-----

2 points by almkglor 6294 days ago | link | parent | on: Show and Tell: elliottslaughter.net

> Although I have managed to figure out how to enable forwarding port 80, I can't find anything on stopping access to port 8080 (so you can still access that port from the internet...). If someone could tell me how to lock that down, I would appreciate it. Thanks in advance.

http://arclanguage.org/item?id=2697

-----

1 point by eds 6294 days ago | link

Thanks for the reminder, I had forgotten about that.

I'm still not sure that's a complete solution though... you don't really prevent access to the port, you just send an access denied message instead of serving the request. (But I don't know that much about web security, so maybe that really is sufficient.)

-----

1 point by almkglor 6294 days ago | link

Not sure either. It depends on whether the Arc Server is secure/{not dumb} enough such that it won't be fooled by someone pretending to be from 127.0.0.1 , for example.

-----

1 point by eds 6293 days ago | link

Couldn't you just make Apache or Linux firewall port 8080 so all attempts to access it from outside are blocked? (That said, I wouldn't know how to do that off the top of my head.)

-----

2 points by gnaritas 6292 days ago | link

Yes, and that's the right approach. See http://articles.slicehost.com/2008/4/25/ubuntu-hardy-setup-p... and scroll down to iptables to see how to setup a firewall on Linux.

-----

1 point by eds 6286 days ago | link

Thanks! That was a really useful article, and not only for setting up firewalls.

-----

2 points by almkglor 6296 days ago | link | parent | on: Struquine in Lisp (and Arc)

> Well, this approach basically is the 'annotate approach, except instead of having a separate basic data type - a "tagged" object - you just use cons cells, where the car is the type and the cdr the rep.

Except for the case where (isa R T), where (annotate T R) returns R directly:

  arc> (annotate 'int 0)
  0

-----

1 point by rntz 6294 days ago | link

This isn't a fundamental difference. Just as 'annotate could as easily create a new doubly-wrapped object, 'annotate-cons can as easily not cons when the type is the same.

    (def annotate-cons (typ val)
      (if (isa val typ) val (cons typ val)))

-----

4 points by almkglor 6297 days ago | link | parent | on: Why I think Arc should use packages

I've been bashing my head in this somewhat, in SNAP. The problem is always with the reader function: we cannot use a single reader function, since two different process could be using the reader and expecting it to be in two different packages. Obviously we would have to create separate monadic readers for each process.

The other problem then becomes: how do we implement, say, (import ...) or (in-package ...) ? Should the reader silently filter them out? If not, how does 'eval notify the reader that the package is changed? If someone defines foo::in-package and bar::in-package, which one does the reader return, and how will 'eval know if the package should be changed?

Yet another problem would be intrasymbol syntax: should the reader leave foo::bar!nitz, or should it process it into (foo::bar 'foo::nitz) or even (foo::bar 'quux::nitz) if (import quux::nitz foo::nitz) is used?

Yet another problem is the "standard" package, i.e. CL-USER in CL. Obviously all symbols from this package are imported into all packages.... or should they? The problem is a problem which PG ignored in ArcN: backwards compatibility. If I write code today which uses the symbol "convoke" as, say, a table in my function and tomorrow PG decides to create a macro "convoke" which does something else, kaboom! my code dies.

In my opinion we should separate packages into several interfaces.

Basically, suppose I release a package, AlmkglorSuperSupremePanPizza, and I define a "version 1" of this interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)
Then someone else can use:

  (import AlmkglorSuperSupremePanPizza::v1)
This imports the 'eat, 'drink, and 'be-merry symbols from AlmkglorSuperSupremePanPizza. Importantly, once I publish the interface, I cannot change it. If I realize that my interface is lacking, I will have to define a new interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)
  (interface v2
    ;include v1 interface elements
    v1 for-tomorrow-we-die)

-----

4 points by almkglor 6297 days ago | link

continued:

Anyway, I've been thinking rather deeply about packages and module systems and the like. One major reason for using a symbol-based package/module system is types: for example, if packageA defines a 'gaz type, and packageB defines a 'gaz type also, obviously the types are incompatible and they shouldn't be the same.

Also, for a potential solution from within the Scheme implementation, consider instead moving the problem from the reader to the evaller.

Instead of having symbol packages be handled by the reader, we might instead define a new builtin type, 'eval-cxt. An 'eval-cxt object is simply an evaluator, but understands packages and the (in-package ...) and (import ...) etc. syntaxes.

The reader simply reads in symbols blindly, without caring about the exact package they should go to. This simplifies the reader and allows us to continue using the reader and writer for saving plain Arc data. Instead, any particular context for evaluation is put into the 'eval-cxt object. 'eval-cxt will understand 'import etc. forms, and will perform the translation of all symbols into their qualified, package-based counterparts, i.e.:

  (= evaller (eval-cxt))
  (= tmp (read))
  user input> '(hello world)
  => (quote (hello world))

  (evaller tmp)
  => (arc-user::hello arc-user::world)
The Arc REPL would then be something like:

  (def tl ()
    (let my-eval (eval-cxt)
      ((afn ()
         (pr "arc> ")
         (write:my-eval:read)
         (self)))))
This simplifies 'read, EXCEPT: intrasymbol syntax must, must be absolutely expanded by the reader. Why? Because otherwise macros whose expansions use intrasymbol syntax won't work properly:

  (in-package sample)
  (= private-table
     (table 'foo 42
            'bar 99))
  (mac foo ()
    `(private-table!foo))
The problem is the 'foo symbol above: the macroexpansion must express both private-table as sample::private-table and 'foo as sample::foo.

Incidentally, the 'eval function can still be expressed rather simply by this manner:

  (def eval (e)
    ((eval-cxt) e))
Alternatively we can give an optional package name:

  (w/uniq no-param
    (def eval (e (o p no-param))
      (let evaller (eval-cxt)
        (unless (is p no-param)
          (evaller `(in-package ,p)))
        (evaller e))))
This keeps maximum backward compatibility with ArcN: a simple reader and a simple eval function

-----

3 points by cchooper 6296 days ago | link

Hmm... interesting stuff.

I think I'll keep your interfaces idea. It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

I don't really understand the problem with 'in-package. Is this a SNAP-specific problem or does it affect Arc generally? Won't things work the same way as in CL?

Some other thoughts I had about this which may be useful: modules are usually kept in a file, so a 'load-in-package function which takes a package argument might be useful

Also, I thought it would be good to have a read-macro to switch packages. I'll reuse #: as it's not needed in my system:

  #:foo (some expressions)
This would read the expressions in package 'foo before executing them. That might solve your problem as the package is passed to the reader explicitly.

I like the idea of symbols being read in without a package, but then getting a package at eval time. One way to implement this may be to store all the symbols in a special package when they are read, then eval can move these to a new package when it evaluates them. This makes packages very dynamic.

One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc. That;s just a bit crazy, so I think strings should be used to name packages instead. Alternatively, package names should also be interned in a special package that's treated differently. Seeing as packages are just mappings from strings to symbols, that doesn't really make much difference.

-----

3 points by almkglor 6296 days ago | link

> It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

Which is the point, of course ^^

The other point is disciplining package makers to make package interfaces constant even as newer versions of the package are made. This helps preserve backward compatibility. In fact, if the ac.scm and arc.arc functions are kept in their own package, we can even allow effective backward compatibility of much of Arc by separating them by version, i.e.

  (using arc v3)
  (using arc v4)
  (using arc v5)
> I don't really understand the problem with 'in-package.

  ; tell the reader that package 'foo has a symbol 'in-package
  (= foo::in-package t)
  ; enter package foo
  (in-package foo)
  ; now: does the reader parse this as (in-package ...) or (foo::in-package ...)
  (in-package bar)
> Is this a SNAP-specific problem or does it affect Arc generally?

It's somewhat SNAP-specific, since we cannot have a stateful, shared reader, but I suspect that any Arc implementation that supports concurrency of any form will have similar problems with having readers keep state across invocations. The alternative would be having a monadic reader.

> Won't things work the same way as in CL?

Not sure: I never grokked anything except the basics of CL packages.

> #:foo (some expressions)

How about in a module file? It might get inconvenient to have to keep typing #:foo for each expression I want to invoke in the foo package, which means we really should think deeply about how in-package should be properly implemented.

> One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc.

If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package, i.e. foo::bar is always foo::bar, as long as :: exists in the symbol.

Of course, hierarchical packages are nice too ^^

-----

1 point by cchooper 6294 days ago | link

> #:foo (some expressions)

I'm assuming you can use it like this:

  #:foo 
  ((def (x) (+ 1 x))
   (def (y) (expt y))
   (= bar 123))
or like this:

  #:foo ((load "filename"))
so you only have to type it once (although you would at the top level!) With this syntax, #:foo x could expand to something like (read-with-package "foo" x), so you wouldn't need a stateful read. Well, unless you called 'read within the file. So I guess you do. :)

> If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package

True, but it will be a bit confusing:

  (import 'foo::bar 'foo::quux)
  (import 'foo::bar 'baz::quux)
Are 'foo:quux and 'baz::quux the same package? If so, it's a bit strange that you can refer to the same thing by different symbols. That's why I think strings are better. Not sure what I think about nested packages. I'll have to ponder on that.

-----

1 point by almkglor 6294 days ago | link

load is currently defined as:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (whilet e (read f)
          (eval (hook e))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))
What magic needs to be inserted here to make 'load use the correct 'read, keeping in mind that even plain Arc supports threads and those threads share global variables?

It still looks like a stateful 'read to me, and I don't want a stateful 'read at all, because a file might want to directly use 'read:

  $ cat getconfig.arc
  (= configuration (read "my.cfg"))
This is one good reason to try to keep 'read stupid: one of Arc's idioms is to simply dump data as s-expressions and read them in later as list structures. If 'read is too smart, this idiom might have some subtle gotchas.

For that matter I'd prefer to keep the package definitions in the file itself, rather than have to remember to put the file in a package:

  $ cat mine.arc
  (in-package mine)

  (def mine ()
    (prn "this is my mine!!"))
> (import 'foo::bar 'foo::quux)

Okay, I have to ask: what does 'import mean?

-----

1 point by cchooper 6293 days ago | link

I would define 'load-in-package as

  - (def load (file (o hook))
  + (def load-in-package (package file (o hook))
  - (whilet e (read f)
  + (whilet e (read-in-package package f)
That's the best I can do. I think that if packages are involved then read is inherently stateful, so even threads are a problem. I have no idea how CL implementations deal with threads and package*, because the spec makes no account for it. :(

> (import 'foo::bar 'foo::quux)

Oops, that should be

  (import 'foo::quux 'baz::quux)

-----

1 point by almkglor 6293 days ago | link

Here's my proposal:

We move all state information into a new object type called a "context". It can be constructed without parameters via the 'cxt function:

  (cxt)
  => <implementation-specific>
  (type (cxt))
  => arc::cxt
The REPL becomes a RCEPL, a read-contexter-eval-print loop. For convenience, we also provide an 'eval-cxt object:

  (eval-cxt)
  => <implementation-specific>
  (type (eval-cxt))
  => arc::eval-cxt
'eval-cxt objects are callable, and their call is equivalent to:

  (let ob (eval-cxt)
    (ob x))
  ==>
  (let ob (cxt)
    (eval:ob x))
The implementation is free to define 'cxt and/or 'eval-cxt objects in terms of Arc axioms or by adding them as implementation-specific axioms.

The context object accepts a plain read expression (with unpackaged symbols) and emits an s-expression where all symbols are packaged symbols.

It is the context object which keeps track of the current package, so you might have some accessor functions to manipulate the context object (e.g. destructure it into the current package, etc.).

The read function is stateless and simply emits unpackaged symbols, and emits packaged symbols if and only if the given plaintext specifically includes a package specification.

A package object is a stateful, synchronized (as in safely accessible across different threads, and whose basic operations are assuredly atomic) object. A context is a stateful object intended for thread- and function- local usage.

context objects

===============

A context object is callable (and has an entry in the axiom::call* table) and has the following form:

  (let ob (cxt)
    (ob expression))
The return value of the context is either of the following:

1. If the expression is one of the following forms (the first symbol in each form is unpackaged, 'symbol here is a variable symbol):

  (in-package symbol)
  (interface symbol . symbols)
  (using symbol)
  (import symbol symbol)
...then the return value is axiom::t, and either the context's state is changed, or the state of a package (specifically the current package of the context) is changed.

2. For all other forms, it returns an equivalent expression, but containing only packaged symbols. The state of the context is not changed.

The forms in number 1 above have the following changes in the context or current package of the context:

  (in-package symbol)
Changes the current package of the context to the package represented by the unpackaged symbol. The implementation is free to throw an error if the given symbol is packaged.

  (interface symbol . symbols)
Defines an interface. All symbols are first applied to the current package to translate them into packaged symbols, if they are unpackaged (this translation by itself may change the package's state, and also a packaged symbol will simply be passed as-is by the package object; see section "package objects" below). It then modifies the package of the first symbol to have an interface whose symbols are the given symbols.

If the interface already exists, it is checked if the lists are the same to the existing list. If it is not the same, the implementation is free to throw an error.

  (using symbol)
The given symbol must be a packaged symbol. It must name an interface of its package; if the interface does not exist on the package, the implementation must throw an error. For each symbol in the interface, this changes the current package's state, creating or modifying the mapping from the unpackaged symbol of the same name to the symbol in the interface.

For conflicting package interfaces: let us suppose that the context is in package 'User, and there exists two package interfaces, A::v1 and B::v1. A::v1 is composed of (A::foo A::bar) while B::v1 is composed of (B::bar B::quux). If the context receives (using A::v1), the User package contains the mapping {foo => A::foo, bar => A::bar}. Then if the context receives (using B::v1), the User package afterwards contains the mapping {foo => A::foo, bar => B::bar, quux => B::quux}.

  (import symbol symbol)
Forces the current package to have a specific mapping. The first symbol must be a packaged symbol and the second symbol must be unpackaged. The implementation must throw an error if this invariant is violated.

Continuing the example above, after (import A::bar A-bar), this changes the package to {foo => A::foo, bar => B::bar, A-bar => A::bar, quux => B::quux}

package objects

===============

A package object is callable and has the following form:

  (ob expression)
expression must evaluate to a symbol, and if a non-symbol is applied to a package object, the implementation is free to throw an error. The application otherwise evaluates to either:

1. The same symbol, if the given symbol is a packaged symbol; this does not change the state of the package

2. A packaged symbol, if the given symbol is an unpackaged symbol. If the package does not contain a mapping for the unpackaged symbol, the state of the package is changed so that a mapping for the unpackaged symbol to a packaged symbol exists.

The package object also supports an 'sref operation:

  (sref ob v k)
k is an unpackaged symbol while v is a packaged symbol; the implementation is free to throw an error if this invariant is violated.

Packages are handled by interface.

Further, we also predefine two packages, axiom and arc.

The axiom package contains the following symbols:

  axiom::t
  axiom::nil
  axiom::fn
  axiom::if
  axiom::quote
  axiom::quasiquote
  axiom::unquote
  axiom::unquote-splicing
  axiom::set
  axiom::call*
The axiom package is implicitly imported into all packages. It presents no interface

The arc package contains all "standard" Arc functions and macros. The arc package is not implicitly imported into all packages.

The arc package contains the interface arc::v3 . This interface is the set of symbols currently defined on Anarki. Future extensions to the arc standard library must be first placed in the interface arc::v3-exp until they are placed into a future arc::v4 interface, and so on.

load

====

The load implementation is thus:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (let evaller (eval-cxt)
          (evaller '(in-package User))
          (whilet e (read f)
            (evaller (hook e)))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))

-----

1 point by cchooper 6294 days ago | link

As for CL packages, I've decided I don't really like the way they work. If a file is compiled, then (in-package foo) is only guaranteed to work if it appears at the top level. So...

  (if (eq x 10) (in-package foo) (in-package bar))
works in an interpreted file, but the behaviour is undefined if the file is compiled. CLISP handles both cases fine.

Also in CL, the value of package* doesn't always correspond to the actual current package. For example

  (setf x "I'm in the default pacakge!")
  (setf foo::x "I'm in the FOO package!")
  (setf *package* foo)
  (print *package*)
  (print x)
does this when interpreted

  #<PACKAGE FOO>
  "I'm in the FOO package!"
but this when compiled

  #<PACKAGE FOO>
  "I'm in the default pacakge!"
Either the package should be determined at eval-time (as was your suggestion) or the user should be forced to use read macros like #: and #.(in-package ...) to switch packages at read time. The CL solution is an ad-hoc compromise between the two.

-----

1 point by almkglor 6294 days ago | link

Forcing the user to keep using read macros doesn't feel quite right. Personally I'm more for using 'eval-cxt objects, which would do the assignment from plain symbols to qualified symbols, and keep track of the current package.

Of course, using 'eval-cxt raises questions about static whole-program compilation, I think. Hmm. I haven't thought deeply about that yet.

-----

More