Arc Forum | Module idea for PyArc

Arc Forum

Module idea for PyArc

4 points by Pauan 5225 days ago | 40 comments

This describes my plan for modules in PyArc. It's already implemented, and appears to work correctly, but PyArc itself is not finished, and cannot parse/eval all of arc.arc yet. Henceforth, I shall refer to Arc 3.1 in MzScheme as "MzArc" and my Arc interpreter written in Python as "PyArc".

I had some issues with other module systems I saw. One was complexity. There's nothing wrong with a complex module system per se, but I would rather keep Arc lightweight and simple if I can. I would rather have a simple module system at the core, and then build a more complex system on top of that.

For me, the primary concern was namespace collisions. I wanted to be able to import a library somebody else made without worrying about it breaking stuff in my program. In other words, I wanted to be able to call library functions they've created, without letting them see or tamper with my stuff, unless I explicitely allow them to.

To accomplish that, I made a few changes. They shouldn't cause existing code to break, but obviously programs that use the extensions won't work in MzArc.

First, eval and load can take a second optional parameter. This is the evaluation environment. If it is not specified, the code is evaluated in the current environment, which is what MzArc does right now.

What is an environment? Anything that supports get/set on key/value pairs. That means (table) can be used as an environment, and so can an alist. Custom types can be used as well, provided that the interpreter knows what to do with them. All that is required is that (foo 'x) returns something, and (= (foo 'x) 'y) sets something.

Lookups are performed in the environment you specify, and (assign) is scoped there as well. This means you can evaluate code in a safe, clean environment, isolated from everything else:

  (eval '(+ 10 5) (table)) -> error: `+` is undefined

The above throws an error because `+` is not defined in the environment. PyArc provides a new built-in function, called `new-namespace`, which returns a table that contains the built-in functions, including those defined in arc.arc. So if you want the previous example to work, but don't want it to have access to your globals, you can use this:

  (eval '(+ 10 5) (new-namespace)) -> 15

You could also pick-and-choose specific variables from the current namespace. For instance, the following allows the code to only use the + function:

  (eval '(+ 10 5) (obj + +)) -> 15

That is all that is required to create a simple module system. How do you use it, then? Let's suppose you had the files foo.arc and bar.arc and you wanted to import them into your program, but without icky namespace issues. You could use this:

  (= foo (load "foo.arc" (new-namespace)))
  (= bar (load "bar.arc" (new-namespace)))

Now, all the global variables, functions, macros, and ssyntax are available in the tables foo and bar. However, foo.arc and bar.arc are not allowed to access anything in your namespace. Let's suppose that foo.arc defines a function called my-let, you could use it like so:

  (foo!my-let ...)

What if you don't want to type foo! all the time? You can pull it into your namespace:

  (= my-let foo!my-let)
  (my-let ...)

Okay, but what about macros? They're scoped to the module, just like everything else. Assuming bar.arc contains a macro called my-do, you can pull it into your namespace just like with my-let:

  (= my-do bar!my-do)

Now, whenever my-do appears after that assignment, it will be expanded, like as if you had written it yourself. The same goes for ssyntax. If a file defines ssyntax, it will only affect that file. But other files can use it as well, if they choose to import it.

All the above is pretty verbose, though, so I also defined a macro called import:

  (import foo "foo.arc"
          bar "bar.arc")

The above is the same as calling load with (new-namespace) and assigning to a symbol, but it's much more concise. It also supports an alternate syntax, that lets you import only certain variables:

  (import (my-let) "foo.arc"
          (my-do)  "bar.arc")

When the first argument is a list, it specifies which variables to import. That means the above is equivalent to:

  (with (foo (load "foo.arc" (new-namespace))
         bar (load "bar.arc" (new-namespace)))

    (= my-let foo!my-let)
    (= my-do  bar!my-do))

If you know Python, then the following are roughly equivalent:

  (load "foo.arc")            -> from foo import *
  (import foo "foo.arc")      -> import foo
  (import bar "foo.arc")      -> import foo as bar
  (import (my-let) "foo.arc") -> from foo import my-let

In this module system, it is the person who imports the module who decides what the module's name is. This has at least a couple benefits: first off, it means that you can import Arc code that knows nothing about modules. In some module systems, you're required to use something like (module ...) in your program, but that means that you can't import files that don't use that convention.

Second, it creates a problem when you want to import two or more modules, but they define the same name. Consider if foo.arc and bar.arc had both used (module "something"), that would cause a name collision! But because the caller is the one who defines the names, they can easily resolve the issue.

I considered the two points above to be important, and so they guided me as I designed this module system. If you want, it should be possible to build a more complex system on top of this simple core, but I personally don't have a need for that (yet).

Also, despite being simple, the system defined above allows for some interesting things. For instance, suppose you wanted to eval foo.arc and bar.arc in the same environment, but not in your namespace:

  (= env (new-namespace))
  (load "foo.arc" env)
  (load "bar.arc" env)

Now the table env contains the globals for both foo.arc and bar.arc, but neither have access to your globals (unless you let them). How do you let a module have access to your globals? Simple, you just assign them to the environment:

  (= env (new-namespace))
  (= env!my-func my-func)
  (= env!my-eval my-eval)
  (= env!my-defn my-defn)
  (load "foo.arc" env)
  (load "bar.arc" env)

Now foo.arc and bar.arc have access to the built-in globals, my-func, my-eval, and my-defn, but nothing else. Thus, modules interact in a predictable way, controllable by you. Or, how about importing the same file twice, but in two separate namespaces?

  (import foo1 "foo.arc"
          foo2 "foo.arc")

Even though they are the same file, the namespaces foo1 and foo2 are separate from each other. So, as you can see, the caller has control over how to mix and match modules, rather than the one who wrote the module.

Random side note: see why I dislike using ! for property access? Using . would be so much nicer.

1 point by Pauan 5225 days ago | link

rocketnia brought up an excellent point, that sometimes a module will want to share state with all modules. For instance, consider a module that defines a coerce* table, and overwrites the built-in function coerce.

When you want your custom data-type to be coercable, you can modify the global coerce* table, and voila. The problem is, this module system isolates modules from each other a bit too well. My solution is to allow for changing the built-ins directly. This means that all future calls to (new-namespace) will contain the changes.

Of course, this gives modules a lot of power, since they can change how all modules loaded after them behave. Imagine a module that overwrites eval, for instance. With this module system, such changes would be isolated to the module, so everything is fine. But if it's possible to overwrite the built-ins directly, then one module could break things in other modules, even if they don't explicitly allow it.

Then again, that kind of raw power seems to me to be in the spirit of Arc, so I'm going to try that approach. My suggestion, however, is to use it sparingly: only when your module really needs to persist and be accessible even to modules that haven't imported you.

So... one question: how should you define built-ins? Should there be a special global built-ins* that you can write to directly? Should it be a function call?

-----

1 point by Pauan 5225 days ago | link

I gave this some more thought.

My worry with something like built-ins* is that it gives the programmer a lot of power to break stuff. Not just in their module, but in all modules! But then I realized that any module can just use (system "rm -rf /") and break far more.

Modules are child-proof in the sense that they're protected from other modules, but the built-in functions like `system` are not safe. So, really, you should only import modules that you have verified, or that have been verified by somebody you trust, etc. [1]

Don't rely on the module system to protect you from other people's bad code. The module system exists to make programming easier, better, and more flexible; not necessarily safer. [1]

What this means is that the whole "built-ins is dangerous" thing is a non-issue. Regardless of how dangerous it is, there are already far more destructive things in Arc, so I might as well include built-ins* and give people the flexibility to do stuff.

---

With that out of the way, here's my current plan:

Provide a global __built-ins* variable that contains the built-in functions. Using (new-namespace) creates an empty namespace that inherits from __built-ins* [2]

This means that if you change __built-ins* it will change all modules, even modules that were loaded before yours. This completely destroys the conceptual model of modules being distinct and isolated from each other.

Thus, the name is intentionally ugly. You shouldn't be mucking around with __built-ins* unless you need to, but it's available just in case you do need it. Though... do you think the two underscores are too Pythony?

This has some interesting implications. Aside from the obvious ones (being able to create a persistent function/table/whatever that is available to all modules), it also means you can load any module into the built-ins:

  (load "foo.arc" __built-ins*)

So for instance, the Arc core is implemented in arc.arc, but there's also some libraries like string.arc, code.arc, etc. For now, the plan is for PyArc to only load arc.arc, but we'll see.

In any case, let's say you wanted it to behave like as if PyArc loaded arc.arc and strings.arc. You could use this:

  (load "strings.arc" __built-ins*)

Voila. Now the functions defined in strings.arc are available as globals in every module, including yours. This is the only way to do this. If you used `load` without a second argument, it would load them into your module, but other modules wouldn't have access to them (unless they loaded it too).

[1]: partially invalidated; see below

[2]: not anymore; see below

---

Oh, and here's my plan for handling stuff like (system) etc.

There should be a blacklist (or a whitelist?) of "safe" and "unsafe" functions. With "safe" being defined as "any changes made only affect the module, and do not affect other modules, or the system as a whole."

Obviously things like (system) or __built-ins* are unsafe, etc.

Then, there would be two types of Arc code: trusted and untrusted. Trusted can do anything. Untrusted code can only do safe things, but not unsafe.

Since trusted code can load other modules with __built-ins* this lets trusted code elevate the privileges of any other module.

So, why is this important? Well, it means you only need to verify trusted modules. If a module uses only safe functions, then you can load it in a restricted namespace, which means that you can safely load that module even if it's malicious: any bad stuff it does is scoped to the module.

Obviously trusted modules will need to be verified to ensure they don't do nasty stuff, but that's still far better than having to verify every module. You would only need to verify modules that actually use the unsafe functions. I expect most modules will work fine with the safe subset, so this hopefully would save a lot of time spent verifying.

To accomplish this, I plan to change `new-namespace` so it returns a safe namespace. It will also be changed to accept an optional first parameter: the namespace to inherit from. Thus, if you want an unsafe namespace, you can use this:

  (new-namespace __built-ins*)

This also allows you to inherit from other namespaces:

  (import foo "foo.arc")
  (load "bar.arc" (new-namespace foo))

The above loads bar.arc in a new namespace, that inherits from foo.arc's namespace. So if a variable is not found in bar's namespace, it will then check foo's namespace, etc.

The point of it isn't to restrict what programs can do. The point is to give modules only as much functionality as they need, making it easier to verify that they don't do bad stuff.

Oh, by the way. The whole "unsafe/safe" distinction isn't going to be enforced by the interpreter. So it won't be like you submit a module to a committee and they'll put a "trusted" stamp on it or anything like that. It'll just be a community convention thing. Thus, you could load all modules as trusted code if you wanted to, but then you run the risk of a random module doing bad stuff.

The only way the interpreter will "enforce" this is to provide a standard way of making safe namespaces, and ensuring that safe namespaces can't do bad stuff unless you let them. It's up to you whether you want to load a module as safe or unsafe.

The way I'll handle this is that any modules you specify on the command line...

  ./arc foo.arc bar.arc

...will be loaded as trusted. They can then choose to load other modules as trusted/untrusted as they wish. If people want, I could provide a way to load an untrusted module from the command line, but I don't think there'll be any need for that for a while.

I'm not a security expert or anything, so if you see any sort of flaw with this plan, please point it out!

---

Also, here's an idea I had:

A (current-namespace) function that returns the current namespace. That means that the following two are equivalent:

  (load "foo.arc")
  (load "foo.arc" (current-namespace))

Why? Well, here's my plan for handling macros with modules. While within a module, macros are unhygienic. It's up to you to not screw stuff up: it's your module, after all.

But when importing a module, macros always expand in the namespace they were defined in. There's a problem with that, though:

  ; foo.arc

  (def something () "hello")
  (mac message () '(something))


  ; bar.arc

  (import (message) "foo.arc")
  (message) -> "hello"

  (def something () "goodbye")
  (message) -> "hello"

As you can see, bar.arc wants to expand the macro `message` in bar's namespace, not in foo's namespace. Because macros are hygienic by default, this won't work. You can force it with (w/namespace), though:

  (w/namespace (current-namespace)
    (message)) -> "goodbye"

That means that (message) is equivalent to (w/namespace foo (message))

Could this be handled by `eval`, maybe? Something like this:

  (eval '(message) (current-namespace))

Then w/namespace could just be a convenience macro. Not sure how I'll do the interpreter magic to make that work, though. Guess I'll try it and see!

-----

1 point by rocketnia 5224 days ago | link

"Thus, the name is intentionally ugly. You shouldn't be mucking around with __built-ins * unless you need to, but it's available just in case you do need it. Though... do you think the two underscores are too Pythony?"

"Too Pythony" was my first impression, lol, but it makes sense according to your naming scheme. An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?

---

"There should be a blacklist (or a whitelist?) of "safe" and "unsafe" functions."

I'm not a security expert either, but I don't know if that should be a global list. Suppose I make an online REPL which executes PyArc code people send to it, and suppose my REPL program also depends on someone else's Web server utilities, which I automatically download from their site as the program loads (if I don't already have them). I might distrust the Web server library in general but trust it enough to open ports, but I also might not want REPL users to have that power over my ports.

I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one.

Is there a way to execute resource-limited code in Python, like http://docs.racket-lang.org/reference/Sandboxed_Evaluation.h...? ...Hm, I suppose (http://wiki.python.org/moin/How%20can%20I%20run%20an%20untru...) is a starting point to answer that.

---

"As you can see, bar.arc wants to expand the macro `message` in bar's namespace, not in foo's namespace."

Well, it's fine if that's what you expect as the writer of bar.arc, but I'd expect things to actually succeed at being hygienic. My approach to bar.arc would be more like this:

  (= foo!something (fn () "goodbye"))

This doesn't need to pollute all uses of foo.arc in the application; bar.arc can have its own separate instance of foo.arc.

There may still be a namespace issue though. If foo.arc defines a macro with an anaphoric variable, like 'aif, and then bar.arc uses foo.arc's version of 'aif, then the anaphoric variable will still be in foo.arc's namespace, right? My own solution would look something like this:

  ; in foo.arc
  (mac aif ...
    `(...
       ,(eval ''it caller-namespace)
       ...))

-----

1 point by Pauan 5224 days ago | link

"An underscore means it's something people shouldn't rely on accessing, and two underscores means it's something people, uh, really shouldn't rely on accessing?"

Yeah, I figured two underscores served as more emphasis than one. :P Also, two underscores seemed uglier to me, and also distinguished it from internal ("private") variables.

---

"I think it would make more sense to control security by manually constructing limited namespaces and loading code inside of them. There's likely to be a common denominator namespace that's as secure as you'd ever care to make it, but it doesn't have to be the only one."

When I said "global list" what I meant was just defining which are safe and which aren't. Then having a default safe namespace that would contain the items deemed safe.

Yeah, you can create custom namespaces, for instance you could create a safe namespace that allows access to safe functions and (system), but nothing else:

  (= env (new-namespace))
  (= env.'system system)
  (load "foo.arc" env)

Voila. In fact, here's how you could handle the scenario you described:

  (= unsafe-env (new-namespace))
  (= unsafe-env.'open-socket open-socket)
  (= web (load "web-server.arc" unsafe-env))

  (= safe-env (new-namespace))
  (eval (read-input-from-user) safe-env)

Thus, web-server.arc has access to the safe functions, and open-socket. Meanwhile, the input that you get from the user is eval'd in a safe environment. It's a very flexible system. The above is verbose, I admit, but that can be fixed with a macro or two.

---

"My approach to bar.arc would be more like this:"

Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad.

Also, there can't be a `caller-namespace` variable (at least not implicitly), because then untrusted code could access trusted code, and so why have a distinction at all? Your example would work, but only if importers explicitly decide to give access:

  (= env (new-namespace))
  (env.'caller-namespace (current-namespace))
  (= foo (load "foo.arc" env))

Now foo.arc can access caller-namespace, because you're allowing them to.

---

Side note: I'm going to start using .' rather than ! because I think the former looks nicer.

-----

1 point by rocketnia 5224 days ago | link

"Side note: I'm going to start using .' rather than ! because I think the former looks nicer."

I agree. ^_^

---

"Hm... I admit that would probably be a clean solution most of the time, but what if you want both `something`s at the same time? You end up needing to store a reference to the old one and juggling them back and forth. Maybe that wouldn't be so bad."

I don't know what else you could do if you wanted both somethings at once. ^^; That said, I think I'd just explicitly qualify foo!something or import it under a new name.

There's some more potential trouble, though. If a module defines something that's supposed to be unique to it, then two instances of that module will have separate versions of the value, and they may not be compatible. If a module establishes a framework, for instance, then two instances of the module may define two frameworks, each with its own extensions, and some data might make its way over to the wrong framework at some point. On the other side of the issue, if a module extends a framework, then two instances of the module might extend it twice, and one of them might get in the way of the other.

There are several possible ways to deal with this. Code that loads a library (host code?) could load it in an environment that had dummy variable bindings which didn't actually change when they were assigned to, thereby causing the library to use an existing structure even if it created a new one. Framework structures could all be put in a single central namespace, as you say, and any code to make a new one could check to see if it already existed. A library could require some global variables to have already been defined in its load namespace, intentionally giving the host code a lot of leeway in how to specify those variables.

I've been considering all those approaches for Penknife, and I'm not sure what'll be nicest in practice. They all seem at least a little hackish, and none of them seems to really solve the duplicated-extension side of the issue, just the duplicated-framework side. At this point, I can only hope the scenarios come up rarely enough that whatever hackish solutions I settle on are good enough, and at least standardized so that not everyone has to reinvent the wheel. Please, if you have ideas, I'm all ears. ^_^

-----

1 point by Pauan 5224 days ago | link

When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?

In any case, all those approaches should work in PyArc, in addition to using __built-ins* (provided you really really want the library's unique something to be unique and available everywhere...)

Hm... come to think of it... an environment/namespace/module can be anything that supports get/set, right? It may be possible to create a custom data-type that would magically handle that. Somehow. With magic.

-----

1 point by rocketnia 5224 days ago | link

"When you say "unique to it" do you mean "only one value, even if the module is loaded multiple times"? Do you have any examples of where a module would want that?"

I thought I gave an example. If a module defines something extensible, then having two extensible things is troublesome, 'cause you have to extend both of them or be careful not to assume that values supported by one extensible thing are supported by the other.

---

"Somehow. With magic."

I propose also reserving the name "w/magic" for use in examples. :-p

-----

1 point by Pauan 5224 days ago | link

"something extensible?" Got any more specific/concrete examples?

---

"I propose also reserving the name "w/magic" for use in examples. :-p"

Okay, but if I find a magic function I'm going to put it in PyArc so you can use it in real code too. :P

-----

1 point by rocketnia 5224 days ago | link

""something extensible?" Got any more specific/concrete examples?"

I mean something extensible like the 'setter, 'templates, 'hooks, and 'savers* tables, as well as Anarki's 'defined-variables* , 'vtables* , and 'pickles* tables, all defined in arc.arc. These might sound familiar. ^_^

Lathe (my blob of Arc libraries) is host to a few examples of non-core Arc frameworks. There's the Lathe module system itself, and then there's the rule precedence system and the type-inheritance-aware dispatch system on top of that. There's also a small pattern-matching framework.

If you load the Lathe rule precedence system twice (which I think means invasively removing it from the Lathe module system's cache after the first time, but there may be other ways), you'll have two instances of 'order-contribs, the rulebook where rule precedence rules are kept. Then you can sort some rulebooks according to one 'order-contribs and some according to the other, depending on which instances of the definition utilities you use.

---

"Okay, but if I find a magic function I'm going to put it in PyArc so you can use it in real code too. :P"

I think I saw one implemented toward the end of Rainbow.... >.>

-----

1 point by Pauan 5224 days ago | link

Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated.

If you want your stuff to be available everywhere, stick it in __built-ins. Unless you have a better suggestion?

-----

1 point by rocketnia 5224 days ago | link

"Hm... I'm not sure why that's an issue, though. If a module imports your module, they'll get a nice clean copy. Then if a different module imports your module, they get a clean copy too. Everything's kept nice and isolated."

That's exactly why I'm not sure it'll come up much in practice. But as an example...

Suppose someone makes a bare-bones library to represent monads, for instance, and someone else makes a monadic parser library, and then someone else finally makes a Haskell-style "do" syntax, which they put in their own library. Now I want to make a monadic parser, but I really want the convenience of the "do" syntax--but I can't use it, because the parser library has extended the monad operations for its own custom monad type and the "do" library only sees its own set of extensions.

You mentioned having the person loading the libraries be in charge of loading their dependencies, and that would yield an obvious solution: I can just make sure I only load the monad library once, giving it to both libraries by way of namespace inheritance or something.

But is that approach sustainable in practice? When writing a small snippet for a script or example, it can't be convenient to enumerate all the script's dependencies and configure them to work together. Over multiple projects, people are going to fall back to in-library (load ...) commands just for DRY's sake. What I'd like to see is a good way to let libraries specify their dependencies while still letting their dependents decide how to resolve them.

---

"Unless you have a better suggestion?"

I've told ya my ideas: Dummy global variable bindings and/or a central namespace and/or configuration by way of global variables. (http://arclanguage.org/item?id=14036) They're all too imperfect for my taste, so I'm looking for better suggestions too.

-----

2 points by Pauan 5224 days ago | link

Hm... like I said, it should be possible to build a more complicated system on top of the simple core, though I'm not sure exactly how it would work.

But... here's an idea: a bootloader module that would load itself into __built-ins* so it could persist across all modules, including modules loaded later.

It could then define (namespace ...) and (require ...) functions or something. Modules could be written using said constructs, and the bootloader would then handle the dependencies, creating namespaces as needed. And it could keep a cache around, so re-importing a module that has already been imported will just grab it from the cache.

The bootloader could then define (use ...) or something, which would do all the automatic dependency and caching junk, but you could still use plain old (load) and (import) to bypass the bootloader and get more refined control. Something like that may work.

Haha, I just had a crazy idea. What if a module imported itself into __built-ins* ? Something like this:

  ; foo.arc

  (if no:__built-ins*.'foo-check
    (do
      (= __built-ins*.'foo-check t)
      (load "foo.arc" __built-ins*))
      
    (do
      ; define rest of foo.arc here
      ...))

I suspect any solution will have some wart or other. Tradeoffs and all that. Also, the solution to the specific problem you mentioned is to load them all in a single namespace, right? Or at least namespaces that inherit from some common one.

So perhaps we could define a macro that makes that easier, since the current way of doing it is pretty verbose. Assuming it was almost-as-simple as (import ...) that would help ease the pain somewhat, though it wouldn't help with dependency management (that's a whole different ballpark).

I also thought of a macro that would make it easier to import/export stuff to/from a module. Right now you need to do stuff like this:

  (= env (new-namespace))
  (= env.'foo foo)
  (= env.'bar bar)
  ; etc.

Which is clunky. But I haven't figured out a good name for it. Okay, wait, I could use plain-ol `namespace`:

  (namespace foo bar)

I'm undecided though. It's like (table) vs (obj), only with namespaces.

-----

1 point by Pauan 5224 days ago | link

Oh, and by the way. In addition to creating a safe namespace and selectively giving it unsafe functions, you can also remove functions from a safe namespace.

For instance, suppose you wanted to run code in a safe environment, but you didn't want it to be able to print (using pr, prn, prt, etc.) You could use this:

  (= env (new-namespace))
  (= env.'disp nil)

  ; do something with env

Like I said, it's very flexible. You have complete control over what is/isn't in a namespace. You can execute each module in it's own namespace, or combine them however you wish, etc. It has a very simple core, but has many many potential uses.

-----

1 point by shader 5225 days ago | link

Traditionally, scopes are formed as trees, and if a name isn't found in the local scope, then the parent is checked and so on.

I see namespaces, environments and scopes as different names for the same thing. Thus when arc is loaded, it would create a default namespace, and load all built in functions into that namespace. It's up to you whether user functions default to a "user" namespace (some languages do this) or default to the root namespace. Any newly created namespaces reference that as their parent, and so on down the tree.

If you do implement a true environment system for PyArc, I recommend you do it this way.

I'd also recommend considering making environments a first class part of the language, which is where you seem to be headed. Reifying environments creates many interesting possibilities, including user-level implementations of module systems, selective dynamic scoping, parameterization, and more control over evaluation in general.

-----

1 point by Pauan 5225 days ago | link

It's already done. As said, modules are implemented, but there's still some bugs to work out.

There's a Python variable global_env that contains the base functions (defined in PyArc) and (once I get it working) arc.arc as well. When it loads a file, it creates a shallow copy of global_env and uses that as the namespace for the file.

Then, within the file, if it uses import, that then calls (new-namespace) which once again creates a shallow copy of global_env.

Actually, PyArc does have an environment class, but it's not exposed to Arc. On the other hand, eval and load both accept tables and alists for an environment. Is that what you meant?

So, I already have all the scaffolding in place, I'm just trying to decide on what name to use. I don't really like the name built-ins* but it does seem to be pretty accurate.

-----

1 point by rocketnia 5225 days ago | link

If you're creating a shallow copy and you're just treating a namespace as something that maps names to values (rather than mapping names to bindings containing values), then it won't be as customizable in some ways: When you redefine 'some, you don't get 'all for free, and you don't get to maintain different 'setter tables in different namespaces.

I'm going to great pains reimplementing Penknife so that I create an all new core environment from scratch each time, just so that it can be hackable like that. XD But I'm doing this by passing all the bindings around manually so that core-call's implementation can depend on binding-get's binding and vice versa. The core is ridiculously circuitous, with some 11-argument functions and such. Fortunately, the circuitous part is mostly confined to a single, reasonably sized file, which defines a DSL for the rest of the files to use.

Gotta finish up this reimplementation and post it at some point. XD;;;

-----

1 point by Pauan 5225 days ago | link

Okay, so, my plan right now is that if a function is bound in global_env, it has dynamic scope, and if it's bound anywhere else, it's lexical.

This should allow for my shallow-copying strategy to work, but also allow for shadowing core functions. This may break arc.arc though, but I'll tackle that when I get to it.

-----

1 point by rocketnia 5225 days ago | link

I have almost no clue what you're doing, but I hope it works out. Treating core variables differently than others sounds nothing like what I would do, so we're separate people. :-p

-----

1 point by Pauan 5225 days ago | link

Yes, it's ludicrous, crazy, and probably insane, but I'm trying it anyways. I suspect it'll break later, though, so I'll have to go back to lexical-scope-for-everything.

By the way, I called it dynamic scope, but I'm not actually sure what it is. It's this weird short-circuiting thing that causes the global_env to jump back one environment level, which then causes it to loop back onto itself if the variable isn't shadowed, but it works (for now).

Edit: nevermind, I had to revert it. Darn. It was such a silly hack too.

-----

1 point by Pauan 5225 days ago | link

Hm... yes, I may end up needing to change that, or hack it in some way to allow better redefining of built-in functions, while still keeping modules isolated.

-----

1 point by rocketnia 5225 days ago | link

If there were a special global 'built-ins*, would it be a built-in? ^_^

Honestly though, I'm not quite sure quite what you mean by defining built-ins. If you're trying to change what's returned by (new-namespace), guess what: You can give something a modified version of 'new-namespace. :D

-----

1 point by Pauan 5225 days ago | link

Yes, it would be. In fact, it would have to be.

Okay, so, let me explain how this works... global_env is a Python dictionary that defines the built-ins that are exposed to Arc code. After parsing and evaling arc.arc (which still doesn't work yet, but it should eventually), it now contains the core of Arc, including special functions defined in PyArc.

The (new-namespace) function creates a shallow copy of global_env, which is what keeps the modules isolated from each other, because when you use (import) it loads it with (new-namespace).

What this means is, Arc code cannot overwrite built-ins, they can only shadow them. So if you overwrite the new-namespace function, that change would only affect your module, and nobody else's. See what I mean about modules being too isolated from each other? They're child-proof! :P

What would need to happen in order to support the point you brought up (a coerce* table, etc.) would be a way to actually write to global_env directly, bypassing the shallow copy. But since (new-namespace) works by creating a shallow copy of global_env, any future modules loaded after yours would use the stuff you defined even if they don't import your module, which is why I'm calling it dangerous (but possibly dangerous in a good way).

I'm just trying to decide on the name that is used to write directly to global_env. I think it should be a table, but it could be a function as well, like (new-namespace). Of course, there's still the question of whether we should allow overwriting the built-ins at all, but I think that malleability fits well with Arc.

-----

1 point by Pauan 5225 days ago | link

Oh, and by the way, here's my plan for handling "private" variables:

If it's a global that is supposed to be internal (used only by your module), I suggest prefixing it with a _, so foo would become _foo, etc. But still make it accessible to other modules, just in case.

Why? Well, consider the scenario where somebody imports your module but it has a bug in it. If the implementation is exposed, they can potentially fix the bug. Another problem is adding new features... if your module hides all of it's internal details, then it will probably be harder to extend your module.

Thus, I propose that internal variables are handled by an "honor" system, so to speak. The _ means that it is intended for internal use, so other modules shouldn't go mucking with it if they don't have to, but they still can, if they need to. It also means that the variable is not part of the official API, so it could change or be removed in future versions of the module.

In other words, don't depend on variables prefixed with _ and avoid mucking around with them if you can.

On the other hand, if the variable really truly is useful only for your module and nobody else (a gensym, maybe?) then it's okay to wrap it in a (let), that way only your module can see it.

-----

1 point by rocketnia 5225 days ago | link

Why? Well, consider the scenario where somebody imports your module but it has a bug in it.

We're practically the same person. XD That's one of the use cases I always reach for when talking about Penknife or about language hackability in general.

-----

1 point by Pauan 5225 days ago | link

Actually, I used to be in the "wrap it in a let! hide everything!" camp, but your post (http://arclanguage.org/item?id=13769) changed my mind.

I still don't really like Python's "everything is open!" approach, but I appreciate it a bit more now than I used to. On the other hand, I do think it's a good fit for Arc.

-----

1 point by shader 5225 days ago | link

Sometimes I agree that . would look nicer, but I also feel that as used currently it has a stronger tradition, simpler meaning and is more "fundamental" and generally useful. I use it all the time with its current meaning, but I only rarely use !. A better choice might be to redefine : instead, as that is commonly used for namespaces. Or provide multi-char ssyntax, and use :: .

I do wish that ssyntax was handled at the same level as other racket reader macros, such as brackets, strings, quote, etc. Then you could do foo.'a or foo."a", something I've desired many times.

-----

1 point by Pauan 5225 days ago | link

Yes, I plan to handle ssyntax at the reader level in PyArc.

In any case, with customizable ssyntax, you can change it however you like. You could even remove the built-in ssyntax, so ' ` , ,@ . ! & : ~ wouldn't work any more, and you'd have to use pure S-expressions. I don't know why anybody would want to do that, but I don't see a reason to restrict it either.

-----

1 point by rocketnia 5225 days ago | link

Technically, he ' ` , ,@ operators are reader macros in Arc, not ssyntax. ^_^ A reader macro is more powerful, in a sense[1]: Once a certain sequence of characters is read in, the whole reader behavior can be replaced, putting you in a different language. On the other hand, those particular operators don't need to be quite that powerful, and I'm all for implementing them and ssyntax in a single consistent way.

In Penknife, instead of saying `(a b ,c), I say qq.[a b \,c] (where "\," is just an escape sequence qq implements; Penknife syntax is founded on nestable strings). As long as infix operators are capable of acting on arbitrary subexpressions, any variable can effectively be a prefix operator, lessening the need for heiroglyphics.

---

[1]: From another perspective, reader macros are rather innately limited to prefix notation; I believe it can be overcome in certain cases (http://arclanguage.org/item?id=13888), but it means manually managing a stack of things which could be pushed under the next infix operator to come along. Can't tell if that's ugly at this point. ^^

-----

1 point by Pauan 5225 days ago | link

Oh, by the way, since we're making ssyntax reader-level, I might be able to get more powerful ssyntax as well. For instance, " works by consuming the stream until it finds a matching " Ditto for [] and () etc. This behavior hopefully wouldn't be too hard to add, though I'm not sure what the Arc interface to it would be like.

I probably wouldn't be able to move the defaults for "" [] () into Arc, though, because they call Python functions/classes that I don't particularly want to expose to Arc.

-----

1 point by Pauan 5225 days ago | link

Yeah, I know. Unless I made ssyntax way more powerful, I couldn't put stuff like "" or [] in there. I'm okay with that, at least for now. But since it's possible to put ` ' , and ,@ in the ssyntax, I plan to do that. Makes it more hackable in Arc, you know?

Also, since I plan to expand ssyntax at read-time in PyArc, what's the distinction between ssyntax and reader macros, besides the fact that Arc can't define reader macros, and reader macros are more powerful?

-----

1 point by rocketnia 5225 days ago | link

"Makes it more hackable in Arc, you know?"

There's nothing stopping Arc from having reader macros too, except that at this point there isn't a good standard; it takes Racket calls, and the more I learn about Racket and reader macros, the more I think it has an incomplete standard too. :-p I want to make a reader macro that stops when it reaches a symbol-terminating character--but wait, there are ways to specify symbol-terminating characters, but I see no way to check for them. Time to hack the language core... if only I could. ^^

"what's the distinction between ssyntax and reader macros, besides..."

I think the distinction is how much you're parsing the stream one character at a time (in which case you can dispatch on reader macros) and how much you're parsing it in chunks. Infix syntax always looks like a chunk to me, but as I was saying, infix operators could be implemented as reader macros if we kept/passed enough state in the reader. There could be no distinction at all.

-----

1 point by aw 5225 days ago | link

Then you could do foo.'a or foo."a", something I've desired many times.

Can someone post a summary of how they would like ssyntax to work? Along with getting the syntax to work in more cases, I'm vaguely aware for example that people have preferences as to whether a.b.c should be (a (b c)) or ((a b) c), but I don't know what they are.

-----

1 point by aw 5225 days ago | link

As an aside, eval in my own runtime project works the same way: you can pass it an optional second argument to use as the namespace.

A common criticism of this style of module implementation is that it doesn't provide for a way to avoid namespace collisions on prerequisite macros. E.g., if I import a macro foo which expands into bar, I have to import bar as-is (I can't rename it to something else, because then the expansion of foo would break). It doesn't personally bother me though, since I always go and easily rename bar in the source if it's causing a problem for me.

-----

1 point by Pauan 5225 days ago | link

Okay, so, I gave this some thought. I might be able to do some kind of hacky thing where when a macro expands, it first checks for variables in the macro's scope. Then it checks for variables in the defining module's scope. Then it checks for variables in the importer's scope.

Sounds really hacky. I may need to use name-munging, but if I implement it at the interpreter level, at least ordinary code shouldn't be aware of it. Ideally it would be completely transparent.

-----

1 point by rocketnia 5225 days ago | link

Oh yeah, this is an interesting issue. It's one reason I'm going with hygienic macros in Penknife, besides simple cleanliness. :-p The technique I use in Penknife, as it turns out, is an example of syntactic closures, but I didn't know it until well after I had an okay system in place. I found out when I read this from http://lambda-the-ultimate.org/node/4196:

"Other frameworks for lexically-scoped macros, notably syntactic closures (Bawden and Rees 1988) and explicit renaming (Clinger 1991), use a notion of lexical context that more directly maps to the programmer's view of binding scopes. Unfortunately, the more direct representation moves binding information into the expansion environment and, in the case of syntactic closures, tangling the representation of syntax and expansion environments."

I had exactly this entanglement happening in Penknife at one point. At the time a Penknife macro is defined, the local environment is captured, and later on, syntax generated by the macro is wrapped up in a way that refers back to that environment. At one point I was just putting the environment itself in the wrapper.

For your purposes, that might be just fine, especially if you're just going to eval the macro's result right when you get it. Then again...

PyArc's modules sound exactly like I want Penknife's (not yet existing) modules to be, and you seem to value the ability of the library users to configure the libraries to their own purposes, much like I do. I suspect, in our languages, programs will need to load the same library multiple times for different purposes, especially when it comes to diamond dependencies, where two individual libraries might depend on a common third library but configure it in different ways.

With Penknife's approach to extensible syntax, it turns out parsing is really slow, so I've organized the language so that it can parse a library in one pass and then run the resulting code over and over. ("Parse" basically means "compile" in Penknife, but Penknife does one or two additional things that could be called compilation, so I shy away from that word altogether.) That exposes a weakness of tangled syntactic closures: The compiled representation holds the original captured environment, rather than the one that should be used this time around.

So for Penknife I ended up storing the environment in the macro itself and complicating my environments so that variables were referred to based on what sequence of macro-unwrapping it took to get to them. Since the person using the macro obviously has the macro in their namespace, it comes together pretty nicely. I'm happy with the result, but it's not nearly as simple as (= env!foo 4), so I can't expect you to follow the same path. ^^

-----

1 point by Pauan 5225 days ago | link

In PyArc, right now, we just re-parse and re-eval it, so loading the same module twice is not a problem.

Okay, so, I fixed the issue with it thinking a variable was undefined when it wasn't. I then found out something very interesting:

  ; foo.arc
  
  (assign message* "hello")
  (mac ret () 'message*)
  
  ; bar.arc
  
  (import foo "foo.arc")
  (foo!ret) -> "hello"

  (= ret foo!ret)
  (ret) -> error: message* is undefined

So... without even trying to, I automagically made macros sorta-hygienic with modules. When you import a module, and then call it's macros using the foo!bar table convention, it works just fine. The macro will use it's defining environment, rather than the caller's environment. But... if you pull the macro into the current namespace, it dies, because it's now being eval'd in the caller's namespace.

I'm pretty okay with this. It's kind of a wart that macros don't work quite right in that situation, but the fact that they do work when using the table is pretty darn nice. This also theoretically gives the caller the choice of whether to eval the macro in the defined environment, or the caller environment. The only issue, I think, is that this behavior may be confusing, at least until you've been bitten by it once or twice; by then you should have it memorized. :P

Obviously this is just a simple test... the real test will be when it's released and we'll have more complicated macros. Then we'll see if there's still any major hygiene problems or not.

-----

1 point by shader 5225 days ago | link

Sounds like you're looking for some kind of hygienic macros, and may have rediscovered the logic behind scheme adopting them in the first place.

It is possible that properly handling modules at the core language level requires that either macros are only ever expanded and evaluated in their original context, or hygiene must be used to encapsulate that context in the macro-expansion. Or leave it without restrictions, and hope people follow the guidelines.

Not all hygienic macro systems are cumbersome or complicated, and it's possible that we could create one that permits selective breaking of hygiene as desired. One candidate would be an implementation of SRFI 72: http://srfi.schemers.org/srfi-72/srfi-72.html

Hygiene has been discussed on this forum, and I think a few systems may exist already.

-----

1 point by Pauan 5225 days ago | link

Wouldn't name-munging solve it also? Suppose, for instance, that you have two modules:

  ; foo.arc
  
  (= message* "hello")
  (mac ret () message*)
  
  
  ; bar.arc
  
  (import (ret) "foo.arc")
  (ret) -> error: message* is undefined

Okay, but what if every symbol was transparently prefixed with the module's name, or a gensym or something? Something like this:

  ; foo.arc
  
  (= foo@message* "hello")
  (main@mac foo@ret () foo@message*)
  
  
  ; bar.arc
  
  (main@import (bar@ret) "foo.arc")
  (bar@ret) -> "hello"

This is all handled by the interpreter, so normal users don't see it. Anyways, now `ret` would expand to this:

  (foo@message*)

Which would refer to the correct binding in the correct namespace. Hm... but then what if the macro actually does want to grab a value defined in bar.arc? Tricky.

-----

1 point by rocketnia 5225 days ago | link

I think your question is really "Wouldn't name-munging accomplish hygiene?" :)

What you're talking about is pretty similar to how Racket accomplishes hygiene by way of 'read-syntax. Everything's wrapped up so you know what its source file is, and that's all I know for now. :-p Seems Racket's system is pretty cumbersome, but then that's probably because it's very static, with modules revealing what they export before any of the exported values are actually calculated.

What you're talking about also sounds extremely similar to Common Lisp's approach to namespaces. I don't know that approach very well, but it could act as some sort of example. ^^

-----

1 point by Pauan 5225 days ago | link

Hm... that's a good point. I guess that's a good reason for implementing name-munging schemes. I'll have to consider that.

-----