Arc Forum | Ask Arc: What's the best lisp variant for systems programming?

Arc Forum

Ask Arc: What's the best lisp variant for systems programming?

4 points by shader 2875 days ago | 22 comments

Golang and Rust are currently the two most touted systems programming languages, and while I appreciate some of what they offer I'm rather frustrated by the hype and other deficiencies of the languages (particularly for golang). And to be honest, I just like using members of the lisp family.

However, none of the dialects I've looked at really seem targeted for the same kind of lower-level OS component development. The current project I'm considering is mostly just container management, so it doesn't have to be super close to the metal, but it should produce fairly concise and efficient binaries.

I like clojure, but it's clunky and has multisecond launch times... CL seems like the classic choice; maybe I'm unnecessarily hesitant to use it.

Bonus points for being usable as the basis for an open source project that _other_ people could contribute to...

Common Lisp, picolisp, Guile, ferret, arc (with more hacks to make it closer to what I'm looking for...)

Let the discussion commence...

2 points by akkartik 2874 days ago | link

I've been hanging out a good amount in the #bootstrappable IRC channel on Freenode lately. They seem to have a decent start at a fresh new stack that builds all the way up from machine code and can run a decent amount of C (as of last night: gcc, binutils, libc). There's also parallel infrastructure to build tiny proof-of-concept Lisp and Forth interpreters.

I think I may have my recent side projects retarget this stack. For the last couple of years I've been trying to create a better stack of system software for implementing Lisp (among other things): https://github.com/akkartik/mu#readme. I even started building something up from machine code, just like bootstrappable: https://github.com/akkartik/mu/tree/master/subx#readme. So I should hopefully be able to switch to bootstrappable fairly easily.

Their documentation is still poor. Here's what I've been able to piece together so far. The bottom-most repo is at http://git.savannah.nongnu.org/cgit/stage0.git. A writeup on it: https://bootstrapping.miraheze.org/wiki/Stage0.

From stage0, the basic flow is this:

* stage0/stage0/hex0 builds stage0/stage1/hex1.

* stage0/stage1/hex1 builds stage0/stage1/hex2.

* stage0/stage1/hex2 builds stage0/stage1/M0.

* stage0/stage1/M0 builds stage0/stage2/cc_x86.s

* stage0/stage2/cc_x86.s builds https://github.com/oriansj/M2-Planet, which is a bare-bones compiler for a subset of C that is also self-hosting.

* M2-Planet is on the way to building https://www.gnu.org/software/mes (there's still a gap here)

* mes can build gcc, binutils, glibc.

For Lisp, there's a tiny interpreter at http://git.savannah.nongnu.org/cgit/stage0.git/tree/stage2/l... that can be built with M0.

There's a decent amount of documentation, but I'm still finding my way around it. Not clear what order to read things, etc. The good news is that it builds. On any Linux system, 32-bit or 64-bit, you should be able to just run `make`. There's not a lot of code, and it seems worth spending time to understand. The folks on the IRC channel have been super helpful to answer questions.

This doesn't answer your immediate question. But I'm hopeful that we can build a new Lisp something like this that will be approachable and hackable, and good for sharing code and runtime across language boundaries. Which would reduce the gap between Lisp and systems programming, or any other domain.

-----

2 points by shader 2874 days ago | link

Sounds pretty interesting.

I have this constant problem of dissatisfaction with existing tools and paradigms, to the point that I often work my way down to the bottom of the stack considering building a new OS...

My most recent descent has started with a dissatisfaction of container orchestration tools, then discovering that it's not really easy to manage containers with unprivileged code under Linux, and now wondering about building a microkernel OS / language for distributed actor-model development from the ground up, and thinking it would be nice to have an "OS development framework" so it is easier to experiment with novel architectures without having to deal with a lot of driver development grunt work... I originally asked this question somewhere near the beginning of all that.

-----

2 points by akkartik 2874 days ago | link

You're speaking my language :)

Speaking of building a new OS, check out https://gitlab.com/giomasce/asmc which uses some of the bootstrappable infrastructure to build an OS. It uses a stack-based (Forth-like) intermediate language called G to implement C.

-----

2 points by hjek 2874 days ago | link

For making small self-contained binaries, Embeddable Common Lisp[0] and Chicken[1] are great. Gambit[2] is also worth a look, and while it's supposedly faster than Chicken, Chicken does have way better documentation and more packages available.

Guile, Picolisp and Arc can't make self-contained binaries, right? But Ferret looks interesting.

[0]: https://common-lisp.net/project/ecl/

[1]: https://call-cc.org/

[2]: http://www.gambitscheme.org/wiki/index.php/Main_Page

-----

3 points by shader 2874 days ago | link

True, Guile and Picolisp do not make self-contained binaries, but they're pretty lightweight and fast, and seem like they still might be half-decent choices. I guess I should probably remove Guile from the list of candidates, since it's really for plugin extensibility in other programs; I may even have been slightly overlapping it with Chicken in my mind.

Arc would have to be "polished" a bit to make it work, but that's what we're here for, isn't it?

-----

2 points by akkartik 2874 days ago | link

I'm not sure system programming is a good domain for Arc. Why not just Racket directly? The interpreter makes Arc quite slow.

-----

2 points by shader 2874 days ago | link

Yeah, I probably wouldn't use it for that either, at least as much because of the relative instability and hackishness of arc. It was fun for a few minutes to imagine modifying it to be used that way though.

-----

2 points by rocketnia 2872 days ago | link

Interpreter? Are you talking about Racket's bytecode interpreter?

-----

1 point by akkartik 2872 days ago | link

No, I was forgetting that ac is considered a compiler :) But it compiles the Arc codebase every single time Arc starts up. Maybe we should start memoizing its outputs to disk somehow, see if that makes it noticeably faster. My suspicion is egregious runtime processing like ar-denil-last and ar-apply-args will cause it to not make a difference.

Given the pervasiveness with which Arc has to make such transforms at runtime, I've gotten into the habit of thinking of it effectively as an interpreter.

-----

2 points by akkartik 2874 days ago | link

Common Lisp has been more on my radar recently after reading http://stevelosh.com/blog/2018/08/a-road-to-common-lisp.

-----

2 points by i4cu 2874 days ago | link

I liked the article, but I'm at a loss as to why that would lead you make CL more notable than other lisp or scheme dialects.

For me, the article makes good points about Lisp in general and makes a good case for lisp vs. non-lisp languages, but other than that it just read like a very good lisp intro.

-----

3 points by akkartik 2874 days ago | link

A lot of the reason CL never stuck for me was that the docs are ancient and not very approachable. And I didn't have a guru in my network.

But this write-up clarifies several things that I've never understood. The package system, for example: http://stevelosh.com/blog/2018/08/a-road-to-common-lisp/#pac...

If I'd seen it ten years ago perhaps I'd have stuck with Common Lisp. Even if it's large and baroque, it has a lot in it.

-----

2 points by akkartik 2874 days ago | link

Can you elaborate on "being usable as the basis for an open source project that _other_ people could contribute to"? Is that not true of any open source project?

-----

2 points by shader 2874 days ago | link

Well, that was more of a self-directed comment that if I'm using a custom language for making my projects, the odds that anyone else will want to contribute to it are rather low.

-----

2 points by akkartik 2874 days ago | link

The reason I asked: I actually don't think open source projects are designed for others to contribute to :)

https://www.reddit.com/r/ProgrammingLanguages/comments/8i33h...

-----

2 points by shader 2874 days ago | link

Interesting that you mention the sandboxing problem, since that's something I'm currently thinking about...

My current thought for solving the sandbox problem is to have a thoroughly first-class and recursive system for creating subsets of resources and providing them to children. Fortunately, I don't think we ever have to worry about running _unexpected_ code; just code that does unexpected things. So if we have the tools to take part of the resources available for the current "process" and pass them on to any child interpreters (tabs, etc.), then it wouldn't matter how many layers you decide to add to the system.

That said, a lot of complexity with sandboxing comes from the desire to deliberately break isolation and provide access to shared resources... I think that temptation should be avoided as much as possible, and solved with easy to use communication primitives. While that does impose an overhead, we're already quite used to isolation overheads, and it should be constant. And if the abstractions around resources are good enough, the difference shouldn't be that noticeable.

And now you know why I'm thinking about an actor-model OS, where everything (including files, etc.) are represented as isolated actors that communicate via messaging. Then it doesn't make a difference whether you "own" the 'filehandle' or a different actor does - it's just an address that you send authorized messages to anyway. Fits pretty well with microkernel and distributed OS design too.

-----

2 points by akkartik 2874 days ago | link

"My current thought for solving the sandbox problem is to have a thoroughly first-class and recursive system for creating subsets of resources and providing them to children."

I hadn't been thinking about sandboxing, but my interest in testable interfaces lately has me asking how I can design an interface for memory allocation in a testable way. And that requires precisely such a recursive way for one allocator to be able to do book-keeping on memory allocated by a parent allocator. This would be the start of the capability to run the tests of an OS while running on itself.

The only way to have an address tagged as used in one allocator and free in a child allocator: metadata must be somewhere else. I'm imagining something like this, where 'the 'allocator' blocks are book-keeping metadata on used vs free areas:

    +---------------------------------------+
    |                                       |
    |              Allocator                |
    |                                       |
    +---------------------------------------+
    |                                       |
    |              Block 1                  |
    |                                       |
    +---------------------------------------+
    |              Block 2                  |
    +---------------------------------------+
    |              Block 3                  |
    |                                       |
    +---------------------------------------+
    | Block 4  +--------------------------+ |
    |          |        Allocator         | |
    |          +--------------------------+ |
    |          |        Block 1           | |
    |          +--------------------------+ |
    |          |        Block 2           | |
    |          +--------------------------+ |
    |          |        Unused            | |
    |          +~~~~~~~~~~~~~~~~~~~~~~~~~~+ |
    +---------------------------------------+
    |              Block 5                  |
    +---------------------------------------+
    |              Unused                   |
    |                                       |
    +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+

Anyway, if I have a bootstrappable stack to start with, the timeline for this allocator just got accelerated!

-----

3 points by shader 2874 days ago | link

Yeah; I guess I didn't state it that clearly, but the sandboxing reference was from the reddit thread you linked

"Sandboxing is challenging partly because our boundaries for what is trusted shift imperceptibly over time. Browsers came out in a world where the desktop was considered the crown jewels, while browsers were considered untrusted. Over time local disk has disappeared (e.g. Chromebooks) while our lives are ruled by the browser. This has made the original design of browser security (shielding the local file system; cookies; single origin) if not obsolete then at least dangerously incomplete. So any sandboxing solution has to think about how to fight this gradual creep. No matter how you organize your sandboxes using Qubes or whatnot, something inside a sandbox is liable to become more important than other stuff inside that sandbox. And even outside that sandbox."

The idea of running tests against a live application, particularly an OS, is intriguing. It shouldn't be too hard though, since the code and its state don't have to be in the same place. So you could run the live code against separate testing data in a different sandbox or something...

Your specific example of memory management makes me think that vague "recursive subsets of resources" probably won't cut it for a lot of things. After all, if you make memory access too indirect, it would be very very slow; but if you try to use the actual CPU management interfaces, it may not allow the kind of control that we want...

How much do you really need the bootstrappable stack? There shouldn't be anything stopping you from using something that's already been bootstrapped, instead of raw hex code. E.g. existing C tools, etc. There's also the picolisp-based PilOS project; basically picolisp (which is mostly bootstrapped from an assembler they wrote anyway) running directly on the metal as a kernel. (https://picolisp.com/wiki/?PilOS)

-----

1 point by akkartik 2874 days ago | link

Oh sorry, I did understand your sandboxing reference. Just meant I wasn't concerned with it when thinking about the memory allocator. Should have said, "I haven't been thinking about sandboxing.."

I'm less concerned about overhead because I'm primarily concerned about tests, which by definition don't run in production.

But I think in the case of memory allocation you can do several levels of sandboxing before you start hitting performance bottlenecks. You'd have no extra indirections when using memory, because its just a pointer and you just have an address. You'd only hit extra levels of indirection when you try to allocate and the allocator is out of memory. Then you may need to request more from your parent, who needs more from its parent, and so on, all the way down to the OS.

---

"How much do you really need the bootstrappable stack?"

My basic idea is that code is easier to quickly understand if you can run it and see it in operation in a variety of tiny slices. I think it would be easy to quickly modify codebases if you had access to a curriculum of logs of the subsystems you care about, click on any line to jump to the code emitting it, and so on. The curriculum would be created automatically while running tests (because tests are selected manually to capture interesting scenarios).

This idea requires being able to drill down arbitrarily deep into the traces, because we can't anticipate what a reader is trying to change. I may be trying to investigate why a certain sequence of operations causes my laptop to lock up. That may require drilling down inside a save operation to see the data structures inside the OS that perform buffering of disk writes. And so on.

Compilers are part of this infrastructure I want to drill down into. If I want to understand how the if keyword works, it does me no good if it terminates in some opaque binary that wasn't built on my computer. It's also only mildly nicer to find out that some code in this compiler I'm reading requires understanding the whole compiler. Metacircularity is a cute trick, but it's hell on comprehension. Any sort of coiling is bad. What noobs need is to see things laid out linearly.

I've looked at PicoLisp before. I even modeled Wart's memory allocator on PicoLisp. So it's great in many ways. But like all software today it's not really intended for end-users to look at the source code. Its code is intended for insiders who have spent months and years building up an intimate understanding of its architecture. That assumption affects how it's written and managed.

-----

2 points by shader 2873 days ago | link

"its just a pointer and you just have an address"

That could be true; I'm not really familiar with CPU facilities for memory isolation, but this is probably one of the most solved of the isolation challenges, since it does require CPU support.

"You'd only hit extra levels of indirection when you try to allocate..."

Good point. I wonder if there's any way to improve that, or if the worst case is rare enough that it's acceptable?

---

"This idea requires being able to drill down arbitrarily deep into the traces..."

That's a cool idea, and fits very well with the GNU objective of fully open and transparent source code. I'm not sure that bootstrapping is what is required to achieve that goal, however. What you really need is transparent source all the way down, which can be satisfied with self-hosted code, even if it's not bootstrapped. In fact, I'd argue that multiple layers of bootstrapping would make the drill-down process very challenging, because you'd have to cross very sharp API boundaries — not just between functions or libraries, but between languages and runtime environments. Making a single debug tool that handles all of that would be impressive, let alone expecting your users to understand it.

"Metacircularity is a cute trick, but it's hell on comprehension"

Metacircularity applies to interpreters built on interpreters, not compilers. I can agree that it makes things opaque though, because it reuses the existing platform rather than fully implementing it. A self-hosted compiler, even if written in its own language, could be fully comprehensible, however (depending on the quality of the code...). It's just a program that takes syntax and produces binary.

Interestingly, while writing this, I ran across the following paper, which may be of some interest: Avoiding confusion in metacircularity: The meta-helix (Chiba et al.) (https://pdfs.semanticscholar.org/4319/37e467eb9a516628d47888...) I'll read more of it tomorrow, and possibly make a separate post for it.

I do think it would be really cool and possibly also useful if every compiler was self-hosted, and could be drilled into as part of the debugging process. It would mean that you could read the code, since it would be written in the language you are currently using, and the same debugger should be able to handle it.

---

"But like all software today it's not really intended for end-users to look at the source code"

I haven't actually looked at picolisp's source much myself, but what would it take to satisfy your desires for "end-user" readable code?

-----

2 points by akkartik 2873 days ago | link

Very interesting. I'm going to read that paper as well. And think about what you said about the difference between interpreting and compiling.

> what would it take to satisfy your desires for "end-user" readable code?

See code running. My big rant is that while it's possible to read static code on a screen and imagine it running, it's a very painful process to build up that muscle. And you have to build it from scratch for every new program and problem domain. And even after you build this muscle you screw up every once in a while and fail to imagine some scenario, thereby causing regressions. So to help people understand its inner workings, code should be easy to see running. In increasing order of difficulty:

a) It should be utterly trivial to build from source. Dependencies kill you here. If you try to install library version x and the system has library version y -- boom! If you want to use a library, really the only way to absolutely guarantee it won't cause problems is to bundle it with your program, suitably isolated and namespaced so it doesn't collide with a system installation of the same library.

b) It should be obvious how to run it. Provide example commands.

c) There should be lots and lots of example runs for subsets of the codebase. Automated tests are great for this. My approach to Literate Programming also helps, by making it easy to run subsets of a program without advanced features: http://akkartik.name/post/wart-layers. https://travis-ci.org/akkartik/mu runs not just a single binary, but dozens of binaries, gradually adding features from nothing in a pseudo-autobiographical fashion, recapitulating some approximation of its git history. In this way I guarantee that my readers (both of them) can run any part of the program by building just the parts it needs. That then lets them experiment with it more rigorously without including unnecessary variables and moving parts.