I want to think more about this and continue the conversation, but I'm worried the reply window will close first.
Since I presume your input space is relatively small (the AST of a program, which usually only has a few thousand nodes), it sounds like you have some sort of state-space explosion.
Your comment about recursive matching of hypertees sounds like the biggest problem.
Just a shot in the dark (having not studied what you're doing yet), but is there any chance you could use partial-order reduction, memoization, backtracking, etc. to reduce the state-space explosion?
I could be wrong, but most of the other optimizations sounded like they address constant factors, like contract checking.
But then I don't know much about how contracts work; I guess the verification logic could be rather involved itself.
If the window closes, maybe we could continue at #arc-language:matrix.org
From that context, the main idea that I'm thinking about is the tradeoff between "correctness" and "complexity" in an application or language.
That is, you could add complexity (language features, tools, architecture) to ensure some kinds of correctness, but that complexity adds risks of its own.
So, is it better to have a simpler language that doesn't have as many guarantees but is easier to understand and iterate with (Zig), or a more complex language that has more guarantees but at the cost of program complexity and slower builds? (Rust and Haskell)
This is the discussion that I thought y'all would be interested in.
There are a number of factors going into what makes it slow (and what might make it faster). I'm particularly hopeful about points 4 and 5 here, basically the potential to design data structures which pay substantially better attention to avoiding redundant memory use.
By the way, thanks for asking! I'm reluctant to talk about this in depth in the documentation because I don't want to get people's hopes up; I don't know if these optimizations will actually help, and even if they do, I don't know how soon I'm going to be able to work on building them. Still, it's something I've been thinking about.
1. Contracts on the whole
Punctaffy does a lot of redundant contract checking. I have a compile-time constant that turns off contract checking, and turning it off gives a time reduction in the unit tests of about 70%, reducing a time like 1h20m to something more like 20m. That's still pretty slow, but since this is a quick fix for most of the issue, it's very tempting to just publish a contractless variation of the library for people who want to live on the edge.
2. Whether the contracts trust the library
Currently, all the contracts are written as though they're for documentation, where they describe the inputs and the outputs. This constrains not just that the library is being used with the correct inputs but that it's producing the correct outputs. Unless the library is in the process of being debugged, these contracts can be turned off.
(Incidentally, I do have a compile-time constant that turns on some far more pervasive contract-checking within Punctaffy to help isolate bugs. I'll probably want it to control these output-verifying contracts as well.)
3. The contract `hypernest/c`
One of the most fixable aspects here is that my `hypernest/c` contract checks a hypernest's entire structure every time, verifying that all the different branches zip together. If I verified this at the time a hypernest was constructed, the `hypernest/c` contract could just take it for granted that every hypernest was well-formed.
4. Avoidable higher-dimensional redundancy in the data structure
Of course, even without contracts, 20 minutes is still a pretty long time to wait for some simple tests to compile. I don't want to imagine the time it would take to compile a project that made extensive use of Punctaffy. So what's the remaining issue?
Well, one clue is a previous implementation of hypersnippets I wrote before I refactored it and polished it up. This old implementation represented hypertees not as trees that corresponded to the nesting structure, but as plain old lists of brackets. Every operation on these was implemented in terms of hyperstacks, and while this almost imperative style worked, it didn't give me confidence in the design. This old implementation isn't hooked up to all the tests, but it's hooked up to some tests that correspond to ones that take about 3 minutes to run on the polished-up implementation. On the old list-of-brackets implementation, they take about 7 seconds.
I think there's a reason for that. When I represent a hole in a hypertee, the shape of the hole itself is a hypertee, and the syntax beyond each of the holes of that hole is a hypertee that fits there. A hypertee fits in a hole if its low-degree holes match up. That means that in the tree representation, I have some redundancy: Certain holes are in multiple places that have to match up. Once we're dealing with, say, degree-3 hypertees, which can have degree-2 hypertees for holes, which can have degree-1 hypertees for holes, which have a degree-0 hypertee for a hole, the duplication compounds on itself. The data structure is using too much space, and traversing that space is taking too much time.
I think switching back to using lists of brackets and traversing them with hyperstacks every time will do a lot to help here.
But I have other ideas...
5. Avoidable copying of the data structure
Most snippets could be views, carrying an index into some other snippet's list of brackets rather than carrying a whole new list of brackets of their own. In particular, since Punctaffy's main use is for parsing hyperbracketed code, most snippets will probably be views over a programmer's hand-written code.
6. The contract `snippet-sys-unlabeled-shape/c`
There's also another opportunity that might pay off a little. Several of Punctaffy's operations expect values of the form "snippet with holes that contain only trivial values," using the `snippet-sys-unlabeled-shape/c` contract combinator to express this. It would probably be easy for each snippet to carry some precomputed information saying what its least degree of nontrivial-value-carrying hole is (if any). That would save a traversal every time this contract was checked.
This idea gets into territory that makes some more noticeable compromises to conceptual simplicity for the sake of performance. Now a snippet system would have a new dedicated method for computing this particular information. While that would help people implement efficient snippet systems, it might intimidate people who find snippet systems to be complicated enough already.
It's not that much more complicated, so I suspect it's worth it. But if it turns out this optimization doesn't pay off very well, or if the other techniques already bring Punctaffy's performance to a good enough level, it might not turn out to be a great tradeoff.
I don't think so, since OP mentioned Arc 3.2 without any Nginx. I just tried it and I'm able to reproduce this issue with just the official branch in anarki (which is identical to Arc 3.2) without any Nginx.
Arc 3.2 doesn't get very frequent updates, but we should probably fix this in the stable branch. It shows the same error, and also has some errors in the log. Unfortunately I'm not sure when I can get to this. I've been having some RSI issues and time at the computer is limited at the moment.
I've brought up an idea a couple of times here: If (quasiquote ...) and (unquote ...) match up and nest like parentheses, then what if we generalize that analogy to higher dimensions?
After years of going back to the drawing board over and over, I finally have a working, reasonably elegant (if slow), and most importantly, documented implementation of hypersnippets and hyperbrackets in Punctaffy. I'm really excited to be able to give examples of how these can be used for things other than quasiquotation.
The "Introduction to Punctaffy" section introduces the idea of degree-2 hyperbrackets. There are precious few examples where degree-3 hyperbrackets would come in handy (much less higher-degree ones), but the infrastructure is working for those as well.
There are still a number of directions I hope Punctaffy will take from here, but it's finally past the "sorry, I haven't written up good documentation, so I don't know how to explain" stage that it's been stuck in for quite a while. :) If you're interested, I've written up some in-depth discussion of those future ideas in the "Potential Applications" sections. One of those (the interaction between 'unsyntax and ellipses in the Racket syntax template DSL) would even be a practical application of degree-3 hyperbrackets.
This documentation totals to something like 30,000 words. Most of those words are the reference documentation of Punctaffy's hypersnippet-shaped data structures (hypertees and hypernests). Don't worry about reading through all of it, but I'd be very happy to see if anyone could make it through the introduction. After 4 years of not having much to show people, I really hope that introduction will help. :)
Started with the tutorial to understand basis of language, then reading `arc.arc` helped find out about new functions. If you're also starting out, have found making a cheat sheet to be incredibly helpful with quickly finding out what certain things do.
I appreciate your frankness, lol. Anarki maintenance happens whenever we feel like it. :-p I have been feeling an inkling of an Arc urge lately, hence this post, but I have other priorities right now that this is basically procrastination from.
What do you mean by getting GitHub to build docs from the trunk branch? That's what that part of the CI script is there to simulate; I think there's still no actual way to configure GitHub Pages to build from anything but a `gh-pages` branch or a `<username>.github.io` repository.
Thanks for raising these questions! A few random thoughts that immediately occur to me:
My general bias is that CI infrastructure is uneconomic for this project, given the amount of effort I'm willing to put into Anarki at this point. But I have absolutely no objection to others continuing to maintain it.
I'd be fine with having a more complex set of instructions for contributors than newcomers. I'm totally willing to perform those commands for myself when I make changes to Anarki.
I'd also be ok with undoing some of this complexity if you want to do that, though I don't mean to push for that in any way.
I think it's also not super onerous to require people to manually update the docs when making changes. That way we can get rid of gh-pages, and configure Github to just rebuild docs from the trunk branch. That eliminates one use case for CI.
Your final proposal is also fine, though as you mentioned someone would have to implement it.. :)
 As context, I also plan to remove CI from the project I _do_ actively work on.
Should clarify something (follow-up to previous reply, see below or above).
Most macros don't want to deal with whitespace and comments. We also might want to _not let_ them, otherwise people will start using comments as code, like in Ruby. Macros just want expressions. So, we would define a second level of the AST and perform a second pass.
For the same base notation, there may be multiple languages defined in terms of it. If such a language has any form of prefix or infix, or uses the `outside_parens()` calling convention, the second pass would have to group nodes into expressions, in ways specific to that language. Furthermore, it should be addled with metadata about packages, types, and so on. The resulting AST is compiled, fed to macros, etc.
> Why do you care about emitting exactly the code that was parsed? What new apps does it enable?
Got a `gofmt` addiction, can't go back. Auto-formatting should ship with every language.
Briefly skimmed the Go implementation, and you seem to be right: it seems to lose whitespace and enforce its own formatting.
> Are you aware of any languages that perfectly reproduce input layout?
For now just my own.  The language isn't real yet, and might never be realized, but it has a base data notation (very Lisp-like), a parser, and I just started writing a formatter. Because the AST for the data notation preserves whitespace and comments, the formatter can print the code _exactly_ as is. This has interesting repercussions.
For a fully-implemented formatter for a fully-defined language, you wouldn't need whitespace; see Go. However, being able to print everything back means your formatter is usable from the start. It can support one or two simple rules, making only minor modifications, but you can use it on real code right away. Furthermore, this means we'll _always_ be able to choose which rules to enable or disable, which can be handy if the family of languages described in terms of this notation has different formatting preferences. I actually want the formatter shipped with the language, like `gofmt`, to be non-configurable, but this still seems like a useful quality.