Can you explain what the axes are, again? The Y axis is the line; that's pretty clear. But it seems like both the X axis and the Z axis are related to how many characters into the line one goes. (Edit: I just listened to the FAQ, and the Z axis is level of indentation. So for each Y axis, there's only one possible Z value, for any number of X values? That is, you can't have both (42, 27, 3) and (42, 27, 4). Is that right?)
On ohayo.computer, you might want to make it obvious that you have to open the developer console to see the output from running the code. This is the last thing I would think to do. Compare it to many other "run code online" sites, e.g. https://repl.it/JtNz/0, where there's both a "run" button so I don't have to hunt for hidden menus that explain what command to press, and a box in which output goes.
On a technical note, Ohayo's fib.fire seems to be broken for me. Every box seems to have "no block 'X' found" in it: http://imgur.com/a/fwM8X. This happens in both firefox and chrome.
This presentation doesn't seem to answer many of the questions I have about 3D Languages. Why are they better? You say you expect them to have fewer bugs, but I don't see why. You say that they're easier to understand, but even you have a problem figuring out what the location of a given character is! I want to know why you think something, not just that you think something.
And, not to nitpick, but I really don't get the whole "ETN" phrase. It stands for "Extends Tree Notation", but then it's not even a noun, but an adjective! So something can't be an ETN, it can have the quality of ETN. You even seem to acknowledge this. At 7:35 in the video, you say "What fire is is a thing called an ETN. It extends tree notation." You should be able to replace "ETN" with "Extends Tree Notation", and have the sentence still make sense, like "I'm playing a CD" vs "I'm playing a Compact Disc". I'm harping on this because I think it's something I'm misunderstanding, but it seems to be near the core of your argument. Should ETN be a noun? Maybe it should be "Extended Tree Notation". And also, "ETN" seems to be identical to "indentation-based languages with weird syntax".
First, just want to say thank you so much, zck, for taking the time to write this feedback. It is so helpful and very much appreciated!
> Can you explain what the axes are, again? The Y axis is the line; that's pretty clear. But it seems like both the X axis and the Z axis are related to how many characters into the line one goes. (Edit: I just listened to the FAQ, and the Z axis is level of indentation. So for each Y axis, there's only one possible Z value, for any number of X values? That is, you can't have both (42, 27, 3) and (42, 27, 4). Is that right?)
Sorry, this is still slightly up in the air. The Y axis is equivalent to line number (this is fixed). The X axis is equivalent to indent level (this is fixed). Still running some hardware experiments to determine whether to use the Z axis for the words and model these programs as 3-dimensional or whether to stick to just the X axis and save the Z dimension for lines connecting assignment words with their references. I realize that's still not clear and I hope to add something on hover to Ohayo soon, so you can hover over a word and see the X,Y,Z location.
> On ohayo.computer, you might want to make it obvious that you have to open the developer console to see the output from running the code. This is the last thing I would think to do. Compare it to many other "run code online" sites, e.g. https://repl.it/JtNz/0, where there's both a "run" button so I don't have to hunt for hidden menus that explain what command to press, and a box in which output goes.
Thank you, so, so much for this feedback! I prioritized the heck out of that and just shipped version 3.0, which opens with the source editor visible on the top left, and a console for displaying the output on the bottom left. If you go to a fire program, you can now put your cursor on any line and press "Command+enter" to compile and execute that tree (either just that individual line, or that line and any child lines). You can also press "Command+shift+enter" to just compile the tree and print the compiled version to Javascript.
> On a technical note, Ohayo's fib.fire seems to be broken for me. Every box seems to have "no block 'X' found" in it: http://imgur.com/a/fwM8X. This happens in both firefox and chrome.
I'm sorry! I'm still figuring out best practices for designing ETNs, and I've been changed the instruction words a lot. I'm currently building a suite of utilities that will easily migrate programs from and older ETN version (in this case, Fire), to a newer version. This is definitely a critical need before this thing is ready for people to depend on, and I'm sorry about the trouble now. We have made a lot of progress in figuring out some best practices for ETNs (largely by stealing the best ideas from Haskell), and one of the next big changes will be a highly improved version of Fire that implements those best practices. But even when we have a much better and more stable version of Fire, I expect there will always be room for improvement, so will get some better systems in place to ensure people don't have to worry about breaking changes. Really sorry about that. I should add more disclaimers.
> This presentation doesn't seem to answer many of the questions I have about 3D Languages. Why are they better?
I hope the new console (which allows you to run/compile just a branch or single line of your program) starts to provide some hints as to how different ETNs are, and how they enable lots of beneficial things not possible in 1-dimensional languages and 1-dimensional terminals.
> You say you expect them to have fewer bugs, but I don't see why.
Much more to come on this. Basically the thrust is that empirically 99%+ of bugs occur in extanenous parts of the code. ETNs start bringing us closer to the absolute minimum, perfect program necessary to solve a problem. We will see a huge reduction in bugs just from going to ETNs => current high level languages => machine code, however, that's just the beginning. We're inventing a completely new non-von neumann architecture, 3-d machine architecture, that can compute high level ETN programs directly. But realistically that is years away from reliability. In the short term we'll be able to realize a lot of gains in bug reduction just from using ETNs that compile to 1D languages.
> You say that they're easier to understand, but even you have a problem figuring out what the location of a given character is!
I know, very embarrassing! That one is because the Z/X axis debate is still up in the air. Hopefully the correct answer in the design decision will emerge soon.
> I want to know why you think something, not just that you think something.
A lot of it is empirical. But it's the type of thing where I can see that ETNs will be 10-100x faster because they solve a lot of unsolved problems or poorly solved programming problems, but they won't be 100x faster until all the other stuff is there too (stuff like the code editor, which we finally just added). At this point I see no deal breakers and am highly confident all the predictions will come true (and then some), but it's a matter of still doing lots of grunt work to eliminate the trivial (but impactful) road blocks. But yeah, I fully agree with your sentiment and hope we can start to provide more hard data and direct proof about why we think these things.
> And, not to nitpick, but I really don't get the whole "ETN" phrase
This is a great nit! "Extended Tree Notation" is probably better. Thanks!
Wait, you're writing about "3-dimensional source code" and the dimensions aren't settled yet? That just makes me glad I didn't read your slides, and even less likely to put in the effort next time. I'll repeat my earlier comment: your MVPs are too M and insufficiently V.
How are you so sure that you won't settle on 2 or 4 dimensions? (Let us stipulate that 5 is right out.)
> empirically 99%+ of bugs occur in extraneous parts of the code
You'll need to show me these empirical studies.
I haven't actually ever heard a story that accounts for 99% of bugs. Pretty much every software engineering study ends up with a much flatter profile than that. You have to do many things right to eradicate 99% of bugs.
> ETNs start bringing us closer to the absolute minimum, perfect program necessary to solve a problem.
From what I can tell, ETNs are mostly about eliminating punctuation and replacing it with indentation. Is that right? If so, is your claim that "99% of bugs" are hiding in the punctuation?
Is upgrading the syntax to ETNs all that's needed to eliminate 99% of bugs? What about DRY? The value of good interfaces? Parnas's theory of information hiding? SOLID?
I'll trade my pulled-out-of-my-ass theory for yours. I think bugs arise because our representation of algorithms ("code") over-emphasizes the rules the algorithms performs, and under-emphasizes the input space that the rules are meant to operate on. Bugs arise when people modifying the code forget about rare areas of the input space, and the scaffolding around the project is unable to remind them. Nail down the input space, and bugs go down because your tests fail more often. You won't fix this problem no matter how much you tweak the superficial syntax with which you write code. (I work on this, so it was not pulled out of my ass just now: http://akkartik.name/about; https://github.com/akkartik/mu)
Clear and simple syntax / representation is important; combined with matching editing tools it enables us to communicate ideas easily and fluently.
I also like the idea of well defined input spaces. Many theorems or algorithms only work under certain conditions, and much damage has been done by applying them outside of their intended domains. But I think that's only part of the problem.
My own theory is that programs are specifications, and the more clearly and precisely they specify the better. Programs can fit into a matrix of good/bad ideas and good/bad specifications. Of these, two kinds are interesting bugs:
1) Incorrectly specified good ideas
2) Correctly specified bad ideas
Well specified good ideas are correct programs, and incorrectly specified bad ideas are just hopelessly confused.
Improving the languages and tools will never fix bad ideas, but they can make them more obvious. Now the goal is to make programming as close as possible to 'saying what you mean'. In other words, making the semantics as explicit as possible.
Basically my goal is 'declarative programming', which turns out to be a very vague concept to most people. They all agree that it's better, but nobody seems to have a good explanation for why. I think the difference is that declarative programs specify the only the relationships which are important, leaving the rest up to the platform to optimize or interpret as it sees fit. This leads to powerful and concise languages such as SQL, but at the cost of placing the burden on the platform rather than the programmer. Good for communication and clarity, bad for development and adoption.
Basically, declarative languages can be more concise because they rely more on shared knowledge; predefined vocabulary. If the language doesn't already have a way to express the concept you want, however, it is much more work to add. Imperative / procedural programs are more flexible because they rely on implicit semantics. You just tell the computer what to do—you don't have to explain what it is doing or why. Everything the program "accomplishes" is imaginary and external to the specification. This leaves very little room for the computer to optimize your selection of operations, and leaves a lot of room for you to accidentally provide an incorrect sequence of steps.
It's like the difference between giving directions by saying "Go to the grocery store at 5th and Main" vs. "Take a left, go three blocks, take a right, go two more blocks, park on the right side of the street and enter the blue building." The first is much clearer, but places much higher expectations on the navigation abilities of the recipient, while the second can be followed by anyone even though they have no idea where they're going - and mistakes are correspondingly harder to notice.
Sadly, the nature of declarative languages makes them fairly domain specific, which may explain part of why they're so rare and hard to make. Creating a declarative language for solving a class of problems is much harder than solving a single problem imperatively; you actually have to think of how and why you're solving those problems. But I think we could probably create some general patterns and guidelines for defining them, and maybe even start building up some tools to reduce the effort required.
> Wait, you're writing about "3-dimensional source code" and the dimensions aren't settled yet? That just makes me glad I didn't read your slides, and even less likely to put in the effort next time. I'll repeat my earlier comment: your MVPs are too M and insufficiently V.
> How are you so sure that you won't settle on 2 or 4 dimensions? (Let us stipulate that 5 is right out.)
Sorry, the language itself is fully settled, the only question is with our hardware prototypes, we've found a way to compute with the ETN programs mapped to 2-dimensions, and a machine structure where we can compute answers with a source program mapped to 3-dimensions. But really both are 3-dimensional, the former it's just the Z-axis doesn't vary.
In the 3-D version, the first word of a node (aka the head/base/instruction/type/command), is at z 1, and subsequent words go up the z-stack. In the 2-d version, subsequent words just go up the x-dimension. They actually both offer advantages, and we'll figure out which is better I'm sure in the next year or so.
Again, this stuff is at the cutting edge of the hardware research. We're talking about a whole new type of machine architecture without registers.
> You'll need to show me these empirical studies.
Totally agree. We will.
> I haven't actually ever heard a story that accounts for 99% of bugs. Pretty much every software engineering study ends up with a much flatter profile than that. You have to do many things right to eradicate 99% of bugs.
Agreed. And to hit that 99%, we're going to need the new hardware, so that is quite far off (3 - 20 years, hard to predict). But we can hit 90% fewer with ETN software alone.
> From what I can tell, ETNs are mostly about eliminating punctuation and replacing it with indentation. Is that right?
No. Forget about the punctuation of newlines and spaces. Think about it as Cartesian coordinates. ETNs are about giving source code physical dimensions. About making sure that source code could directly be built out of circuitry. Think of ETN programs like something you could build in a Voxel editor like MagicaVoxel. Each block holds a word, which is just a number from 0 to infinity, and problems are trees of these numbers connected in physical space. Sorry if that's not clear. I think the more code and tools we build the easier it will be to understand.
> Is upgrading the syntax to ETNs all that's needed to eliminate 99% of bugs?
No. To reduce bugs by 90% (99% won't be possible until we have ETN machines) you also need well designed ETNs. Which have good FPL things like no side effects, prefix notation, DRY, good naming, good interfaces, et cetera. Great question. Working on a release shortly with a lot more tools and help on building great ETNs.
> Nail down the input space, and bugs go down because your tests fail more often.
I like that! I'm a big fan of strongly typed languages and the idea they basically prove your program correct at compile time if you think more about your types.
Thanks for the feedback! I hope the next wave of ETN stuff will help start to demonstrate the benefits better.
While the concept of a cartesian program space is interesting, it seems largely unrelated to TNs. This is probably a good thing though, as programs require semantic relationships ("lines" between nodes) that are lacking in cartesian spaces. If there was semantic significance to adjacency or distance between points, or along each axis, that might be reasonable. Otherwise the "dimensions" are just an irrelevant and cumbersome alternative to line numbers.
Additionally, a third dimension is meaningless as long as your fundamental representation is two-dimensional. Unless you use an editor that is natively 3-dimensional, mapping a two-dimensional representation onto three-dimensions will leave a lot of redundancy or sparseness, as demonstrated by your conflation of x and z.
> If there was semantic significance to adjacency or distance between points, or along each axis, that might be reasonable
Yes, there is semantic significance to adjacency & distance from the y-axis (which indicates an edge that connects parent and child nodes).
We are approaching everything simultaneously from the highest abstract level and lowest logical level. We have some more stuff coming out soon that shows off the benefits of the dimensionality more. One of the cooler experiments is a new type of processor with a graph-paper-esque 2D grid that can load a high level tree program and then execute it directly (no cumbersome series of transformations to a bunch of 64 bit registers). AFAIK this is original, though I wouldn't be surprised if Lisp Machines, Thinking Machines, Alteryx, Nvidia, Intel, et cetera have dabbled in this space a bit (though to date haven't been able to find anything on machines that execute trees directly).
> Yes, there is semantic significance to adjacency & distance from the y-axis (which indicates an edge that connects parent and child nodes).
Actually, it seems like your tree relationships have a very confusing relationship to the coordinates. Adding a newline increments Y, and a space increments X, but children are those nodes such that that
1) child.Y > parent.Y
2) child.X == parent.X + 1
With additional complications that only the node with the lowest X value for a given Y becomes the child; all others on the same line become part of the content of that node.
This means that the relationships between two elements depends not just on their coordinates, but also the coordinates of nearby nodes. (6, 4) may or may not be a direct child of (5, 3); it depends on if (5, 3) is a full node, or just a content element that's actually part of (5, 2) or (5, 1).
So the coordinates do not actually define the relationships between nodes; they do not clearly relate to the tree structure at all.
> have a very confusing relationship to the coordinates
Agreed. I sometimes get confused too.
One rule that always holds is this:
1) One line === One node
So every node has an absolute Y coordinate (just the line number), but also a relative coordinate(s), relative to its ancestor(s).
Both are useful at various times. There's probably a better way to eliminate confusion here.
> So the coordinates do not actually define the relationships between nodes
Given an array of node coordinates {y,x} [{1,1}, {2,2}, {3,1}, {4,2}], one has enough information to define the whole tree structure of the program. But you are right, you need the full set of coordinates of a certain node's ancestors to properly know its coordinates, and having a line that begins with 1 or more spaces, it is impossible to deduce how many nodes deep it is without also having access to the previous line(s).
I think the TN/ETN parsing model is somewhat neat in its simplicity, which means it will probably have some longevity.
However, most of the work you have done is just a simplification of the syntax; it has no relation to the semantics whatsoever, and as such is unlikely to cause a major paradigm shift.
Perhaps the coolest part of your notation is the concept of constant validity, which in this case you achieved by simplifying the notation until it matched the medium. Every atomic operation on the text (add a character, new line, or space) is also a valid atomic operation on the tree. Especially because it works with any text editor, instead of fancy semantically (or at least syntactically) aware editors. However, I think any true advances in programming will require improvements in the semantics.
> However, most of the work you have done is just a simplification of the syntax; it has no relation to the semantics whatsoever,
Agreed. However, I think one thing that is starting to emerge from our data (17 useful ETNs now compiling to Javascript, Rust, TypeScript, Logo, Haskell, C++, LLVM IR, SQL, HTML, CSS, JSON, and Regular Expressions) is how well this Tree Notation syntax can work for every programming paradigm (functional, imperative, declarative, dataflow, oo, logic, stack ...). Perhaps it is best explained as a universal syntax. The neat thing about this is that once you learn the TN syntax, you now know the complete syntax for languages with very different semantics. So while I agree we aren't changing semantics here yet, instead just leveraging the semantics and VMs of existing languages, this universal syntax could be big in that it can lead to better cross language static tools and enable developers who generally stick to one or two paradigms to make use of more.
> Perhaps the coolest part of your notation is the concept of constant validity
Agreed! The elimination of parse errors is one of my favorite features. Of course, the user can still make errors at the ETN level like mistyping a word or providing invalid parameters to a node. To help catch and fix these kinds of errors, I just launched version 5.0 of Ohayo (Ohayo still shitty, but the core is getting really solid) which includes a revamped compiler-compiler that supports 100% type checking of every word in your program. It makes it easy to create, as you say above "well defined input spaces".
> this universal syntax could be big in that it can lead to better cross language static tools and enable developers who generally stick to one or two paradigms to make use of more.
An alternate syntax will not allow you to use any additional paradigms unless you also provide alternate semantics. It might enable more powerful editing tools or effective macros and metaprogramming though.
> An alternate syntax will not allow you to use any additional paradigms unless you also provide alternate semantics.
Right. The syntax for ETNs is the same, but the semantics are different. For example, I have a language called "Flow" that is a data flow language, passing a matrix through a series of nodes. I also have a logic language called "Project", that can solve relational issues among nodes. Different semantics, identical syntax.
Right now to use different paradigms, a user generally has to learn different semantics and different syntaxes. This eliminates the latter.
Is that a good thing, though? A classic design principle is that similar things should look similar and different things should look different. Imagine a project with both Flow and Project files. Wouldn't it be nice to be able to tell them apart at a glance?
> A classic design principle is that similar things should look similar and different things should look different.
Agreed, but I think context eliminates such a need. If this comment were about cooking it would look the same. We reuse one writing system.
> Imagine a project with both Flow and Project files. Wouldn't it be nice to be able to tell them apart at a glance?
Ah, good point! So far it hasn't been a problem, but I imagine there may be issues as the number of Tree Languages (note--I took the feedback and dropped the "ETN" acronym) and combinations increase. It might emerge that there are some universal best practices so semantics won't change too markedly from one language to the next. But I think it could be that semantics vary a lot. Right now I have some languages where flow goes forwards (top down) backwards (children up), stack based, parallel, synchronous, et cetera. I personally haven't had trouble keeping them straight just knowing the context, but that is not necessarily a predictor of how it will go for other people (or even me), in the future. We shall see.
Another similar problem is when you have a file with both Flow and Project code (something that actually comes up a lot).
What happens when 2 languages use the same keyword but with different semantics and it happens that a 3rd language embeds them both? It might cause some confusion. Or even just the basic problem of doing color highlighting for one language in a node of another--how do you ensure the color schemes don't conflict? Perhaps a border or something would do the trick. Problems to solve in the future.
> If this comment were about cooking it would look the same. We reuse one writing system.
That's true, but the fact that we're both able to make analogies just suggests that analogies aren't a good defense for your system. It isn't self-evident that "eliminating different syntaxes" is always a good thing. You need to actually take the trouble to motivate it.
In my experience the hard part of dealing with polyglot systems is juggling the different semantics. Syntax is in the noise. Should it be the same or different? It just doesn't seem worth thinking about.
Don't get me wrong, I find Lisp's uniform syntax very helpful. But Lisp is helpful also because of its (relatively) uniform semantics. While adding Lisp syntax atop say Erlang seems useful, mixing LFE and regular Scheme would be a nightmare.
> What happens when 2 languages use the same keyword but with different semantics and it happens that a 3rd language embeds them both?
Yes, I can relate to this question. For example, here's a fragment from the Mu codebase where I embed tests containing Mu programs in my C++ implementation: http://akkartik.github.io/mu/html/040brace.cc.html#366. The Mu instruction is `return-if`, but because it's in a C++ file, just the `return` is highlighted. Super ugly.
My take-away from all this: polyglot systems are a bad idea. Mu's implementation being in C++ is hopefully a temporary state of affairs. We shouldn't be picking "the right tool for the job". Software is more malleable than past tools. We should be tweaking our one language to do everything the job needs.
So rather than try to come up with solutions for polyglot programming, I'd just discourage it altogether.