Looking at adoption of languages like Lisp and Arc I believe great documentation is crucial to their use and adoption. I don't use languages with bad documentation.
Here is some great documentation but it's not sustainable and people can't contribute to it or generate their own documentation.
http://files.arcfn.com/doc/index.html
I am interesting in what the community thinks of contributing to building of a built in documentation system for arc.
I am interested in people opinions of how to architect this mechanism.
One example as a wrapper macro or function
(adoc arcpackage "just prints it's argument"
(def foo x (prn x)
)
A example embedded with in the def call
(def foo (adoc :package pexample :description "foo just prints it's argument") x (prn x)
What are your ideas and concepts for this? I think it's crucial to adoption of arc and any language
I would just like to point out that I love Python's doctest module. It lets you embed Python code in a docstring, which is then evaluated for correctness:
def foo():
"""
>>> foo()
42
"""
return 42
When you use doctest, it will look for >>>, evaluate it, then check if the result matches the expectation. It's great because:
A) it's really simple and easy to use
B) unlike prose, the documentation is actually evaluated, so it's kept up to date
C) provides documentation for how the program is expected to behave in various situations
---
In fact, I usually don't write unit tests for my JavaScript code, but I write doctests for my Python code all the time, simply because it's so darn easy to use.
What if you don't want to embed the tests in your code itself? You can also tell doctest to load and eval a file instead:
# tests/foo
>>> foo()
42
Then in Python:
import doctest
doctest.testfile("tests/foo")
I'm using this approach in PyArc with great success, with hundreds of tests spread across 9 files. And the doctest module lets you embed prose/comments in the tests as well:
# tests/foo
The foo function is very very very complicated:
>>> foo() # this calls the foo function
42
Thus, it provides a way to easily document and write unit tests at the same time. Such a system is not necessarily competing with things like Javadoc, but instead could be used to complement them.
---
By the way, unit testing isn't exactly a new idea: aw already has a collection of unit tests written in Arc. The reason I like doctest is because it lowers the cost of entry as much as possible. Sure, you could use something like this:
(testis (foo) 42)
But I think it's easier and clearer to use something like this:
>>> (foo)
42
Another difference is that you don't need specialized functions like test-iso, testnil, testt, catcherr, etc. It matches the printed output with the expected output:
>>> '(1 2 3)
(1 2 3)
>>> nil
nil
And it would work for errors, too:
>>> (err "foo")
error: foo
This isn't meant to be a slight on aw: most unit testing frameworks I've seen do something similar, including QUnit (which is what I use for JS). Even my own Arc unit tester that I quickly whipped together does it like that. But I like Python's doctest approach.
The code site that I'm working on is in fact primarily driven by a desire to fix the documentation problem.
My approach is to make documentation more wiki-like so that multiple people can easily contribute. I find examples particularly useful in documentation, so a "pastebin for examples" is another major part of my work.
I hope that this will turn out to be more useful than the traditional approach of embedding documentation in the program source because it will make it easier for other people to contribute to the documentation.
You beat me to the punch by 25 minutes. ^_^ My reply below (at http://arclanguage.org/edit?id=14084) covers the same sort of stuff as yours but in a longer and more scatterbrained way.
If that's all you want, Anarki already has it. ^_^
I'm interested in finding a documentation solution that scales up to programming-in-the-large, without being the cumbersome kind of thing that people would have trouble using even if they wanted to.
In particular, I like the Racket approach of having both an API reference, where people look up the meanings of particular functions, and a user guide, where people learn the design of the system and develop a better idea of which functions they should actually be looking for.
In-source docstrings (or Javadoc comments and the like) should be an okay way to get good coverage in the API reference, but they're kinda independent of each other and hard to automatically organize, especially in a language without modules or classes. They're not even close to being good pieces for walkthrough-like user guides.
Literate programming, meanwhile, encourages code to be arranged as though it's a user guide. (This may be just one style of literate programming; I imagine people who care about literate programming mainly just care about good documentation.) The Inform 7 docs are an awesome example of this:
However, there are usually at least some parts of the code that are left as an exercise for the reader to understand. :-p This approach also straddles the line between guide and reference without necessarily being the best at either. There's kind of a cross-cutting concern going on, the more roles you try to give to a single codebase, and it kinda drives me in the reverse of literate programming: I almost suspect it would be most ideal just to put nothing but implementation notes in the code comments, and to have code be simple enough to stand on its own (like the current state of Arc).
This ties back in with another, more important part of the Racket discussion: The version of the program you have shouldn't determine the version of the docs you have. In fact, the docs oughta be a living document that immediately collects improvements and discussions, oftentimes faster than the codebase changes. They should be some form of wiki or help forum.
I'll go farther than that: Keeping comments in the code is a brittle system because they'll become inconsistent with the real docs. Instead, people should browse the code on a website that automatically lets them see the most recent versions of the wiki content. There'd essentially be a CMS that manages:
- The code itself.
- Snippet-local API docs.
- Comprehensive, listing-like API references.
- User guides.
- Examples of various sizes (like aw's "pastebin for examples" idea).
- Tutorials.
- Bug tracking entries.
- Freeform discussions on all these things.
This is bigger than I expect to design all by myself, for sure. :-p
I definitely agree with you that there is a difference between api reference and user guides. Languages do need both.
You bring up a excellent point that documents should be living documents and can evolve faster than the version of the code. With a community this small I think it's easiest to bundle the two together since anyone can contribute to anarki.
Maybe Anarki contributor should designate a collaborative location that can serve as the official site for documentation for atleast user guides, tutorials and faq's.
For discussions I think we would all agree that arclanguage.com is the best place.
"With a community this small I think it's easiest to bundle the two together since anyone can contribute to anarki."
I like that in concept. We could just have a GitHub Pages branch on Anarki, and GitHub would automatically let us view it as a web page. However, aw mentioned having trouble with GitHub Pages: http://arclanguage.org/item?id=12934
Someone could rig up a website to serve GitHub raw file views as HTML, but I don't know if that's nice to GitHub. :)
Someone could instead have a website that somehow keeps an up-to-date clone of Anarki (perhaps triggering a "git pull" not only as a cron job but also every time a certain page was viewed) and somehow uses that to determine the website content.
One thing to consider is security: If anyone can show their own JS code on this page, they could set tracking cookies or something. If anyone can run Arc code on the server, there's even more to worry about (albeit nothing the racket/sandbox module isn't designed for).
---
"Maybe Anarki contributor should designate a collaborative location that can serve as the official site for documentation for atleast user guides, tutorials and faq's."
However, having a separate place for documentation is only one part of what I'm suggesting. I'm not sure it's worth it unless the separate parts are somehow integrated again--for instance, by showing docs and discussions as you browse code, or by letting user guide writers say {{doc:anarki:lib/util.arc:afnwith}} or somesuch to include a piece of the API reference.
---
"For discussions I think we would all agree that arclanguage.com is the best place."
Speaking of which, are arclanguage.com and arclanguage.org both legitimate? Both of their WHOIS entries list Paul Graham, but I don't know whether that means anything. I've never logged in anywhere but arclanguage.org, just because it's what most people link to.
Anyway, we totally do use the Arc Forum for discussions now, but I think things would be better if we could incorporate ideas like the ones from this thread: http://arclanguage.org/item?id=12920
I imagine that my complaints about GitHub Pages at the time were probably just growing pains on GitHub's part.
However exactly for the reason of implementing our own features at some point such as the cross references you mention I expect that we're going to want to do our own processing. Which suggests that GitHub Pages or the arclanguagewiki on Google Sites might be part of the right long term solution, but only if there's a way to e.g. insert the piece of an API reference... which we're generating.
Here's a thought. What if we had a server which went out and gathered documentation source material from various places such as Anarki. (GitHub has http://help.github.com/post-receive-hooks/ so the server could get notified of new pushes to Anarki instead of having to poll).
The server would work on the text of the sources, such as docstrings found in the Anarki source code. That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser). The server would process the documentation source material and generate static HTML files... which could be hosted on S3 or GitHub Pages. This would have an additional advantage that even if the server were down, the documentation itself would still be up and available.
"The server would work on the text of the sources, such as docstrings found in the Anarki source code."
With this approach, people might be pushing to Anarki way more, sometimes using in-browser file edits on GitHub, and the server would have to scrape more and more things each time. Then again, that would be a good problem to have. :-p
---
"That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser)."
By the same token, it would be harder for just anyone to update the server, right? Eh, that might be a necessity for security anyway.
Potentially, parts of the server could run Arc code in a sandbox, incorporating the Arc code's results into the output with the help of some format that's known to have no untrusted JavaScript, like an s-expression equivalent of BBCode or something.
Well, code that generates page contents.... Suppose I want to put "prev" and "next" links on several pages, or suppose I want an API reference to automatically loop through and include all the docstrings from a file. Lots of this could be up to the server to do, but I'd like for the documentation itself to have some power along these lines. For instance, someone might write a DSL in Arc and want to set up a whole subsite covering the DSL's own API.
Besides that, it would just be nifty to have the Arc documentation improve as people improved the Arc libraries and vice versa.
Suppose I want to put "prev" and "next" links on several pages, or suppose I want an API reference to automatically loop through and include all the docstrings from a file.
I'd just have the server code do that.
For instance, someone might write a DSL in Arc and want to set up a whole subsite covering the DSL's own API.
Sorry, not following you here. How would this be different?
Besides that, it would just be nifty to have the Arc documentation improve as people improved the Arc libraries and vice versa.
Certainly. Naturally the server code can be written in Arc itself.
Say this DSL is a stack language written in Arc, called Starc, and Starc programs are implemented by lists of symbols. I've set up a global table to map from symbols to their meanings, and I have a 'defstarc macro that submits to that table and supports docstrings.
Now I want my language to have documentation support that's seamless with Arc's own documentation. Somehow I need my Starc documentation to be split across multiple pages, with some pages created using the 'defstarc docstrings. I want Starc identifiers to be displayed in a different style than Arc identifiers, but if anything, I want it easier for a Starc programmer to refer to Starc identifiers in the documentation than to Arc identifiers.
So every time I come up with one of these requirements for the documentation, I should submit a patch to the server or something? Fair enough--the code implementing the documentation oughta be documented somewhere too, and keeping it close to the project also makes it more ad hoc and inconsistent--but I think this would present a bit of an obstacle to working on the documentation. I'd rather there be a compromise, where truly ad hoc and experimental things were doable in independent projects and the most useful documentation systems moved to the server code gradually.
This would be more complicated to design, and it could probably be incorporated into a more authoritarian design after it's underway, so no worries.
- you run a copy of the server code you're working on locally, until you see that your "Stark" documentation is being integrated into the rest of the documentation in the way that you want it to
- you push your changes to the server (say, via github for example) and they go live
OK, but what if you're a completely random person, you've never posted anything to arclanguage.org, no one knows who you are, and you want write access to the server so that you "can do stuff". Alright, fork the repo on github, push your changes there, and send a pull request. Then when you turn out to be someone who isn't trying to install malicious Javascript you are given write access to the server repo yourself. (This is pretty standard approach in open source projects, by the way).
But... what if write access to the server repo ends up being controlled by an evil cabal of conservatives who reject having any of this "Starc" stuff added? Fire up your own server, publish the documentation pages yourself, and people will start using your documentation pages because they are more complete than the old stuff.
My concern with the sandbox idea is that I imagine it's going to be hard to create a sandbox that is both A) powerful enough to be actually useful, and B) sufficiently constrained so that there's no possible way for someone to manage to generate arbitrary Javascript.
I'm finding this discussion very helpful, by the way. What I'm spending my time on now is the "pastebin for examples" site. I've been wondering if this project would stay focused on just the examples part (with the ability for other documentation sites to embed examples from the pastebin site) or if it would expand to be a site for complete documentation itself (the "code site for Arc" idea).
For the pastebin site I've thrown away several designs that weren't working and I've found one that so far does look like it's going to work. But, the catch is that by design it allows the site to execute arbitrary code in the target machine that's running the example. This isn't too terrible by itself (you can always run the example in a virtual machine or on an Amazon EC2 instance etc. instead of on your own personal computer if you want), but it does mean that the "pastebin for examples" site is going to need a higher level of security than an Arc documentation site.
Which in turn implies that while the Arc documentation site can use examples from the pastebin site (if people find it useful), the pastebin site itself shouldn't be expanding to take on the role of the Arc documentation site (since the Arc documentation site can and should allow for a much freer range of contributions).
"But... what if write access to the server repo ends up being controlled by an evil cabal of conservatives who reject having any of this "Starc" stuff added?"
The main thing I'm afraid of is the documentation site becoming stagnant. Too often, someone finds the arclanguage.org website and asks "How do I get version 372 of MzScheme?" Too often, someone who's been reading arcfn.com/doc the whole time finally looks at the Arc source and starts a forum thread to say "Look at all these unappreciated functions!" ^_^
I don't blame pg or kens; I blame the fact that they don't have all the time in the world to do everything they want. I'm in the same position, and I bet it's pretty universal.
---
"Fire up your own server, publish the documentation pages yourself, and people will start using your documentation pages because they are more complete than the old stuff."
That could be sufficient. But then while I'm pretty active on this forum, I'm not sure I have the energy to spare on keeping a server up. If the community ends up having only people as "let someone else own it" stingy as me, we'll be in trouble. >.>;
---
"My concern with the sandbox idea is that I imagine it's going to be hard to create a sandbox that is both A) powerful enough to be actually useful, and B) sufficiently constrained so that there's no possible way for someone to manage to generate arbitrary Javascript."
All I'm thinking of is some hooks where Arc code can take as input an object capable of querying the scrape results and give as output a BBCode-esque representation that's fully verified and escaped before use. But then I don't know if that would be sophisticated enough for multi-page layouts or custom styles or whatnot either. ^^;
There could also be another Arc hook that helped specify what to scrape in the first place... but in a limited way so that it couldn't do denial-of-service attacks and stuff. ^^; lol
Partly it's just a curiosity for me. I like the thought of letting Arc code be run in a sandbox for some purpose, even if it's only marginally useful. :-p
---
Meanwhile, I had another thought: Even if the server doesn't allow running arbitrary code, people could still develop special-purpose things for it by running their own static site generators and putting up the output somewhere where the server will crawl. I wonder how this could affect the server design.
But then while I'm pretty active on this forum, I'm not sure I have the energy to spare on keeping a server up.
I'd be happy to run the server, and set up some kind of simple continuous deployment system so that when someone makes a code push to the server repo the code goes live.
Depending on availability and motivation I may (or may not...) end up having time myself to get Ken's documentation into a form where it can be edited (he generously offered last year to let us do this).
A part that I don't have motivation to do myself is writing the code that would crawl Anarki and generate documentation from the docstrings.
I like the thought of letting Arc code be run in a sandbox for some purpose, even if it's only marginally useful.
I certainly won't prevent someone from adding a sandbox to the server. On the other hand... if you'd like to work on something where a sandbox would be useful ^_^, I'd encourage you join me in my API project :-)
"The main thing I'm afraid of is the documentation site becoming stagnant. Too often, someone finds the arclanguage.org website and asks "How do I get version 372 of MzScheme?" Too often, someone who's been reading arcfn.com/doc the whole time finally looks at the Arc source and starts a forum thread to say "Look at all these unappreciated functions!" ^_^
I don't blame pg or kens; I blame the fact that they don't have all the time in the world to do everything they want. I'm in the same position, and I bet it's pretty universal."
I think if contributing is open and flexible people will contribute to keep the site up todate. Complete and simple instructions must exist to help and encourage people to contribute. Some is social where people feel they need "permission" to contribute.
The interesting thing I am seeing among the experimentation and projects people are doing here is the fragmentation. I think experimentation with languages are great and very necessary but it's difficult to see there isn't a main champion for the community to rally behind.
PS stupid question how are you italicizing quoted text. I tried adding <i>some text</i> but that didn't work. I haven't had enough time to play with the comments to figure that out.
"The server would work on the text of the sources, such as docstrings found in the Anarki source code. That way even if someone pushed something malicious to Anarki then we wouldn't have a security problem (either on the server or in the reader's browser)."
If it ever got to the point where actually eval'ing the code were necessary/desirable, you could do so in a safe namespace in PyArc (hint hint).
Arc already supports docstrings, and has a built in function 'help that displays them, along with the signature of the function and the filename in which the function was defined. The help information can also be accessed from an op when the arc server is running, but I'm not sure how much of that functionality is Anarki specific. There is also 'sig that displays the signature of a given function, 'fns that searches for a given function name in the list of presently defined functions, and on Anarki there's also 'src that prints out the source of a given function or macro. Unfortunately, Anarki's ppr is currently broken because len no longer works for improper lists.
I've been interested for a long time in runtime accessible automatic documentation, and 'src and 'help go a long way towards providing that. One of the few things I think we still need in that regard for runtime documentation is a means of searching for fns by category, and that will probably involve either adding tags to the docstring, or searching the code for a given pattern. Having the source of a function stored in a table at runtime is incredibly useful, and could allow you to search for all functions that call a given function or macro, or that use a particular idiom.