I've been trying to work Ken Shirriff's docs for arc3.1 at http://arclanguage.github.io/ref into a representation that keeps them close to the code and thus more likely to be updated. One piece of the puzzle is what I call examples:
arc> (help map1)
[fn] (map1 f xs)
Returns a list containing the result of function 'f' applied to
every element of 'xs'.
Examples:
arc> (map1 cdr
'((1) (2 3) (4 5)))
(nil (3) (5))
arc> (map1 [list _ (* _ 10)]
'(1 2 3))
The examples aren't just text. They're code I chose to place after each definition:
(def map1 (f xs)
"Returns a list containing the result of function 'f' applied to
every element of 'xs'."
(if (no xs)
nil
(cons (f car.xs)
(map1 f cdr.xs))))
(examples map1
(map1 cdr '((1) (2 3) (4 5)))
(nil (3) (5))
(map1 [list _ (* _ 10)]
'(1 2 3))
((1 10) (2 20) (3 30)))
Here's what happens if the documentation goes out of date:
Changes so far: https://github.com/arclanguage/anarki/compare/85962de2d9...64105bd8d5. I haven't finished adding all the examples yet, but will keep doing so. It's a therapeutic diversion in a busy time.
After the examples are done, we still need some way to maintain the connective narrative. We could just rely on the arc 3.1 docs for that. Or we could maintain a smaller version of the reference and autogenerate the tables on each page. Or we could go whole-hog on a literate format. We'll see.
Another idea: have examples also run as tests using our now-standard unit-test.arc that zck's been converting all our tests to (huge thanks!), with some convention for the suite name.
Ideally, the 'results' in the documentation would still be computed dynamically, so you can always see whatever the current output would be, but also have the expected results available for separate regression testing.
All the talk about stdin/stdout/randomness got me realizing that, really, any state is going to be an issue (as state is wont to be). Not that it's often a problem in a mostly-functional language like Arc.
On the one (liberating) hand, Arc's philosophy clearly guides us to not lose sleep over the potential problems of state---just be careful. I see your destructive examples are careful to use ret, for instance, rather than global assignment. On the other hand, we are losing the ability to express certain kinds of tests here. Ideally, we could eval the examples in their own "sandbox" and capture their output, just as if it had been at a REPL.
This is an interesting problem, because it feels so familiar. After some slow, tired thoughts (been a long day for me), it hit me: "Oh! We're doing doctests, except in the other direction."
1. Copy/paste a REPL interaction into a docstring.
2. Looking up the documentation shows the static string.
3. A test function (doctest.testmod()) combs through docstrings, parses them apart, runs them in isolated environments, and checks their results.
Here:
1. You give some literal code as examples (yeah, Lisp!).
2. Looking up the documentation evaluates the example in the current environment.
3. Running the example generates some output that we print out as though it had been pasted from a REPL interaction.
For point 1, using sexps seems more befitting to Arc than parsing out docstrings. Yet we do lose something here: writing literal strings of expected output is cumbersome, as you mentioned. Much better would be to just copy & paste an entire REPL interaction (inputs & outputs) into a multiline string and call it a day.
Point 3 is a small detail: to run the tests, you "have" to look at the docs. Except writing a doctest.testmod() equivalent isn't hard---just evaluate all the examples without printing their docs.
But Point 2 is an interesting one. Arc doesn't have much support for environment-mangling: eval doesn't like lexical variables, lack of modules/namespaces, the difficulties of environment introspection, the weirdnesses that macros introduce, etc. About the only way I see to have a guaranteed clean slate (without requiring us to just "be careful" with the examples) is if we could fork off another Arc interpreter, run the example in that process, and capture the output (including stderr!) for comparison with the docstring. I reckon it would be slow as hell (one interpreter for each block of examples), but is that otherwise feasible?
I like the idea of just having to compare output strings universally (the repl-output comment I made), so that we don't really have to decide which expecteds to selectively evaluate. Instead, just look at the strings: (is "#hash(...)" "#hash(...)"). But there are probably ways that just comparing the output strings could break. Obvious things like random number generation, race conditions, user input. But even tables: are two tables with the same contents guaranteed to serialize the same way? I'm too tired to effectively think through or look up an answer.
I like the examples stuff. The only issue that I have with it is that you are providing explicit, yet non-tested results. It makes sense to me to either provide just the code and have the documentation system generate the results, or to provide test cases instead.
Ideally, the 'results' in the documentation would still be computed dynamically, so you can always see whatever the current output would be, but also have the expected results available for separate regression testing.
Also, I think it would be great if the regression test results could be automatically generated and stored somewhere else, because I don't want to have to manually type in expected results for every case. Some mechanism to 'freeze' the current output as the accepted value for regression purposes would be cool.
Yeah, that whole side is a bit clunky. It gives readers feedback when things are broken so they aren't misled. But the job of the writer is still hard. Things would be improved by my last idea, to run examples along with other tests.
Here's a couple more issues:
a) Expected results are currently unevaluated, so there's no way to represent tables, and so on.
arc> (help counts)
[fn] (counts seq)
Returns a table with counts of each unique element in 'seq'.
Examples:
arc> (counts '(b a n a n a))
(obj b 1 a 3 n 2) <-- ERROR
b) We can't provide examples for non-deterministic functions like rand-elt.
c) We can't show stdout and stderr.
---
Last but not least, I've been thinking about the freezing idea myself in other contexts. I think there's something big there.
As for the 'freezing', I've been thinking about it in other contexts as well. I noticed that one of the companies I worked at seemed to want to do some of their regression tests that way, so it makes sense to provide a way to automate it.
In particular, I would like a way to convert an interactive console session (my normal method of testing code) into a set of unit tests. Not sure exactly what the right way to do that would be though.
It's still not a very good form of unit test, as it won't cover cases as mentioned above with randomness or side-effects. Mutation testing or fuzz testing / generators are probably better ways to generate unit tests, but they don't make very good examples.
Yeah. B can probably work just fine with dynamically evaluated examples, but stderr/stdout would require user interaction, which may not be acceptable. Maybe some way to indicate if an example should be evaluated or not?
What do you mean by 'run examples along with other tests'? Does that mean that you would include other tests in the help output? That the examples would be included when running the tests? Or something else?
(examples do
(do (prn "line 1")
(prn "line 2")
(prn "line 3"))
_)
Here's how it looks:
arc> (help do)
[mac] (do . args)
Evaluates each expression in sequence and returns the result of the
last expression.
Examples:
arc> (do (prn "line 1")
(prn "line 2")
(prn "line 3"))
It makes sense to specify the example expr / expected pairs as sexps & strings: run repl-output on exprs, compare those strings to the expected strings.
Thanks for thinking through that! Yes, I'd had a similar though not as fully-formed idea, but was stuck on distinguishing stdout from return value in the examples. But yeah, that's overthinking it -- we have to mentally parse the two at the repl anyway.
Only other quibble: a string isn't very nice to read next to code, even if it looks nice at the repl. Hmm..
Maybe we should make an auto scraper/notifier? Though iirc, the arc server doesn't really appreciate frequent queries. On closer look, it should be fine; the default throttle window should be 10s. It could query once a minute, and update an rss feed for arc forum posts and comments. Or provide webhook + email support.
I actually built one back in the day: http://akkartik.name/post/2011-05-13-05-47-29-soc. Trouble was, it kept getting banned/blocked even when I scaled back to crawl HN every five minutes. Eventually I was crawling so infrequently that I was missing stories that dropped off /newest in the meantime, so I stopped crawling.
I actually had an arc version for a brief period[1], but it didn't seem like there was as much need for it here. But perhaps the email notifications would be useful. I'll see if I can bring it back.
[1] I was certain I'd showed it here, but no I only showed it over email to a couple of folks. I really should bring it back just so y'all can look at it. Or at least locate the (rails) sources.
For the connective narrative, how about using something like the existing [[ref]] markers that were in the docstrings to provide 'tags' for each entry? Every entry would then be associated with a set of tags that includes it's own name + any tag mentioned in its docstring.
This gets you two things.
1) You can now provide grouped details pages, like the 'list operations' and
'macros' pages from the arc3.1 docs.
2) You can provide connective narrative associated with a particular tag, in a separate document.
It probably gets you other things too. Maybe we could add the ability on the command line to filter help by tag instead of just name. That would be nice.
Cool idea! I used to do something a long time ago: use a little-known option of ctags (-r) to match wikiWords, so that hitting C-] in vim or M-. in emacs with the cursor on say 'databaseSubsystem' would take you to '$<databaseSubsystem>' in the codebase.
The nice thing about it is that a name could potentially belong to multiple different tags/narratives. Multiple overlapping trails is a pretty decent way to narrate an intrinsically non-linear thing like a codebase. See also http://leoeditor.com.
a) All examples are now checked when you load tests.arc.
b) You can pass in an unprintable expression using valueof. For example:
(examples sref
(ret x '(1 2 3)
(sref x 4 1))
(1 4 3)
(ret x "abc"
(sref x #\d 0))
"dbc"
(ret x (obj a 1 b 2)
(sref x 3 'd))
(valueof (obj a 1 b 2 d 3)))
This is how it looks:
arc> (help sref)
[fn] (sref tem v k)
Sets position 'indices' in 'aggregate' (which might be a list, string, hash
table, or other user-defined type) to 'value'.
Examples:
arc> (ret x '(1 2 3)
(sref x 4 1))
(1 4 3)
arc> (ret x "abc"
(sref x #\d 0))
"dbc"
arc> (ret x (obj a 1 b 2)
(sref x 3 'd))
#hash((b . 2) (a . 1) (d . 3))
Summary of rules for expected value:
i) If it's _, checking is skipped and help doesn't print the result.
ii) If it's of the form (valueof x) then we evaluate x when printing and comparing.
iii) Otherwise we compare against the raw value.
I could eliminate b) by always evaluating x. This would make tests look like:
(examples list
(list 1 2 3)
'(1 2 3) <-- note the quote
(list "a" '(1 2) 3)
'("a" (1 2) 3)) <-- note the quote
rather than like the current:
(examples list
(list 1 2 3)
(1 2 3)
(list "a" '(1 2) 3)
("a" (1 2) 3))
Which do people prefer?
(Then again, perhaps there's no point polishing this further if we find a clean way to manage examples as strings that can continue to be checked, while handling ordering and so on.)
Personally, I'm of the opinion that tests and examples should probably be kept separately. Examples are intended to be evaluated and displayed for help as a part of the documentation, while tests are designed to prevent errors in the code and often need to be designed for that purpose. 'Not equal' is only one of many assertions one may wish to make about output, and as noted before, many things have side effects that are not so easily compared.
Merging the concept can be helpful, but requires more of the people making the examples in the first place. Also, just because a code snippet makes a good example does not mean it makes a good test, and vice versa.
I would prefer a solution where 'examples didn't include any predefined results at all, and they were all just evaluated during help. If desired, someone working with the unit test suite could write code that leveraged the examples, but it wouldn't be necessary. That way we could use good illustrative examples that may not make good tests, and good thorough tests that may not make good examples.
Yeah you may well be right. Is it really not useful to also be able to see the result of an example call right there next to the code?
I certainly agree that the vocabulary of matchers is incomplete. And if we can't find a small basis set of them, this whole second alternative to tests start to seem inelegant.
Perhaps we should just tag some tests to be shown in help and call it a day. Though I like seeing the tests right next to each function. That seems useful even if we rip out the online help altogether. In fact, I started out writing examples when the macro was a noop. We could just inline all the tests, but then it would be overwhelming to have 20 tests for each function.. Ok, I'll stop rambling now. Summary of use cases:
a) Getting feedback that something broke when we make a change. (Unit tests currently do this.)
b) Seeing some examples as we read the code.
c) Online help at the repl.
d) Online help in the browser. (like kens's /ref/)
I don't think including the tests alongside the code would help much; many tests are rather complicated, involving set up, take down, and more complex assertions than just 'eq. Not that one couldn't understand what it meant, just that it's not as clear as an example.
I hadn't thought of the use case for wanting examples while perusing the code itself, but I must admit that I find that somewhat uncommon. I don't often know which file a function is defined in, and rarely need it given the help support we have in the repl. If I do want to look at the code, the 'src utility shows the source of a function. If I want to edit it, I often use the help utilities to find the file anyway. So having the results only available via the repl wouldn't bother me any.
C and D can get by with just evaluating the examples and showing the output.
No, unless you wanted to dynamically read the examples from the docstrings to evaluate them when the examples are queried, either directly or as part of help.
Actually, I don't know if the examples should be automatically displayed with 'help, or queried separately.
Either way, it would be nice to make them automatically evaluated, unless they can't be for whatever reason. It seems like that would be easiest to do with something like the existing examples macro, but if you think it would be doable with docstrings I guess that could work.
Ah, ok. So you don't care about being able to see them next to the function, but you would like some way to see them at the repl along with their results. Let me know if I'm still missing something.
(Sorry I'm asking basic questions. The discussion has ranged far enough that I just want to circle back to nail down precisely what you're suggesting among all the use cases and possible UIs.)
Well, that's what I'm suggesting. I don't see the other cases as essential, and the interface seems simpler that way. I'm lazy too, which means that if I'm making examples, I'd rather not have to provide the results. Not that I couldn't, but I'd like the option at least to do otherwise.
As always, you don't have to change anything just to meet my opinions. I'm used to being in the minority, in fact.
Maybe nobody else cares. It's a pretty small community at this point anyway, and I doubt everyone checks daily. I know I've gone through long periods without checking.