Arc Forum | > How about warning when you assign to a variable that existing code depends...

Arc Forum

2 points by waterhouse 5620 days ago | link | parent

> How about warning when you assign to a variable that existing code depends on?

Sometimes you do that on purpose, though. E.g. you define a function that relies on a second function, and define the second function later. (rocketnia gives an example, but I'll proceed with this one anyway.)

  > (def mod-expt (a n m)
      (fast-expt-* a n 1 (fn (x y) (mod (* x y) m))))
  #<procedure: mod-expt>
  > (def fast-expt-* (a n one *)
      (xloop (a a n n tt one)
        (if (is n 0)
            tt
            even.n
            (next (* a a) (/ n 2) tt)
            (next a dec.n (* a tt)))))
  Warning: ...

Perhaps you could make "def" smart so it wouldn't set off the warning. What if you happened to give a function the same name as an unquoted symbol, though? Maybe you'd be careful not to do that. And what if you used a global variable that you planned to define later? The warning would be inappropriate. Perhaps you'd learn to ignore it. Or perhaps you could name your global variables in a certain way and have the warning thing recognize it. And tell everyone who uses Arc to name global variables the same way, or to at least come up with a naming scheme that can be mechanically understood by the warning thing--and to stick to it. Bah humbug.

> The proposed changes are backwards-compatible with Arc 3.1, since all they attempt to do is provide sensible defaults for things that presently raise errors.

It is nice when you can introduce something without breaking old things. However, I think this thing is bad: it's fragile and shallow, and I think most programmers would just not use it and resent the time it took to understand it.

Imagine if, say, whenever mathematical operations (e.g. sqrt) were called with a list argument, then, instead of throwing an error, the function was instead applied to the car of the list. (Mapping it over the list is more likely to be useful; applying it to the average is also possible.) Or if the expression (a < b) evaluated to (< a b) when "a" evaluated to a number or anything else passable to the < function. Or if, whenever you used the variable "it" inside a then-expression in a call to "if", and "it" was otherwise unbound, it bound "it" to the if-expression (as in "aif")? Or why not all of these at once, and more?

In principle, you might be able to ignore extra little "features" like this. I think it'd annoy me, though--in the case of "sqrt" et al. being applied to lists, I'd probably think about it every time I dealt with math and lists (which I do a lot). It adds one more case to deal with to mentally evaluate any mathematical function call. The best thing that could happen is that I'd never use it, or encounter it in anyone else's code, and my mind would freed of the impulse to worry. But even if it was never used in correct code, I'd still have to think about it whenever I made a mistake and had to diagnose a problem.

By the way, perhaps you are just looking for something you'd use at the REPL, instead of a new language feature. Maybe a REPL with capabilities like Clisp's:

  [1]> (+ 1 achtung)
  *** - SYSTEM::READ-EVAL-PRINT: variable ACHTUNG has no value
  The following restarts are available:
  USE-VALUE      :R1      Input a value to be used instead of ACHTUNG.
  STORE-VALUE    :R2      Input a new value for ACHTUNG.
  ABORT          :R3      Abort main loop
  Break 1 [2]>

It'd make it easy to put in the symbol as the value of the unbound variable. You could tweak it so that would be the default option, and you'd just have to press return again or something; or you could even make it set the unbound variable to the symbol by default. This would be on your customized REPL, of course. :-P

> Not any more insidious than the bugs you can get from an abuse of unhygienic macros or mutation though.

There's an entire style of programming devoted to minimizing and isolating mutation, and languages exist which try to disallow it entirely. There has been a lot of work done about trying to implement hygienic macros. But these things are useful enough that it's difficult to get rid of them entirely (mutation more so; many languages don't have macros at all). This idea seems it would make every variable reference (in foreign code) and every global assignment (in code you write) a potential headache, and the payoff seems to me almost zero.

1 point by evanrmurphy 5619 days ago | link

Hmm... it seems that each of us has certain conveniences we want and certain sacrifices we're willing to make in order to allow for the conveniences. But which things are the conveniences and which are the sacrifices is different depending on our personal preferences. Taking your example:

> Sometimes you do that on purpose, though. E.g. you define a function that relies on a second function, and define the second function later. (rocketnia gives an example, but I'll proceed with this one anyway.)

  > (def mod-expt (a n m)
      (fast-expt-* a n 1 (fn (x y) (mod (* x y) m))))
  #<procedure: mod-expt>
  > (def fast-expt-* (a n one *)

I would have always defined fast-expt-* before mod-expt and not the other way around. I think it was aw's essay on linearizing code dependencies [1] that finally persuaded me this is a good guarantee to have. When I'm reading my code, I value the confidence of knowing that everything below line n is unnecessary for getting everything above line n to work [2].

I can't remember the last time I intentionally wrote code that didn't conform to this principle. And if you're willing to write your code this way, then it eliminates the largest problems you all have identified with quote inference. But you and rocketnia seem to place value on being able reference functions before they've been defined. So while I would be willing to trade that ability for implicit quotation, it seems you would prefer the reverse.

Have I misrepresented your views here?

---

[1] http://awwx.posterous.com/how-to-future-proof-your-code

[2] Where line n is any point in your code that's not inside of a definition.

-----

3 points by waterhouse 5619 days ago | link

I do usually put definitions of dependencies first (fast-expt-* is in fact before mod-expt in my Arc file), but sometimes I shuffle my code around, and I'd be annoyed if it complained when I did that. I like having the freedom to do it, though I don't use it all that much. But mutually recursive functions are a good example too--it's impossible to have both functions come after their dependency.

> But you and rocketnia seem to place value on being able reference functions before they've been defined. So while I would be willing to trade that ability for implicit quotation, it seems you would prefer the reverse.

It is true that I'd prefer being able to permute my definitions over having implicit quotation, but that's by far not my only reason for disliking the idea. As I said, it's a fragile, shallow add-on feature that would confuse me (by giving me weird results for erroneous code) and that would make code harder for me to reason about (whenever I see a variable reference, either that refers to something that's been defined, OR it refers to a symbol! and I can establish the latter only by ensuring that it's not defined anywhere).

By the way, why do you think this feature is a good idea? I find two quotes that seem to suggest your reasoning:

1. "My sense is that something like this would rate highly in both complexity of implementation and convenience for programming." How do you get this sense? Do you write a lot of code that uses quoted symbols? Let's take a count in my big fat Arc file:

  $ grep -o "'" a | wc -l
     107
  $ egrep -o "[^' ()]" a | wc -l
   31362
  $ egrep -o "[a-zA-Z-+$*/]+" a | wc -l
    8261

Even discounting whitespace and parentheses, quotes account for about 0.3% of the characters I type. If we count symbols, about 1.25% of the symbols I use are quoted. Are you working on something that uses a bazillion quoted symbols--symbols which would have to be quoted individually (e.g. the list '(a b c d) just requires one quote)?

2. "I just think we're missing out on such valuable real estate here!" See the paragraphs in my grandparent post beginning with "Imagine if" and "In principle". Tacking things on just because you can isn't a good idea.

-----

2 points by evanrmurphy 5619 days ago | link

> But mutually recursive functions are a good example too--it's impossible to have both functions come after their dependency.

You can do it by extending one of them:

  (def foo ())
  (def bar () (foo))
  (extend foo () t (bar))

I'm not saying this is a better way to define mutually recursive functions. Just pointing out that it's possible.

> add-on feature

Not sure what you mean by "add-on". Is xs.0 list access an add-on? This is whatever that is.

> why do you think this feature is a good idea?

It could unify the "." and "!" ssyntaxes to some degree by allowing table access with h.k instead of h!k in most cases. alists that you presently have to express with '((x 1) (y 2)) could be contracted to (x.1 y.2).

I haven't worked out all the useful applications of this yet but I'm finding it interesting and think there's some potential.

> fragile, shallow [...] Tacking things on just because you can isn't a good idea.

I'm generally disappointed by your flamey response to my interest in exploring a core language possibility that could make arc programs shorter. Your ideas and even complete disagreement are very welcome, but your overall tone is insulting. Perhaps I misread you.

-----

4 points by waterhouse 5619 days ago | link

> I'm generally disappointed by your flamey response... your overall tone is insulting.

I'm sorry, my intent wasn't to insult you. (I'm glad you explained that, though.) I thought my words were clear. Let me explain:

I called the idea "fragile" because it would be easy to break code that depended on it--just by defining a new variable. You suggested a thing that would warn upon defining a previously-used-as-unquoted-symbol variable, but rocketnia and I brought up cases where the warning would be a false positive. I considered a more sophisticated warning system--one that had some kind of mechanical procedure for guessing whether an unbound variable was supposed to be an unquoted symbol or a function to be defined later--one that required the programmer to follow one naming scheme for unquoted symbols and another for functions to be defined later. My conclusion was that for this system to work, the programmer would basically have to tiptoe around it and be very careful, or else things would break. Hence, I thought the word "fragile" was appropriate, and used it.

I called it "shallow" because the maximum benefit, in the best possible case, is that you don't have to type the ' character most of the time. Contrast this with, say, learning to use lists when you're used to naming every single variable. Not only does it become easier to, say, compute the sum of the cubes of five numbers, but you can write a single function to compute the sum of the cubes of any number of numbers! And then you can make matrices--again, referencing them with a single variable--and writing a single function to deal with n x n matrices for all n! Without lists or other compound data structures, it'd seem really hard and annoying just to deal with 2 x 2 matrices, and 10 x 10 would seem impossible. There are deep and rich benefits to using compound data structures. But the only thing this unquoted symbols idea can possibly be good for is letting you omit the ' character; I therefore thought the word "shallow" was a good descriptor.

Regarding "Tacking things on just because you can isn't a good idea". Your motivation seemed like it might be, at least partially, something like this: "We can get strictly greater functionality out of Arc if we take some things that are errors and assign them a meaning in the language. Therefore, we should do it. Let's start looking for error cases we can replace with functionality!" Your comment "I just think we're missing out on such valuable real estate here!" added weight to this interpretation. And so I attacked it with a reductio ad absurdum, giving several examples of how one might "add functionality" in this way. I hoped to show that the line of reasoning "You get strictly greater functionality, therefore it can't be a bad idea" was wrong. I summed it up by saying "Tacking things on just because you can isn't a good idea."

> I'm not saying this is a better way to define mutually recursive functions. Just pointing out that it's possible.

And there are other ways to do it[0]. But it would be impossible to do it by just writing (def foo () (bar)) and (def bar () (foo)) and putting them in the right order. Hence, this idea would make such programs more complex/verbose. (Eh, perhaps you could set up a warning system and teach it to recognize mutual recursion. I think learning about this would distract the programmer somewhat, which isn't by itself a deal-breaker but is an undesirable aspect. I suspect there are more cases yet to be covered; and you'd still have to order your definitions properly--I don't think a compiler without psychic abilities could always tell what you were going to define later; and even if you were warned when you made a function with a conflicting name, it'd be annoying to have to give your function a new name, or to change the code that used that name.)

> alists that you presently have to express with '((x 1) (y 2)) could be contracted to (x.1 y.2).

Incidentally, I do find it annoying to type out a lot of such things, and I have a routine for dealing with that. Perhaps you'd find it sufficient? So whenever I want to make a big alist, I do something like this:

  (tuples 2 '(Jan 1 Feb 2 Mar 3 Apr 4 May 5 Jun 6
              Jul 7 Aug 8 Sep 9 Oct 10 Nov 11 Dec 12))
  ;instead of '((Jan 1) (Feb 2) ...)

Note that I've redefined "(tuples xs n)" as "(tuples n xs)". It is much better this way. :-} I could also use "pair", I suppose.

Oh, and, by the way, if ssyntax were implemented at the reader level--which I think it should be; I think the current situation is just a hack that will be changed eventually--you could write '((x 1) (y 2)) as '(x.1 y.2). [I see you address this in a sister post.]

And this example suggests that your intent goes beyond merely having unbound symbols evaluate to themselves. In my post I cite at the top of this thread, I addressed problems with trying to have ((x 1) (y 2)) evaluate as '((x 1) (y 2)).

> make arc programs shorter.

That is a worthy goal, one I'd forgotten about. It's good that you brought it up. I suppose the shortness of a program is kind of a good static measure, whereas objections like "It'd confuse me" are usually only temporary, and the programmer gets used to it.

But I do believe that a) using it would create either horrible risks of breaking things or annoying false-positive compiler warnings, b) therefore I'd never use it, so it wouldn't actually make my programs any shorter, and c) it would, inevitably, make debugging harder--instead of UNBOUND-VARIABLE errors I'd get diverse results, depending on precisely what happens when a symbol is put in the place of the unbound variable.

Now, (c) also applies to having xs.0 list/table/string reference work. But (a) and (b) don't. I do use it[1], and relying on it doesn't cause any problems like the fragility I've described. And the payoffs are pretty good. Many things are significantly shorter--e.g. m.i.j for reaching into a matrix, instead of (aref m i j) or, worse, (vector-ref (vector-ref m i) j).[2] The error-obfuscation objection still applies, but I think the benefits override the objection.

[0] I was thinking you could define them in the same lexical context:

  (with (foo-val nil bar-val nil)
    (= foo-val (fn () (bar-val))
       bar-val (fn () (foo-val))
       foo     foo-val
       bar     bar-val))

[1] In fact, arc3.1 doesn't even provide "hash-ref" or "string-ref" functions, so you kinda have to use (x n). "list-ref" at least could be implemented by the user in terms of car and cdr.

[2] I was going to add: "And since I don't need to specify the type of the data structure, sometimes I or my code can forget that detail. I could change matrices to be implemented as nested hash tables or vectors, and m.i.j would still be correct." However, this part could be done with a unified "ref" function that reached into all data structures.

-----

2 points by evanrmurphy 5617 days ago | link

Thanks very much for clearing all of this up. :)

My reply has been delayed because I wanted to respond to a lot of your points in detail. I haven't found time to do that yet, but I thought I should at least say this before waiting any longer.

-----

1 point by evanrmurphy 5619 days ago | link

> alists that you presently have to express with '((x 1) (y 2)) could be contracted to (x.1 y.2).

Is there a general interest in moving ssyntax functionality to the reader? I've found some past discussions about it but wanted to know the present sentiment.

If ssyntax was processed at the reader-level, it would help in this particular scenario because then '(x.1 y.2) could evaluate to ((x 1) (y 2)) instead of (x.1 y.2).

-----

3 points by aw 5619 days ago | link

Is there a general interest in moving ssyntax functionality to the reader?

In the Arc runtime project, that was my assumption behind my choosing my matching library to implement the reader in Arc. The matching library is way more powerful than what would be needed to simply replace the Racket reader as-is; the goal is that when people want to experiment with different kinds of syntaxes or with extending ssyntax to work in more cases it will be easy to do.

-----

1 point by evanrmurphy 5619 days ago | link

That's good to know, thanks. :)

-----

2 points by rocketnia 5619 days ago | link

Yeah, I like being able to reference a function before it's defined. (Macros annoy me a little for not allowing that.) For me it's a matter of the concept of "definition" being an ambient thing, where something counts as being defined if it's defined anywhere, even later on. It's like how, in a mathematical proof or prose argument, a broad claim may be reduced into a bunch of littler inferences, some handled one-by-one systematically and some left to the reader to fill in. I've read (or tried to read) a bunch of mathematical papers or books that start out building lemma after lemma and climax in a theorem, and those might even be in the majority, but sometimes I have to approach them backwards, and then I have to backtrack to figure out what their terminology means, and it's pretty frustrating.

In education, lots of the time new topics are built upon the foundations the old topics provided, but sometimes they're built upon established motivations and provide all-new foundations, like an analysis course justifying the calculus courses that came before it, or a mechanics course casting Newton's laws in a new light.

For me, the motivation comes pretty early on relative to the implementation. I could decide to put the main control flow algorighm at the top to set the stage for the rest, or I could decide to arrange things according to the order they'll be applied--or in fact I might like having them in the reverse order, the order in which they're needed to get the result from more and more convenient starting positions. That last strategy is probably closest to dependencies-come-first coding, but I don't want to be limited to it, even if I risk choosing a frustratingly haphazard strategy.

-----

1 point by evanrmurphy 5619 days ago | link

Nice summary of the different approaches. ^_^

One reason to favor the dependencies-first approach in Arc is that we have mutability.

If you're only doing single assignment and side effect-free programming, then your code doesn't have a significant order [1]. But insofar as your program is imperative and performing mutations, the order is significant.

A consequence of this is that if you want to be able to take advantage of imperative features, you're making it harder by ordering your code any other way. I say this because even if your code is purely functional right now, when you try to insert some imperative code later, the order is going to start mattering more. And it's going to start seeming tangled and confused if it doesn't build up in the order of execution (at least it does for me).

So dependencies-first programming plays especially well with imperative code. I'm also particularly interested in it at this moment because I'm working on a refined auto-quote mechanism that could be hard to take advantage of if you're not programming this way. ;)

---

[1] Except for the macros wart you alluded to.

-----

1 point by akkartik 5619 days ago | link

Yeah, I agree: I like to see the 'business end' of code up front. aw's article made some good points I'm still mulling over[1], but upgrading things seems like such a rare event compared to the day-to-day use of code. Especially if I manage to keep up my resolution[2] to never rely on any libraries :)

---

[1] http://github.com/awwx/ar now keeps tests in a separate file. Does that weaken the case for defining things before using them? Perhaps you could define your tests bottom-up but write your code top-down, or something.

I still want to try out a test harness that analyzes dependencies and runs tests bottom-up: http://arclanguage.org/item?id=12721. That way you could write your tests in any order and they'd execute in the most convenient order, with test failures at low levels not triggering noisy failures from higher-level code.

[2] http://arclanguage.org/item?id=13219

-----

3 points by aw 5619 days ago | link

http://github.com/awwx/ar now keeps tests in a separate file

Not by design, as it happens. I wrote some new tests for code written in Arc, and stuck them into a separate file because I hadn't gotten around to implementing a mechanism to load Arc code without running the tests.

Though I do view writing dependencies-first as a form of scaffolding. You may need or want scaffolding for safety, or because you're working on a large project, or because you're in the midst of rebuilding.

Does that mean that you always need to use scaffolding when you work on a project? Of course not. If you're getting along fine without scaffolding, then you don't need to worry about it.

Nor, just because you might need scaffolding in the future, does it mean that you have to build it right now. For example, if I had some code that I wanted to rebase to work on top of a different library, and it wasn't in dependency order, and it looked like the rebasing work might be hard, I'd probably put my code into dependency order first to make the task either. But, if I thought the rebasing was going to be easy, I might not bother. If I ran into trouble, then perhaps I'd backtrack, build my scaffolding, and try again.

-----

1 point by rocketnia 5619 days ago | link

Especially if I manage to keep up my resolution to never rely on any libraries :)

I have effectively the same resolution, but only 'cause of Not Invented Here syndrome. :-p Nah, I use plenty of libraries; they just happen to be the "libraries" that implement Arc. I use all kinds of those. :-p

---

http://github.com/awwx/ar now keeps tests in a separate file. Does that weaken the case for defining things before using them?

That file is loaded after the things it depends on, right?

---

...you could write your tests in any order and they'd execute in the most convenient order, with test failures at low levels not triggering noisy failures from higher-level code.

I'm not sure I understand. Do you mean if I define 'foo and then call 'foo in the process of defining 'bar (perhaps because 'foo is a macro), then the error message I get there will be less comprehensible than if I had run a test on 'foo before trying to define 'bar?

---

In any case, aw's post mostly struck me as a summary of something I'd already figured out but hadn't put into words: If a single program has lots of dependencies to manage, it helps to let the more independent parts of the program bubble together toward the top, and--aw didn't say this--things which bubble to the top are good candidates for skimming off into independent libraries. If you're quick enough to skim them off, the bubbling-to-the-top can happen mentally.

Lathe has been built up this way from the beginning, basically. It's just that the modules are automatically managed, and it acts as a dependency tree with more than one leaf at the "top," rather than something like yrc or Wart with a number on every file.

I'm interested in making a proper unit test system for Lathe, so we may looking for the same kinds of unit test dependency management, but I'm not sure yet about many things, like whether I want the tests to be inline or not.

Well, Lathe has an examples/ directory, which I've ended up using for unit tests. It's kind of interesting. Lathe's unit tests have become just like its modules over time, except that they print things to tell you about their status. Being a module, an example automatically loads all its dependencies, and you can load it up and play around with the things defined in it at the REPL, which is occasionally useful for debugging the example itself. But it's pretty ad-hoc right now, and I don't, for instance, write applications so that they load examples as they start up, like you might do.

-----

3 points by akkartik 5619 days ago | link

"Do you mean if I define 'foo and then call 'foo in the process of defining 'bar (perhaps because 'foo is a macro), then the error message I get there will be less comprehensible than if I had run a test on 'foo before trying to define 'bar?"

If bar depends on foo (foo can be function or macro), and some tests for foo fail, then it's mostly pointless to run the tests for bar.

---

"That file is loaded _after_ the things it depends on, right?"

Yeah well, you gotta load code before you can run the tests for it :)

My understanding of aw's point was this: if you load your code bottom-up, then you can test things incrementally as you define them, and isolate breakage faster. Defining the tests after their code is irrelevant to the argument because it's hard to imagine an alternative.

If you put your tests in a separate file and run them after all the code has been loaded, you can still order them bottom-up. So to answer my own question, no, keeping the tests in a separate file doesn't weaken aw's argument :)

-----

3 points by aw 5619 days ago | link

There is a small difference: if you've loaded only the code up to the point of the definition which is being tested when you run the test (either by writing tests in the same source code file as the definitions, or by using some clever test infrastructure), then you prove that your definitions aren't using anything defined later.

Of course you can probably tell whether code is in prerequisite order just by looking at it, so this may not add much value.

-----

1 point by aw 5619 days ago | link

whether I want the tests to be inline or not

Something I've been thinking about, though I haven't implemented anything yet, is that there's code, and then there's things related to that code such as prerequisites, documents, examples, tests, etc. The usual practice is to stick everything into the source code file: i.e., we start off with some require's or import's to list the prerequisites, doc strings inline with the function definition, and, in my case, tests following the definition because I wanted the tests to run immediately after the definition.

But perhaps it would be better to be able to have things in separate files. I could have a file of tests, and the tests for my definition of "foo" would be marked as tests for "foo".

Then, for example, if I happened to want to run my tests in strict dependency order, I could load my code up to and including my definition of foo, and then run my tests for foo.

-----

1 point by akkartik 5619 days ago | link

"the tests for my definition of foo would be marked as tests for foo."

In java or rails each class file gets a corresponding test file in a parallel directory tree. I find it too elaborate, but it does permit this sort of testing classes in isolation.

-----