* the original version of Norvig's spelling corrector is written in Python, is 21 lines, and can be found at http://norvig.com/spell-correct.html, w/ lots of explanations and test materials.
Will look at it in details this evening, but for now:
* not sure if this changes anything, but in Norvig's edits1() version:
>>> edits1("python")
# the var `s' is [('', 'python'), ('p', 'ython'), ('py', 'thon'),
('pyt', 'hon'), ('pyth', 'on'), ('pytho', 'n'), ('python', '')]
But in your Arc version, the result of (map [split ... (- (len word) 1)) omits the last ("python" "")
arc> (def e (word)
(map [split word _] (range 0 (- (len word) 1))))
#<procedure: e>
arc> (e "python")
(("" "python") ("p" "ython") ("py" "thon") ("pyt" "hon") ("pyth" "on")
("pytho" "n"))
arc> (def e (word) ; "Fixed", making it even shorter :-)
(map [split word _] (range 0 (len word))))
*** redefining e
#<procedure: e>
arc> (e "python")
(("" "python") ("p" "ython") ("py" "thon") ("pyt" "hon") ("pyth" "on")
("pytho" "n") ("python" ""))
Makes me think, (- (len xs) 1) is a common idiom in languages w/ zero-based indexing.
Makes me think, maybe 'range should behave like in Python, i.e: if there is only one arg, the `start' is implicitely 0.
---
EDIT: oh my god, there are links to the impls of this spelling corrector in other languages on Norvig's page, and the Perl one is 63 lines (!) and is written in Perl6. I suppose I'd not be able to sleep well before trying to make something shorter (and in Perl5). A Perl program 3 times longer than its Python equivalent, never seen that before!
EDIT2: heck, thinking of it, the list-comprehension and special indexes of Python (word[:i]/word[i:]) may actually not be easy/short to do in Perl...
But. Both this version & pg's one are actually buggy. Obvious example:
arc> (correct "yellow")
"fellow"
Canonical version gives the expected answer, "yellow". So I'd say, no, these are not correct implementations (the Perl version seems to always give the same results than the Arc one).
A thing I have remarked is that the list returned by 'edits{1|2} contains the original value (i.e: (find "python" (edits1 "python")) is t), where this is not the behaviour of Norvig's version. (That's also why the loop in my correct() isn't (for ($win, &edits1($win), &edits2($win)), in contrary to the Arc one).
And this may be incorrect. Or maybe, if $win/word is in nwords, 'correct should stop (immediately). This would - at least - correct the "yellow" -> "fellow" problem.
This is, however, not the only issue. Norvig's version and these versions, given the same "big.txt", don't give the same correction for other words (try "speling", "godd"). And I strongly suppose that Norvig's version is the most correct.
(def known (words) (dedup:keep [nwords _] words)) ; lines count is now 12
(def correct2 (word (o f [nwords _]))
(most f
(or (known:list word) (known:edits1 word) (known:edits2 word) (list word))))
Or:
(def correct3 (word (o f [nwords _])) ; don't need 'known, but require aspirin
(most f (or ((apply orf (map [fn (w) (dedup:keep [nwords _] (_ w))]
(list list edits1 edits2)))
word) (list word))))
arc> (correct{2/3} "yellow")
yellow
arc> (correct{2/3} "godd")
good
arc> (correct{2/3} "speling")
spelling
arc> (correct{2/3} "unnkknnoown")
unnkknnoown
Not exactly the same results than Norvig's version (>>> correct("speling") -> "sling" / "godd" -> "god") but I tested the Ruby version linked on his site, and it yields the same results. Note that the result for "speling" is not really good in canonical version. Maybe it's because the order on Python sets is different from the one in Ruby/Arc lists. I should port the test program of Norvig to stop worrying, but it's OK for now. For now, let's say this version is better than the Norvig's one (!!!)
> "+" does mean "add" to the majority of people on earth - as its common knowledge, and thus it's intiutive. Car means "something one drives" to almost everyone.
Because we couldn't think of any better alternatives. It would have been misleading to use first and rest or head and tail, because conses are fundamentally pairs; lists are one thing you can build with them, but not the only thing.
There's no conventional name in English for the first and second halves of a pair. If you have to make up names, car and cdr are pretty good choices, because they're short and the same length and naturally composable (e.g. cadr).
Basically, surreal numbers are a pair of two (surreal) numbers sets, the left one, and the right one.
0 is (() . ()), (cons () ()), where () is the empty set.
1 is (() . <surreal 0: (() . ())>)
-1 is (<surreal 0> . ())
It doesn't make sense to get the right set by saying (rest <surreal 0>), because it's just not the "rest", it's just the... cdr, i.e: the second halve of the 'cons. I mean conceptually, it's not the "rest".
2. Car means "something one drives" to almost everyone. No, no, NO. Ask, for instance, the 1+ billion Chinese people if "car" means anything to them. Just stop the fucking English-centrism. Will English be the proeminient language in 100 years? Is programming reserved to English-native-speakers? Is it good to mimic human languages in programming languages (hint: Algol VS C syntax)?
I love car/cdr because they don't mean anything.
Well actually, yes, they mean/meant something: content of {address|decrement} registry. But, in my view, they have just turn into arbitrary-chosen names for basic operations, like "+" is the arbitrary name of the "addition op" in our current arithmetic. You just learn that +/car means "this basic op", it becomes common knowledge, and you're done. I don't know, it's not even a name at this point, just the representation symbol.
I love APL (http://en.wikipedia.org/wiki/APL_(programming_language)) because it uses just symbols for ops. I love Perl's $_ / $| / $\ / etc. special vars because they don't mean anything by themselves, etc. I don't know, it just makes programmers from any country equals when it comes to speak to a computer. I love maths because nearly all the maths stuff use arbitrary-chosen "names", making people equals when it comes to speak to the Universe.
1. So is "last" not a pair too ? [edit] I see last does mean the last even when a pair. Where as rest may not. I'll check this out. In the end it just may be worth while to include rest and next as extra functions for the sake of language adoption? At an abstract level "rest" and "next" do accomplish what most expect it should.
>"No, no, NO. Ask, for instance, the 1+ billion Chinese people if "car" means anything to them. Just stop the fucking English-centrism."
I am willing to bet if you asked that question to every person on earth "what is car?" that statistically, the answer would represent my statement, even if by merit that some languages do not have a meaning for "car" and English, in my opinion, is more widely adopted. I could be wrong, but I don't feel bad about making that statement.
[Edited - deleted some of my own BS comments :) - thank you pg for having edit time]
Still- Maybe I should have written it this way:
"Car does not mean first to almost everyone"
>"Will English be the proeminient language in 100 years?"
Not sure. I think the world we be a much better place if the average world citizen spoke multiple languages.
And you never know maybe in doing so new languages could emerge and maybe there will be a "most powerful language" yet maybe it will not be widely adopted, or maybe it will.
We shall see :) (Edit - Opps - I just realized we will not)
Also I believe everyone is overlooking one point.
Scheme is already more powerful, yet not widely adopted.
I don't believe pg's goal in building arc is to make a scheme even more powerful. I'd like to think that pg acknowledges that language adoption should be an equally important factor to arcs' potential success.
About 'last. Conses are a powerful, general data type. One way you could use them is to build another little-less-general data type, a subset of them, a "list", where (list 'a 'b 'c) is (cons 'a (cons 'b (cons 'c nil))).
Then you define some functions to operate (exclusively) on this new "data type", and 'last is one of them. Exclusively: (last (cons 'a 'b)) doesn't work, because it'd have no meaning to use 'last here. '(a . b) is not a list, just a cons.
Similarly, one way you could use lists is to build a-little-less-general abstraction, association lists. Then you define 'alref/etc. to use with this new data type. But (alref (list 'a 'b 'c) 'a) doesn't work, since it has no meaning in this context.
But 'car/'cdr, they operate on raw conses, not only on lists. And I said before, IMHO, 'rest is not good, because you're assuming by using this name that it is an operator for lists only. Yes, (rest '(a . b)) would work, but its meaning is crappy here (IMHO).
But you're making an interesting point. Maybe 'rest/'first should be also included, as synonyms for 'cdr/'car because actually people almost always use 'car/'cdr when working with lists, and in list context, their meaning may be easier to understand. But this is another debate.
---
About the meaning of 'car. Sure, if you're programming in one of the current dialects of Lisp, you have some basic notions of English, even if you're Chinese. Yet, when reading some Lisp code, if I see 'car I just know it means "first halve of the pair". I mean, before you pointed out that "car" means "a vehicle" most of the time, I didn't ever think about this (valid) point when reading/writing Lisp code. That's why I don't see it as a problem.
I don't know, I suppose it's because the context is important. When I see 'map in a Lisp file, I don't think about a (geographical) map. When I see 'cons I know it's not as in "pros & cons". When I see 'table, I know it's not about a dining table. When I type `cd' in my shell, I know it means "change directory" and not "compact disc", etc.
Yes, if you ask a random guy in the street what "car" means he'll answer you "a vehicle" and not "the first halve of the pair". But it's the same problem if you ask it what "cd" or "table" means. There are ton of issues like this, even in natural languages. For instance, homonyms: a tire is both a car wheel and the feeling of fatigue.
---
About language adoption. It seems pg knows it is important: So whether or not a language has to be good to be popular, I think a language has to be popular to be good. And it has to stay popular to stay good. (http://paulgraham.com/popular.html)
OTOH, remember Arc is made for "hackers" and to be adopted by "hackers" (at least at first). And there is the "100 years" idea (i.e: no rush in adoption may not be a problem). So maybe pg's idea of "language adoption" is not the same than yours ;-)
---
> (Edit - Opps - I just realized we will not)
Funny yet true one :-)
I always felt a little uncomfortable about the "100 years" idea. I keep saying to myself it's just a catch phrase for a decent goal, trying to design something timeless. But still, it also reminds me of "I intend to set up a thousand-year Reich and anyone who supports me in this battle is a fellow-fighter for a unique spiritual — I would say divine — creation". You know, the fooliness to not want to live in the current, real, impure world.
> "I don't know, I suppose it's because the context is important. When I see 'map in a Lisp file, I don't think about a (geographical) map. When I see 'cons I know it's not as in "pros & cons". When I see 'table, I know it's not about a dining table. When I type `cd' in my shell, I know it means "change directory" and not "compact disc", etc"
Absolutely and it's "because the context is important" that I made the suggestion. I don't really care that it's called "car". I'm just suggesting that IF you're going to have a function called "last" why wouldn't you create "first", as the user will intuitively try to use it given that last exists.
You make a powerful point about cons cells being a more fundamental type than lists. I think if we really wanted english language words to describe the parts of a cons, "left" and "right" would be appropriate for what a cons is in isolation, but would unfortunately be meaningful only if you're using conses to represent binary trees. The beauty of a cons is that it's the smallest possible structure out of which one can build arbitrarily larger composites. And I'm not sure how valuable metaphors from the physical world are when contemplating abstractions - a cons is another degree removed from everyday reality than windows, buttons, dialogs, tabs, and menus are.
For outgoing connections: yep sure, as you can see you're not the first to want it ;-)
For select(): I, at least, would be a bit interested to see. It could be fun to write a srv.arc which uses it, the perf should be better (or not, it's a common misconception that select() will always and automatically make things faster + high-perf is not the goal for Arc anyway. select() should make the memory occupation lower, but to what I've seen MzScheme's threads are light enough memory is not really a problem). And with coroutines, maybe it could copy the vanilla srv.arc behaviour (maybe coroutines are not even needed, or not sufficient, I'm not a async IO expert enough to say. That's mainly why I'd be interested in having select(), I could try).
By copying the srv.arc behaviour I mean, when you're in a defop, what you print is sent to the socket, and when you return the socket is closed, which makes life easy.
Example:
(defop hello req
(prn "Hello!"))
Then go to localhost:8080/hello, and you'll see a blank page with "Hello!" written. Simple. As far as I know, async-IO-based code is often more complicated than that.
Last but not least, it's interesting because you'll certainly learn a lot writing it.
> How big is the chance of it being included in arc of the level of suckage is tolerable?
Fairly low. I mean, just for the heck of it, pg will not include (interesting) code. Your code would have to make the code of news.arc shorter or the perf of news.arc far better. Outgoing sockets, for instance, don't, this is certain.
Anarki is a kind of fork of vanilla Arc, and it includes, for instance, the outgoing sockets stuff by default. I don't know its codebase enough to say, but maybe there are other interesting network code. Anarki is at http://github.com/nex3/arc
Also, maybe browse/search (http://af.searchyc.com/, site:arclanguage.org <searched terms, i.e: async IO> in google) this forum.
Interesting. I had thought the purpose of HN was to push down on arc and make sure the language was capable, thus improving arc. Now it seems that arcs only purpose is make HN code work?
I'm seeing chickens and eggs fly by at hazzardous speeds.
Maybe I am reading this wrong?, but it seems to be a common response/statement.
I suspect that if you want to know about Arc's design goals, then pg's writings at http://www.paulgraham.com/arc.html is a better source than various random interpretations floating around on the forums.
Agreed. I do hope, however, that arc.v4.0 is structured in such away to move away from being news.arc centric. I hope pg chooses a different project to push down on arc. I think this would open the door to a fresh perspective.
Scheme (lambda args ...) is equivalent to Arc (fn args ...).
It allows to get the raw list of arguments, which may be empty (i.e: no arguments are passed).
So (if (pair? args) ((car args) h)) h) tests if args is a non-empty list (with pair?). If so, it uses the car of this
list as the initialiser. It calls it (((car args) h)) with the freshly created table, h, as parameter.
(An example of the "raw arguments list" behaviour in Arc:
arc> (def prnargs args ; /!\ no parents around "args"
(forlen i args (prn "arg n°" i ": " (args i))))
arc> (prnargs)
nil
arc> (prnargs 'a 'b)
arg n°0: a
arg n°1: b
nil
)
> ... removing table / obj > table ...
I believe 'obj is less used than 'table. 'table is more simple in that everyone knows what a (hash)table is, but an "object" is a fuzzy concept, as you pointed it out.
> but all kind of things are objects
Yep, and what most people call objects are more/less just a more/less "special" hashtable. That's why, I suppose, obj is called like that: just a synonym for table (although the function 'obj is != from 'table).
You can do a lot with an hashtable, and it's cool. Where is the problem?
(For info, Arc's templates also uses hashtables behind the hood, if I remember well. See also: Javascript objects.)
> I suppose what I'm saying is that it doesn't seem very useful to have table visible in Arc and in general
I agree with you that maybe having both 'obj and 'table is a bit confusing, and maybe too much. People may expect they are totally different things. And they are in a way redundant.
My quick idea would be to make 'table takes 0, 1 or more args. If 0, empty hash. If 1, it's an initialiser fn. If more it uses the args like 'obj does.
As I observed above (http://arclanguage.org/item?id=10528), table can't do this (unless we rotate names around, which would be a perfectly valid option), but here's a lightly-tested macro (with a terrible name) which exhibits your DWIMmy behaviour:
> I like your suggestion about changing to 'table to take variable numbers of arguments.
Then let's do it :-)
In ac.scm, we change the xdef of 'table, and the def of 'fill-table (which doesn't seem to be used anymore) to work with a normal list (and not an alist).
> Thanks much for your detailed and informative reply.
Thanks go to you for having a fresh view on the data structures of Arc, and questioning them. I was just "quoting the manual" and this is not hard.
----
To absz: thanks for mentioning "DWIM", I didn't know this acronym but this is exactly what I expect from a (programming) language. It is what differenciates a language from a (formal) notation IMO. Funny point, the (french) wikipedia article uses Perl as a DWIM-oriented language example :-)
Interesting that Thaddeus (http://www.arclanguage.org/item?id=10533) makes a mistake using 'obj / wants 'inst to work w/ 'obj. This comforts me in the idea that 'obj is confusing. People expects an "object" to do magic (inheritance, instanciation, etc), to be high-level, where they just expect an (hash)"table" to be a simple key/value store.
The DWIM definition of 'table given above would be even better if it could also take an alist as single argument, and use it to init the table (it's easily doable, but I'm tired now).
But maybe this DWIM stuff is crap because we're writing everything in Scheme and not in Arc.
Maybe (xdef new-table (lambda () (make-hash-table 'equal)), and then use the raw 'new-table and 'fill-table, 'listtab, 'tablist and co. to define a DWIM 'table in Arc.
Or maybe it's crap because DWIM is crap by nature and it's better to have (to know) 'table, 'obj, 'fill-table, etc.
What is sure is that it makes the "design work" easier, you can keep the 'table/etc. definitions shorter and cleaner in arc.arc/etc. What is less sure is if doing so (making the designer have the good life) is the way to have a pratical language for actual programmers.
I like your definition of table. The only advantage of 'obj is that keys don't need to be quoted. (Although sometimes that's a disadvantage too; the macro quotes them, so keys can't be determined at run-time)
> you can't modify the variable within the loop, which is something I meant to be possible
Yes useful feature, so maybe:
(mac for (v init max . body)
(w/uniq (gv gi gm)
`(with (,gv nil ,gi ,init ,gm (+ ,max 1))
(loop (assign ,gv ,gi) (< ,gv ,gm) (assign ,gv (+ ,gv 1))
((fn (,v) ,@body (= ,gv ,v)) ,gv)))))
?
Very lightly tested, only in the online repl, but seems OK although a bit ugly.
arc> (for i 0 10 (pr i " ") (++ i))
0 2 4 6 8 10 nil
arc> (do (for i 0 10 (thread:pr i " ")) (sleep 1))
0 10 8 6 4 2 9 5 1 7 3 nil
arc> (urldecode "80%25%20-+20%25")
"80% - 20%"
Anyway, it'd make the def of 'for more complex, less clean, and the perf a little bit worse.
> I believe the strange behavior palsecam discovered is actually correct. But if anyone wants to make the case that it shouldn't be, I'm open to being convinced.
I don't really care but I like to play the devil's advocate :-)
It's a bug for my brain. I'd sleep better at night if I knew I could
use 'for in any situation, even w/ threads. 1: Simpler. The less stuff I have to keep in mind (e.g: "oh right, and remember 'for is not thread-safe"), the better.
2: More robust. I like
to know I can "stand on the shoulders of giants" and that edge cases are handled correctly.
It's a bug because you call it "strange" and considered it as a bug
(and so do I). Maybe we're wrong and we can't see the real problem(s)
behing using threads in a loop construct, or maybe this behaviour is
just a free overhead that shouldn't exist, and we're right.
coconutrandom this is really awesome! You wanted something that was not possible with Arc (http://www.arclanguage.org/item?id=10092), and instead of complaining or anything, you actually make it possible!
Hats off, really, hats off to you! You are what I'd called a Real Man :-P
I'll take a look at the source as soon as possible to see if I can help for the refactoring, but I'm too busy this week-end.
People often suggest a bidirectional for, then demonstrate its readability on literal values -- and it does look good. The issue is that when you use variables or expressions (as is usually the case), it's not immediately obvious what the bounds (and thus behavior) of the loop will be. Hence, the separation.
Shouldn't 'forlen be used instead in your examples?
arc> (= xs '(a b c))
(a b c)
arc> (forlen i xs (prn (xs i)))
a
b
c
nil
But hey hey, too many loop constructs. If I hadn't `grep' Arc (see below), once again, I'd not know about it, too.
I don't agree with 'for_ being "buggy" or with the 'for/'down being an obvious/mandatory division. It's just a tradeoff in my opinion. But you're right, don't get me wrong, you're right there is a rational behind 'down.
The descendant loop being a separate concept because there are cases where it is "necessary", well... I actually, personally, don't buy this. I call "everyone repeats the lesson, and no one questions it".
It happens, it is rare but it happens, that I need a descendant 'for. And schtroumpf, I can never remember the syntax for it (in any language), I always need to Google it. But God knows I can remember a tremendous amount of details (last example in Arc: the need for 'write/'disp, understand it while coding evsrv, hop it's in my brain and it will not be forgotten. This kind of details, OK.). And because I think my brain is awesome, and only forget useless stuff (I remember "useful" things I saw when I was 5), well I don't buy the need for 'down.
For strange behaviours where bounds would be inversed, well, I always check the input where needed (general rule). If I'm doing "explanatory programming", well it's OK if this causes "bugs". It is far more OK than if I need to WTF and loose time, and my concentration, to start my browser, and ask Google how to 'down. And often, use 'forlen/'each. 'for is good for C.
And yes, all this is terribly arrogant. But I'm not alone in the "need to Google it every time".
The funnier is, I actually `grep'ed Arc files yesterday, and I'm nearly sure 'down could be removed without problems/'for_ adopted, according that you modify some things (like the def of 'forlen).
And I'm nearly sure it will be good, because Worse Is Not Always Better. The programmer should have the easy life (i.e: not having to remember 100 loop constructs), and not the {system|language} designer (which should make sure the def of say 'forlen is correct even for empty lists, even if it means adding an 'if or anything). I don't care too much if Arc.arc is a little bloated if it means I have less stuff to remember.
Where "good" here means, my definition of "good", and the crazy definition of "Would make Arc and news.arc shorter". Yes, I claim it'd actually make it shorter.
Unfortunately, ars longa, vita brevis, and I don't want to waste time to prove this rather useless point. But I'm nearly sure 'down is useless in this current small-not-so-small version of Arc + libs. Oh and schtroumpf, I add it to my ARROGANT_TODO list. Will demonstrate my point of view is at least very acceptable one of these days, so that you don't take me for a moron too much :-)
But of course, if you like 'for to be like this, I see no problem with this.
And thanks for taking the time to remind all this to me (because sincerely, one more time, I couldn't see why 'for in Arc couldn't go in descendant).
And anyway, 'for is so 70s. 'each, 'repeat are far more used. How many times do we use 'for directly (hint: something like 5 times in x.arc, macros definitions excluded because this doesn't count IMO, "worse is not always better", and most of the times when you are sure the bounds are ok, e.g: (for 0 255 ...))? And how many times do we use 'down (hint: once in x.arc)?!
----
Ultime arrogance, I'll quote Einstein here:
The important thing is not to stop questioning [the real need for 'down, even if everyone says so, when `grep' is far less convinced than people on this]. Curiosity has its own reason for existing.
Shouldn't 'forlen be used instead in your examples?
I'd say each should be used in the examples. The point is that they are easy instances of a more general problem, as I noted: when using for, you're typically using expressions; when you're using expressions, you aren't sure if the bounds will result in an ascending or descending loop. There are instances where this distinction is important. Take posmatch in strings.arc, defined as
(def posmatch (pat seq (o start 0))
(catch
(if (isa pat 'fn)
(for i start (- (len seq) 1)
(when (pat (seq i)) (throw i)))
(for i start (- (len seq) (len pat))
(when (headmatch pat seq i) (throw i))))
nil))
Here we see the else-clause for-loop isn't merely a place to substitute forlen or each: it only iterates up (by for's behavior) to the largest index at which the pattern could occur in the sequence:
That's fine, of course. It works. It just seems gratuitous -- like something I'd want handled for me already. But everyone has their definitions of "good", and yours is certainly no less (or more) valid than mine.
Hell, someone might like having the bidirectional loop in general, then use a separate loop ("up"?) for this case.
The programmer should have the easy life (i.e: not having to remember 100 loop constructs)
Whereas I think remembering 100 loop constructs is easier than remembering that the handful of loop constructs are incredibly fragile.
But of course, if you like 'for to be like this, I see no problem with this.
Nor do I see a problem if you want a bidirectional for. This is one use for macros: rather than worry that Arc doesn't have some loop construct, you're allowed to make your own. No need for the language spec to get updated if you can easily write a bidirectional loop. And if for was changed to be bidirectional, I could similarly write macros for ascending and descending loops.
As you say, this is just the rationale. But that's not saying much: by its very nature, language design is about rationale; the only "necessary" components of the language are basically the ones that make it Turing-complete.
Thanks my arrogance/guts for pushing me to try to remove 'down, because it showed me the Arc codebase confirms my own experience of programming:
- you never use 'for directly, but in cases where you are sure the bounds are OK.
Where "directly" means, not in a library {mac|fn} definition, because here you must anyway validate your input, if you agree w/ "Worse is not always better" (i.e: the {sys|lang|lib} writer does the hard work, not you, the user).
If you don't agree, well, one problem is, it leads to incoherences/bugs. See below.
The "problematic" (few) occurrences of 'for only appears in arc.arc and strings.arc which are typical librairies files. Not even "normal" librairies, but "core" ones.
The kind of ones were I'd strongly apply "worse is not better".
You'll not see 'for used with expressions in any other files, i.e: "application" (blog.arc, news.arc, etc.) or even other libs files.
You'll not even see it at all in news.arc, srv.arc, code.arc, prompt.arc.
You'll see it used directly twice, here:
blog.arc: (for i 0 4 ; no bounds pb
html.arc:(for i 0 255 (= (hexreps i ; no bounds pb
- you sometimes, rarely, also need to directly use a descendant 'for ('down). Only once in all Arc (but once = it is needed):
news.arc: (down id maxid* 1
Where maxid* is a global, and the kind of one which is nearer IMO to a litteral than to a (complex) expression, so no pb. See below.
So it's a pity that for this one time, you can't use 'for, and have to ressort using yet another loop construct that is here for... non-existing problems.
- for the vast, vast majority of looping, you use higher-level loop constructs (each/repeat/etc.), so there is no problem w/ incorrect bounds, assuming the lib writer is not a moron.
----
In arc.arc:
Is it coherent than 'posmatch will return
nil when pat > seq, where 'headmatch will
throw an error in the same case (even stranger knowing 'posmatch
actually calls 'headmatch)?
arc> (headmatch "abcd" "abc")
Error: "string-ref: index 3 out of range [0, 2] for string: \"abc\""
arc> (posmatch "abcd" "abc")
nil
Coherent, and correct IMO. We ask if it
matches. If pat > seq, the answer is just
"no", it's not an error per-se.
Or: how 'headmatch is "incredibly fragile",
and the so-called "solid" 'for hides this
fact here. Thanks pseudo-solidity.
Validate your input, and don't rely on the behaviour
of something inherently fragile (using a raw construct),
when writing a library fn.
In news.arc, I (obviously) changed:
(down id maxid* 1
to:
(for id maxid* 1
I feared it may not work when there are no item, tested this case
(nsv), then access localhost:8080, and there were actually no problem.
I don't use news.arc, so can't test for the rest, but it should be OK.
(If pb, maybe just changing to (for id maxid* 0 ...) would solve it.)
----
"You claimed it'd make the code shorter! Prove it!"
Clever, interesting test:
arc> (let toto 0
(each (k v) (tokcount '("arc.arc" "strings.arc" "news.arc"))
(++ toto v))
toto)
14756
arc-no-down> (let toto 0
(each (k v) (tokcount '("arc.arc" "strings.arc" "news.arc"))
(++ toto v))
toto)
14749
Harder, dumber, raw `wc' test:
$ wc -m 3.1orig/*.arc
[...]
198017 total
$ wc -m 3.1nodown/*.arc
[...]
198017 total # Argh, failed! It's ==, not strictly <...
----
No-down patch was coded quickly and with nearly no testing afterwards, so there might be bugs. I hope someone prouve me I've introduced lots of bugs, like this I could be sure all this crap at least makes someone take a look at the reality (where the reality is, here, some pratical code, and not some books), and try to question things. One thing Arc got very right is "code.arc".
And no, telling me "it is buggy for me" doesn't count without showing some Arc code, in where you'll be effectively embarrassed by the new 'for behaviour. Else it's like with hygienic macros: "incredibly less fragile" but no one cares 'cause unhygienic is good enough/more powerful, according you live in the real world.
And anyway it doesn't count because everyone here more or less accept the fact that the Arc codebase is a superb piece of software (so if you don't have the same coding practice, you suck), that brevity is power, and that it is a valid codebase to test the necessity of an operator. All of this IS questionable. But too many people here are... not qualified to do so, unless they are sure their comments history will not reveal some stupid blind adoration for Arc.
I trust {my|other people} guts & feelings, but on the end I believe only in reality, in data (and you know as well as me that code is data :-D), and not in opinions and books.
- you never use 'for directly, but in cases where you are sure the bounds are OK.
The "problematic" (few) occurrences of 'for only appears in arc.arc and strings.arc which are typical librairies files.
What makes arc.arc and strings.arc less valid examples of for usage? They're Arc programs, too. Should they not inherit the elegance they're attempting to define? (While still balancing efficiency, of course, cf. the tutorial: http://ycombinator.com/arc/tut.txt)
To the contrary, because arc.arc and strings.arc use for I think they make perfect examples -- which would make your first statement untrue, since you had to write extra bounds-checking.
- you sometimes, rarely, also need to directly use a descendant 'for ('down). Only once in all Arc (but once = it is needed):
So it's a pity that for this one time, you can't use 'for, and have to ressort using yet another loop construct that is here for... non-existing problems.
You're ignoring that down has another purpose. As you say, the need for a descending loop is rare. But the need for for to only go in one direction is much less rare (more on that later).
for the vast, vast majority of looping, you use higher-level loop constructs (each/repeat/etc.), so there is no problem w/ incorrect bounds, assuming the lib writer is not a moron.
So you'd also want to foist the responsibility of not being a "moron" onto every user of for? If other loops are already used to avoid silly bugs, why not for?
I count at least 12 different loop constructs in arc.arc: while, loop, for, down, repeat, each, whilet, whiler, forlen, on, until, noisy-each, and arguably others like evtil and drain.
I find that adding these makes code simpler: they express (and implement) purposeful loops. That's why I can do
(each x xs (prn x))
instead of
(forlen i xs (prn (xs i)))
which can be done instead of
(for i 0 (- (len xs) 1) (prn (xs i)))
which can be done instead of
(loop (= i 0) (< i (len xs)) (++ i) (prn (xs i)))
etc. If I wanted the most general & least to remember, I'd use a goto.
When for tries to infer the direction I want to go, I need to fight it to stop from going in the opposite direction -- to me, this is inconvenient.
Is it coherent than 'posmatch will return nil when pat > seq, where 'headmatch will throw an error in the same case (even stranger knowing 'posmatch actually calls 'headmatch)?
I agree that headmatch has odd behavior here. But with the fixed behavior (i.e., your patch):
Just because the function to which you funnel input sanitizes data doesn't mean you should be supplying bad values. Further, if we add more error-checking to posmatch to avoid the redundant calls, we're adding even more complexity -- wrestling against for to get it to go just one direction.
"You claimed it'd make the code shorter! Prove it!"
I believe only in reality, in data
Then let's inspect your patch closer:
inspect-patch.arc
(def default (file)
(+ "../arc3.1/" file))
(def patched (file)
(+ "../arc-patch/" file))
(def sexp-tokcount (sexp)
(len (flat sexp)))
(= for-def*
'(mac for (v init max . body)
(w/uniq (gi gm)
`(with (,v nil ,gi ,init ,gm (+ ,max 1))
(loop (assign ,v ,gi) (< ,v ,gm) (assign ,v (+ ,v 1))
,@body))))
down-def*
'(mac down (v init min . body)
(w/uniq (gi gm)
`(with (,v nil ,gi ,init ,gm (- ,min 1))
(loop (assign ,v ,gi) (> ,v ,gm) (assign ,v (- ,v 1))
,@body))))
new-for-def*
'(mac for (v init end . body)
(w/uniq (gi gm gt gf)
`(do
(if (> ,end ,init)
(= ,gt < ,gf +)
(= ,gt > ,gf -))
(with (,v nil ,gi ,init ,gm (,gf ,end 1))
(loop (assign ,v ,gi) (,gt ,v ,gm) (assign ,v (,gf ,v 1))
,@body))))))
; if this calculation is wrong, it should be revealed in logic-savings
(= max-diff* (- (+ (sexp-tokcount for-def*) (sexp-tokcount down-def*))
(sexp-tokcount new-for-def*)))
(def token-total (file)
(sum cadr (tokcount (list file))))
(def token-diff (file1 file2)
(- (token-total file1) (token-total file2)))
(def compare-tokcount (filename)
(let diff (token-diff (default filename) (patched filename))
(if (> diff 0)
(prn "The patch saved " (plural diff "token") " in " filename)
(< diff 0)
(prn "The patch added " (plural (- diff) "token") " to " filename)
(prn "The patch didn't change the token count in " filename))))
(def maximum-savings ()
(prn "The patch could have saved at most (caveat lector) "
(plural max-diff* "token")
" in arc.arc"))
(def logic-savings ()
(let diff (token-diff (default "arc.arc") (patched "arc.arc"))
(if (<= diff max-diff*)
(prn "So, by changing 'for in arc.arc, "
(plural (- max-diff* diff) "token")
" got added to code that used the previous version of 'for")
(err "miscalculated the maximum number of tokens you could save"))))
(map compare-tokcount '("arc.arc" "strings.arc" "news.arc"))
(prn)
(maximum-savings)
(logic-savings)
At the REPL
arc> (load "inspect-patch.arc")
The patch saved 9 tokens in arc.arc
The patch added 2 tokens to strings.arc
The patch didn't change the token count in news.arc
The patch could have saved at most (caveat lector) 17 tokens in arc.arc
So, by changing 'for in arc.arc, 8 tokens got added to code that used the previous version of 'for
nil
To explain the "caveat", I assume the most this new for could change is: (a) remove the single-direction for and down, (b) add the bidirectional for, and (c) leave any other piece of code that used for/down unchanged (save switching the word "down" to the word "for").
With these assumptions (and by inspecting the code), the assessment seems correct: arc.arc nets 8 additional tokens to stop for from going backwards. It's not that the token count is shorter from having for go both directions; it's that the code you've added to avoid for's new behavior isn't quite enough to outweigh the savings from removing down's definition.
In actuality, you'll wind up saving far less than 9 tokens because of multiple evaluation bugs:
i.e., 7 more tokens, totaling 14 more tokens, which outweighs the original figure. So, nothing is even really saved in arc.arc. Though, of course, the rewrites could be shorter with something like once-only (see towards the end of http://gigamonkeys.com/book/macros-defining-your-own.html).
Further, strings.arc and news.arc did not get shorter (strings.arc even got a little longer). The only way it seems that un-patched code could get shorter is if it had to go either up or down and the order didn't matter -- unlike code in the files inspected.
Therefore, this patch can either make new code longer or make you hope that for doesn't iterate in a direction you don't want it to (as in news.arc), unless you needed to do the Arc 3.1 equivalent of
(if (< start end)
(for i start end ...)
(> start end)
(for i end start ...))
which, with this patch, could be replaced with
(for i start end ...)
which is shorter.
As infrequently as such code occurs (0 times in the standard Arc 3.1 distribution, so far as I can tell), this does not yield big space savings. If it does occur frequently enough, it shouldn't outweigh the need for single-direction iterations, but would probably instead be made into a separate macro:
(mac between (var bound1 bound2 . body)
...)
Additionally, you assert that having an extra loop construct entails an unnecessary mental burden for the programmer. I disagree. It's not a burden if its purpose is specific: if you want to repeat a block of code, use
(repeat n ...)
instead of
(for temp 1 n ...)
If you want to iterate over the length of a sequence, use
(forlen i xs ...)
instead of
(for i 0 (- (len xs) 1) ...)
Moreover, if you want to iterate upwards through a range of integers, use
> I think that would get me in trouble sometimes...
Yes, I can also see cases where this behaviour would be problematic.
However, I'm nearly sure if I hadn't have to look at arc.arc to modify 'for, I'd have never know of 'down.
And the day I would have the need for a reverse 'for, I'd have try in the REPL (for i n+m n), see it doesn't work, and then say to myself "OK, Arc is one of those language where the programmer has to make sure the bounds are in the right order in 'for", and I would have code a little thing to ensure that.
BUT, and CatDancer is very right, on the other hand, if you have 'for working also in descendant order, then it's "OK, Arc is one of those language where the programmer has to test the bounds of 'for if in a situation where he can't be sure the expected "max" will effectively be > to the "min"".
Trade-off, trade-off, trade-off everywhere. Is the overload of knowing yet another loop construct interesting enough because then you don't have to manually test the bounds where you're not sure what they'll be, or is it the inverse? I guess there is no universal answer to this...
Your choice, but I noticed the Anarki arc.sh startup script is bash-dependant, which is actually less widespread than Perl. Not totally sure, but the xBSD doesn't have it by default, where they have Perl. Perl is everywhere (even on Windows, see ActivePerl), seriously. Similarly, nearly Linux distro have it by default, where, same thing, some doesn't have bash (I'm sure of this, because I got bad surprises because of this in the past, although don't remember which ones. Can check if you want). Bash is a big fat cow.
Personal opinion, but a shell-script bash-dependant is just killing the shell-script idea. I code shell scripts when I need to be really sure any Unix will make it run, and this implies making it strictly Posix-sh compatible.
> I'd rather not rely on hardcoding what files get loaded at startup
Yes this is a big sucking point. But there is a workaround:...
> especially since anarki modifies libs.arc to autoload things in load/
... then makes it modify as(-fast).scm instead :-P. Or even libs.arc, that just would contains autoloaded stuff (and not the "core" libs).
> be necessary to use arc for any sort of scripting
Yep, but not only. Not only at all. This is related to my way of working, but I'll expose it nevertheless so that people can understand my need for fast app startups, because I know a lot of people like me. I, of course, can understand the "eternal session" stuff, I sincerely find it smart, but really, I know very few people that can stick with it (I can't). I mean, for what I know, you're really not the majority.
I'm actually OK with a script taking a long time (I sometimes like scripts to take time, it gives me the impression it's doing a lot of work and that I'm smart, I'm making the computer do the stuff :-D + allow me to pause for a second => think before typing). It's not OK when I'm in an trial/error process, but even there it's not so bad if I loose some time. Always pleasant to see the computer work, even for nothing.
But I need to spawn/terminate, spawn/terminate processes, and this is have to be quick. Because I've always at least 2 or 3 virtual desktops (eeePC, so it's not "real" vdesktops, I need nearly one vdesktop by app on this tiny screen. Change vdesktop by window, same thing), and needing to find the one in which I started the process just plainly sucks my balls (seriously; even if it takes 2 secs: useless stuff, waste of time). Say for a terminal, I just hit ctrl-shift-enter, a new xterm is spawn in 0.1 sec and I can do what I have to do right now. Same for my web browser, which is Conkeror (or Chrome on Windows but same here, because it starts very quickly), and can be launched in daemon mode so that when you call it, it's there in < 1 sec (and this is why I can't consider the fat Firefox). Same for everything, or should be. For instance, I actually plan to learn vim because Emacs is too long to start. I plan to drop bash, because with the smart completion feature, it's not instant to start and for a shell this is just a sin.
I'm a big procrastinator, and when I want to do some work, this is not often, and if the computer makes it difficult (slow to start), well I come back to procrastinate. Seriously. Even 1 sec can suffise. And this bugs me a lot, because computers nowadays, WTH they've got GHz of power, this should not happen.
I insistate on all this not because I think this is the right way, but because you (I mean, not you rntz personally, I mean people in general) are crazy to ignore this.
Yes, 0.1 sec makes a difference. I'm sure you know about the Google experiment to try to load x * 2 instead of x (don't remember the exact numbers) results by page, the load time increase was something as insignifiant as +0.1 sec, and they actually loose a real % of customers. (If you don't know about this XP, tell me, will go seek for the link, it's very very interesting). People - just - respond - to - speed. I already said that, but Google made his fucking huge empire based merely on this single point. And they continue to expand their empire applying this recipe (Chrome, Google OS which, I bet everything on this, will be 10x faster to start than Ubuntu, GMail on mobile that just indicates "Download GMail for your mobile - it's blazing fast", etc.)
So, but if you think Google is crap, you're crazy to ignore the speed issues.
FTR, if an app can't be fast, good solutions include applications that would not allow multiple-instances, but will come back in focus when called again. Another good thing is, for terminal, tilda/yakuake. Another interesting idea is a client/server idea, where the client is lightweight where the server, launched as a daemon, is long to start (the evsrv started with this idea).
Last point, eternal session is good, but if this means you keep your personal computer on for days, you're just killing the planet, and this obviously sucks. There is this old maxim "Don't waste, even cheap ressources like computer power/electricity, but to not waste human brain power". Gosh I was reading lately how Amazon was applying this, to the point where they removed the light bulbs from their snack distributors, because they're basically useless. Same thing, Amazon is a giant now, so don't think this is stupid (funny story, Amazon also communicated lately on how 200ms (!) of extra latency makes them loose lots of money).
> Your choice, but I noticed the Anarki arc.sh startup script is bash-dependant, which is actually less widespread than Perl.
Good catch. I've changed arc.sh so that it should be POSIX sh compliant, but I'm unsure precisely what this requires and don't have the patience to devour the entire spec at the moment. I've tested it against bash and dash thus far. You can give it a look over at http://github.com/nex3/arc/blob/master/arc.sh if you want.
> Last point, eternal session is good, but if this means you keep your personal
computer on for days, you're just killing the planet, and this obviously sucks.
(require mzscheme) ; promise we won't redefine mzscheme bindings
(require "ac.scm")
(require "brackets.scm")
(use-bracket-readtable)
(parameterize ((current-directory (current-load-relative-directory)))
(load "compiled/arc_arc_scm.zo")
; yeah, should chdir to ./compiled/ directly, but don't want
; to fight w/ MzScheme now. And yeah, the filenames are dirty
(load "compiled/strings_arc_scm.zo")
; (load "compiled/pprint_arc_scm.zo")
; Commented, 'cause bug in mzc or smthg (at least on my machine):
; compiled/pprint_arc_scm.zo::2989: read (compiled): ill-formed code [...]
;
; (aload "pprint.arc")
; ^ and this commented 'cause it'd make `arc' take more than 1s, + unused by me.
; Uncomment if you need it, my machine is an eeePC, so on a Core 2 Duo
; or the like, I suppose you'll still be under 1 sec. No cheat here
(load "compiled/code_arc_scm.zo")
(load "compiled/html_arc_scm.zo")
(load "compiled/srv_arc_scm.zo")
(load "compiled/app_arc_scm.zo")
; Ah yeah, and I don't load prompt.arc, since a long time, useless to me.
; Add it here and in alc.pl, loading it will not kill perf I think (loading
; bytecode is fucking fast on this fucking good MzScheme).
)
(let ((args (vector->list (current-command-line-arguments))))
(if (not (empty? args)) ; cmdline args are script filenames
(for-each (lambda (f) (if (string=? f "-") (tl) (aload f))) args)
(tl)))
Ah yeah, and 'tl2 in ac.scm needs to get back the ':a' stuff (thanks rntz for pointing this is out on the "Arc usable on Unix" thread):
#!/bin/sh
## arc: launcher for Arc. Put in your $PATH.
ARC_DIR="/home/paul/arc/3.1/" # change this
alc.pl $ARC_DIR;
rlwrap mzscheme -qr ${ARC_DIR}as-fast.scm $@
Note you have no manual work to do when modifying one of arc.arc or the libs, alc.pl is good enough to detect changes and do all the work for you (.arc -> .scm -> .zo). But, and this sucks, if you want to add the loading of, say prompt.arc, you should take care to modify alc.pl and as-fast.scm. And also no work if you hack ac.scm, 'cause as previously said, mzcheme will detect this.
Note all of this is not dependant of my previous patch to make Arc OK on Unix, but you'll have to change some things by yourself if you don't use this patch.
Note 1 sec is still very long. Compare to the 0.012 secs of alc.pl.
There is this quote "On the Internet, if it's not instant, it's too long." Well, it's true in computing in general. I still have to wait before the Arc prompt to show, and this sucks.
Note you should not delete libs.arc or as.scm because alc.pl use them to bootstrap.
X you mean X.org?! I don't understand here, but it's because I'm not a graphic guy and I've never understand X.org (if you are effectively speaking about it).
Can you provide more explanations and/or some example code please? I'd be really glad to understand.
> I think it's not worth the effort right now, [...] most likely to be trampled and disregarded silently by the vanilla branch
+1, for the silently mainly. But don't get too frustrated, it will make you have heartburns and this is not good :-D, or yes get frustrated, but at least know you're not alone thinking this kind of thing
All the drawing is handled externally by... a drawing server which processes drawing requests from Arc. But it's not even in alpha state so I'm not releasing it yet. Also because I'm thinking in trashing it and start from scratch using opengl instead of xdraw and change the way it communicates with Arc.
But I'm too busy at the moment, it will have to be in my next holidays (roughly 4 or 5 months).