Arc Forum | First Priority: Core Language

Arc Forum

	First Priority: Core Language (paulgraham.com)
	37 points by pg 6355 days ago \| 28 comments

11 points by kens 6354 days ago | link

What gets defined as the "axioms" of Arc? One way to think about it is that ac.scm provides the axioms, and arc.arc builds the language out of them. But ac.scm defines about 87 functions, which seems like a large axiom set.

The ac.scm functions range from primitives such as "car" to OS library operations such as "client-ip" to scaffolding operations such as "annotate" and "ccc". (By scaffolding, I mean that annotate and ccc seem to be there only to support mac and point, which seem like the "real" axioms. Am I right about this? Is mac just something built out of the axiom annotate, or is annotate just something to suport the axiom mac?) And why is rand implemented but not cosine?

If one is building an axiomatic language, there's a big gap between the seven axioms of Lisp described in http://www.paulgraham.com/ilc03.html and the 87 functions of ac.scm. (These seven basic words of Lisp inexplicably makes me think of George Carlin's seven words you can't say on TV; one could build a Lisp-like language "carlin" using his seven words as the primitives. But I digress.) For instance, cons seems much more fundamental than current-gc-milliseconds or break-thread.

The (cool) JavaScript port omits lots of things (e.g. thread), but still seems to have the "Arc nature". This would suggest that there's a thinner Arc trying to escape from the existing Arc. Going the opposite direction, there's the demand for a full-functioned Arc.

It seems to me that there are at least 4 different Arcs: there's Arc the 100-year language, Arc the exploration of axiomatic programming language theory, Arc the powerful Web 1.5 framework, and Arc the language for exploratory programming. It's not obvious to me that these are all the same, or should have the same axioms.

Am I going down the wrong path considering ac.scm to be axioms and arc.arc the language implemented from the axioms? Are some things in arc.arc axioms and some things in ac.scm just optimizations? What parts of Arc count as axioms?

-----

7 points by shiro 6355 days ago | link

The difficult part is, of course, to test if the choice of axioms is optimal you have to build number of different real applications on top of it, and real applications need vast libraries.

Something which seems so harmless as 'standard input port' needs to be reconsidered if you start supporting large character set. Another example is the history of the quest to find the right way to do asynchronous inter-process communication---the flaw of the core model only surfaces when some applications try to push the envelope.

-----

4 points by pg 6354 days ago | link

News.YC is a real application and I wouldn't call its libraries vast. I suspect it would be enough to do 4 or 5 fairly different types of applications, maybe a couple thousand lines each.

I'd love to do a graphics application next, preferably web-based. Anyone have opinions about the right structure for such a thing?

-----

4 points by shiro 6354 days ago | link

I don't deny that News.YC is a real application. OTOH, I've used my Scheme implementation for several commercial projects and I don't think I can port them to the current version of Arc straightforward. So I try to pinpoint the reasons. Libraries can come afterwards, so here I think of the core features.

A couple of things I think indispensable are the ability to write non-consing inner loop and O(1) access structure (vectors)---combining these with macros I can make the speed-critical routine run pretty fast. Although this seems an optimization issue, it is so critical to the extent that, if I weren't have them I couldn't have used Scheme to the projects.

Managing concurrent processes and threads safely and efficiently is another part that the language can help tremendously if it provides a right model. Arc's current choice of atomic-invoke is simple and clean, but questionable in terms of scalability; at least I cannot use that model in my app. (And that model is so fundamental that it'll be pain to change later to go through all atlets and atwiths to replace them for the different primitives.)

Oh BTW, I think the primitive 'dead' in ac.scm line 1083 is missing wrapnil.

-----

5 points by CatDancer 6354 days ago | link

I wonder if an eager hacker might like to take a look at compiling Arc to ActionScript (maybe using something similar to http://www.omnigia.com/scheme/cpscm/home/ or http://www-sop.inria.fr/mimosa/personnel/Florian.Loitsch/sch... ?), so Arc could be driving a Flash application in the browser.

-----

1 point by kennytilton 6354 days ago | link

Given that we have a report of a JS Arc (ArcLite) and that ActionScript is supposed to be a JS... sounds like we're almost there. :)

-----

7 points by nostrademons 6354 days ago | link

The source itself is pretty close to ActionScript compatible. Problem is, I mess with the __proto__ property a lot for basic data types and for scope chaining. It's what lets me use JavaScript's native lookup mechanism for variable lookup; instead of searching down an a-list, I make each activation frame a JavaScript object and knit their prototypes together, so I can lookup variables with 'env[symbol]' and the whole search process is in C. __proto__ is a non-standard property; it's supported in all major browsers, but I wouldn't bet on it appearing in Flash. And I'm not sure it'd be fast enough if you had to implement variable lookup in the interpreter itself.

I thought about writing a compiler instead of an interpreter, but macros & quasiquote present a bit of a problem. You can compile the code down to JavaScript...but if you run into a quasiquote, you've got to jump back into the compiler to evaluate it, then splice that code back into your function, then eval the newly-generated JavaScript code to get back a real function. Remember, even ordinary functions like setforms (in the standard library) call macex at runtime, so you can't just separate things into a compile-time-only macroexpansion pass. And ActionScript doesn't have an eval function, so that gets a little complicated.

-----

3 points by CatDancer 6354 days ago | link

I understand your point about macros but not quasiquote... I thought `(a b ,x ,y c d) was an abbreviation for (list 'a 'b x y 'c 'd)? What am I missing?

I wonder if it might work if you have an identical Arc implementation in your compiler and in your runtime so that all the macro expansions can be done at compile time.

ActionScript3 flash.display.Loader.load() says it loads SWF files... does this include compiled ActionScript? If so, maybe a REPL inside the Flash application could be written by calling back to your server which ran the compiler. Not ideal, but certainly more fun than an edit -> compile -> run -> debug cycle.

-----

3 points by nostrademons 6354 days ago | link

`(a b ,x ,y c d) is an abbreviation for (quasiquote (a b (unquote x) (unquote y) c d). The quasiquote returns a literal list, except that whenever it encounters an unquote or unquote-splicing it hops back into the evaluator and evaluates the form in the local environment.

Flash load() is probably your best bet. My startup has a similar problem - dynamically generating code that will run in a browser - and we eventually decided it was easier to go with JavaScript/eval than Flash, even though we have to support Flash anyway for the finished product. Our other option was to send the code back to the server through Flash's XMLConnection, compile it there, returns a URL of the compiled SWF through the connection, then loadSWF() it and hope we can figure out how to dynamically reference the new classes. Check out MTASC for the server; Macromedia's Flash Compiler won't run on the command line.

Or you could just punt on the dynamic features. I don't support continuations in ArcLite, and I know someone doing a native-code port that's planning to leave out a few of the more dynamic features. Beware that I found (= ...) doesn't work if you leave out (macex ...), though.

-----

3 points by okplus 6354 days ago | link

ActionScript does have an eval function: http://www.adobe.com/support/flash/action_scripts/actionscri...

Edit: looks like it may be somewhat limited. Just names of variables. It doesn't actually eval beyond looking in the symbol table...

-----

2 points by nostrademons 6354 days ago | link

Yeah, it was a deliberate design decision by Macromedia. They wanted to keep the VM small, so they deliberately left out anything that smacked of a runtime compiler. Eval, regexps. Though I heard regexps may have come back in AS3...

-----

2 points by CatDancer 6354 days ago | link

I haven't yet heard of any Scheme -> JavaScript compilers that support eval (and thus a REPL)... so ArcLite (or some other interpreter written in JavaScript) would be a good way to support interactive programming in the meantime.

-----

3 points by olavk 6354 days ago | link

Writing flash apps in arc - that would be a _really_ cool showcase.

-----

1 point by masterponomo 6354 days ago | link

Not sure about the structure being right or not, but the script interface to The Gimp comes to mind, at least for graphics primitives, only because it is Lisp and could be translated to Arc w/o reinventing the wheel (reinvention could be Phase II:-)).

-----

1 point by ms 6354 days ago | link

I'm not sure what you have in mind with graphics app, but I think a wiki whose pages could contain some kind of simple, "object-oriented" vector graphics (e.g. using a small Flash app) would be cool.

-----

1 point by djwhitt 6351 days ago | link

If you're interested in doing a graphical web app I would look into SVG.

-----

1 point by nostrademons 6351 days ago | link

SVG still needs a plugin in many of the popular browsers. If you want something that'll work out of the box, I'd use Canvas (or excanvas.js for IE compatibility). SVG's a better technology, but the winner tends to be the one that works now, not the one that will work better eventually.

-----

5 points by NickSmith 6355 days ago | link

Right on. Regardless of all the bletching over functionality, I'd be more concerned if you lost your focus on getting the basics right.

Having said that, it would be nice if someone could document (however roughly) the changes at each iteration. I know this goes against what you said in 'Arc's Out'[1] but I think it would help folks in fixing up their existing code and encourage the Arc community to dive in deeper... and that, I would have thought, can only be a good thing.

[1] ."So we're giving notice in advance that we're going to keep acting as if we were the only users. We'll change stuff without thinking about what it might break, and we won't even keep track of the changes."

-----

3 points by lojic 6354 days ago | link

You'd be surprised how handy /pattern/, =~, !~ can be :)

  module RegexPatterns
    MONEY = /^(\$[ ]*)?([0-9]+|[0-9]{1,3}(,[0-9]{3})*)?(\.[0-9]{0,2})?$/
    ...
  end

  def valid_currency? str
    str =~ RegexPatterns::MONEY
  end

  def parse_money str
    if str =~ RegexPatterns::MONEY
      str.delete('$, ').to_f
    else
      raise "invalid input"
    end
  end

Not sure how you'd handle group extraction nicely without resorting to implicit variable assignments though.

  raise 'data error' unless data =~ /(\d+) items/
  num_items = $1.to_i

I almost rejected Ruby when an initial perusal of a Ruby text showed some Perlisms, but I have to admit that the regular expression handling of Ruby is a joy to use. If you can make regular expression usage a joy in Arc, that would be awesome. I posted on this thread because I'm not sure if that can be done outside of the core with libraries alone.

-----

8 points by shiro 6354 days ago | link

I incorporated regexp literal in Gauche Scheme and found it very handy. Regexp literal is written as #/pattern/. When appears in the procedure position it also works as a matcher.

    (#/\d+/ "abc123")  => #<match object>

The matcher returns a match object if the given string matches the pattern; you can extract submatches from it. The match object also works like a procedure.

    (cond [(#/(\d+)-(\d+)/ "123-456") => (cut map <> '(1 2))])    
        => ("123" "456")

The good thing I found about this "acts like a procedure" feature is that I can pass around it wherever a procedure is expected. For example, grep can be expressed in terms of the standard 'filter' procedure.

    (filter #/\w+/ list-of-strings)

(I'm not sure Arc can go this direction, since Arc's operators that takes predicates (e.g. 'find', 'some', 'all', ...) does "auto-testification"---if a given object isn't a procedure, it creates a predicate that tests equality to the given object---which may conflict with this type of extended use of "callable objects".)

-----

1 point by partdavid 6350 days ago | link

There's nothing joyful about regular expressions. For one thing, as above, it leaks logic all over your code. Secondly, it's unclear--it would be quite difficult to identify a bug in your expression. A related clarity problem is that you have restricted inputs to a subset of valid inputs, and it's hard to see how or why. Third, they are brittle and hard to instrument for diagnostics.

-----

1 point by lojic 6349 days ago | link

I don't see how it "leaks logic all over your code". But I like to keep an open mind - what are you suggesting as an alternative for the above example?

Regarding the difficulty in identifying a bug in the expression, I tend to agree. That's why I have a lot of unit test cases for each meaningful regular expression.

Regarding restricting the inputs to a subset of valid inputs, which inputs would you like to accept that the regex rejects (for U.S. currency only)? I haven't had any complaints yet, but that's not to say it won't happen in the future.

-----

1 point by partdavid 6341 days ago | link

1) If you have capture patterns, you have code in one place dependent on the expression in another without the coupling being clear. More mildly, you are married to regular expression operators because you have direct references to your (regular-expression-defined) subprogram all over. You can't decide not to make it a regular expression; or to make it two, or make it a much clearer expression and a programmatic dress-up. The alternative is to not use regular expressions.

2) Eh, unit tests can catch the mistakes you anticipated making. Lots of other mistakes are possible. Why write a complex regular expression and page full of unit testing code for it when you could write more straightforward logic.

3) Like I said, it's not clear why you have the expressions you do.

I'm not saying regular expressions never ever have their place. In particular, they can be a convenient method to offer users to specify search and validation patterns and that kind of thing. But fixing program logic into them is a bad idea.

Now, if it's inconvenient or inefficient to express that textual extraction in some way other than regular expressions, I'm suggesting that is a failure of the language (for example, because pattern matching is weak or specifying grammars is cumbersome), not a point for recommending regular expressions.

-----

2 points by lojic 6325 days ago | link

Sorry, I just now saw this. The Arc forum makes it darn near impossible to realize someone has replied to an older item :(

I think an example of what you're talking about would be great. If you have a better way to validate textual data than regular expressions, then naturally I would want to know about it.

Here's a few regular expressions I've collected. I realize they're not perfect (e.g. the email regex), but they're good enough for my purposes presently.

    REGEX_EMAIL     = /^([-\w+.]+)@(([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(([-\w]+\.)+[a-zA-Z]{2,4}))$/
    REGEX_FLOAT     = /^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)?(\.[0-9]*)?$/
    REGEX_INTEGER   = /^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)$/
    REGEX_ISO_8601  = /^(\d{4})-(\d{2})-(\d{2})[Tt](\d{2})[-:](\d{2})[-:](\d{2})[Zz]$/
    REGEX_MONEY     = /^(\$[ ]*)?([0-9]+|[0-9]{1,3}(,[0-9]{3})*)?(\.[0-9]{0,2})?$/
    REGEX_PHONE     = /^(\((\d{3})\)|(\d{3}))[-. ]?(\d{3})[-.]?(\d{4})[ ]*([^\s\d].{0,9})?$/
    REGEX_SSN       = /^(\d{3})-?(\d{2})-?(\d{4})$/
    REGEX_ZIP_CODE  = /^(\d{5})[- ]?(\d{4})?$/

So, what would you use to accomplish the same thing without regular expressions that is as concise? The regular expressions allow an easy way to both validate user input and parse it via groups. They're declarative vs. imperative. I have these in a Ruby module with associated functions for parsing (which primarily uses the groups) etc., so they're encapsulated together.

I think you mentioned you're an Erlang programmer, so how would the non-regex Erlang code look to validate an email address corresponding to the REGEX_EMAIL expression above?

-----

1 point by partdavid 6314 days ago | link

Ah, you're right, it's a bit hard to see when folks have replied. Yes, I'm an Erlang programmer.

In response to your question, I don't accept your premise that replicating a particular regular expression is a real programming task. You say your email regular expression isn't perfect, but it's not clear to me why you chose those particular set of restrictions beyond what's defined in the RFC--so it's a little hard for me to replicate (for example, the local-part and domain of the address can have a more kinds of characters that what you have defined).

Instead, I'll offer this as a non-equivalent but interesting comparison. I've elided the module declarations (as have you), including the imports that allow some of these functions without their module qualifiers:

  email(S) ->
     [User, Domainp] = tokens(S, "@"),
     {User,
      case {address(Domainp), reverse(tokens(Domainp, "."))} of
         {{ok, Addr}, _} -> Addr;
         {_, RDomain = [Tld|_]} when length(Tld) >= 2,
                                     length(Tld) =< 4 ->
            join(reverse(RDomain), ".")
      end
     }.

I don't know how the terseness of this compares with your example, given that it includes some things that yours doesn't (a way to call it, a format for the return value rather than the capture variables). Terseness, of course, in the pg sense of code tree nodes, whatever they are. :)

The Erlang function above returns a tuple of the local-part and the domain part and throws an error if it can't match the address. If this were something I wanted to ship around to other functions or send to another machine or store in a database table or something, I would have email/1 return a fun (function object) instead.

If either one of us wanted something better than what we have (or even if we don't--it seems like coming up with The Right Thing To Do With Email Addresses is worth a bit of time to do only once) I would write a grammar. The applicable RFC 2822 more or less contains the correct one, which is only a few lines.

At the "low" end of text processing power, there are basic functional operations on strings and lists, and at the "high" end there are grammar specifiers and parser generators. In the band in between lives regular expressions, and I am not convinced that that band is very wide. I like regular expressions (and, indeed, I would like it very much if Erlang had better support for them) but for me they are a specialized tool, particularly useful (like wildcard globbing) for offering as an input method to users.

But they aren't a general solution to every kind of problem, and for that reason I don't think Arc or any other general-purpose language benefits from baking them into the basic syntax--they belong in a library.

-----

2 points by masterponomo 6355 days ago | link

Built-in "Industrial D" operators would be a major win--standardize ways of reasoning about data, regardless of how/where it originates or is stored. The core operators would not be for database/external storage, but just plain data. Similar to Perl's DBI:DBD, only with DBI operators built in and implementing relational calculus, not SQL function calls.

-----

1 point by mcoles 6355 days ago | link

As a casual hobbyist who's trying to learn lisp(cl/scheme), coming from a ruby mindframe, arc actually feels like it may be a net win, while the jury is out on the other two.

-----

1 point by bOR_ 6354 days ago | link

same, although I only tried cl (about 6 months ago).

-----