Arc Forumnew | comments | leaders | submitlogin
2 points by lojic 5894 days ago | link | parent

Sorry, I just now saw this. The Arc forum makes it darn near impossible to realize someone has replied to an older item :(

I think an example of what you're talking about would be great. If you have a better way to validate textual data than regular expressions, then naturally I would want to know about it.

Here's a few regular expressions I've collected. I realize they're not perfect (e.g. the email regex), but they're good enough for my purposes presently.

    REGEX_EMAIL     = /^([-\w+.]+)@(([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})|(([-\w]+\.)+[a-zA-Z]{2,4}))$/
    REGEX_FLOAT     = /^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)?(\.[0-9]*)?$/
    REGEX_INTEGER   = /^([0-9]+|[0-9]{1,3}(,[0-9]{3})*)$/
    REGEX_ISO_8601  = /^(\d{4})-(\d{2})-(\d{2})[Tt](\d{2})[-:](\d{2})[-:](\d{2})[Zz]$/
    REGEX_MONEY     = /^(\$[ ]*)?([0-9]+|[0-9]{1,3}(,[0-9]{3})*)?(\.[0-9]{0,2})?$/
    REGEX_PHONE     = /^(\((\d{3})\)|(\d{3}))[-. ]?(\d{3})[-.]?(\d{4})[ ]*([^\s\d].{0,9})?$/
    REGEX_SSN       = /^(\d{3})-?(\d{2})-?(\d{4})$/
    REGEX_ZIP_CODE  = /^(\d{5})[- ]?(\d{4})?$/
So, what would you use to accomplish the same thing without regular expressions that is as concise? The regular expressions allow an easy way to both validate user input and parse it via groups. They're declarative vs. imperative. I have these in a Ruby module with associated functions for parsing (which primarily uses the groups) etc., so they're encapsulated together.

I think you mentioned you're an Erlang programmer, so how would the non-regex Erlang code look to validate an email address corresponding to the REGEX_EMAIL expression above?



1 point by partdavid 5884 days ago | link

Ah, you're right, it's a bit hard to see when folks have replied. Yes, I'm an Erlang programmer.

In response to your question, I don't accept your premise that replicating a particular regular expression is a real programming task. You say your email regular expression isn't perfect, but it's not clear to me why you chose those particular set of restrictions beyond what's defined in the RFC--so it's a little hard for me to replicate (for example, the local-part and domain of the address can have a more kinds of characters that what you have defined).

Instead, I'll offer this as a non-equivalent but interesting comparison. I've elided the module declarations (as have you), including the imports that allow some of these functions without their module qualifiers:

  email(S) ->
     [User, Domainp] = tokens(S, "@"),
     {User,
      case {address(Domainp), reverse(tokens(Domainp, "."))} of
         {{ok, Addr}, _} -> Addr;
         {_, RDomain = [Tld|_]} when length(Tld) >= 2,
                                     length(Tld) =< 4 ->
            join(reverse(RDomain), ".")
      end
     }.
I don't know how the terseness of this compares with your example, given that it includes some things that yours doesn't (a way to call it, a format for the return value rather than the capture variables). Terseness, of course, in the pg sense of code tree nodes, whatever they are. :)

The Erlang function above returns a tuple of the local-part and the domain part and throws an error if it can't match the address. If this were something I wanted to ship around to other functions or send to another machine or store in a database table or something, I would have email/1 return a fun (function object) instead.

If either one of us wanted something better than what we have (or even if we don't--it seems like coming up with The Right Thing To Do With Email Addresses is worth a bit of time to do only once) I would write a grammar. The applicable RFC 2822 more or less contains the correct one, which is only a few lines.

At the "low" end of text processing power, there are basic functional operations on strings and lists, and at the "high" end there are grammar specifiers and parser generators. In the band in between lives regular expressions, and I am not convinced that that band is very wide. I like regular expressions (and, indeed, I would like it very much if Erlang had better support for them) but for me they are a specialized tool, particularly useful (like wildcard globbing) for offering as an input method to users.

But they aren't a general solution to every kind of problem, and for that reason I don't think Arc or any other general-purpose language benefits from baking them into the basic syntax--they belong in a library.

-----