"With this in place, we can say (my-stream) to read a character, but all the other things we can do with an input stream are still available."
Yeah, but that doesn't help if you want to extend readc or peekc so they understand your new stream. You need to extend readc and peekc directly. And then how do those functions get at the data they need?
---
"One random thing I'm not comfortable with about the (my-stream 'peek) approach is the fact that it uses a symbol, 'peek, which might mean different things to different libraries. Obviously 'peek itself would be toward the core, but I could easily imagine two pull parsers trying to use 'xml-node."
That is true, but how is that any more arbitrary than saying that the global functions peekc, readc, etc. are reserved for input streams? The symbol 'peek simply means whatever the function wants it to mean. Thus, a function that behaves like an input stream can be called like an input stream. Duck typing.
---
"Could you elaborate on that somehow? If I "extend" 'readc, I'm probably only changing it to support a whole new range of stream types, not (intentionally) modifying the behavior for existing ones. It shouldn't be too clunky to determine whether something's in that new range, and I don't have a clue what part of this strategy you're saying is less extensible than your idea."
Why not modify the behavior of existing ones? In fact, readline does that right now, by relying on readc. Thus, if your input supports readc, it supports readline too.
And it's not just readc either, it could be other functions, it's just that this is a discussion related to input streams. So, yeah, it's possible to do it with the current model, but I think it's clunkier and leads to more (and uglier) code.
Okay... let me try to explain... I want to allow Arc users to create new data types, like input streams. Right now, this is possible only by extending built-in functions like readc, peekc, etc. Something like this:
(extend readc (x) (isa x 'my-data-type)
...)
...but now you end up needing to define a new data type, and you need to extend all the built-in functions individually. You can avoid that by using coerce...
(def readc (x)
(do-something (coerce x 'input))
...the problem is the "do-something" part. Even after readc coerces your custom input type to 'input, how does it extract the data it needs? How does it read a character? How does readc access the internals of your data type? You could special-case it to be a function call, but then what about peek? What about reading bytes?
What I am proposing is a way to solve both problems in one fell swoop: functions are typed not based on their `type`, but based on the pseudo-methods they contain. Duck typing. And by calling these "methods", functions (like readc) can extract the data they need. Now, you can define readc like this:
(def readc (x)
(x 'char))
And now any function that supports 'char automatically gets access to readc without needing to extend it. What if a different function also uses 'char? No problem, it works too! Thus, the functions don't care about the type of their argument, they only care that it supports the methods they need to do their job. And by leveraging prototypes, it lets you extend the scope chain at any point, giving more fine-grained control than simply whether data is a string or a cons or whatever.
By the way, you can combine this approach with conventional types as well:
(def readc (x)
(coerce x 'input).'char)
The above is more strict, because it not only expects the data to be coercable to 'input, but it also expects it to have a 'char "method". This could potentially handle your use-case about conflicting libraries using the same method names: you can use the function's actual `type` to differentiate between them.
---
Random side note: this of course means that I can define tables so that they are functions of type 'table that have a 'get and 'set method. Then you can create a custom table just by creating an annotated function that has 'get and 'set. And then `apply` could be designed so when you call a table with (foo 'bar), it calls (foo 'get 'bar) instead.
This makes it super-easy for Arc code to define custom table types. Or custom input types. Or custom any-type, really, since all composite data types can be represented as functions with methods.
This is what I'm trying to explain. If you have a table called foo, then calling (foo 'bar) is a "get" action, and calling (= (foo 'bar) 5) is a "set" action. foo is a single data-type, it is self-contained, but it has two different behaviors in two different areas.
If you want to emulate that, you end up needing to jump through some hoops, like the following:
(def my-table ()
(obj get (fn (n) ...)
set (fn (n v) ...)))
...and now you have the fun of extending sref and apply (and probably eval as well) so they behave properly with your custom table. Remember my prototype post a while back? I couldn't even get it to work in pgArc. What a pain. So much easier if all your function needs to do is implement the required methods, and everything just works without needing to extend anything.
And this "functions with methods" idea is applicable to other data types as well, like input streams. So for the same reasons that creating custom table types are a massive pain, creating custom input types are a massive pain. But this proposal makes it super easy.
---
"My approach, with direct use of failcall, and with rulebooks rather than deep dependency chains[1], would promote patterns like this:"
By the way, I would expect my proposal to be completely compatible with rulebooks as well... the only requirement is that the call (foo) return a char, and (foo 'peek) return a char, but not consume the stream. It's completely up to foo how it handles delegation and/or inheritance. I merely provided one potential way: prototypes.
But the proposal works even with ordinary functions that aren't prototypes. In fact, it doesn't even need to be functions per se, for instance I would expect the following to work:
(= foo (obj char #\f))
(readc foo) -> #\f
Neat. Since foo is a table, and (foo 'char) returns #\f, readc was able to use it. This, of course, wouldn't work in the strict version of readc, which tries to coerce it's argument to 'input. But this would work:
"By "deep depencency chains," I mean that I assume you're talking about having patterns whereby A is the prototype of B, B is the prototype of C, C is the prototype of D, and people only ever use D most of the time. (A, B, and C might have longer names.)"
It could be that way, but shallower trees are also quite possible, and in fact I expect those would be the norm. My suggestion to authors of functions would be to only increase the depth of the tree as needed.
"Yeah, but that doesn't help if you want to extend readc or peekc so they understand your new stream. You need to extend readc and peekc directly."
Sounds like you're solving your own problem. :)
"And then how do those functions get at the data they need?"
Why hide it?
Anyway, I'd put the extensions of 'readc and 'peekc in the same place as the rest of my code that dealt in the concrete details of my custom stream type. That way, I can pretend in all my other code that the data is strictly encapsulated, and when I do change the implementation, everything I need to refactor is in the same page or two of code.
---
"That is true, but how is that any more arbitrary than saying that the global functions peekc, readc, etc. are reserved for input streams?"
If you're saying the symbol 'peek is just as arbitrary as the global variable name 'peekc, I agree, but global variables are the more likely thing to be handled by a namespace system. :) If that's not what you're saying, whoops.
---
"Why not modify the behavior of existing ones? In fact, readline does that right now, by relying on readc. Thus, if your input supports readc, it supports readline too."
Huh? Using 'readline doesn't change what happens the next time you use 'readc. I think we're having some word failures here.
Maybe what you mean by "modify" is more of a pure functional thing, where you "change" a list by removing its first element when you call 'cdr. But then I still don't understand what you meant in your original statement that "Another problem with things like (readc) is that it's all-or-nothing."
"you need to extend all the built-in functions individually . You can avoid that by using coerce..."
Darn 'coerce, always enticing people to use it. ^_^ It's actually possible to use a coercion pattern here, but you'll need to check whether you can coerce at all, and go on to other extensions if not. (This is something failcall and rulebooks make convenient.) However, to create a value of type 'input, you still need to specify all the reading and peeking behavior somewhere, and I prefer to specify those behaviors in separate pieces of ravioli, in this case by extending each function individually.
"Even after readc coerces your custom input type to 'input, how does it extract the data it needs?"
Exactly, I wouldn't turn something into an 'input and then turn it back; by extending 'readc directly, I'd preempt the coercion step.
To digress a bit, coercion is a positively useful design pattern when there's more than one sensible set of "axioms" for a set of utilities. If I see utilities A1, A2, B1, B2, B3, and B4, and I notice that the B's can be implemented in terms of the A's, then I can do the following:
1. Write a function C of the form "Take any value, and return a similar value that supports all the A's." I call this a coercion function, but I don't expect everyone to agree with that. ^_^
2. Extend the B's so that they try using C. (If there's no standard way to "try" using a function, such as failcall, then C needs to indicate failure in its own way, or there needs to be another function D that people call to see if they can call C.)
3. Sometimes it's useful (if uninspired) to create a boilerplate concrete type for C to return when there's no better way to make the right A-savvy value. This type tends to take the form of a wrapper containing a function to call for the A1 behavior and a function to call for the A2 behavior. Obviously, the A's should be extended to support this type; that's the only point of it.
After this, if I ever want to extend the B's, there's a good chance I can do it by finding (or making) something that extends the A's instead, and then extending F to do the conversion. After a certain initial cost (making C and C's general-purpose return type), this eventually becomes a net decrease in the number of extensions needed.
...And I've run out of time. I'll get to the rest of your post later! ...And your followups someday. XD
I would like to point out that prototypes and methods are completely separate subjects that serve a similar purpose (extensibility) in completely separate ways. Perhaps I shouldn't have discussed them in the same post.
Methods solve the problem of polymorphism, namely the ability to easily define new types (and new instances of existing types) that can intermesh with existing code (like the built-ins readc and peekc). It does this by implementing duck typing: if the function supports the method, just use it.
This can be augmented by a technique that I've come to like: coercion. Rather than saying "my argument needs to be of type foo", the function just coerces to type 'foo. If it can't be coerced, it will automatically throw an error.
The reason I like the coerce approach is because it means you can easily create completely new types. So if you create a new 'byte type, you can extend coerce so it can be coerced into 'char, 'int, 'num, etc. and existing code will work with it.
The reason I like the method approach is that it makes it easy to create new instances of existing data types. Like I mentioned, it makes it easy to create custom 'table types. It also maximizes the benefit of prototypes, and in some cases allows completely new types to piggyback off of existing functions, which is made easy with prototypes.
The reason I call it "more extensible" is for the exact same reason I call the coerce approach more extensible. With the coerce approach, the function doesn't care what type it is, it only cares that it's possible to coerce it to the type it wants.
In the case of methods, the function doesn't care what type it is, it only cares that it's possible to extract the data it needs, using the function's methods.
---
Prototypes, however, serve a different purpose. Specifically, it tries to solve the problem where you sometimes want to extend a particular function, and sometimes want to extend many functions at once. It's designed to give fine-grained control beyond merely what the type is. Also, by letting one function serve as a "base" for other functions, it tries to reduce duplicate code.
All three concepts are designed to enhance extensibility, but they do so from different angles, and in different ways. You'll note that all three attempt to achieve extensibility by ignoring what the type of something is. Instead, they focus on either the desired type, the desired functionality, or the desired ancestor.
The three combine to form a coherent whole, which I think is as it should be.
Perhaps I should write up a post comparing the two approaches (prototypes/methods vs. pure functions). Depending on the results, that could either convince me that my idea is silly, or convince you guys that it has merit.
Coercion means you only need to define coercion rules in one spot, and existing code can Just Work.
Methods mean you only need to define a method that implements the functionality, and existing code can Just Work.
---
Prototypes get rid of the traditional concept of type entirely. Rather than saying "that's an input stream" or "that's an object of type input" we instead say "that's a function that inherits from stdin."
Of course, I don't plan to get rid of types, since I think they're useful (especially with coercion), but at the same time, I think having more fine-grained control can be useful, so I think there's some synergy between types and prototypes.