Arc Forum | Interestingly, I'm finding that I don't think we need a version control system f...

Arc Forum

5 points by CatDancer 5994 days ago | link | parent

Interestingly, I'm finding that I don't think we need a version control system for distributing and sharing hacks.

Which doesn't mean "don't use a version control system". Instead, I think it means "use whatever version control system works best for you". Because if we can share hacks without a version control system, then you can freely choose which one you want to use, if one would be useful for your work. Your using darcs doesn't make it harder for me to use git.

How is this possible? So far this is largely speculation on my part. I've been thinking about how to share hacks and have some ideas, but not much implemented yet.

The first step for me was my "minimum distance from Arc" approach to hacks.

I started using this approach because I had hacks that I found useful and important (and I thought other people might find them useful also), but I thought didn't need to (and in some cases I thought shouldn't) be incorporated into Arc. Some were even single-project hacks: a hack that I wanted to use in a specific program but not one that I wanted in my other programs.

So being able to apply each hack independently became important to me. As a side effect, I found that it made merges really easy.

I find that getting a hack into a "minimum distance from Arc" configuration takes a little work. I don't program day to day with the original, pristine arc3, I have an arc3 with my usual ten or twenty hacks applied. Then I have something that I've written as part of my application that I think might be useful to other people, so I extract it as a patch or a library. Then I need to take the original arc3, apply the hacks that I think are the minimum prerequisites, create my patch or library file, and then test everything to make sure that I haven't forgotten anything. The testing process too manual and tedious for me right now, so I'm working on an "example runner" to make that part go faster.

Once I've done the work to get a hack into its "minimum distance from Arc" configuration, I've found that "rebasing" a hack onto a new version of Arc is easy. For a patch, I apply the old patch to the new Arc, fix problems if any, and create a new patch against the new Arc. Which I can easily do with just "patch" and "diff", no version control system needed.

That this is possible at all is due to Arc's design finding the minimum set of code to implement given functionality. Because of this, my hacks to Arc often end up being only one or two changes to Arc, which in turn means that hacks can be applied independently. That they can be applied independently means that I don't need to be merging a bunch of pg's changes with a bunch of my changes.

As a next step, going beyond the "minimum distance from Arc" approach, I'm now looking at ways to implement my hacks using minimal changes to Arc. I now look at one of my hacks, and I think, "OK, if Arc were more hackable, then I wouldn't need to be patching Arc to do this. I could be redefining some function or calling some hook to implement my change. So what can I do to make Arc more hackable, which I could then use to implement my hack as a library?" So now I have some smaller change to Arc which enables my hack.

As an example of the "minimal change to Arc" approach, if I were implementing a documentation system for Arc, I personally wouldn't put the doc strings inside the source code file. If, for example, the implementation of a function in arc.arc changed, then I might or might not need to also change the documentation (depending on whether the change made a difference in what the function did). If the documentation doesn't need to change, then I don't need to be merging my documentation patches with pg's changes in the same file.

Finding the minimal change to Arc has the additional advantage of enabling greater flexibility. If the documentation were in a separate file, then someone who wanted to translate the documentation into a different language (I mean people languages like Russian, French, Mandarin etc.) could do so easily, and I could choose which language I wanted by pointing the documentation system at that file. And, if a new arc.arc came out, I could use the new Arc while still reading the old documentation, while I was waiting for the nice translator to translate the new documentation into my language.

I still use version control for my own application since I'm spending most of my time writing functionality, not working on making the code as minimal as possible. And, if I were working on a library together with other people, I'd want a version control system to help with that. However, once the library is ready to be published, I'm imagining that it won't need to be published using a version control system, instead it can be source code and patch files, with some meta-information about dependencies.

If I turn out to be right, the upshot of all of this is that we don't all need to choose the same version control system. You can use git, or darcs, or whichever one works best for you. There should be enough meta-information published with hacks so that if you want to use a hack written by someone else, a simple script will be able to import it into darcs or git making the release history appear as commits. Then you can use the tools of your favorite version control system to track changes and, if you're hacking the hack, merge or rebase your changes on top of a newly released version of the hack.

1 point by rntz 5994 days ago | link

The things you describe - using patch and diff to update hacks for new releases, applying and merging patches, etc, are essentially what a version control system like git does under the hood. So instead of letting a VCS handle this for you, you're doing it manually. And with all due respect, I suspect this works well only with small, few, and simply dependent patches.

Not that making patches small isn't a good thing for other reasons, but I don't like having arbitrary limits on the complexity of my hacks to arc. The hygiene branch on anarki, for example, introduces massive changes to the underlying arc, yet it's a worthwhile hack and I'm thinking of porting it upstream to arc3. The coerce patch, while not nearly as large, involves many deletes and hence changes many line numbers in arc: if I merely represented hacks as diffs, then combining that hack with others would be a pain in the ass due to this changing of line numbers - a simple diff/patch scheme doesn't maintain the history necessary to do this automatically, so I'm stuck fixing the patches on my own. But this is exactly one of the tasks VCSes are built to handle.

Similarly, I'm trying to port many of anarki's changes to arc2 upstream to arc3, and I've already generated 14 distinct and often dependent hacks. I can only see more being generated as the process goes on. Updating all of these hacks for the new feature additions manually would have taken me an hour, maybe several. With git it took me about half an hour (well, more thanks to a stupid mistake I made). If I write the script I have in mind for handling this (which merely drives git) I should be able to do it in ten minutes. That's about the time it should take, IMO.

Keeping changes minimal is good, but it's no panacea. Issues of scale will come up, as they have in other projects; VCSes were created for a reason. If someone were to write, as you suggest, a set of scripts for manipulating hacks - applying, merging, rebasing - with metadata about dependencies, they'd essentially have built a simple VCS!

(Darcs, in particular, is built around the notion of commuting patches - of extracting a given patch or set of changes from those surrounding it in a development history - which allows for precisely the kind of "independent" application of hacks that you desire.)

-----

1 point by CatDancer 5994 days ago | link

If someone were to write, as you suggest, a set of scripts for manipulating hacks - applying, merging, rebasing - with metadata about dependencies, they'd essentially have built a simple VCS!

Maybe. git wasn't up to handling the dependency part.

I suspect this works for you only because you have small, few, and simply dependent patches.

Yup. That's true. I'm imagining, with sufficient effort, large patches can be made small by making Arc more hackable. But unless and until that work is done for a particular patch, if it is possible at all, then a version control system will be necessary for the reasons you describe.

I did write a script to rebase my patches on top of the succession of arc3 releases ^_^ Despite all the various features of git, I needed to write the script to keep the rebase work from being unbearably tedious. What I found interesting was that once I had the script, I didn't need git for anything.

I'm imagining that as I publish my hacks, and if there are some that you'd like to use, that I'll be able to have a script automatically push them to darcs or git (whatever VCS you want to use), showing the succession of releases of a hack as commits in a branch, and then you'll be able to use the normal VCS mechanisms to merge and keep track of changes. If this turns out not to be true, then let me know and I'll see what I can figure out...

-----

3 points by shader 5994 days ago | link

I think that both CatDancer and rntz have good points, the problem is that they are working on different things and thus have different perspectives.

CatDancer is mostly working on libraries - independent pieces of code that, while they may redifine some things, or make minor changes to the base language, mostly leave it alone. These can be written easily in the "shortest distance from arc" method and still work well together because they don't actually change the base of arc. This also means that while VC is useful in development, it is hardly needed in publishing.

rntz has been working on porting changes from arc2 to arc3 and some other hacks, many of which require major changes to the arc code base itself such as his coerce hack. These need to be VC'd so that they can be understood by other users of Anarki, and so that they can work more easily with eachother. Since many of them create major changes to the codebase, it can sometimes be a challenge to get them to work together and as he says plain diffs would be a nightmare.

I may be summarizing a bit much, and I'm sure there are more subtleties to it than that, but it's how I understand the situation.

As far as I can tell, they can be handled separately. The publishing of libraries and miscellaneous code can be done without publicly visible version control, but the Anarki base should probably be versioned.

Git is still a good choice for what we're doing, as far as I can tell, because for one thing all it is is scripts built for managing it's base. That is the essence of the so-called porcelain. I'm sure it won't hurt anyone to add a little bit more in the way of domain-specific tools. They might be useful to others as well.

So it sounds like we have to things going on here:

1) We need a meta-data and library publishing system for people to share code that they've written to use with arc.

2) We need a slightly better method of handling changes to the arc base so that people can coordinate massive changes thereto. I think that individual commits works ok for minor changes, but it rapidly gets complicated to handle if they start stepping on eachother. That's when we should switch to "branches" which can be interpreted to mean alternate conceptions of arc. Case in point being the hygienic version of arc in the hygiene branch.

At some point we just admit that the lisp community is hopelessly built on making their own incompatible versions of lisp, and everyone makes their own fork of arc. Magic merges and rebasing can only handle so much incompatibility in the history of the system.

-----

2 points by CatDancer 5994 days ago | link

Git is still a good choice for what we're doing

I wouldn't dismiss the possibility of using darcs for your project. git was designed to merge tens of thousands of lines of code in a fraction of a second; that doesn't mean it isn't useful for other tasks, but it also doesn't mean that darcs might not be better for the kinds of tasks that you find yourselves needing to do.

-----

3 points by shader 5994 days ago | link

Git was also designed to have a fast, simple back end, with a script based front end. Each command that you're used to using with git is just a shell or perl script that calls a few basic git functions to mess with blobs. Therefore, it is easy to add new tools and scripts. A good example given the darcs context would be: http://raphael.slinckx.net/blog/2007-11-03/git-commit-darcs-...

My point is that the architecture of git allows us to write scripts that work at the same level as the other git scripts; we can keep or leave the others as desired.

-----