Arc Forumnew | comments | leaders | submitlogin
On the need for garbage collection
9 points by vrk 6119 days ago | 8 comments
In the spirit of the speculation by Abelson & Sussman in their infamous video lectures [1]: Suppose we had the ultimate garbage-generating program:

  (def gimme-garbage (list)
      (gimme-garbage (cons 'foo list)))
  (gimme-garbage '())
Let's assume we have tail-call optimization (in a 100-year language, we certainly should!). Then, assuming that the machine running this can do N calls a second to gimme-garbage, which includes consing the list, in t seconds we'll have, of course, t * N cons cells. For sufficiently large t and N, the size of the program doesn't matter anymore, so we can ignore how much memory it consumes.

Let's say N is 3 billion, which is just fine for today's processors (assuming one core at, say, 3 GHz can execute the function once per cycle), though conservative for what the future might bring. However, since parallel execution of the function above makes little sense given the usual definition of rfn and cons, and since current memory buses are quite limiting, let's stick with the estimate for now.

Suppose the average lifetime of a computer running this is 10 years, or 315,360,000 seconds. Then, in the lifetime of the computer, assuming there are no power failures and the program will not abort, it will generate roughly 2.8 * 10^18 cons cells.

It's a large number, but there are two things to notice:

1. It's finite.

2. It's addressable even with today's 64-bit CPUs.

I don't know how much memory in bytes a cons cell usually takes in Scheme and Lisp implementations on 64-bit processors, but let's say, for the sake of argument, that it takes 24 bytes (two 64-bit pointers plus some extra information). Then the number above is equivalent to around 62 million terabytes.

That's a rather pathological case. Most computers don't run 10 years, and most programs certainly do not exhibit similar garbage-generating behaviour. But even if we accept the assumptions, to build a machine where you would not need garbage collection nor to free memory in the lifetime of the machine, you would need only 31 million 2 TB harddisks [2] and an operating system that can map the disks to memory.

If you look at some traffic statistics [3], in 2002 the estimate of all traffic on the Internet was 27.6 terabytes/second. Since the number seems to triple yearly before that, let's say it's 20,120 TB/s in 2008 (unrealistic given current network capacities). There are maybe 1.3 billion users [4], so each user is generating garbage at 16.3 MB/s.

Supposing there was one computer per user, modern computers generate 5140 TB of data in a 10-year lifetime, provided that they are constantly on.

Thus, if computers came with, say, 10,000 TB of RAM/disk space, we wouldn't need garbage collection or to free memory.

Conclusion? We'll still need memory management and garbage collection with current hardware, but 10,000 TB per computer is not that far off. Maybe in ten or twenty years, although by then every computer will likely generate a hundred times more garbage per second.

[1] http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/

[2] http://www.lacie.com/products/product.htm?pid=11028

[3] http://en.wikipedia.org/wiki/Internet_traffic

[4] http://www.internetworldstats.com/stats.htm

EDIT: On a 32-bit system running mzscheme 352 and arc1, a 1 million element cons'd list of one symbol ('foo) seems to take 115 megabytes -- so that's 120 bytes per cons cell (all containing the same symbol).



6 points by byronsalty 6119 days ago | link

I'm missing how internet traffic relates to memory consumption per computer. This is a joke right? You just got me didn't you?

I'm waiting for the conclusion where Microsoft is going to send me $243 for every person I send this to.

Oh I get it now - the Internet is generating garbage. You, sir, are a funny one.

-----

5 points by raymyers 6119 days ago | link

It's no laughing matter!

"Conses don't grow on trees" -- Gerald Sussman.

Oh wait... that's exactly where they grow, isn't it?

-----

3 points by dido 6118 days ago | link

Not all computers will ever be as powerful as personal computers. There's a huge market for computers with limited memory and processing power, because they're embedded in other devices too, and these devices need to be cheap. Where do you suppose all those 8-bit microcontrollers, which have processing power roughly equivalent to the C-64 I first learned to program on twenty odd years ago, are used? They're in your cellphone, your watch, your air conditioner, your washing machine, in your car. Even if memory prices for personal computers fall to levels such that everyone could afford a personal computer with 10 petabytes of storage (which frankly, I don't think this will ever happen, given that the size of the entire Internet seems to be in that neighborhood), garbage collection will still be very much useful, if only in the embedded systems that are pretty much everywhere, even in subsystems of personal computers as well.

-----

3 points by ttch 6118 days ago | link

“GC? Don't Do It!” Address/memory management for a gigantic LISP environment or, GC considered harmful, Jon L. White, Proceedings of the 1980 ACM conference on LISP and functional programming (ACM SIGPLAN), Pages: 119 - 127 http://portal.acm.org/citation.cfm?id=802797

-----

1 point by ms 6118 days ago | link

I wonder whether web apps need GC... after all, you're writing stateless servers (oops, I'm on the Arc forum, well, nevermind ;P)

For a stateless server, you could just go on accumulating memory for a request from a per-handler pool, and once the request is served, you throw it all away (i.e. you re-set the pool-counter to zero.)

-----

1 point by cooldude127 6118 days ago | link

of course, GC is entirely unnecessary if you're stateless, but that is obviously not what arc's server is.

-----

1 point by byronsalty 6118 days ago | link

Why would GC be unnecessary for stateless servers? Something needs to free memory once the response is committed. Either the interpreter, the server, or your webapp. I prefer the interpreter - that's why I'm not using C.

-----

1 point by tac-tics 6118 days ago | link

Garbage collection, like tail call optimization, is a mere implementation detail. Any self-respecting Turing Machine has an unbounded storage tape. It quite like civilization's answer to nuclear waste. Just bury it, mark off that section of land, and move onto the next!

-----