"Why is HTML/XML a well-suited format for text and Lisp not? Where are the differences?"
The shorthands of HTML can be pretty nice. For instance, it's nice that HTML collapses whitespace to a single space, and I think there's actually an odd boost in brevity and readability of rich text:
lisp: "Said the program, \"" (i Helloooo) ", nurse world.\""
HTML: Said the program, "<i>Helloooo</i>, nurse world."
Racket tries to get around this shortcoming with its own TeX-like syntax, Scribble:
TeX: Said the program, ``\emph{Helloooo}, nurse world.''
Scribble: Said the program, "@i{Helloooo}, nurse world."
Arc has its own mechanism:
atstrings: "Said the program, \"@(i Helloooo), nurse world.\""
Then, of course, there are markdown languages, which try to incorporate kludges that look similar to the everyday kludges people already use when trying to communicate in unformatted Unicode:
markdown: Said the program, "*Helloooo*, nurse world."
I prefer my own approach, which is probably just a more minimalistic variant of TeX, supporting only a lisp-like section syntax. It lets the section type dictate any idiosyncratic treatment of the section body:
Chops: Said the program, "[i Helloooo], nurse world."
"And why is it not good for distributed, networked computation?"
I mostly make that claim because I strongly believe in David Barbour's RDP (Reactive Demand Programming). RDP, like much research (and practice) in distributed computation, deals with designing control flow constructs that represent changes in the location of the computation. This makes it easier to manage behavior on multiple machines from a single body of source code.
In Web programming, distributed code typically takes the form of a program that primarily runs on the server but generates HTML+CSS+JS. Tools intended to help in this process include text-munging tools like PHP and Rails's use of templating syntax; control-flow-munging tools like Racket and Arc's Web continuation management; and languages whose distributed control flow is more seamless by design, like Opa and Ur/Web.
While most forays into distributed computation, including these, seem to start from an imperative language, RDP starts from a reactive one. A reactive control flow graph applies at any given instant, rather than implicitly carrying interactions across time. Thanks to both time- and space-transportation being represented explicitly, it's probably easier to account correctly for the time cost of space transportation. (And when state, being the time-distribution channel it is, is represented externally to the program, it's easier to use other tools to manage it.)
Although RDP is reactive, it even diverges from popular reactive models like FRP (Functional Reactive Programming) by abandoning the idea of discrete events. Discrete events tend to complicate fan-in computation flow (how do we zip and deduplicate event streams?), and yet they're inessential to the continuous and tentative way we actually interact with the world.
Much of the complexity of Web programming is likely due to the fact that our space- and time-distribution standards--HTTP requests, HTTP caching, cookies, WebSockets, HTML forms, HTML local storage, etc.--are designed with eventful interaction in mind. JavaScript (and potentially a JavaScript-like lisp) only enables our event addiction.
On the other hand, HTML and CSS have reactive semantics. They're very inelegant as they stand today, but with some maintenance, they might represent a better evolution path toward a simple Web.