Sam's Two Webs

Don Box's Spoutlet

Syndication

Ever since inadvertently pulling the pin on the HiRest/LoRest hand grenade last month, I've got to admit it's been great to see folks trying to get beyond the SOA/REST noise and instead get to the bottom of what actually makes "the web" work.
 
Of everything I've seen, I think Sam Ruby's recent missive on save vs. unsafe is the closest to where I'm landing.
 
Many of us have said over and over again that "GET is special" and I strongly believe that identifying the numerous ways in which it is indeed special is the path to enlightenment on this one (at least it's proving to be so for me).

Posted Apr 21 2006, 01:28 PM by don-box

Comments

theCoach wrote re: Sam's Two Webs
on 04-21-2006 8:54 AM
Everything about this seems just a little above my head, but I want to point out something that I sense is relevent.

Web 1.0 had to emerge out of very lenient foundations. The incentives and "GET" model all pointed toward that.

For Web 2.0, or SOA, or whatever you want to call it, this is not necessarily the case.

The www took off because it connected the world, not because of a protocol, and because of its success we now have in place different infrastructure for considering what the correct balance of correctness vs. leniancy is.
Just because some architectural model works for building a two story building, it does not mean that you can build skyscapers with it. I think it is a mistake to take much of a lesson from how http got its start in the emerging internet and project that as telling us much about where to go in the future.

But, like I said, it is mostly going over my head.
John wrote re: Sam's Two Webs
on 04-21-2006 11:34 PM
'Two Webs' is a bad idea.

URIs are the truly awesome technology, not HTTP. HTTP is a humble transfer protocol.

HTTP in The Real World splits request context into 'subject'/'object' with 'session token'/'URI'. REST only admits of 'resource'. The web works that works is the one in the real world.

I think it's important to make your URIs explicit rather than rely on the content of HTTP headers. That is, if you're going to support multiple content types then explicitly indicate which content type you want in the URI. If URI is inexplicit then 'redirect' based on HTTP header content.

Whatever you do, please don't misappropriate, misinterpret or otherwise extend HTTP headers.

The web works because of URI, not because of HTTP. The web also works because of 'session tokens'. Session context is the only part of the request context which should be omitted from the URI.

John wrote re: Sam's Two Webs
on 04-22-2006 12:05 AM
What I was trying to say above is that if your URIs are explicit then you don't need to depend on HTTP headers. You don't want to depend on HTTP headers. To the extent that you do depend on HTTP headers you should only redirect if they bear on the request context.

The URI should be the *explicit* context for the request, with the exception of a 'session' or 'user' context.

If you serve documents which exist in a 'universal context' then each URL means the same thing all users and you don't need a 'subject' context.

If you serve documents customised based on user information and you are happy with user agent support for HTTP-Auth then you can use the HTTP-Auth header as the 'subject' context.

If you support user sessions then a cookie is the place to specify a 'session token' for the session. In this case the 'session' would be the 'subject' of the request.

If you put 'subject' context in the URI then users can't share or bookmark links as effectively. The 'subject' of a request is usually taken as 'implicit' from a user's perspective so it makes sense to leave it out of the URI. There are also minor security and privacy concerns with indicating the 'subject' of a request in the URI. Often it's better to leave the 'subject' context 'out of band'. Many web sites happily use sessions (with session tokens in a cookie); I don't believe it's true that their effect on scalability is a limiting factor.

Once you have the 'subject' (a user, or a session), the 'object' (a URI) and a 'verb' (i.e. GET) then you can construct the entire context necessary for your program to generate the implied response. In short: try not to rely on HTTP headers, you don't really need them anyway.
John wrote re: Sam's Two Webs
on 04-22-2006 12:13 AM
...if you need a slogan, it's not 'Two Webs'.

It's: 'More Resources'.

XHTML would be nice. Perhaps you could let the IE7 team know..?
From 9 till 2 wrote Putting in the Potent
on 04-22-2006 5:49 AM
M. David Peterson wrote re: Sam's Two Webs
on 04-23-2006 8:19 AM
Yo Coach,

> But, like I said, it is mostly going over my head. <

It shouldn't be, but its good you recognize this as a possibility.

For example:

> The www took off because it connected the world, not because of a protocol, <

False. The WWW was ALL about the protocol. The internet, in various forms, already existed. TCP/IP, the backbone even in today's "modern" internent, began its packet transmission duties in the early 70's. For all intents and purposes the world was already pretty well connected by the time Tim Berners Lee decided it was time to take these connections and do something useful with them... like connecting people.

But please don't take this to mean I agree with your point... I don't. As mentioned, TBL's WWW was ALL about protocol and markup. HTTP and a subset of SGML called HTML. However, keep in mind at this stage of the game SGML has a solid two decades of research and development, starting with Charles Goldfarb, Edward Mosher and Raymond Lorie in 1969 as the Generalized Markup Language, or GML. FastForwarding through a lot of history you can access for free on Wikipedia, SGML ultimately became the standar, building on the work, obviously, of GML.

Given the fact that HTML needed to be parsed, and was a subset of SGML, it made obvious sense to use the already existing SGML parsers to parse the HTML into something a machine could then more easily render onto a screen for reading.

So with all of this preexisting stuff, why didn't someone just connect the dots and called it the WorldWideWeb?

Obviously thats exactly what Tim Berners-Lee did, and he did it with a protocol called HypterText Transfer Protocol, or HTTP.

So what this all comes down to is one thing...

The reason the WWW, invented by Tim Berners-Lee, was able to accomplish what it has was because of HTTP.

Why?

Because it was SIMPLE! It didn't require state to be maintained and it was built on top of DNS (developed in 1983 by Paul Mockapetris) which meant that the system could still survive as long as at least one DNS server was still alive. It would crawl to the speed of traffic between the 520 overpass, and the 85th St exit in Redmond (doesn't matter what time... pretty much anytime on that patch of road will work.. pick one and go with it ;), but it would still work.

Fault Tolerancy in its finest hour.

To finalize, the genius of Tim Berners-Lee existed in his ability to not reinvent anything that didn't need to be, extending into the proper areas (HTTP) such that even my five year old son could more than likely figure out how to build one (with a little help from Papa, but he's a sharp kid, so not much :) and as such, implementations from hackers around the world began to spring up like Tulips in Skagit Valley (are you from the Seattle area, by the way??? if not, some of this might not make much sense... but the important parts are non-geo specific, so I think we should be okay :)

So, moving on to John...

Please read the above, check out a library book on the subject, visit your local neighborhood Wikipedia, Amazon.com, or Barnes & Noble... or simply pick another profession... cuz' I'm sorry to suggest that if none of these sound interesting to you, I don't think the software business and you were made for each other.

I'm not stating this as an absolute... but seriously, this information is free and READILY available... so why not use it?

Peace and love unto you all (oh, and to Don... FANTASTIC post by the way.... I agree... a lot can be accomplished with GET... Not that you are in need of the info, but if anybody else cares to explore some fun ways to take advantage of all that GET and the URI have to offer, please see: http://www.xsltblog.com/archives/2006/02/what_rest_gets_1.html)



John wrote re: Sam's Two Webs
on 04-23-2006 9:04 PM
> I don't think the software business and you were made for each other. <

Ouch.

That's OK though, I'm thinking about moving into the Ninja industry anyway. I just need to pump a few more agility tomes.

I think my point that URI is the 'key' enabling feature of HTTP is pretty important.

Despite what you say, HTTP isn't 'simple'. It's just that HTTP is basically irrelevant. As long as you get the verb 'GET' right, and as long as you have a URI that indicates explicitly what you want, then HTTP works 'enough' to get you where you want to go.

The reason sloppy HTTP implementations work is mostly because of the explicit nature of URI. Certainly [X]HTML is a very important technology too.

The point I was trying to draw out is this: REST doesn't allow for personalised resource representations without modifying the request URI to indicate the 'subject' of the personalisation. Lot's of HTTP implementations use a 'cookie' to serve as the 'subject' context which representations can vary by. As this is true, what I'm concerned might happen is that in creating APIs around HTTP people start to try and put this additional context into HTTP headers (other than cookies). If this happens then the web won't work so well as it does now.

That is, if in order to programmatically access your web application via HTTP I need to add some extra HTTP header, then you'll have broken the internet. (At which time it'll be straight to middle management for you.)

See this [1] for example. See how the NewsGator API, which alleges to be 'RESTful' requires that you add a custom HTTP header to each request? Yeah. That breaks the internet. Don't do that. The problem that the NewsGator people have is that they need a 'subject' context to personalise/authorise access via HTTP. The problem with REST is that it doesn't support this at all, which makes REST pretty much useless for a massive number of web application developers.

Anyway, I'm done here now.

[1] http://www.newsgator.com/ngs/api/NewsGatorRESTAPI.pdf

G. Roper wrote re: Sam's Two Webs
on 04-24-2006 10:27 AM
John,
Why can't the URL contain subject (customization) info. Suppose we want to return data on a popular utilities stock to two users, GRoper and BDodson and to an anonymous user:

1. http://www.mysite.com/utilities/Enron/GRoper
2. http://www.mysite.com/utilities/Enron/BDodson
3. http://www.mysite.com/utilities/Enron/

Now the first two URLs contain subject information and the last does not. Yet all 3 can be designed to retrieve the same information about the selected utility. In the case of the first two, the returned resource can be formatted to suit each particular user, either GRoper or BDodson. The customization information may be in the URL or at some other URL keyed by the contents of the current URL. In 3., a standardized resource with no customization is returned to the client.
This also means that should BDodson look at a URL obtained from GRoper, she will see the same thing that GRoper does.

I'll admit that it is also possible to interpret the URLs above as returning information specific to each user and possibly not to be shared among users, but this interpretation is application-specific and, in this case, I (as the developer) chose to not interpret them in that manner.

Am I missing something here?
Christian Mogensen wrote re: Sam's Two Webs
on 04-24-2006 1:10 PM
Authentication is an additional HTTP header that lets GRoper and BDodson get their personalized views of the resource at http://www.mysite.com/utilities/Enron/ without additional jiggery-pokery.

Your examples work great if GRoper wants to expose his view of the resource to the world, but assuming he wants to keep it private, he will need to provide a username+password somehow.

This info either ends up in a HTTP-auth header, or in a cookie (when using forms-based login).
Neither breaks the internet.

But yeah - you are absolutely right: the URL can encode extra info used to customize the output. Usually this is done in a query string rather than in the URI itself.
http://www.mysite.com/utilities/Enron/?user=BDodson
http://www.mysite.com/utilities/Enron/?user=GRoper
John wrote re: Sam's Two Webs
on 04-25-2006 1:15 AM
> Why can't the URL contain subject (customization) info? <

I didn't say it couldn't. It can if you want. I said that it won't always.

My point was that a lot of people won't want to put subject context in the URI (myself included). I won't go into the motivations, but I will say that since it is true that the subject will be left out of the URI (as it is with session tokens in cookies in many web applications presently) then there will need to be a place for this additional context to be specified.

My agenda here was to warn that since many developers will be doing this (regardless of what anyone else thinks about that) that it is important that they don't abuse HTTP headers to relay this context. I've shown specifically how the NewsGator API abuses HTTP for an additional 'subject' context, and I'm sure they won't be the last.

Hopefully you can see why that will lead to a serious problem, because I'm not going to offer the long drawn out explanation of why.

The first step is acceptance. The next step is to agree that cookies (or HTTP-Auth in a small number of cases) are the only reasonable place for this context to be.

I've been shouted down one too many times over this, and I really don't have the energy or motivation to continue discussing it. Don's a smart guy, so hopefully he's heard and understood what I've said, and hopefully a big player like Microsoft won't end up breaking the web (again).
theCoach wrote re: Sam's Two Webs
on 04-25-2006 7:04 AM
M. David Peterson,

"It shouldn't be, but its good you recognize this as a possibility."

At least we agree on something ;)

I think, at the very least you are discounting the technology. Personal computers had to reach a critical mass. Modem technology had to go beyond placing an old school headset onto a do-hickey, transmission speeds had to be above a critical level, etc.

I think it is pretty easy to imagine a world where a different protocol took off at around the same time had Tim Berners Lee never existed. Perhaps I am wrong on that, and only a protocol that had the basic charactersitics of HTML and HTTP had that possibility.

What I do not think you addressed, is my point about the lessons of an emergent technology phenomena, the www, providing iron clad lessons for building the next, hopefully more interesting infrastructure.

My view of how that will happen, is that tools, validation, emerging business-specific standardized data structures, and computer to computer communication/automation will play a much, much larger role in generation two, for lack of a better descriptor. Those factors point toward a different balance.
Also, there is a lot more effort being focused on getting this right, providing different incentives for how this all plays out.

M. David Peterson wrote re: Sam's Two Webs
on 04-29-2006 8:30 AM
John, the Coach,

Glad to see you both have a sense of humor... Not everyone has the ability to roll so well with the punches to then follow-up with a solid response. So thanks for that. :)

John,

I think you are correct in the sense that I really didn't put enough emphasis on the power of the URI in my comments. HTTP is really the transparent piece to all of this, and should remain as such. It's comforting to know that at a pinch, we could always get my son to write us an HTTP server, but the fact that he can continue playing with his trucks in the sandbox and not have to worry about such things should provide us all a little bit of added comfort.

Coach,

>> What I do not think you addressed, is my point about the lessons of an emergent technology phenomena, the www, providing iron clad lessons for building the next, hopefully more interesting infrastructure. <<

Yep. Good point. The "Worser is Better" approach is what I believe applies best to this... In other words, if there is something of value in regards to the next generation of the transfer protocols (peer-to-peer protocols seem to be the obvious front runner in regards to technologies that are driving us closer to the grid-influenced future that seems to be the underlying force that gets each of us out of bed each morning) it is that they need to be just enough (in regards to technical difficulty) to get the job at hand done, and simply leave the rest alone.

The WS-* group of standards tends to get a lot of trash-talk for being overly complex, and/or trying to take the better-is-better approach, but it seems to me that folks seems to overlook the fact that the reason that WS-DeathStar (see: David Heinemeier Hansson’s > http://www.loudthinking.com/arc/000585.html) is funny (to me anyway; others I'm sure feel differently) is more because they chose to Atomicize the foundation instead of building a Super-Sized, one size fits all platform specification, and less about WS-DeathStar or WS-WalkTheDog (see: http://www.oreillynet.com/xml/blog/2006/04/kurtcagleqotd_a_new_service_st.html) resulting from such "madness."

If I were to make a prediction about all of this it would be that the group of WS-* standards will continue to be a primary force driving us into our ever-more-closely-connected future, but someone like Jesse James Garret will come along and take what we all call W(eb)S(ervices)-*, rename it after something found in the bathroom cleaning supplies aisle, and somehow be labeled a "Father of ..." and regarded as a visionary leader fighting for the rights of every decent, hard(ly) working, 9-5 mort on the planet, when in fact its Scott Isaacs, Adam Bosworth, Derek Denny-Brown, and an entire army of other Microsoft developers, as well as folks like Tim Bray, Tim O'Reilly, and plenty of other non-MS folks who deserve the "Father of" label. (see: http://www.oreillynet.com/xml/blog/2006/04/keepin_it_simple_1.html#comment-27466 for some fantastic commentary from Len Bullard on the matter)

Of course, if the current trend of Bathroom Cleanser Acronymic naming conventions continues forth in its "blind-leading-the-deaf" evil ways, the result will probably be something that no good Lisp-fearing Weenie on the planet could take any sort of pride in knowing they played any part in such non-sense, as when all is said and done, we all know (well, at least in ten years when we're all programming in the "New Hotness" programming language called Lisp (a prediction from Chris Sells which to me feels just about right)) that everything is just a layer, on top of a layer, on top of Yet Another Layer, on top of a Lisp foundation.

While I think the general idea for what this might be called can be found in the term:

> ASynchronous Scheme, Haskell, and Objectified Lisp Extensible Markup Language <

Let's just hope this > http://www.xsltblog.com/archives/2005/06/via_google_code_1.html < isn't the result. :)


Add a Comment

(required)  
(optional)
(required)  
Remember Me?