Log in

No account? Create an account

Previous Entry | Next Entry

The principle of robustness

In Internet protocol development, there is a principle of robustness that implementors are encouraged to follow. The principle originated in the IENs and RFCs that specified the IP(v4) protocol. I quote here from RFC 791, which is probably one of the most referenced RFCs:

"The implementation of a protocol must be robust. Each implementation must expect to interoperate with others created by different individuals. While the goal of this specification is to be explicit about the protocol there is the possibility of differing
interpretations. In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior. That
is, it must be careful to send well-formed datagrams, but must accept
any datagram that it can interpret (e.g., not object to technical
errors where the meaning is still clear)."

I had an opportunity to see the importance of this principle in testing some features of the SixXS IPv6-to-IPv4 (and vice versa) gateways. Basically, the way it works is if you're coming in via IPv6, it passes on your request to the IPv4 destination. The destination responds to the gateway with the requested page, which rewrites the URIs so that subsequent page requests generated from clicks on that page go through the gateway.

There is, however, a wrinkle that happens to intersect with some of my past experiences with HTTP and web server logs. When the gateways pass the request on to the IPv4 (or v6) host, they prepend identification data to the original HTTP User-Agent: header. (You can see an example here.) The rationale for doing this is that the User-Agent: is (usually) logged in the web server's logs, so if webmasters find this in their logs, they can take it as a sign that IPv6 usage is increasing, and hopefully it will encourage them to provide direct (non-gatewayed) IPv6 service.

However, this causes an undesirable side effect. The User-Agent: is also used by the web server to indicate that the browser supports certain features. The latest HTTP specification (RFC 2616) suggests that the "product tokens" (the browser identification) should be listed in the order of their significance. But in this case, SixXS makes its gateway the most significant of the original User-Agent: to itself. This causes problems when accessing some websites with some browsers. When accessing Yahoo's home page through the gateway using IE 7.0, you get an error message saying that the browser you are using is not supported, suggesting you to use the ones that are supported.

So, what should happen here? Should SixXS append their data to the original User-Agent:? Arguably, the gateway function is not that of a browser, so it ought not do anything to change the semantics of what the browser is trying to communicate to the server. OTOH, Yahoo could make a minor server change to scan for the presence of the common product types. Or IE 7.0 could rewrite their User-Agent: such that it is easier for servers in general to determine what it is. (Browsers that want to indicate that they support the types of features common to the original Netscape browser (and its descendants) will typically indicate "Mozilla" as part of the first product token.) It just so happens that the expected behavior happens when using Firefox with Yahoo, or using IE 7.0 with Google. I'll also note that the SixXs gateway service is part of a general movement to promote IPv6 usage, but when accessing a popular web site such as Yahoo using a popular browser such as IE 7.0 doesn't work, it's not much of an incentive for the end user to use IPv6.

User-Agent: processing is a pain in general. IMO, it is unfortunate that a simpler syntax wasn't used to encode the various combinations of browsers and the features they support. Rather, complex processing must be used to recognize the browser product tokens, which doesn't always work. If you take a look at some sample User-Agent: processing code, you'll see that it is complicated and vulnerable to small deviations from common usage.

[update] SixXS has modified the gateway to append its data after the original User-Agent:.



( 5 comments — Leave a comment )
Oct. 18th, 2007 09:55 pm (UTC)
Interesting post Greg. Have you tried to forward this to anyone at Yahoo?
Oct. 19th, 2007 07:15 am (UTC)
Re: Interesting
Not yet. I've had a couple of email exchanges with someone from the SixXS staff. I'm waiting to hear more from them before doing anything else.
Oct. 19th, 2007 09:22 pm (UTC)
Re: Interesting
I heard back from SixXS; they decided to append their data to the User-Agent:. Accessing Yahoo's home page via their gateway works fine now. (They haven't gotten around to updating the documentation yet.)
Oct. 19th, 2007 06:35 pm (UTC)
My experiences with spam and rfc2821/2822 have led me to believe that the concept of "be liberal in what you accept" is outdated. All other things being equal, the more liberally you accept deviations, the more your code will be exploited by those who would want to make your code accept stuff it shouldn't, or fail to catch an error/exploit it should catch.

I probably wouldn't want to go to the other extreme either, but in general "robust" seems to disagree with "liberal" in a world where exploits are a fact of life.
Oct. 19th, 2007 09:38 pm (UTC)
It's certainly a debatable topic in general. A problem is you may not find out for some time that the traffic you blocked was not only legit, it's part of the "new order" that you want to be a part of. Flip side is that the places where the "new order" is springing up can be potential breeding grounds for all types of unwanted traffic, as they haven't gone through the security testing the mainstream products and services have gone through. In the end, it's a matter of weighing risks vs. rewards, and being flexible enough to change.
( 5 comments — Leave a comment )