October 18th, 2007

classic cylon

The principle of robustness

In Internet protocol development, there is a principle of robustness that implementors are encouraged to follow. The principle originated in the IENs and RFCs that specified the IP(v4) protocol. I quote here from RFC 791, which is probably one of the most referenced RFCs:

"The implementation of a protocol must be robust. Each implementation must expect to interoperate with others created by different individuals. While the goal of this specification is to be explicit about the protocol there is the possibility of differing
interpretations. In general, an implementation must be conservative
in its sending behavior, and liberal in its receiving behavior. That
is, it must be careful to send well-formed datagrams, but must accept
any datagram that it can interpret (e.g., not object to technical
errors where the meaning is still clear)."

I had an opportunity to see the importance of this principle in testing some features of the SixXS IPv6-to-IPv4 (and vice versa) gateways. Basically, the way it works is if you're coming in via IPv6, it passes on your request to the IPv4 destination. The destination responds to the gateway with the requested page, which rewrites the URIs so that subsequent page requests generated from clicks on that page go through the gateway.

There is, however, a wrinkle that happens to intersect with some of my past experiences with HTTP and web server logs. When the gateways pass the request on to the IPv4 (or v6) host, they prepend identification data to the original HTTP User-Agent: header. (You can see an example here.) The rationale for doing this is that the User-Agent: is (usually) logged in the web server's logs, so if webmasters find this in their logs, they can take it as a sign that IPv6 usage is increasing, and hopefully it will encourage them to provide direct (non-gatewayed) IPv6 service.

However, this causes an undesirable side effect. The User-Agent: is also used by the web server to indicate that the browser supports certain features. The latest HTTP specification (RFC 2616) suggests that the "product tokens" (the browser identification) should be listed in the order of their significance. But in this case, SixXS makes its gateway the most significant of the original User-Agent: to itself. This causes problems when accessing some websites with some browsers. When accessing Yahoo's home page through the gateway using IE 7.0, you get an error message saying that the browser you are using is not supported, suggesting you to use the ones that are supported.

So, what should happen here? Should SixXS append their data to the original User-Agent:? Arguably, the gateway function is not that of a browser, so it ought not do anything to change the semantics of what the browser is trying to communicate to the server. OTOH, Yahoo could make a minor server change to scan for the presence of the common product types. Or IE 7.0 could rewrite their User-Agent: such that it is easier for servers in general to determine what it is. (Browsers that want to indicate that they support the types of features common to the original Netscape browser (and its descendants) will typically indicate "Mozilla" as part of the first product token.) It just so happens that the expected behavior happens when using Firefox with Yahoo, or using IE 7.0 with Google. I'll also note that the SixXs gateway service is part of a general movement to promote IPv6 usage, but when accessing a popular web site such as Yahoo using a popular browser such as IE 7.0 doesn't work, it's not much of an incentive for the end user to use IPv6.

User-Agent: processing is a pain in general. IMO, it is unfortunate that a simpler syntax wasn't used to encode the various combinations of browsers and the features they support. Rather, complex processing must be used to recognize the browser product tokens, which doesn't always work. If you take a look at some sample User-Agent: processing code, you'll see that it is complicated and vulnerable to small deviations from common usage.

[update] SixXS has modified the gateway to append its data after the original User-Agent:.