April 25th, 2005

classic cylon


Slashdot is reporting that Google is making changes to its AdSense program. The most significant change to end users is that non-text ads can appear on AdSense affiliate sites. The most significant changes to advertisers are the option to choose where their ads appear and to use CPM (cost-per-impression) pricing. CPM is thought by some to be less vulnerable to click fraud; this is not necessarily so. (Bots can repeatedly submit queries causing all the impressions to be used up, for example.) There is also the issue of what constitutes an impression (is it whether the web server served up the ad, or whether the ad is viewed as a result of caching?), not to mention how the number of "people" that viewed the ad is determined.

I found a very good article on click fraud written by Dmitri Eroshenko, the CEO of Clicklab. I like the article because it gives both a technical and a business perspective on the subject. He notes that his company has developed a system that assigns weights to certain types of sessions, in much the same way that email spam filters do. I think this is the right approach, and regret that it wasn't possible for me to do something like that at AV, due to a lack of time and resources.

I also found a open-source project called hypKNOWsys that attempts to identify certain types of usage patterns in user sessions. Their code probably would not scale to a site with heavy traffic volume, such as a search engine. Also, they primarily rely on page identification using the Apache log format, which is cumbersome, and requires extra processing to determine whether or not pages are identical. I think the gladiator approach of having pages identified by numerical codes corresponding to the domain name (siteid) and rendered page (pageid) was much more efficient. It's unfortunate the implementation was not as good as the ideas. I sometimes wonder if we should have continued to use Apache but modified the page serving code to use numeric codes. I suppose the gladiator team, having had experience with Resin and Java, felt more comfortable with those, but I think they underestimated the amount and nature of AV traffic.
  • Current Mood
    thoughtful thoughtful