gregbo (gregbo) wrote,

  • Mood:

independent study on Google's click fraud detection practices released

As part of the recent click fraud lawsuit brought against Google that they have agreed to settle for $90 million, an independent study has been released by Dr. Alexander Tuzhilin, a professor of Information Systems at NYU's Stern School of Business. It's a lengthy report (47 pages), containing a fairly detailed description of how Google detects click fraud, along with some historical commentary on AdWords and AdSense. Dr. Tuzhilin reached the conclusion that Google made "reasonable" efforts to combat click fraud.

Although there were some points I found a little questionable, I found myself generally agreeing with what I read in the report until I reached this:

9.1.5 History of Google Filters.


The Early Days (February 2002 – Summer 2003). When AdWords program was launched in February 2002, Google had three filters installed at that time. These filters detected and removed only the very basic invalid clicks. Looking back at these early days of invalid click detection, it is not clear to me why Google engineers could not conceive and introduce some of the subsequently developed filters which are pretty basic and obvious, having the hindsight that we have now. Also, their invalid click detection efforts were quite slow at that time: during these 1.5 years no new filters were introduced, and the whole invalid click detection effort was based only on the three filters introduced during the AdWords launch in February 2002. There are several extenuating circumstances that might have caused such a slow start:

Click fraud was a really new phenomenon at that time, much less understood than it is now; therefore Google engineers were on a learning curve trying to understand the problems associated with click fraud and the ways to combat it. Moreover, when Google launched the original version of the AdWords program in 2000, it was based on the CPM, and not the CPC advertising model. Click fraud is quite different for the CPM than for the CPC model, which means that Google engineers had to learn about new types of the CPC-related fraud at that time. This switch and the related uncertainties might have also slowed their efforts to develop new CPC-based filters.


Click fraud was NOT a new phenomenon at that time, and it was quite well understood by members of the Internet technical community. This knowledge has been known for YEARS, as long as the Internet has been in existence. It only requires knowledge of how the Internet architecture and protocols work -- in particular how easily traffic can be generated and faked.

If you've been reading my journal for a while, you'll recall that I've wondered how it could have been that Google didn't do anything about click fraud, given the talent of their engineering staff, etc. Well, if these hypotheses are correct, THEY DIDN'T UNDERSTAND THE PROBLEM. To me, that is very scary.

There is some additional text in this section suggesting that they didn't have sufficient resources to fight click fraud at the time. I can see how that might be true (AV didn't have (and didn't get) sufficient resources to fight click fraud either). Still, it is disconcerting, especially with regards to other practices within the company at the time.

One point that surprised me is that their methods of detecting click fraud are primarily anomaly- and rule-based. I would have thought they would use machine learning and other statistical analysis techniques, such as are used for identifying spam. (Although, in all fairness, such techniques may not yield useful results, since it is possible to generate fraudulent traffic that looks like nonfraudulent, but nonconverting traffic. Furthermore, statistical anti-spam techniques are "personalized" in the sense that individuals provide their own samples of "ham" (good) and "spam" (bad) emails which are used to "train" the learning algorithm to recognize spam. All advertisers and publishers do not use the same criteria in determining what is invalid, although it is possible to imagine that different accounts might employ their own training sets.)

The report reaches a (not too surprising) conclusion that CPC advertising has the fundamental flaw of not being able to identify the intent of a click. Since Google is testing CPA advertising, that would further strengthen Google's defense that it is making reasonable progress in fighting click fraud.

I think there are still a lot of unanswered questions, such as:

  • Was there ever a formal design review of AdWords or AdSense, prior to their release?
  • If so, did the topic of click fraud come up?
  • If it did, what arguments were presented? Who said what, and how was it received? How did these projects get released without consideration of click fraud? (Interestingly enough, one of the Xooglers worked on AdWords, but was laid off in September 2001.)
  • When Google was/is audited, do the auditors have access to the data that is used to determine what is/isn't fraudulent, for the purposes of charging?
  • To what extent is this data used to determine how well Google has performed financially in a given quarter?
  • Are the criteria used to determine how refunds are paid fairly applied? Ditto for the criteria for terminating (and for that matter, reinstating) publishers.

I may write more about this after rereading the report and thinking about it some more. However, I think this is just the first of many such reports. Everything that's happened until now was just a preamble. Let the games begin ...
Tags: click fraud

  • Ciena interview

    I had an onsite interview at Ciena a couple of weeks ago for a Senior Systems Test position. Long story short — I didn't get the job. I think they…

  • ProtonMail test

    I took a test from 7-9am this morning from ProtonMail, a secure email provider based in Geneva, Switzerland, that has an office in SF. The test was…

  • IBM interview

    I had an interview loop yesterday at the IBM Silicon Valley Lab facility with several people from the Cloud Network Services group. Four engineers…

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.