?

Log in

No account? Create an account

Previous Entry | Next Entry

the hazards of software engineering

Of late in some of the blogs and forums I read about online advertising, click fraud, etc., I have seen some claims that it is "relatively easy" to determine the physical location of the IP addresses belonging to a dialup modem bank, server farm, or some other source of traffic that someone might use for some predictable purpose (such as click fraud). My standard responses to those claims include:


  1. There's no guarantee that the registered location of the allocated addresses corresponds to the actual location of the hardware.
  2. Even if it is, IP addresses may be reallocated when organizations change infrastructures, e.g. moving to a new datacenter.
  3. Even if you can map an IP address to some physical location, this may not be the true origin of the traffic. It could have arrived via proxy, ssh, VPN, or several other remote access/usage technologies.


However, I have become somewhat uneasy about my stance, not because I think I am wrong, but because I think the criteria for judging the correctness of my response may be changing.

There was a time while I was at AV that I became very uneasy about the code I was writing. The code did what it was supposed to do, and it was developed on time. However, it wasn't always the "best" way to implement it. Due to time pressures, I was often forced to make tradeoffs that under other conditions (being part of an engineering group that weighed the pros and cons of various approaches) I wouldn't have made. What worried me was that I didn't know what would happen if my management changed, and someone came in who had done things differently elsewhere (because hir organization was empowered and organized to do so). I didn't want to be put in the position that a future manager would want to get rid of me because I didn't do what [s]he thought should be done. But I had to make decisions; I couldn't just sit around because something needed to be done.

As an example, I remember a time when something in the gladiator logs hadn't been captured by my code for some time (because I wasn't told it needed to be done). The gladiator logs were no longer online, but the reduced logs were. I determined that some data in the reduced logs correlated highly with the information that was in the gladiator logs, so it wouldn't be necessary to reprocess the gladiator logs. I presented the idea to my management, thinking that I was being wise to do so because I was saving the company time and money. However, my management asked that the logs be reprocessed anyway, because they had to have the exact information. Reprocessing did take a lot of time, not just due to the extra cycles spent actually doing it but the time spent getting the data off of backup tapes. And as I predicted, the results correlated highly with what was in the reduced logs. I wasn't happy about the entire incident for many reasons, including the fact that I (and others) were forced to do something laborious instead of applying some intelligence to the task.

So how does this relate to mapping IP addresses to physical locations? Under some circumstances, you might get reasonably good answers. You have to have traffic that's pretty clean (ie. little or no click fraud). Also, the traffic needs to be fairly recent. If you apply old traffic to a current geolocation database, you may not get the right locations because of the types of changes I mentioned above. (Which is one reason I was opposed to using geolocation data period, because in the haste of reprocessing traffic, someone might forget to apply the geolocation database that was pertinent to the time the traffic was captured, if the database for that time even existed.)

A danger of software engineering is that the same criteria is not always applied to work that's done. But what can the software engineer do, except to present the solutions that [s]he has learned (through best current practices), but do whatever management says needs to be done? Furthermore, suppose management is able to change the parameters of the project, so that they can live with more risk in the methodology of how the software is written by paying the software engineers less? They may conclude that it is cheaper in the long run to deal with potentially incorrect (but plausible) data if all they need is someone who knows how to query the data and generate reports, rather than pay more money for someone who can explain how the data may be incorrect and present viable alternatives.

Latest Month

March 2018
S M T W T F S
    123
45678910
11121314151617
18192021222324
25262728293031
Powered by LiveJournal.com
Designed by Tiffany Chow