?

Log in

No account? Create an account

Previous Entry | Next Entry

passion, take 3

While reading incredibillfeed, I noticed that he was one of the panelists on the Search Engine Strategies forum on bot obedience. Turns out he is the creator of CrawlWall, which is a firewall for crawlers (aka bots, the programs that pull web content so it can be indexed by search engines). He and I have debated the merits of certain types of blocking policies on WebmasterWorld. For example, he favors blocking server farms, claiming they are homes for malicious bots that scrape site content (putting it on other sites that claim it is their own), not to mention click fraudsters. My response was that while this is certainly possible, it's best to ban access to one's site based on observed bad behavior, rather than presumed identity. After all, the entity accessing the site may be perfectly legit, just using a user agent no one has ever heard of before (because it's new). Also, IP addresses change hands, so what could have been a server farm last week could be a subscriber pool this week.

Anyway, based on what he writes I can tell he is very passionate about what he does. However, I couldn't see myself trying to create a company to do something like that. The reason is because the problem he is trying to solve is not intrinsically interesting to me. This is not to say it isn't important or doesn't have value, but it's not something I'd care to spend most of my time on. And when one is trying to get one's business off the ground, especially when it entails a good deal of software development, one must be willing to spend most of one's time on it. There is just too much competition, especially from companies with very deep pockets. If they sense there is a market to be served, they'll easily jump in and steal your market share. (You may get lucky and they'll buy you, but you still have to put in enough effort to create a business that someone would be willing to buy.) Also, in this type of business, there is a lot that could go wrong, despite the best of intentions. There are a whole lot of potential pitfalls due to unexpected corner cases, unreliable ISP service, etc.

In case you're wondering why I wouldn't want to do something like create a crawler firewall, but spend a fair amount of time repeatedly playing the same measures from a piano piece, or repeatedly practicing a dance pattern, it's because even though those activities may seem boring I derive value from the end results. However, as I wrote some time ago, I'm not passionate about either of those – not to the extent that I would spend most of my time on any one of them. I try to spend equal time on several activities, or if that's not possible, enough time to meet the goals I've set for each. But with regards to dancing, I don't dance much; perhaps once a week outside of my lesson, and I'm starting to think that after the showcase I won't sign up for any more lessons for a few months. It seems as if dancing is a luxury I can't really afford right now, financially or timewise. When I'm more actively involved in dancing it feeds on itself and I want to be even more involved, but it just seems like it will go on the back burner in the immediate future.

Comments

( 3 comments — Leave a comment )
(Anonymous)
Aug. 13th, 2006 08:02 am (UTC)
IncrediBILL here...
Greg,

It's not terribly exciting except the challenge is I was told it can't be done and I'm doing it. Not to mention I really had no choice as certian high speed scrapers were knocking a couple of my sites offline for 5-60 minutes at a time, so it's something of a self-preservation activity as well.

Apache didn't provide the tools I needed, nor did anyone else, so I wrote one and after I did people showed interest in it so I'm moving toward making it public, possibly free, unknown at this time.

We never got into the technology in our previous discussions but the trick is I'm not actually dropping all of them in an actual firewall, otherwise I wouldn't be able to see what requests are reaching the server. It's a script that analyzes incoming traffic attempts and offers various captcha-type challenges to let humans access a site yet blocks bad bots from a server farm, and can serve up actual server errors as well telling them to go away.

I get updates all the time on new activity and any new worthwhile bots won't be blocked long as it pops up on my personal control panel for analysis within minutes of access.

The REAL problem with server farms, besides all the bad bots, which another panelist also talked about, is that they host a ton of proxy servers that cloak lists of URLs to the search engines and can actually hijack your website.

I get all sorts of notification, even when the bot user agents change for a particular location, so I'm on top of it.

Not to mention the software allows you to initially "preview" what would happen, then set the levels of protection to a level the webmaster feels comfortable with using.

The power of automation ;)

I kind of envy some your other pursuits as I used to play in a symphony and 50's big band in the early 80's and abandoned it for all the long hours of a computer career. Used to play 4 instruments and now I can't play any, sigh.

Also abandoned my other artistic endeavors, such as drawing, pastels, etc. and the only thing I have left all these years later is photography and the only reason I have time for that is it's push button art.

So after reading how you torture yourself over taking more lessons and such, I recommend you do as I miss it a lot, and it's much harder to get back to your skill level after you drop it than it is to continue even at a limited level.

Good luck with it :)

gregbo
Aug. 14th, 2006 04:34 am (UTC)
Re: IncrediBILL here...
It's not terribly exciting except the challenge is I was told it can't be done and I'm doing it.

Hmmm ... I wonder why they said it couldn't be done?

We never got into the technology in our previous discussions but the trick is I'm not actually dropping all of them in an actual firewall, otherwise I wouldn't be able to see what requests are reaching the server.

Sorry about that. I was trying to describe it in general terms.

So after reading how you torture yourself over taking more lessons and such, I recommend you do as I miss it a lot, and it's much harder to get back to your skill level after you drop it than it is to continue even at a limited level.

Hmmm ... does it seem as if I'm torturing myself? I don't mean to come off that way. Just thinking "out loud." I've pretty much reached my decision. Once having made up my mind I don't feel much conflict although I feel some regret. In general I'm trying to be realistic.



(Anonymous)
Aug. 14th, 2006 05:10 am (UTC)
Re: IncrediBILL here...
Greg,

They said I couldn't stop bots because of stealth techniques, if you're interested in my presentation sticky me on WMW and I'll send you a link to the PPT file. The problem is the people that really don't want to be stopped take great lengths to about being stopped. Someone I know that writes these crawlers said point blank "a lot of inexpensive webhosting accounts" were being employed to snare a page or two from here or there, being stealth, meaning to me blocking hosts was the only way to aggressively attack the problem.

OK, maybe you aren't torturing yourself, but I put myself in your place when I was debating those issues and it was torture for me to give up one passion for another, which is why I feel incomplete as I miss the music.

Oh well, that's the was the bass clarinet reed crumbles...
( 3 comments — Leave a comment )

Latest Month

December 2017
S M T W T F S
     12
3456789
10111213141516
17181920212223
24252627282930
31      

Page Summary

Powered by LiveJournal.com
Designed by Tiffany Chow