Stop Site Scraping
Tuesday, March 6th, 2007A few days ago my partner John, noticed that our web sites content was being scraped. It wasn’t all that concerning at the time, but last night he did some keyword searches on Google to check our rankings and noticed that the site with our stolen content actually ranked higher than ours!
This was obviously a problem. So, we immediately needed to figure out what steps to take. John sent a DMCA to Google as I put together a cease and desist letter to send off to the site owner, domain registrar, and their host. During all of this, we were determining the IP address of the site. We did a whois on their domain name which resulted in 3 different IPs and when we pinged their domain we found a forth.
In order to block their domain ranges we added the following to the .htaccess file:
Order Deny,Allow
Deny from 127.0.0.0
This will block access for any user with an address in the 123.123.123.0 to 123.123.123.255 range.
John then thought of a way to use this to our advantage. What if we detected any traffic from their domain and instead of blocking it, we redirect it to our homepage so they become OUR visitors. We created a rewrite condition like:
RewriteCond ${HTTP_REFERER} ^123\.123\.123\.
RewriteRule .? index.php [R=301,L]
This should redirect anyone from their domain to our homepage. Now to just see if it works!
You may also want to stop people from linking to your images, javascript, swf, and css files. This is known as HotLinking, and it cost you bandwidth when they do it. If you would like to prevent HotLinking then add the following to your .htaccess file.
# START Prevent HotLinking
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?search-this.com/.*$ [NC]
RewriteRule \.(gif|jpg|js|css|swf)$ - [F]
# END Prevent HotLinking
This will prevent HotLinking to your gif, jpg, js, css and swf files. Just remember that mod_rewrite should be enabled for this to work.
You may also decide you want to replace a HotLinked image with your own image. To do this add the following to your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?search-this.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.search-this.com/images/hotlinked.jpg [R,L]
Now when they link to one of your images it will display the alternate image that you provided.
Hope this helps someone out there…
And finally, if you need to find any merchant account related information… don’t visit the imposter’s, visit the Ultimate Merchant Account Resource.
