Journal of Information Science

 

Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Register here to gain access to SAGE's 500+ Journals Online

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
This Article
Right arrow Full Text (OnlineFirst PDF)
Right arrow All Versions of this Article:
0165551507087237v1
34/5/726    most recent
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Google Scholar
Right arrow Articles by Huntington, P.
Right arrow Articles by Jamali, H. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati  
What's this?
First published on May 8, 2008, doi:10.1177/0165551507087237

Journal of Information Science 2008;34:726.

A more recent version of this article appeared on October 1, 2008


Article

Web robot detection in the scholarly information environment

Paul Huntington*, David Nicholas, and Hamid R. Jamali

CIBER:1 School of Library, Archive and Information Studies, University College London

* To whom correspondence should be addressed.


   Abstract

An increasing number of robots harvest information on the world wide web for a wide variety of purposes. Protocols developed at the inception of the web laid out voluntary procedures in order to identify robot behaviour, and exclude it if necessary. Few robots now follow this protocol and it is now increasingly difficult to filter for this activity in reports of on-site activity. This paper seeks to demonstrate the issues involved in identifying robots and assessing their impact on usage in regard to a project which sought to establish the relative usage patterns of open access and non-open access articles in the Oxford University Press published journal Glycobiology, which offers in a single issue articles in both forms. A number of methods for identifying robots are compared and together these methods found that 40% of the raw logs of this journal could be attributed to robots.

Key Words: electronic journals; robot detection; web crawlers; web log analysis


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati    What's this?