| Sign In to gain access to subscriptions and/or personal tools. |
Web robot detection in the scholarly information environmentSchool of Library, Archive and Information Studies, University College London
School of Library, Archive and Information Studies, University College London
School of Library, Archive and Information Studies, University College London, h.jamali{at}gmail.com An increasing number of robots harvest information on the world wide web for a wide variety of purposes. Protocols developed at the inception of the web laid out voluntary procedures in order to identify robot behaviour, and exclude it if necessary. Few robots now follow this protocol and it is now increasingly difficult to filter for this activity in reports of on-site activity. This paper seeks to demonstrate the issues involved in identifying robots and assessing their impact on usage in regard to a project which sought to establish the relative usage patterns of open access and non-open access articles in the Oxford University Press published journal Glycobiology, which offers in a single issue articles in both forms. A number of methods for identifying robots are compared and together these methods found that 40% of the raw logs of this journal could be attributed to robots.
Key Words: electronic journals robot detection web crawlers web log analysis
This version was published on October
1, 2008 Journal of Information Science, Vol. 34, No. 5,
726-741 (2008) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||