Advanced Search

Journal Navigation

Journal Home

Subscriptions

Archive

Contact Us

Table of Contents

Click here to sign up for SAGE Journal Email Alerts today!

Sign In to gain access to subscriptions and/or personal tools.
Journal of Information Science
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Add to Saved Citations
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Request Reprints
Right arrow Add to My Marked Citations
Citing Articles
Right arrow Citing Articles via Web of Science (1)
Right arrow Citing Articles via Google Scholar
Right arrow Citing Articles via Scopus
Google Scholar
Right arrow Articles by Egghe, L.
Right arrow Articles by Rousseau, R.
Right arrow Search for Related Content
Social Bookmarking
 Add to CiteULike   Add to Complore   Add to Connotea   Add to Del.icio.us   Add to Digg   Add to Reddit   Add to Technorati   Add to Twitter  
What's this?

An approach to similarity measurement of absence-presence data: the case that common zeros matter

Leo Egghe

LUC, Universitaire Campus, B-3590 Diepenbeek, Belgium, and UA, IBW, Universiteitsplein1, B-2610 Wilrijk, Belgium

Ronald Rousseau

KHBO, IWT, Zeedijk 101, B-8400 Oostende, Belgium, and UA, IBW, Universiteitsplein 1, B-2610 Wilrijk, Belgiumronald.rousseau{at}khbo.be

Similarity between objects (documents, persons, answers to a questionnaire, etc.) is generally determined through relations between representations of these objects. In the case of binary representations the presence of a property (e.g. an index term) carries a weight of one, its absence a weight of zero. In many similarity studies common zeros are ignored. This situation is called the zero insensitive case. In this article, however, we study the zero sensitive case. Clearly, answers to binary questionnaires (yes-no, encoded as 1-0) are zero sensitive, as people who answer ‘no’ to the same questions are more similar than those who give different answers. We present a wish list for such a zero sensitive approach to similarity. Making a difference between common zeros and common ones leads to an ‘identity-similarity’ theory. Hence, we move beyond a pure similarity theory. Two approaches to the problem of similarity measurement of presence-absence data, where common zeros matter and have the same effect as common ones, are presented. For the case that there is a difference between common ones and common zeros a totally new approach is proposed. In each case a coding approach is used, leading to new representations, which then lead to a similarity ranking. Examples of functions respecting these rankings are given.

When discussing similarity in general terms authors should clearly state which requirements they imply for the notion of ‘similarity’. It is only then that the problem of the best measure for a given study can be brought up for discussion in a meaningful way.

Key Words: zero-sensitive similarity • absence-presence data • ranking of identical arrays • radix 4 encoding

Journal of Information Science, Vol. 30, No. 6, 509-519 (2004)
DOI: 10.1177/0165551504047827


Add to CiteULike CiteULike   Add to Complore Complore   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us   Add to Digg Digg   Add to Reddit Reddit   Add to Technorati Technorati   Add to Twitter Twitter    What's this?