Co-citation Analysis Controversy
From previous posts, it's clear that I'm skeptical about many recent applications of citation analysis, bibliometrics, etc. No doubt this comes from my academic training. Author cocitation analysis (sometimes ACA) uses the statistical relationships of citations to articles to map the collaborativeness or interconnectedness of members of a community. For example, if a, b, and c write papers, and both d and e each cite these same papers, it can be inferred that d and e work in related research topics. When the numbers of authors and papers increase, or if the links are 2nd or 3rd hand (in other words, f cites d who cites a, b, and c), more complicated statistical methods are required to show the strength of the connection and the reliability of the measurement. Some of these methods are used in the various clustering tools.
In the new issue of JASIST (v55 n10, Aug 2004) there are more letters to the editor about author cocitation analysis and the use of Pearson's
r as a measure of similarity (DOI: 10.1002/asi.20028, 10.1002/asi.20029). Pearson's
r, aka the Pearson
product-moment correlation coefficient, measures how well a linear equation describes a relationship between two variables. White and Griffith first came up with this use around 1980 and it's been argued about ever since. Specifically, the value of knowing that the relationship between the variables is linear and positive/negative and using that as a measure of similarity is in debate. The proponents say that
r is easy to calculate and provides a good overview. The detractors say that there are a couple of examples where it breaks down. Both agree that qualitative information is required to provide real meaning (duh).
Some hot applications:
- The new Scout Report for Math, Engineering, and Technology (v3 n14, July 2, 2004) points to the Erdos Number Project.
- For km in companies, an internal analysis of the interconnectedness of the employees can contribute to social network analysis, communities of practice, and leveraging of employees' tacit knowledge (see Cross and Parker). I haven't read this book yet, but it's supposed to explain how to do this in the appendix.
- There's lots of work related to the blogosphere and collaboration, interconnection, social network analysis, etc. (See here, some here, and of course here).