Our Assistant Director of Research, Ruth Hattam, recently attended a workshop jointly run and sponsored by ARMA and Thomson Reuters on Bibliometrics for Beginners. Bibliometrics – a key part of which is use of citation data – is growing in importance in Higher Education, particularly as research funding becomes more competitive and institutions need ways to analyse strengths and target key areas for support.
Bibliometrics can to some extent allow strategic oversight of research activity and performance, although it does have several drawbacks and limitations, some of which were covered in the workshop.
Read on to find out more and for an overview of the day.
Bibliometrics: What is it and how is it used?
Bibliometrics literally means ‘measure of books’.
Use of citation indexes is clearly key to bibliometrics. A number were mentioned including Web of Science (Thomson Reuters), Scopus (Elsevier) – both databases – and Google Scholar, which also includes internet sources. There are also regional/subject-specific databases. As different editorial policies apply only one database should be used per comparison – mixing and matching isn’t encouraged as it wouldn’t give a balanced picture.
Possible uses of bibliometric data include:
- individual review and recruitment;
- University rankings;
- REF 2014 will use citations for the first time (not in all Panels but 3, 4 and 11 will);
- and grant applications
In terms of metrics, the h-index was developed to measure quality and quantity. This is the point at which the number of papers which have been most cited is equal to the number of citations (for example, an academic with 378 papers – 48 of which had been cited 48 times). Self-citation is one of possible pitfalls when using h-indices – but this figure can be removed from the analysis.
While citation based analysis like this can be useful, one needs to bear in mind that it doesn’t always take into account the nature of the citation itself. For example, a paper can be cited because the author: wants to build on prior knowledge; agrees or disagrees with the analysis; wants to help or hinder other researchers; wants to disprove the conclusion; and to improve their own impact factor. Outliers can also skew the results significantly. Bear in mind that people can look very good on paper even though they are no longer researching, for example Aristotle!
Data: From indexing to indicators
It’s important to understand what various bibliometric databases do and don’t include: First, they don’t contain all journals – 80 % of papers are published in 40% of journals, so databases don’t try to capture 100% of journals. Google Scholar is much more inclusive because it catches more publications, but the flip side is that these are not necessarily as high quality.
Second, it is worth considering how data is collected. Thomson Reuters (TR) uses:
- publishing standards e.g. peer review, editorial conventions, TR have subject specialists who assess content;
- editorial content e.g. TR have subject specialists;
- diversity, regional influence of authors;
- citation analysis – for new journals there is an analysis of editor and authors’ prior work
Third, what kind of outputs are indexed? The majority of citations from books are within arts and humanities and social sciences. Science subject nearly always cite other journals which reflects the speed at which field moves. Where books are concerned TR insist on original research and exclude textbooks.
Fourth, how are the data organised? TR has 249 subject areas, and has incorporated REF categories.
While there are clear advantages to using citation analysis, there are also a number of limitations:
- productivity is volume not quality, although you could argue that has been quality tested (i.e. peer reviewed) to get into the journal in first place;
- number of self-citations – the h-index would not distinguish these;
- it is papers and not the person being cited;
- stage of career not factored in: established researchers have higher productivity, citation count, and h index, so you have to normalise for publication year. One approach is to divide citation count by number of active years of research. TR compares each paper only with other papers of same year – look at average number of citations papers received in that year;
- subject differences need to be factored in. Need to do comparisons by subject not within university. TR normalise by subject category and academic year. Also need to distinguish between outputs, e.g. original research versus reviews so need to look at document type;
- value of citation not assessed – will include negative citations;
- relative contribution of each author on a paper not known;
- number of authors on a paper not known. Can normalise by calculating average number of authors per paper calculation;
- does not automatically account of differences in subject field: there are lots of initial citations in sciences, then the field moves on – mathematics is a low cited field and number of citations is more constant.
Two other bibliometric analyses are worth considering here: Journal Impact Factor (JIF) and Eigenfactor metrics:
JIF looks at impact of journal in a particular research community over the last 2 years based on number of citations. This is then normalised for size of journal. The impact factor is the number of citations divided by number of articles published in that journal. This is not as good an indicator for slow moving fields because it only goes two years back. It is good at capturing high level activity for fast moving subjects, e.g. natural sciences, engineering, and can inform where to publish in those subjects. JIF has developed 5 year impact factor to take account of subject differences.
Eigenfactor metrics were developed at University of Washington by Jevin West and Carl Bergstrom. From Wikipedia: Eigenfactor is a rating of the total importance of a scientific journal. Journals are rated according to the number of incoming citations, with citations from highly ranked journals weighted to make a larger contribution to the eigenfactor than those from poorly ranked journals. Eigenfactor scores and article influence scores are freely viewable on eigenfactor.org.
Conclusion: International issues and the future of bibliometrics
Citation data can be used to examine the extent of international collaborations of researchers or institutions. Data showed that working with international collaborators increases the number of citations.
It can also be used by a university to look at citation data relative to number of published papers. Universities are starting to look at citations and other factors, e.g. amount of industry income brought in per researcher, number of doctoral degrees awarded per member of academic staff – based on data from HESA. Thomson Reuters also combine with citation analysis with other data sources to perform more fine-grained analyses.
The point was made that China might be bringing down average due to massive increase in growth of number of papers but relatively low citation rate.
Ultimately before doing any sort of analysis or evaluation, you need to clearly define your objectives – what do you want to know and what will the data inform?
For any Northumbria staff interested in finding out more about the way bibliometrics can be used, it’s worth noting that the Library provides online instructional support in using such tools in the Measuring your Research Performance section of Skills Plus: http://nuweb2.northumbria.ac.uk/library/skillsplus/topics.html?l3-13