Bibliometric Big Data and its Uses Dr. Gali Halevi Elsevier, NY
In memoriam https://www.youtube.com/watch?v=srbqtqtmncw
The Multidimensional Research Assessment Matrix
Unit of assessment Purpose Output dimensions Bibliometric indicators Other indicators Individual Allocate resources Research productivity Publications Peer review Research group Improve performance Quality, scholarly impact Journal citation impact Patents, licences, spin offs Department Increase multidiscipl. research Innovation and social benefit Actual citation impact Invitations for conferences Institution Increase regional engagement Sustainabi-lity & Scale Internat. coauthorship External research income Research field Promotion, hiring Research infrastruct. citation prestige PhD com-pletion rates
Unit of assessment Purpose Output dimensions Bibliometric indicators Other indicators Individual Allocate resources Research productivity Publications Peer review Research group Improve performance Quality, scholarly impact Journal citation impact Patents, licences, spin offs Department Increase multidiscipl. research Innovation and social benefit Actual citation impact Read Invitations for conferences column- Institution Increase regional engagement Sustainabi-lity & Scale wise Internat. coauthorship External research income Research field Promotion, hiring Research infrastruct. citation prestige PhD completion rates
Unit of assessment Purpose Output dimensions Bibliometric indicators Other indicators Individual Allocate resources Research productivity Publications Peer review Research group Improve performance Quality, scholarly impact Journal citation impact Patents, licences, spin offs Department Increase multidiscipl. research Innovation and social benefit Actual citation impact Invitations for conferences Institution Increase regional engagement Sustainabi-lity & Scale Internat. coauthorship External research income Research field Promotion, hiring Research infrastruct. citation prestige PhD completion rates
Indicators that are appropriate in one context may be useless or invalid in another The choice of indicators depends upon: What units is to be assessed Which aspect is being assessed? Why is the assessment done? Meta assumptions on the state of the system under assessment
CASE 1 [My view: Inappropriate use] Meta-level : Policy issue Recruitment of new researchers at research universities Policy measure Select the best researchers Bibliometric- operationalization Rank researchers by average impact factor of journals in which they published and select nr. 1
CASE 2 [My view: Appropriate use] Meta-level = Policy issue Research community is not sufficiently oriented toward international networks Policy measure Stimulate publication in good international journals Bibliometric - operationalization Count and reward articles in the first impact quartile of journals in subject field
Big Data in Bibliometrics In the past decade bibliometric data expanded to include a variety of large scale data such as: Citations References Key words (descriptors) Usage data Full text analytics The availability of the data and technological capabilities brought forth a strong proliferation of bibliometric databases and dataanalytical tools for the development of: Sophisticated and custom scientific evaluation Indicators Measurements of the behavior of researchers, journal editors and publishers; societal impact indicators of research, such as its technological value or its contribution to the enlightenment of the general public; Creation and analysis of large datasets by combining multiple datasets.
Compound Big Datasets and their objects of study
Examples of Big Data Analysis in Bibliometrics and its uses
Relationships between Downloads & Citations Answers questions such as: 1. How is my research impactful? 2. Are my resources optimized?
14 Article cycle Downloads Corrected paginated proof online 22-08-2008 Corrected proof online 04-03-2008 The effect of citations upon downloads Citations (red curve, low numbers)
Large differences in SD download patterns among journals
16 Analysis by journal and doc type 1. The number of a journal s downloads is about 100 times its number of citations (in a 5 yr window) 2. The advantage of reviews over normal articles is much larger for downloads than it is for citations 3. Findings suggest that in performance measurement reviews are better handled separately 4. Large differences exist between journals in the absolute level and temporal patterns in download counts
Patents and scientific articles: Library Science Example Answers questions such as: 1. How does my research impacts the economy? 2. Can I direct my research better? 3. Can I foster corporate / academic relations?
Co-Citations analysis in Patents We found that from 1999 the terms began to appear in Patents
Co-Citations analysis in Patents The first patent that uses co-citation analysis as a method is a Xerox patent that uses co-citation analysis to generate clusters of documents in a database. Other Assignees included Google, AT&T, Microsoft and others
Predictive Trends Modelling with Author and Index Keywords Answers questions such as: 1. What are the upcoming trends in my discipline? 2. Who is researching in emerging areas?
Why use Author Keywords Author keywords are assigned by the researchers themselves to describe their work They show unique tagging of the content especially when a new discovery or methodology is concerned When tracked over time they can show growth, adoption and development When compared to the Index keywords they are able to show new concepts later on adopted by the mainline thesauri and indexes
Distribution Wind Heat Engine Heat exchanger Heat pump Thermal Performance http://public.tableausoftware.com/views/presentationlink/sheet1?:embed=y&:display_count=no
Citation context analysis Combining citation data with full text article data Answers questions such as: 1. Where is your research cited 2. How multidisciplinary your research is
Citation density & sectional distribution Most citations appear in the introduction There are slightly more out-discipline citations than indiscipline citations The findings section followed by the discussion section are the second and third sections where citations appear.
Towards an Author Evaluation Tool (AET)
Current state Several metrics are widely used in research performance including: Journal Impact Factor H-Index G-Index i10-index (And others; See http://www.harzing.com/pophelp/metrics.htm#gindex ) Most of these methods are based on articles / citations counting and calculating their value to come up with a single number Methods based on statistical calculations have been highly criticized due to their rigid nature, complexity and lack of context Despite of the growing criticism of the research evaluation methodologies, evaluative metrics are still needed for: Research and researcher performance Research funding
Lessons learned When developing new indicators/metrics we need to Base our work established and accepted methods Combine methods in a way that gives proper weight to each Take into account every data point possible so that the indicator is as comprehensive as possible Make the method flexible enough to be able and contain future data points and/or methods Allow the person or institution measured certain amount of control over the data being calculated Aims for a simple, understandable metric that is transparent and easy to use
Basic characteristics of the Self-Organizing Research Assessment Metrics Our model of author assessment combines proper benchmarking with data accuracy enhancement tools The tool facilitates completeness check and corrections of assessed author s publication list Compares an assessed author with other researchers who are active in the same subject field Compare an assessed author with other researchers who are in the same phase of their scientific career Can be easily used by tenured and young researchers as a self-assessment tool, and also by administrators and analysts
Selecting an author and collecting publications data We tested level 2 of the model using author data found in Scopus in the field of energy
Creating the field appropriateness via cited references This includes the journal, authors and years of publication of the cited references
Creating the benchmarking network
Selecting the appropriate benchmarking network
34 Author Evaluation Metrics XXX Number of articles are above the median compared to the benchmarked network Citations are slightly below the median compared to the benchmarked network Citations per article are at the bottom 25% compared to the benchmarked network The author has a high publication rate but it is not as impactful
Valuable notions and distinctions Data accuracy is crucial Use data verified by authors themselves Combine metrics and expert knowledge Impact factors are no substitutes of actual impact Use multiple indicators Take into account career phase Take into account unintended effects Focus on top vs. bottom of quality distribution
Co-Authorship & Collaboration Answers questions such as: 1. How diverse are my collaborations? 2. Can I expand my research collaborations? 3. Who and in which countries should I be collaborating with and on what?
Main research questions and bibliometric indicators used in this study
A bibliometric model for capturing the state of scientific development INTERNATIONALIS ATION PRE- DEVELOPMENT Low research activity without clear policy or structural funding of research. BUILDING- UP Collaborations with developed countries are established. National researchers enter international scientific networks. CONSOLIDATI ON AND EXPANSION The country develops its own scientific infrastructure. The amount of funds available for research increases. Research institutions in the country start functioning as fully fledged partners, and increasingly take the lead in international collaborations.
Colombia s Scientific Collaborations
Subject Areas
Colombia s Collaborations in Medicine
Colombia s Collaborations in Agriculture
Dr. Gali Halevi Senior Research Analyst and Program Director Email: g.halevi@elsevier.com