Towards automatic discovery of co-authorship networks in the Brazilian academic areas

Size: px
Start display at page:

Download "Towards automatic discovery of co-authorship networks in the Brazilian academic areas"

Transcription

1 Towards automatic discovery of co-authorship networks in the Brazilian academic areas Jesús. Mena-Chalco, Roberto M. Cesar Junior Department of Computer Science Institute of Mathematics and Statistics University of São aulo - Brazil {jmena,cesar}@vision.ime.usp.br Abstract In Brazil, individual curricula vitae of academic researchers, that are mainly composed of professional information and scientific productions, are managed into a single software platform called Lattes. Currently, the information gathered from this platform is typically used to evaluate, analyze and document the scientific productions of Brazilian research groups. Despite the fact that the Lattes curricula has semi-structured information, the analysis procedure for medium and large groups becomes a time consuming and highly error-prone task. In this paper, we describe an extension of the scriptlattes (an open-source knowledge extraction system from the Lattes platform), for analysing individuals Lattes curricula and automatically discover largescale co-authorship networks for any academic area. Given some knowledge domain (academic area), the system automatically allows to identify researchers associated with the academic area, extract every list of scientific productions of the researchers, discretized by type and publication year, and for each paper, identify the co-authors registered in the Lattes latform. The system also allows the generation of different types of networks which may be used to study the characteristics of academic areas at large scale. In particular, we explored the node s degree and AuthorRank measures for each identified researcher. Finally, we confirm through experiments that the system facilitates a simple way to generate different co-authorship networks. To the best of our knowledge, this is the first study to examine large-scale co-authorship networks for any Brazilian academic area. I. INTRODUCTION Data mining is a process of extracting large volumes of data, commonly used to identify or reveal possible relationships between instances of the treated elements [1]. The extraction of scientific data, identification of bibliometric patterns, and effective modeling and visualization of networks of interaction among co-authors are relevant topics in e-science, Bibliometrics and Scientometrics. In recent years, there has been growing interest in such topics because of the knowledge discovery process that may be implemented from the treatment of datasets available in the repositories of scientific production (e.g. datasets of bibliographic production, academic guidances and research projects). On the other hand, in Brazil, the National Council for Scientific and Technological Development (CNq) performs an important work on the integration of academic curricula databases of public and private institutions into a single platform called Lattes. The so-called Lattes Curriculum are considered a national standard of assessment or evaluation, representing a history of scientific activities academic and professional of researchers registered on the platform [2]. Lattes curricula have been designed to show individual public information for each user registered on the platform. In this context, performing a summarization of bibliographic production, or generation of co-authorship networks, for a group of registered users of medium or large size (e.g. researchers from a specific academic area), really requires a lot of manual effort and a myriad of process, susceptible to failures, which could influence the obtained results. Therefore, starting from the work 1 [3], an automatic system for generation of large-scale co-authorship networks, based on Lattes curricula belongings to researchers associated to specific academic areas, has been created. It is worth noting that this previous work (i) do not explore the automatic identification of researchers associated with an academic area, and (ii) do not extract measures to characterize co-authorship networks. It is important to note that our approach assumes the existence of a large set of Lattes curricula. Our local dataset contains over 1.3 million of curricula. The data was gathered by crawling publicly accessible information on to the Lattes platform. We believe that our work is the first study to automatically examine Brazilian co-authorship networks in such large-scale. In addition to the generation of co-authorships networks, the system allows to automatic computation of two metrics related to the node s degree and the AuthorRank measure [4] for each identified researcher. Considering our experiments, we provide an insight into the real-world co-authorship networks. We observe a significant correlation between the node measures such as the degree value and the AuthorRank measure calculated of each researcher associated with an academic area. However, we observe that the AuthorRank measure is an indicator more representative than the degree value, because the AuthorRank better revels the status of a researcher, as mentioned by X. Liu et al. [4]. Finally, note that this system can be used for exploring, identifying and validating patterns of scientific activities, thus bringing bibliometric and/or scientometric information about a group of interest [5]. The relevance of our work rest on the benefits of using the process considered for the automatic 1 The scriptlattes is available as open-source code that can be adapted and improved for specific research needs:

2 analysis of scientific production and relationship among the actors of research academic areas [6], [7]. This paper is organized as follows. The acquisition procedures used to collect the used dataset is described in Section II. The system is described in Section III. Some experiments and results are shown in Section IV. The paper is concluded with some comments on our ongoing work in Section V. II. DATA COLLECTION The data presented in this paper is from the Lattes platform on May 5th, 2011 and contains about 1.3 million curricula. We have created several procedures in order to crawl curricula, in HTML format, by accessing the public websearch interface provided by the Lattes platform. Several webquery were executed for the purpose of access to large Lattes curricula dataset. Each web-query has been referred to a specific knowledge area. In summary, it was performed eighty queries to the Lattes platform, each one associated to an knowledge domain. For every result a specific HTML parser has been created in order to automatically obtain the list of curricula. With this in mind, we used this list to automatically download the Lattes curricula. It is important to note that with this collection strategy, we do not attempt to study the entire Lattes database, but a representative quantity data, since in 2007 the CNq announced that the Lattes platform achieved one million of registered curriculum which means that it should be much larger now. III. THE SYSTEM Our proposed system is composed of four main modules, namely (i) data selection, (ii) data extraction, (iii) redundancy treatment, (iv) co-authorship networks generation, and (v) AuthorRank computation. The basic operation of the system is as follows. Firstly, a list of Lattes curricula, in HTML format, must be selected following some criteria of analysis given by the user. This requires multiple queries to the local database. The resulting Lattes curricula are processed through a deterministic HTML parser that extracts all information of interest (e.g. professional information, expertise areas, list of academic productions). Therefore, the processed productions lists, associated with each researcher, are combined into lists, organized by type and publication year. The obtained filtered lists are used to create different types of networks. This is achieved by identifying all co-authors (registered on Lattes platform) of each publication. The AuthorRank computation is then performed to obtain a measure related to the status of each researcher. Fig. 1 shows the high level view of the proposed system. A. Data selection This module allows to select the Lattes curricula, in HTML format, from the local database. Therefore, only the subset of researchers Lattes curricula is chosen in accordance to the academic (expertise) areas informed by the user. In order to facilitate the comparison between strings, the curricula were normalized to use the same characters codification. As Fig. 1. A high level view of our system to automatically discover coauthorship networks belongings to academic areas. indicated in Section II, public users of the Lattes platform do not have access to the complete Lattes database. For this reason, special attention is given to extract the information about the scientific productions, as explained in the data extraction module. B. Data extraction In this module, a deterministic HTML arser [8] is used to automatically extract all information of each researcher related to full name, bibliographic citation name, academic areas, professional address, scholarship type, and gender. Additionally, the designed parser allows to extract the list of academic productions limited by the years indicated in the time period (given by the user). Currently, it is considered three types of academic production, namely (i) bibliographical production, (ii) technical production, and (iii) artistic production. See in Table I the types of academic production considered by the system. Is worth mentioning that a computational challenge for the system is processing the data in HTML format, where the constituent parts of the academic productions (e.g. authors names, publication title, the journal name of publication, number, volume, pages, year) are presented without any indication of division/separation. Thus, the designed parser identifies, in the vast majority of cases, all the constituent parts of the academic productions. C. Redundancy treatment It is common that some academic productions are made in co-authorship with one or more researches of the selected group in the Data selection module. Thus, the same academic production may be being registered in each Lattes curriculum of the corresponding co-authors. The developed system has a redundancy treatment module that allows the detection of

3 TABLE I INFORMATION EXTRACTED FROM THE LATTES CURRICULUM. (i) Bibliographical production Articles in scientific journals Book published/organized Book chapter published Articles in newspapers/magazines Complete works published in proceedings of conferences Expanded summary published in proceedings of conferences Summary published in proceedings of conferences Articles accepted for publication resentations of work Other kinds of bibliographical production (ii) Technical production atented or registered software Not patented or registered software Technological products Techniques or process Technical works Other kinds of technical production (iii) Artistic productions Artistic/cultural production equal or similar academic productions. In this regard, the productions are used to detect co-authorship among the selected group of researchers: two or more researchers are considered co-authors if there is a common production between them. The detection of equal or similar productions is accomplished through comparisons among all extracted academic productions discretized by year and type (e.g., journal paper, book chapter), so that the productions with different publication s year are not used in any comparison, allowing a substantial reduction of processing time of this module. Due to inconsistencies in filling information in the Lattes curriculum (typos or lack of standardization of co-authors names [9]), the comparison of any two productions is made through an inexact matching procedure between the title of the academic productions. Two productions are considered equal if the similarity percentage between the titles is greater than a percentage given by the user. The similarity between two strings is based on the distance proposed by Levenshtein [10]. The Levenshtein distance is obtained through the minimum number of insertions, deletions or substitutions required to transform one string into another (Levenshtein distance equals to 0 indicate that two considered titles are exactly equal). An important feature of this module is the conjunction of data in the similar productions, so that the missing information of a production can be combined/complemented with information from the other. Currently, complementation refers only to the exclusive use of the string with larger size. With this module, the system is able to keep a record of all coauthors associated with a particular academic production. This information is important in numerical weighting of academic productions, such as the normalization of edges weights belonging to co-authorship networks. D. Co-authorship networks generation A co-authorship network describes research activities that have been carried out by a research group [6], [11]. In this module, the designed system uses a graph to represent the coauthorship among researchers based on scientific productions. Each researcher is represented by a node. An edge is created between a pair of nodes whenever a common production of the corresponding researchers is detected by the redundancy treatment module, i.e., if two researchers have co-authored an academic production, their respective nodes in the coauthorship network are linked by an edge (co-author relationship between authors). Our work is focused on creating different co-authorship networks. Below we describe briefly each of them. 1) Networks without weights: this type of network is commonly represented by an undirected graph without weights (or undirected unit-weighted graph), where the edges represent only connections for co-authored academic productions. A simple measure that can be harnessed on this type of graphs is the node degree, that represents the number of edges connected to it. See an example in Fig. 2(a). The degree of the node A1 is 3. 2) Networks with un-normalized weights: this type of network is represented by an undirected weighted graph, where the edges represent the number of academic productions developed in co-authorship between two nodes. In this type of networks the node degree can be seen as the sum of the edges weights of edges connected to it. Note that with this representation, the total number of productions made in co-authorship is overlooked. For example, in Fig. 2(b) the author A1 collaboratively has participated in three productions, however, the node degree is 4 (this value does not correspond with a common sense notion of coauthorship). 3) Networks with co-authorship frequencies: to work around the problem reported in the previous subsection, in [4] it was defined the terms of exclusivity and co-authorship frequency. Exclusivity is related to the value at which two authors have an exclusive co-authorship relation for a particular academic production. Thus, given an academic production co-authored by n authors, the exclusivity value of this production can be defined as 1/(n 1). This measure gives more weight to coauthor relationships in academic production with fewer total co-authors than articles with large numbers of co-authors. See in the top of Fig. 2 examples of exclusivity values computed for three productions. The co-authorship frequency between two co-authors is defined as the sum of all exclusivity values between them. This measure gives more weight to authors who co-publish more academic productions together. See in Fig. 2(c) examples of co-authorship frequencies. Note that with this representation,

4 Fig. 2. Example of co-authorship networks obtained from automatic identification of 3 productions written by 4 authors: A1, A2, A3 and A4. given an author, the total number of productions made in coauthorship is computed by the sum of co-authorship frequencies obtained from co-authors connected to the author. For example, the author A1 has participated in three productions ( ). 4) Networks with normalized weights: this type of network is represented by a directed weighted graph where, given an author, the salient edges are represented by the co-authorship frequencies normalized by the total number of productions made in co-authorship. This normalization ensures that the weights of an author s relationships sum to one. See in Fig. 2(d) an example of network with normalized weights. The normalized weights intuitively gives an idea of the importance of a co-author in the production carried out in collaboration with others. For example, the author A1 participates in 75% of the production made in co-authorship of A2 (i.e., A1 is 75% important for A2). However, the author A2 is important only about 50% for A1. On the other hand, A1 is 100% important for A4, but A4 is important only about 33% for A1. This behavior is typical in the relationship advisor-advisee (A1-A4). E. AuthorRank computation The weights of the aforementioned directed graph show the importance of collaboration and reciprocity, being also used in the system as a basis for the calculation of the AuthorRank measures [4] of researchers. We could understand the AuthorRank measure as a numeric value that represents the status of a researcher within the co-authorship network (the algorithm proposed by X. Liu et al. [4], called AuthorRank, it is an adaptation of the agerank algorithm [12] used in the search for relevant pages in the Google search engine). Therefore, the researchers with more collaborative status have the highest AuthorRank values. The AuthorRank (AR) of an author i, which has n coauthors, is given as follows: AR (t) (i) = (1 d) + d n ( AR (t 1) ) (j) w j,i j=1 where AR (t 1) (j) corresponds to the AuthorRank of the j coauthor of i (j = 1,..., n), w j,i corresponds to the normalized weight between author j and author i, and d is the damping factor (usually set to 0.85 as described in [13]). For this iterative process we considered the initialization of Author- Rank measures as AR (0) (k) = 1 k. In our experiments the AuthorRank measures converged when t = 100. For the example shown in Fig. 2, A1 is the author with the highest AuthorRank. It is important to note that, given the node degree values, the authors A2 and A3 could be classified at the same level. However, considering the AuthorRank measures, the ranking of author A2 is better than the ranking of author A3. IV. EXERIMENTS AND RESULTS Researchers from four domains have been considered in the present study: Computer Science, Mathematics, hysics, and Statistics. In order to standardize the analysis of automatic generation of co-authorship networks, only the journal papers published between 2000 and 2010 were considered. The obtained co-authorship networks are presented in Fig. 3. For each knowledge domain, the Data selection module (Section III) has been configured to consider only researchers with a doctoral degree (at least). In the regular representation (first column), the researchers are represented by nodes, and the co-authorships are represented by unweighted straight edges. The graph drawing was obtained through the Fruchterman- Reingold algorithm [14]. The entire graph is simulated as if it were a physical system. The aforementioned algorithm is based on the attribution of forces between the nodes and the (1)

5 Regular representation Node size is proportional to node s degree Node size is proportional to node s AuthorRank measure (a) Computer Science 6014 researchers (b) Mathematics 5008 researchers (c) hysics 6410 researchers (d) Statistics 2274 researchers Fig. 3. Co-authorship networks belonging to researchers associated to four Brazilian academic areas. The researchers are represented by nodes, the coauthorships are represented by edges. The networks were created considering the journal papers published between 2000 and 2010 by researchers with doctoral degree. Isolated nodes are not shown.

6 cumulative frequency Computer sciences Mathematics hisics Statistics All degree Fig. 4. Cumulative degree distribution of co-authorship networks of four Brazilian academic areas. edges, as if the edges were straight springs and nodes are electrically charged particles. The forces are applied iteratively to the nodes, pulling them closer together or pushing them more distant until the system enter into an equilibrium state. This algorithm was adopted to visualize the complete co-authorship networks because the vertices are evenly distributed, making edge lengths uniform and reflecting symmetry. In general, each co-authorship network apparently reflects characteristics of professional/academic practice. For instance, mathematicians usually have a few co-authors in publishing their work. On the other hand, physicists generally tend to have several co-authors in the publications. This approach may be better understood by calculating the cumulative degree distribution of each co-authorship network (see Fig. 4). In the representation of co-authorship networks of the second column of Fig. 3, each node size is proportional to its own measure of degree of connectivity, i.e., the size of the node represents the number of co-authors of each researcher, so that the more collaborative researcher will have a larger size of the node. On the other hand, in the third column of Fig. 3, the node size is proportional to its own measure of AuthorRank. Note that the nodes maintain the same relative positions for all co-authorship networks of each area. As expected, only one giant component was obtained for each academic area. As a global analysis we execute our proposed system in order to obtain a complete co-authorship network composed by the four aforementioned areas. A unique label was assigned for each academic area hd researchers have been identified in our local database researchers were associated with more than one academic area. Fig. 5 presents the co-authorship networks created considering the journal papers published between 2000 and Note that in the regular representation of the network (first column), the researchers associated with each area are slightly grouped among themselves forming components. Note also that the area of mathematics (nodes represented in green color) is transversal to the other three areas considered in our experiment. In the second and third column of Fig. 5 the node size associated to each researcher was modified in order to represent the degree value and AuthorRank measures, respectively. The higher value is represented by a node with a higher size. Note that the more connected academic area corresponds to hysics. Therefore, it is expected that the majority of nodes with large size (i.e., nodes width high degree value) belong to this area. However, a better sense of collaboration is given by the AuthorRank measures. Regarding the relationship between node measures such as degree and AuthorRank that allows to characterize the topology of real-world networks, few studies had been conducted [15]. In this sense, in our experiment, we computed the earson s correlation coefficient of (p-value < 2.2e- 16) between the degree values and the AuthorRank measures. This significant correlation indicates that the AuthorRank measure is related to degree value of each node. In fact, both measures indicate how related is a researcher in the academic community. Fig. 6 displays the graphic correlation between both measures. In order to compare the representation of co-authorship networks by both measures, in Table II the top 10 researchers are presented sorted by degree values and AuthorRank measures. Note that, considering the nodes degree value, the top 10 identified researchers are associated with the area of hysics (highly connected area). On the other hand, considering the nodes AuthorRank measure, the researchers more collaborative from other areas are shown in evidence in the top 10 researchers associated with the four considered academic areas. It is also interesting to note that both measures are apparently linked in part to the number of papers published and the type of academic scholarship conferred by the CNq (the best types of scholarships associated to the top researchers are:,, 1C and 1D). V. CONCLUSION AND ERSECTIVES In the present work we have presented a system which can improve the usefulness significantly of the scriptlattes (an open-source knowledge extraction system from the Lattes platform) in order to automatically discover co-authorship networks through data extracted from the Lattes platform. Our initiative contributes mainly in two aspects: (i) allow the automatic generation of large-scale co-authorship networks of several Brazilian academic area, and (ii) provides a good opportunity to analyze, through metrics, co-authorship networks derived from different areas associated with the eighty knowledge areas classified by CNq. All things considered, the co-authorship networks may be examined by complementary tools for social network analysis such as ajek, UCInet or R. Thereby, the obtained networks

7 Regular representation Node size is proportional to node s degree value Node size is proportional to node s AuthorRank measure Fig. 5. Co-authorship network belonging to researchers associated to four Brazilian academic areas: Computer Science (red), Mathematics (green), hysics (blue), Statistics (magenta). Researchers with two or more academic areas are presented in black color. The networks were created considering the journal papers published between 2000 and 2010 by researchers with doctoral degree. Isolated nodes are not shown. TABLE II T O 10 RESEARCHERS W. R. T. DEGREE VALUES ( LEFT ) AND AUTHOR R ANK MEASURES ( RIGHT ) OBTAINED FROM JOURNAL AERS UBLISHED BETWEEN 2000 AND 2010 BY RESEARCHERS WITH H D DEGREE. CS=C OMUTER S CIENCE, M=M ATHEMATICS, = HYSICS, S=S TATISTICS. Researcher A6201 A14631 A2763 A12704 A1839 A260 A3599 A14752 A9190 A11411 Degree AuthorRank apers Area(s) CS, Type 1C 1D 1C 1C can be easily treated with various metrics such as: distance, clustering and centrality (closeness and betweenness) [16], [17]. These measures are widely studied to: (i) characterize social networks and automatically identify co-authorship subcommunities of research groups [18], (ii) study and characterize the temporal evolution of co-authorship among researchers, i.e., analyze the cooperation dynamics of researchers across different periods of time [19], or (iii) correlate quantitative with qualitative collaboration data in order to examine the trending of research activities on specific axes [20], [21]. Finally, we emphasize that the use of automatic tools similar to the presented (e.g. DBL, CiteSeer, Google Scholar, Microsoft Academic Search, and ArnetMiner [22]) have been increasingly necessary because there is a increasing volume of academic production data that must be correctly computed [1] and effectively visualized by using computational methods [23]. ACKNOWLEDGMENT The authors would like to thank CNq, CAES, FAES and FINE for their financial support. Researcher A14631 A4600 A9220 A15939 A14752 A2763 A11411 A14054 A16431 A6201 AuthorRank Degree apers Area(s) CS, CS, S CS, M, M CS Type 1C R EFERENCES [1] S. Issue, Communications of ACM: Surviving the data deluge. New York, NY, USA: ACM, 2008, vol. 51, no. 12. [2] C. Amorin, Curriculum vitae organization: the Lattes software platform, esquisa Odontolgica Brasileira, vol. 17, no. 1, pp , [3] J.. Mena-Chalco and R. M. Cesar-Jr., scriptlattes: An open-source knowledge extraction system from the lattes platform, Journal of the Brazilian Computer Society, vol. 15, no. 4, pp , [4] X. Liu, J. Bollen, M. Nelson, and H. V. de Sompel, Co-authorship networks in the digital library research community, Informations rocessing and Management, vol. 41, no. 6, pp , [5] S. Nicholson, The basis for bibliomining: Frameworks for bringing together usage-based data mining and bibliometrics through data warehousing in digital library services, Informations rocessing and Management, vol. 42, no. 3, pp , [6] S. Klink,. Reuther, A. Weber, B. Walter, and M. Ley, Analysing social networks within bibliographical data, in 7th International Conference on Database and Expert Systems Applications, ser. Lecture Notes in Computer Science, vol. 4080, 2006, pp [7] R. Kouzes, G. Anderson, S. Elbert, I. Gorton, and D. Gracio, The changing paradigm of data-intensive computing, Computer, vol. 42, no. 1, pp , [8] M. Tomita, Current issues in parsing technology. Kluwer Academic ublishers, Boston, [9] I. S. Kang, S. H. Na, S. Lee, H. Jung,. Kim, W. K. Sung, and J. H. Lee, On co-authorship for author disambiguation, Information rocessing and Management, vol. 45, no. 1, pp , [10] G. Navarro, A guided tour to approximate string matching, ACM Computing Surveys, vol. 33, no. 1, pp , 2001.

8 Degree AuthorRank Fig. 6. Correlation between degree and AuthorRank measures. [11] M. Maia and S. Caregnato, Co-autoria como indicador de redes de colaborao cientfica, erspectivas em Cincia da Informao, vol. 13, no. 2, pp , [12] L. age, S. Brin, R. Motwani, and T. Winograd, The pagerank citation ranking: Bringing order to the web, Stanford InfoLab, Technical Report , November 1999, previous number = SIDL-W [13] S. Brin and L. age, The anatomy of a large-scale hypertextual web search engine, in Seventh International World-Wide Web Conference, [14] T. M. J. Fruchterman and E. M. Reingold, Graph drawing by forcedirected placement, Software - ractice and Experience, vol. 21, no. 11, pp , [15] A. Jamakovic and S. Uhlig, On the relationships between topological measures in real-world networks, Networks and Heterogeneous Media, vol. 3, no. 2, pp , [16] S. Wasserman and K. Faust, Social network analysis: methods and applications. Cambridge University ress, [17] J.. Scott, Social Network Analysis: A Handbook, 2nd ed. SAGE ublications, [18] M. Newman and M. Girvan, Finding and evaluating community structure in networks, hysical Review E, vol. 69, no. 2, p , [19] B. Wu, F. Zhao, S. Yang, L. Suo, and H. Tian, Characterizing the evolution of collaboration network, in roceeding of the 2nd ACM workshop on Social web search and mining, ser. SWSM 09, 2009, pp [20] L. L. Leydesdorff, Can scientific journals be classified in terms of aggregated journal-journal citation relations using the journal citation reports? Journal of the American Society for Information Science and Technology, vol. 57, no. 5, pp , [21] L. Leydesdorff, Betweenness centrality as an indicator of the interdisciplinarity of scientific journals, Journal of the American Society for Information Science and Technology, vol. 58, no. 9, pp , [22] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, Arnetminer: Extraction and mining of academic social networks, in roceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp [23] D. A. Keim, Information visualization and visual data mining, IEEE transactions on Visualization and Computer Graphics, vol. 8, no. 1, pp. 1 8, 2002.

scriptlattes: an open-source knowledge extraction system from the Lattes platform

scriptlattes: an open-source knowledge extraction system from the Lattes platform Journal of the Brazilian Computer Society, 9; 5():-9. ISSN -65 scriptlattes: an open-source knowledge extraction system from the Lattes platform Jesús Pascual Mena-Chalco*, Roberto Marcondes Cesar Junior

More information

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search

ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Project for Michael Pitts Course TCSS 702A University of Washington Tacoma Institute of Technology ALIAS: A Tool for Disambiguating Authors in Microsoft Academic Search Under supervision of : Dr. Senjuti

More information

A comparative study of social network analysis tools

A comparative study of social network analysis tools Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises

More information

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics

Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu

More information

Temporal Visualization and Analysis of Social Networks

Temporal Visualization and Analysis of Social Networks Temporal Visualization and Analysis of Social Networks Peter A. Gloor*, Rob Laubacher MIT {pgloor,rjl}@mit.edu Yan Zhao, Scott B.C. Dynes *Dartmouth {yan.zhao,sdynes}@dartmouth.edu Abstract This paper

More information

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland

More information

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending

More information

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS

USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu ABSTRACT This

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

Scientific Collaboration Networks in China s System Engineering Subject

Scientific Collaboration Networks in China s System Engineering Subject , pp.31-40 http://dx.doi.org/10.14257/ijunesst.2013.6.6.04 Scientific Collaboration Networks in China s System Engineering Subject Sen Wu 1, Jiaye Wang 1,*, Xiaodong Feng 1 and Dan Lu 1 1 Dongling School

More information

ProteinQuest user guide

ProteinQuest user guide ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Graph Processing and Social Networks

Graph Processing and Social Networks Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

TIBCO Spotfire Network Analytics 1.1. User s Manual

TIBCO Spotfire Network Analytics 1.1. User s Manual TIBCO Spotfire Network Analytics 1.1 User s Manual Revision date: 26 January 2009 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO

More information

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context

Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Intelligent Analysis of User Interactions in a Collaborative Software Engineering Context Alejandro Corbellini 1,2, Silvia Schiaffino 1,2, Daniela Godoy 1,2 1 ISISTAN Research Institute, UNICEN University,

More information

Network-Based Tools for the Visualization and Analysis of Domain Models

Network-Based Tools for the Visualization and Analysis of Domain Models Network-Based Tools for the Visualization and Analysis of Domain Models Paper presented as the annual meeting of the American Educational Research Association, Philadelphia, PA Hua Wei April 2014 Visualizing

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

Identifying Influential Scholars in Academic Social Media Platforms

Identifying Influential Scholars in Academic Social Media Platforms Identifying Influential Scholars in Academic Social Media Platforms Na Li, Denis Gillet École Polytechnique Fédérale de Lausanne (EPFL) 1015 Lausanne, Switzerland {na.li, denis.gillet}@epfl.ch Abstract

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Six Degrees of Separation in Online Society

Six Degrees of Separation in Online Society Six Degrees of Separation in Online Society Lei Zhang * Tsinghua-Southampton Joint Lab on Web Science Graduate School in Shenzhen, Tsinghua University Shenzhen, Guangdong Province, P.R.China zhanglei@sz.tsinghua.edu.cn

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

MapReduce Approach to Collective Classification for Networks

MapReduce Approach to Collective Classification for Networks MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty

More information

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE

A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LIX, Number 1, 2014 A STUDY REGARDING INTER DOMAIN LINKED DOCUMENTS SIMILARITY AND THEIR CONSEQUENT BOUNCE RATE DIANA HALIŢĂ AND DARIUS BUFNEA Abstract. Then

More information

How to Analyze Company Using Social Network?

How to Analyze Company Using Social Network? How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,

More information

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

More information

Social Network Discovery based on Sensitivity Analysis

Social Network Discovery based on Sensitivity Analysis Social Network Discovery based on Sensitivity Analysis Tarik Crnovrsanin, Carlos D. Correa and Kwan-Liu Ma Department of Computer Science University of California, Davis tecrnovrsanin@ucdavis.edu, {correac,ma}@cs.ucdavis.edu

More information

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer.

International Journal of Engineering Research-Online A Peer Reviewed International Journal Articles are freely available online:http://www.ijoer. RESEARCH ARTICLE SURVEY ON PAGERANK ALGORITHMS USING WEB-LINK STRUCTURE SOWMYA.M 1, V.S.SREELAXMI 2, MUNESHWARA M.S 3, ANIL G.N 4 Department of CSE, BMS Institute of Technology, Avalahalli, Yelahanka,

More information

Graph/Network Visualization

Graph/Network Visualization Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS

THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS CO-205 THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS SLUTER C.R.(1), IESCHECK A.L.(2), DELAZARI L.S.(1), BRANDALIZE M.C.B.(1) (1) Universidade Federal

More information

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Social Analysis of the SEKE Co-Author Network

Social Analysis of the SEKE Co-Author Network Social Analysis of the SEKE Co-Author Network Rehab El Kharboutly Swapna S. Gokhale Software Engineering Computer Science & Engg. Quinnipiac University Univ. of Connecticut Hamden, CT 06518 Storrs, CT

More information

Visual Analysis Tool for Bipartite Networks

Visual Analysis Tool for Bipartite Networks Visual Analysis Tool for Bipartite Networks Kazuo Misue Department of Computer Science, University of Tsukuba, 1-1-1 Tennoudai, Tsukuba, 305-8573 Japan misue@cs.tsukuba.ac.jp Abstract. To find hidden features

More information

Viral Marketing in Social Network Using Data Mining

Viral Marketing in Social Network Using Data Mining Viral Marketing in Social Network Using Data Mining Shalini Sharma*,Vishal Shrivastava** *M.Tech. Scholar, Arya College of Engg. & I.T, Jaipur (Raj.) **Associate Proffessor(Dept. of CSE), Arya College

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support Rok Rupnik, Matjaž Kukar, Marko Bajec, Marjan Krisper University of Ljubljana, Faculty of Computer and Information

More information

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing

More information

Learn Software Microblogging - A Review of This paper

Learn Software Microblogging - A Review of This paper 2014 4th IEEE Workshop on Mining Unstructured Data An Exploratory Study on Software Microblogger Behaviors Abstract Microblogging services are growing rapidly in the recent years. Twitter, one of the most

More information

How To Find Local Affinity Patterns In Big Data

How To Find Local Affinity Patterns In Big Data Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class

More information

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data Search in BigData 2 - When Big Text meets Big Graph Christos Giatsidis, Fragkiskos D. Malliaros, François Rousseau, Michalis Vazirgiannis Computer Science Laboratory, École Polytechnique, France {giatsidis,

More information

Applying Social Network Analysis to the Information in CVS Repositories

Applying Social Network Analysis to the Information in CVS Repositories Applying Social Network Analysis to the Information in CVS Repositories Luis Lopez-Fernandez, Gregorio Robles, Jesus M. Gonzalez-Barahona GSyC, Universidad Rey Juan Carlos {llopez,grex,jgb}@gsyc.escet.urjc.es

More information

Can Twitter Predict Royal Baby's Name?

Can Twitter Predict Royal Baby's Name? Summary Can Twitter Predict Royal Baby's Name? Bohdan Pavlyshenko Ivan Franko Lviv National University,Ukraine, b.pavlyshenko@gmail.com In this paper, we analyze the existence of possible correlation between

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

NOVEL APPROCH FOR OFT BASED WEB DOMAIN PREDICTION

NOVEL APPROCH FOR OFT BASED WEB DOMAIN PREDICTION Volume 3, No. 7, July 2012 Journal of Global Research in Computer Science RESEARCH ARTICAL Available Online at www.jgrcs.info NOVEL APPROCH FOR OFT BASED WEB DOMAIN PREDICTION A. Niky Singhai 1, B. Prof

More information

Abstract. Introduction

Abstract. Introduction CODATA Prague Workshop Information Visualization, Presentation, and Design 29-31 March 2004 Abstract Goals of Analysis for Visualization and Visual Data Mining Tasks Thomas Nocke and Heidrun Schumann University

More information

Record Deduplication By Evolutionary Means

Record Deduplication By Evolutionary Means Record Deduplication By Evolutionary Means Marco Modesto, Moisés G. de Carvalho, Walter dos Santos Departamento de Ciência da Computação Universidade Federal de Minas Gerais 31270-901 - Belo Horizonte

More information

Investigating Clinical Care Pathways Correlated with Outcomes

Investigating Clinical Care Pathways Correlated with Outcomes Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges

More information

Evaluating Software Products - A Case Study

Evaluating Software Products - A Case Study LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATION: A CASE STUDY ON GAMES Özge Bengur 1 and Banu Günel 2 Informatics Institute, Middle East Technical University, Ankara, Turkey

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information

Extracting Information from Social Networks

Extracting Information from Social Networks Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

The Importance of Analytics

The Importance of Analytics CIPHER Briefing The Importance of Analytics July 2014 Renting 1 machine for 1,000 hours will be nearly equivalent to renting 1,000 machines for 1 hour in the cloud. This will enable users and organizations

More information

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD

IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Journal homepage: www.mjret.in ISSN:2348-6953 IMPROVING BUSINESS PROCESS MODELING USING RECOMMENDATION METHOD Deepak Ramchandara Lad 1, Soumitra S. Das 2 Computer Dept. 12 Dr. D. Y. Patil School of Engineering,(Affiliated

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM

ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

CS 207 - Data Science and Visualization Spring 2016

CS 207 - Data Science and Visualization Spring 2016 CS 207 - Data Science and Visualization Spring 2016 Professor: Sorelle Friedler sorelle@cs.haverford.edu An introduction to techniques for the automated and human-assisted analysis of data sets. These

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Gephi Tutorial Quick Start

Gephi Tutorial Quick Start Gephi Tutorial Welcome to this introduction tutorial. It will guide you to the basic steps of network visualization and manipulation in Gephi. Gephi version 0.7alpha2 was used to do this tutorial. Get

More information

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia

More information

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

More information

FiskP, DLLP and XML

FiskP, DLLP and XML XML-based Data Integration for Semantic Information Portals Patrick Lehti, Peter Fankhauser, Silvia von Stackelberg, Nitesh Shrestha Fraunhofer IPSI, Darmstadt, Germany lehti,fankhaus,sstackel@ipsi.fraunhofer.de

More information

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies

Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China

Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Corporate Leaders Analytics and Network System (CLANS): Constructing and Mining Social Networks among Corporations and Business Elites in China Yuanyuan Man, Shuai Wang, Yi Li, Yong Zhang, Long Cheng,

More information

Analysis of an Outpatient Consultation Network in a Veterans Administration Health Care System Hospital.

Analysis of an Outpatient Consultation Network in a Veterans Administration Health Care System Hospital. Analysis of an Outpatient Consultation Network in a Veterans Administration Health Care System Hospital. Alon Ben-Ari. December 8, 2015 Introduction Many healtcare organizations face the challenge of scaling

More information

A Prototype System for Educational Data Warehousing and Mining 1

A Prototype System for Educational Data Warehousing and Mining 1 A Prototype System for Educational Data Warehousing and Mining 1 Nikolaos Dimokas, Nikolaos Mittas, Alexandros Nanopoulos, Lefteris Angelis Department of Informatics, Aristotle University of Thessaloniki

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY

SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY SECURITY METRICS: MEASUREMENTS TO SUPPORT THE CONTINUED DEVELOPMENT OF INFORMATION SECURITY TECHNOLOGY Shirley Radack, Editor Computer Security Division Information Technology Laboratory National Institute

More information

Specific Usage of Visual Data Analysis Techniques

Specific Usage of Visual Data Analysis Techniques Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia

More information

Enhancing the Ranking of a Web Page in the Ocean of Data

Enhancing the Ranking of a Web Page in the Ocean of Data Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India hkshitesh@gmail.com In today

More information

Using Big Data Analytics

Using Big Data Analytics Using Big Data Analytics to find your Competitive Advantage Alexander van Servellen a.vanservellen@elsevier.com 2013 Electronic Resources and Consortia (November 6 th, 2013) The Topic What is Big Data

More information

Using bibliometric maps of science in a science policy context

Using bibliometric maps of science in a science policy context Using bibliometric maps of science in a science policy context Ed Noyons ABSTRACT Within the context of science policy new softwares has been created for mapping, and for this reason it is necessary to

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

Typo-Squatting: The Curse of Popularity

Typo-Squatting: The Curse of Popularity Typo-Squatting: The Curse of Popularity Alessandro Linari 1,2 Faye Mitchell 1 David Duce 1 Stephen Morris 2 1 Oxford Brookes University, 2 Nominet UK {alinari,frmitchell,daduce}@brookes.ac.uk, stephen.morris@nominet.org.uk

More information

A Color Placement Support System for Visualization Designs Based on Subjective Color Balance

A Color Placement Support System for Visualization Designs Based on Subjective Color Balance A Color Placement Support System for Visualization Designs Based on Subjective Color Balance Eric Cooper and Katsuari Kamei College of Information Science and Engineering Ritsumeikan University Abstract:

More information

Database Marketing, Business Intelligence and Knowledge Discovery

Database Marketing, Business Intelligence and Knowledge Discovery Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski

More information

Application of Social Network Analysis to Collaborative Team Formation

Application of Social Network Analysis to Collaborative Team Formation Application of Social Network Analysis to Collaborative Team Formation Michelle Cheatham Kevin Cleereman Information Directorate Information Directorate AFRL AFRL WPAFB, OH 45433 WPAFB, OH 45433 michelle.cheatham@wpafb.af.mil

More information

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY

RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY RUTHERFORD HIGH SCHOOL Rutherford, New Jersey COURSE OUTLINE STATISTICS AND PROBABILITY I. INTRODUCTION According to the Common Core Standards (2010), Decisions or predictions are often based on data numbers

More information

Handling the Complexity of RDF Data: Combining List and Graph Visualization

Handling the Complexity of RDF Data: Combining List and Graph Visualization Handling the Complexity of RDF Data: Combining List and Graph Visualization Philipp Heim and Jürgen Ziegler (University of Duisburg-Essen, Germany philipp.heim, juergen.ziegler@uni-due.de) Abstract: An

More information

Fault Analysis in Software with the Data Interaction of Classes

Fault Analysis in Software with the Data Interaction of Classes , pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

A Knowledge Management Framework Using Business Intelligence Solutions

A Knowledge Management Framework Using Business Intelligence Solutions www.ijcsi.org 102 A Knowledge Management Framework Using Business Intelligence Solutions Marwa Gadu 1 and Prof. Dr. Nashaat El-Khameesy 2 1 Computer and Information Systems Department, Sadat Academy For

More information

Network Analysis of a Large Scale Open Source Project

Network Analysis of a Large Scale Open Source Project 2014 40th Euromicro Conference on Software Engineering and Advanced Applications Network Analysis of a Large Scale Open Source Project Alma Oručević-Alagić, Martin Höst Department of Computer Science,

More information