Hands-on Tutorial: From Words to Networks: Extraction and Analysis of Semantic Network Data from Text Data. What we will do today
|
|
|
- Sabina George
- 10 years ago
- Views:
Transcription
1 Hands-on Tutorial: From Words to Networks: Extraction and Analysis of Semantic Network Data from Text Data St. Petersburg State University, May 19, 2013 Jana Diesner, PhD Assistant Professor The University of Illinois University of Illinois Urbana-Champaign 1 What we will do today Gain methodological and hands-on expertise in: 1. Information Extraction and Relation Extraction Distill relevant information from text data Construct one-mode & multi-mode semantic networks from unstructured, natural language text data Several natural language processing/ text mining techniques Pre-processing Identify salient concepts from single documents and corpora Create and apply codebooks (aka dictionaries or thesauri) Locate and classify entities that can serve as nodes for networks. Link entities into edges (relation extraction) 2. Network Analysis Collect, visualize, analyze, interpret network data Compute basic network metrics on the graph and node level 2
2 What we will do today 3. Computational Thinking A fundamental skill that people from all backgrounds can use to solve problems in their domain. Like reading, writing, arithmetic. Not a rote skill. An approach to solving problems, designing systems and understanding human behavior that draws on concepts fundamental to computer science. 3 Introduction: Network Analysis 4
3 The Concept of Networks Socio-technical: The Internet Social: High School Dating 5 Behavioral Data Interaction data Network Analysis: Workflow Data management and analysis Data representation in relational form Data analysis Utilization Answer substantive and graph-theoretic questions Develop and test hypothesis and theories Create visualizations Populate databases Simulations, assess interventions Input to further computations, e.g. machine learning
4 Questions that Network Analysis Helps to Answer General types of questions: What does it look like? (Visualization) What are the structure, functioning, dynamics of a network? Who are the key entities? (Key player analysis) Which subgroups exist? (Clustering) What would happen if? (Simulation) Specific questions: Who talks to whom? About what? How do ideas and innovations emerge, spread, change and vanish in society? Who are the key players in a network? What benefits and risks result from an observed network structure for the network and 7 its wider context? Network Analysis: Core Idea and Relevance Concurrently study nodes(entities) and edges(relations) Understand patterns of relationships between entities Understand how micro-level behavior leads to macrolevel outcomes -Widespread acknowledgement of everything being connected -Popularity of social networking services -Advances in computational solutions for network analysis Strong demand for solid knowledge & skills in network analysis in academia, administration, business.
5 Network Basics: Nodes and Edges Nodes(aka points, vertices) Classical: social agents (agents, groups): social network One type of nodes: one-mode network Multiple node classes (e.g. information and agents): multi-mode network Socio-technical networks (people and infrastructures) Edges(aka links, ties) Binary or weighted (frequency, probability, ordinal data) Directed or not Typed or not One type: simplex, multiple types: multiplex 9 Network Basics: Network Characteristics Structure Patterns of relations among entities Network analysis: understand the structure, functioning and dynamics of networks Consider entities & relations simultaneously Dynamic Complex Multitude of interactions Simple decision the node level can lead to complex structure and behavior 10
6 Network Basics: Levels of Analysis Node: Egocentric: ego and respective alters Dyad: Undirected: (N square -N)/2 Directed: (N square N) Reciprocity? Triad: Georg Simmel: triad smallest meaningful social unit Directed: 16 isomorphic classes (triad census) Cluster: grouping Complete network 11 Levels of Analysis: Triads Exercise: The enemy of my enemy is my friend The friend of my friend is my friend The friend of my enemy is my enemy The enemy of my friend is my enemy 12
7 Network Basics: Structural Balance Heider(1940): generalization of theory of cognitive dissonance, extended by Cartwright and Harary(1956) (blue = pos, red = neg) P O P O P O P O P O P O P O P O 1 or 3 +: balanced: No tensions, stable 0 or 2+: unbalanced: tension, stress, dissonance, change, unstable Network Basics: Structural Balance A triad is balanced if its sign (product of signs) is positive balanced unbalanced 14
8 From Reading Tea Leaves to Metrics Who is key? Depends on: With respect to what dimension of power? Image of David Krackhardt s kite network from 15 Node-level Network Metrics: Dimensions of Power and Influence Degree Centrality Idea: immediate contacts (ego-network per node) Power: Prestige, Action Roles: Star, Hub Computation: Sum of direct links per node Closeness Centrality Idea: Reaching Power: Fastest access to other nodes or what flows through the network Roles: Monitor, Transmitter Computation: Inverse of sum of geodesic distances (shortest path) from a node to all other nodes 16
9 Node-level Network Metrics: Dimensions of Power and Influence Betweenness Centrality Idea: Lying in between Power: Control, Mediation Roles: Broker, Gatekeeper, Bridge, Liason Computation: Extent to which a node is on shortest path between any pair of nodes Eigenvector Centrality Idea: Close to power Power: Access Roles: Lobbyist Equivalence Regular (strict) Structural (relaxed) Idea: Redundancy, Resilience Models of Network Evolution: Random Graphs Run in NetLogo time Principle: At each time step: a) equal chance for any pair of unconnected nodes to get connected b) one such pair picked randomly and linked Question: What do you think will happen? 3
10 Models of Network Evolution: Random Graphs Model: Random graphs (Erdős, Rényi 1959) A giant component emerges Component = connected group of nodes Node degree (degree centrality) in resulting network follows a normal distribution, no highly connected nodes 19 Models of Network Evolution: Scale-Free Networks Principle: Bias: a node s chance of getting linked is directly proportional to its degree. Example: Meet Ben and Amy. Everybody likes them. Therefore, at each time step, connections to Ben and Amy are a little more likely than all other connections. Question: What do you think will happen?
11 Models of Network Evolution: Scale-Free Networks Model: preferential attachment, aka scale free networks, power-law networks (Barabási & Albert 1999) Emergence of hubs (nodes with high degree centrality) A node s tenure and popularity translate into node s degree This explains the first mover s advantage (e.g. Microsoft) Question: How could Google as a late-comer in the search engine market win over the majority of users? A: Fitness function 21 Random Networks vs. Scale-Free Networks Barabasi(2002),pg. 71
12 Scale-Free Networks Skewed distributions have several names/ flavors: Power law: polynomial distribution with scale invariance Scale free: no typical/ average/representative nodes Popularized by Barabasi(2003): Linked Pareto principle, aka 80/20 rule Vilfredo Pareto, around 1900 Generalized: 80% of X cause/produce/consume20% of Y, while 20% of X cause/produce/consume 80% of Y Long Tail: a few are bestsellers while most books are sold in low numbers Zipf slaw: a word s frequency is inversely proportional to the word s rank in a frequency table Models of Network Evolution: Small World You (red) know your friends (blue) and your friend s friends (blue) (FOAF = friend of a friend) Principle: Rewiring (introduce randomness) Example: Your friend s friend Ben want to meet Amy. How many people does he have to go through to be introduced to her? Are you helpful in this process? (time 1) Now lets assume you knew Amy (t 2) You = shortcut Shortcuts make the world small Model: Small World (Milgram 1967) 24 time 1 2
13 Models of Network Evolution: Small World Milgram: experiment ($680 budget) Wichita, KS Omaha, NB send this to a personal acquaintance you know on a first name basis who is more likely than you to know the target person Wife of a divinity school teacher, Cambridge, MA Stockbroker in Boston, MA Models of Network Evolution: Small World Findings (Milgram, below): social distance > physical distance Replicated for Facebook (721 mio users, 69 billion links) Average distance: 4.74, i.e intermediaries or "degrees of separation (Backstrom et al. 2012)
14 Network Topologies: Forms of Organization and Organizing Name Underlying Principle Structural Fingerprint Erdoes Renyi Random Graph Scale Free Network Small World Network Network Topologies: Forms of Organization and Organizing Randomness Node degree follow normal distribution Preferential Attachment Most nodes have a few links while a few nodes have many links (hubs) Shortcuts Friends know each other Nodes connected to its neighbors and a few distant nodes Hierarchy Power Directed, acyclic graph Cellular Network Core Periphery Network Balance between conceal and coordinate Trust Dense connectionswithin cells, sparse connections among cells.? Nodes belong either to core or periphery. No ties between periphery nodes, but from core to core and periphery.
15 Differences to Other Domains Statistics: General utility method Independence assumption (iid: independent and identicallydistributed random variables) Social sciences: Reduction of social concepts and phenomena to unstructured data Focus on entities and their attributes Attributes static across social contexts Data collection via sampling Bills with a bell curve on it is money from the past Network Analysis: Becoming a general utility method Dependency assumption Express questions as structured variables Focus on entities and their relations Relations space and time dependent Some entity attributes can be formalized as relations Data collection via census 29 Network Basics: Network Analysis Process 1. Specification of goal, question, or task. 2. Specification of relevant nodes, edges, and network boundaries. 3. Data collection if no data given. 4. Representation of relational data as a list, matrix, or graph. 5. Analysis and utilization of relational data. 6. Result validation. Do error analysis if applicable. 7. Interpretation of the results with respect to step 1. 30
16 The network analysis process 1. Specify goal, question, or task Identified as gap or contradiction in prior work Given by client 2. Specify relevant nodes, edges, boundaries Network boundaries: Natural (all comments on a blog) Demographic, ecological (all publications by UIUC researchers) 31 The network analysis process 3. Data collection Complete population (all online information about GSLIS) Snowballing Problems? Starting point Missing isolates Archival data 4. Representation of relational data as a list, matrix, or graph All isomorphic Express the same information without loss of information Can be translated into one another (as opposed to transformed) 32
17 The network analysis process 5. Analysis and utilization of relational data Data management: database operations: store, retrieve, search, merge Heuristic: Visualization Analytical: Network Analysis Prognostic, exploring possible scenarios: Simulation Input to machine learning system Combination/ integration with classical statistics Computing some metrics on network data is not the same as doing network analysis 33 The network analysis process 6. Result validation. Do error analysis if applicable. Generalization Limitations 7. Interpretation of the results with respect to step 1. Implications Suggestions, e.g. policies Upper and lower bound for generalization: under what conditions do your findings hold truth? Derive hypotheses and theories Build models 34
18 Example: The network analysis process in action 1985*: Kenneth Lay. Houston, Texas Business: gas supplier, energy broker, global commodity trader, and other $$ Success $$! 2001: 7th-largest business organization (by revenue) in the USA, 21,000 employees in over 40 countries Stock market s darling 12/2001: Bankruptcy Charged with illicit accounting and business practices Involved Auditor: Arthur Andersen Smartest guys in the room 35 Example: Network analysis process in action 1. Substantive, network related research questions: How do the structure and functioning of an organization change during a crisis? How does interpersonal communication change during a crisis? 2. Specification of relevant nodes, edges, network boundaries Nodes: people, edges: communication Trade-off between coordination and concealment Diesner J, Frantz T, Carley KM (2005) Communication Networks from the Enron Corpus "It's Always About the People 36 Enron is no Different. Computational and Mathematical Organization Theory (CMOT), 11(3),
19 Example: Network analysis process in action 3. Data: Inquiries by SEC and FERC in > release of 620,000 s Real communication from a real organization (over time 10/ /2002) Rare glimpse into organizational processes, culture, crisis from rawrelational data ( addresses) to network data (people, content, context) to insights and knowledge Compiled career history for 676 people with full name, addresses (1 to 17 per person, average: 2.2), career history (110 job titles, 13 positions, 8 ranks) Social structure (headers) Semantics (bodies) Board (21) Chairman (10), Vice Chairman (8), Board of Directors (3) Executive Management (32) CEO (18), President (14) Senior Management (224) COO (2), CFO (4), CRO (1), Treasurer (7), Vice President (82), Director (127) Lawyers (46) Lawyer (46) Managers (119) Manager (119) Traders (27) Trader (27) Specialists (134) Andersen (3), Specialist (75), Analyst (56) Associates (136) Associate (51), Employee 37 (76), Administrator (3), Contractor (3) Example: Network analysis process in action October 2000 October , 7. Network Analysisand Interpretation -> Answers: Network: higher density, heterogeneity, cohesion, new links More by-passing of formal chains or hierarchies Pair-wise relations intensified (trusted others), triads decreased More communication, but communication diversity and entropy decrease (less words, less distinct words, higher redundancy) Discourse drifts towards polarized ends of themes and issues? Individual patterns start to strongly vary depending on addressee 38
20 Network Beispiel 2: Visualization: Co-ownership Kuenstler beyond auction houses (qualitative, text data based) 39 Network Basics: Network Visualizations Heuristic utility Appropriate for: Stimulating communication Inappropriate for: Large-scale data Objective analysis Be careful when used for consulting Layout options Idea of spring embedders Diplomatic: Circles 40
21 Bravo!You have passed the primer in network analysis! 41 Readings and References Network Analysis introductory text books: Hanneman, RA & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California. URL: Wasserman, S. and K. Faust, Social Network Analysis: Methods and Applications Cambridge University Press), Easley, D. & Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press. URL: Network models we ran them in NetLogo: Erdős, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae Debrecen, 6, Barabási, A., & Albert, R. (1999). Emergence of Scaling in Random Networks. Science, 286(5439), 509. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of 'small-world' networks. Nature, 393, Milgram, Stanley. (1967). The Small World Problem, Psychology Today, 2: Backstrom, L., Boldi, P., Rosa, M., Ugander, J., & Vigna, S. (2011). Four degrees of separation. Arxiv preprint arxiv:
22 Readings and References On skewed distributions: Anderson, Chris (2006). The Long Tail: Why the Future of Business Is Selling Less of More. New York: Hyperion. Barabasi, A.L. (2002). Linked: the new science of networks Perseus Publishing, Cambridge. Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf's law. Contemporary Physics 46: doi: / Simon, H. A. (1955). On a Class of Skew Distribution Functions. Biometrika 42: doi: / Zipf, George K. (1949) Human Behavior and the Principle of Least- Effort. Addison-Wesley. Triads: Simmel, G. (1950). The Sociology of Georg Simmel. Free Press. Balance Theory: Heider, F. (1982). The psychology of interpersonal relations: Lawrence Erlbaum. Computational Thinking: Wing, J. M. (2006). Computational thinking. CACM, 49(3), References for Images Internet: High school dating: Data from P.S. Bearman, J. Moody, & K. Stovel, Chains of affection: The structure of adolescent romantic and sexual networks, American Journal of Sociology 110, (2004), image by Mark Newman Art markets: Diesner J., Stützer, C. (2008) Finding relations. Presentation at Chemnitz Art Museum. Core periphery: Image from ladamic/img/politicalblogs.jpg) Skewed distribution: Enron:
23 Network Analysis: Learning Resources Mailing list: Organization: International Network for Social Network Analysis (INSNA): Journals: Social Network Analysis and Mining 26+information+retrieval/journal/13278 Network Science: Connections: Social Networks: 45 Network Data Sets: Types Social From small-scale (observations) to web-scale (social media) Collaboration (who works with whom) Co-author, co-appear in movies, co-edit, co-chair (corporate boards) Communication Who talks to whom Information The web, citation networks, semantic networks Technological Routers, power generation stations Biological Food web, animals, cell metabolism, neurological For typology see Kleinberg textbook, chapter 2.4, Images
24 Network Data Sets Classic small scale, small group data: at Smaller collection of classic and some older internet datasets: Large scale, social media data: More internet datasets: (Socio)Technical networks: Network Data Sets Benchmark datasets and competitions: php Kitchen sink sorted by type: Large kitchen sink: php A smaller kitchen sink: epage And an even smaller kitchen sink:
25 Network Data Sets: Collection NodeXL( Netscrape ( ape) Codebase: ScraperWiki Social Network Analysis Software: Overview and Main Stream Tools Overview: ware(name and URL, main functionality, input and output formats, platforms, license, costs) Mature in terms of metrics, matrix transformation routines, visualization, import and export from/ to various data formats Biggest player: UCINET ( limited scalability, commercial, free trial Pajek( Comparatively great scalability, free visone: for academic research) NetMiner( (commercial)
26 Network Analysis Software: Open Source Low entry, high ceiling: NodeXL ( (grouping, viz, baseline set of metrics) Currently popular: Gephi: format conversion) R routines/ packages for SNA ( now mainly integrated into statnet( Highly flexible graph manipulation code base: JUNG update since Jan 2010) Library and GUI-based tool: GUESS (uses JUNG) ( (no update since 2007) From Words to Networks: Methodology
27 Text Data Network Data Construction & Analysis Scalable, reliable, robust methods & technologies Network data Data analysis Applications Answer substantive questions about networks Fill databases Forecasting: Explore future behavior and whatif scenarios Input to other computations, e.g. machine learning Text Analysis and Social Science and Computing: Levels of Resolution Distant readings Close readings
28 Accuracy Assessment in Information Retrieval: Concepts of recall and precision FN TN Relevant items: left of straight line Retrieved items: in oval Red regions: errors: Left: false negatives Right: false positives Precision P: left green region/ oval Recall R: left green region/ left region Result TP: True positive (yeah, correct result) FN: False negative (oops, blind spots) Truth TP FP: False positive (oops, false alarm) FP TP: True negative (yeah, correctly missing results) Accuracy Assessment in Information Retrieval: Concepts of recall and precision Recall = TP/ TP+FN Precision = TP/ TP+FP Relationship? Inverse. Thus a harmonic mean is calculated: F= T*P / 0.5 (T+R) FN TP FP TN
29 Text Analysis and Social Science and Computing: Levels of Resolution Distant readings Today s workshop Close readings From Words to Networks: Association networks, based on co-occurrence of concepts in text data Basic principle and assumption: Language, information and knowledge can be modeled and represented as relational data Les Miserables: M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E 69, (2004). Diesner J, KumaraguruP, Carley KM (2005) Mental Models of Data Privacy and Security Extracted from Interviews with Indians. 55th Annual Conference 58 of International Communication Association (ICA), New York, NY, May 2005.
30 Classification: A main Task in Text Coding Ontology: the study of being or existence Taxonomy: practice & science of classification Modifying ontologies: Human updating, reindexing Solutions? Citizen science, crowdsourcing Automation Relation Extraction: One-mode and Multi-mode Networks Example from UN News Service (New York), : Jan Pronk, the Special Representative of Secretary-General Kofi Annanto Sudan, today called for the immediate return of the vehicles to World Food Programme(WFP) and NGOs. extract relational data: one mode/ semantic networks multi-mode/ meta-networks Jan Pronk Jan Pronk Sudan vehicles Kofi Annan Sudan vehicles Kofi Annan WFP NGO s WFP NGO s Knowledge Person Location Organization Resource Identify relevant nodes Classify identified nodes 60
31 From Words to Networks: Relation Extraction Diesner, J., & Carley, K. M. (2005). Revealing Social Structure from Texts: Meta-Matrix Text Analysis as a novel method for Network Text Analysis. In V. K. Narayanan & D. J. Armstrong (Eds.), Causal Mapping for Information Systems and Technology Research (pp ). Harrisburg, PA: Idea Group Publishing. 61 Motivation for Relational Text Analysis Fact: Collection and storage of large volumes of text data cheap, easy and efficient Interviews, books, news wire articles, legal documents, annual reports, data from web 1.0 (web sites) and web 2.0 ( s, blogs, chats, ) Need: Methods and tools for automated, robust and reliable knowledge discovery and reasoning about information, incl. network structures, from text data. Challenge: Effective, efficient and controlled extraction of relevant (user-defined) instances of categories (e.g. node and edge classes) from unstructured, natural language text data. 62
32 Basic Types of Information in Text Data Morphology: structure of words E.g. spelling, inflections, derivations Syntax: relationships between words e.g. parts of speech tagging Semantics: meaning of language e.g. word sense disambiguation, grammars Pragmatics: language in context and social use of language e.g. sentiment analysis, discourse analysis Relation Extraction (this lab): borrows from all of the above 63 Network Data Extracted from Texts Respective theories and methods developed across many disciplines: Artificial Intelligence (e.g. Sowa) Cognition and Linguistics (e.g. Collins) Communications (e.g. Doerfel, Monge) Political Science (e.g. Schrodt) Sociology (e.g. Carley, Mohr) Computer Science (e.g. McCallum) 64
33 Methods for Constructing Networks of Words 1. Mental Models (Spreading Activation) (Collins & Loftus 1975) 2. Case Grammar and Frame Semantics (Fillmore 1982, 1986) 3. Discourse Representation Theory (Kamp 1981) 4. Knowledge representation in AI, assertional semantic networks (Shapiro 1971, Woods 1975) 5. Centering Resonance Analysis (Corman et al. 2002) 6. Mind maps (Buzan 1974) 7. Concept maps (Novak & Gowin 1984) 8. Hypertext (Trigg & Weiser 1986) 9. Qualitative text coding (Grounded Theory) (Glaser & Strauss 1967) 10. Definitional semantic networks incl. text coding with ontologies (Fellbaum 1998) 11. Semantic Web (Berners-Lee et al. 2001, Van Atteveldt 2008) 12. Frames (Minsky 1974) 13. Semantic Grammars (Franzosi 1989, Roberts 1997) 14. Network Text Analysis in social science (Carley & Palmquist 1991) 15. Event Coding in pol. science (King & Lowe 2003, Schrodt et al. 2008) 16. Semantic networks in comm. science (Danowski 1993, Doerfel 1998) 17. Probabilistic graphical models (Howard 1989, Pearl 1988) Automation Abstraction Generalization Diesner, J. (2012). Uncovering and managing the impact of methodological choices for the computational construction 65 of socio-technical networks from texts. CMU-ISR , Carnegie Mellon University. Exercise Task: Represent the relevant information contained in the text data as network data. Questions: What criteria did you use to identify nodes? Edges? How did you come up with your criteria? How many different criteria (features) did you use? How consistent were you in applying your criteria? How similar are the solutions from different teams? 66
34 Lab: Information Extraction Relation Extraction 67 Lab: Three Step Process Text pre-processing: Natural Language Processing (NLP) techniques precondition for finding meaningful information, incl. representations of nodes and edges, in text data Selection of relevant entities (positive filter: thesaurus, entity extraction) or removal of irrelevant entities (negative filter: delete list) given the research question, data, domain Node identification (and classification) Thesaurus-based (manually or automatically built) Edge identification (and classification) Identification: Proximity based approach (cooccurrence) 68
35 Finding salient terms: Cumulative frequency Bag of Words How to: Load texts into AutoMap Create Union Concept List Generate: Concept List: Union Sort results file by decreasing cumulative frequency This is one dimension of salience, prominence, importance What are other dimensions? Zipf, George Kingsley (1949). Human Behavior and the Principle of Least Effort. Cambridge, Mass.: Addison-Wesley 69 Refine Bag of Words: Stop words Stop words listed in delete list Serves as negative filter (remove from text data what s contained in list) How to: Preprocess: Text Refinement: Apply delete list How to construct a delete list? In concept list viewer Use predefined lists for English from apply delete list panel Construct your own, one entry per line (incl. n-grams), save as.csvfile Notion of adjacency (direct vs. rhetorical, which maintains original distance of words)
36 Finding salient terms: TD*IDF What determines word s importance in corpus? Discriminating and distinguishing tf = term frequency (importance of term within document) cumulative occurrence of term x in document y tf = total number of terms in document y idf = inverse document freq. (importance of a term in corpus) idf= log total number of documents in corpus total number of documents containing term x tfidf= tf* idf tfidf: strategy and measure High if tf= high and df= low High for signal, low for noise How to: Generate: Concept List: Union 71 Finding salient terms: N-grams Meaningful multi-words units How to: Generate: generalization thesaurus: bigram Sort by decreasing frequency and decreasing tfidf, pick suitable entries, removed duplicates
37 Text pre-processing: Stemming Detects inflections and derivations of concepts Converts each term into its morpheme How to: Pre-process: text refinement: stemming Two families of stemmers: Porter (rule-based): high efficiency, poor human readability Krovetz(dictionary-based): lower efficiency, better human readability Porter, M.F An algorithm for suffix stripping. I 14(3): Krovetz, Robert (1995). Word Sense Disambiguation for Large Text Databases. Unpublished PhD Thesis, University of Massachusetts. 73 Text pre-processing: When to stop? When to stop? ( criteria ) Orcam srazor: 14th-century English logician and Franciscan friar, William of Ockham. Aka lex parsimoniae(law of parsimony) Basic idea: All other things being equal, the simplest solution is the best. Why does it matter? Don t want: Overfitting Want: Generalizability 74
38 Positive filter: Thesaurus Text term (incl. n-gram), concept, entity class Functions: disambiguation (1,2), consolidation (3,4), n-gram concatenation (3,4) Examples: 1. Apple, apple, organization 2. apple, apple, resource 3. Digital Humanities, digital_humanities, knowledge 4. computational folkloristic, digital_humanities, knowledge Positive filter: Thesaurus Construction How to use thesaurus: List relevant entities: serves as positive filter Then construct one-mode network, aka semantic network Cross-classify relevant entities with ontological categories Then construct multi-mode network Help for constructing thesauri: Computer-supported: union Concept List (terms with highest frequency and tfidf values after deletion), bigrams Automated: AutoMap: generate: thesaurus suggestion (more on slide 42) External sources (e.g. CIA World Fact Book, WordNet) Other automated techniques, e.g. Bootstrapping Limitations: Tedious, incomplete, outdated, deterministic 76
39 Ontology: the who, what, when, where, why, and how of events Where? (places) Who? (people, groups) What? (tasks, events) When? (time) house past How? (resources, knowledge) animal ghost town pale Why? (beliefs, sentiments, mental models) 77 Linking nodes: Approaches Syntax and surface patterns(fillmore, Schrodt) Linguistics: parsing trees Logical and Knowledge Representation in Artificial Intelligence (Shapiro) first order calculus, predicate logic (quantifiers) Distance based (Danowski) Communications: distance in text (windowing) or in space (Euclidean) Probabilistic, learning from data (McCallum) Machine learning techniques: probabilistic (Bayesian), kernels (N-dimensional similarity), graphical models (hidden markov models, conditional random fields), boot strapping Cites and summary in Diesner, J., & Carley, K. M. (2010). Relation Extraction from Texts (in German, title: Extraktionrelationaler DatenausTexten). In C. Stegbauer& R. Häußling(Eds.), Handbook Network Research (Handbuch Netzwerkforschung) (pp ). Vs Verlag. 78
40 Link formation: one simple approach: Distance-based Distance based approach, Windowing: Text Unit (text, paragraph, sentence) Window Size (2 to N) Adjacency (direct or rhetorical) Leader xxx actively involved xxx several industry xxx civic associations. Exercise: given the following thesaurus, what combination of distance based features results in a useful relational structure? Thesaurus: leader, leader; involved, engage; industry, industry; civic associations, civic_association Sentence, thesaurus content only, rhetorical adjacency, window size 5: civic_association leader engage industry Danowski, J. (1982). A network-based content analysis methodology for computer-mediated communication: An illustration with a computer bulletin board. In R. Bostrom (Ed.), Communication Yearbook, 6: New Brunswick, NJ: Transaction Books. 79 Distance-based node linkage How to: AutoMap: Generate: meta network: meta-network dynetlml(union) Set parameters for this course, mainly select: Directionality: bidirectional An appropriate window size An appropriate stop unit A meta-network thesaurus, and check the box: use master thesaurus format» The thesaurus needs to have the following header row:» conceptfrom, conceptto, metaontology, metaname» You don t need a value in the last column
41 Thesaurus Construction From rule-based and deterministic methods to probabilistic and machine-learning based methods How to use it in AutoMap: Load raw data, no preprocessing Generate: thesaurus suggestion Decision support wizard on that panel has overview on types Most accurate one: the model in the middle, sufficient for most work in this course: the one right above the one in the middle References Recommended further readings related to this session: McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), Diesner, J., Carley, K. M. (2011): Semantic Networks. In G. Barnett (Ed), Encyclopedia of Social Networking, (pp ). Sage Publications. Diesner, J., Carley, K. M. (2011): Words and Networks. In G. Barnett (Ed.), Encyclopedia of Social Networking, (pp ). Sage Publications. 82
42 References Image for Precision and Recall from Wikipedia Mental Models: Johnson-Laird, P. (1983). Mental Models. Cambridge, MA: Harvard University Press. Klimoski, R., & Mohammed, S. (1994). Team mental model: Construct or metaphor? Journal of Management, 20, Rouse, W. B., & Morris, N. M. (1986). On looking into the black box; prospects and limits in the search for mental models. Psychological Bulletin, 100, Readings and References Baker, W. E., & Faulkner, R. R. (1993). The Social Organization of Conspiracy: Illegal Networks in the Heavy Electrical Equipment Industry. American Sociological Review, 58(6), Carley, K. M. (1997). Network text analysis: The network position of concepts. In C. W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts (pp ). Mahwah, NJ, USA: Lawrence Erlbaum Associates, Inc. Diesner, J., & Carley, K. M. (2010). Relational Methods in Crime Research and Law Enforcement. In C. Stegbauer & R. Haeussling (Eds.), Handbook Network Research: Vs Verlag. Diesner, J., Frantz, T. L., & Carley, K. M. (2005). Communication Networks from the Enron Corpus. It's Always About the People. Enron is no Different. Computational & Mathematical Organization Theory, 11(3), Doerfel, M. (1998). What Constitutes Semantic Network Analysis? A Comparison of Research and Methodologies. Connections, 21(2), Mergel, I., Diesner, J., & Carley, K. M. (2010). Attention Networks among Members of Congress. Paper presented at the XXX International Sunbelt Social Network Conference. Mohr, J. (1998). Measuring Meaning Structures. Annual Reviews in Sociology, 24(1), Monge, P. R., & Contractor, N. (2003). Theories of Communication Networks. New York: Oxford University Press. Schrodt, P. A., & Gerner, D. J. (1994). Validity Assessment of a Machine-Coded Event Data Set for the Middle East, American Journal of Political Science, 38(3), Sowa, J. (1992). Semantic Networks. In S. C. Shapiro (Ed.), Encyclopedia of Artificial Intelligence (2nd ed., pp ). New York, NY, USA: Wiley and Sons. 84
43 Acknowledgements This work was supported by the National Science Foundation (NSF) IGERT , the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD , the Air Force Office of Scientific Research (AFOSR) MURI FA , the Office of Naval Research (ONR) MURI N , and a Siebel Scholarship. Additional support was provided by CASOS, the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government. I thank Nikita Basov and the Centre for German and European Studies at the St. Petersburg State University, Russia, for hosting this workshop. 85 Thank you! Q&A For any questions, comments, feedback, follow-upnow and in the future: Jana Diesner [email protected] Phone: ++1 (412) Web: 86
ConText: Network Generation from Text Data. Jana Diesner Bochum, Apr 2015
ConText: Network Generation from Text Data Jana Diesner Bochum, Apr 2015 The concept of Semantic Networks Structured representations of information and knowledge Originally meant to be used for reasoning
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Introduction to Networks and Business Intelligence
Introduction to Networks and Business Intelligence Prof. Dr. Daning Hu Department of Informatics University of Zurich Sep 17th, 2015 Outline Network Science A Random History Network Analysis Network Topological
A comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
Graph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
Borgatti, Steven, Everett, Martin, Johnson, Jeffrey (2013) Analyzing Social Networks Sage
Social Network Analysis in Cultural Anthropology Instructor: Dr. Jeffrey C. Johnson and Dr. Christopher McCarty Email: [email protected] and [email protected] Description and Objectives Social network analysis
How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks
How Placing Limitations on the Size of Personal Networks Changes the Structural Properties of Complex Networks Somayeh Koohborfardhaghighi, Jörn Altmann Technology Management, Economics, and Policy Program
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
An overview of Software Applications for Social Network Analysis
IRSR INTERNATIONAL REVIEW of SOCIAL RESEARCH Volume 3, Issue 3, October 2013, 71-77 International Review of Social Research An overview of Software Applications for Social Network Analysis Ioana-Alexandra
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.
Data Mining on Social Networks Dionysios Sotiropoulos Ph.D. 1 Contents What are Social Media? Mathematical Representation of Social Networks Fundamental Data Mining Concepts Data Mining Tasks on Digital
The mathematics of networks
The mathematics of networks M. E. J. Newman Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109 1040 In much of economic theory it is assumed that economic agents interact,
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks
A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks Text Analytics World, Boston, 2013 Lars Hard, CTO Agenda Difficult text analytics tasks Feature extraction Bio-inspired
Network-Based Tools for the Visualization and Analysis of Domain Models
Network-Based Tools for the Visualization and Analysis of Domain Models Paper presented as the annual meeting of the American Educational Research Association, Philadelphia, PA Hua Wei April 2014 Visualizing
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Text Mining - Scope and Applications
Journal of Computer Science and Applications. ISSN 2231-1270 Volume 5, Number 2 (2013), pp. 51-55 International Research Publication House http://www.irphouse.com Text Mining - Scope and Applications Miss
Segmentation and Classification of Online Chats
Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract One method for analyzing textual chat
IC05 Introduction on Networks &Visualization Nov. 2009. <[email protected]>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
Supervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University
Yifan Chen, Guirong Xue and Yong Yu Apex Data & Knowledge Management LabShanghai Jiao Tong University Presented by Qiang Yang, Hong Kong Univ. of Science and Technology 1 In a Search Engine Company Advertisers
Introduction to social network analysis
Introduction to social network analysis Paola Tubaro University of Greenwich, London 26 March 2012 Introduction to social network analysis Introduction Introducing SNA Rise of online social networking
Search Engines. Stephen Shaw <[email protected]> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
Course Syllabus. BIA658 Social Network Analytics Fall, 2013
Course Syllabus BIA658 Social Network Analytics Fall, 2013 Instructor Yasuaki Sakamoto, Assistant Professor Office: Babbio 632 Office hours: By appointment [email protected] Course Description This
Network Theory: 80/20 Rule and Small Worlds Theory
Scott J. Simon / p. 1 Network Theory: 80/20 Rule and Small Worlds Theory Introduction Starting with isolated research in the early twentieth century, and following with significant gaps in research progress,
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING
CAPTURING THE VALUE OF UNSTRUCTURED DATA: INTRODUCTION TO TEXT MINING Mary-Elizabeth ( M-E ) Eddlestone Principal Systems Engineer, Analytics SAS Customer Loyalty, SAS Institute, Inc. Is there valuable
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Network Analysis Basics and applications to online data
Network Analysis Basics and applications to online data Katherine Ognyanova University of Southern California Prepared for the Annenberg Program for Online Communities, 2010. Relational data Node (actor,
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg
Module Catalogue for the Bachelor Program in Computational Linguistics at the University of Heidelberg March 1, 2007 The catalogue is organized into sections of (1) obligatory modules ( Basismodule ) that
Foundations of Business Intelligence: Databases and Information Management
Foundations of Business Intelligence: Databases and Information Management Problem: HP s numerous systems unable to deliver the information needed for a complete picture of business operations, lack of
Extracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
Temporal Visualization and Analysis of Social Networks
Temporal Visualization and Analysis of Social Networks Peter A. Gloor*, Rob Laubacher MIT {pgloor,rjl}@mit.edu Yan Zhao, Scott B.C. Dynes *Dartmouth {yan.zhao,sdynes}@dartmouth.edu Abstract This paper
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Social Network Analysis using Graph Metrics of Web-based Social Networks
Social Network Analysis using Graph Metrics of Web-based Social Networks Robert Hilbrich Department of Computer Science Humboldt Universität Berlin November 27, 2007 1 / 25 ... taken from http://mynetworkvalue.com/
Get the most value from your surveys with text analysis
PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015
W. Heath Rushing Adsurgo LLC Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare Session H-1 JTCC: October 23, 2015 Outline Demonstration: Recent article on cnn.com Introduction
How to Analyze Company Using Social Network?
How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,
Business Intelligence and Decision Support Systems
Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley
Network VisualizationS
Network VisualizationS When do they make sense? Where to start? Clement Levallois, Assist. Prof. EMLYON Business School v. 1.1, January 2014 Bio notes Education in economics, management, history of science
A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam
A Tutorial on dynamic networks By, Erasmus University Rotterdam V 1.0-2013 Bio notes Education in economics, management, history of science (Ph.D.) Since 2008, turned to digital methods for research. data
01219211 Software Development Training Camp 1 (0-3) Prerequisite : 01204214 Program development skill enhancement camp, at least 48 person-hours.
(International Program) 01219141 Object-Oriented Modeling and Programming 3 (3-0) Object concepts, object-oriented design and analysis, object-oriented analysis relating to developing conceptual models
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy
SOCIAL NETWORK ANALYSIS University of Amsterdam, Netherlands Keywords: Social networks, structuralism, cohesion, brokerage, stratification, network analysis, methods, graph theory, statistical models Contents
1. Classification problems
Neural and Evolutionary Computing. Lab 1: Classification problems Machine Learning test data repository Weka data mining platform Introduction Scilab 1. Classification problems The main aim of a classification
Visualizing Complexity in Networks: Seeing Both the Forest and the Trees
CONNECTIONS 25(1): 37-47 2003 INSNA Visualizing Complexity in Networks: Seeing Both the Forest and the Trees Cathleen McGrath Loyola Marymount University, USA David Krackhardt The Heinz School, Carnegie
Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation
Wasserman, Stanley, and Katherine Faust. 2009. Social Network Analysis: Methods and Applications, Structural Analysis in the Social Sciences. New York, NY: Cambridge University Press. Chapter III: Notation
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks
Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks Imre Varga Abstract In this paper I propose a novel method to model real online social networks where the growing
Temporal Dynamics of Scale-Free Networks
Temporal Dynamics of Scale-Free Networks Erez Shmueli, Yaniv Altshuler, and Alex Sandy Pentland MIT Media Lab {shmueli,yanival,sandy}@media.mit.edu Abstract. Many social, biological, and technological
How to do a Business Network Analysis
How to do a Business Network Analysis by Graham Durant-Law Copyright HolisTech 2006-2007 Information and Knowledge Management Society 1 Format for the Evening Presentation (7:00 pm to 7:40 pm) Essential
ProteinQuest user guide
ProteinQuest user guide 1. Introduction... 3 1.1 With ProteinQuest you can... 3 1.2 ProteinQuest basic version 4 1.3 ProteinQuest extended version... 5 2. ProteinQuest dictionaries... 6 3. Directions for
Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer
Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Bagged Ensemble Classifiers for Sentiment Classification of Movie Reviews
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3 Issue 2 February, 2014 Page No. 3951-3961 Bagged Ensemble Classifiers for Sentiment Classification of Movie
Data Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka ([email protected]) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
General Network Analysis: Graph-theoretic. COMP572 Fall 2009
General Network Analysis: Graph-theoretic Techniques COMP572 Fall 2009 Networks (aka Graphs) A network is a set of vertices, or nodes, and edges that connect pairs of vertices Example: a network with 5
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH
205 A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH ABSTRACT MR. HEMANT KUMAR*; DR. SARMISTHA SARMA** *Assistant Professor, Department of Information Technology (IT), Institute of Innovation in Technology
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Investigating Clinical Care Pathways Correlated with Outcomes
Investigating Clinical Care Pathways Correlated with Outcomes Geetika T. Lakshmanan, Szabolcs Rozsnyai, Fei Wang IBM T. J. Watson Research Center, NY, USA August 2013 Outline Care Pathways Typical Challenges
Social Networks and Social Media
Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich
Big Data Text Mining and Visualization. Anton Heijs
Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
Improving Decision Making and Managing Knowledge
Improving Decision Making and Managing Knowledge Decision Making and Information Systems Information Requirements of Key Decision-Making Groups in a Firm Senior managers, middle managers, operational managers,
Mining Text Data: An Introduction
Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
Search and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
Visualizing e-government Portal and Its Performance in WEBVS
Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government
Data Mining Yelp Data - Predicting rating stars from review text
Data Mining Yelp Data - Predicting rating stars from review text Rakesh Chada Stony Brook University [email protected] Chetan Naik Stony Brook University [email protected] ABSTRACT The majority
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System
An Overview of a Role of Natural Language Processing in An Intelligent Information Retrieval System Asanee Kawtrakul ABSTRACT In information-age society, advanced retrieval technique and the automatic
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
A Review of Data Mining Techniques
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
MINFS544: Business Network Data Analytics and Applications
MINFS544: Business Network Data Analytics and Applications March 30 th, 2015 Daning Hu, Ph.D., Department of Informatics University of Zurich F Schweitzer et al. Science 2009 Stop Contagious Failures in
Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same?
Towards better understanding Cybersecurity: or are "Cyberspace" and "Cyber Space" the same? Stuart Madnick Nazli Choucri Steven Camiña Wei Lee Woon Working Paper CISL# 2012-09 November 2012 Composite Information
Important dimensions of knowledge Knowledge is a firm asset: Knowledge has different forms Knowledge has a location Knowledge is situational Wisdom:
Southern Company Electricity Generators uses Content Management System (CMS). Important dimensions of knowledge: Knowledge is a firm asset: Intangible. Creation of knowledge from data, information, requires
Mining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
Sentiment analysis on tweets in a financial domain
Sentiment analysis on tweets in a financial domain Jasmina Smailović 1,2, Miha Grčar 1, Martin Žnidaršič 1 1 Dept of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia 2 Jožef Stefan International
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
An Introduction to Data Mining
An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING
dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on
Semantically Enhanced Web Personalization Approaches and Techniques
Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,
