Text and data analytics for social network mining Goran Nenadic John Keane, Xiao-Jun Zeng School of Computer Science G.Nenadic@manchester.ac.uk Manchester Institute of Biotechnology
Topics Aim and tasks Background current work Challenges
Text and data analytics Develop a dedicated workbench for integrated data and text analytics to facilitate social science research Data-intensive social media data is big data integrate structured and unstructured data Knowledge-intensive use meta-data, semantics make use of linked data Task-specific but using re-usable components
Examples of current/previous work Mining healthcare Web 2.0 Finding health-related quality of life issues (chronic diseases, e.g. thyroid cancer, brain tumours) Medication safety monitoring summarise patient experience, but also find outliers Mental health monitoring depression, self-harm capturing mood changes
Examples of current/previous work
Examples of current/previous work Security intelligence military intelligence smart cities multi-modal data Financial forecasting Twitter Can Predict the Stock Market FOREX forecasting
Examples of future work Evolution of Public Mood over Social Networks Effects of Universal Interventions understand the main drivers behind network dynamics of mood changes, across a population and over time Fashion in Science can social media help reveal what is fashionable and why?
Challenges Social media language(s) extremely dynamic, introducing new words and/or new meanings extremely noisy layman terminologies various sub-languages financial vs. healthcare
Medwatcher Social, www.medwatcher.org/icpe
Challenges Social media language(s) long(-ish) and short texts mostly subjective sentiment analysis/opinion mining sarcasm, irony, etc. but also facts multi-lingual not only text multi-modal/complex data linking out
Challenges Text analytics identify (key) entities of interest identify (key) domain area(s) learn-as-you-go vocabularies just-enough grammatical processing negations, speculations, sarcasm identify source of information, provenance mapping to knowledge sources
Challenges Analytics often BIG data, but there are SMALL data time varying and dynamic changes often requires real-time or online analytics real-time pre-processing causality analysis is much more difficult what is the source and what is the effect? summarisation and consolidation of data e.g. conflicting data anonymisation and disclosure
Challenges Data storing and representing BIG data data streaming redundancy, replication, versioning Linked Data? access and privacy can we mine the data without explicit consent? ownership? What is the best practice? shared semantic resources no standard resources Linked data, ontologies and knowledge bases
Challenges Social Science Research Objects data, methods, questions http://www.researchobject.org/ Ethics, legal et al. Can we mine data without explicit consent? How much do we trust social media data? automatically generated tweets? Aggregation of personal data? What to make public? Privacy preservation
Summary/Discussion Social media data is complex multi-modal data language is extremely complex Social network analysis combining structured and unstructured data combining raw and meta-data online analytics on data streams data integration is key Interdisciplinary approaches Data/knowledge intensive
What is social media? Social media refers to interaction among people in which they create, share, and/or exchange information and ideas in virtual communities and networks. Social media is "a group of Internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of usergenerated content.
What makes social media? collaborative projects (for example, Wikipedia) blogs and microblogs (for example, Twitter) content communities (for example, YouTube and DailyMotion) social networking sites (for example, Facebook) virtual game-worlds (e.g., World of Warcraft) virtual social worlds (e.g. Second Life)?