Big Data and Complex Networks Analytics Timos Sellis, CSIT Kathy Horadam, MGS
Big Data What is it? Most commonly accepted definition, by Gartner (the 3 Vs) Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. 2
Big Data some stats high-volume, high-velocity and high-variety > 2 million emails sent Every minute (http://www.domo.com/blog/blog /2012/06/08/how-much-data-iscreated-every-minute/) 34,722 likes 100,000 tweets 571 websites added 250,000 items sold on amazon $272,020 spend on web shopping 3
Complex Networks What is it? Network with significant topological features common in real-world networks eg most technological, biological and social networks Rapidly expanding field bringing together mathematics, engineering, computer science, sociology, epidemiology, physics, biology. 4
Big Data and Complex Network Synergies Both share interesting properties Large scale (volume) Complexity (variety) Dynamics (velocity) Interesting analytics algorithms Many applications with both characteristics (social networks, utility networks, security, etc) 5
Big Data - Research Issues (1) Main stream Infrastructure and Architectures (New large scale data architectures, Cloud architectures) Models (Data representation, storage, and retrieval) and Data Access (Query processing and optimization, Privacy, Security) 6
Big Data - Research Issues (2) Complex Data Analytics Computational, mathematical, statistical, and algorithmic techniques for modelling high dimensional data, large graphs, and complex (interrelated) data Learning, inference, prediction, and knowledge discovery for large volumes of dynamic data sets Data retrieval and data mining to facilitate pattern discovery, trend analysis and anomaly detection Dimensionality reduction, sparse data 7
Big Data - Research Issues (3) Highly Streaming Data Positional streams Social network data Mobile app data Game data 8
Big Data - Research Issues (4) Data Integration Findability and search Information fusion of multiple data sources Semantic integration Recommendation systems 9
Networks- Research Issues (1) Analytics Mathematical models of simpler networks do not show the significant topological features. Network structure and community detection Knowledge discovery, especially of characteristic small communities (motifs) in large networks Bipartite networks 10
Networks- Research Issues (2) Dynamics Algorithm development: machine learning, high dimensional data, large networks New topological, statistical techniques Eg. persistent homology: track connectivity changes RMIT could be a national leader if we could develop this further 11
Networks- Research Issues (3) Detection and Prediction Identification of influential or hidden nodes or communities across networks Structural anomaly detection (via supervised or unsupervised learning) Model transmission or flow through network 00 50 Correlation=94%!! Data Fit 0 06 June 2001 1st June 2002 1st June 2003 1st June 2004 1st June 2005 1st June 2006 1st June 2007 1st June 2008 1st June 2009 1st June 2010 1st June 2011 1st June 2001 Fitting period year Extrapolation 12
Networks- Research Issues (4) Location and Spatial Networks Prioritised habitats 13
Possible Research Themes (1) Situation Awareness applications (Disaster Management, Fault detection) Resource Management applications (Ecology, environment, power network management) Public Health applications (Epidemics, medical records) Financial and Forensic applications (Fraud detection, money laundering) Smart cities applications (Transport, Energy) 14
Possible Research Themes (1) Security applications (Biometrics, computer and information security) Positioning Technologies applications (Agriculture, Forest health, real-time tracks, large mobile networks) Education (Learning analytics) 15
RMIT today High-interest, cutting-edge and well-funded research in: Large scale Data Integration Data quality, etc Sensor networks Data driven complex networks, Sensor network data, Distributed Sensor Networks Complex Networks/Graphs network/graph models and structure detection, graph mining, network/graph analysis, prediction, identification and security Positioning apps/technologies Power and Transport networks, network analysis for detecting possible problems, streamed metering data, real time analytics 16
RMIT today - Examples Former Employees Current Employees Insiders Contractors Trusted Business Partners Cloud Providers Anomaly detection Money laundering Epidemic spread Smart metering Biometric Identification 17
RMIT tomorrow Foster collaboration between many disciplines towards large scale information management. For example, planners, designers and technologists can collaborate on designing buildings fitted with sensors using intelligent optimisation techniques. Plan for a major collaborative effort, like a CRC. Build long term partnerships with key international and national public and private organizations. 18
Preliminary SWOT analysis Strengths 1. Infrastructure/data management 2. Complex network dynamics 3. Location based services 4. Information retrieval 5. Optimization 6. Theoretical analysis Opportunities 1. NICTA funding potential for RMIT centre 2. Cover different application areas, compared to on-going activities 3. Identify a short term impact opportunity 4. Identify an opportunity that can attract an industry sector (e.g. logistics, energy and positioning/mobile applications) Weaknesses 1. No major results/history in the area 2. Big data and complex networks on its own is not recognised as an RMIT strength Threats 1. A couple of CoE proposals submitted 2. Some other on-going efforts (CRCs, government CoE) 3. Fragmentation based on disciplines, due to cultural difference 19