DATA WAREHOUSING RESEARCH TRENDS Research trends relevant to data warehousing and OLAP include [Cuzzocrea et al.]: Data source heterogeneity and incongruence Filtering out uncorrelated data Strongly unstructured nature of data sources High scalability Combining the benefits of RDBMS and NoSQL database systems Query optimization issues in Hadoop-based systems Integrating multidimensional data models and techniques with Hadoop visualisation of big data analytics 1
DATA MINING RESEARCH TRENDS Data mining research trends include exploration of the following [Han & Kamber]: Application exploration Scalable and interactive data mining methods Integration of data mining with search engines, database, data warehouse and cloud database systems Mining social and information networks Mining spatiotemporal, moving-objects and cyber-physical systems Mining multimedia, text, and web data Mining biological and biomedical data 2
Data mining with software engineering and system engineering Visual and audio data mining Distributed data mining and real-time data stream mining Privacy protection and information security in data mining The above areas for both data warehousing and data mining largely have a computer technology focus. One of our original definitions of a data warehouse was... a collection of technologies aimed at enabling the knowledge worker to make better and faster decisions. Hence, business goals also need to be addressed in considering the use of these technologies. Also, it is increasingly recognised that the complexity of the data and multiple technologies used by a knowledge worker introduces its own problems. 3
PRACTICAL LESSONS FROM LARGE-SCALE DATA MINING The importance of domain knowledge. Look and know your data. Many data mining projects fail because of the lack of adequate data: suitable data pre-processing, sampling are crucial. Understand the business goals when developing a mining model. Important metrics in training and deployment of mining models: offline training time, online prediction time, model size or memory footprint. Understand the impact of architecture and scalability issues on algorithm design and implementation. Importance of people in considering the above. Importance of two under-researched areas: visualisation real-time interaction with large datasets. 4
BIG DATA ANALYTICS TRENDS AND FUTURE DIRECTIONS Challenges include [Assunção et al.]: Data management Extracting meaningful content quickly out of large volumes of data, especially when it is unstructured. Aggregating and correlating streaming data from multiple sources. Storage enabling easy migration between data centres and cloud providers. Programming models optimized for streaming and/or multidimensional data. Engines combining applications with multiple programming models in a single execution abstraction. Optimizing resource usage and energy consumption during execution of such applications. 5
Model building and scoring Techniques utilising the elasticity and scale of cloud systems. Standards and APIs supporting prediction and wider analytics as services. Visualisation and user interaction Efficient techniques supporting real-time visualisation. Cost-effective devices for large-scale visualisation. Other challenges Business models enabling analytics and analytics as a service on the cloud. Supporting replicability of analytics. Incorporating human expertise into the analytics process, particularly in cloud environments. Location and context-aware techniques for collecting, processing, analysing and visualising large-scale mobile and sensor data. 6
Reading (Optional) V Gopalkrishnan et al, Big Data, Big Business: Bridging the Gap, Proc. Big Mine 12, August 2012. Y Chen et al., Practical Lessons of Data Mining at Yahoo!, Proc. CIKM 09, 1047-1055, 2009. J Lin & D Ryaboy, Scaling Big Data Mining Infrastructure: The Twitter Experience, ACM SIGKDD Exploration, 14(2), 6-19, 2012. M D. Assunção et al., Big Data computing and clouds: Trends and future directions, J. Parallel and Distributed Computing, 79-80, 3-15, 2015. 7