2015-04-24 Bigdata@BTH Challenges and applications Håkan Grahn, Blekinge Institute of Technology Parisa Yousefi, Ericsson and Blekinge Institute of Technology BigData@BTH Research profile financed by the Knowledge foundation 36 msek (KKS) + 15 msek (BTH) + >40 msek (companies) Sep. 2014 to Dec. 2020 11 companies 4 departments at BTH Focus on machine learning and data mining, and efficient implementation of such algorithms on multicore and cloud system 1
Research focus How shall we design future scalable systems for big data analytics in order to achieve a good balance between performance and resource efficiency as well as business value? Research themes - Core academic competence in all themes Theme A: Big data analytics for decision support - Business intelligence - Multi-criteria decision-making - Descriptive/predictive big data analytics Theme B: Big data analytics for image processing - Image classification - Image restoration - Pattern recognition Theme C: Core technologies - Data mining and knowledge discovery - Discovery science - Machine learning - Real-time analytics Theme D: Foundations and enabling technologies - Multicore and cloud - Data communication and networks - Heterogeneous systems - Real-time and scheduling - Storage systems - Software architecture and implementation 2
Balanced mix of industry partners Theme A: Big data analytics for decision support - Business intelligence - Multi-criteria decision-making - Descriptive/predictive big data analytics Indigo IPEX MMI Scorett Footware Contribe Telenor Ericsson Noda Intelligent Systems Wireless Maingate Nordic Theme B: Big data analytics for image processing - Image classification - Image restoration - Pattern recognition Theme C: Core technologies - Data mining and knowledge discovery - Discovery science - Machine learning - Real-time analytics Theme D: Foundations and enabling technologies - Multicore and cloud - Data communication and networks - Heterogeneous systems - Real-time and scheduling - Storage systems - Software architecture and implementation Compuverde Sony Arkiv Digital AD Uniqueness and competitive edge Theme A: Big data analytics for decision support - Business intelligence - Multi-criteria decision-making - Descriptive/predictive big data analytics Health care domain Large distributed systems Telecommunication systems Concrete challenges!! Theme B: Big data analytics for image processing - Image classification - Image restoration - Pattern recognition Theme C: Core technologies - Data mining and knowledge discovery - Discovery science - Machine learning - Real-time analytics Unique combination!! Theme D: Foundations and enabling technologies - Multicore and cloud - Data communication and networks - Heterogeneous systems - Real-time and scheduling - Storage systems - Software architecture and implementation Camera devices Large-scale image processing and classification 3
Industrial challenges Results, knowledge, products, Concrete projects Industrial challenges drive the research agenda IC1: Real-time and large-scale quality assessment of images IC2: Demand-based hospital staff planning IC3: Customer profiling for personalized strategies & marketing IC4: Fraud and anomaly detection in large-scale data sets IC5: Automation and orchestration of cloud-based test environments IC6: Collection and selection of data for real-time analysis 4
Industrial challenges Results, knowledge, products, Concrete projects IC1 IC2 IC3 IC4 IC5 IC6 P1, Theme A X X P2, Theme A X X X P3, Theme B X X P4, Theme C X X X X X P5, Theme C X X X P6, Theme D X X X P7, Theme D X X IC1: Real-time and large-scale quality assessment of images IC1 IC2 IC3 IC4 IC5 IC6 P1, Theme A X X P2, Theme A X X X P3, Theme B X X P4, Theme C X X X X X P5, Theme C X X X P6, Theme D X X X P7, Theme D X X 5
IC1: Real-time and large-scale quality assessment of images IC1 IC2 IC3 IC4 IC5 IC6 P1, Theme A X X P2, Theme A X X X P3, Theme B X X P4, Theme C X X X X X P5, Theme C X X X P6, Theme D X X X P7, Theme D X X P3 (B): Efficient media analysis and processing P4 (C): Efficient ensemble methods for challenging domains Subprojects Addressing the challenges P1 (A): Decision support systems for resource estimation and allocation P2 (A): Decision support systems for anomaly detection and visualization P3 (B): Efficient media analysis and processing P4 (C): Efficient ensemble methods for challenging domains P5 (C): Classification and regression in large data streams P6 (D): Data collection and selection in large distributed environments P7 (D): Resource-efficient automatic orchestration of resources in cloud systems for big data analytics 6
Possible applications in transport and logistics Distributed data collection, filtering, and storage, e.g., traffic information Planning and scheduling, e.g., resource planning, train schedules, maintenance FLOAT - FLexibel Omplanering Av Tåglägen i drift KAJT Kapacitet i JärnvägsTrafiken Anomaly detection, e.g., strange or unusual behavior Revenue management, e.g., revenue leakage, run-away costs 7