Methods for Analysing Large-Scale Resources and Big Music Data

Size: px
Start display at page:

Download "Methods for Analysing Large-Scale Resources and Big Music Data"

Transcription

1 Methods for Analysing Large-Scale Resources and Big Music Data Tillman Weyde Department of Computer Science Music Informatics Research Group City University London

2 Overview! Large-Scale Music Analysis! Big Music Data! Available technologies and methods! The Digital Music Lab! Architecture! Data! Example! Chord recognition! Chord sequence analysis! Visualisations! Hands-on tasks! Exploring British Library content 2 The DML Project: Objectives and Methodology

3 Large Scale Music Analysis! Music has gone digital on a large scale.! What about musicology?! What is different about large scale data! How can we analyse large scale music data 3 The Digital Music Lab Project

4 Large Scale Music Analysis! Big Music Data Collections become increasingly available digitally, e.g.! itunes and Spotify (and others) offer over 30 m tracks each! British Library holds ~ 3 millions of audio recordings (~10% are digitized), 1.5 million music prints, 100k manuscripts! Internet archive: 2.5 m audio tracks 4 The DML Project: Objectives and Methodology

5 Large Scale Music Analysis! Big Music Data in Research! Systematic Musicology has developed as "data oriented empirical research" Parncutt, R Systematic musicology and the history and future of Western musical scholarship. Journal of Interdisciplinary Music Studies, 1, 1-32.!... working with larger data sets will open up new areas of musicology. N. Cook (2005). The Compleat Musicologist. Keynote speech at ISMR. 5 The DML Project: Objectives and Methodology

6 Big Data Analysis Workflow! Acquisition, Storage and Management! Large Hardware & Software Systems! Exploration, Hypothesis development! Query & Search, Visualization! Modelling & Testing! Statistical tools 6

7 Big Data Analysis Technology! Parallel processing on large arrays! Cheap unreliable hardware! Software Architecture! Map/Reduce, Computation Graphs (Hadoop, Spark)! Algorithms! Failure-tolerant, efficient, simple! Visualisations 7

8 Big Music Data Applications (1)! Popular:! Animated! Recommendation (network based) Narrative 2.0 Liveplasma 2.0 8

9 Big Music Data Applications (2)! Not many: e.g. music history graph by Google (since 1950, not classical music )! 9

10 Digital Transformations in Musicology! Challenges! Gap between musicology and music technology (music information retrieval)! Large heterogeneous data collections! Need for software infrastructure! Audio and symbolic music processing! Connecting resources (semantic web, linked data)! Tools and visual interfaces! Methods for gaining musical insight from data 10 The Digital Music Lab Project

11 Open Questions (partly)! How can music research use audio transcription and analysis on large data collections?! How can we provide an infrastructure that enables researchers to make use of large data collections and create reusable open datasets?! How can computational tools be made usable for music researchers, musicians and other users (who are not necessarily computer scientists)? 11 The Digital Music Lab Project

12 Musicological Questions From 2014 Workshop! Analysing styles, trends over time! Work across different heterogenuous collections! Utilise external metadata and annotations 12

13 Infrastructure needed! Feature Extraction! Vamp and other plug-ins! Parallelisation! Middleware! Semantic Web! Music Ontology! Aggregation and collection level analysis 13 13/03/2015

14 Large Scale Music Analysis! Plan for this session! Show Digital Music Lab and our approach to large scale music analysis! Workflow and technologies! Present features and interfaces! Hands-on data exploration! Methods for further analysis! Discussion 14

15 The Digital Music Lab 15

16 The Digital Music Lab project! January March 2015 small follow-up project running now! City University (Dpt of Computer Science, Dpt of Music)! Tillman Weyde, Stephen Cottrell, Jason Dykes, Emmanouil Benetos, Daniel Wolff, Dan Tidhar, Alex Kachkaev! Queen Mary UoL (Centre for Digital Music)! Mark Plubmley, Simon Dixon, Mathieu Barthet, Steven Hargreaves! University College London (Dpt of Computer Science, Centre for Digital Humanities)! Nicolas Gold, Samer Abdallah! British Library (BL Labs)! Aquiles Alencar-Brayner, Mahendra Mahey, Adam Tovell 16

17 Goals! Develop a networked infrastructure to bring computation to the data! Avoid copyright problems by design! Integrate audio feature extraction and transcription! Development of analysis tools! Interactive visual interfaces! Musicological applications 17 13/03/2015

18 Outputs! Curated datasets and derived data (>4 Terabytes)! Web service with visual interfaces for data exploration! Publications (more to come)! Redistributable virtual machine images (in preparation) 18 13/03/2015

19 The Digital Music Lab! Overview 19

20 The DML System Provides...! Access: Systematic exploration of heterogenuous and large music libraries! Control: Interfacing with complex automatic music analysis tools! Analysis: Gain summarised knowledge on large numbers of recordings! Sharing: Experiments reproducible with same data, clear provenance of analysis results. 20

21 The Technical Perspective! Access to data! Audio access restricted by physical location! Metadata unification of different formats! Control via web interface to large-scale analysis! Interactive UI for overview and exploration! Scalable analysis is available on collection-level and recording-level! Share the well-defined and derived data! Re-use of existing software and published code for analysis 21

22 Software Ecosystem! Distributed system! Virtual machines (VirtualBox)! Open Source OS (Ubuntu)! Parallelised existing analysis tools! Python (NumPy)! Vamp Plugins! Big-Data map-reduce (Spark)! Computation management! Built on semantic architecture! Interactive user interface for exploration and analysis! Built using state-of-the-art web technologies 22

23 Data-Flow for Computational Analysis User Interface Provide Web Server Database: Results & Metadata Analysis Management: Cliopatria Audio, Transcriptions and Feature Storage Access Audio and Features Computing Server 23

24 Physical Locations Matter: Content Access! Two computing servers, located at BL and ILM! Allow for in-place access to restricted data! Dedicated server at City for web access 24

25 Sustainability! Preference on Open Source! Basic infrastructure (Ubuntu, Spark, Vamp...)! Soundsoftware repository for! Publishing versioned code of newly developed software! Backup and sharing: Open data / features / results! Open and reproducible method! Enables similar set-up in further institutions 25

26 Results Implemented in the DML System! Conceptual framework (including implementation) for collection-level analysis! Collection in focus as object of analysis! Data-flow for interactive retrieval of results! Secure, responsive and redundant network structure! Distributed computation ressources! Open-source software ecosystem for large-scale music analysis! Parallelised feature extraction and results management! Collection-level analysis, interface and visualisation 26

27 Feature Extraction 27

28 Audio Descriptors List 1. Spectrogram 2. MFCCs 3. Chroma 4. Onsets 5. Speech/Music Segmentation 6. Chords 7. Beats/Tempo 8. Key 9. Melody 10. Note Transcription 28

29 Raw Audio 1. B Sample from CHARM: JS Bach, Chorale Prelude - Beloved Jesus, Cohen, Harriet (piano), Columbia,

30 Spectrogram 2 versions:! STFT magnitude spectrogram! Constant-Q Transform magnitude spectrogram 30

31 MFCCs! Stand for: Mel-Frequency Cepstral Coefficients! Extracted using QM Vamp Plugin Set 31

32 Chroma! Spectrum projected onto 12 bins (representing semitones of an octave)! Extracted using: QM Chromagram and NNLS Chroma Vamp plugins 32

33 Onsets! Onset: the beginning of a musical note or another sound! Extracted using QM Onset Vamp plugin 33

34 Speech/Music Segmentation! Useful for ethnographic recordings/radio broadcasts! Extracted using BBC Speech/Music Segmentation Vamp Plugin 34

35 Chords! Extracted using Chordino Vamp Plugin 35

36 Beats! Beat locations labelled with metrical position! Extracted using Beatroot, Marsyas, Tempotracker Vamp Plugins 36

37 Tempo! Estimated based on onset/beat information! Extracted using Tempotracker and Tempogram Vamp plugins 37

38 Key! Extracted using QM Key Vamp plugin (supports major/minor keys) 38

39 Melody! Or more precisely: Sequence of fundamental frequency (F0) values corresponding to the perceived pitch of the main melody.! Extracted using MELODIA Vamp plugin 39

40 Note Transcription Semitone Resolution! Multiple-pitch detection (onset/offset/pitch/velocity)! Extracted using Silvet Vamp plugin! Synthesized transcription example: 40

41 Note Transcription High Pitch Resolution! Multiple-pitch detection on a 20-cent resolution useful for tuning/ temperament analysis and analysis of non-western music! Extracted using Silvet Vamp plugin 41

42 Data! British Library! CHARM! I Like Music 42

43 Data - British Library Music Dataset! Currently identifying, organising, and curating available music data collections from the BL Sound Archive! Over 3M digital audio recordings, in a variety of formats! Copyright-cleared material will be made available to the public! Copyright-restricted material will be accessible to BL users 43

44 Data I Like Music Dataset! I Like Music: digital music service provider to companies who hold public performance licences! Sole provider of online music to the BBC! Holds a commercial music library of 1.2M tracks and a production music library of 400k tracks 44

45 Data CHARM and Mazurka Dataset! CHARM: AHRC Research Centre for the History and Analysis of Recorded Music ( )! CHARM Dataset: 5k copyright-free historical recordings ( ) + metadata! Mazurka Dataset 3k recordings + metadata! Ideal for musicological analysis using computational methods 45

46 Extracted features available today! ILM, BL, CHARM datasets ~ tracks! Transcriptions MIDI and high resolution! Beats and tempo curves! Chroma! Chord and key 46

47 Infrastructure Feature Extraction Vamp plug-ins Spark and other techniques for parallelisation Middleware Semantic Web server (RDF with Prolog using ClioPatria) Music Ontology Manages aggregation and collection level analysis Provides SPARQL endpoint 47

48 Infrastructure Derived data from 2 collections! Accessible via the web 48

49 Interfaces and Visualisations Audio collections Chord sequence patterns Tag crowd-sourcing 49

50 Studies! Temperament! Chord progressions 0.5 ET Just Vallotti fcmtge scmtfd Kellner Lehman

51 Examples! Chord Sequence Analysis 51

52 Large-Scale Analysis of Chord Sequences! Extracted chord sequences (e.g. Am7/E7/Gmaj7, etc...)! On ILM's commercial music collection (1m tracks )! Parallel processing of multiple music clips! Chordino Vamp plugin (Queen Mary University of London)! 6 weeks on 8 core virtual machine! Retrieve most frequent chord patterns using Sequential Pattern Mining (SPM)! In specific genre subsets (classical, folk, jazz, blues, rock, reggae)! Chord pattern graphs visualised with open source graphviz 52 Barthet, M. et al. Big Chord Data Extraction and Mining. In: CIM 2014

53 Audio-based Automatic Chord Recognition Chordino Vamp plugin [Mauch and Dixon, ISMIR 2010]. Uses chromagram obtained with a non-negative least squares (NNLS) procedure for approximate note transcription. Accuracy of 80% when assessed with MIREX 2009 s dataset (Popular songs from the Beatles and Queen). Excerpt from Buddy Guy s Mary Had A Little Lamb 53

54 Chord Sequence Patterns! Segmentation of audio recordings into chord sequences! Representation of no detection (N)! Most frequency of (non-contiguous) chord sequences are patterns 54

55 Chord Pattern Length Bell-shaped curve for all genres Jazz and Classical have more patterns: greater harmonic diversity? Folk has more long patterns? 55

56 Frequent Chord Patterns: Classical vs. Blues Classical Blues Classical patterns more distributed Blues connects dom7 with dom7, classical doesn t (dom7 is typically resolved to major or minor tonic) 56

57 Visualisation of Chord Sequence Patterns! Most prominent chord sequences! Compare two collections or visualisations 57 Kachkaev, A. et al. Visualising Chord Progressions in Music Collections. In: CIM 2014

58 Circular Grid (circle of fifths, straight and twisted) 58

59 Parallel coordinates (chord types, circle of fifths) 59

60 Transition matrix (chord types, roots) 60

61 Tonnetz 61

62 Folk vs. Jazz 62

63 Demo 63

64 Practical exercises! Open (similarity) or (tempo curve)! Follow the worksheet or explore freely! Pitch (class) distribution! Similartiy! Tuning! Tempo curves! Take notes of results, ideas and hypotheses! You can explore the chord sequences here: 64

65 Practical exercises! Tasks 1! Schoenberg has relatively flat pitch profile 65

66 Practical exercises! Tasks 2! Mozart sonatas seem more homogeneous 66

67 Practical exercises! Tasks 3! Tuning was stable between these periods 67

68 Practical exercises! Tasks 4! Tempo smoothes out, some final ritardando 68

69 Where to go from here! Found something interesting?! Results need careful interpretation wrt! Noisy data! Data collection! Significance! Cross-validation! Meaning! Can be challenging, needs expertise in music and data 69

70 Where to go from here (ctd.)! Follow up ideas and hypothesis by close examination of the data! Extract data with SPARQL interface (more on Semantic Web and SPARQL tomorrow)! Use programming languages and statistical tools, e.g. Python, Pandas, SciPy, R, Matlab, SPSS, 70

71 The end! Open discussion! Thank you 71

Automatic Transcription: An Enabling Technology for Music Analysis

Automatic Transcription: An Enabling Technology for Music Analysis Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University

More information

1 About This Proposal

1 About This Proposal 1 About This Proposal 1. This proposal describes a six-month pilot data-management project, entitled Sustainable Management of Digital Music Research Data, to run at the Centre for Digital Music (C4DM)

More information

Music recommendation for music learning: Hotttabs, a multimedia guitar tutor

Music recommendation for music learning: Hotttabs, a multimedia guitar tutor Music recommendation for music learning: Hotttabs, a multimedia guitar tutor Mathieu Barthet, Amélie Anglade, Gyorgy Fazekas, Sefki Kolozali, Robert Macrae Centre for Digital Music Queen Mary University

More information

PUBLISHING MUSIC SIMILARITY FEATURES ON THE SEMANTIC WEB

PUBLISHING MUSIC SIMILARITY FEATURES ON THE SEMANTIC WEB 10th International Society for Music Information Retrieval Conference (ISMIR 2009) PUBLISHING MUSIC SIMILARITY FEATURES ON THE SEMANTIC WEB Dan Tidhar, György Fazekas, Sefki Kolozali, Mark Sandler Centre

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

SYMBOLIC REPRESENTATION OF MUSICAL CHORDS: A PROPOSED SYNTAX FOR TEXT ANNOTATIONS

SYMBOLIC REPRESENTATION OF MUSICAL CHORDS: A PROPOSED SYNTAX FOR TEXT ANNOTATIONS SYMBOLIC REPRESENTATION OF MUSICAL CHORDS: A PROPOSED SYNTAX FOR TEXT ANNOTATIONS Christopher Harte, Mark Sandler and Samer Abdallah Centre for Digital Music Queen Mary, University of London Mile End Road,

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC

DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC Thomas Rocher, Matthias Robine, Pierre Hanna, Robert Strandh LaBRI University of Bordeaux 1 351 avenue de la Libration F 33405 Talence, France simbals@labri.fr

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Beethoven, Bach und Billionen Bytes

Beethoven, Bach und Billionen Bytes Meinard Müller Beethoven, Bach und Billionen Bytes Automatisierte Analyse von Musik und Klängen Meinard Müller Tutzing-Symposium Oktober 2014 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University,

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

FAST MIR IN A SPARSE TRANSFORM DOMAIN

FAST MIR IN A SPARSE TRANSFORM DOMAIN ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 ravelli@lam.jussieu.fr Gaël Richard TELECOM ParisTech gael.richard@enst.fr

More information

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum

BIG. Big Data Analysis John Domingue (STI International and The Open University) Big Data Public Private Forum Big Data Analysis John Domingue (STI International and The Open University) Project co-funded by the European Commission within the 7th Framework Program (Grant Agreement No. 257943) 1 The Data landscape

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning Automatic hord Recognition from Audio Using an HMM with Supervised Learning Kyogu Lee enter for omputer Research in Music and Acoustics Department of Music, Stanford University kglee@ccrma.stanford.edu

More information

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer

Survey of Canadian and International Data Management Initiatives. By Diego Argáez and Kathleen Shearer Survey of Canadian and International Data Management Initiatives By Diego Argáez and Kathleen Shearer on behalf of the CARL Data Management Working Group (Working paper) April 28, 2008 Introduction Today,

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic

Big Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the

More information

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution

Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Sustainable Development with Geospatial Information Leveraging the Data and Technology Revolution Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.

More information

A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE

A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE A MUSICAL WEB MINING AND AUDIO FEATURE EXTRACTION EXTENSION TO THE GREENSTONE DIGITAL LIBRARY SOFTWARE Cory McKay Marianopolis College and CIRMMT Montréal, Canada cory.mckay@mail.mcgill.ca David Bainbridge

More information

OpenAIRE Research Data Management Briefing paper

OpenAIRE Research Data Management Briefing paper OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement

More information

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323

Machine Learning and Cloud Computing. trends, issues, solutions. EGI-InSPIRE RI-261323 Machine Learning and Cloud Computing trends, issues, solutions Daniel Pop HOST Workshop 2012 Future plans // Tools and methods Develop software package(s)/libraries for scalable, intelligent algorithms

More information

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

MEng, BSc Applied Computer Science

MEng, BSc Applied Computer Science School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Control of affective content in music production

Control of affective content in music production International Symposium on Performance Science ISBN 978-90-9022484-8 The Author 2007, Published by the AEC All rights reserved Control of affective content in music production António Pedro Oliveira and

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

A causal algorithm for beat-tracking

A causal algorithm for beat-tracking A causal algorithm for beat-tracking Benoit Meudic Ircam - Centre Pompidou 1, place Igor Stravinsky, 75004 Paris, France meudic@ircam.fr ABSTRACT This paper presents a system which can perform automatic

More information

James Hardiman Library. Digital Scholarship Enablement Strategy

James Hardiman Library. Digital Scholarship Enablement Strategy James Hardiman Library Digital Scholarship Enablement Strategy This document outlines the James Hardiman Library s strategy to enable digital scholarship at NUI Galway. The strategy envisages the development

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Considerations for Research Data Management

Considerations for Research Data Management Considerations for Research Data Management Andrew Dean - OCF adean@ocf.co.uk - 07508 033894 Wednesday 3 rd December 2014 What is an RDM solution? Research Data Management A method of effectively managing

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Databases in Organizations

Databases in Organizations The following is an excerpt from a draft chapter of a new enterprise architecture text book that is currently under development entitled Enterprise Architecture: Principles and Practice by Brian Cameron

More information

MEng, BSc Computer Science with Artificial Intelligence

MEng, BSc Computer Science with Artificial Intelligence School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

AMUSICAL key and a chord are important attributes of

AMUSICAL key and a chord are important attributes of IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008 291 Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized

More information

Big Data and Cloud Computing for GHRSST

Big Data and Cloud Computing for GHRSST Big Data and Cloud Computing for GHRSST Jean-Francois Piollé (jfpiolle@ifremer.fr) Frédéric Paul, Olivier Archer CERSAT / Institut Français de Recherche pour l Exploitation de la Mer Facing data deluge

More information

UniGR Workshop: Big Data «The challenge of visualizing big data»

UniGR Workshop: Big Data «The challenge of visualizing big data» Dept. ISC Informatics, Systems & Collaboration UniGR Workshop: Big Data «The challenge of visualizing big data» Dr Ir Benoît Otjacques Deputy Scientific Director ISC The Future is Data-based Can we help?

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information

challenges Beatrice Alex! Edinburgh Language Technology Group! School of Informatics! balex@inf.ed.ac.uk! @bea_alex!

challenges Beatrice Alex! Edinburgh Language Technology Group! School of Informatics! balex@inf.ed.ac.uk! @bea_alex! Text mining big data: potential and challenges Beatrice Alex! Edinburgh Language Technology Group! School of Informatics! balex@inf.ed.ac.uk! @bea_alex! LTG The Edinburgh Language Technology Group Research

More information

How To Become A Data Scientist

How To Become A Data Scientist Programme Specification Awarding Body/Institution Teaching Institution Queen Mary, University of London Queen Mary, University of London Name of Final Award and Programme Title Master of Science (MSc)

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Data-Driven Decisions: Role of Operations Research in Business Analytics

Data-Driven Decisions: Role of Operations Research in Business Analytics Data-Driven Decisions: Role of Operations Research in Business Analytics Dr. Radhika Kulkarni Vice President, Advanced Analytics R&D SAS Institute April 11, 2011 Welcome to the World of Analytics! Lessons

More information

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

The cross-disciplinary Roots of the British collaboration between scholars in humanities and

The cross-disciplinary Roots of the British collaboration between scholars in humanities and HALOGEN RESEARCH DATA MANAGEMENT BENEFITS CASE STUDY 1. BACKGROUND The cross-disciplinary Roots of the British collaboration between scholars in humanities and genetics at the University of Leicester (Wellcome

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation

GATE Mímir and cloud services. Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation GATE Mímir and cloud services Multi-paradigm indexing and search tool Pay-as-you-go large-scale annotation GATE Mímir GATE Mímir is an indexing system for GATE documents. Mímir can index: Text: the original

More information

HadoopRDF : A Scalable RDF Data Analysis System

HadoopRDF : A Scalable RDF Data Analysis System HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations

More information

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,

More information

DataGraft: Simplifying Open Data Publishing

DataGraft: Simplifying Open Data Publishing DataGraft: Simplifying Open Data Publishing Dumitru Roman 1, Marin Dimitrov 2, Nikolay Nikolov 1, Antoine Putlier 1, Dina Sukhobok 1, Brian Elvesæter 1, Arne Berre 1, Xianglin Ye 1, Alex Simov 2, Yavor

More information

LDIF - Linked Data Integration Framework

LDIF - Linked Data Integration Framework LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,

More information

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries

fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Developing a Website. Chito N. Angeles Web Technologies: Training for Development and Teaching Resources

Developing a Website. Chito N. Angeles Web Technologies: Training for Development and Teaching Resources Developing a Website Chito N. Angeles Web Technologies: Training for Development and Teaching Resources Static vs. Dynamic Website Static Website Traditional Website Contains a fixed amount of pages and

More information

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy Much higher Volumes. Processed with more Velocity. With much more Variety. Is Big Data so big? Big Data Smart Data Project HAVEn: Adaptive Intelligence

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

Principles for Working with Big Data"

Principles for Working with Big Data Principles for Working with Big Data" Juliana Freire Visualization and Data Analysis (ViDA) Lab Computer Science & Engineering Center for Urban Science & Progress (CUSP) Center for Data Science New York

More information

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

More information

GUITAR TAB MINING, ANALYSIS AND RANKING

GUITAR TAB MINING, ANALYSIS AND RANKING 12th International Society for Music Information Retrieval Conference (ISMIR 2011) GUITAR TAB MINING, ANALYSIS AND RANKING Robert Macrae Centre for Digital Music Queen Mary University of London robert.macrae@eecs.qmul.ac.uk

More information

Software Description Technology

Software Description Technology Software applications using NCB Technology. Software Description Technology LEX Provide learning management system that is a central resource for online medical education content and computer-based learning

More information

LJMU Research Data Policy: information and guidance

LJMU Research Data Policy: information and guidance LJMU Research Data Policy: information and guidance Prof. Director of Research April 2013 Aims This document outlines the University policy and provides advice on the treatment, storage and sharing of

More information

A Review of Data Mining Techniques

A Review of Data Mining Techniques Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Metadata Repositories in Health Care. Discussion Paper

Metadata Repositories in Health Care. Discussion Paper Health Care and Informatics Review Online, 2008, 12(3), pp 37-44, Published online at www.hinz.org.nz ISSN 1174-3379 Metadata Repositories in Health Care Discussion Paper Dr Karolyn Kerr karolynkerr@hotmail.com

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

BYODs & FAIR Data Stewardship

BYODs & FAIR Data Stewardship BYODs & FAIR Data Stewardship Luiz Olavo Bonino luiz.bonino@dtls.nl www.elixir-europe.org Summary FAIR Data stewardship Approach in NL BYOD FAIR Data tooling ecosystem Way of working (FAIR) Data Stewardship

More information

Amit Sheth & Ajith Ranabahu, 2010. Presented by Mohammad Hossein Danesh

Amit Sheth & Ajith Ranabahu, 2010. Presented by Mohammad Hossein Danesh Amit Sheth & Ajith Ranabahu, 2010 Presented by Mohammad Hossein Danesh 1 Agenda Introduction to Cloud Computing Research Motivation Semantic Modeling Can Help Use of DSLs Solution Conclusion 2 3 Motivation

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Linked Science as a producer and consumer of big data in the Earth Sciences

Linked Science as a producer and consumer of big data in the Earth Sciences Linked Science as a producer and consumer of big data in the Earth Sciences Line C. Pouchard,* Robert B. Cook,* Jim Green,* Natasha Noy,** Giri Palanisamy* Oak Ridge National Laboratory* Stanford Center

More information

BIGDATA GREENPLUM DBA INTRODUCTION COURSE OBJECTIVES COURSE SUMMARY HIGHLIGHTS OF GREENPLUM DBA AT IQ TECH

BIGDATA GREENPLUM DBA INTRODUCTION COURSE OBJECTIVES COURSE SUMMARY HIGHLIGHTS OF GREENPLUM DBA AT IQ TECH BIGDATA GREENPLUM DBA Meta-data: Outrun your competition with advanced knowledge in the area of BigData with IQ Technology s online training course on Greenplum DBA. A state-of-the-art course that is delivered

More information

The Knowledge Sharing Infrastructure KSI. Steven Krauwer

The Knowledge Sharing Infrastructure KSI. Steven Krauwer The Knowledge Sharing Infrastructure KSI Steven Krauwer 1 Why a KSI? Building or using a complex installation requires specialized skills and expertise. CLARIN is no exception. CLARIN is populated with

More information

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

The Value of Taxonomy Management Research Results

The Value of Taxonomy Management Research Results Taxonomy Strategies November 28, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. The Value of Taxonomy Management Research Results Joseph A Busch, Principal What does taxonomy do for search?

More information

Programme Specification Postgraduate Programmes

Programme Specification Postgraduate Programmes Programme Specification Postgraduate Programmes Awarding Body/Institution Teaching Institution University of London Goldsmiths, University of London Name of Final Award and Programme Title MSc Data Science

More information

Sensor data management software, requirements and considerations. Don Henshaw H.J. Andrews Experimental Forest

Sensor data management software, requirements and considerations. Don Henshaw H.J. Andrews Experimental Forest Sensor data management software, requirements and considerations Don Henshaw H.J. Andrews Experimental Forest Joint NERC Environmental Sensor Network/LTER SensorNIS Workshop, October 25-27 th, 2011 COMMON

More information

Database preservation toolkit:

Database preservation toolkit: Nov. 12-14, 2014, Lisbon, Portugal Database preservation toolkit: a flexible tool to normalize and give access to databases DLM Forum: Making the Information Governance Landscape in Europe José Carlos

More information

Using Big Data and GIS to Model Aviation Fuel Burn

Using Big Data and GIS to Model Aviation Fuel Burn Using Big Data and GIS to Model Aviation Fuel Burn Gary M. Baker USDOT Volpe Center 2015 Transportation DataPalooza June 17, 2015 The National Transportation Systems Center Advancing transportation innovation

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information