ehg New Trends in e Humanities Amsterdam

Size: px
Start display at page:

Download "ehg New Trends in e Humanities Amsterdam 10 01 2013"

Transcription

1 ehg New Trends in e Humanities Amsterdam

2 Overview 1) Dialect geography 2) A unified structure for Dutch dialect dictionary data 3) Dialectgebieden in Brabant. Geografische clustering op basis van de ruwe lexicale gegevens van het Woordenboek van de Brabantse Dialecten 4) Visualization as a Research Tool for Dialect Geography Using a Geo browser

3 1 Dialect geography

4 Dialect geography What dialect differences are there between villages or towns? What dialect areas are there? What relations are there between dialect areas and other geographic data?

5 Dialect geography Dictionary of the Brabantic Dialects (WBD)

6

7

8 2 A unified structure for Dutch dialect dictionary data Folkert de Vriend, Lou Boves, Henk van den Heuvel, Roeland van Hout, Joep Kruijsen, Jos Swanenberg (2006). In: Proceedings of The fifth international conference on Language Resources and Evaluation (LREC 2006), Genoa, Italy, pp

9 Dialect geography Dictionary of the Brabantic Dictionary of the Limburgian Dictionary of the Flemmish Dictionary of the Zeelandish Dialects (WBD) Dialects (WLD) Dialects (WVD) Dialects (WZD)

10 Total research area

11 Form based dictionary: Dictionary of the Zeelandish Dialects (WZD)

12 Sense based dictionary: Dictionary of the Brabantic Dialects (WBD)

13 Map from Dictionary of the Brabantic Dialects (WBD)

14 Unification: Map showing unified data from WBD, WLD, WZD, WVD

15 Unification: Map showing unified data from WBD and WLD (CLARIN COAVA (Cornips et al. 2011)) Frog ( Kikker )

16 Towards a unified structure based on standards All dialect dictionary projects use the same core data types: form, sense and location.

17 WBD

18 WVD

19 Dictionary of the Zeelandish Dialects (WZD)

20 Mapping core data onto the LMF core model Although the data organisation is either form based or sense based, the core data types have the same heterarchical relation

21 LMF XML implementation for core data <LexicalResource name='"unified Dialect Lexicon"> <Lexicon name="wbd"> <LexicalEntry> <Form>ulling</Form> <Sense>Fret</Sense> <Location>Rosmalen</Location> </LexicalEntry> </Lexicon> <Lexicon name="wvd"> <LexicalEntry> <Form>voejerkuil</Form> <Sense>Groenvoerkuil</Sense> <Location>K136a</Location> </LexicalEntry> </Lexicon> <Lexicon name="wzd"> <LexicalEntry> <Form> aerdwurm</form> <Sense>dauwworm</Sense> <Location>Z.eil.</Location> </LexicalEntry> </Lexicon> </LexicalResource>

22 Additional information needed for specifying core data types Form Type: Lexical, Phonetic Alphabet: IPA, Genoveva, Latin Sense Type: Concept, Meaning Location Type: Placename, Area, Kloeke code Standardisation: Where possible convert data to the same standard (f.i. location: longitude/lattitude) Minimal: map type and alphabet labels to ISOCat

23 Additional information needed for classifying core data types Form Class: phonetic forms have a lexical classification in WBD, WLD en WVD Sense: Class: taxonomy Location Class: geopolitical taxonomy Unified classifications can be used to provide access to the unified data

24 Conclusions All dictionaries share the same core data types A unified structure built around these core data types will enable Using the data of the different dictionaries as one huge dataset. Organising the data based on either sense, form, or location. This will enable different perspectives on the data.

25 3 Dialectgebieden in Brabant. Geografische clustering op basis van de ruwe lexicale gegevens van het Woordenboek van de Brabantse Dialecten Folkert de Vriend, Jos Swanenberg, Roeland van Hout (2007). In: Taal en Tongval, themanummer 20, Dialectlexicografie, pp

26 Introduction Computational analysis of the data that were collected for WBD using cluster analyses Cluster analyses is defined in Jain and Dubes (1988) as the process of classifying objects into subsets that have meaning in the context of a particular problem. In a dialect geographic context the problem can be finding dialect areas. Aim: to see if we could find detailed dialect patterns in Brabant based on lexical data only The dialect patterns that we found were compared to the dialect map of Belemans and Goossens (2000)

27 Belemans and Goossens (2000): Represents a traditional view on the classification of the Brabant area Based on qualitative analyses of several types of data.

28 Data selection Only a subselection of the WBD data could be used: Part III of WBD Data collected in the Nijmeegse enquetete. These were collected for the whole research area and not just for subareas of Brabant. Only the data for the core data types: concept, lexical form and location Resulting data selection: a data matrix with 614,941 lexical forms for 4229 concepts and 639 locations.

29 Method 1) Lexical distances were computed for all location pairs in the data set. (RuG/L04 ) 2) Using these lexical distances the locations were grouped together using cluster analyses. (RuG/L04) 3) For interpreting the resulting groups of locations, these were then converted to a KML symbol map (cartographic software developed by Meertens and Radboud University) 4) This symbol map was overlaid onto the map of Belemans and Goossens so that (mis)matches between the two maps could be visually inspected. (Geobrowser)

30 Data characteristics that were problematic for the method Nijmeegse enquete covered the whole research area, but not for every concept a lexical form was recorded in each location. Result: Datamatrix was a huge gatenkaas (83.3% of the cells in the data matrix were empty) Distance matrix was also a gatenkaas. (Since often no distance could be calculated for a pair of locations.) This was problematic for the cluster algorithms we used since they cannot deal with missing distances.

31 Solution 1 Create plausible lexical distances for the empty cells using the lexical distance to locations that are geographically near ( imputation ). Cluster analyses set to yield nine clusters showed that it was possible to find dialect areas for Brabant based on lexical data.

32 But Although the resulting dialect maps showed some general resemblance with Belemans and Goossens (2000), the results were not very satisfactory. The dialect maps contained clusters that overlapped each other much and also clusters covering the entire research area.

33 6 overlapping clusters

34 3 clusters covering entire research area

35 Solution 2 Strongly reduce the percentage of empty cells in the data matrix by completely removing all concepts and locations with little or no lexical forms. Result: Data matrix was reduced from 614,941 lexical forms to 100,277 lexical forms. The percentage of empty cells in the data matrix was reduced from 83.3% to 20.3%. Now for every pair of locations a distance could be calculated (without using imputation)

36 Result Requiring the cluster analysis to return nine clusters resulted in a close resemblance to the nine main areas of the dialect map of Belemans and Goossens (2000). Also, the result did not contain clusters covering the entire research area anymore.

37 Final result based on 100,277 lexical forms

38 Conclusions We could find detailed dialect patterns in Brabant based on lexical data only. These detailed dialect patterns resembled the map of Belemans and Goossen (2000) closely. But for our computational method the dataset had to be manipulated extensively. Relevant for ehumanities perspective: The WBD dataset was collected with a typical humanities aim in mind: collecting as much variation as possible. For searching for general patterns using cluster analyses the gatenkaas character of the datamatrix suddenly was a problem.

39 4 Visualization as a Research Tool for Dialect Geography Using a Geobrowser Folkert de Vriend, Lou Boves, Roeland van Hout, Jos Swanenberg (2011). In: Literary and Linguistic Computing, 26(1), pp

40 Basic research chain for conducting dialect geography research Applies to dialect dictionary projects as well as to dialect atlas projects. Modelled very much as a pipeline with a unidirectional data and process flow.

41 Visualization as a research tool Visualization of map data does not have to be a static and final stage in the research chain. The basic research chain can be extended with support for using visualization as a research tool. What do we mean by that?

42 Scheidermans mantra Scheidermans mantra for designing advanced information visualization interfaces (Shneiderman, 1996) can also be applied to map data It formulates the basic principles as: overview first, zoom and filter, then details on demand. It can be regarded as a set of minimum requirements for using visualization as a research tool. A research chain that meets the requirements of Scheidermans mantra will need some form of dynamic visualisation.

43 Incorporation of independent Support for hypotheses about the processes underlying patterns in dialect variation might be found in independent diatopical data. Shattered block (Weijnen 1977) diatopical data (1)

44 Incorporation of independent diatopical data (2) By combining different types of non linguistic diatopical data with dialect data, one is able to explore hypotheses about relations between the structures found in the data sets. The basic research chain should be extended with the ability to combine visualizations of dialect data with visualisations of independent diatopical data, for example by overlaying maps. Ideally, research into relations between such independent diatopical and dialect data already starts in the interpretation stage. Therefore, in the extended research chain (next slide) incorporation of independent diatopical data is an additional input to the geographic interpretation stage.

45 Extended research chain with support for visualization as a research tool Original unidirectional basic research chain is turned into an architecture that supports an iterative process flow that aids exploration of multiple hypotheses about the data.

46 Support for visualization as a research tool We checked to what extent tools that are already available for dialect geography research support visualization as a research tool. The two main European workbenches for dialect geography research (RuG/L04 and VDM) are very sophisticated but they did (in 2010) not offer much support for using visualization as a research tool. Geobrowsers (like Google Earth or Nasa Worldwind) do support using visualization as a research tool: They fully adhere to Shneiderman s visual informationseeking mantra. For overlaying maps with independent diatopical data, Google Earth offers easy to use built in tools.

47 Demo geobrowser (Google Earth)

48 Conclusions With dynamic visualization and the ability to incorporate independent data the role of the map changes from a static presentation of research findings into a research tool that can be used to gain new insights about (dialect geographic) data. A first step for existing tools for computational analysis of dialect geographic data towards the full extended research chain described, would be to make them more interoperable with geobrowsers.

49 CLARIN MIGMAP (Bloothooft et al): KML output

50

Curation Report. Brabants Nederlands en Nederlands Brabants Handwoordenboek

Curation Report. Brabants Nederlands en Nederlands Brabants Handwoordenboek Curation Report Brabants Nederlands en Nederlands Brabants Handwoordenboek CLARIN NL Data Curation Service Version 1, 2 October 2013 Henk van den Heuvel CLST, Radboud University Nijmegen 1. Introduction

More information

Curation Report. Zoo prôte wèij in Nuejne mi mekaâr

Curation Report. Zoo prôte wèij in Nuejne mi mekaâr Curation Report Zoo prôte wèij in Nuejne mi mekaâr NUENENS DIALECTWOORDENBOEK CLARIN NL Data Curation Service Version 1, 8 oktober 2013 Henk van den Heuvel CLST, Radboud University Nijmegen 1. Introduction

More information

A Unified Structure for Dutch Dialect Dictionary Data

A Unified Structure for Dutch Dialect Dictionary Data A Unified Structure for Dutch Dialect Dictionary Data Folkert de Vriend 1, Lou Boves 1,2, Henk van den Heuvel 1, Roeland van Hout 2, Joep Kruijsen 2, Jos Swanenberg 2 1 Centre for Language and Speech Technology

More information

Curation Report KEMPENSCH TAALEIGEN

Curation Report KEMPENSCH TAALEIGEN Curation Report KEMPENSCH TAALEIGEN BERGEIJKS DIALECTWOORDENBOEK CLARIN NL Data Curation Service Version 1, 8 oktober 2013 Henk van den Heuvel CLST, Radboud University Nijmegen 1. Introduction There are

More information

Applying quantitative methods to dialect Dutch verb clusters

Applying quantitative methods to dialect Dutch verb clusters Applying quantitative methods to dialect Dutch verb clusters Jeroen van Craenenbroeck KU Leuven/CRISSP jeroen.vancraenenbroeck@kuleuven.be 1 Introduction Verb cluster ordering is a well-known area of microparametric

More information

ATLAS.ti 6 Distinguishing features and functions

ATLAS.ti 6 Distinguishing features and functions SoftwareReviews:ATLAS.ti6 ATLAS.ti6 Distinguishingfeaturesandfunctions Thisdocumentisintendedtobereadinconjunctionwiththe ChoosingaCAQDASPackageWorkingPaper which provides a more general commentary of

More information

How To Create A Clarin Metadata Infrastructure

How To Create A Clarin Metadata Infrastructure Creating & Testing CLARIN Metadata Components Folkert de Vriend (1), Daan Broeder (2), Griet Depoorter (3), Laura van Eerten (3), Dieter van Uytvanck (2) 1) Meertens Institute Joan Muyskenweg 25, Amsterdam,

More information

SkyEYE Tracking Feature List 1.1 Contents

SkyEYE Tracking Feature List 1.1 Contents 1 SkyEYE Tracking Feature List 1.1 Contents Real Time Tracking... 2 Vehicle Track History Mapping... 3 Automatic Electronic Trip Log Book... 3 Over speed Monitoring... 4 Customer Site Visit Monitoring...

More information

CLARIN-NL Second Open Call. Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010

CLARIN-NL Second Open Call. Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010 CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010 Overview Background Project Types Project Goals Roles Resource Curation Projects Demonstrator Projects CLARIN Centres

More information

Pilot project: A Dictionary of the Dutch Dialects Jacques Van Keymeulen and Veronique De Tier Ghent University

Pilot project: A Dictionary of the Dutch Dialects Jacques Van Keymeulen and Veronique De Tier Ghent University Pilot project: A Dictionary of the Dutch Dialects Jacques Van Keymeulen and Veronique De Tier Ghent University The lexicon of the traditional dialects in the Dutch language area is disappearing at a rapid

More information

The Migmap project: technical aspects

The Migmap project: technical aspects The Migmap project: technical aspects New Trends in e-humanities, 29 November 2012 Jan Pieter Kunst, Meertens Institute 1 General architecture of the application 2 General architecture of the application

More information

The Syntactic Atlas of the Dutch Dialects

The Syntactic Atlas of the Dutch Dialects The Syntactic Atlas of the Dutch Dialects A corpus of elicited speech as an on-line Dynamic Atlas Sjef Barbiers & Jan Pieter Kunst Meertens Institute (KNAW) 1 Coordination Hans Bennis (Meertens Institute)

More information

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values

An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An

More information

Exploratory Data Analysis for Ecological Modelling and Decision Support

Exploratory Data Analysis for Ecological Modelling and Decision Support Exploratory Data Analysis for Ecological Modelling and Decision Support Gennady Andrienko & Natalia Andrienko Fraunhofer Institute AIS Sankt Augustin Germany http://www.ais.fraunhofer.de/and 5th ECEM conference,

More information

GIS & Spatial Modeling

GIS & Spatial Modeling Geography 4203 / 5203 GIS & Spatial Modeling Class 2: Spatial Doing - A discourse about analysis and modeling in a spatial context Updates Class homepage at: http://www.colorado.edu/geography/class_homepages/geog_4203

More information

Get the most value from your surveys with text analysis

Get the most value from your surveys with text analysis PASW Text Analytics for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That

More information

Current Order Tool Experiences Complaints

Current Order Tool Experiences Complaints Current Order Tool Experiences Complaints Log in unadvertised case sensitivity for email address that is used as login id CERES Dataset Info pages are too crowded!! On the Data Products Catalog page, remove

More information

Introduction to Exploratory Data Analysis

Introduction to Exploratory Data Analysis Introduction to Exploratory Data Analysis A SpaceStat Software Tutorial Copyright 2013, BioMedware, Inc. (www.biomedware.com). All rights reserved. SpaceStat and BioMedware are trademarks of BioMedware,

More information

ONLINE RESOURCES FOR RESEARCH. Indika Karunathilake

ONLINE RESOURCES FOR RESEARCH. Indika Karunathilake ONLINE RESOURCES FOR RESEARCH Indika Karunathilake Why online resources for research? What are the online resources available for research? Brainstorming Tools Search Engines Online databases Online journals

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

NakeDB: Database Schema Visualization

NakeDB: Database Schema Visualization NAKEDB: DATABASE SCHEMA VISUALIZATION, APRIL 2008 1 NakeDB: Database Schema Visualization Luis Miguel Cortés-Peña, Yi Han, Neil Pradhan, Romain Rigaux Abstract Current database schema visualization tools

More information

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS

USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS USING SELF-ORGANIZING MAPS FOR INFORMATION VISUALIZATION AND KNOWLEDGE DISCOVERY IN COMPLEX GEOSPATIAL DATASETS Koua, E.L. International Institute for Geo-Information Science and Earth Observation (ITC).

More information

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10

Linguistic Research with CLARIN. Jan Odijk MA Rotation Utrecht, 2015-11-10 Linguistic Research with CLARIN Jan Odijk MA Rotation Utrecht, 2015-11-10 1 Overview Introduction Search in Corpora and Lexicons Search in PoS-tagged Corpus Search for grammatical relations Search for

More information

DATA VISUALIZATION GABRIEL PARODI STUDY MATERIAL: PRINCIPLES OF GEOGRAPHIC INFORMATION SYSTEMS AN INTRODUCTORY TEXTBOOK CHAPTER 7

DATA VISUALIZATION GABRIEL PARODI STUDY MATERIAL: PRINCIPLES OF GEOGRAPHIC INFORMATION SYSTEMS AN INTRODUCTORY TEXTBOOK CHAPTER 7 DATA VISUALIZATION GABRIEL PARODI STUDY MATERIAL: PRINCIPLES OF GEOGRAPHIC INFORMATION SYSTEMS AN INTRODUCTORY TEXTBOOK CHAPTER 7 Contents GIS and maps The visualization process Visualization and strategies

More information

Data Interoperability Extension Tutorial

Data Interoperability Extension Tutorial Data Interoperability Extension Tutorial Copyright 1995-2010 Esri All rights reserved. Table of Contents About the Data Interoperability extension tutorial...................... 3 Exercise 1: Using direct-read

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

There are various ways to find data using the Hennepin County GIS Open Data site:

There are various ways to find data using the Hennepin County GIS Open Data site: Finding Data There are various ways to find data using the Hennepin County GIS Open Data site: Type in a subject or keyword in the search bar at the top of the page and press the Enter key or click the

More information

Visualization Method of Trajectory Data Based on GML, KML

Visualization Method of Trajectory Data Based on GML, KML Visualization Method of Trajectory Data Based on GML, KML Junhuai Li, Jinqin Wang, Lei Yu, Rui Qi, and Jing Zhang School of Computer Science & Engineering, Xi'an University of Technology, Xi'an 710048,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/54957

More information

Galaxy Morphological Classification

Galaxy Morphological Classification Galaxy Morphological Classification Jordan Duprey and James Kolano Abstract To solve the issue of galaxy morphological classification according to a classification scheme modelled off of the Hubble Sequence,

More information

THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS

THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS CO-205 THREE-DIMENSIONAL CARTOGRAPHIC REPRESENTATION AND VISUALIZATION FOR SOCIAL NETWORK SPATIAL ANALYSIS SLUTER C.R.(1), IESCHECK A.L.(2), DELAZARI L.S.(1), BRANDALIZE M.C.B.(1) (1) Universidade Federal

More information

ATLAS.ti 7 Distinguishing features and functions

ATLAS.ti 7 Distinguishing features and functions ATLAS.ti 7 Distinguishing features and functions This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common

More information

Product Navigator User Guide

Product Navigator User Guide Product Navigator User Guide Table of Contents Contents About the Product Navigator... 1 Browser support and settings... 2 Searching in detail... 3 Simple Search... 3 Extended Search... 4 Browse By Theme...

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Data Mining Clustering (2) Toon Calders Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining Outline Partitional Clustering Distance-based K-means, K-medoids,

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

PERFORMANCE TOOLS DEVELOPMENTS

PERFORMANCE TOOLS DEVELOPMENTS PERFORMANCE TOOLS DEVELOPMENTS Roberto A. Vitillo presented by Paolo Calafiura & Wim Lavrijsen Lawrence Berkeley National Laboratory Future computing in particle physics, 16 June 2011 1 LINUX PERFORMANCE

More information

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma)

The Language Archive at the Max Planck Institute for Psycholinguistics. Alexander König (with thanks to J. Ringersma) The Language Archive at the Max Planck Institute for Psycholinguistics Alexander König (with thanks to J. Ringersma) Fourth SLCN Workshop, Berlin, December 2010 Content 1.The Language Archive Why Archiving?

More information

SAND: Relation between the Database and Printed Maps

SAND: Relation between the Database and Printed Maps SAND: Relation between the Database and Printed Maps Erik Tjong Kim Sang Meertens Institute erik.tjong.kim.sang@meertens.knaw.nl May 16, 2014 1 Introduction SAND, the Syntactic Atlas of the Dutch Dialects,

More information

Reasoning Component Architecture

Reasoning Component Architecture Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture

More information

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL

More information

Security visualisation

Security visualisation Security visualisation This thesis provides a guideline of how to generate a visual representation of a given dataset and use visualisation in the evaluation of known security vulnerabilities by Marco

More information

An Introduction to KeyLines and Network Visualization

An Introduction to KeyLines and Network Visualization An Introduction to KeyLines and Network Visualization 1. What is KeyLines?... 2 2. Benefits of network visualization... 2 3. Benefits of KeyLines... 3 4. KeyLines architecture... 3 5. Uses of network visualization...

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Online Search Engine Advertising Data Visualization Tool

Online Search Engine Advertising Data Visualization Tool Online Search Engine Advertising Data Visualization Tool Project Proposal Yingsai Dong dysalbert@gmail.com Department of Computer Science University of British Columbia CPSC 547 Information Visualization

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining Mining Process CRISP - DM Cross-Industry Standard Process for Mining (CRISP-DM) European Community funded effort to develop framework for data mining tasks Goals: Cross-Industry Standard Process for Mining

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Transcription bottleneck of speech corpus exploitation

Transcription bottleneck of speech corpus exploitation Transcription bottleneck of speech corpus exploitation Caren Brinckmann Institut für Deutsche Sprache, Mannheim, Germany Lesser Used Languages and Computer Linguistics (LULCL) II Nov 13/14, 2008 Bozen

More information

Data Integration for ArcGIS Users Data Interoperability. Charmel Menzel, ESRI Don Murray, Safe Software

Data Integration for ArcGIS Users Data Interoperability. Charmel Menzel, ESRI Don Murray, Safe Software Data Integration for ArcGIS Users Data Interoperability Charmel Menzel, ESRI Don Murray, Safe Software Product overview Extension to ArcGIS (optional) Jointly developed with Safe Software Based on Feature

More information

Dialect Corpora Taken Further: The DynaSAND corpus and its application in newer tools

Dialect Corpora Taken Further: The DynaSAND corpus and its application in newer tools PACLIC 24 Proceedings 759 Dialect Corpora Taken Further: The DynaSAND corpus and its application in newer tools Jan Pieter Kunst a and Franca Wesseling b a Meertens Institute, Royal Netherlands Academy

More information

ArcGIS Online. Visualizing Data: Tutorial 3 of 4. Created by: Julianna Kelly

ArcGIS Online. Visualizing Data: Tutorial 3 of 4. Created by: Julianna Kelly ArcGIS Online Visualizing Data: Tutorial 3 of 4 2014 Created by: Julianna Kelly Contents of This Tutorial The Goal of This Tutorial In this tutorial we will learn about the analysis tools that ArcGIS Online

More information

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be

More information

Easy Map Excel Tool USER GUIDE

Easy Map Excel Tool USER GUIDE Easy Map Excel Tool USER GUIDE Overview Easy Map tool provides basic maps showing customized data, by Ontario health unit geographies. This tool will come in handy especially when there is no dedicated

More information

Big Data Processing and Analytics for Mouse Embryo Images

Big Data Processing and Analytics for Mouse Embryo Images Big Data Processing and Analytics for Mouse Embryo Images liangxiu han Zheng xie, Richard Baldock The AGILE Project team FUNDS Research Group - Future Networks and Distributed Systems School of Computing,

More information

Interactive Visual Data Analysis in the Times of Big Data

Interactive Visual Data Analysis in the Times of Big Data Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net)

More information

Web Data Extraction: 1 o Semestre 2007/2008

Web Data Extraction: 1 o Semestre 2007/2008 Web Data : Given Slides baseados nos slides oficiais do livro Web Data Mining c Bing Liu, Springer, December, 2006. Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008

More information

RAMS Software Techniques in European Space Projects

RAMS Software Techniques in European Space Projects RAMS Software Techniques in European Space Projects An Industrial View J.M. Carranza COMPASS Workshop - York, 29/03/09 Contents Context and organisation of ESA projects Evolution of RAMS Techniques in

More information

COC131 Data Mining - Clustering

COC131 Data Mining - Clustering COC131 Data Mining - Clustering Martin D. Sykora m.d.sykora@lboro.ac.uk Tutorial 05, Friday 20th March 2009 1. Fire up Weka (Waikako Environment for Knowledge Analysis) software, launch the explorer window

More information

Developing Fleet and Asset Tracking Solutions with Web Maps

Developing Fleet and Asset Tracking Solutions with Web Maps Developing Fleet and Asset Tracking Solutions with Web Maps Introduction Many organizations have mobile field staff that perform business processes away from the office which include sales, service, maintenance,

More information

Deliverable 12.1 Training Plan

Deliverable 12.1 Training Plan Deliverable 12.1 Training Plan DAM-LR 011841 Distributed Access Management for Language Resources implemented as Specific Support Action Contract Number: 011841 Project Coordinator: Peter Wittenburg Project

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Ensembles and PMML in KNIME

Ensembles and PMML in KNIME Ensembles and PMML in KNIME Alexander Fillbrunn 1, Iris Adä 1, Thomas R. Gabriel 2 and Michael R. Berthold 1,2 1 Department of Computer and Information Science Universität Konstanz Konstanz, Germany First.Last@Uni-Konstanz.De

More information

Einführung in die Kognitive Ergonomie

Einführung in die Kognitive Ergonomie 147 Vorlesung 8, den 9. Dezember 1999 148 147 Vorlesung 8, den 9. Dezember 1999 Donnerstag, den 9. Dezember 1999 Einführung in die Kognitive Ergonomie Wintersemester 1999/2000 1. Direct Manipulation and

More information

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus

TEXT-FILLED STACKED AREA GRAPHS Martin Kraus Martin Kraus Text can add a significant amount of detail and value to an information visualization. In particular, it can integrate more of the data that a visualization is based on, and it can also integrate

More information

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

More information

Co-Creation of Models and Metamodels for Enterprise. Architecture Projects.

Co-Creation of Models and Metamodels for Enterprise. Architecture Projects. Co-Creation of Models and Metamodels for Enterprise Architecture Projects Paola Gómez pa.gomez398@uniandes.edu.co Hector Florez ha.florez39@uniandes.edu.co ABSTRACT The linguistic conformance and the ontological

More information

National Register of Historic Places: GIS Webinar Cultural Resource GIS Facility National Park Service June 2012

National Register of Historic Places: GIS Webinar Cultural Resource GIS Facility National Park Service June 2012 National Register of Historic Places: GIS Webinar Cultural Resource GIS Facility National Park Service June 2012 In February and March 2012 the National Register of Historic Places held webinars in conjunction

More information

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

Deep profiling of multitube flow cytometry data Supplemental information

Deep profiling of multitube flow cytometry data Supplemental information Deep profiling of multitube flow cytometry data Supplemental information Kieran O Neill et al December 19, 2014 1 Table S1: Markers in simulated multitube data. The data was split into three tubes, each

More information

USGS Community for Data Integration

USGS Community for Data Integration Community of Science: Strategies for Coordinating Integration of Data USGS Community for Data Integration Kevin T. Gallagher USGS Core Science Systems January 11, 2013 U.S. Department of the Interior U.S.

More information

Employee Survey Analysis

Employee Survey Analysis Employee Survey Analysis Josh Froelich, Megaputer Intelligence Sergei Ananyan, Megaputer Intelligence www.megaputer.com Megaputer Intelligence, Inc. 120 West Seventh Street, Suite 310 Bloomington, IN 47404

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Automate Data Integration Processes for Pharmaceutical Data Warehouse

Automate Data Integration Processes for Pharmaceutical Data Warehouse Paper AD01 Automate Data Integration Processes for Pharmaceutical Data Warehouse Sandy Lei, Johnson & Johnson Pharmaceutical Research and Development, L.L.C, Titusville, NJ Kwang-Shi Shu, Johnson & Johnson

More information

ADVANCED SEMI-AUTOMATIC VISUALIZATION OF SPATIAL DATA USING INSTANTATLAS

ADVANCED SEMI-AUTOMATIC VISUALIZATION OF SPATIAL DATA USING INSTANTATLAS CO-384 ADVANCED SEMI-AUTOMATIC VISUALIZATION OF SPATIAL DATA USING INSTANTATLAS VONDRAKOVA A., HARBULA J., HLADISOVA B., VOZENILEK V. Palacky University Olomouc, OLOMOUC, CZECH REPUBLIC Introduction Semi-automatic

More information

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION

USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION USING SELF-ORGANISING MAPS FOR ANOMALOUS BEHAVIOUR DETECTION IN A COMPUTER FORENSIC INVESTIGATION B.K.L. Fei, J.H.P. Eloff, M.S. Olivier, H.M. Tillwick and H.S. Venter Information and Computer Security

More information

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class

Homework 4 Statistics W4240: Data Mining Columbia University Due Tuesday, October 29 in Class Problem 1. (10 Points) James 6.1 Problem 2. (10 Points) James 6.3 Problem 3. (10 Points) James 6.5 Problem 4. (15 Points) James 6.7 Problem 5. (15 Points) James 6.10 Homework 4 Statistics W4240: Data Mining

More information

DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION

DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION DATA QUALITY AND SCALE IN CONTEXT OF EUROPEAN SPATIAL DATA HARMONISATION Katalin Tóth, Vanda Nunes de Lima European Commission Joint Research Centre, Ispra, Italy ABSTRACT The proposal for the INSPIRE

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

A Statistical Spatial Framework to Inform Regional Statistics

A Statistical Spatial Framework to Inform Regional Statistics A Statistical Spatial Framework to Inform Regional Statistics Martin Brady & Gemma Van Halderen Australian Bureau of Statistics, Canberra, Australia Corresponding Author: m.brady@abs.gov.au Abstract Statisticians

More information

Connecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules

Connecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules Journal of Universal Computer Science, vol. 11, no. 11(2005), 1835-1848 submitted: 1/9/05, accepted: 1/10/05, appeared: 28/11/05 J.UCS Connecting Segments for Visual Data Exploration and Interactive Mining

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

CLARIN-NL Third Call: Closed Call

CLARIN-NL Third Call: Closed Call CLARIN-NL Third Call: Closed Call CLARIN-NL launches in its third call a Closed Call for project proposals. This called is only open for researchers who have been explicitly invited to submit a project

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Identifying Patterns in DNS Traffic

Identifying Patterns in DNS Traffic Identifying Patterns in DNS Traffic Pieter Lexis System and Network Engineering Thu, Jul 4 2013 Reflection and Amplification Attacks DNS abused as DDoS Tool Spamhaus hit with 300 Gigabit/second DDoS Reflected

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

WEB-BASED VISUAL EXPLORATION AND ERROR DETECTION IN LARGE DATA SETS: ANTARCTIC ICEBERG TRACKING DATA AS A CASE

WEB-BASED VISUAL EXPLORATION AND ERROR DETECTION IN LARGE DATA SETS: ANTARCTIC ICEBERG TRACKING DATA AS A CASE WEB-BASED VISUAL EXPLORATION AND ERROR DETECTION IN LARGE DATA SETS: ANTARCTIC ICEBERG TRACKING DATA AS A CASE Connie A. Blok blok@itc.nl Ulanbek Turdukulov turdukulov@itc.nl Barend Köbben Juan Luis Calle

More information

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper

Quick and Easy Web Maps with Google Fusion Tables. SCO Technical Paper Quick and Easy Web Maps with Google Fusion Tables SCO Technical Paper Version History Version Date Notes Author/Contact 1.0 July, 2011 Initial document created. Howard Veregin 1.1 Dec., 2011 Updated to

More information

Performance Metrics for Graph Mining Tasks

Performance Metrics for Graph Mining Tasks Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical

More information

Easily add Maps and Geo Analytics in MicroStrategy

Easily add Maps and Geo Analytics in MicroStrategy Easily add Maps and Geo Analytics in MicroStrategy Agenda Introduction Configure to use Maps in MicroStrategy MicroStrategy Geo Analysis Capabilities and Examples Key Takeaways and Q&A Why Geospatial Analysis

More information

To introduce software process models To describe three generic process models and when they may be used

To introduce software process models To describe three generic process models and when they may be used Software Processes Objectives To introduce software process models To describe three generic process models and when they may be used To describe outline process models for requirements engineering, software

More information

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore. CI6227: Data Mining Lesson 11b: Ensemble Learning Sinno Jialin PAN Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore Acknowledgements: slides are adapted from the lecture notes

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

FastStats & Dashboard Product Overview

FastStats & Dashboard Product Overview FastStats & Dashboard Product Overview Guide for Clients July 2011 Version 1 Matrix FastStats Overview Matrix believes that FastStats is an ideal analytics tool for UK Mortgage lenders. Matrix FastStats

More information

www.thevantagepoint.com

www.thevantagepoint.com Doing More with Less: How efficient analysis can improve your vantage point on information Nils Newman Director of New Business Development Search Technology newman@searchtech.com PIUG Workshop Topics

More information

Compiling a Dictionary of an Unwritten Language: A Noncorpus-based

Compiling a Dictionary of an Unwritten Language: A Noncorpus-based Compiling a Dictionary of an Unwritten Language: A Noncorpus-based Approach Jacques van Keymeulen, Department of Dutch Linguistics, Ghent University, Belgium (jacques.vankeymeulen@ugent.be) Abstract: In

More information

Between voicing and aspiration

Between voicing and aspiration Workshop Maps and Grammar 17-18 September 2014 Introduction Dutch-German dialect continuum Voicing languages vs. aspiration languages Phonology meets phonetics Phonetically continuous, phonologically discrete

More information

Classify then Summarize or Summarize then Classify

Classify then Summarize or Summarize then Classify Classify then Summarize or Summarize then Classify DIMACS, Rutgers University Piscataway, NJ 08854 Workshop Honoring Edwin Diday held on September 4, 2007 What is Cluster Analysis? Software package? Collection

More information

A GIS BASED GROUNDWATER MANAGEMENT TOOL FOR LONG TERM MINERAL PLANNING

A GIS BASED GROUNDWATER MANAGEMENT TOOL FOR LONG TERM MINERAL PLANNING A GIS BASED GROUNDWATER MANAGEMENT TOOL FOR LONG TERM MINERAL PLANNING Mauro Prado, Hydrogeologist - SRK Consulting, Perth, Australia Richard Connelly, Principal Hydrogeologist - SRK UK Ltd, Cardiff, United

More information