Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams

Size: px
Start display at page:

Download "Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams"

Transcription

1 1 Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams Magdalena Jankowska, Evangelos Milios, Vlado Kešelj Faculty of Computer Science, Dalhousie University aaaa June 2013

2 2 Relative N-Gram Signatures Interactive classification of a document Who wrote this book? Analysis of characteristics of a document What are the characteristics of the author s style? Language independent method

3 5 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: Alice's Adventures in the Wonderland by Lewis Carroll

4 6 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC

5 7 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE

6 8 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE ICE_

7 9 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE ICE_ CE_W

8 10 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams ALIC LICE ICE_ CE_W n-grams in our system: uppercase Alice's Adventures in the Wonderland by Lewis Carroll each sequence of non-word characters replaced by an underscore

9 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? works of Carrol works of Twain works of Shakespeare Proposed by Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based author profiles for authorship attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 03,

10 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? comparison of the frequency of the most common character n-grams works of Carrol works of Twain works of Shakespeare Proposed by Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based author profiles for authorship attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 03,

11 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? comparison of the frequency of the most common character n-grams works of Carrol works of Twain works of Shakespeare Applications: Authorship attribution Malicious code detection Gene classification Web page genre classification 13

12 14 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ A N D _ T H E _ T H E T H E f 1 _ T H E n=4, L=6 n-gram normalized frequency

13 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ distance(f 1 (x), f 2 (x) ) A N D _ T H E _ T H E _ n=4, L=6 _ T H E f 1 _ T H E f 2 f 2 (x)=0 f 1 (x)=0 15

14 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ distance(f 1 (x), f 2 (x) ) A N D _ T H E _ T H E _ n=4, L=6 _ T H E f 1 _ T H E f 2 CNG dissimilarity between two documents f 1 (x)=0 sum of the distances with respect to all n-grams in the union of the profiles 16

15 Motivation text visualization on the language-independent level of character n-grams similarity of documents characteristics of documents visualization of the CNG classifier reasons for the classification result possibility of influencing the classification 17

16 RNG-Sig Web application 18 Implemented as a web application d3.js JavaScript library for visualization Available online with pre-loaded data at:

17 19 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Visual representation of the CNG dissimilarity between two documents n=4 (4-grams) L=500 (500 most common n-grams)

18 20 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs zoom with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Each strip represents an n-gram n=4 (4-grams) L=500 (500 most common n-grams)

19 21 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) n=4 (4-grams) L=500 (500 most common n-grams) Each strip represents an n-gram 500 most common n-grams of Alice's Adventures decreasing frequency in Alice's Adventures

20 22 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) n=4 (4-grams) L=500 (500 most common n-grams) these among 500 most common n-grams of Tarzan that are not among 500 most common n-grams of Alice's Adventures Each strip represents an n-gram 500 most common n-grams of Alice's Adventures decreasing frequency in Tarzan decreasing frequency in Alice's Adventures

21 23 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Color: distance of two documents with respect to this n-gram n=4 (4-grams) L=500 (500 most common n-grams)

22 24 Relative N-Gram Signature Visualizes similarity between documents on the level of character n-grams Relative signature of two documents that do not share any of their respective 500 most common n-grams Signature of a document with respect to itself

23 Relative N-Gram Signature Visual metaphor 25 Inspiration: emission spectrum spectrum of frequencies of electromagnetic emissions by atoms or molecules picture from Wikipedia

24 26 Sequence of signatures scenario authorship analysis

25 27 Sequence of signatures The same base document

26 28 Sequence of signatures Signature of the most similar document Carrol s Through the looking glass

27 29 Sequence of signatures CNG dissimilarity score sum of the distances over all n-grams in a signature

28 30 Sequence of signatures minimum dissimilarity = classifier result

29 31 Sequence of signatures zooming in

30 Interactive exploration of signatures 32 browsing

31 Interactive exploration of signatures 33 context: most common words

32 Interactive exploration of signatures 34

33 Interactive exploration of signatures 35

34 Interactive exploration of signatures 36 context: concordance style given n-gram within the text

35 Language independence 37 Polish authors

36 Language independence 38 Polish authors searching for n-grams

37 Language independence 39 Polish authors

38 40 Motivation for analysis of Mark Twain novels D. A. Keim and D. Oelke. Literature Fingerprinting: A New Method for Visual Literary Analysis. In Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology, Hapax Legomena Visual analysis of works of Mark Twain: Adventures of Huckleberry Finn stands out from the other works of Mark Twain with respect to: Function words frequency Simpson's index Hapax Legomena Function words (first dimension after PCA)

39 Example analysis: comparison of novels by Mark Twain Adventures of Huckleberry Finn 41

40 42 Complementary Comparison View N-grams ordered separately in each signature, according to their distance Better for comparison of signatures but not for exploring

41 43 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Manual, task-dependent adaptation of the classification process Problem G, sample 02

42 44 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Problem G, sample 02

43 45 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Problem G, sample 02

44 Interactively influencing the visualization and the classifier n-grams originating mostly from proper names ignoring selected n-grams in the base document Two options: the length of the list of n-grams in the base document is kept Intact by adding less frequent n-grams at the top no new n-grams are added the list of n-grams for the base document becomes shorter 46

45 Interactively influencing the visualization and the classifier ignoring selected n-grams in the base document correct classification result 47

46 Thank you! 65

Visual Analytics: Combining Automated Discovery with Interactive Visualizations

Visual Analytics: Combining Automated Discovery with Interactive Visualizations Visual Analytics: Combining Automated Discovery with Interactive Visualizations Daniel A. Keim, Florian Mansmann, Daniela Oelke, and Hartmut Ziegler University of Konstanz, Germany first.lastname@uni-konstanz.de,

More information

Evaluation of Authorship Attribution Software on a Chat Bot Corpus

Evaluation of Authorship Attribution Software on a Chat Bot Corpus Evaluation of Authorship Attribution Software on a Chat Bot Corpus Nawaf Ali Computer Engineering and Computer Science J. B. Speed School of Engineering University of Louisville Louisville, KY. USA ntali001@louisville.edu

More information

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering

Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo

More information

Chapter 1 Learning to Program With Alice

Chapter 1 Learning to Program With Alice Chapter 1 Learning to Program With Alice (Duke Book) Rather than typing a difficult, long and obscure computer language in the hopes of getting a calculation, with Alice you will be more like a director

More information

Blogs and Twitter Feeds: A Stylometric Environmental Impact Study

Blogs and Twitter Feeds: A Stylometric Environmental Impact Study Blogs and Twitter Feeds: A Stylometric Environmental Impact Study Rebekah Overdorf, Travis Dutko, and Rachel Greenstadt Drexel University Philadelphia, PA {rjo43,tad82,greenie}@drexel.edu http://www.cs.drexel.edu/

More information

Graduate Studies in Computer Science at Dalhousie University. Evangelos Milios Faculty of Computer Science Dalhousie University www.cs.dal.

Graduate Studies in Computer Science at Dalhousie University. Evangelos Milios Faculty of Computer Science Dalhousie University www.cs.dal. Graduate Studies in Computer Science at Dalhousie University Evangelos Milios Faculty of Computer Science Dalhousie University www.cs.dal.ca/~eem Bird s eye view of Halifax Halifax Fun Halifax, Nova Scotia

More information

Readability Visualization for Massive Text Data

Readability Visualization for Massive Text Data , pp.241-248 http://dx.doi.org/10.14257/ijmue.2014.9.9.25 Readability Visualization for Massive Text Data Hyoyoung Kim *, Jin Wan Park * and Dongsu Seo ** GSAIM, Chung-Ang University * The.kimyo@gmail.com,

More information

LINCOLN SCHOOL 2015-16 Course Syllabus: English 7 Theme: How does literature challenge, change, and define us?

LINCOLN SCHOOL 2015-16 Course Syllabus: English 7 Theme: How does literature challenge, change, and define us? LINCOLN SCHOOL 2015-16 Course Syllabus: English 7 Theme: How does literature challenge, change, and define us? Teacher: Room: Length: Ms. Jenny Nam / jnam@lincnet.org / 781-259- 9408 ext. 1127 B- 127 3

More information

Write my paper intelligence studies. Physics accompanies submission of dissertation in Part I and submission of a Project.

Write my paper intelligence studies. Physics accompanies submission of dissertation in Part I and submission of a Project. Write my paper intelligence studies. Physics accompanies submission of dissertation in Part I and submission of a Project. Write my paper intelligence studies >>>CLICK HERE

More information

RNA Structure and folding

RNA Structure and folding RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure

More information

2.0. Specification of HSN 2.0 JavaScript Static Analyzer

2.0. Specification of HSN 2.0 JavaScript Static Analyzer 2.0 Specification of HSN 2.0 JavaScript Static Analyzer Pawe l Jacewicz Version 0.3 Last edit by: Lukasz Siewierski, 2012-11-08 Relevant issues: #4925 Sprint: 11 Summary This document specifies operation

More information

Author Gender Identification of English Novels

Author Gender Identification of English Novels Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in

More information

Computer Aided Document Indexing System

Computer Aided Document Indexing System Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia

More information

Integrating Web Content Clustering into Web Log Association Rule Mining

Integrating Web Content Clustering into Web Log Association Rule Mining Integrating Web Content Clustering into Web Log Association Rule Mining Jiayun Guo, Vlado Kešelj, and Qigang Gao Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS,

More information

Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware

Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware International Journal of Machine Learning and Computing, Vol. 4, No. 3, June 2014 Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware Henry C. Williams, Joi

More information

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong

More information

From lowest energy to highest energy, which of the following correctly orders the different categories of electromagnetic radiation?

From lowest energy to highest energy, which of the following correctly orders the different categories of electromagnetic radiation? From lowest energy to highest energy, which of the following correctly orders the different categories of electromagnetic radiation? From lowest energy to highest energy, which of the following correctly

More information

Data Deduplication in Slovak Corpora

Data Deduplication in Slovak Corpora Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain

More information

Interpreting areading Scaled Scores for Instruction

Interpreting areading Scaled Scores for Instruction Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should

More information

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?

More information

Detecting Internet Worms Using Data Mining Techniques

Detecting Internet Worms Using Data Mining Techniques Detecting Internet Worms Using Data Mining Techniques Muazzam SIDDIQUI Morgan C. WANG Institute of Simulation & Training Department of Statistics and Actuarial Sciences University of Central Florida University

More information

Master of Arts. Program in English

Master of Arts. Program in English Master of Arts Program in English Indiana University East Department of English Program Contact: Edwina Helton, Director of Graduate Programs in English edhelton@iue.edu Master of Arts in English The Master

More information

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08

9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft

More information

Unit 10.4: Stories of Other Worlds: Science Fiction, Fantasy, and Imaginative Literature

Unit 10.4: Stories of Other Worlds: Science Fiction, Fantasy, and Imaginative Literature Unit 10.4: Stories of Other Worlds: Science Fiction, Fantasy, and Imaginative Literature The final quarter of the year gives students opportunities to let their minds roam free to distant or imagined worlds,

More information

Guidelines for Establishment of Contract Areas Computer Science Department

Guidelines for Establishment of Contract Areas Computer Science Department Guidelines for Establishment of Contract Areas Computer Science Department Current 07/01/07 Statement: The Contract Area is designed to allow a student, in cooperation with a member of the Computer Science

More information

Microsoft Band Web Tile

Microsoft Band Web Tile Band Web Tile Web Tile Documentation By using this Band Web Tile, you agree to be bound by the Terms of Use. Further, if accepting on behalf of a company, then you represent that you are authorized to

More information

Continuous Biometric User Authentication in Online Examinations

Continuous Biometric User Authentication in Online Examinations 2010 Seventh International Conference on Information Technology Continuous Biometric User Authentication in Online Examinations Eric Flior, Kazimierz Kowalski Department of Computer Science, California

More information

Computer-aided Document Indexing System

Computer-aided Document Indexing System Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous

More information

Strategy Formulation in Japanese Management

Strategy Formulation in Japanese Management Strategy Formulation in Japanese Management Copyright 2007 Keio University Quote of the Day Cheshire Puss, she (Alice) began would you tell me, please, which way to got from there? That depends a good

More information

Abstract. Introduction

Abstract. Introduction CODATA Prague Workshop Information Visualization, Presentation, and Design 29-31 March 2004 Abstract Goals of Analysis for Visualization and Visual Data Mining Tasks Thomas Nocke and Heidrun Schumann University

More information

Applying Static Analysis to High-Dimensional Malicious Application Detection

Applying Static Analysis to High-Dimensional Malicious Application Detection Applying Static Analysis to High-Dimensional Malicious Application Detection Sean Semple, Stanislav Ponomarev, Jan Durand, Travis Atkison Louisiana Tech University Ruston, LA 71270 {sms079, spo013, jrd037,

More information

Cross-Language Authorship Attribution

Cross-Language Authorship Attribution Cross-Language Authorship Attribution Dasha Bogdanova (1), Angeliki Lazaridou (2) (1) CNGL Centre for Global Intelligent Content, School of Computing, Dublin City University, Dublin, Ireland (2) Center

More information

INFRARED SPECTROSCOPY (IR)

INFRARED SPECTROSCOPY (IR) INFRARED SPECTROSCOPY (IR) Theory and Interpretation of IR spectra ASSIGNED READINGS Introduction to technique 25 (p. 833-834 in lab textbook) Uses of the Infrared Spectrum (p. 847-853) Look over pages

More information

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc

Search Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,

More information

Determining if Two Documents are by the Same Author

Determining if Two Documents are by the Same Author DETERMINING IF TWO DOCUMENTS ARE BY THE SAME AUTHOR 1 Determining if Two Documents are by the Same Author Moshe Koppel (Corresponding Author) Dept. of Computer Science Bar-Ilan University Ramat-Gan, Israel

More information

Electromagnetic Radiation (EMR) and Remote Sensing

Electromagnetic Radiation (EMR) and Remote Sensing Electromagnetic Radiation (EMR) and Remote Sensing 1 Atmosphere Anything missing in between? Electromagnetic Radiation (EMR) is radiated by atomic particles at the source (the Sun), propagates through

More information

Technical Report. The KNIME Text Processing Feature:

Technical Report. The KNIME Text Processing Feature: Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project

The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project Seminar on Dec 19 th Abstracts & speaker information The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project Eleni Bozia (USA) Angelos Barmpoutis (USA)

More information

The Title of a Yale University Doctoral. Dissertation

The Title of a Yale University Doctoral. Dissertation The Title of a Yale University Doctoral Dissertation A Dissertation Presented to the Faculty of the Graduate School of Yale University in Candidacy for the Degree of Doctor of Philosophy by The Author

More information

TIBCO Spotfire Network Analytics 1.1. User s Manual

TIBCO Spotfire Network Analytics 1.1. User s Manual TIBCO Spotfire Network Analytics 1.1 User s Manual Revision date: 26 January 2009 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO

More information

2015-2016 North Dakota Advanced Placement (AP) Course Codes. Computer Science Education Course Code 23580 Advanced Placement Computer Science A

2015-2016 North Dakota Advanced Placement (AP) Course Codes. Computer Science Education Course Code 23580 Advanced Placement Computer Science A 2015-2016 North Dakota Advanced Placement (AP) Course Codes Computer Science Education Course Course Name Code 23580 Advanced Placement Computer Science A 23581 Advanced Placement Computer Science AB English/Language

More information

Mining a Corpus of Job Ads

Mining a Corpus of Job Ads Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department

More information

Biometric Authentication using Online Signatures

Biometric Authentication using Online Signatures Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,

More information

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks

Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks 1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of

More information

Adventures in Alice Programming

Adventures in Alice Programming California State University, Northridge Summer Academic Enrichment Program Adventures in Alice Programming Course Overview: The Adventures in Alice Programming class teaches the student how to develop

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

PENNSYLVANIA COMMON CORE STANDARDS English Language Arts Grades 9-12

PENNSYLVANIA COMMON CORE STANDARDS English Language Arts Grades 9-12 1.2 Reading Informational Text Students read, understand, and respond to informational text with emphasis on comprehension, making connections among ideas and between texts with focus on textual evidence.

More information

Content Management System User Guide

Content Management System User Guide CWD Clark Web Development Ltd Content Management System User Guide Version 1.0 1 Introduction... 3 What is a content management system?... 3 Browser requirements... 3 Logging in... 3 Page module... 6 List

More information

Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr

Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr Copyright c 2007-2010 Xavier Clerc cadmium@x9c.fr Released under the LGPL version 3 February 6, 2010 Abstract: This

More information

Visualizing Repertory Grid Data for Formative Assessment

Visualizing Repertory Grid Data for Formative Assessment Visualizing Repertory Grid Data for Formative Assessment Kostas Pantazos 1, Ravi Vatrapu 1, 2 and Abid Hussain 1 1 Computational Social Science Laboratory (CSSL) Department of IT Management, Copenhagen

More information

Successful graduates of the MA program in American Studies will be awarded a degree in which the following two elements will appear:

Successful graduates of the MA program in American Studies will be awarded a degree in which the following two elements will appear: Abbreviations STUDY UNIT LIST (MA) D = MA degree thesis (Hungarian szakdolgozat ) G = seminar (practical class; Hu gyakorlat ) K = lecture (Hu kollokvium ) Sz = comprehensive examination (Hu szigorlat

More information

Visualizing molecular simulations

Visualizing molecular simulations Visualizing molecular simulations ChE210D Overview Visualization plays a very important role in molecular simulations: it enables us to develop physical intuition about the behavior of a system that is

More information

Group Theory and Chemistry

Group Theory and Chemistry Group Theory and Chemistry Outline: Raman and infra-red spectroscopy Symmetry operations Point Groups and Schoenflies symbols Function space and matrix representation Reducible and irreducible representation

More information

A Practical Attack to De Anonymize Social Network Users

A Practical Attack to De Anonymize Social Network Users A Practical Attack to De Anonymize Social Network Users Gilbert Wondracek () Thorsten Holz () Engin Kirda (Institute Eurecom) Christopher Kruegel (UC Santa Barbara) http://iseclab.org 1 Attack Overview

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Interactive Visual Data Analysis in the Times of Big Data

Interactive Visual Data Analysis in the Times of Big Data Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net)

More information

School Library Website Components

School Library Website Components School Library Website Components by Odin Jurkowski Don t frustrate students by designing a flashy site that is difficult for them to use. School library websites are a necessity. The earliest adopters

More information

Creating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows

Creating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows Creating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows Published by Nick Grattan Consultancy Limited 2009. All rights reserved. Version 1.00. Nick Grattan Consultancy

More information

Interactive Exploration of Decision Tree Results

Interactive Exploration of Decision Tree Results Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,

More information

Lesson 15 - Fill Cells Plugin

Lesson 15 - Fill Cells Plugin 15.1 Lesson 15 - Fill Cells Plugin This lesson presents the functionalities of the Fill Cells plugin. Fill Cells plugin allows the calculation of attribute values of tables associated with cell type layers.

More information

Getting Started with Scratch

Getting Started with Scratch Getting Started with Scratch a guide to designing introductory Scratch workshops draft version, september 2009 Overview There s no one way to host a Scratch workshop. Workshops can take on a variety of

More information

Visualizing Poetry: Creating Tools for Critical Analysis. Introduction Current debates over distant reading (Moretti) seem to imply that digital tools

Visualizing Poetry: Creating Tools for Critical Analysis. Introduction Current debates over distant reading (Moretti) seem to imply that digital tools Visualizing Poetry: Creating Tools for Critical Analysis Luis Meneses and Richard Furuta Introduction Current debates over distant reading (Moretti) seem to imply that digital tools are suited to nothing

More information

Custom Linetypes (.LIN)

Custom Linetypes (.LIN) Custom Linetypes (.LIN) AutoCAD provides the ability to create custom linetypes or to adjust the linetypes supplied with the system during installation. Linetypes in AutoCAD can be classified into two

More information

Rolling the Dice on Big Data. Ilse Ipsen Department of Mathematics

Rolling the Dice on Big Data. Ilse Ipsen Department of Mathematics Rolling the Dice on Big Data Ilse Ipsen Department of Mathematics The Economist, 27 February 2010 Science, 11 February 2011 McKinsey Global Institute, May 2011 Rolling the Dice on Big Data What is Big?

More information

Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 4 th, 2007

Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 4 th, 2007 Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 4 th, 2007 The Use of Stylometry for Email Author Identification: A Feasibility Study Robert Goodman, Matthew Hahn, Madhuri Marella,

More information

Development of Leadership Skills in Engineering Employment. Charles W Turner Emeritus Professor of Electrical Engineering, King s College London

Development of Leadership Skills in Engineering Employment. Charles W Turner Emeritus Professor of Electrical Engineering, King s College London Development of Leadership Skills in Engineering Employment Charles W Turner Emeritus Professor of Electrical Engineering, King s College London IEEE Education Society Continuing Education Program This

More information

Narrative Literature Response Letters Grade Three

Narrative Literature Response Letters Grade Three Ohio Standards Connection Writing Applications Benchmark A Write narrative accounts that develop character, setting and plot. Indicator: 1 Write stories that sequence events and include descriptive details

More information

This use study analyzes a specific scenario for a financial credit interaction for an online personal loan request.

This use study analyzes a specific scenario for a financial credit interaction for an online personal loan request. Case Study: Online Personal Loan Scenario v.01 Author: Domenico Catalano Introduction Personal Information sharing is an emerging trend for online personal daily life activities, including the interaction

More information

Simple Language Models for Spam Detection

Simple Language Models for Spam Detection Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to

More information

ELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING

ELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING ELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING Prashant D. Abhonkar 1, Preeti Sharma 2 1 Department of Computer Engineering, University of Pune SKN Sinhgad Institute of Technology & Sciences,

More information

Reading is the process in which the reader constructs meaning by interacting with the text.

Reading is the process in which the reader constructs meaning by interacting with the text. Part 1 Reading is the process in which the reader constructs meaning by interacting with the text. This interactive process involves the reader s prior knowledge, the text, and the reading situation. Literal

More information

Interactive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents

Interactive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents Interactive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents Carlos Monroy, Rajiv Kochumman, Richard Furuta, and Eduardo Urbina TEES Center for the Study of Digital Libraries Texas

More information

WAFFle: Fingerprinting Filter Rules of Web Application Firewalls

WAFFle: Fingerprinting Filter Rules of Web Application Firewalls Email: sebastian.schinzel@cs.fau.de Twitter: @seecurity WAFFle: Fingerprinting Filter Rules of Web Application Firewalls Isabell Schmitt, Sebastian Schinzel* Friedrich-Alexander Universität Erlangen-Nürnberg

More information

Classifying Manipulation Primitives from Visual Data

Classifying Manipulation Primitives from Visual Data Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if

More information

Chemistry 102 Summary June 24 th. Properties of Light

Chemistry 102 Summary June 24 th. Properties of Light Chemistry 102 Summary June 24 th Properties of Light - Energy travels through space in the form of electromagnetic radiation (EMR). - Examples of types of EMR: radio waves, x-rays, microwaves, visible

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

COSC 6397 Big Data Analytics. Mahout and 3 rd homework assignment. Edgar Gabriel Spring 2014. Mahout

COSC 6397 Big Data Analytics. Mahout and 3 rd homework assignment. Edgar Gabriel Spring 2014. Mahout COSC 6397 Big Data Analytics Mahout and 3 rd homework assignment Edgar Gabriel Spring 2014 Mahout Scalable machine learning library Built with MapReduce and Hadoop in mind Written in Java Focusing on three

More information

Global Music Management MPAMB-GE.2207 3 points; NYU in London; Spring 2016

Global Music Management MPAMB-GE.2207 3 points; NYU in London; Spring 2016 DEPAR TMEN T O F MU SIC AND PER FOR MING AR TS PR OFESSION S Music Business Program Sample January 2016 Syllabus for website only -- subject to change Global Music Management MPAMB-GE.2207 3 points; NYU

More information

Cyber Security Through Visualization

Cyber Security Through Visualization Cyber Security Through Visualization Kwan-Liu Ma Department of Computer Science University of California at Davis Email: ma@cs.ucdavis.edu Networked computers are subject to attack, misuse, and abuse.

More information

Visual Structure Analysis of Flow Charts in Patent Images

Visual Structure Analysis of Flow Charts in Patent Images Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information

More information

Learning Objectives. Required Resources. Tasks. Deliverables

Learning Objectives. Required Resources. Tasks. Deliverables Fleet Modeling 10 Purpose This activity introduces you to the Vehicle Routing Problem (VRP) and fleet modeling through the use of a previously developed model. Using the model, you will explore the relationships

More information

INFORMATION VISUALIZATION TECHNIQUES USAGE MODEL

INFORMATION VISUALIZATION TECHNIQUES USAGE MODEL INFORMATION VISUALIZATION TECHNIQUES USAGE MODEL Akanmu Semiu A. 1 and Zulikha Jamaludin 2 1 Universiti Utara Malaysia, Malaysia, ayobami.sm@gmail.com 2 Universiti Utara Malaysia, Malaysia, zulie@uum.edu.my

More information

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition

POSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics

More information

Movie Classification Using k-means and Hierarchical Clustering

Movie Classification Using k-means and Hierarchical Clustering Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani

More information

Atomic Calculations. 2.1 Composition of the Atom. number of protons + number of neutrons = mass number

Atomic Calculations. 2.1 Composition of the Atom. number of protons + number of neutrons = mass number 2.1 Composition of the Atom Atomic Calculations number of protons + number of neutrons = mass number number of neutrons = mass number - number of protons number of protons = number of electrons IF positive

More information

Web Modules for Garden Centers

Web Modules for Garden Centers Web Modules for Garden Centers 503.248.2159 overview web modules Run an online plant library to give your customers better inventory visibility. Create an online market place. No need for website redesign

More information

COPYRIGHT ACT -- FAIR DEALING (Advisory for SUTD Faculty, Researchers, Staff and Students)

COPYRIGHT ACT -- FAIR DEALING (Advisory for SUTD Faculty, Researchers, Staff and Students) COPYRIGHT ACT -- FAIR DEALING (Advisory for SUTD Faculty, Researchers, Staff and Students) When determining whether copying of the whole or part of the work or adaptation constitutes fair dealing, the

More information

App Building Guidelines

App Building Guidelines App Building Guidelines App Building Guidelines Table of Contents Definition of Apps... 2 Most Recent Vintage Dataset... 2 Meta Info tab... 2 Extension yxwz not yxmd... 3 Map Input... 3 Report Output...

More information

SOUTH DAKOTA Reading and Communication Arts Standards Grade 9 Literature: The Reader s Choice Course 4 2002

SOUTH DAKOTA Reading and Communication Arts Standards Grade 9 Literature: The Reader s Choice Course 4 2002 SOUTH DAKOTA Reading and Communication Arts Standards Literature: The Reader s Choice Course 4 2002 OBJECTIVES Reading Goals and Indicators Ninth Grade Reading Goal 1: Students are able to read at increasing

More information

Data Integration through XML/XSLT. Presenter: Xin Gu

Data Integration through XML/XSLT. Presenter: Xin Gu Data Integration through XML/XSLT Presenter: Xin Gu q7.jar op.xsl goalmodel.q7 goalmodel.xml q7.xsl help, hurt GUI +, -, ++, -- goalmodel.op.xml merge.xsl goalmodel.input.xml profile.xml Goal model configurator

More information

AP CHEMISTRY 2007 SCORING GUIDELINES (Form B)

AP CHEMISTRY 2007 SCORING GUIDELINES (Form B) AP CHEMISTRY 2007 SCORING GUIDELINES (Form B) First Ionization Energy Question 6 Second Ionization Energy Third Ionization Energy (kj mol 1 ) (kj mol 1 ) (kj mol 1 ) Element 1 1,251 2,300 3,820 Element

More information

Detection and mitigation of Web Services Attacks using Markov Model

Detection and mitigation of Web Services Attacks using Markov Model Detection and mitigation of Web Services Attacks using Markov Model Vivek Relan RELAN1@UMBC.EDU Bhushan Sonawane BHUSHAN1@UMBC.EDU Department of Computer Science and Engineering, University of Maryland,

More information

Ivy Tech Community College of Indiana

Ivy Tech Community College of Indiana Ivy Tech Community College of Indiana POLICY TITLE Credit Transfer Awarding/Dual Credit POLICY NUMBER ASOM 4.3 PRIMARY RESPONSIBILITY Academic Affairs CREATION / REVISION / EFFECTIVE DATES Created September

More information

Working Title: Web Development/User Experience Specialist Classification: Analyst/Programmer (Career) Job Code: 0400 Range Code: 2

Working Title: Web Development/User Experience Specialist Classification: Analyst/Programmer (Career) Job Code: 0400 Range Code: 2 POSITION DESCRIPTION Department: Library Position Reports To: Associate Dean Working Title: Web Development/User Eperience Specialist Classification: Analyst/Programmer (Career) Job Code: 0400 Range Code:

More information

Supervised and unsupervised learning - 1

Supervised and unsupervised learning - 1 Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in

More information

1 st day Basic Training Course

1 st day Basic Training Course DATES AND LOCATIONS 13-14 April 2015 Princeton Marriott at Forrestal, 100 College Road East, Princeton NJ 08540, New Jersey 16-17 April 2015 Hotel Nikko San Francisco 222 Mason Street, San Francisco, CA

More information

UT Martin Password Policy May 2015

UT Martin Password Policy May 2015 UT Martin Password Policy May 2015 SCOPE The scope of this policy is applicable to all Information Technology (IT) resources owned or operated by the University of Tennessee at Martin. Any information

More information

Thesis Format Guide. Denise Robertson Graduate School Office 138 Woodland Street Room 104 508-793-7676 gradschool@clarku.edu

Thesis Format Guide. Denise Robertson Graduate School Office 138 Woodland Street Room 104 508-793-7676 gradschool@clarku.edu Thesis Format Guide This guide has been prepared to help graduate students prepare their research papers and theses for acceptance by Clark University. The regulations contained within have been updated

More information