Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams
|
|
- Emerald Greer
- 7 years ago
- Views:
Transcription
1 1 Relative N-Gram Signatures: Document Visualization at the Level of Character N-Grams Magdalena Jankowska, Evangelos Milios, Vlado Kešelj Faculty of Computer Science, Dalhousie University aaaa June 2013
2 2 Relative N-Gram Signatures Interactive classification of a document Who wrote this book? Analysis of characteristics of a document What are the characteristics of the author s style? Language independent method
3 5 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: Alice's Adventures in the Wonderland by Lewis Carroll
4 6 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC
5 7 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE
6 8 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE ICE_
7 9 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams Alice's Adventures in the Wonderland by Lewis Carroll ALIC LICE ICE_ CE_W
8 10 Character N-Grams Strings of n consecutive characters from a given text Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: n=4 4-grams ALIC LICE ICE_ CE_W n-grams in our system: uppercase Alice's Adventures in the Wonderland by Lewis Carroll each sequence of non-word characters replaced by an underscore
9 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? works of Carrol works of Twain works of Shakespeare Proposed by Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based author profiles for authorship attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 03,
10 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? comparison of the frequency of the most common character n-grams works of Carrol works of Twain works of Shakespeare Proposed by Vlado Kešelj, Fuchun Peng, Nick Cercone, and Calvin Thomas. N-gram-based author profiles for authorship attribution. In Proceedings of the Conference Pacific Association for Computational Linguistics, PACLING 03,
11 Common N-Gram (CNG) Classifier assigns a document to a class from a given set of classes? comparison of the frequency of the most common character n-grams works of Carrol works of Twain works of Shakespeare Applications: Authorship attribution Malicious code detection Gene classification Web page genre classification 13
12 14 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ A N D _ T H E _ T H E T H E f 1 _ T H E n=4, L=6 n-gram normalized frequency
13 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ distance(f 1 (x), f 2 (x) ) A N D _ T H E _ T H E _ n=4, L=6 _ T H E f 1 _ T H E f 2 f 2 (x)=0 f 1 (x)=0 15
14 CNG Classifier - Dissimilarity Profile a sequence of L most common n-grams of a given length n document 1: Alice's Adventures in the Wonderland by Lewis Carroll document 2: Tarzan of the Apes by Edgar Rice Burroughs _ T O _ I N G A N D _ A N D I N G O F _ A N D _ distance(f 1 (x), f 2 (x) ) A N D _ T H E _ T H E _ n=4, L=6 _ T H E f 1 _ T H E f 2 CNG dissimilarity between two documents f 1 (x)=0 sum of the distances with respect to all n-grams in the union of the profiles 16
15 Motivation text visualization on the language-independent level of character n-grams similarity of documents characteristics of documents visualization of the CNG classifier reasons for the classification result possibility of influencing the classification 17
16 RNG-Sig Web application 18 Implemented as a web application d3.js JavaScript library for visualization Available online with pre-loaded data at:
17 19 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Visual representation of the CNG dissimilarity between two documents n=4 (4-grams) L=500 (500 most common n-grams)
18 20 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs zoom with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Each strip represents an n-gram n=4 (4-grams) L=500 (500 most common n-grams)
19 21 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) n=4 (4-grams) L=500 (500 most common n-grams) Each strip represents an n-gram 500 most common n-grams of Alice's Adventures decreasing frequency in Alice's Adventures
20 22 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) n=4 (4-grams) L=500 (500 most common n-grams) these among 500 most common n-grams of Tarzan that are not among 500 most common n-grams of Alice's Adventures Each strip represents an n-gram 500 most common n-grams of Alice's Adventures decreasing frequency in Tarzan decreasing frequency in Alice's Adventures
21 23 Relative N-Gram Signature Relative signature of Tarzan of the Apes by Burroughs with respect to ( on the background of ) Alice's Adventures in the Wonderland by Carroll (base document) Color: distance of two documents with respect to this n-gram n=4 (4-grams) L=500 (500 most common n-grams)
22 24 Relative N-Gram Signature Visualizes similarity between documents on the level of character n-grams Relative signature of two documents that do not share any of their respective 500 most common n-grams Signature of a document with respect to itself
23 Relative N-Gram Signature Visual metaphor 25 Inspiration: emission spectrum spectrum of frequencies of electromagnetic emissions by atoms or molecules picture from Wikipedia
24 26 Sequence of signatures scenario authorship analysis
25 27 Sequence of signatures The same base document
26 28 Sequence of signatures Signature of the most similar document Carrol s Through the looking glass
27 29 Sequence of signatures CNG dissimilarity score sum of the distances over all n-grams in a signature
28 30 Sequence of signatures minimum dissimilarity = classifier result
29 31 Sequence of signatures zooming in
30 Interactive exploration of signatures 32 browsing
31 Interactive exploration of signatures 33 context: most common words
32 Interactive exploration of signatures 34
33 Interactive exploration of signatures 35
34 Interactive exploration of signatures 36 context: concordance style given n-gram within the text
35 Language independence 37 Polish authors
36 Language independence 38 Polish authors searching for n-grams
37 Language independence 39 Polish authors
38 40 Motivation for analysis of Mark Twain novels D. A. Keim and D. Oelke. Literature Fingerprinting: A New Method for Visual Literary Analysis. In Proceedings of the 2007 IEEE Symposium on Visual Analytics Science and Technology, Hapax Legomena Visual analysis of works of Mark Twain: Adventures of Huckleberry Finn stands out from the other works of Mark Twain with respect to: Function words frequency Simpson's index Hapax Legomena Function words (first dimension after PCA)
39 Example analysis: comparison of novels by Mark Twain Adventures of Huckleberry Finn 41
40 42 Complementary Comparison View N-grams ordered separately in each signature, according to their distance Better for comparison of signatures but not for exploring
41 43 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Manual, task-dependent adaptation of the classification process Problem G, sample 02
42 44 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Problem G, sample 02
43 45 Interactively influencing the visualization and the classifier Ad-hoc Authorship Attribution Competition, 2004 Problem G, sample 02
44 Interactively influencing the visualization and the classifier n-grams originating mostly from proper names ignoring selected n-grams in the base document Two options: the length of the list of n-grams in the base document is kept Intact by adding less frequent n-grams at the top no new n-grams are added the list of n-grams for the base document becomes shorter 46
45 Interactively influencing the visualization and the classifier ignoring selected n-grams in the base document correct classification result 47
46 Thank you! 65
Visual Analytics: Combining Automated Discovery with Interactive Visualizations
Visual Analytics: Combining Automated Discovery with Interactive Visualizations Daniel A. Keim, Florian Mansmann, Daniela Oelke, and Hartmut Ziegler University of Konstanz, Germany first.lastname@uni-konstanz.de,
More informationEvaluation of Authorship Attribution Software on a Chat Bot Corpus
Evaluation of Authorship Attribution Software on a Chat Bot Corpus Nawaf Ali Computer Engineering and Computer Science J. B. Speed School of Engineering University of Louisville Louisville, KY. USA ntali001@louisville.edu
More informationAdaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering
IV International Congress on Ultra Modern Telecommunications and Control Systems 22 Adaptive Framework for Network Traffic Classification using Dimensionality Reduction and Clustering Antti Juvonen, Tuomo
More informationChapter 1 Learning to Program With Alice
Chapter 1 Learning to Program With Alice (Duke Book) Rather than typing a difficult, long and obscure computer language in the hopes of getting a calculation, with Alice you will be more like a director
More informationBlogs and Twitter Feeds: A Stylometric Environmental Impact Study
Blogs and Twitter Feeds: A Stylometric Environmental Impact Study Rebekah Overdorf, Travis Dutko, and Rachel Greenstadt Drexel University Philadelphia, PA {rjo43,tad82,greenie}@drexel.edu http://www.cs.drexel.edu/
More informationGraduate Studies in Computer Science at Dalhousie University. Evangelos Milios Faculty of Computer Science Dalhousie University www.cs.dal.
Graduate Studies in Computer Science at Dalhousie University Evangelos Milios Faculty of Computer Science Dalhousie University www.cs.dal.ca/~eem Bird s eye view of Halifax Halifax Fun Halifax, Nova Scotia
More informationReadability Visualization for Massive Text Data
, pp.241-248 http://dx.doi.org/10.14257/ijmue.2014.9.9.25 Readability Visualization for Massive Text Data Hyoyoung Kim *, Jin Wan Park * and Dongsu Seo ** GSAIM, Chung-Ang University * The.kimyo@gmail.com,
More informationLINCOLN SCHOOL 2015-16 Course Syllabus: English 7 Theme: How does literature challenge, change, and define us?
LINCOLN SCHOOL 2015-16 Course Syllabus: English 7 Theme: How does literature challenge, change, and define us? Teacher: Room: Length: Ms. Jenny Nam / jnam@lincnet.org / 781-259- 9408 ext. 1127 B- 127 3
More informationWrite my paper intelligence studies. Physics accompanies submission of dissertation in Part I and submission of a Project.
Write my paper intelligence studies. Physics accompanies submission of dissertation in Part I and submission of a Project. Write my paper intelligence studies >>>CLICK HERE
More informationRNA Structure and folding
RNA Structure and folding Overview: The main functional biomolecules in cells are polymers DNA, RNA and proteins For RNA and Proteins, the specific sequence of the polymer dictates its final structure
More information2.0. Specification of HSN 2.0 JavaScript Static Analyzer
2.0 Specification of HSN 2.0 JavaScript Static Analyzer Pawe l Jacewicz Version 0.3 Last edit by: Lukasz Siewierski, 2012-11-08 Relevant issues: #4925 Sprint: 11 Summary This document specifies operation
More informationAuthor Gender Identification of English Novels
Author Gender Identification of English Novels Joseph Baena and Catherine Chen December 13, 2013 1 Introduction Machine learning algorithms have long been used in studies of authorship, particularly in
More informationComputer Aided Document Indexing System
Computer Aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić, Jan Šnajder Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, 0000 Zagreb, Croatia
More informationIntegrating Web Content Clustering into Web Log Association Rule Mining
Integrating Web Content Clustering into Web Log Association Rule Mining Jiayun Guo, Vlado Kešelj, and Qigang Gao Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS,
More informationGenetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware
International Journal of Machine Learning and Computing, Vol. 4, No. 3, June 2014 Genetic & Evolutionary Feature Selection for Author Identification of HTML Associated with Malware Henry C. Williams, Joi
More informationMap-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering
Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong
More informationFrom lowest energy to highest energy, which of the following correctly orders the different categories of electromagnetic radiation?
From lowest energy to highest energy, which of the following correctly orders the different categories of electromagnetic radiation? From lowest energy to highest energy, which of the following correctly
More informationData Deduplication in Slovak Corpora
Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences, Bratislava, Slovakia Abstract. Our paper describes our experience in deduplication of a Slovak corpus. Two methods of deduplication a plain
More informationInterpreting areading Scaled Scores for Instruction
Interpreting areading Scaled Scores for Instruction Individual scaled scores do not have natural meaning associated to them. The descriptions below provide information for how each scaled score range should
More informationStatistical Validation and Data Analytics in ediscovery. Jesse Kornblum
Statistical Validation and Data Analytics in ediscovery Jesse Kornblum Administrivia Silence your mobile Interactive talk Please ask questions 2 Outline Introduction Big Questions What Makes Things Similar?
More informationDetecting Internet Worms Using Data Mining Techniques
Detecting Internet Worms Using Data Mining Techniques Muazzam SIDDIQUI Morgan C. WANG Institute of Simulation & Training Department of Statistics and Actuarial Sciences University of Central Florida University
More informationMaster of Arts. Program in English
Master of Arts Program in English Indiana University East Department of English Program Contact: Edwina Helton, Director of Graduate Programs in English edhelton@iue.edu Master of Arts in English The Master
More information9. Text & Documents. Visualizing and Searching Documents. Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08
9. Text & Documents Visualizing and Searching Documents Dr. Thorsten Büring, 20. Dezember 2007, Vorlesung Wintersemester 2007/08 Slide 1 / 37 Outline Characteristics of text data Detecting patterns SeeSoft
More informationUnit 10.4: Stories of Other Worlds: Science Fiction, Fantasy, and Imaginative Literature
Unit 10.4: Stories of Other Worlds: Science Fiction, Fantasy, and Imaginative Literature The final quarter of the year gives students opportunities to let their minds roam free to distant or imagined worlds,
More informationGuidelines for Establishment of Contract Areas Computer Science Department
Guidelines for Establishment of Contract Areas Computer Science Department Current 07/01/07 Statement: The Contract Area is designed to allow a student, in cooperation with a member of the Computer Science
More informationMicrosoft Band Web Tile
Band Web Tile Web Tile Documentation By using this Band Web Tile, you agree to be bound by the Terms of Use. Further, if accepting on behalf of a company, then you represent that you are authorized to
More informationContinuous Biometric User Authentication in Online Examinations
2010 Seventh International Conference on Information Technology Continuous Biometric User Authentication in Online Examinations Eric Flior, Kazimierz Kowalski Department of Computer Science, California
More informationComputer-aided Document Indexing System
Journal of Computing and Information Technology - CIT 13, 2005, 4, 299-305 299 Computer-aided Document Indexing System Mladen Kolar, Igor Vukmirović, Bojana Dalbelo Bašić and Jan Šnajder,, An enormous
More informationStrategy Formulation in Japanese Management
Strategy Formulation in Japanese Management Copyright 2007 Keio University Quote of the Day Cheshire Puss, she (Alice) began would you tell me, please, which way to got from there? That depends a good
More informationAbstract. Introduction
CODATA Prague Workshop Information Visualization, Presentation, and Design 29-31 March 2004 Abstract Goals of Analysis for Visualization and Visual Data Mining Tasks Thomas Nocke and Heidrun Schumann University
More informationApplying Static Analysis to High-Dimensional Malicious Application Detection
Applying Static Analysis to High-Dimensional Malicious Application Detection Sean Semple, Stanislav Ponomarev, Jan Durand, Travis Atkison Louisiana Tech University Ruston, LA 71270 {sms079, spo013, jrd037,
More informationCross-Language Authorship Attribution
Cross-Language Authorship Attribution Dasha Bogdanova (1), Angeliki Lazaridou (2) (1) CNGL Centre for Global Intelligent Content, School of Computing, Dublin City University, Dublin, Ireland (2) Center
More informationINFRARED SPECTROSCOPY (IR)
INFRARED SPECTROSCOPY (IR) Theory and Interpretation of IR spectra ASSIGNED READINGS Introduction to technique 25 (p. 833-834 in lab textbook) Uses of the Infrared Spectrum (p. 847-853) Look over pages
More informationSearch Engines. Stephen Shaw <stesh@netsoc.tcd.ie> 18th of February, 2014. Netsoc
Search Engines Stephen Shaw Netsoc 18th of February, 2014 Me M.Sc. Artificial Intelligence, University of Edinburgh Would recommend B.A. (Mod.) Computer Science, Linguistics, French,
More informationDetermining if Two Documents are by the Same Author
DETERMINING IF TWO DOCUMENTS ARE BY THE SAME AUTHOR 1 Determining if Two Documents are by the Same Author Moshe Koppel (Corresponding Author) Dept. of Computer Science Bar-Ilan University Ramat-Gan, Israel
More informationElectromagnetic Radiation (EMR) and Remote Sensing
Electromagnetic Radiation (EMR) and Remote Sensing 1 Atmosphere Anything missing in between? Electromagnetic Radiation (EMR) is radiated by atomic particles at the source (the Sun), propagates through
More informationTechnical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold Killian.Thiel@uni-konstanz.de Michael.Berthold@uni-konstanz.de Copyright 2012 by KNIME.com AG
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationThe First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project
Seminar on Dec 19 th Abstracts & speaker information The First Online 3D Epigraphic Library: The University of Florida Digital Epigraphy and Archaeology Project Eleni Bozia (USA) Angelos Barmpoutis (USA)
More informationThe Title of a Yale University Doctoral. Dissertation
The Title of a Yale University Doctoral Dissertation A Dissertation Presented to the Faculty of the Graduate School of Yale University in Candidacy for the Degree of Doctor of Philosophy by The Author
More informationTIBCO Spotfire Network Analytics 1.1. User s Manual
TIBCO Spotfire Network Analytics 1.1 User s Manual Revision date: 26 January 2009 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO
More information2015-2016 North Dakota Advanced Placement (AP) Course Codes. Computer Science Education Course Code 23580 Advanced Placement Computer Science A
2015-2016 North Dakota Advanced Placement (AP) Course Codes Computer Science Education Course Course Name Code 23580 Advanced Placement Computer Science A 23581 Advanced Placement Computer Science AB English/Language
More informationMining a Corpus of Job Ads
Mining a Corpus of Job Ads Workshop Strings and Structures Computational Biology & Linguistics Jürgen Jürgen Hermes Hermes Sprachliche Linguistic Data Informationsverarbeitung Processing Institut Department
More informationBiometric Authentication using Online Signatures
Biometric Authentication using Online Signatures Alisher Kholmatov and Berrin Yanikoglu alisher@su.sabanciuniv.edu, berrin@sabanciuniv.edu http://fens.sabanciuniv.edu Sabanci University, Tuzla, Istanbul,
More informationTattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks
1 Tattoo Detection for Soft Biometric De-Identification Based on Convolutional NeuralNetworks Tomislav Hrkać, Karla Brkić, Zoran Kalafatić Faculty of Electrical Engineering and Computing University of
More informationAdventures in Alice Programming
California State University, Northridge Summer Academic Enrichment Program Adventures in Alice Programming Course Overview: The Adventures in Alice Programming class teaches the student how to develop
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationPENNSYLVANIA COMMON CORE STANDARDS English Language Arts Grades 9-12
1.2 Reading Informational Text Students read, understand, and respond to informational text with emphasis on comprehension, making connections among ideas and between texts with focus on textual evidence.
More informationContent Management System User Guide
CWD Clark Web Development Ltd Content Management System User Guide Version 1.0 1 Introduction... 3 What is a content management system?... 3 Browser requirements... 3 Logging in... 3 Page module... 6 List
More informationCaml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr
Caml Virtual Machine File & data formats Document version: 1.4 http://cadmium.x9c.fr Copyright c 2007-2010 Xavier Clerc cadmium@x9c.fr Released under the LGPL version 3 February 6, 2010 Abstract: This
More informationVisualizing Repertory Grid Data for Formative Assessment
Visualizing Repertory Grid Data for Formative Assessment Kostas Pantazos 1, Ravi Vatrapu 1, 2 and Abid Hussain 1 1 Computational Social Science Laboratory (CSSL) Department of IT Management, Copenhagen
More informationSuccessful graduates of the MA program in American Studies will be awarded a degree in which the following two elements will appear:
Abbreviations STUDY UNIT LIST (MA) D = MA degree thesis (Hungarian szakdolgozat ) G = seminar (practical class; Hu gyakorlat ) K = lecture (Hu kollokvium ) Sz = comprehensive examination (Hu szigorlat
More informationVisualizing molecular simulations
Visualizing molecular simulations ChE210D Overview Visualization plays a very important role in molecular simulations: it enables us to develop physical intuition about the behavior of a system that is
More informationGroup Theory and Chemistry
Group Theory and Chemistry Outline: Raman and infra-red spectroscopy Symmetry operations Point Groups and Schoenflies symbols Function space and matrix representation Reducible and irreducible representation
More informationA Practical Attack to De Anonymize Social Network Users
A Practical Attack to De Anonymize Social Network Users Gilbert Wondracek () Thorsten Holz () Engin Kirda (Institute Eurecom) Christopher Kruegel (UC Santa Barbara) http://iseclab.org 1 Attack Overview
More informationTerm extraction for user profiling: evaluation by the user
Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,
More informationInteractive Visual Data Analysis in the Times of Big Data
Interactive Visual Data Analysis in the Times of Big Data Cagatay Turkay * gicentre, City University London Who? Lecturer (Asst. Prof.) in Applied Data Science Started December 2013 @ the gicentre (gicentre.net)
More informationSchool Library Website Components
School Library Website Components by Odin Jurkowski Don t frustrate students by designing a flashy site that is difficult for them to use. School library websites are a necessity. The earliest adopters
More informationCreating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows
Creating While Loops with Microsoft SharePoint Designer Workflows Using Stateful Workflows Published by Nick Grattan Consultancy Limited 2009. All rights reserved. Version 1.00. Nick Grattan Consultancy
More informationInteractive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
More informationLesson 15 - Fill Cells Plugin
15.1 Lesson 15 - Fill Cells Plugin This lesson presents the functionalities of the Fill Cells plugin. Fill Cells plugin allows the calculation of attribute values of tables associated with cell type layers.
More informationGetting Started with Scratch
Getting Started with Scratch a guide to designing introductory Scratch workshops draft version, september 2009 Overview There s no one way to host a Scratch workshop. Workshops can take on a variety of
More informationVisualizing Poetry: Creating Tools for Critical Analysis. Introduction Current debates over distant reading (Moretti) seem to imply that digital tools
Visualizing Poetry: Creating Tools for Critical Analysis Luis Meneses and Richard Furuta Introduction Current debates over distant reading (Moretti) seem to imply that digital tools are suited to nothing
More informationCustom Linetypes (.LIN)
Custom Linetypes (.LIN) AutoCAD provides the ability to create custom linetypes or to adjust the linetypes supplied with the system during installation. Linetypes in AutoCAD can be classified into two
More informationRolling the Dice on Big Data. Ilse Ipsen Department of Mathematics
Rolling the Dice on Big Data Ilse Ipsen Department of Mathematics The Economist, 27 February 2010 Science, 11 February 2011 McKinsey Global Institute, May 2011 Rolling the Dice on Big Data What is Big?
More informationProceedings of Student/Faculty Research Day, CSIS, Pace University, May 4 th, 2007
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 4 th, 2007 The Use of Stylometry for Email Author Identification: A Feasibility Study Robert Goodman, Matthew Hahn, Madhuri Marella,
More informationDevelopment of Leadership Skills in Engineering Employment. Charles W Turner Emeritus Professor of Electrical Engineering, King s College London
Development of Leadership Skills in Engineering Employment Charles W Turner Emeritus Professor of Electrical Engineering, King s College London IEEE Education Society Continuing Education Program This
More informationNarrative Literature Response Letters Grade Three
Ohio Standards Connection Writing Applications Benchmark A Write narrative accounts that develop character, setting and plot. Indicator: 1 Write stories that sequence events and include descriptive details
More informationThis use study analyzes a specific scenario for a financial credit interaction for an online personal loan request.
Case Study: Online Personal Loan Scenario v.01 Author: Domenico Catalano Introduction Personal Information sharing is an emerging trend for online personal daily life activities, including the interaction
More informationSimple Language Models for Spam Detection
Simple Language Models for Spam Detection Egidio Terra Faculty of Informatics PUC/RS - Brazil Abstract For this year s Spam track we used classifiers based on language models. These models are used to
More informationELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING
ELEVATING FORENSIC INVESTIGATION SYSTEM FOR FILE CLUSTERING Prashant D. Abhonkar 1, Preeti Sharma 2 1 Department of Computer Engineering, University of Pune SKN Sinhgad Institute of Technology & Sciences,
More informationReading is the process in which the reader constructs meaning by interacting with the text.
Part 1 Reading is the process in which the reader constructs meaning by interacting with the text. This interactive process involves the reader s prior knowledge, the text, and the reading situation. Literal
More informationInteractive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents
Interactive Timeline Viewer (ItLv): A Tool to Visualize Variants Among Documents Carlos Monroy, Rajiv Kochumman, Richard Furuta, and Eduardo Urbina TEES Center for the Study of Digital Libraries Texas
More informationWAFFle: Fingerprinting Filter Rules of Web Application Firewalls
Email: sebastian.schinzel@cs.fau.de Twitter: @seecurity WAFFle: Fingerprinting Filter Rules of Web Application Firewalls Isabell Schmitt, Sebastian Schinzel* Friedrich-Alexander Universität Erlangen-Nürnberg
More informationClassifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
More informationChemistry 102 Summary June 24 th. Properties of Light
Chemistry 102 Summary June 24 th Properties of Light - Energy travels through space in the form of electromagnetic radiation (EMR). - Examples of types of EMR: radio waves, x-rays, microwaves, visible
More informationLearning is a very general term denoting the way in which agents:
What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);
More informationCOSC 6397 Big Data Analytics. Mahout and 3 rd homework assignment. Edgar Gabriel Spring 2014. Mahout
COSC 6397 Big Data Analytics Mahout and 3 rd homework assignment Edgar Gabriel Spring 2014 Mahout Scalable machine learning library Built with MapReduce and Hadoop in mind Written in Java Focusing on three
More informationGlobal Music Management MPAMB-GE.2207 3 points; NYU in London; Spring 2016
DEPAR TMEN T O F MU SIC AND PER FOR MING AR TS PR OFESSION S Music Business Program Sample January 2016 Syllabus for website only -- subject to change Global Music Management MPAMB-GE.2207 3 points; NYU
More informationCyber Security Through Visualization
Cyber Security Through Visualization Kwan-Liu Ma Department of Computer Science University of California at Davis Email: ma@cs.ucdavis.edu Networked computers are subject to attack, misuse, and abuse.
More informationVisual Structure Analysis of Flow Charts in Patent Images
Visual Structure Analysis of Flow Charts in Patent Images Roland Mörzinger, René Schuster, András Horti, and Georg Thallinger JOANNEUM RESEARCH Forschungsgesellschaft mbh DIGITAL - Institute for Information
More informationLearning Objectives. Required Resources. Tasks. Deliverables
Fleet Modeling 10 Purpose This activity introduces you to the Vehicle Routing Problem (VRP) and fleet modeling through the use of a previously developed model. Using the model, you will explore the relationships
More informationINFORMATION VISUALIZATION TECHNIQUES USAGE MODEL
INFORMATION VISUALIZATION TECHNIQUES USAGE MODEL Akanmu Semiu A. 1 and Zulikha Jamaludin 2 1 Universiti Utara Malaysia, Malaysia, ayobami.sm@gmail.com 2 Universiti Utara Malaysia, Malaysia, zulie@uum.edu.my
More informationPOSBIOTM-NER: A Machine Learning Approach for. Bio-Named Entity Recognition
POSBIOTM-NER: A Machine Learning Approach for Bio-Named Entity Recognition Yu Song, Eunji Yi, Eunju Kim, Gary Geunbae Lee, Department of CSE, POSTECH, Pohang, Korea 790-784 Soo-Jun Park Bioinformatics
More informationMovie Classification Using k-means and Hierarchical Clustering
Movie Classification Using k-means and Hierarchical Clustering An analysis of clustering algorithms on movie scripts Dharak Shah DA-IICT, Gandhinagar Gujarat, India dharak_shah@daiict.ac.in Saheb Motiani
More informationAtomic Calculations. 2.1 Composition of the Atom. number of protons + number of neutrons = mass number
2.1 Composition of the Atom Atomic Calculations number of protons + number of neutrons = mass number number of neutrons = mass number - number of protons number of protons = number of electrons IF positive
More informationWeb Modules for Garden Centers
Web Modules for Garden Centers 503.248.2159 overview web modules Run an online plant library to give your customers better inventory visibility. Create an online market place. No need for website redesign
More informationCOPYRIGHT ACT -- FAIR DEALING (Advisory for SUTD Faculty, Researchers, Staff and Students)
COPYRIGHT ACT -- FAIR DEALING (Advisory for SUTD Faculty, Researchers, Staff and Students) When determining whether copying of the whole or part of the work or adaptation constitutes fair dealing, the
More informationApp Building Guidelines
App Building Guidelines App Building Guidelines Table of Contents Definition of Apps... 2 Most Recent Vintage Dataset... 2 Meta Info tab... 2 Extension yxwz not yxmd... 3 Map Input... 3 Report Output...
More informationSOUTH DAKOTA Reading and Communication Arts Standards Grade 9 Literature: The Reader s Choice Course 4 2002
SOUTH DAKOTA Reading and Communication Arts Standards Literature: The Reader s Choice Course 4 2002 OBJECTIVES Reading Goals and Indicators Ninth Grade Reading Goal 1: Students are able to read at increasing
More informationData Integration through XML/XSLT. Presenter: Xin Gu
Data Integration through XML/XSLT Presenter: Xin Gu q7.jar op.xsl goalmodel.q7 goalmodel.xml q7.xsl help, hurt GUI +, -, ++, -- goalmodel.op.xml merge.xsl goalmodel.input.xml profile.xml Goal model configurator
More informationAP CHEMISTRY 2007 SCORING GUIDELINES (Form B)
AP CHEMISTRY 2007 SCORING GUIDELINES (Form B) First Ionization Energy Question 6 Second Ionization Energy Third Ionization Energy (kj mol 1 ) (kj mol 1 ) (kj mol 1 ) Element 1 1,251 2,300 3,820 Element
More informationDetection and mitigation of Web Services Attacks using Markov Model
Detection and mitigation of Web Services Attacks using Markov Model Vivek Relan RELAN1@UMBC.EDU Bhushan Sonawane BHUSHAN1@UMBC.EDU Department of Computer Science and Engineering, University of Maryland,
More informationIvy Tech Community College of Indiana
Ivy Tech Community College of Indiana POLICY TITLE Credit Transfer Awarding/Dual Credit POLICY NUMBER ASOM 4.3 PRIMARY RESPONSIBILITY Academic Affairs CREATION / REVISION / EFFECTIVE DATES Created September
More informationWorking Title: Web Development/User Experience Specialist Classification: Analyst/Programmer (Career) Job Code: 0400 Range Code: 2
POSITION DESCRIPTION Department: Library Position Reports To: Associate Dean Working Title: Web Development/User Eperience Specialist Classification: Analyst/Programmer (Career) Job Code: 0400 Range Code:
More informationSupervised and unsupervised learning - 1
Chapter 3 Supervised and unsupervised learning - 1 3.1 Introduction The science of learning plays a key role in the field of statistics, data mining, artificial intelligence, intersecting with areas in
More information1 st day Basic Training Course
DATES AND LOCATIONS 13-14 April 2015 Princeton Marriott at Forrestal, 100 College Road East, Princeton NJ 08540, New Jersey 16-17 April 2015 Hotel Nikko San Francisco 222 Mason Street, San Francisco, CA
More informationUT Martin Password Policy May 2015
UT Martin Password Policy May 2015 SCOPE The scope of this policy is applicable to all Information Technology (IT) resources owned or operated by the University of Tennessee at Martin. Any information
More informationThesis Format Guide. Denise Robertson Graduate School Office 138 Woodland Street Room 104 508-793-7676 gradschool@clarku.edu
Thesis Format Guide This guide has been prepared to help graduate students prepare their research papers and theses for acceptance by Clark University. The regulations contained within have been updated
More information