Scala type classes and machine learning. David Andrzejewski Bay Area Scala Enthusiasts - 1/14/2013, 1/16/2013



Similar documents
Overview. Elements of Programming Languages. Advanced constructs. Motivating inner class example

Comp 411 Principles of Programming Languages Lecture 34 Semantics of OO Languages. Corky Cartwright Swarat Chaudhuri November 30, 20111

Summit Public Schools Summit, New Jersey Grade Level / Content Area: Mathematics Length of Course: 1 Academic Year Curriculum: AP Computer Science A

Analyse et Conception Formelle. Lesson 5. Crash Course on Scala

Java (12 Weeks) Introduction to Java Programming Language

Introduction to Bayesian Classification (A Practical Discussion) Todd Holloway Lecture for B551 Nov. 27, 2007

Web Development in Java

Semester Review. CSC 301, Fall 2015

Hadoop + Clojure. Hadoop World NYC Friday, October 2, Stuart Sierra, AltLaw.org

CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Mammoth Scale Machine Learning!

Computing Concepts with Java Essentials

COSC 6397 Big Data Analytics. Mahout and 3 rd homework assignment. Edgar Gabriel Spring Mahout

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Instruction Set Architecture (ISA)

f...-. I enterprise Amazon SimpIeDB Developer Guide Scale your application's database on the cloud using Amazon SimpIeDB Prabhakar Chaganti Rich Helms

CSE 373: Data Structure & Algorithms Lecture 25: Programming Languages. Nicki Dell Spring 2014

AP Computer Science Java Subset

High-Level Programming Languages. Nell Dale & John Lewis (adaptation by Michael Goldwasser)

COSC282 BIG DATA ANALYTICS FALL 2015 LECTURE 2 - SEP 9

OPINION MINING IN PRODUCT REVIEW SYSTEM USING BIG DATA TECHNOLOGY HADOOP

RARITAN VALLEY COMMUNITY COLLEGE ACADEMIC COURSE OUTLINE. CISY 105 Foundations of Computer Science

2.0. Specification of HSN 2.0 JavaScript Static Analyzer

Analysis Tools and Libraries for BigData

Question 2 Naïve Bayes (16 points)

Content-Based Recommendation

Java the UML Way: Integrating Object-Oriented Design and Programming

Customer Case Study. Sharethrough

Affdex SDK for Android. Developer Guide For SDK version 1.0

TECHNOLOGY Computer Programming II Grade: 9-12 Standard 2: Technology and Society Interaction

Chapter 6. The stacking ensemble approach

Kotlin for Android Developers

Habanero Extreme Scale Software Research Project

Why is Internal Audit so Hard?

Quantifind s story: Building custom interactive data analytics infrastructure

CSCI567 Machine Learning (Fall 2014)

6.170 Tutorial 3 - Ruby Basics

Java 6 'th. Concepts INTERNATIONAL STUDENT VERSION. edition

ARIZONA CTE CAREER PREPARATION STANDARDS & MEASUREMENT CRITERIA SOFTWARE DEVELOPMENT,

Big Data Frameworks: Scala and Spark Tutorial

An Introduction to Machine Learning and Natural Language Processing Tools

A Sumo Logic White Paper. Harnessing Continuous Intelligence to Enable the Modern DevOps Team

Projektgruppe. Categorization of text documents via classification

Machine Learning using MapReduce

Model Deployment. Dr. Saed Sayad. University of Toronto

Cloud Computing for Education Workshop

Learning from Data: Naive Bayes

The C Programming Language course syllabus associate level

Analytics on Big Data

Classification Problems

Introduction to Synoptic

Object-Oriented Design Lecture 4 CSU 370 Fall 2007 (Pucella) Tuesday, Sep 18, 2007

CS 106 Introduction to Computer Science I

Chapter 15 Functional Programming Languages

Distributed Computing and Big Data: Hadoop and MapReduce

Data Mining & Data Stream Mining Open Source Tools

Mining a Corpus of Job Ads

Spark: Cluster Computing with Working Sets

TYPESAFE TOGETHER - SUBSCRIBER TRAINING. Training Classes

How To Learn From The Revolution

Evolution of the Major Programming Languages

Machine Learning Final Project Spam Filtering

HDB++: HIGH AVAILABILITY WITH. l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg

Triplestore Testing in the Cloud with Clojure. Ryan Senior

CS 378 Big Data Programming. Lecture 9 Complex Writable Types

Towards Interface Types for Haskell

8. Machine Learning Applied Artificial Intelligence

A Scheme for Automation of Telecom Data Processing for Business Application

Sentiment analysis of Twitter microblogging posts. Jasmina Smailović Jožef Stefan Institute Department of Knowledge Technologies

Crowdfunding Support Tools: Predicting Success & Failure

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Handout 1. Introduction to Java programming language. Java primitive types and operations. Reading keyboard Input using class Scanner.

Charles Dierbach. Wiley

Introduction to Object-Oriented Programming

Machine Learning for Naive Bayesian Spam Filter Tokenization

CSCE 110 Programming I Basics of Python: Variables, Expressions, and Input/Output

T O B C A T C A S E G E O V I S A T DETECTIE E N B L U R R I N G V A N P E R S O N E N IN P A N O R A MISCHE BEELDEN

How Scala Improved Our Java

core. Volume I - Fundamentals Seventh Edition Sun Microsystems Press A Prentice Hall Title ULB Darmstadt

Security Testing For RESTful Applications

CSE 308. Coding Conventions. Reference

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Java Application Developer Certificate Program Competencies

Social Business Intelligence For Retail Industry

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Expedia

Java CPD (I) Frans Coenen Department of Computer Science

1 Maximum likelihood estimation

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Content visualization of scientific corpora using an extensible relational database implementation

CS 348: Introduction to Artificial Intelligence Lab 2: Spam Filtering

Social Media Mining. Data Mining Essentials

Class 16: Function Parameters and Polymorphism

Formal Engineering for Industrial Software Development

Data Mining - Evaluation of Classifiers

Glossary of Object Oriented Terms

The Clean programming language. Group 25, Jingui Li, Daren Tuzi

Statistical Machine Learning

VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter

Transcription:

Scala type classes and machine learning David Andrzejewski Bay Area Scala Enthusiasts - 1/14/2013, 1/16/2013

Context: Sumo Logic Mission: Transform Machine Data into IT and Business Insights Offering: First enterprise-class cloud-based log management and analytics service Funding: Accel, Greylock, and Sutter Hill Ventures Machine log data Sumo Logic Service Advanced analytics Scala

Machine learning example: spam classification Training data Hand-labeled e-mails Classifier Naive Bayes Test data Real e-mails "in the wild"

Machine learning example: more details Emails Actually important Label Feature vectors Classifier Fun and exciting

Feature vectors: machine learning Machine learning on feature vectors "Bob, your new project idea is awful." [1.3, 7.5, 0.0, 0.0, -4.2,...] Crucial design decision single words vs ngrams? token vs character? counts vs binary? binary presence / absence? weighting schemes (TF-IDF)? smoothing?

Feature vectors: software engineering Machine learning code only cares about features Input formats for popular libraries scikits.learn (Python): NumPy arrays weka (Java): Instance interface mahout (Hadoop-Java): Vector interface Common integration issues caller bookkeeping not type-safe / parameterized wrappers not easily composable

Polymorphism via type class pattern is-a type contract [T <: FeatureVector] inheritance / interfaces / sub-typing "type T is a calculator of its feature vector" has-a type contract [T : FeatureVector] type class pattern "type T has a calculator of its feature vector" Classic Scala examples: Numeric and Ordering

Context bound [T : FeatureVector] trait FeatureVector[T] { def features(x: T): Array[Double] class LinearClassifier[T : FeatureVector](weights: Array[Double]) { val featuremapping = implicitly[featurevector[t]] def innerproduct(x: Array[Double], y: Array[Double]): Double = { (0 until x.size).map{i => x(i) * y(i).sum def classify(x: T): Boolean = { innerproduct(featuremapping.features(x), weights) > 0

Context bound (de-sugared) trait FeatureVector[T] { def features(x: T): Array[Double] //class LinearClassifier[T : FeatureVector](weights: Array[Double]) { // val featuremapping = implicitly[featurevector[t]] class LinearClassifier[T](weights: Array[Double]) (implicit featuremapping: FeatureVector[T]) { def innerproduct(x: Array[Double], y: Array[Double]): Double = { (0 until x.size).map{i => x(i) * y(i).sum def classify(x: T): Boolean = { innerproduct(featuremapping.features(x), weights) > 0

Context bound: caller usage implicit object CharValueFeatures extends FeatureVector[String] { def features(x: String): Array[Double] = { val a = x.filter(_ == 'a').size.todouble val b = x.filter(_ == 'b').size.todouble Array[Double](a,b) val classifier = new LinearClassifier[String](Array(-1.0, 0.5)) println("%s classified as %b".format("aaab", classifier.classify ("aaab"))) aaab classified as false

"manual override" object CharCaseFeatures extends FeatureVector[String] { def features(x: String): Array[Double] = { val upper = x.filter(_.isupper).size.todouble val lower = x.filter(_.islower).size.todouble Array[Double](upper,lower) val classifier = new LinearClassifier[String](Array(-1.0, 0.5))(CharCaseFeatures) println("%s classified as %b".format("aaab", classifier.classify ("aaab"))) aaab classified as true

Why use this (or not)? Advantages extend code you don't own cleaner than inheritance / wrapping easily substitute evidence parameters avoid "over-stuffing" value types (eg, case classes) leverage type checking parameter-passing vs type annotation Disadvantages implicits can be tricky unfamiliar code idiom Other applications serialization

References Shameless plug: Sumo Logic Free (500 MB / day) https://www.sumologic.com/signup/ Scala type class pattern Sumo Logic blog post (+ references) http://www.sumologic.com/blog/company/scala-at-sumo-typeclasses-with-a-machine-learning-example Seth Tisue talk at nescala 2012 http://www.youtube.com/watch?v=yyo0ganyvie Scala in Depth by Joshua D. Suereth Other related stuff (I think?) Haskell typeclasses Clojure protocols Scalaz