Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University



Similar documents
Dawn Broschard, EdD Senior Research Analyst Office of Retention and Graduation Success

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

Predictive Analytics Certificate Program

Hexaware E-book on Predictive Analytics

Why is Internal Audit so Hard?

An intelligent tool for expediting and automating data mining steps. Ourania Hatzi, Nikolaos Zorbas, Mara Nikolaidou and Dimosthenis Anagnostopoulos

Application of Predictive Model for Elementary Students with Special Needs in New Era University

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

Table of Contents. Chapter No. 1 Introduction 1. iii. xiv. xviii. xix. Page No.

W. Heath Rushing Adsurgo LLC. Harness the Power of Text Analytics: Unstructured Data Analysis for Healthcare. Session H-1 JTCC: October 23, 2015

Benchmarking of different classes of models used for credit scoring

An Introduction to Health Informatics for a Global Information Based Society

Information Management course

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

IST565 M001 Yu Spring 2015 Syllabus Data Mining

Sunnie Chung. Cleveland State University

IT services for analyses of various data samples

Master Specialization in Knowledge Engineering

Composition Studies. Graduate Certificate Program. Online Certificate Program in English. Indiana University East Department of English

IBM SPSS Modeler Premium

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

An interdisciplinary model for analytics education

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

Sentiment analysis on tweets in a financial domain

Social Media Implementations

Analytics: An exploration of the nomenclature in the student experience.

Big Data: A Closer Look

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Real World Application and Usage of IBM Advanced Analytics Technology

Data Mining Solutions for the Business Environment

Introduction to Data Mining

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

IBM Predictive Analytics Solutions for Education

Course Descriptions: Undergraduate/Graduate Certificate Program in Data Visualization and Analysis

Data Science Certificate Program

Master of Science in Computer Science Information Systems

Predictive Analytics & Predictive Modeling December 2 3, Catherine Snyder Supervisor US Dealer Audit, Audit Services General Motors Company

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

IBM SPSS Direct Marketing

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

SEO Consulting Services By Cromosys. [Strategy & Plan]

Predictive analytics. The rise and value of predictive analytics in enterprise decision making

testo dello schema Secondo livello Terzo livello Quarto livello Quinto livello

The Big Data Revolution And How to Extract Value from Big Data

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

COMP9321 Web Application Engineering

Introduction Predictive Analytics Tools: Weka

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Get the most value from your surveys with text analysis

Programme Specification Postgraduate Programmes

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

Perm State University Master in Finance & Information Technology (MiFIT)

Solve Your Toughest Challenges with Data Mining

Machine Learning and Data Mining. Fundamentals, robotics, recognition

PREDICTING STUDENT RETENTION & SUCCESS IN ONLINE PROGRAMS. William Bloemer & Karen Swan UNIVERSITY OF ILLINOIS SPRINGFIELD

Maximizing Return and Minimizing Cost with the Decision Management Systems

Solve your toughest challenges with data mining

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia

DEPARTMENT OF INFORMATION AND LIBRARY SCIENCE

An Introduction to WEKA. As presented by PACE

Prof. Timothy Shea Charlton College of Business Southcoast E-Commerce Conference 2015

Big Data and Analytics: Challenges and Opportunities

DATA MINING - SELECTED TOPICS

Forensic & Investigative Accounting (FIA) Section American Accounting Association Mission, Objectives and Strategy.

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

White Paper. Data Mining for Business

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

PREDICTIVE ANALYTICS DEMYSTIFIED

Virtual Site Event. Predictive Analytics: What Managers Need to Know. Presented by: Paul Arnest, MS, MBA, PMP February 11, 2015

Improve Model Accuracy with Unstructured Data

Massive Cloud Auditing using Data Mining on Hadoop

Bachelor of Bachelor of Computer Science

COURSE SYLLABUS. Instructor Information:

Data Mining Applications in Higher Education

Transcription:

Mining Text Data for Useful Information in Higher Education John Zilvinskis Indiana University

Institutional Researchers Credo We have not succeeded in answering all our problems indeed we sometimes feel we have not completely answered any of them. The answers we have found have only served to raise a whole set of new questions. In some ways we feel that we are as confused as ever, but we think we are confused on a higher level and about more important things. Earl C. Kelley, Professor of Secondary Education at Wayne University, 1951

Presentation Overview 1. Describe basic concepts of text mining 2. Invite presentation attendees to ask questions and discuss application of this technology 3. List the differences in text mining software 4. Apply this technique to two real life examples 5. Provide implications and considerations

Raise your hand if You have a general understanding of text mining Keep your hand up if You have or someone you know has participated in a text mining project You have played a significant role in at least one project that used text mining You have written code for or worked on several text mining projects

Learning Outcomes As a result of attending this session, participants will be able to: List fundamental methodologies for organizing text data. Describe how one could integrate mined text in student learning and performance analytics. Compare the differences between text mining software packages. Use text mining methods to refine survey questions.

Big Data & Data Mining Big Data (Laney) volume (amount of data) velocity (speed of data) variety (range of data types and sources) Data Mining - Applying algorithms to big data to generate new information

Analytics Predictive, Automated, Scale, Real time Data mining to create actionable intelligence (Campbell, DeBlois, & Oblinger, 2007, p. 42) Learning v. Student Analytics

Text Mining The need to turn text into numbers so powerful algorithms can be applied to large document databases (Miner, Delen, Elder, Fast, Hill, & Nisbet, 2012, p. 30) Text analytics volume (amount of data) velocity (speed of data) variety (range of data types and sources)

Citation Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications Miner, Delen, Elder, Fast, Hill, & Nisbet, 2012

Text Mining Processes Define project and identify data Process data: Establish a corpus, Pre-process data, Extract knowledge Develop models Evaluate results Disseminate results

Extract Knowledge Classification Clustering Association Trend analysis

Why Not Qualitative Research? Requires extensive resources Data must be processed in a timely fashion Might not be practical with big data Information must integrate with other data

What Kind of Text Can We Mine? For What Purpose Should We Mine? Perhaps attendees could share what type of textbased datasets are available to them or which ones they would like to have access to. This may help IR staff recognize what text they have access to and can analyze in addition to learning how they may conduct such analyses. AIR Program Reviewer

How Can We Mine Text in IR? Kind of Data Application essays Written assignments CMS postings Student blogs Course evaluations Surveys E-portfolios Early alert, course drop text For What Purpose Acceptance, enrollment Likelihood of passing Participation Change in student major Faculty success Open-ended questions Student success Student performance

Software Freeware RapidMiner Easy user interface, inverse document frequencies, some aspects for purchase Weka/KEA R Applicable to machine learning, some resources Computer science heavy, many online resources Commercial Software Modeler Premium (SPSS, IBM), strong user interface, other analytics tools, easy to use and comprehensive dictionary Enterprise Miner (SAS), moderate user interface, comprehensive data manipulation, and integrated clustering function

Classifying Open Ended Responses National Survey of Student Engagement Experimental item set leadership Formal leadership core item 1,482 of 4,836 students listed other Classified 830 (56%) entries

Classifying Open Ended Responses Position n % of other Tutoring 145 9.8% Teaching Assistant 87 5.9% Research Assistant 60 4.0% Secretary 55 3.7% Treasurer 57 3.8% Mentor 54 3.6% Member 51 3.4% Editor 25 1.7%

Classifying Open Ended Responses Position Did Not Complete Formal Leadership Completed Formal Leadership Original Option n % n % Resident Assistant 206 34.3% 395 65.7% Diversity Advocate 28 38.9% 44 61.1% Judicial Officer 20 37.7% 33 62.3% President 41 4.6% 846 95.4% Write-In Other n % n % Tutoring 77 53.1% 68 46.9% Teaching Assistant 44 50.6% 43 49.4% Treasurer 13 23.6% 42 76.4% Editor 5 20.0% 20 80.0%

Clustering E-Portfolio Submissions City University of New York (CUNY) Guttman High touch, block scheduling, learning communities, summer bridge Bill and Melinda Gates grant 163 student e-portfolio introductions

Clustering E-Portfolio Submissions Concept Custered Terms Family family, york, high school, college, child Learning class, teacher, art, math, subject Everyday know, day, love, life College participation high school, school, attend, guttman Gamming game, movie, favorite, watch, video Making friends shy, person, friend, know, quiet Recreation art, basketball, play, sport, travel Society social, worker, work, believe, help Technology technology, information, art, health, mind Business guttman, business, manhattan, administration, graduate

Regression of Academic Preparation and Clustered Text Related to Credit Hours Independent Variable β Sig. SATV -0.23 0.02 SATM 0.22 0.02 WritProf 0.20 0.02 Age 0.08 0.31 Connection to family -0.15 0.06 R 2 0.12

Implications Process of automation Considering text source Weight of sentiment

Considerations Theoretical v. A-theoretical Ethical considerations Creepy treehouse Use of language

Thank You