Measuring Usability Datagathering & usability metrics. De hoorcollegereeks 2012-2013. Vandaag 5/29/2013



Similar documents
Second Edition. Measuring. the User Experience. Collecting, Analyzing, and Presenting Usability Metrics TOM TULLIS BILL ALBERT

How To Test A Website On A Web Browser

GMP-Z Annex 15: Kwalificatie en validatie

MAYORGAME (BURGEMEESTERGAME)

THE EMOTIONAL VALUE OF PAID FOR MAGAZINES. Intomart GfK 2013 Emotionele Waarde Betaald vs. Gratis Tijdschrift April

Examen Software Engineering /09/2011

Guidelines for Using the Retrospective Think Aloud Protocol with Eye Tracking

What can Kind en Gezin (Child and Family) do for you and your family?

UPA 2004 Presentation Page 1

User experience research and practice two different planets?

Screen Design : Navigation, Windows, Controls, Text,

Levels of measurement in psychological research:

Load Balancing Lync Jaap Wesselius

user checks! Get it RITE! Snel naar een sterk verbeterd ontwerp

Specification by Example (methoden, technieken en tools) Remco Snelders Product owner & Business analyst

User Interface Design

Effect of Using Human Images in Product Presentation of E-Commerce Website on Trust, Fixation and Purchase Intention: A Design of Experiment

Improving Government Websites and Surveys With Usability Testing and User Experience Research

Maatschappelijke Innovatie

An Application of the UTAUT Model for Understanding Student Perceptions Using Course Management Software

Graphical User Interfaces. Prof. dr. Paul De Bra Technische Universiteit Eindhoven Universiteit Antwerpen. 2005/2006 Graphical User Interfaces 2

ANALYSIS OF USER ACCEPTANCE OF A NETWORK MONITORING SYSTEM WITH A FOCUS ON ICT TEACHERS

How to Get More Value from Your Survey Data

Online Customer Experience

Spread. B&R Beurs. March 2010

EEN HUIS BESTUREN ALS EEN FABRIEK,

Relationele Databases 2002/2003

Citrix Access Gateway: Implementing Enterprise Edition Feature 9.0

DUOLINGO USABILITY TEST: MODERATOR S GUIDE

Patient Reported Outcome Measures (PROM) after adenotonsillectomy performed in children with sleep-disordered breathing

IP-NBM. Copyright Capgemini All Rights Reserved

101 Inspirerende Quotes - Eelco de boer - winst.nl/ebooks/ Inleiding

NANYANG TECHNOLOGICAL UNIVERSITY

Usability Testing. Credit: Slides adapted from H. Kiewe

ISACA Roundtable. Cobit and 7 september 2015

Keeping an eye on recruiter behavior

Making, Moving and Shaking a Community of Young Global Citizens Resultaten Nulmeting GET IT DONE

Design Document. Developing a Recruitment campaign for IKEA. Solve-

Sample test Secretaries/administrative. Secretarial Staff Administrative Staff

Maximizer Synergy. BE Houwaartstraat 200/1 BE 3270 Scherpenheuvel. Tel: Fax:

IAB Programmatic educatie dag

Abstract. Keywords: Mobile commerce, short messaging services, mobile marketing. Mobile Marketing

Tim Huijgen. Summary. Experience. Teacher educator, researcher and history teacher

Memorandum. Zie bijlage. Behavioural and Societal Sciences Kampweg DE Soesterberg Postbus ZG Soesterberg. Van Dr. J.B.F.

IT-waardeketen management op basis van eeuwenoude supply chain kennis

The Usability of Electronic Stores based on the Organization of Information and Features

A Comparison of Training & Scoring in Distributed & Regional Contexts Writing

Risk-Based Monitoring

Qualitative data acquisition methods (e.g. Interviews and observations) -.

TRENDS IN TRAVEL. GfK turning research into business opportunities. Judith Nijk,

COOLS COOLS. Cools is nominated for the Brains Award! Coen Danckmer Voordouw

Hoofdstuk 2 Samenwerking en afstemming in de zorgketen

Cambridge International Examinations Cambridge International General Certificate of Secondary Education

Market Intelligence & Research Services. CRM Trends Overview. MarketCap International BV Januari 2011

IMPLEMENTATIE PAL4 DEMENTIE BIJ CLIENTEN VAN ZORGORGANISATIE BEWEGING 3.0

Pentecostal Views on Violent Crises in Plateau State: Pilot Study Report NPCRC Technical Report #N1101

BEGRIJP JE DOELGROEP BETER DOOR KENNIS OVER TRENDS. Spreker: Aljan de Boer Moderator: Mascha Buiting

Virtualisatie. voor desktop en beginners. Gert Schepens Slides & Notities op gertschepens.be

PERCEPTION OF BUILDING CONSTRUCTION WORKERS TOWARDS SAFETY, HEALTH AND ENVIRONMENT

Met wie moet je als erasmusstudent het eerst contact opnemen als je aankomt?

If farming becomes surviving! Ton Duffhues Specialist Agriculture and society ZLTO Director Atelier Waarden van het Land 4 juni 2014, Wageningen

Scientific Method. 2. Design Study. 1. Ask Question. Questionnaire. Descriptive Research Study. 6: Share Findings. 1: Ask Question.

How to manage Business Apps - Case for a Mobile Access Strategy -

User Resistance Factors in Post ERP Implementation

HMRC Tax Credits Error and Fraud Additional Capacity Trial. Customer Experience Survey Report on Findings. HM Revenue and Customs Research Report 306

Inhoud. Xclusief Verzekeringen 4. Xclusief Auto 7. Xclusief Wonen 8. Xclusief Jacht 11. Xclusief Evenementen 12. Contact 15

Is het nodig risico s te beheersen op basis van een aanname..

Shaadi.com- Usability Test

Web as New Advertising Media among the Net Generation: A Study on University Students in Malaysia

Informed decision making under technological change

Methods. and Schnieder, Reichheld and Sasser, Knight, Gans, Koole, and Mandelbaum, and others.

Mobile Stock Trading (MST) and its Social Impact: A Case Study in Hong Kong

HIPPO STUDY DG Education And Culture Study On The Cooperation Between HEIs And Public And Private Organisations In Europe. Valorisatie 9/26/2013

Data Coding and Entry Lessons Learned

HOE WERKT CYBERCRIME EN WAT KAN JE ER TEGEN DOEN? Dave Maasland Managing Director ESET Nederland

CATALYST REPORT DebateHub and Collective Intelligence Dashboard Testing University of Naples Federico II

CHAPTER 5: CONSUMERS ATTITUDE TOWARDS ONLINE MARKETING OF INDIAN RAILWAYS

Inhoudsopgave. Vak: Web Analysis 1 Vak: Web Information 1 Vak: Web Science 2 Vak: Web Society 3 Vak: Web Technology 4

THE THEORY OF PLANNED BEHAVIOR AND ITS ROLE IN TECHNOLOGY ACCEPTANCE OF ELECTRONIC MEDICAL RECORDS IMPLEMENTATION

NAAR NEDERLAND HANDLEIDING

NORTHERN VIRGINIA COMMUNITY COLLEGE PSYCHOLOGY RESEARCH METHODOLOGY FOR THE BEHAVIORAL SCIENCES Dr. Rosalyn M.

Indexes, Scales, and Typologies. Edgar Degas: The Absinthe Drinker (detail),

Corporate Universities Aanjagers van de lerende organisatie

Campaigns and actions: some recent Dutch experiences. Gerjan Huis in t Veld november

ead management een digital wereld

Applying Behavioural Insights to Organ Donation: preliminary results from a randomised controlled trial

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = Adverse impact as defined by the 4/5ths rule was not found in the above data.

Integral Engineering

Interaction Design. Chapter 4 (June 1st, 2011, 9am-12pm): Applying Interaction Design I

The state of DIY. Mix Express DIY event Maarssen 14 mei 2014

Diagnosis 2.0 Towards a new paradigm for personalized mental health care. Marieke Wichers, Psychiatry & Psychology, Maastricht

Shopper Marketing Model: case Chocomel Hot. Eric van Blanken 20th October 2009

Engineering Natural Lighting Experiences

A Study on the Acceptance of E-Ticketing In Universiti Utara Malaysia Bus Service

Modeling Customer Behavior in Multichannel Service Distribution: A Rational Approach D. Heinhuis

Assuring the Cloud. Hans Bootsma Deloitte Risk Services +31 (0)

Modelling user acceptance of wireless medical technologies

Lecture 2, Human cognition

SPORTFORUM ROTTERDAM ZUID EDUARDO BOGOTA SALAZAR

Cloud Computing: A Comparison Between Educational Technology Experts' and Information Professionals' Perspectives

Transcription:

Op zondagmiddag ligt Matthieu aan de elektrodes Measuring Usability Datagathering & usability metrics Universiteit Utrecht Dr. S. Renooij (met dank aan Dr. H. Prüst) 1 TROUW 8/3 2010 2 De hoorcollegereeks 2012-2013 1(17): Introductie UE, Interaction design 2(18): Establishing requirements, Paper prototyping (Maartje de Vries)* 3(19): dag na Hemelvaart 4(20): Ontwikkelen van (digitale) leermiddelen en usability (Ellen Schuurink)* 5(21): herkansingen periode 3 6 (22): Performance metrics, self-reported metrics, scales, physiological metrics 7 (23): UE in grote (overheids) projecten (Margot Lagendijk)* 8 (24): Verschillende UE methoden in het gehele proces (Stijn Nieuwendijk)* 9 (25): Evaluation, DECIDE framework 10 (26): UE overzicht en ontwikkelingen 11 (27): Eindtoets * verplichte aanwezigheid Vandaag Datagathering & analysis Bijbehorende literatuur: -Tullis & Albert (2008) hfst 4 en 6 - Sharp et al (2011) hfst 7 en 8 Usability metrics Performance metrics Self-reported metrics (user perception) Scales for UX Behavioural and physiological metrics 4 key issues that require attention for any data gathering session to be successful: goal setting identifying participants (sampling) relationship with participants (clean & professional) pilot studies triangulation 5 triangulation triangulation: the investigation of a phenomenon from (at least) two different perspectives (Jupp, 2009). Four types: - Triangulation of data: data is drawn from different sources at different times, in different places or from different people (possibly by using a different sampling technique). - Investigator triangulation: different researchers (observers, interviewers etc) have been used to collect and interpret the data. - Triangulation of theories : the use of different theoretical frameworks through which to view the data or findings - Methodological triangulation means to employ different data gathering techniques. 6 1

User centered design Main data gathering techniques General: Interviews Structured, unstructured, semi-structured focus groups Questionnaires Observation Data recording Usability specific: Inspections (expert evaluator) Predictive models (theoretical) 7 Observation Direct observation in controlled environments Think aloud Indirect observation: (tracking users activities) Diaries Interaction logging eye tracking, (web) analytics Direct observation in the field Structuring frameworks Degree of participation (insider or outsider) Ethnography Structuring frameworks to guide observation - The person. Who? - The place. Where? - The thing. What? The Goetz and LeCompte (1984) framework: - Who is present? - What is their role? - What is happening? - When does the activity occur? - Where is it happening? - Why is it happening? - How is the activity organized? Ethnography Ethnography is a philosophy with a set of techniques that include participant observation and interviews Ethnographers immerse themselves in the culture that they study Co-operation of people being observed is required A researcher s degree of participation can vary along a scale from outside to inside Analyzing video and data logs can be time-consuming; data analysis is continuous Interpretivist technique Collections of comments, incidents, and artifacts are made Data recording notes, audio, video, photographs data logging time on task, efficiency measuring physiological data heart rate, temperature, eye tracking, facial expressions 2

Kiezen tussen technieken 13 14 Usability criteria Specifieke criteria waarmee de usability van een product kan worden bepaald, door meting van de performance van de gebruiker. Bijvoorbeeld: tijd die nodig is een taak uit te voeren (efficiency) tijd die nodig is een taak te leren (learnability) aantal fouten dat wordt gemaakt na verstrijken van periode (memorability) Toetsbaar, meetbaar, kwantificeerbaar: Usability metrics (Tullis & Albert hfst 4 en verder) Criteria for User experience User experience refers to all aspects of someone s interaction with a product, application, or system (Tullis & Albert 2008) How many errors do users make in trying to log onto a library system? How many users get frustrated trying to read the tiny serial number on the back of their new MP3 player trying to registrate it? Behaviours and attitudes that can be measured to give insight into user experience 15 16 Usability Metrics Ways of measuring or evaluating the user s experience Reveal something about the user s experience What to measure? How to measure? When to measure? Data gathering Analysis and interpretation of data Tullis & Albert (2008) review three main types of usability metrics: Performance metrics Self-reported metrics (user perception) Behavioural and physiological metrics Measurements for Usability / User Experience Usability metrics reveal something about the user s experience with the interaction between the user and the product Performance e.g. criteria that explicitly measure effectiveness, efficiency, learnability Perceived experience e.g. satisfaction, expectation, perceived ease of use, perceived usefullness, awareness, pleasure Physiology and behaviour e.g. eye-movement, neural activity, facial expressions, stress 17 18 3

Validity whether an instrument actually measures what it sets out to measure Construct validity: the degree to which a measure relates to other variables as expected within a system of theoretical relationships Content validity: the degree to which a measure corresponds to the content of the construct it was designed to cover Criterion validity: evidence that scores from an instrument correspond with concurrent external measures conceptually related to the measured construct Ecological validity: evidence that the results of a study can be applied to real-world conditions 19 Reliability whether an instrument can be interpreted consistently across different situations (zie Wetenschappelijke Onderzoeksmethoden) 20 Performance metrics Performance metrics: user behaviour in relation with the use of scenarios or tasks useful to estimate the magnitude of a specific usability issue task-success (measures effectiveness, efficiency,...) time-on-task (measures efficiency, learnability...) steps-to-completion (measures efficiency,...) efficiency (measures efficiency,...) lostness (measures efficiency,...) errors 21 22 Performance metrics: task-success how effectively are users able to complete a given set of tasks? clear end-state? Vind de prijs van het boek dat gebruikt wordt bij de module Usability Engineering versus Onderzoek hoe je de profileringsruimte van de bacheloropleiding Informatiekunde kunt invullen Task succes Levels of succes: Complete succes Without assistance With assistance Partial succes Without assistance With assistance Failure Participant thought it was complete, but it wasn t Participant gave up Binary succes (0-1) 23 24 4

Binary success rates in Excel or SPSS Proefpersoon Taak 1 Taak 2 Taak 3 Pp1 1 0 1 Pp2 1 0 1 Pp3 1 1 1 Pp4 1 1 1 Pp5 0 0 1 Pp6 1 0 0 Pp7 0 1 1 Pp8 0 0 1 Pp9 1 0 1 Pp10 1 1 1 Pp11 0 1 1 Pp12 1 0 1 Mean 0,67 0,42 0,92 Std Dev 0,49 0,51 0,29 95% Conf. Interval 0,28 0,29 0,16 Lower Limit 0,39 0,13 0,75 Upper Limit 0,95 0,71 1,08 t-test: 0,237 % success 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Taak 1 Taak 2 Taak 3 Confidence intervals A confidence interval is a range that estimates the true population value for a statistic extremely valuable for any usability professional (Tullis 2008) For a given statistic calculated from a sample (e.g. the mean), the confidence interval is a range of values around that statistic that are believed to contain, with a certain probability (e.g. 95%), the population value. http://www.amstat.org NB: the binary success statistic adheres to a Bernouilli distribution; hence the success proportion follows a binomial distribution. This should be taken into account when computing the confidence interval (use e.g. Wald, or Adjusted Wald for n < 20). 25 26 Binary success rates and confidence intervals Binary success rates and confidence intervals ProefpersoonTaak 1 Taak 2 Taak 3 Pp1 1 0 1 Pp2 1 0 1 Pp3 1 1 1 Pp4 1 1 1 Pp5 0 0 1 Pp6 1 0 0 Pp7 0 1 1 Pp8 0 0 1 Pp9 1 0 1 Pp10 1 1 1 Pp11 0 1 1 Pp12 1 0 1 0.67 0.42 0.92 Mean Std Dev 0.49 0.51 0.29 Conf. Interval 0.28 0.29 0.16 95% Lower Limit 0.39 0.13 0.75 Upper Limit 0.95 0.71 1.08 t-test: 0.237 % success 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Taak 1 Taak 2 Taak 3 12 participants 36 participants given equal distribution, more participants yield smaller confidence intervals 27 28 Binary success: difference between tasks Analysis To determine statistically significant differences between the various tasks: perform t-test or Analysis of Variance (ANOVA) Zie Wetenschappelijke Onderzoeksmethoden 29 30 5

Measuring efficiency the amount of effort that a user expends to complete a task two types of effort Cognitive effort involves finding the right place to perform an action (e.g. finding a link on a web page), deciding what action is necessary (should I click on this link?), and interpreting the results of the action Physical effort involves the physical effort required to take an action simple and compound measures Measuring efficiency Simple metrics time-on-task steps-to-completion (number of steps or actions to complete a task) Compound metrics efficiency lostness 31 32 Performance metrics: time-on-task How much time is required to complete a task? How to measure? All tasks? Only successful tasks? the faster a participant can complete a task, the better the user experience? hotel /airplane ticket reservation: Yes (?) games? 33 Performance metrics: efficiency Compound efficiency metric: - combination of task success and time-on-task - typically measured per task; alternative = per participant efficiency = task completion rate mean time per task or, alternatively, efficiency = number of successfully completed tasks total time spent 34 Performance metrics: lostness Performance metrics: lostness Compound efficiency metric (Smith 1996) example 35 36 6

Measuring learnability measure how performance changes over time (how any efficiency metric changes over time) how much time and effort is required to become proficient using the product or application Performance metrics: time-on-task multiple trials for single subject (same task); gives learning curve collecting data multiple times (trials) within-subjects design (zie Wetenschappelijke Onderzoeksmethoden) 37 38 Measuring errors & usability issues Measuring errors & usability issues Usability issue: underlying cause of a problem Error: the possible outcome, i.e. the mistakes made during a task Errors may be useful in pointing out particularly confusing or misleading parts of an interface It is not evident what constitutes an error Therefore measuring errors is not always easy 39 40 Severity ratings of usability issues A combination of: frequency impact persistence 0 = I don't agree that this is a usability problem at all 1 = Cosmetic problem only: need not be fixed unless extra time is available on project 2 = Minor usability problem: fixing this should be given low priority 3 = Major usability problem: important to fix, so should be given high priority 4 = Usability catastrophe: imperative to fix this before product can be released Self-reported metrics Jakob Nielsen 41 42 7

Self-reported metrics What to measure? How to measure? Single-item formats Multiple-item formats: indexes and scales: general and usability-specific Gathering self-reported data Pre/Post-task Pre/Post-test Analysing self-reported data What to measure? characteristics age attitudes date of birth Kernenergie in Nederland moet worden afgeschaft beliefs occupation Het aantal ongelukken in kerncentrales is de laatste jaren toegenomen behaviours Bent u op dit moment lid van Greenpeace? gender male female 43 44 De informatie die je zoekt De informatie die je zoekt Attitude Wat mensen zeggen te willen Moet kernenergie in Nederland worden afgeschaft? Ja Nee Wat is uw houding ten aanzien van de afschaffing van kernenergie in Nederland? Sterk tegen Tegen Niet voor of tegen Voor Sterk voor Attitude Beliefs Wat mensen denken dat waar is Het aantal ongelukken met kerncentrales is de afgelopen tien jaar toegenomen. Waar Onwaar In hoeverre draagt kernenergie bij aan de energievoorziening in Nederland? In zeer beperkte mate In beperkte mate Voor een belangrijk deel Voor een zeer belangrijk deel Bent u het eens of oneens met de volgende stelling: Kernenergie zou in Nederland verboden moeten worden Eens Oneens Denkt u dat de afschaffing van kernenergie leidt tot problemen in de energievoorziening Ja Nee De informatie die je zoekt De informatie die je zoekt Attitude Beliefs Gedrag Wat mensen zeggen te doen Heeft u wel eens meegedaan aan een demonstratie tegen kernenergie? Ja Nee Bent u op dit moment lid van Greenpeace? Ja Nee Denkt u in de toekomst ooit mee te doen aan demontraties tegen kernenergie? Nee Waarschijnlijk niet Waarschijnlijk wel Ja Attitude Beliefs Gedrag Kenmerken Wie mensen zeggen te zijn Bent u een man of een vrouw? Man Vrouw Wat is uw huidige leeftijd? jaar Wat is uw geboortejaar? 19 Wat is uw hoogst genoten opleiding? Lager onderwijs Middelbaar onderwijs (HAVO/VWO) Middelbaar beroepsonderwijs (VMBO/MBO) Hoger Onderwijs (HBO/WO) 8

Single-item formats e.g. well-known Likert scale *: meten een enkele eigenschap, karakteristiek, etc 1 dimensie I think that I would like to use this system frequently: Strongly Disagree Disagree Neither agree not disagree Agree Rensis Likert Strongly Agree * in fact a misnomer: it s not a scale but a well-known question format 49 50 Guidelines single-items formats Avoid "acquiescence bias": people are more likely to agree with a statement than to disagree with it (Cronbach, 1946) You need to balance positively-phrased statements (such as "I found this interface easy to use") with negative ones (such as "I found this interface difficult to navigate"). Use 5-9 levels in a rating You gain no additional information by having more than 10 levels Include a neutral point in the middle of the scale Otherwise you lose information by forcing some participants to take sides Don t use numbers, but if so: use positive integers 1-7 instead of -3 to +3 (Participants are less likely to go below 0 than they are to use 1-3) Use word labels for at least the end points. Hard to create labels for every point beyond 5 levels Having labels on the end points only also makes the data more intervallike 51 Measuring concepts / Complex constructs Multidimensional indexes / scales 52 Suppose you want to measure innovativity of an organisation complexity of a task willingness of people to participate in social relations presence in a virtual environment engagement in a game credibility of a company maturity of different aspects of a company Strategic E-business maturity Informationtechnology perceived ease of use of a system Multidimensional measurement indexes & scales Both are composite or multidimensional measures Index simply accumulates scores assigned to individual indicators Scale is composed of several items that have a logical or empirical structure among them Both indexes and scales are ordinal measures engagement Construct 9

Multidimensional measurement indexes & scales a set of items to measure a construct some items don t fit in the construct (check reliability; Cronbach s alfa) Item 1 Item 2 Item 3 Item 5 Item 4 Item 6 Index construction an index of political activism (yes/no answers; summarised in single score) Wrote a letter to a public official Gave money to a political candidate Signed a political petition Gave money to a political cause Wrote a political letter to the editor Persuaded someone to change her or his plans Construct 56 Bogardus Social Distance Scale determines willingness of people to participate in social relations of various degrees of closeness with other kinds of people Are you willing to let sex offenders live in your country? Are you willing to let sex offenders live in your community? Are you willing to let sex offenders live in your neighborhood? Are you willing to let a sex offender live next door to you? Would you let your child marry a sex offender? Guttman scale Do you feel a woman shoud have the right to an abortion if she is not married? Do you feel that a woman should have the right to an abortion when her pregnancy was the result of a rape? Do you feel a woman shoud have the right to an abortion if continuing her pregnancy would seriously threaten her life? Woman s health is seriously endangerd 89% Pregnant as a result of rape 81% Woman is not married 39% Is a cumulative (Guttman) scale 57 58 Guttman scale Guttman scale bron: The basics of Social Research Earl Babbie 59 bron: The basics of Social Research Earl Babbie 60 10

Models with validated usability scales: TAM Technology Acceptance Model (Davis,1989) 61 Perceived Usefulness: degree to which a system is believed to enhance a person s job Perceived Ease of Use: the degree to which the use of a system is believed to be free from effort. 62 Validated usability scales: perceived usefulness (TAM) Validated usability scales: perceived ease of use (TAM) 63 64 Models with validated usability scales: UTAUT Unified Theory of Acceptance and Use of Technology (Venkatesh et al., 2003) Validated usability scales: performance expectancy, effort expectancy (UTAUT) 65 66 11

Other validated usability scales: presence user s subjective sensation of being there a perceptual illusion of non-mediation (Lombard & Ditton, 1997) Game-based leren A Cross-Media Presence Questionnaire: The ITC-Sense of Presence Inventory (Lessiter et al. 2001) 67 Rampentraining voor ambulancemedewerkers Code Red Triage (van der Spek (2010)) presence / engagement Validated usability scales: System Usability Scale (SUS) Code Red: Triage Or COgnitionbased DEsign Rules Enhancing Decisionmaking TRaining In A Game Environment, Erik D. van der Spek, Pieter Wouters and Herre van Oostendorp, in British Journal of Educational Technology (2010) 69 John Brooke Digital Equipment Corporation, 1986 A quick and dirty usability scale 70 SUS Self-reported data: Pre/Post-task Pre/Post-testsession 71 72 12

Measuring expectations: Pre- and Post-Task Ratings Ratings Can Help Prioritize Work Before the task: How easy or difficult do you expect this task to be? Very easy Very difficult 0 0 0 0 0 0 0 After the task: How easy or difficult was this task to do? Very easy Very difficult 0 0 0 0 0 0 0 Promote It Big Opportunity 1=Difficult Avg. Experience Rating 7 6 5 4 3 2 1 Average Expectation and Experience Ratings by Task 1 2 3 4 5 6 7 Average Expectation Rating Don t Touch It Fix it Fast 7=Easy 73 74 Pre/Post- task ratings versus Pre/Postsession ratings task-level data: help identify areas that need improvement (quick ratings immediately after each task help pinpoint tasks and interface parts that are particularly problematic) session-level data: help to get a sense of overall usability (effective overall evaluation after each participant has had a chance to interact with the product more fully) 75 Post-session ratings: examples Software Usability Scale (SUS) 10 ratings Usefulness, Satisfaction, and Ease of use (USE) Questionnaire for User-Interface Satisfaction QUIS * - 71 (long form), 26 (short form) ratings Software Usability Measurement Inventory (SUMI) * 50 ratings After Scenario Questionnaire (ASQ) three ratings Post Study System Usability Questionnaire (PSSOQ) - 19 ratings. Electronic version called the Computer System Usability Questionnaire (CSUQ) Website Analysis and MeasureMent Inventory (WAMMI) * 20 ratings of website usability Computer System Usability Questionnaire (CSUQ) * requires a license 76 Physiological and behavioural metrics Physiological metrics 77 Verbal behaviours Comments Questions Utterance of confusion / frustration Nonverbal behaviours Facial expressions Eye behaviour Skin conductance Heart rate Blood flow Temperature Sleep / wake 78 13

Measuring physiological signals: observation Usability Test Observation Coding Form Date: Participant ID: Task #: Start Time: Verbal Behaviors Strongly positive comment Other positive comment Strongly negative comment Other negative comment Suggestion for improvement Question Variation from expectation Stated confusion Stated frustration Other: Non-verbal Behaviors Frowning/Grimacing/Unhappy Smiling/Laughing/Happy Surprised/Unexpected Furrowed brow/concentration Evidence of Impatience Leaning in close to screen Variation from expectation Fidgeting in chair Random mouse movement Groaning/Deep sigh Rubbing head/eyes/neck Other: Task Completion Status: Incomplete: Participant gave up Task called by moderator Thought complete, but not End Time: Notes Notes Complete: Fully complete Notes: Complete with assistance Partial completion Measuring physiological signals: equipment 80 Facial expressions Video-based systems Electromyogram sensors pupils 81 Eye tracking (measuring attention) Are People Drawn to Faces on Webpages? T.Tullis, M.Siegel & M.Sun in: CHI 2009, Boston, Massachusetts, USA. Faces draw attention to them on webpages Study 1: users are clearly drawn to faces when asked to look at pages and report what they remember Eye tracking and task-performance Are People Drawn to Faces on Webpages? T.Tullis, M.Siegel & M.Sun in: CHI 2009, Boston, Massachusetts, USA. Study 2: a Portfolio Summary page was modified to contain either a photo of a woman s face or no image tasks that had answers that could be found by reading information on the page 83 84 14

Eye tracking and task-performance Study 2: Contrary to expectation, a picture of a face in this context actually caused users to do worse on a task involving information adjacent to the face. Thermal Imaging (measuring stress) Thermal imaging of the face Stresscam: a small thermal imaging camera 85 StressCam: Non-contact Measurement of Users Emotional States through Thermal Imaging, C. Puri et al. CHI 2005 86 Thermal Imaging (measuring stress) user stress is correlated with increased blood flow in the frontal vessel of the forehead. This increased blood flow dissipates convective heat Polysomnography & Actigraphy (measuring sleep) Polysomnography Actigraphy Sleeping diary 87 88 Comparing subjective and objective data Combining metrics: an example??? In accordance Objective measure (polysomnography Subjective measure(sleeping diary) 1=very bad,, 7=very well In accordance??? Emily B. Falk, Elliot T. Berkman, and Matthew D. Lieberman From Neural Responses to Population Behavior: Neural Focus Group Predicts Population-Level Media Effects in: Psychological Science 2012; 0(2012), zie UBU heavy smokers with the intention to quit brain activations were recorded while smokers viewed three different television campaigns (A, B, C) promoting the National Cancer Institute s telephone hotline to help smokers quit (1-800-QUIT-NOW) self-report predictions of the campaigns relative effectiveness 89 population measures of the success of each campaign 90 15

Self-report scale of Ad effectiveness Fig. 1. Illustration of the medial prefrontal cortex (MPFC) region of interest (ROI) and three measures of the effectiveness of the antismoking ad campaigns promoting the National Cancer Institute s Smoking Quitline. Falk E B et al. Psychological Science 2012;0956797611434964 Copyright by Association for Psychological Science Falk E B et al. Psychological Science 2012;0956797611434964 Copyright by Association for Psychological Science Samenvattend Usability metrics: reveal something about the user experience with the interaction between the user and the product Performance metrics e.g. task success, time on task, errors, efficiency -> statistics Self-reported metrics / User perception metrics e.g. satisfaction, expectation, perceived ease of use, perceived usefulness, awareness, pleasure -> scales Behavioural and physiological metrics e.g. eye-movement, facial expressions, stress 93 16