SPRACH - WP 6 & 8: Software engineering work at ICSI



Similar documents
GCE APPLIED ICT A2 COURSEWORK TIPS

Client/server is a network architecture that divides functions into client and server

Example of Standard API

Deploying Cisco Unified Contact Center Express Volume 1

Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast

D2.4: Two trained semantic decoders for the Appointment Scheduling task

VoiceXML Data Logging Overview

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

vcenter Operations Manager Administration 5.0 Online Help VPAT

TI 313 (1.0 EN) d&b Remote network - Integration

Frequently Asked Questions About VisionGauge OnLine

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Voluntary Product Accessibility Template

Voluntary Product Accessibility Report

Summary Table for SolarWinds Web Help Desk

How To Develop Software

Xerox DocuMate 3125 Document Scanner

Camtasia: Importing, cutting, and captioning your Video Express movie Camtasia Studio: Windows

Summary Table for SolarWinds Web Help Desk

How To Test Video Quality With Real Time Monitor

Structural Health Monitoring Tools (SHMTools)

Computer Requirements

AUTOMATIC PHONEME SEGMENTATION WITH RELAXED TEXTUAL CONSTRAINTS

Government Case Study:

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

VPAT Summary. VPAT Details. Section Web-based Internet information and applications - Detail

How do non-expert users exploit simultaneous inputs in multimodal interaction?

The PACS Software System. (A high level overview) Prepared by : E. Wieprecht, J.Schreiber, U.Klaas November, Issue 1.

Using Pharmacovigilance Reporting System to Generate Ad-hoc Reports

How To Retire A Legacy System From Healthcare With A Flatirons Eas Application Retirement Solution

RFID Based 3D Indoor Navigation System Integrated with Smart Phones

3.5 EXTERNAL NETWORK HDD. User s Manual

Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2

XpoLog Center Suite Log Management & Analysis platform

Introducing Version 7 SOFTWARE MODULES

Current Page Location. Tips for Authors and Creators of Digital Content: Using your Institution's Repository: Using Version Control Software:

SQL Server Integration Services. Design Patterns. Andy Leonard. Matt Masson Tim Mitchell. Jessica M. Moss. Michelle Ufford

Using The HomeVision Web Server

HOPS Project presentation

VPAT Voluntary Product Accessibility Template

SPEECH DATA MINING, SPEECH ANALYTICS, VOICE BIOMETRY. 1/41

The Scientific Data Mining Process

GV STRATUS The Next Step in Collaborative Workflows. Régis André Product Manager, STRATUS September 2011

Summary Table Voluntary Product Accessibility Template

Open-Source, Cross-Platform Java Tools Working Together on a Dialogue System

IP Office Release 6.0 FAQ

Chapter 13: Program Development and Programming Languages

Crystal Live. Overview. Interaction Recording and Monitoring Solution to Data Mining and Performance Management

Specialty Answering Service. All rights reserved.

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Encore. The Powerful, Affordable Answer for Contact Centers Like Yours. Product Description

Skills for Employment Investment Project (SEIP)

Pro/INTRALINK 9.0/9.1 Curriculum Guide

Enriched Links: A Framework For Improving Web Navigation Using Pop-Up Views

FROM RELATIONAL TO OBJECT DATABASE MANAGEMENT SYSTEMS

PRODUCT BROCHURE SPATIAL ANALYZER. Powerful, traceable and easy-to-use metrology and analysis software

NETAVIS Observer 4.6. Full Feature List

Ansur Test Executive. Users Manual

Course Name: Course in JSP Course Code: P5

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

For Quartus II Software. This Quick Start Guide will show you. how to set up a Quartus. enter timing requirements, and

A quick user guide for your LX Apollo DVR

WEB DATABASE PUBLISHING

SECTION 508 COMPLIANCE

TestManager Administration Guide

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases

Using RMON to Manage Remote Networks Gilbert Held

Computer Forensics and Investigations Duration: 5 Days Courseware: CT

EDM SOFTWARE ENGINEERING DATA MANAGEMENT SOFTWARE

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

VPAT. Voluntary Product Accessibility Template. Version 1.3

Voluntary Product Accessibility Template (VPAT)

STIM202 Evaluation Kit

WESTERN KENTUCKY UNIVERSITY. Web Accessibility. Objective

1. Central Monitoring System Software

Greenplum Database (software-only environments): Greenplum Database (4.0 and higher supported, or higher recommended)

How To Recognize Voice Over Ip On Pc Or Mac Or Ip On A Pc Or Ip (Ip) On A Microsoft Computer Or Ip Computer On A Mac Or Mac (Ip Or Ip) On An Ip Computer Or Mac Computer On An Mp3

Texas Essential Knowledge and Skills Correlation to Video Game Design Foundations 2011 N Video Game Design

How To Understand Programming Languages And Programming Languages

DIGITAL FORENSICS SPECIALIZATION IN BACHELOR OF SCIENCE IN COMPUTING SCIENCE PROGRAM

Configuration Management: An Object-Based Method Barbara Dumas

UPSTREAM for Linux on System z

New Generation of Software Development

ImagineWorldClient Client Management Software. User s Manual. (Revision-2)

MDSplus Automated Build and Distribution System

QAD Business Intelligence Release Notes

Sports Management Information Systems. Camilo Rostoker November 22, 2002

A Process for ATLAS Software Development

USTC Course for students entering Clemson F2013 Equivalent Clemson Course Counts for Clemson MS Core Area. CPSC 822 Case Study in Operating Systems

Transcription:

SPRACH - WP 6 & 8: Software engineering work at ICSI March 1998 Dan Ellis International Computer Science Institute, Berkeley CA <dpwe@icsi.berkeley.edu> 1 2 3 Hardware: MultiSPERT Software: speech & visualization tools Packaging: SPRACHworks SPRACH WP6 & 8 - Dan Ellis 1998mar24-1

1 Hardware: MultiSPERT Multiple SPERT boards on single host MultiSPERT New software, firmware, hardware - for nets: pattern parallel and network parallel Fastest performance: 500 MCUPS (5 boards, 10x previous single-board) Current neural-net trainings: 8000 hidden units, 3.2M params, 18M frames ~10 15 ops per training iteration - this size not previously possible SPRACH WP6 & 8 - Dan Ellis 1998mar24-2

2 Software: Speech tools New components: feacalc: front-end feature calculation - uses core RASTA-PLP routines from rasta - mnemonic options ( -L -rasta log ) - comprehensive file format support featools: scaffolding for novel features - handles finding input files & assembling output - user provides (chain of) online-feature applets pfile_utils: feature archive manipulation - merging, splitting, rearranging, summary stats - pfile_gaussian, pfile_klt... SPRACH WP6 & 8 - Dan Ellis 1998mar24-3

Visualization & User Interface tools ICSI is developing a toolkit of Tcl/Tk modules for visualization & demos Current pieces - visualization of signals, features, probabilities, label alignments - interactive & file-based audio input/output - web-interface tools Some current applications - sgramimg.cgi on-demand spectrogram gifs - recogviz comparing different ASR configs - berpdemo98 speech application plus graphics SPRACH WP6 & 8 - Dan Ellis 1998mar24-4

sgramimg.cgi CGI script: spectrogram GIF on-demand - point to in <img src..> tag of other pages Automatically includes xlabel annotation Try it, e.g. links from: http://www.icsi.berkeley.edu/~dpwe/research/etc/phnless.html SPRACH WP6 & 8 - Dan Ellis 1998mar24-5

recogviz Motivation: compare recognition techniques - at each stage in process (signal, features, probs) Easily to incorporate novel features etc. SPRACH WP6 & 8 - Dan Ellis 1998mar24-6

berpdemo98 Existing application plus recogviz modules Demo available... SPRACH WP6 & 8 - Dan Ellis 1998mar24-7

3 Packaging: SPRACHworks Original SPRACHworks goal: Source-level distribution of all partner tools - integrated: compatibility between tools - portable: easily compiled - ready-to-run: demos to play with HUGE amount of work! Reduced scope: - several separate but conformal packages (ICSI, Cambridge, FPMs/Strut) - incremental incorporation of packages Status: - first collection: April 97 (Cambridge) - latest: single make for complete speech demo ( SPRACHcore = 19 packages) - ICSI-developed, installed at FPMs & IDIAP SPRACH WP6 & 8 - Dan Ellis 1998mar24-8

Future work Tools - language modelling, two-pass decoder, etc... - formal modular structure for visualization drop-in demos Packaging - include more packages: ASR training - tutorial documentation - more platforms? SPRACH WP6 & 8 - Dan Ellis 1998mar24-9

Combined speech/nonspeech analysis: How to combine ASR with Computational Auditory Scene Analysis (CASA) for nonspeech transients? Preprocessor (Weintraub 85, Nakatani/Okuno 97) mixture CASA (f 0 -based) single element ASR words Use extra CASA info in ASR (Cooke,Morris & Green 97) Simultaneous search for speech & nonspeech elements mixture mixture CASA Hypothesis management Compare prediction errors mask element Missing-data ASR scene hypothesis Speech elements Nonspeech elements Predict words words casa-appr 1998mar Difference between observations & predictions drives analysis ThisL/ICSI - Dan Ellis 1998mar25-1

Speech/nonspeech: preliminary results Original speech +clap Bootstrap recognition db 60 40 20 Reconstructed speech element Recovered nonspeech (a) Speech plus clap (223cl-env.pf) (b) Recognizer output h# w n ay n tclt uw f ay ah s ay ow h# v s eh v ah n h# h# n ay n tcl t uw f ay v ow h# s eh v ax n <SIL> nine two five oh <SIL> seven (c) Predicted speech element (223cl-renv.pf) (d) Click5 from nonspeech analysis (justclick.pf) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 s Issues: - iterative refinement of each element use nonspeech to relax speech recognition - reconstructing speech features from ASR output - use f 0 -based separation to bootstrap ASR; recognizer trained on periodic + noise ThisL/ICSI - Dan Ellis 1998mar25-2

ICSI GUI tools in ThisL Interactive audio_frontend for commands Web-embeddable recognition display tools ThisL/ICSI - Dan Ellis 1998mar25-3

ThisL Overview Archive Query Database Segmentation Control IR NLP ASR Receiver ASR Text Audio http Video Domains: BBC news (3hr/day) NIST TREC French? Issues: Size of episodes decoder Size of database speed Segmentation/side info Information Retrieval Dan Ellis <dpwe@icsi.berkeley.edu> 1998apr02