Abhinandan Das EDUCATION AREAS OF INTEREST REPRESENTATIVE RESEARCH



Similar documents
Adina Crainiceanu. Ph.D. in Computer Science, Cornell University, Ithaca, NY May 2006 Thesis Title: Answering Complex Queries in Peer-to-Peer Systems

MATTEO RIONDATO Curriculum vitae

International Journal of Advanced Research in Computer Science and Software Engineering

Clustering Data Streams

Xianrui Meng. MCS 138, 111 Cummington Mall Department of Computer Science Boston, MA (857)

Database Application Developer Tools Using Static Analysis and Dynamic Profiling

Ming-Wei Chang. Machine learning and its applications to natural language processing, information retrieval and data mining.

Mahesh Srinivasan. Assistant Professor of Psychology and Cognitive Science University of California, Berkeley

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

The Microsoft Database Research Group

Efficient Integration of Data Mining Techniques in Database Management Systems

Query Optimization Approach in SQL to prepare Data Sets for Data Mining Analysis

Curriculum Vitae Ruben Sipos

The basic data mining algorithms introduced may be enhanced in a number of ways.

keywords: big-graphs, big-data, graph-systems, knowledge graphs, uncertain graphs, graph streams, revenue maximization, moving objects indexing.

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

BHARATENDU SRIVASTAVA

Machine Learning Department, School of Computer Science, Carnegie Mellon University, PA

University of Washington, Seattle Ph. No: (248) melodi.ee.washington.edu/~rkiyer/ Seattle, WA

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

Diversity Student Summer Research Opportunity Program. June 6, 2014 August 1, 2014 PROGRAM DESCRIPTION

Ahmed Metwally Google Inc Amphitheatre Pkwy Mountain View, CA (805)

Data Mining and Database Systems: Where is the Intersection?

CURRICULUM VITAE. Experience in Higher Education 2004 current UMBC, assistant professor, information systems

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Juan (Jenn) Du. Homepage: www4.ncsu.edu/ jdu/ Co-advisors: Dr. Xiaohui (Helen) Gu and Dr. Douglas Reeves

A Novel Cloud Computing Data Fragmentation Service Design for Distributed Systems

Self-Compressive Approach for Distributed System Monitoring

CONTENT. King Hussein Faculty of Computing Sciences

Classification and Prediction

Database Tuning Advisor for Microsoft SQL Server 2005

Building Data Cubes and Mining Them. Jelena Jovanovic

Web Log Data Sparsity Analysis and Performance Evaluation for OLAP

Load Balancing in Peer-to-Peer Data Networks

Dr. Shashank Srivastava. Contact. Telephone: Teaching. Courses Taught: Current Courses: Past Courses:

Nicole M. Lawless DesJardins 1227 University of Oregon, Eugene OR Ph: ,

International journal of Engineering Research-Online A Peer Reviewed International Journal Articles available online

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

Casey A. Knifsend Curriculum Vitae

Curriculum of the research and teaching activities. Matteo Golfarelli

Discover Viterbi: Computer Science

Survey On: Nearest Neighbour Search With Keywords In Spatial Databases

Debasis Karmakar. Postdoctoral Scholar (October 2009 Present) UC Irvine (Department of Civil and Environmental Engineering), Irvine, CA

WORKSHOP ON SUPPLY CHAIN MANAGEMENT 4 th February 2015 to 8 th February 2015 at IIT Kanpur, INDIA CONDUCTED BY

Accelerated Status Eligibility Requirements: How to Petition: What are the advantage? Important note about applying to graduate school: Please note

Sharareh Noorbaloochi Department of Psychology New York University 6 Washington Place, 559, New York, NY noorbaloochi@nyu.

Introduction to Database Systems CS4320/CS5320. CS4320/4321: Introduction to Database Systems. CS4320/4321: Introduction to Database Systems

Load Distribution in Large Scale Network Monitoring Infrastructures

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Laura F. Boehm Vock. Voice: (715) Website: pages.stolaf.edu/boehm/

Teaching in School of Electronic, Information and Electrical Engineering

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Data Stream Management

SHANKAR VENKATARAMAN. Ph.D. in Accounting, 2008 The University of Texas at Austin, McCombs School of Business, Austin, TX

GETTING READY TO WORK IN PSYCHOLOGY

Data Mining: Opportunities and Challenges

Course 6234A: Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services

CS 525 Advanced Database Organization - Spring 2013 Mon + Wed 3:15-4:30 PM, Room: Wishnick Hall 113

Adam Anthony Baldwin-Wallace College Voice: (440) Department of Mathematics and Computer Science 275 Eastland Rd

Load Balancing in Structured Peer to Peer Systems

AMIS 7640 Data Mining for Business Intelligence

Clinical Nurse Leader Examination, Dominican University, 2012

CURRICULUM VITAE JORGE PÉREZ

Karthik Sridharan. 424 Gates Hall Ithaca, sridharan/ Contact Information

KAITLIN WOOLLEY 5807 South Woodlawn Avenue Chicago, IL website:

II. OLAP(ONLINE ANALYTICAL PROCESSING)

June Zhang (Zhong-Ju Zhang)

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

Kathleen Boyd Curriculum Vita

AHMED KHURSHID. Curriculum Vitae

Research Statement Immanuel Trummer

Natural Language to Relational Query by Using Parsing Compiler

Type Ahead Search in Database using SQL

Graduate Studies COMPUTER SCIENCE

CURRICULUM VITAE. Ankur Gupta

Swiss Joint Master in Computer Science of the universities of Bern, Neuchâtel and Fribourg

Prevention, Detection, Mitigation

David L. Arndt, M.S.

Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices

Dynamic Data in terms of Data Mining Streams

Adam C. Zaleski. College of Natural Sciences Travel Award (2008). Received $300 in travel expenses to attend annual conference and present research.

Association rules for improving website effectiveness: case analysis

Transcription:

Abhinandan Das 4154 Upson Hall Department of Computer Science Cornell University Ithaca NY 14853 http://www.cs.cornell.edu/~asdas Office: (607) 255 4574 Home: (607) 227 5957 Cell: (607) 227 5957 Fax: (607) 255 4428 Email: asdas@cs.cornell.edu EDUCATION Ph.D. in Computer Science August 2005 (Expected) Ithaca, NY GPA: 4.13/4.30 Thesis: Approximate Query Answering over Data Streams Advisor: Prof. Johannes Gehrke Bachelor of Technology in Computer Science and Engineering April 2000 Indian Institute of Technology (IIT) - Bombay, India AREAS OF INTEREST Database Systems, Data Stream Processing, Data Summarization, Database Privacy, Distributed Algorithms and Data Mining. REPRESENTATIVE RESEARCH Online Approximation Techniques for Spatial Databases This work introduces novel sketch based methods that permit high quality selectivity estimation for spatial joins and range queries. Our synopses can be constructed in a single scan over the input, handle inserts and deletes to the database incrementally, and hence can also be used for processing streaming data. In contrast to previous approaches which provide no guarantees on the quality of approximate results provided, our techniques return approximate results that come with provable probabilistic error bounds. The quality guarantees are user tunable and permit a graceful tradeoff between space consumption and the quality of the resulting approximation. [C2] Approximate Join Processing over Data Streams Joins are important operators in any data stream processing system. In this work, we consider the problem of approximating sliding window joins over data streams in a system with limited resources. We propose semantic load shedding for dealing with resource constraints in case system resources are insufficient to handle bursty/variable rate streams. We present optimal offline and fast online techniques for load shedding, as well as present some hardness results and approximation algorithms for semantic join approximation. [C3,J1] Distributed Data Stream Processing With the widespread use and deployment of networks linking together a broad range of devices, distributed data streaming applications are becoming increasingly common. In this work, we consider the problem of estimating set expression cardinality in a distributed streaming environment. We analyze and present the first algorithmic techniques for minimizing

communication costs while answering set expression cardinality queries with guaranteed accuracy. Applications include detecting Distributed Denial of Service (DDoS) attacks in real time, detecting deviations in traffic flow across network routers, tracking web usage statistics of users. [C1] INTERNSHIPS AND RESEARCH EXPERIENCE Research Intern at Bell Laboratories, Murray Hill, NJ Summer 2003 Mentors: Minos Garofalakis, Rajeev Rastogi and Sumit Ganguly Worked on techniques for distributed set expression cardinality estimation in the Internet Management Research Group at Bell Labs. [C1] (More details above.) Research Intern at Microsoft Research, Redmond, WA Summer 2001 Mentors: Vivek Narasayya and Surajit Chaudhuri Developed an algorithm for automating the physical layout of relational databases [C4]. Also implemented a prototype layout wizard on the Microsoft SQL Server 2000 product and experimentally demonstrated the superiority of the layouts generated by the proposed algorithm over traditional RAID based techniques. This work was done as part of the AutoAdmin project in the Data Management, Exploration and Mining (DMX) group at MSR. Group Membership Protocol, Cornell University 2001 Designed and developed the highly scalable SWIM group membership protocol ([C5,P1]) used by distributed peer-to-peer applications to maintain weakly consistent group membership. Unlike the traditional heart beat based protocols which are popular with distributed systems designers today, SWIM uses a randomized probing technique that imposes constant expected load per member and constant expected failure detection time, independent of group size. Approximating Correlated Sums over Data Streams, Cornell University 2001 Developed efficient approximation algorithms and summary structures for computing approximate correlated sum (CS) aggregates when the independent aggregate can be computed exactly over a data stream in limited space. For the case when the independent aggregate cannot be computed exactly over a data stream, we show the negative result that the error of the CSaggregate can be arbitrarily large.[j2] Advanced DBMS Project, Cornell University 2000 Developed an algorithm for constructing multidimensional histograms over static as well as streaming data, based on a novel objective function to determine bucketization. Distributed Data Mining, IIT Bombay (Senior Thesis) 1999-2000 Advisors: Prof. Sunita Sarawagi and Prof. S. Sudarshan. Developed an efficient classification algorithm for a vertically partitioned hierarchical database that exploits the hierarchy to speed up classification. In contrast to traditional mining approaches, the algorithm incorporates information from various tables in the database without the need for first manually pre-processing the data as a flattened table. CONFERENCE/JOURNAL PUBLICATIONS (Available at http://www.cs.cornell.edu/~asdas) [C1] Abhinandan Das, Sumit Ganguly, Minos Garofalakis and Rajeev Rastogi, Distributed Set- Expression Cardinality Estimation. In Proceedings of the 30 th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, August 2004.

[C2] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Approximation Techniques for Spatial Data. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, June 2004. [C3] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Approximate Join Processing over Data Streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), San Diego, CA, June 2003. [C4] Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das and Vivek Narasayya, Automating Layout of Relational Databases. In Proceedings of the 19 th IEEE International Conference on Data Engineering (ICDE 2003), Bangalore, India, March 2003. [C5] Abhinandan Das, Indranil Gupta and Ashish Motivala, SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol. In Proceedings of the 2002 IEEE International Conference on Dependable Systems and Networks (DSN 2002), Washington DC, June 2002. [J1] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Semantic Approximation of Data Stream Joins. In IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 17, No 1, January 2005. [J2] Rohit Ananthakrishna, Abhinandan Das, Johannes Gehrke, Flip Korn, S. Muthukrishnan and Divesh Srivastava, Efficient Approximation of Correlated Sums on Data Streams. In IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 15, No 3, May/June 2003. TECHNICAL REPORTS AND POSTERS [T1] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Online Approximation Techniques for Spatial Data. Cornell University Computing and Information Science Tech. Report TR2004-1947, July 2004. [T2] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Semantic Approximation of Data Stream Joins. Computing and Information Science Tech. Report TR2004-1932, March 2004. [T3] Abhinandan Das, Distributed Data Mining. Senior Thesis, IIT Bombay, May 2000. Advisors: Sunita Sarawagi and S. Sudarshan. [P1] Abhinandan Das, Indranil Gupta and Ashish Motivala, The Dping Scalable Membership Service. In ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, October 2001. (poster) UNDER PREPARATION Load Smoothing for Data Streams, with Mirek Riedewald and Johannes Gehrke. PATENTS Abhinandan Das, Minos Garofalakis, Rajeev Rastogi and Sumit Ganguly, Distributed Set Expression Cardinality Estimation, Filed December 2004. Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das and Vivek Narasayya, Automating Layout of Relational Databases, Filed March 2003.

ACADEMIC HONORS Recipient of the prestigious Sage Excellence University Fellowship for two years at Cornell University. Ranked in the top 3 amongst undergraduate students in the Computer Science Dept. at IIT Bombay. Among the top 5 CGPA holders in the institute amongst over 400 undergraduate students at IIT Bombay. Recipient of the Director s Special Prize for finishing in the top 5 of my entire batch (all branches of engineering) at IIT Bombay in my freshman year. Received special mention for exceptional performance in mathematics as an undergrad at IIT Bombay. Placed in the top 0.4% amongst approximately 100,000 examinees in JEE 1996, the Joint Entrance Examination for admission to the IITs. Placed first in the Physics-Chemistry-Mathematics group (with 99.33%) in Pune Division in the preuniversity HSC (10+2) examination amongst over 150,000 examinees. Awarded the National Talent Search Scholarship (NTS) in an all India exam held in 1993. TEACHING EXPERIENCE CS432 Introduction to Database Systems Fall 2004 Was the head teaching assistant for this senior level course in Computer Science at Cornell University. Responsibilites included assisting the professor in designing written assignments as well as setting questions for the prelims and course finals. CS433 Practicum in Database Systems Fall 2004 Delivered half the lectures for CS433, the practicum course associated with CS432 where the aim is to learn about database internals by writing C++ code for a large part of a database system. PROFESSIONAL ACTIVITIES External Reviewer SDM 2005, ISI 2005, VLDB 2004, SIGMOD 2004, PODS 2004, EDBT 2004, KDD 2004, ISI 2004, SIGMOD 2003, ICML 2003, KAIS 2003, SSDBM 2003, KDD 2003, SIGMOD 2001, JDM, IEEE Trans. on Knowledge and Data Engineering, VLDB Journal. Reviewer DSN 2003. OTHER ACTIVITIES Programming Contests Finished in the top 10 in the Trilogy Programming Contest held at Cornell University in 2001. Selected for trials for the regional ACM programming contest. Placed 3 rd in the IIT Open C programming contest held in 1999.

Sports Member of the winning team at the district level junior lawn tennis championships in Pune district, India. Played lawn tennis at the State Level (in India) making the quarter finals of two prestigious state ranking tournaments (Pune Open and Karia Trophy) on two occasions each. Active member of the Cornell Badminton Club. PERSONAL DETAILS Citizenship: Indian Visa Status: F1 REFERENCES Prof. Johannes Gehrke Ph: 607-255-1045 Email: johannes@cs.cornell.edu Prof. Sumit Ganguly IIT Kanpur, Kanpur, India. Ph: +91-512-259-8716 Email: sganguly@cse.iitk.ac.in Dr. Mirek Riedewald Ph: 607-255-0110 Email: mirek@cs.cornell.edu Prof. Jayavel Shanmugasundaram Ph: 607-255-4117 Email: jai@cs.cornell.edu Dr. Minos Garofalakis Networking Research Lab, Bell Laboratories, Murray Hill, NJ 07974. Ph: 908-582-1723 Email: minos@research.bell-labs.com