Abhinandan Das 4154 Upson Hall Department of Computer Science Cornell University Ithaca NY 14853 http://www.cs.cornell.edu/~asdas Office: (607) 255 4574 Home: (607) 227 5957 Cell: (607) 227 5957 Fax: (607) 255 4428 Email: asdas@cs.cornell.edu EDUCATION Ph.D. in Computer Science August 2005 (Expected) Ithaca, NY GPA: 4.13/4.30 Thesis: Approximate Query Answering over Data Streams Advisor: Prof. Johannes Gehrke Bachelor of Technology in Computer Science and Engineering April 2000 Indian Institute of Technology (IIT) - Bombay, India AREAS OF INTEREST Database Systems, Data Stream Processing, Data Summarization, Database Privacy, Distributed Algorithms and Data Mining. REPRESENTATIVE RESEARCH Online Approximation Techniques for Spatial Databases This work introduces novel sketch based methods that permit high quality selectivity estimation for spatial joins and range queries. Our synopses can be constructed in a single scan over the input, handle inserts and deletes to the database incrementally, and hence can also be used for processing streaming data. In contrast to previous approaches which provide no guarantees on the quality of approximate results provided, our techniques return approximate results that come with provable probabilistic error bounds. The quality guarantees are user tunable and permit a graceful tradeoff between space consumption and the quality of the resulting approximation. [C2] Approximate Join Processing over Data Streams Joins are important operators in any data stream processing system. In this work, we consider the problem of approximating sliding window joins over data streams in a system with limited resources. We propose semantic load shedding for dealing with resource constraints in case system resources are insufficient to handle bursty/variable rate streams. We present optimal offline and fast online techniques for load shedding, as well as present some hardness results and approximation algorithms for semantic join approximation. [C3,J1] Distributed Data Stream Processing With the widespread use and deployment of networks linking together a broad range of devices, distributed data streaming applications are becoming increasingly common. In this work, we consider the problem of estimating set expression cardinality in a distributed streaming environment. We analyze and present the first algorithmic techniques for minimizing
communication costs while answering set expression cardinality queries with guaranteed accuracy. Applications include detecting Distributed Denial of Service (DDoS) attacks in real time, detecting deviations in traffic flow across network routers, tracking web usage statistics of users. [C1] INTERNSHIPS AND RESEARCH EXPERIENCE Research Intern at Bell Laboratories, Murray Hill, NJ Summer 2003 Mentors: Minos Garofalakis, Rajeev Rastogi and Sumit Ganguly Worked on techniques for distributed set expression cardinality estimation in the Internet Management Research Group at Bell Labs. [C1] (More details above.) Research Intern at Microsoft Research, Redmond, WA Summer 2001 Mentors: Vivek Narasayya and Surajit Chaudhuri Developed an algorithm for automating the physical layout of relational databases [C4]. Also implemented a prototype layout wizard on the Microsoft SQL Server 2000 product and experimentally demonstrated the superiority of the layouts generated by the proposed algorithm over traditional RAID based techniques. This work was done as part of the AutoAdmin project in the Data Management, Exploration and Mining (DMX) group at MSR. Group Membership Protocol, Cornell University 2001 Designed and developed the highly scalable SWIM group membership protocol ([C5,P1]) used by distributed peer-to-peer applications to maintain weakly consistent group membership. Unlike the traditional heart beat based protocols which are popular with distributed systems designers today, SWIM uses a randomized probing technique that imposes constant expected load per member and constant expected failure detection time, independent of group size. Approximating Correlated Sums over Data Streams, Cornell University 2001 Developed efficient approximation algorithms and summary structures for computing approximate correlated sum (CS) aggregates when the independent aggregate can be computed exactly over a data stream in limited space. For the case when the independent aggregate cannot be computed exactly over a data stream, we show the negative result that the error of the CSaggregate can be arbitrarily large.[j2] Advanced DBMS Project, Cornell University 2000 Developed an algorithm for constructing multidimensional histograms over static as well as streaming data, based on a novel objective function to determine bucketization. Distributed Data Mining, IIT Bombay (Senior Thesis) 1999-2000 Advisors: Prof. Sunita Sarawagi and Prof. S. Sudarshan. Developed an efficient classification algorithm for a vertically partitioned hierarchical database that exploits the hierarchy to speed up classification. In contrast to traditional mining approaches, the algorithm incorporates information from various tables in the database without the need for first manually pre-processing the data as a flattened table. CONFERENCE/JOURNAL PUBLICATIONS (Available at http://www.cs.cornell.edu/~asdas) [C1] Abhinandan Das, Sumit Ganguly, Minos Garofalakis and Rajeev Rastogi, Distributed Set- Expression Cardinality Estimation. In Proceedings of the 30 th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, August 2004.
[C2] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Approximation Techniques for Spatial Data. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, June 2004. [C3] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Approximate Join Processing over Data Streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), San Diego, CA, June 2003. [C4] Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das and Vivek Narasayya, Automating Layout of Relational Databases. In Proceedings of the 19 th IEEE International Conference on Data Engineering (ICDE 2003), Bangalore, India, March 2003. [C5] Abhinandan Das, Indranil Gupta and Ashish Motivala, SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol. In Proceedings of the 2002 IEEE International Conference on Dependable Systems and Networks (DSN 2002), Washington DC, June 2002. [J1] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Semantic Approximation of Data Stream Joins. In IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 17, No 1, January 2005. [J2] Rohit Ananthakrishna, Abhinandan Das, Johannes Gehrke, Flip Korn, S. Muthukrishnan and Divesh Srivastava, Efficient Approximation of Correlated Sums on Data Streams. In IEEE Transactions on Knowledge and Data Engineering (TKDE), Volume 15, No 3, May/June 2003. TECHNICAL REPORTS AND POSTERS [T1] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Online Approximation Techniques for Spatial Data. Cornell University Computing and Information Science Tech. Report TR2004-1947, July 2004. [T2] Abhinandan Das, Johannes Gehrke and Mirek Riedewald, Semantic Approximation of Data Stream Joins. Computing and Information Science Tech. Report TR2004-1932, March 2004. [T3] Abhinandan Das, Distributed Data Mining. Senior Thesis, IIT Bombay, May 2000. Advisors: Sunita Sarawagi and S. Sudarshan. [P1] Abhinandan Das, Indranil Gupta and Ashish Motivala, The Dping Scalable Membership Service. In ACM Symposium on Operating Systems Principles (SOSP), Banff, Canada, October 2001. (poster) UNDER PREPARATION Load Smoothing for Data Streams, with Mirek Riedewald and Johannes Gehrke. PATENTS Abhinandan Das, Minos Garofalakis, Rajeev Rastogi and Sumit Ganguly, Distributed Set Expression Cardinality Estimation, Filed December 2004. Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das and Vivek Narasayya, Automating Layout of Relational Databases, Filed March 2003.
ACADEMIC HONORS Recipient of the prestigious Sage Excellence University Fellowship for two years at Cornell University. Ranked in the top 3 amongst undergraduate students in the Computer Science Dept. at IIT Bombay. Among the top 5 CGPA holders in the institute amongst over 400 undergraduate students at IIT Bombay. Recipient of the Director s Special Prize for finishing in the top 5 of my entire batch (all branches of engineering) at IIT Bombay in my freshman year. Received special mention for exceptional performance in mathematics as an undergrad at IIT Bombay. Placed in the top 0.4% amongst approximately 100,000 examinees in JEE 1996, the Joint Entrance Examination for admission to the IITs. Placed first in the Physics-Chemistry-Mathematics group (with 99.33%) in Pune Division in the preuniversity HSC (10+2) examination amongst over 150,000 examinees. Awarded the National Talent Search Scholarship (NTS) in an all India exam held in 1993. TEACHING EXPERIENCE CS432 Introduction to Database Systems Fall 2004 Was the head teaching assistant for this senior level course in Computer Science at Cornell University. Responsibilites included assisting the professor in designing written assignments as well as setting questions for the prelims and course finals. CS433 Practicum in Database Systems Fall 2004 Delivered half the lectures for CS433, the practicum course associated with CS432 where the aim is to learn about database internals by writing C++ code for a large part of a database system. PROFESSIONAL ACTIVITIES External Reviewer SDM 2005, ISI 2005, VLDB 2004, SIGMOD 2004, PODS 2004, EDBT 2004, KDD 2004, ISI 2004, SIGMOD 2003, ICML 2003, KAIS 2003, SSDBM 2003, KDD 2003, SIGMOD 2001, JDM, IEEE Trans. on Knowledge and Data Engineering, VLDB Journal. Reviewer DSN 2003. OTHER ACTIVITIES Programming Contests Finished in the top 10 in the Trilogy Programming Contest held at Cornell University in 2001. Selected for trials for the regional ACM programming contest. Placed 3 rd in the IIT Open C programming contest held in 1999.
Sports Member of the winning team at the district level junior lawn tennis championships in Pune district, India. Played lawn tennis at the State Level (in India) making the quarter finals of two prestigious state ranking tournaments (Pune Open and Karia Trophy) on two occasions each. Active member of the Cornell Badminton Club. PERSONAL DETAILS Citizenship: Indian Visa Status: F1 REFERENCES Prof. Johannes Gehrke Ph: 607-255-1045 Email: johannes@cs.cornell.edu Prof. Sumit Ganguly IIT Kanpur, Kanpur, India. Ph: +91-512-259-8716 Email: sganguly@cse.iitk.ac.in Dr. Mirek Riedewald Ph: 607-255-0110 Email: mirek@cs.cornell.edu Prof. Jayavel Shanmugasundaram Ph: 607-255-4117 Email: jai@cs.cornell.edu Dr. Minos Garofalakis Networking Research Lab, Bell Laboratories, Murray Hill, NJ 07974. Ph: 908-582-1723 Email: minos@research.bell-labs.com