Traffic Driven Analysis of Cellular Data Networks
|
|
- Brandon Cobb
- 8 years ago
- Views:
Transcription
1 Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu Subramanian (Alcatel Lucent Bell Labs)
2 Mobile Data Usage Higher than the traffic volume in the entire Global Internet In EB / month 0.6 EB / month Forecast of Global Mobile Data Traffic Source: CISCO VNI Mobile 1 Exabyte = 1 million Terabyte Relatively little research on nature of mobile data traffic. 2
3 Modeling and Forecasting Traffic Management Traffic Analysis 3
4 Measurement Infrastructure Packet Flows Internet Mobility and Session Manager Flow Monitoring Tool Flow Records SQL Database Radio Access Network
5 Sample Results from Traffic Analysis Data collected from a nationwide 2G/3G network circa 2007 About 10K BSes, 1M subscribers. Significant traffic imbalance per subscriber and per BS 1% of subscribers create more than 60% of load. 10% of BSes experience more than 50% of load. Mobility is generally low More than 50% subscribers stick to just one BS daily. Median radius of gyration is ~1 mile.
6 Sample Results from Traffic Analysis Mobility is predictable Subscribers are almost always found in their top 2 3 most visited locations. They return to the same location at the same time of the day with high probability. More mobile subscribers tend to generate more traffic. Radio resource usage efficiency is very poor Much poorer for light users relative to heavy users.
7 Functional Influence Among BSes Model BS load as time series. Explore causal relationships between pairs of time series. Granger Causality Determines whether one time series is useful in forecasting another when using an autoregressive model. Has been used in economics and neuroscience. Statistically significant causality exists among neighboring BSes (roughly among half of the neighbors). Causality graph and causal path Make a graph out of causality. Long paths exist in this graph (median = 15 hops, 90 percentile = 37 hops).
8 Modeling Study Model BS traffic loads exploiting any interactions/dependencies Exploit tools from machine learning. Many possible directions purely static/spatial, dynamic/temporal. Goals: Intellectual broad understanding of any underlying structure would help future network architectures. Utilitarian models can help estimation/forecasting. Useful for various resource management.
9 Spatial Modeling Approach: Probabilistic Graphical Modeling Assume load on n base stations are multi variate Gaussian: Mean vector Covariance matrix Learn the parameters given a set of training data, specifically the inverse covariance matrix, given a set of training data (p observations). 1 is easier to estimate than and exposes interesting properties.
10 Inverse Covariance Matrix: Properties If then load variables X i and X j are conditionally independent, given the rest of the variables. Most problems produce a `sparse model. Related to probabilistic graphical models (e.g., Gaussian Markov Random Field). 1 3 Undirected Graphical Model > Edge 5 > no edge Graph properties translate to probabilistic (in)dependencies 2 4
11 Inference Problem Estimate load for BS i given the load of a subset of BSes S as the conditional mean: Broad questions: How large should be S? Effort vs. accuracy tradeoff. How to choose S? Measure only a subset and estimate the rest.
12 First Solve the Learning Problem Learn the inverse covariance matrix from training data. How? Exploit relationship with linear regression modeling. Express load of BS i as a linear function of all other BS loads and then regress: Y i X j j i i Regression coefficients j can be shown to be directly related to inv. cov. matrix elements.
13 Sparse Models i Sparse model > many regression coeffs j are zero. Reduces danger of over fitting (lowering variance). Also, computationally efficient. Introduce a regularization term in regression. We used Lasso. Empirical error Regularization term modeling penalty
14 Regularization Cross validate using additional training samples (not used for model creation). Use various values of to create different models. Choose the one with max likelihood.
15 Data Processing Hourly load of 400 BSes covering 75 x 84 miles area. Includes a busy downtown and surrounding suburbs. No temporal dimension in model. Create different models for for different parts of the day (every 4 hours). Account for diurnal variation of load. Use residuals from a fitting function. Residuals pass normality test.
16 Average Edge Length in the Model Graph In miles In hops (in Voronoi Graph) Apparent spatial/regional significance.
17 Choosing the Measured Set S Greedy strategy each iteration picks the BS that minimizes the error estimate. Higher load first achieves almost similar performance.
18 Impact of Estimation Accuracy on Applications We understand the measurement complexity (size of S) vs. Error tradeoff. But how much accuracy do we need? Need to turn to applications Studied two applications Energy Management Opportunistic Traffic Scheduling
19 Opportunistic Traffic Scheduling Similar to Smart Electric Grid move non urgent traffic from peak to off peak periods. What is non urgent? p2p, large downloads, sync, push, etc. Who decides? User agent on mobile. May have multiple levels of priority or have deadlines to aid scheduling. Carriers can incentivize such scheduling. Similar to QoS scheduling but at a higher layer and at a longer time scale. Two components in System Architecture Server (Scheduler) in core network. User agent on mobile that coordinates with Server.
20 Server (scheduler) in the core network Creates low-priority flow Deadline=2hr Time Line 2PM 2:30PM 3PM 3:30PM 20
21 Solving the Scheduling Problem Several approaches possible based on how flows are prioritized. But for any approach, server needs to be able estimate current/future loads at all BSes. Also, needs to model/estimate subscriber mobility (separate problem). Poor estimation leads to poor scheduler performance.
22 Evaluation Approach Trace driven simulator based on a capacity model of BSes. Opportunistic scheduling is meant to admit more traffic but with the same network capacity. We use the same traffic trace always, but reduce network capacity to demonstrate impact. Impact? Do low priority flows still finish within a reasonable time? Are high priority flows impacted?
23 Results Low priority flows = random subset of longlived flows (over 25 mins), about 8% of all flows. Randomly chosen deadlines 1 4 hours. Rest high priority. Scheduling epoch hourly. Only a subset of 400 BSes are measured, rest estimated.
24 Conclusions Discovering structures in mobile traffic is a rich area of study. Applications in network and resource management.
25 Questions? Modeling and Forecasting Traffic Management Traffic Analysis 25
Opportunistic Traffic Scheduling in Cellular Data Networks
Opportunistic Traffic Scheduling in Cellular Data Networks Utpal Paul, Milind Madhav Buddhikot, Samir R. Das Computer Science Department, Stony Brook University, Stony Brook, NY 794-44, U.S.A. Alcatel-Lucent
More informationStatistical machine learning, high dimension and big data
Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,
More informationLoad Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach
WC25 User-in-the-loop: A Spatial Traffic Shaping Approach Ziyang Wang, Rainer Schoenen,, Marc St-Hilaire Department of Systems and Computer Engineering Carleton University, Ottawa, Ontario, Canada Sources
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationCrowdsourcing mobile networks from experiment
Crowdsourcing mobile networks from the experiment Katia Jaffrès-Runser University of Toulouse, INPT-ENSEEIHT, IRIT lab, IRT Team Ecole des sciences avancées de Luchon Networks and Data Mining, Session
More informationProbabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationTraffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms
Traffic Prediction in Wireless Mesh Networks Using Process Mining Algorithms Kirill Krinkin Open Source and Linux lab Saint Petersburg, Russia kirill.krinkin@fruct.org Eugene Kalishenko Saint Petersburg
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationMarketing Mix Modelling and Big Data P. M Cain
1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationMOBILE DATA FORECASTING TOOLS AND METHODOLOGY TO IMPROVE ACCURACY AND OPTIMIZE PROFIT
MOBILE DATA FORECASTING TOOLS AND METHODOLOGY TO IMPROVE ACCURACY AND OPTIMIZE PROFIT STRATEGIC WHITE PAPER As wireless network operators are faced with enormous CAPEX and OPEX decisions, effective and
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationA Learning Based Method for Super-Resolution of Low Resolution Images
A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method
More informationOn Correlating Performance Metrics
On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationBackbone Capacity Planning Methodology and Process
Backbone Capacity Planning Methodology and Process A Technical Paper prepared for the Society of Cable Telecommunications Engineers By Leon Zhao Senior Planner, Capacity Time Warner Cable 13820 Sunrise
More informationBig Data Techniques Applied to Very Short-term Wind Power Forecasting
Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with
More informationManaging Incompleteness, Complexity and Scale in Big Data
Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:
More informationInternational Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013
A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:
More informationNetwork Discovery from Passive Measurements
Network Discovery from Passive Measurements Brian Eriksson UW-Madison bceriksson@wisc.edu Paul Barford UW-Madison pb@cs.wisc.edu Robert Nowak UW-Madison nowak@ece.wisc.edu ABSTRACT Understanding the Internet
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationVector Time Series Model Representations and Analysis with XploRe
0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin mungo@wiwi.hu-berlin.de plore MulTi Motivation
More informationAUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.
AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationFigure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
More informationWireless Technologies for the 450 MHz band
Wireless Technologies for the 450 MHz band By CDG 450 Connectivity Special Interest Group (450 SIG) September 2013 1. Introduction Fast uptake of Machine- to Machine (M2M) applications and an installed
More informationWhite paper. Mobile broadband with HSPA and LTE capacity and cost aspects
White paper Mobile broadband with HSPA and LTE capacity and cost aspects Contents 3 Radio capacity of mobile broadband 7 The cost of mobile broadband capacity 10 Summary 11 Abbreviations The latest generation
More informationOn the effect of forwarding table size on SDN network utilization
IBM Haifa Research Lab On the effect of forwarding table size on SDN network utilization Rami Cohen IBM Haifa Research Lab Liane Lewin Eytan Yahoo Research, Haifa Seffi Naor CS Technion, Israel Danny Raz
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationStability of QOS. Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu
Stability of QOS Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu Abstract Given a choice between two services, rest of the things being equal, it is natural to prefer the one with more
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationIntegrated System Modeling for Handling Big Data in Electric Utility Systems
Integrated System Modeling for Handling Big Data in Electric Utility Systems Stephanie Hamilton Brookhaven National Laboratory Robert Broadwater EDD dew@edd-us.com 1 Finding Good Solutions for the Hard
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationJava Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
More informationWEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics
WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics Visualization of Matrices Good visuals anchor any presentation. MATLAB has a wide variety of ways to display data and calculation results that can be
More informationNetwork (Tree) Topology Inference Based on Prüfer Sequence
Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,
More informationMulti-layer MPLS Network Design: the Impact of Statistical Multiplexing
Multi-layer MPLS Network Design: the Impact of Statistical Multiplexing Pietro Belotti, Antonio Capone, Giuliana Carello, Federico Malucelli Tepper School of Business, Carnegie Mellon University, Pittsburgh
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationBayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University caizhua@gmail.com
Bayesian Machine Learning (ML): Modeling And Inference in Big Data Zhuhua Cai Google Rice University caizhua@gmail.com 1 Syllabus Bayesian ML Concepts (Today) Bayesian ML on MapReduce (Next morning) Bayesian
More informationA Catechistic Method for Traffic Pattern Discovery in MANET
A Catechistic Method for Traffic Pattern Discovery in MANET R. Saranya 1, R. Santhosh 2 1 PG Scholar, Computer Science and Engineering, Karpagam University, Coimbatore. 2 Assistant Professor, Computer
More informationUnderstanding Traffic Dynamics in Cellular Data Networks
Understanding Traffic Dynamics in Cellular Data Networks Utpal Paul, Anand Prabhu Subramanian, Milind Madhav Buddhikot,SamirR.Das Computer Science Department, Stony Brook University, Stony Brook, NY 11794-44,
More information16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
More informationCourse: Model, Learning, and Inference: Lecture 5
Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.
More information3. The Junction Tree Algorithms
A Short Course on Graphical Models 3. The Junction Tree Algorithms Mark Paskin mark@paskin.org 1 Review: conditional independence Two random variables X and Y are independent (written X Y ) iff p X ( )
More informationDefining the Smart Grid WAN
Defining the Smart Grid WAN WHITE PAPER Trilliant helps leading utilities and energy retailers achieve their smart grid visions through the Trilliant Communications Platform, the only communications platform
More informationBlind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections
Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections Maximilian Hung, Bohyun B. Kim, Xiling Zhang August 17, 2013 Abstract While current systems already provide
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationCAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?
More informationDrugs store sales forecast using Machine Learning
Drugs store sales forecast using Machine Learning Hongyu Xiong (hxiong2), Xi Wu (wuxi), Jingying Yue (jingying) 1 Introduction Nowadays medical-related sales prediction is of great interest; with reliable
More informationThere are a number of factors that increase the risk of performance problems in complex computer and software systems, such as e-commerce systems.
ASSURING PERFORMANCE IN E-COMMERCE SYSTEMS Dr. John Murphy Abstract Performance Assurance is a methodology that, when applied during the design and development cycle, will greatly increase the chances
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationUsing Duration Times Spread to Forecast Credit Risk
Using Duration Times Spread to Forecast Credit Risk European Bond Commission / VBA Patrick Houweling, PhD Head of Quantitative Credits Research Robeco Asset Management Quantitative Strategies Forecasting
More information240ST014 - Data Analysis of Transport and Logistics
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research MASTER'S
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationKeywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
More informationVENDOR MANAGED INVENTORY
VENDOR MANAGED INVENTORY Martin Savelsbergh School of Industrial and Systems Engineering Georgia Institute of Technology Joint work with Ann Campbell, Anton Kleywegt, and Vijay Nori Distribution Systems:
More informationBroadband Quality in Public Libraries: Speed Test Findings and Results
Broadband Quality in Public Libraries: Test Findings and Results March 1, 2015 by John Carlo Bertot, Ph.D. Co-Director and Professor jbertot@umd.edu Jean Lee Graduate Research Associate Nishit Pawar Graduate
More informationNeural Network Add-in
Neural Network Add-in Version 1.5 Software User s Guide Contents Overview... 2 Getting Started... 2 Working with Datasets... 2 Open a Dataset... 3 Save a Dataset... 3 Data Pre-processing... 3 Lagging...
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationMeasurement and Modelling of Internet Traffic at Access Networks
Measurement and Modelling of Internet Traffic at Access Networks Johannes Färber, Stefan Bodamer, Joachim Charzinski 2 University of Stuttgart, Institute of Communication Networks and Computer Engineering,
More informationStatistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
More informationCairo, May 18, 2009 INET. Olaf Kolkman
Olaf Kolkman The Hourglass: A Simplified Model Application Layer: Applications use IP for connectivity The Network Access Layer: Components in the Network Access Layer deliver IP connectivity The IP Layer:
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationTime Series Analysis of Aviation Data
Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in
More informationMonotonicity Hints. Abstract
Monotonicity Hints Joseph Sill Computation and Neural Systems program California Institute of Technology email: joe@cs.caltech.edu Yaser S. Abu-Mostafa EE and CS Deptartments California Institute of Technology
More informationAssignment #3 Routing and Network Analysis. CIS3210 Computer Networks. University of Guelph
Assignment #3 Routing and Network Analysis CIS3210 Computer Networks University of Guelph Part I Written (50%): 1. Given the network graph diagram above where the nodes represent routers and the weights
More informationGraphical Modeling for Genomic Data
Graphical Modeling for Genomic Data Carel F.W. Peeters cf.peeters@vumc.nl Joint work with: Wessel N. van Wieringen Mark A. van de Wiel Molecular Biostatistics Unit Dept. of Epidemiology & Biostatistics
More informationParallelization Strategies for Multicore Data Analysis
Parallelization Strategies for Multicore Data Analysis Wei-Chen Chen 1 Russell Zaretzki 2 1 University of Tennessee, Dept of EEB 2 University of Tennessee, Dept. Statistics, Operations, and Management
More informationPerformance of TD-CDMA systems during crossed slots
Performance of TD-CDMA systems during s Jad NASREDDINE and Xavier LAGRANGE Multimedia Networks and Services Department, GET / ENST de Bretagne 2 rue de la châtaigneraie, CS 1767, 35576 Cesson Sévigné Cedex,
More informationExperiments on the local load balancing algorithms; part 1
Experiments on the local load balancing algorithms; part 1 Ştefan Măruşter Institute e-austria Timisoara West University of Timişoara, Romania maruster@info.uvt.ro Abstract. In this paper the influence
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationEmpowering Developers to Estimate App Energy Consumption. Radhika Mittal, UC Berkeley Aman Kansal & Ranveer Chandra, Microsoft Research
Empowering Developers to Estimate App Energy Consumption Radhika Mittal, UC Berkeley Aman Kansal & Ranveer Chandra, Microsoft Research Phone s battery life is critical performance and user experience metric
More informationDo Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*
Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?* Jennjou Chen and Tsui-Fang Lin Abstract With the increasing popularity of information technology in higher education, it has
More informationMultiple Kernel Learning on the Limit Order Book
JMLR: Workshop and Conference Proceedings 11 (2010) 167 174 Workshop on Applications of Pattern Analysis Multiple Kernel Learning on the Limit Order Book Tristan Fletcher Zakria Hussain John Shawe-Taylor
More informationThe Answer Is Blowing in the Wind: Analysis of Powering Internet Data Centers with Wind Energy
The Answer Is Blowing in the Wind: Analysis of Powering Internet Data Centers with Wind Energy Yan Gao Accenture Technology Labs Zheng Zeng Apple Inc. Xue Liu McGill University P. R. Kumar Texas A&M University
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationUse of deviance statistics for comparing models
A likelihood-ratio test can be used under full ML. The use of such a test is a quite general principle for statistical testing. In hierarchical linear models, the deviance test is mostly used for multiparameter
More informationCollege Readiness LINKING STUDY
College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)
More informationIRMOS Newsletter. Issue N 4 / September 2010. Editorial. In this issue... Dear Reader, Editorial p.1
IRMOS Newsletter Issue N 4 / September 2010 In this issue... Editorial Editorial p.1 Highlights p.2 Special topic: The IRMOS Cloud Solution p.5 Recent project outcomes p.6 Upcoming events p.8 Dear Reader,
More informationStatistical Prediction Models for Network Traffic Performance
Statistical Prediction Models for Network Traffic Performance Kejia Hu, Alex Sim Scientific Data Management Research Group Computational Research Division Lawrence Berkeley National Laboratory AND Demetris
More informationCopyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?
Copyright Network and Protocol Simulation Michela Meo Maurizio M. Munafò Michela.Meo@polito.it Maurizio.Munafo@polito.it Quest opera è protetta dalla licenza Creative Commons NoDerivs-NonCommercial. Per
More informationLearning Gaussian process models from big data. Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu
Learning Gaussian process models from big data Alan Qi Purdue University Joint work with Z. Xu, F. Yan, B. Dai, and Y. Zhu Machine learning seminar at University of Cambridge, July 4 2012 Data A lot of
More informationThe Coremelt Attack. Ahren Studer and Adrian Perrig. We ve Come to Rely on the Internet
The Coremelt Attack Ahren Studer and Adrian Perrig 1 We ve Come to Rely on the Internet Critical for businesses Up to date market information for trading Access to online stores One minute down time =
More informationReview of Transpower s. electricity demand. forecasting methods. Professor Rob J Hyndman. B.Sc. (Hons), Ph.D., A.Stat. Contact details: Report for
Review of Transpower s electricity demand forecasting methods Professor Rob J Hyndman B.Sc. (Hons), Ph.D., A.Stat. Contact details: Telephone: 0458 903 204 Email: robjhyndman@gmail.com Web: robjhyndman.com
More informationExtracting correlation structure from large random matrices
Extracting correlation structure from large random matrices Alfred Hero University of Michigan - Ann Arbor Feb. 17, 2012 1 / 46 1 Background 2 Graphical models 3 Screening for hubs in graphical model 4
More information4/3/2014 STATISTICAL APPLICATIONS IN MARKET RESEARCH. Introductions. Tiffany Bonus, MS Chris Claeys, MS
STATISTICAL APPLICATIONS IN MARKET RESEARCH Introductions Tiffany Bonus, MS Chris Claeys, MS 1 Agenda What is Market Research? Job Responsibilities Important Skills Research Topics Statistical Applications
More informationMachine Learning for Data Science (CS4786) Lecture 1
Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:
More informationIris Sample Data Set. Basic Visualization Techniques: Charts, Graphs and Maps. Summary Statistics. Frequency and Mode
Iris Sample Data Set Basic Visualization Techniques: Charts, Graphs and Maps CS598 Information Visualization Spring 2010 Many of the exploratory data techniques are illustrated with the Iris Plant data
More informationEvaluation of Machine Learning Techniques for Green Energy Prediction
arxiv:1406.3726v1 [cs.lg] 14 Jun 2014 Evaluation of Machine Learning Techniques for Green Energy Prediction 1 Objective Ankur Sahai University of Mainz, Germany We evaluate Machine Learning techniques
More informationAnalysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
More informationData Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More information