Performance Analysis Proposal

1. Title Performance Analysis Proposal Performance Analysis of P2P VoIP Applications 2. Objective To evaluate the dynamic behavior (i.e., CODEC usage) of VoIP P2P applications (e.g. Skype, Google Talk) under different network conditions and the impact on voice quality. 3. Research Questions Which s achieve better results? (this is not the same as which is the best? ) Which codec adaptations each perform on different network conditions (i.e., bandwidth, delay, jitter, packet loss)? How long does it take to adapt to a good codec? Which applications do cause more packet loss? Given that the network is low (e.g., 30-40K), would the application adapt to it, or the network quality metrics would reflect poor performance? (packet loss). 4. Environment Considerations Two users exchanging VoIP packets across a controlled network, which is a testbed network configured for different conditions using a WAN emulator (such as nist.net). The users must have access to the public Internet, in order to be able to be authenticated by the application, to have access to the contact list and to be able to start a voice call. However, the actual traffic goes through the emulated network. 5. Metrics Two different types of metrics, reflecting different aspects: End-user characteristics: MOS, E-model, PESQ, PSQM, PAMS, voice quality prediction? Network characteristics:, delay, jitter, packet loss. A different metric could be also used to evaluate TCP-friendliness of these s. In other words, at what extent does their adaptation (or the lack of it) harm (or not) TCP flows?

6. Experimental Design 6.1. Parameters Experiments assume two users, S (the sender) and R (the receiver) sending/receiving a fixed audio source over an emulated IP network. A fixed audio source must be used because depending on the sound source, Skype (and possibly others) generates different bit rates (this was observed in Net-Peeker, during a Skype session). 6.2. Metrics Metrics are the observable outcome of an experiment (also known as response variable). The following metrics are defined for these experiments, along with the method used for collecting them. a) The average size of packets delivered to R (measured on S) Packet sizes can be easily collected by reading the tcpdump file. b) The time the application takes to adapt after changing network conditions (assuming it redefines its strategy/choose another codec). A variant of this metric will be the initial adaptation delay, i.e., the time an application takes to find the correct codec/configuration for a given network condition. Adaptation delay is the time between two unambiguously different bitrate levels (or two average packet sizes, if we can get some insight from it, in case the application adapts by changing codec sampling or frame size). c) The PESQ MOS, as defined by the ITU recommendation P.862 ranging from 1.0 (worst) up to 4.5 (best). It represents the effect of network conditions on end-user audio perceived quality The PESQ MOS may be obtained by a method comprised of two phases: first, recording output sound in S and input sound in R; and second, submitting both sound files to the PESQ algorithm (C program freely available). d) Bitrate generated by the application (measured on S), used for comparing with the actual throughput observed on R and for computing the adaptation delay. Tcpstat computes the bitrate using the tcpdump file as input. e) Throughput, delay, jitter, packet loss (measured on R) Throughput: tcpstat Packet loss and jitter: ipstats Delay: ipstat + NTP

6.3. s and s s are variables that affect the outcome of the experiments. Each factor has a set of alternative values (levels). s are the values each factor can assume, that is, each level is an alternative for the correspondent factor. # 1 Skype, Google Talk, Yahoo Messenger with voice 2 Noisy (e.g. music), conversation (with periods of silence) 3 1M, 256K 150K, 56K, 28K, 9.6K 3 0, 1ms, 10ms, 100ms, 500ms, 1s, 10s 4 WAN packet loss 0.01%, 0.1%, 1%, 5%, 10%, 50% 5 WAN Jitter 0, 10%, 20% 6 TCP session (using TG?), 6.4. Experiments Next sections describe the specific experiments to be conducted. The factors and levels are described in the correspondent table and the observable values are all metrics defined on section 6.2. Since random variables are involved, in order to achieve good statistical accuracy it must be adopted a confidence level of 95% and a maximum error (precision) of 5%. Experiment duration: 1 hour Replications: 60 replications of 1 minute each. Two different types of experiments may be considered. In the first type the values of the levels are fixed, i.e., they are not varied during each experiment. In the second type, the levels are varied during the experiments. The adaptation delay can only be obtained from dynamic experiments. On the other hand, if only dynamic experiments are used, it may be difficult to collect the other metrics and associate them to a specific network condition. In other words, it may be difficult to automatically and unambiguously tell which metrics refers to each network conditions, because of the adaptation delay, that may vary depending on the type and value of network parameter that is changed (loss, delay,, etc). 6.4.1. Impact of bottleneck on selected metrics

10Mbps, 1Mbps, 100kbps, 56kbps, 28kbps, 9.6kbps Fixed = 100ms 6.4.2. Impact of delay on selected metrics Fixed = 128kbps 0, 1ms, 10ms, 100ms, 500ms, 1s, 10s 6.4.3. Impact of packet loss on selected metrics Fixed = 128kbps Fixed = 100ms WAN packet loss 0%, 1%, 5%, 10%, 50%

6.4.4. Impact of Jitter on selected metrics Fixed = 128kbps Fixed = 100ms WAN Jitter 0, 10%, 20% 6.4.5. TCP friendliness of the 10Mbps, 1Mbps, 100kbps, 56kbps, 28kbps, 9.6kbps Fixed = 1ms A (stable) TCP session 7. Research Group John Doe Jane Doe Joe Doe 8. Workplan Action 1 Environment Configuration

Activity 1.1 Installing Nist.Net in the NGN testbed Activity 1.2 Configuring routing and testing Action 2 Finding voice quality metrics Activity 2.1 Surveying voice quality metrics Activity 2.2 Comparing different metrics Activity 2.3 Choosing a particular metric Action 3 Background Survey Activity 3.1 Reading Skype paper Activity 3.2 Looking for more information on Skype, Google talk, etc Activity 3.3 Surveying the state of the art in similar papers (involving typical VoIP comparisons, for instance, SIP and H.323) Action 4 Performing experiments 9. Schedule Activity 4.1 Defining detailed experimental results Activity 4.2 Performing experiments Activity 4.3 Collecting and analyzing results Activity 4.4 Formatting and writing down results Activity 1.1 1.2 2.1 2.2 2.3 3.1 3.2 3.3 4.1 4.2 4.3 4.4 Week 1 2 3 4 5 6 7 8 9 10 11 12