Network Modeling, Fall 2012 Final Exam

Network Modeling, Fall 2012 Final Exam Date: 4 February, 2013 INSTRUCTIONS: There are 6 problems in total, each with 4 points. You must choose and solve **any 5 problems** (out of the 6) that you prefer! You can try all problems, but in the end you have to indicate which 5 you would like to be counted towards your grade. **NOTE: If you do not indicate this, I will choose the 5 out of the 6 that get the lowest grade! So, please don t forget. HINTS: When you re dealing with large variables (e.g. number of nodes) plus some constant, you could ignore the constants if this doesn t change your argument significantly Be smart about how you approach questions. Sometimes you might be able to come up with (and justify) the correct answer, without long calculations! Problem 1: Centralities and more. (a) Consider a network that is small-world. Can it also have a high clustering coefficient? YES or NO? Justify your answer and draw an example network. (b) Consider the network in the following figure 1. Indicate the node with the highest and lowest centrality with respect to (Note: if there are more than one nodes with the same (high or low) centrality, you can indicate any of them.) 1. degree centrality; 2. betweeness centrality; 3. PageRank centrality (assume that each undirectional link between nodes i-j is replaced with two directional links, one from i to j and one from j to i, for this calculation assume also that the β factor in PageRank is 0, i.e. α = 1) 1

Figure 1: (c) You want to attack the network of figure 1 by removing nodes (from highest to lowest): (i) based on degree centrality, (ii) based on betweeness centrality, (iii) based on PageRank centrality. Which of the three strategies will disconnect the network the fastest (that is, will need the fewest nodes removed). (d) Assume that one leader node exists in the network. To distribute the load of the leader among nodes, the leader node holds a token for one time unit, then gives it to any of its neighbors (with equal probability), which also holds the token for one time unit, then gives it to any of its neighbors, and so forth. Assume that node 1 starts being the leader first. 1. Where would you guess the token is, if you observe the network after a long amount of time? Justify your answer briefly. 2. Can you modify the token-moving protocol to ensure that each node becomes the leader for a similar amount of time, on average? Problem 2: Markov Chains and Choosing Servers. Consider the two discretetime markov chains shown in Fig. 2. Assume you start at state 1. You are interested in the long-term (limiting) probability of finding the chain in state 3 (lim n p (n) 13 ). (a) For which of the two chains is the above probability higher? Chain (a)? Chain (b)? Equal for both? Justify your answer. (b) Qhich of the two chains converges to its stationary distribution faster? Chain (a)? Chain (b)? Both equally fast? Justify your answer. (c) You want to download a large file, and you are offered 4 options of mirror servers to choose from. All servers have the same arrival rate for traffic, Poisson with rate λ = 0.5, but differ in terms of the service times: Server 1: file download times are exponentially distributed with mean 1 (time unit). 2

Figure 2: Server 2: all files take exactly 1 time unit to download. Server 3: file download times are uniformly distributed in (0, 2). Server 4: file download times are either exactly 0.2 with probability 0.9 or 8.2 with probability 0.1. Which server would you prefer if you would want to minimize the (queueing) time you wait until your file starts downloading? Rank all 4 servers in terms of your preference. Problem 3: Consider the server farm shown in Figure 3. The arrival stream is a Poisson process with rate λ. Each job with probability p is sent to Host 1 and with probability 1 p is sent to Host 2. There is a queue at each host. Host 1 has service rate Exp(µ 1 ). Host 2 has service rate Exp(µ 2 ). (a) Assume µ 1 = µ 2. Either prove or disprove that E[T Q ] (queueing time) and E[T ] (response time) are always minimized when p is chosen to balance the load (the load of a server is ρ, the utilization of the server). (b) Now assume µ 1 µ 2. Either prove or disprove that E[T Q ] (queueing time) and E[T ] (response time) are always minimized when p is chosen to balance the load. Problem 4 - A web server with priorities You are downloading files from a given web server. Download requests are generated as a Poisson process with rate λ. The sizes of the files you download (and thus the download times or service times) are random and follow a generic distribution with first and second moments E[D] and E[D 2 ]. This server however is also serving premium users: while you download a file, the server might pause your download to serve premium users; the duration of a pause, is a generally distributed random variable R. 3

Figure 3: after the pause, your download continues from the point it stopped; however, your download might pause again, so multiple interruptions might occur during a single file download. the time between the end of one pause and the beginning of the next one is exponentially distributed with rate α. (a) Denote the number of interruptions during a single file download as F s. Derive E[F s ] and E[F 2 s ]. (b) Derive the expected service time for a file you download (this included both the actual download time, and the times you have to pause, waiting for premium users) and the variance of the service time. (c) If you issue a request for a download, while your previous requests have not yet finished, then the new requests will have to queue. Assuming a normal, first-come first serve system, derive the expected response time for your downloads. Problem 5: In most networks, links are not drawn completely at random, but with a tendency to connect nodes which share some common traits. As a simplistic model assume that nodes have different colors, such that there are N red nodes and N blue nodes (assume that N is large). The probability for an edge between nodes of identical color is p and the probability for an edge between nodes of different color is q. Thus for normal networks p > q. For q = 0 the result is a network of two disjoint clusters each consisting of unicolored nodes. (a) What is the minimal values of p and q in order to reach global connectivity (with a high probability)? 4

(b) Is the resulting network small world? If so, for what values of q? Prove or disprove. (c) You decide to sample this network using a simple random walk (not MCMC). You start your random walk from a red node. How long will it take on average to sample the first blue node? (d) At what rate does the above random walk converge to its stationary distribution, as a function of p, q, and N. Problem 6: (a) You are told that a given network has the following the degree distribution: the probability of having a node with degree k is p(k) = ck 3, and no other information is given. Draw a network of at least 20 nodes that looks like it follows this distribution (NOTE: no need to follow the distribution exactly! just the key characteristics you would expect to observe in such a network.) (b) Assume now that a much larger network of N nodes (e.g. the Internet) has the degree distribution of question (a). You are AKAMAI and would like to build a CDN (content distribution overlay) over this network. Specifically, out of the N nodes you must choose a small subset of at most L (super)nodes, where you will cache popular content. Your goal is to maximize the number of nodes (among the remaining N L) that have a direct connection to at least one of the L supernodes. How would you pick the L supernodes for this network? (c) Can you think of a network structure where the above algorithm might not work well? Draw a small network example (assume L = 2). Explain what properties in general (e.g. combination of degree distribution, clustering coefficient, path lengths, etc.) you would expect to create trouble for the algorithm. Propose a better algorithm for such networks. (d) Assume N = 10000. Estimate the value of L would you need so that, on average, 50% of the nodes have a direct link to at least one of the L supernodes. 5