Understanding Linux Migrations: How easy is it to change distributions?
1) Abstract In this paper, we look at the reality behind the perception that using Linux gives users complete flexibility to change distributions by trying to answer the following four questions: 1. Across a variety of workloads and organization sizes, are there significant costs to migrate between Linux distributions? 2. What cost categories drive migration costs? 3. What factors influence whether migration to a new Linux distribution will be relatively expensive or inexpensive? 4. Can hiring experienced Linux administrators allow organizations to mitigate migration costs? We did so by assessing the costs incurred when migrating servers to a new Linux distribution. We gathered data through guided interviews with 23 system administrators who had migrated 28 groups of servers to new Linux distributions in the past three years. These migrations were performed at North American companies with more than 290 full-time employees. The companies have IT environments with both Windows and Linux servers, and came from the following verticals: media, manufacturing, healthcare, biotech, energy, engineering, and software. The data showed migrating servers to a new Linux distribution is not necessarily easy, inexpensive, or predictable. 1. To the question, Across a variety of workloads and organization sizes, are there significant costs to migrate between Linux distributions? Findings: Migrations are often costly. We analyzed migration costs and compared them to the original installation and deployment costs, using 20% as the cut-off for significiant costs because annual software maintenance costs are about 20% of installation cost. We found almost 40% of migrations involved significant costs (i.e. over 20% of the original installation cost) 2. What cost categories drive migration costs? Findings: Labor drives migration costs. Averaging the cost breakdown by category across migrations showed labor make up more than two-thirds of migration costs. Also, half the migrations required more labor than the original installation of the migrated servers 3. What factors influence whether migration to a new Linux distribution will be relatively expensive or inexpensive? Findings: Migration costs are hard to predict. An Ordinary Least Squares regression found no statistically significant relationships between the migration cost or labor effort and number of servers, number or workloads, combination of workloads, use of custom applications, or years of staff experience 4. Can hiring experienced Linux administrators allow organizations to mitigate migration costs? Findings: Not likely. The lack of a statistically significant relationship between migration costs and years of staff experience indicates staff experience is not a good predictor and has a limited effect on mitigating migration costs The next section provides a detailed description of our findings to these four questions. Question 1 is addressed in Section 2.1, Question 2 is addressed in Section 2.2, and Questions 3-4 are addressed in Section 2.3. www.ke y-inc.com {1}
2) Key Findings 2.1 When migrating between Linux distributions, costs were often significant, with a wide range at the high end If two distributions are completely different, then a migration to a new distribution would be like a new installation. So, to gauge the degree to which two distributions are different, we measured migration costs relative to the original cost to set up the migrated servers. These costs include those associated with new hardware, software, staff training, outsourcing, downtime, running servers in parallel and labor. Hardware costs include any equipment purchased because the original equipment was would not run with the new distribution. Software costs included new license and support fees required because the previous software version could not be used with the new distribution. New license fees were counted in full, but only the first year of support fees was included. Training costs include coursework fees and IT staff s reduced efficiency while ramping up on the new distribution. Outsourcing costs includes fees for third-party services. Downtime costs were the indirect costs of lost productivity of affected users. Running servers in parallel accounted for costs of running the old servers in parallel with the migrated ones. Figure 1: Distribution of migration costs as percent of installation costs Figure 1 is the distribution of migration costs as a percent of installation costs. Because annual software maintenance costs are about 20% of installation cost, we used 20% as a dividing line for significant migration cost. Figure 1 shows that almost 40% of migrations involved significant costs (i.e. more than 20% of the original installation costs). Relatively difficult migrations seem to be fairly common. We examine these cases more closely in Figure 2, which shows the distribution of the 11 migrations with costs more than 20% of installation cost. Figure 2: Distribution of migration costs as percent of installation costs when migration cost percentage > 20% Figure 2 shows that when migration costs are significant, they fall into a wide range that can reach 100% of the original installation cost. 2.2 Costs are driven by labor, and effort required is often more than original installation To understand what drives migration costs, we look at where migrations dollars are spent. Figure 3 shows the migration cost breakdown, which was generated by averaging the cost breakdown for each migration. {2}
Figure 3: Average migration cost by category a 50% chance that building a completely new system would be easier than a migration, and the interchangeability of distributions does not seem to be a commonly realized benefit. Figure 5 is the distribution of labor effort for those instances when the migration labor effort was greater than the installation effort. Figure 5:Distribution of migration labor as percent of installation labor when migration labor percentage > 100% Figure 3 shows labor costs make up more than twothirds of migration costs; so, we focus our attention there. If two distributions are completely different, then a migration would be similar to a new installation. So, we used 100% as a dividing line for significant migration labor effort. Figure 4 is the distribution of the labor required to migrate the servers as a percent of the labor required to originally install the servers. Figure 4: Distribution of migration labor as percent of installation labor* Figure 4 shows half the migrations required more labor for the migration than the original installation. There is Figure 5 shows labor effort was most often between 100% and 300% of the original installation effort, but in extreme cases, could be almost 10 times the effort. For migrations to require more effort than installations potentially indicates large differences between some distributions. Enterprises considering migrations should not use their original installation as a benchmark for the difficulty of the migration. 2.3 Migration costs cannot be predicted from number of servers, workloads, use of custom applications, or staff experience After seeing the wide ranges shown in Figures 1, 2, 4, and 5, we attempted to identify factors that might determine whether one is likely to experience low or high migration costs. We tested the following factors effect on migration costs: *Figure 4 has four fewer data points than Figure 1 because four interviewed administrators knew the total budget for the original installation but not breakdown by category. Their migrations were excluded from this analysis. {3}
number of migrated servers number of workload types on migrated servers use of custom software staff experience Figure 6 is a scatter plot of the number of servers and cost of each migration. Figure 6: Migration costs as percent of installation costs vs. number of migrated servers group averages was significant. Figures 7 and 8 show the average and range of migration costs and labor efforts, respectively, for different numbers of workloads. In these figures, the groups are labeled on the X-axis. The lowest value for that group is displayed at the bottom of the line, the average is shown next to the marker, and the highest value is displayed at the top of the line. So, in Figure 7, for migrations involving one workload, the lowest migration cost was 2% of original installation, the average was 25%, and the maximum was 99%. Figure 7: Range of migration costs as percent of installation costs by number of workloads No dependency is readily apparent and this initial analysis indicates the number of servers does not appear to be a strong driver of migration costs. To test the effect of number and combination of workload types, we classified software on the migrated servers as one of three workload types business applications, databases, and edge infrastructure, and grouped migrations by number of workload. Then, the average, highest, and lowest migration cost and labor effort as a percentage of original installation costs and labor effort were then calculated for each group. The average was used to determine whether a given number of workload types would typically indicate higher or lower costs. The range of lowest to highest costs within a group was compared to the differences between group averages to test whether the differences between Figure 7 shows the average migration cost varies by 12% depending on the number of workloads, but migration costs for the same number of workloads varies between 41% and 97%. {4}
Figure 8: Range of migration labor as percent of installation labor by number of workloads Figure 9 shows the average migration cost varies by 19% depending on the use of custom software, but costs for migrations with the same usage of custom software varies between 46% and 95%. Figure 10 Range of migration labor as percent of installation labor by custom and packaged software usage Figure 8 shows the average labor effort varies by 100% depending on the number of workloads, but labor effort for the same number of workloads varies between 72% to 752%. Next, we test the impact of custom software by classifying each migration into one of three categories: custom applications only, packaged applications only, and packaged & custom applications. Figures 9 and 10 show the average and range of migration costs and labor efforts, respectively. Figure 9: Range of migration costs as percent of installation costs by custom and packaged software usage* Figure 10 shows the average labor effort varies by 367% depending on use of custom software, but labor effort migrations with the same usage of custom software varies between 99% and 752%. Last, we considered the influence of staff experience with Linux on migration cost and labor. Interviewees were asked to estimate the average number of years their migration staff had with Linux, Figures 11 and 12 are scatter plots of the IT staff s average years of professional Linux experience against the migration costs and labor efforts, respectively. No correlation between costs or labor and staff experience is apparent. *The absence of range for the Custom only category is due to there being only one migration where a custom application was the only workload migrated. {5}
Figure 11: Migration cost as percent of installation cost vs. average staff Linux experience* regression. Regression analysis not only indicates relationships between migration cost as a percentage of original installation cost and specific factors, but also controls for effects of other factors. The results of the regression analysis are shown in Figure 13. Figure 13: Migration factor correlation parameter estimates Figures 6 through 12 indicate these migration factors do not appear to be good predictors of migration costs or labor efforts. To determine whether there is a statistically significant relationship between migration cost Figure 12: Migration labor as percent of installation labor vs. average staff Linux experience The parameter estimate indicates the change in migration cost, per unit change in each migration factor. In other words, a coefficient of 0.10 would indicate a 10% increase in the migration cost for each unit increase in the corresponding migration factor. The t-scores represent the parameter estimates divided by the standard error. T-scores approach zero as the standard errors increase. As t-scores increase, p-values decrease. P-values measure whether the results could be obtained by chance alone, and p-values less than or equal to 0.05 are considered statistically significant. In Figure 13, no migration factor has p-value less than 0.05, indicating no relationship between the migration cost and these migration factors. Performing the same analysis on migration labor and these migration factors also showed no statistically significant relationships. None of the migration factors analyzed can be used to predict migration cost and labor effort. or labor effort and each factor or combinations of factors, we performed an Ordinary Least Squares (OLS) * One interviewee could not estimate his staff s Linux experience, and thus only 27 migrations were included in the analysis of staff experience with relation to migration cost {6}
3) Conclusion The perception of easily interchangeable Linux distributions does not fully bear out in reality. Migrating to a new distribution can be costly and labor intensive, and this indicated differences between distributions are greater than currently perceived. Costs cannot be easily predicted by the number of migrated servers nor the workloads on the migrated servers, and these costs are not easily mitigated by hiring experienced IT staff. In addition to considering migration costs, users thinking about choosing Linux as their server operating system should also weigh two potential indicators of vendor lock-in. The first is most interviewees performed migrations within a Linux distribution group, or set of related distributions. Examples of a distribution group would be the Red Hat group of CentOS (uses Red Hat source code, but without trademark), Fedora Core (sponsored by Red Hat), and commercial Red Hat or the SuSE group of all SuSE releases. 79% of migraions were within a single group, 14% were between different groups, and 7% included both servers migrated within one group and between groups. The second potential indicator is interviewees concern about vendor certification. Many customers commented they would not consider migrating their mission-critical servers onto distributions without software certification: I probably would not have migrated to an unsupported distribution if the servers were running, say our financial data. -Director of IT responsible for 160 Linux servers I wouldn t migrate a mission-critical application to an unsupported distribution. -Linux administrator with 6 years of professional experience If distributions were interchangeable, then it would be easy for independent software vendors (ISVs) to certify their software on many distributions. So, we conducted a preliminary test of vendor certification as a contributor to vendor lock-in by analyzing which Linux distributions had been certified for Oracle. A matrix of existing certified Linux distributions by Oracle version is shown in Figure 14 (on page 8). Figure 14 shows not only a sparseness in vendor-certified Linux distributions but also a trend for later versions of Oracle to be certified on fewer distributions. There are half as many distributions certified for Oracle 10g Release 2 than Oracle 9.2. This runs counter to what is expected with fully interchangeable distributions. Examining the density of ISV certification would shed more light on the potential for vendor lockin when choosing Linux and may bear further study. {7}
Figure 14: Distribution certification for Oracle database Figure 15: Distribution of migrations by workloads {8}
4) Scope and Methodology 4.1 Scope of Study All migrations studied were performed at companies within the United States and Canada. They were done across all workloads and verticals, with the exception of government, telecommunication, and education. Every customer ran a mix of Linux and Windows servers, as well as other possible operating systems. Figure 17 shows the distribution of workloads on the migrated servers. 4.2 Methodology We interviewed 23 enterprise system administrators who had performed a Linux migration in the past three years. In the four cases where interviewees provided information on migrations or upgrades of multiple groups of servers, each group of servers with distinct workloads was treated as a separate migration. Interviewees were asked to estimate and identify costs associated the migration by providing the following information: 4.2.1 Labor Labor costs consisted of the costs of the man-hours spent on the migration over the following phases: Pre-Migration Planning Pilot Implementation Post-Migration Testing and Trouble-shooting Each interviewee was asked about: the number of man-hours required to complete each phase that went beyond their staff s normal duties the staff billing rate. For those customers who could not disclose this rate, the most common value of $50 per hour was used. Labor costs for both migration and original installation were calculated by multiplying the total number of man-hours spent during all phases by the customer s staff rate. The same staff rate was used for both original installation and migration. 4.2.2 Training Training costs consisted of course fees or extra time spent on learning aspects of new distribution. Interviewees were asked to identify whether the staff required any training to perform the migration or use the new distribution, and if so, the course fees associated with the migration. The time required for the staff to reach 100% productivity on the new distribution was also considered a training cost. Administrators were asked about the following factors to assess their staff s productivity while ramping up to the new distribution: number of staff responsible for the migration number of migrated servers and total number of servers staff was responsible for length of time to become proficient in the new distribution effectiveness while becoming proficient The costs related to ramping up to the new distributions were quantified using the following formula: # of staff ( 1 % effectiveness) man hours to get up to speed # of servers migrated # of servers migrated # of servers staff is responsible for 4.2.3 Outsourcing Outsourcing costs consisted of the professional fees for outside consultants brought in to perform or assist with migration tasks or support. 4.2.4 Licensing Licensing costs consisted of product lifetime usage fees and incremental support fees for applications and distributions. Interviewees were asked to provide the following information about their applications: whether they were able to continue using the same versions of the middleware and application software after the migration whether they had to pay any new license fees as a {9}
result of the migration New license fees that were incurred as a result of changing distributions were considered migration costs. For product lifetime usage license fees, the entire fee was considered a migration costs. For new incremental fees, the fees for the first year were considered part of the migration cost and the minimum because most support fees are paid on an annual basis. 4.2.5 Hardware Hardware costs consisted of the purchase price of new equipment required for the migration. Interviewees were asked whether the migration drove the purchase, and whether they could have used the old hardware with the new distribution. Equipment purchases were only counted as a cost if they could not have used the old hardware, or did not buy the hardware for lifecycle reason. 4.2.6 Running old servers in parallel For migrations where the old servers were run in parallel with the new servers during the Post-Migration phase, customers were asked to provide: the monthly budget to run the servers before the migration the length of time the servers were run in parallel, in months the number of servers run in parallel The cost of operating the old servers was calculated by multiplying the above three items. This cost was counted toward the migration cost because without the migration, there would not have been this incremental cost. 4.2.7 Downtime Downtime costs consisted of lost user productivity. For each department affected, interviewees were asked about length of downtime number of users in each department affected percentage of normal productivity of those users during downtime hourly rate of those users The cost associated with lost productivity due to the migration was found by multiplying the above four factors. 4.2.8 Migration cost breakdown For each migration, we calculated the percentage of total migration costs from each category Labor, Training, Outsourcing, Licensing, Hardware, Running servers in parallel, and Downtime. These percentages were averaged across migrations to create the average migration cost breakdown shown in Figure 3. {10}
Keystone Strategy South San Francisco, CA Salt Lake City, UT Burlington, MA info@keystonestrategy.com www.keystonestrateg y.com