Database Performance Report Using PivotalVRP Sample Report Date: July 2013 Created by: Shahar Harap
Table of Content 1. Executive Summary... 3 2. Overall resource consumption overview... 4 2.1. Peak investigation... 5 3. Skew Analysis... 7 4. Top consumers using BI model... 9 4.1. Top SQL by CPU Time... 9 4.2. Top SQL by Total Writes... 9 4.3. Top CPU Time usage by Total Reads... 10 5. Top SQL... 11 5.1. Top SQLs distributed by runtime... 11 5.2. Top SQL texts sorted by Reads per Sec.... 12 5.3. Top SQL texts sorted by Writes Per Sec.... 13 6. Variance... 14 7. Conclusions... 16
1. Executive Summary These are the highlights of the performance investigation using the PivotalVRP product: Overall The main goal of this report was to provide a complete view into the GP environment while sample queries were running. Resource usage observations o A few CPU peaks reaching about 50% o Recurring IO peaks lasting up to a couple of minutes - mostly reads o We discovered a series of skews. Summary Overall we saw that the queries running were not taking advantage of the full GP environment, since at no point did the DCA reach extreme resource usage peaks, CPU usage was mainly in the 15-30% range and I/O consumption was well within the DCA s capability. We did discover through the Variance module that although the IO consumption grew, after a few tweaks the runtime performance (which translates into real world QoS) was improved considerably. We found out that the data distribution within the table is not optimal; this type of visibility can assist DBA s to tweak environments and/or queries thereby getting more out of the environment. Within MPP environments the overall performance is dictated by the weakest link. The inclusion of PivotalVRP can also assist in migration since it can quickly point out where fine tuning is necessary. Our recommendation : 1. DBAs and R&D should further investigate the skew issues identified on nodes SDW2 and SDW3 from an application point of view (data distribution). 2. System admin should investigate SDW2 and SDW3 skew from HW/OS/network perspective. 3. More traffic can be executed on GP to allow better utilization. If resource queues are applied then they can be configured with a higher concurrency limitation. 4. DBAs & R&D should take a closer look at the highlighted queries brought is sections 2.1 & 5 and tune them if possible as well as looking at the queries on section 6 (what s new) and make sure they are legitimate queries. 5. Once all of the above are applied, the system should be monitored using PivotalVRP for an additional week and then compared using the Variance module to see the performance differences.
2. Overall resource consumption overview The following graph shows the overall CPU and IO consumption during (01/07/2013 04/07/2013) We found some peaks in CPU activity as shown in the graph above We will examine some of them in order to find trends in the DB During the investigation of the graph we see that: The CPU reached up to 54% CPU The I/O loads show a few peaks of Writes and one peak of Reads We saw some stops in the database The table below highlights resource loads Load types Avg CPU Max CPU Results during work period 29% 54%
2.1. Peak investigation We investigated the peak by using the Performance BI and Playback Module We zoomed in to find where the peak started, with PivotalVRP you have the ability to select the period of time you wish to investigate based on the events, you can review periods of time ranging from single minutes to days. We ran the Playback module over the selected time period in order to revisit and review the chain of events. Playback results We discovered that a single select could have caused the peak. We dove in, in order to see its parameters.
In the window above we can see that the select ran about 58:50 min but had no CPU or IO activity. Below is the SQL text: select computer_id,date(date_time), count(distinct(process_name)) process_count from events group by computer_id, date(date_time) order by process_count desc limit 1000;
3. Skew Analysis We reviewed the information relating to skews with the Skew Analysis module and we suspect that on the date selected there was an issue with the data distribution within the table. Our suspicion is due to the fact that a lot of skews where identified on various different segments, therefor the issue is not related to a particular node, see the example and table below:
host Sdw3 Sdw3 Sdw2 Avg Variance % 266 233 100 In the example above and in the table we see that skews were found on different segment. We see that the average variance reached up to 266% difference (in runtime) between the segments.
4. Top consumers using BI model We investigated CPU Time over a few days range in order to find what the top consumers SQL in the cluster are. Through the Performance BI module we have the ability to build the drill down path in a dynamic way by selecting the sequence of filters. 4.1. Top SQL by CPU Time 4.2. Top SQL by Total Writes
4.3. Top CPU Time usage by Total Reads Database performance analysis report using PivotalVRP
5. Top SQL 5.1. Top SQLs distributed by runtime Below are three of the top SQLtexts! select date(date_time), source_port from events! select distinct date(date_time), source_port from eventslimit 1000;! select count(v)::float4 from (select ta.file_length as v, count(ta.file_length) as f from only pg_temp_33043.pg_analyze_18882_3 as ta group by ta.file_length) as foo where f > 1
5.2. Top SQL texts sorted by Reads per Sec. Here are the top 3 SQLs:! select count(*)::float4 from (select ta.source_port from only pg_temp_33043.pg_analyze_18960_3 as ta group by ta.source_port) as tb! select count(*) from events_fix_1_prt_12;! select computer_id, process_name,source_port, destination_port from events where source_host = destination_host and source_port <> destination_port limit 1000;
5.3. Top SQL texts sorted by Writes Per Sec. Below are the top 3 SQLs:! Vacuum! select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=18986! select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=18908
6. Variance We know that some alterations were made and the original test run was halted and restarted, we wanted to see if there was a run time improvement between the two runs. The Variance module has the ability to provide numeric and quantitative performance comparative results. From the screen above we can see that overall there was a 69.8% improvement of the CPU usage. From the screen above we can see that all queries (100%) which ran during the two periods improved in run time.
From the screen above we can see almost all queries which ran during the two periods increased their I/O consumption. What s new Above are queries which ran during the second period and didn t run during the first one. In order to get a full picture when comparing environments you need to be able to show what additional queries ran in a particular environment (or time frame), What s New provides this information.
7. Conclusions The test environment selected for this sample report had multiple starting points due to a variety of reasons, PivotalVRP was used in order to give full visibility into the performance of the queries and the environment itself. Thanks to its high level of granularity we have been able to analyze the performance data in a matter of minutes. At any point during the selected period we were able to see how each single query behaved and what performance impact it had. Overall we saw that the queries running were not taking advantage of the full GP environment, since at no point did the environment reach extreme resource usage peaks, CPU usage was mainly in the 15-30% range and I/O consumption was well within the environment s capability. We did discover through Variance that although the IO consumption grew, after a few tweaks the runtime performance (which translates into real world QoS) improved considerably. We found out that the data distribution within the table wasn t optimal, this type of visibility can assist DBA s to tweak the environment and/or the queries thereby getting more out of the environment. The inclusion of PivotalVRP can also assist in migration since it can quickly point out where fine tuning is necessary.