MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning 2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.
Abstract The goal of performance tuning is to optimize the overall performance of the MDM Hub within the database and application server environments. This article helps you optimize the performance of the MDM Hub. Supported Versions MDM Multidomain Edition 9.6.0 for Microsoft SQL Server JBoss 5.1 and 5.2 Java 1.7 Table of Contents Performance Tuning Overview.... 2 Optimizing the Database.... 2 Microsoft SQL Server Instance Properties.... 2 Microsoft SQL Server Database Properties.... 3 Optimizing the Application Server.... 3 JBoss JAVA_OPTS Settings.... 3 Optimizing the MDM Hub.... 3 Optimizing Batch Performance.... 3 Performance Tuning Overview The goal of performance tuning is to optimize the overall performance of the MDM Hub within the database and application server environments. Experiment with the performance tuning parameters to arrive at appropriate values. You need to perform the following tasks to improve performance: Optimize the database server Optimize the application server Optimize the MDM Hub Optimizing the Database Your database administrator must configure Microsoft SQL Server for optimum performance. Also, you must specifically configure Microsoft SQL Server for optimum performance of the MDM Hub. You must set the Microsoft SQL Server instance properties and Microsoft SQL Server database properties. Microsoft SQL Server Instance Properties You must configure the max degree of parallelism server configuration option in Microsoft SQL Server to optimize the performance of the MDM Hub. The max degree of parallelism must be the same as the value of parallel degree that you specify for the base object. The parallel degree depends on the amount of data in the base object, number of processor cores, and the physical memory of the database server machine. Set max degree of parallelism to 1 to suppress parallel plan generation. 2
Microsoft SQL Server Database Properties To optimize performance of the MDM Hub, configure the following Microsoft SQL Server database properties: Set AUTO_UPDATE_STATISTICS_ASYNC to ON to enforce statistics update in asynchronous mode. Set the PARAMETERIZATION option to FORCED to parameterize SQL statements. Parameterized statements reduce the frequency of query compilations and recompilations to improve performance. Set READ_COMMITTED_SNAPSHOT to ON to allow other sessions to see data that is not yet a committed transaction. Optimizing the Application Server You must configure the application server for optimum performance of the MDM Hub. JBoss JAVA_OPTS Settings You can configure the JAVA_OPTS settings in the run.conf.bat file to allocate enough memory and prevent memory leaks for optimum performance of the MDM Hub. The run.conf.bat file is in the following directory: <JBoss_install_dir>\bin\run.conf.bat Configure the following settings: Configure JAVA_OPTS to allocate memory to the application server. The following example of JAVA_OPTS setting assumes that the physical memory of the application server machine is 8 GB: set JAVA_OPTS=%JAVA_OPTS% -Xrs -Xms1024m -Xmx5120m -Xss512k -XX:MaxPermSize=512 If you use JBoss 5.2, use the following settings to avoid memory leaks: set JAVA_OPTS=%JAVA_OPTS% -Djboss.vfs.forceVfsJar=true Set the log level for Microsoft JDBC driver to OFF to improve the performance of Services Integration Framework API. The default is ALL. The following log level property is set in the logging properties file: com.microsoft.sqlserver.jdbc = OFF After you set the log level, set the following options: set JAVA_OPTS=%JAVA_OPTS% -Djava.util.logging.manager = java.util.logging.logmanager set JAVA_OPTS=%JAVA_OPTS% -Djava.util.logging.config.file = <path to the logging properties file> Optimizing the MDM Hub Optimizing the MDM Hub involves the optimization of the MDM Hub batch jobs. Optimizing Batch Performance A batch job is a program in the MDM Hub that you can run to complete a discrete unit of work. You can launch batch jobs individually or as a group from the MDM Hub Console or with Services Integration Framework APIs. You can configure optimum settings in the cmxserver.properties file and through the Hub Console to optimize the performance of batch jobs. The following factors determine batch job performance: Amount of physical memory allocated to application servers and database servers Increase memory allocation to improve batch job performance. 3
Number of threads Configure an appropriate number of threads for the application server and database server machines to efficiently use CPU. Number and complexity of cleanse functions The performance of batch jobs increases with reduced number and reduced complexity of cleanse functions. Number of trust and validation columns in base objects The performance of load and automerge jobs decreases as the number of trust and validation columns in base objects is increased. Number of child base objects The performance of load, tokenize, and automerge batch jobs decreases as the number of child base objects for a base object is increased. Complexity of match path The performance of tokenize and match jobs decreases with the increase in the number of match columns and child base objects. Parallel degree that is set for the base object Parallel degree is an advanced base object property. For optimum performance of batch jobs, set a value between 1 and the number of processor cores on the database server machine. You can use the Hub Console to set the parallel degree for a base object. Microsoft SQL Server uses parallelism to retrieve data and create indexes. Optimizing Stage Job Performance The main settings for stage jobs are stored in the C_REPOS_CLEANSE_MATCH_SERVER table. The table stores the list of Process Servers. You can configure the following parameters for each Process Server for stage jobs: CLEANSE_EXECUTION_MODE_STR Mode of operation of the stage job. Configure one of the following values that would optimize the performance for your stage job scenario: BOTH. Online and batch operations will run. ONLINE. Only online API operations will run. BATCH. Only batch operations are will run. SERVER_NODE_COUNT Number of threads to use for the stage job. The number of threads to use depends on the number of processors that the machine uses. The number of threads must be half the number of processors in use. Alternatively, you can use the Process Server tool in the Utilities workbench of the Hub Console to set the mode of operation, thread count, and the number of process servers to use. To improve performance, configure optimum number of threads and process servers. You must also set the cmx.server.java_jdbc_loader property to TRUE in the cmxcleanse.properties file. The cmxcleanse.properties file is in the following directory: <infamdm_install_directory>\hub\cleanse\resources\ When you set the cmx.server.java_jdbc_loader property to TRUE, all records that undergo staging are written to the database through JDBC connections. 4
Optimizing Load Job Performance You can improve the performance of load batch jobs by configuring the right number of threads for processing. You can also set the maximum size of blocks to process to improve performance. Configure the following settings to improve performance of load jobs: Setting number of threads to process for each load job You can specify the number of threads that the MDM Hub must use to process the load job in the cmxserver.properties file. Set the value of the cmx.server.batch.threads_per_job property to a value equal to or less than the number of threads for batch processing that you configure for the Process Server. The default is 20. Setting the maximum block size to process You can use the cmx.server.batch.load.block_size property in the cmxserver.properties file to configure the maximum number of records to process in each block for the load. Experiment with the values for block size to arrive at an optimum value. The default is 250. Optimizing the Generate Match Tokens Job Performance You can improve the performance of the generate match tokens job by configuring an optimum batch size. You can configure the cmx.server.tokenize.loader_batch_size property in the cmxcleanse.properties file to specify the batch size. The cmx.server.tokenize.loader_batch_size property configures the maximum number of insert statements that the MDM Hub can send to the database during direct load. Default is 1000. Batch size depends on the hardware you use. The optimum batch size for a database server machine with a 16 core processor and a solid-state drive (SSD) set up in a redundant array of independent disks (RAID) is 1000. Optimizing Match Job Performance You can improve the performance of the match job by configuring an optimum batch size. You can configure the cmx.server.tokenize.loader_batch_size property in the cmxcleanse.properties file to specify the batch size for the JDBC loader used in the match job. The cmx.server.tokenize.loader_batch_size property configures the maximum number of insert statements that the MDM Hub can send to the database during direct load. Default is 1000. Batch size depends on the hardware you use. The optimum batch size for a database server machine with a 16 core processor and a solid-state drive (SSD) set up in a redundant array of independent disks (RAID) is 1000. Optimizing Automerge Job Performance You can improve the performance of automerge batch jobs by configuring the right number of threads for processing. You can also set the maximum size of blocks to process to improve performance. Configure the following settings to improve performance of automerge jobs: Setting number of threads to process for each automerge job You can specify the number of threads that the MDM Hub must use to process the automerge job in the cmxserver.properties file. Set the value of the cmx.server.batch.threads_per_job property to a value equal to or less than the number of threads for batch processing that you configure for the Process Server. The default is 20. 5
Setting the maximum block size to process You can use the cmx.server.batch.automerge.block_size property in the cmxserver.properties file to configure the maximum number of records to process in each block for automerge. Experiment with the values for block size to arrive at an optimum value. The default is 250. Author MDM Documentation 6