Size: px
Start display at page:



1 (QWHUSULVH'DWDEDVH$UFKLWHFWXUH 0LJUDWLRQ 7HLMR3HOWRQLHPL 06F3URMHFW'LVVHUWDWLRQ for the Degree of Master of Science in Informatics with major in Computer Systems and Software Engineering minor in Representation and Reasoning 7KH8QLYHUVLW\RI(GLQEXUJK 6HSWHPEHU

2 (QWHUSULVH'DWDEDVH$UFKLWHFWXUH0LJUDWLRQ $XWKRUTeijo Peltoniemi School of Informatics, University of Edingburgh $FDGHPLFVXSHUYLVRU J. Douglas Armstrong PhD School of Informatics, University of Edingburgh,QGXVWULDOVXSHUYLVRUV Terho Oinonen, Toomas Valjakka TietoEnator Forest Corporation 2

3 $EVWUDFW Fenix is a commercial enterprise level ERP software that depends heavily on the underlying database management system (DBMS). The customer considers relying solely on one DBMS vendor as a risk and requires contingency planning. This paper outlines a plan for migrating the DBMS and reviews migration tasks in theory as well as practice. An essential part of migration planning is to define the requirements for the new system. This is achieved by analysing the current system and prospective tech nologies. Another purpose of the analysis is to point out deficiencies in the current architecture and practices, which tend to be outdated with older systems; Fenix is no exception here. Furthermore, Fenix could gain significant performance improvements with new technologies. However, these require substantial investments with no exception. Since a large migration project contains a number of uncertainties piloting is necessary. This paper contains a report of an exemplary migration project carried out to a small partition of the system. The problems encountered during it were mostly tool-oriented; it is highly recommendable to acquire a complete migration suite package and establish a proper migration environment should the project become actual. The new system is validated through benchmarking it with the old system. This was part of the mini project also. 3



6 7HVWTXHULHV $VVXPSWLRQV 5XQQLQJWKHWHVW 'LVFXVVLRQRQWKHUHVXOWV &+$37(50,*5$7,2152$'0$3 PARTITION...71 MIGRATION ACTIVITIES WHS 6WHS 6WHS 6WHS 6WHS DURATIONS...76 &+$37(5&21&/86,21$1')857+(53/$16 %,%/,2*5$3+< $33(1',;$&21752/),/( $33(1',;%)250$7),/( $33(1',;&$%%5(9$7,216 6

7 &KDSWHU,QWURGXFWLRQ In this chapter I will describe the system at hand and the relating database architecture. Furthermore, I will discuss the objectives of this project. )HQL[ Fenix is a tailored Enterprise Resource Planning (ERP) system. ERP systems are used during each stage of the order chain including order handling, logistics and invoicing and Fenix is no exception. Other functions include, for example, steering and operational reporting and technical customer service. TietoEnator Forest Ltd. (TE) has continued developing Fenix since 1994 and the development team consists of 60 people working in six locations in three countries. The customer, a large paper company, has dedicated 50 people to developing Fenix. The project is one of the largest of this type conducted in Finland. (QWHUSULVHGDWDEDVHDUFKLWHFWXUH An enterprise-level system has thousands of users around the world and this places certain performance requirements on the underlying Database Management System (DBMS). It has to be able to process hundreds of concurrent transactions and deal with terabytes of data. A common practice is to distribute the data. High availability must also be provided. Besides traditional Online Transaction Processing (OLTP) functionality companies require management decision support tools to improve their competitive edge. Data Warehousing (DW) is designed for these purposes and it is proven to provide an excellent return on investment (ROI) (Connolly et al. 1999, p.915). By enterprise database architecture I mean a database architecture that consists of multiple databases residing in multiple servers. Also, it is distributed geographically 7

8 and provides high availability by architectural means. Furthermore, it provides DW. Other definitions also exist. For instance, Seacord (2001, p.15) takes a more functional approach and requires support for ad-hoc query, persistent summaries and complex transactions. These requirements are naturally also included in my definition (DW and data distribution place needs for them). 'HYHORSPHQW The actual development project was finished in Despite moving to the maintenance phase, development continues to meet changing business needs, new rollouts and emerging technology (TietoEnator Corporation 2003). For instance, the user population is constantly growing and scalability issues must be addressed. Fenix has faced many migration projects, for instance changing from two-tier to three-tier in 1996 when BEA Tuxedo was adopted. Other changes include becoming web and XML enabled. Further migration plans are under review. 0LJUDWLRQDQGSURMHFWREMHFWLYHV Fenix contains Sybase Adaptive Server (ASE), a DBMS that has served relatively well to date. The cooperation between Sybase and TE is well established and Sybase is willing to develop their products in the desired directions. However, the customer requires a worst-case plan for DBMS migration. This is mostly contingency planning; the customer regards relying solely on one vendor as a risk. However, preparing for migration has become more and more topical, as Sybase has lost market share. Sybase has not been rated among three winners which are Oracle, MS SQLServer and IBM DB/2. The first two are considered as alternatives to Sybase. TE Forest has long experience on Oracle and has used it with other projects. MS SQLServer is technologically based on Sybase (version 6) although the systems have grown apart more recently. MS SQLServer still uses T-SQL as the programming language for stored procedures. DB/2 is not very well known in the forestry industry; 8

9 not only would it be risky to adopt an unused technology but it would be expensive too: the staff would need to be educated to master DB2. TE considers open source options such as MySQL or Postgress not compliant due to the lack of some essential features. For example, MySQL does not support stored procedures that are essential within the system. Furthermore, TE considers high-class technical support an important factor that is not necessarily available with open source products. The goal of this project is to draw up a plan for the database architecture migration as well as investigate how to solve some existing problems and deficiencies with the architecture. Because the migration is a substantial effort anyway, it would be wise to take a progressive approach in taking full advantage of features of the new technology. The plan is summarised in a roadmap that maps actions into a timescale. The perspective taken here is technical and it addresses technical issues. There is also a practical component in this project: I will test migration in practice using designated tools. This requires setting up the testing environment. I will conclude the migration with benchmarking the systems. As database management systems are quite different in technical level, it is difficult to produce a general vendor-independent plan. Therefore, I am considering Oracle as the migration DBMS. Despite concentrating on a single vendor many parts of this paper are general in nature. 9

10 &KDSWHU 0LJUDWLRQSURFHVVRYHUYLHZ In this chapter I will discuss issues relating to the migration process. Most of the literature about migration concerns modernising legacy systems and is quite general. Also, relational to OO database conversion is well documented and material concerning simple MS Access to SQLServer projects is available. However, literature on a large scale database architecture migration project is rare. It is not surprising that companies want to keep in confidence large projects of this nature that pose a number of risks. 7\SHVRIPLJUDWLRQ Migration can be carried out in one shot or continuously. The first is applicable when only the schema is migrated to the new DBMS. Also, a relatively small amount of simple data can be transferred. However, the one shot approach with large systems leads to unacceptable long downtimes. 0LJUDWLRQSODQQLQJ 0LJUDWLRQSODQQLQJDFWLYLWLHV In this section I am discussing managerial issues since they drive the technical design. John Bergey suggests that a successful migration project requires a sound migration plan (2001, p.1). He divides the planning into six sub actions: 10

11 Figure 2.1 As the model above is designed considering legacy system modernisation it has to be adapted for our purposes. The migration management approach follows generally typical project management tasks including monitoring progress and risks. The relevant inputs for the review include system functionality documentation, nonfunctional requirements and available funding and resources. The first of these is partially described in this paper (chapter 3) and the last is naturally held in confidence. Non-functional requirements include high availability, high performance and flexibility (TietoEnator Corporation 2003). As the project is large and contains a number of uncertainties, prototyping is necessary. In fact, considering the project concentrates on back-end rather than frontend, piloting with a mini project would be highly recommendable. This would reveal whether the technology is suitable and how fast the developers learn it (Cockburn 1996). Issues with roll-outing include whether the system is taken into use in one release or in increments. The incremental approach is probably the only possible method 11

12 considering the amount of work required and the resource constraints in the situation. This is the most common approach for financial reasons: projects of this size require huge investment and there is pressure for early quantifiable benefits (Seacord 2001b, p.1). However, the overall cost of the project will grow if it is not conducted in a single increment. Also, if the chosen approach is incremental further planning is required to work out how the system is partitioned into increments. This might affect the system design itself. Support needs are typically divided into two parts, setting up the actual technical support and seeking acceptance for the new system from personnel. The first part is quite straightforward and consists of setting up the help desk and educating the developers. The latter might be trickier: some people might find it stressful to be compelled to learn new technologies. 3ODQQLQJPLJUDWLRQLQ2UDFOHZD\ There are also other views on migration planning. For instance, Oracle Corporation (2002, p.2-1) suggest in their documentation that migration planning consists of five tasks. The process starts with requirement analysis and end with risk assessment. This is not a totally different view from the above; the first activities in Bergey s model can be seen to consist of these actions. Task 1: Determine the requirements of the migration project. The purpose of this task is to clarify technical issues about the source database including character sets and version. Also, the impact on applications in terms of APIs and connectivity issues is analysed. This information is then used to determine additional requirements for the destination database. Acceptance criteria are also defined here. The end result of the task is a detailed requirements document. Task 2: Estimating workload. This is done using various reports produced during the migration. For instance, the used tool may not be able to convert some of the tables 12

13 or stored procedures. Time for fixing is then allocated to each of the errors found in reports. Task 3: Analysing operational requirements. This task consists of evaluating operational considerations concerning backups and recovery, required downtimes and their impact on business, parallel run requirements and training needs. Time and resources are allocated to all of the tasks. This forms the skeleton of the migration plan. Task 4: Analysing the application. At this step the application is evaluated in order to determine whether any changes are required. Sometimes rewriting the application is necessary. Connections are also evaluated here; Oracle may require new ODBC or JDBC drivers or some of the SQL almost certainly need to be written again. Again, times and resources are allocated. Furthermore, the original requirements document is updated. Task 5: Planning the migration project. Uncertain factors are evaluated at this juncture. Also financial, personnel and time constraints are defined and the final version of the plan is produced. 0LJUDWLRQSODQQLQJJXLGHOLQHV Bergey summarises his earlier studies into few guidelines: Analyse the needs of affected stakeholders to determine the migration schedules and training requirements. Define measures for assessing the success of the project. Do the planning thoroughly and do not consider it as an extra task. Involve customers and users. Do not allow implementation to begin before the plan is accepted by all the stakeholders. Divide the project into chunks defined by the roll-out plan. Put effort into planning and monitoring the migration project. 13

14 Although some of the guidelines seem rather altruistic it is quite possible to underestimate the size and difficulty of a migration project. Therefore the meaning of planning must be emphasised. 'DWDEDVHPLJUDWLRQDQGSDUDOOHORSHUDWLRQ Database migration provides a good opportunity to improve the representation of data (Seacord 2001b, p.19). This includes removing redundancies and seeking better performance through better organisation of data. An optimal solution might require partition of data. Seacord (2001b, p.19) discusses how the database schema evolves. The first version is an analogue to the legacy database and it is revised gradually. The one-shot approach is virtually impossible if the database is large and contains a substantial number of tables. An incremental approach typically requires parallel operation of old and new systems. However, there are several complications with this and the issues discussed in the preceding paragraph. Reorganising the data can lead to very complex mappings between the old and the modernised schemas. Also, maintenance is more expensive and performance is affected as well. Despite these negative aspects parallel operation significantly reduces operational risks. Usually the legacy system is kept as a warm stand-by. The parallel run is carried out with replication (Porrasmaa 2004). Naturally, there are many ways to do this, e.g. using triggers. 14

15 &KDSWHU $QDO\VLVRISUHVHQWDUFKLWHFWXUH 3UHVHQWDWLRQRIWKHV\VWHPDUFKLWHFWXUH I briefly introduced Fenix in the introductory chapter. Now I will analyse the architecture and the environment in which it works. The analysis is the basis for the requirements specification for the new system: the new system must meet the current functional requirements. The analysis also points out deficiencies. )HQL[V\VWHP Fenix operates on HP-UX machines with BEA Tuxedo Transaction Processing monitor (TM) and Sybase ASE Database Management System (DBMS). Front-ends include Windows based applications operated mainly via Citrix connections. It is also possible to install the application to the client. Some of the functionality is available via the Internet as well. 7KHEDFNHQGV\VWHPRYHUYLHZ The system consists of three environments, system test, acceptance test and production. TE developers and testers use system test for unit testing. The customer carries out acceptance testing in corresponding environment. These environments enable the rigorous configuration management process to be fully adopted but also increase replication needs. Each of the environments contains multiple databases, some of them being operational online and some Data Warehouses (DW) or archiving databases. Steering Reporting functions use DW extensively while other functions use Online Transaction Processing databases (OLTP). There are also multiple Tuxedo domains in each of the environments. 15

16 The TE Testing environment includes two server computers and Tuxedo domains for web applications and for the normal use. For unit testing purposes there are naturally DW and OLTP databases available. These are located at the customer s premises. There is also a local database in one of the TE offices, this being a replica of the master OLTP, which is maintained for efficiency reasons. The acceptance test environment consists of three server machines, five Tuxedo domains and three database environments. These are required for the actual acceptance testing, but also training, rollout purposes and web applications. In the production environment, which I will concentrate on in this paper, the OLTP, DW and Budgeting databases each have dedicated servers that are either HP V (16 CPU) or K class (4 CPU). The storage is handled with HP XP disk array with fibre channels to the servers. One server is allocated for distributed printing facilities. There are also geographically distributed databases for some partitions of the data. This is discussed below. (In addition to the environments discussed above there are also environments for build and version management and development. Discussing these is out of the scope of this paper.) 7KHGDWDEDVHDUFKLWHFWXUH The current production database architecture consists of the centralised main OLTP database, DW and a database for Budgeting. There are also warm stand-by databases for the first and the last of these for availability purposes. Some of the data is also held in distributed databases. There are plans to separate this data from the main OLTP to a separate database (BD). This is illustrated inside the dashed circle in Picture 1 (it is a slightly modified version from the TE Fenix Database Architecture presentation (TietoEnator 2000)): 16

17 GUI presentation data Warm stand-by Local Database Replication Budgeting OLTP Warm stand-by Warm stand-by BD master Update BDS etc. Operational reporting & ad-hoc Application server Tuxedo queue Work station client Data Warehouse Picture 3.1 The separated database holds basic data (BD) such as addresses, names, routes and tariffs, which is static in nature and rarely changes. This database is then replicated to local databases distributed geographically. This is known as star topology and is illustrated below in picture 3.2 (adapted from TietoEnator presentation (2000)). Rectangles in the picture denote actions. The replication is carried out to gain increased performance and scalability. The pitfall is decreased integrity, and this is why the database is functionally partitioned leaving business critical data in the OLTP centralised. The above picture also illustrates the collaboration within the system. Most of the connections are made through the application servers but not all: some of the reporting tools might take direct connections, some of the two-tier modules have survived to date and furthermore, BD is used via two-tier connection. There are plans to simplify the diverse connection types in the future. 17

18 OLTP WSB Replication Warm standby BD master BD Replication Local Database Picture 3.2 The databases and related responsibilities: 'DWDEDVHVHUYHU /RFDWLRQ 'HVFULSWLRQ OLTP Centralised The master database for all essential business data Budgeting Centralised The master database for budgeting related data such as months, price calculations etc. Basic data Centralised The master database for basic data Local database Distributed A replica from the above Data Warehouse Centralised Data Warehouse for Steering reporting Warm stand-by Centralised A full replica of the OLTP. Can continue as master immediately when the OLTP crashes 18

19 There are also initial plans for further partitioning of the OLTP. I am not discussing these here as the plans are still at an early stage. $UFKLWHFWXUDOOD\HUV Defining the layered model of the system architecture makes it easier to understand the database connections in the system. There are five main layers: Channels, Interfaces, Integration middleware, Service architecture and Databases, the first being naturally the top layer. In a stricter form of layer architecture upper layers depend on layers one level lower (Szyperski 1998, p.141). However, this is not the case with Fenix since upper layers can collaborate with any lower layer. This is called non-strict layering. Non-strict layering increases connection types and decreases the simplicity of the architecture. This seems to be the only possibility due to requirements that are placed by some technologies used within Fenix. Picture 3.3 illustrates the architectural layers: Channels Interfaces Integration middleware FML ICA http Inhouse XML EDIFACT SQL FML XML SQL Service architecture Databases Picture

20 As the picture suggests the topmost layer takes either straight connections or indirect connections via lower layers. Service or Interfaces layers do not connect straight as this is done via Integration middleware instead. The most common pattern is Channels Integration middleware Service Integration middleware Databases sequence. The Channels layer consists of Windows GUI, Citrix clients, web browsers, mill connections or external connections such as Cognos. External connections typically use in-house proprietary interface languages and protocols. Straight connections are done via Sybase API (Open Client) or Open Database Connectivity (ODBC). Recently Java Database Connectivity (JDBC) has also emerged increasingly but it is used more on a lower layer in rather than on Channels. SQL denotes these three options in the picture. Tuxedo is used via Field Manipulation Language (FML). Database tables and columns are mapped into FML fields, which are then used to pass data to and from the layers. The framework includes a number of methods for manipulating FML collections and models (a model stands for a single row). I will discuss the framework below more thoroughly. extensible Mark-up Language (XML) has also emerged lately and it has many uses within Fenix. Mill connections have traditionally been handled with EDI messages (EDIFACT), which XML is replacing at the moment. Standard XML document structures (papinet) have existed already for some time in the industry. The Interfaces layer consists of, as the name implies, interfaces to different systems such as papinet, PartnerWeb and edialogue. The latter two are web-based systems. PartnerWeb brings some of the functionality available via web. The interface to client s message subsystem also lies on this layer as well as Citrix Mainframe interface. 20

21 Integration middleware is a layer on top of Tuxedo based services and includes Fenix framework and BEA tools, for instance. The layer is required to enable the services interpret the calls made from layers above. For example, XtoF translates XML to FML and vice versa. StreamServe is a printing tool that manages geographically distributed printing. This technology was taken into use recently and is incrementally replacing old in-house tools. StreamServe uses XML as input. The service architecture is based on Tuxedo services programmed with C++. As mentioned above, Tuxedo is a transaction manager (TM) software. Tuxedo is capable of managing distributed transactions and call queues. Only the latter is used with Fenix and distributed transactions must be managed in application level. I will discuss this issue more shortly. Picture 3.4, which is adapted from TietoEnator presentation (2003), provides a view of the technology layer model. This is more technical than the architectural layer model. SENS is the client s closed network and lies in between the Channels and Interfaces layers with Extranet and Internet. It can be seen as an extra layer. As the architectural layer model is mostly logical it is not necessary to introduce another layer for it there. XtoF and BEAWLS form a gateway for XML. This is also the purpose of BEA WTC. Web is enabled wit BEA WLS and JOLT gateway technologies. 21

22 Picture 3.4 'DWDEDVHFRQQHFWLRQV As the previous section implies there are requirements for different types of database connections in the system. These include ODBC, JDBC, in-house technologies and Sybase API. There are few remaining parts of the system that run two-tier connections to the database. In addition to these there are also some reporting tools that take straight connections. These include Power++, Crystal Reports and Cognos. Power++ is a Sybase tool the usage of which has been deprecated recently after its development and support came to an end. Cognos is a relatively new purchase that provides reporting capabilities accessible through the Internet. At the moment the technology is used to draw up reports from production databases including the OLTP and DW and it does not have a testing environment. 22

23 As mentioned above the usage of JDBC has emerged heavily lately. This is because the use of the web has increased and web technologies tend to rely on JDBC. Threetier Tuxedo connections are the dominating pattern used to access the database. As the UX environment does not support ODBC or JDBC Sybase API is used here. BD data is accessed with direct ODBC connections. The connection types: 'DWDEDVHVHUYHU JDBC ODBC In-house Sybase API /RFDWLRQ Web applications BD data, old two-tier connections, reporting tools Reporting tools Tuxedo services 'LVWULEXWHGWUDQVDFWLRQV In a typical distributed environment distributed transactions form the skeleton of the framework. However, Fenix makes a distinction here. As discussed above Tuxedo manages distributed transactions between servers. Tuxedo follows the X/Open Distributed Transaction Processing (DTP) reference model that includes three types of interacting components, applications, resource managers and transaction managers. Typically the course of a transaction is as follows: An application initialises a transaction by requesting it from TM. TM then opens it with an RM that is, the DBMS in this case, but could be any subsystem that implements transactional data. The application can then directly access the database with a chosen method such as native programming interface. 23

24 The TM s responsibilities include allocating a unique identifier for the transaction, passing it to other parties and also deciding when to commit. After the RM has been provided the identifier it can determine which calls belong to which transactions. Two-phase commit is required when there are multiple RMs, which is not the case with Fenix. Two-phase commit includes voting and completion phases (Colouris et al. 2001, p ). If a participating service requires abortion during the voting phase the transaction is ended by the coordinator process that is typically one of the participating services. In case all the processes have voted for completion and one or more processes fails during the completion phase, the processes enquire the coordinator what the result of the voting was and work out the changes with the log file according to the answer. Two-phase commit protocol is designed to tolerate a succession of failures and secures the consistency of data. The pitfall with the protocol relates to performance. As the protocol includes a number of messages passed by participants and the coordinator, it consumes bandwidth and can cause considerable delays, especially if the coordinator fails. After the application has finished it calls back the TM and requests commit. Applications and TMs communicate via the (de juro) standard TX interface. The TX interface provides calls for defining the transaction demarcation (the scope of the transaction). RMs and TMs interact via XA interface that is supposed to be provided by the database vendor. Picture 3.5 illustrates this: 24

25 Picture 3.5 TX and XA interfaces ensure that the data stays consistent within a distributed system. However, securing the consistency in Fenix relies merely on the DBMS: only the TX interface is in use. The lack of XA interface derives from the early days when Sybase did not support it. Later, after starting to support XA, Sybase did not recommend that it be used. Adopting XA now requires a substantial amount of work since begin and commit transaction calls should be replaced with Tuxedo s tpbegin function. The developer has to use only local calls if he wants to secure the integrity of the transaction. Calling services in other servers lead to separate database connections and the integrity property of transaction is lost. Currently the only way to avoid fragmented transactions is to instruct the developers to use only local calls. 7KHIUDPHZRUN The framework is built on an in-house ANSI C++ class library called Phobos. The framework caters for two-tier ODBC connections and three-tier Sybase API and ODBC connections. It is implemented according to typical layer convention where upper layers provide higher abstraction of lower layers (Szyperski 1998, p.140). This 25

26 is a tried and tested pattern with frameworks and provides easy-to-use interfaces for application developers. The typical course of actions in the server begins with an instance of fxtx-class. Methods include running a dynamic SQL or a stored procedure. This class collaborates with a lower level framework class that in its turn collaborates with ctlibrary that is, the Sybase API. ct-library provides different methods for manipulating the data and allows the developer to choose whether to use cursors or other means, for example. The client calls services via b-mode objects. B-modes convert FML fields, which are returned from the service, into object attributes and vice versa when requesting a service. In the server FML fields map to database columns. The developer has to provide a bind set when retrieving data. Bind sets are used to determine which FML fields should be used. FML model maps to one row of data while a collection comprises of many rows. There are different field definitions for collections and models; therefore a collection is not just a collection of models. FML field definitions are made to specific header files and they are kept in a repository. The field types are Tuxedo related and not entirely compatible with the types Sybase provides. Therefore some conversion is required. This is also taken into account with in-house development conventions and standards. The layer pattern is advantageous with the migration process, as only classes that access ct-library must be replaced instead of all the application code. In fact the framework already provides interface to Oracle. 'DWD:DUHKRXVLQJ The purpose of DW is to provide data for management decision support (Connolly et al. 1999, p. 914). Fenix is no exception here: the uses of DW lie in steering reporting. Technical Customer Support (TCS) also uses DW, as the management monitors the development of user satisfaction this way. The DW is accessed by standard 26

27 application code as well as Cognos PowerPlay cubes (cube is a term for a multidimensional report). DW inflow, that is extracting data from the OLTP, happens every night. Following common conventions the data is cleansed and processed first in a temporary store. This enables summarising, packaging and distributing data. The processing is carried out in a database located in the same server as the OLTP. Since the DW locates in a different server, the cleansed data is uploaded there via Tuxedo queue. Another option would be using replication. Extraction, cleansing and transformation are done with custom-built procedures in the OLTP. Connolly et al. (1999, p. 928) lists three ways for this: code generators, database data replication tools and dynamic transformation engines. These could improve the processing since the task is complicated and it can be difficult to program an optimal procedure by hand. The DW design follows typical star schema where fact tables are surrounded by dimension tables. Fact tables contain factual data such as invoice related data. Dimension tables surrounding the fact table contain reference data such as customer, product data etc. The fact table contains a foreign key to each of the dimension tables. Therefore, the data can be seen as a multidimensional structure, or cubes. In order to be able to fully exploit this multidimensional data special technologies are required. Online Analytical Processing (OLAP) tools are designed for this purpose (Connolly et al. 1999, p.951). All the DW related tools including the database are from Sybase within Fenix. Sybase does not provide proper OLAP functioning. The Fenix DW is not of the purest form as the data in there is updated in addition to extracting data into it. This is seen as a bad thing by the developers. Sybase is not a pure DW database either. 27

28 5HSOLFDWLRQ The replication is used in multiple places in the architecture. Warm stand-by databases for the OLTP and Budgeting, and in the future local BD databases are all replicas of the corresponding master database. The replication method in Sybase is log based. The data is replicated at certain time intervals from the master redo log into the replica. This is discussed more in the next chapter. 'HYHORSPHQWSUDFWLFHV /RFNLQJDQGWLPHVWDPSLQJ Introducing row-level locking has been a big improvement to Sybase ASE. This has significantly improved the performance of the system. The performance reasons, again, have affected the decision to adopt an optimistic approach to locking which is based on timestamping. Complex and long transactions to a vast number of large tables with conservative locking mechanisms would cause significant delays (Kroenke 2002, p.305). The timestamping mechanism is included in the application level. Update services compare timestamps before proceeding with updates. If the time stamp value passed to the procedure from a FML message is older than the current time stamp of the data set an error message is raised and the transaction is aborted.,qwhjulw\frqvwudlqwvdqglqgh[lqj The system relies on loose integrity constraints in the table definitions. Primary keys, foreign keys and other integrity constraint declarations are avoided and tables reference others just seemingly. The uniqueness constraint of primary keys is implemented with unique indexes. In addition to primary key fields all highly referenced fields are indexed. All the large tables can be queried only using indexed fields according to development practices. However, there is no mechanism preventing this in the DBMS. 28

29 The data is also heavily denormalized to speed up retrievals. This is quite a common practice with a database of this size (Connolly et al. 2002, p.507). Denormalization (controlled redundancy) is not unproblematic though; it makes the implementation more complex and decreases flexibility. As many of the tables in the system contain hundreds of thousands or millions of rows, table scans cause massive delays since a scan causes table level locking. This is prevented with using only indexed columns in the SQL WHERE clauses. Also complicated join operations can cause a table scan and are therefore divided into multiple joins using temporary tables. 7ULJJHUV Triggers are used in places within Fenix. However, using them has not been recommended, since there have not been proper management tools in the market before recent times. Triggers that are not properly managed and documented can easily lead to awkward situations. 6WRUHGSURFHGXUHVDQGG\QDPLF64/ Dynamic SQL has not been recommended. The reasons derive from performance problems: dynamic SQL is parsed and optimized every time it is run. Stored procedures, on the other hand, are parsed and optimized only once. Another downside of complicated multi-part dynamic SQL program is increased bandwidth overhead. This is because of multiple connections to the database. Sybase enables the use of prepared dynamic SQL. This way dynamic SQL works as a temporary stored procedure. This feature is not used within the system and could be introduced in parts where complicated logic is required. The programming language used in stored procedures is called Transactional-SQL (T/SQL). &XUVRUV Using cursors is not recommended for performance reasons; using cursors can lead to page- or table-level locking. The issue is not just a company regulation but a wellknown restriction in Sybase: 29

30 The fact is that cursors introduce a fantastic performance problem in your applications and nearly always need to be avoided. (Rankins et al. 1999, p.160) However, one could argue that this issue is outdated and the functioning of cursors has improved (Talebzadeh 2004). This is discussed in more depth in the next chapter. 9LHZV A view is essentially a dynamic temporary relation, a result of a query to one or more persistent base relations (Connolly et al. 1999, p.101). Views are used to simplify complex operations on the base relations, providing a security mechanism and enabling users to access data in a customised way. Currently views are not widely used within the system and could be exploited in many places. For instance, it could be used to hide parts of data from some users and provide reduced complexity and convenience for the developers. As the operations on the views are actually done to the base relations, extreme care must be taken to ensure that queries do not lead to table scans on large tables. Secondly, the criteria for the view must be indexed in the base relation if it is large. Moreover, DBMSs do typically not allow update operations on a view that consists of multiple base relations. 7KHFXUUHQWVLWXDWLRQ To conclude I have summarised the issues with the current practices and systems. I have also proposed some solutions to improve the situation. The issues relate mostly to the restricted use of DBMS; Fenix is database oriented and driven software and could clearly gain from using some of the powerful features and functions a modern DBMS system provides. The development practices are based on the situation ten years ago and they should be modernized.,qwhjulw\frqvwudlqwv As discussed above the integrity constraint checking is implemented into the application layer. This was a typical design decision some 10 years ago since many 30

31 commercial systems did not support them fully (Connolly et al. 1999, p.732). Declarative integrity is also considered to reduce performance. However, this is dangerous considering duplication and inconsistencies. DBMS systems have evolved since those days and provide better support and locking schemes now. Constraints stored in the catalog can be enforced and controlled centrally; Codd calls this integrity independence (Connolly et al. 1999, p.105). Declarative integrity should be at least tested and depending on the results the development practices could be changed to encourage using them. 'HQRUPDOL]DWLRQ Denormalization can be justified at some parts where it is used currently due to heavy usage of that particular data. However, it seems that denormalizing has become a typical development practice and no attempts are made to avoid it. Redundant data, again, leads easily to inconsistencies and also to redundant effort, which then causes network and processing overhead. Therefore, the schema should be inspected and redundancies should be removed where they are not necessary. Their use should be discouraged in the future. (QWHUSULVHFRQVWUDLQWV Enterprise constraints enforce the updating of the relations following business rules (Connolly et al. 1999, p.276). Including them in the data definition, instead of the application, simplifies the development and again allows centralised control and enforcement. Enterprise constraints could be introduced to Fenix: There is processing that is timed, for instance DW runs. This could be handled by enterprise constraints. There are lots of simple business rules for maximum and minimum values. Simple business rules, which are widely used and are not necessarily properly documented, and which lead to long if clauses, could be replaced. 0DWHULDOL]HGYLHZV One possible solution for compensating the downsides of view resolution is view materialization. Materialized views are temporary tables that are created when the 31

32 view is queried first time. These are widely used in DW, replication servers, data visualization and mobile systems (Connolly et al. 2002, p.186) and are worthy of investigation in Fenix as well. The system contains complex data structures and relationships and could benefit from this approach that is capable of improving data integrity and query optimisation. 32

33 &KDSWHU 7HFKQRORJ\FRPSDULVRQ In this chapter I will compare Oracle and Sybase. I will start with business-oriented discussion and then move to technological aspects. While doing so I will try to discuss the issues in context. At the end I will offer some suggestions. 0DUNHWVLWXDWLRQDQGILQDQFLDOPDWWHUV Oracle is the dominant database vendor on the market with a 37% share (Ryan 2003) and the world s second biggest software vendor. On the other hand, Sybase is a relatively small player with a 2% share. This does not mean that the company is inactive. Sybase is currently developing new innovations including JMS based products and mobile databases. Sybase also claims to provide 15% lower life cycle costs than Oracle since ASE runs better in smaller computers. In Sybase tests four-processor HP computer can process transactions per minute. It seems quite obvious that Sybase is seeking to improve its market share within smaller companies. Furthermore, it has recently partnered with SAP (Sybase Corporation 2003) and provides the database for their smaller packages (Business One). Oracle, on the other hand, has been dominant especially in enterprise computing, Vincent Ryan cites Noel Yuhanna from Forrester Research in his article in (Ryan 2003) who claims that Oracle s dynamic cache and job scheduling are superior and Oracle can deal with concurrent users. Oracle has lately developed its grid computing solutions and provides large-scale products. 33

34 These recent reviews imply that there is space on the market for both companies. However, Oracle is a safer bet considering its market share and popularity. The current version of Oracle is 10g while ASE version was launched last year. 0HPRU\PRGHO The Sybase memory model is based on multithreading. All the processes run in a single OS memory space. Oracle, on the other hand, requires OS processes for each user, log writer etc. This is the main reason why ASE s performance is better in smaller platforms; Oracle requires more memory and processing capacity. 'DWDVWUXFWXUHV An Oracle database is divided into physical datafiles, which map to tablespaces that are logical structures (Oracle Corporation 2002c). Datafiles are divided into data blocks where the logical units, such as tables and indexes, locate. Tablespaces correspond to Sybase segments. Oracle checks buffer cache before looking up the data from datafiles. All the retrieved data is stored into the cache. Oracle enables dynamic cache sizing; this enables the best hit ratio (Thakkar 2002). Also ASE stores recently retrieved data into cache. ASE allows dividing the cache into distinct caches and the binding of objects into them. This is a useful feature, as it enables the user to prevent the cache from flushing important work tables. Oracle records information about database and operating system files into a control file. Backup metadata locates in this file in addition to database creation timestamp, checkpoint information and Recovery manager (RMAN). When the physical makeup of the database changes the control file changes as well. Oracle supports multiple copies of control files. Sybase ASE uses logical devices to store physical data (Sybase Corporation 2000, p.8). A database consists of segments that locate in one or more logical devices. ASE allows the user to decide in which segment the data is stored by specifying the segment name in a table definition (ON SEGMENT). Oracle, on the other hand, 34

35 allows the user to define in which tablespace the data is stored. The decisions concerning storage specification within ASE should apply similarly in Oracle. Whereas Oracle stores system information in the system table space Sybase contains a system database for this purpose. Moreover, Oracle contains control files for some vital data such as names and locations of datafiles. Control files can be mirrored and archived. Sybase has no similar functionality; related data locates in the master database. Oracle records all the operations into the redo log. Typically a database update operation causes Oracle to write two entries to the redo log: one right after the update and another after committing the transaction. The first consists of changes in the transaction table undo, the undo data block and the data block the updated table maps to. Log writer writes over old redo data when redo logs are filled. There are always at least two redo logs and Oracle supports archived redo files and log mirroring. These allow the system to recover completely from instance and media failures from a chosen point in time while creating overhead to storage. Sybase records database activities into a transaction log that locates in a designated device. Transaction logging has caused a serious bottleneck. It fills up quickly because of transactions that remain open. Bad transaction handling has to be removed in order to avoid the problem but it is not easy to find all the black spots in the code. The transaction log device size can be also increased but this is hardly a lasting solution for the transaction log problem. When the log fills up it reduces the performance of the application. 3HUIRUPDQFHDQGWXQLQJ Oracle provides wider configuration options while ASE is easier to manage. ASE contains only one configuration file and provides only a restricted interface to developers. Oracle, on the other hand, contains many configuration files and is more 35

36 open to users. This also makes Oracle more difficult to use and it is advisable to use designated tools when tuning the database. In addition to Oracle itself, there are a number of third party vendors in the market who provide these tools. Tuning tools are available for Sybase as well but not in similar numbers. &XUVRUVDQGORFNLQJ Cursors reside either in server-side or in client-side. Client-side cursors are another source of delay as they point back to the database via network and the processing is affected by the related latency. This is the reason why ct-cursor class, that is the Open Client s cursor class, should be dealt with cautiously. However, as bandwidths have increased, client-side cursors can be regarded as an option. The locking problem, on the other hand, is not simply a problem with Sybase but something other DBMSs also have to deal with. The locking typically follows 1992 ANSI SQL standard isolation level conventions. The isolation levels are based on three serialization violations and how the system allows them to occur. The following table is from Kroenke 2002, p.309: 9LRODWLRQ 7\SH,VRODWLRQ/HYHO 5HDG 8QFRPPLWWHG 5HDG &RPPLWWHG 'LUW\5HDG Possible Not possible 1RQUHSHDWDEOH 5HDG 3KDQWRP 5HDG 5HSHDWDEOH 6HULDOL]DEOH 5HDG Not Not possible possible Possible Possible Not possible Not possible Possible Possible Possible Not possible As the table shows, the most restrictive level of locking is called Serializable since it does not allow any kind of violation to occur. At the same time it enables the least throughput, as the locking granularity typically has to grow since serialization is 36

37 enforced at transaction level. For comparison, in Read Committed level serialization is enforced only at statement level (Connolly et al. 2002, p.597). Cursors can be divided into four groups, Forward only, Static, Keyset and Dynamic (Kroenke 2002, p.310). The first is the simplest form and provides functionality for moving only forward the dataset. The later three, on the other hand, are scrollable. Furthermore, the last two are able to show updates dynamically. This, however, requires at least a dirty read level isolation to ensure consistency. Sybase ASE offers three isolation levels, 0, 1 and 3 (Sybase corporation). These are called no read lock, no hold lock and hold lock respectively in Sybase terminology. The first uses no locks and therefore does not block anything from other applications. The drawback is that cursors at this level are not updateable as they are read-only. Level 1 and 3 type cursors are updateable. Level 1 ASE locks pages and releases them when the cursor moves out of the page. This is the default. Level 3 is the strictest form of locking where all the base table pages that have been read during the transaction are locked and released only after the transaction ends. The isolation levels in Oracle are quite similar to those in ASE. Oracle implements two of the ISO isolation levels, Read Committed and Serializable and also a third isolation level Read Only that corresponds to Sybase no read lock. The first two use row-level locking (Connolly et al. 2002, p.597) and wait if there are uncommitted transactions that are locking the required rows. The difference is that when the earlier transaction releases the locks Read Committed proceeds with the update while Serializable returns an error since operations are not serializable that is, serially equivalent. Since Oracle records the locks into corresponding data block locks must never be escalated as is the case with Sybase. Lock escalation significantly reduces throughput while managing fewer locks requires less processing. 37

New method for data replication in distributed heterogeneous database systems

New method for data replication in distributed heterogeneous database systems New method for data replication in distributed heterogeneous database systems Miroslaw Kasper Department of Computer Science AGH University of Science and Technology Supervisor: Grzegorz Dobrowolski Krakow,

More information


ON-LINE BOOKING APPLICATION NEIL TAIT ON-LINE BOOKING APPLICATION NEIL TAIT Submitted in partial fulfilment of the requirements of Napier University for the degree of Bachelor of Engineering with Honours in Software Engineering School of Computing

More information


ABBREVIATIONS AND SYMBOLS... 4 1 INTRODUCTION... 5 2 REQUIREMENTS FOR THE SYSTEM... 7 3 DATA PROTECTION... 10 1 ABBREVIATIONS AND SYMBOLS... 4 1 INTRODUCTION... 5 1.1 Objectives of the study... 5 1.2 Scope of the study... 5 1.3 Structure of the study... 6 2 REQUIREMENTS FOR THE SYSTEM... 7 2.1 Data protection...

More information

Towards Heterogeneous Grid Database Replication. Kemian Dang

Towards Heterogeneous Grid Database Replication. Kemian Dang Towards Heterogeneous Grid Database Replication Kemian Dang Master of Science Computer Science School of Informatics University of Edinburgh 2008 Abstract Heterogeneous database replication in the Grid

More information

Best Practices in Scalable Web Development

Best Practices in Scalable Web Development MASARYK UNIVERSITY FACULTY OF INFORMATICS Best Practices in Scalable Web Development MASTER THESIS Martin Novák May, 2014 Brno, Czech Republic Declaration Hereby I declare that this paper is my original

More information

Architecture of a Database System

Architecture of a Database System Foundations and Trends R in Databases Vol. 1, No. 2 (2007) 141 259 c 2007 J. M. Hellerstein, M. Stonebraker and J. Hamilton DOI: 10.1561/1900000002 Architecture of a Database System Joseph M. Hellerstein

More information

Migrating and testing distributed cloud based web applications

Migrating and testing distributed cloud based web applications Migrating and testing distributed cloud based web applications Master of Science Thesis in Networks and Distributed Systems (MPNET) Daniel Arenhage Fabian Lyrfors Chalmers University of Technology Department

More information

SQL SERVER 2005 PERFORMANCE Whitepaper Series. CHANGES IN THE DATABASE PARADIGM How SQL Server 2005 Redefines the Role of a DBA

SQL SERVER 2005 PERFORMANCE Whitepaper Series. CHANGES IN THE DATABASE PARADIGM How SQL Server 2005 Redefines the Role of a DBA SQL SERVER 2005 PERFORMANCE Whitepaper Series CHANGES IN THE DATABASE PARADIGM How SQL Server 2005 Redefines the Role of a DBA The following Whitepapers are included in this series: Changes in the Database

More information


DATABASES AND THE GRID DATABASES AND THE GRID Paul Watson Department of Computing Science, University of Newcastle, Newcastle-upon-Tyne, NE1 7RU, UK e-mail: Paul.Watson@newcastle.ac.uk Telephone: +44 191 222 7653 Fax: +44 191

More information

Database Performance Study

Database Performance Study Database Performance Study By Francois Charvet Ashish Pande Please do not copy, reproduce or distribute without the explicit consent of the authors. Table of Contents Part A: Findings and Conclusions p3

More information

ES 2 : A Cloud Data Storage System for Supporting Both OLTP and OLAP

ES 2 : A Cloud Data Storage System for Supporting Both OLTP and OLAP ES 2 : A Cloud Data Storage System for Supporting Both OLTP and OLAP Yu Cao, Chun Chen,FeiGuo, Dawei Jiang,YutingLin, Beng Chin Ooi, Hoang Tam Vo,SaiWu, Quanqing Xu School of Computing, National University

More information

Anatomy of a Database System

Anatomy of a Database System Anatomy of a Database System Joseph M. Hellerstein and Michael Stonebraker 1 Introduction Database Management Systems (DBMSs) are complex, mission-critical pieces of software. Today s DBMSs are based on

More information

Complex software project development: agile methods adoption

Complex software project development: agile methods adoption JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE J. Softw. Maint. Evol.: Res. Pract. (2011) Published online in Wiley Online Library (wileyonlinelibrary.com)..528 Complex software project

More information

Configuration Management for Open Source Software

Configuration Management for Open Source Software AALBORG UNIVERSITY Department of Computer Science Technical Report Configuration Management for Open Source Software by Ulf Asklund & Lars Bendix R-01-5005 January 2001 DEPARTMENT OF COMPUTER SCIENCE Fredrik

More information

An Oracle White Paper February 2013. Information Management and Big Data A Reference Architecture

An Oracle White Paper February 2013. Information Management and Big Data A Reference Architecture An Oracle White Paper February 2013 Information Management and Big Data A Reference Architecture Disclaimer The following is intended to outline our general product direction. It is intended for information

More information

Masaryk University Faculty of Informatics. Master Thesis. Database management as a cloud based service for small and medium organizations

Masaryk University Faculty of Informatics. Master Thesis. Database management as a cloud based service for small and medium organizations Masaryk University Faculty of Informatics Master Thesis Database management as a cloud based service for small and medium organizations Dime Dimovski Brno, 2013 2 Statement I declare that I have worked

More information

SAP Standard for Upgrade and Enhancement Management

SAP Standard for Upgrade and Enhancement Management October 2009 SAP Standard for Upgrade and Enhancement Management Whitepaper Active Global Support SAP AG Page 1 of 67 Change history: Version Date Changes 1.0 April 2007 Original version 2.0 October 2009

More information

MS SQL Server - An Overview

MS SQL Server - An Overview MS SQL Server - An Overview MS SQL Server An Overview Microsoft SQL Server s tight integration with Windows Server, automated self-tuning and management tools, and the wide availability of developers and

More information

Performance Testing. Nov 2011

Performance Testing. Nov 2011 Performance Testing Nov 2011 Load testing tools have been around for more than a decade but performance testing practices adopted by the IT industry are still far from maturity. There are still severe

More information

Scaling Distributed Database Management Systems by using a Grid-based Storage Service

Scaling Distributed Database Management Systems by using a Grid-based Storage Service Scaling Distributed Database Management Systems by using a Grid-based Storage Service Master Thesis Silviu-Marius Moldovan Marius.Moldovan@irisa.fr Supervisors: Gabriel Antoniu, Luc Bougé {Gabriel.Antoniu,Luc.Bouge}@irisa.fr

More information

A History of Databases

A History of Databases A History of Databases 2 What the reader will learn: The Origins of databases Databases of the 1960 s and 1970 s Current mainstream database technologies relational versus object orientation The need for

More information

Database as a Cloud Service

Database as a Cloud Service Database as a Cloud Service March, 2012 Roger Wolter Senior Architect, Scalability Experts Executive Summary On-demand or pay-as-you-use computing services delivered through manageability and lower operational

More information



More information

Database as a Cloud Service

Database as a Cloud Service Database as a Cloud Service June, 2011 Roger Wolter Senior Architect, Scalability Experts Executive Summary On-demand or pay-as-you-use computing services delivered through a Cloud solution can provide

More information

White paper Solution Approaches for Big Data

White paper Solution Approaches for Big Data White paper Solution Approaches for Big Data Big Data becomes relevant for more and more organizations. They move to new fields of applications where large volumes of data are automatically and continuously

More information

High Availability Options for Kronos Workforce Central Suite on SQL Server

High Availability Options for Kronos Workforce Central Suite on SQL Server High Availability Options for Kronos Workforce Central Suite on SQL Server Maximize value and operational efficiency throughout your workforce management software Technical Reviewers: Richard Waymire,

More information

Performance Optimization Guide

Performance Optimization Guide Thomas Schneider SAP Performance Optimization Guide Analyzing and Tuning SAP Systems Bonn Boston Contents at a Glance 1 Performance Management of an SAP Solution... 33 2 Analysis of Hardware, Database,

More information

Best Practices for DB2 on z/os Schema Management. By Klaas Brant and BMC Software

Best Practices for DB2 on z/os Schema Management. By Klaas Brant and BMC Software Best Practices for DB2 on z/os Schema Management By Klaas Brant and BMC Software Page 2 CONTENTS Chapter 1: Introduction to change management... 9 Why do we need change?... 9 Change drivers... 9 What can

More information

Risk-Based E-Business Testing

Risk-Based E-Business Testing DRAFT Chapter from Paul Gerrard s Forthcoming Book: Risk-Based E-Business Testing Chapter 13 - Service Testing Risks to Be Addressed ID Risk Test Objective Technique The technical infrastructure cannot

More information

Technical Comparison of Oracle Database 11g and SQL Server 2008: Focus on Manageability. An Oracle White Paper December 2008

Technical Comparison of Oracle Database 11g and SQL Server 2008: Focus on Manageability. An Oracle White Paper December 2008 Technical Comparison of Oracle Database 11g and SQL Server 2008: Focus on Manageability An Oracle White Paper December 2008 Technical Comparison of Oracle Database 11g and SQL Server 2008: Focus on Manageability

More information