A Comparison of Flexible Schemas for Software as a Service
|
|
|
- Kerry Bennett
- 10 years ago
- Views:
Transcription
1 A Comparison of Flexible Schemas for Software as a Service Stefan Aulbach Dean Jacobs $ Alfons Kemper Michael Seibold Technische Universität München, Germany $ SAP AG, Walldorf, Germany {stefan.aulbach, alfons.kemper, [email protected] michael.seibold}@in.tum.de ABSTRACT A multi-tenant database system for Software as a Service (SaaS) should offer schemas that are flexible in that they can be extended for different versions of the application and dynamically modified while the system is on-line. This paper presents an experimental comparison of five techniques for implementing flexible schemas for SaaS. In three of these techniques, the database owns the schema in that its structure is explicitly defined in DDL. Included here is the commonly-used mapping where each tenant is given their own private tables, which we take as the baseline, and a mapping that employs Sparse Columns in Microsoft SQL Server. These techniques perform well, however they offer only limited support for schema evolution in the presence of existing data. Moreover they do not scale beyond a certain level. In the other two techniques, the application owns the schema in that it is mapped into generic structures in the database. Included here are XML in DB2 and Pivot Tables in HBase. These techniques give the application complete control over schema evolution, however they can produce a significant decrease in performance. We conclude that the ideal database for SaaS has not yet been developed and offer some suggestions as to how it should be designed. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; H.2. [Information Systems]: Database Management Logical Design General Terms Design, Performance Keywords Multi-Tenancy, Software as a Service, Flexible Schemas, Extensibility, Evolution Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and thefull citation on the firstpage. Tocopy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 09, June 29 July 2, 2009, Providence, Rhode Island, USA. Copyright 2009 ACM /09/06...$ INTRODUCTION In the Software as a Service (SaaS) model, a service provider owns and operates an application that is accessed by many businesses over the Internet. A key benefit of this model is that, by careful engineering, it is possible to leverage economy of scale to reduce total cost of ownership relative to on-premises solutions. Common practice in this regard is to consolidate multiple businesses into the same database to reduce operational expenditures, since there are fewer processes to manage, as well as capital expenditures, since resource utilization is increased. A multi-tenant database system for SaaS should offer schemas that are flexible in two respects. First, it should be possible to extend the base schema to support multiple specialized versions of the application, e.g., for particular vertical industries or geographic regions. An extension may be private to an individual tenant or shared by multiple tenants. Second, it should be possible to dynamically evolve the base schema and its extensions while the database is on-line. Evolution of a tenant-owned extension should be totally self-service : the service provider should not be involved; otherwise operational costs will be too high. This paper presents an experimental comparison of five techniques for implementing flexible schemas for SaaS. In three of these techniques, the database owns the schema in that its structure is explicitly defined in DDL: Private Tables: Each tenant is given their own private instance of the base tables that are extended as required. In contrast, in all of the other mappings, tenants share tables. We take Private Tables as the experimental baseline. Extension Tables: The extensions are vertically partitioned into separate tables that are joined to the base tables along a row ID column. Sparse Columns: Every extension field of every tenant is added to its associated base table as a Sparse Column. Our experiments here use Microsoft SQL Server 2008 []. To implement Sparse Columns efficiently, SQL Server uses a variant of the Interpreted Storage Format [4, 7], where a value is stored in the row together with an identifier for its column. Our experimental results show that these techniques perform well, however they offer only limited support for schema evolution. DDL commands over existing data, if they are supported at all, consume considerable resources and negatively impact performance. In the on-line setting, the ap- 88
2 plication must be given control over when and how bulk data transformations occur. An additional issue is that these techniques do not scale beyond a certain level. In the other two techniques, the application owns the schema in that it is mapped into generic structures in the database: XML: Each base table is augmented by a column that stores all extension fields for a tenant in a flat XML document. Since these documents necessarily vary by tenant, they are untyped. Our experiments here use purexml in IBM DB2 [20]. Pivot Tables: Each value is stored along with an identifier for its column in a tall narrow table [2]. Our experiments here use HBase [], which is an open source version of Google BigTable [6]. BigTable and HBase were originally designed to support the exploration of massive web data sets, but they are increasingly being used to support enterprise applications [4]. The Pivot Table mapping into HBase that we employ is consistent with best practices. These two techniques give the application complete control over schema evolution, however our experimental results show that they can produce a significant decrease in performance from the baseline. For XML, the decrease is greatest for reads, which require parsing the untyped documents and reassembling typed rows. The decrease is proportional to the number of extension fields. For Pivot Tables, the decrease is more than an order of magnitude in some cases. Note that these results should not be taken as a negative statement about the quality of these systems, since they have not been optimized for our use case. Moreover, HBase is an early-stage open source project, not a mature commercial product. Our results are intended to give a general indication of the trade-offs in implementing flexible schemas. Several major SaaS vendors have developed mapping techniques in which the application owns the schema. This approach has been elevated to a design principle whereby the application derives essential capabilities by managing the metadata itself [9, 23]. To achieve acceptable performance, these applications re-implement significant portions of the database, including indexing and query optimization, from the outside. We believe that databases should be enhanced to directly support the required capabilities. Our experiments are based on a multi-tenant database testbed that simulates a simple but realistic Customer Relationship Management (CRM) service. The workload contains single- and multi-row create, read, and update operations as well as basic reporting tasks. The schema can be extended for individual tenants and it can evolve over time. Our previous work with a more limited version of this testbed (no extensions) showed that the performance of Private Tables degrades if there are too many tables [3]. This effect is due to the large amount of memory needed to hold the metadata as well as an inability to keep index pages in the buffer pool. In this paper, we create only a moderate number of tables, take Private Tables as the baseline, and use it to compare the other mappings. This paper is organized as follows. Section 2 describes our multi-tenant database testbed and the CRM application that it simulates. Section 3 describes the schema mapping techniques. Section 4 presents the results of our experiments. Section 5 concludes that the ideal database for SaaS Campaign Account Lead Opportunity Asset Contact LineItem Product Case Contract Figure : CRM Application Schema has not yet been developed and offers some suggestions as to how it should be designed. 2. MULTI-TENANT DATABASE TESTBED The experiments in this paper are based on a multi-tenant database testbed we have developed that can be adapted for different database configurations. Each configuration requires a plug-in to the testbed that transforms abstract actions into operations that are specific to and optimized for the target database. The testbed simulates a simple but realistic CRM service. Figure shows the entities and relationships in the base schema. The base entities are extended with additional fields of various types for each tenant. Tenants have different sizes and tenants with more data have more extension fields, ranging from 0 to 00. The characteristics of the dataset are modeled on salesforce.com s published statistics [3]. The testbed has nine request classes. The distribution of these requests is controlled using a mechanism similar to TPC s card decks. Select : Select all attributes of a single entity as if it was being displayed in a detail page in the browser. Select 50: Select all attributes of 50 entities as if they were being displayed in a list in the browser. Select 000: Select all attributes of the first 000 entities as if they were being exported through a Web Services interface. ing: Run one of five reporting queries that perform aggregation and/or parent-child-roll-ups. Insert : Insert one new entity instance as if it was being manually entered into the browser. Insert 50: Insert 50 new entity instances as if data were being synchronized through a Web Services interface. Insert 750: Insert 750 new entity instances as if data were being imported through a Web Services interface. Update : Update a single entity as if it was being modified in an edit page in the browser. Update 00: Update 00 entity instances as if data were being synchronized through a Web Services interface. The testbed mimics a typical application server s behavior by creating a configurable number of connections to the database backend. To avoid blockings, the connections are distributed among a set of worker hosts, each of them handling a few connections only. Distributing these connections among multiple hosts allows for modeling various sized, multi-threaded application servers. 882
3 3. SCHEMA MAPPING TECHNIQUES Within a SaaS application, each tenant has a logical schema consisting of the base schema and a set of extensions. To implement multi-tenancy, the logical schemas from multiple tenants are mapped into one physical schema in the database. The mapping layer transforms queries against the logical schemas into queries against the physical schema so multi-tenancy is transparent to application programmers. The physical schemas for the five mapping techniques studied in this paper are illustrated in Figure 2. The example data set used in this Figure is most clearly shown in the Private Tables mapping (Figure 2(a)). There are three tenants 7, 35, and 42 each of which has an Account table with Account ID (Aid) and Name fields. Tenant 7 has extended the Account table with two fields for the health care industry: Hospital and Beds. Tenant 42 has extended the Account table with one field for the automotive industry: Dealers. In the Extension Tables mapping (Figure 2(b)), the industry extensions are split off into separate tables that are joined to the base Account table using a new Row number column (Row). Tenants share the tables using a tenant ID column (Tenant). This section describes the other three mappings in more detail. 3. Sparse Columns in Microsoft SQL Server Sparse Columns were originally developed to manage data such as parts catalogs where each item has only a few out of thousands of possible attributes. Storing such data in conventional tables with NULL values can decrease performance even with advanced optimizations for NULL handling. To implement Sparse Columns, SQL Server 2008 uses a variant of the Interpreted Storage Format [4, 7], where a value is stored in the row together with an identifier for its column. In our mapping for SaaS, the base tables are shared by all tenants and every extension field of every tenant is added to the corresponding base table as a Sparse Column, as illustrated in Figure 2(c). Sparse columns must be explicitly defined by a CREATE/ALTER TABLE statement in the DDL and, in this sense, are owned by the database. Nevertheless, the application must maintain its own description of the extensions, since the column names cannot be statically embedded in the code. For writes, the application must ensure that each tenant uses only those columns that they have declared, since the namespace is global to all tenants. For reads, the application must do an explicit projection on the columns of interest, rather than doing a SELECT, to ensure that NULL values are treated correctly. Sparse Columns requires only a small, fixed number of tables, which gives it a performance advantage over Private Tables; [3] shows that having many tables negatively impacts performance. On the other hand, there is some overhead for managing Sparse Columns. As an example, the SQL Server documentation recommends using a Sparse Column for an INT field only if at least 64% of the values are NULL [5]. Both of these factors are reflected in the performance results presented in Section XML inibm DB2 IBM purexml was designed to allow processing of semistructured data alongside of structured relational data [20]. The mapping for SaaS that we use follows the recommendations in the purexml documentation for supporting multitenancy [2]. The base tables are shared by all tenants and Account 7 Aid Name Hospital Beds Acme St. Mary 35 2 Gump State 042 Account Ext Tenant Row Aid Name 7 0 Acme 7 2 Gump 35 0 Ball 42 0 Big Account 42 Aid Name Dealers Big 65 (a) Private Tables Account 35 Aid Name Ball Healthcare Account Tenant Row Hospital Beds 7 0 St. Mary 35 7 State 042 Automotive Account Tenant Row Dealers (b) Extension Tables Account Tenant Aid Name SPARSE 7 Acme Hospital St. Mary Bed Gump Hospital State Bed Ball 42 Big Dealer 65 (c) Sparse Columns Account Tenant Aid Name Ext XML 7 Acme <ext><hospital>st. Mary</hospital> <beds>35</beds></ext> 7 2 Gump <ext><hospital>state</hospital> <beds>042</beds></ext> 35 Ball 42 Big <ext><dealers>65</dealers></ext> (d) XML RowKey Account Contact 7Act [name:acme, hospital:st. Mary, beds:35] 7Act2 [name:gump, hospital:state, beds:042] 7Ctc [ ] 7Ctc2 [ ] 35Act [name:ball] 35Ctc [ ] 42Act [name:big, dealers:65] (e) Pivot Tables Figure 2: Schema Mapping Techniques each base table is augmented by a column (Ext XML) that stores all extension fields for a tenant in a flat XML document, as illustrated in Figure 2(d). Since these documents necessarily vary by tenant, they are untyped. This representation keeps the documents as small as possible, which is an important consideration for performance [6]. purexml offers a hybrid query language that provides native access to both the structured and semi-structured representations. Our testbed manipulates data in the structured format, thus accessing extension data requires a correlated subquery to manage the XML. This subquery extracts the relevant extension fields using the XMLTABLE function which converts an XML document into a tabular format using XPath. The query with the XMLTABLE function has 883
4 SELECT b.tenant, b.aid, b.name, e.dealers FROM Accounts b, XMLTABLE( i/ext PASSING b.ext_xml AS "i" COLUMNS Dealers INTEGER PATH dealers ) AS e WHERE Tenant = 42 AND Aid = ; (a) Physical SELECT Query IXSCAN tid,aid accounts FETCH RETURN NLJOIN accounts XSCAN (b) Query Execution Plan Figure 3: Correlated Subquery for XML in DB2 to be generated client- and query-specific to access clients extension fields relevant in the particular query. Figure 3(a) shows an example query against the physical schema that selects three base fields and one extension field; Figure 3(b) shows the associated query plan. In our testbed, rows are always accessed through base fields, hence there is no need to use the special XML indexes offered by purexml [20]. To insert a new tuple with extension data, the application has to generate the appropriate XML document; our performance results generally include the time to perform this operation. Updates to extension fields are implemented using XQuery 2.0 features to modify documents in place. 3.3 Pivot Tables in HBase HBase [], which is an open source version of Google BigTable [6], was originally designed to support the exploration of massive web data sets. These systems are increasingly being used to support enterprise applications in a SaaS setting [4]. In an HBase table, columns are grouped into column families. Column families must be explicitly defined in advance in the HBase DDL, for this reason they are owned by the database. There should not be more than tens of column families in a table and they should rarely be changed while the system is in operation. Columns within a column family may be created on-the-fly, hence they are owned by the application. Different rows in a table may use the same column family in different ways. All values in a column are stored as Strings. There may be an unbounded number of columns within a column family. Data in a column family is stored together on disk and in memory. Thus, a column family is essentially a Pivot Table; each value is stored along with an identifier for its column in a tall narrow table [2]. HBase was designed to scale out across a large farm of servers. Rows are range-partitioned across the servers by key. Applications define the key structure, therefore implicitly control the distribution of data. Rows with the same key SELECT p.name, COUNT(c.Case_id) AS cases FROM Products p, Assets a, Cases c WHERE c.asset = a.asset_id AND a.product = p.product_id GROUP BY p.name ORDER BY cases DESC Figure 4: Logical ing Query prefix will be adjacent but, in general, may end up on different servers. The rows on each server are physically broken up into their column families. The mapping for SaaS that we use is illustrated in Figure 2(e). In keeping with best practices for HBase, this mapping ensures that data that is likely to be accessed within one query is clustered together. A single HBase table is used to store all tables for all tenants. The physical row key in HBase consists of the concatenation of the tenant ID, the name of the logical table, and the key of the row in the logical table. Each logical table is packed into its own column family, thus each row has values in only one column family. Within a column family, each column in the logical table is mapped into its own physical HBase column. Thus, since columns are dynamic, tenants may individually extend the base tables. The reporting queries in our testbed require join, sort and group operations, which are not currently provided by HBase. We therefore implemented these operators outside the database in an adaptation layer that runs in the client. The adaptation layer utilizes operations in the HBase client API such as update single-row, get single-row and multi-row scan with row-filter. As an example, consider the reporting query shown in Figure 4, which produces a list of all Products with Cases by joining through Assets. To implement this query, our adaptation layer scans through all Cases for the given tenant and, for each one, retrieves the associated Asset and Product. It then groups and sorts the data for all Cases to produce the final result. In our experiments, HBase was configured to run on a single node and the Hadoop distributed map-reduce framework was not employed. In our experience, hundreds of tenants for an application like CRM can be managed by a database on a single commodity processor. In this setting, spreading the data for a tenant across multiple nodes and doing distributed query processing would not be advantageous; the overhead for managing the distribution would nullify any benefits of parallelization. Of course, in addition to scaling up to handle many small tenants, the ideal SaaS database should also scale out to handle large tenants. But even in this case, map-reduce is problematic for queries such as the one in Figure 4, since it requires that data be clustered around Products. Other queries, such as pipeline reports on Opportunities, might require that the data be clustered in other ways. We conclude this section with several comments about the usage of HBase in our experiments. First, HBase offers only row-at-a-time transactions and we did not add a layer to extend the scope to the levels provided by the commercial databases. Second, compression of column families was turned off. Third, neither major nor minor compactions occurred during any of the experiments. Fourth, replication of data in the Hadoop file system was turned off. Fifth, column families were not pinned in memory. Sixth, the system was configured so that old attribute values were not maintained. 884
5 Sel Ins Sel Ins 0 Private Tables Extension Tables Sparse Column 0 Priv. Table (Small Tenant) Priv. Table (Medium Tenant) Priv. Table (Large Tenant) Sparse (Small Tenant) Sparse (Medium Tenant) Sparse (Large Tenant) Response Time [msec] Response Time [msec] Ins 750 Upd Upd 00 Ins 750 Upd Upd 00 (a) Overall (b) By Tenant Size Figure 5: SQL Server Performance 4. EXPERIMENTAL RESULTS This section presents the results of our experiments on schema extensibility and evolution. To study schema evolution, we issued a series of schema alteration statements during a run of the testbed and measured the drop in throughput. The experiments were run on Microsoft SQL Server 2008, IBM DB2 V.9.5 on Windows 2008, and HBase 0.9 on Linux (CentOS 5.2). The database host was a VM on VMWare ESXi with GHz vcpus and 8 GB of RAM. 4. Microsoft SQL Server Figure 5(a) shows the results of running our testbed on Microsoft SQL Server using three different mappings: Private Tables, Extension Tables, and Sparse Columns. The horizontal axis shows the different request classes, as described in Section 2, and the vertical axis shows the response time in milliseconds on a log scale. In comparison to Private Tables, Extension Tables clearly exhibits the effects of vertical partitioning: wide reads (Sel,, ) are slower because an additional join is required, while narrow reads () are faster because some unnecessary loading of data is avoided. Updates (Upd, Upd 00) perform similarly to wide reads because our tests modify both base and extension fields. Extension Tables is faster for inserts because tables are shared among tenants so there is a greater likelihood of finding a page in the buffer pool with free space. Sparse Columns performs as well or better than Private Tables in most cases. The additional overhead for managing the Interpreted Storage Format appears to be offset by the fact that there are fewer tables. Sparse Columns performs worse for large inserts (Ins 750), presumably because the implementation of the Interpreted Storage Format is tuned to favor reads over writes. Figure 5(b) shows a break down of the Private Table and Sparse Column results by tenant size. Recall from Section 2 that larger tenants have more extension fields, ranging from 0 to 00. The results show that the performance of both mappings decreases to some degree as the number of extension fields goes up. SQL Server permits up to 30,000 Sparse Columns per table. Our standard configuration of the testbed has 95 tenants, which requires about 2,000 columns per table. We also tried a configuration with 390 tenants and about 24,000 columns per table and there was little performance degradation. The number of extension fields per tenant in our testbed is drawn from actual usage, so SQL Server is unlikely to be able to scale much beyond 400 tenants. As a point of comparison, salesforce.com maintains about 7,000 tenants in one (very large) database [3]. Figures 6(a) and 6(b) show the impact of schema evolution on throughput in SQL Server. In these graphs, the horizontal axis is time in minutes and the vertical axis is transactions per minute. The overall trend of the lines is downward because data is inserted but not deleted during a run. Part way through each run, ALTER TABLE statements on 5 base tables were submitted. The first two lines in each graph show schema-only DDL statements: add a new column and increase the size of a VARCHAR column. The third line in each graph shows a DDL statement that affects existing data: decrease the size of a VARCHAR column. To implement this statement, SQL Server scans through the table and ensures that all values fit in the reduced size. A more realistic alteration would perform more work than this, so the results indicate a lower bound on the impact of evolution. The gray bar on each graph indicates the period during which this third operation took place. In the Private Tables case (Figure 6(a)), 975 ALTER TA- BLE statements were submitted, 5 for each of the 95 tenants. Individual schema-only alterations completed very rapidly, but nevertheless had an impact on throughput because there were so many of them. Adding a new column took about minute to complete while increasing the size of a VARCHAR column took about 3 minutes. Decreasing the size of a VARCHAR column took about 9 minutes and produced a significant decrease in throughput. The overall loss of throughput in each case is indicated by the amount of time it took to complete the run. In the Sparse Columns case (Figure 6(b)), the tables are shared and 5 ALTER TABLE statements were submitted. The schema-only changes completed almost immediately and had no impact on throughput. Decreasing the size of a VAR- CHAR column took about 2 minutes, during which through- 885
6 Add new column Increase VARCHAR size Decrease VARCHAR size 0 Private Tables Extension Tables XML 2000 Transactions per Minute Response Time [msec] Sel Ins 750 Upd Upd Testbed runtime (min) (a) Private Tables Ins (a) Overall Add new column Increase VARCHAR size Decrease VARCHAR size 0 Private Table (Small Tenant) Private Table (Medium Tenant) Private Table (Large Tenant) XML (Small Tenant) XML (Medium Tenant) XML (Large Tenant) 2000 Transactions per Minute Response Time [msec] Testbed runtime (min) (b) Sparse Columns Sel Ins (b) By Tenant Size Upd 00 Upd Ins 750 Figure 6: SQL Server Throughput Figure 7: DB2 Performance put dropped almost to zero. The overall loss of throughput was greater for Private Tables, as indicated by the amount of time it took to complete the runs. However the behavior of Private Tables is probably preferable in the SaaS setting because the throughput drop is never as deep, thus the servers don t need to be overprovisioned as much. In any case, neither of these mappings is ideal in that the application should have more control over when such resource-intensive operations occur. 4.2 IBM DB2 Figure 7(a) shows the results of running our testbed on IBM DB2 using three different mappings: Private Tables, Extension Tables, and XML using purexml. The axes are the same as in Figure 5. In comparison to Private Tables, Extension Tables exhibits the same performance variations as in SQL Server. However XML produces a decrease in performance in most cases. The decrease is particularly severe for reads, which require executing a correlated subquery containing an XQuery statement embedded in a call to the XMLTABLE function, as described in Section 3.2. Figure 7(b) shows a break down of the Private Table and XML results by tenant size. Re- call from Section 2 that larger tenants have more extension fields, ranging from 0 to 00. The results show that for reads, the performance decrease of XML is proportional to the number of extension fields. Note that in the Insert 750 case, the results do not include the time to construct the XML document (for no particularly good reason) and there is no variation based on tenant size. XML gives the application complete control over schema evolution. In this setting, the application is responsible for performing any bulk transformations associated with schema alterations that impact existing data. To study the efficiency of such transformations, we ran our schema evolution throughput experiment on DB2 using purexml. To simulate the decrease-varchar case, we submitted a query for each of the five base tables that SELECTs one field from all documents. These queries were run on the database server so no data transfer costs were incurred. The results were almost identical to the SQL Server Sparse Columns case shown in Figure 6(b). The five queries took about 2 minutes to complete, during which time throughput dropped to a very low level. Of course, the advantage of the XML mapping is that the application need not perform such transformations all at once. 886
7 Response Time [msec] Sel MS SQL MS SQL + Adaptation HBase Ins Ins 750 Figure 8: HBase Performance Upd Upd HBase Figure 8 shows the results of running our testbed on HBase along with two SQL Server configurations: one using the Sparse Columns mapping presented in Section 4. and one using the adaptation layer described in Section 3.3. Recall that the adaptation layer performs join, sort and group operations outside the database. In the latter case, to further approximate the HBase mapping, we made the base columns as well as the extension columns sparse. It turns out that this change was not significant: according to independent testbed runs we performed, making the base fields sparse has little impact on the performance of SQL Server. In comparison to the Sparse Columns mapping in SQL Server, HBase exhibits a decrease in performance that ranges from one to two orders of magnitude depending on the operation. One reason for this decrease is the reduced expressive power of the HBase APIs, which results in the need for the adaptation layer. This effect is particularly severe for reports and updates, where SQL Server with adaptation also shows a significant decrease in performance. These results are consistent with [9], which shows that high-volume data-processing tasks such as reporting are more efficiently processed by shipping queries to the server rather than by shipping data to the client. The performance decrease for updates is primarily due to the fact that the adaptation layer submits changes one at a time rather than in bulk. HBase has a bulk update operation, however it appears that, in the version we used, changes are not actually submitted in bulk unless automatic flushing to disk is turned off. In any case, this problem with bulk updates should be easy to fix. A second reason for the performance decrease is that rows must be assembled from and disassembled into Pivot Tables. Since the data for a row may be spread out across the disk, several reads or writes may be required. This effect is particularly severe for reads. A third reason is that HBase accesses disks (in the Hadoop File System) over the network. In contrast, shared-nothing architectures put disks on the local SCSI bus while shared-disk architectures use fast SANs. 5. CONCLUSIONS The conclusion we draw from these experiments is that the ideal database system for SaaS has not yet been developed. The mappings in which the application owns the schema perform poorly unless significant portions of the database are re-implemented from the outside. And the mappings in which the database owns the schema provide only limited support for schema evolution in the presence of existing data. Moreover they cannot be scaled beyond a certain level. We believe that the ideal SaaS database system should be based on the Private Tables mapping. The interleaving of tenants, which occurs in all of the other mappings, breaks down the natural partitioning of the data. It forces import and export of a tenant s data, which must occur for backup/restore and migration, to be carried out by querying the operational system. In contrast, with Private Tables, each tenant s data is clustered together on disk so it can be independently manipulated. The interleaving of tenant data also complicates access control mechanisms in that it forces them to occur at the row level rather than the table level. In the ideal SaaS database system, the DDL should explicitly support schema extension. The base schema and its extensions should be registered as templates. There should be multiple tenants and each tenant should be able to select various extensions to the base schema. The system should not just stamp out a new copy of the schema for each tenant, rather it should maintain the sharing structure. This structure should be used to apply schema alterations to all relevant tenants. In addition, this structure will reduce the amount of meta-data managed by the system. In the ideal SaaS database system, the DDL should support on-line schema evolution. The hard part here is allowing evolution over existing data. As an example, Oracle implements this capability as follows: first a snapshot of the existing data is transformed into an interim table, then all transactions are blocked while any intervening updates are processed, and finally the original table is replaced by the interim table [7]. While this is a great step forward, the application must be given more control over when such resource-intensive operations occur. It should be possible to perform data transformations eagerly at the point the schema is changed, lazily at the point data is accessed, or every time data is accessed [8]. The ideal SaaS database system should distribute the data for many tenants across a farm of servers. The distribution should be tenant-aware rather than lumping all tenants into one large data set. It should be possible to spread the data for a large tenant across multiple servers, but that will not be the common case and in practice should not require more than a handful of servers per tenant. To support data distribution for large tenants, the query language should be less powerful than in conventional databases, however it should not be so weak that the common case suffers. In particular, the database should support multiple communication patterns for joins, rather than requiring the use of map-reduce. For example, joins in OLAP queries on a star schema tend to distribute well because the dimension tables are small. The ideal SaaS database system should have a sharednothing architecture where data is stored on fast local disks. Data should be explicitly replicated by the database, rather than a distributed file system, and used to provide high availability. To facilitate failure handling, the transactional guarantees should be weaker than in conventional databases [8]. In addition to supporting high availability, replicated data should be used to improve scalability (handle more requests for a given set of data), improve performance (maintain the 887
8 data in several different formats), and support on-line upgrades (roll upgrades across the replicas). This paper has focused on conventional row-oriented databases for OLTP. Enterprise applications also require columnoriented databases for OLAP [5, 2, 22]. The basic prescription for SaaS databases offered here Private Tables with support for on-line schema extension and evolution applies to column stores as well as row stores. The vertical storage structures of HBase and BigTable, which we use to implement Pivot Tables, are similar to column stores in that they are designed for narrow operations over many rows. Such vertical structures may be made more competitive for wide operations by keeping the data in memory, since the cost of reassembling rows is dominated by the time to perform many reads from disk. Advancements in data storage technologies are gradually making main-memory databases more attractive [0]. Our basic prescription for SaaS applies equally well to main-memory databases. 6. REFERENCES [] S. Acharya, P. Carlin, C. A. Galindo-Legaria, K. Kozielczyk, P. Terlecki, and P. Zabback. Relational Support for Flexible Schema Scenarios. PVLDB, (2): , [2] R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In VLDB, pages 49 58, 200. [3] S. Aulbach, T. Grust, D. Jacobs, A. Kemper, and J. Rittinger. Multi-Tenant Databases for Software as a Service: Schema-Mapping Techniques. In SIGMOD, pages , [4] J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs To Support Sparse Datasets Using an Interpreted Attribute Storage Format. In ICDE, page 58, [5] P. Boncz. Monet: A Next-Generation DBMS Kernel For Query-Intensive Applications. Phd, University of Amsterdam, The Netherlands, request=abstract&key=bo:diss:02. [6] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2), [7] E. Chu, J. Beckmann, and J. Naughton. The Case for a Wide-Table Approach to Manage Sparse Relational Data Sets. In SIGMOD, pages , [8] S. Finkelstein, D. Jacobs, and R. Brendle. Principles for Inconsistency. In CIDR, [9] M. J. Franklin, B. T. Jónsson, and D. Kossmann. Performance Tradeoffs for Client-Server Query Processing. In SIGMOD, pages 49 60, 996. [0] J. Gray. Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King. com/~gray/talks/flash_is_good.ppt, [] HBase [2] T. Legler, W. Lehner, and A. Ross. Data Mining with the SAP Netweaver BI Accelerator. In VLDB, pages , [3] T. McKinnon. Plug Your Code in Here An Internet Application Platform. hpts_conference_oct_2007.ppt, [4] MegaStore /07/0/GoogleMegastore.aspx, [5] Microsoft SQL Server 2008 Books Online Using Sparse Columns. en-us/library/cc aspx. [6] M. Nicola. 5 Best Practices for purexml Performance in DB2. library/techarticle/dm-060nicola/, [7] Oracle Database 0g Release 2 Online Data Reorganization & Redefinition. availability/pdf/ha_0gr2_online_reorg_twp.pdf. [8] J. F. Roddick. Reduce, Reuse, Recycle : Practical Approaches to Schema Integration, Evolution and Versioning. Lecture Notes in Computer Science, 37:209 26, [9] Salesforce.com: The Force.com Multitenant Architecture. ForcedotcomBookLibrary/Force.com_Multitenancy_ WP_0508.pdf. [20] C. M. Saracca, D. Chamberlin, and R. Ahuja. DB2 9: pure XML Overview and Fast Start. IBM, [2] M. Taylor and C. J. Guo. Data Integration and Composite Business Services, Part 3: Build a Multi-Tenant Data Tier with Access Control and Security. library/techarticle/dm-072taylor/, [22] Vertica [23] Workday: Forget ERP, Start Over
Survey on Multi-Tenant Data Architecture for SaaS
www.ijcsi.org 198 Survey on Multi-Tenant Data Architecture for SaaS Li heng 1, Yang dan 2 and Zhang xiaohong 3 1 College of Computer Science, Chongqing University Chongqing, 401331, China 2 School of Software
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework
Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Ruminations on Multi-Tenant Databases
To appear in BTW 2007, Aachen Germany Ruminations on Multi-Tenant Databases Dean Jacobs, Stefan Aulbach Technische Universität München Institut für Informatik - Lehrstuhl III (I3) Boltzmannstr. 3 D-85748
In-Memory Columnar Databases HyPer. Arto Kärki University of Helsinki 30.11.2012
In-Memory Columnar Databases HyPer Arto Kärki University of Helsinki 30.11.2012 1 Introduction Columnar Databases Design Choices Data Clustering and Compression Conclusion 2 Introduction The relational
ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier
Tips and Tricks for Using Oracle TimesTen In-Memory Database in the Application Tier Simon Law TimesTen Product Manager, Oracle Meet The Experts: Andy Yao TimesTen Product Manager, Oracle Gagan Singh Senior
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
Data Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
Future Prospects of Scalable Cloud Computing
Future Prospects of Scalable Cloud Computing Keijo Heljanko Department of Information and Computer Science School of Science Aalto University [email protected] 7.3-2012 1/17 Future Cloud Topics Beyond
A Comparison of Approaches to Large-Scale Data Analysis
A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce
DBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
A New Hybrid Schema-Sharing Technique for Multitenant Applications
A New Hybrid Schema-Sharing Technique for Multitenant Applications Franclin S. Foping, Ioannis M. Dokas, John Feehan, Syed Imran Cork Constraint Computation Centre University College Cork Ireland E-mail:
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni
OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni Agenda Database trends for the past 10 years Era of Big Data and Cloud Challenges and Options Upcoming database trends Q&A Scope
SQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why In-Memory? Information at the Speed of Thought Imagine access to business data,
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
White Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario
White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
CitusDB Architecture for Real-Time Big Data
CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing
Scaling Database Performance in Azure
Scaling Database Performance in Azure Results of Microsoft-funded Testing Q1 2015 2015 2014 ScaleArc. All Rights Reserved. 1 Test Goals and Background Info Test Goals and Setup Test goals Microsoft commissioned
Relational Databases in the Cloud
Contact Information: February 2011 zimory scale White Paper Relational Databases in the Cloud Target audience CIO/CTOs/Architects with medium to large IT installations looking to reduce IT costs by creating
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
ABAP SQL Monitor Implementation Guide and Best Practices
ABAP SQL Monitor Implementation Guide and Best Practices TABLE OF CONTENTS ABAP SQL Monitor - What is it and why do I need it?... 3 When is it available and what are the technical requirements?... 5 In
Performance And Scalability In Oracle9i And SQL Server 2000
Performance And Scalability In Oracle9i And SQL Server 2000 Presented By : Phathisile Sibanda Supervisor : John Ebden 1 Presentation Overview Project Objectives Motivation -Why performance & Scalability
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK
3/2/2011 SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK Systems Group Dept. of Computer Science ETH Zürich, Switzerland SwissBox Humboldt University Dec. 2010 Systems Group = www.systems.ethz.ch
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
Discovering SQL. Wiley Publishing, Inc. A HANDS-ON GUIDE FOR BEGINNERS. Alex Kriegel WILEY
Discovering SQL A HANDS-ON GUIDE FOR BEGINNERS Alex Kriegel WILEY Wiley Publishing, Inc. INTRODUCTION xxv CHAPTER 1: DROWNING IN DATA, DYING OF THIRST FOR KNOWLEDGE 1 Data Deluge and Informational Overload
Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
Object Oriented Database Management System for Decision Support System.
International Refereed Journal of Engineering and Science (IRJES) ISSN (Online) 2319-183X, (Print) 2319-1821 Volume 3, Issue 6 (June 2014), PP.55-59 Object Oriented Database Management System for Decision
MS SQL Performance (Tuning) Best Practices:
MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware
MTCache: Mid-Tier Database Caching for SQL Server
MTCache: Mid-Tier Database Caching for SQL Server Per-Åke Larson Jonathan Goldstein Microsoft {palarson,jongold}@microsoft.com Hongfei Guo University of Wisconsin [email protected] Jingren Zhou Columbia
SQL Server 2014 New Features/In- Memory Store. Juergen Thomas Microsoft Corporation
SQL Server 2014 New Features/In- Memory Store Juergen Thomas Microsoft Corporation AGENDA 1. SQL Server 2014 what and when 2. SQL Server 2014 In-Memory 3. SQL Server 2014 in IaaS scenarios 2 SQL Server
Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence
SAP HANA SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence SAP HANA Performance Table of Contents 3 Introduction 4 The Test Environment Database Schema Test Data System
Performance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
IBM DB2 specific SAP NetWeaver Business Warehouse Near-Line Storage Solution
IBM DB2 specific SAP NetWeaver Business Warehouse Near-Line Storage Solution Karl Fleckenstein ([email protected]) IBM Deutschland Research & Development GmbH June 22, 2011 Important Disclaimer
Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html
Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
Integrating VoltDB with Hadoop
The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Benchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract
SQL Server 2005 Features Comparison
Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions
Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.
Preview of Oracle Database 12c In-Memory Option 1 The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any
Introduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
Chapter 1: Introduction. Database Management System (DBMS) University Database Example
This image cannot currently be displayed. Chapter 1: Introduction Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Database Management System (DBMS) DBMS contains information
Actian Vector in Hadoop
Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here
PLATFORM Top Ten Questions for Choosing In-Memory Databases Start Here PLATFORM Top Ten Questions for Choosing In-Memory Databases. Are my applications accelerated without manual intervention and tuning?.
ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski [email protected]
Kamil Bajda-Pawlikowski [email protected] Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications
Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &
An Approach Towards Customized Multi- Tenancy
I.J.Modern Education and Computer Science, 2012, 9, 39-44 Published Online September 2012 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijmecs.2012.09.05 An Approach Towards Customized Multi- Tenancy
Innovative technology for big data analytics
Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of
SQL Server Parallel Data Warehouse: Architecture Overview. José Blakeley Database Systems Group, Microsoft Corporation
SQL Server Parallel Data Warehouse: Architecture Overview José Blakeley Database Systems Group, Microsoft Corporation Outline Motivation MPP DBMS system architecture HW and SW Key components Query processing
Oracle Warehouse Builder 10g
Oracle Warehouse Builder 10g Architectural White paper February 2004 Table of contents INTRODUCTION... 3 OVERVIEW... 4 THE DESIGN COMPONENT... 4 THE RUNTIME COMPONENT... 5 THE DESIGN ARCHITECTURE... 6
Big Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
How To Write A Database Program
SQL, NoSQL, and Next Generation DBMSs Shahram Ghandeharizadeh Director of the USC Database Lab Outline A brief history of DBMSs. OSs SQL NoSQL 1960/70 1980+ 2000+ Before Computers Database DBMS/Data Store
In-Memory Databases MemSQL
IT4BI - Université Libre de Bruxelles In-Memory Databases MemSQL Gabby Nikolova Thao Ha Contents I. In-memory Databases...4 1. Concept:...4 2. Indexing:...4 a. b. c. d. AVL Tree:...4 B-Tree and B+ Tree:...5
Oracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
White Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013
SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase
SAP HANA SPS 09 - What s New? SAP HANA Scalability
SAP HANA SPS 09 - What s New? SAP HANA Scalability (Delta from SPS08 to SPS09) SAP HANA Product Management November, 2014 2014 SAP AG or an SAP affiliate company. All rights reserved. 1 Disclaimer This
CS2Bh: Current Technologies. Introduction to XML and Relational Databases. Introduction to Databases. Why databases? Why not use XML?
CS2Bh: Current Technologies Introduction to XML and Relational Databases Spring 2005 Introduction to Databases CS2 Spring 2005 (LN5) 1 Why databases? Why not use XML? What is missing from XML: Consistency
Geodatabase Programming with SQL
DevSummit DC February 11, 2015 Washington, DC Geodatabase Programming with SQL Craig Gillgrass Assumptions Basic knowledge of SQL and relational databases Basic knowledge of the Geodatabase We ll hold
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008
One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code
Business Intelligence and Column-Oriented Databases
Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 [email protected] Nikola Modrušan
The Methodology Behind the Dell SQL Server Advisor Tool
The Methodology Behind the Dell SQL Server Advisor Tool Database Solutions Engineering By Phani MV Dell Product Group October 2009 Executive Summary The Dell SQL Server Advisor is intended to perform capacity
ICONICS Choosing the Correct Edition of MS SQL Server
Description: This application note aims to assist you in choosing the right edition of Microsoft SQL server for your ICONICS applications. OS Requirement: XP Win 2000, XP Pro, Server 2003, Vista, Server
IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance
Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle
1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We
<Insert Picture Here> Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region
Oracle Database Directions Fred Louis Principal Sales Consultant Ohio Valley Region 1977 Oracle Database 30 Years of Sustained Innovation Database Vault Transparent Data Encryption
Oracle Database: SQL and PL/SQL Fundamentals NEW
Oracle University Contact Us: + 38516306373 Oracle Database: SQL and PL/SQL Fundamentals NEW Duration: 5 Days What you will learn This Oracle Database: SQL and PL/SQL Fundamentals training delivers the
Move Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
Big Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
In-Memory Data Management for Enterprise Applications
In-Memory Data Management for Enterprise Applications Jens Krueger Senior Researcher and Chair Representative Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1
Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC 10.1.3.4.1 Mark Rittman, Director, Rittman Mead Consulting for Collaborate 09, Florida, USA,
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Data Management in the Cloud -
Data Management in the Cloud - current issues and research directions Patrick Valduriez Esther Pacitti DNAC Congress, Paris, nov. 2010 http://www.med-hoc-net-2010.org SOPHIA ANTIPOLIS - MÉDITERRANÉE Is
An Oracle White Paper July 2011. Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide
Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide An Oracle White Paper July 2011 1 Disclaimer The following is intended to outline our general product direction.
