Stream processing in data-driven computational science

Size: px
Start display at page:

Download "Stream processing in data-driven computational science"

Transcription

1 Stream processing in data-driven computational science Ying Liu, Nithya N. Vijayakumar and Beth Plale Computer Science Department, Indiana University Bloomington, IN, USA {yingliu, nvijayak, Abstract The use of real-time data streams in data-driven computational science is driving the need for stream processing tools that work within the architectural framework of the larger application. Data stream processing systems are beginning to emerge in the commercial space, but these systems fail to address the needs of large-scale scientific applications. In this paper we illustrate the unique needs of large-scale data driven computational science through an example taken from weather prediction and forecasting. We apply a realistic workload from this application against our Calder stream processing system to determine effective throughput, event processing latency, data access scalability, and deployment latency. I. INTRODUCTION The same technology advancements that have driven down the price of handhelds, cameras, phones and other devices, have enabled affordable commodity sensors, wireless networks and other devices for scientific use. As a result, scientific computing that was previously static, such as weather forecast prediction models, can now be envisioned as dynamic - with models triggered in response to changes in the environment. The cyberinfrastructure needed to bring about the dynamic capabilities is still evolving. Stream processing in scientific applications differs from stream processing in other domains in important ways. We define a stream S as a sequence of events, S = {e i } where i is a monotonically increasing number and 0 < i <. Events often are timestamped. Depending on the source, event flow rates in a stream can range from an event per microsecond to an event per day, and can range in size from a few bytes to megabytes or gigabytes. The contents of an event could be for instance a new reading of a stock value, or could mark a state change in an application. Stream processing falls into three general categories: stream management systems, rule engines, and stream processing engines []. In stream management systems, stream processing is similar to a traditional database management system which could be relational [2] [3] or object-relational []. The interface is through a declarative SQL-style language that has This work is supported in part by NSF grants EIA and CDA- 0600, and DOE DE-FG02-0ER2600. been augmented with operations over time-based tables []. A client invokes pre-built operations or can code his own in a procedural language that is then stored as a stored procedure []. Rule engines date from the early 970 s. Clients write rules in a declarative programming language in which patterns of events can be described [6] [7]. The rule language supports relational and temporal operators, as well as subtyping, parallelization, etc. [8]. When events arrive, selected rules in the rule base are fired, causing an action to result. Rule engines include Message Oriented Middleware MOM) technologies. The latter hold a collection of user profiles in the form of XPath expressions as rules for instance [9] [0]. Arriving events are matched against the profiles, with the corresponding action being to forward the event on the user indicated in the profile. Stream processing engines SPE s) are designed specifically for processing data flows on the fly. In many systems described in literature and available commercially, engines execute queries continuously over arriving streams of data [] [2] [3]. Clients describe their filtering and processing needs through a declarative language or through a graphical user interfacegui) [] [] that is converted. Events are processed on the fly, without necessarily storing them. Queries can be deployed dynamically [3], and can have their operators reordered on the fly []. The SPE uses constructs such as the time window to deal with the unbounded nature of the streams. The size of the sliding window determines the history over which a operator can execute. Optimizations have been applied to yield memory savings for instance in [3] [] [6]. The SPE architecture uses an underlying storage and/or transport medium that can be files [2] [], a publish-subscribe system [7], or sockets [8]. The contributions of this paper are as follows. Through our extensive study of stream processing in the context of scientific computing, we have come to understand what we believe to be fundamental differences of stream processing in the context of scientific computing versus elsewhere. We list these requirements here. Having worked with meteorology researchers over the past several years, we understand their needs more clearly than others. Hence we have developed a

2 realistic stream workload and stream processing scenario for dynamic weather forecasting and use it to illustrate features of stream processing in data-driven scientific computing, through the Calder system developed at Indiana University. In [3] we evaluated throughput and deployment latency of single queries on a synthetic workload. In this paper we extend that work to encompass distributed collections of queries and users under synthetic and realistic workloads. Specifically, we measure effective throughput, event processing latency, data access scalability, and deployment latency. Our results show that good performance and excellent scalability can be achieved by a service that fits within the context of a data-driven, workfloworchestrated computational science application. The remainder of the paper is organized as follows. In Section II, we list and discuss unique features of data streams in data-driven science and the requirements of stream processing systems in scientific domain. In Section III, we describe a dynamic data stream example from weather prediction and forecasting. In Section IV, we briefly describe the Calder stream processing architecture and show how it fits in the framework of meteorology forecasting. In Section V, we experimentally evaluate our system under a realistic meteorological workload. Conclusions and future work are discussed in Section VI. II. STREAM PROCESSING IN COMPUTATIONAL SCIENCE Stream processing in computational science introduces challenges not always fully present in domains such as finance, media streaming, and business such as RFID tags). We characterize the list of unique requirements to data driven computational science as follows. We argue that the most data driven applications we have observed have these requirements. A) Heterogeneous data formats. Science applications use and generate data in many different data formats, including netcdf, HDF, FITS, JPG, XML, and ASCII. The binary formats can have complex access and retrieval APIs. B) Asynchronous streams. Stream generation rates can be highly asynchronous. One event stream might generate an event once every millisecond, while another might generate an event only once every 2 hours. Some SPEs fuse or join streams based on the assumption of relatively synchronous streams. C) Wide variance in event sizes. Events generated by a sensor are only a few bytes in size while events generated by large-scale instruments or regularly run models can be in the 0 s of megabytes in size. D) Timeliness is relative. One application may want to be notified the instant a condition occurs, whereas for a second application a condition may only emerge over days or weeks. E) Streaming is part of larger system. Stream processing in data-driven computational science can be one small part of a much larger system. Its architecture must be compatible with the overall system architecture. Data No. Ev. Size Ev. Rate Cum. Rate Cum. BW Source sources KB) ev/hr) event/hr) Kbps) Metars st order Metars nd order Rawinsondes buoy data) Acars NexRad II NexRad III GOES model data ) Eta model data) CAPS sensors) TABLE I OBSERVATIONAL DATA SOURCES USED IN MESOSCALE METEOROLOGY. SHOWS THE RATES AND SIZES OF DATA PRODUCTS OVER NEW ORLEANS. Fig.. Data sources around New Orleans. F) Scientists need changes as an experiment progresses. One could envision a dynamic weather prediction workflow that data mines a region of the atmosphere looking for tornado signatures then kicks off a prediction model. The region over which data mining is carried out will change as a storm moves across the Midwest for instance. As the storm moves, the filtering criteria e.g., spatial region) must adapt. G) Domain specific processing. Much stream processing in computational science is domain specific. For instance, a mesoscale detection algorithm classifies vortices detected in Doppler radar data. Thus, a stream processing system needs to be extensible, that is, it needs to provide mechanisms for scientists to extend stream and processing with their own operators. III. METEOROLOGY EXAMPLE Meteorology is a rich application domain for illustrating the uniqueness of stream processing in scientific domains. Atmospheric scientists have considerable number and variety of weather observational instruments available to them due in large part to over 00 years of history in observing the atmosphere. Tools such as the Unidata Internet Data Dissemination IDD) [9] system distribute many of the data products to interested universities for research purposes. The data products range considerably in their sizes and generation

3 rates. Table I lists nine of the most common data products. These products are moved to the location where the weather forecast model is to run, then ingested into the model at runtime. To illustrate the use of stream processing engines in this context, suppose that an atmospheric science student is studying Fall severe weather in the region around New Orleans, Louisiana see Figure ) and wants to kick off a regional km forecast when a storm cell emerges. The Figure shows the region around New Orleans approximately at degree North Latitude and 90.2 degree West Longitude). The innerbox in Figure marks an area of 2 degree Latitude height and 2 degree Longitude width around New Orleans, where one degree latitude is 70 statute miles and one degree longitude 60 is statute miles approximately. The figure is taken from the GeoGUI in LEAD portal [20]. The number of data products, their sizes and rates for the sensors that overlap the 80 mile radius around New Orleans are given in Table I. We call this the New Orleans Workload. The table shows nine data products, and for each type gives the number of sources. The event rate is the rate at which events are generated at the source. The cumulative rate and bandwidth are calculated over all data sources within a data type and under storm mode. An event is a time stamped observation from a data source. For the NexRad Level II Doppler radar, for instance, an event corresponds to a scan, where one scan consists of fourteen 360 degree sweeps of a radar. A scan completes in -7 minutes. The range given in the event size column of the table is bipolar: the small event size occurs during clear skies, and the large event size occurs during storm conditions. The variability in event rates in Table I, from 0.08 ev/hr to ev/min, and variability in event sizes, from KB to MB, clearly demonstrates several stream processing requirements of Section II, specifically asynchronous streams requirement B), and wide variances in event sizes requirement C). This collection of data products also demonstrates a common requirement of stream processing in scientific domains, that of heterogeneous data products requirement A). The product formats shown in Table I alone include text, raw radar format, model specific binary format, images, and netcdf data. IV. CALDER ARCHITECTURE Calder, developed at Indiana University, falls into the category of a stream processing engine SPE). Its purpose is to provide timely access to data streams. Additional details of the system architecture can be found in [3]. In this section, we provide a brief overview of the system architecture and show how a stream processing fits into a larger datadriven computational science application. In particular, we discuss a scenario in the context of the mesoscale meteorology forecasting example of Section III. We view data streams as a virtual data repository, that while data sources point of presence handlers for incoming Pub sub System channels channels, one per event type Fig. 2. continuous dynamic deployment runtime container execution engine Calder System Calder architecture. service factory planner service continuous GDS GDS GDS rowset request response chunk rowset service ring buffers hold results constantly changing, has many similarities to a database [2]. Like a database, a collection of streams is bound by coherence, in that the streams belonging to a collection are related to one another, and possess meaning in that a collection of streams can be described. We call such a collection of streams a Virtual Stream Store. Calder manages multiple virtual stream stores simultaneously and provides users with access to one or more virtual stream stores. Calder uses a publish-subscribe system, dquobec [22] as its underlying transport layer. How sensors and instruments are pub-sub enabled is outside our scope of research, but solutions exist, such as [23], which takes an XML approach. This pubsub enabling is shown in Figure 2 as a single point of presence, however other approaches exist. In the simplified diagram of Figure 2, the data streams flow to a execution engine where they are received by handlers. The runtime acts on each incoming event by triggering one or more queries. A executes on the event, and generates zero, one, or more events that either trigger other queries in the system or flow to the Rowset Service where they are stored to a ring buffer for user access. User interaction with Calder follows the Globus OGSI model of service interaction where a grid data service GDS) is created on behalf of a user to serve an interaction with the virtual stream store. The user submits SQL-like queries through the GDS. Details of the extended GDS interface are given in [2]. The planner service optimizes and distributes queries and fragments based on local and global optimization criteria. The planner service initiates a request to the rowset service to create a new ringbuffer for the. Calder supports monotonic time-sequenced SQL Select- From-Where queries. The operators supported are select/project/join operators where the join operator is an equijoin over the logical or physical time fields; the boolean operations are AND and OR; and relational operations are

4 / % & ' ' ) * + 0, ' 6 3 & ) < ' * 6 6 V. E XPERIMENTS 2. ). - radar scan. When the classification algorithm detects a vortice pattern that exceeds the threshold, a response trigger is issued to the response channel. The workflow engine is reading the response channel, and acts on the message to wake the dormant prediction simulation.! " : ; Fig. 3. Stream processing to detect vortices in Doppler radar data below) as part of a larger workflow above). =, 6=,,, < and >. We do not currently support aggregate operations like GROUP BY but are working towards it. In addition, our language supports START and EXPIRE clauses for specifying the lifetime of the, the RANGE clause for specifying a user s approximation of the divergence in stream rates that the will experience. RANGE is an optional clause which is only required for the which includes join operations. The EXEC FUNC clause specifies a user-defined function to be executed on the resulting events. As we indicated in requirement B of Section II, an SPE must often operate as part of a larger system. In applications where it makes sense to treat a collection of streams as a coherent and meaningful data resource, Calder provides continuous access to the resource. Figure 3 illustrates how Calder works in an real application, in this case from mesoscale meteorology. Suppose a storm front is moving across the U.S. Midwest, threatening to spawn tornados. A user wants to deploy a data mining agent that can detect precursor conditions for a tornado, and when detected, spawn a weather prediction model. A scientist creates an experiment by interacting with an experiment builder [2] accessed through a science gateway. The specification is handed off to a workflow engine, which interacts with component pieces through a notification system. The workflow engine interacts with Calder by passing it a declarative, similar to how it would interact with a database management system. Calder optimizes the and deploys it which includes the data mining classification components [26]) at a computational node located, for instance, on the Teragrid [27]. The when instantiated at the computational node executes the filtering/data mining loop depicted at the bottom of Figure 3 for every incoming Doppler As discussed in Section II, data driven computational science imposes unique demands on a stream processing engine SPE). While some of these requirements are future work see Section VI), Calder already addresses several important requirements. One of these is the requirement that the engine adapt to changing needs of the experiment, requirement F). Calder addresses this through dynamic deployment of queries at runtime. We also experimentally evaluate the scalability of the rowset service, because while not unique to scientific computing, is important nonetheless. Finally, we examine throughput and event processing latency of a execution engine for the scenario given in Section III. A. Experimental Setup We developed a workload simulator that simulates the instrument types common to mesoscale meteorology. The simulator generates events at realistic sizes and rates as shown in Table I. Our workload generator is a set of highly configurable parallelized processes. Each process takes a channel name, data type, rates, sizes, and modes clear or storm) for one instrument and produces a stream of events of the required size and rate with pre-set metadata. The streams generate events onto the dquobec publish-subscribe system, one stream per channel. In our experimental setup, each execution engine registers through the pub-sub system to receive all data products from the workload simulator. The experiment is executed on a 28-node cluster where each node runs RHEL WS release and has dual AMD 2.0 GHz Opteron 6 bit processors with GB memory. The Opteron nodes are interconnected by a Gbps LAN. The simulator processes execute on 9 cluster nodes. B. Query Deployment Latency In this first experiment, we examine the time taken to deploy a continuous into the Calder system while performing under the New Orleans workload. We used a set of selectproject queries that filter the data products on temporal and spatial aspects. Currently, Calder supports only falls-within boundary check) spatial queries. Users submit queries through the Grid Data Service. The planner service creates a execution plan, and then deploys the to the execution engine. Microbenchmarks of the steps of deployment are presented in [3]. Here we examine the scalability of deployment latency by submitting 000 queries across 2 processing engines using 2 nodes of the Opteron cluster. The number of simultaneous users submitting queries is set at 0, based on a study [28] that estimates the number of

5 users running canonical workflows in LEAD simultaneously at 0. Query deployment latency includes plan generation, distribution and installation time, plus the overhead for XML and SOAP communication and processing between the different components of Calder. Figure shows the average deployment latency as seen by 0 users for 20 queries. The X axis shows in milliseconds. The deployment latency of the n th in the figure was computed by taking the average of the deployment latency of the n th for all 0 users. One can see from Figure that latency is high for the first and low thereafter. The initial high latency can be attributed to the large user proxy creation GDS setup) time of approximately 200ms. While Figure shows the average latency, and we can see that after the first, the deployment latency seen by the user is almost constant at around 300 to 00 ms. The table embedded on top right of Figure shows the overall distribution of deployment latency for all the 000 queries. From this table, it can be observed that maximum number of queries fall in the range of ms. Query Deployment Latency ms) Time ms) Count Number of Queries Fig.. Average deployment latency and frequency distribution for 000 queries. C. Data Access Scalability The rowset service provides users and programs with flexibility in data access by synchronizing data generation between the execution engine and requests by the users. Users request their results through OGSA-DAI v6 GDS OGSI) that has been extended to support stream data resource [2]. The GDS maintains a persistent connection to the rowset service and thus a user can submit any number of rowset requests using a single GDS. In this experiment we study the scalability of this service by measuring the response time as seen by a single user in the presence of multiple other users and the resultant data streams from New Orleans workload. Each user instantiates a GDS to connect to the rowset service. The rowset service response time is defined as a time period from the instance a request is submitted to the rowset service until the instant the user receives the first result. The scalability experiment consists of two simultaneous tasks. The rowset service is fed with New Orleans workload data products defined in Table I. The user request workload is simulated as many users sending requests to the rowset service simultaneously. We measure a single user s response time while gradually increasing from 0 to 800, the number of users sending requests to the rowset service. The results appearing in Figure is the average response time calculated over 0 runs. Further scaling beyond 800 users encountered a limit on the maximum number of open sockets for a process, because each GDS maintains an active connection to the rowset service. We can see from the figure that the response time increases in proportion to the number of users. But the response time with 800 users in the system is still in a reasonable range of 20ms. The best fit plot shows trend of increase. The variation in response time is caused by the variations in rates of the input streams which in turn influence the rates at which streams arrive at the rowset service. D. Throughput of execution engine 0 Average RT Trend 6 Input BW Output BW Mean Output BW Trend Output BW Scatter Plot Average Response Time ms) Bandwidth BW) in Mbps Number of users Queries Fig.. Response time of the rowset service as seen by a single user in the presence of n number of users. Fig. 7. Input and output network bandwidth at a Query Execution Engine under increasing load. Pass-through queries running over New Orleans Workload

6 27 Metar 0.9 Kbps 00 Metar 2. Kbps Event AMeta ) Q Rawin sondes ACAR NEXRAD II NEXRAD III GOES Eta CAPS 0.07 Kbps Kbps Kbps 2.6 Kbps 9. Kbps 6 Kbps Kbps collective bandwidth under storm mode Max = 22. Mbps Raw data on channels Event AMeta ) Q Event BMeta 2) Q2 Event BMeta 2) Q22 Event ICAPS) Q89 Event ICAPS) Q99 Event ICAPS) Q00 Query Execution Engine collective bandwidth under storm mode Max = Mbps Output data on channels Data Products Fig. 6. right). Data types and bandwidth produced by workload simulator left) operating in storm mode and collective output bandwidth from queries output stream The purpose of this third experiment is to compute throughput for a single execution engine on a large number of queries and realistic data streams with high throughput output bandwidth). The output bandwidth of a single is the product of the rate and size of the output events produced by the. The overall output bandwidth of a quoblet is the sum of all output bandwidth produced by all the queries deployed in the system at that time. Figure 6 shows data products reflecting the New Orleans workload given in Table I. The cumulative input bandwidth shown in Figure 6 is calculated by adding the bandwidth of each data product under storm mode. In this experiment, we capture the scalability of a execution engine on a single computational node as a function of the output bandwidth of the engine. We gradually increase the number of queries deployed and measure the overall output bandwidth at different times. We use a suite of metadata filtering queries, each of which executes on one of the 9 data products. For the purposes of this experiment, our queries are pass-through select all) queries that act on one data product at a time. Pass-through queries remove the bias on the output bandwidth. Each additional produces a specific output stream. An example of pass-through is as follows: SELECT * FROM NexRad Level II START " T00:00: :00" EXPIRE " T00:00: :00"; The execution engine the quoblet) is hosted on a single computational node. The output streams generated by the queries are fed to client processes listening on corresponding output channels. The streams are mapped oneto-one to channels in the the underlying pub-sub system. The workload simulator was configured to generate the New Orleans workload under clear sky mode. We increased the rates of few data products while correspondingly decreasing their sizes to maintain a smooth continuous input. The queries were submitted at 0-second intervals and the throughput measured at -second intervals. Figure 7 shows the input and output bandwidth Y-axis) measured for 00 queries X-axis). We can see that the input bandwidth is steady around 0.Mbps clear mode) which is less than the cumulative input bandwidth shown in Figure 6 storm mode). Figure 7 shows that the output bandwidth of the engine Y axis) increases with the number of queries X axis). The scatter plot plots all the output bandwidth measurements taken. The average output bandwidth is connected and a trend line super-imposed to show the increasing nature of output bandwidth with increase in the number of queries. From Figure 7, we can see that the output bandwidth keeps increasing linearly with increasing number of queries. We tested up to 00 queries and this increasing trend continues providing an average throughput of 38Mbps for 00 queries. This shows the ability of the quoblet to scale well to hundreds of queries under clear mode. We are currently working on throughput measurements for storm mode as well. The maximum number of queries supported by a execution engine is influenced by several factors including the arrival rate of the input streams and the complexity of the queries. The current measurements were taken in a cluster where the stream providers and the processing node existed in the same LAN and were connected by a Gbps Ethernet connection. In a wide area network, the output bandwidth may be restricted by the maximum bandwidth of

7 Description Average Std. Deviation Query Execution Time ms) MDA Execution Time ms) Total Service Time ms) Fig. 8. data TABLE II SERVICE TIME FOR EXECUTING FILTER QUERY AND DATA MINING ALGORITHM ON NEXRAD LEVEL II DATA. Execution of filter and data mining algorithm on NexRad II the network links. E. Event Processing Latency In this final experiment we measure event processing latency service time) for a typical in the context of the motivating example of Section III, that is, where the portion filters out all data products that are not NexRad Level II data and are outside the geospatial region of interest. The data mining portion of the is a Mesoscale Detection classifier algorithm [26] that operates over the NexRad Level II data. Figure 8 shows the relationship between the two pieces, and a representative is as follows: SELECT * FROM NexRad Level II WHERE southbound >= "28.00" and eastbound <= "-89.00" and northbound <= "3.00" and westbound >= "-9.00" EXEC_FUNC MDA_Algorithm START " T00:00: :00" EXPIRE " T00:00: :00" Table II shows service time distributed across the filtering and mining parts of the execution. We can see that execution consumes a small fraction of total service time. More complex queries may consume longer execution time, but this confirms earlier results [29] and also confirms our earlier results that service time is dependant on the rates of the input streams when joins are involved [7]. VI. CONCLUSION AND FUTURE WORK In this paper we have distinguished the major categories of stream processing system, and have argued that data driven science imposes unique demands on stream processing systems. The stream processing needs from meteorology researchers are evidence of the unique requirements of stream processing systems in data driven applications. We signified this point by describing a use scenario. The scenario motivates the experimental evaluation carried out on the Calder stream processing engine. Specifically, the experiments apply a realistic workload from the meteorology applications against the Calder system to experimentally determine its effective throughput, event processing latency, data access scalability, and deployment latency. The primary focus of our ongoing work is to support ing on XML data streams, because Calder, though currently supports various data formats, lacks the ability to dynamically add new data formats and user-defined functions needed to satisfy the requirement A). XML based language support will allow users to dynamically define new data formats. Our second focus is stream resource discovery. As discussed earlier, a collection of active streams form a Virtual Stream Store; the store must be describable in a way that clients can discover it, understand the data it contains, and issue suitably formatted queries. Capturing metadata about the streams, the collection of streams, queries, and other details such as data format is key to enabling discovery. Third, optimal placement is dependent upon information about the computational mesh in which the queries exist, so metadata must include performance monitoring information collected about streams in real time. We are also planning to migrate our GDS to OGSA-DAI WSRF.0 which is compatible with Globus Toolkit.0. Finally, we are examining issues of user privacy, approximate processing in the occurrence of missing data, missing streams and dynamic deployment of user specified data mining. The latter is needed to satisfy the requirement G. REFERENCES [] M. Stonebraker, U. Cetintemel, and S. Zdonik, The 8 requirements of real-time stream processing, SIGMOD Rec., vol. 3, no., 200. [2] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom, Stream: The Stanford data stream management system, in Data Stream Management, 200. [3] T. Johnson, C. D. Cranor, and O. Spatscheck, Gigascope: a stream database for network application, in ACM SIGMOD International Conference on Management of Data, [] M.G.Koparanova and T.Risch, High-performance stream-oriented grid database manager for scientific data, in st European Across Grids Conference, [] A. Arasu, S. Babu, and J. Widom, The CQL continuous language: Semantic foundations and execution, In Very Large Database VLDB) Journal, vol., no., 200. [6] D. Luckham, The Power of Events. Addison Wesley, [7] L. Brownston, R. Farrell, E. Kant, and N. Martin, Programming Expert Systems in OPS. Addison Wesley, 98.

8 [8] D. C. Luckham and J. Vera, An event-based architecture definition language, IEEE Transactions on Software Engineering, vol. 2, no. 9, pp , 99. [9] M. Altinel and M. J. Franklin, Efficient filtering of XML documents for selective dissemination of information, in The Very Large Database VLDB Conference, [0] B. Nguyen, S. Abiteboul, G. Cobena, and M. Preda, Monitoring XML data on the Web, SIGMOD Record, vol. 30, no. 2, pp. 37 8, 200. [] R. Avnur and J. M. Hellerstein, Eddies: continuously adaptive processing, in ACM SIGMOD International Conference on Management of Data, [2] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah, Telegraphcq: Continuous dataflow processing for an uncertain world, in Conference on Innovative Database systems Research CIDR), [3] N. Vijayakumar, Y. Liu, and B. Plale, Calder grid service: Insights and experimental evaluations, in CCGrid Conference, [] D. J. Abadi, Y. Ahmad, M. Balazinska, U. Cetintemel, M. Cherniack, J.- H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik, The Design of the Borealis Stream Processing Engine, in Second Biennial Conference on Innovative Data Systems Research CIDR) Conference, 200. [] U. V. Catalyurek, Supporting large scale data driven science in distributed environments, in Minisymposium on Distributed Data Management Infrastructures for Scalable Computational Science and Engineering Applications, SIAM Conference on Computational Science and Engineering SIAM CSE 0), 200. [6] B. Plale, Leveraging run time knowledge about event rates to improve memory utilization in wide area data stream filtering, in IEEE International Symposium on High Performance Distributed Computing, [7] B. Plale and N. Vijayakumar, Evaluation of rate-based adaptivity in joining asynchronous data streams, in 9th IEEE International Parallel and Distributed Processing Symposium, April 200. [8] D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik, Aurora: A new model and architecture for data stream management, In Very Large Database VLDB) Journal, vol. 2, no. 2, pp , [9] B. Domenico, Unidata internet data distribution: Real-time data on the desktop, in Science Information Systems Interoperability Conference SISIC), 200. [20] K. K. Droegemeier, V. Chandrasekar, R. Clark, D. Gannon, S. Graves, E. Joseph, M. Ramamurthy, R. Wilhelmson, K. Brewster, B. Domenico, T. Leyton, V. Morris, D. Murray, B. Plale, R. Ramachandran, D. Reed, J. Rushing, D. Weber, A. Wilson, M. Xue, and S. Yalda, Linked environments for atmospheric discovery LEAD): A cyberinfrastructure for mesoscale meteorology research and education, in 20th Conf. on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, Seattle, WA, 200. [2] B. Plale, Using global snapshots to access data streams on the grid, in Lecture Notes in Computer Science, Volume 36. Springer Verlag, 200, 2nd European Across Grids Conference AxGrids). [22] N. Vijayakumar and B. Plale, dquobec event channel communication system, Computer Science Department of Indiana University, Tech. Rep. TR6, 200. [23] D. McMullen, et al., Instruments and sensors on the grid: Issues and challenges, in GlobusWorld, 200. [2] Y. Liu, B. Plale, and N. Vijayakumar, Realization of ggf dais data service interface for grid access to data streams, Indiana University, Computer Science Department, Tech. Rep. TR63, 200. [2] B. Plale, D. Gannon, Y. Huang, G. Kandaswamy, S. L. Pallickara, and A. Slominski, Cooperating services for data-driven computational experimentation, in Computing in Science and Engineering CiSE), 200. [26] J. Rushing, R. Ramachandran, U. Nair, S. Graves, R. Welch, and A. Lin, ADaM: A data mining toolkit for scientists and engineers, Computers and Geosciences, vol. 3, 200. [27] TeraGrid, 200, [Online]. Available: [28] B. Plale, Usage study for data storage repository in LEAD, 200, LEAD TR00. [29] B. Plale and K. Schwan, Dynamic ing of streaming data with the dquob system, IEEE Transactions on Parallel and Distributed Systems, vol., no., April, 2003.

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace

Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August

More information

Real Time Business Performance Monitoring and Analysis Using Metric Network

Real Time Business Performance Monitoring and Analysis Using Metric Network Real Time Business Performance Monitoring and Analysis Using Metric Network Pu Huang, Hui Lei, Lipyeow Lim IBM T. J. Watson Research Center Yorktown Heights, NY, 10598 Abstract-Monitoring and analyzing

More information

Flexible Data Streaming In Stream Cloud

Flexible Data Streaming In Stream Cloud Flexible Data Streaming In Stream Cloud J.Rethna Virgil Jeny 1, Chetan Anil Joshi 2 Associate Professor, Dept. of IT, AVCOE, Sangamner,University of Pune, Maharashtra, India 1 Student of M.E.(IT), AVCOE,

More information

A Survey Study on Monitoring Service for Grid

A Survey Study on Monitoring Service for Grid A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide

More information

Middleware support for the Internet of Things

Middleware support for the Internet of Things Middleware support for the Internet of Things Karl Aberer, Manfred Hauswirth, Ali Salehi School of Computer and Communication Sciences Ecole Polytechnique Fédérale de Lausanne (EPFL) CH-1015 Lausanne,

More information

Data Stream Management System for Moving Sensor Object Data

Data Stream Management System for Moving Sensor Object Data SERBIAN JOURNAL OF ELECTRICAL ENGINEERING Vol. 12, No. 1, February 2015, 117-127 UDC: 004.422.635.5 DOI: 10.2298/SJEE1501117J Data Stream Management System for Moving Sensor Object Data Željko Jovanović

More information

Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy

Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy Inferring Fine-Grained Data Provenance in Stream Data Processing: Reduced Storage Cost, High Accuracy Mohammad Rezwanul Huq, Andreas Wombacher, and Peter M.G. Apers University of Twente, 7500 AE Enschede,

More information

The Challenges and Opportunities of Workflow Systems in Environmental Research

The Challenges and Opportunities of Workflow Systems in Environmental Research Preprint, WIRADA Science Symposium, Aug 2011, Melbourne, AU The Challenges and Opportunities of Workflow Systems in Environmental Research Beth Plale 1 1 School of Informatics and Computing, Indiana University,

More information

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data

Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data Distributed Sampling Storage for Statistical Analysis of Massive Sensor Data Hiroshi Sato 1, Hisashi Kurasawa 1, Takeru Inoue 1, Motonori Nakamura 1, Hajime Matsumura 1, and Keiichi Koyanagi 2 1 NTT Network

More information

Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD

Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD Towards Dynamically Adaptive Weather Analysis and Forecasting in LEAD Beth Plale 1, Dennis Gannon 1, Dan Reed 2,SaraGraves 3, Kelvin Droegemeier 4, Bob Wilhelmson 5, and Mohan Ramamurthy 1 Indiana University

More information

A Practical Evaluation of Load Shedding in Data Stream Management Systems for Network Monitoring

A Practical Evaluation of Load Shedding in Data Stream Management Systems for Network Monitoring A Practical Evaluation of Load Shedding in Data Stream Management Systems for Network Monitoring Jarle Søberg, Kjetil H. Hernes, Matti Siekkinen, Vera Goebel, and Thomas Plagemann University of Oslo, Department

More information

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007

Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Abstract This technical report explains the differences and similarities between the

More information

BackStreamDB: A Distributed System for Backbone Traffic Monitoring Providing Arbitrary Measurements in Real-Time

BackStreamDB: A Distributed System for Backbone Traffic Monitoring Providing Arbitrary Measurements in Real-Time BackStreamDB: A Distributed System for Backbone Traffic Monitoring Providing Arbitrary Measurements in Real-Time Christian Lyra 1, Carmem S. Hara 2, and Elias P. Duarte Jr. 2 1 Brazilian Research Network

More information

Scaling a Monitoring Infrastructure for the Akamai Network

Scaling a Monitoring Infrastructure for the Akamai Network Scaling a Monitoring Infrastructure for the Akamai Network Thomas Repantis Akamai Technologies trepanti@akamai.com Scott Smith Formerly of Akamai Technologies scott@clustrix.com ABSTRACT We describe the

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

COMPUTING SCIENCE. Scalable and Responsive Event Processing in the Cloud. Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson

COMPUTING SCIENCE. Scalable and Responsive Event Processing in the Cloud. Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson COMPUTING SCIENCE Scalable and Responsive Event Processing in the Cloud Visalakshmi Suresh, Paul Ezhilchelvan and Paul Watson TECHNICAL REPORT SERIES No CS-TR-1251 June 2011 TECHNICAL REPORT SERIES No

More information

Aurora: a new model and architecture for data stream management

Aurora: a new model and architecture for data stream management Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey 2, Sangdon Lee 2, Michael Stonebraker 3, Nesime Tatbul

More information

Adaptive Rate Stream Processing for Smart Grid Applications on Clouds

Adaptive Rate Stream Processing for Smart Grid Applications on Clouds Adaptive Rate Stream Processing for Smart Grid Applications on Clouds Yogesh Simmhan simmhan@usc.edu Baohua Cao baohuaca@usc.edu Viktor K. Prasanna prasanna@usc.edu Michail Giakkoupis mgiakkoup@usc.edu

More information

RTSTREAM: Real-Time Query Processing for Data Streams

RTSTREAM: Real-Time Query Processing for Data Streams RTSTREAM: Real-Time Query Processing for Data Streams Yuan Wei Sang H Son John A Stankovic Department of Computer Science University of Virginia Charlottesville, Virginia, 2294-474 E-mail: {yw3f, son,

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Management of Human Resource Information Using Streaming Model

Management of Human Resource Information Using Streaming Model , pp.75-80 http://dx.doi.org/10.14257/astl.2014.45.15 Management of Human Resource Information Using Streaming Model Chen Wei Chongqing University of Posts and Telecommunications, Chongqing 400065, China

More information

Preemptive Rate-based Operator Scheduling in a Data Stream Management System

Preemptive Rate-based Operator Scheduling in a Data Stream Management System Preemptive Rate-based Operator Scheduling in a Data Stream Management System Mohamed A. Sharaf, Panos K. Chrysanthis, Alexandros Labrinidis Department of Computer Science University of Pittsburgh Pittsburgh,

More information

Sensor Event Processing on Grid

Sensor Event Processing on Grid Sensor Event Processing on Grid Eui-Nam Huh Dept. of Computer Engineering Kyung Hee University #1 Seochon Kiheung, Yoingin, Kyunggi-Do, Korea johnhuh@khu.ac.kr Abstract. Wireless sensor networks are increasingly

More information

Design of Data Archive in Virtual Test Architecture

Design of Data Archive in Virtual Test Architecture Journal of Information Hiding and Multimedia Signal Processing 2014 ISSN 2073-4212 Ubiquitous International Volume 5, Number 1, January 2014 Design of Data Archive in Virtual Test Architecture Lian-Lei

More information

Isolines: Energy-efficient Mapping in Sensor Networks

Isolines: Energy-efficient Mapping in Sensor Networks Isolines: Energy-efficient Mapping in Sensor Networks Ignacio Solis and Katia Obraczka {isolis, katia}@cse.ucsc.edu Computer Engineering Department University of California, Santa Cruz April 15, 2005 Abstract

More information

A QoS-aware Method for Web Services Discovery

A QoS-aware Method for Web Services Discovery Journal of Geographic Information System, 2010, 2, 40-44 doi:10.4236/jgis.2010.21008 Published Online January 2010 (http://www.scirp.org/journal/jgis) A QoS-aware Method for Web Services Discovery Bian

More information

Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments

Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments Alleviating Hot-Spots in Peer-to-Peer Stream Processing Environments Thomas Repantis and Vana Kalogeraki Department of Computer Science & Engineering, University of California, Riverside, CA 92521 {trep,vana}@cs.ucr.edu

More information

Adaptive Rate Stream Processing for Smart Grid Applications on Clouds

Adaptive Rate Stream Processing for Smart Grid Applications on Clouds Adaptive Rate Stream Processing for Smart Grid Applications on Clouds Yogesh Simmhan, Baohua Cao, Michail Giakkoupis and Viktor K. Prasanna Center for Energy Informatics Ming Hsieh Department of Electrical

More information

USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS

USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS USING COMPLEX EVENT PROCESSING TO MANAGE PATTERNS IN DISTRIBUTION NETWORKS Foued BAROUNI Eaton Canada FouedBarouni@eaton.com Bernard MOULIN Laval University Canada Bernard.Moulin@ift.ulaval.ca ABSTRACT

More information

Redefining Smart Grid Architectural Thinking Using Stream Computing

Redefining Smart Grid Architectural Thinking Using Stream Computing Cognizant 20-20 Insights Redefining Smart Grid Architectural Thinking Using Stream Computing Executive Summary After an extended pilot phase, smart meters have moved into the mainstream for measuring the

More information

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking

Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Creating Synthetic Temporal Document Collections for Web Archive Benchmarking Kjetil Nørvåg and Albert Overskeid Nybø Norwegian University of Science and Technology 7491 Trondheim, Norway Abstract. In

More information

RUBA: Real-time Unstructured Big Data Analysis Framework

RUBA: Real-time Unstructured Big Data Analysis Framework RUBA: Real-time Unstructured Big Data Analysis Framework Jaein Kim, Nacwoo Kim, Byungtak Lee IT Management Device Research Section Honam Research Center, ETRI Gwangju, Republic of Korea jaein, nwkim, bytelee@etri.re.kr

More information

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12518

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12518 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

An Integrated Simulation and Visualization Framework for Tracking Cyclone Aila

An Integrated Simulation and Visualization Framework for Tracking Cyclone Aila An Integrated Simulation and Visualization Framework for Tracking Cyclone Aila Preeti Malakar 1, Vijay Natarajan, Sathish S. Vadhiyar, Ravi S. Nanjundiah Department of Computer Science and Automation Supercomputer

More information

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams

An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams Susan D. Urban 1, Suzanne W. Dietrich 1, 2, and Yi Chen 1 Arizona

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

The THREDDS Data Repository: for Long Term Data Storage and Access

The THREDDS Data Repository: for Long Term Data Storage and Access 8B.7 The THREDDS Data Repository: for Long Term Data Storage and Access Anne Wilson, Thomas Baltzer, John Caron Unidata Program Center, UCAR, Boulder, CO 1 INTRODUCTION In order to better manage ever increasing

More information

A Middleware Strategy to Survive Compute Peak Loads in Cloud

A Middleware Strategy to Survive Compute Peak Loads in Cloud A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: sashko.ristov@finki.ukim.mk

More information

DSEC: A Data Stream Engine Based Clinical Information System *

DSEC: A Data Stream Engine Based Clinical Information System * DSEC: A Data Stream Engine Based Clinical Information System * Yu Fan, Hongyan Li **, Zijing Hu, Jianlong Gao, Haibin Liu, Shiwei Tang, and Xinbiao Zhou National Laboratory on Machine Perception, School

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

Dynamic Thread Pool based Service Tracking Manager

Dynamic Thread Pool based Service Tracking Manager Dynamic Thread Pool based Service Tracking Manager D.V.Lavanya, V.K.Govindan Department of Computer Science & Engineering National Institute of Technology Calicut Calicut, India e-mail: lavanya.vijaysri@gmail.com,

More information

Oracle Database Scalability in VMware ESX VMware ESX 3.5

Oracle Database Scalability in VMware ESX VMware ESX 3.5 Performance Study Oracle Database Scalability in VMware ESX VMware ESX 3.5 Database applications running on individual physical servers represent a large consolidation opportunity. However enterprises

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

Manjrasoft Market Oriented Cloud Computing Platform

Manjrasoft Market Oriented Cloud Computing Platform Manjrasoft Market Oriented Cloud Computing Platform Innovative Solutions for 3D Rendering Aneka is a market oriented Cloud development and management platform with rapid application development and workload

More information

Building Platform as a Service for Scientific Applications

Building Platform as a Service for Scientific Applications Building Platform as a Service for Scientific Applications Moustafa AbdelBaky moustafa@cac.rutgers.edu Rutgers Discovery Informa=cs Ins=tute (RDI 2 ) The NSF Cloud and Autonomic Compu=ng Center Department

More information

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Dynamic Load Balancing of Virtual Machines using QEMU-KVM Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College

More information

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper.

The EMSX Platform. A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks. A White Paper. The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide

More information

liquid: Context-Aware Distributed Queries

liquid: Context-Aware Distributed Queries liquid: Context-Aware Distributed Queries Jeffrey Heer, Alan Newberger, Chris Beckmann, and Jason I. Hong Group for User Interface Research, Computer Science Division University of California, Berkeley

More information

Remote Sensitive Image Stations and Grid Services

Remote Sensitive Image Stations and Grid Services International Journal of Grid and Distributed Computing 23 Remote Sensing Images Data Integration Based on the Agent Service Binge Cui, Chuanmin Wang, Qiang Wang College of Information Science and Engineering,

More information

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array

Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation Report: Accelerating SQL Server Database Performance with the Lenovo Storage S3200 SAN Array Evaluation report prepared under contract with Lenovo Executive Summary Even with the price of flash

More information

Event-based middleware services

Event-based middleware services 3 Event-based middleware services The term event service has different definitions. In general, an event service connects producers of information and interested consumers. The service acquires events

More information

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data

StreamStorage: High-throughput and Scalable Storage Technology for Streaming Data : High-throughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Real-time analytical processing (RTAP) of vast amounts of time-series data from sensors, server logs,

More information

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang

The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2015) The Key Technology Research of Virtual Laboratory based On Cloud Computing Ling Zhang Nanjing Communications

More information

Task Scheduling in Data Stream Processing. Task Scheduling in Data Stream Processing

Task Scheduling in Data Stream Processing. Task Scheduling in Data Stream Processing Task Scheduling in Data Stream Processing Task Scheduling in Data Stream Processing Zbyněk Falt and Jakub Yaghob Zbyněk Falt and Jakub Yaghob Department of Software Engineering, Charles University, Department

More information

Cloud Computing. Lecture 5 Grid Case Studies 2014-2015

Cloud Computing. Lecture 5 Grid Case Studies 2014-2015 Cloud Computing Lecture 5 Grid Case Studies 2014-2015 Up until now Introduction. Definition of Cloud Computing. Grid Computing: Schedulers Globus Toolkit Summary Grid Case Studies: Monitoring: TeraGRID

More information

PERFORMANCE IMPACT OF WEB SERVICES ON INTERNET SERVERS

PERFORMANCE IMPACT OF WEB SERVICES ON INTERNET SERVERS PERFORMANCE IMPACT OF WEB SERVICES ON INTERNET SERVERS M. Tian, T. Voigt, T. Naumowicz, H. Ritter, J. Schiller Freie Universität Berlin Computer Systems & Telematics {tian, voigt, naumowic, hritter, schiller}@inf.fu-berlin.de

More information

BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE. D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A.

BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE. D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A. BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A. Slominski What this talk is about How to build secure, reliable applications

More information

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003

Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks. An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building Blocks An Oracle White Paper April 2003 Achieving Mainframe-Class Performance on Intel Servers Using InfiniBand Building

More information

Operating System for the K computer

Operating System for the K computer Operating System for the K computer Jun Moroo Masahiko Yamada Takeharu Kato For the K computer to achieve the world s highest performance, Fujitsu has worked on the following three performance improvements

More information

Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet

Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet Implementation of a Hardware Architecture to Support High-speed Database Insertion on the Internet Yusuke Nishida 1 and Hiroaki Nishi 1 1 A Department of Science and Technology, Keio University, Yokohama,

More information

An Active Packet can be classified as

An Active Packet can be classified as Mobile Agents for Active Network Management By Rumeel Kazi and Patricia Morreale Stevens Institute of Technology Contact: rkazi,pat@ati.stevens-tech.edu Abstract-Traditionally, network management systems

More information

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications Rouven Kreb 1 and Manuel Loesch 2 1 SAP AG, Walldorf, Germany 2 FZI Research Center for Information

More information

Improving Grid Processing Efficiency through Compute-Data Confluence

Improving Grid Processing Efficiency through Compute-Data Confluence Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform

More information

Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services

Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz Multiscale Computing Lab Biomedical Informatics Department

More information

Fault Tolerance in Hadoop for Work Migration

Fault Tolerance in Hadoop for Work Migration 1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Sigiri: Towards A Light-Weight Job Management System for Large Scale Systems

Sigiri: Towards A Light-Weight Job Management System for Large Scale Systems Sigiri: Towards A Light-Weight Job Management System for Large Scale Systems Eran Chinthaka, Suresh Marru, and Beth Plale School of Informatics, Indiana University Bloomington, Indiana, USA. {echintha,

More information

Grid Portal Development for Sensing Data Retrieval and Processing

Grid Portal Development for Sensing Data Retrieval and Processing Grid Portal Development for Sensing Data Retrieval and Processing 1 Diego Arias, Mariana Mendoza, Fernando Cintron, Kennie Cruz, and Wilson Rivera Parallel and Distributed Computing Laboratory University

More information

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications

EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications ECE6102 Dependable Distribute Systems, Fall2010 EWeb: Highly Scalable Client Transparent Fault Tolerant System for Cloud based Web Applications Deepal Jayasinghe, Hyojun Kim, Mohammad M. Hossain, Ali Payani

More information

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30

Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30 Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight

More information

Keywords: Big Data, HDFS, Map Reduce, Hadoop

Keywords: Big Data, HDFS, Map Reduce, Hadoop Volume 5, Issue 7, July 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Configuration Tuning

More information

Algorithms for Interference Sensing in Optical CDMA Networks

Algorithms for Interference Sensing in Optical CDMA Networks Algorithms for Interference Sensing in Optical CDMA Networks Purushotham Kamath, Joseph D. Touch and Joseph A. Bannister {pkamath, touch, joseph}@isi.edu Information Sciences Institute, University of Southern

More information

Load Balancing in Distributed Data Base and Distributed Computing System

Load Balancing in Distributed Data Base and Distributed Computing System Load Balancing in Distributed Data Base and Distributed Computing System Lovely Arya Research Scholar Dravidian University KUPPAM, ANDHRA PRADESH Abstract With a distributed system, data can be located

More information

13.2 THE INTEGRATED DATA VIEWER A WEB-ENABLED APPLICATION FOR SCIENTIFIC ANALYSIS AND VISUALIZATION

13.2 THE INTEGRATED DATA VIEWER A WEB-ENABLED APPLICATION FOR SCIENTIFIC ANALYSIS AND VISUALIZATION 13.2 THE INTEGRATED DATA VIEWER A WEB-ENABLED APPLICATION FOR SCIENTIFIC ANALYSIS AND VISUALIZATION Don Murray*, Jeff McWhirter, Stuart Wier, Steve Emmerson Unidata Program Center, Boulder, Colorado 1.

More information

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Applied Technology Abstract Microsoft SQL Server includes a powerful capability to protect active databases by using either

More information

DDS and SOA Interfaces to ESB

DDS and SOA Interfaces to ESB DDS and SOA Interfaces to ESB NCOIC Plenary, VA Beach 29 Mar 2007 Joe Schlesselman NCOIC OS&P WG Chair joe.schlesselman@rti.com www.rti.com Gerardo Pardo-Castellote CTO & Co-Author DDS Specification gerardo.pardo@rti.com

More information

Analysis and Modeling of MapReduce s Performance on Hadoop YARN

Analysis and Modeling of MapReduce s Performance on Hadoop YARN Analysis and Modeling of MapReduce s Performance on Hadoop YARN Qiuyi Tang Dept. of Mathematics and Computer Science Denison University tang_j3@denison.edu Dr. Thomas C. Bressoud Dept. of Mathematics and

More information

Data Mining for Data Cloud and Compute Cloud

Data Mining for Data Cloud and Compute Cloud Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,

More information

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1

A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.

More information

P ERFORMANCE M ONITORING AND A NALYSIS S ERVICES - S TABLE S OFTWARE

P ERFORMANCE M ONITORING AND A NALYSIS S ERVICES - S TABLE S OFTWARE P ERFORMANCE M ONITORING AND A NALYSIS S ERVICES - S TABLE S OFTWARE WP3 Document Filename: Work package: Partner(s): Lead Partner: v1.0-.doc WP3 UIBK, CYFRONET, FIRST UIBK Document classification: PUBLIC

More information

Processing Flows of Information: From Data Stream to Complex Event Processing

Processing Flows of Information: From Data Stream to Complex Event Processing Processing Flows of Information: From Data Stream to Complex Event Processing GIANPAOLO CUGOLA and ALESSANDRO MARGARA Dip. di Elettronica e Informazione Politecnico di Milano, Italy A large number of distributed

More information

Scalability and Classifications

Scalability and Classifications Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

More information

Email: tjohn@mail.nplindia.ernet.in

Email: tjohn@mail.nplindia.ernet.in USE OF VIRTUAL INSTRUMENTS IN RADIO AND ATMOSPHERIC EXPERIMENTS P.N. VIJAYAKUMAR, THOMAS JOHN AND S.C. GARG RADIO AND ATMOSPHERIC SCIENCE DIVISION, NATIONAL PHYSICAL LABORATORY, NEW DELHI 110012, INDIA

More information

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age.

Keywords: Dynamic Load Balancing, Process Migration, Load Indices, Threshold Level, Response Time, Process Age. Volume 3, Issue 10, October 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Load Measurement

More information

Cloud Storage Solution for WSN Based on Internet Innovation Union

Cloud Storage Solution for WSN Based on Internet Innovation Union Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,

More information

PART III. OPS-based wide area networks

PART III. OPS-based wide area networks PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity

More information

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce

More information

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT

GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT GENERIC DATA ACCESS AND INTEGRATION SERVICE FOR DISTRIBUTED COMPUTING ENVIRONMENT Hemant Mehta 1, Priyesh Kanungo 2 and Manohar Chandwani 3 1 School of Computer Science, Devi Ahilya University, Indore,

More information

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12902

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12902 Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited

More information

Effective Parameters on Response Time of Data Stream Management Systems

Effective Parameters on Response Time of Data Stream Management Systems Effective Parameters on Response Time of Data Stream Management Systems Shirin Mohammadi 1, Ali A. Safaei 1, Mostafa S. Hagjhoo 1 and Fatemeh Abdi 2 1 Department of Computer Engineering, Iran University

More information

Quiz for Chapter 6 Storage and Other I/O Topics 3.10

Quiz for Chapter 6 Storage and Other I/O Topics 3.10 Date: 3.10 Not all questions are of equal difficulty. Please review the entire quiz first and then budget your time carefully. Name: Course: Solutions in Red 1. [6 points] Give a concise answer to each

More information

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South

More information

Performance Analysis of VM Scheduling Algorithm of CloudSim in Cloud Computing

Performance Analysis of VM Scheduling Algorithm of CloudSim in Cloud Computing IJECT Vo l. 6, Is s u e 1, Sp l-1 Ja n - Ma r c h 2015 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Performance Analysis Scheduling Algorithm CloudSim in Cloud Computing 1 Md. Ashifuddin Mondal,

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Asynchronous Data Mining Tools at the GES-DISC

Asynchronous Data Mining Tools at the GES-DISC Asynchronous Data Mining Tools at the GES-DISC Long B. Pham, Stephen W. Berrick, Christopher S. Lynnes and Eunice K. Eng NASA Goddard Space Flight Center Distributed Active Archive Center Introduction

More information

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications White Paper Table of Contents Overview...3 Replication Types Supported...3 Set-up &

More information

Figure 1: Illustration of service management conceptual framework

Figure 1: Illustration of service management conceptual framework Dagstuhl Seminar on Service-Oriented Computing Session Summary Service Management Asit Dan, IBM Participants of the Core Group Luciano Baresi, Politecnico di Milano Asit Dan, IBM (Session Lead) Martin

More information

System Models for Distributed and Cloud Computing

System Models for Distributed and Cloud Computing System Models for Distributed and Cloud Computing Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Classification of Distributed Computing Systems

More information

Oracle Database 11g Comparison Chart

Oracle Database 11g Comparison Chart Key Feature Summary Express 10g Standard One Standard Enterprise Maximum 1 CPU 2 Sockets 4 Sockets No Limit RAM 1GB OS Max OS Max OS Max Database Size 4GB No Limit No Limit No Limit Windows Linux Unix

More information

Investigations on Hierarchical Web service based on Java Technique

Investigations on Hierarchical Web service based on Java Technique Investigations on Hierarchical Web service based on Java Technique A. Bora, M. K. Bhuyan and T. Bezboruah, Member, IAENG Abstract We have designed, developed and implemented a hierarchical web service

More information