ISO/IEC JTC 1 N 12414

Size: px
Start display at page:

Download "ISO/IEC JTC 1 N 12414"

Transcription

1 ISO/IEC JTC 1 N 11 ISO/IEC JTC 1 Information technology Secretariat: ANSI (United States) Document type: Title: Status: Text for NP ballot Proposal for a New Work Item on Information technology Big Data Definition and Vocabulary Please submit your vote via the online balloting system. Date of document: Source: Expected action: Korea VOTE Action due date: of secretary: Committee URL: lrajchel@ansi.org

2 NEW WORK ITEM PROPOSAL Closing date for voting Date of circulation Secretariat ANSI Reference number (to be given by the Secretariat) ISO/IEC JTC 1 NXXXXX Proposer: Korea A proposal for a new work item within the scope of an existing committee shall be submitted to the secretariat of that committee with a copy to the ITTF and, in the case of a subcommittee, a copy to the JTC 1 secretariat. Proposals not within the scope of JTC 1 shall be submitted to the ITTF. A new work item proposal may be made by a national body, the JTC 1 secretariat or JTC 1 subcommittee secretariat, another technical committee or subcommittee, a Category A organization in liaison with JTC 1, the technical management board or one of its advisory groups, or the Chief Executive Officer. The proposal will be circulated to the P-members of JTC 1 or JTC 1 subcommittee for voting, and to the O-members for information. IMPORTANT NOTE: Proposals without adequate justification risk rejection or referral back to the originator. Guidelines for proposing and justifying a new work item are given overleaf. Proposal (to be completed by the proposer) Title of the proposed deliverable (in the case of an amendment, revision or a new part of an existing publication, show the reference number and current title) Information technology Big Data Definition and Vocabulary Scope of the proposed deliverable The proposed Internal Standard will provide a definition of Big Data along with a set of terms in fields of Big Data. It is a terminology foundation for Big Data standardization works. Its scope will include: Big Data Definition and Taxonomies, Big Data Elements, Big Data Patterns. This project will not be addressing legal, regulatory, and cross-border trade discussions, but will help with, and provide a framework for, and deeper and better understanding of, issues involving such topics so that others in the international community can address them. Purpose and justification of the proposal (attach a separate page as annex, if necessary) The purpose of this publication is to create a standard which provides common terms and definitions for the field of Big Data. Justification: Big data is being defined and classified with different ways by industry, academia, and research institutes. This is a field in flux, and different people may have different conceptions of what terms mean. This impedes efforts to standardize Big Data because each specification provides its own definitions, often obscuring the terminology similarities and differences across specifications. This Internal Standard will clarify the underlying concepts of Big Data and Data Science to enhance communication among producers and consumers of these new technologies by helping us use the same term for the same concept. Is this a Management Systems Standard (MSS)? Yes No Envisaged publication type (indicate one of the following, if possible) International Standard Technical Specification Technical Report Proposed development track 1 ( months) ( months - default) (8 months) Target dates for availability First CD 1. October 01 Publication 1. October 018 Known patented items (see ISO/IEC Directives Part 1 for important guidance) Yes No If "Yes", provide full information in an annex. Are there any known accessibility requirements and/or dependencies? Yes No If yes, please specify in a separate annex. Are there any known requirements for cultural and linguistic adaptability? Yes No If yes, please specify in a separate annex. DRAFT Version 01-mm-dd JTC 1 New Work Item Proposal Page 1

3 Meeting Information Estimated number of meetings: 8 Frequency of meetings: Date and place of first meeting (if known): - per year A listing of relevant documents ISO 0:009, Terminology work Principles and methods ISO 80:00 Terminology work Harmonisation of concepts and terms ISO 101-1:011, Terminology entries in standards Part 1: General requirements and examples of presentation ISO 101-:011, Terminology entries in standards Part : Adoption of standardised terminological entries ISO/IEC Information Technology - Cloud Computing - Overview and Vocabulary A statement from the proposer as to how the proposed work may relate to or impact on existing work, especially existing ISO and IEC deliverables. The proposer should explain how the work differs from apparently similar work, or explain how duplication and conflict will be minimized. (attach a separate page as annex, if necessary) The proposed work will complement existing work in the area of terminology. It will build on content from several ISO standard and produce a profile that is aligned with Big Data requirements. The work will also consider related projects such as ISO/IEC Information Technology - Cloud Computing - Overview and Vocabulary. A simple and concise statement identifying and describing relevant affected stakeholder categories (including small and medium sized enterprises) and how they will each benefit from or be impacted by the proposed deliverable(s). The proposed deliverables will benefit the international Big Data community and vendor by facilitating the consistent development and providing common terms & definitions for the field of Big Data. Liaisons (A listing of relevant external international organizations or internal parties (other ISO and/or IEC committees) to be engaged as liaisons in the development of the deliverable(s).) TeleManagement Forum, OASIS A listing of relevant countries which are not already P-members of the committee. Preparatory work (at a minimum an outline should be included with the proposal; ensure that all copyright issues are identified) A draft is attached An existing document is attached An outline is attached. to serve as an initial basis It is possible to supply a draft by The proposer or the proposer's organization is prepared to undertake the preparatory work required Yes No Proposed Project Editor (include contact information) Kangchan Lee chan@etri.re.kr for KATS (South Korea) Name of the Proposer (include contact information) Suwook Ha, Kangchan Lee (ETRI) sw.ha@etri.re.kr, chan@etri.re.kr Supplementary information relating to the proposal (Comments of the JTC 1 or SC Secretariat ) This proposal relates to a new ISO/IEC document This proposal relates to the amendment of an existing ISO/IEC document This proposal relates to the revision of an existing ISO/IEC document This proposal relates to a multi-part standard consisting of 10 parts This proposal relates to the adoption as an active project of an item currently registered as a Preliminary Work Item This proposal relates to the re-establishment of a cancelled project as an active project This proposal requires the service of a maintenance agency. If yes, has a potential candidate been identified? Please identify This proposal requires the service of a registration authority. If yes, has a potential candidate been identified? Please identify This proposal is submitted with a CD for simultaneous NP and CD balloting Other: Voting information The ballot associated with this proposal comprises a vote on (check only one): Adoption of the proposal as a new project (Stage 10.99) Adoption of the proposal as a new project and the associated draft as a working draft (WD) (Stage 0.0) Adoption of the proposal as a new project and the associated draft as a committee draft (CD) (Stage 0.0) Other: It is proposed to assign this new item to: JTC 1/WG9 a new JTC 1 subcommittee Annex(es) are included with this proposal (give details) None. DRAFT Version 01-mm-dd JTC 1 New Work Item Proposal Page

4 Use this form to propose: a) a new ISO/IEC document (including a new part to an existing ISO/IEC document), or the amendment/revision of an existing ISO/IEC document; b) the establishment as an active project of a preliminary work item, or the re-establishment of a cancelled project; c) the change in the type of an existing publication, e.g. conversion of a Technical Specification into an International Standard. This form is not intended for use to propose an action following a systematic review - use ISO Form 1 for that purpose. Proposals for correction (i.e. proposals for a Technical Corrigendum) should be submitted in writing directly to the secretariat concerned. DRAFT Version 01-mm-dd JTC 1 New Work Item Proposal Page

5 Guidelines on the completion of a proposal for a new work item (see also the ISO/IEC Directives Part 1 and the associated ISO/IEC JTC 1 Supplement) a) Title of the proposed deliverable: Indicate the subject of the proposed new work item. b) Scope of the proposed deliverable: Give a clear indication of the coverage of the proposed new work item. Indicate, for example, if this is a proposal for a new publication, or a proposed change (amendment/revision). It is often helpful to indicate what is not covered (exclusions). c) Envisaged publication type: Details of the types of ISO/IEC deliverable available are given in the ISO/IEC Directives, Part 1 and/or the associated JTC 1 Supplement. d) Purpose and justification of the proposal: Give details based on a critical study of the following elements wherever practicable. Wherever possible reference should be made to information contained in the related Business Plan. 1) The specific aims and reason for the standardization activity, with particular emphasis on the aspects of standardization to be covered, the problems it is expected to solve or the difficulties it is intended to overcome. ) The main interests that might benefit from or be affected by the activity, such as industry, consumers, trade, governments, distributors. ) Feasibility of the activity: Are there factors that could hinder the successful establishment or global application of the standard? ) Timeliness of the standard to be produced: Is the technology reasonably stabilized? If not, how much time is likely to be available before advances in technology may render the proposed standard outdated? Is the proposed standard required as a basis for the future development of the technology in question, or for adoption in a future regulatory system? ) Urgency of the activity, considering the needs of other fields or organizations. Indicate target date and, when a series of standards is proposed, suggest priorities. ) The benefits to be gained by the implementation of the proposed standard; alternatively, the loss or disadvantage(s) if no standard is established within a reasonable time. Data such as product volume or value of trade should be included and quantified. ) If the standardization activity is, or is likely to be, the subject of regulations or to require the harmonization of existing regulations, this should be indicated. 8) If a series of new work items is proposed having a common purpose and justification, a common proposal may be drafted including all elements to be clarified and enumerating the titles and scopes of each individual item. e) Relevant documents: List any known relevant documents (such as standards and regulations), regardless of their source. NOTE: The following criteria f) and g) do not mandate any feature for adaptability to culture, language, human functioning or context of use. The following criteria require that if any features are provided for adapting to culture, language, human functioning or context of use by the new Work Item proposal, then the proposer is required to identify these features. f) Accessibility: Indicate here whether the proposed standard takes into account diverse human functioning and diverse contexts of use. If so, indicate how it is addressed in your project plan. Indicate how the guidelines of ISO/IEC Guide 1 (Guidelines for standards developers to address the needs of older persons and persons with disabilities), ISO/IEC TR (Information technology -- Accessibility considerations for people with disabilities -- Part 1: User needs summary), and ISO TR 11 (Ergonomics data and guidelines for the application of ISO/IEC Guide 1 to products and services to address the needs of older persons and persons with disabilities) have been implemented in the proposal, or why they are not deemed to be relevant. g) Cultural and linguistic adaptability: Indicate here if cultural and natural language adaptability is applicable to your project. If so, indicate how it is addressed in your project plan. Typical examples of requirements include: 1) for text or speech, the user shall be able to choose the natural language of input and output sentences and the language captured shall be identified; ) for character coding, the code shall be clearly identified for correct input and rendering; ) for sorted lists, linguistic user order expectations shall be respected (see ISO/IEC 11 International string ordering and comparison); ) cultural variations in the way concepts are perceived in different countries shall be respected; and ) input methods used in a given country shall also be supported. For a list of what is required in most IT products, see ISO/IEC TR 19 (Guidelines, methodology, and reference criteria for cultural and linguistic adaptability in information technology products) and ISO/IEC TR 1101 (Framework for internationalization). h) Liaisons: List the relevant external international organizations or internal parties (other ISO and/or IEC committees) to be engaged as liaisons in the development of the deliverable(s). i) Preparatory Work: When the proposer considers that an existing well-established document may be acceptable as a standard (with or without amendment), indicate this with appropriate justification and attach a copy to the proposal. In this case, provide the document publication date, implementation history and national/global adoption experience. DRAFT Version 01-mm-dd JTC 1 New Work Item Proposal Page

6 Error! Reference source not found. Information Technology Big Data Definition and Vocabulary Document type: Error! Reference source not found. Document subtype: Error! Reference source not found. Document stage: Error! Reference source not found. Document language: Error! Reference source not found. C:\Users\lrajchel\Desktop\JTC 1 to do\korea NPs\KNB_NWIP_Definition_and_Vocabulary_outline.doc Error! Reference source not found.

7 Contents Page 1 Scope...1 Normative references...1 Terms and definitions Terms defined in this International Standard...1. Terms defined elsewhere... Symbols and abbreviated terms... Overview of Big Data....1 General description of Big Data.... Definition of Big Data.... Key characteristics of Big Data Volume..... Variety..... Velocity..... Variability..... Veracity... Actors and Roles for Big Data Data Provider...8. Data Consumer...8. Big Data Application Provider...8. Big Data Framework Provider...8 Big Data Elements Data Elements...8. Dataset at Rest...8. Dataset in Motion...9. Analytics within Data Science...9. Big Data Metrics...9. Big Data Security and Protection Big Data Patterns Data Process Data Process Ordering Changes Transactions Data Analytics Time Window Storage Medium Changes...9 Annex A (Informative) Examples and Use case of Big Data Bibliography ii Error! Reference source not found.

8 Information Technology Big Data Definition and Vocabulary 1 Scope This Internal Standard will provide a definition of Big Data along with a set of terms in fields of Big Data. It is a terminology foundation for Big Data standardization works. Its scope will include: Big Data Definition and Taxonomies, Big Data Elements, Big Data Patterns Note - This project will not be addressing legal, regulatory, and cross-border trade discussions, but will help with, and provide a framework for, and deeper and better understanding of, issues involving such topics so that others in the international community can address them Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. ISO 0:009, Terminology work Principles and methods ISO 80:00, Terminology work Harmonisation of concepts and terms ISO 101-1:011, Terminology entries in standards Part 1: General requirements and examples of presentation ISO 101-:011, Terminology entries in standards Part : Adoption of standardised terminological entries ISO/IEC 188, Information Technology - Cloud Computing Overview and Vocabulary Terms and definitions.1 Terms defined in this International Standard.1.1. Big Data Analytics: analytical functions to support the integration of results derived in parallel across distributed pieces of one or more data sources. This is a rapidly evolving field both in terms of functionality and the underlying programming model. Error! Reference source not found.error! Reference source not found. 1

9 Big Data Engineering: the storage and data manipulation technologies that leverage a collection of horizontally coupled resources to achieve a nearly linear scalability in performance..1.. Big Data Models: logical data models (relational and non-relational) and processing/computation models (batch, streaming, and transactional) for the storage and manipulation of data across horizontally scaled resources..1.. Big Data Paradigm: the distribution of data systems across horizontally-coupled independent resources to achieve the scalability needed for the efficient processing of extensive datasets..1.. NoSQL: datastores and interfaces that are not tied to strict relational approaches. Alternately called no SQL or not only SQL..1.. Non-Relational Models: logical data models such as document, graph, key value and others that are used to provide more efficient storage and access to non-tabular datasets..1.. Schema-on-read: big data is often stored in a raw form based on its production, with the schema, needed for organizing (and often cleansing) the data, is discovered and transformed as the data is queried Terms defined elsewhere..1. cloud computing: paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand [Recommendation ITU-T Y.00 ISO/IEC 188, 01]... framework: structure expressed in diagrams, text, and formal rules which relates the components of a conceptual entity to each other [ISO 118-1, 01]... Internet of Things: an integrated environment, inter-connecting anything, anywhere at anytime [ISO/IEC JTC 1 SWG Report, 01]... lifecycle: evolution of a system, product, service, project or other human-made entity from conception through retirement [ISO/IEC TR , 011] Error! Reference source not found.

10 ownership: legal right of possession, including the right of disposition, and sharing in all the risks and profits commensurate with the degree of ownership interest or shareholding, as demonstrated by an examination of the substance, rather than the form, of ownership arrangements [ISO 108-, 011]... relational model : a data model whose structure is based on a set of relations [ISO/IEC 8-1, 1999]... role: a set of activities that serves a common purpose [Recommendation ITU-T Y.0 ISO/IEC 189, 01]..8. security: all aspects related to defining, achieving, and maintaining confidentiality, integrity, availability, non-repudiation, accountability, authenticity, and reliability of a system [ISO/IEC 188, 008]..9. sensor: device that observes and measures a physical property of a natural phenomenon or manmade process and converts that measurement into a signal [ISO/IEC 918-, 01]..10. streaming data: data passing across an interface from a source that is operating continuously [ISO/IEC 198-, 011] 8 9 Symbols and abbreviated terms For the purposes of this International Standard, the following abbreviations apply: 0 1 MPP NoSQL RA RFID SQL Massively Parallel Processing Not Only Structured Query Language Reference Architecture Radio-Frequency Identification SQL Query Language 8 Overview of Big Data In recent years, the term Big Data has emerged to describe a new paradigm for data applications. New technologies tend to emerge with a lot of hype, but it can take some time to tell what is new and different. While Big Data has been defined in a myriad of ways, the heart of the Big Data paradigm is that is too big Error! Reference source not found.

11 (volume), arrives too fast (velocity), changes too fast (variability), contains too much noise (veracity), or is too diverse (variety) to be processed within a local computing structure using traditional approaches and techniques. The technologies being introduced to support this paradigm have a wide variety of interfaces making it difficult to construct tools and applications that integrate data from multiple Big Data sources. This report identifies potential areas for standardization within the Big Data technology space..1 General description of Big Data Big Data is used as a concept that refers to the inability of traditional data architectures to efficiently handle the new data sets. Characteristics that force a new architecture to achieve efficiencies are the dataset-at-rest characteristics volume, and variety of data from multiple domains or types; and from the data-in-motion characteristics of velocity, or rate of flow, and variability (principally referring to a change in velocity). Each of these characteristics results in different architectures or different data lifecycle process orderings to achieve needed efficiencies. A number of other terms (often starting with the letter V ) are also used, but a number of these refer to the analytics and; not big data architectures. The new big data paradigm occurs when the scale of the data at rest or in motion forces the management of the data to be a significant driver in the design of the system architecture. Fundamentally the big data paradigm represents a shift in data system architectures from monolithic systems with vertical scaling (faster processors or disks) into a horizontally scaled system that integrates a loosely coupled set of resources. This shift occurred 0-some years ago in the simulation community when the scientific simulations began using massively parallel processing (MPP) systems. In different combinations of splitting the code and data across independent processors, computational scientists were able to greatly extend their simulation capabilities. This of course introduced a number of complications in such areas as message passing, data movement, and latency in the consistency across resources, load balancing, and system inefficiencies while waiting on other resources to complete their tasks. In the same way, the big data paradigm represents this same shift, again using different mechanisms to distribute code and data across loosely-coupled resources in order to provide the scaling in data handling that is needed to match the scaling in the data. The purpose of storing and retrieving large amounts of data is to perform analysis that produces additional knowledge about the data. In the past, the analysis was generally accomplished on a random sample of the data. Big Data Paradigm consists of the distribution of data systems across horizontally-coupled independent resources to achieve the scalability needed for the efficient processing of extensive datasets. With the new Big Data Paradigm, analytical functions can be executed against the entire data set or even in real-time on a continuous stream of data. Analysis may even integrate multiple data sources from different organizations. For example, consider the question What is the correlation between insect borne diseases, temperature, precipitation, and changes in foliage. To answer this question an analysis would need to integrate data about incidence and location of diseases, weather data, and aerial photography. While we certainly expect a continued evolution in the methods to achieve efficient scalability across resources, this paradigm shift (in analogy to the prior shift in the simulation community) is a one-time occurrence; at least until a new paradigm shift occurs beyond this crowdsourcing of processing or data system across multiple horizontally-coupled resources. Big Data Engineering is the storage and data manipulation technologies that leverage a collection of horizontally coupled resources to achieve a nearly linear scalability in performance. New engineering techniques in the data layer have been driven by the growing prominence of data types that cannot be handled efficiently in a traditional relational model. The need for scalable access in structured and unstructured data has led to software built on name-value/key-value pairs or columnar (big table), documentoriented, and graph (including triple-store) paradigms. Non-Relational Models refers to logical data models such as document, graph, key value and others that are used to provide more efficient storage and access to non-tabular datasets. Error! Reference source not found.

12 NoSQL (alternately called no SQL or not only SQL ) refers to datastores and interfaces that are not tied to strict relational approaches. Big Data Models refers to logical data models (relational and non-relational) and processing/computation models (batch, streaming, and transactional) for the storage and manipulation of data across horizontally scaled resources. Schema-on-read big data is often stored in a raw form based on its production, with the schema, needed for organizing (and often cleansing) the data, is discovered and transformed as the data is queried. This is critical since in order for many analytics to run efficiently the data must be structured to support the specific algorithms or processing frameworks involved. Big Data Analytics is rapidly evolving both in terms of functionality and the underlying programming model. Such analytical functions support the integration of results derived in parallel across distributed pieces of one or more data sources. The Big Data paradigm has other implications from these technical innovations. The changes are not only in the logical data storage, but in the parallel distribution of data and code in the physical file system and direct queries against this storage. The shift in thinking causes changes in the traditional data lifecycle. One description of the end-to-end data lifecycle categorizes the steps as collection, preparation, analysis and action. Different big data use cases can be characterized in terms of the dataset characteristics at-rest or in-motion, and in terms of the time window for the end-to-end data lifecycle. Dataset characteristics change the data lifecycle processes in different ways, for example in the point of a lifecycle at which the data is placed in persistent storage. In a traditional relational model, the data is stored after preparation (for example after the extract-transform-load and cleansing processes). In a high velocity use case, the data is prepared and analyzed for alerting, and only then is the data (or aggregates of the data) given a persistent storage. In a volume use case the data is often stored in the raw state in which it was produced, prior to the application of the preparation processes to cleanse and organize the data. The consequence of persistence of data in its raw state is that a schema or model for the data is only applied when the data is retrieved, known as schema on read. A third consequence of big data engineering is often referred to as moving the processing to the data, not the data to the processing. The implication is that the data is too extensive to be queried and transmitted into another resource for analysis, so the analysis program is instead distributed to the data-holding resources; with only the results being aggregated on a different resource. Since I/O bandwidth is frequently the limited resource in moving data, another approach would be to embed query/filter programs within the physical storage medium. At its heart, Big Data refers to the extension of data repositories and processing across horizontally-scaled resources, much in the same way the compute-intensive simulation community embraced massively parallel processing two decades ago. In the past, classic parallel computing applications utilized a rich set of communications and synchronization constructs and created diverse communications topologies. In contrast, today, with data sets growing into the Petabyte and Exabyte scales, distributed processing frameworks offering patterns such as map-reduce, offer a reliable high-level, commercially viable compute model based on commodity computing resources, dynamic resource scheduling, and synchronization techniques.. Definition of Big Data The term Big Data is used in a variety of contexts with a variety of characteristics. To understand where standards will help support the big data paradigm, we have to reach some level of consensus on what the term really means. This report uses the following working definition of Big Data : Big Data is a data set(s) with characteristics (e.g. volume, velocity, variety, variability, veracity, etc.) that for a particular problem domain at a given point in time cannot be efficiently processed using current/existing/established/traditional technologies and techniques in order to extract value. Error! Reference source not found.

13 The above definition distinguishes Big Data from business intelligence and traditional transactional processing while alluding to a broad spectrum of applications that includes them. The ultimate goal of processing Big Data is to derive differentiated value that can be trusted (because the underlying data can be trusted). This is done through the application of advanced analytics against the complete corpus of data regardless of scale. Parsing this goal helps frame the value discussion for Big-Data use cases. Any scale of operations and data: Utilizing the entire corpus of relevant information, rather than just samples or subsets. It's also about unifying all decision-support time-horizons (past, present, and future) through statistically derived insights into deep data sets in all those dimensions. Trustworthy data: Deriving valid insights either from a single-version-of-truth consolidation and cleansing of deep data, or from statistical models that sift haystacks of "dirty" data to find the needles of valid insight. Advanced analytics: Faster insights through a variety of analytic and mining techniques from data patterns, such as long tail analyses, micro-segmentations, and others, that are not feasible if you're constrained to smaller volumes, slower velocities, narrower varieties, and undetermined veracities. A difficult question is what makes Big Data big, or how large does a dataset have to be in order to be called big data? The answer is an unsatisfying it depends. Part of this issue is that Big is a relative term and with the growing density of computational and storage capabilities (e.g. more power in smaller more efficient form factors) what is considered big today will likely not be considered big tomorrow. Data is considered big if the use of the new scalable architectures provides improved business efficiency over other traditional architectures, in other words the functionality cannot be achieved in something like a traditional relational database platform. Big data essentially focuses on the self-referencing viewpoint that data is big because it requires scalable systems to handle it, and architectures with better scaling have come about because of the need to handle big data Key characteristics of Big Data..1 Volume Traditionally, the data volume requirements for analytic and transactional applications were in sub-terabyte territory. However, over the past decade, more organizations in diverse industries have identified requirements for analytic data volumes in the terabytes, petabytes, and beyond. Estimates produced by longitudinal studies started in 00 [8] show that the amounts of data in the world is doubling every two years. Should this trend continue, by 00, there will be 0 times the amount of data as there had been in 011. Other estimates indicate that 90% of all data ever created, was created in the past years. The sheer volume of the data is colossal - the era of a trillion sensors is upon us. This volume presents the most immediate challenge to conventional information technology structures. It has stimulated new ways for scalable storage across a collection of horizontally coupled resources, and a distributed approach to querying. Briefly, the traditional relational model has been relaxed for the persistence of newly prominent data types. These logical non-relational data models, typically lumped together as NoSQL, can currently be classified as Big Table, Name-Value, Document and Graphical models. A discussion of these logical models was not part of the phase one activities that led to this document... Variety Traditionally, enterprise data implementations for analytics and transactions operated on a single structured, row-based, relational domain of data. However, increasingly, data applications are creating, consuming, Error! Reference source not found.

14 processing, and analyzing data in a wide range of relational and non-relational formats including structured, unstructured, semi-structured, documents and so forth from diverse application domains. Traditionally, a variety of data was handled through transforms or pre-analytics to extract features that would allow integration with other data through a relational model. Given the wider range of data formats, structures, timescales and semantics that are desirous to use in analytics, the integration of this data becomes more complex. This challenge arises as data to be integrated could be text from social networks, image data, or a raw feed directly from a sensor source. The Internet of Things is the term used to describe the ubiquity of connected sensors, from RFID tags for location, to smart phones, to home utility meters. The fusion of all of this streaming data will be a challenge for developing a total situational awareness. Big Data Engineering has spawned data storage models that are more efficient for unstructured data types than a relational model, causing a derivative issue for the mechanisms to integrate this data. It is possible that the data to be integrated for analytics may be of such volume that it cannot be moved in order to integrate, or it may be that some of the data is not under control of the organization creating the data system. In either case, the variety of big data forces a range of new big data engineering in order to efficiently and automatically integrate data that is stored across multiple repositories and in multiple formats... Velocity The Velocity is the speed/rate at which the data is created, stored, analyzed and visualized. Traditionally, most enterprises separated their transaction processing and analytics. Enterprise data analytics were concerned with batch data extraction, processing, replication, delivery, and other applications. But increasingly, organizations everywhere have begun to emphasize the need for real-time, streaming, continuous data discovery, extraction, processing, analysis, and access. In the big data era, data is created in real-time or near real-time. With the availability of Internet connected devices, wireless or wired, machines and devices can pass-on their data the moment it is created. Data Flow rates are increasing with enormous speeds and variability, creating new challenges to enable real or near real-time data usage. Traditionally this concept has been described as streaming data. As such there are aspects of this that are not new, as companies such as those in telecommunication have been sifting through high volume and velocity data for years. The new horizontal scaling approaches do however add new big data engineering options for efficiently handling this data... Variability Variability refers to changes in data rate, format/structure, semantics, and/or quality that impact the supported application, analytic, or problem. Specifically, variability is a change in one or more of the other Big Data characteristics. Impacts can include the need to refactor architectures, interfaces, processing/algorithms, integration/fusion, storage, applicability, or use of the data. The other characteristics directly affect the scope of the impact for a change in one dimension. For, example in a system that deals with petabytes or exabytes of data refactoring the data architecture and performing the necessary transformation to accommodate a change in structure from the source data may not even be feasible even with the horizontal scaling typically associated with big data architectures. In addition, the trend to integrate data from outside the organization to obtain more refined analytic results combined with the rapid evolution in technology means that enterprises must be able to adapt rapidly to data variations... Veracity Veracity refers to the trustworthiness, applicability, noise, bias, abnormality and other quality properties in the data. Veracity is a challenge in combination with other Big Data characteristics, but is essential to the value associated with or developed from the data for a specific problem/application. Assessment, understanding, exploiting, and controlling Veracity in Big Data cannot be addressed efficiently and sufficiently throughout the data lifecycle using current technologies and techniques. Error! Reference source not found.

15 Actors and Roles for Big Data.1 Data Provider A Data Provider makes data available to themselves or to others. The actor fulfilling this role can be part of the Big Data system, internal to the organization in another system, or external to the organization orchestrating the system. Once the data is within the local system, requests to retrieve the needed data will be made by the Big Data Application Provider and routed to the Big Data Framework Provider. Data Consumer The Data Consumer receives the value output of the Big Data system. In many respects, they are the recipients of the same functionality that the Data Provider brings to the Big Data Application Provider. After the system adds value to the original data sources, the Big Data Application Provider then offers that same functionality to the Data Consumer.. Big Data Application Provider The Big Data Application Provider executes the manipulations of the data lifecycle to meet the requirements. This is where the general capabilities within the Big Data Framework are combined to produce the specific data system.. Big Data Framework Provider The Big Data Framework Provider has general resources or services to be used by the Big Data Application Provider in the creation of the specific application. There are many new components here from which the Big Data Application Provider can choose in utilizing these resources and the network to build the specific system Big Data Elements The rate of growth in the amount of data generated and stored has been increasing exponentially. Data growth rates are considered to be more than Moore s Law would indicate if applied to data volumes - implying a doubling in volume every three months. This data explosion is creating opportunities for new ways of combining and using data to find value. One of the significant shifts is in the amount of unstructured data. Structured data has typically been the focus of most enterprise analytics, and has been handled through use of the relational data model. Micro-texts, relationship data, images and videos have seen such an explosion in quantity that the trend is to try to incorporate this data to generate value. The central benefit of Big Data analytics is the ability to process large amounts and types of information. Big Data To understand the components of Big Data, and to characterize the terms that are associated with Big Data and data science, the concepts will be addressed by category. 1.1 Data Elements. Dataset at Rest 8 Error! Reference source not found.

16 . Dataset in Motion. Analytics within Data Science Big Data Metrics. Big Data Security and Protection Big Data Patterns 8.1 Data Process Data Process Ordering Changes Transactions 1 8. Data Analytics Time Window 8. Storage Medium Changes Error! Reference source not found. 9

17 Annex A (Informative) Examples and Use case of Big Data 10 Error! Reference source not found.

18 Bibliography [01] Error! Reference source not found. 11

la conception et l'exploitation d'un système électroniques

la conception et l'exploitation d'un système électroniques Philippe NEW WORK ITEM PROPOSAL SC3 MARTIN 171 1 Date of presentation Reference number 2008/07/29 (to be given by the Secretariat) Proposer ISO/TC / SC N Secretariat 170 A proposal for a new work item

More information

xxxxx Conformity assessment Requirements for third party certification auditing of environmental management systems - competence requirements

xxxxx Conformity assessment Requirements for third party certification auditing of environmental management systems - competence requirements NEW WORK ITEM PROPOSAL Date of presentation 2011-02-25 Reference number (to be given by the Secretariat) Proposer ISO/TC 207/SC 2 ISO/TC 207 / SC 2 N 251 Secretariat NEN A proposal for a new work item

More information

Private Circulation Document: IST/35_07_0075

Private Circulation Document: IST/35_07_0075 Private Circulation Document: IST/3_07_007 BSI Group Headquarters 389 Chiswick High Road London W4 4AL Tel: +44 (0) 20 8996 9000 Fax: +44 (0) 20 8996 7400 www.bsi-global.com Committee Ref: IST/3 Date:

More information

ISO/IEC JTC 1 Information technology. Big data

ISO/IEC JTC 1 Information technology. Big data ISO/IEC JTC 1 Information technology Big data Preliminary Report 2014 Our vision To be the world s leading provider of high quality, globally relevant International Standards through its members and stakeholders.

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Cloud and Big Data Standardisation

Cloud and Big Data Standardisation Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

More information

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics Big Data Terminology - Key to Predictive Analytics Success Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics Outline Big Data Phenomena Terminology Role Background on

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Understanding traffic flow

Understanding traffic flow White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow

More information

BIG Big Data Public Private Forum

BIG Big Data Public Private Forum DATA STORAGE Martin Strohbach, AGT International (R&D) THE DATA VALUE CHAIN Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing

More information

This document is a preview generated by EVS

This document is a preview generated by EVS INTERNATIONAL STANDARD ISO 10781 Second edition 2015-08-01 Health Informatics HL7 Electronic Health Records-System Functional Model, Release 2 (EHR FM) Informatique de santé Modèle fonctionnel d un système

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A SURVEY ON BIG DATA ISSUES AMRINDER KAUR Assistant Professor, Department of Computer

More information

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Virtualization A Potential Antidote for Big Data Growing Pains perspective Data Virtualization A Potential Antidote for Big Data Growing Pains Atul Shrivastava Abstract Enterprises are already facing challenges around data consolidation, heterogeneity, quality, and

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

The InterNational Committee for Information Technology Standards INCITS Big Data

The InterNational Committee for Information Technology Standards INCITS Big Data The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance

More information

Big Data Integration: A Buyer's Guide

Big Data Integration: A Buyer's Guide SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

for Oil & Gas Industry

for Oil & Gas Industry Wipro s Upstream Storage Solution for Oil & Gas Industry 1 www.wipro.com/industryresearch TABLE OF CONTENTS Executive summary 3 Business Appreciation of Upstream Storage Challenges...4 Wipro s Upstream

More information

Adobe Insight, powered by Omniture

Adobe Insight, powered by Omniture Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

INTERNATIONAL STANDARD

INTERNATIONAL STANDARD ISO/IEC 14543-4-2 INTERNATIONAL STANDARD Edition 1.0 2008-05 Information technology Home electronic system (HES) architecture Part 4-2: Communication layers Transport, network and general parts of data

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Big Data Systems and Interoperability

Big Data Systems and Interoperability Big Data Systems and Interoperability Emerging Standards for Systems Engineering David Boyd VP, Data Solutions Email: dboyd@incadencecorp.com Topics Shameless plugs and denials What is Big Data and Why

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Build an effective data integration strategy to drive innovation

Build an effective data integration strategy to drive innovation IBM Software Thought Leadership White Paper September 2010 Build an effective data integration strategy to drive innovation Five questions business leaders must ask 2 Build an effective data integration

More information

This document is a preview generated by EVS

This document is a preview generated by EVS INTERNATIONAL STANDARD ISO/IEC 29180 First edition 2012-12-01 Information technology Telecommunications and information exchange between systems Security framework for ubiquitous sensor networks Technologies

More information

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.935.4445 F.508.988.7881 www.idc-hi.com

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.935.4445 F.508.988.7881 www.idc-hi.com Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.935.4445 F.508.988.7881 www.idc-hi.com L e v e raging Big Data to Build a F o undation f o r Accountable Healthcare C U S T O M I N D

More information

Banking On A Customer-Centric Approach To Data

Banking On A Customer-Centric Approach To Data Banking On A Customer-Centric Approach To Data Putting Content into Context to Enhance Customer Lifetime Value No matter which company they interact with, consumers today have far greater expectations

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued

Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued Architecting an Industrial Sensor Data Platform for Big Data Analytics: Continued 2 8 10 Issue 1 Welcome From the Gartner Files: Blueprint for Architecting Sensor Data for Big Data Analytics About OSIsoft,

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Data virtualization: Delivering on-demand access to information throughout the enterprise

Data virtualization: Delivering on-demand access to information throughout the enterprise IBM Software Thought Leadership White Paper April 2013 Data virtualization: Delivering on-demand access to information throughout the enterprise 2 Data virtualization: Delivering on-demand access to information

More information

Wrangling Actionable Insights from Organizational Data

Wrangling Actionable Insights from Organizational Data Wrangling Actionable Insights from Organizational Data Koverse Eases Big Data Analytics for Those with Strong Security Requirements The amount of data created and stored by organizations around the world

More information

MANAGING USER DATA IN A DIGITAL WORLD

MANAGING USER DATA IN A DIGITAL WORLD MANAGING USER DATA IN A DIGITAL WORLD AIRLINE INDUSTRY CHALLENGES AND SOLUTIONS WHITE PAPER OVERVIEW AND DRIVERS In today's digital economy, enterprises are exploring ways to differentiate themselves from

More information

Beyond Watson: The Business Implications of Big Data

Beyond Watson: The Business Implications of Big Data Beyond Watson: The Business Implications of Big Data Shankar Venkataraman IBM Program Director, STSM, Big Data August 10, 2011 The World is Changing and Becoming More INSTRUMENTED INTERCONNECTED INTELLIGENT

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Big Data Standardisation in Industry and Research

Big Data Standardisation in Industry and Research Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Machine Data Analytics with Sumo Logic

Machine Data Analytics with Sumo Logic Machine Data Analytics with Sumo Logic A Sumo Logic White Paper Introduction Today, organizations generate more data in ten minutes than they did during the entire year in 2003. This exponential growth

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Systems and software engineering Lifecycle profiles for Very Small Entities (VSEs) Part 5-6-2:

Systems and software engineering Lifecycle profiles for Very Small Entities (VSEs) Part 5-6-2: TECHNICAL REPORT ISO/IEC TR 29110-5-6-2 First edition 2014-08-15 Systems and software engineering Lifecycle profiles for Very Small Entities (VSEs) Part 5-6-2: Systems engineering Management and engineering

More information

We are Big Data A Sonian Whitepaper

We are Big Data A Sonian Whitepaper EXECUTIVE SUMMARY Big Data is not an uncommon term in the technology industry anymore. It s of big interest to many leading IT providers and archiving companies. But what is Big Data? While many have formed

More information

Comparative Analysis of SOA and Cloud Computing Architectures using Fact Based Modeling

Comparative Analysis of SOA and Cloud Computing Architectures using Fact Based Modeling Comparative Analysis of SOA and Cloud Computing Architectures using Fact Based Modeling Baba Piprani 1, Don Sheppard 2, Abbie Barbir 3 1 MetaGlobal Systems, Canada 2 ConCon Management Services, Canada

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie

Big data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data coming soon... to an NSI near you John Dunne Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data is beginning to be explored and exploited to inform policy making. However these

More information

6 Cloud computing overview

6 Cloud computing overview 6 Cloud computing overview 6.1 General ISO/IEC 17788:2014 (E) Cloud Computing Overview Page 1 of 6 Cloud computing is a paradigm for enabling network access to a scalable and elastic pool of shareable

More information

Master big data to optimize the oil and gas lifecycle

Master big data to optimize the oil and gas lifecycle Viewpoint paper Master big data to optimize the oil and gas lifecycle Information management and analytics (IM&A) helps move decisions from reactive to predictive Table of contents 4 Getting a handle on

More information

Comparative Analysis of SOA and Cloud Computing Architectures Using Fact Based Modeling

Comparative Analysis of SOA and Cloud Computing Architectures Using Fact Based Modeling Comparative Analysis of SOA and Cloud Computing Architectures Using Fact Based Modeling Baba Piprani 1, Don Sheppard 2, and Abbie Barbir 3 1 MetaGlobal Systems, Canada 2 ConCon Management Services, Canada

More information

The emergence of big data technology and analytics

The emergence of big data technology and analytics ABSTRACT The emergence of big data technology and analytics Bernice Purcell Holy Family University The Internet has made new sources of vast amount of data available to business executives. Big data is

More information

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing Purpose of the Workshop In October 2014, the President s Council of Advisors on Science

More information

Next Generation Business Performance Management Solution

Next Generation Business Performance Management Solution Next Generation Business Performance Management Solution Why Existing Business Intelligence (BI) Products are Inadequate Changing Business Environment In the face of increased competition, complex customer

More information

White Paper. Requirements of Network Virtualization

White Paper. Requirements of Network Virtualization White Paper on Requirements of Network Virtualization INDEX 1. Introduction 2. Architecture of Network Virtualization 3. Requirements for Network virtualization 3.1. Isolation 3.2. Network abstraction

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

A New Era Of Analytic

A New Era Of Analytic Penang egovernment Seminar 2014 A New Era Of Analytic Megat Anuar Idris Head, Project Delivery, Business Analytics & Big Data Agenda Overview of Big Data Case Studies on Big Data Big Data Technology Readiness

More information

ISO/IEC JTC 1/SC 38 N 282

ISO/IEC JTC 1/SC 38 N 282 ISO/IEC JTC 1/SC 38 N 282 ISO/IEC JTC 1/SC 38 Distributed application platforms and services (DAPS) Secretariat: ANSI Document type: Request for comments Title: Draft Study Group on Cloud Computing Report

More information

TopBraid Insight for Life Sciences

TopBraid Insight for Life Sciences TopBraid Insight for Life Sciences In the Life Sciences industries, making critical business decisions depends on having relevant information. However, queries often have to span multiple sources of information.

More information

Architecting an Industrial Sensor Data Platform for Big Data Analytics

Architecting an Industrial Sensor Data Platform for Big Data Analytics Architecting an Industrial Sensor Data Platform for Big Data Analytics 1 Welcome For decades, organizations have been evolving best practices for IT (Information Technology) and OT (Operation Technology).

More information

The Lab and The Factory

The Lab and The Factory The Lab and The Factory Architecting for Big Data Management April Reeve DAMA Wisconsin March 11 2014 1 A good speech should be like a woman's skirt: long enough to cover the subject and short enough to

More information

Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management mohans@kellogg.northwestern.

Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management mohans@kellogg.northwestern. Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management mohans@kellogg.northwestern.edu Transportation Center Business Advisory Committee Meeting

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

IBM Enterprise Content Management Product Strategy

IBM Enterprise Content Management Product Strategy White Paper July 2007 IBM Information Management software IBM Enterprise Content Management Product Strategy 2 IBM Innovation Enterprise Content Management (ECM) IBM Investment in ECM IBM ECM Vision Contents

More information

PUBLICLY AVAILABLE SPECIFICATION PRE-STANDARD

PUBLICLY AVAILABLE SPECIFICATION PRE-STANDARD IEC/PAS 62443-3 PUBLICLY AVAILABLE SPECIFICATION PRE-STANDARD Edition 1.0 2008-01 Security for industrial process measurement and control Network and system security INTERNATIONAL ELECTROTECHNICAL COMMISSION

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Cybersecurity Analytics for a Smarter Planet

Cybersecurity Analytics for a Smarter Planet IBM Institute for Advanced Security December 2010 White Paper Cybersecurity Analytics for a Smarter Planet Enabling complex analytics with ultra-low latencies on cybersecurity data in motion 2 Cybersecurity

More information

This document is a preview generated by EVS

This document is a preview generated by EVS TECHNICAL REPORT ISO/IEC TR 20000-9 First edition 2015-02-15 Information technology Service management Part 9: Guidance on the application of ISO/IEC 20000-1 to cloud services Technologies de l information

More information

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method Improving Data Processing Speed in Big Data Analytics Using HDFS Method M.R.Sundarakumar Assistant Professor, Department Of Computer Science and Engineering, R.V College of Engineering, Bangalore, India

More information

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi

A Hurwitz white paper. Inventing the Future. Judith Hurwitz President and CEO. Sponsored by Hitachi Judith Hurwitz President and CEO Sponsored by Hitachi Introduction Only a few years ago, the greatest concern for businesses was being able to link traditional IT with the requirements of business units.

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information