1 POTENTIALS OF ONLINE MEDIA AND LOCATION-BASED BIG DATA FOR URBAN TRANSIT NETWORKS IN DEVELOPING COUNTRIES Kelsey Lantz* Ph.D. Student Glenn Department of Civil Engineering Lowry Hall, Clemson University, SC Tel: +() -, Fax: +-() -0 Sakib Khan Ph.D. Student Glenn Department of Civil Engineering G Lowry Hall, Clemson University, SC Tel: +() -, Fax: +-() -0 Linh B. Ngo Research Associate Big Data Systems Lab, School of Computing B, McAdams Hall, Clemson University, SC Tel: + () -, Fax: +-() -0 Mashrur Chowdhury Eugene Douglas Mays Professor of Transportation Glenn Department of Civil Engineering Lowry Hall, Clemson University, SC Tel: +-() -, Fax: +-() -0 Sarah Donaher Undergraduate Student Glenn Department of Civil Engineering Lowry Hall, Clemson University, SC Tel: +(0) -, Fax: +-() -0 Amy Apon Professor, Chair of Computer Science Division School of Computing McAdams Hall, Clemson University, SC Tel: + () -, Fax: +-() -0 *Corresponding author Number of Figures =, and Tables = ; Word count= 0 (0 words +0*)
2 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon 0 ABSTRACT Big data, collected in the form of social media posts and mobile phone location tracking has great potential to inform and manage the planning and operation of transit networks in developing countries. Data is widely availablebut the challenge, as with developed countries,is figuring out how to best use it. This paper uses a case study method to look at approaches in Nairobi, Kenya; Istanbul, Turkey; and Dhaka, Bangladesh. In Nairobi,Global Positioning System (GPS) location data has been collected to generate the first map of the complex Matatu transit network. In Istanbul, automated fare collection systems are processed to better understand the usage of a Bus Rapid Transit (BRT) system. In Dhaka, researchers are collecting GPS positioning data to manage the city bus networks. Residents of these developing cities are frequent users of on-line media; similar to many other cities in the developing countries. This study revealed that integration of on-line media with location-based data provide a big data scenario, which have potentials for supporting transit operations while posing challenges to manage the data mobility.of courseit is not realistic to apply a one-size-fits-all approach to any problem in the developing world, but together, the casestudies show that with the right approach,technical capacity in transitional cities has the potential togrow to support higher-level data processing to make more efficient and more sustainable policy decisions for crucial urban transit networks in developing countries.
3 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon INTRODUCTION Crippling congestion, deteriorating air quality, decreasing road safety and massive growth in energy consumption are symptoms of rapid motorization in developing cities around the world. Transit networks are a critical piece of the mobility puzzle because their effective operation can prevent would-be motorists from taking to the already congested roadways, and in general they move more people with a smaller carbon footprint (). Leapfrogging beyond traditional traffic sensing techniques, developing countries may capitalize on the technology available today to harness widely available big data to better understand the operation of formal and informal transit networks and subsequently optimize their planning and operation. Big data is a blanket term that is typically used to describe data that are too complex and too large to be processed using traditional storage and analytic infrastructures. By aggregating prevalent and publicly accessible data sources, big data can provide a wealth of opportunities for improving the planning and operation of the crucial transit networks that cities in developing countries rely on. Growing economies are increasingly taking advantage of data-intensive technologies(). With the right framework, big data-based knowledge can inform intelligent decision-making and optimize transit networks, thereby saving time, money, energy, and lives through improved safety and reduced travel time. This paper seeks to identify some ofthe benefits and challenges of using big data to plan and operate urban transit systems and to subsequently develop a replicable architecture for communication within the relevant stakeholders. To do so, we explore case studies in Nairobi, Kenya; Istanbul, Turkey; and Dhaka, Bangladesh to further investigate the practical applications of big data in transit planning and operation. Developing countries that lack conventional transportation infrastructure have the unique benefit of being able to capitalize on lessons learned from industrialized countries. The problems that plague an urbantransportation network are well known and the solutions do exist(). Rapid motorization presents a complicated dilemma: rising income growth and the subsequent rise in demand for private vehicles has propelled the transportation sector to be forecasted as the highest source of future greenhouse gas emissions. However, to restrict car use is to restrict the personal freedom of potential drivers and can be expected to be wildly politically unpopular. Ideally, automobile substitutes can be improved in these developing cities. However, this requires significant investments into physical infrastructure solutions such as heavy rail and popular Bus rapid transit (BRT) networks. This could be a challenge in term of the amount of time, effort and money for developing countries. Intelligent transportation systems (ITS) with new software and hardware subsystems must accompany this improvement in physical transit networks in order to face the complex challenges of future urban mobility(,). An integrated approach that includes both the hardware of new transit infrastructure and the software of system management is critical for urban planners in developing cities (). Initial transit development policies can be adopted to reduce the impacts on existing physical infrastructures and to reduce the costs to the improvement process. The planning of any transit development strategy consists of many decisions that are plagued with varying levels of uncertainty. Theoretically, every decision has a certain probability based on some prior information. If we can improve the quality of prior information, this uncertainty will be reduced. This reflects a well-established mathematical theorem of information theory that supports several types of statistical and probabilistic analysis ().Data available from on-line sources, such as social media and weather services can provide significant data to help the decision making process. However, with the availability with large amount of
4 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon data comes the challenge of managing these Big Data efficiently and effectively. The objectives of this study are to assess the potential of on-line data sources for supporting public transit operations in developing counties and identify strategies to manage these big data in the context of developing counties. Big Data For this study, we analyze how certain varieties of big data can be collected and processed to improve the functionality of urban transit systemsin developing countries. These data sets include two types of data: location-based data from mobile devices and text-based data from online media sources including Twitter, Facebook, FourSquare, and weather reporting services. Most individual media posts contain little informational value, but the aggregation of millions of messages over time can reveal important patterns. In absence of other geographic input, texts embedded in social media posts can carry large amounts of information. One of the most significant advantages of using location-based social media data is the valuable ability to track activity purposes, which can lead to more comprehensive planning abilities in the future (). Although large amounts of data make sampling error irrelevant, this does not make the sample necessarily representative. While mobile penetration in the developing world is high, Internet access is notably less universal and does therefore not represent an accurate cross section of potential transit ridership. It is worth remembering that Twitter does not evenly represent all people. However, content-based approaches to geo-locating Twitter users is currently being used for public health research () and similar methods could be applied to collecting and processing data regarding transit systems(). Alternatively, location-based data is a direct report of a physical location, transmitted through a number of sources, including: in-person card payment data, GPS chips in mobile devices, and cell-tower triangulation on mobile devices. Both formats of data can be configured into an open-source General Transit Feed Specification (GTFS). Public transit schedules and their associated geographic information are defined in this common format. These feeds permit transit agencies to make their data available to developers in orderfor them to write various applications that utilize that data in an compatible way, making it usable for future uses(0). By its open source nature, GTFS represents an accessible and wideranging platform for future expansion of the initiative to harness big data for the planning and operation of transit networks. Big data typically refers to data sets with sizes beyond the ability of traditional software tools to capture, curate, manage and process the data within a reasonable amount of time. At a glance, GIS data do not seem to take up a lot of space. Minimally, a GPS record includes latitude, longitude, and a time stamp. In commercial systems, latitude and longitude are typically stored as type double, which occupies bytes for each value (,). The time stamp can be stored using bytes. This brings the minimal size of a GPS record to 0 bytes. In typical GPS map-matching process, the sampling rates of GPS data on one device can be as high as once every 0 seconds (0 samples per device per day) or as low as once every minutes (0 samples per device per day)(). The rates of GPS data collected daily can be calculated as follows: High-rate bytes per day = 0 * 0 samples per device per day * (number of GPS devices) Low-rate bytes per day= 0 * 0 samples per device per day * (number of GPS devices) With these equations, we can estimate the amount of data collected daily through GPS devices for Dhaka, Nairobi, and Istanbul shown below in Table. The number of GPS-enabled devices
5 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon was approximated by multiplying the percent of mobile phone ownership in each city by its transit ridership ()(). TABLE. Bandwidth Required in Each Case Study City City Number of devices (GPSenabled Daily GPS data cell phones/gps- collected with a high enabled vehicles) sampling rate (Gigabytes) Nairobi,00, Istanbul,, 0.. Dhaka,000, Daily GPS data collected with a low sampling rate (Gigabytes) In the lowest sampling case, the city of Dhaka will acquire Terabytes (TB)of data every year, and in the highest sampling case, the city of Istanbul will acquire TB of data every year. In term of network bandwidth, we are looking at Mbps (megabits per second) for the lowest case and 0. Mbps for the highest case. This network bandwidth does not account for potential bandwidth expansion for video stream traffic camera, which could take up to 00 Kbps (kilobits per second) for each camera (). Video traffic monitoring could improve the accuracy of traffic management and act as an important component in future ITS initiatives. In another example, city of London s CCTV system requires a total bandwidth of. gigabits per second (Gbps)to support up to 00 cameras (). These estimations give us an idea about the necessary storage capacity and network bandwidth required to support these GPS data streams. To put this into perspective, the Palmetto Supercomputer, ranked in the top fastest supercomputers for U.S. academic institutions, has only TB for daily storage. The Palmetto Supercomputer has two dedicated T lines at. Mbps per line to connect to the Internet. For online media data, we include not only communication platforms such as Twitter or Facebook but also data sources that can augment traffic such as weather information, online resources listing public and private events that can impact traffic, and data feeds. These data come in a variety of formats, including and not limited to PDF, CSV, and structured/unstructured XML(). Through the above analysis, data coming from GPS-enabled devices and online media sources for traffic management purposes can be classified as big data. We have large volume of data under a variety of formats coming in at a high velocity with a certain degree of veracity due to missing and/or duplicated GPS data or ambiguous/incorrect online data(). This motivates the need for an Information Technology (IT) infrastructure that can support big data storage and analytics. Given the high volumes of these data sets collected over time, the capacity of public agencies in developing cities to process the data can be quickly outstripped. Fortunately, there is a global movement to make transit data open for several reasons. Through the Open Government Partnership, governments from more than countries have committed to empower citizens, encourage transparency, condemn corruption, and take advantage ofnovel technologies in order to strengthen governance with a strong emphasis on open data as a way to achieve this(0). The open source provision of structured datasets by governments will contribute to improving the transparency and accountability of states, enabling new forms of civic engagement, and stimulating economic growth and development(0). For a policy to be effective, planning should be driven by inclusive dialogue, technical expertise and available data. Making more transport
6 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon data available enables technology innovation that supports better operations. Additionally, standardizing the format through a structure like GTFS, open source software can be used to develop planning tools. These tools show planners how the city operates, as well as provide a platform to gather public input more efficiently through crowdsourcing. Big, open source transit data collected by telecommunications companies and social media can help to analyze traffic flows and other important urban dynamics. From an IT point of view, the big data infrastructure for ITS needs to support the four aspects of big data: volume, velocity, variety, and veracity. Volume, velocity, and variety can be supported through storage components and veracity will be handled through processing/analytics components. Benefits and Challenges The benefits to utilizing big data for the improvement of transit planning and operation are numerous. It delivers cost effective prospects to improve decision making in critical development areas like transportation planning and disaster response. Additionally, traditional household travel surveys in developing countries have proven to be expensive and inefficient() relative to the possibilities afforded by big data collection and analysis. Such surveys are not conducted regularly by any local public agencies, so finding a more productive and efficient way to analyze the travel behavior of a complex urban network will save time and effort for city planners. In addition to the benefits afforded by big open source data for transit planning and operation, several challenges plague the process. Common to all big data endeavors, security and privacy concerns and interoperability challenges may limit the scale of projects at this time. Encrypting data, though useful for addressing privacy concerns, will also increase the size of bandwidth needed to support the networks.completing these projects in the developing world means that we also have to overcome low technological infrastructure as well as economic and human resource scarcity. It is particularly important to note that each city in the developing world has its own individual assets and weaknesses, and that it is impractical to apply a one-sizefits-all solution to a diverse array of transit networks. METHODS This paper evaluates the usage of big data collection and processing for transit networks in the three urban areas: Nairobi, Istanbul, and Dhaka. Nairobi s network is informal, unregulated and fluid; data is being collected regarding the routes and stops in order to understand the system and improve it. Istanbul enjoys a fairly functional BRT network, and data collected from the automated fare collection service is helping planners to optimize the network. Dhaka has an overcrowded bus system that lacks a map or clearly marked stops, and researchers are using GPS data to better understand the workings of the system. Analysis of existing systems in these cities reveled that with adequate technical capacity, information and communication technology can greatly improve the sensing and processing of data regarding transit network operations in developing cities. There are many opportunities to utilize this information, including formalizing bus stops to increase the safety of riders, regulating and standardizing fares, and promoting open source data reserves to encourage the development of mobile phone applications to assist existing and new transit ridership. We then generate potential architectures for communication technology, and data storage and processing.
7 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon CASE STUDIES Table shows organizational and data details for the three cities that were the focus of case studies of applications of big data for transit planning and operation. Nairobi, Istanbul and Dhaka are all large, rapidly growing metropolitan regions in developing countries that are making an effort to harness big data to optimize their transit networks. For this paper, we identify approaches to collecting and processing data. We develop architecture for existing and future data flows within each city and we calculate the required bandwidth capacity for the data flows. TABLE. Case Study Cities and Their Data Existing Main Agency Sample Data Types Transit System Pursuing Big Data Collected for Planning Nairobi Informal Matatu University of Nairobi GPS locations from mobile Network Istanbul BRT Metropolitan Municipality of Istanbul Dhaka Nairobi Informal Private Bus Network Urban Launchpad phones User fares GPS locations and rider surveys It is often argued that Africa s greatest challenge is its infrastructure. Growth has outpaced capacity of infrastructure, especially with respect to urban mobility. Large African cities like Nairobi are in desperate need of evidence-based policy advising, where big data can play an important role. A popular response to heavy congestion problems in cities like Nairobi has been significant investment in extensive public transit systems, including both underground and overland metro systems. Substantially fewer resourceshave been allocated for improving and expanding public bus networks, which are oftenpoorly maintained and quite crowded. In many cases, privately owned mini-bus operators have met the need for affordable public transit by serving areas where standard bus routes are insufficient for the respective travel demand(). Researchers with the University of Nairobi and their collaborators are working to standardize and make transit data available for the network of Matatus, the informal city bus system. Using open source transit software in conjunction with existing digital mapping efforts, an inclusive framework for acquiringand mapping transportation data will emerge(). The data includes 0 routes and stops and has already spawned the development of two useful apps. These apps can now help to gather other useful data on accidents and congestion via crowdsourcing. Industry partners are also getting on board with the open source big data movement in Nairobi. IBM has pioneered a solution that enablesnairobi commuters to utilize their mobile devices to advise traveling routes subjectto currentestimates of traffic congestion. Using special algorithms to read and transmit images from CCTV cameras located around the metropolitan region of Nairobi, both drivers and public transit passengers can use their mobile devices to stay updated on current road conditions and receive recommendations for alternate routes. Only cameras arecurrently operational, so IBM researchers have used network analytics to augment the data, enabling the system to forecast otherwiseundetected patterns of mobility. The system works on both basic mobile phones with an SMS-based system and on smart phones through an app where users can view a city map that displays several
8 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon routing options around existing trafficjams. Researchers at IBM are developing the capabilities to include various other types of data including public safety, road conditions and scheduled construction in order to produce a more nuancedperspective on urban mobility.
9 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon FIGURE ATransit management architecture for Nairobi Produced using Turbo Architecture
10 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon Figure shows a transit management architecture for Nairobi developed by the authors based on the data available to them. The attached architecture for Nairobi is based on the Matatu transit system. It does not enjoy a ready framework of big data system like Istanbul has, but some of the required subsystems are already there. IBM has one of its global laboratories there, which collects data from the CCTV cameras installed in the local roads. This IBM center can act as the central data management center. Also the transit map of the Matatu route is already developed. Proper integration between this transit system with the data center (IBM lab) will help to have a complete framework of collecting transit data. The mobile network and online media of Nairobi can fill the gap of information flow regarding incident occurrence or transit vehicle status. Nairobi Traffic Command Center is suggested to act as the public transit management center. Using the data provided by the traffic command center, vehicles will be updated of route condition, fare schedule, and other factors. Istanbul Residents of Istanbul, Turkey rely on a centrally operated BRT system. Data collected from automated fare collection systems is being processed daily. Istanbul s bus lines use a smart card to enables speedier payment and the collection of user data. The card is swiped by users when they get on a bus and at their destination when they collect their refund, if they do not use the entire line. The card can track a rider s origin station, destination, frequency of rides, and some personal characters of the rider (including their status as a student, adult, or senior). Though this creates a constant incoming stream of transportation data, the information is not yet achieving its highest potential (). There are still issues halting the progress of analyzing this data in order to improve Istanbul s transportation. The amount of data is overwhelming and, as to date, no efficient way of sorting through and analyzing it has been implemented. Furthermore, the data is not collected for the purpose of using it to analyze transportation, so some additional data of demographic, land use, and spatial nature, for example, will likely be needed in order to create relevant, actionableconclusions(). Istanbul s Bus Rapid Transit (BRT) system, which travels between the Asian and European parts of the city, also uses the Istanbulkart, or smart payment card. This bus system is environmentally friendly and greatly helps relieve congestion. However, % of commuters in Istanbul still drive their own cars to work(). Since the BRT network operates in primarily separate lanes, it maintains an appeal in its own right as traffic congestion increases in regular travel lanes. If the bus system was streamlined and made more appealing to the public, traffic congestion could continue to be significantly reduced. Big data could be used to accomplish this. For example, the analysis of big data led to the conclusion that a high volume of passengers was consistently overwhelming the station of Mecidiyekoy. In response, a new station was built, greatly lessening the strain on the Mecidiyekoy station. The greatest roadblock on the way to utilizing the BRT s data collection system is the fact that data is only collected for those travelers who collect the refund on their Istanbulkarts at their destination station are mere % of travelers. Another project, called Insights in Motion, utilizes IBM analytics software to study how to make transportation more convenient and comfortable. The study is currently being conducted in two cities: Dubuque, Iowa and Istanbul, Turkey. This project looks at cities in motion and aims to decrypt the direction and motivation behind the movement of a city basically, how and why people move around. This new information will be used to develop new infrastructure while retrofitting old infrastructure to its full potential. This project analyzes data
11 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon gathered from transportation systems, census records, and data collected through cellular device. In Istanbul, this project hopes to have big impacts. It aims to reduce operating expenses by 0%, meet % more demand, reduce average commuter time by 0% and reduce per-traveler combustion emissions by 0%(). The Istanbul Metropolitan Municipality briefly opened up all their transport data from November -, 0. This was to allow for a citywide competition in which teams used this data to code and design smart city applications. It focuses on three domains: smart mobility, smart tourism and smart participation(). Istanbul could be coming into a new era of efficient transportation if projects like Insights in Motion continue to be implemented. These projects use big data to create efficiency, allowing us to know where and how to allocate resources, when certain areas of infrastructure are most likely to be overwhelmed, and where people travel. Many transportation analysts and experts have begun to realize that supplying open and thorough data is the best and possible only way to begin developing smarter, sustainable cities. Though Turkey still does not have an open data policy, events like the previous mobile application competition will hopefully encourage the Istanbul Metropolitan Municipality to open up their transit data and allow individuals and companies to begin analyzing the data in order to create a more sustainable Istanbul. Figure shows the proposed transit management architecture for Istanbuldeveloped by the authors based on the data available to them. Among the selected study areas, Istanbul already has an active framework of big data management to manage the data collected from the Istanbulkart (traveler card) of the local BRT system. In Istanbul, traffic sensors and cameras are installed on the highways. Data collected from the sensors and cameras can help the Istanbul Department of Fire Brigade in case of incidents or fire hazard. In addition to that, massive quantity of media and wireless network generated data are expected to be collected by the data center, which will archive every possible data for future planning purposes.
12 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon FIGURE Transit management architecture for Istanbul - Produced using Turbo Architecture
13 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon Dhaka As the most densely populated city in Bangladesh and one of the fastest growing cities in the world, high levels of congestion with heterogeneous traffic frequently plague the city of Dhaka.Dhaka is poised at a critical moment in its development where it must choose whether to allow rampant motorization to choke its already congested streets or to determine the best way to improve public transit to enable a more seamless integration of an already heterogeneous traffic situation (). A common policy reaction to such motorization is to increase roadway capacity in order to decrease congestion. However, it is well known that this strategy tends to encourage long-term motorization patterns through induced demand. In other words, expanding capacity attracts new trips by improving driving conditions, which causes the city and its transport network to become more and more auto-centric. An alternative to rampant infrastructure growth to accommodate the endlessly growing fleet of private vehicles is to invest in technology that will improve a more sustainable transit network in the long term(). With the right architecture, Dhaka can accomplish this arduous but necessary task. It should be noted that the Asian Development Bank approved a loan for a Dhaka BRT system in 0, which has not yet materialized (). We analyze the urban transit system as it exists today and plan for a future that could accommodate a centrally planned BRT corridor. Researchers are exploring how smart phone technology can improve bus and minibus public transportation network through GIS tracking. The bus network is independently owned and operated and largely unregulated, leaving many questions as to the true effectiveness of the service. This situation is expected to improve in near future due to integrated GPS system with the proposed bus rapid transit system (BRTS) in the Dhaka city. With the installed GPS technology, real time data regarding the BRTS arrival or departure time, and incident location, can be transferred instantly to a transit management center. Figure shows a future transit management architecture for Dhakadeveloped by the authors based on the data available to them. In Dhaka, current transit system consists of transit vehicles from multiple private and public bus companies on every route. Existing traffic control center is inactive due to lack of coordination between the transit vehicles, control center and traveler. Also, road excavation due to drainage system management and underground cable connection is a very common scene in Dhaka, which often blocks the bus stops and hinder the normal embarking or alighting of passengers. Also Bangladesh fire service and Civil Defense acts as an emergency management center in case of any incident occurs like fatal accident or fire hazard in public buses. In the architecture developed in this study, these subsystems are included as they are inevitable for smooth transit system flow. As shown in Figure, Bangladesh Road Transport Authority (BRTA) can play the role of the data management center with its current manpower and infrastructure. As suggested, BRTA will receive information from traffic control room, City Corporation, fire service and wireless network data. Transit vehicles will be assigned by the central traffic control center. Traveler from Bangladesh already uses SPASS, an automated fare collection system. They can get information from a cellular network. Data will flow from media and cellular network to BRTA. BRTA already maintains the log of transit vehicle fitness information, thus from the archive, transit vehicle compliance to maintenance and safety requirements can be checked. Also from the database of vehicle registration number, detailed information regarding the vehicle can also be derived.
14 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon FIGURE Transit management architecture for Dhaka - Produced using Turbo Architecture
15 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon FUTURE POTENTIALS In order for big data to work effectively for transit planning and operation, technical capacity must be developed and maintained in the developing world. An overview for a vision for the ITS information infrastructure is shown in Figure. 0 0 FIGURE Multiscale/multiresolution Design of the ITS Information Infrastructure and the available options at each storage level This infrastructure includes a flexible hierarchical distributed system that can incorporate design and configuration tradeoffs at several levels. For example, tradeoffs to be considered are when and where the data will be stored, processed, and cached in different layers of the data infrastructure decision support system to optimize metrics such as response time, cost, and reliability. Each layer of this infrastructure represents a subset of the computing infrastructure that must be designed to support the characteristics of the size and velocity of the data streams. At the lowest level, streaming process, document store technologies such as MongoDB or CouchDB can handle incoming data stream with high velocity. These database technologies can be implemented directly upon the standard Linux file system on the local hard drives of the computing infrastructures for this level. In a performance evaluation study, MongoDB are found to be able to support write volume of up to Mb/s (). With the document store acting as a buffer, data will be transferred to an extensible record store such as HBase on the next layer, Extraction-Transform-Load (ETL) process, and possibly imported into a MySQL database at the streaming process level to support real-time data analytics. Built on top of the Hadoop Distributed File System (HDFS) (0) and based on Google s Big Table technology (), HBase not only can provide massive tabular storage with billions of rows and hundreds of thousands of columns for unstructured data but also can be dynamically extended and are extremely resilient against hardware failures. Both document store and extensible record store can support unstructured data under a variety of different formats (). After the ETL process, the data then
16 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon can be integrated into a common massive-scale data warehouse, from which long term analytic and strategic planning jobs can be run. Once again, a storage mechanism such as HBase would be useful at this layer, as HBase can scale to petabytes of data and can easily be distributed across geographically different locations. One of the challenges in collecting transportation data, particularly at developing countries, is the lack of data standards. As a result, all the sample technologies described in Figure are chosen such that they can support the ingestion of data in any format/structure, as they do not rely on the traditional restrictions of tabular data in standard RMDBS. However, we still need to integrate these unstructured data into a common format usable for analyst. This challenge, together with the issues of missing and inaccurate data, are often handled through machine learning techniques (,). Built on top of HDFS, HBase can take advantage of Mahout, a scalable machine learning and data mining library. For other types of data analytics, HDFS enables a wide variety of programming tools and libraries. One such tool is MapReduce (), a batch-oriented programming library that can support hundreds of thousands of concurrent computing processes. Another tool is Spark (), an in-memory analytic tool to support very fast data processing. There exists previous work that utilizes Hadoop HDFS/MapReduce in different aspects of traffic management such as estimating vehicle carbon emissions (), providing dynamic route guidance (), and predicting traffic congestion (). With the previously described capability of the Hadoop-based storage infrastructure and the related supporting components, this provides a theoretical validation to the feasibility and potential of our proposed framework. Given the proper framework for collecting, processing, and storing large quantities of data relevant to the planning and operation of transit networks, developing cities will be better equipped for sustainable and cost-effective decision making in the future. Planner will be able to use more realistic models to determine a course of action. The last twenty years have seen the birth of Intelligent Transportation Systems (ITS), which has typically focused on applying communication and information technology to more traditional transportation functions like toll collection and incident detection. The increasing diversity and prevalence of mobile devices may change ITS at its core as they introduce new information sources into transportation planning and operations, including mobile sensor networks and real-time information dissemination. Mobile phones are a medium for both creating and receiving valuable urban information; this technology is poised to revolutionize the way we exchange and process data in complex metropolitan areas in the developing world. Realtime connectivity for enabling data exchange between vehicles and infrastructure(0), which is supposed to be a major transportation innovations in the future, will provide another opportunity to improve transit services with buses communicating with infrastructure that includes traffic signals for receiving priority and for enhancing other transit related services. CONCLUSIONS In order to be equipped to make smart decisions about the future of critical formal and informal urban transit networks in developing countries, we need access the data to contextualize the problems and to model future scenarios based on real information. Access to such data, including social media aggregations and GIS data, can be difficult in areas with low technical capacity and processing power. It is our argument based on this study, however, that it is worth overcoming such technical obstacles in order to leapfrog beyond a traditional transit-planning model to a more accurate, cost-effective, and sustainable process involving big data for transit planning and
17 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon 0 operation. Furthermore, traffic management policies can be derived from big data in order to assist with future improvements of the physical transit infrastructure. This study revealed that integration of online media with location-based data provide a big data scenario, which have potentials for supporting transit operations while posing challenges to manage the data mobility. It is not feasible to apply a one-size-fits-all approach to any problem in the developing world, but together, these cases show that with the right approach, technical capacity in transitional cities can grow to support higher-level data processing to make more efficient and more sustainable policy decisions for crucial urban transit networks. For cities at a crossroads between committing to an auto-centric future and improving alternative modes of transportation in our energy insecure future, it is important to consider the short-and long-term impacts of smarter transit planning and operation fueled by more reliable data sources.
18 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon ACKNOWLEDGEMENT This material is based upon work supported by the Department of Education GAANN Fellowship. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Department of Education. REFERENCES. Cheng-Min Feng, Cheng-Hsien Hsieh. Resource Allocation for Sustainable Urban Transit from a Transport Diversity Perspective. Sustainability. 00 Nov ;0.. Emmanuel Letouzé. Big Data for Development: Facts and Figures. Science and Development Network [Internet]. 0 Apr ; Available from: Chowdhury MA. Fundamentals of intelligent transportation systems planning. Boston: Artech House; p.. National ITS Architecture [Internet]. Available from: Albert M. L. Ching. A User-Flocksourced Bus Intelligence System in Dhaka. [Cambridge, Massachusetts]: MIT; 0.. Cover TM, Thomas JA. Elements of information theory. Hoboken, N.J.: Wiley-Interscience; 00. p.. Hasan S, Zhan X, Ukkusuri SV. Understanding urban human activity and mobility patterns using large-scale location-based data from online social media. ACM Press; 0 [cited 0 Aug ]. p.. Available from: Michael J. Paul, Mark Dredze. You are what you tweet: Analyzing Twitter for public health. Barcelona, Spain; 0. p... Cheng Z, Caverlee J, Lee K. You are where you tweet: a content-based approach to geolocating twitter users. ACM Press; 00 [cited 0 Jul 0]. p.. Available from: 0. Wong J. Leveraging the General Transit Feed Specification for Efficient Transit Analysis. Transportation Research Record: Journal of the Transportation Research Board. 0 Dec ;(-):.. LocationManager [Internet]. Android Developer; Available from: Garmin GPS Interface Specification [Internet]. Available from:
19 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon Lou Y, Zhang C, Zheng Y, Xie X, Wang W, Huang Y. Map-matching for low-sampling-rate GPS trajectories. ACM Press; 00 [cited 0 Aug ]. p.. Available from: Ilgin Gokasar, Kevser Simsek. Using Big Data for Analysis and Improvement of Public Transportation Systems in Istanbul. Stanford University; 0. Available from: Allowed=y. Central Intelligence Agency. The World Factbook [Internet]. [cited 0 Jul ]. Available from: https://www.cia.gov/library/publications/the-world-factbook/index.html. Aguado M, Jacob E, Matias J, Conde C, Berbineau M. Deploying CCTV As an Ethernet Service Over the Wimax Mobile Network in the Public Transport Scenario. IEEE; 00 [cited 0 Aug ]. p.. Available from: Barnett P. Real life experiences during the design, realization and commissioning of M London orbital motorway CCTV enhancement scheme using digital transmission technology. IEE; 00 [cited 0 Aug ]. p.. Available from: Lécué F, Tallevi-Diotallevi S, Hayes J, Tucker R, Bicer V, Sbodio ML, et al. STAR-CITY: semantic traffic analytics and reasoning for CITY. ACM Press; 0 [cited 0 Aug ]. p.. Available from: Boyd D, Crawford K. Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society. 0 Jun;():. 0. Tim G. Davies, Zainab Ashraf Bawa. The Promises and Perils of Open Government Data (OGD). The Journal of Community Informatics [Internet]. 0;(). Available from: Sarmiento I, González-Calderón C, Córdoba J, Díaz C. Important Aspects to Consider for Household Travel Surveys in Developing Countries. Transportation Research Record: Journal of the Transportation Research Board. 0 Dec ;(-):.. Cohen B. Urbanization in developing countries: Current trends, future projections, and key challenges for sustainability. Technology in Society. 00 Jan;(-): 0.. Williams S, Marcello E, Klopp JM. Toward Open Source Kenya: Creating and Sharing a GIS Database of Nairobi. Annals of the Association of American Geographers. 0 Jan ;0(): 0.. Akin D. Model for Estimating Increased Ridership Caused by Integration of Two Urban Transit Modes: Case Study of Metro and Bus-Minibus Transit Systems, Istanbul, Turkey. Transportation Research Record. 00 Jan ;():.
20 Lantz, Khan, Ngo, Chowdhury, Donaher, Apon Steve Hamm. Insights in Motion: Deep Analytics Shows How Cities Really Work [Internet]. Building a Smarter Planet. 0 [cited 0 Jul ]. Available from: Leyla Arsan. Apps with Istanbul Transport Open Data Awarded [Internet]. City Service Development Kit. 0 [cited 0 Jul ]. Available from: Khan SM, Chowdhury M. ITS for One of the Most Congested Cities in the Developing World?Dhaka Bangladesh: Challenges and Potentials [ITS in Developing Countries]. IEEE Intelligent Transportation Systems Magazine. 0;():0.. -0: Greater Dhaka Sustainable Urban Transport Project [Internet]. Asian Development Bank; 0. Available from: Elif Dede, Madhusudhan Govindaraju, Daniel Gunter, Richard Shane Canon, Lavanya Ramakrishnan. Performance evaluation of a mongodb and hadoop platform for scientific data analysis. 0. p Shvachko K, Kuang H, Radia S, Chansler R. The Hadoop Distributed File System. IEEE; 00 [cited 0 Aug ]. p. 0. Available from: Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, et al. Bigtable: A Distributed Storage System for Structured Data. ACM Transactions on Computer Systems. 00 Jun ;():.. Cattell R. Scalable SQL and NoSQL data stores. ACM SIGMOD Record. 0 May ;():.. Kamakshi Lakshminarayan, Steven A. Harp, Robert P. Goldman, Tariq Samad. Imputation of Missing Data Using Machine Learning Techniques. KDD. ;0.. Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence. 00 May;(-):.. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM. 00 Jan ;():0.. Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica. Spark: cluster Computing with Working Sets Yue Zhu, Si-qi Jia, Jun-kui Zhang, Qi Ll. Research of Urban Traffic Carbon Emission Data Mining Based on Hadoop. 0;Application Research of Computers():.
City Deploys Big Data BI Solution to Improve Lives and Create a Smart-City Template Overview Customer: Customer Website: http://www.bcn.cat/en/ Country or Region: Spain Industry: Government Customer Profile
Smarter Transportation Management Smarter Transportation Management An Opportunity for Government to Think and Act in New Ways: Smarter Transportation and Traffic Safety Cate Richards NA Smarter Cities
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Big Data Storage Architecture Design in Cloud Computing Xuebin Chen 1, Shi Wang 1( ), Yanyan Dong 1, and Xu Wang 2 1 College of Science, North China University of Science and Technology, Tangshan, Hebei,
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
INTELLIGENT TRANSPORT IN REAL TIME BUILDING CONFIDENCE, IMPROVING PERFORMANCE ABOUT CONNEXIONZ THE SMART SOLUTION TO TRANSPORT MANAGEMENT S mart phones, smart grids, smart cars the list of smart technologies
Questions to be responded to by the firm submitting the application Why do you think this project should receive an award? How does it demonstrate: innovation, quality, and professional excellence transparency
Urban Mobility India 2011 The IBM Smarter Cities Solutions: Opportunities for Intelligent Transportation Himanshu Bhatt Global Program Director, Market Strategy IBM Software Group Innovative leaders are
CHAPTER 8: INTELLIGENT TRANSPORTATION STSTEMS (ITS) Intelligent Transportation Systems (ITS) enables people and goods to move more safely and efficiently through a state-of-the-art multi-modal transportation
WHITEPAPER BEST PRACTICES Releasing the Value Within the Industrial Internet of Things Executive Summary Consumers are very familiar with the Internet of Things, ranging from activity trackers to smart
White Paper A Real-time Data Hub For Smarter City Applications Intelligent Transportation Innovation for Real-time Traffic Flow Analytics with Dynamic Congestion Management 2 Understanding traffic flow
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
SMART TRANSPORTATION Professor William HK LAM, The Hong Kong Polytechnic University Professor Hong K LO, The Hong Kong University of Science and Technology Professor SC WONG, The University of Hong Kong
Cisco Mobile Network Solutions for Commercial Transit Agencies Overview Commercial transit agencies provide a vital service to our communities. Their goal is to provide safe, affordable, and convenient
NTT DATA Big Data Reference Architecture Ver. 1.0 Big Data Reference Architecture is a joint work of NTT DATA and EVERIS SPAIN, S.L.U. Table of Contents Chap.1 Advance of Big Data Utilization... 2 Chap.2
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
Traffic System for Smart Cities Empowering Mobility Consulting Solutions Managed Services About US Leading provider of Toll Collection and Intelligent Transport Management Systems (ITS) More than a decade
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
Transportation Analytics Driving efficiency, reducing congestion, getting people where they need to go faster. Transportation Analytics Driving efficiency, reducing congestion, getting people where they
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
Research Note What is Big Data? By: Devin Luco Copyright 2012, ASA Institute for Risk & Innovation Keywords: Big Data, Database Management, Data Variety, Data Velocity, Data Volume, Structured Data, Unstructured
Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: firstname.lastname@example.org; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big
How to Leverage Big Data in the Cloud to Gain Competitive Advantage James Kobielus, IBM Big Data Evangelist Editor-in-Chief, IBM Data Magazine Senior Program Director, Product Marketing, Big Data Analytics
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Financial Services Grabbing Value from Big Data: The New Game Changer for Financial Services How financial services companies can harness the innovative power of big data 2 Grabbing Value from Big Data:
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
7.0 Transportation Management I. Introduction The Kansas City region has invested considerable effort and resources in the design and implementation of its regional multimodal transportation system. As
Craig McWilliams Craig Burrell Bringing Smarter, Safer Transport to NZ World Class Transport. Smarter, Stronger, Safer. Bringing Smarter Safer Transport to NZ Craig Burrell Infrastructure Advisory Director
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: email@example.com
The New Mobility: Using Big Data to Get Around Simply and Sustainably The New Mobility: Using Big Data to Get Around Simply and Sustainably Without the movement of people and goods from point to point,
ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that
WHITE PAPER ON Operational Analytics www.htcinc.com Contents Introduction... 2 Industry 4.0 Standard... 3 Data Streams... 3 Big Data Age... 4 Analytics... 5 Operational Analytics... 6 IT Operations Analytics...
SAP Thought Leadership Paper Predictive Maintenance Using Predictive Maintenance to Approach Zero Downtime How Predictive Analytics Makes This Possible Table of Contents 4 Optimizing Machine Maintenance
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
American Public Transit Association Bus & Paratransit Conference Intelligent Transportation Systems Integrating Technologies for Mobility Management & Transportation Coordination May 24, 2011 RouteMatch
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Hitachi Review Vol. 63 (2014), No. 1 18 Big Data Collection and Utilization for Operational Support of Smarter Social Infrastructure Kazuaki Iwamura Hideki Tonooka Yoshihiro Mizuno Yuichi Mashita OVERVIEW:
Internet of Things The Future of Smart In our Daily Lives Karen Lomas. Director, Smart Cities EMEA 1 Executive Summary As the developed world evolves individuals and collective groups be they corporations,
IBM Software Wrangling big data: Fundamentals of data management How to maintain data integrity across production and archived data Wrangling big data: Fundamentals of data management 1 2 3 4 5 6 Introduction
Big Data Driven Knowledge Discovery for Autonomic Future Internet Professor Geyong Min Chair in High Performance Computing and Networking Department of Mathematics and Computer Science College of Engineering,
Smart Cities Opportunities for Service Providers By Zach Cohen Smart cities will use technology to transform urban environments. Cities are leveraging internet pervasiveness, data analytics, and networked
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
INTRODUCTION At first, the Internet of Things (IoT) may seem like an idea straight out of science fiction. However, on closer consideration, we realize that the process of connecting everyday electronic
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
Big Data in Transportation Engineering Nii Attoh-Okine Professor Department of Civil and Environmental Engineering University of Delaware, Newark, DE, USA Email: firstname.lastname@example.org IEEE Workshop on Large Data
In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning
Mohan Sawhney Robert R. McCormick Tribune Foundation Clinical Professor of Technology Kellogg School of Management email@example.com Transportation Center Business Advisory Committee Meeting
Hadoop for Enterprises: Overcoming the Major Challenges Introduction to Big Data Big Data are information assets that are high volume, velocity, and variety. Big Data demands cost-effective, innovative
Impact of Big Data in Oil & Gas Industry Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. New Age Information 2.92 billions Internet Users in 2014 Twitter processes 7 terabytes
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
INTELLIGENT TRANSPORTATION SYSTEMS IN WHATCOM COUNTY A REGIONAL GUIDE TO ITS TECHNOLOGY AN INTRODUCTION PREPARED BY THE WHATCOM COUNCIL OF GOVERNMENTS JULY, 2004 Whatcom Council of Governments 314 E. Champion
Our future depends on intelligent infrastructures siemens.com/intelligent-infrastructures Answers for infrastructure and cities. 2 By combining engineering and data expertise, we enable our customers to
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Visualize your World Democratization i of Geographic Data Session Agenda Google GEO Solutions - More than just a Map Enabling our Government Customers- Examples Summary & Invite to Learn More About Google
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
Received by: Date: OFFICIAL USE: MASS TRANSIT PROGRAM GRANT APPLICATION PROJECT NAME LEAD AGENCY PROJECT TYPE GENERAL PROJECT AREA APPLICATION FORM Planning/ Project Development Capital/Equipment Amenities/
SCHOOL TRANSPORTATION Automated Vehicle Location for School Buses Can the Benefits Influence Choice of Mode for School Trips? TORI D. RHOULAC The author is Assistant Professor, Department of Civil Engineering,
Commuter Choice Certificate Program Current course offerings (subject to change) Core 1 Commuter Choice Toolbox Required Courses Rideshare Options Audience: This 2 credit required course is targeted to
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
Volume 3, Issue 6, June 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Image
IBM Global Business Services Microsoft Dynamics CRM solutions from IBM Power your productivity 2 Microsoft Dynamics CRM solutions from IBM Highlights Win more deals by spending more time on selling and
CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and
Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today
Deploying Big Data to the Cloud: Roadmap for Success James Kobielus Chair, CSCC Big Data in the Cloud Working Group IBM Big Data Evangelist. IBM Data Magazine, Editor-in- Chief. IBM Senior Program Director,
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
Predictive Analytics: Answering the Age-Old Question, What Should We Do Next? Current Challenges As organizations strive to meet today s most pressing challenges, they are increasingly shifting to data-driven
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
CELL PHONE TRACKING Index Purpose Description Relevance for Large Scale Events Options Technologies Impacts Integration potential Implementation Best Cases and Examples 1 of 10 Purpose Cell phone tracking
IBM Software Healthcare Thought Leadership White Paper For healthcare, change is in the air and in the cloud Scalable and secure private cloud solutions can meet the challenges of healthcare transformation
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
The Challenge of Handling Large Data Sets within your Measurement System The Often Overlooked Big Data Aaron Edgcumbe Marketing Engineer Northern Europe, Automated Test National Instruments Introduction
A Bicycle Accident Study Using GIS Mapping and Analysis Petra Staats, Graduate Assistant, Transportation Policy Institute, Rutgers University, New Brunswick, NJ, USA firstname.lastname@example.org Summary Surveys