Building Rome in a Day

Size: px
Start display at page:

Download "Building Rome in a Day"

Transcription

1 Building Rome in a Day Sameer Agarwal 1, Noah Snavely 2 Ian Simon 1 Steven. Seitz 1 Richard Szeliski 3 1 University of Washington 2 Cornell University 3 icrosoft Research Abstract We present a system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo sharing sites. Our system uses a collection of novel parallel distributed matching and reconstruction algorithms, designed to maximize parallelism at each stage in the pipeline and minimize serialization bottlenecks. It is designed to scale gracefully with both the size of the problem and the amount of available computation. We have experimented with a variety of alternative algorithms at each stage of the pipeline and report on which ones work best in a parallel computing environment. Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores. 1. Introduction Entering the search term Rome on flickr.com returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. ost of these photographs are captured from hundreds or thousands of viewpoints and illumination conditions Trevi Fountain alone has over 50,000 photographs on Flickr. Exciting progress has been made on reconstructing individual buildings or plazas from similar collections [16, 17, 8], showing the potential of applying structure from motion (Sf) algorithms on unstructured photo collections of up to a few thousand photographs. This paper presents the first system capable of city-scale reconstruction from unstructured photo collections. We present models that are one to two orders of magnitude larger than the next largest results reported in the literature. Furthermore, our system enables the reconstruction of data sets of 150,000 images in less than a day. To whom correspondence should be addressed. cs.washington.edu City-scale 3D reconstruction has been explored previously in the computer vision literature [12, 2, 6, 21] and is now widely deployed e.g., in Google Earth and icrosoft s Virtual Earth. However, existing large scale structure from motion systems operate on data that comes from a structured source, e.g., aerial photographs taken by a survey aircraft or street side imagery captured by a moving vehicle. These systems rely on photographs captured using the same calibrated camera(s) at a regular sampling rate and typically leverage other sensors such as GPS and Inertial Navigation Units, vastly simplifying the computation. Images harvested from the web have none of these simplifying characteristics. They are taken from a variety of different cameras, in varying illumination conditions, have little to no geographic information associated with them, and in many cases come with no camera calibration information. The same variability that makes these photo collections so hard to work with for the purposes of Sf also makes them an extremely rich source of information about the world. In particular, they specifically capture things that people find interesting, i.e., worthy of photographing, and include interiors and artifacts (sculptures, paintings, etc.) as well as exteriors [14]. While reconstructions generated from such collections do not capture a complete covering of scene surfaces, the coverage improves over time, and can be complemented by adding aerial or street-side images. The key design goal of our system is to quickly produce reconstructions by leveraging massive parallelism. This choice is motivated by the increasing prevalence of parallel compute resources both at the CPU level (multi-core) and the network level (cloud computing). At today s prices, for example, you can rent 1000 nodes of a cluster for 24 hours for $10,000 [1]. The cornerstone of our approach is a new system for largescale distributed computer vision problems, which we will be releasing to the community. Our pipeline draws largely from the existing state of the art of large scale matching and Sf algorithms, including SIFT, vocabulary trees, bundle adjustment, and other known techniques. For each stage in our pipeline, we consider several alternatives, as some

2 algorithms naturally distribute and some do not, and issues like memory usage and I/O bandwidth become critical. In cases such as bundle adjustment, where we find that the existing implementations did not scale, we have created our own high performance implementations. Designing a truly scalable system is challenging, and we discovered many surprises along the way. The main contributions of our paper, therefore, are the insights and lessons learned, as well as technical innovations that we invented to improve the parallelism and throughput of our system. The rest of the paper is organized as follows. Section 2 discusses the detailed design choices and implementation of our system. Section 3 reports the results of our experiments on three city scale data sets, and Section 4 concludes with a discussion of some of the lessons learned and directions for future work. 2. System Design Our system runs on a cluster of computers (nodes), with one node designated as the master node. The master node is responsible for the various job scheduling decisions. In this section, we describe the detailed design of our system, which can naturally be broken up into three distinct phases: (1) pre-processing 2.1, (2) matching 2.2, and (3) geometric estimation Preprocessing and feature extraction We assume that the images are available on a central store, from which they are distributed to the cluster nodes on demand in chunks of fixed size. This automatically performs load balancing, with more powerful nodes receiving more images to process. This is the only stage where a central file server required; the rest of the system operates without using any shared storage. This is done so that we can download the images from the Internet independent of our matching and reconstruction experiments. For production use, it would be straightforward to have each cluster node to crawl the Internet for images at the start of the matching and reconstruction process. On each node, we begin by verifying that the image files are readable, valid images. We then extract the EXIF tags, if present, and record the focal length. We also downsample images larger than 2 ega-pixels, preserving their aspect ratios and scaling their focal lengths. The images are then converted to grayscale and SIFT features are extracted from them [10]. We use the SIFT++ implementation by Andrea Vedaldi for its speed and flexible interface [20]. At the end of this stage, the entire set of images is partitioned into disjoint sets, one for each node. Each node owns the images and SIFT features associated with its partition. N0 N I0 IK IN SIFT SIFT SIFT F0 FN VT VQ VT VQ TF0 FK VT TFK VQ D F TFIDFK K TFN Round 1 Round 2 Round 3 Verify Verify erge Distributek1 (f) {F0} Verify {FN} TFIDF0 TFIDFN (a) (b) (c) (d) (e) Distributek2 CC {F0} Verify {FN} (g) (h) (i) (j) (k) Expand1 {T0} erge {TN} 0 N T0 TC 0 N erge T0 erge Figure 1. Our multi-stage parallel matching pipeline. The N input images are distributed onto processing nodes, after which the following processing stages are performed: (a) SIFT feature extraction, vocabulary tree vector quantization, and term frequency counting; (b) document frequency counting; (c) TFIDF computation and information broadcast; (d) computation of TF-based match likelihoods; (e) aggregation at the master node; (f) round-robin bin-packing distribution of match verification tasks based on the top k 1 matches per image; (g) match verification with optional inter-node feature vector exchange; (h) match proposal expansion based on images found in connected components CC using the next k 2 best matches per image; (i) more distributed verification; (j) four more rounds of query expansion and verification; (k) track merging based on local verified matches; (l) track aggregation by into C connected components; (m) final distribution of tracks by image connected components and distributed merging Image atching The key computational tasks when matching two images are the photometric matching of interest points and the geometric verification of these matches using a robust estimate of the fundamental or essential matrix. While exhaustive matching of all features between two images is prohibitively expensive, excellent results have been reported with approximate nearest neighbor search. We use the ANN library [3] for matching SIFT features. For each pair of images, the features of one image are inserted into a k-d tree, and the features from the other image are used as queries. For each query, we consider the two nearest neighbors, and matches that pass Lowe s ratio test are accepted [10]. We use the priority queue based search method, with an upper bound on the maximum number of bin visits of 200. In our experience, these parameters offer a good tradeoff between computational cost and matching accuracy. The matches returned by the approximate nearest neighbor search are then pruned and verified using a RANSAC-based estimation of the fundamental or essential matrix [18], depending on the availability of focal length information from the EXIF tags. (l) TC (m)

3 These two operations form the computational kernel of our matching engine. Unfortunately, even with a well optimized implementation of the matching procedure described above, it is not practical to match all pairs of images in our corpus. For a corpus of 100,000 images, this translates into 5,000,000,000 pairwise comparisons, which with 500 cores operating at 10 image pairs per second per core would require about 11.5 days to match. Furthermore, this does not even take into account the network transfers required for all cores to have access to all the SIFT feature data for all images. Even if we were able to do all these pairwise matches, it would be a waste of computational effort, since an overwhelming majority of the image pairs do not match. This is expected from a set of images associated with a broad tag like the name of a city. Thus, we must be careful in choosing the image pairs that the system spends its time matching. Building upon recent work on efficient object retrieval [15, 11, 5, 13], we use a multi-stage matching scheme. Each stage consists of a proposal and a verification step. In the proposal step, the system determines a set of image pairs that it expects to share common scene elements. In the verification step, detailed feature matching is performed on these image pairs. The matches obtained in this manner then inform the next proposal step. In our system, we use two methods to generate proposals: vocabulary tree based whole image similarity and query expansion The verification stage is described in Figure 1 shows the system diagram for the entire matching pipeline Vocabulary Tree Proposals ethods inspired by text retrieval have been applied with great success to the problem of object and image retrieval. These methods are based on representing an image as a bag of words, where the words are obtained by quantizing the image features. We use a vocabulary tree-based approach [11], where a hierarchical k-means tree is used to quantize the feature descriptors. (See 3 for details on how we build the vocabulary tree using a small corpus of training images.) These quantizations are aggregated over all features in an image to obtain a term frequency (TF) vector for the image, and a document frequency (DF) vector for the corpus of images (Figure 1a e). The document frequency vectors are gathered across nodes into a single vector that is broadcast across the cluster. Each node normalizes the term frequency vectors it owns to obtain the TFIDF matrix for that node. These per-node TFIDF matrices are broadcast across the network, so that each node can calculate the inner product between its TFIDF vectors and the rest of the TFIDF vectors. In effect, this is a distributed product of the matrix of TFIDF vectors with itself, but each node only calculates the block of rows corresponding to the set of images it owns. For each image, the top scoring k 1 + k 2 images are identified, where the first k 1 images are used in an initial verification stage, and the additional k 2 images are used to enlarge the connected components (see below). Our system differs from that of Nister and Stewenius [11], since their system has a fixed database to which they match incoming images. They can therefore store the database in the vocabulary tree itself and evaluate the match score of an image on the fly. In our case, the query set is the same as the database, and it is not available when the features are being encoded. Thus, we must have a separate matrix multiplication stage to find the best matching images Verification and detailed matching The next step is to verify candidate image matches, and to then find a detailed set of matches between matching images. If the images were all located on a single machine, the task of verifying a proposed matching image pair would be a simple matter of running through the image pairs, perhaps with some attention paid to the order in which the verifications are performed so as to minimize disk I/O. However, in our case, the images and their feature descriptors are distributed across the cluster. Thus, asking a node to match the image pair (i, j) require it to fetch the image features from two other nodes of the cluster. This is undesirable, as there is a large difference between network transfer speeds and local disk transfers. Furthermore, this creates work for three nodes. Thus, the image pairs should be distributed across the network in a manner that respects the locality of the data and minimizes the amount of network transfers (Figure 1f g). We experimented with a number of approaches with some surprising results. We initially tried to optimize network transfers before any verification is done. In this setup, once the master node has all the image pairs that need to be verified, it builds a graph connecting image pairs which share an image. Using etis [7], this graph is partitioned into as many pieces as there are compute nodes. Partitions are then matched to the compute nodes by solving a linear assignment problem that minimizes the number of network transfers needed to send the required files to each node. This algorithm worked well for small problem sizes, but as the problem size increased, its performance became degraded. Our assumption that detailed matches between all pairs of images take the same constant amount of time was wrong: some nodes finished early and were idling for up to an hour. The second idea we tried was to over-partition the graph into small pieces, and to parcel them out to the cluster nodes on demand. When a node requests another chunk of work, the piece with the fewest network transfers is assigned to it. This strategy achieved better load balancing, but as the size

4 of the problem grew, the graph we needed to partition grew to be enormous, and partitioning itself became a bottleneck. The approach that gave the best results was to use a simple greedy bin-packing algorithm (where each bin represents the set of jobs sent to a node), which works as follows. The master node maintains a list of images on each node. When a node asks for work, it runs through the list of available image pairs, adding them to the bin if they do not require any network transfers, until either the bin is full or there are no more image pairs to add. It then chooses an image (list of feature vectors) to transfer to the node, selecting the image that will allow it to add the maximum number of image pairs to the bin, This process is repeated until the bin is full. This algorithm has one drawback: it can require multiple sweeps over all the image pairs needing to be matched. For large problems, the scheduling of jobs can become a bottleneck. A simple solution is to only consider a subset of the jobs at a time, instead of trying to optimize globally. This windowed approach works very well in practice, and all our experiments are run with this method. Verifying an image pair is a two-step procedure, consisting of photometric matching between feature descriptors, and a robust estimation of the essential or fundamental matrix depending upon the availability of camera calibration information. In cases where the estimation of the essential matrix succeeds, there is a sufficient angle between the viewing directions of the two cameras, and the number of matches are above a threshold, we do a full Euclidean twoview reconstruction and store it. This information is used in later stages (see 2.4) to reduce the size of the reconstruction problem erging Connected Components At this stage, consider a graph on the set of images with edges connecting two images if matching features were found between them. We refer to this as the match graph. To get as comprehensive a reconstruction as possible, we want the fewest number of connected components in this graph. To this end, we make further use of the proposals from the vocabulary tree to try and connect the various connected components in this graph. For each image, we consider the next k 2 images suggested by the vocabulary tree. From these, we verify those image pairs which straddle two different connected components (Figure 1h i). We do this only for images which are in components of size 2 or more. Thus, images which did not match any of their top k 1 proposed matches are effectively discarded. Again, the resulting image pairs are subject to detailed feature matching. Figure 2 illustrates this. Notice that after the first round, the match graph has two connected components, which get connected after the second round of matching Query Expansion After performing two rounds of matching as described above, we have a match graph which is usually not dense enough to reliably produce a good reconstruction. To remedy this, we use another idea from text and document retrieval research query expansion [5]. In its simplest form, query expansion is done by first finding the documents that match a user s query, and then using them to query the database again, thus expanding the initial query. The results returned by the system are some combination of these two queries. In essence, if we were to define a graph on the set of documents, with similar documents connected by an edge, and treat the query as a document too, then query expansion is equivalent to finding all vertices that are within two steps of the query vertex. In our system, we consider the image match graph, where images i and j are connected if they have a certain minimum number of features in common. Now, if image i is connected to image j and image j is connected to image k, we perform a detailed match to check if image j matches image k. This process can be repeated a fixed number of times or until the match graph converges. A concern when iterating rounds of query expansion is drift. Results of the secondary queries can quickly diverge from the original query. This is not a problem in our system, since query expanded image pairs are subject to detailed geometric verification before they are connected by an edge in the match graph Track Generation The final step of the matching process is to combine all the pairwise matching information to generate consistent tracks across images, i.e., to find and label all of the connected components in the graph of individual feature matches (the feature graph). Since the matching information is stored locally on the compute node that it was computed on, the track generation process proceeds in two stages (Figure 1k m). In the first, each node generates tracks from all the matching data it has available locally. This data is gathered at the master node and then broadcast over the network to all the nodes. Observe that the tracks for each connected component of the match graph can be processed independently. The track generation then proceeds with each compute node being assigned a connected component for which the tracks need to be produced. As we merge tracks based on shared features, inconsistent tracks can be generated, where two feature points in the same image belong to the same track. In such cases, we drop the offending points from the track. Once the tracks have been generated, the next step is to extract the 2D coordinates of the feature points that occur in the tracks from the corresponding image feature files. We also extract the pixel color for each such feature point, which is later used for rendering the 3D point with the average

5 Initial atches CC erge Query Expansion 1 Query Expansion 4 Skeletal Set Figure 2. The evolution of the match graph as a function of the rounds of matching, and the skeletal set corresponding to it. Notice how the second round of matching merges the two components into one, and how rapidly the query expansion increases the density of the within component connections. The last column shows the skeletal set corresponding to the final match graph. The skeletal sets algorithm can break up connected components found during the match phase if it determines that a reliable reconstruction is not possible, which is what happens in this case. color of the feature points associated with it. Again, this procedure proceeds in two steps. Given the per-component tracks, each node extracts the feature point coordinates and the point colors from the SIFT and image files that it owns. This data is gathered and broadcast over the network, where it is processed on a per connected component basis Geometric Estimation Once the tracks have been generated, the next step is to run structure from motion (Sf) on every connected component of the match graph to recover a pose for every camera and a 3D position for every track. ost Sf systems for unordered photo collections are incremental, starting with a small reconstruction, then growing a few images at a time, triangulating new points, and doing one or more rounds of nonlinear least squares optimization (known as bundle adjustment [19]) to minimize the reprojection error. This process is repeated until no more cameras can be added. However, due to the scale of our collections, running such an incremental approach on all the photos at once is impractical. The incremental reconstruction procedure described above has in it the implicit assumption that all images contribute more or less equally to the coverage and accuracy of the reconstruction. Internet photo collections, by their very nature are redundant many photographs are taken from nearby viewpoints and processing all of them does not necessarily add to the reconstruction. It is thus preferable to find and reconstruct a minimal subset of photographs that capture the essential connectivity of the match graph and the geometry of the scene [8, 17]. Once this is done, we can add back in all the remaining images using pose estimation, triangulate all remaining points, and then do a final bundle adjustment to refine the Sf estimates. For finding this minimal set, we use the skeletal sets algorithm of [17], which computes a spanning set of photographs that preserves important connectivity information in the image graph (such as large loops). In [17], a two-frame reconstruction is computed for each pair of matching images with known focal lengths. In our system, these pairwise reconstruction are computed as part of the parallel matching process. Once a skeletal set is computed, we estimate the Sf parameters of each resulting component using the incremental algorithm of [16]. The skeletal sets algorithm often breaks up connected components across weakly-connected boundaries, resulting in a larger set of components Bundle Adjustment Having reduced the size of the SF problem down to the skeletal set, the primary bottleneck in the reconstruction process is the non-linear minimization of the reprojection error, or bundle adjustment (BA). The best performing BA software available publicly is Sparse Bundle Adjustment (SBA) by Lourakis & Argyros [9]. The key to its high performance is the use of the so called Schur complement trick [19] to reduce the size of the linear system (also known as the normal equations) that needs to be solved in each iteration of the Levenberg-arquardt (L) algorithm. The size of this linear system depends on the 3D point and the camera parameters, whereas the size of the Schur complement only depends on the camera parameters. SBA then uses a dense Cholesky factorization to factor and solve the resulting reduced linear system. Since the number of 3D points in a typical Sf problem is usually two orders of magnitude or more larger than the number of cameras, this leads to substantial savings. This works for small to moderate sized problems, but for large problems with thousands of images, computing the dense Cholesky factorization of the Schur complement becomes a space and time bottleneck. For large problems however, the Schur complement itself is quite sparse (a 3D point is usually visible in only a few cameras) and exploiting this sparsity can lead to significant time and space savings. We have developed a new high performance bundle adjustment software that, depending upon the size of the problem, chooses between a truncated and an exact step L algorithm. In the first case, a block diagonal preconditioned conjugate gradient method is used to solve approximately the normal equations. In the second case, CHOLOD [4], a sparse direct method for computing Cholesky factorization is used to exactly solve the normal equations via the Schur comple-

6 Time (hrs) Data set Images Cores Registered Pairs verified Pairs found atching Skeletal sets Reconstruction Dubrovnik 57, ,868 2,658, , Rome 150, ,658 8,825,256 2,712, Venice 250, ,925 35,465,029 6,119, Table 1. atching and reconstruction statisics for the three data sets. ment trick. The first algorithm has low time complexity per iteration, but uses more L iterations, while the second one converges faster at the cost of more time and memory per iteration. The resulting code uses significantly less memory than SBA and runs up to an order of magnitude faster. The exact runtime and memory savings depend upon the sparsity structure of the linear system involved Distributed Computing Engine Our matching and reconstruction algorithm is implemented as a two-layered system. At the base of the system is an application-agonstic distributed computing engine. Except for a small core, which is easily ported, the system is cross-platform and runs on all major flavors of UNIX and icrosoft Windows. The system is small and extensible, and while it comes with a number of scheduling policies and data transfer primitives, the users are free to add their own. The system is targeted at batch applications that operate on large amounts of data. It has extensive support for local caching of application data and on-demand transfer of data across the network. If a shared filesystem is available, it can be used, but for maximum performance, all data is stored on local disk and only transferred across the network as needed. It supports a variety of scheduling models, from trivially data parallel tasks to general map-reduce style computation. The distributed computing engine is written as set of Python scripts, with each stage of computation implemented as a combination of Python and C++ code. 3. Experiments We report here the results of running our system on three city data sets downloaded from flickr.com: Dubrovnik, Croatia; Rome; and Venice, Italy. Figures 3(a), 3(b) and 3(c) show reconstructions of the largest connected components in these data sets. Due to space considerations, only a sample of the results is shown here, we encourage the reader to visit where the complete results are posted, and additional results, data, and code will be posted over time. The experiments were run on a cluster of 62 nodes with dual quad core processors each, on a private network with 1 GB/sec Ethernet interfaces. Each node was equipped with 32 GB of RA and 1 TB of local hard disk space. The nodes were running the icrosoft Windows Server bit operating system. The same vocabulary tree was used in all experiments. It was trained off-line on 1,918,101 features extracted from 20,000 images of Rome (not used in the experiments reported here). We trained a tree with a branching factor of 10 and a total of 136,091 vertices. For each image, we used the vocabulary tree to find the top 20 matching images. The top k 1 = 10 matches were used in the first verification stage, and the next k 2 = 10 matches were used in the second component matching stage. Four rounds of query expansion were done. We found that in all cases, the ratio of number of matches performed to the number of matches verified starts dropping off after four rounds. Table 1 summarizes statistics of the three data sets. The reconstruction timing numbers in Table 1 bear some explanation. It is surprising that the Sf time for Dubrovnik is so much more than for Rome, and is almost the same as Venice, both which is are much larger data sets. The reason lies in how the data sets are structured. The Rome and Venice data sets are essentially a collection of landmarks, which at large scale have a simple geometry and visibility structure. The largest connected component in Dubrovnik on the other hand captures the entire old city. With its narrow alley ways, complex visibility and widely varying view points, it is a much more complicated reconstruction problem. This is reflected in the sizes of the skeletal sets associated with the largest connected components as shown in Table 2. As we mentioned above, the skeletal sets algorithm often breaks up connected components across weakly-connected boundaries, therefore the size of the connected component returned by the matching system (CC1), is larger than the connected component (CC2) that what the skeletal sets algorithm returns. 4. Discussion At the time of writing this paper, searching on Flickr.com for the keywords Rome or Roma results in over 2,700,000 images. Our aim is to be able to reconstruct as much of the city as possible from these photographs in 24 hours. Our current system is about an order of magnitude away from this goal. We believe that our current approach can be scaled to problems of this size. However, a number

7 (a) Dubrovnik: Four different views and associated images from the largest connected component. Note that the component captures the entire old city, with both street-level and roof-top detail. The reconstruction consists of 4,585 images and 2,662,981 3D points with 11,839,682 observed features. Colosseum: 2,097 images, 819,242 points Trevi Fountain: 1,935 images, 1,055,153 points Pantheon: 1,032 images, 530,076 points Hall of aps: 275 images, 230,182 points (b) Rome: Four of the largest connected components visualized at canonical viewpoints [14]. San arco Square: 13,699 images, 4,515,157 points Grand Canal: 3,272 images, 561,389 points (c) Venice: The two largest connected components visualized from two different view points each of open questions remain. The track generation, skeletal sets, and reconstruction algorithms are all operating on the level of connected components. This means that the largest few components com- pletely dominate these stages. Currently, our reconstruction algorithm only utilizes multi-threading when solving the bundle adjustment problem. While this helps, it is not a scalable solution to the problem, as depending upon the connectiv-

8 Data set CC1 CC2 Skeletal Set Reconstructed Dubrovnik 6, Rome 7,518 2, ,097 Venice 20,542 14,079 1,801 13,699 Table 2. Reconstruction statistics for the largest connected components in the three data sets. CC1 is the size of the largest connected component after matching, CC2 is the size of the largest component after skeletal sets. The last column lists the number of images in the final reconstruction. ity pattern of the match graph, this can take an inordinate amount of time and memory. The key reason we are able reconstruct these large image collections is because of the large amount of redundancy present in Internet collections, which the skeletal sets algorithm exploits. We are currently exploring ways of parallelizing all three of these steps, with particular emphasis on the Sf system. The runtime performance of the matching system depends critically on how well the verification jobs are distributed across the network. This is facilitated by the initial distribution of the images across the cluster nodes. An early decision to store images according to the name of the user and the Flickr ID of the image meant that most images taken by the same user ended up on the same cluster node. Looking at the match graph, it turns out (quite naturally in hindsight) that a user s own photographs have a high probability of matching amongst themselves. The ID of the person who took the photograph is just one kind of metadata associated with these images. A more sophisticated strategy would exploit all the textual tags and geotags associated with the images to predict what images likely to match and then distribute the data accordingly. Finally, our system is designed with batch operation in mind. A more challenging problem is to start with the output of our system and to add more images as they become available. Acknowledgement This work was supported in part by SPAWAR, NSF grant IIS , the Office of Naval Research, the University of Washington Animation Research Labs, and icrosoft. We also thank icrosoft Research for generously providing access to their HPC cluster. The authors would also like to acknowledge discussions with Steven Gribble, Aaron Kimball, Drew Steedly and David Nister. References [1] Amazon Elastic Computer Cloud. com/ec2. [2]. Antone and S. Teller. Scalable extrinsic calibration of omni-directional image networks. IJCV, 49(2): , [3] S. Arya, D. ount, N. Netanyahu, R. Silverman, and A. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JAC, 45(6): , [4] Y. Chen, T. Davis, W. Hager, and S. Rajamanickam. Algorithm 887: CHOLOD, Supernodal Sparse Cholesky Factorization and Update/Downdate. TOS, 35(3), [5] O. Chum, J. Philbin, J. Sivic,. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In ICCV, pages 1 8, [6] C. Früh and A. Zakhor. An automated method for large-scale, ground-based city model acquisition. IJCV, 60(1):5 24, [7] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SISC, 20(1):359, [8] X. Li, C. Wu, C. Zach, S. Lazebnik, and J. Frahm. odeling and Recognition of Landmark Image Collections Using Iconic Scene Graphs. In ECCV, pages , [9]. I. A. Lourakis and A. A. Argyros. Sba: A software package for generic sparse bundle adjustment. AC TOS, 36(1):1 30, [10] D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91 110, [11] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, pages , [12]. Pollefeys, D. Nister, J. Frahm, A. Akbarzadeh, P. ordohai, B. Clipp, C. Engels, D. Gallup, S. Kim, P. errell, et al. Detailed real-time urban 3d reconstruction from video. IJCV, 78(2): , [13] G. Schindler,. Brown, and R. Szeliski. City-scale location recognition. In CVPR, pages 1 7, [14] I. Simon, N. Snavely, and S.. Seitz. Scene summarization for online image collections. In ICCV, pages 1 8, [15] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages , [16] N. Snavely, S.. Seitz, and R. Szeliski. Photo Tourism: Exploring photo collections in 3D. TOG, 25(3): , [17] N. Snavely, S.. Seitz, and R. Szeliski. Skeletal graphs for efficient structure from motion. In CVPR, pages 1 8, [18] P. Torr and A. Zisserman. Robust computation and parametrization of multiple view relations. In ICCV, pages , [19] B. Triggs, P. clauchlan, H. R.I, and A. Fitzgibbon. Bundle adjustment - a modern synthesis. In Vision Algorithms 99, pages , [20] A. Vedaldi. Sift++. vedaldi/code/siftpp/siftpp.html. [21] L. Zebedin, J. Bauer, K. Karner, and H. Bischof. Fusion of feature- and area-based information for urban buildings modeling from aerial imagery. In ECCV, pages IV: , 2008.

Towards Linear-time Incremental Structure from Motion

Towards Linear-time Incremental Structure from Motion Towards Linear-time Incremental Structure from Motion Changchang Wu University of Washington Abstract The time complexity of incremental structure from motion (SfM) is often known as O(n 4 ) with respect

More information

Building Rome in a Day

Building Rome in a Day Building Rome in a Day Agarwal, Sameer, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski. Presented by Ruohan Zhang Source: Agarwal et al., Building Rome

More information

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION

FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION FAST APPROXIMATE NEAREST NEIGHBORS WITH AUTOMATIC ALGORITHM CONFIGURATION Marius Muja, David G. Lowe Computer Science Department, University of British Columbia, Vancouver, B.C., Canada mariusm@cs.ubc.ca,

More information

Fast Matching of Binary Features

Fast Matching of Binary Features Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been

More information

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH

MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH MetropoGIS: A City Modeling System DI Dr. Konrad KARNER, DI Andreas KLAUS, DI Joachim BAUER, DI Christopher ZACH VRVis Research Center for Virtual Reality and Visualization, Virtual Habitat, Inffeldgasse

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

State of the Art and Challenges in Crowd Sourced Modeling

State of the Art and Challenges in Crowd Sourced Modeling Photogrammetric Week '11 Dieter Fritsch (Ed.) Wichmann/VDE Verlag, Belin & Offenbach, 2011 Frahm et al. 259 State of the Art and Challenges in Crowd Sourced Modeling JAN-MICHAEL FRAHM, PIERRE FITE-GEORGEL,

More information

Building Rome on a Cloudless Day

Building Rome on a Cloudless Day Building Rome on a Cloudless Day Jan-Michael Frahm 1, Pierre Fite-Georgel 1, David Gallup 1, Tim Johnson 1, Rahul Raguram 1, Changchang Wu 1, Yi-Hung Jen 1, Enrique Dunn 1, Brian Clipp 1, Svetlana Lazebnik

More information

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28

Recognition. Sanja Fidler CSC420: Intro to Image Understanding 1 / 28 Recognition Topics that we will try to cover: Indexing for fast retrieval (we still owe this one) History of recognition techniques Object classification Bag-of-words Spatial pyramids Neural Networks Object

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

Robust Panoramic Image Stitching

Robust Panoramic Image Stitching Robust Panoramic Image Stitching CS231A Final Report Harrison Chau Department of Aeronautics and Astronautics Stanford University Stanford, CA, USA hwchau@stanford.edu Robert Karol Department of Aeronautics

More information

Online Environment Mapping

Online Environment Mapping Online Environment Mapping Jongwoo Lim Honda Research Institute USA, inc. Mountain View, CA, USA jlim@honda-ri.com Jan-Michael Frahm Department of Computer Science University of North Carolina Chapel Hill,

More information

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service

siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service siftservice.com - Turning a Computer Vision algorithm into a World Wide Web Service Ahmad Pahlavan Tafti 1, Hamid Hassannia 2, and Zeyun Yu 1 1 Department of Computer Science, University of Wisconsin -Milwaukee,

More information

GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas

GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas GPS-aided Recognition-based User Tracking System with Augmented Reality in Extreme Large-scale Areas Wei Guan Computer Graphics and Immersive Technologies Computer Science, USC wguan@usc.edu Suya You Computer

More information

Fast Multipole Method for particle interactions: an open source parallel library component

Fast Multipole Method for particle interactions: an open source parallel library component Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,

More information

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree.

Cees Snoek. Machine. Humans. Multimedia Archives. Euvision Technologies The Netherlands. University of Amsterdam The Netherlands. Tree. Visual search: what's next? Cees Snoek University of Amsterdam The Netherlands Euvision Technologies The Netherlands Problem statement US flag Tree Aircraft Humans Dog Smoking Building Basketball Table

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Distributed Kd-Trees for Retrieval from Very Large Image Collections

Distributed Kd-Trees for Retrieval from Very Large Image Collections ALY et al.: DISTRIBUTED KD-TREES FOR RETRIEVAL FROM LARGE IMAGE COLLECTIONS1 Distributed Kd-Trees for Retrieval from Very Large Image Collections Mohamed Aly 1 malaa@vision.caltech.edu Mario Munich 2 mario@evolution.com

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service

TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service TouchPaper - An Augmented Reality Application with Cloud-Based Image Recognition Service Feng Tang, Daniel R. Tretter, Qian Lin HP Laboratories HPL-2012-131R1 Keyword(s): image recognition; cloud service;

More information

Local features and matching. Image classification & object localization

Local features and matching. Image classification & object localization Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to

More information

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU

Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Image Retrieval for Image-Based Localization Revisited

Image Retrieval for Image-Based Localization Revisited SATTLER et al.: IMAGE RETRIEVAL FOR IMAGE-BASED LOCALIZATION REVISITED 1 Image Retrieval for Image-Based Localization Revisited Torsten Sattler 1 tsattler@cs.rwth-aachen.de Tobias Weyand 2 weyand@vision.rwth-aachen.de

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011

Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011 Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis

More information

Tracking in flussi video 3D. Ing. Samuele Salti

Tracking in flussi video 3D. Ing. Samuele Salti Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Processing with Google s MapReduce. Alexandru Costan 1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:

More information

16.1 MAPREDUCE. For personal use only, not for distribution. 333

16.1 MAPREDUCE. For personal use only, not for distribution. 333 For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Fast field survey with a smartphone

Fast field survey with a smartphone Fast field survey with a smartphone A. Masiero F. Fissore, F. Pirotti, A. Guarnieri, A. Vettore CIRGEO Interdept. Research Center of Geomatics University of Padova Italy cirgeo@unipd.it 1 Mobile Mapping

More information

Big Graph Processing: Some Background

Big Graph Processing: Some Background Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs

More information

Machine Learning over Big Data

Machine Learning over Big Data Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed

More information

SEAIP 2009 Presentation

SEAIP 2009 Presentation SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:

More information

Distributed Dynamic Load Balancing for Iterative-Stencil Applications

Distributed Dynamic Load Balancing for Iterative-Stencil Applications Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,

More information

Data Mining in the Swamp

Data Mining in the Swamp WHITE PAPER Page 1 of 8 Data Mining in the Swamp Taming Unruly Data with Cloud Computing By John Brothers Business Intelligence is all about making better decisions from the data you have. However, all

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Digital Image Increase

Digital Image Increase Exploiting redundancy for reliable aerial computer vision 1 Digital Image Increase 2 Images Worldwide 3 Terrestrial Image Acquisition 4 Aerial Photogrammetry 5 New Sensor Platforms Towards Fully Automatic

More information

Computer Vision - part II

Computer Vision - part II Computer Vision - part II Review of main parts of Section B of the course School of Computer Science & Statistics Trinity College Dublin Dublin 2 Ireland www.scss.tcd.ie Lecture Name Course Name 1 1 2

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

Image Search by MapReduce

Image Search by MapReduce Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s

More information

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang

Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform

More information

Segmentation & Clustering

Segmentation & Clustering EECS 442 Computer vision Segmentation & Clustering Segmentation in human vision K-mean clustering Mean-shift Graph-cut Reading: Chapters 14 [FP] Some slides of this lectures are courtesy of prof F. Li,

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING

ACCELERATING COMMERCIAL LINEAR DYNAMIC AND NONLINEAR IMPLICIT FEA SOFTWARE THROUGH HIGH- PERFORMANCE COMPUTING ACCELERATING COMMERCIAL LINEAR DYNAMIC AND Vladimir Belsky Director of Solver Development* Luis Crivelli Director of Solver Development* Matt Dunbar Chief Architect* Mikhail Belyi Development Group Manager*

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Fast Analytics on Big Data with H20

Fast Analytics on Big Data with H20 Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,

More information

Canonical Image Selection for Large-scale Flickr Photos using Hadoop

Canonical Image Selection for Large-scale Flickr Photos using Hadoop Canonical Image Selection for Large-scale Flickr Photos using Hadoop Guan-Long Wu National Taiwan University, Taipei Nov. 10, 2009, @NCHC Communication and Multimedia Lab ( 通 訊 與 多 媒 體 實 驗 室 ), Department

More information

Keyframe-Based Real-Time Camera Tracking

Keyframe-Based Real-Time Camera Tracking Keyframe-Based Real-Time Camera Tracking Zilong Dong 1 Guofeng Zhang 1 Jiaya Jia 2 Hujun Bao 1 1 State Key Lab of CAD&CG, Zhejiang University 2 The Chinese University of Hong Kong {zldong, zhangguofeng,

More information

Flow Separation for Fast and Robust Stereo Odometry

Flow Separation for Fast and Robust Stereo Odometry Flow Separation for Fast and Robust Stereo Odometry Michael Kaess, Kai Ni and Frank Dellaert Abstract Separating sparse flow provides fast and robust stereo visual odometry that deals with nearly degenerate

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult

More information

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow

A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow , pp.233-237 http://dx.doi.org/10.14257/astl.2014.51.53 A Study on SURF Algorithm and Real-Time Tracking Objects Using Optical Flow Giwoo Kim 1, Hye-Youn Lim 1 and Dae-Seong Kang 1, 1 Department of electronices

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Bundle adjustment gone public

Bundle adjustment gone public GoBack Bundle adjustment gone public Manolis I.A. Lourakis lourakis@ics.forth.gr http://www.ics.forth.gr/~lourakis/ Institute of Computer Science Foundation for Research and Technology - Hellas Vassilika

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Interactive Dense 3D Modeling of Indoor Environments

Interactive Dense 3D Modeling of Indoor Environments Interactive Dense 3D Modeling of Indoor Environments Hao Du 1 Peter Henry 1 Xiaofeng Ren 2 Dieter Fox 1,2 Dan B Goldman 3 Steven M. Seitz 1 {duhao,peter,fox,seitz}@cs.washinton.edu xiaofeng.ren@intel.com

More information

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data

Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Application of Predictive Analytics for Better Alignment of Business and IT

Application of Predictive Analytics for Better Alignment of Business and IT Application of Predictive Analytics for Better Alignment of Business and IT Boris Zibitsker, PhD bzibitsker@beznext.com July 25, 2014 Big Data Summit - Riga, Latvia About the Presenter Boris Zibitsker

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Performance Workload Design

Performance Workload Design Performance Workload Design The goal of this paper is to show the basic principles involved in designing a workload for performance and scalability testing. We will understand how to achieve these principles

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

ATTRIBUTE ENHANCED SPARSE CODING FOR FACE IMAGE RETRIEVAL

ATTRIBUTE ENHANCED SPARSE CODING FOR FACE IMAGE RETRIEVAL ISSN:2320-0790 ATTRIBUTE ENHANCED SPARSE CODING FOR FACE IMAGE RETRIEVAL MILU SAYED, LIYA NOUSHEER PG Research Scholar, ICET ABSTRACT: Content based face image retrieval is an emerging technology. It s

More information

Patterns of Information Management

Patterns of Information Management PATTERNS OF MANAGEMENT Patterns of Information Management Making the right choices for your organization s information Summary of Patterns Mandy Chessell and Harald Smith Copyright 2011, 2012 by Mandy

More information

Large-Scale Test Mining

Large-Scale Test Mining Large-Scale Test Mining SIAM Conference on Data Mining Text Mining 2010 Alan Ratner Northrop Grumman Information Systems NORTHROP GRUMMAN PRIVATE / PROPRIETARY LEVEL I Aim Identify topic and language/script/coding

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

MusicGuide: Album Reviews on the Go Serdar Sali

MusicGuide: Album Reviews on the Go Serdar Sali MusicGuide: Album Reviews on the Go Serdar Sali Abstract The cameras on mobile phones have untapped potential as input devices. In this paper, we present MusicGuide, an application that can be used to

More information

Media Cloud Service with Optimized Video Processing and Platform

Media Cloud Service with Optimized Video Processing and Platform Media Cloud Service with Optimized Video Processing and Platform Kenichi Ota Hiroaki Kubota Tomonori Gotoh Recently, video traffic on the Internet has been increasing dramatically as video services including

More information

Integrating Apache Spark with an Enterprise Data Warehouse

Integrating Apache Spark with an Enterprise Data Warehouse Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

On Demand Satellite Image Processing

On Demand Satellite Image Processing On Demand Satellite Image Processing Next generation technology for processing Terabytes of imagery on the Cloud WHITEPAPER MARCH 2015 Introduction Profound changes are happening with computing hardware

More information

Topology Aware Analytics for Elastic Cloud Services

Topology Aware Analytics for Elastic Cloud Services Topology Aware Analytics for Elastic Cloud Services athafoud@cs.ucy.ac.cy Master Thesis Presentation May 28 th 2015, Department of Computer Science, University of Cyprus In Brief.. a Tool providing Performance

More information

High-Volume Data Warehousing in Centerprise. Product Datasheet

High-Volume Data Warehousing in Centerprise. Product Datasheet High-Volume Data Warehousing in Centerprise Product Datasheet Table of Contents Overview 3 Data Complexity 3 Data Quality 3 Speed and Scalability 3 Centerprise Data Warehouse Features 4 ETL in a Unified

More information

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

White Paper. How Streaming Data Analytics Enables Real-Time Decisions White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing

A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing A Scalable Network Monitoring and Bandwidth Throttling System for Cloud Computing N.F. Huysamen and A.E. Krzesinski Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South

More information