Cloud based object recognition: A system proposal

Cloud based object recognition: A system proposal Daniel LORENČÍK 1, Peter SINČÁK 2 Abstract In this chapter, we will present a proposal for the cloud based object recognition system. The system will extract the local features from the image and classify the object on the image using Membership Function ARTMAP (MF ARTMAP) or Gaussian Markov Random Field model. The feature extraction will be based on SIFT, SURF and ORB methods. Whole system will be built on the cloud architecture, to be readily available for the needs of the new emerging technological field of cloud robotics. Besides the system proposal, we specified research and technical goals for the following research. 1 Introduction Since the history of computers and computing began roughly 70 years ago, we have seen the large scale computers replaced by affordable personal computers. In the last years, we are witnesses to another notion the personal computers shrank in size to tablets, netbooks and even smartphones, and the heavy computational and storage tasks are offloaded to the cloud. Also, the applications available on the cloud have high impact on the productivity as they allow for easy implementation on sharing data between several users, thereby promoting real-time collaboration and aggregation of crowd knowledge, example being the Google Apps suite [1] or Microsoft Office 365 [2]. With this knowledge in mind, it is possible to envision the similar system of application which will be available for use by robots. The obvious benefit is the possibility of creating small robots with greater longevity of battery life since the heavy computation is done elsewhere. These robots will not have to be highly sophisticated. Therefore, they can be cheap or can be created from available resources like smartphones combined with the wheeled chassis. More than that, the robots can benefit from the sharing of knowledge. This idea was presented by pro- 1 Department of Cybernetics and Artificial Intelligence, Technical University of Košice daniel.lorencik@tuke.sk 2 Department of Cybernetics and Artificial Intelligence, Technical University of Košice peter.sincak@tuke.sk

2 Daniel LORENČÍK, Peter SINČÁK fessor James Kuffner in his talk Robots with their heads in the Clouds [3]. The knowledge sharing in real time has a potential to influence the ability of robots to exist in the real world as the knowledge gained in the learning process is gained by all robots using the service. We will provide more detailed information on the cloud robotics in the next section. With the availability of cloud computing, and method of artificial intelligence provided as a service in the cloud environment, the idea of remote brain [4] resurfaces again. It is true that the connection to the cloud is crucial; therefore it is a weak link in the chain, but with the available connection options via WiFi and availability of 3G and 4G networks, this is more of a technological problem. The structure of the chapter is as follows: in the second section, we will provide an introduction to the cloud computing and define the cloud robotics. In this section, we will also present the projects of cloud robotics. In the third section, we will provide an overview of three methods (SIFT, SURF, ORB) for the feature extraction from the image, and two methods for classification (Membership Function ARTMAP and Gaussian Markov Random Field model) of the objects on the image based on extracted features. In the fourth section, we will propose a cloud based system for object recognition on image. Fifth section contains the conclusion of the chapter. 2 Cloud Robotics The cloud computing can be viewed as grid computing with the added concepts from utility, service and distributed computing [5]. The relationship between different distributed computing systems is shown on Fig. 1. Fig. 1 Relationship between orientation and scale of different distributed computing systems The cloud computing was defined by the National Institute of Standards and Technology as the model for enabling ubiquitous, convenient, on-demand net-

Cloud based object recognition: A system proposal 3 work access to a shared pool of configurable computing resources (e. g. networks, servers, storage, applications and services) that can be rapidly provisioned with minimal effort or service provider interaction. [6]. The clouds are provided in four deployment models: Private Cloud cloud infrastructure is used exclusively by a single organization Community Cloud cloud infrastructure is used by a group of consumers with shared concerns Public Cloud cloud infrastructure is provided for public use Hybrid Cloud combines at least two of the previous models with the clear distinction between models of cloud infrastructure but provides the possibility to port applications from one model to another Besides the deployment models, the cloud also provide three types of services: Infrastructure as a Service (IaaS) the user has the ability to create and manage virtual machines depending on his/her individual needs. The administration of machines, networking and all settings are the responsibility of the user Platform as a Service (PaaS) the user is provided with an access to the highlevel integrated environment to build, test and deploy applications. Part of the required settings is managed by the platform itself (also the scaling is done automatically). However, this can present some restrictions on the use of programming language or tools. Software as a Service (SaaS) the software or application is provided to the end users. Benefits are instant update of the application, and the minimal footprint of it on the user computer (usually is used from internet browser). Examples of cloud services are Google Apps [1], or storage oriented services with automatic synchronization like Dropbox [7] or SkyDrive [8]. Cloud robotics is based on the notions of cloud and combines the computational power of the computer cloud and the availability of internet-connected devices. Device can be any hardware that has the ability to connect to the internet, and can be programmed to use the cloud services. It can be virtually any robot that has wired or wireless connection, or it can be smartphone, small computer (NetDuino [9], Raspberry Pi [10]). Especially when using smartphones with connected actuators (Romo [11], SmartBot [12]), or low cost small computers like Raspberry Pi, it is possible to create affordable robots, for which the cloud robotics can provide the software needed. This software can be in the form of AI bricks [13]. Most of the projects in cloud robotics until now were focused on the task of creating the cloud robotics infrastructure. In the process, similar to the services in cloud computing, the Robot as a Service (RaaS) was defined [14]. RaaS has to have the features of Service oriented architecture, namely it has to be a service provider, service broker and service client. The RaaS makes available the actions it can perform, accepts connections to it, and is able to use other services as well.

4 Daniel LORENČÍK, Peter SINČÁK As was said, there are several projects concerned with cloud robotics: DAvinCi is a cloud-based framework for service robots [15], which allows several robots to communicate together and collaborate on the creation of the environment map using FastSLAM algorithm. MyRobots.com is a web based project focused on connecting all robots and intelligent devices to the Internet [16]. It is promoted as a cloud service for robots, although currently only app store and basic monitoring service are available. It is possible to download the application for the device, and also upload user-created application. Monitoring service allows for remote monitoring of robot status, and it can send alerts if the robots encounter problem. ASORO is an acronym for A*Star Social Robotics [17]. The main goal of the project is to create and promote social robots. From the cloud robotics point of view, this project is intriguing because all the robots created use the Unified Robotics Framework (UROF), which is essentially an operating system allowing to connect modules for robot functions. These are similar to the AI bricks already mentioned, and are used as needed for the tasks as path planning, task planning, navigation control and other. RoboEarth is a project which goal has been to create a World Wide Web for robots [18]. RoboEarth is a collection of databases storing actions, objects data and environments data. These databases are shared amongst connected robots. Therefore if one robot has learned to identify certain object, all others robots gain this knowledge as well. The same is true for actions (or action recipes), which describe how to do tasks, and environments, which store information about the object and their locations. The data in databases are encoded in semantic language OWL, so it is possible to derive new knowledge from existing or to use the same approach to the similar action. Also, the actions are finely tuned with the use. The action recipes are composed of atomic actions, which are again similar to the notion of the AI bricks. The proposal of the system is based upon knowledge gained from these projects and aims to an AI brick, which can be used in already available cloud robotics frameworks. 3 Image Processing and Object Classification As our proposed system will provide a cloud based service for object classification on the image, in this section we will provide an overview of methods we will use. We will start with methods for extraction of local features from the image - SIFT, SURF and ORB, and continue with the classification methods based on Membership Function ARTMAP and Gaussian Markov Random Fields model.

Cloud based object recognition: A system proposal 5 Scale-invariant Feature Transform was described in detail in [19]. SIFT extracts local features from scale-space extrema called key points. Key points are identified as a minimum (or maximum) of difference of Gaussians occurring at multiple scales of image pyramid. Next, the unstable key points are removed, and the orientation is assigned. Computing from the image pyramid provide invariance to scale, assigning the orientation based on a peak in local histogram provides invariance to the rotation, and invariance to illumination is provided by thresholding the values of descriptor vector comprised of values of orientation histograms. The final descriptor has 128 dimensions. Experimental results from [19] suggest that SIFT can recognize even partially occluded objects, as for the object recognition only 3 descriptors are needed. It is invariant to scale, rotation and translation and partially to the affine translation (up to 50 degrees). Several improved method were proposed Affine SIFT [20] to improve invariance to the tilt of the camera and Principal Component Analysis SIFT (PCA-SIFT) [21] which creates the descriptor vector using PCA. Speeded Up Robust Transform was described in [22]. It is inspired by SIFT, uses three stages: detection, description and matching and is tailored to provide high speed and accuracy similar to SIFT. Detection is based on using basic Hessian matrix approximation and integral images. Interest points are blobs located at maxima of the determinant of Hessian matrix. The searching of interest points on different scales is done by gradually larger filters as opposed to the resampling of the image in the SIFT [22], [23]. Description is similar to the SIFT approach, and based on Haar wavelet responses and produces a 64 dimensional vector (there are modification with different length of descriptor vector SURF-36 and SURF- 128). In matching phase, the sign of Laplacian is used to determine if the interest point is a bright blob on a dark background or dark blob on a white background. Only the features with the same sign are compared, which leads to speed improvement over SIFT. Oriented FAST and Rotated BRIEF is a new approach proposed in [24]. It uses modified edge detector FAST (Features from Accelerated Segment Test, [25], [26]) and modified feature point descriptor BRIEF (Binary Robust Independent Elemental Features, [27]). It uses FAST edge detector to detect corners, and to provide scale invariation, the FAST is run on the scale image pyramid. Orientation of the corner is found by use of intensity centroid, which assumes that the corner intensity is offset from the center of the corner. BRIEF is found to be equal in performance to SIFT and SURF, and since it uses binary strings as description vectors, it is computed faster [27]. As the BRIEF is not rotation-invariant, the corners found by oriented FAST are normalized in orientation and then the BRIEF descriptor is computed. Then the uncorrelated key points are selected and used to construct a Rotated BRIEF descriptor. ORB was designed to be faster than existing local feature detectors SIFT and SURF. Experiments described in [24] suggest it is magnitude faster than SURF and two magnitudes faster than SIFT with the similar recognition ability.

6 Daniel LORENČÍK, Peter SINČÁK Membership Function ARTMAP is a classification tool [28], [29] based on Adaptive Resonance Theory [30], [31] and the theory of Fuzzy Sets. The knowledge representation is based on the hypothesis that input samples are in fuzzy clusters in feature space, which is the universe of fuzzy sets. Therefore, it is possible to calculate the membership value of each point from feature space to every fuzzy cluster defined in this space. The MF ARTMAP network consists of: The input layer of neuron, which normalizes input and maps the input samples to the comparison layer, and the number of neurons is the same as the number of dimensions in feature space; Comparison layer is n to m grid, where n is the dimension of the feature space, and m is the number of neurons in recognition layer; size of the layer is dynamically changed in the process of learning; Recognition layer contains neurons representing the fuzzy clusters in feature space; therefore the number of neurons can change in the learning process MapField layer which consists of neurons representing fuzzy classes. Here is computed the value of the membership of the sample to the fuzzy class Learning algorithm of MF ARTMAP is divided into two steps: structure adaptation, where new fuzzy clusters are created (therefore changing recognition and comparison layer); and parameters adaptation, where parameters of membership function stored in connections between layers are changed. Gaussian Markov Random Field model (GMRF) was investigated for the task of object classification in [32], [33] on texture data, and in [34] the GMRF was investigated for the task of image classification. The use of GMRF is interesting because any data distribution can be approximated by Gaussian mixtures and there are many mathematical techniques to allow easy work with these data [35]. In the [32], the textures were modeled with the GMRF with parameters estimated from training samples observed at given angle as the GMRF is not invariant to scale and rotation. Classifier was based on modified Bayes rule consisting of obtaining the maximum likelihood estimate of the rotation and scale parameters for each class hypothesis, comparing the results and mapping the input sample to the class with the highest estimate. GMRF model of texture was parameterized, and the rotation and scale parameters have been a part of the model using spectral density of the GMRF. The results of the experiments show that this approach proved successful, and can be found in [32]. 4 System Proposal for Cloud-based Object Classification Based on the study of the cloud robotics, we have identified a challenging research topic in this field cloud-based system for object classification from image data. This system contributes to the cloud robotics as a distributed vision system can be available for any device capable of connecting to the internet and able to use

Cloud based object recognition: A system proposal 7 cloud service, will be based on the shared knowledge base and will fulfill the criteria for becoming and AI brick as defined in [13]. This system should be usable in existing cloud robotics frameworks like RoboEarth. The high level overview of the proposed system is shown on the Fig. 2. The system will accept the image from the device in most commonly used image formats. Then, the image will be preprocessed, and features will be extracted using already mentioned local feature extracting methods. The final decision which method will be used will be done depending on the results of performance tests. By using the single feature extraction method, we will ensure that the feature space will be normalized for all users. The clustering and classification will be done in classification module which will use shared knowledge from all users. The classification service will then send back the result of classification which will consist of at least five most probable object classes. In case the object on the image will not be classified or classified wrongly, the user will have the option to offer a better result. Fig. 2 High level overview of architecture of the proposed system. Object classification module is shared between users and knowledge is stored in the structure of classifier The main contribution of the system will be availability for various devices, and as a result of use of cloud computing platform should be available everywhere and every time. Second main contribution will be the use of shared knowledge. This will allow for increased rate in building a knowledge base, and will provide a higher quality service with a higher number of users. The

8 Daniel LORENČÍK, Peter SINČÁK knowledge sharing, easy availability and implementation are the characteristics of proposed system. The challenge here will be to adapt the MF ARTMAP and GMRF model for the cloud architecture. These two methods will be compared to their stand-alone versions. To test this system, we will also create a test to verify the robustness and performance of the system. We will use standard classification tests, as well as test the classification of standard household objects. Most decisive test will be the comparison of the system which is open to general use versus the system open to only handful of expert teachers. Open system can benefit from the crowd knowledge, where every user of the system can also contribute to the learning process. Hypothesis stands that the system open to general use will be faster in training, will have more object classes and will provide more accurate results in the long-term use. 5 Conclusion In this chapter, an overview of cloud robotics, local features extraction methods SIFT, SURF, ORB and classifiers Membership Function ARTMAP and Gaussian Markov Random Field model were presented. Based on the knowledge gained, the cloud-based object classification system was proposed. The system was proposed as an AI brick, and aims to provide powerful and easy to use object classification from the image data for existing and future robots. The advantages of providing this system as a cloud service will be availability, both geographical, and for various devices (only requirements are the ability to connect to the internet and be able to use the cloud services), instant sharing of gained knowledge between connected devices, easy rollout of the new version, scalability, reliability and offloading of heavy tasks to the cloud. More importantly, the system will be created in a way that will allow for easy integration to the existing cloud robotics frameworks like RoboEarth. From the research point of view, since the cloud robotics is a relatively new field of technology and research, there are many challenges associated with it. One of them is the question if the cloud robotics will have an impact on the method traditionally used in Artificial Intelligence, the example being the implementation of the neural network on the cloud with the shared structure for every user. Acknowledgments Research supported by the "Center of Competence of knowledge technologies for product system innovation in industry and service", with ITMS project number: 26220220155 for years 2012-2015.

References Cloud based object recognition: A system proposal 9 [1] Google Apps. [Online]. Available: http://www.google.com/enterprise/apps/. [Accessed: 05-Jun-2013]. [2] Microsoft Office 365. [Online]. Available: http://office.microsoft.com/en-001/. [Accessed: 24-Jul-2013]. [3] E. Guizzo, Robots With Their Heads in the Clouds, IEEE Spectrum, 28-Feb-2011. [4] M. Inaba, Remote-brained humanoid project, Advanced Robotics, vol. 11, no. 6, pp. 605 620, 1996. [5] I. T. Foster, Y. Zhao, I. Raicu, and L. Shiyong, Cloud Computing and Grid Computing 360-Degree Compared, in 2008 Grid Computing Environments Workshop, 2008, pp. 1 10. [6] P. Mell and T. Grance, The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology, Nist Special Publication, vol. 145, p. 7, 2011. [7] DropBox. [Online]. Available: http://www.dropbox.com/. [Accessed: 03-Jun-2013]. [8] SkyDrive. [Online]. Available: https://skydrive.live.com/. [Accessed: 05-Jun-2013]. [9] netduino. [Online]. Available: http://www.netduino.com/. [Accessed: 22-Jul-2013]. [10] Raspberry Pi. [Online]. Available: http://www.raspberrypi.org/. [Accessed: 22-Jul- 2013]. [11] Romo. [Online]. Available: http://romotive.com/. [Accessed: 25-Jul-2013]. [12] SmartBot. [Online]. Available: http://www.overdriverobotics.com/smartbot/. [Accessed: 25-Jul-2013]. [13] T. Ferraté, Cloud Robotics - new paradigm is near, Robotica Educativa y Personal, 20- Jan-2013. [14] Y. Chen, Z. Du, and M. García-Acosta, Robot as a Service in Cloud Computing, 2010 Fifth IEEE International Symposium on Service Oriented System Engineering, pp. 151 158, Jun. 2010. [15] R. Arumugam, V. R. Enti, K. Baskaran, and a S. Kumar, DAvinCi: A cloud computing framework for service robots, in 2010 IEEE International Conference on Robotics and Automation, 2010, pp. 3084 3089. [16] MyRobots.com. [Online]. Available: http://myrobots.com. [Accessed: 08-Jun-2013]. [17] H. Li, A*Star Social Robotics. [Online]. Available: http://www.asoro.astar.edu.sg/index.html. [Accessed: 13-Jun-2013]. [18] RoboEarth Project. [Online]. Available: http://www.roboearth.org/. [Accessed: 03-Jun- 2013]. [19] D. G. Lowe, Object recognition from local scale-invariant features, in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150 1157 vol.2. [20] G. Yu and J.-M. Morel, A FULLY AFFINE INVARIANT IMAGE COMPARISON METHOD, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 1597 1600. [21] Y. Ke and R. Sukthankar, PCA-SIFT: a more distinctive representation for local image descriptors, in Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, vol. 2, pp. 506 513. [22] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, Speeded-Up Robust Features (SURF), Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346 359, Jun. 2008. [23] H. Bay, T. Tuytelaars, and L. Van Gool, SURF: Speeded Up Robust Features, in European Conference on Computer Vision, 2006, pp. 404 417. [24] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, ORB : an efficient alternative to SIFT or SURF, in IEEE International Conference on Computer Vision, 2011, pp. 2564 2571.

10 Daniel LORENČÍK, Peter SINČÁK [25] E. Rosten and T. Drummond, Machine learning for high-speed corner detection, in European Conference on Computer Vision, 2006, pp. 430 443. [26] E. Rosten, R. Porter, and T. Drummond, Faster and better: a machine learning approach to corner detection., IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 1, pp. 105 119, Jan. 2010. [27] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, BRIEF : Binary Robust Independent Elementary Features, in European Conference on Computer Vision, 2010, pp. 778 792. [28] P. Sinčák, M. Hric, and J. Vaščák, Membership Function-ARTMAP Neural Networks, TASK Quarterly, vol. 7, no. 1, pp. 43 52, 2003. [29] P. Smolár, Object Categorization using ART Neural Networks, Technical University of Kosice, 2012. [30] G. A. Carpenter and S. Grossberg, The ART of adaptive pattern recognition by a selforganizing neural network, Computer, vol. 21, no. 3, pp. 77 88, 1988. [31] G. A. Carpenter and S. Grossberg, Adaptive Resonance Theory, MIT Press, Boston, 2003. [32] F. S. Cohen, Z. Fan, and M. A. Patel, Classification of rotated and scaled textured images using Gaussian Markov random field models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 2, pp. 192 202, 1991. [33] G. Rellier, X. Descombes, F. Falzon, and J. Zerubia, Texture feature analysis using a gauss-markov model in hyperspectral image classification, IEEE Transactions on Geoscience and Remote Sensing, vol. 42, no. 7, pp. 1543 1551, Jul. 2004. [34] M. Berthod, Z. Kato, S. Yu, and J. Zerubia, Bayesian image classification using Markov random fields, Image and Vision Computing, vol. 14, no. 4, pp. 285 295, May 1996. [35] R. A. Gopinath, Maximum likelihood modeling with Gaussian distributions for classification, in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 98 (Cat. No.98CH36181), 1998, vol. 2, no. 914, pp. 661 664.