A Visual Interactive Realtime EXplorer for Bitcoin!

Size: px
Start display at page:

Download "A Visual Interactive Realtime EXplorer for Bitcoin!"

Transcription

1 Scuola Politecnica e delle Scienze di Base Corso di Laurea Magistrale in Ingegneria Informatica Tesi di Laurea Magistrale in sistemi distribuiti A Visual Interactive Realtime EXplorer for Bitcoin Anno Accademico 2013/14 relatore correlatore Dott. Marco Benedetti candidato " matr. M Ch.mo Prof. Stefano Russo

2 Introduction 3 Chapter 1: Bitcoin: the protocol and the currency 5 Addresses 5 Transactions 6 Blocks 8 Monetary aspects and incentive 10 Bitcoin origins 12 Bitcoin ecosystem 13 Chapter 2: State of the art in bitcoin forensic analysis 15 Related literature 15 The flexcoin case 16 The biggest unsolved case: Mt.Gox eruption 16 Chapter 3: Considerations about the blockchain 18 Chapter 4: Requirements specification 23 Chapter 5: High-level architecture 27 Chapter 6: The Identity Reasoner 29 Address clustering 29 Data structure 31 Invariants of the identity reasoner database 32 Considerations about address graph size 39 Implementation details and experimental results 41 Chapter 7: The Query Engine 46 Balance queries 46 Flow queries 49 Implementation details 52 Chapter 8: The graphical user interface 54 Implementation details 60 Chapter 9: Experimental results 61 Future directions 66 References 67

3 Introduction Bitcoin is a peer-to-peer, decentralised virtual currency born a few years ago and currently used for trading various services and goods all over the world. Such currency is not legally recognised by most government and is controlled by no central entity. Despite these shortcomings (or perhaps thank to them), the market capitalisation of BC is already in the orders of billions of dollars. The identity of Bitcoin users is hidden behind pseudonyms, but the ledger book of financial transactions is globally visible. Many academic studies deal with the problem of linking pseudonyms to real-world identities and infer knowledge from the graph of transactions (whose size is in the order of the tens of millions and rising quickly). At present, the graph of transactions is explored manually or by ad-hoc scripting or by using software developed for other goals, such as visualizers for (graph) DBs. Looking for and making sense of interesting/novel transaction patterns may be quite challenging. This work aims to produce a modular, scalable, adaptable software toolkit meant to assist a human expert in analysing and making sense of a network of bitcoin transactions. The software will be be called VIREX-BC, as in Visual Interactive Realtime EXplorer for BC: visual, in that all the information will be presented and explored by sophisticated imagery and info graphics generated on-the-fly depending on the search context; interactive, in that input from users will be accepted at any moment to direct and refine the exploration, and 3

4 realtime, in that fresh transactions will be included and analysed on-the-fly as they are timestamped in the network. This work is inspired to bitiodine, an open source tool for extracting intelligence from the bitcoin network, developed by Michele Spagnuolo for his master thesis at Politecnico di Milano and published on Financial Cryptography [3]. 4

5 Chapter 1: Bitcoin: the protocol and the currency Bitcoin is a decentralised global digital currency, based on an open-source software, implementing a peer to peer network that agrees on a logical order of transactions thanks to a distributed algorithm. Its first appearance is dated to Jan. 2009, with the first release of the Bitcoin client. In this chapter we will describe the working principles of Bitcoin (addresses, transactions and blocks) and the ecosystem of services born around this striking technology. Addresses Before starting accepting payments in bitcoin, we have to create a bitcoin address. The address, in bitcoin, is a string of letters and numbers which can be thought of as an International Bank Account Number (IBAN) code: it s public, it has a reduced risk of transcription error and it s needed to receive money An example of address is 1Peppe- MEUXx6XgjubBnEQKtay2xpefnCZT and currently this address has a balance of 0.2 bitcoin. Generating a bitcoin address has no cost and it can be done using the bitcoin client. The privacy model introduced by bitcoin (see picture on the left) has public transactions, so addresses work also as pseudonyms to hide people real identities. A person can have multiple addresses, therefore one of the principal activities in bitcoin forensic analysis is linking addresses 5

6 controlled by the same user. In the chapter The Identity Reasoner a definition for address control and for cluster of addresses is given. Transactions Bitcoin transactions are public transfers of funds among addresses and, in particular, they can transfer bitcoins from from zero or more bitcoin addresses, to one or more bitcoin addresses. The following picture shows four transactions, in a scenario in which Alice and Bob are two bitcoin users. For each of the four transactions, inputs and outputs addresses are represented by colours: transactions on the utmost right have no input addresses and one output address, the mid transaction has one input address (green) and two output addresses (red and cyan) and finally, the fourth transaction has two input addresses and two output addresses. 6

7 Moreover, each input of a transaction, has a reference (marked with a broken line) to the output of a previous transaction. These references are needed to prove that an address owns an amount of bitcoins that wants to transfer and, together with transactions form a directed acyclic graph. Ron and Shamir analyse this graph in details in [6]. Transactions on the utmost right don t have neither input addresses nor references to previous transactions: they are the so called mining transactions. Mining transactions are the way bitcoins are injected into the network, and their amount is a reward for bitcoin miners, the ones that contribute to write the bitcoin public ledger book (see Blocks section for details). The mid transaction has a very common structure: it has one input address (the green one, that belongs to Bob) and two outputs (Bob s red address, and Alice s cyan address). In this case, Bob is spending his mined 50 bitcoins, to pay 30 bitcoins to the Alice s cyan address, that already has a balance of 50 bitcoins. Since the bitcoin protocol forces users to spend the whole output of a transaction, Bob needs to spend the whole content of its green address (50 bitcoins), even if he just wants to transfer 30 bitcoins to Alice. In order to collect change, Bob creates a new red address, called change address, and dispatches to it an amount of 19 bitcoins. As probably you have already noticed, the sum of the inputs is not equal to the sum of the outputs, but there is a difference of 1 bitcoin. This difference (that actually is lower that 1 bitcoin) is called a transaction fee, and is necessary to let the network accept and timestamp that transaction. The last transaction is multi-input. Let s suppose that Alice wants to buy a service for 60 bitcoins. She must prove to own 60 bitcoins, so she needs to insert two references to previous transactions that have deposited 80 bitcoins in her cyan address. In order to collect change, Alice generates an orange address that will be used in the future. It s not 7

8 necessary that multi input transactions have the same addresses as inputs. As we will see, multi input transactions are the primary source of linking addresses to the same controller. Technically speaking a transaction is a section of data that is broadcast to the bitcoin network. As shown in the picture at left, it s signed with the public key of the payee and the reference to the previous transaction is obtained through hashing. All transactions are public, not encrypted and permanently recorded into the blockchain since the origin. Blocks Now that we have defined a data structure for transactions, we need a way to logically order them, so that the payee knows that the previous owners of an output did not sign any earlier transaction, therefore avoiding double spending of a transaction output. A common solution to this problem is to introduce a trusted central authority, or mint, that checks every transaction for double spending. Bitcoin proposes a completely distributed solution, in which all transactions are publicly announced and the network agrees on a single history in which they were received [1]. Using the following picture as reference, we will now explain the distributed algorithm that each node performs in order to reach this goal. 8

9 Step1: new transactions are broadcast to all nodes Step2: each node collects new transactions into a block Blocks are represented as green boxes: they contain transactions, and are stored in an ordered list, called the blockchain, that contains the logical order of transactions in the bitcoin network. In fact, each block contains an hash of the previous one, proving that data of the previous block must have existed (in order to get into hash) at the time that the next is added to the blockchain. In the example shown in figure, we can state that Tx1 and Tx2 must have existed when Tx6 and Tx7 are added to the blockchain. transactions in the same block have to be considered concurrent, in the sense that it s not possible to logically order them. As we will see in the following sections, all nodes agree on a single blockchain and a new block is generated at a rate of 1 every (about) 10 minutes. Step3: each node works on finding a difficult proof-of-work for its block Once all new transactions are collected into a block, a node tries to add this block to the current blockchain and to persuade other nodes to agree that the next ring of the blockchain is the block he forged, but this is a very difficult task In fact, users in the bitcoin network will accept only blocks that carry with them a proof-of-work, just like the one described in [7], a piece of data difficult to find but immediate to verify. In particular, the proof-of-work for a block is a nonce value that gives a blocks s hash that be- 9

10 gins with a fixed number of zero bits. Once the CPU effort has been expended to make it satisfy the proof-of-work, the block cannot be changed without redoing the work. As later blocks are chained after it, the work to change the block would include redoing all the blocks after it. The majority decision is represented by the longest chain, which has the greatest proof of work invested in it. This algorithm works if a majority of CPU power is controlled by honest nodes, where a node is said to be honest when accepts to work on the longest chain he knows. Some considerations about proof-of-work integrity can be found in [1] and [8]. Step4: When a node finds a proof-of-work, it broadcasts the block to all nodes. Step5: Nodes accept the block only if all transactions in it are valid and not already spent. Step6: Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash. Monetary aspects and incentive Bitcoin is designed as a system where no central monetary authority is involved. In fact, new money is created and introduced into the system via the process of validating transactions (i.e. finding valid blocks): by convention, the first transaction in a block is a spe- 10

11 cial transaction that starts a new coin owned by the creator of the block. This, apart from providing a way to initially distribute coins into circulation, adds an incentive for nodes to support the network. The steady addition of a constant of amount of new coins is analogous to gold miners expending resources to add gold to circulation. In our case, it is CPU time and electricity that is expended [1]. The supply of money evolves based on an agreement between users performing the mining activity [2]. Currently, the scheme has been technically designed to supply money with a predictable pace, and the number of bitcoins generated per block will half every 4 years, reaching a total number of bitcoins into circulation equals to 21 millions in 2040 (see graph from [2]). This solution has many macroeconomic negative implications, such as price instability and deflationary economy. When the money supply has reached the plateau, the incentive will be found with transaction fees. If the output value of a transaction is less than its input value, the difference is a fee that is added to the incentive value of the block containing the transaction. Both incentives may help encourage nodes to stay honest. If a greedy attacker is able to assemble more CPU power than all the honest nodes, he would have to choose between using it to defraud people by stealing back his payments, or using it to generate new coins. He ought to find it more profitable to play by the rules, such rules that favour him with more new coins than everyone else combined, than to undermine the system and 11

12 the validity of his own wealth. [8] assesses integrity by proof of work in a scenario in which bitcoin is used as a primary currency for online transfers currently carried out by credit cards. Bitcoin origins The theoretical roots of Bitcoin can be found in the Austrian school of economics and its criticism of the current fiat money system and interventions undertaken by governments and other agencies, which, in their view, result in exacerbated business cycles and massive inflation [2]. In 1988, cryptography advocate Wei Dai suggested a system in which the currency would be both regulated and created through crowdsourced cryptography. In 2008, a person (or a group of people) under the pseudonym of Satoshi Nakamoto distributed a paper named Bitcoin: A Peer-to-Peer Electronic Cash System [1] and then released an open source software named Bitcoin, which was a first attempt to give a shape to this idea. The first bitcoin transaction was dated Jan, 3rd In the first years of its life, bitcoin was used in small communities of early adopters: everyone could install the open source software on its personal computer and participate to the network also by minting new bitcoins. In 2010 Bitcoin was used by an individual to trade a real good for the first time, but the true explosion of its popularity can be dated to the mid of Since then a wide variety of service providers began to accept bitcoin as a mean of payments and an ecosystem of support services, such as wallet services or exchanges, was born. Some of these third-parties are noteworthy. 12

13 Bitcoin ecosystem Wallet services allow bitcoin users to transact with others without installing the bitcoin client. These services manage a bitcoin address in place of its respective owner, so that he can send and receive bitcoins, in a home-banking fashion. greenaddress.it and blockchain.info offer wallet services. Bitcoin currency exchanges allow users to trade bitcoins with other currencies, earning commissions for each trade. They usually operate also as a wallet service, storing amounts of money on behalf of their customers, and allow deposits and withdrawals in different currencies. The Silk Road was a famous online shop in the deep web that could only be accessed via TOR. This site allowed people to buy a variety of items, but became famous for being a drug market and other illicit items. On October 2, 2013 the FBI shut down the silk road and its creator was arrested on charge of alleged murder-for-hire and narcotics trafficking violation. Mining pools are distributed services aimed at transaction validation, in which clients contribute together to the validation of a transaction, and then split the reward that comes from this activity, according the processing power that each participant put into play. Pooled mining effectively reduces the granularity of the block generation reward, spreading it out more smoothly over time. Deepbit was an example of mining pool. Bitcoin network is an optimal infrastructure for gambling. Its protocol allows online gambling services to confirm that the results were actually calculated fairly without trusting any external party. Hundreds of gambling sites exploit bitcoin network, including dice games, casino, lotteries, slot machines, and poker rooms. Satoshidice and just-dice are two of the most famous dice games. 13

14 Nowadays many vendors accept bitcoin as a mean of payment, including restaurants and shops, even if it s rather unusual that bitcoins are used to purchase physical goods or services, in particular because of price instability. coinmap.org shows vendors accepting bitcoin spread all around the world. It s important to know that bitcoin is not the unique exemplar of virtual currency: many other currencies, called alt-coins, have been created since 2009, by modifying bitcoin core source code, such as litecoin, namecoin, dogecoin. Moreover, a lot of currencies, usually called meta-coins, are built upon the bitcoin infrastructure, each adding a particular service (e.g. zerocoin adds strong anonymity to bitcoin). In general, the bitcoin protocol and its infrastructure (the blockchain), currently mainly used to transfer coins among people, can be used to send public, potentially anonymous, timestamped, timeless, and certified messages, therefore has a wide variety of applications and can replace many forms of intermediations. It s not sure that bitcoin will success as a currency, but for sure it s technology is worth of attention by entrepreneurs and regulators. 14

15 Chapter 2: State of the art in bitcoin forensic analysis Related literature Due to Bitcoin claimed anonymity, forensics analysis in its network has been a well studied topic in literature since In 2011 Reid and Harrigan [9] first linked addresses belonging to the same entity and showed some implications for anonymity. In 2012 [10] analysed and evaluated the privacy implications of Bitcoin if it was used as a primary currency to support the daily transactions of individuals in a university setting. Through a simulator that faith-fully mimics the use of Bitcoin within a university, they show that the profiles of almost 40% of the users can be, to a large extent, recovered. In 2013 some researchers at the University of California collected information on the web and tried to group bitcoin addresses based on the evidence of shared authority. Their work is published in [4]. In 2011, Michele Spagnuolo released the open source software Bitiodine, simultaneously with his thesis at the university of Illinois. Its work has been later published on Financial Cryptography [3], with the name Bitiodine: Extracting Intelligence From The Bitcoin Network. Bitiodine is able to cluster addresses and classify them using a dataset partially obtained in an automatic fashion, using scrapers for major web sources of bitcoin addresses. Bitiodine has been the main source of inspiration for this work: with the help of its creator, its source has been deeply studied and analysed. Virex tries to maintain bitiodine strengths and to add some improvements, such as an architecture for real time tracking of transactions and a graphical user interface. 15

16 The flexcoin case Flexcoin 1, a bitcoin bank, has been forced to close because of a theft of 896 bitcoin on March, 3rd. The company posted on its website the following statement: The attacker logged into the flexcoin front end from IP address under a newly created username and deposited to address 1DSD3B3uS2wGZjZAwa2dqQ7M9v7Ajw2iLy The coins were then left to sit until they had reached 6 confirmations. The attacker then successfully exploited a flaw in the code which allows transfers between flexcoin users. By sending thousands of simultaneous requests, the attacker was able to "move" coins from one user account to another until the sending account was overdrawn, before balances were updated. This was then repeated through multiple accounts, snowballing the amount, until the attacker withdrew the coins (1NDkevapt4SWYFEmquCDBSf7DLMTNVggdu, and 1QFcC5JitGwpFKqRDd9QNH3eGN56dCNgy6) Provided information are enough to visualise flows between flexcoin and its attacker and also to infer some conclusions about the end of stolen coins. The biggest unsolved case: Mt.Gox eruption Mt. Gox, called "Mount Gox" or "MTGOX", was one of the most widely used bitcoin currency exchange market: it was launched in July 2010 and by 2013 was handling 70% of all Bitcoin tradings 2. The market was closed on February,. Mark Karpelès, Mt. Gox CEO, claimed bankruptcy and announced that around 850,000 bitcoins belonging to customers and the company were missing and likely stolen. Although 200,000 bitcoins

17 have since been found, the reason(s) for the disappearance theft, fraud, mismanagement, or a combination of these are unclear as of March. The timeline of the events that lead to Mt.Gox shutdown are the following. On 07 February Mt. Gox halted all bitcoin withdrawals. The company said it was pausing withdrawal requests to obtain a clear technical view of the currency processes. On 10 February The company issued a press release stating that the issue was due to transaction malleability, a known bug that affected many bitcoin clients, including the official one. For technical details about transactions malleability, see Decker and Wattenhofer [11]. On 24 February, Mt. Gox suspended all trading, and hours later its website went offline, returning a blank page On 28 February Mt. Gox filed for bankruptcy protection in Tokyo, reporting that the company had lost almost 750,000 of its customers' bitcoins, and around 100,000 of its own bitcoins. On 20 March, Mt. Gox reported on its website that it found 200,000 bitcoins in an old format cold wallet. That brings the total number of lost bitcoins, down to 650,000 from 850,

18 Chapter 3: Considerations about the blockchain In this chapter the reader will find some considerations about the blockchain size and consequent scalability of virex. First, the state of the current blockchain will be analysed and some assumptions about the structure of transactions will be made. Then, the trend of the total number of transactions will be considered in order to make an attempt to predict the size of the blockchain in the future. In the following table there are some measurements obtained from the blockchain at time of writing (Tue, 13 May 06:42:37 GMT). Number of blocks Number of transactions ~39M Number of distinct addresses ~36M Number of outputs ~100M Number of inputs ~89M Each transaction can have an arbitrary number of inputs and outputs, and can generate an arbitrary number of new addresses, but some considerations about the distributions of the number of inputs and outputs per transaction can be made. The following charts gives evidence to the fact that these distributions show a peak at 1 input and 2 outputs, respectively #inputs/transaction #outputs/transaction 18

19 Number of inputs Probability Number of outputs Probability A transaction structure with one input and two outputs is the most common, since the input address is used to collect money, the former output address is controlled by the payee and the latter is used to collect change. Sometimes one address is not sufficient to collect an high amount of money to transfer and more input addresses are needed. We can assume that these distributions as approximatively time-invariant because an increasing in the size of a transaction (in terms of number of outputs and inputs) leads to expensive transaction fees. The expected values for the above distributions are summarized in the following table. Current number of transactions ~39M Expected number of addresses per transaction E[na/nt] 0,92 Expected number of inputs per transactions E[nin/nt] 2,30 Expected number of inputs per transactions E2[nin/nt] 2,31 Expected number of outputs per transactions E[nout/nt] 2,58 NB: The value expected number of inputs per transaction E2 considers mining transactions as transactions with one input. Now let s focus on the total number of transactions and on its derivative, the number of transactions per day (see diagrams below). Both have been growing quite slow from 2009 to mid 2012, but straight afterwards they started to grow faster, with a change in 19

20 the trend of the number of transactions per day, in the mid of The steeper slope is in accordance with the diffusion of bitcoin among not-very-early adopters, and probably we will experience other trend-changes in the future, but it s possible to state that, when the growth will significantly slow down, the total number of transactions will level off at an approximately constant value. In order to disclose the current trend behind the growth of the total number of transactions after mid 2012, and predict its value in the near future, a simple linear regression between the number of days elapsed from 2012 June 01 (next called the reference date) 20

21 and the number of transactions per day is estimated, resulting in a fitting line with intercept at transactions/day and slope 53,75 transactions/day/day. Afterwards, to estimate the total number of transactions it s necessary to integrate this quantity, considering an initial value at the reference date of transactions. 21

22 The results are shown in the following table. Date (Jun 01) Blockchain estimated size Number of transactions per day Total number of transactions , , , These results are obtained through a rough calculation, but could be useful to asses the feasibility of the project: as explained in chapters The Identity Reasoner and The Query Engine, the database size depends linearly from the number of transactions. Since transactions per day currently grows linearly and, according to Moore s law, memory size doubles every year (or every three years) it should be possible, to keep in memory the whole database. However, this analysis is quite optimistic. If bitcoin will become commonly accepted as a mean of payment, transactions will grow at a very higher pace before reaching the saturation plateau. 22

23 Chapter 4: Requirements specification Requirements of VIREX can be resumed in a set of questions to which the system tries to answer. They are all intended to be queries in the sense that they don t modify the state of the internal systems and for this reason, virex interface is often refereed as the virex query language. A first classification of virex operations separates questions about balances and questions about flows. In the first category there are questions about the amount of bitcoins controlled by an addresses, an entity or a cluster; in the second there are questions about bitcoin transfers among addresses, but also about mined bitcoins. A second (orthogonal) classification separates questions about addresses and questions about cluster of addresses. The first class considers flows among single bitcoin addresses or bitcoin entities, without applying any clustering algorithm, and all information are extracted from the public ledger book of the bitcoin blockchain. The second class of operations answers applying clustering to bitcoin entities, with the aid of clustering heuristics and algorithms described in literature (see chapter named The Identity Reasoner ). Tables in this section specify all interface methods. Implementation details are in the chapter named The Query Engine. 23

24 F.R.1 Natural language questions BALANCE What s the balance of the address 1dice8EMZ at May 26 14:07:44 UTC? What s the balance of the addresses controlled by Satoshi at May 26 14:07:44 UTC? Name Type Description Inputs entity String The address, or the supposed controller, of which we are interested in the balance timestamp Number The unix timestamp of the date and time Outputs balance Number The balance, in satoshis, of specified entity at requested date and time F.R.2 Natural language question BALANCE CLUSTERED What s the balance of the cluster to which address 1dice8EMZ belongs, at 26 May 14:07:44 UTC? What s the balance of the cluster controlled by Giuseppe, at 26 May 14:07:44 UTC? Name Type Description Inputs entity String A representative address or supposed controller of the cluster of which we are interested in the balance timestamp Number The unix timestamp of the date and time Outputs balance Number The balance, in satoshis, of specified cluster at requested date and time In the un-clustered version of a balance query, when a controller is specified, virex returns the sum of the amounts of bitcoin deposited in the addresses controlled by the selected entity. 24

25 F.R.3 Natural language question FLOW What s the flow between address 1dice8EMZ and address Inputs 1NDpZ2wyFe... in the period of time that goes from 15 Jan 00:00 UTC to 26 May 14:07:44 UTC? What s the flow between addresses controller by Satoshi and address 1NDpZ2wyFe... in the period of time that goes from 15 Jan 00:00 UTC to 26 May 14:07:44 UTC? payer entity payee entity Name Type Description from date timestamp to date timestamp String String Number Number The address, or supposed controller, of the payer of the flow we are interested in The address, or supposed controller, of the payee of the flow we are interested in The unix timestamp of the initial date and time The unix timestamp of the final date and time Outputs flow Number The flow between addresses in the specified period F.R.4 Natural language question FLOW CLUSTERED What s the flow between the cluster to which address 1dice8EMZ belongs and the cluster controlled by Giuseppe, in the period of time that goes from 15 Jan 00:00 UTC to 26 May 14:07:44 UTC? Name Type Description Inputs payer entity String A representative address, or the supposed controller of the payer cluster payee entity String A representative address, or the supposed controller of the payee cluster from date timestamp Number The unix timestamp of the initial date and time to date timestamp Number The unix timestamp of the final date and time Outputs flow Number The flow between clusters in the specified period 25

26 F.R.5 Natural language question MINED BITCOINS (UN-CLUSTERED OR CLUSTERED) What s the amount of bitcoin mined by address 1dice8EMZ (or by the cluster to which the address 1dice8EMZ belongs) in the period of time that goes from 15 Jan 00:00 UTC to 26 May 14:07:44 UTC? It s important to notice that virex has been designed to ask a lot of more questions, such as: How many addresses did 1dice8EMZ payed, in the period of time going from 5 Jan 00:00 UTC to 26 May 14:07:44 UTC? Who controls the clusters that 1dice8EMZ payed in the period of time going from 5 Jan 00:00 UTC to 26 May 14:07:44 UTC? This questions enable for a deeper analysis of bitcoin flows, but are not formalized here, and no implementation is still available. Now let s define the real-time word in the VIREX acronym. We say that virex is realtime in the sense that all questions specified with the the bitcoin query language shall receive answers updated to the latest confirmed 3 transactions. After a transaction is broadcast to the bitcoin network, it may be included in a block and when that happens it is said that one confirmation has occurred for the transaction. With each subsequent block that is added to the blockchain, the number of confirmations is increased by one. To protect against double spending, a transaction should not be considered as confirmed until a certain number of blocks have been added. Just like the classic bitcoin client, we will consider a transaction as confirmed when at least 6 blocks confirm the transaction

27 Chapter 5: High-level architecture High level architecture for VIREX system is shown in the following figure. Backend components are enclosed in white boxes, data flows are represented by lines, and arrow s direction identifies the component that takes the initiative (push/pull). At the origin of data there is the Bitcoin Network that, block by block, timestamps transactions and inject them into an extended client that is responsible for realtime tracking of transactions (Realtime Tracker). The Transaction Manager is responsible to analyse new transactions and extract from them essential information needed for address clustering, and to update information controlled by the query engine. In particular, it takes into account flows and balances that generate from analyzed transactions. 27

28 The Query Engine is the core of the system and is the component responsible at answer questions described in requirements. It must be extremely fast and scalable, in order to support requests coming from the user interface (graphical or not). The Identity Reasoner tries to link addresses together, using information gathered from the blockchain itself and from the web. It clusters together all the addresses likely to be controlled by the same entity. Currently, not all described components have a real implementation. In particular only prototypes for the Identity Reasoner, Query Engine and Web user interface have been implemented. Moreover, all these components have to be orchestrated and synchronised to maintain a consistent state of the bitcoin transaction graph, but this problem is not addressed in this work. 28

29 Chapter 6: The Identity Reasoner Virex Identity reasoner is the component responsible to cluster addresses and associate them to entities of the real word (a person, a service, a forum user), with the aid of address clustering and data collection. In particular, it needs to: Track clusters of addresses in realtime, while transactions are timestamped in the block chain. Merge clusters that belong to the same entity according to heuristics and user knowledge. Collect and store information about addresses. Address clustering Address clustering in Bitcoin is the activity that seeks to identify groups of addresses that are probably controlled by the same entity. It s possible to reach this goal to some extent, thanks to two well-known heuristics able to link addresses from the structure of transactions in which they are involved. Before presenting heuristics it s important to define the meaning of address control, as in [4]. In short, the controller of an address is the expected entity responsible for forming transactions on behalf of that address. Private key knowledge is a necessary requirement for address control, but not a sufficient one. Consider, for example, buying physical bitcoins from a vendor such as Casascius. Both creator and buyer of the physical bitcoin know the private key, but, according to the previous definition, the controller is the bitcoin buyer. Moreover, it s important to emphasise that this definition of address control, is quite different from account ownership. For example, a wallet service or an ex- 29

30 change service is the controller of all addresses it generates (often used by customers for deposits / withdrawals), but the funds in these addresses are owned by a wide variety of distinct users. The first linking heuristic is often referred as heuristic of multi-input transactions and was already identified by bitcoin creators: it s described in the privacy section of the original bitcoin paper [1]. Briefly, in the hypothesis that users don t share their private keys, if two addresses are used as inputs to the same transaction, then they are controlled by the same entity. For a more formal definition of this heuristic, it s possible to read [3] or [4]. The second linking heuristic is often called shadow address guessing [3] and aims at guessing, for each transaction, the address used for change. According to this heuristic, the address used for change is controlled by the same entity controlling input addresses. As Satoshi Nakamoto suggests in its paper, a new key pair should be used for each transaction to keep them from being linked to a common owner, and in fact, current bitcoin implementation generates, for each transaction, a new address for collecting change. Many techniques to identify this address are described in literature, but In this work the more stringent one will be used, i.e. the variant described in [3]: If there are two output addresses (one payee and one change address, which is true for the vast majority of transactions), and one of the two has never appeared before in the block chain, while other has, then we can safely assume that the one that never appeared before is the shadow address generated by the client to collect change back. 30

31 This version, although effective, has proven significantly less safe than the multi input transaction heuristic. [4] reports very high rate of false positives, ending up with a giant super-cluster containing the public keys of Mt.Gox, Instawallet, BitPay, and Silk Road. Moreover, it s possible to understand that two addresses are controlled by the same user thanks to data collection, by labelling addresses as being controlled by some known real-world entity. Data collection can be performed by transacting with real actors in the bitcoin ecosystem (e.g. playing with just-dice, depositing and withdrawing from an exchange), but always more frequently the primary source of this data is the big and unstructured word of the internet. A very huge dataset was collected and described in [4]: services include mining pools, wallets, exchanges, vendors and many others, while Bitiodine [3] includes scrapers for just-dice, bitcointalk, bitcoin-otc and many other sites. In addition, many users publicly claim their own addresses on the web, and many of these are collected at blockchain.info/tags. Data structure Identity reasoner core data structure is a graph in which nodes represent addresses and relationships represent links between addresses that state the two addresses are controlled by the same entity. Each node has the following properties: An address, a string representing the bitcoin address of the node A controller, a string identifying the controller of the bitcoin address A cluster id, a numeric identifying the cluster to which the address belongs There are three types of relationships: HEURISTIC1, directed, to identify a link between two addresses caused by a multi input transaction. 31

32 HEURISTIC2, directed, to identify a link between two addresses caused by change address guessing. SAME_CONTROLLER, undirected, to identify a link between two addresses caused by knowledge of shared control between the two addresses. Each relationship has a description property, giving information about its origin (e.g. for H1 and H2 relationships, the description is an identifier of the transaction that caused the linking). Given the identity reasoner data structure, it s possible to identify and track clusters of addresses using well knows graph algorithms. It is straightforward to compute connected components of a graph in linear time (in terms of the numbers of the vertices and edges of the graph) using either breadth-first search or depth first search. There are also efficient algorithms to dynamically track connected components of a graph as vertices and edges are added. Invariants of the identity reasoner database Some invariants are defined to keep the data structure consistent with knowledge extracted from the blockchain and from the web. INVARIANT0: There aren t two nodes in the graph with the same address. INVARIANT1: Two addresses are in the same connected components if and only if then they have the same cluster identifier. 32

33 INVARIANT2: Given a transaction, with M ordered input addresses and an irrelevant number of output addresses, then exists in the identity reasoner the following path, with edges of type HEURISTIC1: INVARIANT3: Given a transaction, with an input address in position 0 (first position), say, and a shadow address, then exists in the identity reasoner the following edge of type HEURISTIC2: INVARIANT4: Two addresses have the same controller property if and only if they are linked by a SAME_CONTROLLER relationship. Algorithms The data structure should be upgraded each time one of the following event happens A new transaction is confirmed in the blockchain. In this case, the identity reasoner should add new nodes corresponding to new addresses that appeared in the network and new edges, corresponding to heuristics that have been evaluated. E new controller for an address is discovered. In this case, the identity reasoner should merge clusters that are controlled by the same entity or separate addresses that are no more controlled by the same entity. 33

34 Primitive operations Virex identity reasoner graph is supposed to have, as well as setters and getters for node s properties, a series of primitive operations which don t guarantee identity reasoner invariants, but are useful to define more complex transactional operations described in subsequent paragraphs. 1. create_node(address): if a node with the specified address doesn t exist in the network, create the node. 2. create_relationship(address1, address2, type): if a relationship between address1 and address2, with the specified type, doesn t exist, create the relationship. 3. delete_relationship(address1, address2, type) 4. traverse_address(address): starting from the node with the specified address, and using a breadth/depth first algorithm, identify all nodes in the same connected components of the starting node, marking them with the same cluster id. 5. merge_clusters(address1, address2): merge clusters of selected addresses, without the need of re traversing a portion of the graph. Bootstrapping To initially bootstrap the identity reasoner graph, it s necessary to read the whole blockchain and importing into the graph all identified nodes (addresses) and edges relative to heuristics 1 and 2. Then we need to traverse all nodes of the graph in order to identify connected components for the first time. Adding a bitcoin transaction When a new transaction is timestamped into the blockchain, it s necessary to update the virex identity reasoner data structure with all new addresses and new heuristics. 34

35 From the point of view of the identity reasoner, a transaction can be considered as a set of addresses and a set of heuristics. In the following example a new transaction involves Address6 and Address4 and a new heuristic of type 1. When a new transaction is added to the identity reasoner, there is always no need to delete edges, hence there is no need to re-traverse portions of the graph. Setting controller Given an address A, if we are going to set its controller property to C, it s important to guarantee invariants for each possible state of the network. We summarise this state using three binary variables as shown in the following table. Six out of eight possible states are consistent with invariants and are therefore noteworthy. The address has a different controller property The address is linked to another one with a SAME_CONTROLLE R relationship Another address with the same controller as C exists in the network FALSE FALSE FALSE 1 FALSE FALSE TRUE 2 Case FALSE TRUE FALSE Inconsistent FALSE TRUE TRUE Inconsistent TRUE FALSE FALSE 3 TRUE FALSE TRUE 4 TRUE TRUE FALSE 5 35

36 The address has a different controller property The address is linked to another one with a SAME_CONTROLLE R relationship Another address with the same controller as C exists in the network TRUE TRUE TRUE 6 Case Each of the consistent state will be analyzed in details. Setting controller 1/6 In the first case, you just need to set the controller property to C for the given address. Setting controller 2/6 In this second case, after setting the controller property for Address6 to Alice, and adding an edge between Address6 and Address5, it s needed to merge clusters 2 and 3. Primitive operations to execute are the following: 1. set_controller( Address6, Alice ) 36

37 2. create_relationship( Address5, Address6, SAME_CONTROLLER) 3. merge_clusters( Address5, Address6 ) Setting controller 3/6 In this example you need to change controller for Address5 from Alice to Chris. The node Address5 is not connected to other nodes with a SAME_CONTROLLER relationship, and no node with controller Chris exists in the network, so you just need to change the controller property. Setting controller 4/6 In this example you need to change controller for Address5 from Alice to Bob. The node Address5 is not connected to other nodes with a SAME_CONTROLLER 37

38 relationship, but a node with controller Bob already exists in the network. Primitive operations to execute are the same as in case 2. Setting controller 5/6 In this example, you need to change the controller for Address1 from Bob to Chris. The SAME_CONTROLLER relationship between Address1 and Address5 has to be dropped and connected components involving these addresses need to be identified again. Primitive operations to be executed follows: 1. set_controller( Address1, Bob ) 2. delete_relationship( Address1, Address5,SAME_CONTROLLER) 3. traverse_address( Address1 ) 4. traverse_address( Address5 ) Setting controller 6/6 38

39 In the last example you need to change controller for Address6 from Alice to Bob. Primitive operations to be executed are the following: 1. set_controller( Address6, Bob ) 2. remove_relationship( Address6, Address5,SAME_CONTROLLER) 3. traverse_address( Address6 ) 4. traverse_address( Address5 ) 5. create_relationship( Address1, Address6,SAME_CONTROLLER) 6. merge_clusters( Address1, Address6 ) Considerations about address graph size The number of nodes is proportional to the number of addresses in the blockchain. Denoting with E[na/nt] the expected number of addresses addresses per transaction, we have that the number of nodes is Considering heuristic 1, we have an edge for each couple of addresses in a transaction, so the number of relationships of type HEURISTIC1 can be expressed as 39

40 Where nt is the number on transactions in the blockchain and E[nin/nt] is the expected number of inputs per transaction. Considering heuristic 2, we have at most a single shadow address per transaction, so an upper bound to the number of relationships of type HEURISTIC2 can be expresses as It s important to note that both the number of nodes and the number of edges are linear with the number of transactions. 40

41 Implementation details and experimental results Identity reasoner has been implemented using neo4j, a famous graph database. A graph database uses graph structures, such as nodes, edges, and properties to represent and store data and is a powerful tool for graph-like queries, for example traversing or computing the shortest path between two nodes. The resulting database size is about 12 GB, and the number of identified clusters for each heuristic, in the current blockchain (~ 35.7M addresses) is reported in the following table. Expected number of edges Actual number of edges Number of identified clusters Maximum cluster size (addresses) Average cluster size (addresses) H1 ~ 50 M ~ 50 M ~ 16M ~ 1M ~ 2.18 H2 ~ 39 M ~ 13 M ~ 23M ~ 3M ~ 1.54 H1+H2 ~ 89 M ~ 63 M ~ 8.5M ~ 13M ~ H1 H2 H1+H addr 2 addr 3 addr 4 addr 5 addr It is evident that implemented heuristic 2 is quite unsafe, since it ends up in a giant supercluster of about 40% of addresses and, for this reason, it won t be taken into account in the discussions that will follow. A refined implementation is described in [4] and should be implemented in the near future. 41

42 It s now time add some prior knowledge to the entity reasoner, and to link addresses to their supposed controllers. A first dataset is taken from the BitIodine software and is composed of about 70,000 addresses potentially belonging to the authors of CryptoLocker, a famous ransomware that locks computers running MS Windows, by encrypting important files with an RSA public key and then offers to decrypt the data if a payment through bitcoin is made. These addresses have been obtained by searching on google for extracts of the text of the money request displayed by the malware and by reading a Reddit thread in which victims and researchers post addresses 4. When adding this dataset to the entity reasoner, we end up in a giant supercluster of about 13M addresses that contains addresses controlled by both MtGox and Cryptolocker. This result has two potential implications, not excluding each other: the first is that there is some false information in the dataset, i.e. some addresses have been announced as controlled by Cryptolocker, but are actually controlled by, for example, Mt. Gox and have nothing to do with the famous malware. The second is that there could be a connection between Cryptolocker and MtGox, that can lead to think that Cryptolocker was a Mt. Gox customer, and some coins stored in addresses controlled by Mt. Gox are owned by Cryptolocker. This scenario highlights the central role that exchanges play in the bitcoin ecosystem, since nowadays goods and services are mostly payed with fiat currencies. virus_encrypts_instead 4 - _of/ 42

43 A second dataset is obtained form another Reddit thread started just after MtGox filed for bankruptcy 5, with the aim of trying to find an acknowledge for the story told by its CEO and to figure out the financial situation of the famous currency exchange. Many of these addresses belong to the second biggest cluster (about 500k addresses) whose representative is 1LNWw6yCxkUmkhArb2Nf2MPw6vG7u5WG7q and some of them belong to very small clusters with 1 up to 4 addresses, that are likely to belong to MtGox. Third, we were able to identify a Bitstamp 6 hot wallet thanks to the knowledge of one of its addresses 18xgnWy7HmrPnUsD6NJCc29nu4QL21vaYD. In the following picture we show, in linear scale, the size, in number of addresses, of the biggest four clusters and of hot wallets for known entities. As shown in the diagram, we can state that the second cluster is a MtGox hot wallet, but other big clusters controllers are unknown. 5 ou_have_used_to/

44 If we plot the portion of the identity reasoner graph relative to big clusters and report some clustering statistics, we are able to identify some false positives heuristic1 edges. Lessons learned Dealing with address clustering and identity reasoning is for sure one of the most fascinating challenge of bitcoin forensics analysis. We tried to describe a model, based on a graph data structure, that can incrementally evolve with the bitcoin network and with an increasing knowledge of address-controller associations. Unfortunately this model is incredibly subject to corruption, and if raw information (e.g. collected on the web) reveals affected by errors, it suddenly will bring clusters (especially the biggest ones, belonging to influential actors of the network) to tie together, hence distorting results. A model for adding edges to the graph is necessary, and should take into accounts the size of the clusters that are going to be merged, the amount and the quality of information collected. Bitcoin service providers are neither strong neither decentralised (yet), so users have strong interest in forensic analysis, as confirmed by discussions about Mt. Gox and 44

45 Cryptolocker cases. For this reason it would be very nice if they could play an active role in this activity, by reporting information they own about subjects they want to control, in a crowd-sourced fashion. 45

46 Chapter 7: The Query Engine Bitcoin query engine is the component responsible to answer users requests defined in requirements chapter. It s designed to be extremely fast and scalable. Balance queries Let s recall example queries about balances such as: What was the balance of 1dice at Mon, 26 May 14:07:44 UTC? What was the balance of the cluster containing 1dice at Mon, 26 May 14:07:44 UTC? Given an address a and an instant of time t, balance is a non negative value and can be obtained from bitcoin transactions with the following formula: where T is a transaction with Nt outputs (Nt>0) and Mt inputs (Mt may be 0 for mining transactions), whose timestamp isn t greater than t. In other words, evaluating the balance of an address means to sum over the boundaries (unspent outputs) of the transaction graph at time t: if an output is already spent at time t, it s necessary to cancel the correspondent positive addend using a negative one. Given this definition of address 46

47 balance, to evaluate a cluster balance is just necessary to sum over single balances of all addresses of that cluster. To efficiently compute balance of addresses and clusters, a data structure called balance element is defined. Balance elements are build starting from transactions: given a transaction T, timestamped at time t, with Mt input addresses in(i) and Nt output addresses out(i), then for each address in input addresses or in output addresses, a balance elements is defined as follows: field tx id address cluster id timestamp amount description (optional) identifies the transaction that generated the balance element. Can be either the hash of the transaction or a progressive identifier. identifies the address of the input/output identifies the cluster to which the address belongs the time at which the transaction was timestamped into the blockchain amount of satoshis transferred from / to the specified address. Amount is positive for output addresses and negative for input addresses. Using balance elements, the balance of an address can be evaluated aggregating all amounts of interesting balance elements. Let s consider, for instance, the transaction identified by 58545bb4cdbd0272df60efa969e1f c507d c6bfd113a9712c2d 7, with two inputs and two outputs and has been timestamped in the block with height and timestamp :41:

48 The selected transaction has two inputs (1Ai and16u) and 2 outputs (1Et and 1Q9) hence produces 4 balance elements with negative variations for balances of 1Ai and 16U and positive variations for balances of 1Et and 1Q9. In particular, the following balance elements are inserted into the query engine database. tx_id address cluster_id (invented) timestamp amount BE Ai ,01 BE U ,01 BE Et ,008 BE Q ,012 Since each transaction produces a balance elements for each input and a balance element for each output, the total number of balance elements can be estimated, starting from the number of transactions, using the following formula: and considering that expected values for the number of inputs and outputs of a given transaction is constant in time, it s possible to assume that the number of balance elements is linear with the number of transactions. 48

49 Flow queries Flow queries aim to answer questions like the following: What was the flow between address1 and address2 between May 26, 2013 and May 01,? What was the flow between the cluster controller by Alice and the cluster controlled by Bob between May 26, 2013 and May 01,? The issue of flow queries deserves more attention. In fact, because of multi input transactions, trying to define flow between address it s not trivial. For example, let s consider a transaction T, with two inputs and two outputs. How does this transaction contributes to the flow between in(0) and out(0)? It may be 5, but also 1, or 0. In short, flow between addresses is not well defined in case of multi input transactions. However, assuming that bitcoin users do not share their private keys (therefore making deterministic the first heuristic) it s possible to give a definition of the flow between addresses that is consistent with the flow between clusters. The flow between two addresses (a1 and a2) is the sum of output values deposited to a2 in transactions having a1 as inputs, divided by the total number of input addresses of each transaction. Considering the previous transaction, the flow between in(0) and out(0) is equals to 2.5, just like the flow between in(1) and out(0). Since in(0) and in(1) are in the same cluster 49

50 (first heuristic), it s possible to obtain the flow between clusters by summing over flows between addresses, obtaining a total flow between the cluster in(0)-in(1) and out(0) of 5. The following formula defines flow between addresses and clusters. Another issue with flows is mining. It would be interesting if it would be possible to link together balances and flow with the following formula: This though simple equation is not so obvious in bitcoin if we do not extend flow definition to transactions with no inputs (mining transactions). In particular the addresses set was extended with a special address, defined as mine address. So, if you ask to the virex query engine the flow between the mine address and another address, you are asking for the amount of bitcoins mined by that address. To efficiently compute flows between addresses and clusters, another simple data structure, called flow element is defined. Flow elements are also built starting from transac- 50

51 tions: given a transaction T, timestamped at time t, with Mt>0 input addresses (mining address is considered as an input address if the transaction has no input addresses) and Nt output addresses out(0) out(nt-1), then for each pair (in(i), out(j)) a flow element is defined as follows: field tx id payer payer_cid payee payee_cid timestamp flow description (optional) identifies the transaction that generated the flow element identifies the payer address in(i), or the mining address if the transaction has no inputs identifies the cluster of the payer address identifies the payee address out(j) identifies the cluster of the payee address the time at which the transaction was timestamped into the blockchain amount of satoshis transferred from payer to payee address evaluated using the definition of flow between addresses Using flow elements the flow between two addresses can be evaluated aggregating all amounts of interesting flow elements. Total number of flow elements can be estimated starting from the number of transactions. For each transaction, a flow element is generated for each pair of input and output addresses, as described by the following formula: Just like balance elements, it s possible to conclude that the number of flow elements is linear with the number of transactions. 51

52 Implementation details Virex query engine has been implemented using SQLite 8, an open source software li- brary that implements a self-contained, server less, zero-configuration, transactional SQL database engine. SQLite has been chosen for its simplicity and speed, but probably there may be many other alternative and faster solutions, in particular nosql databases for realtime analytics. Balance elements and flow elements are stored in two different tables; the first has the following columns 9: Colum name Column datatype type size tx_id INTEGER 1 to 8 bytes, depending on the magnitude address_id INTEGER 1 to 8 bytes cluster_id INTEGER 1 to 8 bytes timestamp INTEGER 1 to 8 bytes amount INTEGER 1 to 8 bytes row_id (hidden) INTEGER 1 to 8 bytes Moreover, in order to speed up selection operations on address, cluster and timestamp, and aggregation operations on amounts, two covering indices 10 are defined. CREATE INDEX x_balance_elements_covering ON balance_elements (address_id,timestamp,amount); CREATE INDEX x_balance_clusters_covering ON balance_elements (cluster_id,timestamp,amount); In sqlite indices are implemented using a B-Tree, so the index size is proportional to the number of indexed elements. Note that each index element has, except from indexed fields, an hidden row id one of type INTEGER

53 In the other table, flow elements, the following columns and indices have been defined: Colum name Column datatype type size tx_id INTEGER 1 to 8 bytes payer INTEGER 1 to 8 bytes payer_cid INTEGER 1 to 8 bytes payee INTEGER 1 to 8 bytes payee_cid INTEGER 1 to 8 bytes timestamp INTEGER 1 to 8 bytes flow REAL 8 byte row_id (hidden) INTEGER 1 to 8 bytes CREATE INDEX x_flow_elements_covering ON flow_elements(payer,payee,timestamp,flow); CREATE INDEX x_flow_clusters_covering ON flow_elements (payer_cid,payee_cid,timestamp,flow); In the following tables, table size for balance elements and flow elements and respective indexes, are reported. As highlighted, there is a slight mismatch between the estimate and the actual value of the number of flow elements. Balance Elements Maximum row size per element Expected number of elements Actual number of elements Maximum expected size Actual size 48 bytes ~188M ~188M ~8.4 GB ~5.5 GB Flow Elements 64 bytes ~230M ~283M ~16.8 GB ~ 11 GB Maximum row size per element Index size per element Actual number of elements Maximum expected size Actual size Balance Elements Flow Elements 48 bytes bytes ~188M ~ 17.6 GB ~ 14.7 GB 64 bytes bytes ~283M ~ 39 GB ~ 27 GB 53

54 Chapter 8: The graphical user interface Virex graphical user interface is a web based, single page application, that shows a stacked balance-time graph of different bitcoin entities with arrows representing flows among them. The interaction starts with the input search box (1) in the navigation bar, on the top of the page, that allows the user to find an existing address or a controller he knows. For example it s possible to search for an address (e.g. 1PeppeMEUXx6XgjubBnEQKtay2xpefnCZT), or for the name of a controller, (e.g. Cryptolocker, ). By clicking on a string that appears on the type-ahead, we add the selected entity to a list of interesting entities (2) and it s balance is plotted (3) on a graph with a logarithmic scale. 54

55 It s possible to remove an entity from the diagram by clicking on the corresponding X in the list of entities (2). It s important to notice that, as confirmed by the checkbox on the navigation bar (4), virex is displaying a clustered diagram, i.e. the selected entity is considered as a representative for its cluster, and the sum of balances of all addresses in the same cluster is shown (according to the definition of balance for clusters given in the chapter The Query Engine ). It s possible to show the balance of the sole selected entity by turning off the checkbox (4). 55

56 We can now switch to the Profile page (5) to get some information about selected entities (cluster id, cluster size, current balance and alleged controller). It s also possible to modify current controller by clicking on the edit button (6) and then confirm with the checkmark (7). When changing controller of addresses, the entity reasoner is involved to check that all invariants (SAME_CONTROLLER above all) are verified. In particular, if a controller is set for an address and another address that has the same controller exists in the entity reasoner database, than the clusters of these addresses are merged into one. 56

57 Now let s add another entity to the list of interested entities, by searching it with the text box on the navigation bar. In addition to balances, you can now see flows among interesting entities (8). Bitcoin flows are represented with arrows and the amount of bitcoin transferred from an entity to another in the interval of time delimited by white vertical lines is shown next to the arrow. The entities displayed on the graph can be swapped by using the left column and dragging the selected entity in the desired position. Moreover, it s possible to interact with the diagram by clicking and dragging to select a period of time to zoom in (10). 57

58 Alternatively, you can set up x-axis by showing the collapsed time setup window (11). In the panel that appears, you can select the starting and ending date (13), and the number of thicks (12). In the above screenshot we have reduced the number of ticks to 5, and hidden flows using the checkbox on the navigation bar (14). In the picture above, the number of thicks has been increased and a more detailed balance diagram is shown. Unfortunately it s impossible to see dates, because they are overlapping each other. 58

59 Where are mined bitcoins? It s possible to search for the mine address (15) and the amount of mined bitcoins appears in the diagram. 59

60 Implementation details Virex graphical user interface has been implemented using html5 technologies (html, css, javascript), with the aid of many different libraries. The web application is completely orchestrated thanks to the angular-js 11 library. This open source framework adds a model-view-controller abstraction over the top of DOM manipulation and excels at building dynamic views. User interface components, such as navigation bar, type-ahead and date-time pickers are bootstrap 12 components completely rewritten natively in angularjs from the angular-ui project 13 team. The draggable list of interesting addresses has been implemented using angularjs implementation of jquery-ui draggable component 14. The balance diagram is implemented using the SVG stacked diagram component of the Data-Driven-Documents library 15, also known as d3. D3 is a powerful library that allows you to bind arbitrary data to the DOM, and then apply data-driven transformations to the document. Flow arrows are completely written from scratch using the D3 library

61 Chapter 9: Experimental results Let s start from studying the biggest four clusters and flows of money among them. These clusters have been presented in chapter The Identity Reasoner, have a number of addresses in the order of hundreds of thousands, and are likely to be controlled by automated services, since it s impossible to think that such amount of addresses have been manually generated by a single person. According to our data set, the second one is a MtGox hot wallet. As suggested by the diagram, all big clusters are early adopters and Mt.Gox is active from 2010 Oct. If we zoom on the period of time that goes from 2013 March to May, and increase the number of thicks for the diagram to 20, we end up with a figure that shows a significant drop in Mt.Gox, cluster3 and cluster2 balances, while cluster1 has an approximately constant balance (remember that the scale is logarithmic). 61

62 If we display flows among these entities in the period of time that goes from 2009 Jan to May (full bitcoin history), we see important amount of flows between MtGox and clusters 3 and 4, while cluster 1 remains quite isolated. 62

63 These results can lead to think that clusters 3 and 4 are controlled by Mt.Gox too, but this is just an hypothesis. They all have the same descending trend (potentially due to transactions malleability attacks in the period from 2013 March to March) and are highly coupled by bitcoin flows (transfer of funds among hot wallets). While heuristics link together addresses that belong to the same wallet (and hence have the same controller), this line of reasoning, that uses flows and balances, with the aid of massive data collection could enable for clustering together distinct wallets. We will now demonstrate that cluster1 and cluster4 are used to launder bitcoins stolen in the flexcoin theft case. Thanks to information provided by flexcoin, we were able to identify two of its hot wallets, whose representatives are addresses 1GEhfbj and 1DSD3B3. These addresses are respectively coloured in purple and sky blue. 63

64 Moreover, starting from accused addresses 1NDkeva and 1QFcC5J, declared by flexcoin as belonging to the thief, we identified a cluster of seven addresses of the same wallet, coloured in light green. We focused on the period of time that goes from 2013 March 02 and 2013 March 03 and visualise theft flows: as described by flexcoin, the thief deposited bitcoin to one of flexcoin addresses (from 21:32 to 00:46 CEST), then transferred 864 bitcoin to his wallet (00:46 to 04:01 CEST), and soon after emptied it. We then use the query engine, directly and without graphical user interface, to identify clusters that received these bitcoins. A part of them (408) flow to cluster4 (green, MtGox?), that is also a common friend between flexcoin and its thief, in the sense that there are many flows from 1DSD3B3 to Cluster4 that date back also to If further investigations should confirm that cluster4 actually belongs to Mt.Gox, this is just another chance to highlight the current centrality of bitcoin exchanges in forensic analysis. 64

65 Another part of the stolen bitcoins (185) flow to cluster1 (violet), whose controller is currently unknown. Flexcoin has no flow relationships with cluster1. Moreover, 175 bitcoins flow from the theft to a single and unclustered address 12Cxy5 and are then spent, but we didn t follow this track. As depicted by virex diagrams these transactions happen very quickly and money is stolen and laundered in less than half a day. 65

66 Future directions Virex4bitcoin enables for many investigations on the blockchain. Thanks to its graphical user interface is possible to easily get an idea about what s going on the bitcoin network. Further development are listed below: Implementation of the real time tracker. The architecture should be scalable and ready for real time tracking of confirmed transactions. Formal definition of other flow based queries and their implementation. During a forensic exploration of the bitcoin transaction graph, questions like Where does this money amount goes?, Who is the entity with which this address interacted in this period of time? usually arise. These questions lack for a formal definition and implementation. Moreover, simple GUI interactions should be considered to allow for a relaxing and intriguing forensic session that includes this kind of queries. Flow based approach to address clustering. Do flows include significant patterns that enable further heuristics, in order to cluster wallets controlled by the same entity? Statistical approach to address tagging. The dataset collected on the web has proven to be affected by errors. A statistical approach, with the aid of machine learning techniques, could reduce the number of incorrect tags. 66

67 References (1) Satoshi Nakamoto - Bitcoin: A peer-to-peer electronic cash system (2) European Central Bank - Virtual Currency Schemes (3) Michele Spagnuolo, Federico Maggi and Stefano Zanero - BitIodine: Extracting Intelligence from the Bitcoin Network (4) Meiklejohn, Pomarole, Jordan, Levchenko, McCoy, Voelker, Savage - A Fistful of Bitcoins: Characterising Payments Among Men with No Names (5) IEEE Spectrum Various Authors - The Cryptoanarchists Answer to Cash (6) Dorit Ron, Adi Shamir - Quantitative Analysis of the Full Bitcoin Transaction Graph (7) A.Back - Hashcash - a denial of service counter-measure (8) Jörg Becker, Dominic Breuker, Tobias Heide, Justus Holler, Hans Peter Rauer, and Rainer Böhme - Can We Afford Integrity by Proof-of-Work? Scenarios Inspired by the Bitcoin Currency (9) Fergal Reid, Martin Harrigan - An Analysis of Anonymity in the Bitcoin System (10)Elli Androulaki, Ghassan O. Karame, Marc Roeschlin, Tobias Scherer, and Srdjan Capkun - Evaluating User Privacy in Bitcoin (11)Christian Decker, Roger Wattenhofer - Bitcoin Transaction Malleability and MtGox 67

BitIodine: extracting intelligence from the Bitcoin network

BitIodine: extracting intelligence from the Bitcoin network BitIodine: extracting intelligence from the Bitcoin network Michele Spagnuolo http://miki.it [email protected] @mikispag Bitcoin BitIodine About Bitcoin Decentralized, global digital currency A global

More information

An Analysis of the Bitcoin Electronic Cash System

An Analysis of the Bitcoin Electronic Cash System An Analysis of the Bitcoin Electronic Cash System Danielle Drainville University of Waterloo December 21, 2012 1 Abstract In a world that relies heavily on technology, privacy is sought by many. Privacy,

More information

Distributed Public Key Infrastructure via the Blockchain. Sean Pearl [email protected] April 28, 2015

Distributed Public Key Infrastructure via the Blockchain. Sean Pearl smp1697@cs.rit.edu April 28, 2015 Distributed Public Key Infrastructure via the Blockchain Sean Pearl [email protected] April 28, 2015 Overview Motivation: Electronic Money Example TTP: PayPal Bitcoin (BTC) Background Structure Other

More information

Bitcoin: A Peer-to-Peer Electronic Cash System

Bitcoin: A Peer-to-Peer Electronic Cash System Bitcoin: A Peer-to-Peer Electronic Cash System Satoshi Nakamoto [email protected] www.bitcoin.org Abstract. A purely peer-to-peer version of electronic cash would allow online payments to be sent directly

More information

Bitcoin: Concepts, Practice, and Research Directions

Bitcoin: Concepts, Practice, and Research Directions Bitcoin: Concepts, Practice, and Research Directions Ittay Eyal, Emin Gün Sirer Computer Science, Cornell University DISC Bitcoin Tutorial, October 2014 Barter Gold Fiat 2 Barter Gold Fiat Bitcoin 2008:

More information

2. Elections We define an electronic vote as a chain of digital signatures. Each owner transfers the vote to the candidate or legislation by digitally

2. Elections We define an electronic vote as a chain of digital signatures. Each owner transfers the vote to the candidate or legislation by digitally Abstract A purely peer to peer version of electronic vote would allow online votes to be sent directly from one party to another without going through a central voting register. Digital signatures provide

More information

Cryptography: Authentication, Blind Signatures, and Digital Cash

Cryptography: Authentication, Blind Signatures, and Digital Cash Cryptography: Authentication, Blind Signatures, and Digital Cash Rebecca Bellovin 1 Introduction One of the most exciting ideas in cryptography in the past few decades, with the widest array of applications,

More information

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390

The Role and uses of Peer-to-Peer in file-sharing. Computer Communication & Distributed Systems EDA 390 The Role and uses of Peer-to-Peer in file-sharing Computer Communication & Distributed Systems EDA 390 Jenny Bengtsson Prarthanaa Khokar [email protected] [email protected] Gothenburg, May

More information

BACK OFFICE MANUAL. Version 1.2 - Benjamin Bommhardt DRAGLET GMBH Bergsonstraße 154 81245 München - Germany

BACK OFFICE MANUAL. Version 1.2 - Benjamin Bommhardt DRAGLET GMBH Bergsonstraße 154 81245 München - Germany BACK OFFICE MANUAL Version 1.2 - Benjamin Bommhardt DRAGLET GMBH Bergsonstraße 154 81245 München - Germany Contents Introduction... 3 Overview of cxadmin... 4 Dashboard... 4 Customer overview... 5 Markets...

More information

Quantitative Analysis of the Full Bitcoin Transaction Graph

Quantitative Analysis of the Full Bitcoin Transaction Graph Quantitative Analysis of the Full Bitcoin Transaction Graph Dorit Ron and Adi Shamir Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Israel {dorit.ron,adi.shamir}@weizmann.ac.il

More information

Princeton University Computer Science COS 432: Information Security (Fall 2013)

Princeton University Computer Science COS 432: Information Security (Fall 2013) Princeton University Computer Science COS 432: Information Security (Fall 2013) This test has 13 questions worth a total of 50 points. That s a lot of questions. Work through the ones you re comfortable

More information

HACKER INTELLIGENCE INITIATIVE. The Secret Behind CryptoWall s Success

HACKER INTELLIGENCE INITIATIVE. The Secret Behind CryptoWall s Success HACKER INTELLIGENCE INITIATIVE The Secret Behind 1 1. Introduction The Imperva Application Defense Center (ADC) is a premier research organization for security analysis, vulnerability discovery, and compliance

More information

Orwell. From Bitcoin to secure Domain Name System

Orwell. From Bitcoin to secure Domain Name System Orwell. From Bitcoin to secure Domain Name System Michał Jabczyński, Michał Szychowiak Poznań University of Technology Piotrowo 2, 60-965 Poznań, Poland {Michal.Jabczynski, Michal.Szychowiak}@put.poznan.pl

More information

Bitmessage: A Peer to Peer Message Authentication and Delivery System

Bitmessage: A Peer to Peer Message Authentication and Delivery System Bitmessage: A Peer to Peer Message Authentication and Delivery System Jonathan Warren [email protected] www.bitmessage.org November 27, 2012 Abstract. We propose a system that allows users to securely

More information

Electronic Cash Payment Protocols and Systems

Electronic Cash Payment Protocols and Systems Electronic Cash Payment Protocols and Systems Speaker: Jerry Gao Ph.D. San Jose State University email: [email protected] URL: http://www.engr.sjsu.edu/gaojerry May, 2000 Presentation Outline - Overview

More information

Virtual Currencies and their Relevance to Digital Forensics PRESTON MILLER

Virtual Currencies and their Relevance to Digital Forensics PRESTON MILLER Virtual Currencies and their Relevance to Digital Forensics PRESTON MILLER 1 Presentation Overview Virtual Currency Cryptocurrency Bitcoin Basics: Obtaining, Usage, and History Digital Forensics Relevance

More information

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE CPSC 467a: Cryptography and Computer Security Notes 1 (rev. 1) Professor M. J. Fischer September 3, 2008 1 Course Overview Lecture Notes 1 This course is

More information

msigna Getting Started

msigna Getting Started msigna Getting Started Thank you for deciding to try msigna, the most powerful secure cryptocoin storage solution available. We think you will enjoy using msigna as it is, but it is still a product under

More information

Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance

Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance Chapter 3: Data Mining Driven Learning Apprentice System for Medical Billing Compliance 3.1 Introduction This research has been conducted at back office of a medical billing company situated in a custom

More information

Security visualisation

Security visualisation Security visualisation This thesis provides a guideline of how to generate a visual representation of a given dataset and use visualisation in the evaluation of known security vulnerabilities by Marco

More information

Service Discovery with the Google Android Mobile Platform

Service Discovery with the Google Android Mobile Platform tesi di laurea Service Discovery with the Google Android Mobile Platform Anno Accademico 2007/2008 relatore Ch.mo prof. Stefano Russo correlatore Ing. Marcello Cinque candidato Marco Faiella Matr. 885/139

More information

Blockchain, Throughput, and Big Data Trent McConaghy

Blockchain, Throughput, and Big Data Trent McConaghy Blockchain, Throughput, and Big Data Trent McConaghy Bitcoin Startups Berlin Oct 28, 2014 Conclusion Outline Throughput numbers Big data Consensus algorithms ACID Blockchain Big data? Throughput numbers

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

CoinAMI Coin-Application Mediator Interface

CoinAMI Coin-Application Mediator Interface Bilkent University Department of Computer Engineering CoinAMI Coin-Application Mediator Interface Supervisor Can Alkan Members Ahmet Kerim Şenol Alper Gündoğdu Halil İbrahim Özercan Muhammed Yusuf Özkaya

More information

Tableau Server Scalability Explained

Tableau Server Scalability Explained Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Adversary Modelling 1

Adversary Modelling 1 Adversary Modelling 1 Evaluating the Feasibility of a Symbolic Adversary Model on Smart Transport Ticketing Systems Authors Arthur Sheung Chi Chan, MSc (Royal Holloway, 2014) Keith Mayes, ISG, Royal Holloway

More information

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project

BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project BGP Prefix Hijack: An Empirical Investigation of a Theoretical Effect Masters Project Advisor: Sharon Goldberg Adam Udi 1 Introduction Interdomain routing, the primary method of communication on the internet,

More information

A Fistful of Bitcoins: Characterizing Payments Among Men with No Names

A Fistful of Bitcoins: Characterizing Payments Among Men with No Names A Fistful of Bitcoins: Characterizing Payments Among Men with No Names Sarah Meiklejohn Marjori Pomarole Grant Jordan Kirill Levchenko Damon McCoy Geoffrey M. Voelker Stefan Savage University of California,

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Bitcoin Thief Tutorial

Bitcoin Thief Tutorial The complete Bitcoin Thief Tutorial SESSION ID: HTA-R02 Uri Rivner Head of Cyber Strategy BioCatch Etay Maor PMM Cyber Trusteer, an IBM Company The first few things you should know about Bitcoin Most people

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks [email protected] ([email protected]) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Using the Bitcoin Blockchain for secure, independently verifiable, electronic votes. Pierre Noizat - July 2014

Using the Bitcoin Blockchain for secure, independently verifiable, electronic votes. Pierre Noizat - July 2014 Using the Bitcoin Blockchain for secure, independently verifiable, electronic votes. Pierre Noizat - July 2014 The problem with proprietary voting systems Existing electronic voting systems all suffer

More information

Bitcoin: Regulations and Legal Risks for a New Virtual Currency

Bitcoin: Regulations and Legal Risks for a New Virtual Currency Bitcoin: Regulations and Legal Risks for a New Virtual Currency Presented by: John Casey and Adam Holbrook Copyright 2014 by K&L Gates LLP. All rights reserved. GOALS Learn to speak the Bitcoin language:

More information

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction Chapter-1 : Introduction 1 CHAPTER - 1 Introduction This thesis presents design of a new Model of the Meta-Search Engine for getting optimized search results. The focus is on new dimension of internet

More information

Public Key Infrastructure (PKI)

Public Key Infrastructure (PKI) Public Key Infrastructure (PKI) In this video you will learn the quite a bit about Public Key Infrastructure and how it is used to authenticate clients and servers. The purpose of Public Key Infrastructure

More information

An Analysis of Anonymity in Bitcoin Using P2P Network Traffic

An Analysis of Anonymity in Bitcoin Using P2P Network Traffic An Analysis of Anonymity in Bitcoin Using P2P Network Traffic Philip Koshy, Diana Koshy, and Patrick McDaniel Pennsylvania State University, University Park, PA 16802, USA Abstract. Over the last 4 years,

More information

Java Bit Torrent Client

Java Bit Torrent Client Java Bit Torrent Client Hemapani Perera, Eran Chinthaka {hperera, echintha}@cs.indiana.edu Computer Science Department Indiana University Introduction World-wide-web, WWW, is designed to access and download

More information

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

The Mathematics of the RSA Public-Key Cryptosystem

The Mathematics of the RSA Public-Key Cryptosystem The Mathematics of the RSA Public-Key Cryptosystem Burt Kaliski RSA Laboratories ABOUT THE AUTHOR: Dr Burt Kaliski is a computer scientist whose involvement with the security industry has been through

More information

Authentication Types. Password-based Authentication. Off-Line Password Guessing

Authentication Types. Password-based Authentication. Off-Line Password Guessing Authentication Types Chapter 2: Security Techniques Background Secret Key Cryptography Public Key Cryptography Hash Functions Authentication Chapter 3: Security on Network and Transport Layer Chapter 4:

More information

Counter Expertise Review on the TNO Security Analysis of the Dutch OV-Chipkaart. OV-Chipkaart Security Issues Tutorial for Non-Expert Readers

Counter Expertise Review on the TNO Security Analysis of the Dutch OV-Chipkaart. OV-Chipkaart Security Issues Tutorial for Non-Expert Readers Counter Expertise Review on the TNO Security Analysis of the Dutch OV-Chipkaart OV-Chipkaart Security Issues Tutorial for Non-Expert Readers The current debate concerning the OV-Chipkaart security was

More information

CSCE 465 Computer & Network Security

CSCE 465 Computer & Network Security CSCE 465 Computer & Network Security Instructor: Dr. Guofei Gu http://courses.cse.tamu.edu/guofei/csce465/ Public Key Cryptogrophy 1 Roadmap Introduction RSA Diffie-Hellman Key Exchange Public key and

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

Probability and Expected Value

Probability and Expected Value Probability and Expected Value This handout provides an introduction to probability and expected value. Some of you may already be familiar with some of these topics. Probability and expected value are

More information

StartPeeps.com The Start Of A New Social Era

StartPeeps.com The Start Of A New Social Era StartPeeps.com The Start Of A New Social Era Introduction Something is wrong with the internet today. Companies like Facebook, Google, Ebay, Paypal and numerous others make billions of dollars every month.

More information

Security in Electronic Payment Systems

Security in Electronic Payment Systems Security in Electronic Payment Systems Jan L. Camenisch, Jean-Marc Piveteau, Markus A. Stadler Institute for Theoretical Computer Science, ETH Zurich, CH-8092 Zurich e-mail: {camenisch, stadler}@inf.ethz.ch

More information

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.

How to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc. How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background

More information

INTRUSION PREVENTION AND EXPERT SYSTEMS

INTRUSION PREVENTION AND EXPERT SYSTEMS INTRUSION PREVENTION AND EXPERT SYSTEMS By Avi Chesla [email protected] Introduction Over the past few years, the market has developed new expectations from the security industry, especially from the intrusion

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Lab 11. Simulations. The Concept

Lab 11. Simulations. The Concept Lab 11 Simulations In this lab you ll learn how to create simulations to provide approximate answers to probability questions. We ll make use of a particular kind of structure, called a box model, that

More information

Persistent Binary Search Trees

Persistent Binary Search Trees Persistent Binary Search Trees Datastructures, UvA. May 30, 2008 0440949, Andreas van Cranenburgh Abstract A persistent binary tree allows access to all previous versions of the tree. This paper presents

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall. 5401 Butler Street, Suite 200 Pittsburgh, PA 15201 +1 (412) 408 3167 www.metronomelabs.com

More information

Mobile Wallet Platform. Next generation mobile wallet solution

Mobile Wallet Platform. Next generation mobile wallet solution Mobile Wallet Platform Next generation mobile wallet solution Introduction to mwallet / Mobile Wallet Mobile Wallet Account is just like a Bank Account User s money lies with the Mobile Wallet Operator

More information

G Data Mobile MalwareReport. Half-Year Report July December 2013. G Data SecurityLabs

G Data Mobile MalwareReport. Half-Year Report July December 2013. G Data SecurityLabs G Data Mobile MalwareReport Half-Year Report July December 2013 G Data SecurityLabs Contents At a glance... 2 Android malware: share of PUPs increasing significantly... 3 Android.Application consists of

More information

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad

Test Run Analysis Interpretation (AI) Made Easy with OpenLoad Test Run Analysis Interpretation (AI) Made Easy with OpenLoad OpenDemand Systems, Inc. Abstract / Executive Summary As Web applications and services become more complex, it becomes increasingly difficult

More information

Image Search by MapReduce

Image Search by MapReduce Image Search by MapReduce COEN 241 Cloud Computing Term Project Final Report Team #5 Submitted by: Lu Yu Zhe Xu Chengcheng Huang Submitted to: Prof. Ming Hwa Wang 09/01/2015 Preface Currently, there s

More information

2.4: Authentication Authentication types Authentication schemes: RSA, Lamport s Hash Mutual Authentication Session Keys Trusted Intermediaries

2.4: Authentication Authentication types Authentication schemes: RSA, Lamport s Hash Mutual Authentication Session Keys Trusted Intermediaries Chapter 2: Security Techniques Background Secret Key Cryptography Public Key Cryptography Hash Functions Authentication Chapter 3: Security on Network and Transport Layer Chapter 4: Security on the Application

More information

The Feasibility and Application of using a Zero-knowledge Protocol Authentication Systems

The Feasibility and Application of using a Zero-knowledge Protocol Authentication Systems The Feasibility and Application of using a Zero-knowledge Protocol Authentication Systems Becky Cutler [email protected] Mentor: Professor Chris Gregg Abstract Modern day authentication systems

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Associate Prof. Dr. Victor Onomza Waziri

Associate Prof. Dr. Victor Onomza Waziri BIG DATA ANALYTICS AND DATA SECURITY IN THE CLOUD VIA FULLY HOMOMORPHIC ENCRYPTION Associate Prof. Dr. Victor Onomza Waziri Department of Cyber Security Science, School of ICT, Federal University of Technology,

More information

NEW DIGITAL SIGNATURE PROTOCOL BASED ON ELLIPTIC CURVES

NEW DIGITAL SIGNATURE PROTOCOL BASED ON ELLIPTIC CURVES NEW DIGITAL SIGNATURE PROTOCOL BASED ON ELLIPTIC CURVES Ounasser Abid 1, Jaouad Ettanfouhi 2 and Omar Khadir 3 1,2,3 Laboratory of Mathematics, Cryptography and Mechanics, Department of Mathematics, Fstm,

More information

At a recent industry conference, global

At a recent industry conference, global Harnessing Big Data to Improve Customer Service By Marty Tibbitts The goal is to apply analytics methods that move beyond customer satisfaction to nurturing customer loyalty by more deeply understanding

More information

Database Security. The Need for Database Security

Database Security. The Need for Database Security Database Security Public domain NASA image L-1957-00989 of people working with an IBM type 704 electronic data processing machine. 1 The Need for Database Security Because databases play such an important

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

Quantifind s story: Building custom interactive data analytics infrastructure

Quantifind s story: Building custom interactive data analytics infrastructure Quantifind s story: Building custom interactive data analytics infrastructure Ryan LeCompte @ryanlecompte Scala Days 2015 Background Software Engineer at Quantifind @ryanlecompte [email protected] http://github.com/ryanlecompte

More information

Using In-Memory Computing to Simplify Big Data Analytics

Using In-Memory Computing to Simplify Big Data Analytics SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed

More information

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation object-oriented programming language (such as

More information

SSL. Secure Sockets Layer. - a short summary - By Christoph Gutmann and Khôi Tran

SSL. Secure Sockets Layer. - a short summary - By Christoph Gutmann and Khôi Tran SSL Secure Sockets Layer - a short summary - By Christoph Gutmann and Khôi Tran Page 1 / 7 Table of contents 1. Brief historic outline of SSL 2. Why did SSL come to life? 3. How does SSL work? 4. Where

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

How To Protect Your Data From Being Hacked On Security Cloud

How To Protect Your Data From Being Hacked On Security Cloud F-SECURE SECURITY CLOUD Purpose, function and benefits October 2015 CONTENTS F-Secure Security Cloud in brief 2 Security Cloud benefits 3 How does Security Cloud work? 4 Security Cloud metrics 4 Security

More information

Everything you need to know about flash storage performance

Everything you need to know about flash storage performance Everything you need to know about flash storage performance The unique characteristics of flash make performance validation testing immensely challenging and critically important; follow these best practices

More information

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE

QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QLIKVIEW ARCHITECTURE AND SYSTEM RESOURCE USAGE QlikView Technical Brief April 2011 www.qlikview.com Introduction This technical brief covers an overview of the QlikView product components and architecture

More information

To use MySQL effectively, you need to learn the syntax of a new language and grow

To use MySQL effectively, you need to learn the syntax of a new language and grow SESSION 1 Why MySQL? Session Checklist SQL servers in the development process MySQL versus the competition To use MySQL effectively, you need to learn the syntax of a new language and grow comfortable

More information

Graph Theory Problems and Solutions

Graph Theory Problems and Solutions raph Theory Problems and Solutions Tom Davis [email protected] http://www.geometer.org/mathcircles November, 005 Problems. Prove that the sum of the degrees of the vertices of any finite graph is

More information

Network Security. Computer Networking Lecture 08. March 19, 2012. HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Network Security. Computer Networking Lecture 08. March 19, 2012. HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23 Network Security Computer Networking Lecture 08 HKU SPACE Community College March 19, 2012 HKU SPACE CC CN Lecture 08 1/23 Outline Introduction Cryptography Algorithms Secret Key Algorithm Message Digest

More information

Victor Shoup Avi Rubin. fshoup,[email protected]. Abstract

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,[email protected] Abstract In this paper, we investigate a method by which smart

More information

Microsoft Axapta Inventory Closing White Paper

Microsoft Axapta Inventory Closing White Paper Microsoft Axapta Inventory Closing White Paper Microsoft Axapta 3.0 and Service Packs Version: Second edition Published: May, 2005 CONFIDENTIAL DRAFT INTERNAL USE ONLY Contents Introduction...1 Inventory

More information

More information >>> HERE <<<

More information >>> HERE <<< More information >>> HERE http://urlzz.org/bitcoinwa/pdx/ftpl1585/ Tags: review bitcoin wealth alliance, ## download, buy

More information

The University of Jordan

The University of Jordan The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S

More information

Payment systems. Tuomas Aura CSE-C3400 Information security. Aalto University, autumn 2015

Payment systems. Tuomas Aura CSE-C3400 Information security. Aalto University, autumn 2015 Payment systems Tuomas Aura CSE-C3400 Information security Aalto University, autumn 2015 Outline 1. Card payment 2. (Anonymous digital cash) 3. Bitcoin 2 CARD PAYMENT 3 Bank cards Credit or debit card

More information

SiteCelerate white paper

SiteCelerate white paper SiteCelerate white paper Arahe Solutions SITECELERATE OVERVIEW As enterprises increases their investment in Web applications, Portal and websites and as usage of these applications increase, performance

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

A Secure RFID Ticket System For Public Transport

A Secure RFID Ticket System For Public Transport A Secure RFID Ticket System For Public Transport Kun Peng and Feng Bao Institute for Infocomm Research, Singapore Abstract. A secure RFID ticket system for public transport is proposed in this paper. It

More information

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring University of Victoria Faculty of Engineering Fall 2009 Work Term Report Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring Department of Physics University of Victoria Victoria, BC Michael

More information

The Conference on Computers, Freedom, and Privacy. 'Overseeing' the Poor: Technology Privacy Invasions of Vulnerable Groups

The Conference on Computers, Freedom, and Privacy. 'Overseeing' the Poor: Technology Privacy Invasions of Vulnerable Groups 'Overseeing' the Poor: Technology Privacy Invasions of Vulnerable Groups Birny Birnbaum Center for Economic Justice www.cej-online.org Credit Scoring: 21 st Century Redlining and the End of Insurance Why

More information

Electronic Payments. EITN40 - Advanced Web Security

Electronic Payments. EITN40 - Advanced Web Security Electronic Payments EITN40 - Advanced Web Security 1 Card transactions Card-Present Smart Cards Card-Not-Present SET 3D Secure Untraceable E-Cash Micropayments Payword Electronic Lottery Tickets Peppercoin

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

TOP TRUMPS Comparisons of how to pay for goods and services online

TOP TRUMPS Comparisons of how to pay for goods and services online Cash Cash is legal tender in the form of bank notes and coins Small value purchases e.g. cafes, shops Pocket money Repaying friends Cash is physically transferred from one person to the next, usually face-to-face

More information