BitIodine: extracting intelligence from the Bitcoin network Michele Spagnuolo http://miki.it michele@spagnuolo.me @mikispag
Bitcoin BitIodine
About Bitcoin Decentralized, global digital currency A global peer-to-peer network, across which transactions are broadcast Can support strong anonymity - implementations not strongly anonymous Recent clarification to US regulations - to be considered as foreign currency and not legal tender
Open money Open money trust-no-one currency/commodity that isn t subject to manipulation by central banks or corporations designed for security, founded on distributed cryptography crypto-currency Quanta sit vis in numeris... T. Struve, De Numeris, 1421
History Digital currency based in cryptography - 1993, 1998. Bitcoin whitepaper - 2008, by "Satoshi Nakamoto" (pseudonym) First client in January 2009 Open Source No single point of failure
Who uses Bitcoin? A lot of legitimate uses of Bitcoin http://weusecoins.org
Why use Bitcoin? Speed and price No central authority No setup Better privacy No counterfeit, no chargebacks No account freezing Algorithmically known inflation
Trading goods for Bitcoin
Also accepting Bitcoin...
Who also uses Bitcoin? Top 20 categories $1.2M rev/month 220 categories, ranging from digital goods to various kinds of narcotics or prescription medicines
Some stats about Bitcoin As of June 1: 1 BTC is traded for 128.4 USD (max: 266 USD on April 11, min: 50 USD on April 16) $1.5B market capitalization 30,000 BTC / 3.8M$ sent per hour Approximately 60,000 transactions per day
Key concepts Address Wallet Transaction Block Blockchain Mining / Generation
Address and Wallet An address is a string like 1MikiSPbrhCFk7S4wzZP7gQqhwWH866DCb generated by a Bitcoin client together with the private key needed to spend and redeem the coins sent to it. It is public, and can be posted everywhere in order to receive payments. A wallet is a file which stores addresses and the private keys needed to use them.
Transaction A record of coins moving from one or more addresses to one recipient address. It is saved in a block.
How spending works Several cryptographic technologies public key cryptography - ECDSA hashing algorithms - SHA-1, SHA-256,... Future proof - designed to be upgraded in a forward compatible way
A block is a list of transactions. Block Each block embeds new transactions that took place before it was created. A new block is generated, on average, every 10 minutes. The first generated block is called genesis block. Transactions are placed in blocks, which are linked by SHA-256 hashes.
Blockchain The blockchain is the linked list of all the blocks that have been generated since day one. A full copy of the blockchain contains every transaction ever executed, and is distributed on every client. For any block on the chain, there is only one path to the genesis block. Coming from the genesis block, however, there can be forks.
Mining / Generation Users on the Bitcoin network compete to generate a new valid block that satisfies a difficult requirement (block hash with N leading zero bytes) - this is done by brute force. The lucky solver is then rewarded a fixed number of bitcoins for cryptographically validating transactions on the network. The parameter N is automatically adjusted by the protocol (difficulty), making it scale with the number of miners and making sure that, statistically, one block is generated every 10 minutes.
Mining hardware Nowadays mining with GPUs is no longer profitable - FPGAs or ASICs.
Distributed consensus problem New blocks are linked to older blocks, forming a block chain that is constantly being extended. Miners may generate blocks at the same time or too close together. A participant choosing to extend an existing path in the block chain indicates a vote towards consensus on that path. The longer the path, the more computation was expended building it. Longest path = accepted chain Bitcoin offers a unique solution to the consensus problem in distributed systems since voting power is directly proportional to computing power.
BitIodine
Is Bitcoin anonymous? Yes Users don't need to set up accounts No ID required Cash-like Tor / VPN help No Every transaction is stored in the blockchain Analysis of the blockchain can correlate addresses
BitIodine a tool for analyzing and profiling the Bitcoin network
Transaction graph
User graph
Techniques for creating the User Graph Two main techniques: Multi-input transactions grouping Shadow address guessing (change)
Multi-input transactions When a transaction has multiple input addresses, we can safely assume that those addresses belong to the same wallet, thus to the same user. Assumption: owners don't share private keys. Caveat: web wallets - pools that would be "mistakenly" grouped as a single user.
Shadow addresses the entire value of an unspent output of a prior transaction must be spent and used as input for a new transaction input is destroyed, and change should be sent back to the user to improve anonymity, a "shadow" address is automatically created and used to collect back the "change"
Shadow addresses When a Bitcoin transaction has exactly two output addresses, O 1 and O 2, such that O 2 is a new address (i.e., an address that has never appeared before), and O 1 corresponds to an old address (an address that has appeared previously), we can assume that O 2 constitutes a shadow address for one of the input addresses of that transaction.
Transactions with two outputs are the vast majority. 10000000" Transac/on'count'per'number'of'outputs' Number'of'transac/ons' 9000000" 8000000" 7000000" 6000000" 5000000" 4000000" 3000000" Why? One payee, one shadow address for change back. 2000000" 1000000" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50" Number'of'outputs' Transac7on(volume(per(number(of(outputs( Volumes((BTC)( 1,400,000,000" 1,200,000,000" 1,000,000,000" 800,000,000" 600,000,000" 400,000,000" 200,000,000" 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"26"27"28"29"30"31"32"33"34"35"36"37"38"39"40"41"42"43"44"45"46"47"48"49"50" Number(of(outputs(
Shadow addresses The official bitcoin client tries to randomize the position of the change output, but code is flawed: File: wallet.cpp // Insert change txn at random position: Number of payees vector<ctxout>::iterator position = wtxnew.vout.begin()+getrandint(wtxnew.vout.size()); wtxnew.vout.insert(position, CTxOut(nChange, scriptchange)); size()+1 If just two outputs (one payee), GetRandInt(1) always returns 0. The change ends up always in the first output. If multiple outputs, change is never the last output. Still not fixed! (Recently fixed!)
Shadow addresses Given the bug in the official client, BitIodine checks transactions with exactly two outputs, and also checks that the first address was new at the time of the transaction. If it is, chances are it's a shadow address. It is a heuristic.
Evaluating heuristics Number of clusters Cluster size 6M 5M 4M 3M 2M 1M 0M 5,331,657 Heur. 1 2,175,621 660,494 Heur. 1+2 det. Heur. 1+2 relaxed Coverage 15 10 14.32 100 5 67 33 0 30% 54.5% Heur. 1 39% 87.1% 50% 99.7% Heur. 1+2 det. Heur. 1+2 relaxed 0 1.74 1 Heur. 1 4.30 5 1 Heur. 1+2 det. Heur. 1+2 relaxed Transaction coverage Addresses clustered Average Median
The Classifier: labels for addresses
The Classifier: labels for users
Example: classifying address and owner Classify an address we found in IRC chat logs of channel #bitcoin-otc Borrowed money from other Bitcoin-OTC users
Zero-balance address, exhausted in 84 transactions, belonging to user xisalty on BitcoinTalk forum and user xisalty-otc on Bitcoin-OTC. The owner is a known scammer! Every address belonging to the user is empty, 4.5% are One- Time-Addresses, 2.3% are zombies.
A real-world case: investigating the Silk Road Identify the addresses owned by the Silk Road Investigate the activity of the addresses
A real-world case: investigating the Silk Road One of the bitcoin addresses that moved most funds on the network during 2012 is: 1DkyBEKt5S2GDtv7aQw6rQepAvnsRyHoYM
A real-world case: investigating the Silk Road Using the Mt.Gox scraper, we find that: On July 17, 2012 this address had a balance of 517,825 BTC On the same day Bitcoin looked on the verge of breaking 10 USD/BTC At 02:00 AM, someone sold 10,000 BTC at 9 USD/ BTC, driving the price down At 02:29, two large withdrawals of respectively 20,000 BTC and 60,000 BTC were made from this address one after the other, and included in a block at 02:32.
A real-world case: investigating the Silk Road Using the Mt.Gox scraper, we find that: Mt.Gox, at the time, needed 6 confirmations in order to allow the user to spend deposited bitcoins. 6 confirmations matured with block at height 189421, relayed at 02:47:24 AM At 02:52 and 02:53, someone sold approximately 15,000 BTC at market price in several batches, causing the price to drop below 7.5 USD/BTC.
A real-world case: investigating the Silk Road
A real-world case: investigating the Silk Road Sign up to the Silk Road Deposit 0.001 BTC to a one-time deposit address The coins are mixed: the deposit address is provably in the same wallet as more than 25,000 other addresses
A real-world case: investigating the Silk Road Find a connection between the addresses in the mixer and the large 1Dky... address The mixer is a cluster active since June 18, 2012 - more than 80,000 inputs/outputs Follow the flow of coins!
A real-world case: investigating the Silk Road Multi-hop connection found: the 1Dky... address belongs to the Silk Road.
Conclusions We present BitIodine We test it on real-world use cases we are able to get valuable information we get insights on the Silk Road Plans for the future add user-friendly front-end to the framework
Thank you.