Exploring Ethereum with BigQuery
Introduction
Ethereum is a blockchain-based distributed computing platform where users may exchange currency (Ether), provide or purchase services (smart contracts), mint their coinage (tokens), as well as other applications. The Ethereum network is fully decentralized, managed by public-key cryptography, peer-to-peer networking, and proof-of-work to process/verify transactions. In this report, we will present the methodology and results of the analysis of the Ethereum blockchain dataset. The dataset contains information on the blocks, transactions, and smart contracts that exist on the Ethereum network. We conducted three analyses: time analysis, top ten most popular services, and top ten most active miners
Dataset Schema
• number: The block number
• hash: Hash of the block
• parent_hash: Hash of the parent of the block
• nonce: Nonce that satisfies the difficulty target
• sha3_uncles: Combined has of all uncles for a given parent
• logs_bloom: Data structure containing event logs
• transactions_root: Root hash of the transactions in the payload
• state_root: Root hash of the state object
• receipts_root: hash of the transaction receipts tree
• miner: The address of the beneficiary to whom the mining rewards were given
• difficulty: Integer of the difficulty for this block
• total_difficulty: Total difficulty of the chain until this block
• size: The size of this block in bytes
• extra_data: Arbitrary additional data as raw bytes
• gas_limit: The maximum gas allowed in this block
• gas_used: The total used gas by all transactions in this block
• timestamp: The timestamp for when the block was collated
• transaction_count: The number of transactions in the block
BigQuery is a fully managed, petabyte-scale analytics data warehouse that enables businesses to analyze all their data very quickly. It is a serverless platform, so you don’t need to worry about managing any infrastructure. You can simply load your data into BigQuery and start querying it using SQL. BigQuery is also very cost-effective, as you only pay for the storage and querying that you use.
blocks_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16967424 entries, 0 to 16967423
Data columns (total 4 columns):
# Column Dtype
--- ------ -----
0 timestamp datetime64[ns, UTC]
1 transaction_count Int64
2 miner object
3 size Int64
dtypes: Int64(2), datetime64[ns, UTC](1), object(1)
memory usage: 550.2+ MB
CPU times: user 11.4 ms, sys: 304 µs, total: 11.7 ms
Wall time: 36.1 ms
Part A :- Time Analysis (25 %)
Create a bar plot showing the number of transactions occurring every month between the start and end of the dataset.
Part B Top Ten Most Popular Services (25 %)
Evaluate the top 10 smart contracts by total Ether received. You will need to join address field in the contracts dataset to the to_address in the transactions dataset to determine how much ether a contract has received.
Dataset Schema - contracts
• address: Address of the contract
• bytecode: Code for Ethereum Contract
• function_sighashes: Function signature hashes of a contract
• is_erc20: Whether this contract is an ERC20 contract
• is_erc721: Whether this contract is an ERC721 contract
• block_number: Block number where this contract was created
address
0xd4039ecc40aeda0582036437cf3ec02845da4c13 5280000000000000000000.000000
0x8a46b5dd6e1aa09619245ff386cff5a624cddddb 4500900000000000000000.000000000
0xe9603c91ef4334450671b1e4a6cad611ba20bbf1 3976237731316744024643.000000
0xb8901acb165ed027e32754e0ffe830802919727f 1709610360292951025469.000000
0x89e7a245d5267ecd5bf4ca4c1d9d4d5a14bbd130 1600000000000000000000.000000
0x30b561304a4cd1f9941ab99be06d53a6cb341167 818000000000000000000.000000000
0x89cd1b96aa6dfb50d2b70ba29c026793602423ca 200000000000000000000.000000000
0x54edc8496f1e97fe7af2928e5b258cdda35c6ca3 138874124753428342453.0000000
0x13dc57a190808e3bde97a625093d66212d101fc4 79250898615375547000.00000000
0xdbd324b73f6f85bf9013b75c442021303b635ff9 67679600000000000000.00000000
Part C :- Top Ten Most Active Miners (10 %)
Evaluate the top 10 miners by the size of the blocks mined. This is simpler as it does not require a join. You will first have to aggregate blocks to see how much each miner has been involved in. You will want to aggregate size for addresses in the miner field. This will be similar to the wordcount that we saw in Lab 1 and Lab 2. You can add each value from the reducer to a list and then sort the list to obtain the most active miners
SELECT miner, SUM(size) AS total_block_size FROM `bigquery-public-data.crypto_ethereum.blocks`GROUP BY miner
ORDER BY total_block_size DESC
miner total_block_size
0 0xea674fdde714fd979de3edf0f56aa9716b898ec8 130337685085
1 0x5a0b54d5dc17e0aadc383d2db43b0a0d3e029c4c 65704646587
2 0x829bd824b016326a401d083b33d092293333a830 63253546382
3 0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5 28083941732
4 0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 27120033449
5 0x690b9a9e9aa1c9db991c7721a92d351db4fac990 21308961087
6 0x1ad91ee08f21be3de0ba2ba6918e714da6b45836 21232642110
7 0x95222290dd7278aa3ddd389cc1e1d165cc4bafe5 17182907850
8 0x00192fb10df37c9fb26829eb2cc623cd1bf599e8 13184407748
9 0x04668ec2f57cc15c381b461b9fedab5d451c8f7f 10071546636
Conclusion
In conclusion, we have analyzed the Ethereum blockchain dataset and presented the results of three analyses. The time analysis showed the monthly trend of the number of transactions and average transaction value. The top ten most popular services analysis showed the smart contracts that received the most Ether. The top ten most active miners analysis showed the miners that mined the most blocks by size. These analyses can provide insights into the usage and performance of the Ethereum network