When you work with large data set you know the limits and challenges of traditional database management systems. You pay the price for the total cost of ownership. You manage and maintain the server infrastructure and need to make sure that data is accessible and up and running 24/7. As soon you are required to share your data with different parties within your organization or partners, data storage needs to be distributed and redundant in order to guarantee not only availability but high performance to return speedy query result sets. Sounds familiar, right?
About Google BigQuery
In 2010 Google started to test Google BigQuery which was available for the public at the end of 2011. Within the last 9 years, Google BigQuery evolved and added additional features beyond just “querying” petabytes of data in a few seconds. Google BigQuery ML,BigQuery BI Engine, BigQuery GIS topped up the BigQuery tool set.
Besides developing new features, Google and other folks in the community made publicly available data accessible via Google BigQuery. At the time of writing, there are 130 different types of data sets available for you.
How to Load Data into Google Cloud Storage
If you want to use Google BigQuery with your own data, load your own data as CSV, JSON or connect to external storage to query the data set.
Query Ethereum Blockchain Implementation with Google BigQuery
Ethereum and blockchain developers working with distributed systems are challenged to setup an efficient working environment. When you want to read data from the Ethereum blockchain you will find it quite hard. Google BigQuery and our Anyblocks Index can come here quite handy to speed up development work. We decode, index and structure the transactions and increase query performance for you. And the good thing for non-blockchain developers, simply use your SQL skills to do your selects and joins to query the entire Ethereum data set. By the way, did you know that Anyblocks Index currently includes 22 different Ethereum- and Bitcoin-based networks?
So what does it look like to query Ethereum in Google BigQuery?
To start with, you need to have a google cloud account. Create one or log in. Make sure to enable BigQuery and head over to the data marketplace and find the Ethereum data set. You should now see the BigQuery UI using the Ethereum data set. Make sure that you use the new BigQuery UI version. The old version doesn’t have the up-to-date data set available and will be discontinued in June 2020.
How to search Ethereum Transaction Structures in Google BigQuery?
Let’s query some Ethereum data! We’ll use the sample SQL queries from our SQL tutorial and find the latest 10 Sai (former Dai) transactions. Don’t be confused by the different queries. Google BigQuery uses the standard SQL dialect. Besides that, the Ethereum Google BigQuery has a slightly different database model and table relations than we have.
SELECT logs.block_timestamp AS block_timestamp ,logs.block_number AS block_number ,logs.transaction_hash AS transaction_hash ,logs.log_index AS log_index ,logs.block_hash AS block_hash ,logs.data AS data ,token_transfer.from_address AS from_address ,token_transfer.to_address AS to_address FROM `bigquery-public-data.crypto_ethereum.logs` AS logs JOIN `bigquery-public-data.crypto_ethereum.token_transfers` token_transfer ON logs.transaction_hash = token_transfer.transaction_hash WHERE address = '0x89d24a6b4ccb1b6faa2625fe562bdd9a23260359' AND topics[SAFE_OFFSET(0)] = '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef' order by logs.block_timestamp limit 10;
When you look into the Ethereum BigQuery data model, the data for the events is still hashed in the data column. In order to find the corresponding sender and receiver address, we need to join the token_transfer table with the logs table. The join slows down the query to 25.8 sec compared to 2 sec 801 ms on my working machine (MacBook Pro 2,7 GHz Quad-Core Intel Core i7, 16 GB 2133 MHz LPDDR3) with DataGrip an IDE for databases.
Use the shared query, if you would like to play with Google BigQuery UI and the Ethereum data set. Make sure to not exceed the monthly 1 TB, in case you opt-in for the on-demand pricing. The next 1 TB would cost you 5.00 USD. Did you know that we offer a monthly flat fee for all Ethereum-based production networks? For test networks, you’ll even get access for free!
Ethereum Google BigQuery REST API
At some point in time, you want to integrate the data into your existing enterprise applications. You need to match Ethereum events and transactions with your internal business logic and map both data sets, in order to get better insights.
Besides using SQL, you can work with the provided BigQuery REST API and code your own client in the programming language of your choice. In case you work with very large data sets a REST API might not be the ideal choice as the queries are running synchronously with the request. That means you need to deal with timeouts and slow responses for larger results sets. That’s the reason why Google recommends using the provided Google Cloud Client libraries.
Do you know Elasticsearch? Elasticsearch provides a REST API, too. You are able to define your DSL (Domain Specific Language) query in a JSON format and Elasticsearch will take care that you’ll get splendid super-fast response times. That’s the reason why we love Elasticsearch and indexed all the Ethereum network blockchains to our own Elasticsearch instance. So you can take advantage of the Elasticsearch performance.
Besides the client library, you can query Ethereum transactions and smart contracts from connectors. Install the Google BigQuery connectors and query data via the command line, Hadoop, Excel or Google dataflow.
Ethereum Visualization: Why should you query and visualize Ethereum data at all?
“A visualization like this (and the underpinning database query) is useful for making business decisions, such as prioritizing improvements to the Ethereum architecture itself (is the system running close to capacity and due for an upgrade?) to balance sheet adjustments (how quickly can a wallet be rebalanced?).”
We totally agree! It is essential to find improvements as early as possible when you work with distributed ledger technologies. Fixing a bug and re-deploying a new version is troublesome and costly! Sometimes even irreversible. Do you remember the DAO hack?
Google BigQuery is limited to the Ethereum Mainnet. We know many projects and partners which extensively test on Ethereum test networks. What about analyzing and monitoring those test networks? We understand your need! That’s why we are indexing Kovan, Rinkeby, Robsten, Görli and some other Ethereum- and Bitcoin-based networks as well. In total 22. You want to detect bottlenecks and issues before deploying them on the Mainnet, don’t you?
Shall I use Google BigQuery for Ethereum?
Google BigQuery is a great tool for data scientists to collaborate and access data. By adding Ethereum, Bitcoin and other cryptocurrencies to the BigQuery public data set, Google shows the rising demand and importance of blockchain data.
However, when you zoom into the Google BigQuery Ethereum data sets, you’ll notice that additional work is required to access important information.
1. Transaction ID Ethereum — Log Data Not Decoded
The value of the data column in the log table is not decoded. That requires you to decode the data at your end which slows you down.
2. Solidity smart contract as byte code
Smart contracts are available as bytecode only. That means you are not able to query code, strings and functions.
3. Transaction events not indexed
Ethereum transaction events are not indexed which is crucial in case you want to monitor and be alerted when specific events have occurred.
4. Join data tables
You need to create SQL views or joins in order to map data to one table. That makes the query significantly slower and requires additional scripting at your end.
You can use Google BigQuery, if you want to have general insights about Ethereum transactions and traces without running your own (archive) node. As soon you need to dig deeper into the world of Ethereum data, you need to do extra work. Decode the data, structure and store it, before you can start with your actual data science work.
Consider working with our Anyblocks Index instead?
As blockchain developers and data scientists ourselves, we eat our own dog food. That’s why we understand your needs and challenges and have created Anyblocks Index a product which helps you to speed up your work.
1. More than 90% of the data is decoded. Avoid additional time-consuming processing and coding time and query the data right away.
If your smart contract is not decoded? Send us your ABI via email@example.com and we’ll take care of it.
2. Smart contracts are stored in plain full text
3. Events and internal transactions (traces) are indexed
4. Get super-fast response times with our Elasticsearch API
5. Use our same tech stack and data model across all test networks and other Ethereum-based blockchains.
It is even available as a virtual appliance if you need for private/permissioned consortia networks.
6. You can visualize the data easily via our hosted Metabase dashboard software.
Or connect to your existing Business Intelligence (e.g. Tableau etc.) or IT Monitoring systems (e.g. Nagios etc.) via our ready-to-go connectors.
These benefits are six substantial advantages compared to Google BigQuery when you work with the Anyblocks Index.
But maybe even more important: we are a small, approachable team of experts that are here to help you! We have deep blockchain know-how and smart contract specific data science experience and can help extract and visualize the relevant data technically but also with the eye of users.
For enterprises we can offer professional service-level-agreements, but generally we answer any questions by anybody, as we love to work with all sorts of projects and teams in the Ethereum ecosystem.
Have a look at all the features and services we are offering. Take also a glimpse into the monitoring use cases and get some inspiration what you can do with the Anyblock Index and check out our dashboard examples.
Now it’s time to get your hands dirty. Take a leap to Google BigQuery and use the public Ethereum data set and play around. Or create a free account for Anyblocks Index or request a free demo from us: firstname.lastname@example.org