Search documentation

Python Tutorial

Welcome! In this tutorial, you will learn how to use Python to query the Anyblock ElasticSearch blockchain data API. First you will learn to initiate your ElasticSearch Python client. Then we will show you different queries to retrieve data from the Anyblock Index and how to work with Pandas Dataframes to analyze Ethereum Blockchain data. It is obviously possible to do all the query experiments in Python, but we recommend you to test your queries with cURL, or in Postman, and then copy those over to your Python code.

You will learn:

  • How to install and import the necessary libraries
  • Initiate a ElasticSearch client in Python to query the Anyblock Index API
  • How to write basic queries to get Blockchain data
  • How to use aggregation functions in ElasticSearch
  • Working with Bool and Range queries

Useful Resources for using Python with ElasticSearch

Installing the ElasticSearch library in Python

Before being able to use the power of ElasticSearch, we first have to install the ElasticSearch library via pip. Just execute the following command in your shell.
python -m pip install elasticsearch

For anaconda users it is also possible to install ElasticSearch via anaconda prompt with this command:

conda install elasticsearch

After installing ElasticSearch you can import the library and start working with it:

from elasticsearch import Elasticsearch

Creating a client API connection to Anyblock Index

Connecting to the ElasticSearch Anyblock API is quite simple. We just need to specify the host and the authentication details. The API Keys are listed on your Anyblock Analytics account dashboard. Go to your account and either create a new API key, or copy the already created key.

Initiating a ElasticSearch client is as easy as:

es = Elasticsearch(
hosts= ["https://api.anyblock.tools/ethereum/ethereum/mainnet/es/"],
http_auth=('YOUR EMAIL', 'YOUR API KEY'))

An ElasticSearch client requires a host, in this case we want to interact with the interface of the ethereum mainnet. Furthermore we need our HTTP authentication details. Namely the email address of our Anyblock account and our API key.

But there is also another option to initiate the ElasticSearch client. If you don’t want to use HTTP authentication, it is possible to use bearer tokens.

es = Elasticsearch(
hosts=["https://api.anyblock.tools/ethereum/ethereum/mainnet/es/"], headers={"authorization": "Bearer "})

Create blockchain data API queries to Anyblock Index

After all the prerequisites we can start to query the ElasticSearch API. Let’s start with some basic examples and gradually increase and expand our queries. First we are going to use the low level elasticsearch-py library.

Your first basic query with ElasticSearch

Let’s start with the most basic query. We want to get the latest mined block in the Ethereum Blockchain. So we have to write a query, in which the number of the blocks are in descending order and the output is limited to 1. Why to 1? We want to get the latest Block, not the latest 3, neither the latest 100. The basic structure of a search query is:

response = es.search(index="ethereum-ethereum-mainnet-DATA-STRUCTURE”, body=YOUR QUERY)

Where:

  • es is your ElasticSearch client,
  • index is your data structure and database we want to query
  • and body is where we specify our query details.

So if we want to search for the latest Block in the block data structure, we have to sort the results descending by the block number “number.num”. We limit the results to size:1 and pass the variable “query” to the search method parameter “body”. The code would look like this:

query = {
"sort":{
"number.num":"desc"
},
"size": 1
}

response = elastic_client.search(
	index="ethereum-ethereum-mainnet-block", body=query)

The query results are saved in the variable “response”. The body of the results, without the metadata-information about the processing of the query, can be printed with:

elastic_docs = response["hits"]["hits"]
source_data = {}

for k,i in enumerate(elastic_docs):
	source_data[k] = i["_source"]

print(source_data)

We will save the content of response["hits"]["hits"] in the variable elastic_docs. Then we create an empty dictionary source_data. After this we use an enumerate loop, where the key is the number of the single result, and the value is the content of the block which is saved in _source.

{0: {'author': '0xea674fdde714fd979de3edf0f56aa9716b898ec8',
  'difficulty': {'raw': '3131719354698495',
  'padded': '0x000000000000000000000000000000000000000000000000000b20483bac4aff'},
  'extraData': '0x65746865726d696e652d6575312d38',
  'gasLimit': {'raw': '12450818', 'num': 12450818},
  'gasUsed': {'raw': '12446007', 'num': 12446007},
  'hash': '0x12ed9f83555eb43890ee27421752160bd1d29fbbefd4c58de670249528409baa',
  'logsBloom': '0x2e7a4493e544a28cdaf8158ec50331000aef84909fd74c6722d1b065406a31d70b9c16b0b84a822068002284484653829a0e81105d2f614b9020400ae832121680c972048796925e698afd3c2938d6b0669ced51516225571927745e860030cd982e1fa61e7995ecaa5d8814b0851800004e2e45402a2605022d6e7c14482e66754c1015a374ba367f611ec68bcc38fa652a9dcd95429009885e15469e938950db6809c5c9102a5ca1b3d8ba7b44aeb8ce223979280e7f00f8828d886aa95350b920a44a6a8e3671201859e78a7a20b5ee5422075469fd9877a263a3174525989c112ec0551d620a668486e000696a28162aa004406df550688c806ff1c31546',
  'miner': '0xEA674fdDe714fd979de3EdF0F56AA9716B898ec8',
  'mixHash': '0xe668864eed6f90214679b4c7c290b7659c90ac9e51deaecf3be1b6a57194e3bc',
  'nonce': '0x94f1bf9002b98bd0',
  'number': {'raw': '10867927', 'num': 10867927},
  'parentHash': '0x3b2b06fba38fab45ec4246e44fbb0dee6683f7ef61daee8d3f3fac052b644f32',
  'receiptsRoot': '0x691c2e2c8b1c930f0dd39b2199bf3a4273952fd7b993f98ba4021871bb6dc13f',
  'sha3Uncles': '0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347',
  'size': {'raw': '33580', 'num': 33580},
  'stateRoot': '0x5a8120d82346b9bdeb2db491773bc1725987ee077d9ce6988a5da01b1fdab67e',
  'timestamp': 1600189901,
  'totalDifficulty': {'raw': '17423377319267971510266',
  'padded': '0x0000000000000000000000000000000000000000000003b085eeeedd65362ffa'},
  'transactionsRoot': '0x56bc37272277c38061afd67823476e79a68922d357c67f41584f097d86c99793',
  'uncles': [],
  'sealFields': ['0xa0e668864eed6f90214679b4c7c290b7659c90ac9e51deaecf3be1b6a57194e3bc',
  '0x8894f1bf9002b98bd0']}}

Query blocks by date

After withstanding the first difficulties of our first query, let’s try to create a more complex query. Now we want to query by date. Let’s get the data for all blocks mined between 01.01.2020 00:00:00 and 01.01.2020 06:00:00 UTC. The timestamp for the first date is 1577836800 and 1577858400 for the second date interval.  Visit unixtimestamp.com to convert unix timestamps to datetime. We shouldn’t forget to limit the output size of the query. With from-size it is possible to configure the maximum amount of results. From defines the offset from the latest value. It defaults to zero.

query = {
      "from" : 0, "size" : 10,
	"query": {
    	"range" : {
        	"timestamp" : {
            	"gte" : 1577836800,
            	"lte" : 1577858400
        	}
    	}
	}
}
response = es.search(
	index="ethereum-ethereum-mainnet-block", body=query)
elastic_docs = response["hits"]["hits"]
source_data={}

for k,i in enumerate(elastic_docs):
	source_data[k] = i["_source"]
print(source_data)

What are we doing in this query?

We use ElasticSearch’s range query. With the range query we can get documents that contain the terms with the specified range. In this case we get documents (blocks), which contain the term (timestamp), in the specified range:

  • gte / greater than equal to 1577836800 (01.01.2020 00:00:00)
  • lte / less than equal to 1577858400 (01.01.2020 06:00:00)

The documents which are returned are blocks that are created in the range between these two dates. In a timespan of 6 hours. Between these timestamps, 1185 blocks were mined. We get the results and data for each block mined in this time interval.

After querying the block data, let’s create a query for transactions in the same timespan. Let’s see how many transactions were executed in the same timespan.

query = {
      "from" : 0, "size" : 10,
      "query": {
    	"bool":{
        	"filter":[{
            	"term":{
                	'event.raw':'Transfer'
                  	}}],
          	"must": [{
            	"range" : {
                	"timestamp" : {
                    	"gte" : 1577836800,
                    	"lte" : 1577858400
                              	}
                      	}}]
          	
            	}},
    	"sort":{
        	"timestamp":'desc'
          	},
    	"size":50000 }

response = elastic_client.search(index="ethereum-ethereum-mainnet-block",
                  	body=query, doc_type="event")
elastic_docs = response["hits"]["hits"]

source_data = {}
for k,i in enumerate(elastic_docs):
	source_data[k] = i["_source"]

 

What are we doing in this query?

In this query, we combine multiple ElasticSearch queries in a boolean manner. Here we use a bool query, which is a way to combine multiple queries. We are filtering for all documents, which are containing the value “transfer” in events.raw. So we are looking for transfers. Furthermore we use “must” in the bool query,  which means that the condition in the query must be satisfied to consider a match. All documents with a match, have to be in the requested timespan. We set the size limit to 50000, but we still only get 5000 results. This is the case, because one query can only return 5000 results. There are different methods for obtaining more results. One way is to work with scroll_ids.

Here you can find a minimal working example of working with the scroll parameter. This tutorial explains the concepts behind iterating over the entire ElasticSearch index.

Looking at the results, we have more than 5000 transactions in the timeframe of six hours.

 

Aggregation Queries

Now let’s try to build aggregation queries. In this first aggregation example, we want to know in which block the most gas was used. There are many different aggregation functions and in the useful resources chapter you can find the link to the ElasticSearch documentation which is quite detailed in describing the different aggregation methods.

query = {
  "aggs": {
	"max_gas": { "max": { "field": "gasUsed.num" } }
  }
}
response = elastic_client.search(
	index="ethereum-ethereum-mainnet-block", body=query)
print(response['aggregations'])

This query returns the amount of gas used for the block in which the “gasUsed” value is the largest.

{'max_gas': {'value': 12602295}}

This basic aggregation query is pretty straight-forward. Let’s do a bucket aggregation in the next step.

Once again we want to look at the timespan (six hours) which was introduced earlier. Now we want to aggregate the amount of transactions per block in the timespan of six hours.

query = {"query": {
    	"bool":{
        	"filter":[{
            	"term":{
                	'event.raw':'Transfer'
                  	}}],
          	"must": [{
            	"range" : {
                	"timestamp" : {
                    	"gte" : 1577836800,
                    	"lte" : 1577858400
                              	}
                      	}}]
          	
            	}},
    	"aggs": {
        	"From": {
            	"terms": { "field": "blockNumber.raw" } }
            	},
    	"sort":{
        	"timestamp":'desc'
          	},
    	"size":5000 }

response = elastic_client.search(index="ethereum-ethereum-mainnet-block",
                  	body=query, doc_type="event")
print( response['aggregations'])

Again we filter for transactions in the given timespan. Furthermore we use the aggregation function to aggregate into buckets based on the field blockNumber.raw which contains the blockNumber in integer format.

{'From': {'doc_count_error_upper_bound': 274,
  'sum_other_doc_count': 91260,
  'buckets': [{'key': '9194254', 'doc_count': 201},
  {'key': '9194034', 'doc_count': 126},
  {'key': '9194150', 'doc_count': 124},
  {'key': '9194160', 'doc_count': 120},
  {'key': '9194321', 'doc_count': 112},
  {'key': '9193309', 'doc_count': 96},
  {'key': '9193969', 'doc_count': 93},
  {'key': '9194351', 'doc_count': 92},
  {'key': '9194370', 'doc_count': 91},
  {'key': '9193272', 'doc_count': 90}]}}

We can see that each block contains different amounts of transactions. The block with the most transactions has 126 transactions. The block with the fewest transactions has 90 transactions.

Image: Source code example. Click to enlarge.

Conclusion

In this tutorial we learned to install ElasticSearch-py, initiate a ElasticSearch client with python and worked with a few basic query examples. We worked with range and bool queries and dipped our toes into ElasticSearch’s powerful aggregation functions. There are many resources for ElasticSearch aggregations which can teach you more complex aggregation functions.

There are many options to work with the results obtained from the Anyblock API. It is possible to import the results into a Pandas DataFrame and start analyzing and visualizing Blockchain data. Furthermore there is ElasticSearch-DSL, which is a high-level library that was built on top of elasticsearch-py to write and run queries against ElasticSearch more conveniently. If you had troubles working with the low-level library I would recommend checking out the ElasticSearch DSL documentation and the Anyblock ElasticSearch documentation.

Interested or questions?

 

Sascha Göbel
(Co-Founder & Chief Technology Officer)
sascha@anyblockanalytics.com
+49 6131 3272372

    

Pin It on Pinterest