The MUD Indexer
The MUD Indexer is an off-chain indexer for on-chain applications built with MUD.
Why an off-chain indexer?
Reads with on-chain apps can be tricky. What does it mean to be able to query the Ethereum network? Technically, given a node with a fully synced state, we can explore just about everything using the EVM, but the “exploring” would be looking at raw storage slots for accounts corresponding to smart contracts. A way around this exists by providing view functions on smart contracts: these effectively are just wrappers around raw storage and expose a more friendly API. Instead of having to figure out where the balances for an account are stored in the storage tree, we now can call a function that does the lookup via Solidity via an RPC endpoint.
The issue with view functions is that complexity in having to generate and call these to get the “full picture” of the state from the chain explodes pretty quickly. For a web app client wanting to present the user with an up-to-date view of the on-chain state, the task of calling the different view functions becomes unnecessarily hard very fast. This can also quickly accelerate the need to run a set of dedicated nodes just to be able to service the needed number of RPC requests.
Installation
Running the indexer locally (on the host)
These are the steps to install and start the MUD indexer.
They are written under the assumption you are using anvil
for your test chain, which is what the getting started package's pnpm dev
does.
-
Build the repository as explained here.
-
Ensure you have
jq
(opens in a new tab) installed. -
Run a test world. The easiest way to do this is to follow these directions in a separate command line window.
Note: While MUD v2 is in alpha, there might be breaking changes between pre-release versions. To ensure your World is compatible with your indexer, make sure both are running the same MUD version. To create a project with the MUD version on the
main
branch, usemud@main
instead of.mud@next
pnpm create mud@main project-name
-
Start the indexer.
cd packages/store-indexer pnpm start:sqlite:local
Note: If you have already ran the MUD indexer, and then restarted the blockchain, make sure to delete the old database:
rm anvil.db
-
Run this command (in a separate command line window) as a sanity check to verify the indexer is working correctly:
curl 'http://localhost:3001/trpc/findAll?batch=1&input=%7B%220%22%3A%7B%22json%22%3A%7B%22chainId%22%3A31337%2C%22address%22%3A%220x5FbDB2315678afecb367f032d93F642f64180aa3%22%7D%7D%7D' | jq
The result should be nicely formatted (and long) JSON output.
Running the indexer on Docker
The indexer Docker image is available on github (opens in a new tab).
These environment variables need to be provided to the docker indexer to work:
Type | Variable | Meaning | Sample value (using anvil running on the host) |
---|---|---|---|
Needed | RPC_HTTP_URL | The URL to access the blockchain using HTTP | http://host.docker.internal:8545 (opens in a new tab) |
Optional | RPC_WS_URL | The URL to access the blockchain using WebSocket | |
Optional | START_BLOCK | The block to start indexing from. The block in which the World contract was deployed is a good choice. | 1 |
Optional | DEBUG=mud:* | Turn on debugging | |
Optional, only for SQLite | SQLITE_FILENAME | Name of database | anvil.db |
Optional, only for PostgreSQL | DATABASE_URL | URL for the database, of the form postgres://<host>/<database> |
There are several ways to provide those environment variables to docker run
:
- On the command line you can specify
-e <variable>=<value>
. You specify this after thedocker run
, but before the name of the image. - You can also write all the environment variables in a file and specify it using
--env-file
. You specify this after thedocker run
, but before the name of the image. - Both Docker Compose (opens in a new tab) and Kubernetes (opens in a new tab) have their own mechanisms for starting docker containers with environment variables.
SQLite
The command to start the indexer in SQLite mode is pnpm start:sqlite
.
To index an anvil
instance running to the host, use:
docker run \
--platform linux/amd64 \
-e RPC_HTTP_URL=http://host.docker.internal:8545 \
-p 3001:3001 \
ghcr.io/latticexyz/store-indexer:latest \
pnpm start:sqlite
However, this creates a docker container with a state, the SQLite database file. If we start a new container with the same image and parameters, it is going to have to go back to the start of the blockchain, which depending on how long the blockchain has been in use may be a problem. We can solve this with volumes (opens in a new tab):
-
Create a docker volume for the SQLite database file.
docker volume create sqlite-db-file
-
Run the indexer container using the volume.
docker run \ --platform linux/amd64 \ -e RPC_HTTP_URL=http://host.docker.internal:8545 \ -e SQLITE_FILENAME=/dbase/anvil.db \ -v sqlite-db-file:/dbase \ -p 3001:3001 \ ghcr.io/latticexyz/store-indexer:latest \ pnpm start:sqlite
-
You can stop the docker container and restart it, or start a separate container using the same database.
-
When you are done, you have to delete the docker containers that used it before you can delete the volume. You can use these commands:
docker rm `docker ps -a --filter volume=sqlite-db-file -q` docker volume rm sqlite-db-file
Note: You should do this every time you restart the blockchain. Otherwise your index will include data from multiple blockchains, and make no sense.
PostgreSQL
The command to start the indexer in PostgreSQL mode is pnpm start:postgres
.
-
The docker instance identifies to PostgreSQL as
root
. To give this user permissions on the database, enterpsql
and run this command.CREATE ROLE root SUPERUSER LOGIN;
Note: This is assuming a database that is isolated from the Internet and only used by trusted entities. In a production system you will use at least a password as authentication, and limit the user's authority.
-
Start the docker container. For example, to index an
anvil
instance running to the host to the databasepostgres
on the host, use.docker run \ --platform linux/amd64 \ -e RPC_HTTP_URL=http://host.docker.internal:8545 \ -e DATABASE_URL=postgres://host.docker.internal/postgres \ -p 3001:3001 \ ghcr.io/latticexyz/store-indexer:latest \ pnpm start:postgres
-
The PostgreSQL database is persistent. If you restart the blockchain you have to delete the content of this database, otherwise the indexer will include old information. To delete the database content, enter
psql
and run this command.DROP OWNED BY root;
Data access and interpretation
Using the indexer with TypeScript
You can see an example of an indexer client in the MUD repo (opens in a new tab). To execute it:
-
Download and build the example.
git clone https://@github.com/latticexyz/mud.git cd mud/examples/indexer-client pnpm install
-
Run the example.
pnpm read-indexer
If your indexer is indexing the minimal template, the expected output should be similar to:
Block number: 2042
Tables: schema,Hooks,StoreMetadata,NamespaceOwner,InstalledModules,ResourceAccess,Systems,FunctionSelector,SystemHooks,SystemRegistry,ResourceType,Counter
Information about Counter
{
id: '0x5FbDB2315678afecb367f032d93F642f64180aa3____Counter',
address: '0x5FbDB2315678afecb367f032d93F642f64180aa3',
tableId: '0x00000000000000000000000000000000436f756e746572000000000000000000',
namespace: '',
name: 'Counter',
keySchema: {},
valueSchema: { value: 'uint32' },
lastUpdatedBlockNumber: 2042n,
records: [ { key: {}, value: [Object] } ]
}
The actual value: 2
Detailed explanation
import { createIndexerClient } from "@latticexyz/store-sync/trpc-indexer";
const indexer = createIndexerClient({
url: "http://127.0.0.1:3001/trpc",
});
Create an indexer client. The URL, http://127.0.0.1/3001/trpc
, is the one for the indexer you run on the local computer.
If the indexer is elsewhere, modify the URL as appropriate.
const result = await indexer.findAll.query({
chainId: 31337,
address: "0x5FbDB2315678afecb367f032d93F642f64180aa3",
});
Connect to the indexer.
The chainId
value is for a sanity check, to ensure we are connecting to the correct indexer.
The address
parameter is the address of the World
contract whose information we are reading.
The same indexer can read from multiple World
objects, as long as they are on the same blockchain.
console.log(`Block number: ${result.blockNumber}`);
The result
has two fields:
blockNumber
, the data is correct as of that block (it could change later).tables
, the tables of theWorld
we are reading.
console.log(`Tables: ${result.tables.map((t) => t.name)}`);
Use map
(opens in a new tab) to get the names of the tables.
const counterTable = result.tables.filter((t) => t.name == "Counter")[0];
Use filter
(opens in a new tab) to get the Counter
table, assuming there is at least one.
console.log("Information about Counter");
console.log(counterTable);
Log the information we have about the table. It contains these fields:
Field | Meaning |
---|---|
id | Iinternal ID for uniqueness in the context of SQL |
address | World address |
tableId | bytes32 hex encoded table ID (a concat of bytes16(namespace) and bytes16(name) ) |
namespace | Table's namespace |
name | Table's name |
keySchema | Schema of the table key |
valueSchema | Schema of the table value |
lastUpdatedBlockNumber | Block number for which the data is correct |
records | Actual table data |
console.log(`The actual value: ${counterTable.records[0].value.value}`);
The actual value in Counter
, which is in the first (and only) record.
Every record has two fields:
key
value
Both key
and value
can have multiple fields inside them, one for each field in the appropriate schema.
In the case of Counter
there is nothing in the key
, but value
has a single field, also called value
.
So the counter value is value.value
of the record.
Using HTTP
The parameters, especially the URL, may change as we adjust our tRPC configuration. If you can use the TypeScript methods, those are more stable.
The indexer is built on top of tRPC (opens in a new tab), which is primarily designed to be used between TypeScript (opens in a new tab) programs. As a result, the data format is more complicated than REST APIs.
For queries, the indexer expects to get two parameters in a query string (opens in a new tab):
-
batch
, the batch identifier. This parameter can contain any string, as long as it is present. -
input
, the actual query. This query is a URLEncoded (opens in a new tab) JSON (opens in a new tab) object. To play with possible object values, you can use this online calculator (opens in a new tab).For example, the input object used in the sanity check above is:
{ "0": { "json": { "chainId": 31337, "address": "0x5FbDB2315678afecb367f032d93F642f64180aa3" } } }
An HTTP request can include multiple queries, so the query object includes a numbered key (or numbered keys) with the actual queries. Also, in the future other formats than JSON might be supported, so each query specifies what format it uses.
The one type of query currently supported is findAll
. It gives you all the information stored on the specific world in the indexer.
The query is available at path /findAll
and have two parameters:
chainId
: The chain ID of the blockchain to query.address
: The address of theWorld
. Based on the paramters in the development environment,pnpm dev
uses the address in the sanity check.
To understand the result, it is easiest to look at it inside Node.
-
Read the result and enter
node
:curl 'http://localhost:3001/findAll?batch=wd&input=%7B%220%22%3A%7B%22json%22%3A%7B%22chainId%22%3A31337%2C%22address%22%3A%220x5FbDB2315678afecb367f032d93F642f64180aa3%22%7D%7D%7D' > res.json node
-
Read the result into a JavaScript object.
res = JSON.parse(fs.readFileSync("res.json"));
-
res
is actually a list of results, which only has one item because we only had one key in the query. This item has one key,record
, whose value also has only one key,data
. Run these commands to verify the statements above and then cutres
to the actual result.res.length; Object.keys(res[0]); Object.keys(res[0].result); res = res[0].result.data;
-
Inside the result there are two fields:
meta
, the metadata (opens in a new tab). In the case of JSON the field names are part of the data, so this primarily contains data types.json
, the data read from theWorld
. This field itself is divided into two values:blockNumber
, the block number in which these query results apply.tables
, a list of the tables in theWorld
and their data.
You can use the
filter
function (opens in a new tab) to get a specific table. In this case, we are getting theCounter
table.counterTable = res.json.tables.filter((t) => t.name === "Counter")[0];
-
Finally, to get the actual record with the counter value, use:
counterTable.records[0].value.value;
Debugging
Indexing other blockchains
To index a different blockchain, you specify these environment variables:
Variable | Value |
---|---|
RPC_HTTP_URL | The URL to access the blockchain using HTTP |
RPC_WS_URL | The URL to access the blockchain using WebSocket |
START_BLOCK1 | The block to start indexing from. The block in which the World contract was deployed is a good choice. |
SQLITE_FILENAME | <blockchain name>.db |
(1) This value is optional, but highly recommended in chains where the world was deployed after a large number of blocks had already passed.
After you do that, start the indexer with this command:
pnpm start:sqlite
Note that in your queries you need to specify the correct chainId for the chain you are indexing.
Using the indexer with PostgreSQL
To run the indexer with PostgreSQL (opens in a new tab), follow these steps:
-
Download and install PostgreSQL (opens in a new tab).
-
To use the default setting (local PostgreSQL, using the
postgres
database, indexing the localanvil
chain), run:pnpm start:postgres:local
Otherwise, set these environment variables:
Variable Value CHAIN_ID ChainID for the chain you index RPC_HTTP_URL1 The URL to access the blockchain using HTTP RPC_WS_URL1 The URL to access the blockchain using WebSocket START_BLOCK2 The block to start indexing from. The block in which the World
contract was deployed is a good choice.DATABASE_URL URL for the database, of the form postgres://<host>/<database>
(1) Not necessary in the case of a Lattice testnet.
(2) This value is optional, but highly recommended in chains where the world was deployed after a large number of blocks had already passed for performance reasons.
And then run:
pnpm start:postgres
Reading information from PostgreSQL
You can now use psql
(opens in a new tab) to view the indexer information.
Note: This example uses the template (opens in a new tab).
If you are indexing a different World
, change the world address and use the name of a table you have rather than Counter
.
-
Get the list of schemas to verify
__mud_internal
is there.\dn
-
Get the list of relations in the schema.
\dt __mud_internal.*
You should get two tables,
chain
andtables
. -
Get the list of chains being indexed.
SELECT * FROM __mud_internal.chain;
-
The
tables
table is a lot bigger, so first get the list of columns within the table.\d+ __mud_internal.tables
-
Get the list of MUD table names.
SELECT name FROM __mud_internal.tables;
-
Get the key and value schemas for
Counter
.SELECT key_schema, value_schema FROM __mud_internal.tables WHERE name='Counter';
-
The schema for a world is
<address>__
. Get the list of tables for a specific world.\dt "0x5FbDB2315678afecb367f032d93F642f64180aa3__".*
-
Specify the world schema is the default.
SET search_path TO "0x5FbDB2315678afecb367f032d93F642f64180aa3__";
-
Get the value of the counter.
SELECT * FROM "Counter";
Deleting the PostgreSQL data
PostgreSQL is a persistent data storage. This means that if you restart the blockchain without manually deleting the data, your indexer will use old data and provide incorrect results.
To avoid this, enter psql
and run this command:
DROP OWNED BY <your user name>;