The MUD Indexer

The MUD Indexer is an off-chain indexer for on-chain applications built with MUD.

Why an off-chain indexer?

Reads with on-chain apps can be tricky. What does it mean to be able to query the Ethereum network? Technically, given a node with a fully synced state, we can explore just about everything using the EVM, but the “exploring” would be looking at raw storage slots for accounts corresponding to smart contracts. A way around this exists by providing view functions on smart contracts: these effectively are just wrappers around raw storage and expose a more friendly API. Instead of having to figure out where the balances for an account are stored in the storage tree, we now can call a function that does the lookup via Solidity via an RPC endpoint.

The issue with view functions is that complexity in having to generate and call these to get the “full picture” of the state from the chain explodes pretty quickly. For a web app client wanting to present the user with an up-to-date view of the on-chain state, the task of calling the different view functions becomes unnecessarily hard very fast. This can also quickly accelerate the need to run a set of dedicated nodes just to be able to service the needed number of RPC requests.

Installation

Running the indexer locally (on the host)

These are the steps to install and start the MUD indexer. They are written under the assumption you are using anvil for your test chain, which is what the getting started package's pnpm dev does.

Build the repository as explained here.
Ensure you have jq (opens in a new tab) installed.
Run a test world. The easiest way to do this is to follow these directions in a separate command line window.

Note: While MUD v2 is in alpha, there might be breaking changes between pre-release versions. To ensure your World is compatible with your indexer, make sure both are running the same MUD version. To create a project with the MUD version on the main branch, use mud@main instead of ~~mud@next~~.
```
pnpm create mud@main project-name
```
Start the indexer.
```
cd packages/store-indexer
pnpm start:sqlite:local
```
Note: If you have already ran the MUD indexer, and then restarted the blockchain, make sure to delete the old database:
```
rm anvil.db
```

Run this command (in a separate command line window) as a sanity check to verify the indexer is working correctly:

curl 'http://localhost:3001/trpc/findAll?batch=1&input=%7B%220%22%3A%7B%22json%22%3A%7B%22chainId%22%3A31337%2C%22address%22%3A%220x5FbDB2315678afecb367f032d93F642f64180aa3%22%7D%7D%7D' | jq

The result should be nicely formatted (and long) JSON output.

Running the indexer on Docker

The indexer Docker image is available on github (opens in a new tab).

These environment variables need to be provided to the docker indexer to work:

Type	Variable	Meaning	Sample value (using `anvil` running on the host)
Needed	RPC_HTTP_URL	The URL to access the blockchain using HTTP	http://host.docker.internal:8545 (opens in a new tab)
Optional	RPC_WS_URL	The URL to access the blockchain using WebSocket
Optional	START_BLOCK	The block to start indexing from. The block in which the `World` contract was deployed is a good choice.	1
Optional	DEBUG=mud:*	Turn on debugging
Optional, only for SQLite	SQLITE_FILENAME	Name of database	`anvil.db`
Optional, only for PostgreSQL	DATABASE_URL	URL for the database, of the form `postgres://<host>/<database>`

There are several ways to provide those environment variables to docker run:

On the command line you can specify -e <variable>=<value>. You specify this after the docker run, but before the name of the image.
You can also write all the environment variables in a file and specify it using --env-file. You specify this after the docker run, but before the name of the image.
Both Docker Compose (opens in a new tab) and Kubernetes (opens in a new tab) have their own mechanisms for starting docker containers with environment variables.

SQLite

The command to start the indexer in SQLite mode is pnpm start:sqlite. To index an anvil instance running to the host, use:

docker run \
  --platform linux/amd64 \
  -e RPC_HTTP_URL=http://host.docker.internal:8545 \
  -p 3001:3001  \
  ghcr.io/latticexyz/store-indexer:latest  \
  pnpm start:sqlite

However, this creates a docker container with a state, the SQLite database file. If we start a new container with the same image and parameters, it is going to have to go back to the start of the blockchain, which depending on how long the blockchain has been in use may be a problem. We can solve this with volumes (opens in a new tab):

Create a docker volume for the SQLite database file.
```
docker volume create sqlite-db-file
```

Run the indexer container using the volume.

docker run \
   --platform linux/amd64 \
   -e RPC_HTTP_URL=http://host.docker.internal:8545 \
   -e SQLITE_FILENAME=/dbase/anvil.db \
   -v sqlite-db-file:/dbase \
   -p 3001:3001  \
   ghcr.io/latticexyz/store-indexer:latest  \
   pnpm start:sqlite

You can stop the docker container and restart it, or start a separate container using the same database.
When you are done, you have to delete the docker containers that used it before you can delete the volume. You can use these commands:
```
docker rm `docker ps -a --filter volume=sqlite-db-file -q`
docker volume rm sqlite-db-file
```
Note: You should do this every time you restart the blockchain. Otherwise your index will include data from multiple blockchains, and make no sense.

PostgreSQL

The command to start the indexer in PostgreSQL mode is pnpm start:postgres.

The docker instance identifies to PostgreSQL as root. To give this user permissions on the database, enter psql and run this command.
```
CREATE ROLE root SUPERUSER LOGIN;
```
Note: This is assuming a database that is isolated from the Internet and only used by trusted entities. In a production system you will use at least a password as authentication, and limit the user's authority.

Start the docker container. For example, to index an anvil instance running to the host to the database postgres on the host, use.

docker run \
  --platform linux/amd64 \
  -e RPC_HTTP_URL=http://host.docker.internal:8545 \
  -e DATABASE_URL=postgres://host.docker.internal/postgres \
  -p 3001:3001  \
  ghcr.io/latticexyz/store-indexer:latest  \
  pnpm start:postgres

The PostgreSQL database is persistent. If you restart the blockchain you have to delete the content of this database, otherwise the indexer will include old information. To delete the database content, enter psql and run this command.
```
DROP OWNED BY root;
```

Data access and interpretation

Using the indexer with TypeScript

You can see an example of an indexer client in the MUD repo (opens in a new tab). To execute it:

Download and build the example.

git clone https://@github.com/latticexyz/mud.git
cd mud/examples/indexer-client
pnpm install

Run the example.
```
pnpm read-indexer
```

If your indexer is indexing the minimal template, the expected output should be similar to:

Block number: 2042
Tables: schema,Hooks,StoreMetadata,NamespaceOwner,InstalledModules,ResourceAccess,Systems,FunctionSelector,SystemHooks,SystemRegistry,ResourceType,Counter
Information about Counter
{
  id: '0x5FbDB2315678afecb367f032d93F642f64180aa3____Counter',
  address: '0x5FbDB2315678afecb367f032d93F642f64180aa3',
  tableId: '0x00000000000000000000000000000000436f756e746572000000000000000000',
  namespace: '',
  name: 'Counter',
  keySchema: {},
  valueSchema: { value: 'uint32' },
  lastUpdatedBlockNumber: 2042n,
  records: [ { key: {}, value: [Object] } ]
}
The actual value: 2

Detailed explanation

import { createIndexerClient } from "@latticexyz/store-sync/trpc-indexer";
const indexer = createIndexerClient({
  url: "http://127.0.0.1:3001/trpc",
});

Create an indexer client. The URL, http://127.0.0.1/3001/trpc, is the one for the indexer you run on the local computer. If the indexer is elsewhere, modify the URL as appropriate.

const result = await indexer.findAll.query({
  chainId: 31337,
  address: "0x5FbDB2315678afecb367f032d93F642f64180aa3",
});

Connect to the indexer. The chainId value is for a sanity check, to ensure we are connecting to the correct indexer. The address parameter is the address of the World contract whose information we are reading. The same indexer can read from multiple World objects, as long as they are on the same blockchain.

console.log(`Block number: ${result.blockNumber}`);

The result has two fields:

blockNumber, the data is correct as of that block (it could change later).
tables, the tables of the World we are reading.

console.log(`Tables: ${result.tables.map((t) => t.name)}`);

Use map (opens in a new tab) to get the names of the tables.

const counterTable = result.tables.filter((t) => t.name == "Counter")[0];

Use filter (opens in a new tab) to get the Counter table, assuming there is at least one.

console.log("Information about Counter");
console.log(counterTable);

Log the information we have about the table. It contains these fields:

Field	Meaning
id	Iinternal ID for uniqueness in the context of SQL
address	`World` address
tableId	`bytes32` hex encoded table ID (a concat of `bytes16(namespace)` and `bytes16(name)`)
namespace	Table's namespace
name	Table's name
keySchema	Schema of the table key
valueSchema	Schema of the table value
lastUpdatedBlockNumber	Block number for which the data is correct
records	Actual table data

console.log(`The actual value: ${counterTable.records[0].value.value}`);

The actual value in Counter, which is in the first (and only) record. Every record has two fields:

key
value

Both key and value can have multiple fields inside them, one for each field in the appropriate schema. In the case of Counter there is nothing in the key, but value has a single field, also called value. So the counter value is value.value of the record.

Using HTTP

⚠URLs and parameters may vary

The parameters, especially the URL, may change as we adjust our tRPC configuration. If you can use the TypeScript methods, those are more stable.

The indexer is built on top of tRPC (opens in a new tab), which is primarily designed to be used between TypeScript (opens in a new tab) programs. As a result, the data format is more complicated than REST APIs.

For queries, the indexer expects to get two parameters in a query string (opens in a new tab):

batch, the batch identifier. This parameter can contain any string, as long as it is present.
input, the actual query. This query is a URLEncoded (opens in a new tab) JSON (opens in a new tab) object. To play with possible object values, you can use this online calculator (opens in a new tab).

For example, the input object used in the sanity check above is:
```
{
  "0": {
    "json": {
      "chainId": 31337,
      "address": "0x5FbDB2315678afecb367f032d93F642f64180aa3"
    }
  }
}
```
An HTTP request can include multiple queries, so the query object includes a numbered key (or numbered keys) with the actual queries. Also, in the future other formats than JSON might be supported, so each query specifies what format it uses.

The one type of query currently supported is findAll. It gives you all the information stored on the specific world in the indexer. The query is available at path /findAll and have two parameters:

chainId: The chain ID of the blockchain to query.
address: The address of the World. Based on the paramters in the development environment, pnpm dev uses the address in the sanity check.

To understand the result, it is easiest to look at it inside Node.

Read the result and enter node:

curl 'http://localhost:3001/findAll?batch=wd&input=%7B%220%22%3A%7B%22json%22%3A%7B%22chainId%22%3A31337%2C%22address%22%3A%220x5FbDB2315678afecb367f032d93F642f64180aa3%22%7D%7D%7D' > res.json
node

Read the result into a JavaScript object.

res = JSON.parse(fs.readFileSync("res.json"));

res is actually a list of results, which only has one item because we only had one key in the query. This item has one key, record, whose value also has only one key, data. Run these commands to verify the statements above and then cut res to the actual result.
```
res.length;
Object.keys(res[0]);
Object.keys(res[0].result);
res = res[0].result.data;
```
Inside the result there are two fields:
- meta, the metadata (opens in a new tab). In the case of JSON the field names are part of the data, so this primarily contains data types.
- json, the data read from the World. This field itself is divided into two values:
  - blockNumber, the block number in which these query results apply.
  - tables, a list of the tables in the World and their data.
You can use the filter function (opens in a new tab) to get a specific table. In this case, we are getting the Counter table.
```
counterTable = res.json.tables.filter((t) => t.name === "Counter")[0];
```
Finally, to get the actual record with the counter value, use:
```
counterTable.records[0].value.value;
```

Debugging

See here.

Indexing other blockchains

To index a different blockchain, you specify these environment variables:

Variable	Value
RPC_HTTP_URL	The URL to access the blockchain using HTTP
RPC_WS_URL	The URL to access the blockchain using WebSocket
START_BLOCK¹	The block to start indexing from. The block in which the `World` contract was deployed is a good choice.
SQLITE_FILENAME	`<blockchain name>.db`

(1) This value is optional, but highly recommended in chains where the world was deployed after a large number of blocks had already passed.

After you do that, start the indexer with this command:

pnpm start:sqlite

Note that in your queries you need to specify the correct chainId for the chain you are indexing.

Using the indexer with PostgreSQL

To run the indexer with PostgreSQL (opens in a new tab), follow these steps:

Download and install PostgreSQL (opens in a new tab).

To use the default setting (local PostgreSQL, using the postgres database, indexing the local anvil chain), run:

pnpm start:postgres:local

Otherwise, set these environment variables:

Variable	Value
CHAIN_ID	ChainID for the chain you index
RPC_HTTP_URL¹	The URL to access the blockchain using HTTP
RPC_WS_URL¹	The URL to access the blockchain using WebSocket
START_BLOCK²	The block to start indexing from. The block in which the `World` contract was deployed is a good choice.
DATABASE_URL	URL for the database, of the form `postgres://<host>/<database>`

(1) Not necessary in the case of a Lattice testnet.

(2) This value is optional, but highly recommended in chains where the world was deployed after a large number of blocks had already passed for performance reasons.

And then run:

pnpm start:postgres

Reading information from PostgreSQL

You can now use psql (opens in a new tab) to view the indexer information.

Note: This example uses the template (opens in a new tab). If you are indexing a different World, change the world address and use the name of a table you have rather than Counter.

Get the list of schemas to verify __mud_internal is there.
```
\dn
```
Get the list of relations in the schema.
```
\dt __mud_internal.*
```
You should get two tables, chain and tables.
Get the list of chains being indexed.
```
SELECT * FROM __mud_internal.chain;
```
The tables table is a lot bigger, so first get the list of columns within the table.
```
\d+ __mud_internal.tables
```
Get the list of MUD table names.
```
SELECT name FROM __mud_internal.tables;
```

Get the key and value schemas for Counter.

SELECT key_schema, value_schema FROM __mud_internal.tables WHERE name='Counter';

The schema for a world is <address>__. Get the list of tables for a specific world.
```
\dt "0x5FbDB2315678afecb367f032d93F642f64180aa3__".*
```

Specify the world schema is the default.

SET search_path TO "0x5FbDB2315678afecb367f032d93F642f64180aa3__";

Get the value of the counter.
```
SELECT * FROM "Counter";
```

Deleting the PostgreSQL data

PostgreSQL is a persistent data storage. This means that if you restart the blockchain without manually deleting the data, your indexer will use old data and provide incorrect results.

To avoid this, enter psql and run this command:

DROP OWNED BY <your user name>;

MUD client CLI