NFT Attribute Aggregation

Enver Podgorcevic, Solana NFTs
Back

Note: Things are luckily different now with the introduction of Metaplex collections standard, I'm keeping this article here just for the historical purposes

Introduction to Metaplex attributes

Currently the most popular token metadata standard on Solana blokchain, the Metaplex Token Metadata Standard, specifies that tokens have attributes array which contains an optional number of attributes, each attribute having the following form:

{
	"trait_type": string,
	"value": string
}

A concrete example of such an array of attributes is given below.

{
	...
  "attributes": [
    {
      "trait_type": "Attributes Count",
      "value": 2
    },
    {
      "trait_type": "Type",
      "value": "Skeleton"
    },
    {
      "trait_type": "Clothes",
      "value": "Orange Jacket"
    },
    {
      "trait_type": "Ears",
      "value": "None"
    },
    {
      "trait_type": "Mouth",
      "value": "None"
    },
    {
      "trait_type": "Eyes",
      "value": "None"
    },
    {
      "trait_type": "Hat",
      "value": "Crown"
    }
  ],
	...
}

The rest of this document will be concerned with specifying the design decisions for storing attribute data in our database, both for single NFTs and for collections.

Also, the methods presented below, especially the histogram method, could be used on collection data before it’s sent to the front end, in order to make the UI more responseive.

Storing Single NFT Attributes

Word of caution: Since Metaplex did such a great job on standardizing the NFT metadata stuff, get ready for a bit of a rant from someone who first hand felt the pain of having to deal with all sorts of wacky bugs caused by the stuff they did or did not standardize. If you’re interested only in the technical bits, please feel free to skip ahead to the last paragraph of this section.

First things first, Metaplex offchain metadata is a mess. It’s not standardized, it’s sometimes (~25% of time) stored on mutable storage where it can be changed to whatever the storage owner pleases, and generally it’s pretty restricting and in my opinion not very well thought-out.

On-chain data is to some extend not so terrible — it has its flaws but at least it’s well behaved and has consistent format. Off-chain data, on the other hand, is a complete mess. Chaos. Mishmash if you will. Users can add random additional fields, there’s no any mandatory fields, it shares some of the fields with the on-chain metadata (which, unsurprisingly, don’t 👏necessarily 👏 share 👏 the 👏 same 👏 value 👏), it can even contain nested objects in places where one expects to find primitive values, and probably a dozen more cases which cause runtime exceptions that people haven’t came up with yet.

Photo

So, in order to keep the things well behaved and avoid doing sanity checks both on front and back end all the time, all data that does not comply with the standard specification is discarded, and that is the first design decision regarding persistent storage of Metaplex metadata.

Storing Attribute Information per Collection

Now that we got that rant out of the way, let’s discus some of the more technical design decisions that had to be made when storing the attribute data associated with NFT collections.

Attribute data associated with NFT collection should contain as much as possible useful information about the distribution of attributes throughout a collection. In order to talk more easily about attributes of the same kind, let’s define trait as the collection of attributes that share the same name.

One natural way of gathering information about a trait is to create the set of all the different attribute values from the trait. In most normal collections, both the number of different traits and their individual sizes will have manageable size.

But, some of the not so well behaved collections will end up having huge trait size that cannot be handled by the current implementation of the pipeline due to the size limitations imposed by DynamoDB (500KB) and SNS/SQS message size (265KB), the latter one being the max size of the collection.

Collections larger than 256KB are rare, but they nevertheless exists. There are several reasons why some of the collection trait datasets can get so large, and here are some of them.

NOTE: All the following methods are applied just to the problematic cases (0.04%) where there’s too much trait data per collection.

Numerical traits

Problem: Some of the collections have numeric traits. These traits, especially if they’re floating point numbers, are mostly unique for all the NFTs from a collection. They’re mostly harmless, but if the collection has huge amount of NFTs, the size of such trait can grow to a significant amount.

Solution: If trait data for collections happen to be larger than the max size, the data is searched for all the numerical traits which are shared by a significant number of NFTs from a collection.

Significant number means that at least 25% of all the NFTs from a collection — both the ones which have this trait and the ones which dont — need to have different attribute values for that trait. This check is done in order not to lose any information on rare traits that are possessed by only a handful of NFTs.

After these traits are identified, their numeric values are compressed into a histogram with 7 buckets. This number seemed like a good choice both because of the back end calculations and because of the front end display of such values, since 5 would be too low and 9 would be kind of cumbersome to keep track of. And even number of buckets would be just wrong. So instead of saving the trait data in the usual way, instead of attribute values a new field is added, histogramInfo, that contains min and max values together with buckets field that contains array of 7 integers, each representing the number of occurrences of attribute values in that bucket’s range.

High-variance non-numerical traits

© Enver Podgorcevic.RSS