An Inquiry into the nature of Metaplex NFT Collections

Enver Podgorcevic, Solana NFTs
Back

Note: Things are luckily different now with the introduction of Metaplex collections standard, I'm keeping this article here just for the historical purposes

Prologue

Until recently published Metaplex Token Metadata Specification v1.1.0, NFT collections have been quite an elusive subject. There wasn’t any kind of on-chain collection field where you could specify that an NFT belongs to a certain collection. Surely, you could have used say symbol field, different for each and every collection that you drop, but someone could have used the same field value and perhaps trick people into thinking that their NFT belongs to your collection.

Or, you could specify your collection in the off-chain metadata field collection, where you could specify collection name and family.

The problem is, a lot of the NFTs don’t have this field specified, and if we want to have an automatic way of grouping (pre-v1.1.0) NFTs into collections, we will have to come up with some kind of heuristic for doing so. This document will explore some of the most popular ways the community has used to group their NFTs into collections.

Note: The datased used for analysis is going to be the set of all Metaplex on-chain metadata acquired on January 13th, 2022., at ~19:00 o’clock CEST. It’s 3.2GB in size and contains 6270291 Metaplex NFT metadata instances.

Grouping NFTs into Collections

Pre-v1.1.0 Era

As previously noted, collection handling wasn’t that good in pre-v1.1.0 NFTs. Nontheless, it would be good to have some sort of a heuristic for grouping them, so that users could search not only for single NFTs but also for collections of them.

symbol field

This field seems to be the first candidate for collection grouping. NFTs from a lot of popular collections have their “unique” symbol (which can be used by anyone else). Let’s see how many different symbols there are. I used the following command to extract that data:

cat metadata.data | \
	jq ".data.symbol" | \
	sort | \
	uniq -c | \
	sort -r -n > \
	metadata.symbol

This command first extracts the symbol value for every NFT (jq), does a lexicographic sort of the result (sort), then counts number of duplicate adjecent lines ([uniq -c](https://man7.org/linux/man-pages/man1/uniq.1.html)), and then again sorts the result, this time with numerical order (sort -n). This way we get a file with all different symbol instances, prefixed with their number of occurances. Pretty nifty.

The first thing I noticed that the most used symbol field is empty string — “”, with 1453320 occurrences. That’s around ~23.17% of all metadata instances.

There are 7508 different symbol fields. If we exclude symbols with occurrence one (3117) and the empty string (1), then there are 4390 different symbol names.

But this doesn’t tell us much about the actual collections, since everybody can use whatever symbol name they want. In order to get around that obstacle, we will have to consider additional fields.

Symbol + update authority fields

Combining the symbol and updateAuthority fields seems like a next natural step, since most of the NFTs from the same collection have the same update authorities.

The following command saves a sorted list of the number of occurrences of all unique symbol + updateAuthority pairs:

cat metadata.data | \
	jq -r '"\"" + .data.symbol + "\" \"" + .updateAuthority + "\""' | \
	sort | \
	uniq -c | \
	sort -r -n > \
	metadata.symbol.update_auth

The only difference from the first command is that this one has different jq filter. The difference in the resulting data is that this one results in a much more sensible dataset.

The first interesting thing we see is that the collection with the most NFTs is the one with symbom POWR, there are 100,000 of them and the collection name is Fractals. In order to find the name I had to actually google the mint address, it’s not written anywhere in neither on-chain or off-chain metadata.

And sure enough, other big collections like DAPE and SolPunks can also be found at the top of the list.

One other interesting thing that I found out is that this list can be used to find fake NFTs. Here’s one example of finding fake DAPE NFT. If we look at the list of all symbols, we see that there are 11605 NFTs with DAPE symbol. But if we look at the new list, we see that there are 10039 DAPE NFTs that share common update authority. If we grep the list to find all occurrences of DAPEs, we see that there are a lot of fakes, most of them in groups of one per update authority, but some of them have almost 300 NFTs per update authority. Here’s a random fake that I found, and it’s a copy of this original. Another interesting thing is that there are more than 10,000 official degenerate apes. There are 39 more, these might be either tests or maybe even some sort of secret NFTs, who knows.

But if we take a look at the second largest collection of the same symbol and update authority, we see that it’s not an NFT collection in the usual sense of the word. It contains all kinds of assets from an NFT game. We could group these assets by their names, because similar assets have the same name, and thus create something similar to collections of NFTs.

© Enver Podgorcevic.RSS