Qazal

Posted on Nov 30, 2021Read on Mirror.xyz

A mathematical approach to NFT rarity

You might have just paid 20K for an NFT, but was it worth it?

NFTs come with a variety of attributes and they get high valuations for their uniqueness. But how do we find out what's unique?

NFTGO's new rarity model does exactly that. Our model uses a statistical approach called Jaccard distance under the hood to make this possible.

What is Jaccard distance?

Jaccard distance is a metric for measuring similarities between two sets of data. In our case, the Jaccard distance takes into account the similarities between NFTs based on their attributes and assigns a rarity score ranging from 0 to 100.

Jaccard distance formula

The distance between two data points, in this case, is equal to the number of similar attributes divided by the total number of their attributes. We see how this plays out in different NFTs. Our model employs this algorithm to quantitatively measure the similarities between two NFTs. This opens up the possibility for us to estimate an NFT’s rarity scientifically. 

** **

Why does Jaccard distance work? Part 1

This approach measures the overlap between two finite sets. The higher the overlap, the greater the Jaccard distance. By understanding the similarities across all the NFTs in a collection, we can gauge how rare a token is. 

Deeper dive 

For understanding this model on a deeper level, let’s step back from NFTs and consider a simple dataset with 3 sets: A, B, and C.

Let set A contain numbers {1, 4, 8} and set B contain {9 , 1, 10} and set C contain {8, 4, 10}. We want to know how “rare” set A is relative to sets B and C. 

We can do this by first calculating the Jaccard distance for A and B, and then doing the same for A and C. Take the average and normalize the results. 

We can see that A and B have a total of 5 unique values. The sets have 1 value in common 

JD for sets A and B

J(A, B) =1 / 5 = 0.2

Now we perform the same equation for A and C: 

They have two values in common (8, 4)

There’s a total of 4 unique values

J(A, C) = 2 / 4 = 0.5

The average of the JD for A and other two sets:

Average = (0.5 + 0.2 ) / 2 = 0.35 

If we calculate the average JD for B and C, we get 0.2 and 0.35 respectively. 

We normalize the final average using this formula, later on, we will see how crucial this formula is. By applying this normalization, we get the z-score for the NFT.

z-score formula (x is average JD)

For the normalization step to get A’s z score, we get the difference between the initial value we got for A from step 2 and the lowest average value of all the dataset. In this case, this would be the average JD for set B. Then, we divide the results by the difference between the maximum and minimum values in the dataset. 

z(A) = (0.35 - 0.2 ) / (0.35 - 0.2) = 1

Finally, we multiply the z-score by 100 to get the rarity score. This is:

A: 100

C: 100

B: 0

We can conclude that A and C are the rarest and B is the least rare in this collection of 3 sets.

Why does it work? Part 2

You now understand how the model transforms the data and gives you the final results. Now that it’s not a black box anymore, we can go back to the original question of “Why does it work”. This is more clear when we add some examples. This time, from the real NFT world! 

Let’s consider the CryptoPunks. This collection has 6969 NFTs in total. Our goal is to estimate the Rarity of a single NFT in this collection relative to the other NFTs. Keep in mind that rarity is a relative attribute and in this model, we consider all the data from the collection to indicate an NFT’s rarity.

Let’s look at CryptoPunk #5577. This Punk has 2 attributes in total. Some of them might commit to its rarity, others might bring its score lower. Let’s see how rare this NFT is. 

A good way of visualizing the model’s point of view is using Venn diagrams. We compare the other 6968 NFTs with this Punk and get its Jaccard distance for each of them, as an example, here’s the computation for #5577 and #6965. You can see the rarity score for each of them -which was computed by the model-, and also a Venn diagram of their attributes. 

The rarest CryptoPunks

The Jaccard distance for these two would be 1 / 3 which is about 0.3. Remember that we divide the number of unique attributes by the total number of attributes.

We can see that these two NFTs are relatively similar but not quite. The more similar the two NFTs are, the closer the Jaccard distance is to zero. The extreme case would be that if two NFTs have Identical attributes, they would have a Jaccard distance of 1 and therefore, none of them are “rare” relative to each other. 

This calculation is spread across all NFT pairs in the collection. By taking the average, we can see how rare the NFT is relative to all the other NFTs in the collection.

But taking the average is not enough, we have to take into account the Jaccard distance for other NFTs from the collection. This prevents us from Overestimating or underestimating an NFT purely based on its scores. 

Assume that we have a collection of NFTs. We want to calculate how rare an NFT is and we do the previous two steps, we take the Jaccard distances for all the pairs and then take the average. We’re happy because the NFT’s average score is not close to one and therefore we conclude that the NFT is Super Rare.

Can you guess what the problem is?

What if other NFTs have the same score? Let’s take the extreme case again. We have a collection of 100 NFTs and all of them have one attribute: Color. Every NFT has one unique color. If we aren’t aware of that and we blindly add up the Jaccard distances and take the average, we get the pretty good (seemingly) result of Zero. 

Viola! We have a super rare NFT on our radar.

Not quite,

The problem is that if we do the same two steps for all the other 99 NFTs, we get the same result. 

Is our NFT really unique and rare in that collection?

Well, we again go back to the fact that NFT rarity is a relative attribute. We can’t rely on results from only one NFT. 

Although this is where some approaches to NFT rarity calculation completely stop, we now know why it isn’t enough at all and we have to continue with steps 3 and 4.

z-score formula

The numerator is telling us how close the NFT’s average Jaccard distance is to the least rare NFT. 

In the denominator, we have the difference between the maximum average distance (The rarest) and the minimum average distance. This gives us the “diversification” in an NFT collection. If the collection consists of similar elements with not a lot of rare items, the difference between the maximum and minimum value of average JDs would be quite insignificant.

This will reduce the effect of underestimating the rarity of an NFT only because of its small JD with other data points when this analogous pattern presents itself in the whole collection.

Going back to our 100 colorful NFTs example, this formula will tell us the truth about how rare our NFT is.

Stay focused on the truth

Nowadays, everyone is coming up with a new way of measuring NFT rarity. Many of these models are not understood very well by the users. My goal with this article was to give you the ultimate guide on how our rarity model actually works.

Trust is built through sharing knowledge. Now that you know the mechanism behind our rarity rankings, you can make a much more educated and precise decision when you use our Rarity metrics. 

NFTGO is the digital treasury for the metaverse. We aim to give you the best experience in the transition to a new world. The truth always lies behind the data and now that you know how we approach this data to calculate rarity, you can go and explore the model and know exactly how it works.

Don’t forget to share this with a friend and have fun showing off your rare NFTs!

Written by: Qazal