Hunter Lampson

Posted on May 17, 2022Read on Mirror.xyz

Mapping the Ecosystem of Decentralized Cloud Storage (Part 1)

Understanding the Global Datasphere

The digitization of human life is creating explosive demand for data storage and retrieval. From 2015 to 2025, the global datasphere is estimated to grow at a CAGR of 26%, with over 180ZB of data created, stored, and replicated by 2025. If you were to stack enough 10-terabyte hard drives to fulfill the world’s data needs by 2025, the stack would literally reach the moon.

The growth of the global datasphere from 2015 to 2025.

In economic terms, the cloud storage market was valued at roughly $76B in 2021; by 2028, it will reach $390B (a CAGR of 26.2%). Despite such explosive economic growth, market share among cloud storage providers continues to consolidate. As of Q1 2022, the 3 largest cloud storage providers—Amazon Simple Storage Service (S3), Microsoft Azure, and Google Cloud Storage—captured 65% of market share. Importantly, the Big 3’s collective dominance is growing at an increasing rate. In the past 12 months, the Big 3 saw 34% YoY growth in cloud storage alone, while their competitors grew between 10-20% YoY. Since 2018, smaller players in the space have lost 12% of market share, down from 48% to 36%.

The centralized cloud storage market is consolidating, enabling an oligopoly that can artificially manipulate prices and make it more difficult for new entrants to survive. The power possessed by centralized cloud storage (CCS) providers compounds their network effects, reputations, technological infrastructures, and balance sheets such that potential competitors are simply unable to compete.

Types of Cloud Storage Solutions

  1. On-Premise
  2. Centralized Cloud Storage (CCS)
  3. Decentralized Cloud Storage (DCS)

On-premise storage and CCS providers—the Big 3 (Amazon, Azure, Google) as well as Alibaba Cloud, Salesforce, IBM Cloud, Microsoft OneDrive, Box, iCloud, Dell, Adobe, Digital Ocean, and more—are characterized by their location-centric storage approach. This means information is stored in a single location (or a small handful of locations, usually 3 or fewer), managed in a single database, and operated by a single entity. In short, both on-premise and CCS solutions are prone to adverse outcomes as they represent a single point of failure.

The proliferation of CCS solutions requires an historical glance at the economics of on-premise data storage. At first, companies and users stored data on their own hardware. This meant that data was stored and maintained in the same physical location of the entity that wished to store it (a user’s home computer or a company’s onsite data server). I refer to this as Phase I.

The three phases of cloud storage adoption.

As the network effects of cloud storage enabled cheaper (and oftentimes more secure) storage capabilities, consumers and companies gradually moved to the centralized cloud. At first, these solutions provided easier, cheaper, and more secure ways to access data—but not without making important trade-offs. Fundamentally, the architecture remained the same: one receptacle was responsible for 100% of an entity’s data. In exchange for lower pricing and higher levels of security, users gave up control over their data. Today, most companies rely partially or fully on CCS providers (Phase II).

In the CCS regime, what was once economically optimal has become economically imprudent. Given the exponential demand for data storage, companies are now forced to be more selective in determining their storage solutions due to the high costs of storage and the increased importance of safeguarding it.

Key Weaknesses of CCS Solutions

  1. Lack of Data Ownership
  2. Prone to Data Breaches & Outages
  3. Prone to Censorship
  4. High Relative Costs

Lack of Data Ownership

When users upload data to a CCS provider, they no longer (necessarily) own their data. Apple’s controversial decision (later revoked) to scan iCloud users’ photos for child pornography is a perfect example of this. Apple boasts strict privacy-protecting policies when data is stored locally (on-premise) on a user’s hardware product (iPhone, Mac, etc.). But the moment a user uploads a single byte of data to iCloud, Apple considers this data to be in their domain—no longer in the domain of the user. This precedent implies that data stored in the cloud has the capacity to belong to the storage provider.

Data Breaches & Outages

One need not look far to find massive data breaches among CCS providers. Amazon, Azure, and Google have each suffered given their single-point-of-failure structure.

The centralized construction of these providers allows for them to build ‘large walls’ and provide a higher level of security relative to on-premise solutions. At the same time, the larger and more centralized a database becomes, the more coveted it becomes for an attacker. Data outages, too, are commonplace among CCS solutions. Examples can be seen here: Amazon, Azure, Google.

Data Censorship

CCS providers not only lose data uncontrollably, but they also intentionally remove it. Just last week the popular YouTube channel Bankless was terminated with no warning, notification, nor justification. Google, which owns and stores YouTube content on its cloud service, thankfully reinstated the channel, but the capacity that Google and other CCS providers have to terminate the existence of certain data is possible when stored with CCS providers.

Cost

Perhaps the most critical drawback of CCS solutions is the high fees. Despite the fact that, over the past 50 years, the cost to store data has decreased by an average of 30.5% per year, CCS prices have been flat for the past ~7 years. The past decade of downward trending prices have come to a halt. As the purchasing power of a unit of storage would, in theory, be rising (by ~30.5% per year), it has remained flat—this has not been the case for DCS providers.

The cost of data storage has decreased by an average of 30.5% per year for the past 50 years.

Despite the decrease in storage costs, pricing from the Big 3 has been flat for the past 7 years.

The primary reason for the delta between the price of storage and the cost of storage is due to the market dominance that CCS providers maintain today—again, DCS models have followed a different path.

DCS Solutions

Building atop the weaknesses of CCS, decentralized cloud storage (DCS) has proven to be a paradigm shift in the data storage landscape (Phase III). DCS solutions take a content-centric approach to data storage, meaning that data is recalled more efficiently. Through decentralized marketplaces, DCS providers enable the utilization of idle hard drive space across a geographically-distributed set of nodes. This diversification principle removes the single-point-of-failure risk that is present in both on-premise and CCS solutions while lowering costs and returning data ownership to users.

Cost comparison of the Big 3 and the 5 most prominent DCS solutions.

DCS solutions efficiently link the supply and demand of excess storage space. Today, 75% of hard drives worldwide are less than 25% full. At the same time, the sum of all storage capacity in data centers is much smaller than the storage capacity owned by individuals (think: iPhones, laptops, etc.). DCS providers take advantage of this wasted storage space to achieve lower costs and higher levels of data redundancy, reliability, security, scalability, and efficiency.

The four most prominent DCS providers, by market cap and traction, are Filecoin, Arweave, Sia, and Storj. (Other providers, such as StorX, Internxt, SAFE Network, Filebase, Ceph, and Swarm are not explicitly dealt with in this piece, given their lower levels of adoption.) An in-depth competitive analysis of each project is forthcoming in Part 2. A high-level overview of the 4 key players helps us better understand the state of the DCS market today.

To borrow from Spencer Applebaum and Tushar Jain, a helpful primary distinction to make between DCS services is the difference between contract-based and permanent storage solutions. Simply put, all DCS services in the market today are contract-based models, with the one exception of Arweave.

Contract-Based vs Permanent Storage Models

Filecoin, Sia, and Storj utilize a contract-based pricing model—the same model deployed by CCS incumbents today. Contract-based pricing means that users pay to store their data on an ongoing basis, similar to how one would pay for a subscription. Though Filecoin, Sia, and Storj have their differences, what they share is their unification in competing directly with existing CCS providers today, albeit in a decentralized manner.

Arweave, on the other hand, uses a permanent storage model. This means that users pay a single, upfront fee, and, in return, their data is stored permanently. Too often, Arweave is lazily and imprecisely compared to other DCS and CCS providers. Though Arweave shares some similarities with these solutions, the fundamental feature that defines Arweave is entirely distinct from its competitors. It is not that Arweave is entirely dissimilar from these providers, it is that Arweave is creating an entirely new market*.* It is best, and most accurate, to understand Arweave as fundamentally distinct from its perceived competition—namely Filecoin, Sia, and Storj. Arweave is a fundamentally different service.

Market map of contract-based and permanent storage solutions.

A closer look at Filecoin, Arweave, Sia, and Storj helps us better understand their similarities and differences in comparison to CCS providers.

Comparison of the 4 most prominent DCS solutions.

Filecoin

Filecoin, which launched its mainnet in October 2020, is the most widely adopted and well-funded DCS project in the market today. As of May 14, 2022, Filecoin had a fully diluted market cap of $1.81B, with an all-time high of $12.3B. Juan Benet is the Founder and CEO of Protocol Labs, the company that builds Filecoin and the InterPlanetary File System (IPFS). To date, Filecoin has raised $258.2M in funding, the majority of which came via an Initial Coin Offering (ICO) in late 2017. Filecoin’s lead investor is Andreessen Horowitz, with participation from Sequoia, Alumni Ventures, Winklevoss Capital, Boost VC, BlueYard, USV, Naval Ravikant, Digital Currency Group, Y Combinator, and more. Notable users of IPFS include Brave Browser, Opera Browser, Decrypt, MetaMask, Piñata, and Uniswap.

To understand Filecoin, we must understand IPFS, a peer-to-peer (P2P) distributed system for storing and retrieving data. Built to address the shortcomings of the HTTP-based internet, IPFS uses content-addressing to categorize data, meaning that information is searched for and delivered on the basis of its content, not its location. This is achieved by issuing each piece of data a content identifier (CID). This means that search on IPFS is sent to many nodes that contain the desired information, rather than a single, centralized location that would exist in the HTTP model. Rather than requesting a single address where data is hosted, IPFS requests a cryptographic hash of the desired data, increasing the security of the system while enabling greater accessibility via higher levels of bandwidth.

IPFS is the communication network through which data is stored and transmitted, and Filecoin is the economic system built on top. IPFS alone does not incentivize users to store the data of others: Filecoin does. This is accomplished with two unique proof mechanisms: Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt). PoRep is run once to verify that a storage miner has the content they say they do. For every on-chain PoRep there are 10 SNARKs (Succinct Non-Interactive Argument of Knowledge) included, which proves that the contract was completed successfully. On the other hand, PoSt is run continuously to prove that a storage miner is dedicating storage space to that same data over time. The on-chain interaction required to validate such process is data-intensive. Filecoin uses zk-SNARKs (Zero-Knowledge Succinct Non-Interactive Argument of Knowledge) to generate these proofs and compress them by up to 10x.

Sia

Of the four DCS solutions discussed, Sia was the first to launch and did so in June 2015. Founded by David Vorick and Luke Champine at HackMIT in 2013, Sia has seen strong user traction, boasting a fully diluted market cap of $267M, with an all-time high of $2.97B. In September of 2020, Sia raised $3M in private funding from lead investor Paradigm, with participation from Bessemer, Bain Capital, Hack VC, Dragonfly, A.Capital, SV Angel, Collaborative Fund, and more.

Sia was launched by Nebulous Labs, which was established in 2014. Now rebranded as Skynet, the company and its main network, Sia, seek to reinvent the internet through the use of decentralized storage, applications, and content deliver networks (CDNs). Like Filecoin, Sia divides uploaded data into smaller composite parts (in this case, shards) and disperses them to different hosts around the globe. Unlike Filecoin, Sia achieves this via a different Proof-of-Storage (PoS) mechanism. To prove that hosts store the data they say they do, they are required to share a small percentage of randomly selected data every so often (usually every 90 days). This proof is stored on the Sia blockchain, and the host is rewarded with Siacoin.

Today, Sia is most predominantly used by Filebase, the first S3-compatible (Amazon Simple Storage Service) decentralized storage network, and also by Arzen, a consumer-focused decentralized storage platform.

Storj

Storj, like Filecoin and Sia, has gained significant traction since their launch in October 2018. Founded by Shawn Wilkinson, Sia has raised $35.4M to date from Tank Stream Ventures, BnkToTheFuture, Iterative Capital Management, TechSquare Labs, GVA Capital, and more. As of May 14, Storj had a fully diluted market cap of $235M, with an all-time high of $1.04B. Today, Storj is used by companies such as Fastly and FileZilla.

Storj is best differentiated from Filecoin and Sia given their Proof-of-Retrievability (PoR) mechanism and native S3 compatibility. Technologically, what differentiates Storj from both Filecoin and Sia, is that they use erasure coding alone—not Proof-of-Replication—to increase data redundancy. This means that data durability is not linearly linked to the expansion factor, meaning that higher durability does not require a proportional increase in bandwidth. Given node churn, erasure coding could prove to be valuable over the long term because it requires less disk space and less bandwidth for storage and repair, despite its increase in CPU runtime.

Arweave

In contrast to Filecoin, Sia, and Storj, Arweave provides permanent data storage. Launched in June of 2018 by Sam Williams, CEO, and William Jones, Arweave has reached a fully diluted market cap of $1.22B, as of May 14, 2022, with an all-time high market cap of $2.88B. Arweave has raised $22M in funding led by Andreessen Horowitz, with participation from Coinbase Ventures, USV, Multicoin, 1kx, FJ Syndicates Techstars, and others.

Arweave seeks to provide permanent data storage in a decentralized manner for a one-time fee. This is accomplished by Arweave’s tokenomics mechanism. Given that the cost of data storage has decreased by 30.5% per year for the past 50 years, Arweave assumes that the purchasing power of a dollar today is higher than a dollar in the future. This delta drives Arweave’s endowment pool—the principal is the upfront fee paid by the user and the ‘interest’ is the increase in purchasing power over time denominated in a currency today. Arweave’s conservative assumption of a 0.5% decrease in storage price per year enables the long-term viability of the endowment pool.

Arweave’s current cost of ~$4.33/GB reflects the terminal (one-time) cost. In the short term, solutions like Sia and Filecoin (and even the Big 3) are cheaper. But over the long term, Arweave is a more sensible choice. But, even over the short term, users pay a premium for something nobody else can offer: data permanence. For some, the cost of permanent storage is relatively inelastic because data permanence is required.

Arweave is powered by the blockweave, a collection of blocks that are linked to multiple previous blocks in the network. This allows the miners to provide Proof-of-Access (PoA) in order to add new blocks to the network while ensuring that existing data is continuously replicated across various nodes. Miners on Arweave are thus incentivized to replicate new and existing data in exchange for the native token.

Built atop the Arweave protocol is the permaweb—similar to the world wide web today, only permanent. Arweave is the base layer which powers the permaweb; the permaweb is the layer that users interact with. Given that Arweave is built on HTTP, traditional browsers have access to all data stored on the network, resulting in extremely seamless interoperability. Notably, Arweave recently partnered with Solana to serve as the storage layer for the protocol. They have also partnered with notable projects like DecentLand and Copper.

Part 2 Forthcoming…

To gauge adoption, we must come up with a measure other than market cap. As Terry Angelos, former Head of Crypto at Visa, says, “Pricing is the worst way to measure crypto adoption.” Instead, we must look at usage.

Part 2 of this analysis is a deep dive into the competitive analysis between these four DCS solutions, their ecosystem growth, tokenomics, traction, and future viability.

P.S. This article is written on Mirror and is therefore stored on Arweave permanently.