The Data Access financial asset class - Thesis
Web3 protocols allow Data to packaged into tradeable instruments while preserving access (ownership, usage and monetization) rights. This core primitive will have several downstream effects, some of which I try to capture in this thesis.
I expect these effects to play out in the following broad categories which I describe in detail next.
I'd strongly suggest reading the previous articles (1, 2) in this series if you're unfamiliar with what financial assets are and how Data Access might be packaged into a financial asset.
a. Data Asset financialization
b. Types of Data assets
c. Data Asset valuation
d. Players in the Data Asset economy
e. Emergent effects due to other Web3 developments
f. The role of financial and political regulators
g. Conclusion
Data Asset financialization
If we grant the premise that Data Access rights will be tokenized, a few things follow quite naturally. Data Asset ownership will be tokenized in some form of Non Fungible Tokens (NFTs) while Data Asset usage (for real world applications or for trading and speculation) will be tokenized in some form of Fungible Tokens.
Downstream of these broad token types, a number of financial instruments will emerge. For tokens representing Data Access with sufficient underlying liquidity and market demand, derivative instruments like tokenized bonds, options, futures, insurance contracts, collateral for loans will begin to trade. Money flows from Data Asset monetization will be distributed to Data Token holders.
Much like our current day markets, value capture will be Pareto distributed. Large, mid and small-cap Data Asset indices will emerge.
Downstream of tokenization, Data Access will become commoditized. In other words, there will be a fiat (or equivalent crypto) denominated value to every row of Data in a structured database. Similar metrics will be created for unstructured Data. The market value of Data at any given time will be a reflection of supply-demand dynamics. Corporate finance will adapt to reflect the Currenct monetary Value of Data (CVD) on balance sheets as an asset.
Arbitrage is the lifeblood of financial markets. Data Asset Markets will be no different. Opportunities for arbitrage will emerge as Data Assets get listed as tokens (fungible and non-fungible) on multiple exchanges.
Types of Data Assets
Dynamic Data that is continuously updated and reflects changing trends will be highly valued. Static but rare Datasets will capture some value in terms of one-time sales at a higher cost, but will not capture significant staking value. The difference in valuation between the two will come down to the probability of value loss due to Data Leakage. Static Datasets will be more vulnerable in this regard.
As the Web3 -> Web2 transition gains steam, the semi-unstructured data in public blockchains will continue to get structured. As-a-service Data models for public blockchain Data will co-exist with tokenized variants. Since blockchain Data is public, value additions (frequency and variety of data, extracting ML-ready useful features from this Data, crosschain joins) will be the basis upon which datasets are distinguished from each other.
Structured and Unstructured Data are complementary goods. As more unstructured data becomes available, it's per-unit price will drop. This will result in a signifcant increase in demand for Structured Data.
Demand for Structured Data (y-axis) vs Price of Unstructured Data (x-axis) |
If Data is the new Oil, Algorithms are the new Refineries. As a parallel to how Deep Transfer Learning developed, Tokenized Algorithms trained on Tokenized Datasets will replace custom-built ML models for a wide variety of industries.
Data Access valuation
In a world where Data Access is a commodity, an ecosystem dedicated to the continuous evaluation and pricing of that data will emerge.
Even if a small fraction of daily generated data (~400+ Exabytes as of 2022) finds its way out, we will have a deluge of Data Assets coming on to the market.
As a consequence, formal Data Asset quality estimation and valuation techniques will be developed. Data Asset valuation will be based on 3U's: Uniqueness, Utility and Usability. Expect these qualitative metrics to be quantified in the coming years.
Formal Data Valuation will become a subject of study. Formal techniques analogous to present-day discounted cash flow analysis for equity valuation will be created for Data Valuation. Formal Techniques for estimating the quality of Data Assets based on sample Data Assets will get developed to guide buyers. Data Science and Economics will merge (not just in the econometric sense) to create a new sub-discipline dedicated to monetary analysis of Data.
Formal techniques for estimating Data Publisher reputation (based on Data availability, quality of datasets, publishers stake in datasets) will get developed.
Similar to present-day security audits of smart contract platforms, credit risk analysis and fundamental equity analysis, assessing Data quality and converting that into valuation criteria for investment will become a separate industry category. The equivalents of Moody's, S&P and Fitch for Data Access will be born.
A key difference between Data Access and other commodities such as gold, oil and Bitcoin is that the underlying Data itself is not the bearer asset.
This is a problem, because valuation of the underlying Data Asset is predicated on the availability of access to the Asset! This is a dilemma unique to Data Access as an asset class. There will be an inherent tradeoff between exposing enough Data to enable formal valuation on one hand while retaining most Data to prevent Data Leakage. Privacy preserving techniques such as Differential Privacy and Compute-to-Data will gain greater importance in this light.
Players in the Data Access economy
The Web3 vision of empowering users by providing genuine ownership of their data will empower players who are enfeebled in the existing Web2 Data economy while creating new power players in the new Data Acess economy.
Individuals
Data that pertains to individuals - browsing Data, purchase Data, biometric Data, health records - are typically of no value at the level of the individual. When aggregated across millions of individuals, this data becomes invaluable.
This insight has been weaponized in the Web2 Data economy under the guise of "improving services". The same insight will lead to Data Unions becoming major players in the Web3 Data economy. Due to the inherent friction in tokenizing personal Data, individual Data will flow into Data Unions. Data Unions like Swash and Data Union and Basic Attention Token will proliferate and compete for access to Individual Data.
Data valuation techniques described earlier will create new Data-Monetization-as-a-Service models. Users will be able sell access to their personal data to a single Data Union for higher prices or multiple Data Unions for a lower price. Web2 Data Giants will incentivize users through partnerships with Data Unions or provide direct monetization opportunities to lock in user Data.
Tokenization of Data and Formal Data Valuation will lead to some surprising consequences for individual Data. Services for the creation, anonymization, storage and monetization of all personal Data as NFTs will be created. Each type of personal Data reflects user preference and behavior that informs an aggregate Data distribution.
As anomalous data allows a better estimation of the limits of a Data distribution, such data could become significantly more valuable than "typical" Data. Consequently, users with such "anomalous" personal Data might have personal data NFTs that are significantly more valuable than what is typical.
Corporations
Corporate Data is of three main types. The first is Data generated during the course of operation - payrolls, benefits, employee records. The second is a product of the value that the company is trying to generate. This could be experimental Data for a new product or a custom design for instance. The third is Data from customers using these products.
The first and third types of Data will be subject to the same dynamics as individual data described previously - they will flow into Data Unions. For businesses that depend entirely on customer Data in current usage environments - e.g. Google or Meta - sharing customer Data becomes a competitive disadvantage.
There is no evidence to suggest that these companies will move willingly to a Web3 business model where customers are free to move across services with their Data. A clear competition between these Web2 business models and Web3-based competitors will emerge. Web3 competitors that can provide a user-friendly experience while allowing users to retain and monetize their Data will win.
The second type of IP-restricted Data described above will be converted to monetizable Data Assets if the problem being solved is insurmountable by the company holding the Data. This Data will flow into collaborative Data Unions that are industry-specific. An example of such Data is training data for self-driving cars. The bigger players in this field (Waymo, Tesla, etc) have orders of magnitude greater access to driving data with which to train better self-driving models. Players new to the game have created consortia in an attempt to gain parity.
Having access to a greater pool of Data levels the Data Access playing field and focuses competition towards better Data Science/ML. The tradeoff will be acceptable to all but the largest companies in the space leading to a two tier system - bigger Corporations with deep pockets to generate their own Data and smaller companies that will collaborate on Data collection but compete on Data science/ML. The latter category will be a price-race to the bottom with scaled compute and ML becoming commoditized. This competition will also force Big Data Giants to compete on prices and leave end-customers better-off.
Middleware that helps corporations monetize their Data more effectively will get developed. Data Science focused companies - e.g. Databricks, Snowflake - will integrate Data anonymization, packaging, samples-as-data-advertising and tokenization flows into their ETL flows and Data offerings.
Data Specialists
This category includes purely Data related occupations like Data Scientists, Data Engineers, Data Analysts, ML researchers and engineers. It will also come to include new occupations such as economists, cryptographers and mathematicians due to the need for formal Data Valuation and Data Privacy-preserving techniques.
In addition to Corporate staffing for these roles, consultancies and boutique firms dedicated to this field will emerge. The emergence of Data/AI focused collectives such as Algovera will lead to greater opportunities for individuals to monetize their skills in these areas.
Data Asset holders will incentivize staking on their Datasets by publishing optimally distributed (from privacy and accuracy standpoint) samples of Data. Greater access to such Data and the availability of formal techniques to evaluate Data Quality will exponentially increase TVL in staked Data Assets such as Datasets and Models. Data Specialists will power Smart Money in a world that starves for cash-flow backed yield.
Entirely new job descriptions - for instance, "Credentialed Web3 Data Curator" - will emerge from some combination of the following
1. Increased demand for structured blockchain Data
2. The ability to get verified credentials (that reside on the blockchain) and expertise on specific tasks.
3. The ability to stake and earn income by staking on Data Assets.
Investors
Assets that offer the possibility of a positive real yield in a world starved of it will inevitably attract the attention of the professional investor class. As in the present day financial system, the Data Access asset class will attract participation from the usual suspects: Data Asset arbitrageurs, Hedge Funds, Traders, Asset Management firms, Pension funds, Sovereign Wealth Funds and so on.
This new set of professional players in the Data Asset ecosystem will demand a high level of sophistication in Data Asset valuation that is presently absent. This demand will fuel the advances in Data Asset valuation mentioned in previous section. Professional participation will go hand in hand with the creation of new investment-related products: Indices, Index Funds, ETNs and ETFs, Data-as-collateral and so on.
In previously developed asset classes, retail investor participation typically followed professional investor participation. Most of the price gains that result from being early to an asset class were therefore never accessible to retail participants. Contrary to the usual timeline (and analogous to the broader crypto market), retail investors with a long-term view and the conviction to hold tokens/coins of protocols that make a real contribution to the Web3 Data Asset space will be handsomely rewarded.
Note that I haven't mentioned Venture Capital (VCs) here since it is well known (to the point that it is contentious) that VC funding is driving Web3 infrastructure and application building.
Emergent effects due to other Web3 developments
In a previous article, I wrote about how unanticipated applications will emerge as a consequence of the composability of simpler Web3 and AI building blocks. Given the explosive growth in Web3 applications, it is not possible to write about this topic in any great detail. That said, it is possible to anticipate some future applications that will exist only due to new technological primitives enabled by Web3.
Let's take the example of streaming micropayments. Our present inability to pay for and settle transactions in the sub-microcent range holds back per-access-monetization business models that would make flat-fee models redundant. Programmable cash-flow protocols like Superfluid and the Lightning Network will enable new Data Access monetization models.
For instance, Data NFTs bestowing ownership of an underlying Data Asset will be able to be rented out for time periods of the order of seconds. Streaming payments for the rental time period will be made to the original owner. Micro-access tokens will provide limited access to infinitesimally small portions of Datasets or limited-time access to Algorithms. Strategies like Active Learning on Data Streams and per prediction-payment will become feasible due to micro-payments with almost negligible overhead.
Blockchain Oracles such as Chainlink, Pyth and others will serve two simultaneous roles: They will convey real-time pricing Data for Data-related underlying financial instruments while packaging this Data into tokens for sale.
Fast L1's such as Solana and protocols such as Serum will provide alternatives to DEX-based markets. In all but the most liquid Data token markets, DEXes will be the better alternative for the new financial asset class.
DAOs will incentivize functionally similar groups of Data Specialists to collaborate on making publically available Data tokenizable.
Web3 dispute resolution mechanisms such as Kleros will attempt to adjudicate on matters such as Data Access Right infringement.
Data composability will revolutionize the ability to access and build on data. This will have downstream effects on the availability of structured data - in addition to tokenized raw or structured data, we will have tokenized Data Models.
I could go on, but this section would never really be complete, so I'll hit the pause button on this topic for now.
The role of financial and political regulators
Congress, the SEC and/or equivalent authorities in most countries will attempt to create and enforce regulations around the type of Data that can be traded on exchanges. Definitions of tradeable Data will be kept intentionally broad for maximum prosecutorial flexibility.
The distinction between crypto-assets like Bitcoins and Data Access Tokens is subtle. And not just because one is a Coin while the other is a Token. Coins like Bitcoin are bearer instruments - the owner of the keys owns the Bitcoins. Contrast this to Data Access, where a number of issues such as jurisdiction (for legal disputes) and liability arise.
The owner or trader of a fungible Data token does not own copyright over the underlying Data. The original copyright holder/minter of a Data NFT, on the other hand, does have ownership and might have eventual culpability in case violations of law are determined.
Data-derivative products, especially those that provide passive income or remotely intersect the definition of a security, will increasingly come under regulatory scrutiny. (I am writing this section on the day BlockFi announced a $100M settlement with the SEC.)
The development of crypto as a technology and simultaneously as an asset class has continued in an environment of mainstream indifference at best and outright hostility at worst. Even so, actual regulatory action around crypto assets has, up to this point, been fairly minimal. This looks set to change. Regulation and enforcement of existing rules on to the crypto industry will not leave the new Data economy unscathed.
On the flip side, ~50-100 year old securities laws in multiple countries will be forced to be modernized due to the combination of
- Increased demand for Data Access products for growth, income and long term investment.
- Increased demand for investor protection after multiple incidents of fraudulent behavior.
- Inability to properly regulate decentralized protocols under the ambit of existing securities laws.
As a consequence of these developments, Corporates, protocol teams and research institutions will weigh economic incentives for Data financialization against the possibility of flouting sanctions. This will limit the kind of Data that will broadly be available to be financialized. This will have a chilling effect on the availability and sharing of potentially useful corporate and research-related Data (for instance, in sensitive areas such as semiconductors) despite the economic and societal incentives.
The geopolitics of Data Access financialization will not be an even landscape. For instance, the GAIA-X project in Europe has made already made very forward thinking
moves in regards to regulations on Data sharing. It's not difficult to imagine that China and the US will have very divergent views on export control of Data Access in matters of national importance.
Conclusion
The tokenization and consequent financialization of Data Access is a process that has just begun in recent years. This process will play out fully in the coming years and create new players, ecosystems, winners and losers in its wake.
Fully anticipating the future Idea Maze of the downstream consequences of this financialization is impossible. Assuming that Data Access will be a financial asset in the future, however, does provide some firm ground in trying to make predictions of what is likely to support such a financial asset class.
There is, deservedly, enormous enthusiasm in communities that have currently invested their time and money into making Data Access a true Financial Asset. This thesis is written in acknowledgment of the enormous progress that has been achieved in this regard and the shining future that likely lies ahead.
That said, it is important to keep in mind that the process of financialization of an asset involves several factors besides just price and technology. Factors such as valuation, collaboration between new players in the space and proper regulation are arguably as important to successful, long-term financialization as the ability to simply trade Data Access.
Realizing the vision of Data Access as a bonafide financial asset will need a lot of different pieces (beyond price and technology) highlighted above to fall into place. In this thesis, I have tried to highlight some of these aspects that are generally not mentioned in mainstream discussion of the new Data Economy. This is partly to consolidate my own vision for the space and partly to bring an alternative perspective to members involved in communities building Data into a financial asset.
I hope you found this series of articles useful. As usual, please do not forget to leave bouquets and brickbats on Twitter @antaraxia_kk. And please do not forget to subscribe to this free newsletter for new articles on topics at the intersection of Web3 and AI. Until next time, Adios!