Headjack - the base layer of cyberspace

Headjack is a blockchain that links sovereign identities to content at web-scale. Key points:

  • Creation is fundamentally different from transfers and exchange of value - the design space around trust & data availability for media and identity is different from finance.
  • Following the UNIX philosophy - in Headjack identity is simply an identifier (unique number) and anything orthogonal (KYC, profiles, privacy, finance) can be layered on top of it.

  • It solves single sign-on and allows for user experience similar to Web2 through hierarchical authorization management - keypairs are not required by default and even those with keys bound to their accounts may choose to not explicitly sign every interaction.

  • Consensus is reached on the absolute bare minimum - the history of authorizations, names, keys & off-chain content anchors (merkle roots) - the simplest mental model for developers.

  • Headjack can support billions of accounts and link unlimited amounts of off-chain activity to them. The entire web can be rebuilt on top of it - a claim that is easily provable.

  • Content addressing is with persistent & human-readable URIs (instead of hashes) - the link between identity and data is cryptographically provable even if keys & names have changed.

  • It doesn't deal with off-chain data storage and retrievability - those are separate problems and Headjack simply lets entities point to ways for others to retrieve addressable content.

Book structure

  • What is Headjack - How the protocol technically works and how things like applications, services, DMs, social graphs, preferences, etc. could be implemented - the building blocks necessary to recreate anything from Web2 and beyond.
  • Why Headjack - What's broken with the web and a blueprint of what could be possible - services, business models, infrastructure, algorithms, markets, metaverse, etc.

  • Implementation of Headjack - A detailed specification of the implementation.

What is Headjack

  1. Guiding principles & design goals
  2. Identity & authorization
  3. Content addressing
    1. Host vs data-centric
    2. Blobs & persistent URIs
    3. Names, paths, & more
  4. Messages
  5. IDMs, preferences & social graphs
  6. Storage & retrievability of data
  7. Blocks, state & proofs, oh my!
  8. Throughput numbers (scaling)
  9. Headjack vs the competition

Named after the data port at the back of the head of synthetically-grown humans in the Matrix.

Design - guiding principles

These are the guiding principles when aiming for mass adoption of Headjack:

Customer obsession & the best possible UX

It is highly improbable that the masses (and even most crypto natives) would tolerate services that are much worse (slow, limited & cumbersome) and most of the competing attempts for decentralizing media are nowhere close. There are a few aspects to retaining the comforts and UX of Web2 that we've become so accustomed to:

  1. Nobody wants to deal with keys, wallets & self-custody because of all the headaches & complexities that come along with that. Creation & media are different from exchange & finance and it's OK to trust by default as long as there's a fallback. We should be aiming for better trust instead of trustlessness at the cost of UX. Users shouldn't have to manage keypairs on multiple devices & explicitly sign every interaction - by default they'll be logging into identity managers (IDMs) using email & passwords or SSO ("login with Google") and would then be using these IDMs as SSO to authorize applications to post on their behalf without requiring keys & signatures - by delegating trust. This way the majority of Web2 identity & authentication plumbing can be reused with Headjack underneath as just another backend. "Sign-in with Ethereum" doesn't scale - we should aim for familiarity.

    "With consumer products, simple and “wrong” beats complicated and “right.”" - @naval

    "The future of mass-market crypto experiences lies within apps that provide familiar, custodial experiences with the ability to graduate into non-custodial experiences." - a16z

  2. Users shouldn't have to think about and pay for the storage of their data & blockchain interactions by default - costs & complexity should be abstracted away & shifted to services in the background. Self-hosting is the opposite of customer obsession - let's aim for simplicity.

    "People don’t want to run their own servers, and never will." - Moxie

  3. Content addressing should be with human-friendly URIs with names & numbers instead of being full of hashes - typical for Web3. We're used to adequate URLs where the domain of the platform/website & even the user name are present when identifying content - hashes make URIs much longer & harder to remember. Contrast that to Headjack's addressing.

  4. The applications built on top of the network must match the responsiveness of Web2 and exceed its functionality. "Latency is not an option anymore" - Amazon found that every 100ms of latency cost them 1% in sales. 16 years ago Google found an extra 500ms in search page generation time dropped traffic by 20% - our irritable nature hasn't changed. Web2 isn't going anywhere - "market dynamics and the fundamental forces of centralization" dictate that the best services will be running on huge server racks in data centers with sophisticated caches & batch processing infrastructure due to data gravity.

Web-scale, blockspace & the UNIX philosophy

People grossly underestimate the size of the web and the required infrastructure. Here are some decade old twitter, google and other statistics and a few articles about what it takes to run Twitter: 1, 2, 3, 4, 5. What was going on within a single minute of 2021 is truly mind-boggling:

Headjack follows the UNIX philosophy - it focuses only on identity (identifiers represented as numbers & name ownership) & linking data/actions to it without trying to do anything orthogonal (data storage, KYC, profiles, privacy, finance, etc.) that can be layered on top. It doesn't impose constraints on what could be built around it - separation of concerns. All kinds of systems with their own incentives, cryptoeconomics & guarantees can be implemented on top of this identity layer & addressing. The on-chain vs off-chain tradeoff and what goes into the blockspace is as follows:

  • Consensus should be reached on the absolute bare minimum - only identity (integers), the history of keypairs & authorizations, name ownership & anchors to off-chain activity need logical centralization and must be on-chain with guaranteed data availability.

  • All other activity & data is stored off-chain (IPFS & other protocols) because of the sheer volume - it's ephemeral and its relevance fades with time. Most of it won't be stored forever but any piece can be backed up through archives & IDMs. Events get cryptographically anchored with Merkle roots to the chain so that permissions, inclusion & sequence are provable.

"Developers care about risk." - Haseeb

It must be obvious & provable that the network has a credible path to handling billions of users if entrepreneurs are expected to jump on the opportunity. The easiest mental model will win over developers and users - singletons & opinionated frameworks with a concrete direction are much simpler than a fractured landscape of standards, chains & bridges.

"Consistency is incredibly important for creating a compelling user experience." - Moxie

Decentralization, neutrality & sovereignty

  • Sovereignty: Users should be able to own their identity & connections with a keypair even if by default their activity is managed by an IDM - resembling a custodial service.

  • Credible neutrality - Anyone can permissionlessly have a pseudonymous identity, buy a name, operate an IDM, serve media through an application and publish & broadcast through Headjack using their identity. Moderation & censorship happen at the application & infrastructure level - freedom of speech is guaranteed, but not freedom of reach as that is up to the applications & services that serve media. Individuals will always be able to contact one another for private communication.

  • Anyone can self-host & run software locally, browse the ecosystem, and fetch content & events from entities they've subscribed to (although quite bandwidth-intensive), but their experience will be extremely limited in that they won't be able to run any sort of query/filtration/feed algorithm at scale nor aggregate the activity of billions of people in real-time.

"You can build something centralized on something decentralized but you can’t build something decentralized on top of something centralized. Decentralization is always the base layer." - @RyanSAdams

Identity & authorization

There are 3 types of roles in Headjack (although a single entity may play all 3):

  • Normal accounts - represented by an integer ID on-chain - keypair association is optional.
  • IDMs - a superset of normal accounts - required to have a keypair - can manage other accounts by submitting changes for them (name handle, updating keypairs) & acting as SSO - authorizing applications to post on behalf of users (& ability to revoke the authorization). They will also be responsible for handling DMs as discussed here.
  • Applications - a superset of normal accounts - required to have a keypair - they are the media presentation layer. Users can authorize them through the use of an IDM to post on their behalf without having to explicitly sign every interaction (follow, post, comment, react).

All authorizations are represented & submitted on-chain as simple integer pairs (2131 => 83253, 6331 => 14415) that get aggregated in compact blobs & signed in bulk by IDMs - achieving a very high signal-to-noise ratio (few signatures) ==> improving the throughput in the valuable block space.

With this foundation we achieve the following range of usage scenarios:

  • Costs for using the blockchain can be shifted to IDMs & applications with business models to support that - users won't care that there's an underlying token (they'll always be able to interact with it directly through the mempool & pay for transactions if they wish).
  • Users won't need wallets & keypairs - risky and cumbersome with multiple devices. Most will create accounts through IDMs & use email/pass or Web2 SSO ("login with Google") which will create on-chain integer IDs for them without associated keypairs - "owned" by the custodian. Users will be able to "log in" to applications using their IDM as SSO for Headjack which will authorize the application with a few bytes on-chain to post actions on behalf of users - all without requiring a single signature by the user - neither on-chain for the identity/authorizations (tiny bits of data - just integers & bit flags submitted by the IDM) nor for their off-chain content (posts, comments, reactions).
  • Users can revoke permissions to applications and even retroactively invalidate activity generated on their behalf by an application by saying "discard activity generated through application А from block X forward" through a small on-chain message published by their IDM because everything is sequenced. This is acceptable because in this blockchain such data is non-financial and fake activity has smaller consequences - it is still an enormous improvement compared to the current Web2 status quo.
  • At any point in time users can regain full sovereignty over their identities by binding a keypair through their IDM. Then they'll be able to cut that IDM off (revoke access) & even retroactively invalidate actions from it through another IDM or direct on-chain transactions.
  • Users can be completely anonymous by directly creating an identity with a keypair & paying for an on-chain transaction. They'll be able to use IDMs without having to sign with email/pass or a Web2 SSO - not revealing anything.
  • Applications will be usable by users that don't use an IDM but all their off-chain activity (posts, comments, reactions) will need explicit signatures.

So at the core of it all is just sequencing relations between integers & Merkle roots for content.

In practice, we expect that only cypherpunks & people that have something to lose (big audience/reputation) will go through the trouble to manage a keypair. Almost everyone will use IDMs - even most crypto natives don't want to explicitly sign every action and have their keys in hot wallets ready to get hacked. This way 99.9% of the user activity on-chain (mostly authorizations) ends up going through authorized services and gets batched in a compact way - requiring only that the service signs the aggregated payload and thus reducing the amount of signatures on-chain.

The vast majority of users will be lightweight: consumers & curators of content (through interactions & reactions) with very little creation on their part and little to no audience. At any point in time, they could shift to a more vocal role and start caring about archiving their off-chain data and not relying on the good grace of the infrastructure that sits beneath applications.

Key & session management (rotation, authorization & revocation) require ordering that is logically centralized. It is compatible with any type of DID - anything could be associated with an integer ID. The on-chain authorization has great synergy with the human-readable & persistent addressing for off-chain content.

Content addressing

The move from host-centric to data-centric addressing is a complete paradigm shift by itself but Headjack intertwines that with names, on-chain authorization and sequencing of anchors resulting in the best possible URIs in terms of human-readability & persistence - perhaps the most important aspect of the project. This chapter is broken down into a few sub-chapters:

  1. Host vs data-centric
  2. Blobs & persistent URIs
  3. Names, paths, & more

Host vs data-centric

Let's take a look how the web works and what are the building blocks & ideas that enable Headjack:

Today's web: host-centric

Today's web revolves around hosts & unicast communication - we query DNS to get the IP of servers and open direct connections with them to retrieve the data that they host. But domains, URI paths on servers & the actual files all change & go away which leads to link rot & content drift. Guidance such as "Cool URIs don't change" is just that - guidance - and the Internet Archive is just a bandaid that can hardly keep up with the digital memory hole. In the host-certified paradigm URLs at best point to a location at which a document may have been available at some point in time - devoid of any cryptographic proofs regarding the contents, the creator, an alternative way to retrieve it or the time of publication. The implications are explored in the host-centric section in the motivation.

"It is generally recognized that the current approach of using IP address both as a locator and as an identifier was a poor design choice." - David D. Clark, Designing an Internet

Data-centric computing

Data-centric computing is an emerging concept that has relevance in information architecture and data center design - data is stored independently of the applications, which can be upgraded without costly and complicated data migration. This is a radical shift in information systems that will be needed to address organizational needs for storing, retrieving, moving, and processing exponentially growing data sets. It increases agility by prioritizing data transfer and data computation. Applications become short-lived, constantly added, updated, or removed as algorithms come and go.

"Data is the center of the universe; applications are ephemeral." - The Data-Centric Manifesto

Content-addressable storage

Content-addressable storage (CAS) is a way to store information so it can be retrieved based on its content (not its location/name) and is a key piece of the puzzle. Identifiers are based on content and any change to a data element will necessarily change its content address. The most famous example of CAS is IPFS but it suffers from non-human-friendly addresses (hashes), performance issues, and extreme latency (tens of minutes) because of the global DHT if content is not widely cached/pinned.

Self-authenticating data moves authority from hosts to users. The three components that enable it are cryptographic identifiers, CAS, and an emerging area of research called verifiable computation which is yet to be applied in any meaningful scale.

Information-centric networking

Information-centric networking (ICN) is an approach to evolving the Internet infrastructure away from a host-centric paradigm, based on perpetual connectivity and the end-to-end principle, to a network architecture in which the focal point is identified information (or content or data). Data becomes independent from location, application, storage, and means of transportation, enabling in-network caching and replication. The expected benefits are improved efficiency, better scalability with respect to information/bandwidth demand, and better robustness in challenging communication scenarios. In information-centric networking, the cache is a network-level solution, and it has rapidly changing cache states, higher request arrival rates, and smaller cache sizes.

Named Data Networking

Named Data Networking (NDN) is a Future Internet architecture that builds on top of the previous ideas (& an incarnation of ICN) and in which data is requested by name and routed by the network. However, there are many unsolved challenges with it like the need to reimplement foundational routing infrastructure to make it stateful and hierarchically structured names which require a root name authority to link them to keypairs - outside of its scope. Here's a great lecture on the topic.

Enter Headjack - the final iteration

Headjack is a weird amalgamation inspired by everything above - it provides human-readable & persistent URIs for self-authenticating data (with Merkle proofs & the blockchain) along with the means for its retrieval without forcing a specific way (IPFS is just one option). It acts as the web-scale global index used to check the authenticity of documents (requires consulting with the chain), ownership of names, key management & sequence of events throughout time. It is an addressability layer on top of the current host-centric internet technologies.

Blobs & persistent URIs

This chapter will explain how all off-chain messages (actions/events/content) get published:

Blob construction - batching of user events

Applications accumulate off-chain activity from users which they cryptographically anchor in batches with a Merkle root on-chain and they determine how often to do so (it doesn't have to be on every block) - those with little activity may submit only once per minute or even less often - the frequency is determined by applications based on the volume of activity and the on-chain publishing costs.

When enough activity has been collected it is time for the application to finalize the batch: it is packed in a blob and all the events generated since the last anchored batch are sorted & grouped by accounts in some deterministic way (perhaps accounts based on index and actions based on the type/sequence) with some schema with the following steps:

  1. The intra-blob index (offset table) for lookup of content of specific accounts is generated.
  2. A Merkle root that touches every event is deterministically constructed following a schema.
  3. The IPFS CID (hash) for the blob is generated and it is pinned for others to download.

The only 2 things that are signed & submitted on-chain are thus the Merkle root and the IPFS CID for the next nonce (auto-increment counter) associated with the application account.

Stable intra-blob addressing before publishing

Applications maintain the logical order of events for the future batch in maps in order to provide intra-blob addressing even before it is fully constructed - as an example if a user posts an article and immediately after that comments on their own post - the comment should be able to refer to the post which is not yet committed on-chain. Applications will also display activity by other accounts that is not yet anchored and the interactions can still use the proper addressing when referring to the yet-to-be-anchored messages (the next nonce number is known in advance). Any type of interaction is addressable and sequenced in the blobs - including reactions (likes, etc).

Persistent & provable URIs

Each account has an associated auto-increment counter (nonce) for every time they submit an anchor for off-chain content. So if an application has submitted 2 times already, then the next submission will be with nonce == 3. The blockchain keeps a mapping in its state for each previous nonce value to the block number when it changed so that <application_id>/<nonce> can be translated to which block has the Merkle root anchor & IPFS CID for the blob that corresponds to that nonce for that account.

Once a blob is fetched through the IPFS CID (hash) we can address specific events by using the offset index in the blob header so a URI like <application_id>/<nonce>/<user_id>/<content_id> can point to a specific post, comment or even reaction (activity is grouped by users). The content ID for a specific user is usually a small single-digit number and is necessary only if there has been more than 1 interaction by that user through that application for the given nonce (maybe rare). This is what events with URIs referring to each other looks like:

The blockchain can be queried if the application was allowed to post content on behalf of the user with an on-chain authorization (which probably happened through an IDM) when that specific block was published in order to determine if the activity is authentic - the state keeps information for each account such as since what block number a given application was authorized to post on behalf of a user (and until when - all ranges). Users may avoid using IDMs and explicitly sign their actions in which case their data will be accompanied by their signatures within the data blobs and the only check required will be for the user keypair used for the specific block number.

Steps to prove the authenticity of a URI

To recap - to prove the authenticity of any event with a URI:

  • First check if the data is actually part of an anchored blob with a Merkle proof to a block. This requires either just the piece of data + a Merkle proof for inclusion in the blob or the entire blob in order to reconstruct the Merkle tree & proof.
  • Then check if the user actually submitted the event:
    • Either if at that point the application was authorized to post on behalf of the user which would require a Merkle proof for a part of the blockchain state (authorization ranges).
    • Or by checking for an explicit signature & the public key of that account at that time which would also require a Merkle proof for a part of the blockchain state (account key history).

URIs are persistent as long as someone hosts either the individual event + the Merkle proof or the entire blob (and can reconstruct the proof) and knows to which block it was anchored (from the <application_id>/<nonce> => <block_number> mapping). The following chapter shows how names in URIs are persistent too (even if user/application names change ownership at some point).

A few other notes

  • There can be many different & valid proofs for the same URI from different block heights.
  • Even private intranet data may be anchored but not retrievable by the public if the blob IPFS CID is never published or pinned/hosted - unified addressing for public & private.
  • Users should be able to see the URI of content even if created through another application and the origin should be displayed by default - acting as attribution for other applications.
  • Edits & updates to content come as messages with new unique URIs that reference the older message URIs and it is up to applications to properly handle this - either by showing that there have been changes and a newer version or automatically redirect to the latest. "Forks" are possible but they represent application failure to detect that an old version is being edited.
  • Accounts that anchor content on-chain cannot do so twice in the same block - for simplicity.

Names & paths

Headjack is also a name registry - accounts can own a handle and be identified with it. For specifics around the details (constraints, subdomains, auctions, distribution, hoarding, leasing, etc.) please refer to their dedicated page. In this chapter:

Names in URIs

Users and applications don't need a name and can operate as an integer index just fine, but the preferred case will be with handles. Names can change ownership but the blockchain will be able to translate <application_name>/<nonce>/<user_name>/<content_id> with strings into the canonical integer form discussed previously by substituting the application & user names with account IDs.

Every name has an associated auto-increment nonce (just like account IDs) for every time they submit an anchor for off-chain content and the blockchain records maps of <name>/<nonce> to <id>/<nonce> which can then be used to resolve the URI as discussed in the previous chapter.

But we need to be able to translate not just the application name but also the user name which may have changed ownership at any point - for that the blockchain keeps track of the account ID ownership of every name historically as ranges (from block X to block Y name N was owned by account A) so when we determine the block number for a given data blob we'd be able to check to which account ID does a name in a URI correspond to at that time.

And thus we're able to have URIs such as twitter.com/55212/johnny/3 to identify any event by any actor - all we'd need to do is a few lookups and we'll be able to use Merkle proofs for any piece of content to prove authenticity. Most URIs could even omit the 4th part because probably there won't be more than 1 action by a user for a given batch by an application.

Note that the canonical form (numbers instead of names) of twitter.com/55212/johnny/3 could be something like 42/783/523/3 where only the last number would be the same and the nonce would most likely be different. Also twitter.com might no longer be owned by account 42 but what matters is that the blockchain can correctly determine who owned it at nonce 55212. Multiple names can be owned by an account but their nonces for one event will probably be different.

What to ask the blockchain about a URI

To recap: we can ask the following questions about this URI: twitter.com/55212/johnny/3:

  1. To which application account ID & nonce does twitter.com/55212 correspond?
  2. To which block does the applicationID/nonce map correspond?
  3. What is the IPFS CID & Merkle root of the anchored blob at that block?
  4. What account ID does johnny correspond to in the block where this blob was anchored?
  5. Once we download the blob or just the blob header (using the IPFS CID or any other means):
    1. We can ask the offset table where within the blob is johnny's content № 3?
    2. Once we fetch the actual data & depending on whether it is explicitly signed or not:
      1. either if the application was authorized to post on behalf of johnny at that time,
      2. or if the signature matches the keypair that's been bound to johnny's account at the time of the anchored block.

Web3 URIs interoperable in Web2

Application accounts can point on-chain to a host with an IP address which can be used to display content published through them. Application names can also resemble traditional domain names so it will be possible to copy-paste such URIs directly into your browser and as long as they own the same domain in the traditional DNS they should be able to serve a webpage displaying the piece of content - enabling seamless interoperability during the transition from one paradigm to the other.

Content titles in URIs

Most Web3 platforms suffer from unreadable URIs but we've done a lot better - note the brevity and lack of hashes & hexadecimal symbols (0xf56a0...) - in fact, this is as good as it gets...

Or is it?! What about headlines of articles - can we have them included as well - something like twitter.com/55212/johnny/3/how-I-went-from-vegan-to-keto-and-back-again? Absolutely! The string is not at all necessary to resolve the piece of content (just like in StackOverflow where the database key for a question is just a number (example: question 4) but the page router always changes the URL when loading the page to include the title too). Message types for posts with titles will have a dedicated field which will get included in the content hash and thus spoofing the title will be rejected by conforming applications as it would be a trivial check.

Addressing within content

Different schemas could be used for addressing within pieces of content (like a paragraph from an article or a clip from audio/video - without losing the context of the whole) and message types could have by default associated on-chain schemas (or the schema of choice could be embedded within the header of the message). For example, when medium.com/12475/elvis/0/learn-to-code/121/66 is being loaded the associated schema will be looked up depending on the type of message (in this case - an article) and used to interpret the last part (121/66) which could mean a character selection with an offset from the start and length. The embedded schema could be overridden by explicitly stating which one to use within the URI. As an example, medium.com/12475/elvis/0/learn-to-code/schema/42/121/187 could mean "use on-chain schema number 42" which could interpret the last part (121/187) as start offset and end offset instead of start & length - resulting in the same selection as before. Even individual pages & paragraphs of books should be referencable in such a manner and could be composed of multiple separate posts - and this is just scratching the surface!

For big types of content (audio/video) the message could be broken down into chunks so that users can load only the message header and then depending on the schema used and the addressing within the content - only the necessary chunks could be requested.

Messages

The terms message/event/action/data/document/content are used interchangeably in this book and refer to any type of event/content a user might have generated - post/comment/reaction/etc.

IDMs, preferences & graphs

Data storage & retrievability

In this chapter we will see the different aspects of handling unlimited amounts of off-chain data:

Ingestion and transformation of blob data

Off-chain blobs with data will be fetched, processed and stored immediately after they are published in more optimal database formats for content to be later directly served by application infrastructure. Most of the cryptography checks will be happening instantly during this process but the proofs don't need to be stored. Users will always be able to request proofs for any event at any time (& cache them locally) because they can be regenerated on the fly as necessary.

Hierarchical data blobs & partial fetches

Blobs may be in a hierarchy such that the on-chain IPFS hash points only to the "root" blob that contains the header and the actual indexed data could be in child IPFS blobs (whose IPFS CIDs are contained in the root blob or header) so entities listening for events by specific accounts on Headjack may download only these headers and determine which "leaf" blobs they need to fetch for the data they are interested in (if any). Sparse bitsets & bloom filters could be used to quickly scan for the presence of activity by specific accounts.

Direct IPFS connections & horizontal scaling

Applications can advertise the multiaddress of their IPFS nodes on-chain so that each blob of content that gets published can be downloaded by others instantly by manually connecting with IPFS’s “swarm connect” functionality - avoiding the use of the DHT for each new blob CID which may take tens of minutes. They can provide addresses to multiple IPFS nodes as a cluster for horizontal scaling and use Pinset orchestration - designed for Automated data availability and redundancy.

Applications may choose not to use IPFS at all - what they must do is anchor their blobs with a Merkle root and provide some on-chain advertised means to retrieve the data (example: REST/RPC endpoints in their on-chain account). We expect that IPFS will be the lowest common denominator and will always be used no matter what other solutions are also available.

Sharing data before anchoring it

Applications can talk to each other directly by using their on-chain advertised REST/RPC endpoints and may ask for the events & messages that are not yet published by the other applications. This way they could display "remote" events locally while they are still in the "mempool" and allow their own users to interact with those events from other applications. This is possible because URIs are stable even before publication - see Stable intra-blob addressing before publishing. High activity applications can interoperate and no longer be a slave to the block time. However:

  • Applications should display events that are not yet anchored in the UI differently - especially if coming from another application.
  • Events that refer to each other but are from different applications and have not yet been anchored on-chain could end up committed in the wrong order (if one of the applications skips a few blocks and commits at a later one) - such that an event from the past is referring to an event from the future - breaking referential integrity. However, messages have a timestamp field and could also have the current block height at the time of creation - useful for sorting.

How to retrieve data for a random URI

There are multiple options:

  • The entire original blob with an IPFS CID might still be retrievable from the original application account that posted it or anyone else that has pinned the data.
  • The user account might be using an archival service for all their activity and they can point to that archival service on-chain in their account for others to retrieve their messages.
  • Other well-known players without a direct on-chain connection to the application/user in a URI could be asked if they have the content:
    • Infrastructure companies that do the heavy lifting for applications and store everything.
    • The analog of the Internet Archive in this ecosystem that also stores everything.
  • IPFS can be forked & reused with the following change: instead of delivering content based on the CID hash it can deliver the data + the necessary proofs based on Headjack URIs or their hash (they are unique) - any individual off-chain message that's been anchored would be retrievable as long as someone is hosting it in this p2p network (which needs bootstrapping - could be part of Headjack nodes). However, this won't be very performant due to the granular nature of individual messages with a URI and the use of a global DHT.

Blocks, state & proofs, oh my!

Throughput & scalability

Everyone claims to be scalable, but here we'll prove that Headjack can handle billions of accounts and anchor unlimited amounts of off-chain content tied to identity with simple napkin math.

How big is a Headjack transaction

Applications post anchors to off-chain content with an IPFS CID hash and a merkle root. IDMs also anchor off-chain content (mainly user preferences & updates to social graph), but they also post authorizations to other accounts (applications) to post on behalf of users as integer pairs.

So the fields for a transaction by an application/IDM (which will be the majority) are:

  • version: 4 bytes
  • signature: 65 bytes
  • blob IPFS address: 32 bytes
  • blob merkle root: 32 bytes
  • nonce: 4 bytes auto-increment integer associated with the account - to prevent reordering of anchored off-chain blobs (which would mess up internal addressing based on that nonce)
  • value: 4 bytes amount of native token paid to validators for transaction inclusion

So far that is 141 bytes which almost every transaction by an application or IDM contains. IDMs also submit a list of authorizations (or revocations) as integer pairs. For example, 1000 accounts authorizing 15 different applications to post on their behalf would be 1000 integer pairs. Assuming 8 byte integers (up to 2^64) that would be 8 * 2 * 1000 = 16k bytes.

Naive scenario

The initial version will target block bandwidth of up to 100 kb/s. This is not a problem for ZK validiums as there are already DA solutions that offer 10 mb/s or even much more.

Assuming:

  • 1 MB block size & 10 second block time (100 kb/s of block bandwidth)
  • 1000 applications posting in every block
  • 100 IDMs authorizing as much users as possible - filling the remaining block space
  • no on-chain actions such as keypair & name changes, account creation & direct interaction with the chain by end users

We get:

  • 1100 actors (1000 applications + 100 IDMs) that post in every block at least 141 bytes for their transactions, which is 155100 bytes
  • the remaining 893476 bytes (1048576 (1MB) - 155100) can be filled with authorizations and since an authorization is 16 bytes (8 * 2) that would be 55842 authorizations/revocations every 10 seconds or 5584 authorizations/revocations per second
  • for 1 billion accounts that would be 0.557 authorizations/revocations per person per day which is actually quite good - people on average do way less single sign-ons per day
completely different goals - comparing the 2 protocols just to put things into perspectiveHeadjackEthereum
block size1 MB ~80 kb
block time10 seconds ~13 seconds
blockchain bandwidth per second100 kb/s (x16 more than Ethereum)~6.15 kb/s
blockchain bandwidth per day8640 mb/d~528 mb/d
transactions/authorizations per second5584 APS ~14 TPS
transactions/authorizations per day482,457,600 APS1,209,600
transactions/authorizations per person per day for 1 billion accounts0.482 (x400 more than Ethereum)0.0012096

Realistic scenario

The naive scenario does not include on-chain actions for specific accounts such as:

  • keypair changes (new pubkey (32 bytes) + signature (65 bytes) if there is an older key)
  • account creation (if done by an IDM then this is just a few bytes - no pubkey)
  • name registration & ownership changes (see the dedicated page for more details)
  • updating account fields such as a URI pointing towards an off-chain account directory (which could point to archived posts) or pointing to another account index for such services
  • signed transactions by individual accounts that want to directly interact with the chain
    • authorizing an IDM, rotating keys, or even publishing off-chain content as an application

However, the realistic scenario will not be far from the naive because:

  • Only a % of all accounts will have keypairs (even though 100% could) and will make just a few signed actions per year - leaving most block throughput for authorizations through IDMs.
  • Large % of accounts will rarely even be authorizing new applications - many people don't sign in to new services through SSO every single day. There could also be 2 types of log-ins: passive (viewing only - nothing on-chain) and authorized (allowing services to post on behalf of users).
  • Many applications that don't generate a lot of off-chain activity will publish less often than on every block in order to minimize on-chain block space costs.
  • The chain throughput can be further optimized & scaled by multiple orders of magnitude.

Optimizations & scaling

  • Throughput of 100 kb/s is just the start & can easily go to 1-10 mb/s as a ZK rollup.
  • The chain & state can be trivially sharded - there aren't problems such as fracturing liquidity or preventing composability because accounts don't care about each other - they mostly contain authorization block numbers & keypair history.
  • Integer indexes that only need 4 bytes can be compressed/batched together - it'll take many years to go beyond 4 billion accounts so the actual throughput is 2x of what is listed here.
  • A fee market can develop that tunes the cost of different actions so that actors don't just pay for on-chain bytes - the ways the system is used can be guided through incentives.
  • Other optimizations not listed here - this is just the starting point.

State growth

Headjack's main value proposition is keeping historical records of the sequence of authorizations, key changes & off-chain content anchors and being able to generate proofs for any specific piece of off-chain content.

TODO: finish this

https://ethereum.stackexchange.com/questions/268/ethereum-block-architecture

numbers - state - one difference from other cryptos is that this one is append-only and could be designed to be easier on memory access patterns

One difference with other blockchains is that accounts in Headjack are numbers and thus the state tree could be different.

on eth state growth: https://twitter.com/SalomonCrypto/status/1587983584471633921 https://hackmd.io/@vbuterin/state_size_management

All on-chain changes just append data to one of the few attributes of:

  • accounts:
    • public keys: a map of keys and block height integer ranges (non-overlapping)
    • authorizations: a map of indexes and arrays of block height integer ranges
    • nonces: an array that maps autoincrement indexes to block numbers
      • appended only when publishing off-chain content (usually an application/IDM)
  • names:
    • owners: a map of owner indexes and block height integer ranges (non-overlapping)
    • nonces: an array that maps autoincrement indexes to account index & nonce pairs
      • appended only when publishing off-chain content (usually an application/IDM)

TODO: should IPFS hashes & merkle roots be saved in the state? - no?

TODO: light clients? in addition to merkle proofs for inclusion of content they would need merkle proofs for the state of which applications a user has authorized to post on their behalf in a given block

Off-chain content

There are no limits for off-chain content as it is all just anchored with merkle roots - it could be as high as hundreds of terabytes per second. There isn't a more minimal design that can link unbounded amounts of off-chain data to billions of identities that can change keys & names and yet still provide the guarantees & mental model simplicity of Headjack - it achieves consensus on the absolute bare minimum.

Headjack vs the competition

This chapter focuses only on the disadvantages of some of the more high-profile competing solutions in the space. Most of the issues are solved in Headjack due to its guiding principles & design goals. This page doesn't list any of their positives as it would be too long (so not exhaustive by any means) but many of them have served as an inspiration for Headjack in one way or another.

Comparison table

This is the subjective understanding of Headjack's team - many of the claims lack official sources.

Headjack Farcaster DSNP & Frequency Bluesky & AT Protocol TBD web5
slides & tweet
Ceramic & CyberConnect Lens
Protocol
blockchain-related properties
Scalability & potential scope can handle billions of users (proof) & underpin the entire web perhaps could handle up to ~10 million - may need to move to its own rollup for more. centralized consortium of servers actions are on-chain as NFTs (follow, post's hash) - even a dedicated EVM chain will be futile
Users paying for TX fees & linking identity to financial accounts by default blockchain costs are paid for by services by default Ethereum L1 costs initially planned for subsidy by services centralized consortium of servers - no TXs the anchors (on-chain Merkle roots) get batched with others
Blockchain TX fee stability & predictability as scalable as necessary => no congestion Ethereum L1 - may need to migrate to its own rollup in the future centralized consortium of servers - no TXs Bitcoin TX fees are low due to low economic activity
Block time for anchoring key operations Ethereum ZK validium with multiple blocks in one L1 slot !!! TODO: !!! ADD FOOTNOTE ABOUT SECURITY OF SUCH BLOCKS Ethereum Polkadot centralized consortium of servers Bitcoin Ethereum Polygon PoS
Time to finality
Contains a name registry for easy discoverability yes - & tightly integrated with addressability - URIs aren't broken even if names change ownership yes, also works with ENS no, but will probably introduce one no - uses email-like usernames resolved with Webfinger to a DID & relies on DNS => centralized no no, probably works with ENS no, probably works with ENS
Decentralization for the most important parts Ethereum ZK validium with external data availability (validium) - EigenDA? Ethereum Polkadot - not big enough set of validators centralized consortium of servers Bitcoin Ethereum Polygon PoS
Data availability, storage, retrievability & addressing
Human-readable & persistent URIs for data without any hashes URIs full of hashes (probably) URIs full of hashes URIs full of hashes - CIDs for IPLD objects URIs full of hashes (probably) URIs full of hashes URIs full of hashes
Multiple ways to ask for a URI's document
(in addition to caches/archives)
 multiple ways:
 1) user's IDM
 2) source app identifiable from the URI
 3) IPFS blob from the block
 4) p2p network
 1) user's Hub
 2) p2p network
URIs contain only user id & content hash without user Hubs (yet) & p2p network  1) user's PDR
 2) maybe p2p network with the content CID
 probably
 1) user's DWN
 2) p2p network
only p2p network as Ceramic streams are an abstraction over IPFS unsure - maybe the on-chain NFT post
Big reliance on a p2p network for delivering fine-grained messages Headjack is pure addressing - storage & retrieval are orthogonal and the p2p network for specific URIs is bottom priority using a gossip-based pubsub protocol between peers
Push (broadcast) vs pull (polling) for fetching new content both - event batches are broadcasted & new/individual documents can be requested pull only - requires polling a user's Hub for anything new both - event batches are broadcasted & new/individual documents can be requested pull only - requires polling a user's PDR for anything newpull only - requires polling a user's DWN for anything new both - events are broadcasted & new/individual documents can be requested
Self-authenticating documents proofs are validated by the blockchain need to talk to Ethereum AND the host-certified user directory which can disappear OR change merkle roots not present proofs are validated by the transparency log
Incentive layer & DA for key rotation/revocation & registries Long-term Ethereum L1 might introduce state rent
Ease of use for developers & users
Can leverage existing Web2 authenticating infrastructure Can leverage all existing OAuth code
Easy to work with mental model vs high cognitive load & complexity A bit more complexity compared to Web2
Can use "custodial" hosted services while retaining ultimate control
Ease of indexing & building responsive UI can be as performant as Web2 and not constrained by block time

[1] [2]

1. X.
2. X.

What other projects get wrong

A list of problems with the contenders in the decentralized identity/media space:

  • No credible path to web-scale - some will hit a wall even at 1 million users. Most are vague around their scalability & data structures and don't put it front and center - obfuscating the most important bit. Instead of focusing on NFTs & developer APIs, start with the data and work up from that.
  • Complexity & lack of clarity - distributed systems engineers should easily figure out how they work & what the limitations are. Why build on something that others are probably having a hard time understanding as well and may not be around in the future?

    "Developers care about risk." - Haseeb

    "For the simplicity on this side of complexity, I wouldn't give you a fig. But for the simplicity on the other side of complexity, for that I would give you anything I have." - Oliver Wendell Holmes

  • Too financialized & trying to do too much - profiles & posts as NFTs, microtransactions, marketplaces, fan coins, tipping, content creator incentives.

    "However, a downside I’ve observed in social networks where content is monetized is that user behavior becomes transparently driven by monetary incentives in ways that feel less genuine. This applies to influencer culture on Instagram as well, but cryptocurrency social networks bake it in from the start." - Jay Gerber

    "The question remains: is the future of social media truly intrinsically linked to NFTs or is it a red herring?" - @mattigags

  • Users shouldn't need to use a token, use a wallet, or self-host to benefit from decentralized identity & an open social graph. Most people will always use custodial services.

    "People don’t want to run their own servers, and never will." - Moxie

  • Linking online identity to public financial accounts on Ethereum/Solana/etc will have unintended consequences - a bad default.

  • Federated ones lack logical centralization which leads to fragmentation and no discoverability.

  • Some are solving just identity & the graph - without easy & persistent content addressing.

  • Social media is about aggregated views at scale - not p2p and direct comms.

    "The emphasis of a social network is on "propagation" aka, propaganda." - didibus

  • Some use chains such as Ethereum for logical centralization & store vector commitments (Merkle roots) for events around key management (rotations, authorizations, sessions & revocations) but the data availability problem for whatever is committed is unsolved.

    • The complexity is not encapsulated - there are many open questions, edge cases & failure scenarios and it would inevitably lead to assumptions & trust.
    • Some anchor to Bitcoin but the time to finality matters a lot for UX - 10-minute block times with probabilistic finality is horrendous.
  • Some lack an economic incentive layer.

    "Show me the incentive and I will show you the outcome." - Charlie Munger

Farcaster

Their architecture: link. The account registry is on a blockchain and everything else is off-chain.

  • Registry on Ethereum L1 - for new accounts, name/host changes & key management.

    • No plans on moving to an L2 or their own chain. Also, state rent could eventually be introduced to Ethereum which would lead to further costs & complexity.
  • Keypairs & wallets required - harder mass adoption. Authorizations still require a signature from the root key.

  • Revocations invalidate all prior activity from a delegate:

    "Unfortunately, this means that all messages signed by that signer will be lost since we cannot tell which ones were signed by the attacker." - source

  • The p2p network's ability to scale by passing around granular casts is questionable - they are already discussing possible flooding and nodes having to shadow ban and flag accounts based on behavior.
  • Focus is on partial views of the network as opposed to mass scale aggregation & indexing - although that could easily be implemented.

  • Cast URIs will look something like farcaster://id:8789213729/cast:0xf00b4r which is less readable than what Headjack will be offering with its addressing.

Overall good intuition about the concept of sufficient decentralization (putting only what is absolutely necessary on a blockchain) but the p2p node implementation takes on too much responsibility, complexity & assumptions (consensus, CRDTs, trees, ordering, flooding & replay attacks, etc.) and is lacking in other areas.

DSNP, Frequency & Project Liberty

Frequency (a Polkadot parachain) is the first implementation of DSNP (Decentralized Social Networking Protocol - whitepaper) as a standalone blockchain and has had the most influence over Headjack's design but the two have diverged in some key respects - the biggest of which are scalability, content addressability, UX & choosing Polkadot. Some of the problems with them:

  • No names within the project - just integer IDs for accounts. Content addressing URIs are based on hashes without connection to the batch # / service that published it - example: dsnp://78187493520/0x1234567890abcdef0123456789abcdef0123456789abcdef (source). Addressing content is much worse compared to Headjack's human-readable & persistent URIs.

  • Delegating applications to be able to post on behalf of users (analogous to authorization in Headjack) happens on-chain but requires a signature from the user (bulky - limiting throughput). New applications (& revocation) require the user to have access to their keys. Hierarchical delegation would allow for UX comparable to Web2 and would even allow for users without keypairs at all but DSNP doesn't have that - Headjack does.

  • 100m$ of funding (so far) from just 1 person - Frank McCourt - no other capital & connections to reputable investors & influencers from either the crypto or tech space - generating hype & booting up the network effect might be very hard. They've been around since 2019.

TBD

Jack Dorsey's new "web5" project - slides, announcement.

  • Only anchors DID events to Bitcoin with vector commitments (Merkle roots) using ION & the Sidetree protocol.

    • 10-minute block times with probabilistic finality. Factor in the loading times for the anchored content around key management that's on IPFS - not great at all if you want to log in/authorize a service or revoke access quickly.
  • The ION DID network is not incentivized (just like IPFS) and the anchored content around key management, rotations & revocations depends on the current cluster of ION nodes. They state not having a consensus mechanism as a plus - which is debatable - logical centralization, uptime, adequate finality & DA guarantees matter a lot when dealing with identity.

  • Doesn't have a human-readable global name registry - lacks discoverability.

  • Doesn't have human-readable content addressing.

  • Focus is on users self-hosting their own data, running software locally & handling keypairs.

  • Developing their own Decentralized Web Nodes (DWN) software that would be relaying messages p2p - can't handle web-scale on such a granular level and aggregation is not even in the picture.

CyberConnect

Built on the Ceramic protocol & network.

TODO: working on incentives for pinning https://twitter.com/joelthorst/status/1588863780301156352

  • Requires the use of keypairs & wallets.

  • Every user has their own Ceramic data stream on top of IPFS - it is yet to be proven that the DHT & p2p layers can scale to hundreds of millions or billions of people.

  • The persistence of the social graph is handled by pinning IPFS data on nodes operated by them without any cryptoeconomic incentive for the data availability - it will grow into the tens/hundreds of terabytes for web-scale (Twitter scale: 400M users with 700 connections on average) - especially because they don't have a compact integer-based representation and everything is based on big individually signed actions. The upcoming Ceramic blockchain does not seem to be geared towards storage incentivization and will not be the solution to that.

    "Long-term data retention on CyberConnect is guaranteed through Ceramic’s blockchain anchoring and a custom data pinning service." - source

DeSo

  • It requires wallets & users to pay for every interaction.

  • It puts everything on-chain and their plans to scale are with bigger blocks & sharding (see "Phase 4: Sharding") which is simply not practical for the true scale of the public web.

  • It financializes as much as possible (creator coins, etc.).

  • Their initial growth was fueled by huge sums of VC money but by now it has flatlined. It did reach 1.66$ billion market cap on the 2nd of October 2021 shortly after being listed.

Others

For details about ActivityPub, Matrix, Diaspora, Mastodon, Secure Scuttlebutt, Solid & others please refer to the excellent ecosystem review by the Bluesky project. Other good resources include:

Why Headjack

The web is broken on many fronts - this chapter explores many problematic aspects and how they can be completely solved or at least improved. Headjack's paradigm is an architectural reset of the web & opens doors to things which weren't possible before.

  1. Problems with the current web
  2. Today's information ecology
  3. Event streams & data legos
  4. The ledger of record
  5. Improved infrastructure
  6. Knowledge management
  7. Algorithms, feeds & aggregation
  8. Business models
  9. Startup case study
  10. What really is Headjack

"Millions saw the apple fall, but Newton was the one who asked why." - Bernard Baruch

Problems with the current web

"The internet changed the topology of human communication, with existing social platforms as a mere proof-of-concepts." - "Social Web 3" by Zee Prime Capital

The current host-centric web is just a local maximum due to gradient descent.

  1. The host-centric web
  2. The information ecology
  3. Centralization
  4. Silo vs interoperability
  5. Black boxes & bias
  6. Specific platforms

The host-centric web

The host-centric web & its decay

One major problem of the current host-centric internet architecture is that documents are host-certified - we refer to data by location instead of contents, but that leads to link rot & content drift. Information is fragile without an ecosystem of identity, reputation, references, context & liability - our digital history lacks a solid foundation. We can't expect everyone to be like @balajis - linking to articles from the Internet Archive (Example: look what "Prussian" in that text is pointing to) - this doesn't scale, data is still not self-authenticating and is reliant on a central point of failure. The internet is a collective hallucination and is rotting. It's as permanent as a sand mandala and its just a matter of time for it to go away. Some great quotes:

"More than 98% of the information on the web is lost within 20 years" - a16z Podcast

"The problem is that the foundations are shifting sands, and we need something that has significantly more integrity at the bottom layer, we can't just bolt URNs on as an afterthought. Some organizations are able to maintain persistent data over time, but it is in spite of the technology, not because of it." - tgbugs

"Society can’t understand itself if it can’t be honest with itself, and it can’t be honest with itself if it can only live in the present moment." - source

"People tend to overlook the decay of the modern web, when in fact these numbers are extraordinary—they represent a comprehensive breakdown in the chain of custody for facts." - source

"If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the web, anything can." - source

Lack of authenticity

Using screenshots of tweets in case the originals get deleted does not constitute evidence. The lack of authenticity is being routinely exploited - "Shedding light on fraudulent takedown notices".

"For example, thanks to the site’s record-keeping both of deletions and of the source and text of demands for removals, the law professor Eugene Volokh was able to identify a number of removal requests made with fraudulent documentation—nearly 200 out of 700 “court orders” submitted to Google that he reviewed turned out to have been apparently Photoshopped from whole cloth. The Texas attorney general has since sued a company for routinely submitting these falsified court orders to Google for the purpose of forcing content removals." - source

Today's information ecology

Cultural fragmentation & filter bubbles

The same document may be published through different platforms and because of the host-certified web of today it will get multiple different URLs. Discussion around it becomes fragmented & shallow in the different platforms with separate comment sections and there isn't a way to de-duplicate & unify it. This facilitates polarization as separate echo chambers can form without seeing the opinion of other types of people.

"Echo chambers are intellectual oppression - as opposed to idea labs where ideas are treated as experiments." - Tim Urban

Instead imagine being able to view the entire discussion around a specific event by tracing & aggregating all of the re-publications, references & re-tweets & quotes of it from anywhere and applying any type of filter to that.

That is what interoperable identity, content-addressing & broadcasted data enables - we can connect and de-duplicate everything and allow anyone to build tools around that - constructing a much bigger graph that what Google have created for themselves.

Moderation & censorship

This is an incredibly hairy topic with many aspects - here are just a few of them:

  • No clear rules for moderation & censorship - the terms of service are ambiguous and an ever moving goal post. Platform accountability is practically non-existent:
    • account reach can be down-regulated through opaque techniques like shadow banning
    • accounts can be removed subjectively (case in point: earlier Twitter accounts tracking Nancy Pelosi's public stock trades)
  • There is no way for users to "fork" a Reddit community if they no longer agree with the way moderation is happening - they have to recreate a new subreddit from scratch.
  • There is no market for solving certain types of spam such as financial scams - Twitter & YouTube are riddled with templatized messages and their internal vertically integrated teams are unable to deal with yet another problem in a world-class manner. In an open system such as e-mail the competition & innovation for solving spam has been tremendous.

Centralization

DNS & certificate authorities

ledger of record page - this can obsolete certificate authorities in the use of TLS https://en.wikipedia.org/wiki/Certificate_authority host-centric page - reliance on certificate authorities https://en.wikipedia.org/wiki/Public_key_infrastructure Public key infrastructure is inherently centralized

"According to Netcraft in May 2015, the industry standard for monitoring active TLS certificates, "Although the global [TLS] ecosystem is competitive, it is dominated by a handful of major CAs — three certificate authorities (Symantec, Comodo, GoDaddy) account for three-quarters of all issued [TLS] certificates on public-facing web servers." - Wikipedia

The internet is decentralized as much as its weakest links which are DNS & certificate authorities

By July 2022 just 3 certificate authorities (IdenTrust, DigiCert & Sectigo) are responsible for 3 quarters of the entire market.

Certificate authorities should be a thing of the past.

Furthermore, content served by a website through HTTPS (so using SSL) cannot be cached & cryptographically frozen in time because if the certificate is revoked then there's no way to actually prove the order of events - when was the data signed (cannot rely on internal timestamps) and until when was the certificate valid. Headjack fixes this by anchoring all off-chain events with a merkle proof on-chain & sequences what's relevant.

"In a separate disclosure unrelated to Snowden, the French Trésor public, which runs a certificate authority, was found to have issued fake certificates impersonating Google in order to facilitate spying on French government employees via man-in-the-middle attacks." - wikipedia

yet more CENTRALIZATION in checking for the validity of certificates: reliance on OCSP which is vulnerable to DDoS, replay attacks turtles all the way down - CAs delegate other entities to be OCSP responders Also if a MITM attack is possible then If an attacker has compromised the private key of a server and is doing a MITM attack then OCSP requests will also be going through them - rendering OCSP an unreliable means of mitigating HTTPS server key compromise.

"Because most clients will silently ignore OCSP if the query times out, OCSP is not a reliable means of mitigating HTTPS server key compromise." - Wikipedia

OCSP also leaks browsing behavior https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol

our activity is being tracked when not using a VPN - even if using https we leak to the network which big tech service we are contacting & using

DNSSEC was introduced to fight a wide range of DNS-related attacks but it also requires the use of certificates & a trusted third party.

The world is built on top of chains of trust which rely on certificate authorities https://en.wikipedia.org/wiki/Chain_of_trust https://en.wikipedia.org/wiki/Root_certificate

The internet is under U.S. control

https://whois.icann.org/en/domain-name-registration-process

Centralized & fragmented identity/preferences

Convenience & simplicity sought by users has lead to extreme levels of centralization of identity in just a few players with network effects & single sign-on functionality.

"as of 2018 the consolidation of power and control over the social web by a few large corporations seems unparalleled" - Decentralizing the Social Web

But despite the concentration of SSO services a lot of identity-related activity is fragmented between platforms due to the lack of standards & interoperability: settings/preferences, direct messages, bookmarks, playlists, progress bars, etc.

"Identity on the internet today is fragmented across many centralized services, each with its own set of user data. Signing up for a new service requires making a brand new identity and re-entering all of your information. This is not only tedious but also means that a user’s identity is going to be inconsistent between services because they are not always going to update key information on every single service every time that something changes." - source

Linktree is just a bandaid for today’s fragmentation of identity (valued at 1.3B$) - it is a symptom.

TODO: move this to another page

Contrast that to a world with interoperable & exportable identity/data:

“each time we go from one social network to another we do not need to restate who we are, what our interests are, or who we know” - Decentralizing the Social Web

Infrastructure centralization

Google is way more than just a search engine even though the majority of their revenue comes from advertising - they control large percentages of the plumbing of the web - key choke points such as submarine cables, routing, data centers, browsers, DNS, etc. David Vorick puts this perfectly into perspective in The Worrying Depth and Scope of Censorship on the Internet - some quotes:

"If Google decides they don’t like you, then for 65% of the world you simply stop existing. You have no recourse. The terrifying thing about this is that Google is not an elected entity. Google has turned themselves into unelected regulators of the Internet, and they are held accountable only to their own share price."

"As our economy and services become more deeply intertwined, an increasing number of players have more influence and ability to de-platform a greater number of businesses and users. And these requirements compound against each other. If one service provider is particularly opinionated and quick to de-platform, everybody else is forced to give them a large amount of breathing room and become more oppressive towards their users to avoid potential conflict."

"This does not scale. The end result will be a global monoculture where everybody is afraid to take risks or break the status quo because nobody can afford to upset even a single of the hundreds of services that they depend on. Our culture gets established and defined by giants like Facebook and Google rather than users and creators, because only Facebook and Google have the resources to bully everyone else into allowing changes to happen."

"The only way to avoid this endgame is to demand infrastructure that remains neutral. At the scale of today’s Internet and global economy, infrastructure that does not remain neutral will inevitably turn on its users and coerce them into a set of moral standards that are both arbitrary and enforced without consent."

Barriers to entry

Vertical integration vs markets/competition

Platforms do almost everything in-house in a closed way as providing access to third-party companies to their data to solve specific problems is hard due to complications around data privacy/regulation and the need to safeguard their competitive advantages & trade secrets.

This leads to:

  • lack of cooperation, interoperability, duplicated effort & stifled innovation
  • competition for scarce talent which leads to sub-par solutions
  • company bloat & inefficiencies
    • companies are harder to manage as they are way bigger than what they could be
    • bigger size demands higher revenue - pricing out many business models
  • differences in functionality between platforms => complexity for users

Contrast that to open protocols & exportable data where anyone can specialize, innovate & provide the best possible service for a specific vertical & sell it to others. The move from host to data-centric addressing and open blockchains enable interoperability and composability.

Growth, network effects & monopolies

Social media platforms are growth-at-all-costs stories because the goal is to achieve a network effect first and become a monopoly with a MOAT at which point the user value extraction and cannibalization of ecosystems built on top of APIs can ramp up (1, 2, 3, 4, 5, 6). At that point innovation is less necessary (+ is harder due to inertia) and even the quality of service may degrade. User data is the most valuable commodity and scale enables the best AI models & efficiency of value extraction in the advertising model which comes with a slew of problems & perverse incentives.

Users are usually locked-in and effectively have no voice and exit either because:

  • the network effects are insurmountable for incumbents and there are no alternatives
  • or if they leave for an alternative service they'd lose all their connections, audience & reputation and would have to start from scratch

"I think it might be reasonable to believe that single monolithic companies shouldn’t have monopolies on certain data that practically guarantees user lock-in. And that the internet might be better if some data were made completely open and available to any developer who wants to build on it while ensuring that the data can’t be edited by anyone that isn’t supposed to be able to edit it." - source

"Twitter was supposed to be a protocol allowing anyone to build products and services on top of it that drive value back to the parent company and investors. But it wasn’t a real protocol. It only pretended to be. As soon as the people behind the scenes changed their minds about what they wanted Twitter to be, the “protocol” side of Twitter got shut down. While this ruined a lot of businesses built on top of it at the time, it was perfectly predictable. Before web3, it was near impossible to build real application-specific protocols on the internet. And counter to the beliefs of the biggest web3 critics, web3 does allow you to build real, open, and neutral protocols." - source

The cold start problem for startups

The barrier to entry for most types of platforms is very high:

  • kickstarting a network effect & attracting a critical mass is very difficult
  • need to reinvent the wheel & vertically integrate many aspects instead of composing a service from already existing solutions

And thus few companies are started and even fewer are successful - leading to little innovation, slow progress & sub-par services.

Check out the startup case study expanding on why it would be easier with Headjack.

Black boxes & bias

Black boxes & algorithmic bias

The recommendation algorithms & the social graph are the architecture of virality - the dynamics of amplification & interaction dictate how ideas surface, propagate, compound & evolve. The people writing the algorithmic feeds are the most powerful in the world - @naval.

Ephemeral experiences such as search suggestions & results leave no trace and it's extremely hard to prove bias as Dr. Robert Epstein would attest - there is 0 accountability.

"But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm." - the original Google search engine whitepaper

Does such a search engine exist today? Competition & a lower barrier to entry are direly needed.

The explicit user preferences such as subscriptions & the social graph (following/connections) are routinely discounted in our feeds in favor of algorithm recommendations - platforms optimize for engagement & attention and not for utility & value to end users. We all respond to outrage & enjoy the occasional viral cat video but we should be able to tune & filter what gets shown to us. Have you ever heard a YouTuber tell you to hit the notification bell in addition to subscribing?

"I'm in an ongoing relationship with a moody, sensitive, grudge-holding, and generally crazy girlfriend called the Twitter algorithm. Everything will be going fine and then suddenly I'm getting the cold-shoulder and I don't even really know what I did and just have to wait it out." - Tim Urban

Specific platforms

A non-exhaustive list of additional problems (beyond what's already listed) with some platforms:

  • YouTube:
    • there is no longer a down vote count & like/dislike ratio
    • subscriptions are by now almost meaningless without the notification bell icon
    • the comment section is just an afterthought - they don't care about it
      • the presentation is extremely basic & limiting
      • you cannot even link to a specific comment with a URL
      • financial scams in comments are abundant - moderation is non-existent
  • Twitter:
    • we can't even sort the tweets of someone based on engagement
    • we pin threads of threads on our profiles and sequence them with X/YY numbers
    • cannot sort quotes/replies of a tweet based on engagement/age
    • no unrolled thread view option even though it's a no-brainer at this point
    • filtering & tuning what is shown in lists is nonexistent
      • lists don't show replies that are not to accounts in that list
      • can't display likes in lists
    • we can't see other people's feeds (although there's this third-party app)
    • no way to opt-out of recommendations in the main feed for topics you don't care about or unrelated activity such as X received a reply from someone you don't follow
    • find the beginning of this thread - is that readable & usable? There should be an alternative Reddit-style application

Twitter should have added this feature years ago: "follow" a thread (w/o having to comment as a hack way to do this).

when twitter decide engagement is low they shove down your throat nonsense algorithmic "recent tweets" notifications you can't turn off - that's what it has devolved into. "See less often" from the dropdown menu does nothing. Misaligned incentives.

"Unfortunately, you cannot turn off recent tweets. This is because the feature drives up “user engagement”, which is a key metric that shareholders pay attention to." - source

History 2.0: the ledger of record

In the current web documents are host-certified and we refer to data by location instead of contents. Here we'll expand on problems with the status quo and list the benefits of building a web of trust at web-scale through data-centric addressing & self-authenticating documents tied to identity.

Authenticity

We'll be able to computationally verify the authenticity of any document & tie it to an identity as long as we also have the proofs for it - giving birth to the ledger of record where argument from cryptography begins superseding argument from authority.

Anyone might have saved a specific document (& updates to it) locally along with the necessary proofs for authenticity even if most infrastructure no longer stores/serves it. There is a 1 of N guarantee which allows documents that someone wants buried to be passed around with proofs and resurface in the public at a later point - improving accountability.

The global Git

Headjack is a global version control system with different data availability tradeoffs - storage and retrievability are not guaranteed. If someone processes everything that's linked to the blockchain they'd be able to track the creation and changes of each document - both the edits from the original authors and the forks & references from others, and also the sequence of authorizations that update who has authority to edit a document. All events are cryptographically sealed in time with on-chain commitments and the history cannot be tampered with.

We should be able to view the changes of any document with a diff view - similarly to what The Internet Archive provides (see this as an example) but with a lot more control and a wide range of different visualization tools - example: a slider for filtering/jumping through time like in Discourse.

"The public’s interest in seeing what’s changed—or at least being aware that a change has been made and why—is as legitimate as it is diffuse. And because it’s diffuse, few people are naturally in a position to speak on its behalf." - source

We ought to rebuild everything on top of this ledger of record - including Wikipedia (no more dead links!), open source code, science and peer review - under one global interlinked namespace where any public event is referencable so that others can comment on it.

The history of document updates

Today's web puts authenticity & certification of documents in the hands of hosts which can do whatever they want and rarely provide the option to see previous versions if edits have been made. The Internet Archive is hardly a mainstream tool which doesn't provide any cryptographic authenticity guarantees and can be compromised.

"It is really tempting to cover for mistakes by pretending they never happened. Our technology now makes that alarmingly simple" - source

"Society can’t understand itself if it can’t be honest with itself, and it can’t be honest with itself if it can only live in the present moment. It’s long overdue to affirm and enact the policies and technologies that will let us see where we’ve been, including and especially where we’ve erred, so we might have a coherent sense of where we are and where we want to go." - source

In Headjack updates to URIs are broadcasted but the previous versions remain - applications ought to display the latest state but should allow browsing the entire history of changes - like using Git.

Deduplicating documents & traceability

An open paradigm with content addressing where data is shared between services would enable us to de-duplicate re-uploads as long as they are the same documents in terms of bytes because of the open nature of data - based on their hash. We'll be able to see when something first appeared & the discussion will be much less fractured between platforms and posts - leading to greater depth.

We'll be able to more easily address parts of documents and share ranges of entire videos without having to re-upload them as separate clips which breaks the contextual link. If this becomes as easy as (or even easier than) it currently is to crop & re-upload, then it will become the norm - we'll all prefer not losing the context. In this paradigm deepfakes will be easier to spot & fight - tracing the source of content authentically to identity is important & desirable.

TODO: regarding deepfakes - only official statements could be traced - unofficial leaks will still be unprovable

Verifiable credentials

Entities can sign messages that attest facts about other accounts - the creation of such verifiable credentials doesn't have to happen on-chain - they can be issued off-chain with a message that's only anchored on-chain and has a URI. "issuance is common, revocation is rare" - later revocations & updates can be handled in one of 2 ways:

  • On-chain revocation/updates: if the attestations are uniquely numbered with a counter from the issuer using a nonce, then the Headjack state can be extended to support a special revocation list field in which the chain can record revocations at specific blocks - then the validity of said attestations will be checkable with a single query to the blockchain state. For updates there would be a second list and in order to check the validity for an attestation after an update has been recorded for its nonce, users would need to fetch the off-chain anchored message corresponding to the update at the block at which it was flagged. The blockchain may charge periodic fees for state rent for these lists.
  • Fully off-chain: in which case there will be some liveness assumptions around the issuer for checking if an attestation has been revoked/updated.

Reputation systems

"Reputations will be of central importance, far more important in dealings than even the credit ratings of today." - The Crypto Anarchist Manifesto

We don't need oracles, tokens, automatic on-chain settlement & markets through smart contracts to build reputation systems for predictions & promises - all we need is to immutably sequence predictive messages that are authentically linked to identity and plot the results - the open nature of the data would disincentivize platforms to display it incorrectly which is enough - we trust block explorers after all.

Take the Tipranks platform as an example - we can generalize it for anyone in the world - not just for certified financial advisors. The reality is that millions of people are effectively guilty of shilling, despite some preficing it with the infamous "this is not financial advice". We can self-regulate the crypto & financial industries bottom-up in a decentralized way - steps:

  1. come up with the base set of extensible prediction message types
  2. build the tools that plot predictions versus a price feed
  3. demand that influencers use the specific types of messages for predictions
  4. refuse to listen to accounts that don't use that format and build the habit to check track records before listening to someone - this can (and will) become a social norm
  5. let the chips fall where they may

Message types can be in an extensible inheritance hierarchy and have "fallback" translation mechanisms defined in their on-chain schema for platforms that don't support specific leaf types. As an example: on-chain schema 42 can have the following template for serialization: "{asset} has an {probability} chance of being {above_or_below} {price} by {date}", and thus a basic application that encounters {message_type: "42", asset: "$BTC", date: "2025.02.12", above_or_below: "above", price: "100000$", probability: "80%"} could render "$BTC has an 80% chance of being above 100000$ by 2025.02.12". Or there could be a message type with spline curves. This way the system can evolve even if applications move at different pace and there's no consensus on the evolution of messages - it will naturally happen. Rigidness and/or lack of consensus for such standards has been the bane for many open systems.

The argument that specialized message types are unnecessary because AI will eventually be able to classify things properly is mute - lets get something that is unambiguous and working now - structure is good. The use case for reputation goes beyond finance.

"Finally, self-authenticating data provides more mechanisms that can be used to establish trust. Self-authenticated data can retain metadata, like who published something and whether it was changed. Reputation and trust-graphs can be constructed on top of users, content, and services. The transparency provided by verifiable computation provides a new tool for establishing trust by showing precisely how the results were produced. We believe verifiable computation will present huge opportunities for sharing indexes and social algorithms without sacrificing trust, but the cryptographic primitives in this field are still being refined and will require active research before they work their way into any products." - bluesky

Science, peer review & DeSci

"Society, business & money are downstream of technology, which is itself downstream of science. Science applied is the engine of humanity." - @naval

Open source code is compiled, ran & verified by many independent actors - we should fix the replication crisis in science and push towards more reproducible research.

"More than 70% of researchers have tried and failed to reproduce another scientist's experiments, and more than half have failed to reproduce their own experiments." - source

"Imagine if we optimized for number of independent replications over number of citations." - @balajis

Papers can be split into text, data, code & results with all of them referencable with stable URIs & cryptographically tied to identity, reputation & open peer review. There will always be an element of trust for the input data coming from the physical world but the digital part can be locally verifiable & replicable. Citations could become function calls / imports so that we can trace the dependency graph in science and focus on re-testing the most important bits - we might save a billion or two and avoid lost decades. We then could easily change the data in one paper and see the ripple effects for everything that depends on it. Let's build the digital chain of custody for papers, science & facts.

"Composable science is reproducible science." - @balajis

TODO: bringing real-world data on-chain

If we onboard the worlds information and build reputation systems, we will have solved the oracle problem of bringing facts & events to the blockchain. Example: the result of a ufc fight - no longer need an oracle as ufc itself will post the result in an unambiguous way

Event streams & data legos

"Composability is to software as compounding interest is to finance" - @cdixon

Same data - different views

"Data is the center of the universe; applications are ephemeral." - The Data-Centric Manifesto

The public conversation shouldn't be fractured between platforms such as Twitter, YouTube & Reddit - instead it should be one but viewed through different lenses (based on moderation / indexing / visualization). The Twitter view of a discussion is basically the same as just the top level comments (without the children) in a Reddit thread. Segregated discussion in the open web serves nobody - there should be canonical IDs for events & information that we can all refer to. It doesn't make sense that comments can be arbitrarily disabled for some document on one platform (YouTube) but enabled on another one where a URL from the first is shared. All content could be interlinked, deduplicated, referencable, quotable, commentable & shareable.

"When identities become portable, backends become liquid" - @balajis

Event streams

In an open data environment anything could become an event stream as long as someone is willing to pay for the processing costs (filtration, transformation, storage):

  • edits to a specific document identified by a URI
  • references/mentions of an account/entity/word/URI in public documents
  • any other type of filtration criteria - thresholds, exclude lists, etc.
  • complicated streams can be constructed by transforming/joining others - similar to Kafka

If someone implements speech-to-text and starts transcribing audio episodes and publishing the output it would immediately become available to anyone and would automatically end up being parsed, indexed & pushed through data pipelines. Composability. This is not possible with closed platforms - even if someone was willing to pay the processing costs.

https://openai.com/blog/whisper/ https://news.ycombinator.com/item?id=32927360

Notifications & subscriptions

Twitter decided that it needs to boost engagement and forced "recent tweet" notifications on us without the ability to turn them off - that needs to stop - explicit preferences should be honored.

"Notifications are just alarm clocks that someone else is setting for you." - @naval

When identity is decoupled from the presentation layer we could have IDMs that align with our needs - we could fine-tune how and when we want to be notified. The incentive for an IDM is not to suck all of our attention (as opposed to applications that usually serve ads) - there are other ways to monetize. We'd be able to set a threshold or filter on anything. Subscriptions can be granular & multidimensional for any type of event stream - like "show me everything from X unless from application A or message type T". Some IDMs could even offer the feature to show notifications only in specific time ranges of the day - for those addicted to dopamine hits.

TODO: follow anything

What if you wanted to see when someone follows something as an event/notification - each time they follow someone? There's no way to configure the current platforms for that

We should be able to follow playlists in social media - and be able to turn on notifications for them

Improved infrastructure

Tying data to identity and making it freely available & outside of silos through content-centric addressing enables a lot of composability, functionality & innovation.

Code as addressable data

Frontend code served by applications can be published and have its own URI. Updates to it would happen by broadcasting the next version along with a new URI and then pointing on-chain to it as the latest to use for viewing media. This way presentation layers could be cached locally and in a distributed way with proofs for authenticity - improving redundancy, latency, and throughput. Checking for a newer version would be a small query to the chain if there is a new URI - version control for frontends. This can work even for more dynamic applications that serve different versions depending on region/locale or which are A/B testing - the dynamic part could be served from a centralized host while smaller chunks of code could be referenced through URIs.

TODO: this wouldn't be necessary https://twitter.com/armada_infra/status/1584942215217836032

Better and more competitive search engines

  • Building indexes would be greatly simplified as they will be plugged to the global message bus and update only on events (push) - instead of periodic batch crawling of the public web (pull).
  • Message schemas will improve the indexing & information extraction from dynamic websites. The semantic web will also greatly empower search engines & unlock powerful queries.
  • The move to data-centric addressing and the desegregation of data will lead to a lot less duplicates and more rich & precise context around any event/message.
  • Currently ephemeral experiences (search suggestions) leave no trace and it's extremely hard to prove bias (aks Dr. Robert Epstein) - competition & a lower barrier to entry are direly needed.
  • Search engine sophistication would span the full spectrum - from data center scale to those that you can run locally at home, or the specialized - The Future of Search Is Boutique.

Optimal archiving (like the Internet Archive)

  • Actively polling all websites on earth periodically & to check for changes and save snapshots won't be necessary - instead it will just watch & save all incoming events and have a complete history without any redundant data & inefficiencies.
  • By decoupling content & presentation HTML only the essential could be saved. Applications can signal a change with a new message type in what they serve to browsers for presentation & rendering of content which the archival service could save throughout time as well to provide the historical views. Data duplication in snapshots can be driven to 0.
  • Content that is no longer accessible through the original application that published it and is not archived by the user that posted it (but hasn't been explicitly deleted) would still be accessible by anyone with the same persistent URIs when querying an archival service.

Redundancy, scaling & topological flexibility

Data-centric addressing with self-authenticating data allows for distributed & resilient to DDoS attacks architectures that span the entire globe with horizontal scaling & store-and-forward caches. Computed views such as indexes, graphs & aggregate metrics (counts) can also be made addressable, distributed & cached with either optimistic authenticity (trust by default but with a way to recreate them and check for equivalence) or even have a proof with verifiable computation. Furthermore, there are at least a few points to query for the data of a URI.

Knowledge management

Tying data to identity and making it freely available & outside of silos through content-centric addressing enables a lot of composability, functionality & innovation.

"However, to rely on purely idealism as a motivator of adoption is naive. We need a user experience that is much better than today and we need to invent tools that users will absolutely never attain in the Web 2 realm." - "Social Web 3" by Zee Prime Capital

Bookmarks & playlists

Universal bookmarks - they can have a single repository (your IDM) and work for any type of document from any application. They will be persistent and you could even cache the actual contents that a URI points to along with proofs - in case no one hosts it in the future.

Your personal knowledge base could be built with something like Logseq with URI references to external documents that can be locally cached. Looking up the discussion/commentary for a resource with a URI would be just 1 click away.

Playlists are lists of bookmarks and could work even with heterogeneous audio/video providers which anchor the tracks and provide URIs for them. Spotify could be just an application that uses your IDM for account storage and is paying to other media hosting providers for the streaming.

Intra-document addressing

In Medium you can tweet a selection (sentence/word/paragraph) but when going back to the article from the tweet you don't get shown the original selection. With some archival services you can point to a text selection - for example this link has "Prussian Model" selected from the title when you open the page (#selection-635.4-635.18) and you can change the selection which also changes the URL, but that's possible only because there's a specific hash in the URL (O2D45) and the document is guaranteed not to change in the archive - however that's not the case with Medium where the authenticity of documents is host-certified and they can change in time.

With Headjack URIs point to a specific version of a document and as explained in the addressing chapter we could point to parts of documents in the URIs. If a document has been changed, updates will have their own new URIs and when an application is showing an old URI with intra-document addressing it could:

  • either show a label that there's a newer version of the document and the user can switch
  • or directly show the new version if it's possible to transfer the selection without conflicts

Headjack's intra-document addressing is universal - it works for audio & video too and the application from the startup case study could display this clip with this quote in a much better way:

"The internet creates 1 giant aggregator for everything" - @naval

This can be pushed further - any composition/remix/meme of media could contain the references to the original text/pictures/audio/video so the sources of something can be traced and credited - imagine something like a movie maker that composes from other clips and all metadata is retained.

TODO:

https://subconscious.substack.com/i/49124972/text-fragments-select-excerpts-by-search https://wicg.github.io/scroll-to-text-fragment/ https://support.google.com/chrome/answer/10256233?hl=en&co=GENIE.Platform%3DDesktop= https://en.wikipedia.org/wiki/Vannevar_Bush#:~:text=wholly%20new%20forms%20of%20encyclopedias%20will%20appear%2C%20ready%20made%20with%20a%20mesh%20of%20associative%20trails%20running%20through%20them%2C%20ready%20to%20be%20dropped%20into%20the%20memex%20and%20there%20amplified

The Semantic Web (a.k.a. the original "Web3")

https://twitter.com/Golden https://golden.com/blog/golden-raises-40m-series-b/

The biggest hurdle for its adoption has been the host-centric paradigm and the hoarding of data in silos with no incentive for exporting & interoperability - Headjack changes that through data-centric addressing & broadcasting by default. While there will always be companies that enrich & tag data privately with their own ontologies and vocabularies to construct knowledge graphs for themselves, with open data by default and persistent URIs that always point to the same documents anyone will be able to broadcast similarly annotated versions of content with new URIs and relate them to the originals in a stable way for reuse by others. We can give birth to the public Giant Global Graph outside of large centralized systems such as Google and Facebook. Machine learning for processing unstructured data has its place but it can only go so far - structuring through the use of different message types and further annotations will make everything a lot more machine-readable.

Query & plot anything

There are no limits to the types of queries we should be able to make - some simple examples:

  • "Plot a timeline for all references to an event and filter based on some criteria."
  • "Show me responses/references from high-trust individuals to the top 5 controversial statements of person X and sort them somehow."
  • "When has person X talked about the gut microbiome and have they recommended/mentioned a product or company?"
  • "Has anyone I follow or is connected up to N degrees of separation with me shared/mentioned X in the last Y days?"

Companies with complex indexes & private knowledge graphs could charge for running queries.

Forking media & communities

Most applications will have some kind of moderation & filtration but at any point in time anyone will be able to create a competitor with a different set of rules and/or functionality. Migration can be seamless and the activity of accounts could (and in most cases will) be displayed in both at the same time - but the preferred application will be signaled through the URIs of public actions.

Content coming from competitors could be completely banned but that would be shortsighted - data coming from somewhere else should be displayed as long as it is properly structured and actions from banned accounts can simply be shown as deleted in reply threads. Migration can be gradual with little fracturing of communities & the conversation.

The future of publishing, knowledge & learning

Books as a medium imply transmissionism as the learning model - "people absorb knowledge by reading sentences" - which is wildly incorrect and inefficient. Let's ask this question:

"How might we design mediums which do the job of a non-fiction book—but which actually work reliably?" - Andy Matuschak - “Why books don’t work”

Here's a good start:

  • Outlines as hierarchical expandable trees (like a GitBook) and fractal reading (each chapter summarized in 1 paragraph, expandable on several levels if you want to dig deeper)
  • Interactive executable programs with exercises & examples embedded in documents.
  • Note taking, flash cards, etc.

But that's just scratching the surface - deduplicated open data, persistent URIs linked to identity, intra-document addressing, and the ability to cache resources with proofs enables a lot more:

  • View the discussion and comment on anything referenced externally with a URI - or parts of the book itself as they could also have their own URIs - or even paragraphs!
  • Use tools like Logseq, Roam Research & Obsidian but on top of stable URIs & the semantic web to build your own knowledge base as bookmarks & a graph and share it if you want.
  • Publications can be self-contained and permanent by including everything external for offline use with proofs for authenticity - that's how journalism should be done.
  • Visualize your progress of going through an interconnected book as a color-coded map - an overview of what you've already looked at, how much time you've spent, and what's left.

Imagine constructing stories as interconnected documents, concepts & entities while wearing a VR headset by pulling from the semantic web & the open internet and publishing that as a self-contained & authentic package that anyone can save offline, explore & build upon.

"The most powerful person in the world is the story teller. The storyteller sets the vision, values and agenda of an entire generation that is to come" - Steve Jobs

Let's empower them.

Algorithms, feeds & aggregation

The most important aspect of today's media are the recommendation systems that guide our attention and ultimately our thoughts, opinions & culture at scale. Most algorithms we use are black boxes - we don't know what they are nor how the ML models are parameterized & trained and if there's any (intentional) bias. And even if that was public information - there would be no way to check because we don't have access to the data either. Furthermore, nobody can build new competing indexes, models & recommendation engines and we're left with the lowest common denominator that optimizes mostly for engagement & time spent in the attention economy.

Is it a coincidence that one of the most important papers in AI/ML that introduced transformers was called "Attention is all you need"?

Verifiable algorithms, indexes & models

Within Headjack, all public data is freely accessible and the data network effect is shared by every actor in the ecosystem - anyone can train new ML models, compute views and analyze data in new ways. There will be demand for all kinds of indexes & models but because entry is permissionless a market will appear and the ones that process data the most efficiently will be rewarded. The results of such batch processing jobs could be public or private (available only to those who pay for access) and the product would be verifiable - either optimistically (trust by default but with the option to recreate them and check for equivalence at some cost) or through verifiable computation.

"Rather, many, many different individuals and organizations would be able to tweak the system to their own levels of comfort and share them with others—and allow the competition to happen at the implementation layer, rather than at the underlying social network level." - Protocols, Not Platforms: A Technological Approach to Free Speech

Feeds & home pages

We have no control over what social media shows us - we're presented with their best attempt at engaging us and nothing more. Feeds are primitive/limited and have their own agenda. Once access to data has been democratized we could finally have a choice as new entrants will differentiate themselves in ways that weren't possible before. What if:

  • you want the first thing you see to be a dashboard with graphs & charts about the things you care about instead of an endless feed to scroll?
  • instead of a feed you were shown a map with events grouped by category/people/activity so you could choose selectively which zones to "zoom" into based on your interest?
  • you're interested in all events from the past week - not just those from the last 24 hours?
  • you want things sorted differently, or to tune out certain kinds of content?
  • you want your video feed to show content that people you follow have engaged with?

Signaling preferences

Events coming from an application are an advertisement for it because the URIs point to it as the source - users are effectively signaling what UI, filtration & content moderation they prefer.

But this can be pushed a lot further - using public message types more specific preferences within one application (and compatible with others) could be signaled. The choice of recommendation algorithms, filtration criteria & moderation levels could be displayed in account profiles throughout all applications either as badges or in some other way. An application could accommodate a wide range of preferences by using different indexes & models which would be handled by infrastructure layers beneath them - reusable by all other applications that need them too.

This way of signaling fine-grained preferences could allow us to collectively migrate to better algorithms - by showing which interests & perspectives we've adopted ourselves. One value to signal would be to use algorithms & models that are transparent and not some black box - we could study how virality happens. What if some algorithm promotes antisocial ultra viral content close to the borderline of acceptability just for the sake of engagement? We could boycott it. And why not choose indexes & algorithms that up & down regulate specific accounts? Moderation could be done through a set of filters to which users opt in/out by toggling & layering - choosing what to amplify and what to tone down. Such preferences can also be private.

Currently social media skews our perception of the world because the vocal become viral - most people are quiet and rarely (if ever) post anything, but they do consume. We could let them signal their preferences and better gauge our values. Preferences are a form of expression.

Aggregate sentiment - the big picture

The aggregate sentiment on Twitter’s backend is analogous to a liquidity order book with the spread being the Overton window - we could have a completely different understanding of society, history, and politics and have a societal mirror if not for the current information asymmetry that Big Tech has (point taken from here). Furthermore, we cannot see the "border" between Twitter and Facebook in terms of users - even they (the companies) cannot because the data & backends are private (point taken from here).

TODO: https://twitter.com/balajis/status/1581635513886253057 https://pbs.twimg.com/media/FfMYkpLVEAAOo_d?format=jpg&name=large

With open data, systems could show the overlap of communities and focus on what's shared and unites them - bridging the gap, making “the others” less foreign, and reducing polarization. We should be taking notes from Taiwan's civic hacktivism and their Computational Democracy Project & G0v movement - building better tools for social consensus & relaxing the culture war. Everything could be visualized through dashboards & graphs - including global heat maps of interests & activity based on topic/region/etc. Learning from history & making analogies could be much more quantifiable and precise if everything was in the ledger of record. We could leverage AI to surface human collective intelligence at scale in a transparent way.

"The general trajectory of institutionalization associated with steadily increasing specialization, urbanization, and bureaucracy may mean that mass media will continue to rise in importance, playing the role of the juicy gossiper in our increasingly separated existence from one another." - The Importance of Gossip Across Societies

Business models

Let's see how the current digital economy could be translated & reimplemented on top of Headjack.

Why build if vendor lock-in isn't possible?

Freedom can be a competitive advantage: Substack lets you leave & take your subscribers with you (an email list) - a conscious choice to compete on the quality of service. But they are an outlier.

Headjack's fundamental premise is that it unlocks many possibilities & value to users because the barrier to entry for services would be much lower and they could be composable & interoperable which would lead to greater innovation, quality, choice & freedom. So given that users would benefit more in this ecosystem and that their attention is finite & will be spent either way - it stands to reason that there is money to be made in an even bigger pie! But competition will look different.

"The whole is greater than the sum of its parts." - Aristotle

Unbundling the media stack with markets

"There’s only two ways I know of to make money– bundling, and unbundling." - Jim Barksdale

The rule book will be different - instead of vertically integrated services with tons of employees that reinvent the wheel, companies would specialize and compete on different layers of the stack. Markets Are Eating The World - there will be one for each of these:

  • Storage & retrieval of historical events, data & results of batch processes.
  • Indexing at scale and even custom indexing by specific criteria.
  • Training of AI models for recommendation systems, feeds, etc.
  • Ad serving engines that match users with ones that would be relevant to them.
  • Computed views (aggregate metrics) such as counts of likes & other interactions.
  • Content moderation & safety labelers (malware/copyright/offensive/spam) producing lists with flagged accounts, specific pieces of content or even entire applications.
    • Their output could be used not just for the final presentation layer but also before indexes, models & computed views are created as they could be gamed with spam.
  • Custom types of data streams, event subscriptions, and anything else one could think of!

Creating a new social application would require just a bit of frontend code by a few developers with all the heavy lifting of infrastructure, data storage & processing, content moderation, ad serving & recommendation engines being provided by pay-as-you-go services through APIs. The (data) network effects will be shared by every actor in the ecosystem. Launching a proof of concept would be trivial and new business models & use cases (no ads for social) would be enabled.

  • mention alchemy in business models page about infra companies https://www.alchemy.com/ https://twitter.com/n_x_y_z/status/1580136531552915456 https://n.xyz/

It will take some time for markets to mature and the dust to settle once specialization & competition are democratized, but then prices will be low and the efficiency & quality of services high.

Attribution

TODO: re-read from "A recent example" https://www.thepullrequest.com/p/attribution-rules-the-world-and-itll

"Ads are the cave art of the twentieth century." - Marshall McLuhan

The goal is democratized access, competition & innovation - not the end of the Ad model which is a pipe dream - it isn't going anywhere. The current problem is that we don't have alternatives and that there's no transparency.

"For as long as humans have crafted disembodied versions of their voices, whether it be Pompeiian graffiti or the latest tweet, there have been attempts to both guide user attention in some remunerative direction, and measure the effectiveness of that attention-gathering." - source

"The attention economy has always had its ledger and its cash register, and Web 3 will be no different." - source

!! on data brokers https://themarkup.org/privacy/2021/04/01/the-little-known-data-broker-industry-is-spending-big-bucks-lobbying-congress

"Attribution is the accounting layer of the entire user-acquisition stack: social media, organic community building, referral programs or (gasp!) even ads, they’re all just inputs to attribution. It's much more than just bean-counting, though the bean-counting is important. It’s the capital ‘T’ Truth that the entire ecosystem depends on." - source

"Internet monetization is somewhat like a Soviet election: It doesn’t matter who clicks and where, it’s who counts those clicks that matters. The technology and business of that counting of clicks (and everything else you do online besides) goes by the dull-sounding name of attribution, and it determines the fate of trillion-dollar companies." - source

Let's examine the following example:

"A recent example of questionable attribution: I tweeted about my e-mountain bike, Jason Calacanis (of all people) saw it and asked about the model. Someone posted a review from bikeradar.com, and Jason (I don’t know if this is true) probably googled for it and maybe bought it. Who deserves the attributions credit? According to Google (surprise, surprise), it’ll be Google … and they’ll take all the credit, which is why Google is worth so much and Twitter so little." - source

This entire flow could be tracked

Furthermore, attribution services could credit accounts that retweet content and add more interactions with it with some of the profits

de-duplication through content addressing & adding traceability of content helps paint the picture what happened when and by who - aiding attribution. Infrastructure companies on which applications get deployed on can handle such tracking within them but since all the data is openly broadcasted competing services could offer alternative business models.

"Whether it be circulation numbers for 19th-century newspapers (the start of the printed ads business), or Nielsen ratings for pre-cable TV that determined ads rates there, there’s never been a media ecosystem that didn’t have attribution." - source

"If you had to conjure some collective mechanism for storing aggregated data that was selectively shareable between publisher and advertiser, it would look much like a blockchain." - source

^^ this can be done through verifiable computation & sharing indexes between publishers & advertisers within index infrastructure.

This article posits that attribution has to go on-chain but Headjack offers an alternative - through the infrastructure companies.

It is inevitable that mass media & aggregation will be handled by centralized services that can save user details in logs - what we could do is make a market for user data and make it easier to opt out of such data harvesting with alternatives & a way for whistleblowers to prove if a contract hasn't been honoured.

the history of actions and tracked events would be private and shared only between infra platforms and ppl paying for ads?

what if people jump between apps but aren't logged in some of them? chain is broken.

applications could strategically choose which infrastructure companies they use in order to provide the best attribution for accounts

The ad business needs competition: https://www.wired.com/story/google-antitrust-ad-market-lawsuit/

TODO: grafika kato tazi: https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/388a0504-f0a1-4994-b1ab-a1084aee9008_3453x1949.png

Users could advertise a wallet for micro payments for attribution - or could be handled by their IDMs

"Advertising will not go away—it never does—but who profits from it will change radically." - source

Advertising

google currently gets the lions share of the ad revenue

"For every 1,000 ad views, advertisers pay a certain rate to YouTube. YouTube then takes 45% and the creator gets the rest." - source

"Your take rate is my opportunity" - @cdixon

https://a16zcrypto.com/wp-content/uploads/2022/05/Screen-Shot-2022-05-16-at-10.43.30-PM-1024x582.png

"You know something is profoundly wrong with our economy when Big Tech has a higher take rate than the mafia." - U.S. Congressman Ritchie Torres

One reason Headjack would win is because there will be no wallet fragmentation - users will mostly have 1 account.

Ads themselves are not the root evil - it's the lack of choice & ability to exit in the current monopolistic world due to the benefits of vertical integration in the current host-centric paradigm

TODO: re-read https://www.thepullrequest.com/p/the-right-to-never-be-forgotten

https://mobiledevmemo.com/app-tracking-transparency-codex-guide-to-idfa-deprecation-and-skadnetwork/

https://mobiledevmemo.com/how-apple-might-break-fingerprinting-in-ios-16/

google: Apple ATT / SKAdNetwork

attribution is tricky https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/3d79eda8-590c-4984-97b1-799a21d7b0f7_3734x1183.png https://www.thepullrequest.com/p/everything-is-an-ad-network

Application attribution

All content is published through applications and the creator's choice for which one to use is forever embedded in the content URIs. This can serve as advertising for applications because even when a URI is viewed through competing ones, the original source should be displayed and there ought to be a way to view it through the original one - especially if the message type is not supported.

This can be a powerful way for new applications to get discovered - they can even pay creators to use them in order to attract new users and have their names in the URIs of content forever.

Identity managers - the new storage cloud

IDMs will be the central hub to store all user preferences, social graph, DMs, progress on long-form media, personal notes & bookmarks, etc. But also pictures, videos and many other types of data - the one-stop shop for all things identity. There'll be plenty of ways to provide free tiers and yet monetize for more heavy & advanced use. Anyone will be able to host their own IDM.

IDMs could also provide a feed & a home page similar to current social networks with the most "relevant" content for users and even serve ads, but there'll be a lot more configuration options & choice because all IDMs will be in competition with each other & users will be able to migrate.

Creator monetization

creators will need to shift how they monetize because there won't be platform lock-in & attribution :/ or actually the application attribution for content naturally leads to users checking it out - creators can and should be paid to generate content through them and based on virality of content they could get paid out? omg.

Video, long-form media & streaming

Let's take YouTube for example: currently it is & does many things at once: storage & streaming of video, a recommendation system, serving ads, social features, subscriptions, etc.

In the brave new world of Headjack it would be completely unbundled

TODO:

  • is the video hash uploaded?
  • embedded player required?
  • serving ads or just charging apps for the data streams?

What if some platforms don't freely provide the content but just anchor it and provide APIs like embedding youtube?

separate content hosting & delivery from the recommendation system and the serving of ads

streaming

The future of influencer marketing

Currently, companies cannot target you directly with ads without the social media platforms as middlemen - but with open data, any company could analyze the world and decide what their target audience is. Currently, Coca-Cola targets audiences with the same ad but with some precision - imagine if they could directly target individuals with tailor-made ads based on them - what if an algorithm picks up that you're gay and coca cola send you an ad with a shirtless male (if you're male) - point taken from Yuval Noah Harari here: https://youtu.be/j0uw7Xc0fLk?t=260

Imagine influencers being able to use algorithms to generate synthetic versions of themselves pitching different products to different people and having the application infrastructure serve different versions to different people based on who they are. The influencer is the magnet and the algorithms are the tailors. This is the future whether we like it or not. The good thing about an unbundled media stack is that some services will let you avoid ads for a subscription fee.

Influencer marketing is the end game

The first mover advantage & the transition

No other solution has a seamless way to address content on http and bridge with traditional dns

traditional web2 companies/apps/websites will be able to gradually transition and anchor their content into this namespace - cost of entry would be marginal and the first to do so would get indexed and start getting shown in search results in this ecosystem first

Startup case study

Problem statement: We should be able to comment & annotate long-form media at specific points in time on its timeline as it is increasingly becoming the preferred medium.

The idea

Why not just show a histogram of where most of the comments are and provide a resizable window as an additional widget on the timeline (in addition to the progress cursor) which can filter comments & annotations based on the range and display the threads below Reddit-style with sorting & filtration options? Here's a screenshot of precisely that (ignore the colors & bad style):

The ultimate audio/video player can offer a lot more than just a comment histogram - it has the potential to be a vibrant social experience:

  • 1-click repositioning & resizing of the filter window to timestamp ranges for different topics plotted as horizontal bars (already done & visible in the screenshot)
  • sharing links to specific ranges - as clips, but without losing the context
    • many channels re-upload clips from longer videos - this should instead be a simple "retweet" of the original material with a range (window) as a parameter
  • different types of comments: annotations, questions, personal (private) notes
  • search field for within the comments that are in the current filter window
  • crowdsourced annotations (tagging resources/events/concepts/entities)
  • plotting/toggling different types of histograms (not just comment density):
    • plotting different types of reactions
    • where other users spend most of their time
    • where the user has already played the episode
    • comments that have more than X upvotes or up/down ratio
    • where the most controversy or facts requiring a crowdsourced check are
    • highlighting comments that match the current search filter/query/regex

Here's an older video showcasing this UI (can't resize the window with the mouse yet).

A unique interface

This is nothing like SoundCloud where tiny overlapping rectangles with profile pictures are rendered on the timeline that you can just hover with the cursor to see the comment nor like YouTube where you can write a timestamp (hh:mm:ss) in a comment which becomes a clickable link that fast-forwards the player - fighting it out with the other 20 000 comments in the single linear vertically scrollable section of a 3 hour long podcast and hoping to be noticed - both are horribly insufficient and unusable. For both of them comments are just an afterthought - it is extremely hard to discuss specific parts of long-form media & for good localized signal to actually surface & be noticed.

Challenges in the current web2 world

How would this work as a web2 company?

  • For it to provide value it would need many users commenting on the same video or otherwise there will be no histogram and the whole interface will be pointless. It needs a network effect.
  • Discoverability is hard - it would require that users actively share links to it on other platforms.
  • What if YouTube/Spotify cut it off & disallows embedding their players & using the APIs?
  • Unclear if it should try to be its own platform or just a (browser) plugin to other platforms:
    • The platform way is more ambitious but requires a ton more work & vertical integration: direct messages, a social graph, notifications, a feed & recommendation systems, etc. Would it need to compete with YouTube & Spotify at some point? The likely outcome.
    • The plugin way is a much more limited experience. How would users share links to comments & timeline ranges to others that don't have the plugin installed? How will users be notified for comments & replies? You'd be completely beholden to the platforms.

How Headjack's paradigm fosters innovation

Contrast that to a web built around Headjack:

  • The amount of work for this UI would be minimal - reusing components already present:
    • Identity & single sign-on, DMs, notifications, progress bars, preferences & connections will all be handled by IDMs.
    • The heavy lifting (processing & indexing all broadcasted events, perhaps moderation & even ad serving) will be handled by infrastructure services which will charge the application just the right amount for the sets of API calls & indexes that it needs - pay-as-you-go. There will be different competing infrastructure providers and applications will be able to migrate to the one with the best service or it may even utilize multiple ones at the same time - offering users the choice which indexes & filters to use.
    • The costs & team required to implement this UI will be minimal and focus would be on delivering value to users. The barrier to entry would be lower and better UX & functionality would be unlocked - many more niche needs could be addressed.
  • Application discoverability - the moment someone with a following posts a comment through it that comment would immediately be visible to their audience on applications like Twitter and the URI for the comment will contain the name of the application that was used for its generation - serving as application attribution & advertisement of the alternative application. No need for influencers to explicitly share "hey! check out this platform where I made a comment and make sure to follow me there" to their audience - it'll be automatic & implicit.
  • No risk of platforms cutting access to their APIs to third-party apps & changing their ToS because in this ecosystem everything is a third-party app around unified identity and data is no longer host-certified - addressing is data-centric. There are no MOATs & lock-in - everyone focuses on their niche and competes on competency & offering the best possible service.

What REALLY is Headjack

Let's explore a few mental models, analogies & points of view when thinking about Headjack:

Twitter's original sin: protocol vs company

Twitter's potential has fascinated some of the brightest people in the world ever since its inception:

"It's a new messaging protocol, where you don't specify the recipients. New protocols are rare. Or more precisely, new protocols that take off are. There are only a handful of commonly used ones: TCP/IP (the Internet), SMTP (email), HTTP (the web), and so on. So any new protocol is a big deal. But Twitter is a protocol owned by a private company. That's even rarer." - Paul Graham

But the original sin has been that it's owned by a company which Jack Dorsey states as his biggest regret - no wonder that he's been trying to turn it into a protocol - first with project Bluesky (now AT Protocol) and more recently with TBD Web5 (both covered in the comparison chapter).

"Evolution only builds on open formats and protocols. That's how technology layers." - source

Identity - the base layer of cyberspace

Web3 is often associated with user ownership of networks/services/data, governance, NFTs & micropayments, and while all of them will play a part in it, the main aspect is the decentralization of identity and making it sovereign. Headjack is an application-specific blockchain which can be thought of as an open state database where only the most important bits are decentralized and identity is linked to off-chain data & events with stable & human-readable addressing - giving birth to the ledger of record. Anything can be built on top of that.

"Compositionality is the principle that a system should be designed by composing together smaller subsystems, and reasoning about the system should be done recursively on its structure." - Jules Hedges, On Compositionality

The global event bus / pub-sub

One way to look at Headjack is as a global publish-subscribe messaging network where accounts, message types, tags, application sources & anything within documents can be treated as topics to subscribe to - a multidimensional notification highway. It's the ultimate user-friendly successor of the too technical RSS - "people jumped ship as soon as something better came along".

When compared to Kafka: it provides only logical addressing (allowing for the storage layer to evolve & migrate data seamlessly) and delegates the data availability (storage & retrievability) of the actual documents/events to the entities in the ecosystem (users/IDMs/applications/archives) on a best-effort basis without guarantees (but with multiple ways for retrieving data for a URI).

Twitter's firehose allowed you to ingest all events on the platform in real time for a steep price and was eventually shut down. Headjack is the global, generalized & permissionless version of it.

The address space of the World Computer

Headjack's address space can be thought of as virtual memory and is enough for practically infinite number of events with the concatenation of 4 integers (64bit/32bit/64bit/32bit - see addressing) + the ability for intra-document (fractal) addressing - much better for indexing than hashes. We can build an infinite-core world computer atop this distributed shared memory.

Think of it as an information bus on top of which any type of distributed system can be architected thanks to the minimal semantics, self-describing messages, dynamically definable message types & permissionlessness. The service objects that deal with identity & authorization are on-chain with guaranteed data availability whereas all other data objects are off-chain, anchored & sequenced.

"The internet is the computer but it's missing identity and acls." - koalaman.

Ethereum is NOT the world computer - it's the world's settlement layer. The world computer will be built on identity coupled with authorization & a unified address space.

The Metaverse is the Matrix

And Headjack is the interface.

"The “metaverse” as I like to envision it, is a globally shared and permanent digital reality not owned by any single entity that any company, platform, or person can plug into, regardless of where they are or what device they’re using." - source

In the Metaverse entities will connect & interact with information under a common global namespace & surf the web through competing applications & views that present & filter commonly addressable data in any way imaginable. It will be built on top of a layered stack of technologies & open protocols and underneath everything sit Headjack's primitives: identity, authorization & content addressing. All events can & will be interlinked, tied to identity & made easily referencable. Anything can be built & made composable & interoperable on top of these building blocks as long as they don't impose any constraints - no more walled gardens. Furthermore, the Metaverse is mostly about agency in creation - transaction, exchange & finance are not at the forefront and will be handled by other layers & protocols like generalized smart contract chains & NFTs.

"The metaverse isn’t a 3D world owned by some corporation. It’s a permissionless market-network which respects and interconnects all user-owned and cryptographically-secured digital identities, reputations, wallets, communities, spaces, and objects." - @naval

"We think of the metaverse as the entirety of all composable and interoperable resources, identities, applications, platforms, services, and protocols that exist in cyberspace." - source

"The metaverse has nothing to do with “view” modalities — the tools you use to see the metaverse. That’s a convenient meme for those who have control over manufacturing hardware." - source

Refer to these 2 great resources on what else would be necessary for the Metaverse:

"Cyberspace does not lie within your borders. Do not think that you can build it, as though it were a public construction project. You cannot. It is an act of nature and it grows itself through our collective actions." - "A Declaration of the Independence of Cyberspace" by John Perry Barlow

The supermassive digital gravity well

Headjack is a confluence of multiple interrelated things (identity, names, authorization & addressing). Their synergy leads to the highest utility for names: they're embedded in content URIs. This results in a winner-take-all network effect with unprecedented gravity that would suck all data to be cryptographically anchored to it. It has the potential to truly decentralize DNS - something which Namecoin and Handshake would have had a much harder time doing on their own.

"The internet creates 1 giant aggregator for everything" - @naval

Headjack definitively aims to be the backbone of the entire web - acting as the coordination substrate of cyberspace. It will disaggregate traditional platforms such as Twitter, Reddit, YouTube & Instagram through unbundling, reconstruction & interoperability on top of Headjack's building blocks by mixing and matching various presentation layers, architectures, business models, content moderation policies, etc. "The whole is greater than the sum of its parts.".

"Few tech giants of the past have ever been unseated from their dominance via competition alone: Microsoft never lost the desktop, Google never lost search, Twitter has never lost the public square, Amazon will never lose e-commerce, and Apple will never lose mobile devices. The only way to get out from under those weary giants is creating a new playing field and absolutely dominating it before they figure out what’s going on." - source

Technology, media, the Internet & society

Evolution is 99.9% memetic at this point & accelerating exponentially. If finance is the market for promises, then media is the battleground of ideas and is just as fundamental.

"The medium is the message" - Marshall McLuhan proposes that a communication medium itself, not the messages it carries, should be the primary focus of study.

At the root of our greatest challenges are coordination failures. We are like an ant colony suffering from multiple personality disorder - trapped in multipolar traps, segregated into fabricated factions, oblivious to game theory/markets/economics/history, plagued by short-sightedness & nihilism, playing status games, and running exponential processes in a finite world.

"If code scripts machines, media scripts human beings." - @balajis

Cooperating flexibly in large numbers has led us to an evolutionary advantage - first through stories and then markets, clocks & bits. Mechanisms that make those more efficient benefit the species.

"The Internet is the largest engineering project the earth has ever seen - and we're just getting started" - Barrett Lyon, founder of OPTE Project

Biology is layers of dumb systems that cooperate and mediate between each other. Humans are just a collection of cells - including the "central" nervous system - cells that play along and cooperate which leads to the emergence of consciousness and intelligence. There’s no fundamental reason for humans to not be able to assemble into a unified global collective intelligence - a hive mind.

"I think just like the internet built information, super highways, I think blockchain is building the cooperation super, super highways." - Sreeram Kannan, source

Blockchains can play a major role in upgrading our systems of trust, reputation & coordination and it all depends on the arrangement of our technological building blocks. The things which unify us the most are: 1) geography, 2) language, 3) ethnicity & culture, 4) trade & currency, and 5) media, narrative & history - Headjack is focused on unifying & improving the last of those.

Civilization was born in the fog of war and Satoshi upgraded our capacity for trust & cooperation.

"Because it consists of billions of bidirectional interactions per day, Twitter can be thought of as a collective, cybernetic super-intelligence" - @elonmusk

"First we build the tools, then they build us." - Marshall McLuhan

Goals of Headjack

Mission

End the host-centric model by linking data to identity at scale & unbundle the media stack.

Headjack is the 15th and final standard for decentralized identity.

Execution (how)

How the blockchain & ecosystem are actually implemented (full specification).

Identity managers (IDM)

Handles (names)