Parcel Primer

Summary

Parcel is a data sharing and governance platform. It enables:

  • private data marketplaces
  • data governance to insulate your company from risks around directly holding user data
  • analytics cleanrooms
  • data tokenization

This document details the Parcel and data tokenization architectures at a level that’s useful for an integrator/app developer. It goes on to explain how you might actually perform that integration.

For more information on the motivation behind data tokenization, read our introductory blog here.

Concepts

Exemplary User Flows

Parcel

To better understand the components, let’s consider the journey of a Dev and one of the developer’s Users. This section will motivate the architecture to follow.

  1. Dev signs up to use Parcel and, in doing so, creates a Parcel app.
  2. Dev configures the Parcel app to request access to users’ documents.
  3. Dev integrates the Parcel app into their own application using the TypeScript client, REST API, or client generated from the OpenAPI spec.
  4. User begins to use Dev’s primary application, and Dev’s application prompts User to join Dev’s Parcel app so that the document grants can be made.
  5. Dev, via the Parcel app, processes User’s granted documents using an analytics program, and this runs in a Parcel Worker.
  6. The results of the analysis are returned to User.

A more complicated flow might involve any combination of Dev uploading documents on behalf of the user, Dev aggregating data across users, Dev delegating access to users’ data, and Dev’s Parcel app granting User document access. All things are possible, but that’s way too much detail for now!

Data Tokenization

  1. Data is uploaded to Parcel
  2. The data owner transfers ownership of the data to the tokenization service and receives a minting token (capability).
  3. The data owner uses (optionally consuming) the minting token to issue several tranches of tokens, each with different access rights. For instance, tranches that allow access for one day, one week, and one month, respectively.
  4. Other Parcel users purchase the data token from a marketplace, and use the token to run jobs on the underlying data asset.

Beyond the fundamental tokenization, transfer, and access control mechanisms, the utility of data tokenization is up to the app you’re trying to build. Parcel data tokenization takes care of those boring details so that you can focus on harder concepts like adoption, incentives, and liquidity.

Parcel Architecture

Let’s start in the stratosphere and zoom into the details.

Birds-eye View

Here we have the primary components of Parcel. The general idea is that the client makes requests to the Parcel Gateway using conventional REST endpoints. Documents uploaded to the gateway are encrypted and stored off-chain with the encryption key passed to and secured by the Parcel Runtime. The runtime manages access to documents (via their encryption keys) through a system of grants, and access logs.

Clients can request from the gateway that a job run on one or more documents. The job is sent to a Parcel Worker, which runs the analysis program provided as an OCI (Docker) image. The outputs are re-uploaded as documents. The job inputs, outputs, and operation are all highly configurable. You can imagine it as an enclaved Kubernetes Job that can be granted special document access rights.

Now, you might be wondering why it’s okay to have key material floating about. If you look closely at the diagram, you’ll see some components with bold edges. Those bold edges denote that the component runs fully within an hardware enclave. Thus, if you accept the enclave’s attestation and you trust the code, you can be assured that the intermediate values are private and that the outputs are correct. In the case of the gateway, it means that clients can be lightweight since the gateway is trusted to handle encryption. In the case of the runtime, it means that access control will be applied and that the document encryption keys will remain safe. In the case of the worker, it means that the job’s computation is private and the outputs are genuine.

And here’s the same information in a table because everyone loves tables.

Street View

Hold up. Before you think “what is all of this stuff?!” let me assure you that it’s quite approachable. For starters, all that’s changed from the previous diagram is that there are now boxes inside of the Parcel Gateway and Parcel Runtime showing the internal components that you care about. The two new concepts are the gateway API, gateway-runtime interface, and Parcel Runtime structure.

Let’s start by focusing on the APIs, the boxes in Parcel Gateway and Runtime RPCs. On the gateway, you can see the REST endpoints corresponding to each of the models mentioned in the Concepts section. You also can see corresponding RPC endpoints exposed by the runtime. You might already be able to guess that one of the primary functions of the gateway is to translate developer-friendly REST into efficient, point-to-point encrypted RPCs.

In front of each of the API endpoints is an auth layer. The gateway’s auth checks that you’re a registered user of the gateway, and validates the transaction against public data (e.g., failing fast instead of dispatching a destined-to-fail runtime transaction when someone tries to delete someone else’s identity). The serious auth happens in the runtime where access control over private data and write operations is applied. The runtime is where grants and permissions are checked. No secrets are released or models updated unless the runtime accepts the request.

After a request makes its way through the runtime API, it’s decoded by the “controller” (in the database sense) into database operations. The Parcel DB is a literal SQL database, but with encrypted columns for sensitive data. The cells within the columns are encrypted using keys derived from the runtime’s encryption key, as provided by the key manager runtime that’s peered with the Parcel runtime (and attests the Parcel Runtime before releasing the key).

And here’s a summary of all the above information in a table because why? because everyone loves tables.

A brief detour: Trust Links

Before we go deeper into the Parcel architecture, let’s use our knowledge of the top-level components to understand which components trust each other and why. In the following diagram, there exists an edge from A to B if A trusts B. If the edge is dashed, the entity is only relied upon for availability. If the edge is solid, it’s for confidentiality and/or integrity. The full trust model can be found by taking the transitive closure of this graph.

There are three trust anchors: the Oasis Network, the TEE vendor, and Oasis Labs. From a developer’s perspective,

  • The Oasis Network is secured by proof-of-stake BFT consensus.
  • The TEE vendor is trusted to create reliable TEE hardware (or at least try); there’s no reason to trust the TEE vendor other than reputation, but there’s not much choice (yet).
  • The Oasis Network schedules the Parcel Runtime on an Intel SGX TEE. SGX can sometimes be a bit leaky¹²³⁴, but Oasis Labs runs the compute nodes, and we assume that you assume that neither us nor our cloud provider actually attempts to exploit the side channels.
  • The Parcel Worker, the TEE vendor is AMD and the product is SEV. The AMD SEV TEEs are provided through Google’s Confidential VMs product. AMD SEV, in the current -ES version, does not provide trustworthy attestations, so we need a workaround. We choose to use SEV despite this limitation because it allows running essentially unmodified code, so like Graphene except without the shim OS and RWX pages.
  • Oasis Labs is trusted (only currently) to provide secure code to network upgrades and to attest that we’re using trusted hardware where no hardware attestations exist (e.g., AMD SEV via Google Confidential VMs). Our goal is to reduce your trust requirement in Oasis Labs to zero. We can do this by integrating on-chain governance when it becomes available, and switching to better TEEs when they become available.

In the Weeds

Grant Giving

The three most important concepts in Parcel are identities, documents, and grants. The first two should be fairly intuitive. Grants are also intuitive, but they’re more powerful than simply giving access to document A by identity Z. This section offers a primer, and you’re encouraged to dig into the features further by reading the linked documents.

At their most basic, grants represent access rights given from one (app) identity to another. However, there’s also the condition, which is a policy specified using a straightforward DSL that allows implementing attribute-based access control. Namely, access can be granted from identity to identity based on any (optionally conjunction) of any number of selectors. So, for instance your app could request that participants grant you access to their data only if the data has been uploaded by you, and the data is only read from within a job that you endorse. If you so chose, you could add auditing capabilities by creating a new root, `$or`, where the new branch is “grant access to the identity id of an auditor.” It’s very flexible (and if you can think of anything else, you’re very welcome to suggest it).

One more feature of a grant is its capabilities. The grant capabilities define what operations are permissible by one to whom the grant is given. For documents, the most common capability is READ, but there is one more of note: DELEGATE. The delegation capability allows the grantee to create a new grant referencing the delegable grant to a new grantee. In other words, a grant with the DELEGATE capability can re-grant the documents covered by the grant to someone else. This is a useful feature for building, say, data marketplaces where you request that users grant the data they want to sell to you, so that you can make sales on their behalf. More information about grant capabilities can also be found in the API docs.

Jobs & Worker Attestation

Jobs can be scheduled to run on any number of datasets owned by any number of identities.

The job is described by a job specification that includes security-critical details like what OCI image is being run, the command invoked within the image, the owners of output documents, and whether the job has network access. As mentioned above, grants can be made based on the values of these security-critical details. The job spec also contains non-security-related information like resource requests and (private) environment variables, which are intended for the worker and don’t affect the owners of the data. It is the responsibility of the data owners (or the data owner’s trusted delegate [e.g., a marketplace]) to ensure that the jobs are not malicious.

After submitting a job request, the gateway logs the details with the runtime and stores the private information there. The job will only begin if the job submitter has been granted access to the documents (perhaps under the condition that the data be supplied to a specific image).

Once the documents have been fetched and authenticated, the job will start running until completion. The outputs will be collected and uploaded back to Parcel as per the job spec. The provenance of the output document is recorded, so a worker with network access can be used as a sort of trusted oracle.

For examples of running compute jobs, please refer to the tutorials, and for a complete specification of job spec fields, the API docs.

Ecosystem Components

Oasis Labs has created some additional services alongside Parcel that can make your and your users’ lives easier if you’re willing to trust Oasis Labs.

Oasis Auth — an identity manager that allows you and your users to authenticate to the Parcel Gateway and Parcel Runtime using OAuth. OAuth login provides a more familiar sign-in (and recovery) experience when your users don’t mind trusting Oasis Labs with their accounts.

Oasis Portal — a developer-centric web app that you can use to manage your application, invite users, list documents shared with your app, and initiate jobs, among other things. You’d use Portal when you don’t mind trusting the web app and don’t need the flexibility of the API (so for common tasks).

Oasis Steward — a data management web app through which end-users can manage their data and permissions, and see access logs. You’d direct your users to Steward if they’re willing to trust Oasis with their account and/or you don’t want to build your own user dashboard.

Data Tokenization

Data tokenization is a Parcel add-on. Technically, it’s a layer on top of Parcel, but it’s tightly integrated, so we’ll just refer to it as a part of Parcel, for simplicity. If you don’t have a basic understanding of Parcel parceled away into your mind, you should first peruse the high-level overview.

Data tokenization, if you recall from the concepts, is about turning data into an asset that can participate in the wider DeFi ecosystem. A data token can be exchanged on your favorite DEX and subsequently used to run jobs on the data (or maybe just speculate, idk; you do you).

Tokenization in general has three steps:

  1. custodiation: the entity creating the token, the issuer, transfers ownership of the asset to the mint. This step gives the minted tokens value in terms of a real asset. If you were to frame tokenization as obtaining a loan of data tokens, this step would be called collateralization. The issuer receives the capability to mint tokens backed by the custodiated asset; this capability can be referred to as the minting token.
  2. For data tokenization, the asset is one or more pieces of data, and Parcel is the mint.
  3. tranching: The minting token is (optionally non-destructively) used to issue one or more classes of tokens, each with different rights. You can think of tranches like classes of stock, but with more than A, B, and C, and with the full Parcel grants system as rights. The issuer receives all of the tokens for further use.
  4. distribution: The issuer most likely wants to get the tokens into the hands of others. The issued tokens can be issued as a synthetic on another chain using the Parcel tokenization platform’s cross chain/paratime bridge.
  5. (optional) buyback: The issuer can repurchase the asset by buying back all of the outstanding tokens and then trading them back for the asset. The issuer can also specify that anybody be able to do a buyback.

Some additional notes:

  • Holders of raw data or the ability to recreate non-owned uploaded data can mint new tranches despite not holding a minting token by re-uploading the data and getting a new minting token based on the same asset. There’s nothing we can technically do about this in general. This is the kind of thing that would be solved by a reputation system, or else some state held by, for instance, a trusted oracle that says whether the data has already been uploaded.
  • Multiple parties can mint an asset using a DAO/multisig-controlled Parcel identity. There is currently no plan to implement this, but it might arrive in a future version.
  • This feature isn’t implemented yet, but is scheduled for soon.

Integration (How do I use it??)

For this section, let’s imagine that you want to create a tokenized data marketplace, the use-case that maximally leverages Parcel’s features. However, I’ll detail exactly what specs this marketplace has, so that it’s easy to figure out what you can omit if you’re not building a tokenized data marketplace.

Our imaginary product shall be dubbed Sterling, a privacy-preserving data marketplace implemented as a web/mobile app.

Goals:

  1. strives to provide a cohesive user experience but will expose Parcel to the user where it provides significant usability benefit
  2. maintains information about its users including role, reputation, and token balances
  3. allows users to upload their own private data
  4. allows users to onboard private data from a trusted source using an HTTPS-capable trusted oracle
  5. enables attachment of (machine-generated) endorsements to uploaded data
  6. facilitates exchange of data tokens for money tokens
  7. allows trusted computation on tokenized data according to the rights associated with the tranche (including pay-per-use)

Step 1: Setup

The first step is to create a Parcel app for Sterling through the Parcel Portal.

It has:

  • A frontend client that will be able to make Parcel API calls in the context of the user (it won’t act on behalf of the user, however, as it’s desirable to keep the requested scopes concise)
  • A permission that grants Sterling the TOKENIZE capability on documents tagged as participating in the marketplace

After the Sterling Parcel app has been created, we can start to integrate it with the frontend app. The web/mobile app will need a sign-in button that initiates a Login with Oasis flow to retrieve a Parcel access token (example). Users will be prompted to create an Oasis Identity if they have not already.

Step 2: Data Onboarding

Onboarding of users’ raw (i.e. unauthenticated) data is done by the user through Parcel Steward. Onboarding includes both upload of new data, and sharing of existing data. The reasons for onboarding through Steward and not Sterling are several:

  • Sterling does not need to implement data management and access log viewing interfaces, which saves huge development time.
  • Users trust that Steward is a thin layer on top of the Parcel Gateway because of the strong Oasis Parcel branding, and also the third-party security audit (saving Sterling the $ and trouble of getting its own format audit).
  • Users can share their existing Parcel data with Sterling. This becomes valuable as the Parcel ecosystem and thus data sovereignty develops.

Taking advantage of the features already provided by Parcel Steward, Sterling makes it easy for users to upload data by providing a clear “Upload Data” link that takes users directly to the Sterling data contribution page.

Authenticated data (e.g., coming from an HTTPS webpage) needs to be uploaded from a Parcel Worker. Parcel will eventually itself provide a trusted oracle service, but this can always be done using Parcel jobs.

To fetch authenticated data using a Parcel job, start by creating a custom compute image. The simplest image would be curl, with a command of `curl -o /parcel/out/data.bin $1 -H “Authentication: $USER_AUTH”` where the requested API endpoint is the first argument to the job and the user’s credentials are in the (private) environment variable. More complicated images could implement an OAuth flow to access data from APIs like Google and Facebook. Users will need to trust Sterling with their login credentials, of course. Credentials must not be sent anywhere other than Parcel Gateway.

The compute image ID can them be used to assemble a job specification that takes zero input documents, has the user’s private credentials as env vars, the public URL(s)⁵ as arguments, requests network access (preferably scoped to just the requested URL, but it’s still up to the user to check that the job spec and image are trustworthy), and uploads authenticated data back to the user but shared with Sterling.

Step 3: Data Management

Sell side (data producer)

The real fun begins once data is shared with the Sterling app.

Sterling can show users what documents they have available to tokenize by calling `parcel.listDocuments({ sharedWith: STERLING_APP_ID })`. Additional filter constraints are also possible.

Users can select one or more documents and request that Sterling tokenize them. Sterling calls `parcel.createToken({ documents })` to transfer ownership of the documents to the Parcel tokenization escrow account. The user will receive a minting token (i.e. capability) in response.

Sterling takes the user to a tranche issuance page where the tranches and conditions on tokens within the tranche can be configured. The final configuration is submitted using one or more calls to `parcel.createTranche({ config })`; possibly followed by a call to `parcel.burnToken({ id: mintingTokenId })` if so desired⁶. The tokens are held by the user, as is shown in the Sterling UI.

Through Sterling, the user can list the tokens on other chains/paratimes that have DeFi functionality. Sterling tracks account balances as tokens are moved, purchased, and sold.

Buy side (data consumer)

Data consumption starts with the purchase of a data token (created using the above process).

Sterling facilitates this by displaying users a list of tokens created on the platform (using `parcel.listTokens({ filter })`). Purchasing a token transfers ownership of the token, but not the data, to the purchaser.

When a data token is transferred on Parcel, the tokenization escrow account removes the grants to the previous holder and creates new grants to the new owner. This only happens when the token is transferred on Parcel–a transfer of a synthetic token on another chain will need to be materialized in Parcel to take effect (this eventually happens due to bridging, but it takes a little while).

Sterling allows data consumers to run jobs on the data underlying their held tokens. This is done by enqueuing a job using the Parcel API (the input datasets are those backing the token).

[1]: https://www.usenix.org/conference/usenixsecurity18/presentation/bulck

[2]: https://sgaxe.com/

[3]: https://plundervolt.com/

[4]: https://dl.acm.org/doi/abs/10.1145/3296957.3173204

[5]: Having the URL be public makes it easier for Sterling to display the document’s provenance.

[6]: For data that cannot be regenerated, burning the minting token would make issued tokens more valuable because there could never be data token inflation.