Conversation
…hit up intel when testing
| use tracing::debug; | ||
|
|
||
| /// For fetching collateral directly from Intel | ||
| pub const PCS_URL: &str = "https://api.trustedservices.intel.com"; |
There was a problem hiding this comment.
question:
should this be pre-parsed (lazily?) url::URL? to avoid re-parsing it all the time.
There was a problem hiding this comment.
The problem is that it is used as an arguement in dcap_qvl::collateral::get_collateral_for_fmspc which takes a &str
crates/pccs/src/lib.rs
Outdated
| #[derive(Clone)] | ||
| pub struct Pccs { | ||
| /// The URL of the service used to fetch collateral (PCS / PCCS) | ||
| pccs_url: String, |
There was a problem hiding this comment.
suggestion:
| pccs_url: String, | |
| url: String, |
^-- the struct is already Pccs, no need to repeat that prefix.
crates/pccs/src/lib.rs
Outdated
|
|
||
| impl Pccs { | ||
| /// Creates a new PCCS cache using the provided URL or Intel PCS default | ||
| pub fn new(pccs_url: Option<String>) -> Self { |
There was a problem hiding this comment.
question:
maybe use builder pattern here?
| .trim_end_matches("/sgx/certification/v4") | ||
| .trim_end_matches("/tdx/certification/v4") |
There was a problem hiding this comment.
question:
why do we need this?
There was a problem hiding this comment.
In case the user gives a PCCS url with the full route name for collateral fetching - which will not work since we add it later. Strictly speaking we don't really need this.
| tokio::spawn(async move { | ||
| let outcome = pccs_for_prewarm.startup_prewarm_all_tdx().await; | ||
| pccs_for_prewarm.finish_prewarm(outcome); | ||
| }); |
There was a problem hiding this comment.
quesetion:
what happens if we are not in tokio runtime?
There was a problem hiding this comment.
We panic.
Currently we use dcap_qvl to fetch quote collateral which internally uses reqwest's async api for the fetch. Strictly speaking this doesn't need to mean a tokio runtime, but practically, on non-wasm targets, since async-std is now deprecrated, there are not many other options.
My proposal would be to document that this requires a tokio runtime. The alternative would be to re-implement get_collateral_for_fmspc using reqwest's synchronous api if tokio::runtime::Handle::try_current() fails.
That said, there are some other reasons why we might want our own implementation of get_collateral_for_fmspc, as it does quite a bit of unneeded fetching during the initial pre-warm. But the disadvantage is more code to maintain ourselves.
| /// Resolves when cache is pre-warmed with all available collateral | ||
| pub async fn ready(&self) -> Result<PrewarmSummary, PccsError> { | ||
| let mut outcome_rx = self.prewarm_outcome_tx.subscribe(); | ||
| loop { |
There was a problem hiding this comment.
question:
can it happen that startup_prewarm_all_tdx panics (or gets aborted somehow)?
if "yes" then ready() can get stuck forever, right?
There was a problem hiding this comment.
Yeah i guess we could add a timeout and return an error on timeout.
The most likely thing to go wrong is the PCCS requests timeout - and dcap_qvl uses a 180 second timeout for these. But i guess no harm in adding a global timeout for that task.
Cargo.toml
Outdated
| rustls = { version = "0.23.37", default-features = false, features = ["brotli"] } | ||
| tokio = { version = "1.50.0", features = ["default"] } | ||
| tokio-rustls = { version = "0.26.4", default-features = false } | ||
| dcap-qvl = { git = "https://github.com/Phala-Network/dcap-qvl.git" } |
There was a problem hiding this comment.
question:
should we pin this to some tag/sha?
There was a problem hiding this comment.
Yep good catch. I have pinned to current head of master, and as soon as they make the next release i would switch to that (as we need something which is not yet released).
| } | ||
|
|
||
| /// Extracts the earliest next update timestamp from collateral metadata | ||
| fn extract_next_update(collateral: &QuoteCollateralV3, now: i64) -> Result<i64, PccsError> { |
There was a problem hiding this comment.
question:
should we also check for revocations at *_crl fields?
There was a problem hiding this comment.
Hmm yes, i think this should return the soonest timestamp, either the expiry of the collateral, or the expiry of the CRL list, whichever is sooner. If thats what you mean.
There was a problem hiding this comment.
Done in cc16825
This makes extract_next_update pick the soonest expiry date relevant to collateral, including those of CRL lists.
But it doesn't check the CRL lists - that is strictly speaking the job of the quote verifier, not the PCCS.
… relevant to the collateral
Adds an internal Provisioning Certifcate Cache Service (PCCS) crate which pre-fetches collateral and pre-emptively refreshes it, with the goal of keeping colleteral fetching out of the hot path when verifying attestations.
This copies the functionality from theses PRs to attested-tls-proxy:
API Routes used internally
Here are the routes hit during initial caching - documented in Intel PCCS spec:
GET
https://api.trustedservices.intel.com/sgx/certification/v4/fmspcsSource: src/lib.rs:268
dcap_qvl::collateral::get_collateral_for_fmspcfor each, with both 'processor' and 'platform' as the 'ca' arguement.GET
https://api.trustedservices.intel.com/sgx/certification/v4/pckcrl?ca={processor|platform}&encoding=derSource builder: url_pckcrl():69
GET
https://api.trustedservices.intel.com/tdx/certification/v4/tcb?fmspc={FMSPC}Source builder: url_tcb():77
GET
https://api.trustedservices.intel.com/tdx/certification/v4/qe/identity?update=standardSource builder: url_qe_identity():81
- This gets the identity of the quoting enclave
Example demonstrating pre-warm
Included with the
pccscrate is an example which demonstrate the warm-up cache filling using the Intel PCS.Here is an example output which shows which FMSPCs end up in the cache and which are rejected as being not relevant for TDX quote verification:
Show output revealing which FMSPCs are cached
Possible optimisation
It takes quite a while. We could drastically reduce the number of API calls in the warm-up, as many are not FMSPC dependent and are made redundantly. But this would comes at the cost of rolling more of this ourselves and having less code maintained by Phala.
Show breakdown of how many API calls we could save
Assumingg:
Right now, each of those 74 attempts triggers these PCS calls:
Plus the initial:
So current PCS call count is roughly:
If you cache the shared routes during prewarm:
That becomes:
So the saving is: