sccache is a compilation caching tool developed by mozilla, designed to act as a compilation wrapper and save compilation products to a storage backend to avoid duplicate compilation as much as possible, thus speeding up the overall compilation process. The sccache supports compilation backends such as gcc, clang, MSVC, rustc, NVCC and so on. Rust compilation, for example, can be used like this:

export RUSTC_WRAPPER=/path/to/sccache
cargo build

Recently the community asked if there was a way to speed up the compilation of databend, and I came up with the idea that I could use sccache to speed it up, with the following advantages.

  • Databend has many dependencies, most of which rarely change, and they are well cached
  • Databend's dev build runs on AWS's Self-hosted Runner and is already configured with an intranet S3 bucket

Start stepping in the pits

All things being equal, it looked like all I had to do was write a Github Action. It turns out that I was too optimistic:

The main pits are in two places.

  • To facilitate the maintenance of the test environment, Databend CI uses a build-tools image to run tests, which requires a lot of extra configuration with sccache.
  • The cloud platform enables node Role-based authentication for security reasons, AK/SK is not configured, and sccache does not support IMDSv2.

The first problem was solved by constantly harassing @everpcpc, the later one was relatively more problematic. We first submitted an Issue: Support IMDSv2 for AWS IAM Role authentication to sccache, and then started trying to solve the problem ourselves.

Debugging sccache

Debugging sccache is a painful process: sccache is a CS architecture, and each time the user invokes sccache, a server is started in the background, and communication between the client and the server determines whether to use the cache or not. All interactions with the storage backend are performed by the server, so you can't see the logs directly during the compilation process, you have to redirect them to a file and look at the output of the file. In addition, sccache has a long history and is not very debuggable in many places, e.g.

let res = self
    .client
    .execute(request)
    .await
    .with_context(move || format!("failed GET: {}", url))?;

if res.status().is_success() {
    let body = res.bytes().await.context("failed to read HTTP body")?;
    info!("Read {} bytes from {}", body.len(), url2);

    Ok(body.into_iter().collect())
} else {
    Err(BadHttpStatusError(res.status()).into())
}

When a request fails, it only hits a status code, no actual error content, and it took a lot of detours when debugging IMDSv2. In order to solve the problem as soon as possible, I directly forked sccache and modified the underlying storage to use opendal.

- let credentials = self
-     .provider
-     .credentials()
-     .await
-     .context("failed to get AWS credentials")?;
-
- let bucket = self.bucket.clone();
- let _ = bucket
-     .put(&key, data, &credentials)
-     .await
-     .context("failed to put cache entry in s3")?;
+ self.op.object(&key).write(data).await?;

This is much more comfortable, the problem was quickly located and fixed, and Databend is now using sccache successfully. After installing sccache, you only need to configure it as follows.

- name: Setup Build Tool
    uses: ./.github/actions/setup_build_tool
    with:
    image: ${{ inputs.target }}
    bypass_env_vars: RUSTFLAGS,RUST_LOG,RUSTC_WRAPPER,SCCACHE_BUCKET,SCCACHE_S3_KEY_PREFIX,SCCACHE_S3_USE_SSL,AWS_DEFAULT_REGION,AWS_REGION,AWS_ROLE_ARN,AWS_STS_REGIONAL_ENDPOINTS,AWS_WEB_IDENTITY_TOKEN_FILE

- name: Build Debug
    shell: bash
    run: cargo -Z sparse-registry build --target ${{ inputs.target }}
    env:
    RUSTC_WRAPPER: /opt/rust/cargo/bin/sccache
    SCCACHE_BUCKET: databend-ci
    SCCACHE_S3_KEY_PREFIX: cache/
    SCCACHE_S3_USE_SSL: true

Conclusion

Some relatively simple data is recorded here: the first full compilation takes some extra time to upload the compiled products, subsequent compilations can reuse these products and no longer require compilation.

  • Without any cache: 6m 20s
  • First compilation
    • Finished dev [unoptimized + debuginfo] target(s) in 9m 04s
  • Second compilation
    • Finished dev [unoptimized + debuginfo] target(s) in 5m 35s
  • After removing the debug log of sccache
    • Finished dev [unoptimized + debuginfo] target(s) in 4m 38s

The first compile time here also needs to take into account the performance impact of the sccache debug logs, which should not actually increase that much. However, there are still many places where Databend is not cached, so we can see if there are any optimizations that can be made later. I submitted a proposal to upstream: Use opendal to handle the cache storage operations to see if we can change the sccache storage backend to use opendal, so that it will be easier to dock the storage and debug ~