Skip to main content

3 posts tagged with "engineering"

View All Tags

· 3 min read
Lulu Zhang

When you use Buildbuddy with Bazel to build and test software, Buildbuddy captures information about each Bazel invocation, such as number of builds, build duration, remote cache performance, and more. Buildbuddy has a Trends page to visualize trends in this data over time.

The trends page allows you to see how improvements you are making to your builds affects your average build duration and other stats. It also exposes areas that might need improving. For example, if you see the cache hit rate go down over time, your build might have some non-deterministic build actions that could be improved, or some newly introduced dependencies that result in more frequent cache invalidations.

When we first created the Trends page, we used MySQL queries to aggregate build stats and generate the data we wanted to display. For a time this worked well, but we quickly ran into performance issues for customers that had very large numbers of builds. We were able to temporarily improve performance by adding various indices, and though this helped to reduce the number of rows read, it was not sufficient. Some customers do millions of builds monthly, and the Trends page (which can look back up to a year) for these customers was taking more than 20 minutes to load.

The queries behind the trends page require aggregation of multiple columns, such as cache hits and cache misses. A traditional row-based database like MySQL is not always ideal for such a use case. In row-based databases, data is stored row by row. When aggregating columns, more I/O seeks are required than a column-based database, which stores the data of each column in contiguous blocks. Moreover, column-based databases have a higher compression rate because consecutive values of the same column are of the same type and may repeat.

With a row-based store, we can see from this diagram that computing a sum of cache hit count would require us to load both block 1 and block 2. With a column-based store, all the cache hits data are stored in the same block.

Therefore, we felt that using ClickHouse, a column-based database, would improve the performance of required queries for the trends page. We validated ClickHouse’s performance against our use case: it took ClickHouse 0.317 seconds to process 1.5 million rows and calculate the stats. The same query took MySQL about 24 minutes.

One of our goals for data migration is to make sure the data is accurate. We added monitoring and compared data between MySQL and ClickHouse after we enabled double writing in production. One source of inconsistency was that data was inserted into ClickHouse both by the backfill script and production servers. Different to a traditional database, ClickHouse’s ReplacingMergeTree engine only deduplicates data in the background at an indeterminate time. As a result, we needed to manually run the OPTIMIZE operation to force ClickHouse to deduplicate data after the backfill was done. After we were confident in the data consistency, we finally enabled the Trends page to read from ClickHouse.

What's next

We are excited how ClickHouse unlocks more possibilities for us to provide analytical insights into builds, targets, tests and remote execution. For example, we want to add graphs that show how remote actions are spending most of their time. These insights can be used to guide remote execution performance optimizations.

We would love to hear your feedback about what stats and graphs you are interested in seeing. Join our Slack channel or email us at with any questions, comments, or thoughts.

· 5 min read
Brandon Duffany

Using a remote cache is a great way to speed up your Bazel builds! But by default, Bazel uploads almost everything to the remote cache.

If your network is slow and your build artifacts are very large (like a docker image) this can lead to poor performance.

To address this, and make it easier to fix, we built the new cache requests card.

In this post we'll explore what insights this card can give you into your builds, as well as some fun details about how the card works under the hood.

· 7 min read
Tyler Williams

Let's start with "what's BuildBuddy" for the kids in back. In short, we provide a UI, distributed cache, and remote execution platform for your Bazel builds. That means we securely compile your code, cache the artifacts, and help you visualize the results. We make it possible to build projects like Tensorflow from your laptop in under 5 minutes instead of 90 minutes.

Obviously to do all this, we have to handle some thorny engineering challenges, one of which is scheduling remote executions. For that, we have a scheduler. The scheduler just matches actions (basically jobs) received by our API to remote workers that actually do the work. If you think of a full build of something like Tensorflow as a 10 course meal, a single action is like a recipe for a tiny part of that meal. To make it easier to visualize, here's a real action from building BuildBuddy: