All posts

BYOC option for your Snowflake workloads

Snowflake is the best thing happened to data professionals since Excel

Snowflake has transformed enterprise data management, arguably the most significant leap since Excel. It offers an on-demand data warehouse with a pay-as-you-go model, seamless scalability from zero to infinity, and no need for configuration or tuning, for the first time in history, enabling non-techies to provision and use an analytical database completely on their own. Affordable for most workloads, it leverages the familiar 50-year-old SQL standard with transactional semantics. At the time, the only comparable technology was BigQuery, which actually predated Snowflake by a few years, but it was tightly bound to GCP so not as widely known but not less successful. 

Significant number of workloads and datasets cannot be shipped to Snowflake

Despite its strengths, Snowflake has limitations. Arguably the main one is that it cannot run workloads locally, requiring all data to be shipped to Snowflake’s own cloud account. A significant number of datasets and workloads require local processing for diverse reasons: testing data transformation logic during development cycles, data residency and sovereignty constraints, credit overruns, prohibitively high costs for certain workloads or datasets, impracticality of shipping certain datasets or workloads to Snowflake’s cloud account, shift-left initiatives, operating in clouds or cloud regions where Snowflake service is unavailable yet, prepaid cloud credits which are incompatible with Snowflake and more. While these may seem niche, Snowflake’s ~$4B ARR, projected to reach ~$10B by decade’s end, underscores the growing significance of these issues and the need to run Snowflake workloads locally.

Historically, it was impractical to build a system that could run Snowflake workloads natively

Until recently, building a system to run Snowflake workloads natively was impractical, demanding substantial capital and effort. Snowflake itself has invested ~$1B and a decade of negineering into building out their platform. However, recent advancements have made this feasible for a small, dedicated team. This is exactly what Embucket is working on.

The following recent advancements puts it within reach for our small team

Embucket is being open sourced today and can already run several Snowflake DBT workloads natively

These advancements have enabled Embucket’s small team to build a data platform rapidly and on a modest budget. It already runs many Snowflake workloads natively with DBT support, though it’s not yet ready for production use, for adventurous rustaceans it might be good enough for experimentation. The entire codebase is being open sourced today. If you are attending DataCouncil.ai this week, please come say hi to our booth.

Camuel Gilyadov
Co-founder