Skip to main content

Show HN: Rocky – Rust SQL engine with branches, replay,...

Show HN: Rocky – Rust SQL engine with branches, replay,...

Show HN: Rocky – Rust SQL engine with branches, replay, column lineage

Did you know that more than 70 % of data‑pipeline failures are caused by invisible schema drift? Enter Rocky, the first Rust‑based SQL engine that lets you branch, replay, and track column lineage the way developers version‑control code—bringing Git‑style safety to every MySQL/PostgreSQL query.

What is Rocky and How Does It Differ from Classic SQL Engines?

Rocky is a Rust‑native SQL engine that runs on top of existing MySQL or PostgreSQL instances. It keeps the familiar sql syntax but adds a layer of version control that most databases lack. I’ve found that the biggest pain points in my work are the fragile, ad‑hoc queries that break downstream dashboards when schema changes sneak in. Rocky tackles that by treating every query as a first‑class citizen in a branching model.

  • Rust‑native performance – compiled, memory‑safe, up to 2× faster than CPython‑based engines.
  • Branch‑first architecture – every query lives on a branch; you can create, merge, or discard branches without touching the underlying data.
  • Built‑in replay & lineage – automatic capture of column provenance, making audits and rollbacks trivial.

So the takeaway? Rocky isn’t a replacement for your database; it’s an overlay that gives you code‑like safety for sql operations.

Core Features Explained

Let’s break down the three pillars that make Rocky stand out.

Branching queries

You can snapshot a database state, run experimental analytics, then merge or discard. Think of it as a lightweight git branch for data. When you create a branch, Rocky takes a copy of the schema and any views you touch, leaving the production tables untouched.

Replay engine

Every DML/DDL statement is logged in a deterministic order. You can replay a branch on a fresh server to reproduce a bug, or to run a CI test that validates a new transformation without touching real data.

Column lineage tracker

This feature builds a visual graph of how each output column is derived from source tables. For GDPR, “right‑to‑be‑forgotten” audits, or just sanity checks, lineage is gold. It’s like having a cheat sheet that shows the exact sql path each value took.

Practical Walkthrough: Setting Up Rocky and Running Your First Branch

Below is a step‑by‑step guide that’ll get you from cargo install rocky to a finished branch that you can merge back into main. I’ve included a short Rust‑oriented CLI script to illustrate the flow.

# Install Rocky
cargo install rocky

# Connect to PostgreSQL
rocky connect postgres://user:pass@localhost:5432/mydb

# Create a new branch
rocky create-branch dev

# Run a transformation query
rocky exec -b dev "SELECT customer_id, SUM(amount) AS revenue
FROM orders
WHERE order_date > '2024-01-01'
GROUP BY customer_id;"

# View column lineage
rocky lineage show dev

**Key Talking Points**

  • Rust toolchain 1.80+ required.
  • Rocky branches map to Git‑style refs internally.
  • Branch metadata lives in a lightweight SQLite store by default but can be swapped for S3 or a dedicated metadata service.

Now you’re ready to experiment without fear. Just remember that every branch is a sandbox; you can always drop it if the experiment goes south.

Why It Matters: Real‑World Impact for DBAs, Developers, and Analysts

When you can roll back a faulty transformation instantly, you basically eliminate one major source of production incidents. In my experience, the biggest cost of a bad query is the time spent hunting down the root cause. Rocky turns that into a simple rocky rollback dev.

  • Reduced production incidents – instant rollback of a faulty transformation without restoring a full backup.
  • Regulatory compliance – lineage graphs satisfy “right‑to‑be‑forgotten” and audit requirements for finance, health, and e‑commerce.
  • Collaboration across teams – data analysts can experiment on their own branch while engineers keep the main branch stable, mirroring feature‑branch workflows in software development.

Honestly, the ability to keep a clean, auditable history of every data transformation is a game changer. If you’ve ever struggled with “data drift” or “schema sinking,” Rocky offers a clean, versioned way to address both.

Actionable Takeaways & Next Steps

1. Evaluate: Run Rocky’s built‑in benchmark against MySQL/PostgreSQL on a representative workload to see the performance gains.

2. Integrate: Hook Rocky’s replay API into your CI/CD pipeline (e.g., GitHub Actions) so that every PR runs a replay test before merging.

3. Adopt: Start a pilot project—branch a reporting table, capture lineage, and present the audit report to compliance. It’s a quick win.

4. Contribute: Fork the repo, add a new connector (e.g., Snowflake), and submit a PR. The community is welcoming and eager to grow the ecosystem.

So what’s the catch? Rocky is still maturing, so if you’re running a mission‑critical production system, treat it as a staging tool for now. But the upside is pretty huge.

Frequently Asked Questions

What is the main advantage of using Rocky over MySQL for analytical workloads?

Rocky’s branch‑first model lets you test new transformations without affecting the live schema, and its column‑lineage engine provides instant provenance that MySQL lacks.

Can Rocky work with an existing PostgreSQL database, or does it require a fresh data store?

Yes. Rocky connects to any PostgreSQL instance via standard libpq credentials; it stores branch metadata separately, leaving the original tables untouched.

How does Rocky’s replay feature differ from traditional database backups?

Replay records the exact sequence of SQL statements (including temporary tables) rather than a binary snapshot, enabling deterministic reconstruction of any point‑in‑time state without restoring large dump files.

Is Rocky production‑ready for mission‑critical pipelines?

Rocky is stable for development and staging; its Rust core guarantees memory safety, and the project includes integration tests against MySQL 8 and PostgreSQL 15. Production readiness depends on your tolerance for a newer open‑source project and the need for built‑in lineage.

What programming language should I use to embed Rocky in my data‑engineering workflow?

Rocky ships a CLI and a Rust library; you can call it from any language (Python, Go, Java) via the CLI, but for native performance and full API access, use Rust.


Related reading: Original discussion

Related Articles

What do you think?

Have experience with this topic? Drop your thoughts in the comments - I read every single one and love hearing different perspectives!

Comments

Popular posts from this blog

2026 Update: Getting Started with SQL & Databases: A Comp...

Low-Code Isn't Stealing Dev Jobs — It's Changing Them (And That's a Good Thing) Have you noticed how many non-tech folks are building Mission-critical apps lately? Honestly, it's kinda wild — marketing tres creating lead-gen tools, ops managers deploying inventory systems. Sound familiar? But here's the deal: it's not magic, it's low-code development platforms reshaping who gets to play the app-building game. What's With This Low-Code Thing Anyway? So let's break it down. Low-code platforms are visual playgrounds where you drag pre-built components instead of hand-coding everything. Think LEGO blocks for software – connect APIs, design interfaces, and automate workflows with minimal typing. Citizen developers (non-IT pros solving their own problems) are loving it because they don't need a PhD in Java. Recently, platforms like OutSystems and Mendix have exploded because honestly? Everyone needs custom tools faster than traditional codin...

Practical Guide: Getting Started with Data Science: A Com...

Laravel 11 Unpacked: What's New and Why It Matters Still running Laravel 10? Honestly, you might be missing out on some serious upgrades. Let's break down what Laravel 11 brings to the table – and whether it's worth the hype for your PHP framework projects. Because when it comes down to it, staying current can save you headaches later. What's Cooking in Laravel 11? Laravel 11 streamlines things right out of the gate. Gone are the cluttered config files – now you get a leaner, more focused starting point. That means less boilerplate and more actual coding. And here's the kicker: they've baked health routing directly into the framework. So instead of third-party packages for uptime monitoring, you've got built-in /up endpoints. But the real showstopper? Per-second API rate limiting. Remember those clunky custom solutions for throttling requests? Now you can just do: RateLimiter::for('api', function (Request $ 💬 What do you think?...

Expert Tips: Getting Started with Data Tools & ETL: A Com...

{"text":""} 💬 What do you think? Have you tried any of these approaches? I'd love to hear about your experience in the comments!