Show HN: Rocky – Rust SQL engine with branches, replay, column lineage
Did you know that more than 70 % of data‑pipeline failures are caused by invisible schema drift? Enter Rocky, the first Rust‑based SQL engine that lets you branch, replay, and track column lineage the way developers version‑control code—bringing Git‑style safety to every MySQL/PostgreSQL query.
What is Rocky and How Does It Differ from Classic SQL Engines?
Rocky is a Rust‑native SQL engine that runs on top of existing MySQL or PostgreSQL instances. It keeps the familiar sql syntax but adds a layer of version control that most databases lack. I’ve found that the biggest pain points in my work are the fragile, ad‑hoc queries that break downstream dashboards when schema changes sneak in. Rocky tackles that by treating every query as a first‑class citizen in a branching model.
- Rust‑native performance – compiled, memory‑safe, up to 2× faster than CPython‑based engines.
- Branch‑first architecture – every query lives on a branch; you can create, merge, or discard branches without touching the underlying data.
- Built‑in replay & lineage – automatic capture of column provenance, making audits and rollbacks trivial.
So the takeaway? Rocky isn’t a replacement for your database; it’s an overlay that gives you code‑like safety for sql operations.
Core Features Explained
Let’s break down the three pillars that make Rocky stand out.
Branching queries
You can snapshot a database state, run experimental analytics, then merge or discard. Think of it as a lightweight git branch for data. When you create a branch, Rocky takes a copy of the schema and any views you touch, leaving the production tables untouched.
Replay engine
Every DML/DDL statement is logged in a deterministic order. You can replay a branch on a fresh server to reproduce a bug, or to run a CI test that validates a new transformation without touching real data.
Column lineage tracker
This feature builds a visual graph of how each output column is derived from source tables. For GDPR, “right‑to‑be‑forgotten” audits, or just sanity checks, lineage is gold. It’s like having a cheat sheet that shows the exact sql path each value took.
Practical Walkthrough: Setting Up Rocky and Running Your First Branch
Below is a step‑by‑step guide that’ll get you from cargo install rocky to a finished branch that you can merge back into main. I’ve included a short Rust‑oriented CLI script to illustrate the flow.
# Install Rocky
cargo install rocky
# Connect to PostgreSQL
rocky connect postgres://user:pass@localhost:5432/mydb
# Create a new branch
rocky create-branch dev
# Run a transformation query
rocky exec -b dev "SELECT customer_id, SUM(amount) AS revenue
FROM orders
WHERE order_date > '2024-01-01'
GROUP BY customer_id;"
# View column lineage
rocky lineage show dev
**Key Talking Points**
- Rust toolchain 1.80+ required.
- Rocky branches map to Git‑style refs internally.
- Branch metadata lives in a lightweight SQLite store by default but can be swapped for S3 or a dedicated metadata service.
Now you’re ready to experiment without fear. Just remember that every branch is a sandbox; you can always drop it if the experiment goes south.
Why It Matters: Real‑World Impact for DBAs, Developers, and Analysts
When you can roll back a faulty transformation instantly, you basically eliminate one major source of production incidents. In my experience, the biggest cost of a bad query is the time spent hunting down the root cause. Rocky turns that into a simple rocky rollback dev.
- Reduced production incidents – instant rollback of a faulty transformation without restoring a full backup.
- Regulatory compliance – lineage graphs satisfy “right‑to‑be‑forgotten” and audit requirements for finance, health, and e‑commerce.
- Collaboration across teams – data analysts can experiment on their own branch while engineers keep the main branch stable, mirroring feature‑branch workflows in software development.
Honestly, the ability to keep a clean, auditable history of every data transformation is a game changer. If you’ve ever struggled with “data drift” or “schema sinking,” Rocky offers a clean, versioned way to address both.
Actionable Takeaways & Next Steps
1. Evaluate: Run Rocky’s built‑in benchmark against MySQL/PostgreSQL on a representative workload to see the performance gains.
2. Integrate: Hook Rocky’s replay API into your CI/CD pipeline (e.g., GitHub Actions) so that every PR runs a replay test before merging.
3. Adopt: Start a pilot project—branch a reporting table, capture lineage, and present the audit report to compliance. It’s a quick win.
4. Contribute: Fork the repo, add a new connector (e.g., Snowflake), and submit a PR. The community is welcoming and eager to grow the ecosystem.
So what’s the catch? Rocky is still maturing, so if you’re running a mission‑critical production system, treat it as a staging tool for now. But the upside is pretty huge.
Frequently Asked Questions
What is the main advantage of using Rocky over MySQL for analytical workloads?
Rocky’s branch‑first model lets you test new transformations without affecting the live schema, and its column‑lineage engine provides instant provenance that MySQL lacks.
Can Rocky work with an existing PostgreSQL database, or does it require a fresh data store?
Yes. Rocky connects to any PostgreSQL instance via standard libpq credentials; it stores branch metadata separately, leaving the original tables untouched.
How does Rocky’s replay feature differ from traditional database backups?
Replay records the exact sequence of SQL statements (including temporary tables) rather than a binary snapshot, enabling deterministic reconstruction of any point‑in‑time state without restoring large dump files.
Is Rocky production‑ready for mission‑critical pipelines?
Rocky is stable for development and staging; its Rust core guarantees memory safety, and the project includes integration tests against MySQL 8 and PostgreSQL 15. Production readiness depends on your tolerance for a newer open‑source project and the need for built‑in lineage.
What programming language should I use to embed Rocky in my data‑engineering workflow?
Rocky ships a CLI and a Rust library; you can call it from any language (Python, Go, Java) via the CLI, but for native performance and full API access, use Rust.
Related reading: Original discussion
Related Articles
What do you think?
Have experience with this topic? Drop your thoughts in the comments - I read every single one and love hearing different perspectives!
Comments
Post a Comment