Git
for Data

Version control for data and ETL pipelines. Rollback mistakes in minutes, not weeks.

Try Live Demo Download for Desktop

DataVersion

main v42

CATALOG

sales

customer_data

orders

revenue_metrics

marketing_costs

product_sales

pricing_history

product_catalog

raw_transactions

sales_analysis

tax

support

warehouse

Query 1

Query 2

SELECT
date,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as customers
FROM sales VERSION AS OF 40

Results

date	total_revenue	customers
2025-01-01	$45,230	128
2025-01-02	$52,180	142
2025-01-03	$58,920	156
2025-01-04	$61,450	163
2025-01-05	$47,890	131

The $50M Data Problem

Engineers waste up to 40% of their time monitoring, investigating, and fixing data. Even then, you don't trust the accuracy or freshness of your dashboards.

Impossible Rollbacks

When mistakes happen, data teams face 2-3 week backfill campaigns costing $50K-200K per incident. One bad query can corrupt your entire pipeline.

AI Cannot Provide Proof

You want AI to answer data questions, but it can't show you proof or where data came from. Without lineage, AI-generated insights are just guesses.

Dependency Chaos

Tables depend on other tables, but tracking relationships manually is error-prone. One delayed upstream job breaks everything downstream.

Cost Explosion

Zero visibility into data spending. Companies waste 60-80% of budgets on redundant datasets, inefficient queries, and over-provisioned infrastructure.

Time Travel for Your Data Lake

Bring Git-like superpowers to your data pipelines. Branch, version, and rollback with confidence.

Instant Rollback & Cascade

95% faster incident response. One-click rollback with automatic downstream DAG cascade. Select any point in time, click rollback, and the entire dependency tree cascades automatically.

ROLLBACK TO VERSION 122

Version Everything

Complete audit trail. Every query, transformation, and schema change is automatically versioned. Git-like branching for data and schema with 60%+ storage reduction.

SELECT * FROM sales VERSION AS OF 123

Smart Dependencies & Lineage

Lineage captured at authoring time. Automatically track table dependencies at the partition level. Jobs run in the right order with change impact assessment.

DEPENDS ON sales.date = YESTERDAY

AI-Powered Discovery

Hours to seconds. AI agents understand schemas and lineage. Ask questions in natural language and get instant SQL generation and query execution.

"Show me sales by region last month"

Built for Your Role

One platform, every data persona

Data Engineer

"Zero-ETL pipeline creation. Write a query, click Save As Table, done."

That query you just wrote? Click "Save as Table" and it becomes a production-ready data pipeline with scheduling, dependency tracking, error handling, and automatic rollback propagation. No Airflow configs, no YAML files.

Data Scientist

"S3 data to queryable table in 30 seconds. No tickets, no waiting."

Point to any S3 location - we auto-detect the schema and create a queryable table instantly. Import wizard handles partitions, data types, everything. Start running Python/PySpark queries immediately.

Analytics Manager

"Git for your data. Feature branches, preview periods, safe merges."

Develop and test data transformations on isolated feature branches without touching production. Enable Preview mode to let stakeholders validate changes. Merge when ready.

On-Call Engineer

"Pipeline delayed? One click per level to find the root cause."

Get paged about a late data pipeline? Open the table, see its upstream dependencies with completion status at a glance. Click the incomplete one. Repeat. Find root cause in seconds.

Data Ops Engineer

"One-click rollback with automatic cascade correction."

Bad data got through? Rollback the source table to any previous snapshot. DataVersion automatically marks all downstream tables as invalid and triggers cascade reprocessing.

VP of Data

"Zero compute waste. Auto-provision, auto-shutdown, auto-savings."

No idle clusters burning money. EMR Serverless spins up only when jobs run, then shuts down automatically. Pay only for actual compute seconds.

Everything You Need

A complete data platform in one deployment

AI Assistant

Ask questions in plain English. Get executable SQL instantly. One-click to run, one-click to save.

Feature Branches

Develop and test on isolated branches. Preview for stakeholders. Merge to production when ready.

Lineage Tracking

Complete lineage from source to report, snapshot by snapshot. Full audit trail for compliance.

Easy Import

Point to S3, paste from Excel, or connect external tables. Auto-detect schema and partitions.

Single Sign-On

Enterprise SSO with AWS Cognito. Secure authentication, audit logs, and cross-account sharing.

Apache Iceberg

Industry-standard table format with time travel, schema evolution, and ACID transactions built-in.

Ready to Take Control of Your Data?

Download the desktop app or try our live demo

Try Live Demo

Windows

Windows 10 or later

Download

macOS

macOS 10.15 or later

Download

Linux

Ubuntu, Debian, or Fedora

Download

Git
for Data

The $50M Data Problem

Impossible Rollbacks

AI Cannot Provide Proof

Dependency Chaos

Cost Explosion

Time Travel for Your Data Lake

Instant Rollback & Cascade

Version Everything

Smart Dependencies & Lineage

AI-Powered Discovery

Built for Your Role

"Zero-ETL pipeline creation. Write a query, click Save As Table, done."

"S3 data to queryable table in 30 seconds. No tickets, no waiting."

"Git for your data. Feature branches, preview periods, safe merges."

"Pipeline delayed? One click per level to find the root cause."

"One-click rollback with automatic cascade correction."

"Zero compute waste. Auto-provision, auto-shutdown, auto-savings."

Everything You Need

AI Assistant

Feature Branches

Lineage Tracking

Easy Import

Single Sign-On

Apache Iceberg

Ready to Take Control of Your Data?

Windows

macOS

Linux

Get in Touch

Email Us

GitHub

Community

Git for Data

The $50M Data Problem

Impossible Rollbacks

AI Cannot Provide Proof

Dependency Chaos

Cost Explosion

Time Travel for Your Data Lake

Instant Rollback & Cascade

Version Everything

Smart Dependencies & Lineage

AI-Powered Discovery

Built for Your Role

"Zero-ETL pipeline creation. Write a query, click Save As Table, done."

"S3 data to queryable table in 30 seconds. No tickets, no waiting."

"Git for your data. Feature branches, preview periods, safe merges."

"Pipeline delayed? One click per level to find the root cause."

"One-click rollback with automatic cascade correction."

"Zero compute waste. Auto-provision, auto-shutdown, auto-savings."

Everything You Need

AI Assistant

Feature Branches

Lineage Tracking

Easy Import

Single Sign-On

Apache Iceberg

Ready to Take Control of Your Data?

Windows

macOS

Linux

Get in Touch

Email Us

GitHub

Community

Git
for Data