Product

Multi-engine compute with complete observability

Spark, DuckDB, and Polars on one lake. Every run and query captures logs, spans, lineage, and cost. Full context for your team and agents from day zero.

Cursor Claude Code

[01]SPARK AUTO-INVESTIGATION

[02]AUTO-FIX & REDEPLOY

[01] Compute

Compute with context

Build. Deploy. Debug. Let us deploy your Spark jobs and handle production monitoring, running investigations as soon as things go wrong.

Integrate Create compute CLI

process_pending_invoices.py

[02] Storage

Lake storage & query engines

Work on your own data with an integrated Iceberg storage and query layer, while keeping lineage and observability attached to every query. You can also bring your own iceberg catalog along for the ride.

More about Lake

flower_power.sql

[03] Code

English to distributed SQL

Use your own agent or IDE that inherits our Spark & Iceberg expertise and context. Ask for PySpark against your oleander lake in plain language.

npx skills add oleanderHQ/skills

Skill pack

Build a PySpark script for oleander.default.global_flowers. Filter to poisonous flowers, normalize genus and continent values, and compute risk slices by continent, toxicity band, genus, and bloom season with record counts and confidence metrics. Write partitioned outputs to oleander.analytics.toxicity_by_continent, oleander.analytics.toxicity_by_genus, and oleander.analytics.high_risk_species, and include idempotent upsert behavior so downstream dashboards and anomaly monitors can consume each dataset safely.

[04] Incidents

Automated incident investigations

Anomaly investigation starts the moment alerts fire. Pull deep context from your telemetry lake to pinpoint and assist with root cause before downstream impacts.

Incidents/ spark job rows written dropped 15x

Investigations/ #381

[05] Root cause

Full context-aware root cause analysis

Skip dashboard hopping. Debug production issues using an independent metadata context layer and trace root causes across your data infrastructure on day zero.

CLI

[06] Telemetry

Query telemetry data with SQL

Correlate metrics, logs, traces, and lineage metadata instantly so you can understand the intent behind every deployed pipeline with zero context switching.

pending_invoices.sql

[07] Alerting

Quick, smart alerting with incident triage

Reduce on-call burden. Every alert is paired with a detailed knowledge graph of your data infrastructure so you understand how everything fits together. A shared context with your team to solve incidents faster.

MCP setup

@alertingInvestigate new production alerts and generate triage context with downstream blast radius and likely remediation steps.

DoneThought for 5s

Active Alert

severityP1 high-impact anomaly

pipelinefinance.billing.process_pending_invoices

signalrows_written down 94% vs 7-day avg

started_at2025-01-28 10:03:12 UTC

Triage Context

blast_radius4 downstream models + 2 dashboards

linked_runs3 upstream runs in last 30 minutes

ownershipBilling Data Platform

recommended_next_steprollback run i7k2n + replay window

[08] Insights

The context graph for your data infrastructure

Every run, commit, deployment and dataset is connected. Search it. Trace it. Understand it. Share it.

MCP setup

@insightsAnalyze the last 30 days of finance.billing.process_pending_invoices. Investigate the 15x drop in output volume and map the downstream impact.

DoneThought for 5s

Deployment & Execution

activity28 days active across 45 runs

outages3 major outages (#381, #394, #412)

data12.5TB in; 2.1TB out with drift

uptime99.2% uptime vs. 15x volume drop

Code & Schema Evolution

commits18 commits linked to active runs

schema_changes4 schema migrations

files_touched12 files touched spark/jobs/finance/

investigations7 automated investigations triggered