//Walkthrough

Multi-engine compute with complete observability

Spark, DuckDB, and Polars on one lake. Every run and query captures logs, spans, lineage, and cost. Full context for your team and agents from day zero.

[01]SPARK AUTO-INVESTIGATION
[02]AUTO-FIX & REDEPLOY
//Core
[01] Compute

Compute with context

Build. Deploy. Debug. Let us deploy your Spark jobs and handle production monitoring, running investigations as soon as things go wrong.

process_pending_invoices.py
[02] Storage

Lake storage & query engines

Work on your own data with an integrated Iceberg storage and query layer, while keeping lineage and observability attached to every query. You can also bring your own iceberg catalog along for the ride.

flower_power.sql
[03] Code

English to distributed SQL

Use your own agent or IDE that inherits our Spark & Iceberg expertise and context. Ask for PySpark against your oleander lake in plain language.

npx skills add oleanderHQ/skills
Skill pack

Build a PySpark script for oleander.default.global_flowers. Filter to poisonous flowers, normalize genus and continent values, and compute risk slices by continent, toxicity band, genus, and bloom season with record counts and confidence metrics. Write partitioned outputs to oleander.analytics.toxicity_by_continent, oleander.analytics.toxicity_by_genus, and oleander.analytics.high_risk_species, and include idempotent upsert behavior so downstream dashboards and anomaly monitors can consume each dataset safely.

//Observability
[04] Incidents

Automated incident investigations

Anomaly investigation starts the moment alerts fire. Pull deep context from your telemetry lake to pinpoint and assist with root cause before downstream impacts.

Incidents/ spark job rows written dropped 15x
Investigations/ #381
    [05] Root cause

    Full context-aware root cause analysis

    Skip dashboard hopping. Debug production issues using an independent metadata context layer and trace root causes across your data infrastructure on day zero.

    CLI
    [06] Telemetry

    Query telemetry data with SQL

    Correlate metrics, logs, traces, and lineage metadata instantly so you can understand the intent behind every deployed pipeline with zero context switching.

    pending_invoices.sql
    [07] Alerting

    Quick, smart alerting with incident triage

    Reduce on-call burden. Every alert is paired with a detailed knowledge graph of your data infrastructure so you understand how everything fits together. A shared context with your team to solve incidents faster.

    @alertingInvestigate new production alerts and generate triage context with downstream blast radius and likely remediation steps.

    DoneThought for 5s

    Active Alert

    severityP1 high-impact anomaly
    pipelinefinance.billing.process_pending_invoices
    signalrows_written down 94% vs 7-day avg
    started_at2025-01-28 10:03:12 UTC

    Triage Context

    blast_radius4 downstream models + 2 dashboards
    linked_runs3 upstream runs in last 30 minutes
    ownershipBilling Data Platform
    recommended_next_steprollback run i7k2n + replay window
    [08] Insights

    The context graph for your data infrastructure

    Every run, commit, deployment and dataset is connected. Search it. Trace it. Understand it. Share it.

    @insightsAnalyze the last 30 days of finance.billing.process_pending_invoices. Investigate the 15x drop in output volume and map the downstream impact.

    DoneThought for 5s

    Deployment & Execution

    activity28 days active across 45 runs
    outages3 major outages (#381, #394, #412)
    data12.5TB in; 2.1TB out with drift
    uptime99.2% uptime vs. 15x volume drop

    Code & Schema Evolution

    commits18 commits linked to active runs
    schema_changes4 schema migrations
    files_touched12 files touched spark/jobs/finance/
    investigations7 automated investigations triggered