DataFlint  
Apache Spark simplified

End-to-end actionable observability platform, focused on Apache Spark.
DataFlint identifies and alerts on performance bottlenecks in real-time.

Mitigate your big data performance bottlenecks

With real-time insights across both compute and storage layers, DataFlint identifies inefficiencies and provides actionable solutions—helping you optimize Spark jobs that are slow, costly, or resource-intensive.

Storage layer insights

Get actionable alerts to fix storage bottlenecks in real time

  • Automatically detect when you're reading or writing small files, which can severely impact performance.
  • Iceberg support - Optimize storage by addressing inefficiencies like minimal record changes and excessive table file replacements.

Compute layer insights

Get actionable alerts to fix compute bottlenecks in real time

  • From single-node inefficiencies to cluster-wide insights, address key issues fast.
  • Resolve compute inefficiencies like partition skew, memory overuse, and wasted cores, improving resource utilization and query performance.

Have a Spark Job to Optimize?
Start with Our Open Source Solution

DataFlint’s open-source tool enhances Apache Spark with a user-friendly interface that simplifies performance monitoring and debugging. By adding an intuitive tab to the Spark Web UI, it transforms a complex interface into an easy-to-navigate and insightful tool. Whether you're using Databricks, EMR, or Kubernetes.

Install with 2 lines of code

A DataFlint tab will be added to Spark UI, clicking on it will open a real-time web app

  • Supports installation with scala, pyspark and no-code.
  • Supports installation on Spark History Server.
  • Supports Databricks, AWS EMR and Kubernetes.
from pyspark.sql import SparkSession

builder = pyspark.sql.SparkSession.builder
    ...
    .config("spark.jars.packages", "io.dataflint:spark_2.12:0.2.3") \
    .config("spark.plugins", "io.dataflint.spark.SparkDataflintPlugin") \
    ...

Access powerful features with DataFlint SaaS (in closed beta)

DataFlint Open Source helps you visibility for a single spark job, but with DataFlint SaaS you could have full observability for all of your jobs in all your spark applications, in one place!

Monitor your jobs

See all your application job in one place. Manage versions, see alerts

Get Alerted

Alerts on performance issues and query failures

Resources Managment

Tune your resource usage

Control all your applications

Manage all your spark applications in one place