Data Cleaning

Cocoon uses LLMs to to explore your tables, document their structures, and generate SQL for cleaning. The outputs include (1) SQL queries for cleaning, (2) a YAML file for table documentation, and (3) an HTML report for summaries

Get Started

Online Service

Quick Trial

Drop your csv, and we will generate the profile in ~10 min

Max 1MB and 40 columns

Open Source

Full Features

Cocoon is open-source. Try out Cocoon in Google Colab.

This requires an LLM API (e.g., GPT-4, Claude-3, Gemini-Ultra, or your local LLMs) but offers an interactive experience with no size or column limitations. It also supports databases (e.g., Snowflake, Duckdb...).

Need support or have questions? Contact Us

Gallery

More example results, from Kaggle datasets

Hospital
deep clean

The table is about hospital data.

Beers
deep clean

The table is about different beers.

Flights
deep clean

The table is about flights, with details of their schedules and actual times.

Movies
deep clean

The table is about movies.

Rayyan
deep clean

The table contains bibliographic data for various scientific publications.

2012 SAT result
light clean

The table is about SAT exam results from schools in 2012.

Property Sales
light clean

The table contains 2020 property sales data.

Animal Shelter
light clean

The table contains data about cats in an animal shelter.

ATP Matches
light clean

The table contains data about ATP tennis matches from 2008.

Credit Data
light clean

The table contains credit data for individuals in Germany.

Korean Dramas
light clean

The table contains information about Korean drama TV series.

Customer Orders
light clean

The table contains order information for customers.

Patient Data
light clean

The table contains patient data.

Used Cars
light clean

The table contains data about used cars for sale.