Data profiling is the first step in understanding your tables and identifying any anomalies. Cocoon uses LLMs to bring semantic insights to your profile.
Cocoon is open-source. Try out Cocoon in Google Colab.
This requires an LLM API (e.g., GPT-4, Claude-3, Gemini-Ultra, or your local LLMs) but offers an interactive experience with no size or column limitations. It also supports databases (e.g., Snowflake, Duckdb...).Need support or have questions? Contact Us
More example profiles, from Kaggle datasets
The table lists air quality information for different cities in various countries.
ProfileTable Profiling is based on the following research paper:
@inproceedings{huang2024cocoon,
title={Cocoon: Semantic Table Profiling Using Large Language Models},
author={Huang, Zezhou and Wu, Eugene},
booktitle={Proceedings of the Workshop on Human-In-the-Loop Data Analytics},
year={2024}
}