Table Standardization

When standardizing column values to given vocabularies (e.g., for fuzzy join during enrichment), it's rare to find perfect matches. Cocoon uses LLMs to interpret nuanced differences and recommend the matches.

Get Started

Open Source

Full Features

Cocoon is open-source. Try out Cocoon in Google Colab.

Cocoon connects to your data warehouses (e.g., Snowflake, Duckdb...) and uses LLMs (e.g., GPT-4, Claude-3, Gemini-Ultra, or your local LLMs) to standardize tables.

Visit GitHub Google Colab

Need support? Contact me at zh2408@columbia.edu

Research

Table Standardization is based on the following research paper:

Disambiguate Entity Matching using Large Language Models through Relation Discovery

@inproceedings{huang2024disambiguate,
    title={Disambiguate Entity Matching using Large Language Models through Relation Discovery},
    author={Huang, Zezhou},
    booktitle={Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI},
    pages={36--39},
    year={2024}
}