When standardizing column values to given vocabularies (e.g., for fuzzy join during enrichment), it's rare to find perfect matches. Cocoon uses LLMs to interpret nuanced differences and recommend the matches.
Cocoon is open-source. Try out Cocoon in Google Colab.
Cocoon connects to your data warehouses (e.g., Snowflake, Duckdb...) and uses LLMs (e.g., GPT-4, Claude-3, Gemini-Ultra, or your local LLMs) to standardize tables.Need support? Contact me at zh2408@columbia.edu
Table Standardization is based on the following research paper:
@inproceedings{huang2024disambiguate,
title={Disambiguate Entity Matching using Large Language Models through Relation Discovery},
author={Huang, Zezhou},
booktitle={Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI},
pages={36--39},
year={2024}
}