Before Cocoon
GenAI is overwhelmed by the messy source data ๐Ÿคฏ
It gets confused and starts making stuff up ๐Ÿคช
You
๐Ÿ™‚ What's the active target keyword that drives the most clicks?
GenAI
๐Ÿ˜ต Current RAG: Reading all the source data schema from databases... phew, it's a lot!
CREATE TABLE 'advertised_product_report_data' (
    ad_group_id BIGINT,
    ad_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    __advertised_asin DOUBLE,
    __advertised_sku DOUBLE
);

CREATE TABLE 'ad_group_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    default_bid DOUBLE,
    name VARCHAR,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'ad_group_level_report_data' (
    ad_group_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_bidding_strategy VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT
);

CREATE TABLE 'campaign_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    bidding_strategy VARCHAR,
    creation_date VARCHAR,
    budget BIGINT,
    end_date DOUBLE,
    name VARCHAR,
    portfolio_id BIGINT,
    profile_id BIGINT,
    serving_status VARCHAR,
    start_date VARCHAR,
    state VARCHAR,
    targeting_type VARCHAR,
    budget_type VARCHAR,
    effective_budget DOUBLE
);

CREATE TABLE 'campaign_level_report_data' (
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_applicable_budget_rule_id DOUBLE,
    campaign_applicable_budget_rule_name DOUBLE,
    campaign_bidding_strategy VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions DOUBLE,
    campaign_rule_based_budget_amount DOUBLE
);

CREATE TABLE 'keyword_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    bid DOUBLE,
    campaign_id BIGINT,
    creation_date VARCHAR,
    keyword_text VARCHAR,
    match_type VARCHAR,
    native_language_keyword DOUBLE,
    serving_status VARCHAR,
    state VARCHAR,
    native_language_locale DOUBLE
);

CREATE TABLE 'portfolio_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    budget_amount BIGINT,
    budget_currency_code VARCHAR,
    budget_end_date VARCHAR,
    budget_policy VARCHAR,
    budget_start_date VARCHAR,
    creation_date VARCHAR,
    in_budget BOOLEAN,
    name VARCHAR,
    profile_id BIGINT,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'product_ad_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    asin VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    serving_status VARCHAR,
    sku DOUBLE,
    state VARCHAR
);

CREATE TABLE 'profile_data' (
    id BIGINT,
    _fivetran_deleted BOOLEAN,
    _fivetran_synced VARCHAR,
    account_id VARCHAR,
    account_marketplace_string_id VARCHAR,
    account_name VARCHAR,
    account_sub_type VARCHAR,
    account_type VARCHAR,
    account_valid_payment_method BOOLEAN,
    country_code VARCHAR,
    currency_code VARCHAR,
    daily_budget BIGINT,
    timezone VARCHAR
);

CREATE TABLE 'search_term_ad_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    search_term VARCHAR,
    targeting VARCHAR
);

CREATE TABLE 'targeting_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    ad_keyword_status VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    keyword_type VARCHAR,
    match_type VARCHAR,
    targeting VARCHAR
);
GenAI
๐Ÿคจ I've done my best with the SQL... fingers crossed it's right!
SELECT 
    khd.keyword_text,
    khd.match_type,
    SUM(tkrd.clicks) as total_clicks
FROM 
    keyword_history_data khd
JOIN 
    targeting_keyword_report_data tkrd ON khd.id = tkrd.keyword_id
WHERE 
    khd.state = 'enabled' -- Assuming 'enabled' represents active keywords
    AND khd.serving_status = 'SERVING' -- Ensure the keyword is currently serving
GROUP BY 
    khd.keyword_text,
    khd.match_type
ORDER BY 
    total_clicks DESC
LIMIT 1
GenAI
๐Ÿคจ SQL runs successful! The result is ... empty??
You
๐Ÿ˜ฅ Oops, looks like you're making up those joins and predicates!
After Cocoon
Cocoon prepares your data, making it RAG-ready โœจ
GenAI finds just what it needs, no more wild guesses! ๐ŸŽฏ
You
๐Ÿ™‚ What's the active target keyword that drives the most clicks?
GenAI
๐Ÿ˜Ž RAG from Cocoon: Putting the question in context using Cocoon's ER story... nice!
- relation_desc: Marketers create Account Profiles and organize Portfolios within
    them.
  relation_name: ProfilePortfolioManagement
- relation_desc: Marketers assign Campaigns to Portfolios within Account Profiles.
  relation_name: CampaignPortfolioProfileHierarchy
- relation_desc: Advertisers create Ad Groups within each Campaign.
  relation_name: CampaignAdGroups
- relation_desc: Advertisers add Keywords to Ad Groups in Campaigns.
  relation_name: CampaignAdGroupKeywordHierarchy
- relation_desc: Advertisers create Product Ads within Ad Groups and Campaigns.
  relation_name: AdCampaignHierarchy
- relation_desc: System tracks daily performance of targeted Keywords.
  relation_name: TargetedKeywordDailyPerformance
- relation_desc: System records daily performance data for search terms.
  relation_name: SearchTermAdKeywordPerformance
- relation_desc: System measures daily performance of Ads, Groups, and Campaigns.
  relation_name: CampaignAdPerformanceMetrics
๐Ÿค“ We've found the related relations: CampaignAdGroupKeywordHierarchy, TargetedKeywordDailyPerformance, SearchTermAdKeywordPerformance, CampaignAdPerformanceMetrics

Story behind the relationships (only for those connecting >= 2 entities)

  1. [ProfilePortfolioManagement]: Marketers create Account Profiles and organize Portfolios within them.
  2. [CampaignPortfolioProfileHierarchy]: Marketers assign Campaigns to Portfolios within Account Profiles.
  3. [CampaignAdGroups]: Advertisers create Ad Groups within each Campaign.
  4. [CampaignAdGroupKeywordHierarchy]: Advertisers add Keywords to Ad Groups in Campaigns.
  5. [AdCampaignHierarchy]: Advertisers create Product Ads within Ad Groups and Campaigns.
  6. [TargetedKeywordDailyPerformance]: System tracks daily performance of targeted Keywords.
  7. [SearchTermAdKeywordPerformance]: System records daily performance data for search terms.
  8. [CampaignAdPerformanceMetrics]: System measures daily performance of Ads, Groups, and Campaigns.
%3 CampaignAdGroups CampaignAdGroups Ad Groups Ad Groups CampaignAdGroups--Ad Groups Campaigns Campaigns CampaignAdGroups--Campaigns AdCampaignHierarchy AdCampaignHierarchy AdCampaignHierarchy--Ad Groups AdCampaignHierarchy--Campaigns Product Ads Product Ads AdCampaignHierarchy--Product Ads CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy--Campaigns Portfolios Portfolios CampaignPortfolioProfileHierarchy--Portfolios Account Profiles Account Profiles CampaignPortfolioProfileHierarchy--Account Profiles CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy--Ad Groups CampaignAdGroupKeywordHierarchy--Campaigns Keywords Keywords CampaignAdGroupKeywordHierarchy--Keywords ProfilePortfolioManagement ProfilePortfolioManagement ProfilePortfolioManagement--Portfolios ProfilePortfolioManagement--Account Profiles CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics--Ad Groups CampaignAdPerformanceMetrics--Campaigns CampaignAdPerformanceMetrics--Product Ads SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance--Ad Groups SearchTermAdKeywordPerformance--Campaigns SearchTermAdKeywordPerformance--Keywords TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance--Ad Groups TargetedKeywordDailyPerformance--Campaigns TargetedKeywordDailyPerformance--Keywords
Reasoning: The question "What's the active target keyword that drives the most clicks?" is asking about the performance of keywords in terms of generating clicks. In the story, it is asking for information about the targeted keywords that are currently active and their performance metrics, specifically focusing on which one generates the highest number of clicks. This involves analyzing the performance data of keywords used in advertising campaigns.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Checking out all the related tables Cocoon set up... cool stuff!
- table_name: stg_targeting_keyword_report_data
  table_desc: The table is about targeting keyword report data for a specific ad campaign.
    It includes daily performance metrics for a single keyword "x wing parts" in an
    ad group. The data shows clicks, cost, impressions, and other campaign settings
    across different dates. The campaign has a daily budget of $2000 USD and uses
    exact match targeting for the keyword with a $6 bid.
  attributes: '[''ad_group_id'', ''campaign_id'', ''keyword_id'', ''keyword_status'',
    ''daily_budget_amount'', ''budget_currency'', ''budget_type'', ''clicks'', ''cost'',
    ''impressions'', ''keyword_bid'', ''keyword_match_type'', ''keyword_match_specificity'',
    ''targeted_keyword'', ''date_'']'
- table_name: stg_campaign_level_report_data
  table_desc: The table is about advertising campaign performance. It contains daily
    data for each campaign, including campaign ID, date, budget details, bidding strategy,
    and performance metrics. Key fields are campaign_id, date_, campaign_budget_amount,
    campaign_budget_type, clicks, cost, and impressions. The data appears to be for
    a single campaign (ID 2187) over multiple dates, with consistent budget settings
    but no recorded performance (zero clicks, cost, and impressions).
  attributes: '[''campaign_id'', ''bidding_strategy'', ''daily_budget_amount'', ''budget_currency'',
    ''budget_type'', ''clicks'', ''cost'', ''budget_rule_id'', ''budget_rule_name'',
    ''date_'', ''impressions'']'
- table_name: stg_search_term_ad_keyword_report_data
  table_desc: The table is about search term ad keyword report data. It contains information
    on ad campaigns, including ad group ID, campaign ID, date, keyword ID, budget
    details, and performance metrics. The data includes clicks, cost, impressions,
    keyword bid, search term, and targeting. Each row represents a daily report for
    a specific keyword in an ad campaign.
  attributes: '[''ad_group_id'', ''campaign_id'', ''keyword_id'', ''daily_budget_amount'',
    ''budget_currency'', ''budget_type'', ''clicks'', ''cost'', ''impressions'', ''keyword_bid'',
    ''search_term'', ''targeting_criteria'', ''report_date'']'
- table_name: stg_ad_group_level_report_data
  table_desc: The table is about ad group level performance data. It includes details
    such as ad group ID, date, campaign bidding strategy, clicks, cost, and impressions.
    Each row represents daily performance metrics for a specific ad group. The data
    spans multiple dates and shows various performance indicators for advertising
    campaigns.
  attributes: '[''ad_group_id'', ''bidding_strategy'', ''clicks'', ''cost'', ''impressions'',
    ''date_'']'
- table_name: stg_advertised_product_report_data
  table_desc: The table is about advertised product report data. It contains details
    of ad campaigns including ad group ID, ad ID, campaign ID, date, budget amount,
    budget currency, budget type, clicks, cost, impressions, and advertised ASIN/SKU.
    The data shows daily performance metrics for different ads across various campaigns,
    tracking their budget, spend, and engagement.
  attributes: '[''ad_group_id'', ''ad_id'', ''campaign_id'', ''daily_budget_amount'',
    ''budget_currency'', ''budget_type'', ''clicks'', ''ad_spend'', ''impressions'',
    ''report_date'']'
- table_name: snapshot_ad_group_history_data
  table_desc: The table is about current ad group data. It tracks the most recent
    version of each ad group. The table includes details such as the ad group's ID,
    campaign ID, name, default bid, serving status, and state. Each row represents
    the latest state of an ad group. The creation_date shows when the ad group was
    initially created. This table provides a current snapshot of ad groups without
    historical changes.
  attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
    ''serving_status'', ''ad_group_state'', ''creation_date'']'
  history_table:
    attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
      ''serving_status'', ''ad_group_state'', ''creation_date'', ''last_updated_date'']'
    table_name: stg_ad_group_history_data
- table_name: snapshot_product_ad_history_data
  table_desc: The table is about current product ad status. It tracks the most recent
    version of each ad's information. Each row represents the latest state of a unique
    ad, identified by its ad_id. The table includes details like ad group ID, ASIN,
    campaign ID, current serving status, and state. It provides a snapshot of the
    current ad configurations without historical changes.
  attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
    ''ad_state'', ''creation_timestamp'', ''product_sku'']'
  history_table:
    attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
      ''ad_state'', ''creation_timestamp'', ''last_updated_timestamp'', ''product_sku'']'
    table_name: stg_product_ad_history_data
- table_name: snapshot_campaign_history_data
  table_desc: The table is about current campaign data. It tracks the most recent
    version of each campaign, including its bidding strategy, name, portfolio, profile,
    status, state, targeting type, budget details, and relevant dates. Each row represents
    the latest snapshot of a unique campaign, providing an up-to-date view of campaign
    configurations without historical versions.
  attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
    ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'', ''budget_type'',
    ''creation_datetime'', ''daily_budget'', ''end_date'', ''start_date'']'
  history_table:
    attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
      ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'',
      ''budget_type'', ''creation_datetime'', ''daily_budget'', ''end_date'', ''last_updated_datetime'',
      ''start_date'']'
    table_name: stg_campaign_history_data
- table_name: snapshot_keyword_history_data
  table_desc: The table is about current keyword data for advertising campaigns. It
    tracks the most recent version of each keyword, including its ID, ad group ID,
    campaign ID, bid amount, text, match type, and serving status. Each row represents
    the latest state of a unique keyword. The table provides up-to-date information
    on keyword settings and performance without historical data.
  attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
    ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
    ''native_language_keyword'', ''native_language_locale'']'
  history_table:
    attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
      ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
      ''last_updated_timestamp'', ''native_language_keyword'', ''native_language_locale'']'
    table_name: stg_keyword_history_data
๐Ÿค“ We've identified the related tables: stg_targeting_keyword_report_data
%3 snapshot_portfolio_history_data snapshot_portfolio_history_data stg_profile_data stg_profile_data snapshot_portfolio_history_data->stg_profile_data snapshot_campaign_history_data snapshot_campaign_history_data snapshot_campaign_history_data->snapshot_portfolio_history_data snapshot_campaign_history_data->stg_profile_data stg_advertised_product_report_data stg_advertised_product_report_data stg_advertised_product_report_data->snapshot_campaign_history_data snapshot_product_ad_history_data snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_product_ad_history_data snapshot_ad_group_history_data snapshot_ad_group_history_data stg_advertised_product_report_data->snapshot_ad_group_history_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data->snapshot_campaign_history_data snapshot_keyword_history_data snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_ad_group_history_data stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data->snapshot_campaign_history_data stg_search_term_ad_keyword_report_data->snapshot_keyword_history_data stg_search_term_ad_keyword_report_data->snapshot_ad_group_history_data stg_ad_group_level_report_data stg_ad_group_level_report_data stg_ad_group_level_report_data->snapshot_ad_group_history_data snapshot_keyword_history_data->snapshot_campaign_history_data snapshot_keyword_history_data->snapshot_ad_group_history_data snapshot_product_ad_history_data->snapshot_campaign_history_data snapshot_product_ad_history_data->snapshot_ad_group_history_data stg_campaign_level_report_data stg_campaign_level_report_data stg_campaign_level_report_data->snapshot_campaign_history_data snapshot_ad_group_history_data->snapshot_campaign_history_data
๐Ÿง  Explanation: The available data is sufficient to answer the business question "What's the active target keyword that drives the most clicks?". We can use the 'stg_targeting_keyword_report_data' table as our primary source, which contains information about keywords, their status, and click data. We'll need to filter for active keywords and aggregate the clicks for each keyword. Here's how we can approach this: 1. Use 'stg_targeting_keyword_report_data' table to get keyword data and clicks. 2. Filter for active keywords using the 'keyword_status' column. 3. Group by 'targeted_keyword' to aggregate clicks for each keyword. 4. Order by total clicks in descending order to find the keyword with the most clicks. 5. Limit the result to 1 to get the top keyword. We don't need to join with other tables as all required information is present in the 'stg_targeting_keyword_report_data' table.
๐Ÿ’ก Instruction to write SQL: 1. Start with the 'stg_targeting_keyword_report_data' table. 2. Filter rows where 'keyword_status' indicates an active status (exact value depends on how active status is represented in the data). 3. Group by 'targeted_keyword'. 4. Sum the 'clicks' for each keyword. 5. Order the results by the sum of clicks in descending order. 6. Limit the result to 1 to get the top keyword. The SQL query would look something like this: SELECT targeted_keyword, SUM(clicks) as total_clicks FROM stg_targeting_keyword_report_data WHERE keyword_status = 'active' -- replace 'active' with the actual value representing active status GROUP BY targeted_keyword ORDER BY total_clicks DESC LIMIT 1;
GenAI
๐Ÿ˜Ž RAG from Cocoon: Looking at how tables connect, thanks to Cocoon's key info... clever!
stg_targeting_keyword_report_data:
  foreign_keys:
  - column: ad_group_id
    reference:
      column: ad_group_id
      table_name: snapshot_ad_group_history_data
  - column: campaign_id
    reference:
      column: campaign_id
      table_name: snapshot_campaign_history_data
  - column: keyword_id
    reference:
      column: keyword_id
      table_name: snapshot_keyword_history_data
  primary_key: null
๐Ÿค“ We've planned the join: To execute the SQL, no joins are needed. The query can be completed using only the 'stg_targeting_keyword_report_data' table.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Diving into column details Cocoon prepared... getting a bit technical here!
stg_targeting_keyword_report_data:
- name: keyword_status
  description: Current status of the keyword in the ad group
  domain: '[''ENABLED'']'
- name: targeted_keyword
  description: The targeted keyword phrase
- name: clicks
  description: Number of clicks received on the ad
๐Ÿค” Reasoning for column selection: Based on the SQL approach and join description provided, we only need to use the 'stg_targeting_keyword_report_data' table. The necessary columns are: 1. 'keyword_status': This is needed for filtering active keywords. 2. 'targeted_keyword': This is used for grouping and in the final output. 3. 'clicks': This is used for summing and ordering the results. No joins are required as all the necessary data is in a single table. The query filters for active keywords, groups by the targeted keyword, sums the clicks, and then orders and limits the results to get the top keyword by clicks.
GenAI
๐Ÿ˜Š We have written the SQL:
SELECT 
    targeted_keyword, 
    SUM(clicks) as total_clicks 
FROM 
    stg_targeting_keyword_report_data 
WHERE 
    keyword_status = 'ENABLED'
GROUP BY 
    targeted_keyword 
ORDER BY 
    total_clicks DESC 
LIMIT 1;
GenAI
โœ… SQL runs successful! Here are the samples (first 5 rows):
targeted_keyword total_clicks
0 x wing parts 1007.0
You
๐Ÿซก Nailed it! Thanks a bunch!
โŸณ New chat results may be available. Refresh to check.