Before Cocoon
GenAI is overwhelmed by the messy source data ๐Ÿคฏ
It gets confused and starts making stuff up ๐Ÿคช
You
๐Ÿ™‚ Which ad groups have the highest number of targeted keywords?
GenAI
๐Ÿ˜ต Current RAG: Reading all the source data schema from databases... phew, it's a lot!
CREATE TABLE 'advertised_product_report_data' (
    ad_group_id BIGINT,
    ad_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    __advertised_asin DOUBLE,
    __advertised_sku DOUBLE
);

CREATE TABLE 'ad_group_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    default_bid DOUBLE,
    name VARCHAR,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'ad_group_level_report_data' (
    ad_group_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_bidding_strategy VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT
);

CREATE TABLE 'campaign_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    bidding_strategy VARCHAR,
    creation_date VARCHAR,
    budget BIGINT,
    end_date DOUBLE,
    name VARCHAR,
    portfolio_id BIGINT,
    profile_id BIGINT,
    serving_status VARCHAR,
    start_date VARCHAR,
    state VARCHAR,
    targeting_type VARCHAR,
    budget_type VARCHAR,
    effective_budget DOUBLE
);

CREATE TABLE 'campaign_level_report_data' (
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_applicable_budget_rule_id DOUBLE,
    campaign_applicable_budget_rule_name DOUBLE,
    campaign_bidding_strategy VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions DOUBLE,
    campaign_rule_based_budget_amount DOUBLE
);

CREATE TABLE 'keyword_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    bid DOUBLE,
    campaign_id BIGINT,
    creation_date VARCHAR,
    keyword_text VARCHAR,
    match_type VARCHAR,
    native_language_keyword DOUBLE,
    serving_status VARCHAR,
    state VARCHAR,
    native_language_locale DOUBLE
);

CREATE TABLE 'portfolio_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    budget_amount BIGINT,
    budget_currency_code VARCHAR,
    budget_end_date VARCHAR,
    budget_policy VARCHAR,
    budget_start_date VARCHAR,
    creation_date VARCHAR,
    in_budget BOOLEAN,
    name VARCHAR,
    profile_id BIGINT,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'product_ad_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    asin VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    serving_status VARCHAR,
    sku DOUBLE,
    state VARCHAR
);

CREATE TABLE 'profile_data' (
    id BIGINT,
    _fivetran_deleted BOOLEAN,
    _fivetran_synced VARCHAR,
    account_id VARCHAR,
    account_marketplace_string_id VARCHAR,
    account_name VARCHAR,
    account_sub_type VARCHAR,
    account_type VARCHAR,
    account_valid_payment_method BOOLEAN,
    country_code VARCHAR,
    currency_code VARCHAR,
    daily_budget BIGINT,
    timezone VARCHAR
);

CREATE TABLE 'search_term_ad_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    search_term VARCHAR,
    targeting VARCHAR
);

CREATE TABLE 'targeting_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    ad_keyword_status VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    keyword_type VARCHAR,
    match_type VARCHAR,
    targeting VARCHAR
);
GenAI
๐Ÿคจ I've done my best with the SQL... fingers crossed it's right!
SELECT 
    ag.id AS ad_group_id,
    ag.name AS ad_group_name,
    c.name AS campaign_name,
    COUNT(DISTINCT k.id) AS targeted_keyword_count
FROM 
    ad_group_history_data ag
JOIN 
    campaign_history_data c ON ag.campaign_id = c.id
LEFT JOIN 
    keyword_history_data k ON ag.id = k.ad_group_id
WHERE 
    ag.state = 'enabled' AND
    k.state = 'enabled' AND
    k.serving_status = 'ELIGIBLE'
GROUP BY 
    ag.id, ag.name, c.name
ORDER BY 
    targeted_keyword_count DESC
LIMIT 10
GenAI
๐Ÿคจ SQL runs successful! The result is ... empty??
You
๐Ÿ˜ฅ Oops, looks like you're making up those joins and predicates!
After Cocoon
Cocoon prepares your data, making it RAG-ready โœจ
GenAI finds just what it needs, no more wild guesses! ๐ŸŽฏ
You
๐Ÿ™‚ Which ad groups have the highest number of targeted keywords?
GenAI
๐Ÿ˜Ž RAG from Cocoon: Putting the question in context using Cocoon's ER story... nice!
- relation_desc: Marketers create Account Profiles and organize Portfolios within
    them.
  relation_name: ProfilePortfolioManagement
- relation_desc: Marketers assign Campaigns to Portfolios within Account Profiles.
  relation_name: CampaignPortfolioProfileHierarchy
- relation_desc: Advertisers create Ad Groups within each Campaign.
  relation_name: CampaignAdGroups
- relation_desc: Advertisers add Keywords to Ad Groups in Campaigns.
  relation_name: CampaignAdGroupKeywordHierarchy
- relation_desc: Advertisers create Product Ads within Ad Groups and Campaigns.
  relation_name: AdCampaignHierarchy
- relation_desc: System tracks daily performance of targeted Keywords.
  relation_name: TargetedKeywordDailyPerformance
- relation_desc: System records daily performance data for search terms.
  relation_name: SearchTermAdKeywordPerformance
- relation_desc: System measures daily performance of Ads, Groups, and Campaigns.
  relation_name: CampaignAdPerformanceMetrics
๐Ÿค“ We've found the related relations: CampaignAdGroups, CampaignAdGroupKeywordHierarchy, TargetedKeywordDailyPerformance, CampaignAdPerformanceMetrics

Story behind the relationships (only for those connecting >= 2 entities)

  1. [ProfilePortfolioManagement]: Marketers create Account Profiles and organize Portfolios within them.
  2. [CampaignPortfolioProfileHierarchy]: Marketers assign Campaigns to Portfolios within Account Profiles.
  3. [CampaignAdGroups]: Advertisers create Ad Groups within each Campaign.
  4. [CampaignAdGroupKeywordHierarchy]: Advertisers add Keywords to Ad Groups in Campaigns.
  5. [AdCampaignHierarchy]: Advertisers create Product Ads within Ad Groups and Campaigns.
  6. [TargetedKeywordDailyPerformance]: System tracks daily performance of targeted Keywords.
  7. [SearchTermAdKeywordPerformance]: System records daily performance data for search terms.
  8. [CampaignAdPerformanceMetrics]: System measures daily performance of Ads, Groups, and Campaigns.
%3 CampaignAdGroups CampaignAdGroups Ad Groups Ad Groups CampaignAdGroups--Ad Groups Campaigns Campaigns CampaignAdGroups--Campaigns AdCampaignHierarchy AdCampaignHierarchy AdCampaignHierarchy--Ad Groups AdCampaignHierarchy--Campaigns Product Ads Product Ads AdCampaignHierarchy--Product Ads CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy--Campaigns Portfolios Portfolios CampaignPortfolioProfileHierarchy--Portfolios Account Profiles Account Profiles CampaignPortfolioProfileHierarchy--Account Profiles CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy--Ad Groups CampaignAdGroupKeywordHierarchy--Campaigns Keywords Keywords CampaignAdGroupKeywordHierarchy--Keywords ProfilePortfolioManagement ProfilePortfolioManagement ProfilePortfolioManagement--Portfolios ProfilePortfolioManagement--Account Profiles CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics--Ad Groups CampaignAdPerformanceMetrics--Campaigns CampaignAdPerformanceMetrics--Product Ads SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance--Ad Groups SearchTermAdKeywordPerformance--Campaigns SearchTermAdKeywordPerformance--Keywords TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance--Ad Groups TargetedKeywordDailyPerformance--Campaigns TargetedKeywordDailyPerformance--Keywords
Reasoning: The question "Which ad groups have the highest number of targeted keywords?" is asking about the relationship between ad groups and keywords. In the story, it is asking for information about how many targeted keywords are associated with each ad group, and specifically which ad groups have the most keywords. This relates to how advertisers structure their campaigns and organize their keywords within ad groups.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Checking out all the related tables Cocoon set up... cool stuff!
- table_name: stg_targeting_keyword_report_data
  table_desc: The table is about targeting keyword report data for a specific ad campaign.
    It includes daily performance metrics for a single keyword "x wing parts" in an
    ad group. The data shows clicks, cost, impressions, and other campaign settings
    across different dates. The campaign has a daily budget of $2000 USD and uses
    exact match targeting for the keyword with a $6 bid.
  attributes: '[''ad_group_id'', ''campaign_id'', ''keyword_id'', ''keyword_status'',
    ''daily_budget_amount'', ''budget_currency'', ''budget_type'', ''clicks'', ''cost'',
    ''impressions'', ''keyword_bid'', ''keyword_match_type'', ''keyword_match_specificity'',
    ''targeted_keyword'', ''date_'']'
- table_name: stg_campaign_level_report_data
  table_desc: The table is about advertising campaign performance. It contains daily
    data for each campaign, including campaign ID, date, budget details, bidding strategy,
    and performance metrics. Key fields are campaign_id, date_, campaign_budget_amount,
    campaign_budget_type, clicks, cost, and impressions. The data appears to be for
    a single campaign (ID 2187) over multiple dates, with consistent budget settings
    but no recorded performance (zero clicks, cost, and impressions).
  attributes: '[''campaign_id'', ''bidding_strategy'', ''daily_budget_amount'', ''budget_currency'',
    ''budget_type'', ''clicks'', ''cost'', ''budget_rule_id'', ''budget_rule_name'',
    ''date_'', ''impressions'']'
- table_name: stg_ad_group_level_report_data
  table_desc: The table is about ad group level performance data. It includes details
    such as ad group ID, date, campaign bidding strategy, clicks, cost, and impressions.
    Each row represents daily performance metrics for a specific ad group. The data
    spans multiple dates and shows various performance indicators for advertising
    campaigns.
  attributes: '[''ad_group_id'', ''bidding_strategy'', ''clicks'', ''cost'', ''impressions'',
    ''date_'']'
- table_name: stg_advertised_product_report_data
  table_desc: The table is about advertised product report data. It contains details
    of ad campaigns including ad group ID, ad ID, campaign ID, date, budget amount,
    budget currency, budget type, clicks, cost, impressions, and advertised ASIN/SKU.
    The data shows daily performance metrics for different ads across various campaigns,
    tracking their budget, spend, and engagement.
  attributes: '[''ad_group_id'', ''ad_id'', ''campaign_id'', ''daily_budget_amount'',
    ''budget_currency'', ''budget_type'', ''clicks'', ''ad_spend'', ''impressions'',
    ''report_date'']'
- table_name: snapshot_ad_group_history_data
  table_desc: The table is about current ad group data. It tracks the most recent
    version of each ad group. The table includes details such as the ad group's ID,
    campaign ID, name, default bid, serving status, and state. Each row represents
    the latest state of an ad group. The creation_date shows when the ad group was
    initially created. This table provides a current snapshot of ad groups without
    historical changes.
  attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
    ''serving_status'', ''ad_group_state'', ''creation_date'']'
  history_table:
    attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
      ''serving_status'', ''ad_group_state'', ''creation_date'', ''last_updated_date'']'
    table_name: stg_ad_group_history_data
- table_name: snapshot_product_ad_history_data
  table_desc: The table is about current product ad status. It tracks the most recent
    version of each ad's information. Each row represents the latest state of a unique
    ad, identified by its ad_id. The table includes details like ad group ID, ASIN,
    campaign ID, current serving status, and state. It provides a snapshot of the
    current ad configurations without historical changes.
  attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
    ''ad_state'', ''creation_timestamp'', ''product_sku'']'
  history_table:
    attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
      ''ad_state'', ''creation_timestamp'', ''last_updated_timestamp'', ''product_sku'']'
    table_name: stg_product_ad_history_data
- table_name: snapshot_campaign_history_data
  table_desc: The table is about current campaign data. It tracks the most recent
    version of each campaign, including its bidding strategy, name, portfolio, profile,
    status, state, targeting type, budget details, and relevant dates. Each row represents
    the latest snapshot of a unique campaign, providing an up-to-date view of campaign
    configurations without historical versions.
  attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
    ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'', ''budget_type'',
    ''creation_datetime'', ''daily_budget'', ''end_date'', ''start_date'']'
  history_table:
    attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
      ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'',
      ''budget_type'', ''creation_datetime'', ''daily_budget'', ''end_date'', ''last_updated_datetime'',
      ''start_date'']'
    table_name: stg_campaign_history_data
- table_name: snapshot_keyword_history_data
  table_desc: The table is about current keyword data for advertising campaigns. It
    tracks the most recent version of each keyword, including its ID, ad group ID,
    campaign ID, bid amount, text, match type, and serving status. Each row represents
    the latest state of a unique keyword. The table provides up-to-date information
    on keyword settings and performance without historical data.
  attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
    ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
    ''native_language_keyword'', ''native_language_locale'']'
  history_table:
    attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
      ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
      ''last_updated_timestamp'', ''native_language_keyword'', ''native_language_locale'']'
    table_name: stg_keyword_history_data
๐Ÿค“ We've identified the related tables: stg_targeting_keyword_report_data, snapshot_ad_group_history_data
%3 stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data snapshot_keyword_history_data snapshot_keyword_history_data stg_search_term_ad_keyword_report_data->snapshot_keyword_history_data snapshot_ad_group_history_data snapshot_ad_group_history_data stg_search_term_ad_keyword_report_data->snapshot_ad_group_history_data snapshot_campaign_history_data snapshot_campaign_history_data stg_search_term_ad_keyword_report_data->snapshot_campaign_history_data stg_advertised_product_report_data stg_advertised_product_report_data snapshot_product_ad_history_data snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_ad_group_history_data stg_advertised_product_report_data->snapshot_campaign_history_data stg_campaign_level_report_data stg_campaign_level_report_data stg_campaign_level_report_data->snapshot_campaign_history_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data->snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_ad_group_history_data stg_targeting_keyword_report_data->snapshot_campaign_history_data stg_profile_data stg_profile_data snapshot_keyword_history_data->snapshot_ad_group_history_data snapshot_keyword_history_data->snapshot_campaign_history_data snapshot_product_ad_history_data->snapshot_ad_group_history_data snapshot_product_ad_history_data->snapshot_campaign_history_data snapshot_ad_group_history_data->snapshot_campaign_history_data snapshot_portfolio_history_data snapshot_portfolio_history_data snapshot_portfolio_history_data->stg_profile_data snapshot_campaign_history_data->stg_profile_data snapshot_campaign_history_data->snapshot_portfolio_history_data stg_ad_group_level_report_data stg_ad_group_level_report_data stg_ad_group_level_report_data->snapshot_ad_group_history_data
๐Ÿง  Explanation: The available data is sufficient to answer the business question "Which ad groups have the highest number of targeted keywords?". We can use the 'stg_targeting_keyword_report_data' table, which contains information about keywords and their associated ad groups. To answer this question, we need to count the number of unique keywords for each ad group and then rank the ad groups based on this count.
๐Ÿ’ก Instruction to write SQL: 1. Start with the 'stg_targeting_keyword_report_data' table. 2. Group the data by 'ad_group_id'. 3. Count the distinct 'keyword_id' for each ad group. 4. Order the results by the keyword count in descending order. 5. Optionally, limit the results to show only the top N ad groups. 6. Join with 'snapshot_ad_group_history_data' to get the ad group names for better readability. The SQL query might look like this: SELECT agd.ad_group_name, tkrd.ad_group_id, COUNT(DISTINCT tkrd.keyword_id) as keyword_count FROM stg_targeting_keyword_report_data tkrd JOIN snapshot_ad_group_history_data agd ON tkrd.ad_group_id = agd.ad_group_id GROUP BY tkrd.ad_group_id, agd.ad_group_name ORDER BY keyword_count DESC LIMIT 10; -- Adjust the limit as needed This query will return the top 10 ad groups with the highest number of targeted keywords, along with their names and the count of unique keywords.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Looking at how tables connect, thanks to Cocoon's key info... clever!
snapshot_ad_group_history_data:
  foreign_keys:
  - column: campaign_id
    reference:
      column: campaign_id
      table_name: snapshot_campaign_history_data
  primary_key: ad_group_id
stg_targeting_keyword_report_data:
  foreign_keys:
  - column: ad_group_id
    reference:
      column: ad_group_id
      table_name: snapshot_ad_group_history_data
  - column: campaign_id
    reference:
      column: campaign_id
      table_name: snapshot_campaign_history_data
  - column: keyword_id
    reference:
      column: keyword_id
      table_name: snapshot_keyword_history_data
  primary_key: null
๐Ÿค“ We've planned the join: To execute the SQL, one join is needed. The join is between 'stg_targeting_keyword_report_data' and 'snapshot_ad_group_history_data' on the 'ad_group_id' column. It is an inner join.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Diving into column details Cocoon prepared... getting a bit technical here!
snapshot_ad_group_history_data:
- name: ad_group_id
  description: Unique identifier for the ad group
- name: ad_group_name
  description: Name of the ad group
stg_targeting_keyword_report_data:
- name: ad_group_id
  description: Unique identifier for the ad group
- name: keyword_id
  description: Unique identifier for the keyword
๐Ÿค” Reasoning for column selection: Based on the SQL approach and join description, we need to select specific columns from each table to fulfill the requirements. Here's why these columns are necessary: 1. From 'stg_targeting_keyword_report_data': - 'ad_group_id' is needed for grouping and joining with the ad group history table. - 'keyword_id' is required to count distinct keywords per ad group. 2. From 'snapshot_ad_group_history_data': - 'ad_group_id' is necessary for joining with the keyword report table. - 'ad_group_name' is needed for better readability in the final output. These columns allow us to perform the required grouping, joining, and counting operations, as well as provide the necessary information for the final output as specified in the SQL approach.
GenAI
๐Ÿ˜Š We have written the SQL:
SELECT 
    agd.ad_group_name,
    tkrd.ad_group_id,
    COUNT(DISTINCT tkrd.keyword_id) AS keyword_count
FROM 
    stg_targeting_keyword_report_data tkrd
INNER JOIN 
    snapshot_ad_group_history_data agd ON tkrd.ad_group_id = agd.ad_group_id
GROUP BY 
    tkrd.ad_group_id, agd.ad_group_name
ORDER BY 
    keyword_count DESC
LIMIT 10;
GenAI
โœ… SQL runs successful! Here are the samples (first 5 rows):
ad_group_name ad_group_id keyword_count
0 Red 7 421 1
You
๐Ÿซก Nailed it! Thanks a bunch!
โŸณ New chat results may be available. Refresh to check.