Before Cocoon
GenAI is overwhelmed by the messy source data ๐Ÿคฏ
It gets confused and starts making stuff up ๐Ÿคช
You
๐Ÿ™‚ How many active keywords are associated with each campaign?
GenAI
๐Ÿ˜ต Current RAG: Reading all the source data schema from databases... phew, it's a lot!
CREATE TABLE 'advertised_product_report_data' (
    ad_group_id BIGINT,
    ad_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    __advertised_asin DOUBLE,
    __advertised_sku DOUBLE
);

CREATE TABLE 'ad_group_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    default_bid DOUBLE,
    name VARCHAR,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'ad_group_level_report_data' (
    ad_group_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_bidding_strategy VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT
);

CREATE TABLE 'campaign_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    bidding_strategy VARCHAR,
    creation_date VARCHAR,
    budget BIGINT,
    end_date DOUBLE,
    name VARCHAR,
    portfolio_id BIGINT,
    profile_id BIGINT,
    serving_status VARCHAR,
    start_date VARCHAR,
    state VARCHAR,
    targeting_type VARCHAR,
    budget_type VARCHAR,
    effective_budget DOUBLE
);

CREATE TABLE 'campaign_level_report_data' (
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_applicable_budget_rule_id DOUBLE,
    campaign_applicable_budget_rule_name DOUBLE,
    campaign_bidding_strategy VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions DOUBLE,
    campaign_rule_based_budget_amount DOUBLE
);

CREATE TABLE 'keyword_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    bid DOUBLE,
    campaign_id BIGINT,
    creation_date VARCHAR,
    keyword_text VARCHAR,
    match_type VARCHAR,
    native_language_keyword DOUBLE,
    serving_status VARCHAR,
    state VARCHAR,
    native_language_locale DOUBLE
);

CREATE TABLE 'portfolio_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    budget_amount BIGINT,
    budget_currency_code VARCHAR,
    budget_end_date VARCHAR,
    budget_policy VARCHAR,
    budget_start_date VARCHAR,
    creation_date VARCHAR,
    in_budget BOOLEAN,
    name VARCHAR,
    profile_id BIGINT,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'product_ad_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    asin VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    serving_status VARCHAR,
    sku DOUBLE,
    state VARCHAR
);

CREATE TABLE 'profile_data' (
    id BIGINT,
    _fivetran_deleted BOOLEAN,
    _fivetran_synced VARCHAR,
    account_id VARCHAR,
    account_marketplace_string_id VARCHAR,
    account_name VARCHAR,
    account_sub_type VARCHAR,
    account_type VARCHAR,
    account_valid_payment_method BOOLEAN,
    country_code VARCHAR,
    currency_code VARCHAR,
    daily_budget BIGINT,
    timezone VARCHAR
);

CREATE TABLE 'search_term_ad_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    search_term VARCHAR,
    targeting VARCHAR
);

CREATE TABLE 'targeting_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    ad_keyword_status VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    keyword_type VARCHAR,
    match_type VARCHAR,
    targeting VARCHAR
);
GenAI
๐Ÿคจ I've done my best with the SQL... fingers crossed it's right!
SELECT 
    ch.id AS campaign_id,
    ch.name AS campaign_name,
    COUNT(DISTINCT kh.id) AS active_keywords_count
FROM 
    campaign_history_data ch
LEFT JOIN 
    keyword_history_data kh ON ch.id = kh.campaign_id
WHERE 
    ch.state = 'ENABLED' 
    AND kh.state = 'ENABLED'
    AND ch.serving_status = 'RUNNING'
    AND kh.serving_status = 'RUNNING'
    AND ch.last_updated_date = (
        SELECT MAX(last_updated_date) 
        FROM campaign_history_data 
        WHERE id = ch.id
    )
    AND kh.last_updated_date = (
        SELECT MAX(last_updated_date) 
        FROM keyword_history_data 
        WHERE id = kh.id
    )
GROUP BY 
    ch.id, ch.name
ORDER BY 
    active_keywords_count DESC
GenAI
๐Ÿคจ SQL runs successful! The result is ... empty??
You
๐Ÿ˜ฅ Oops, looks like you're making up those predicates!
After Cocoon
Cocoon prepares your data, making it RAG-ready โœจ
GenAI finds just what it needs, no more wild guesses! ๐ŸŽฏ
You
๐Ÿ™‚ How many active keywords are associated with each campaign?
GenAI
๐Ÿ˜Ž RAG from Cocoon: Putting the question in context using Cocoon's ER story... nice!
- relation_desc: Marketers create Account Profiles and organize Portfolios within
    them.
  relation_name: ProfilePortfolioManagement
- relation_desc: Marketers assign Campaigns to Portfolios within Account Profiles.
  relation_name: CampaignPortfolioProfileHierarchy
- relation_desc: Advertisers create Ad Groups within each Campaign.
  relation_name: CampaignAdGroups
- relation_desc: Advertisers add Keywords to Ad Groups in Campaigns.
  relation_name: CampaignAdGroupKeywordHierarchy
- relation_desc: Advertisers create Product Ads within Ad Groups and Campaigns.
  relation_name: AdCampaignHierarchy
- relation_desc: System tracks daily performance of targeted Keywords.
  relation_name: TargetedKeywordDailyPerformance
- relation_desc: System records daily performance data for search terms.
  relation_name: SearchTermAdKeywordPerformance
- relation_desc: System measures daily performance of Ads, Groups, and Campaigns.
  relation_name: CampaignAdPerformanceMetrics
๐Ÿค“ We've found the related relations: CampaignAdGroups, CampaignAdGroupKeywordHierarchy, TargetedKeywordDailyPerformance, CampaignAdPerformanceMetrics

Story behind the relationships (only for those connecting >= 2 entities)

  1. [ProfilePortfolioManagement]: Marketers create Account Profiles and organize Portfolios within them.
  2. [CampaignPortfolioProfileHierarchy]: Marketers assign Campaigns to Portfolios within Account Profiles.
  3. [CampaignAdGroups]: Advertisers create Ad Groups within each Campaign.
  4. [CampaignAdGroupKeywordHierarchy]: Advertisers add Keywords to Ad Groups in Campaigns.
  5. [AdCampaignHierarchy]: Advertisers create Product Ads within Ad Groups and Campaigns.
  6. [TargetedKeywordDailyPerformance]: System tracks daily performance of targeted Keywords.
  7. [SearchTermAdKeywordPerformance]: System records daily performance data for search terms.
  8. [CampaignAdPerformanceMetrics]: System measures daily performance of Ads, Groups, and Campaigns.
%3 CampaignAdGroups CampaignAdGroups Ad Groups Ad Groups CampaignAdGroups--Ad Groups Campaigns Campaigns CampaignAdGroups--Campaigns AdCampaignHierarchy AdCampaignHierarchy AdCampaignHierarchy--Ad Groups AdCampaignHierarchy--Campaigns Product Ads Product Ads AdCampaignHierarchy--Product Ads CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy--Campaigns Portfolios Portfolios CampaignPortfolioProfileHierarchy--Portfolios Account Profiles Account Profiles CampaignPortfolioProfileHierarchy--Account Profiles CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy--Ad Groups CampaignAdGroupKeywordHierarchy--Campaigns Keywords Keywords CampaignAdGroupKeywordHierarchy--Keywords ProfilePortfolioManagement ProfilePortfolioManagement ProfilePortfolioManagement--Portfolios ProfilePortfolioManagement--Account Profiles CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics--Ad Groups CampaignAdPerformanceMetrics--Campaigns CampaignAdPerformanceMetrics--Product Ads SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance--Ad Groups SearchTermAdKeywordPerformance--Campaigns SearchTermAdKeywordPerformance--Keywords TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance--Ad Groups TargetedKeywordDailyPerformance--Campaigns TargetedKeywordDailyPerformance--Keywords
Reasoning: The question "How many active keywords are associated with each campaign?" is asking for a count of active keywords linked to individual campaigns. In the story, it is asking for information about the relationship between keywords and campaigns. This involves understanding how keywords are organized within the campaign structure and which keywords are considered active.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Checking out all the related tables Cocoon set up... cool stuff!
- table_name: stg_targeting_keyword_report_data
  table_desc: The table is about targeting keyword report data for a specific ad campaign.
    It includes daily performance metrics for a single keyword "x wing parts" in an
    ad group. The data shows clicks, cost, impressions, and other campaign settings
    across different dates. The campaign has a daily budget of $2000 USD and uses
    exact match targeting for the keyword with a $6 bid.
  attributes: '[''ad_group_id'', ''campaign_id'', ''keyword_id'', ''keyword_status'',
    ''daily_budget_amount'', ''budget_currency'', ''budget_type'', ''clicks'', ''cost'',
    ''impressions'', ''keyword_bid'', ''keyword_match_type'', ''keyword_match_specificity'',
    ''targeted_keyword'', ''date_'']'
- table_name: stg_campaign_level_report_data
  table_desc: The table is about advertising campaign performance. It contains daily
    data for each campaign, including campaign ID, date, budget details, bidding strategy,
    and performance metrics. Key fields are campaign_id, date_, campaign_budget_amount,
    campaign_budget_type, clicks, cost, and impressions. The data appears to be for
    a single campaign (ID 2187) over multiple dates, with consistent budget settings
    but no recorded performance (zero clicks, cost, and impressions).
  attributes: '[''campaign_id'', ''bidding_strategy'', ''daily_budget_amount'', ''budget_currency'',
    ''budget_type'', ''clicks'', ''cost'', ''budget_rule_id'', ''budget_rule_name'',
    ''date_'', ''impressions'']'
- table_name: stg_ad_group_level_report_data
  table_desc: The table is about ad group level performance data. It includes details
    such as ad group ID, date, campaign bidding strategy, clicks, cost, and impressions.
    Each row represents daily performance metrics for a specific ad group. The data
    spans multiple dates and shows various performance indicators for advertising
    campaigns.
  attributes: '[''ad_group_id'', ''bidding_strategy'', ''clicks'', ''cost'', ''impressions'',
    ''date_'']'
- table_name: stg_advertised_product_report_data
  table_desc: The table is about advertised product report data. It contains details
    of ad campaigns including ad group ID, ad ID, campaign ID, date, budget amount,
    budget currency, budget type, clicks, cost, impressions, and advertised ASIN/SKU.
    The data shows daily performance metrics for different ads across various campaigns,
    tracking their budget, spend, and engagement.
  attributes: '[''ad_group_id'', ''ad_id'', ''campaign_id'', ''daily_budget_amount'',
    ''budget_currency'', ''budget_type'', ''clicks'', ''ad_spend'', ''impressions'',
    ''report_date'']'
- table_name: snapshot_ad_group_history_data
  table_desc: The table is about current ad group data. It tracks the most recent
    version of each ad group. The table includes details such as the ad group's ID,
    campaign ID, name, default bid, serving status, and state. Each row represents
    the latest state of an ad group. The creation_date shows when the ad group was
    initially created. This table provides a current snapshot of ad groups without
    historical changes.
  attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
    ''serving_status'', ''ad_group_state'', ''creation_date'']'
  history_table:
    attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
      ''serving_status'', ''ad_group_state'', ''creation_date'', ''last_updated_date'']'
    table_name: stg_ad_group_history_data
- table_name: snapshot_product_ad_history_data
  table_desc: The table is about current product ad status. It tracks the most recent
    version of each ad's information. Each row represents the latest state of a unique
    ad, identified by its ad_id. The table includes details like ad group ID, ASIN,
    campaign ID, current serving status, and state. It provides a snapshot of the
    current ad configurations without historical changes.
  attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
    ''ad_state'', ''creation_timestamp'', ''product_sku'']'
  history_table:
    attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
      ''ad_state'', ''creation_timestamp'', ''last_updated_timestamp'', ''product_sku'']'
    table_name: stg_product_ad_history_data
- table_name: snapshot_campaign_history_data
  table_desc: The table is about current campaign data. It tracks the most recent
    version of each campaign, including its bidding strategy, name, portfolio, profile,
    status, state, targeting type, budget details, and relevant dates. Each row represents
    the latest snapshot of a unique campaign, providing an up-to-date view of campaign
    configurations without historical versions.
  attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
    ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'', ''budget_type'',
    ''creation_datetime'', ''daily_budget'', ''end_date'', ''start_date'']'
  history_table:
    attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
      ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'',
      ''budget_type'', ''creation_datetime'', ''daily_budget'', ''end_date'', ''last_updated_datetime'',
      ''start_date'']'
    table_name: stg_campaign_history_data
- table_name: snapshot_keyword_history_data
  table_desc: The table is about current keyword data for advertising campaigns. It
    tracks the most recent version of each keyword, including its ID, ad group ID,
    campaign ID, bid amount, text, match type, and serving status. Each row represents
    the latest state of a unique keyword. The table provides up-to-date information
    on keyword settings and performance without historical data.
  attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
    ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
    ''native_language_keyword'', ''native_language_locale'']'
  history_table:
    attributes: '[''keyword_id'', ''ad_group_id'', ''keyword_bid'', ''campaign_id'',
      ''keyword_text'', ''match_type'', ''serving_status'', ''keyword_state'', ''creation_timestamp'',
      ''last_updated_timestamp'', ''native_language_keyword'', ''native_language_locale'']'
    table_name: stg_keyword_history_data
๐Ÿค“ We've identified the related tables: snapshot_keyword_history_data, snapshot_campaign_history_data
%3 stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data snapshot_keyword_history_data snapshot_keyword_history_data stg_search_term_ad_keyword_report_data->snapshot_keyword_history_data snapshot_ad_group_history_data snapshot_ad_group_history_data stg_search_term_ad_keyword_report_data->snapshot_ad_group_history_data snapshot_campaign_history_data snapshot_campaign_history_data stg_search_term_ad_keyword_report_data->snapshot_campaign_history_data stg_advertised_product_report_data stg_advertised_product_report_data snapshot_product_ad_history_data snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_ad_group_history_data stg_advertised_product_report_data->snapshot_campaign_history_data stg_campaign_level_report_data stg_campaign_level_report_data stg_campaign_level_report_data->snapshot_campaign_history_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data->snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_ad_group_history_data stg_targeting_keyword_report_data->snapshot_campaign_history_data stg_profile_data stg_profile_data snapshot_keyword_history_data->snapshot_ad_group_history_data snapshot_keyword_history_data->snapshot_campaign_history_data snapshot_product_ad_history_data->snapshot_ad_group_history_data snapshot_product_ad_history_data->snapshot_campaign_history_data snapshot_ad_group_history_data->snapshot_campaign_history_data snapshot_portfolio_history_data snapshot_portfolio_history_data snapshot_portfolio_history_data->stg_profile_data snapshot_campaign_history_data->stg_profile_data snapshot_campaign_history_data->snapshot_portfolio_history_data stg_ad_group_level_report_data stg_ad_group_level_report_data stg_ad_group_level_report_data->snapshot_ad_group_history_data
๐Ÿง  Explanation: The available data is sufficient to answer the business question "How many active keywords are associated with each campaign?". We can use the 'snapshot_keyword_history_data' table, which contains information about keywords, their associated campaigns, and their status. The 'keyword_state' column can be used to determine if a keyword is active. We'll need to count the number of active keywords for each campaign.
๐Ÿ’ก Instruction to write SQL: 1. Start with the 'snapshot_keyword_history_data' table. 2. Filter for active keywords using the 'keyword_state' column (assuming 'ACTIVE' or similar indicates an active state). 3. Group the results by 'campaign_id'. 4. Count the number of keywords in each group. 5. Optionally, join with 'snapshot_campaign_history_data' to include campaign names in the result. The SQL query might look like this: SELECT k.campaign_id, c.campaign_name, COUNT(DISTINCT k.keyword_id) as active_keyword_count FROM snapshot_keyword_history_data k LEFT JOIN snapshot_campaign_history_data c ON k.campaign_id = c.campaign_id WHERE k.keyword_state = 'ACTIVE' GROUP BY k.campaign_id, c.campaign_name ORDER BY active_keyword_count DESC; This query will give us the count of active keywords for each campaign, along with the campaign name for better readability.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Looking at how tables connect, thanks to Cocoon's key info... clever!
snapshot_campaign_history_data:
  foreign_keys:
  - column: portfolio_id
    reference:
      column: portfolio_id
      table_name: snapshot_portfolio_history_data
  - column: profile_id
    reference:
      column: profile_id
      table_name: stg_profile_data
  primary_key: campaign_id
snapshot_keyword_history_data:
  foreign_keys:
  - column: ad_group_id
    reference:
      column: ad_group_id
      table_name: snapshot_ad_group_history_data
  - column: campaign_id
    reference:
      column: campaign_id
      table_name: snapshot_campaign_history_data
  primary_key: keyword_id
๐Ÿค“ We've planned the join: To execute the SQL, one join is needed. The join is between 'snapshot_keyword_history_data' and 'snapshot_campaign_history_data' tables. The join key is 'campaign_id'. It is a left outer join, with 'snapshot_keyword_history_data' as the left table and 'snapshot_campaign_history_data' as the right table.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Diving into column details Cocoon prepared... getting a bit technical here!
snapshot_campaign_history_data:
- name: campaign_id
  description: Unique identifier for the campaign
- name: campaign_name
  description: Name of the campaign
snapshot_keyword_history_data:
- name: campaign_id
  description: Identifier for the advertising campaign
- name: keyword_id
  description: Unique identifier for the keyword
- name: keyword_state
  description: Current state of the keyword (e.g., enabled)
  domain: '[''enabled'']'
๐Ÿค” Reasoning for column selection: Based on the SQL approach and join description, we need to select specific columns from each table to fulfill the requirements. For the 'snapshot_keyword_history_data' table, we need 'campaign_id' for joining and grouping, 'keyword_id' for counting distinct keywords, and 'keyword_state' for filtering active keywords. From the 'snapshot_campaign_history_data' table, we need 'campaign_id' for joining and 'campaign_name' for including in the final output. These columns are necessary to perform the join, apply the filter condition, group the results, count the keywords, and include the campaign names in the output as described in the SQL approach.
GenAI
๐Ÿ˜Š We have written the SQL:
SELECT 
    k.campaign_id,
    c.campaign_name,
    COUNT(DISTINCT k.keyword_id) AS active_keyword_count
FROM 
    snapshot_keyword_history_data k
LEFT JOIN 
    snapshot_campaign_history_data c ON k.campaign_id = c.campaign_id
WHERE 
    k.keyword_state = 'enabled'
GROUP BY 
    k.campaign_id, c.campaign_name
ORDER BY 
    active_keyword_count DESC
GenAI
โœ… SQL runs successful! Here are the samples (first 5 rows):
campaign_id campaign_name active_keyword_count
0 2187 Gold Leader 3
You
๐Ÿซก Nailed it! Thanks a bunch!
โŸณ New chat results may be available. Refresh to check.