Before Cocoon
GenAI is overwhelmed by the messy source data ๐Ÿคฏ
It gets confused and starts making stuff up ๐Ÿคช
You
๐Ÿ™‚ What's the name of the account that has the most active ads?
GenAI
๐Ÿ˜ต Current RAG: Reading all the source data schema from databases... phew, it's a lot!
CREATE TABLE 'advertised_product_report_data' (
    ad_group_id BIGINT,
    ad_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    __advertised_asin DOUBLE,
    __advertised_sku DOUBLE
);

CREATE TABLE 'ad_group_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    default_bid DOUBLE,
    name VARCHAR,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'ad_group_level_report_data' (
    ad_group_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_bidding_strategy VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT
);

CREATE TABLE 'campaign_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    bidding_strategy VARCHAR,
    creation_date VARCHAR,
    budget BIGINT,
    end_date DOUBLE,
    name VARCHAR,
    portfolio_id BIGINT,
    profile_id BIGINT,
    serving_status VARCHAR,
    start_date VARCHAR,
    state VARCHAR,
    targeting_type VARCHAR,
    budget_type VARCHAR,
    effective_budget DOUBLE
);

CREATE TABLE 'campaign_level_report_data' (
    campaign_id BIGINT,
    date_ VARCHAR,
    _fivetran_synced VARCHAR,
    campaign_applicable_budget_rule_id DOUBLE,
    campaign_applicable_budget_rule_name DOUBLE,
    campaign_bidding_strategy VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions DOUBLE,
    campaign_rule_based_budget_amount DOUBLE
);

CREATE TABLE 'keyword_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    bid DOUBLE,
    campaign_id BIGINT,
    creation_date VARCHAR,
    keyword_text VARCHAR,
    match_type VARCHAR,
    native_language_keyword DOUBLE,
    serving_status VARCHAR,
    state VARCHAR,
    native_language_locale DOUBLE
);

CREATE TABLE 'portfolio_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    budget_amount BIGINT,
    budget_currency_code VARCHAR,
    budget_end_date VARCHAR,
    budget_policy VARCHAR,
    budget_start_date VARCHAR,
    creation_date VARCHAR,
    in_budget BOOLEAN,
    name VARCHAR,
    profile_id BIGINT,
    serving_status VARCHAR,
    state VARCHAR
);

CREATE TABLE 'product_ad_history_data' (
    id BIGINT,
    last_updated_date VARCHAR,
    _fivetran_synced VARCHAR,
    ad_group_id BIGINT,
    asin VARCHAR,
    campaign_id BIGINT,
    creation_date VARCHAR,
    serving_status VARCHAR,
    sku DOUBLE,
    state VARCHAR
);

CREATE TABLE 'profile_data' (
    id BIGINT,
    _fivetran_deleted BOOLEAN,
    _fivetran_synced VARCHAR,
    account_id VARCHAR,
    account_marketplace_string_id VARCHAR,
    account_name VARCHAR,
    account_sub_type VARCHAR,
    account_type VARCHAR,
    account_valid_payment_method BOOLEAN,
    country_code VARCHAR,
    currency_code VARCHAR,
    daily_budget BIGINT,
    timezone VARCHAR
);

CREATE TABLE 'search_term_ad_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    search_term VARCHAR,
    targeting VARCHAR
);

CREATE TABLE 'targeting_keyword_report_data' (
    ad_group_id BIGINT,
    campaign_id BIGINT,
    date_ VARCHAR,
    keyword_id BIGINT,
    _fivetran_synced VARCHAR,
    ad_keyword_status VARCHAR,
    campaign_budget_amount DOUBLE,
    campaign_budget_currency_code VARCHAR,
    campaign_budget_type VARCHAR,
    clicks BIGINT,
    cost DOUBLE,
    impressions BIGINT,
    keyword_bid DOUBLE,
    keyword_type VARCHAR,
    match_type VARCHAR,
    targeting VARCHAR
);
GenAI
๐Ÿคจ I've done my best with the SQL... fingers crossed it's right!
SELECT 
    p.account_name,
    COUNT(DISTINCT pa.id) AS active_ads_count
FROM 
    profile_data p
JOIN 
    campaign_history_data ch ON p.id = ch.profile_id
JOIN 
    ad_group_history_data ag ON ch.id = ag.campaign_id
JOIN 
    product_ad_history_data pa ON ag.id = pa.ad_group_id
WHERE 
    ch.state = 'ENABLED' 
    AND ag.state = 'ENABLED'
    AND pa.state = 'ENABLED'
    AND pa.serving_status = 'ELIGIBLE'
GROUP BY 
    p.account_name
ORDER BY 
    active_ads_count DESC
LIMIT 1
GenAI
๐Ÿคจ SQL runs successful! The result is ... empty??
You
๐Ÿ˜ฅ Oops, looks like you're making up those joins and predicates!
After Cocoon
Cocoon prepares your data, making it RAG-ready โœจ
GenAI finds just what it needs, no more wild guesses! ๐ŸŽฏ
You
๐Ÿ™‚ What's the name of the account that has the most active ads?
GenAI
๐Ÿ˜Ž RAG from Cocoon: Putting the question in context using Cocoon's ER story... nice!
- relation_desc: Marketers create Account Profiles and organize Portfolios within
    them.
  relation_name: ProfilePortfolioManagement
- relation_desc: Marketers assign Campaigns to Portfolios within Account Profiles.
  relation_name: CampaignPortfolioProfileHierarchy
- relation_desc: Advertisers create Ad Groups within each Campaign.
  relation_name: CampaignAdGroups
- relation_desc: Advertisers add Keywords to Ad Groups in Campaigns.
  relation_name: CampaignAdGroupKeywordHierarchy
- relation_desc: Advertisers create Product Ads within Ad Groups and Campaigns.
  relation_name: AdCampaignHierarchy
- relation_desc: System tracks daily performance of targeted Keywords.
  relation_name: TargetedKeywordDailyPerformance
- relation_desc: System records daily performance data for search terms.
  relation_name: SearchTermAdKeywordPerformance
- relation_desc: System measures daily performance of Ads, Groups, and Campaigns.
  relation_name: CampaignAdPerformanceMetrics
๐Ÿค“ We've found the related relations: ProfilePortfolioManagement, CampaignPortfolioProfileHierarchy, CampaignAdGroups, AdCampaignHierarchy, CampaignAdPerformanceMetrics

Story behind the relationships (only for those connecting >= 2 entities)

  1. [ProfilePortfolioManagement]: Marketers create Account Profiles and organize Portfolios within them.
  2. [CampaignPortfolioProfileHierarchy]: Marketers assign Campaigns to Portfolios within Account Profiles.
  3. [CampaignAdGroups]: Advertisers create Ad Groups within each Campaign.
  4. [CampaignAdGroupKeywordHierarchy]: Advertisers add Keywords to Ad Groups in Campaigns.
  5. [AdCampaignHierarchy]: Advertisers create Product Ads within Ad Groups and Campaigns.
  6. [TargetedKeywordDailyPerformance]: System tracks daily performance of targeted Keywords.
  7. [SearchTermAdKeywordPerformance]: System records daily performance data for search terms.
  8. [CampaignAdPerformanceMetrics]: System measures daily performance of Ads, Groups, and Campaigns.
%3 CampaignAdGroups CampaignAdGroups Ad Groups Ad Groups CampaignAdGroups--Ad Groups Campaigns Campaigns CampaignAdGroups--Campaigns AdCampaignHierarchy AdCampaignHierarchy AdCampaignHierarchy--Ad Groups AdCampaignHierarchy--Campaigns Product Ads Product Ads AdCampaignHierarchy--Product Ads CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy CampaignPortfolioProfileHierarchy--Campaigns Portfolios Portfolios CampaignPortfolioProfileHierarchy--Portfolios Account Profiles Account Profiles CampaignPortfolioProfileHierarchy--Account Profiles CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy CampaignAdGroupKeywordHierarchy--Ad Groups CampaignAdGroupKeywordHierarchy--Campaigns Keywords Keywords CampaignAdGroupKeywordHierarchy--Keywords ProfilePortfolioManagement ProfilePortfolioManagement ProfilePortfolioManagement--Portfolios ProfilePortfolioManagement--Account Profiles CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics CampaignAdPerformanceMetrics--Ad Groups CampaignAdPerformanceMetrics--Campaigns CampaignAdPerformanceMetrics--Product Ads SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance SearchTermAdKeywordPerformance--Ad Groups SearchTermAdKeywordPerformance--Campaigns SearchTermAdKeywordPerformance--Keywords TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance TargetedKeywordDailyPerformance--Ad Groups TargetedKeywordDailyPerformance--Campaigns TargetedKeywordDailyPerformance--Keywords
Reasoning: The question is asking for the account with the most active ads. In the story, this would relate to the Account Profiles that have the highest number of currently running or active advertisements. To determine this, we need to look at the relationships that connect accounts to ads and provide information about ad activity.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Checking out all the related tables Cocoon set up... cool stuff!
- table_name: stg_profile_data
  table_desc: The table is about account profiles. It contains details such as account
    ID, marketplace ID, name, type, and subtype. Financial information includes daily
    budget and currency. Location data covers country code and timezone. The table
    also tracks if the account has a valid payment method and if it has been deleted.
    Each profile has a unique identifier.
  attributes: '[''profile_id'', ''is_deleted'', ''account_id'', ''marketplace_id'',
    ''account_name'', ''account_subtype'', ''account_type'', ''has_valid_payment'',
    ''country_code'', ''currency_code'', ''timezone'', ''daily_budget'']'
- table_name: stg_campaign_level_report_data
  table_desc: The table is about advertising campaign performance. It contains daily
    data for each campaign, including campaign ID, date, budget details, bidding strategy,
    and performance metrics. Key fields are campaign_id, date_, campaign_budget_amount,
    campaign_budget_type, clicks, cost, and impressions. The data appears to be for
    a single campaign (ID 2187) over multiple dates, with consistent budget settings
    but no recorded performance (zero clicks, cost, and impressions).
  attributes: '[''campaign_id'', ''bidding_strategy'', ''daily_budget_amount'', ''budget_currency'',
    ''budget_type'', ''clicks'', ''cost'', ''budget_rule_id'', ''budget_rule_name'',
    ''date_'', ''impressions'']'
- table_name: stg_ad_group_level_report_data
  table_desc: The table is about ad group level performance data. It includes details
    such as ad group ID, date, campaign bidding strategy, clicks, cost, and impressions.
    Each row represents daily performance metrics for a specific ad group. The data
    spans multiple dates and shows various performance indicators for advertising
    campaigns.
  attributes: '[''ad_group_id'', ''bidding_strategy'', ''clicks'', ''cost'', ''impressions'',
    ''date_'']'
- table_name: stg_advertised_product_report_data
  table_desc: The table is about advertised product report data. It contains details
    of ad campaigns including ad group ID, ad ID, campaign ID, date, budget amount,
    budget currency, budget type, clicks, cost, impressions, and advertised ASIN/SKU.
    The data shows daily performance metrics for different ads across various campaigns,
    tracking their budget, spend, and engagement.
  attributes: '[''ad_group_id'', ''ad_id'', ''campaign_id'', ''daily_budget_amount'',
    ''budget_currency'', ''budget_type'', ''clicks'', ''ad_spend'', ''impressions'',
    ''report_date'']'
- table_name: snapshot_ad_group_history_data
  table_desc: The table is about current ad group data. It tracks the most recent
    version of each ad group. The table includes details such as the ad group's ID,
    campaign ID, name, default bid, serving status, and state. Each row represents
    the latest state of an ad group. The creation_date shows when the ad group was
    initially created. This table provides a current snapshot of ad groups without
    historical changes.
  attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
    ''serving_status'', ''ad_group_state'', ''creation_date'']'
  history_table:
    attributes: '[''ad_group_id'', ''campaign_id'', ''default_bid'', ''ad_group_name'',
      ''serving_status'', ''ad_group_state'', ''creation_date'', ''last_updated_date'']'
    table_name: stg_ad_group_history_data
- table_name: snapshot_product_ad_history_data
  table_desc: The table is about current product ad status. It tracks the most recent
    version of each ad's information. Each row represents the latest state of a unique
    ad, identified by its ad_id. The table includes details like ad group ID, ASIN,
    campaign ID, current serving status, and state. It provides a snapshot of the
    current ad configurations without historical changes.
  attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
    ''ad_state'', ''creation_timestamp'', ''product_sku'']'
  history_table:
    attributes: '[''ad_id'', ''ad_group_id'', ''product_asin'', ''campaign_id'', ''ad_serving_status'',
      ''ad_state'', ''creation_timestamp'', ''last_updated_timestamp'', ''product_sku'']'
    table_name: stg_product_ad_history_data
- table_name: snapshot_campaign_history_data
  table_desc: The table is about current campaign data. It tracks the most recent
    version of each campaign, including its bidding strategy, name, portfolio, profile,
    status, state, targeting type, budget details, and relevant dates. Each row represents
    the latest snapshot of a unique campaign, providing an up-to-date view of campaign
    configurations without historical versions.
  attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
    ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'', ''budget_type'',
    ''creation_datetime'', ''daily_budget'', ''end_date'', ''start_date'']'
  history_table:
    attributes: '[''campaign_id'', ''bidding_strategy'', ''campaign_name'', ''portfolio_id'',
      ''profile_id'', ''serving_status'', ''campaign_state'', ''targeting_type'',
      ''budget_type'', ''creation_datetime'', ''daily_budget'', ''end_date'', ''last_updated_datetime'',
      ''start_date'']'
    table_name: stg_campaign_history_data
- table_name: snapshot_portfolio_history_data
  table_desc: The table is about current portfolio data. It tracks the most recent
    version of each portfolio, including budget amounts, currencies, policies, and
    statuses. Key details include portfolio name, profile ID, serving status, state,
    and date ranges. The table provides a snapshot of active portfolios, their current
    budget allocations, and operational states without historical versions.
  attributes: '[''portfolio_id'', ''budget_amount'', ''currency'', ''budget_policy'',
    ''is_within_budget'', ''portfolio_name'', ''profile_id'', ''serving_status'',
    ''portfolio_state'', ''creation_timestamp'', ''end_date'', ''start_date'']'
  history_table:
    attributes: '[''portfolio_id'', ''budget_amount'', ''currency'', ''budget_policy'',
      ''is_within_budget'', ''portfolio_name'', ''profile_id'', ''serving_status'',
      ''portfolio_state'', ''creation_timestamp'', ''end_date'', ''last_updated_timestamp'',
      ''start_date'']'
    table_name: stg_portfolio_history_data
๐Ÿค“ We've identified the related tables: snapshot_product_ad_history_data, snapshot_campaign_history_data, stg_profile_data
%3 snapshot_portfolio_history_data snapshot_portfolio_history_data stg_profile_data stg_profile_data snapshot_portfolio_history_data->stg_profile_data snapshot_campaign_history_data snapshot_campaign_history_data snapshot_campaign_history_data->snapshot_portfolio_history_data snapshot_campaign_history_data->stg_profile_data stg_advertised_product_report_data stg_advertised_product_report_data stg_advertised_product_report_data->snapshot_campaign_history_data snapshot_product_ad_history_data snapshot_product_ad_history_data stg_advertised_product_report_data->snapshot_product_ad_history_data snapshot_ad_group_history_data snapshot_ad_group_history_data stg_advertised_product_report_data->snapshot_ad_group_history_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data stg_targeting_keyword_report_data->snapshot_campaign_history_data snapshot_keyword_history_data snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_keyword_history_data stg_targeting_keyword_report_data->snapshot_ad_group_history_data stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data stg_search_term_ad_keyword_report_data->snapshot_campaign_history_data stg_search_term_ad_keyword_report_data->snapshot_keyword_history_data stg_search_term_ad_keyword_report_data->snapshot_ad_group_history_data stg_ad_group_level_report_data stg_ad_group_level_report_data stg_ad_group_level_report_data->snapshot_ad_group_history_data snapshot_keyword_history_data->snapshot_campaign_history_data snapshot_keyword_history_data->snapshot_ad_group_history_data snapshot_product_ad_history_data->snapshot_campaign_history_data snapshot_product_ad_history_data->snapshot_ad_group_history_data stg_campaign_level_report_data stg_campaign_level_report_data stg_campaign_level_report_data->snapshot_campaign_history_data snapshot_ad_group_history_data->snapshot_campaign_history_data
๐Ÿง  Explanation: The available data is sufficient to answer the business question "What's the name of the account that has the most active ads?". We can approach this by joining relevant tables and counting active ads for each account. Here's how we can do it: 1. We need to identify active ads. The 'snapshot_product_ad_history_data' table contains information about ads, including their status. 2. We need to link ads to accounts. This can be done by joining through campaigns and profiles to the 'stg_profile_data' table which contains account information. 3. We'll count active ads for each account and find the account with the highest count. 4. We'll use the most recent data, so we'll use snapshot tables rather than their historical counterparts.
๐Ÿ’ก Instruction to write SQL: 1. Start with 'snapshot_product_ad_history_data' and filter for active ads (where ad_state = 'enabled' or similar). 2. Join this with 'snapshot_campaign_history_data' using campaign_id to get the profile_id. 3. Join the result with 'stg_profile_data' using profile_id to get account information. 4. Group by account_id and account_name, count the number of ads. 5. Order by the count descending and limit to 1 to get the account with the most active ads. The SQL query might look like this: SELECT p.account_id, p.account_name, COUNT(DISTINCT pa.ad_id) as active_ad_count FROM snapshot_product_ad_history_data pa JOIN snapshot_campaign_history_data c ON pa.campaign_id = c.campaign_id JOIN stg_profile_data p ON c.profile_id = p.profile_id WHERE pa.ad_state = 'enabled' GROUP BY p.account_id, p.account_name ORDER BY active_ad_count DESC LIMIT 1;
GenAI
๐Ÿ˜Ž RAG from Cocoon: Looking at how tables connect, thanks to Cocoon's key info... clever!
snapshot_campaign_history_data:
  foreign_keys:
  - column: portfolio_id
    reference:
      column: portfolio_id
      table_name: snapshot_portfolio_history_data
  - column: profile_id
    reference:
      column: profile_id
      table_name: stg_profile_data
  primary_key: campaign_id
snapshot_product_ad_history_data:
  foreign_keys:
  - column: ad_group_id
    reference:
      column: ad_group_id
      table_name: snapshot_ad_group_history_data
  - column: campaign_id
    reference:
      column: campaign_id
      table_name: snapshot_campaign_history_data
  primary_key: ad_id
stg_profile_data:
  foreign_keys: []
  primary_key: profile_id
๐Ÿค“ We've planned the join: To execute the SQL, two joins are needed. For the first join, the join keys are campaign_id between 'snapshot_product_ad_history_data' and 'snapshot_campaign_history_data'. It is an inner join. For the second join, the join keys are profile_id between the result of the first join and 'stg_profile_data'. It is also an inner join. Both joins are likely many-to-one relationships, as multiple ads can belong to a campaign, and multiple campaigns can belong to a profile. This shouldn't cause issues with the count distinct operation on ad_id.
GenAI
๐Ÿ˜Ž RAG from Cocoon: Diving into column details Cocoon prepared... getting a bit technical here!
snapshot_campaign_history_data:
- name: campaign_id
  description: Unique identifier for the campaign
- name: profile_id
  description: ID of the profile associated with the campaign
snapshot_product_ad_history_data:
- name: ad_id
  description: Unique identifier for the ad
- name: campaign_id
  description: Identifier for the advertising campaign
- name: ad_state
  description: Current state of the ad (enabled or disabled)
  domain: '[''enabled'', ''disabled'']'
stg_profile_data:
- name: profile_id
  description: Unique identifier for the profile
- name: account_id
  description: Unique identifier for the account
- name: account_name
  description: Name of the account
  domain: '[''Ahsoka'', ''Amidala'', ''Chewbacca'', ''Kenobi'', ''Organa'', ''Palpatine'',
    ''Skywalker'', ''Solo'', ''Windu'', ''Yoda'']'
๐Ÿค” Reasoning for column selection: Based on the SQL approach and join description, we need to select specific columns from each table to fulfill the requirements: 1. From 'snapshot_product_ad_history_data', we need 'ad_id' for the count distinct operation, 'campaign_id' for joining with 'snapshot_campaign_history_data', and 'ad_state' for filtering active ads. 2. From 'snapshot_campaign_history_data', we need 'campaign_id' to join with 'snapshot_product_ad_history_data' and 'profile_id' to join with 'stg_profile_data'. 3. From 'stg_profile_data', we need 'profile_id' for joining, and 'account_id' and 'account_name' for the final output grouping and display. These columns cover all the necessary join keys, filtering conditions, and output requirements specified in the SQL approach.
GenAI
๐Ÿ˜Š We have written the SQL:
SELECT 
    p.account_id,
    p.account_name,
    COUNT(DISTINCT pa.ad_id) as active_ad_count
FROM 
    snapshot_product_ad_history_data pa
INNER JOIN 
    snapshot_campaign_history_data c ON pa.campaign_id = c.campaign_id
INNER JOIN 
    stg_profile_data p ON c.profile_id = p.profile_id
WHERE 
    pa.ad_state = 'enabled'
GROUP BY 
    p.account_id, p.account_name
ORDER BY 
    active_ad_count DESC
LIMIT 1;
GenAI
โœ… SQL runs successful! Here are the samples (first 5 rows):
account_id account_name active_ad_count
0 ENTITY31175 Skywalker 1
You
๐Ÿซก Nailed it! Thanks a bunch!
โŸณ New chat results may be available. Refresh to check.