Fivetran Apple Store

We display the source tables from the data warehouses to model.

app (first 100 rows)

id is_enabled name asset_token pre_order_info icon_url app_opt_in_rate ios tvos is_bundle _fivetran_synced
0 12345 True Super Cool Name Random Asset Token NaN Random Icon URL 10 True False False 2022-01-02 12:34:56.789000+00:00

app_store_platform_version_source_type (first 100 rows)

app_id date_ platform_version source_type meets_threshold impressions _fivetran_synced impressions_unique_device page_views page_views_unique_device
0 12345 2021-08-01 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
1 12345 2021-07-01 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
2 12345 2021-08-02 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
3 12345 2021-08-03 00:00:00+00:00 iOS 1.0 App Referrer True 30 2022-01-02 12:34:56.789000+00:00 28 30 28
4 12345 2021-08-04 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
5 12345 2021-04-18 00:00:00+00:00 iOS 1.0 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
6 12345 2021-04-19 00:00:00+00:00 iOS 1.0 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
7 12345 2020-12-04 00:00:00+00:00 iOS 1.0 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
8 12345 2021-01-31 00:00:00+00:00 iOS 1.0 Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
9 12345 2020-12-25 00:00:00+00:00 iOS 1.0 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0

app_store_source_type_device (first 100 rows)

app_id date_ device source_type meets_threshold impressions _fivetran_synced impressions_unique_device page_views page_views_unique_device
0 12345 2021-10-03 00:00:00+00:00 iPhone Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
1 12345 2021-08-23 00:00:00+00:00 iPad App Store Browse True 8 2022-01-02 12:34:56.789000+00:00 5 5 4
2 12345 2021-10-06 00:00:00+00:00 iPhone App Store Search True 1210 2022-01-02 12:34:56.789000+00:00 732 146 103
3 12345 2021-03-17 00:00:00+00:00 iPhone App Store Search True 1757 2022-01-02 12:34:56.789000+00:00 1113 209 151
4 12345 2021-02-24 00:00:00+00:00 Desktop Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
5 12345 2021-06-06 00:00:00+00:00 iPad Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
6 12345 2020-11-15 00:00:00+00:00 iPad Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
7 12345 2021-06-24 00:00:00+00:00 iPad Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
8 12345 2021-03-17 00:00:00+00:00 Desktop App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
9 12345 2021-07-12 00:00:00+00:00 Desktop Institutional Purchase True 0 2022-01-02 12:34:56.789000+00:00 0 0 0

app_store_territory_source_type (first 100 rows)

app_id date_ source_type territory meets_threshold impressions _fivetran_synced impressions_unique_device page_views page_views_unique_device
0 12345 2021-09-17 00:00:00+00:00 App Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
1 12345 2020-12-29 00:00:00+00:00 App Store Search Canada True 1 2022-01-02 12:34:56.789000+00:00 1 0 0
2 12345 2021-01-26 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
3 12345 2020-12-04 00:00:00+00:00 App Store Browse Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
4 12345 2021-06-09 00:00:00+00:00 Web Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
5 12345 2021-10-05 00:00:00+00:00 App Store Search Canada True 47 2022-01-02 12:34:56.789000+00:00 30 5 4
6 12345 2021-05-21 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
7 12345 2021-05-09 00:00:00+00:00 App Store Browse Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
8 12345 2021-05-18 00:00:00+00:00 App Store Search Canada True 3 2022-01-02 12:34:56.789000+00:00 2 1 1
9 12345 2021-10-21 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
10 12345 2021-10-23 00:00:00+00:00 App Store Search Kosovo True 0 2022-01-02 12:34:56.789000+00:00 0 0 0
11 12345 2021-11-24 00:00:00+00:00 App Store Search Kosovo True 0 2022-01-02 12:34:56.789000+00:00 0 0 3
12 12345 2021-10-24 00:00:00+00:00 App Store Search Côte d'Ivoire True 0 2022-01-02 12:34:56.789000+00:00 0 0 3
13 12345 2021-10-26 00:00:00+00:00 App Store Search Cote d'Ivoire True 0 2022-01-02 12:34:56.789000+00:00 0 0 3
14 12345 2021-11-26 00:00:00+00:00 App Store Search Cote d'Ivoire True 0 2022-01-02 12:34:56.789000+00:00 0 0 3
15 12345 2021-11-26 00:00:00+00:00 App Store Search Turkey True 0 2022-01-02 12:34:56.789000+00:00 0 0 3
16 12345 2021-11-26 00:00:00+00:00 App Store Search Türkiye True 0 2022-01-02 12:34:56.789000+00:00 0 0 3

crashes_app_version (first 100 rows)

app_id app_version date_ device meets_threshold crashes _fivetran_synced
0 12345 1.0.0 (iOS) 2021-01-22 00:00:00+00:00 iPod True 0 2022-01-02 12:34:56.789000+00:00
1 12345 1.0.0 (iOS) 2021-01-07 00:00:00+00:00 iPod True 0 2022-01-02 12:34:56.789000+00:00
2 12345 1.0.0 (iOS) 2020-11-15 00:00:00+00:00 iPod True 0 2022-01-02 12:34:56.789000+00:00
3 12345 1.0.0 (iOS) 2021-06-18 00:00:00+00:00 iPhone True 0 2022-01-02 12:34:56.789000+00:00
4 12345 1.0.0 (iOS) 2021-07-30 00:00:00+00:00 Desktop True 0 2022-01-02 12:34:56.789000+00:00
5 12345 1.0.0 (iOS) 2021-05-02 00:00:00+00:00 iPad True 0 2022-01-02 12:34:56.789000+00:00
6 12345 1.0.0 (iOS) 2021-06-19 00:00:00+00:00 iPhone True 0 2022-01-02 12:34:56.789000+00:00
7 12345 1.0.0 (iOS) 2020-12-05 00:00:00+00:00 Desktop True 0 2022-01-02 12:34:56.789000+00:00
8 12345 1.0.0 (iOS) 2020-12-21 00:00:00+00:00 iPod True 0 2022-01-02 12:34:56.789000+00:00
9 12345 1.0.0 (iOS) 2021-04-12 00:00:00+00:00 iPod True 0 2022-01-02 12:34:56.789000+00:00

crashes_platform_version (first 100 rows)

app_id date_ device platform_version meets_threshold crashes _fivetran_synced
0 12345 2021-05-10 00:00:00+00:00 iPad iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
1 12345 2021-05-13 00:00:00+00:00 iPad iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
2 12345 2021-05-12 00:00:00+00:00 iPod iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
3 12345 2021-05-11 00:00:00+00:00 iPod iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
4 12345 2021-07-25 00:00:00+00:00 iPhone iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
5 12345 2021-09-11 00:00:00+00:00 iPad iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
6 12345 2021-04-10 00:00:00+00:00 iPhone iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
7 12345 2021-04-11 00:00:00+00:00 iPhone iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
8 12345 2021-01-03 00:00:00+00:00 iPhone iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00
9 12345 2021-05-28 00:00:00+00:00 iPad iOS 1.0 True 0 2022-01-02 12:34:56.789000+00:00

downloads_platform_version_source_type (first 100 rows)

app_id date_ platform_version source_type meets_threshold first_time_downloads _fivetran_synced redownloads total_downloads
0 12345 2021-04-05 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0
1 12345 2021-05-27 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
2 12345 2021-04-06 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0
3 12345 2021-05-28 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
4 12345 2021-04-27 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0
5 12345 2021-05-11 00:00:00+00:00 iOS 1.0 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0
6 12345 2021-05-29 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
7 12345 2021-04-26 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0
8 12345 2021-06-17 00:00:00+00:00 iOS 1.0 Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
9 12345 2021-05-30 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0

downloads_source_type_device (first 100 rows)

app_id date_ device source_type meets_threshold first_time_downloads _fivetran_synced redownloads total_downloads
0 12345 2021-09-04 00:00:00+00:00 iPhone Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
1 12345 2022-01-07 00:00:00+00:00 iPad App Store Browse True 1 2022-01-02 12:34:56.789000+00:00 7 8
2 12345 2021-07-14 00:00:00+00:00 iPhone App Store Search True 153 2022-01-02 12:34:56.789000+00:00 0 153
3 12345 2021-05-01 00:00:00+00:00 iPhone App Store Browse True 1 2022-01-02 12:34:56.789000+00:00 0 1
4 12345 2021-09-15 00:00:00+00:00 Desktop Institutional Purchase True 0 2022-01-02 12:34:56.789000+00:00 0 0
5 12345 2021-08-31 00:00:00+00:00 iPod App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0
6 12345 2021-06-18 00:00:00+00:00 iPod App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0
7 12345 2021-04-02 00:00:00+00:00 iPod Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
8 12345 2022-03-08 00:00:00+00:00 Desktop Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0
9 12345 2021-09-02 00:00:00+00:00 iPhone App Referrer True 5 2022-01-02 12:34:56.789000+00:00 3 8

downloads_territory_source_type (first 100 rows)

app_id date_ source_type territory meets_threshold first_time_downloads _fivetran_synced redownloads total_downloads
0 12345 2021-10-25 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
1 12345 2021-05-08 00:00:00+00:00 Web Referrer Canada True 1 2022-01-02 12:34:56.789000+00:00 0 1
2 12345 2021-09-30 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
3 12345 2021-11-11 00:00:00+00:00 App Store Search Canada True 1 2022-01-02 12:34:56.789000+00:00 0 1
4 12345 2022-01-10 00:00:00+00:00 Web Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
5 12345 2021-06-18 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
6 12345 2021-11-27 00:00:00+00:00 Web Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
7 12345 2021-07-15 00:00:00+00:00 App Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
8 12345 2021-04-07 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0
9 12345 2021-06-05 00:00:00+00:00 App Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0

sales_account (first 100 rows)

id name _fivetran_synced
0 12345 Super Cool Name 2022-01-02 12:34:56.789000+00:00

sales_subscription_events (first 100 rows)

_filename account_number vendor_number _index event_date app_name days_canceled subscription_name consecutive_paid_periods previous_subscription_name cancellation_reason proceeds_reason subscription_apple_id standard_subscription_duration original_start_date device days_before_canceling quantity marketing_opt_in_duration promotional_offer_name state previous_subscription_apple_id event subscription_group_id country promotional_offer_id app_apple_id _fivetran_synced subscription_offer_type subscription_offer_duration
0 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 9 2021-04-25 Super Cool Name Super Cool Name 3 NaN None 12345 1 Month 2021-02-25 iPhone 1 NJ Renew 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
1 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 25 2021-04-25 Super Cool Name Super Cool Name 14 NaN Rate After One Year 12345 1 Month 2020-03-25 iPhone 1 FL Renew 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
2 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 41 2021-04-25 Super Cool Name Super Cool Name 1 NaN Canceled None 12345 1 Year 2020-04-25 iPhone 176 1 CA Cancel 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
3 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 57 2021-04-25 Super Cool Name Super Cool Name 2 NaN None 12345 1 Month 2021-03-25 iPhone 1 CA Renew 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
4 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 73 2021-04-25 Super Cool Name Super Cool Name 3 NaN Rate After One Year 12345 1 Year 2019-04-25 iPhone 1 Renew 12345 GB 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
5 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 89 2021-04-25 Super Cool Name Super Cool Name 13 NaN Rate After One Year 12345 1 Month 2020-04-24 iPhone 1 ON Renew 12345 CA 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
6 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 105 2021-04-25 Super Cool Name Super Cool Name 1 NaN Canceled None 12345 1 Year 2020-04-25 iPad 70 1 CA Cancel 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
7 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 121 2021-04-25 Super Cool Name Super Cool Name 13 NaN Rate After One Year 12345 1 Month 2020-04-25 iPad 1 Renew 12345 AT 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
8 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 137 2021-04-25 Super Cool Name Super Cool Name 1 NaN None 12345 1 Month 2021-04-25 iPhone 1 MO Subscribe 12345 US 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN
9 Subscription_Event_12345678_20220102_V1_2.txt.gz 12345 12345 153 2021-04-25 Super Cool Name Super Cool Name 3 NaN None 12345 1 Month 2021-02-25 iPhone 1 KA Renew 12345 IN 12345 2022-01-02 12:34:56.789000+00:00 NaN NaN

sales_subscription_summary (first 100 rows)

_filename account_number vendor_number _index developer_proceeds app_name free_trial_promotional_offer_subscriptions proceeds_currency subscription_name pay_as_you_go_promotional_offer_subscriptions customer_currency marketing_opt_ins pay_up_front_promotional_offer_subscriptions billing_retry proceeds_reason subscription_apple_id active_standard_price_subscriptions standard_subscription_duration grace_period device active_pay_up_front_introductory_offer_subscriptions customer_price promotional_offer_name state active_pay_as_you_go_introductory_offer_subscriptions subscription_group_id country active_free_trial_introductory_offer_subscriptions promotional_offer_id app_apple_id _fivetran_synced
0 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 9 24.72 Super Cool Name 0 EUR Super Cool Name 0 EUR 0 0 0 None 12345 30 1 Year 0 iPhone 0 42.99 0 12345 FR 0 12345 2022-01-02 12:34:56.789000+00:00
1 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 25 24.27 Super Cool Name 0 EUR Super Cool Name 0 EUR 0 0 0 None 12345 2 1 Year 0 iPhone 0 42.99 0 12345 FI 0 12345 2022-01-02 12:34:56.789000+00:00
2 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 41 24.20 Super Cool Name 0 Super Cool Name 0 EUR 0 0 0 None 12345 1 1 Year 0 iPhone 0 42.99 MI 0 12345 IT 0 12345 2022-01-02 12:34:56.789000+00:00
3 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 57 3.50 Super Cool Name 0 USD Super Cool Name 0 USD 0 0 0 None 12345 5 1 Month 0 iPhone 0 4.99 MS 0 12345 US 0 12345 2022-01-02 12:34:56.789000+00:00
4 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 73 3.50 Super Cool Name 0 Super Cool Name 0 USD 0 0 2 None 12345 7 1 Month 0 iPhone 0 4.99 NJ 0 12345 US 0 12345 2022-01-02 12:34:56.789000+00:00
5 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 89 31182.00 Super Cool Name 0 KRW Super Cool Name 0 KRW 0 0 0 None 12345 1 1 Year 0 iPhone 0 49000.00 부산 0 12345 KR 0 12345 2022-01-02 12:34:56.789000+00:00
6 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 105 3.50 Super Cool Name 0 Super Cool Name 0 USD 0 0 0 None 12345 1 1 Month 0 iPhone 0 4.99 AA 0 12345 US 0 12345 2022-01-02 12:34:56.789000+00:00
7 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 121 33.99 Super Cool Name 0 Super Cool Name 0 USD 0 0 0 Rate After One Year 12345 6 1 Year 0 iPhone 0 39.99 DC 0 12345 US 0 12345 2022-01-02 12:34:56.789000+00:00
8 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 137 3.50 Super Cool Name 0 Super Cool Name 0 USD 0 0 0 None 12345 6 1 Month 0 iPhone 0 4.99 AZ 0 12345 US 0 12345 2022-01-02 12:34:56.789000+00:00
9 Subscription_12345678_20210712_V1_2.txt.gz 12345 12345 153 29.67 Super Cool Name 0 EUR Super Cool Name 0 EUR 0 0 0 Rate After One Year 12345 1 1 Year 0 iPhone 0 42.99 PO 0 12345 IT 0 12345 2022-01-02 12:34:56.789000+00:00

usage_app_version_source_type (first 100 rows)

app_id app_version date_ source_type meets_threshold installations _fivetran_synced sessions active_devices active_devices_last_30_days deletions
0 12345 1.0.0 (iOS) 2021-03-20 00:00:00+00:00 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 1 0
1 12345 1.0.0 (iOS) 2021-08-08 00:00:00+00:00 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
2 12345 1.0.0 (iOS) 2021-09-25 00:00:00+00:00 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
3 12345 1.0.0 (iOS) 2021-03-26 00:00:00+00:00 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
4 12345 1.0.0 (iOS) 2020-11-03 00:00:00+00:00 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
5 12345 1.0.0 (iOS) 2021-07-05 00:00:00+00:00 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
6 12345 1.0.0 (iOS) 2021-04-01 00:00:00+00:00 Institutional Purchase True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
7 12345 1.0.0 (iOS) 2021-07-10 00:00:00+00:00 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
8 12345 1.0.0 (iOS) 2021-06-08 00:00:00+00:00 Unavailable True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
9 12345 1.0.0 (iOS) 2021-10-20 00:00:00+00:00 Institutional Purchase True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0

usage_platform_version_source_type (first 100 rows)

app_id date_ platform_version source_type meets_threshold installations _fivetran_synced sessions active_devices active_devices_last_30_days deletions
0 12345 2021-06-10 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
1 12345 2021-08-02 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
2 12345 2021-08-01 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
3 12345 2021-08-03 00:00:00+00:00 iOS 1.0 App Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
4 12345 2021-04-18 00:00:00+00:00 iOS 1.0 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
5 12345 2021-04-17 00:00:00+00:00 iOS 1.0 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
6 12345 2021-04-19 00:00:00+00:00 iOS 1.0 App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 2 0
7 12345 2021-07-01 00:00:00+00:00 iOS 1.0 App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 1 0
8 12345 2021-01-31 00:00:00+00:00 iOS 1.0 Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
9 12345 2021-01-30 00:00:00+00:00 iOS 1.0 Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 14 1

usage_source_type_device (first 100 rows)

app_id date_ device source_type meets_threshold installations _fivetran_synced sessions active_devices active_devices_last_30_days deletions
0 12345 2020-11-27 00:00:00+00:00 iPhone App Store Browse True 6 2022-01-02 12:34:56.789000+00:00 58 34 406 3
1 12345 2021-01-04 00:00:00+00:00 Desktop App Store Search True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
2 12345 2021-05-18 00:00:00+00:00 iPad App Referrer True 1 2022-01-02 12:34:56.789000+00:00 8 5 60 0
3 12345 2021-09-10 00:00:00+00:00 iPod App Store Browse True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
4 12345 2021-02-19 00:00:00+00:00 iPad App Store Browse True 1 2022-01-02 12:34:56.789000+00:00 10 3 55 1
5 12345 2021-07-13 00:00:00+00:00 iPod Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 1 0
6 12345 2020-12-22 00:00:00+00:00 iPod Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 1 0
7 12345 2021-07-31 00:00:00+00:00 iPod Web Referrer True 0 2022-01-02 12:34:56.789000+00:00 0 0 1 0
8 12345 2021-01-09 00:00:00+00:00 iPad Web Referrer True 6 2022-01-02 12:34:56.789000+00:00 41 21 273 1
9 12345 2021-02-25 00:00:00+00:00 iPhone App Store Browse True 3 2022-01-02 12:34:56.789000+00:00 65 27 374 5

usage_territory_source_type (first 100 rows)

app_id date_ source_type territory meets_threshold installations _fivetran_synced sessions active_devices active_devices_last_30_days deletions
0 12345 2021-05-21 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 4 3 44 2
1 12345 2020-12-28 00:00:00+00:00 Unavailable Canada True 1 2022-01-02 12:34:56.789000+00:00 0 0 2 0
2 12345 2021-06-07 00:00:00+00:00 Web Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
3 12345 2021-09-22 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 10 5 42 2
4 12345 2021-07-02 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 1 1 3 0
5 12345 2021-03-16 00:00:00+00:00 App Store Browse Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 2 0
6 12345 2021-09-30 00:00:00+00:00 Web Referrer Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
7 12345 2020-11-23 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 3 1 2 0
8 12345 2021-02-13 00:00:00+00:00 Unavailable Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
9 12345 2021-09-09 00:00:00+00:00 App Store Search Canada True 0 2022-01-02 12:34:56.789000+00:00 0 0 0 0
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_downloads_source_type_device (first 100 rows)

device_type download_source threshold_met new_downloads redownloads total_downloads app_id timestamp_
0 iPhone Web Referrer True 0 0 0 12345 2021-09-04
1 iPad App Store Browse True 1 7 8 12345 2022-01-07
2 iPhone App Store Search True 153 0 153 12345 2021-07-14
3 iPhone App Store Browse True 1 0 1 12345 2021-05-01
4 Desktop Institutional Purchase True 0 0 0 12345 2021-09-15
5 iPod App Store Search True 0 0 0 12345 2021-08-31
6 iPod App Store Browse True 0 0 0 12345 2021-06-18
7 iPod Web Referrer True 0 0 0 12345 2021-04-02
8 Desktop Web Referrer True 0 0 0 12345 2022-03-08
9 iPhone App Referrer True 5 3 8 12345 2021-09-02

stg_downloads_source_type_device.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"downloads_source_type_device_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "device",
        "source_type",
        "meets_threshold",
        "first_time_downloads",
        "redownloads",
        "total_downloads"
    FROM "downloads_source_type_device"
),

"downloads_source_type_device_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> timestamp_
    -- device -> device_type
    -- source_type -> download_source
    -- meets_threshold -> threshold_met
    -- first_time_downloads -> new_downloads
    SELECT 
        app_id,
        date_ AS timestamp_,
        device AS device_type,
        source_type AS download_source,
        meets_threshold AS threshold_met,
        first_time_downloads AS new_downloads,
        redownloads,
        total_downloads
    FROM downloads_source_type_device_projected
),

"downloads_source_type_device_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- timestamp_: from VARCHAR to TIMESTAMP
    SELECT
        "device_type",
        "download_source",
        "threshold_met",
        "new_downloads",
        "redownloads",
        "total_downloads",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("timestamp_" AS TIMESTAMP) AS "timestamp_"
    FROM "downloads_source_type_device_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "downloads_source_type_device_projected_renamed_casted"

stg_downloads_source_type_device.yml (Document the table)

version: 2
models:
- name: stg_downloads_source_type_device
  description: The table is about app downloads statistics. It includes details like
    app ID, date, device type, download source, whether it meets a threshold, number
    of first-time downloads, redownloads, and total downloads. The data is broken
    down by different source types (e.g., App Store Browse, Web Referrer) and devices
    (e.g., iPhone, iPad).
  columns:
  - name: device_type
    description: Type of device used for download
    tests:
    - not_null
    - accepted_values:
        values:
        - iPhone
        - iPod
        - Desktop
        - iPad
        - Android Phone
        - Android Tablet
        - Windows Phone
        - Smart TV
        - Gaming Console
        - eReader
        - Smartwatch
        - Other
  - name: download_source
    description: Source or channel of the download
    tests:
    - not_null
    - accepted_values:
        values:
        - App Store Browse
        - Web Referrer
        - App Store Search
        - App Referrer
        - Institutional Purchase
        - Google Play Store Browse
        - Google Play Store Search
        - App Store Update
        - Google Play Store Update
        - Direct Link
        - Email Campaign
        - Social Media
        - In-App Advertisement
        - Push Notification
        - App Store Featured
        - Google Play Store Featured
        - Third-Party App Store
        - Pre-installed
        - QR Code Scan
        - SMS Link
  - name: threshold_met
    description: Indicates if download count meets a certain threshold
    tests:
    - not_null
  - name: new_downloads
    description: Number of first-time downloads
    tests:
    - not_null
  - name: redownloads
    description: Number of redownloads by existing users
    tests:
    - not_null
  - name: total_downloads
    description: Total number of downloads including first-time and redownloads
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: timestamp_
    description: Date and time of the download statistics
    tests:
    - not_null

stg_downloads_platform_version_source_type (first 100 rows)

download_source first_time_downloads redownloads total_downloads platform_version meets_threshold app_id statistics_date
0 App Store Search 0 0 0 iOS 1.0 True 12345 2021-04-05
1 App Referrer 0 0 0 iOS 1.0 True 12345 2021-05-27
2 App Store Search 0 0 0 iOS 1.0 True 12345 2021-04-06
3 App Referrer 0 0 0 iOS 1.0 True 12345 2021-05-28
4 App Store Search 0 0 0 iOS 1.0 True 12345 2021-04-27
5 None 0 0 0 iOS 1.0 True 12345 2021-05-11
6 App Referrer 0 0 0 iOS 1.0 True 12345 2021-05-29
7 App Store Search 0 0 0 iOS 1.0 True 12345 2021-04-26
8 Web Referrer 0 0 0 iOS 1.0 True 12345 2021-06-17
9 App Referrer 0 0 0 iOS 1.0 True 12345 2021-05-30

stg_downloads_platform_version_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"downloads_platform_version_source_type_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "platform_version",
        "source_type",
        "meets_threshold",
        "first_time_downloads",
        "redownloads",
        "total_downloads"
    FROM "downloads_platform_version_source_type"
),

"downloads_platform_version_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> statistics_date
    -- source_type -> download_source
    SELECT 
        app_id,
        date_ AS statistics_date,
        platform_version,
        source_type AS download_source,
        meets_threshold,
        first_time_downloads,
        redownloads,
        total_downloads
    FROM downloads_platform_version_source_type_projected
),

"downloads_platform_version_source_type_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- download_source: ['Unavailable']
    SELECT 
        CASE
            WHEN "download_source" = 'Unavailable' THEN NULL
            ELSE "download_source"
        END AS "download_source",
        "statistics_date",
        "first_time_downloads",
        "redownloads",
        "app_id",
        "total_downloads",
        "platform_version",
        "meets_threshold"
    FROM "downloads_platform_version_source_type_projected_renamed"
),

"downloads_platform_version_source_type_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- statistics_date: from VARCHAR to TIMESTAMP
    SELECT
        "download_source",
        "first_time_downloads",
        "redownloads",
        "total_downloads",
        "platform_version",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("statistics_date" AS TIMESTAMP) AS "statistics_date"
    FROM "downloads_platform_version_source_type_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "downloads_platform_version_source_type_projected_renamed_null_casted"

stg_downloads_platform_version_source_type.yml (Document the table)

version: 2
models:
- name: stg_downloads_platform_version_source_type
  description: The table is about app download statistics. It includes details like
    app ID, date, platform version, source type, and download counts. Each row represents
    download data for a specific app on a particular date, platform version, and source
    type. The table tracks first-time downloads, redownloads, and total downloads,
    along with whether the data meets a certain threshold.
  columns:
  - name: download_source
    description: Source of the app download
    tests:
    - not_null
    - accepted_values:
        values:
        - App Referrer
        - App Store Search
        - Web Referrer
        - Google Play Store
        - Apple App Store
        - Direct Download
        - In-App Advertisement
        - Social Media Link
        - Email Campaign
        - QR Code
        - SMS Link
        - Third-Party App Store
        - Pre-installed
        - Developer Website
        - Affiliate Link
  - name: first_time_downloads
    description: Number of first-time downloads for the app
    tests:
    - not_null
  - name: redownloads
    description: Number of times the app was redownloaded
    tests:
    - not_null
  - name: total_downloads
    description: Total number of downloads including first-time and redownloads
    tests:
    - not_null
  - name: platform_version
    description: Version of the operating system platform
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if the data meets a certain threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: statistics_date
    description: Date of the download statistics
    tests:
    - not_null

stg_app_store_territory_source_type (first 100 rows)

discovery_source unique_device_impressions page_views unique_device_page_views impressions territory meets_threshold app_id metric_date
0 App Referrer 0 0 0 0 Canada True 12345 2021-09-17
1 App Store Search 1 0 0 1 Canada True 12345 2020-12-29
2 None 0 0 0 0 Canada True 12345 2021-01-26
3 App Store Browse 0 0 0 0 Canada True 12345 2020-12-04
4 Web Referrer 0 0 0 0 Canada True 12345 2021-06-09
5 App Store Search 30 5 4 47 Canada True 12345 2021-10-05
6 None 0 0 0 0 Canada True 12345 2021-05-21
7 App Store Browse 0 0 0 0 Canada True 12345 2021-05-09
8 App Store Search 2 1 1 3 Canada True 12345 2021-05-18
9 App Store Search 0 0 0 0 Canada True 12345 2021-10-21
10 App Store Search 0 0 0 0 Kosovo True 12345 2021-10-23
11 App Store Search 0 0 3 0 Kosovo True 12345 2021-11-24
12 App Store Search 0 0 3 0 Côte d'Ivoire True 12345 2021-10-24
13 App Store Search 0 0 3 0 Côte d'Ivoire True 12345 2021-10-26
14 App Store Search 0 0 3 0 Côte d'Ivoire True 12345 2021-11-26
15 App Store Search 0 0 3 0 Türkiye True 12345 2021-11-26
16 App Store Search 0 0 3 0 Türkiye True 12345 2021-11-26

stg_app_store_territory_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"app_store_territory_source_type_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "source_type",
        "territory",
        "meets_threshold",
        "impressions",
        "impressions_unique_device",
        "page_views",
        "page_views_unique_device"
    FROM "app_store_territory_source_type"
),

"app_store_territory_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> metric_date
    -- source_type -> discovery_source
    -- impressions_unique_device -> unique_device_impressions
    -- page_views_unique_device -> unique_device_page_views
    SELECT 
        app_id,
        date_ AS metric_date,
        source_type AS discovery_source,
        territory,
        meets_threshold,
        impressions,
        impressions_unique_device AS unique_device_impressions,
        page_views,
        page_views_unique_device AS unique_device_page_views
    FROM app_store_territory_source_type_projected
),

"app_store_territory_source_type_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- territory: The problem is inconsistent naming for 'Cote d'Ivoire' vs 'Côte d'Ivoire', and 'Turkey' vs 'Türkiye'. The correct values are 'Côte d'Ivoire' and 'Türkiye', based on their more frequent occurrences. 
    SELECT
        "app_id",
        "metric_date",
        "discovery_source",
        CASE
            WHEN "territory" = 'Cote d''Ivoire' THEN 'Côte d''Ivoire'
            WHEN "territory" = 'Turkey' THEN 'Türkiye'
            ELSE "territory"
        END AS "territory",
        "meets_threshold",
        "impressions",
        "unique_device_impressions",
        "page_views",
        "unique_device_page_views"
    FROM "app_store_territory_source_type_projected_renamed"
),

"app_store_territory_source_type_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- discovery_source: ['Unavailable']
    SELECT 
        CASE
            WHEN "discovery_source" = 'Unavailable' THEN NULL
            ELSE "discovery_source"
        END AS "discovery_source",
        "unique_device_impressions",
        "page_views",
        "metric_date",
        "app_id",
        "unique_device_page_views",
        "impressions",
        "territory",
        "meets_threshold"
    FROM "app_store_territory_source_type_projected_renamed_cleaned"
),

"app_store_territory_source_type_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- metric_date: from VARCHAR to TIMESTAMP
    SELECT
        "discovery_source",
        "unique_device_impressions",
        "page_views",
        "unique_device_page_views",
        "impressions",
        "territory",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("metric_date" AS TIMESTAMP) AS "metric_date"
    FROM "app_store_territory_source_type_projected_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT * FROM "app_store_territory_source_type_projected_renamed_cleaned_null_casted"

stg_app_store_territory_source_type.yml (Document the table)

version: 2
models:
- name: stg_app_store_territory_source_type
  description: The table is about app performance metrics in the App Store. It includes
    details such as app ID, date, source type, territory, and various engagement metrics.
    The metrics cover impressions, page views, and unique device interactions. The
    data is segmented by different source types like App Store Search and Web Referrer.
    It also indicates whether the data meets a certain threshold.
  columns:
  - name: discovery_source
    description: Channel through which the app was discovered
    tests:
    - accepted_values:
        values:
        - App Store Search
        - App Store Browse
        - App Referrer
        - Web Referrer
        - Social Media
        - Word of Mouth
        - Advertisement
        - Email Marketing
        - Push Notification
        - In-App Promotion
        - App Store Featuring
        - Press/Media Coverage
        - Influencer Promotion
        - App Store Optimization
        - Google Play Store Search
        - Google Play Store Browse
        - Third-Party App Store
        - Pre-installed on Device
        - QR Code
        - SMS/Text Message
    cocoon_meta:
      missing_acceptable: No specific discovery source for some app impressions.
  - name: unique_device_impressions
    description: Number of unique devices that viewed the app
    tests:
    - not_null
  - name: page_views
    description: Number of times the app page was viewed
    tests:
    - not_null
  - name: unique_device_page_views
    description: Number of unique devices that viewed the app page
    tests:
    - not_null
  - name: impressions
    description: Number of times the app was viewed
    tests:
    - not_null
  - name: territory
    description: Geographic region of the data
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if data meets a certain threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: metric_date
    description: Date of the recorded metrics
    tests:
    - not_null

stg_usage_platform_version_source_type (first 100 rows)

os_version acquisition_source meets_threshold daily_installations daily_sessions daily_active_devices monthly_active_devices daily_deletions app_id record_date
0 iOS 1.0 App Store Search True 0 0 0 0 0 12345 2021-06-10
1 iOS 1.0 App Referrer True 0 0 0 0 0 12345 2021-08-02
2 iOS 1.0 App Referrer True 0 0 0 0 0 12345 2021-08-01
3 iOS 1.0 App Referrer True 0 0 0 0 0 12345 2021-08-03
4 iOS 1.0 App Store Browse True 0 0 0 0 0 12345 2021-04-18
5 iOS 1.0 App Store Browse True 0 0 0 0 0 12345 2021-04-17
6 iOS 1.0 App Store Browse True 0 0 0 2 0 12345 2021-04-19
7 iOS 1.0 App Store Search True 0 0 0 1 0 12345 2021-07-01
8 iOS 1.0 Web Referrer True 0 0 0 0 0 12345 2021-01-31
9 iOS 1.0 Web Referrer True 0 0 0 14 1 12345 2021-01-30

stg_usage_platform_version_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"usage_platform_version_source_type_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "platform_version",
        "source_type",
        "meets_threshold",
        "installations",
        "sessions",
        "active_devices",
        "active_devices_last_30_days",
        "deletions"
    FROM "usage_platform_version_source_type"
),

"usage_platform_version_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> record_date
    -- platform_version -> os_version
    -- source_type -> acquisition_source
    -- installations -> daily_installations
    -- sessions -> daily_sessions
    -- active_devices -> daily_active_devices
    -- active_devices_last_30_days -> monthly_active_devices
    -- deletions -> daily_deletions
    SELECT 
        app_id,
        date_ AS record_date,
        platform_version AS os_version,
        source_type AS acquisition_source,
        meets_threshold,
        installations AS daily_installations,
        sessions AS daily_sessions,
        active_devices AS daily_active_devices,
        active_devices_last_30_days AS monthly_active_devices,
        deletions AS daily_deletions
    FROM usage_platform_version_source_type_projected
),

"usage_platform_version_source_type_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- record_date: from VARCHAR to TIMESTAMP
    SELECT
        "os_version",
        "acquisition_source",
        "meets_threshold",
        "daily_installations",
        "daily_sessions",
        "daily_active_devices",
        "monthly_active_devices",
        "daily_deletions",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("record_date" AS TIMESTAMP) AS "record_date"
    FROM "usage_platform_version_source_type_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "usage_platform_version_source_type_projected_renamed_casted"

stg_usage_platform_version_source_type.yml (Document the table)

version: 2
models:
- name: stg_usage_platform_version_source_type
  description: The table is about app usage data. It includes details like app ID,
    date, platform version, source type, and various metrics. Metrics cover installations,
    sessions, active devices, and deletions. The data is segmented by platform version
    and acquisition source. It also indicates if the data meets a certain threshold.
  columns:
  - name: os_version
    description: Version of the operating system
    tests:
    - not_null
  - name: acquisition_source
    description: Channel through which the app was acquired
    tests:
    - not_null
    - accepted_values:
        values:
        - App Referrer
        - App Store Browse
        - App Store Search
        - Web Referrer
        - Social Media
        - Email Marketing
        - Push Notifications
        - Paid Advertising
        - Organic Search
        - Word of Mouth
        - In-App Referral
        - QR Code
        - App Store Featured
        - Cross-Promotion
        - Influencer Marketing
        - Offline Marketing
        - Pre-installed
  - name: meets_threshold
    description: Indicates if data meets a certain threshold
    tests:
    - not_null
  - name: daily_installations
    description: Number of app installations on the given date
    tests:
    - not_null
  - name: daily_sessions
    description: Number of app sessions on the given date
    tests:
    - not_null
  - name: daily_active_devices
    description: Number of active devices on the given date
    tests:
    - not_null
  - name: monthly_active_devices
    description: Number of active devices in the last 30 days
    tests:
    - not_null
  - name: daily_deletions
    description: Number of app deletions on the given date
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: record_date
    description: Date of the recorded data
    tests:
    - not_null

stg_sales_subscription_events (first 100 rows)

cancellation_reason promotional_offer_name user_state previous_subscription_apple_id promotional_offer_id subscription_name row_index source_file consecutive_paid_periods event_type device_type standard_subscription_duration proceeds_reason country_code subscription_quantity app_name account_number app_apple_id days_before_canceling days_canceled event_date original_start_date previous_subscription_name subscription_apple_id subscription_group_id subscription_offer_duration subscription_offer_type vendor_number
0 None None NJ None None Super Cool Name 9 Subscription_Event_12345678_20220102_V1_2.txt.gz 3 Renew iPhone 1 Month None US 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2021-02-25 None 12345 12345 None None 12345
1 None None FL None None Super Cool Name 25 Subscription_Event_12345678_20220102_V1_2.txt.gz 14 Renew iPhone 1 Month Rate After One Year US 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2020-03-25 None 12345 12345 None None 12345
2 Canceled None CA None None Super Cool Name 41 Subscription_Event_12345678_20220102_V1_2.txt.gz 1 Cancel iPhone 1 Year None US 1 Super Cool Name 12345 12345 176.0 NaN 2021-04-25 2020-04-25 None 12345 12345 None None 12345
3 None None CA None None Super Cool Name 57 Subscription_Event_12345678_20220102_V1_2.txt.gz 2 Renew iPhone 1 Month None US 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2021-03-25 None 12345 12345 None None 12345
4 None None None None None Super Cool Name 73 Subscription_Event_12345678_20220102_V1_2.txt.gz 3 Renew iPhone 1 Year Rate After One Year GB 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2019-04-25 None 12345 12345 None None 12345
5 None None None None None Super Cool Name 89 Subscription_Event_12345678_20220102_V1_2.txt.gz 13 Renew iPhone 1 Month Rate After One Year CA 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2020-04-24 None 12345 12345 None None 12345
6 Canceled None CA None None Super Cool Name 105 Subscription_Event_12345678_20220102_V1_2.txt.gz 1 Cancel iPad 1 Year None US 1 Super Cool Name 12345 12345 70.0 NaN 2021-04-25 2020-04-25 None 12345 12345 None None 12345
7 None None None None None Super Cool Name 121 Subscription_Event_12345678_20220102_V1_2.txt.gz 13 Renew iPad 1 Month Rate After One Year AT 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2020-04-25 None 12345 12345 None None 12345
8 None None MO None None Super Cool Name 137 Subscription_Event_12345678_20220102_V1_2.txt.gz 1 Subscribe iPhone 1 Month None US 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2021-04-25 None 12345 12345 None None 12345
9 None None KS None None Super Cool Name 153 Subscription_Event_12345678_20220102_V1_2.txt.gz 3 Renew iPhone 1 Month None IN 1 Super Cool Name 12345 12345 NaN NaN 2021-04-25 2021-02-25 None 12345 12345 None None 12345

stg_sales_subscription_events.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"sales_subscription_events_projected" AS (
    -- Projection: Selecting 29 out of 30 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "_filename",
        "account_number",
        "vendor_number",
        "_index",
        "event_date",
        "app_name",
        "days_canceled",
        "subscription_name",
        "consecutive_paid_periods",
        "previous_subscription_name",
        "cancellation_reason",
        "proceeds_reason",
        "subscription_apple_id",
        "standard_subscription_duration",
        "original_start_date",
        "device",
        "days_before_canceling",
        "quantity",
        "marketing_opt_in_duration",
        "promotional_offer_name",
        "state",
        "previous_subscription_apple_id",
        "event",
        "subscription_group_id",
        "country",
        "promotional_offer_id",
        "app_apple_id",
        "subscription_offer_type",
        "subscription_offer_duration"
    FROM "sales_subscription_events"
),

"sales_subscription_events_projected_renamed" AS (
    -- Rename: Renaming columns
    -- _filename -> source_file
    -- _index -> row_index
    -- device -> device_type
    -- quantity -> subscription_quantity
    -- state -> user_state
    -- event -> event_type
    -- country -> country_code
    SELECT 
        _filename AS source_file,
        account_number,
        vendor_number,
        _index AS row_index,
        event_date,
        app_name,
        days_canceled,
        subscription_name,
        consecutive_paid_periods,
        previous_subscription_name,
        cancellation_reason,
        proceeds_reason,
        subscription_apple_id,
        standard_subscription_duration,
        original_start_date,
        device AS device_type,
        days_before_canceling,
        quantity AS subscription_quantity,
        marketing_opt_in_duration,
        promotional_offer_name,
        state AS user_state,
        previous_subscription_apple_id,
        event AS event_type,
        subscription_group_id,
        country AS country_code,
        promotional_offer_id,
        app_apple_id,
        subscription_offer_type,
        subscription_offer_duration
    FROM sales_subscription_events_projected
),

"sales_subscription_events_projected_renamed_trimmed" AS (
    -- Trim Leading and Trailing Spaces
    SELECT
        "source_file",
        "account_number",
        "vendor_number",
        "row_index",
        "event_date",
        "app_name",
        "subscription_name",
        "consecutive_paid_periods",
        "previous_subscription_name",
        "proceeds_reason",
        "subscription_apple_id",
        "standard_subscription_duration",
        "original_start_date",
        "device_type",
        "subscription_quantity",
        "event_type",
        "subscription_group_id",
        "country_code",
        "app_apple_id",
        "subscription_offer_type",
        "subscription_offer_duration",
        TRIM("days_canceled") AS "days_canceled",
        TRIM("cancellation_reason") AS "cancellation_reason",
        TRIM("days_before_canceling") AS "days_before_canceling",
        TRIM("marketing_opt_in_duration") AS "marketing_opt_in_duration",
        TRIM("promotional_offer_name") AS "promotional_offer_name",
        TRIM("user_state") AS "user_state",
        TRIM("previous_subscription_apple_id") AS "previous_subscription_apple_id",
        TRIM("promotional_offer_id") AS "promotional_offer_id"
    FROM "sales_subscription_events_projected_renamed"
),

"sales_subscription_events_projected_renamed_trimmed_cleaned" AS (
    -- Clean unusual string values: 
    -- user_state: The problem is that this column contains non-standard US state abbreviations and empty values. 'KA' is likely a typo for 'KS' (Kansas). 'ON' is not a US state abbreviation, but could represent Ontario, Canada. Empty values should be preserved as they might represent missing data. The correct values should be standard US state abbreviations or empty strings for missing data. 
    SELECT
        "source_file",
        "account_number",
        "vendor_number",
        "row_index",
        "event_date",
        "app_name",
        "subscription_name",
        "consecutive_paid_periods",
        "previous_subscription_name",
        "proceeds_reason",
        "subscription_apple_id",
        "standard_subscription_duration",
        "original_start_date",
        "device_type",
        "subscription_quantity",
        "event_type",
        "subscription_group_id",
        "country_code",
        "app_apple_id",
        "subscription_offer_type",
        "subscription_offer_duration",
        "days_canceled",
        "cancellation_reason",
        "days_before_canceling",
        "marketing_opt_in_duration",
        "promotional_offer_name",
        CASE
            WHEN "user_state" = 'KA' THEN 'KS'
            WHEN "user_state" = 'ON' THEN ''
            ELSE "user_state"
        END AS "user_state",
        "previous_subscription_apple_id",
        "promotional_offer_id"
    FROM "sales_subscription_events_projected_renamed_trimmed"
),

"sales_subscription_events_projected_renamed_trimmed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- days_canceled: ['']
    -- cancellation_reason: ['']
    -- days_before_canceling: ['']
    -- marketing_opt_in_duration: ['']
    -- promotional_offer_name: ['']
    -- user_state: ['']
    -- previous_subscription_apple_id: ['']
    -- promotional_offer_id: ['']
    SELECT 
        CASE
            WHEN "days_canceled" = '' THEN NULL
            ELSE "days_canceled"
        END AS "days_canceled",
        CASE
            WHEN "cancellation_reason" = '' THEN NULL
            ELSE "cancellation_reason"
        END AS "cancellation_reason",
        CASE
            WHEN "days_before_canceling" = '' THEN NULL
            ELSE "days_before_canceling"
        END AS "days_before_canceling",
        CASE
            WHEN "marketing_opt_in_duration" = '' THEN NULL
            ELSE "marketing_opt_in_duration"
        END AS "marketing_opt_in_duration",
        CASE
            WHEN "promotional_offer_name" = '' THEN NULL
            ELSE "promotional_offer_name"
        END AS "promotional_offer_name",
        CASE
            WHEN "user_state" = '' THEN NULL
            ELSE "user_state"
        END AS "user_state",
        CASE
            WHEN "previous_subscription_apple_id" = '' THEN NULL
            ELSE "previous_subscription_apple_id"
        END AS "previous_subscription_apple_id",
        CASE
            WHEN "promotional_offer_id" = '' THEN NULL
            ELSE "promotional_offer_id"
        END AS "promotional_offer_id",
        "subscription_name",
        "app_apple_id",
        "row_index",
        "subscription_apple_id",
        "subscription_group_id",
        "previous_subscription_name",
        "source_file",
        "subscription_offer_duration",
        "consecutive_paid_periods",
        "account_number",
        "event_type",
        "device_type",
        "original_start_date",
        "standard_subscription_duration",
        "proceeds_reason",
        "country_code",
        "subscription_offer_type",
        "vendor_number",
        "event_date",
        "subscription_quantity",
        "app_name"
    FROM "sales_subscription_events_projected_renamed_trimmed_cleaned"
),

"sales_subscription_events_projected_renamed_trimmed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- account_number: from INT to VARCHAR
    -- app_apple_id: from INT to VARCHAR
    -- days_before_canceling: from VARCHAR to INT
    -- days_canceled: from VARCHAR to INT
    -- event_date: from VARCHAR to DATE
    -- original_start_date: from VARCHAR to DATE
    -- previous_subscription_name: from DECIMAL to VARCHAR
    -- subscription_apple_id: from INT to VARCHAR
    -- subscription_group_id: from INT to VARCHAR
    -- subscription_offer_duration: from DECIMAL to VARCHAR
    -- subscription_offer_type: from DECIMAL to VARCHAR
    -- vendor_number: from INT to VARCHAR
    SELECT
        "cancellation_reason",
        "marketing_opt_in_duration",
        "promotional_offer_name",
        "user_state",
        "previous_subscription_apple_id",
        "promotional_offer_id",
        "subscription_name",
        "row_index",
        "source_file",
        "consecutive_paid_periods",
        "event_type",
        "device_type",
        "standard_subscription_duration",
        "proceeds_reason",
        "country_code",
        "subscription_quantity",
        "app_name",
        CAST("account_number" AS VARCHAR) AS "account_number",
        CAST("app_apple_id" AS VARCHAR) AS "app_apple_id",
        CAST("days_before_canceling" AS INT) AS "days_before_canceling",
        CAST("days_canceled" AS INT) AS "days_canceled",
        CAST("event_date" AS DATE) AS "event_date",
        CAST("original_start_date" AS DATE) AS "original_start_date",
        CAST("previous_subscription_name" AS VARCHAR) AS "previous_subscription_name",
        CAST("subscription_apple_id" AS VARCHAR) AS "subscription_apple_id",
        CAST("subscription_group_id" AS VARCHAR) AS "subscription_group_id",
        CAST("subscription_offer_duration" AS VARCHAR) AS "subscription_offer_duration",
        CAST("subscription_offer_type" AS VARCHAR) AS "subscription_offer_type",
        CAST("vendor_number" AS VARCHAR) AS "vendor_number"
    FROM "sales_subscription_events_projected_renamed_trimmed_cleaned_null"
),

"sales_subscription_events_projected_renamed_trimmed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- marketing_opt_in_duration has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- proceeds_reason has 60.0 percent missing. Strategy: 🔄 Unchanged
    -- user_state has 30.0 percent missing. Strategy: 🔄 Unchanged
    SELECT
        "cancellation_reason",
        "promotional_offer_name",
        "user_state",
        "previous_subscription_apple_id",
        "promotional_offer_id",
        "subscription_name",
        "row_index",
        "source_file",
        "consecutive_paid_periods",
        "event_type",
        "device_type",
        "standard_subscription_duration",
        "proceeds_reason",
        "country_code",
        "subscription_quantity",
        "app_name",
        "account_number",
        "app_apple_id",
        "days_before_canceling",
        "days_canceled",
        "event_date",
        "original_start_date",
        "previous_subscription_name",
        "subscription_apple_id",
        "subscription_group_id",
        "subscription_offer_duration",
        "subscription_offer_type",
        "vendor_number"
    FROM "sales_subscription_events_projected_renamed_trimmed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "sales_subscription_events_projected_renamed_trimmed_cleaned_null_casted_missing_handled"

stg_sales_subscription_events.yml (Document the table)

version: 2
models:
- name: stg_sales_subscription_events
  description: The table is about subscription events for an app. It includes details
    like account number, event date, app name, subscription name, event type (renew/cancel),
    consecutive paid periods, subscription duration, country, and device. Each row
    represents a specific subscription event with associated attributes like renewal
    status, cancellation reasons, and promotional offers.
  columns:
  - name: cancellation_reason
    description: Reason for subscription cancellation
    tests:
    - accepted_values:
        values:
        - Too expensive
        - No longer needed
        - Switched to competitor
        - Poor service quality
        - Technical issues
        - Changed circumstances
        - Found better alternative
        - Billing problems
        - Lack of features
        - Unused subscription
        - Dissatisfied with product
        - Company policy change
        - Budget cuts
        - Temporary pause
        - Moving/Relocating
        - Other
        - Canceled
    cocoon_meta:
      missing_acceptable: Not applicable for renewals or non-canceled subscriptions
  - name: promotional_offer_name
    description: Name of promotional offer
    cocoon_meta:
      missing_acceptable: Not applicable if no promotional offer was used
  - name: user_state
    description: State or region of the user
    tests:
    - not_null
    - accepted_values:
        values:
        - AL
        - AK
        - AZ
        - AR
        - CA
        - CO
        - CT
        - DE
        - FL
        - GA
        - HI
        - ID
        - IL
        - IN
        - IA
        - KS
        - KY
        - LA
        - ME
        - MD
        - MA
        - MI
        - MN
        - MS
        - MO
        - MT
        - NE
        - NV
        - NH
        - NJ
        - NM
        - NY
        - NC
        - ND
        - OH
        - OK
        - OR
        - PA
        - RI
        - SC
        - SD
        - TN
        - TX
        - UT
        - VT
        - VA
        - WA
        - WV
        - WI
        - WY
  - name: previous_subscription_apple_id
    description: Apple ID of previous subscription
    cocoon_meta:
      missing_acceptable: Not applicable for first-time subscribers
  - name: promotional_offer_id
    description: Identifier for promotional offer
    cocoon_meta:
      missing_acceptable: Not applicable if no promotional offer was used
  - name: subscription_name
    description: Name of the subscription
    tests:
    - not_null
  - name: row_index
    description: Row index in the source file
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the row index in the source file. For this
        table, each row is a specific subscription event. The row_index appears to
        be unique across rows in the given sample, and it's likely to be unique across
        the entire dataset if it represents the actual row number in the source file.
  - name: source_file
    description: Name of the source file
    tests:
    - not_null
  - name: consecutive_paid_periods
    description: Number of consecutive paid subscription periods
    tests:
    - not_null
  - name: event_type
    description: Type of subscription event (e.g., Renew, Cancel)
    tests:
    - not_null
    - accepted_values:
        values:
        - Renew
        - Cancel
        - Subscribe
        - Upgrade
        - Downgrade
        - Pause
        - Resume
        - Trial
        - Expire
        - Refund
  - name: device_type
    description: Device used for the subscription
    tests:
    - not_null
    - accepted_values:
        values:
        - iPhone
        - iPad
        - Android phone
        - Android tablet
        - Windows phone
        - MacBook
        - Windows laptop
        - Desktop computer
        - Smart TV
        - Gaming console
        - Smartwatch
        - E-reader
  - name: standard_subscription_duration
    description: Duration of standard subscription
    tests:
    - not_null
    - accepted_values:
        values:
        - 1 Day
        - 1 Week
        - 2 Weeks
        - 1 Month
        - 3 Months
        - 6 Months
        - 1 Year
        - 2 Years
        - 3 Years
        - 5 Years
  - name: proceeds_reason
    description: Reason for proceeds change
    tests:
    - not_null
  - name: country_code
    description: Country code of the user
    tests:
    - not_null
  - name: subscription_quantity
    description: Quantity of subscriptions
    tests:
    - not_null
  - name: app_name
    description: Name of the application
    tests:
    - not_null
  - name: account_number
    description: Unique identifier for the user account
    tests:
    - not_null
  - name: app_apple_id
    description: Apple ID of the application
    tests:
    - not_null
  - name: days_before_canceling
    description: Days between subscription start and cancellation
    cocoon_meta:
      missing_acceptable: Not applicable for renewals or non-canceled subscriptions
  - name: days_canceled
    description: Number of days the subscription was canceled
    cocoon_meta:
      missing_acceptable: Not applicable for active subscriptions or renewals
  - name: event_date
    description: Date of the subscription event
    tests:
    - not_null
  - name: original_start_date
    description: Initial date of subscription
    tests:
    - not_null
  - name: previous_subscription_name
    description: Name of previous subscription
    cocoon_meta:
      missing_acceptable: Not applicable for first-time subscribers
  - name: subscription_apple_id
    description: Apple ID of the subscription
    tests:
    - not_null
  - name: subscription_group_id
    description: Group ID for the subscription
    tests:
    - not_null
  - name: subscription_offer_duration
    description: Duration of subscription offer
    cocoon_meta:
      missing_acceptable: Not applicable if no special offer was applied
  - name: subscription_offer_type
    description: Type of subscription offer
    cocoon_meta:
      missing_acceptable: Not applicable if no special offer was applied
  - name: vendor_number
    description: Vendor identification number
    tests:
    - not_null

stg_sales_subscription_summary (first 100 rows)

proceeds_currency customer_state grace_period_days subscription_name pay_as_you_go_promo_subs standard_sub_duration active_pay_as_you_go_intro_subs source_file customer_country pay_up_front_promo_subs active_pay_up_front_intro_subs free_trial_promo_subs developer_proceeds device_type marketing_opt_in_count proceeds_reason customer_price active_free_trial_intro_subs active_standard_subs customer_currency billing_retry_count row_index app_name account_number app_apple_id subscription_apple_id subscription_group_id vendor_number
0 EUR None 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz FR 0 0 0 24.72 iPhone 0 None 42.99 0 30 EUR 0 9 Super Cool Name 12345 12345 12345 12345 12345
1 EUR None 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz FI 0 0 0 24.27 iPhone 0 None 42.99 0 2 EUR 0 25 Super Cool Name 12345 12345 12345 12345 12345
2 None MI 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz IT 0 0 0 24.20 iPhone 0 None 42.99 0 1 EUR 0 41 Super Cool Name 12345 12345 12345 12345 12345
3 USD MS 0 Super Cool Name 0 1 Month 0 Subscription_12345678_20210712_V1_2.txt.gz US 0 0 0 3.50 iPhone 0 None 4.99 0 5 USD 0 57 Super Cool Name 12345 12345 12345 12345 12345
4 None NJ 0 Super Cool Name 0 1 Month 0 Subscription_12345678_20210712_V1_2.txt.gz US 0 0 0 3.50 iPhone 0 None 4.99 0 7 USD 2 73 Super Cool Name 12345 12345 12345 12345 12345
5 KRW None 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz KR 0 0 0 31182.00 iPhone 0 None 49000.00 0 1 KRW 0 89 Super Cool Name 12345 12345 12345 12345 12345
6 None AZ 0 Super Cool Name 0 1 Month 0 Subscription_12345678_20210712_V1_2.txt.gz US 0 0 0 3.50 iPhone 0 None 4.99 0 1 USD 0 105 Super Cool Name 12345 12345 12345 12345 12345
7 None DC 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz US 0 0 0 33.99 iPhone 0 Rate After One Year 39.99 0 6 USD 0 121 Super Cool Name 12345 12345 12345 12345 12345
8 None AZ 0 Super Cool Name 0 1 Month 0 Subscription_12345678_20210712_V1_2.txt.gz US 0 0 0 3.50 iPhone 0 None 4.99 0 6 USD 0 137 Super Cool Name 12345 12345 12345 12345 12345
9 EUR None 0 Super Cool Name 0 1 Year 0 Subscription_12345678_20210712_V1_2.txt.gz IT 0 0 0 29.67 iPhone 0 Rate After One Year 42.99 0 1 EUR 0 153 Super Cool Name 12345 12345 12345 12345 12345

stg_sales_subscription_summary.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"sales_subscription_summary_projected" AS (
    -- Projection: Selecting 30 out of 31 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "_filename",
        "account_number",
        "vendor_number",
        "_index",
        "developer_proceeds",
        "app_name",
        "free_trial_promotional_offer_subscriptions",
        "proceeds_currency",
        "subscription_name",
        "pay_as_you_go_promotional_offer_subscriptions",
        "customer_currency",
        "marketing_opt_ins",
        "pay_up_front_promotional_offer_subscriptions",
        "billing_retry",
        "proceeds_reason",
        "subscription_apple_id",
        "active_standard_price_subscriptions",
        "standard_subscription_duration",
        "grace_period",
        "device",
        "active_pay_up_front_introductory_offer_subscriptions",
        "customer_price",
        "promotional_offer_name",
        "state",
        "active_pay_as_you_go_introductory_offer_subscriptions",
        "subscription_group_id",
        "country",
        "active_free_trial_introductory_offer_subscriptions",
        "promotional_offer_id",
        "app_apple_id"
    FROM "sales_subscription_summary"
),

"sales_subscription_summary_projected_renamed" AS (
    -- Rename: Renaming columns
    -- _filename -> source_file
    -- _index -> row_index
    -- free_trial_promotional_offer_subscriptions -> free_trial_promo_subs
    -- pay_as_you_go_promotional_offer_subscriptions -> pay_as_you_go_promo_subs
    -- marketing_opt_ins -> marketing_opt_in_count
    -- pay_up_front_promotional_offer_subscriptions -> pay_up_front_promo_subs
    -- billing_retry -> billing_retry_count
    -- active_standard_price_subscriptions -> active_standard_subs
    -- standard_subscription_duration -> standard_sub_duration
    -- grace_period -> grace_period_days
    -- device -> device_type
    -- active_pay_up_front_introductory_offer_subscriptions -> active_pay_up_front_intro_subs
    -- promotional_offer_name -> promo_offer_name
    -- state -> customer_state
    -- active_pay_as_you_go_introductory_offer_subscriptions -> active_pay_as_you_go_intro_subs
    -- country -> customer_country
    -- active_free_trial_introductory_offer_subscriptions -> active_free_trial_intro_subs
    -- promotional_offer_id -> promo_offer_id
    SELECT 
        _filename AS source_file,
        account_number,
        vendor_number,
        _index AS row_index,
        developer_proceeds,
        app_name,
        free_trial_promotional_offer_subscriptions AS free_trial_promo_subs,
        proceeds_currency,
        subscription_name,
        pay_as_you_go_promotional_offer_subscriptions AS pay_as_you_go_promo_subs,
        customer_currency,
        marketing_opt_ins AS marketing_opt_in_count,
        pay_up_front_promotional_offer_subscriptions AS pay_up_front_promo_subs,
        billing_retry AS billing_retry_count,
        proceeds_reason,
        subscription_apple_id,
        active_standard_price_subscriptions AS active_standard_subs,
        standard_subscription_duration AS standard_sub_duration,
        grace_period AS grace_period_days,
        device AS device_type,
        active_pay_up_front_introductory_offer_subscriptions AS active_pay_up_front_intro_subs,
        customer_price,
        promotional_offer_name AS promo_offer_name,
        state AS customer_state,
        active_pay_as_you_go_introductory_offer_subscriptions AS active_pay_as_you_go_intro_subs,
        subscription_group_id,
        country AS customer_country,
        active_free_trial_introductory_offer_subscriptions AS active_free_trial_intro_subs,
        promotional_offer_id AS promo_offer_id,
        app_apple_id
    FROM sales_subscription_summary_projected
),

"sales_subscription_summary_projected_renamed_trimmed" AS (
    -- Trim Leading and Trailing Spaces
    SELECT
        "source_file",
        "account_number",
        "vendor_number",
        "row_index",
        "developer_proceeds",
        "app_name",
        "free_trial_promo_subs",
        "subscription_name",
        "pay_as_you_go_promo_subs",
        "customer_currency",
        "marketing_opt_in_count",
        "pay_up_front_promo_subs",
        "billing_retry_count",
        "proceeds_reason",
        "subscription_apple_id",
        "active_standard_subs",
        "standard_sub_duration",
        "grace_period_days",
        "device_type",
        "active_pay_up_front_intro_subs",
        "customer_price",
        "active_pay_as_you_go_intro_subs",
        "subscription_group_id",
        "customer_country",
        "active_free_trial_intro_subs",
        "app_apple_id",
        TRIM("proceeds_currency") AS "proceeds_currency",
        TRIM("promo_offer_name") AS "promo_offer_name",
        TRIM("customer_state") AS "customer_state",
        TRIM("promo_offer_id") AS "promo_offer_id"
    FROM "sales_subscription_summary_projected_renamed"
),

"sales_subscription_summary_projected_renamed_trimmed_cleaned" AS (
    -- Clean unusual string values: 
    -- customer_state: The problem is that some values in the customer_state column don't conform to standard US state abbreviations. The correct values should be two-letter US state codes, with the exception of 'DC' for District of Columbia. The empty string (''), 'AA', 'PO', and '부산' are unusual. 'AA' might be a typo for 'AZ' (Arizona), which is already present. 'PO' could be a typo for a state abbreviation but it's unclear which one. '부산' is a Korean city name and doesn't correspond to any US state. The empty string might indicate missing data. 
    -- promo_offer_id: The promo_offer_id column only contains empty strings, which do not identify any promotional offers. Since there are no other values to map to, we will leave the values as empty strings. 
    SELECT
        "source_file",
        "account_number",
        "vendor_number",
        "row_index",
        "developer_proceeds",
        "app_name",
        "free_trial_promo_subs",
        "subscription_name",
        "pay_as_you_go_promo_subs",
        "customer_currency",
        "marketing_opt_in_count",
        "pay_up_front_promo_subs",
        "billing_retry_count",
        "proceeds_reason",
        "subscription_apple_id",
        "active_standard_subs",
        "standard_sub_duration",
        "grace_period_days",
        "device_type",
        "active_pay_up_front_intro_subs",
        "customer_price",
        "active_pay_as_you_go_intro_subs",
        "subscription_group_id",
        "customer_country",
        "active_free_trial_intro_subs",
        "app_apple_id",
        "proceeds_currency",
        "promo_offer_name",
        CASE
            WHEN "customer_state" = 'AA' THEN 'AZ'
            WHEN "customer_state" = 'PO' THEN ''
            WHEN "customer_state" = '부산' THEN ''
            ELSE "customer_state"
        END AS "customer_state",
        "promo_offer_id"
    FROM "sales_subscription_summary_projected_renamed_trimmed"
),

"sales_subscription_summary_projected_renamed_trimmed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- proceeds_currency: ['']
    -- promo_offer_name: ['']
    -- customer_state: ['']
    -- promo_offer_id: ['']
    SELECT 
        CASE
            WHEN "proceeds_currency" = '' THEN NULL
            ELSE "proceeds_currency"
        END AS "proceeds_currency",
        CASE
            WHEN "promo_offer_name" = '' THEN NULL
            ELSE "promo_offer_name"
        END AS "promo_offer_name",
        CASE
            WHEN "customer_state" = '' THEN NULL
            ELSE "customer_state"
        END AS "customer_state",
        CASE
            WHEN "promo_offer_id" = '' THEN NULL
            ELSE "promo_offer_id"
        END AS "promo_offer_id",
        "grace_period_days",
        "subscription_name",
        "pay_as_you_go_promo_subs",
        "app_apple_id",
        "standard_sub_duration",
        "active_pay_as_you_go_intro_subs",
        "subscription_apple_id",
        "subscription_group_id",
        "source_file",
        "customer_country",
        "pay_up_front_promo_subs",
        "active_pay_up_front_intro_subs",
        "free_trial_promo_subs",
        "account_number",
        "developer_proceeds",
        "device_type",
        "marketing_opt_in_count",
        "proceeds_reason",
        "customer_price",
        "active_free_trial_intro_subs",
        "vendor_number",
        "active_standard_subs",
        "customer_currency",
        "billing_retry_count",
        "row_index",
        "app_name"
    FROM "sales_subscription_summary_projected_renamed_trimmed_cleaned"
),

"sales_subscription_summary_projected_renamed_trimmed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- account_number: from INT to VARCHAR
    -- app_apple_id: from INT to VARCHAR
    -- subscription_apple_id: from INT to VARCHAR
    -- subscription_group_id: from INT to VARCHAR
    -- vendor_number: from INT to VARCHAR
    SELECT
        "proceeds_currency",
        "promo_offer_name",
        "customer_state",
        "promo_offer_id",
        "grace_period_days",
        "subscription_name",
        "pay_as_you_go_promo_subs",
        "standard_sub_duration",
        "active_pay_as_you_go_intro_subs",
        "source_file",
        "customer_country",
        "pay_up_front_promo_subs",
        "active_pay_up_front_intro_subs",
        "free_trial_promo_subs",
        "developer_proceeds",
        "device_type",
        "marketing_opt_in_count",
        "proceeds_reason",
        "customer_price",
        "active_free_trial_intro_subs",
        "active_standard_subs",
        "customer_currency",
        "billing_retry_count",
        "row_index",
        "app_name",
        CAST("account_number" AS VARCHAR) AS "account_number",
        CAST("app_apple_id" AS VARCHAR) AS "app_apple_id",
        CAST("subscription_apple_id" AS VARCHAR) AS "subscription_apple_id",
        CAST("subscription_group_id" AS VARCHAR) AS "subscription_group_id",
        CAST("vendor_number" AS VARCHAR) AS "vendor_number"
    FROM "sales_subscription_summary_projected_renamed_trimmed_cleaned_null"
),

"sales_subscription_summary_projected_renamed_trimmed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 4 columns with unacceptable missing values
    -- proceeds_currency has 50.0 percent missing. Strategy: 🔄 Unchanged
    -- proceeds_reason has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- promo_offer_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- promo_offer_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "proceeds_currency",
        "customer_state",
        "grace_period_days",
        "subscription_name",
        "pay_as_you_go_promo_subs",
        "standard_sub_duration",
        "active_pay_as_you_go_intro_subs",
        "source_file",
        "customer_country",
        "pay_up_front_promo_subs",
        "active_pay_up_front_intro_subs",
        "free_trial_promo_subs",
        "developer_proceeds",
        "device_type",
        "marketing_opt_in_count",
        "proceeds_reason",
        "customer_price",
        "active_free_trial_intro_subs",
        "active_standard_subs",
        "customer_currency",
        "billing_retry_count",
        "row_index",
        "app_name",
        "account_number",
        "app_apple_id",
        "subscription_apple_id",
        "subscription_group_id",
        "vendor_number"
    FROM "sales_subscription_summary_projected_renamed_trimmed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "sales_subscription_summary_projected_renamed_trimmed_cleaned_null_casted_missing_handled"

stg_sales_subscription_summary.yml (Document the table)

version: 2
models:
- name: stg_sales_subscription_summary
  description: The table is about app subscription sales. It contains details such
    as account number, vendor number, app name, subscription name, and customer information.
    Each row represents a subscription sale, including financial data like developer
    proceeds and customer price. It also tracks subscription types, durations, and
    promotional offers. The table includes geographic information and device types
    used for subscriptions.
  columns:
  - name: proceeds_currency
    description: Currency of the proceeds
    tests:
    - not_null
  - name: customer_state
    description: State or province of the customer
    tests:
    - accepted_values:
        values:
        - AL
        - AK
        - AZ
        - AR
        - CA
        - CO
        - CT
        - DE
        - FL
        - GA
        - HI
        - ID
        - IL
        - IN
        - IA
        - KS
        - KY
        - LA
        - ME
        - MD
        - MA
        - MI
        - MN
        - MS
        - MO
        - MT
        - NE
        - NV
        - NH
        - NJ
        - NM
        - NY
        - NC
        - ND
        - OH
        - OK
        - OR
        - PA
        - RI
        - SC
        - SD
        - TN
        - TX
        - UT
        - VT
        - VA
        - WA
        - WV
        - WI
        - WY
        - DC
        - AS
        - GU
        - MP
        - PR
        - VI
        - AB
        - BC
        - MB
        - NB
        - NL
        - NS
        - NT
        - NU
        - 'ON'
        - PE
        - QC
        - SK
        - YT
    cocoon_meta:
      missing_acceptable: Not applicable for non-US customers or countries without
        states.
  - name: grace_period_days
    description: Number of grace period days
    tests:
    - not_null
  - name: subscription_name
    description: Name of the subscription
    tests:
    - not_null
  - name: pay_as_you_go_promo_subs
    description: Count of pay-as-you-go promo offer subscriptions
    tests:
    - not_null
  - name: standard_sub_duration
    description: Duration of the standard subscription
    tests:
    - not_null
    - accepted_values:
        values:
        - 1 Week
        - 2 Weeks
        - 1 Month
        - 3 Months
        - 6 Months
        - 1 Year
        - 2 Years
        - 3 Years
        - 5 Years
  - name: active_pay_as_you_go_intro_subs
    description: Count of active pay-as-you-go intro offer subscriptions
    tests:
    - not_null
  - name: source_file
    description: Name of the source file
    tests:
    - not_null
  - name: customer_country
    description: Country code of the customer
    tests:
    - not_null
  - name: pay_up_front_promo_subs
    description: Count of pay-up-front promo offer subscriptions
    tests:
    - not_null
  - name: active_pay_up_front_intro_subs
    description: Count of active pay-up-front intro offer subscriptions
    tests:
    - not_null
  - name: free_trial_promo_subs
    description: Count of free trial promo offer subscriptions
    tests:
    - not_null
  - name: developer_proceeds
    description: Amount received by the developer
    tests:
    - not_null
  - name: device_type
    description: Type of device used
    tests:
    - not_null
    - accepted_values:
        values:
        - iPhone
        - Android phone
        - iPad
        - Android tablet
        - Windows PC
        - Mac
        - Linux PC
        - Smart TV
        - Gaming console
        - Smartwatch
        - E-reader
        - Smart speaker
        - Other
  - name: marketing_opt_in_count
    description: Number of marketing opt-ins
    tests:
    - not_null
  - name: proceeds_reason
    description: Reason for the proceeds
    tests:
    - not_null
  - name: customer_price
    description: Price paid by the customer
    tests:
    - not_null
  - name: active_free_trial_intro_subs
    description: Count of active free trial intro offer subscriptions
    tests:
    - not_null
  - name: active_standard_subs
    description: Count of active standard price subscriptions
    tests:
    - not_null
  - name: customer_currency
    description: Currency used by the customer
    tests:
    - not_null
  - name: billing_retry_count
    description: Number of billing retry attempts
    tests:
    - not_null
  - name: row_index
    description: Row index in the source file
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the row index in the source file. For this
        table, each row is a unique subscription sale. The row_index appears to be
        unique across rows in the given sample.
  - name: app_name
    description: Name of the app
    tests:
    - not_null
  - name: account_number
    description: Unique identifier for the account
    tests:
    - not_null
  - name: app_apple_id
    description: Apple ID of the app
    tests:
    - not_null
  - name: subscription_apple_id
    description: Apple ID of the subscription
    tests:
    - not_null
  - name: subscription_group_id
    description: ID of the subscription group
    tests:
    - not_null
  - name: vendor_number
    description: Unique identifier for the vendor
    tests:
    - not_null

stg_app_store_source_type_device (first 100 rows)

discovery_source unique_impressions device_type unique_page_views total_page_views total_impressions meets_threshold app_id metric_date
0 None 0 iPhone 0 0 0 True 12345 2021-10-03
1 App Store Browse 5 iPad 4 5 8 True 12345 2021-08-23
2 App Store Search 732 iPhone 103 146 1210 True 12345 2021-10-06
3 App Store Search 1113 iPhone 151 209 1757 True 12345 2021-03-17
4 None 0 Desktop 0 0 0 True 12345 2021-02-24
5 None 0 iPad 0 0 0 True 12345 2021-06-06
6 None 0 iPad 0 0 0 True 12345 2020-11-15
7 None 0 iPad 0 0 0 True 12345 2021-06-24
8 App Store Search 0 Desktop 0 0 0 True 12345 2021-03-17
9 Institutional Purchase 0 Desktop 0 0 0 True 12345 2021-07-12

stg_app_store_source_type_device.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"app_store_source_type_device_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "device",
        "source_type",
        "meets_threshold",
        "impressions",
        "impressions_unique_device",
        "page_views",
        "page_views_unique_device"
    FROM "app_store_source_type_device"
),

"app_store_source_type_device_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> metric_date
    -- device -> device_type
    -- source_type -> discovery_source
    -- impressions -> total_impressions
    -- impressions_unique_device -> unique_impressions
    -- page_views -> total_page_views
    -- page_views_unique_device -> unique_page_views
    SELECT 
        app_id,
        date_ AS metric_date,
        device AS device_type,
        source_type AS discovery_source,
        meets_threshold,
        impressions AS total_impressions,
        impressions_unique_device AS unique_impressions,
        page_views AS total_page_views,
        page_views_unique_device AS unique_page_views
    FROM app_store_source_type_device_projected
),

"app_store_source_type_device_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- discovery_source: ['Unavailable']
    SELECT 
        CASE
            WHEN "discovery_source" = 'Unavailable' THEN NULL
            ELSE "discovery_source"
        END AS "discovery_source",
        "metric_date",
        "unique_impressions",
        "app_id",
        "device_type",
        "unique_page_views",
        "total_page_views",
        "total_impressions",
        "meets_threshold"
    FROM "app_store_source_type_device_projected_renamed"
),

"app_store_source_type_device_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- metric_date: from VARCHAR to TIMESTAMP
    SELECT
        "discovery_source",
        "unique_impressions",
        "device_type",
        "unique_page_views",
        "total_page_views",
        "total_impressions",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("metric_date" AS TIMESTAMP) AS "metric_date"
    FROM "app_store_source_type_device_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "app_store_source_type_device_projected_renamed_null_casted"

stg_app_store_source_type_device.yml (Document the table)

version: 2
models:
- name: stg_app_store_source_type_device
  description: The table is about app store performance metrics. It includes details
    like app ID, date, device type, source type, impressions, page views, and unique
    device counts. Each row represents performance data for a specific app on a particular
    date, device, and source type. The data tracks how users discover and interact
    with the app across different platforms and channels.
  columns:
  - name: discovery_source
    description: Channel through which the app was discovered
    tests:
    - accepted_values:
        values:
        - App Store Search
        - App Store Browse
        - Institutional Purchase
        - Word of Mouth
        - Social Media
        - App Store Ads
        - Web Search
        - Website Referral
        - Email Marketing
        - In-App Referral
        - App Store Featured
        - App Store Category
        - Television Ad
        - Print Ad
        - Radio Ad
        - Influencer Promotion
        - App Review Sites
        - App Recommendation
        - Pre-installed
        - Partner Promotion
    cocoon_meta:
      missing_acceptable: Not applicable for direct app access or unknown sources.
  - name: unique_impressions
    description: Number of unique devices that viewed the app
    tests:
    - not_null
  - name: device_type
    description: Type of device used (iPhone, iPad, Desktop)
    tests:
    - not_null
    - accepted_values:
        values:
        - iPhone
        - iPad
        - Desktop
        - Android Phone
        - Android Tablet
        - MacBook
        - Windows Laptop
        - Chromebook
        - Smart TV
        - Gaming Console
        - Smartwatch
        - E-reader
  - name: unique_page_views
    description: Number of unique devices that viewed the app page
    tests:
    - not_null
  - name: total_page_views
    description: Total number of app page views
    tests:
    - not_null
  - name: total_impressions
    description: Total number of times the app was viewed
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if the data meets a certain threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: metric_date
    description: Date of the recorded metrics
    tests:
    - not_null

stg_downloads_territory_source_type (first 100 rows)

acquisition_channel new_downloads country redownloads total_downloads meets_threshold app_id stat_date
0 App Store Search 0 Canada 0 0 True 12345 2021-10-25
1 Web Referrer 1 Canada 0 1 True 12345 2021-05-08
2 App Store Search 0 Canada 0 0 True 12345 2021-09-30
3 App Store Search 1 Canada 0 1 True 12345 2021-11-11
4 Web Referrer 0 Canada 0 0 True 12345 2022-01-10
5 App Store Search 0 Canada 0 0 True 12345 2021-06-18
6 Web Referrer 0 Canada 0 0 True 12345 2021-11-27
7 App Referrer 0 Canada 0 0 True 12345 2021-07-15
8 None 0 Canada 0 0 True 12345 2021-04-07
9 App Referrer 0 Canada 0 0 True 12345 2021-06-05

stg_downloads_territory_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"downloads_territory_source_type_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "source_type",
        "territory",
        "meets_threshold",
        "first_time_downloads",
        "redownloads",
        "total_downloads"
    FROM "downloads_territory_source_type"
),

"downloads_territory_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> stat_date
    -- source_type -> acquisition_channel
    -- territory -> country
    -- first_time_downloads -> new_downloads
    SELECT 
        app_id,
        date_ AS stat_date,
        source_type AS acquisition_channel,
        territory AS country,
        meets_threshold,
        first_time_downloads AS new_downloads,
        redownloads,
        total_downloads
    FROM downloads_territory_source_type_projected
),

"downloads_territory_source_type_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- acquisition_channel: ['Unavailable']
    SELECT 
        CASE
            WHEN "acquisition_channel" = 'Unavailable' THEN NULL
            ELSE "acquisition_channel"
        END AS "acquisition_channel",
        "stat_date",
        "new_downloads",
        "country",
        "redownloads",
        "app_id",
        "total_downloads",
        "meets_threshold"
    FROM "downloads_territory_source_type_projected_renamed"
),

"downloads_territory_source_type_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- stat_date: from VARCHAR to TIMESTAMP
    SELECT
        "acquisition_channel",
        "new_downloads",
        "country",
        "redownloads",
        "total_downloads",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("stat_date" AS TIMESTAMP) AS "stat_date"
    FROM "downloads_territory_source_type_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "downloads_territory_source_type_projected_renamed_null_casted"

stg_downloads_territory_source_type.yml (Document the table)

version: 2
models:
- name: stg_downloads_territory_source_type
  description: The table is about app downloads. It tracks download statistics for
    an app across different territories and source types. Details include the app
    ID, date, source type, territory, download counts (first-time, redownloads, total),
    and whether the data meets a threshold. The table allows analysis of app performance
    by location and acquisition channel over time.
  columns:
  - name: acquisition_channel
    description: Channel or method through which app was downloaded
    tests:
    - not_null
    - accepted_values:
        values:
        - App Store Search
        - Web Referrer
        - App Referrer
        - Google Play Store Search
        - Apple App Store Browse
        - Google Play Store Browse
        - Social Media Ad
        - Search Engine Ad
        - Display Ad
        - Email Campaign
        - SMS Campaign
        - Push Notification
        - Influencer Marketing
        - Word of Mouth
        - QR Code
        - TV Ad
        - Print Ad
        - Radio Ad
        - In-App Promotion
        - App Store Optimization (ASO)
        - Pre-installed
        - App Bundle
  - name: new_downloads
    description: Number of first-time downloads
    tests:
    - not_null
  - name: country
    description: Geographical location of the downloads
    tests:
    - not_null
  - name: redownloads
    description: Number of times the app was redownloaded
    tests:
    - not_null
  - name: total_downloads
    description: Total number of downloads including redownloads
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if data meets a certain threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: stat_date
    description: Date of the download statistics
    tests:
    - not_null

stg_sales_account (first 100 rows)

account_name account_id
0 Super Cool Name 12345

stg_sales_account.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"sales_account_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "name"
    FROM "sales_account"
),

"sales_account_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> account_id
    -- name -> account_name
    SELECT 
        id AS account_id,
        name AS account_name
    FROM sales_account_projected
),

"sales_account_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- account_id: from INT to VARCHAR
    SELECT
        "account_name",
        CAST("account_id" AS VARCHAR) AS "account_id"
    FROM "sales_account_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "sales_account_projected_renamed_casted"

stg_sales_account.yml (Document the table)

version: 2
models:
- name: stg_sales_account
  description: The table is about sales accounts. It contains details of individual
    accounts, including a unique identifier (id) and the account name. Each row likely
    represents a separate sales account in a system or database.
  columns:
  - name: account_name
    description: Name of the sales account
    tests:
    - not_null
  - name: account_id
    description: Unique identifier for the sales account
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the sales account.
        For this table, each row represents a distinct sales account. By definition,
        a unique identifier should be unique across all rows, making it a suitable
        candidate key.

stg_usage_territory_source_type (first 100 rows)

acquisition_source daily_deletions daily_sessions daily_active_devices monthly_active_devices daily_installations territory meets_threshold app_id metric_date
0 App Store Search 2 4 3 44 0 Canada True 12345 2021-05-21
1 None 0 0 0 2 1 Canada True 12345 2020-12-28
2 Web Referrer 0 0 0 0 0 Canada True 12345 2021-06-07
3 None 2 10 5 42 0 Canada True 12345 2021-09-22
4 None 0 1 1 3 0 Canada True 12345 2021-07-02
5 App Store Browse 0 0 0 2 0 Canada True 12345 2021-03-16
6 Web Referrer 0 0 0 0 0 Canada True 12345 2021-09-30
7 App Store Search 0 3 1 2 0 Canada True 12345 2020-11-23
8 None 0 0 0 0 0 Canada True 12345 2021-02-13
9 App Store Search 0 0 0 0 0 Canada True 12345 2021-09-09

stg_usage_territory_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"usage_territory_source_type_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "source_type",
        "territory",
        "meets_threshold",
        "installations",
        "sessions",
        "active_devices",
        "active_devices_last_30_days",
        "deletions"
    FROM "usage_territory_source_type"
),

"usage_territory_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> metric_date
    -- source_type -> acquisition_source
    -- installations -> daily_installations
    -- sessions -> daily_sessions
    -- active_devices -> daily_active_devices
    -- active_devices_last_30_days -> monthly_active_devices
    -- deletions -> daily_deletions
    SELECT 
        app_id,
        date_ AS metric_date,
        source_type AS acquisition_source,
        territory,
        meets_threshold,
        installations AS daily_installations,
        sessions AS daily_sessions,
        active_devices AS daily_active_devices,
        active_devices_last_30_days AS monthly_active_devices,
        deletions AS daily_deletions
    FROM usage_territory_source_type_projected
),

"usage_territory_source_type_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- acquisition_source: ['Unavailable']
    SELECT 
        CASE
            WHEN "acquisition_source" = 'Unavailable' THEN NULL
            ELSE "acquisition_source"
        END AS "acquisition_source",
        "daily_deletions",
        "daily_sessions",
        "daily_active_devices",
        "metric_date",
        "monthly_active_devices",
        "app_id",
        "daily_installations",
        "territory",
        "meets_threshold"
    FROM "usage_territory_source_type_projected_renamed"
),

"usage_territory_source_type_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- metric_date: from VARCHAR to TIMESTAMP
    SELECT
        "acquisition_source",
        "daily_deletions",
        "daily_sessions",
        "daily_active_devices",
        "monthly_active_devices",
        "daily_installations",
        "territory",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("metric_date" AS TIMESTAMP) AS "metric_date"
    FROM "usage_territory_source_type_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "usage_territory_source_type_projected_renamed_null_casted"

stg_usage_territory_source_type.yml (Document the table)

version: 2
models:
- name: stg_usage_territory_source_type
  description: The table is about app usage metrics. It contains data for a specific
    app across different dates, territories, and source types. Key metrics include
    installations, sessions, active devices, and deletions. The data is segmented
    by date, territory, and source type, allowing for analysis of app performance
    across these dimensions.
  columns:
  - name: acquisition_source
    description: Origin of the app installation or interaction
    tests:
    - accepted_values:
        values:
        - App Store Search
        - Web Referrer
        - App Store Browse
        - Direct Link
        - Social Media
        - App Store Ads
        - In-App Referral
        - Email Campaign
        - Push Notification
        - QR Code
        - App Store Featured
        - Google Play Store Search
        - Google Play Store Browse
        - Google Play Store Ads
        - Third-party App Store
        - Pre-installed
        - Word of Mouth
        - TV Advertisement
        - Print Advertisement
        - Influencer Promotion
    cocoon_meta:
      missing_acceptable: Not applicable when user didn't come from a specific source.
  - name: daily_deletions
    description: Number of app deletions on the specific date
    tests:
    - not_null
  - name: daily_sessions
    description: Number of app sessions on the specific date
    tests:
    - not_null
  - name: daily_active_devices
    description: Number of devices using the app on the specific date
    tests:
    - not_null
  - name: monthly_active_devices
    description: Number of devices using the app in the last 30 days
    tests:
    - not_null
  - name: daily_installations
    description: Number of app installations on the specific date
    tests:
    - not_null
  - name: territory
    description: Geographical region where the app is used
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if data meets a certain threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: metric_date
    description: Date of the recorded metrics
    tests:
    - not_null

stg_app_store_platform_version_source_type (first 100 rows)

traffic_source unique_device_impressions unique_device_page_views ios_version total_page_views total_impressions meets_threshold app_id performance_date
0 App Referrer 0 0 iOS 1.0 0 0 True 12345 2021-08-01
1 App Store Search 0 0 iOS 1.0 0 0 True 12345 2021-07-01
2 App Referrer 0 0 iOS 1.0 0 0 True 12345 2021-08-02
3 App Referrer 28 28 iOS 1.0 30 30 True 12345 2021-08-03
4 App Referrer 0 0 iOS 1.0 0 0 True 12345 2021-08-04
5 App Store Browse 0 0 iOS 1.0 0 0 True 12345 2021-04-18
6 App Store Browse 0 0 iOS 1.0 0 0 True 12345 2021-04-19
7 None 0 0 iOS 1.0 0 0 True 12345 2020-12-04
8 Web Referrer 0 0 iOS 1.0 0 0 True 12345 2021-01-31
9 None 0 0 iOS 1.0 0 0 True 12345 2020-12-25

stg_app_store_platform_version_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"app_store_platform_version_source_type_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "platform_version",
        "source_type",
        "meets_threshold",
        "impressions",
        "impressions_unique_device",
        "page_views",
        "page_views_unique_device"
    FROM "app_store_platform_version_source_type"
),

"app_store_platform_version_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> performance_date
    -- platform_version -> ios_version
    -- source_type -> traffic_source
    -- impressions -> total_impressions
    -- impressions_unique_device -> unique_device_impressions
    -- page_views -> total_page_views
    -- page_views_unique_device -> unique_device_page_views
    SELECT 
        app_id,
        date_ AS performance_date,
        platform_version AS ios_version,
        source_type AS traffic_source,
        meets_threshold,
        impressions AS total_impressions,
        impressions_unique_device AS unique_device_impressions,
        page_views AS total_page_views,
        page_views_unique_device AS unique_device_page_views
    FROM app_store_platform_version_source_type_projected
),

"app_store_platform_version_source_type_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- traffic_source: ['Unavailable']
    SELECT 
        CASE
            WHEN "traffic_source" = 'Unavailable' THEN NULL
            ELSE "traffic_source"
        END AS "traffic_source",
        "unique_device_impressions",
        "performance_date",
        "app_id",
        "unique_device_page_views",
        "ios_version",
        "total_page_views",
        "total_impressions",
        "meets_threshold"
    FROM "app_store_platform_version_source_type_projected_renamed"
),

"app_store_platform_version_source_type_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- performance_date: from VARCHAR to TIMESTAMP
    SELECT
        "traffic_source",
        "unique_device_impressions",
        "unique_device_page_views",
        "ios_version",
        "total_page_views",
        "total_impressions",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("performance_date" AS TIMESTAMP) AS "performance_date"
    FROM "app_store_platform_version_source_type_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "app_store_platform_version_source_type_projected_renamed_null_casted"

stg_app_store_platform_version_source_type.yml (Document the table)

version: 2
models:
- name: stg_app_store_platform_version_source_type
  description: The table is about app performance data on the App Store. It includes
    details such as app ID, date, iOS version, traffic source, impression and page
    view metrics. Each row represents a daily snapshot of an app's performance for
    a specific platform version and source type. The data tracks both total and unique
    device metrics for impressions and page views.
  columns:
  - name: traffic_source
    description: Source of app traffic or discovery
    tests:
    - accepted_values:
        values:
        - App Store Browse
        - App Store Search
        - Web Referrer
        - App Referrer
        - Social Media
        - Paid Advertising
        - Word of Mouth
        - Email Campaign
        - Push Notification
        - In-App Promotion
        - App Store Featuring
        - Press/Media Coverage
        - Organic Search
        - Direct Traffic
    cocoon_meta:
      missing_acceptable: No traffic source when impressions and page views are zero.
  - name: unique_device_impressions
    description: Number of unique devices with impressions
    tests:
    - not_null
  - name: unique_device_page_views
    description: Number of unique devices with page views
    tests:
    - not_null
  - name: ios_version
    description: iOS version of the app
    tests:
    - not_null
  - name: total_page_views
    description: Total number of page views for the app
    tests:
    - not_null
  - name: total_impressions
    description: Total number of impressions for the app
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if data meets reporting threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: performance_date
    description: Date of the performance data
    tests:
    - not_null

stg_crashes_platform_version (first 100 rows)

device_type os_version meets_threshold crash_count app_id timestamp_
0 iPad iOS 1.0 True 0 12345 2021-05-10
1 iPad iOS 1.0 True 0 12345 2021-05-13
2 iPod iOS 1.0 True 0 12345 2021-05-12
3 iPod iOS 1.0 True 0 12345 2021-05-11
4 iPhone iOS 1.0 True 0 12345 2021-07-25
5 iPad iOS 1.0 True 0 12345 2021-09-11
6 iPhone iOS 1.0 True 0 12345 2021-04-10
7 iPhone iOS 1.0 True 0 12345 2021-04-11
8 iPhone iOS 1.0 True 0 12345 2021-01-03
9 iPad iOS 1.0 True 0 12345 2021-05-28

stg_crashes_platform_version.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"crashes_platform_version_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "device",
        "platform_version",
        "meets_threshold",
        "crashes"
    FROM "crashes_platform_version"
),

"crashes_platform_version_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> timestamp_
    -- device -> device_type
    -- platform_version -> os_version
    -- crashes -> crash_count
    SELECT 
        app_id,
        date_ AS timestamp_,
        device AS device_type,
        platform_version AS os_version,
        meets_threshold,
        crashes AS crash_count
    FROM crashes_platform_version_projected
),

"crashes_platform_version_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- timestamp_: from VARCHAR to TIMESTAMP
    SELECT
        "device_type",
        "os_version",
        "meets_threshold",
        "crash_count",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("timestamp_" AS TIMESTAMP) AS "timestamp_"
    FROM "crashes_platform_version_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "crashes_platform_version_projected_renamed_casted"

stg_crashes_platform_version.yml (Document the table)

version: 2
models:
- name: stg_crashes_platform_version
  description: The table is about app crashes on different devices. It includes details
    such as app ID, date, device type, platform version, whether it meets a threshold,
    and crash count. Each row represents a specific instance of crash data for an
    app on a particular device and platform version on a given date.
  columns:
  - name: device_type
    description: Type of device (iPad, iPod, iPhone)
    tests:
    - not_null
    - accepted_values:
        values:
        - iPad
        - iPhone
        - iPod
  - name: os_version
    description: Version of the operating system
    tests:
    - not_null
  - name: meets_threshold
    description: Boolean indicating if crash count meets threshold
    tests:
    - not_null
  - name: crash_count
    description: Number of crashes for the app
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: timestamp_
    description: Date and time of the recorded data
    tests:
    - not_null

stg_usage_app_version_source_type (first 100 rows)

source_type daily_deletions daily_sessions daily_active_devices monthly_active_devices daily_installations app_version meets_threshold app_id record_date
0 None 0 0 0 1 0 1.0.0 (iOS) True 12345 2021-03-20
1 App Store Browse 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-08-08
2 None 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-09-25
3 App Store Browse 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-03-26
4 App Store Search 0 0 0 0 0 1.0.0 (iOS) True 12345 2020-11-03
5 App Referrer 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-07-05
6 Institutional Purchase 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-04-01
7 App Store Search 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-07-10
8 None 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-06-08
9 Institutional Purchase 0 0 0 0 0 1.0.0 (iOS) True 12345 2021-10-20

stg_usage_app_version_source_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"usage_app_version_source_type_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "app_version",
        "date_",
        "source_type",
        "meets_threshold",
        "installations",
        "sessions",
        "active_devices",
        "active_devices_last_30_days",
        "deletions"
    FROM "usage_app_version_source_type"
),

"usage_app_version_source_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> record_date
    -- installations -> daily_installations
    -- sessions -> daily_sessions
    -- active_devices -> daily_active_devices
    -- active_devices_last_30_days -> monthly_active_devices
    -- deletions -> daily_deletions
    SELECT 
        app_id,
        app_version,
        date_ AS record_date,
        source_type,
        meets_threshold,
        installations AS daily_installations,
        sessions AS daily_sessions,
        active_devices AS daily_active_devices,
        active_devices_last_30_days AS monthly_active_devices,
        deletions AS daily_deletions
    FROM usage_app_version_source_type_projected
),

"usage_app_version_source_type_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- source_type: ['Unavailable']
    SELECT 
        CASE
            WHEN "source_type" = 'Unavailable' THEN NULL
            ELSE "source_type"
        END AS "source_type",
        "daily_deletions",
        "daily_sessions",
        "daily_active_devices",
        "record_date",
        "monthly_active_devices",
        "app_id",
        "daily_installations",
        "app_version",
        "meets_threshold"
    FROM "usage_app_version_source_type_projected_renamed"
),

"usage_app_version_source_type_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- record_date: from VARCHAR to TIMESTAMP
    SELECT
        "source_type",
        "daily_deletions",
        "daily_sessions",
        "daily_active_devices",
        "monthly_active_devices",
        "daily_installations",
        "app_version",
        "meets_threshold",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("record_date" AS TIMESTAMP) AS "record_date"
    FROM "usage_app_version_source_type_projected_renamed_null"
)

-- COCOON BLOCK END
SELECT * FROM "usage_app_version_source_type_projected_renamed_null_casted"

stg_usage_app_version_source_type.yml (Document the table)

version: 2
models:
- name: stg_usage_app_version_source_type
  description: The table is about app usage data. It contains details of an app's
    performance metrics across different versions, dates, and source types. The data
    includes installations, sessions, active devices, and deletions. Each row represents
    a specific combination of app version, date, and source type, providing a comprehensive
    view of the app's usage patterns and user engagement over time.
  columns:
  - name: source_type
    description: Source of app discovery or installation
    tests:
    - accepted_values:
        values:
        - App Store Browse
        - App Store Search
        - Institutional Purchase
        - App Referrer
        - Website
        - Social Media
        - Advertisement
        - Word of Mouth
        - Pre-installed
        - App Store Featured
        - Third-party App Store
        - QR Code
        - Email Campaign
        - Push Notification
    cocoon_meta:
      missing_acceptable: No specific source for app discovery or installation.
  - name: daily_deletions
    description: Number of app deletions on the given date
    tests:
    - not_null
  - name: daily_sessions
    description: Number of app sessions on the given date
    tests:
    - not_null
  - name: daily_active_devices
    description: Number of active devices on the given date
    tests:
    - not_null
  - name: monthly_active_devices
    description: Number of active devices in the past 30 days
    tests:
    - not_null
  - name: daily_installations
    description: Number of app installations on the given date
    tests:
    - not_null
  - name: app_version
    description: Version of the app and platform
    tests:
    - not_null
  - name: meets_threshold
    description: Indicates if data meets reporting threshold
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: record_date
    description: Date of the recorded data
    tests:
    - not_null

stg_app (first 100 rows)

is_enabled app_name asset_token icon_url user_opt_in_percentage is_ios_compatible is_tvos_compatible is_bundle app_id pre_order_details
0 True Super Cool Name Random Asset Token Random Icon URL 10 True False False 12345 None

stg_app.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"app_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "is_enabled",
        "name",
        "asset_token",
        "pre_order_info",
        "icon_url",
        "app_opt_in_rate",
        "ios",
        "tvos",
        "is_bundle"
    FROM "app"
),

"app_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> app_id
    -- name -> app_name
    -- pre_order_info -> pre_order_details
    -- app_opt_in_rate -> user_opt_in_percentage
    -- ios -> is_ios_compatible
    -- tvos -> is_tvos_compatible
    SELECT 
        id AS app_id,
        is_enabled,
        name AS app_name,
        asset_token,
        pre_order_info AS pre_order_details,
        icon_url,
        app_opt_in_rate AS user_opt_in_percentage,
        ios AS is_ios_compatible,
        tvos AS is_tvos_compatible,
        is_bundle
    FROM app_projected
),

"app_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- pre_order_details: from DECIMAL to VARCHAR
    SELECT
        "is_enabled",
        "app_name",
        "asset_token",
        "icon_url",
        "user_opt_in_percentage",
        "is_ios_compatible",
        "is_tvos_compatible",
        "is_bundle",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("pre_order_details" AS VARCHAR) AS "pre_order_details"
    FROM "app_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "app_projected_renamed_casted"

stg_app.yml (Document the table)

version: 2
models:
- name: stg_app
  description: The table is about mobile applications. It contains details like the
    app's ID, name, whether it's enabled, asset token, icon URL, opt-in rate, platform
    availability (iOS, tvOS), and if it's a bundle. The table also includes a pre-order
    info field and tracks the app's opt-in rate. This data likely represents an app
    store or app management system's database.
  columns:
  - name: is_enabled
    description: Indicates if the app is currently active
    tests:
    - not_null
  - name: app_name
    description: The name of the application
    tests:
    - not_null
  - name: asset_token
    description: Unique identifier for app's digital assets
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for app's digital assets. It's
        likely to be unique for each app, as digital assets are typically managed
        individually.
  - name: icon_url
    description: URL link to the app's icon image
    tests:
    - not_null
  - name: user_opt_in_percentage
    description: Percentage of users who opted in to the app
    tests:
    - not_null
  - name: is_ios_compatible
    description: Indicates if the app is available on iOS
    tests:
    - not_null
  - name: is_tvos_compatible
    description: Indicates if the app is available on tvOS
    tests:
    - not_null
  - name: is_bundle
    description: Indicates if the app is part of a bundle
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the app. In
        app management systems, app_id is typically designed to be a unique identifier
        for each application.
  - name: pre_order_details
    description: Information about app pre-order availability
    cocoon_meta:
      missing_acceptable: Not applicable for apps not available for pre-order.

stg_usage_source_type_device (first 100 rows)

device_type acquisition_channel meets_threshold new_installations daily_sessions daily_active_devices monthly_active_devices app_deletions app_id record_date
0 iPhone App Store Browse True 6 58 34 406 3 12345 2020-11-27
1 Desktop App Store Search True 0 0 0 0 0 12345 2021-01-04
2 iPad App Referrer True 1 8 5 60 0 12345 2021-05-18
3 iPod App Store Browse True 0 0 0 0 0 12345 2021-09-10
4 iPad App Store Browse True 1 10 3 55 1 12345 2021-02-19
5 iPod Web Referrer True 0 0 0 1 0 12345 2021-07-13
6 iPod Web Referrer True 0 0 0 1 0 12345 2020-12-22
7 iPod Web Referrer True 0 0 0 1 0 12345 2021-07-31
8 iPad Web Referrer True 6 41 21 273 1 12345 2021-01-09
9 iPhone App Store Browse True 3 65 27 374 5 12345 2021-02-25

stg_usage_source_type_device.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"usage_source_type_device_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "date_",
        "device",
        "source_type",
        "meets_threshold",
        "installations",
        "sessions",
        "active_devices",
        "active_devices_last_30_days",
        "deletions"
    FROM "usage_source_type_device"
),

"usage_source_type_device_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> record_date
    -- device -> device_type
    -- source_type -> acquisition_channel
    -- installations -> new_installations
    -- sessions -> daily_sessions
    -- active_devices -> daily_active_devices
    -- active_devices_last_30_days -> monthly_active_devices
    -- deletions -> app_deletions
    SELECT 
        app_id,
        date_ AS record_date,
        device AS device_type,
        source_type AS acquisition_channel,
        meets_threshold,
        installations AS new_installations,
        sessions AS daily_sessions,
        active_devices AS daily_active_devices,
        active_devices_last_30_days AS monthly_active_devices,
        deletions AS app_deletions
    FROM usage_source_type_device_projected
),

"usage_source_type_device_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- record_date: from VARCHAR to TIMESTAMP
    SELECT
        "device_type",
        "acquisition_channel",
        "meets_threshold",
        "new_installations",
        "daily_sessions",
        "daily_active_devices",
        "monthly_active_devices",
        "app_deletions",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("record_date" AS TIMESTAMP) AS "record_date"
    FROM "usage_source_type_device_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "usage_source_type_device_projected_renamed_casted"

stg_usage_source_type_device.yml (Document the table)

version: 2
models:
- name: stg_usage_source_type_device
  description: The table is about app usage statistics. It includes details such as
    app ID, date, device type, source type, installation count, session count, active
    devices, and deletions. The data is broken down by device (iPhone, iPad, Desktop,
    iPod) and source type (App Store Browse, App Store Search, App Referrer). It tracks
    various metrics over time to analyze app performance and user engagement across
    different platforms and acquisition channels.
  columns:
  - name: device_type
    description: Type of device (iPhone, iPad, Desktop, iPod)
    tests:
    - not_null
    - accepted_values:
        values:
        - iPod
        - iPad
        - iPhone
        - Desktop
  - name: acquisition_channel
    description: Channel through which the app was discovered
    tests:
    - not_null
    - accepted_values:
        values:
        - App Store Browse
        - Web Referrer
        - App Referrer
        - App Store Search
        - Social Media
        - Word of Mouth
        - Email Marketing
        - Push Notification
        - In-App Promotion
        - App Store Ads
        - Google Ads
        - Facebook Ads
        - Instagram Ads
        - Twitter Ads
        - YouTube Ads
        - Influencer Marketing
        - TV Advertising
        - Print Media
        - Outdoor Advertising
        - Event Marketing
        - App Store Featuring
        - Pre-installed on Device
        - App Bundles
        - Cross-Promotion
        - QR Code
        - Direct URL
  - name: meets_threshold
    description: Boolean indicating if data meets reporting threshold
    tests:
    - not_null
  - name: new_installations
    description: Number of new app installations on the given date
    tests:
    - not_null
  - name: daily_sessions
    description: Number of app sessions on the given date
    tests:
    - not_null
  - name: daily_active_devices
    description: Number of active devices on the given date
    tests:
    - not_null
  - name: monthly_active_devices
    description: Number of active devices in the last 30 days
    tests:
    - not_null
  - name: app_deletions
    description: Number of app deletions on the given date
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the app
    tests:
    - not_null
  - name: record_date
    description: Date of the recorded data
    tests:
    - not_null

stg_crashes_app_version (first 100 rows)

app_version device_type threshold_met crash_count app_id record_datetime
0 1.0.0 (iOS) iPod True 0 12345 2021-01-22
1 1.0.0 (iOS) iPod True 0 12345 2021-01-07
2 1.0.0 (iOS) iPod True 0 12345 2020-11-15
3 1.0.0 (iOS) iPhone True 0 12345 2021-06-18
4 1.0.0 (iOS) Desktop True 0 12345 2021-07-30
5 1.0.0 (iOS) iPad True 0 12345 2021-05-02
6 1.0.0 (iOS) iPhone True 0 12345 2021-06-19
7 1.0.0 (iOS) Desktop True 0 12345 2020-12-05
8 1.0.0 (iOS) iPod True 0 12345 2020-12-21
9 1.0.0 (iOS) iPod True 0 12345 2021-04-12

stg_crashes_app_version.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"crashes_app_version_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "app_id",
        "app_version",
        "date_",
        "device",
        "meets_threshold",
        "crashes"
    FROM "crashes_app_version"
),

"crashes_app_version_projected_renamed" AS (
    -- Rename: Renaming columns
    -- date_ -> record_datetime
    -- device -> device_type
    -- meets_threshold -> threshold_met
    -- crashes -> crash_count
    SELECT 
        app_id,
        app_version,
        date_ AS record_datetime,
        device AS device_type,
        meets_threshold AS threshold_met,
        crashes AS crash_count
    FROM crashes_app_version_projected
),

"crashes_app_version_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- app_id: from INT to VARCHAR
    -- record_datetime: from VARCHAR to TIMESTAMP
    SELECT
        "app_version",
        "device_type",
        "threshold_met",
        "crash_count",
        CAST("app_id" AS VARCHAR) AS "app_id",
        CAST("record_datetime" AS TIMESTAMP) AS "record_datetime"
    FROM "crashes_app_version_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "crashes_app_version_projected_renamed_casted"

stg_crashes_app_version.yml (Document the table)

version: 2
models:
- name: stg_crashes_app_version
  description: The table is about app crash data. It includes the app ID, app version,
    date, device type, a threshold indicator, and number of crashes. Each row represents
    a specific instance of crash data for an app version on a particular device and
    date. The data appears to track crash occurrences across different versions and
    devices over time.
  columns:
  - name: app_version
    description: Version of the app, including platform
    tests:
    - not_null
  - name: device_type
    description: Type of device running the app
    tests:
    - not_null
    - accepted_values:
        values:
        - iPod
        - Desktop
        - iPhone
        - iPad
        - Android Phone
        - Android Tablet
        - Windows Phone
        - Windows Tablet
        - MacBook
        - Smart TV
        - Gaming Console
        - Smartwatch
        - E-reader
  - name: threshold_met
    description: Indicator if crash count meets certain threshold
    tests:
    - not_null
  - name: crash_count
    description: Number of crashes recorded
    tests:
    - not_null
  - name: app_id
    description: Unique identifier for the application
    tests:
    - not_null
  - name: record_datetime
    description: Date and time of the recorded data
    tests:
    - not_null
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_crashes_platform_version stg_crashes_platform_version stg_app stg_app stg_crashes_platform_version->stg_app stg_app_store_territory_source_type stg_app_store_territory_source_type stg_app_store_territory_source_type->stg_app stg_usage_territory_source_type stg_usage_territory_source_type stg_usage_territory_source_type->stg_app stg_usage_app_version_source_type stg_usage_app_version_source_type stg_usage_app_version_source_type->stg_app stg_usage_source_type_device stg_usage_source_type_device stg_usage_source_type_device->stg_app stg_sales_subscription_events stg_sales_subscription_events stg_sales_subscription_events->stg_app stg_sales_account stg_sales_account stg_sales_subscription_events->stg_sales_account stg_app_store_platform_version_source_type stg_app_store_platform_version_source_type stg_app_store_platform_version_source_type->stg_app stg_downloads_source_type_device stg_downloads_source_type_device stg_downloads_source_type_device->stg_app stg_downloads_platform_version_source_type stg_downloads_platform_version_source_type stg_downloads_platform_version_source_type->stg_app stg_downloads_territory_source_type stg_downloads_territory_source_type stg_downloads_territory_source_type->stg_app stg_usage_platform_version_source_type stg_usage_platform_version_source_type stg_usage_platform_version_source_type->stg_app stg_sales_subscription_summary stg_sales_subscription_summary stg_sales_subscription_summary->stg_app stg_sales_subscription_summary->stg_sales_account stg_app_store_source_type_device stg_app_store_source_type_device stg_app_store_source_type_device->stg_app stg_crashes_app_version stg_crashes_app_version stg_crashes_app_version->stg_app

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_app
  primary_key: app_id
  foreign_keys: []
- table_name: stg_app_store_platform_version_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_app_store_source_type_device
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_app_store_territory_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_crashes_app_version
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_crashes_platform_version
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_downloads_platform_version_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_downloads_source_type_device
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_downloads_territory_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_usage_app_version_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_usage_platform_version_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_usage_source_type_device
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_usage_territory_source_type
  foreign_keys:
  - column: app_id
    reference:
      table_name: stg_app
      column: app_id
- table_name: stg_sales_subscription_events
  foreign_keys:
  - column: app_apple_id
    reference:
      table_name: stg_app
      column: app_id
  - column: account_number
    reference:
      table_name: stg_sales_account
      column: account_id
- table_name: stg_sales_subscription_summary
  foreign_keys:
  - column: app_apple_id
    reference:
      table_name: stg_app
      column: app_id
  - column: account_number
    reference:
      table_name: stg_sales_account
      column: account_id
- table_name: stg_sales_account
  primary_key: account_id
  foreign_keys: []
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Applications
  entity_description: Represents mobile applications in an app store or app management
    system, containing details about each app's properties, availability, and performance
    metrics.
  table_name: stg_app
  primary_key: app_id
- entity_name: Sales Accounts
  entity_description: Represents individual sales accounts in a sales management system,
    containing basic identification information for each account.
  table_name: stg_sales_account
  primary_key: account_id
relations:
- relation_description: This table tracks daily performance metrics for Applications
    on the App Store across different iOS versions and traffic sources.
  table_name: stg_app_store_platform_version_source_type
  entities:
  - Applications
- relation_description: This table tracks performance metrics for Applications across
    different devices, discovery sources, and dates in app stores.
  table_name: stg_app_store_source_type_device
  entities:
  - Applications
- relation_description: This table stores performance metrics and engagement data
    for Applications in the App Store across different territories and discovery sources.
  table_name: stg_app_store_territory_source_type
  entities:
  - Applications
- relation_description: This table tracks crash data for different versions of Applications
    across various device types and dates.
  table_name: stg_crashes_app_version
  entities:
  - Applications
- relation_description: This table records crash data for Applications across different
    devices, platform versions, and dates.
  table_name: stg_crashes_platform_version
  entities:
  - Applications
- relation_description: This table tracks download statistics for Applications across
    different platform versions, download sources, and dates.
  table_name: stg_downloads_platform_version_source_type
  entities:
  - Applications
- relation_description: This stores the download statistics for Applications, including
    details about device types, sources, and download counts.
  table_name: stg_downloads_source_type_device
  entities:
  - Applications
- relation_description: This table tracks download statistics for Applications across
    different territories and acquisition channels over time.
  table_name: stg_downloads_territory_source_type
  entities:
  - Applications
- relation_description: This table captures detailed usage metrics for Applications
    across different versions, dates, and source types.
  table_name: stg_usage_app_version_source_type
  entities:
  - Applications
- relation_description: This table stores usage metrics and acquisition data for Applications
    across different platform versions and sources.
  table_name: stg_usage_platform_version_source_type
  entities:
  - Applications
- relation_description: This table tracks usage statistics and performance metrics
    for Applications across different devices and acquisition channels over time.
  table_name: stg_usage_source_type_device
  entities:
  - Applications
- relation_description: This table tracks usage metrics for Applications across different
    territories, dates, and acquisition sources.
  table_name: stg_usage_territory_source_type
  entities:
  - Applications
- relation_name: ApplicationSubscriptionEvents
  relation_description: This tracks subscription events for Applications, which are
    purchased and managed by Sales Accounts across various countries and devices.
  table_name: stg_sales_subscription_events
  entities:
  - Applications
  - Sales Accounts
- relation_name: ApplicationSubscriptionSummary
  relation_description: This table reports daily subscription summary for Applications,
    linking them to Sales Accounts and capturing detailed revenue.
  table_name: stg_sales_subscription_summary
  entities:
  - Applications
  - Sales Accounts
story:
- relation_name: ApplicationSubscriptionEvents
  story_line: Sales Accounts manages Application subscriptions.
- relation_name: ApplicationSubscriptionSummary
  story_line: System generates daily revenue reports for Application subscriptions for sales accounts.