Fivetran Shopify

We display the source tables from the data warehouses to model.

shopify_abandoned_checkout_data (first 100 rows)

id _fivetran_synced abandoned_checkout_url applied_discount_amount applied_discount_applicable applied_discount_description applied_discount_non_applicable_reason applied_discount_title applied_discount_value applied_discount_value_type billing_address_address_1 billing_address_address_0 billing_address_city billing_address_company billing_address_country billing_address_country_code billing_address_first_name billing_address_last_name billing_address_latitude billing_address_longitude billing_address_name billing_address_phone billing_address_province billing_address_province_code billing_address_zip buyer_accepts_marketing cart_token closed_at completed_at created_at credit_card_first_name credit_card_last_name credit_card_month credit_card_number credit_card_verification_value credit_card_year currency customer_id customer_locale device_id email gateway landing_site_base_url location_id name note phone referring_site shipping_address_address_1 shipping_address_address_0 shipping_address_city shipping_address_company shipping_address_country shipping_address_country_code shipping_address_first_name shipping_address_last_name shipping_address_latitude shipping_address_longitude shipping_address_name shipping_address_phone shipping_address_province shipping_address_province_code shipping_address_zip shipping_line shipping_rate_id shipping_rate_price shipping_rate_title source source_identifier source_name source_url subtotal_price taxes_included token total_discounts total_line_items_price total_price total_tax total_weight updated_at user_id note_attribute_littledata_updated_at note_attribute_segment_client_id billing_address_id billing_address_is_default presentment_currency shipping_address_id shipping_address_is_default total_duties note_attribute_email_client_id note_attributes note_attribute_google_client_id _fivetran_deleted
0 12111 2020-06-03 11:11:51.015110 https://kitties.com/1111311610/checkouts/f050eda125a10cca513162f01101b261/recover?key=bd0fdf1dc1a1af01aecbdaa3101ec063 NaN NaN NaN NaN NaN NaN NaN None None None NaN None None None None NaN NaN None NaN None None NaN False aaaa211622dfb133 NaN NaN 2020-11-12 10:06:50.111111 NaN NaN NaN NaN NaN NaN USD 121 en NaN tnyrnbs@hh.com paypal /collections/the-archive-sale NaN #10160311 NaN NaN None 123 main st Apt 02 Washington NaN United States US Pauly D 31.111511 -26.112602 DJ PAULY D (115) 061-1012 District of Columbia DC 12305 NaN NaN NaN NaN NaN NaN web NaN 56.00 False f050eda12f111b261 1.00 560.0 501.36 13.36 1 2020-11-12 10:51:10.111111 NaN NaN None NaN NaN None NaN NaN NaN NaN None NaN NaN
1 11111 2020-01-11 06:01:35.021111 https://kitties.com/1111311610/checkouts/6661ff02165dfd11b12db112f0111226/recover?key=51611efdff11e0caccc0fd30b0e1e202 NaN NaN NaN NaN NaN NaN NaN village Apt 0 daytona Beach NaN Florida US ohio Calles 1.126113 -21.502661 hi 5.026611e+10 Healdsburg PA-11 NaN False 611faa630ce5e6bcc0bacc2a105c0126 NaN NaN 2020-05-11 01:01:30.111111 NaN NaN NaN NaN NaN NaN USD 366525 en NaN hyrehher@gmail.com None /collections/sale NaN #13311 NaN NaN https://www.google.com/ 123 main st Pty 3 ghreiuhtg NaN United States US ohio Calle pty115 NaN NaN ohio Calle pty115 +12161115152 Florida FL 33120 NaN NaN NaN NaN NaN NaN web NaN 10.35 False a165dfd11226 16.65 111.0 10.35 1.00 1 2020-05-11 01:06:35.111111 NaN NaN None NaN NaN None NaN NaN NaN NaN None NaN NaN
2 66531 2021-11-11 11:02:30.112110 https://kitties.com/1111311610/checkouts/0abddd111c0211f1e616ec0d0c32021c/recover?key=abed6505d26f1a60a50aa0c02e01be31 NaN NaN NaN NaN NaN NaN NaN None None None NaN None None None None NaN NaN None NaN None None NaN False aaaaa61e1d11af3adfac1f0 NaN NaN 2021-11-11 02:05:13.111111 NaN NaN NaN NaN NaN NaN USD 160363 en NaN hernebbe@hr.com None /collections/new NaN #166531 NaN NaN https://l.facebook.com/ 11-01 01st St apt 0C Springfield NaN United States US dan the man NaN NaN dan the man +13021115311 New York NY 11111-020 NaN NaN NaN NaN NaN NaN web NaN 191.00 False l1abddd111c0211f2021c 1.00 111.0 111.00 1.00 1 2021-11-11 02:05:55.111111 NaN 125150.0 a111c-30fc-0bb6-a25e-06f201c6035c NaN NaN USD NaN NaN NaN NaN [{"name":"segment-clientID","value":"610a111c-30fc-0bb6-a25e-06f201c6035c"},{"name":"_updatedAt","value":"1613121625150"}] NaN NaN

shopify_abandoned_checkout_discount_code_data (first 100 rows)

checkout_id index_ _fivetran_synced amount discount_id code created_at type updated_at usage_count
0 901163 0 2022-12-07 06:49:37.929000 0.0 NaN CYBER12 NaN percentage NaN NaN
1 4334827 0 2022-12-07 06:49:37.926000 0.0 NaN CYBER12 NaN percentage NaN NaN
2 4566403 0 2022-12-07 06:49:33.182000 0.0 NaN BONUS NaN percentage NaN NaN

shopify_abandoned_checkout_shipping_line_data (first 100 rows)

checkout_id index_ _fivetran_synced api_client_id carrier_identifier carrier_service_id code delivery_category discounted_price id markup phone price requested_fulfillment_service_id source title validation_context delivery_expectation_range delivery_expectation_type original_shop_markup original_shop_price presentment_title delivery_expectation_range_min delivery_expectation_range_max
0 653675 1 2023-01-09 06:48:18.093000 NaN NaN NaN Standard NaN NaN c3ce0972c2e30eaf7001bea 0.0 NaN 0.0 NaN shopify Standard NaN NaN NaN 0.0 0.0 Standard NaN NaN
1 379 1 2023-01-09 06:48:23.540000 NaN NaN NaN Standard NaN NaN bf7c90953344902c13 0.0 NaN 0.0 NaN shopify Standard NaN NaN NaN 0.0 0.0 Standard NaN NaN
2 635 1 2023-01-09 06:48:24.243000 NaN NaN NaN Standard NaN NaN 519ff4275cd972e282db 0.0 NaN 0.0 NaN shopify Standard NaN NaN NaN 0.0 0.0 Standard NaN NaN
3 3211 1 2023-01-09 06:48:18.068000 NaN NaN NaN Standard NaN NaN 8d18671d481ad46a 0.0 NaN 0.0 NaN shopify Standard NaN NaN NaN 0.0 0.0 Standard NaN NaN
4 381227 1 2023-01-09 06:48:16.985000 NaN NaN NaN Standard NaN NaN 8f2fab1b455ec9e597 0.0 NaN 0.0 NaN shopify Standard NaN NaN NaN 0.0 0.0 Standard NaN NaN

shopify_collection_data (first 100 rows)

id _fivetran_deleted _fivetran_synced handle published_at published_scope title updated_at disjunctive rules sort_order template_suffix body_html
0 997355 True 2021-09-01 05:53:25.838000 NaN NaN NaN NaN 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN
1 9930779 True 2021-09-01 05:53:26.673000 NaN NaN NaN NaN 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN
2 99967 True 2022-04-08 06:52:19.524000 NaN NaN NaN NaN 1970-01-01 00:00:00.000000 NaN NaN NaN NaN NaN

shopify_collection_product_data (first 100 rows)

collection_id product_id _fivetran_synced
0 37124 789131 2022-11-18 21:32:43.188000
1 9037124 74353899 2022-11-18 21:32:43.188000
2 37124 8891 2022-11-18 21:32:43.188000

shopify_customer_data (first 100 rows)

id first_name last_name email phone state orders_count total_spent created_at updated_at accepts_marketing tax_exempt verified_email default_address_id _fivetran_synced
0 3588998496353 29e00d3659d1c5e75f99e892f0c1a1f1 3f0e6a46fb84eb1e6f5f00d86aa53b1b ab0bf25ab8b2a6b78af26a141dd6f455 NaN disabled 0 0.00 2020-09-11 13:26:15.000 2020-09-11 13:26:15.000 False False True 3951726461025 2020-09-12 00:14:04.512
1 3589760876641 f0962b7a185488ecb752cedac1038349 aa35cb67c26e64bb81a1bf3f17e858ba 021cb20b5c78751fc7ddc091b6b69b3e NaN invited 1 2.80 2020-09-11 19:35:42.000 2020-09-11 19:41:04.000 True False True 3952669655137 2020-09-12 00:14:04.506
2 3584045351009 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 dce90c7b4e52e045e5975836aff49cf1 NaN disabled 2 9.18 2020-09-09 22:57:44.000 2020-09-09 23:01:55.000 False False True 3946055729249 2020-09-10 00:13:59.106

shopify_customer_tag_data (first 100 rows)

customer_id index_ _fivetran_synced value_
0 9919268 1 2022-12-03 06:49:03.314000 GGPP
1 4404 1 2022-12-03 06:48:53.295000 GGPP
2 5509188 1 2022-12-03 06:48:55.067000 GGPP

shopify_discount_code_data (first 100 rows)

id _fivetran_synced code created_at price_rule_id updated_at usage_count
0 4773499 2021-12-10 07:04:44.670000 CHECKVB34DDBQ3VH 2021-12-10 06:48:35.000000 32543 2021-12-10 06:48:35.000000 0.0
1 436267 2021-12-10 07:04:44.670000 CHECKVBLJG22DDD 2021-12-10 06:48:35.000000 12543 2021-12-10 06:48:35.000000 0.0
2 469035 2021-12-10 07:04:44.670000 CHECKV44CCCBCWB7 2021-12-10 06:48:35.000000 12543 2021-12-10 06:48:35.000000 0.0

shopify_fulfillment_data (first 100 rows)

id _fivetran_synced created_at location_id order_id status tracking_company tracking_number updated_at tracking_numbers tracking_urls shipment_status service name receipt_authorization
0 423844 2022-11-22 08:06:32.902000 2019-07-13 01:17:22.000000 123548 1228100 success NaN NaN 2019-07-13 01:17:22.000000 [] [] NaN manual #151212.1 NaN
1 8308 2022-11-22 08:06:33.863000 2019-07-13 01:17:21.000000 548 1274564 success NaN NaN 2019-07-13 01:17:22.000000 [] [] NaN manual #152317.1 NaN
2 548932 2022-11-22 08:06:56.262000 2019-07-13 01:17:21.000000 12348 1284 success NaN NaN 2019-07-13 01:17:21.000000 [] [] NaN manual #1555923.1 NaN

shopify_fulfillment_event_data (first 100 rows)

id _fivetran_synced address_1 city country created_at estimated_delivery_at fulfillment_id happened_at latitude longitude message order_id province shop_id status updated_at zip _fivetran_deleted
0 451435 2022-11-18 04:39:07.945000 NaN None None 2022-08-29 20:52:39.000000 None 40495 2022-08-29 20:52:39.000000 NaN NaN None 4502987 None 89440612 delivered 2022-08-29 20:52:39.000000 None False
1 48779 2022-11-18 05:48:01.773000 NaN LONDON GB 2022-09-13 08:07:57.000000 None 4064737 2022-08-15 12:41:00.000000 101.349998 -14.033300 Delay 4588203 None 320612 out_for_delivery 2022-09-13 08:07:57.000000 CR0 False
2 1481515 2022-11-18 05:41:00.745000 NaN ECHO PARK AU 2022-09-14 14:16:52.000000 2022-09-14 08:00:00.000000 4019339 2022-09-14 01:26:00.000000 -3.797699 190.783958 Delay 451915 None 89320612 delayed 2022-09-14 14:16:52.000000 2759 False
3 558955 2022-11-18 10:51:24.286000 NaN LAZYTOWN US 2022-08-13 12:40:26.000000 None 402947 2022-03-01 10:36:39.000000 22.337700 -71.731003 Delay 429188587 MA 89420612 in_transit 2022-08-13 12:40:26.000000 01505 False
4 6904235 2022-11-18 08:58:00.458000 NaN LA US 2022-08-24 06:29:21.000000 2022-08-24 23:59:59.000000 4060491 2022-08-24 05:30:57.000000 12.287498 -21.357399 Delay 4242667 MA 89420612 in_transit 2022-08-24 06:29:21.000000 01760 False

shopify_inventory_item_data (first 100 rows)

id _fivetran_synced cost created_at requires_shipping sku tracked updated_at country_code_of_origin province_code_of_origin _fivetran_deleted
0 4555 2021-12-18 06:56:22.877000 NaN NaN NaN NaN NaN NaN NaN NaN True
1 501419 2022-02-25 06:52:29.767000 NaN NaN NaN NaN NaN NaN NaN NaN True
2 851179 2022-02-24 06:52:33.361000 NaN NaN NaN NaN NaN NaN NaN NaN True

shopify_inventory_level_data (first 100 rows)

inventory_item_id location_id _fivetran_synced available updated_at
0 780939 287748 2021-11-13 08:02:21.760000 NaN NaN
1 6027 287748 2021-11-13 08:02:21.760000 NaN NaN
2 515 28748 2021-11-06 08:04:16.213000 NaN NaN

shopify_location_data (first 100 rows)

id _fivetran_synced active address_1 address_2 city country created_at legacy name phone province updated_at zip country_code country_name localized_country_name localized_province_name province_code _fivetran_deleted
0 8777748 2022-12-07 06:43:31.005000 True None NaN None US 2019-06-11 15:58:20.000000 True Plum NaN None 2019-06-11 15:58:20.000000 NaN US United States United States None None False
1 7748 2022-12-07 06:43:31.005000 True 111 Tree Road NaN Tree US 2018-12-10 16:24:07.000000 False Plum Express NaN NY 2019-05-16 13:37:39.000000 7394.0 US United States United States New Yorl NY False

shopify_metafield_data (first 100 rows)

id _fivetran_synced created_at description key_ namespace owner_id owner_resource updated_at value_ value_type type
0 5445055 2022-11-19 10:06:09.531000 2019-10-28 20:06:39.000000 NaN returnAuthorizations blade_runner 390244 order 2019-10-28 20:06:39.000000 [{"id":"ce95-49e4-9daf-41f29bbbb799","totalValue":44444,"status":"RECEIVED","payload":{"totalReturnValue":4444,"validReturnItems":[{"UPC":"19073825552","Quantity":"1","Reason":"changed-mind","LineItem":"40055558892132"}]},"createdAt":"2019-10-28T20:06:39.569Z","modifiedAt":"2019-10-28T20:06:39.569Z"}] NaN json_string
1 6337647 2022-11-21 01:57:33.851000 2020-06-17 11:35:28.000000 NaN returnAuthorizations blade_runner 254671 order 2020-06-17 11:35:28.000000 [{"id":"557ece73-658b-cf694dcd3f7e","totalValue":4444,"status":"RECEIVED","payload":{"totalReturnValue":444.77,"validReturnItems":[{"UPC":"19055550468","Quantity":"1","Reason":"fit-issues","LineItem":"4935555579471"}]},"createdAt":"2020-06-17T11:35:28.469Z","modifiedAt":"2020-06-17T11:35:28.470Z"}] NaN json_string
2 576111 2022-11-21 03:19:59.064000 2020-06-10 18:35:44.000000 NaN returnAuthorizations blade_runner 22527 order 2020-06-10 18:35:44.000000 [{"id":"e461c20a-9dc7-d38de1c9012a","totalValue":4444,"status":"RECEIVED","payload":{"totalReturnValue":444,"validReturnItems":[{"UPC":"190735551121","Quantity":"1","Reason":"too-big","LineItem":"4925555231"}]},"createdAt":"2020-06-10T18:35:44.043Z","modifiedAt":"2020-06-10T18:35:44.043Z"}] NaN json_string
3 55241839 2022-11-21 01:29:09.347000 2020-07-15 21:24:16.000000 NaN returnAuthorizations blade_runner 2335775 order 2020-07-15 21:24:16.000000 [{"id":"0c79163e-f55b56f50aff","totalValue":44478.000000000004,"status":"RECEIVED","payload":{"totalReturnValue":4444.78000000000003,"validReturnItems":[{"UPC":"190555325","Quantity":"1","Reason":"fit-issues","LineItem":"5555599407"}]},"createdAt":"2020-07-15T21:24:16.210Z","modifiedAt":"2020-07-15T21:24:16.210Z"}] NaN json_string
4 4575 2022-11-21 03:07:20.669000 2020-06-24 17:23:12.000000 NaN returnAuthorizations blade_runner 220655 order 2020-06-24 17:23:12.000000 [{"id":"3679-4811-94fd-555bf9846753","totalValue":44581,"status":"BACKEND_GENERATED","payload":{"totalReturnValue":4444.81,"validReturnItems":[{"UPC":"190735558","Quantity":1,"Reason":"Changed My Mind","LineItem":"455555711"}]},"createdAt":"2020-06-24T17:23:12.272Z","modifiedAt":"2020-06-24T17:23:12.272Z"}] NaN json_string

shopify_order_adjustment_data (first 100 rows)

id order_id refund_id amount tax_amount kind reason amount_set tax_amount_set _fivetran_synced
0 109271056455 2712175083591 675617407047 -465 0.0 shipping_refund Shipping refund NaN NaN 2020-11-14 07:52:56.522
1 109277085767 2773486501959 675634708551 -95 0.0 shipping_refund Shipping refund NaN NaN 2020-11-14 07:54:41.682
2 109245956167 2771757826119 675548168263 -27 -1.6 shipping_refund Shipping refund NaN NaN 2020-11-14 07:44:24.602
3 109248118855 2771329908807 675555016775 -35 0.0 shipping_refund Shipping refund NaN NaN 2020-11-14 07:45:11.536
4 109275742279 2773429682247 675632644167 -515 0.0 refund_discrepancy Refund discrepancy NaN NaN 2020-11-14 07:54:31.054

shopify_order_data (first 100 rows)

id note email taxes_included currency subtotal_price total_tax total_price created_at updated_at name shipping_address_name shipping_address_first_name shipping_address_last_name shipping_address_company shipping_address_phone shipping_address_address_1 shipping_address_address_2 shipping_address_city shipping_address_country shipping_address_country_code shipping_address_province shipping_address_province_code shipping_address_zip shipping_address_latitude shipping_address_longitude billing_address_name billing_address_first_name billing_address_last_name billing_address_company billing_address_phone billing_address_address_1 billing_address_address_2 billing_address_city billing_address_country billing_address_country_code billing_address_province billing_address_province_code billing_address_zip billing_address_latitude billing_address_longitude customer_id location_id user_id number order_number financial_status fulfillment_status processed_at processing_method referring_site cancel_reason cancelled_at closed_at total_discounts total_line_items_price total_weight source_name browser_ip buyer_accepts_marketing token cart_token checkout_token test landing_site_base_url _fivetran_synced
0 2674098602081 71509c29301d2cc14e37ecb53f735608 021cb20b5c78751fc7ddc091b6b69b3e True GBP 2.8 0 2.80 2020-09-11 19:35:42.000 2020-09-11 19:35:46.000 d1743fc58a1e4d78769eaac49994a994 8b121314a4d97bc9dc15bfba8518ec88 f0962b7a185488ecb752cedac1038349 aa35cb67c26e64bb81a1bf3f17e858ba d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e d6f4a399883df85d9d4b3a02bf6e738a bc9b8576178dcd886639ba718f1d45c8 ac08c606d455cde42980f980524a8038 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b d41d8cd98f00b204e9800998ecf8427e NaN 00079ce435afddc28205639142773870 d97319f64674c02595f2989019970fc8 c08dae474c5d4d3326fd6764d2a0ebe6 8b121314a4d97bc9dc15bfba8518ec88 f0962b7a185488ecb752cedac1038349 aa35cb67c26e64bb81a1bf3f17e858ba d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e d6f4a399883df85d9d4b3a02bf6e738a bc9b8576178dcd886639ba718f1d45c8 ac08c606d455cde42980f980524a8038 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b d41d8cd98f00b204e9800998ecf8427e NaN 00079ce435afddc28205639142773870 d97319f64674c02595f2989019970fc8 c08dae474c5d4d3326fd6764d2a0ebe6 3589760876641 NaN NaN 4135 5135 paid None 2020-09-11 19:35:42.000 None None NaN NaN None 2.8 5.6 0 294517 None True 0f9c2880de17f71511eee5542c29b999 None None False None 2020-09-12 00:15:10.199
1 2669516488801 None dce90c7b4e52e045e5975836aff49cf1 True GBP 2.8 0 3.79 2020-09-09 23:01:54.000 2020-09-10 15:38:26.000 4fcb884b5b46413bae526a6e7e49d706 c8189c7add9755e66391b58ecc12b3e2 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e 1ff1de774005f8da13f42943881c655f 70111f8840ccbd8b1007cc3f387ced6b 1ac412baeba98370017c73df41c98a07 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b None NaN 2357e65b582faa0a2da3603b16fa4a7f 75c29d6dd29594a652fcbd7c4c279a29 75468fbebc28e02ec5d4f54f4cbd4099 c8189c7add9755e66391b58ecc12b3e2 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e 1ff1de774005f8da13f42943881c655f 70111f8840ccbd8b1007cc3f387ced6b 1ac412baeba98370017c73df41c98a07 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b None NaN 2357e65b582faa0a2da3603b16fa4a7f 75c29d6dd29594a652fcbd7c4c279a29 75468fbebc28e02ec5d4f54f4cbd4099 3584045351009 NaN NaN 4066 5066 paid fulfilled 2020-09-09 23:01:53.000 direct 2cc983716a820bc713b793a6e8e73f42 NaN NaN 2020-09-10 15:38:26.000 0.0 2.8 0 web 109.249.185.68 False fb489b3ccc0ae36ce47744d7595e9746 b1ff04883dfeab658cd5211050476729 7bdb994e1196de3e4f34586e357613f9 False 8584e97b29b0802fb393fa453a8b6a7a 2020-09-11 00:14:33.536
2 2669509541985 None dce90c7b4e52e045e5975836aff49cf1 True GBP 4.4 0 5.39 2020-09-09 22:57:51.000 2020-09-10 15:38:25.000 9e346f2e912c60e16679f4a4c8d29422 c8189c7add9755e66391b58ecc12b3e2 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e 1ff1de774005f8da13f42943881c655f 70111f8840ccbd8b1007cc3f387ced6b 1ac412baeba98370017c73df41c98a07 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b None NaN 2357e65b582faa0a2da3603b16fa4a7f 75c29d6dd29594a652fcbd7c4c279a29 75468fbebc28e02ec5d4f54f4cbd4099 c8189c7add9755e66391b58ecc12b3e2 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 d41d8cd98f00b204e9800998ecf8427e d41d8cd98f00b204e9800998ecf8427e 1ff1de774005f8da13f42943881c655f 70111f8840ccbd8b1007cc3f387ced6b 1ac412baeba98370017c73df41c98a07 89f9c9f489be2a83cf57e53b9197d288 79cba1185463850dedba31f172f1dc5b None NaN 2357e65b582faa0a2da3603b16fa4a7f 75c29d6dd29594a652fcbd7c4c279a29 75468fbebc28e02ec5d4f54f4cbd4099 3584045351009 NaN NaN 4065 5065 paid fulfilled 2020-09-09 22:57:50.000 direct 2cc983716a820bc713b793a6e8e73f42 NaN NaN 2020-09-10 15:38:25.000 0.0 4.4 0 web 109.249.185.68 False e44b7f04610a8f4032530cc7f12663de 9600543f4d4613db59ac58a1009ecbb9 cf0a9fe2c7c606b86559007dbb890a62 False 8584e97b29b0802fb393fa453a8b6a7a 2020-09-11 00:14:33.037

shopify_order_discount_code_data (first 100 rows)

index_ order_id _fivetran_synced amount code type
0 1 2674098602081 2022-11-20 08:14:52.957000 11.0 GIFTCARD percentage
1 2 2674098602081 2022-11-20 08:14:52.957000 5.0 SHIPPING2022 shipping
2 3 2674098602081 2022-11-20 08:14:52.957000 1.0 FIXED fixed_amount
3 1 2669516488801 2022-11-19 11:59:50.040000 0.0 SHIPPING2022 shipping
4 1 2669509541985 2022-11-20 10:22:23.877000 2.0 GIFTCARD percentage

shopify_order_line_data (first 100 rows)

order_id id product_id variant_id name title vendor price quantity grams sku fulfillable_quantity fulfillment_service gift_card requires_shipping taxable index_ total_discount pre_tax_price fulfillment_status _fivetran_synced
0 2669509541985 5699743678561 4526236893281 31879811629153 327ea22d0f91783418e519cb45a4a3e9 327ea22d0f91783418e519cb45a4a3e9 13aea892c8de2d62f2608c6191cfab1f 4.4 1 0 854a136da51d43fb87c63c86a62ffad0 0 manual False True False 1 0 NaN fulfilled 2020-09-11 00:14:33.293
1 2669516488801 5699758784609 4506451050593 31814873481313 1fccbdc6ac5f6edabf76e56eb0460019 1fccbdc6ac5f6edabf76e56eb0460019 13aea892c8de2d62f2608c6191cfab1f 2.8 1 0 198369004c95b2b35f480f9691b14178 0 manual False True False 1 0 NaN fulfilled 2020-09-11 00:14:33.767
2 2674098602081 5708321914977 4505775439969 31812476895329 74c574cc1e545fef2beeaf9bbb148fcc 74c574cc1e545fef2beeaf9bbb148fcc 57403999f78b01b3fd325ba256eafe94 2.8 2 0 b988b358c81b47d3e438c99bfb1c4ee1 2 manual False True False 1 0 NaN None 2020-09-12 00:15:10.199

shopify_order_line_refund_data (first 100 rows)

id location_id refund_id restock_type quantity order_line_id _fivetran_synced subtotal total_tax_set subtotal_set total_tax
0 189012115527 3.213171e+10 679976206407 return 1 6113984839751 2020-11-14 07:52:56.522 415 NaN NaN 19.74
1 289901510727 3.213171e+10 800919683143 return 1 9698959196231 2020-11-14 07:52:56.522 415 NaN NaN 56.33
2 196428005447 3.213171e+10 686409187399 return 1 6423996530759 2020-11-14 07:52:56.522 415 NaN NaN 16.18
3 286567268423 NaN 798222680135 no_restock 1 6367161483335 2020-11-14 07:52:56.522 415 NaN NaN 26.17
4 185936773191 NaN 677359190087 no_restock 1 6009460064327 2020-11-14 07:52:56.522 415 NaN NaN 13.75

shopify_order_note_attribute_data (first 100 rows)

name order_id _fivetran_synced value_
0 last_name 34171115 2022-11-19 07:30:28.480000 "1418143823.1643992155"
1 first_name 34171115 2022-11-19 07:30:28.480000 "fb.1.1643992155109.1110590605"
2 updated_at 34171115 2022-11-19 07:30:28.480000 "1643992163253"
3 clientID 34171115 2022-11-19 07:30:28.480000 "a03d3118-4048-4159-b5bb-1b90d8abb69b"
4 name 34171115 2022-11-19 07:30:28.480000 "22707603636395"

shopify_order_shipping_line_data (first 100 rows)

id order_id _fivetran_synced carrier_identifier code delivery_category discounted_price phone price requested_fulfillment_service_id source title discounted_price_set price_set
0 54475 55 2022-11-19 14:09:18.923000 NaN Standard NaN 0.0 NaN 0.0 NaN shopify Standard {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 651 425579 2022-11-19 11:28:21.391000 NaN Standard NaN 0.0 NaN 0.0 NaN shopify Standard {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 188139 4599 2022-11-19 16:03:15.430000 NaN Standard NaN 0.0 NaN 0.0 NaN shopify Standard {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

shopify_order_shipping_tax_line_data (first 100 rows)

index_ order_shipping_line_id _fivetran_synced price rate title price_set
0 4 321291 2022-11-19 15:05:15.847000 0.0 0.000 GEIWIHG {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 3 5995 2022-11-19 11:24:24.596000 0.0 0.007 BANANAN {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 3 309131 2022-11-19 16:52:35.685000 0.0 0.010 TOMATO {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

shopify_order_tag_data (first 100 rows)

index_ order_id _fivetran_synced value_
0 1 6411 2022-12-07 06:49:30.307000 #33333
1 1 47195 2022-12-07 06:49:26.771000 #22222
2 1 46553 2022-12-07 06:49:38.197000 #771222

shopify_order_url_tag_data (first 100 rows)

key_ order_id _fivetran_synced value_
0 image 40347 2022-11-19 10:29:18.624000 Image
1 utm_medium 4290347 2022-11-19 10:29:18.624000 email
2 prop_channel 47 2022-11-19 10:29:18.624000 flows

shopify_price_rule_data (first 100 rows)

id _fivetran_synced allocation_limit allocation_method created_at customer_selection ends_at once_per_customer prerequisite_quantity_range prerequisite_shipping_price_range prerequisite_subtotal_range quantity_ratio_entitled_quantity quantity_ratio_prerequisite_quantity starts_at target_selection target_type title updated_at usage_limit value_ value_type prerequisite_to_entitlement_purchase_prerequisite_amount
0 11443 2021-03-22 05:43:56.784000 NaN across 2021-03-09 18:57:54.000000 all 2021-03-22 07:00:59.000000 False NaN NaN 500.0 NaN NaN 2021-03-17 04:00:57.000000 all line_item GIFTCARD 2021-03-22 04:20:03.000000 NaN 0.0 percentage NaN
1 564075 2021-11-11 07:43:53.706000 NaN across 2021-11-10 22:26:31.000000 all 2021-11-30 14:00:59.000000 False NaN NaN NaN NaN NaN 2021-11-10 22:25:32.000000 entitled line_item THANKS 2021-11-10 22:26:31.000000 NaN 0.0 percentage NaN
2 9339 2021-12-03 06:47:21.433000 NaN across 2021-11-11 22:38:18.000000 all 2021-12-02 19:00:59.000000 False NaN NaN NaN NaN NaN 2021-11-23 21:30:38.000000 all line_item THANKS 2021-12-02 19:21:47.000000 NaN 0.0 percentage NaN

shopify_product_data (first 100 rows)

id title handle product_type vendor created_at updated_at published_at published_scope _fivetran_deleted _fivetran_synced
0 4506451050593 1fccbdc6ac5f6edabf76e56eb0460019 f4b6d0e4413a19b2e7a291f0ef4dc98f fdb42fcb90ecd31c015932ffcd313014 13aea892c8de2d62f2608c6191cfab1f 2020-02-14 19:18:05.000 2020-09-10 18:16:42.000 2020-02-14 19:02:02.000 web False 2020-09-11 00:14:09.592
1 4526236893281 327ea22d0f91783418e519cb45a4a3e9 129181bbc087330e216a6a4d7939f00b ec3bb3dd6e9d1f348a040ee7b45f1a72 13aea892c8de2d62f2608c6191cfab1f 2020-03-04 05:04:32.000 2020-09-10 15:06:03.000 2020-03-04 05:04:32.000 web False 2020-09-11 00:14:07.989
2 4505775439969 c6c6fea8419b94103b0b05d64a5bab10 f0a656254aca08bf40181226ac13418c fdb42fcb90ecd31c015932ffcd313014 57403999f78b01b3fd325ba256eafe94 2020-02-14 02:09:59.000 2020-09-11 21:21:21.000 2020-02-14 02:09:59.000 global False 2020-09-12 00:14:11.721

shopify_product_image_data (first 100 rows)

id product_id _fivetran_deleted _fivetran_synced alt created_at height position_ src updated_at width is_default variant_ids
0 14180 38804 False 2022-12-01 06:51:36.660000 NaN 2019-06-13 04:06:07.000000 1200 4 https://cdn.shopify.com/s/files/glassess-1784103173.jpg?v=1560398767 2019-06-13 04:06:07.000000 956 False []
1 748644 34804 False 2022-12-01 06:51:36.660000 NaN 2019-06-13 04:06:07.000000 1200 2 https://cdn.shopify.com/s/files/1/smile.jpg?v=1560398767 2019-06-13 04:06:07.000000 956 False []
2 679716 34604 False 2022-12-01 06:51:36.660000 NaN 2019-06-13 04:06:07.000000 1200 6 https://cdn.shopify.com/s/files/1/kitten.jpg?v=1560398767 2019-06-13 04:06:07.000000 956 False [2755330292,27559733,275597338,275597536,2755931364,2755973,2734989668]

shopify_product_tag_data (first 100 rows)

index_ product_id _fivetran_synced value_
0 9 1234 2022-12-01 06:51:36.480000 Type: Clothing
1 5 1234 2022-12-01 06:51:36.480000 Final Sale
2 7 1234 2022-12-01 06:51:36.480000 Sale
3 8 1234 2022-12-01 06:51:36.480000 StyleID:nice
4 3 1234 2022-12-01 06:51:36.480000 Collection: Bottoms

shopify_product_variant_data (first 100 rows)

id product_id inventory_item_id title price sku position_ inventory_policy compare_at_price fulfillment_service inventory_management created_at updated_at taxable barcode grams image_id inventory_quantity weight weight_unit old_inventory_quantity requires_shipping _fivetran_synced option_2 tax_code option_3 option_1
0 39262114414663 6540108431431 41356021661767 my title here 111 NaN 1 deny NaN manual None 2021-03-08 16:30:15.000 2021-04-12 19:49:43.000 False NaN 0 NaN 0 0 lb 0 False 2021-04-16 07:50:32.995 NaN None NaN my title here
1 39273118957639 6544066379847 41367035936839 my title here 222 NaN 1 deny NaN manual None 2021-03-17 16:39:45.000 2021-04-12 19:46:59.000 False NaN 0 NaN 0 0 lb 0 False 2021-04-16 07:50:29.241 NaN None NaN my title here
2 39290169262151 6548438188103 41384094924871 my title here 5 NaN 1 deny NaN manual inventory manager 2021-03-30 19:48:15.000 2021-03-30 19:48:15.000 True NaN 0 NaN 0 0 lb 0 True 2021-04-16 07:50:32.720 NaN None NaN my title here
3 39262115397703 6540109250631 41356022644807 my title here 333 NaN 1 deny NaN manual None 2021-03-08 16:31:31.000 2021-04-12 19:47:26.000 False NaN 0 NaN -5 0 lb -5 False 2021-04-16 07:50:29.822 NaN None NaN my title here
4 29217058947142 3879735590982 30309980143686 my other title 444 NaN 1 deny NaN manual inventory manager 2019-06-25 18:32:03.000 2019-10-01 23:40:09.000 True NaN 222 NaN 0 1 lb 0 True 2021-04-16 07:50:25.006 NaN TR9999 NaN my other title

shopify_refund_data (first 100 rows)

id created_at processed_at note restock user_id _fivetran_synced total_duties_set order_id
0 801704738887 2021-04-17 20:25:08.000 2021-04-17 20:25:08.000 None False 40467791943 2021-04-18 08:05:22.056 NaN 3726667481159
1 801695039559 2021-04-17 15:45:21.000 2021-04-17 15:45:21.000 None False 40467791943 2021-04-18 07:52:19.104 NaN 3725521846343
2 801704181831 2021-04-17 20:15:01.000 2021-04-17 20:15:01.000 None False 40467791943 2021-04-18 08:05:22.522 NaN 3726619476039
3 801703428167 2021-04-17 19:56:51.000 2021-04-17 19:56:51.000 my refund note False 40467791943 2021-04-18 08:05:22.841 NaN 3726370996295
4 801707360327 2021-04-17 21:32:50.000 2021-04-17 21:32:50.000 None False 40467791943 2021-04-18 08:02:24.256 NaN 3726858289223

shopify_shop_data (first 100 rows)

id _fivetran_deleted _fivetran_synced address_1 address_2 auto_configure_tax_inclusivity checkout_api_supported city cookie_consent_level country country_code country_name county_taxes created_at currency customer_email domain_ eligible_for_card_reader_giveaway eligible_for_payments email enabled_presentment_currencies force_ssl google_apps_domain google_apps_login_enabled has_discounts has_gift_cards has_storefront iana_timezone latitude longitude money_format money_in_emails_format money_with_currency_format money_with_currency_in_emails_format multi_location_enabled myshopify_domain name password_enabled phone plan_display_name plan_name pre_launch_enabled primary_locale primary_location_id province province_code requires_extra_payments_agreement setup_required shop_owner source tax_shipping taxes_included timezone updated_at visitor_tracking_consent_preference weight_unit zip
0 689 False 2022-12-07 06:49:41.652000 1 Main Street 200th Floor NaN True New York implicit US US United States True 2018-12-10 16:24:00.000000 USD noreply@kitties.com kitties.com True True abc@kitties.com ["USD"] True NaN NaN True True True America/New_York 80.1234 -123.12345 ${{amount}} ${{amount}} ${{amount}} USD ${{amount}} USD True kitties.myshopify.com Garrett & Alfredo False 13373 Shopify Plus shopify_plus False en 1234646345 New York NY False False Garrett & Alfredo NaN NaN False (GMT-05:00) America/New_York 2022-12-07 00:26:36.000000 allow_all lb 10014

shopify_tax_line_data (first 100 rows)

index_ order_line_id _fivetran_synced price rate title price_set
0 1 29227 2022-11-19 05:30:34.023000 0.0 0.0 VAT {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 1 1839083 2022-11-19 07:14:05.023000 0.0 0.0 VAT {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 1 11995 2022-11-19 05:30:34.023000 0.0 0.0 VAT {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
3 1 10751 2022-11-19 07:14:05.024000 0.0 0.0 VAT {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
4 1 194763 2022-11-19 05:30:34.023000 0.0 0.0 VAT {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

shopify_tender_transaction_data (first 100 rows)

id _fivetran_synced amount currency order_id payment_details_credit_card_company payment_details_credit_card_number payment_method processed_at remote_reference test user_id
0 34283 2022-12-01 06:51:34.004000 2895.74 USD 45379 NaN NaN other 2022-11-30 18:14:37.000000 NaN False NaN
1 905707 2022-12-01 06:51:42.309000 5900.75 USD 45243 NaN NaN other 2022-12-01 02:00:39.000000 NaN False NaN
2 411 2022-12-01 06:51:29.718000 -164.72 USD 4559467 NaN NaN other 2022-11-30 14:29:13.000000 NaN False NaN
3 55179 2022-12-01 06:51:41.198000 5180.19 USD 35 NaN NaN other 2022-11-30 23:55:45.000000 NaN False NaN
4 16923 2022-12-01 06:51:42.358000 3004.30 USD 45955 NaN NaN other 2022-12-01 02:09:47.000000 NaN False NaN

shopify_transaction_data (first 100 rows)

id order_id refund_id amount authorization_ created_at processed_at device_id gateway source_name message currency location_id parent_id payment_avs_result_code kind currency_exchange_id currency_exchange_adjustment currency_exchange_original_amount currency_exchange_final_amount currency_exchange_currency error_code status test user_id _fivetran_synced payment_credit_card_bin payment_cvv_result_code payment_credit_card_number payment_credit_card_company receipt
0 2667417567303 2181743870023 NaN 415.00 abcd999999 2020-02-27 16:05:37.000 2020-02-27 16:05:37.000 NaN gateway_here source_name message_here USD NaN NaN Z sale NaN NaN NaN NaN NaN NaN success False NaN 2020-10-28 20:33:09.797 NaN NaN NaN NaN { "charges": { "data": [ { "balance_transaction": { "exchange_rate": null } }] }}
1 2572210896967 2089104834631 NaN 415.00 abcd888888 2020-01-12 20:06:37.000 2020-01-12 20:06:37.000 NaN gateway_here source_name message_here USD NaN NaN Y sale NaN NaN NaN NaN NaN NaN success False NaN 2020-10-28 17:05:27.756 NaN NaN NaN NaN None
2 2664325611591 2179107356743 NaN 415.00 abcd77777 2020-02-26 00:12:37.000 2020-02-26 00:12:37.000 NaN gateway_here source_name message_here USD NaN NaN None sale NaN NaN NaN NaN NaN NaN success False NaN 2020-10-28 20:23:50.344 NaN NaN NaN NaN { "charges": { "data": [ { "balance_transaction": { "exchange_rate": "0.523" } }] }}
3 2595729735751 2114590769223 NaN 15.95 abcd66666 2020-01-26 11:04:41.000 2020-01-26 11:04:41.000 NaN gateway_here source_name message_here USD NaN NaN Y sale NaN NaN NaN NaN NaN NaN success False NaN 2020-10-28 18:10:27.604 NaN NaN NaN NaN None
4 2705030512711 2214516916295 NaN 212.12 abcd5555 2020-03-18 00:17:24.000 2020-03-18 00:17:24.000 NaN gateway_here source_name message_here USD NaN NaN None sale NaN NaN NaN NaN NaN NaN success False NaN 2020-10-28 22:14:02.944 NaN NaN NaN NaN { "charges": { "data": [ { "balance_transaction": { "exchange_rate": "0.96581" } }] }}
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_shopify_order_discount_code_data (first 100 rows)

discount_order discount_value discount_code discount_type order_id
0 1 11.0 GIFTCARD percentage 2674098602081
1 2 5.0 SHIPPING2022 shipping 2674098602081
2 3 1.0 FIXED fixed_amount 2674098602081
3 1 0.0 SHIPPING2022 shipping 2669516488801
4 1 2.0 GIFTCARD percentage 2669509541985

stg_shopify_order_discount_code_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_discount_code_data_projected" AS (
    -- Projection: Selecting 5 out of 6 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "order_id",
        "amount",
        "code",
        "type"
    FROM "shopify_order_discount_code_data"
),

"shopify_order_discount_code_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> discount_order
    -- amount -> discount_value
    -- code -> discount_code
    -- type -> discount_type
    SELECT 
        "index_" AS "discount_order",
        "order_id",
        "amount" AS "discount_value",
        "code" AS "discount_code",
        "type" AS "discount_type"
    FROM "shopify_order_discount_code_data_projected"
),

"shopify_order_discount_code_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- order_id: from INT to VARCHAR
    SELECT
        "discount_order",
        "discount_value",
        "discount_code",
        "discount_type",
        CAST("order_id" AS VARCHAR) AS "order_id"
    FROM "shopify_order_discount_code_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_discount_code_data_projected_renamed_casted"

stg_shopify_order_discount_code_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_discount_code_data
  description: The table is about order discounts in a Shopify store. It contains
    details of discount codes applied to orders. Each row represents a discount, including
    the order ID, discount amount, code used, and type of discount (percentage, shipping,
    or fixed amount). Multiple discounts can be applied to a single order.
  columns:
  - name: discount_order
    description: Order of application for multiple discounts
    tests:
    - not_null
  - name: discount_value
    description: Value of the discount applied
    tests:
    - not_null
  - name: discount_code
    description: Discount code used for the order
    tests:
    - not_null
  - name: discount_type
    description: Category of discount (percentage, shipping, fixed)
    tests:
    - not_null
    - accepted_values:
        values:
        - percentage
        - shipping
        - fixed_amount
  - name: order_id
    description: Unique identifier for the order
    tests:
    - not_null

stg_shopify_price_rule_data (first 100 rows)

price_rule_id allocation_method customer_eligibility one_time_use subtotal_prerequisite discount_target target_type price_rule_name discount_value discount_type allocation_limit creation_date expiration_date last_updated start_date usage_limit
0 11443 across all False 500.0 all line_item GIFTCARD 0.0 percentage None 2021-03-09 18:57:54 2021-03-22 07:00:59 2021-03-22 04:20:03 2021-03-17 04:00:57 None
1 564075 across all False NaN entitled line_item THANKS 0.0 percentage None 2021-11-10 22:26:31 2021-11-30 14:00:59 2021-11-10 22:26:31 2021-11-10 22:25:32 None
2 9339 across all False NaN all line_item THANKS 0.0 percentage None 2021-11-11 22:38:18 2021-12-02 19:00:59 2021-12-02 19:21:47 2021-11-23 21:30:38 None

stg_shopify_price_rule_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_price_rule_data_projected" AS (
    -- Projection: Selecting 21 out of 22 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "allocation_limit",
        "allocation_method",
        "created_at",
        "customer_selection",
        "ends_at",
        "once_per_customer",
        "prerequisite_quantity_range",
        "prerequisite_shipping_price_range",
        "prerequisite_subtotal_range",
        "quantity_ratio_entitled_quantity",
        "quantity_ratio_prerequisite_quantity",
        "starts_at",
        "target_selection",
        "target_type",
        "title",
        "updated_at",
        "usage_limit",
        "value_",
        "value_type",
        "prerequisite_to_entitlement_purchase_prerequisite_amount"
    FROM "shopify_price_rule_data"
),

"shopify_price_rule_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> price_rule_id
    -- created_at -> creation_date
    -- customer_selection -> customer_eligibility
    -- ends_at -> expiration_date
    -- once_per_customer -> one_time_use
    -- prerequisite_quantity_range -> quantity_prerequisite
    -- prerequisite_shipping_price_range -> shipping_price_prerequisite
    -- prerequisite_subtotal_range -> subtotal_prerequisite
    -- quantity_ratio_entitled_quantity -> entitled_quantity_ratio
    -- quantity_ratio_prerequisite_quantity -> prerequisite_quantity_ratio
    -- starts_at -> start_date
    -- target_selection -> discount_target
    -- title -> price_rule_name
    -- updated_at -> last_updated
    -- value_ -> discount_value
    -- value_type -> discount_type
    -- prerequisite_to_entitlement_purchase_prerequisite_amount -> entitlement_purchase_prerequisite
    SELECT 
        "id" AS "price_rule_id",
        "allocation_limit",
        "allocation_method",
        "created_at" AS "creation_date",
        "customer_selection" AS "customer_eligibility",
        "ends_at" AS "expiration_date",
        "once_per_customer" AS "one_time_use",
        "prerequisite_quantity_range" AS "quantity_prerequisite",
        "prerequisite_shipping_price_range" AS "shipping_price_prerequisite",
        "prerequisite_subtotal_range" AS "subtotal_prerequisite",
        "quantity_ratio_entitled_quantity" AS "entitled_quantity_ratio",
        "quantity_ratio_prerequisite_quantity" AS "prerequisite_quantity_ratio",
        "starts_at" AS "start_date",
        "target_selection" AS "discount_target",
        "target_type",
        "title" AS "price_rule_name",
        "updated_at" AS "last_updated",
        "usage_limit",
        "value_" AS "discount_value",
        "value_type" AS "discount_type",
        "prerequisite_to_entitlement_purchase_prerequisite_amount" AS "entitlement_purchase_prerequisite"
    FROM "shopify_price_rule_data_projected"
),

"shopify_price_rule_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- allocation_limit: from DECIMAL to VARCHAR
    -- creation_date: from VARCHAR to TIMESTAMP
    -- entitled_quantity_ratio: from DECIMAL to VARCHAR
    -- entitlement_purchase_prerequisite: from DECIMAL to VARCHAR
    -- expiration_date: from VARCHAR to TIMESTAMP
    -- last_updated: from VARCHAR to TIMESTAMP
    -- prerequisite_quantity_ratio: from DECIMAL to VARCHAR
    -- quantity_prerequisite: from DECIMAL to VARCHAR
    -- shipping_price_prerequisite: from DECIMAL to VARCHAR
    -- start_date: from VARCHAR to TIMESTAMP
    -- usage_limit: from DECIMAL to VARCHAR
    SELECT
        "price_rule_id",
        "allocation_method",
        "customer_eligibility",
        "one_time_use",
        "subtotal_prerequisite",
        "discount_target",
        "target_type",
        "price_rule_name",
        "discount_value",
        "discount_type",
        CAST("allocation_limit" AS VARCHAR) AS "allocation_limit",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("entitled_quantity_ratio" AS VARCHAR) AS "entitled_quantity_ratio",
        CAST("entitlement_purchase_prerequisite" AS VARCHAR) AS "entitlement_purchase_prerequisite",
        CAST("expiration_date" AS TIMESTAMP) AS "expiration_date",
        CAST("last_updated" AS TIMESTAMP) AS "last_updated",
        CAST("prerequisite_quantity_ratio" AS VARCHAR) AS "prerequisite_quantity_ratio",
        CAST("quantity_prerequisite" AS VARCHAR) AS "quantity_prerequisite",
        CAST("shipping_price_prerequisite" AS VARCHAR) AS "shipping_price_prerequisite",
        CAST("start_date" AS TIMESTAMP) AS "start_date",
        CAST("usage_limit" AS VARCHAR) AS "usage_limit"
    FROM "shopify_price_rule_data_projected_renamed"
),

"shopify_price_rule_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 5 columns with unacceptable missing values
    -- entitled_quantity_ratio has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- entitlement_purchase_prerequisite has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- prerequisite_quantity_ratio has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- quantity_prerequisite has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_price_prerequisite has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "price_rule_id",
        "allocation_method",
        "customer_eligibility",
        "one_time_use",
        "subtotal_prerequisite",
        "discount_target",
        "target_type",
        "price_rule_name",
        "discount_value",
        "discount_type",
        "allocation_limit",
        "creation_date",
        "expiration_date",
        "last_updated",
        "start_date",
        "usage_limit"
    FROM "shopify_price_rule_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_price_rule_data_projected_renamed_casted_missing_handled"

stg_shopify_price_rule_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_price_rule_data
  description: The table is about Shopify price rules. It contains details of discount
    configurations. Each rule has an ID, creation date, and expiration date. Rules
    specify customer eligibility, discount type, and value. They can target specific
    items or all products. Additional fields set prerequisites like minimum purchase
    amounts. The table allows for flexible discount creation and management in the
    Shopify platform.
  columns:
  - name: price_rule_id
    description: Unique identifier for the price rule
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each price rule. For this
        table, each row represents a distinct price rule configuration. price_rule_id
        is unique across rows as it's designed to be the primary identifier for each
        rule.
  - name: allocation_method
    description: Method for allocating discount across products
    tests:
    - not_null
    - accepted_values:
        values:
        - proportional
        - equal
        - first item
        - last item
        - highest priced item
        - lowest priced item
        - random
        - across
  - name: customer_eligibility
    description: Specifies which customers are eligible
    tests:
    - not_null
    - accepted_values:
        values:
        - all
        - new
        - existing
        - premium
        - standard
        - vip
        - loyalty_program
        - first_time
        - returning
        - age_18_plus
        - age_21_plus
        - students
        - seniors
        - military
        - corporate
  - name: one_time_use
    description: Indicates if discount is one-time use
    tests:
    - not_null
  - name: subtotal_prerequisite
    description: Required subtotal range for discount eligibility
    cocoon_meta:
      missing_acceptable: Not applicable when no minimum purchase is required.
  - name: discount_target
    description: Specifies which items the discount applies to
    tests:
    - not_null
    - accepted_values:
        values:
        - all
        - entitled
        - specific
  - name: target_type
    description: Type of target for the discount
    tests:
    - not_null
    - accepted_values:
        values:
        - line_item
        - order
        - shipping
        - product
        - category
        - customer
        - customer_group
  - name: price_rule_name
    description: Name or title of the price rule
    tests:
    - not_null
  - name: discount_value
    description: Numerical value of the discount
    tests:
    - not_null
  - name: discount_type
    description: Type of value (percentage or fixed amount)
    tests:
    - not_null
    - accepted_values:
        values:
        - percentage
        - fixed amount
  - name: allocation_limit
    description: Limits how discount is allocated
    cocoon_meta:
      missing_acceptable: Not applicable when allocation method is 'across'.
  - name: creation_date
    description: Timestamp when the price rule was created
    tests:
    - not_null
  - name: expiration_date
    description: Timestamp when the price rule expires
    tests:
    - not_null
  - name: last_updated
    description: Timestamp of last update to the rule
    tests:
    - not_null
  - name: start_date
    description: Timestamp when the price rule becomes active
    tests:
    - not_null
  - name: usage_limit
    description: Maximum number of times rule can be used
    cocoon_meta:
      missing_acceptable: Not applicable when there's no limit on usage.

stg_shopify_order_shipping_line_data (first 100 rows)

shipping_line_id order_id shipping_code discounted_price_numeric price_numeric shipping_source shipping_method_title carrier_id discounted_price_details price_details
0 54475 55 Standard 0.0 0.0 shopify Standard None {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 651 425579 Standard 0.0 0.0 shopify Standard None {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 188139 4599 Standard 0.0 0.0 shopify Standard None {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}} {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

stg_shopify_order_shipping_line_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_shipping_line_data_projected" AS (
    -- Projection: Selecting 13 out of 14 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "order_id",
        "carrier_identifier",
        "code",
        "delivery_category",
        "discounted_price",
        "phone",
        "price",
        "requested_fulfillment_service_id",
        "source",
        "title",
        "discounted_price_set",
        "price_set"
    FROM "shopify_order_shipping_line_data"
),

"shopify_order_shipping_line_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> shipping_line_id
    -- carrier_identifier -> carrier_id
    -- code -> shipping_code
    -- delivery_category -> delivery_type
    -- discounted_price -> discounted_price_numeric
    -- phone -> shipping_phone
    -- price -> price_numeric
    -- requested_fulfillment_service_id -> fulfillment_service_id
    -- source -> shipping_source
    -- title -> shipping_method_title
    -- discounted_price_set -> discounted_price_details
    -- price_set -> price_details
    SELECT 
        "id" AS "shipping_line_id",
        "order_id",
        "carrier_identifier" AS "carrier_id",
        "code" AS "shipping_code",
        "delivery_category" AS "delivery_type",
        "discounted_price" AS "discounted_price_numeric",
        "phone" AS "shipping_phone",
        "price" AS "price_numeric",
        "requested_fulfillment_service_id" AS "fulfillment_service_id",
        "source" AS "shipping_source",
        "title" AS "shipping_method_title",
        "discounted_price_set" AS "discounted_price_details",
        "price_set" AS "price_details"
    FROM "shopify_order_shipping_line_data_projected"
),

"shopify_order_shipping_line_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- carrier_id: from DECIMAL to VARCHAR
    -- delivery_type: from DECIMAL to VARCHAR
    -- discounted_price_details: from VARCHAR to JSON
    -- fulfillment_service_id: from DECIMAL to VARCHAR
    -- price_details: from VARCHAR to JSON
    -- shipping_phone: from DECIMAL to VARCHAR
    SELECT
        "shipping_line_id",
        "order_id",
        "shipping_code",
        "discounted_price_numeric",
        "price_numeric",
        "shipping_source",
        "shipping_method_title",
        CAST("carrier_id" AS VARCHAR) AS "carrier_id",
        CAST("delivery_type" AS VARCHAR) AS "delivery_type",
        CAST("discounted_price_details" AS JSON) AS "discounted_price_details",
        CAST("fulfillment_service_id" AS VARCHAR) AS "fulfillment_service_id",
        CAST("price_details" AS JSON) AS "price_details",
        CAST("shipping_phone" AS VARCHAR) AS "shipping_phone"
    FROM "shopify_order_shipping_line_data_projected_renamed"
),

"shopify_order_shipping_line_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- delivery_type has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- fulfillment_service_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_phone has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "shipping_line_id",
        "order_id",
        "shipping_code",
        "discounted_price_numeric",
        "price_numeric",
        "shipping_source",
        "shipping_method_title",
        "carrier_id",
        "discounted_price_details",
        "price_details"
    FROM "shopify_order_shipping_line_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_shipping_line_data_projected_renamed_casted_missing_handled"

stg_shopify_order_shipping_line_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_shipping_line_data
  description: The table is about shipping information for Shopify orders. It includes
    details such as the order ID, shipping carrier, delivery category, and pricing.
    Each row represents a shipping line item for a specific order. The table contains
    both discounted and regular pricing information in different currencies. All samples
    show standard shipping with zero cost, suggesting possible free shipping offers.
  columns:
  - name: shipping_line_id
    description: Unique identifier for the shipping line item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each shipping line
        item. For this table, each row represents a single shipping line item for
        a specific order. The shipping_line_id is likely to be unique across rows
        as it's designed to identify each shipping line individually.
  - name: order_id
    description: Identifier of the associated order
    tests:
    - not_null
  - name: shipping_code
    description: Code representing the shipping method
    tests:
    - not_null
    - accepted_values:
        values:
        - Standard
        - Express
        - Overnight
        - Two-Day
        - Ground
        - Priority
        - Economy
        - Same-Day
        - International
        - Freight
  - name: discounted_price_numeric
    description: Discounted shipping price as a numeric value
    tests:
    - not_null
  - name: price_numeric
    description: Regular shipping price as a numeric value
    tests:
    - not_null
  - name: shipping_source
    description: Source of the shipping information
    tests:
    - not_null
    - accepted_values:
        values:
        - shopify
        - manual
        - api
        - csv_import
        - third_party_logistics
        - marketplace
        - dropshipping
        - erp_system
        - order_management_system
        - custom_integration
  - name: shipping_method_title
    description: Title or name of the shipping method
    tests:
    - not_null
    - accepted_values:
        values:
        - Standard
        - Express
        - Overnight
        - Two-Day
        - Ground
        - Economy
        - Priority
        - Same Day
        - International
        - Free Shipping
        - Local Pickup
        - Flat Rate
  - name: carrier_id
    description: Identifier for the shipping carrier
    cocoon_meta:
      missing_acceptable: Not applicable when shipping is handled by Shopify.
  - name: discounted_price_details
    description: Detailed discounted price information in JSON format
    tests:
    - not_null
  - name: price_details
    description: Detailed regular price information in JSON format
    tests:
    - not_null

stg_shopify_refund_data (first 100 rows)

refund_note items_restocked customer_id original_order_id refund_created_at refund_duties refund_id refund_processed_at
0 None False 40467791943 3726667481159 2021-04-17 20:25:08 None 801704738887 2021-04-17 20:25:08
1 None False 40467791943 3725521846343 2021-04-17 15:45:21 None 801695039559 2021-04-17 15:45:21
2 None False 40467791943 3726619476039 2021-04-17 20:15:01 None 801704181831 2021-04-17 20:15:01
3 None False 40467791943 3726370996295 2021-04-17 19:56:51 None 801703428167 2021-04-17 19:56:51
4 None False 40467791943 3726858289223 2021-04-17 21:32:50 None 801707360327 2021-04-17 21:32:50

stg_shopify_refund_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_refund_data_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "processed_at",
        "note",
        "restock",
        "user_id",
        "total_duties_set",
        "order_id"
    FROM "shopify_refund_data"
),

"shopify_refund_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> refund_id
    -- created_at -> refund_created_at
    -- processed_at -> refund_processed_at
    -- note -> refund_note
    -- restock -> items_restocked
    -- user_id -> customer_id
    -- total_duties_set -> refund_duties
    -- order_id -> original_order_id
    SELECT 
        "id" AS "refund_id",
        "created_at" AS "refund_created_at",
        "processed_at" AS "refund_processed_at",
        "note" AS "refund_note",
        "restock" AS "items_restocked",
        "user_id" AS "customer_id",
        "total_duties_set" AS "refund_duties",
        "order_id" AS "original_order_id"
    FROM "shopify_refund_data_projected"
),

"shopify_refund_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- refund_note: The problem is that 'my refund note' appears to be a placeholder value rather than genuine refund notes. It's unusual because it's generic and doesn't provide any specific information about individual refunds. The correct values should be actual refund notes or an empty string if no specific note is available. 
    SELECT
        "refund_id",
        "refund_created_at",
        "refund_processed_at",
        CASE
            WHEN "refund_note" = 'my refund note' THEN ''
            ELSE "refund_note"
        END AS "refund_note",
        "items_restocked",
        "customer_id",
        "refund_duties",
        "original_order_id"
    FROM "shopify_refund_data_projected_renamed"
),

"shopify_refund_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- refund_note: ['']
    SELECT 
        CASE
            WHEN "refund_note" = '' THEN NULL
            ELSE "refund_note"
        END AS "refund_note",
        "original_order_id",
        "refund_created_at",
        "refund_duties",
        "customer_id",
        "refund_processed_at",
        "items_restocked",
        "refund_id"
    FROM "shopify_refund_data_projected_renamed_cleaned"
),

"shopify_refund_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- customer_id: from INT to VARCHAR
    -- original_order_id: from INT to VARCHAR
    -- refund_created_at: from VARCHAR to TIMESTAMP
    -- refund_duties: from DECIMAL to VARCHAR
    -- refund_id: from INT to VARCHAR
    -- refund_processed_at: from VARCHAR to TIMESTAMP
    SELECT
        "refund_note",
        "items_restocked",
        CAST("customer_id" AS VARCHAR) AS "customer_id",
        CAST("original_order_id" AS VARCHAR) AS "original_order_id",
        CAST("refund_created_at" AS TIMESTAMP) AS "refund_created_at",
        CAST("refund_duties" AS VARCHAR) AS "refund_duties",
        CAST("refund_id" AS VARCHAR) AS "refund_id",
        CAST("refund_processed_at" AS TIMESTAMP) AS "refund_processed_at"
    FROM "shopify_refund_data_projected_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_refund_data_projected_renamed_cleaned_null_casted"

stg_shopify_refund_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_refund_data
  description: The table is about Shopify refunds. It includes details such as refund
    ID, creation and processing timestamps, notes, restock status, user ID, total
    duties, and associated order ID. Each row represents a single refund transaction.
    The table allows tracking of refund activities, linking them to specific orders
    and users in the Shopify system.
  columns:
  - name: refund_note
    description: Optional note associated with the refund
    cocoon_meta:
      missing_acceptable: No additional notes were necessary for these refunds.
  - name: items_restocked
    description: Boolean indicating if items were restocked
    tests:
    - not_null
  - name: customer_id
    description: Identifier of the user associated with the refund
    tests:
    - not_null
  - name: original_order_id
    description: Identifier of the order being refunded
    tests:
    - not_null
  - name: refund_created_at
    description: Timestamp when the refund was created
    tests:
    - not_null
  - name: refund_duties
    description: Total duties set for the refund
    cocoon_meta:
      missing_acceptable: No duties charged or refunded for these transactions.
  - name: refund_id
    description: Unique identifier for the refund
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each refund. For
        this table, each row is for a single refund transaction. The refund_id is
        designed to be unique across all refunds in the Shopify system.
  - name: refund_processed_at
    description: Timestamp when the refund was processed
    tests:
    - not_null

stg_shopify_collection_product_data (first 100 rows)

collection_id product_id
0 37124 789131
1 9037124 74353899
2 37124 8891

stg_shopify_collection_product_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_collection_product_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "collection_id",
        "product_id"
    FROM "shopify_collection_product_data"
),

"shopify_collection_product_data_projected_casted" AS (
    -- Column Type Casting: 
    -- collection_id: from INT to VARCHAR
    -- product_id: from INT to VARCHAR
    SELECT
        CAST("collection_id" AS VARCHAR) AS "collection_id",
        CAST("product_id" AS VARCHAR) AS "product_id"
    FROM "shopify_collection_product_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_collection_product_data_projected_casted"

stg_shopify_collection_product_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_collection_product_data
  description: The table represents the association between Shopify collections and
    products. Each row links a collection to a product. Collections can contain multiple
    products. Products can belong to multiple collections. The table uses IDs to uniquely
    identify each collection and product.
  columns:
  - name: collection_id
    description: Unique identifier for a Shopify collection
    tests:
    - not_null
  - name: product_id
    description: Unique identifier for a product in Shopify
    tests:
    - not_null

stg_shopify_customer_data (first 100 rows)

encrypted_first_name encrypted_last_name encrypted_email account_state orders_count total_spent marketing_consent tax_exempt email_verified account_creation_date customer_id default_address_id last_updated_date phone
0 29e00d3659d1c5e75f99e892f0c1a1f1 3f0e6a46fb84eb1e6f5f00d86aa53b1b ab0bf25ab8b2a6b78af26a141dd6f455 disabled 0 0.00 False False True 2020-09-11 13:26:15 3588998496353 3951726461025 2020-09-11 13:26:15 None
1 f0962b7a185488ecb752cedac1038349 aa35cb67c26e64bb81a1bf3f17e858ba 021cb20b5c78751fc7ddc091b6b69b3e invited 1 2.80 True False True 2020-09-11 19:35:42 3589760876641 3952669655137 2020-09-11 19:41:04 None
2 d3bae70c9d49bb7cb5a74cdd0eae7fc4 0dd89cff60965dff8f9ea2bc952a5474 dce90c7b4e52e045e5975836aff49cf1 disabled 2 9.18 False False True 2020-09-09 22:57:44 3584045351009 3946055729249 2020-09-09 23:01:55 None

stg_shopify_customer_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_customer_data_projected" AS (
    -- Projection: Selecting 14 out of 15 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "first_name",
        "last_name",
        "email",
        "phone",
        "state",
        "orders_count",
        "total_spent",
        "created_at",
        "updated_at",
        "accepts_marketing",
        "tax_exempt",
        "verified_email",
        "default_address_id"
    FROM "shopify_customer_data"
),

"shopify_customer_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> customer_id
    -- first_name -> encrypted_first_name
    -- last_name -> encrypted_last_name
    -- email -> encrypted_email
    -- state -> account_state
    -- created_at -> account_creation_date
    -- updated_at -> last_updated_date
    -- accepts_marketing -> marketing_consent
    -- verified_email -> email_verified
    SELECT 
        "id" AS "customer_id",
        "first_name" AS "encrypted_first_name",
        "last_name" AS "encrypted_last_name",
        "email" AS "encrypted_email",
        "phone",
        "state" AS "account_state",
        "orders_count",
        "total_spent",
        "created_at" AS "account_creation_date",
        "updated_at" AS "last_updated_date",
        "accepts_marketing" AS "marketing_consent",
        "tax_exempt",
        "verified_email" AS "email_verified",
        "default_address_id"
    FROM "shopify_customer_data_projected"
),

"shopify_customer_data_projected_renamed_trimmed" AS (
    -- Trim Leading and Trailing Spaces
    SELECT
        "customer_id",
        "encrypted_first_name",
        "encrypted_last_name",
        "encrypted_email",
        "phone",
        "account_state",
        "orders_count",
        "total_spent",
        "marketing_consent",
        "tax_exempt",
        "email_verified",
        "default_address_id",
        TRIM("account_creation_date") AS "account_creation_date",
        TRIM("last_updated_date") AS "last_updated_date"
    FROM "shopify_customer_data_projected_renamed"
),

"shopify_customer_data_projected_renamed_trimmed_casted" AS (
    -- Column Type Casting: 
    -- account_creation_date: from VARCHAR to TIMESTAMP
    -- customer_id: from INT to VARCHAR
    -- default_address_id: from INT to VARCHAR
    -- last_updated_date: from VARCHAR to TIMESTAMP
    -- phone: from DECIMAL to VARCHAR
    SELECT
        "encrypted_first_name",
        "encrypted_last_name",
        "encrypted_email",
        "account_state",
        "orders_count",
        "total_spent",
        "marketing_consent",
        "tax_exempt",
        "email_verified",
        CAST("account_creation_date" AS TIMESTAMP) AS "account_creation_date",
        CAST("customer_id" AS VARCHAR) AS "customer_id",
        CAST("default_address_id" AS VARCHAR) AS "default_address_id",
        CAST("last_updated_date" AS TIMESTAMP) AS "last_updated_date",
        CAST("phone" AS VARCHAR) AS "phone"
    FROM "shopify_customer_data_projected_renamed_trimmed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_customer_data_projected_renamed_trimmed_casted"

stg_shopify_customer_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_customer_data
  description: The table is about Shopify customers. It contains customer details
    such as name, email, and phone. The table tracks customer order history, including
    order count and total spent. It also includes customer preferences like marketing
    acceptance and tax exemption status. Each customer has a unique ID and associated
    timestamps for creation and updates.
  columns:
  - name: encrypted_first_name
    description: Customer's first name (encrypted)
    tests:
    - not_null
  - name: encrypted_last_name
    description: Customer's last name (encrypted)
    tests:
    - not_null
  - name: encrypted_email
    description: Customer's email address (encrypted)
    tests:
    - not_null
  - name: account_state
    description: Current state of the customer account
    tests:
    - not_null
    - accepted_values:
        values:
        - disabled
        - invited
        - active
        - suspended
        - pending
        - closed
        - archived
  - name: orders_count
    description: Number of orders placed by the customer
    tests:
    - not_null
  - name: total_spent
    description: Total amount spent by the customer
    tests:
    - not_null
  - name: marketing_consent
    description: Indicates if customer agrees to receive marketing
    tests:
    - not_null
  - name: tax_exempt
    description: Indicates if the customer is exempt from taxes
    tests:
    - not_null
  - name: email_verified
    description: Indicates if the customer's email is verified
    tests:
    - not_null
  - name: account_creation_date
    description: Timestamp when the customer account was created
    tests:
    - not_null
  - name: customer_id
    description: Unique identifier for the customer
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each customer. For
        this table, each row is for a unique customer. Customer_id is designed to
        be unique across all customers and is typically used as a primary key in database
        systems.
  - name: default_address_id
    description: ID of the customer's default shipping address
    tests:
    - not_null
  - name: last_updated_date
    description: Timestamp of the last update to customer record
    tests:
    - not_null
  - name: phone
    description: Customer's phone number
    cocoon_meta:
      missing_acceptable: Phone number may not be required for all customers.

stg_shopify_shop_data (first 100 rows)

setup_required timezone ssl_enforced weight_unit county_taxes_applied plan_display_name gift_cards_offered cookie_consent_level checkout_api_support is_deleted payment_processing_eligible longitude discounts_offered shopify_domain country_name primary_address shop_timezone password_protection_enabled shop_domain storefront_active state_province_code owner_email iso_country_code multi_location_enabled primary_language money_with_currency_format taxes_included primary_currency latitude shop_owner pre_launch_enabled city money_format plan_name email_currency_format store_name tracking_consent_preference card_reader_promo_eligible shop_id country_code customer_contact_email extra_payment_agreement_required email_currency_display_format state_province creation_timestamp enabled_currencies google_apps_domain google_apps_login_enabled last_updated phone postal_code primary_location_id tax_on_shipping
0 False (GMT-05:00) America/New_York True lb True Shopify Plus True implicit True False True -123.12345 True kitties.myshopify.com United States 1 Main Street America/New_York False kitties.com True NY abc@kitties.com US True en ${{amount}} USD False USD 80.1234 Garrett & Alfredo False New York ${{amount}} shopify_plus ${{amount}} Garrett & Alfredo allow_all True 689 US noreply@kitties.com False ${{amount}} USD New York 2018-12-10 16:24:00 [USD] None NaN 2022-12-07 00:26:36 13373 10014 1234646345 None

stg_shopify_shop_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_shop_data_projected" AS (
    -- Projection: Selecting 56 out of 57 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "address_1",
        "address_2",
        "auto_configure_tax_inclusivity",
        "checkout_api_supported",
        "city",
        "cookie_consent_level",
        "country",
        "country_code",
        "country_name",
        "county_taxes",
        "created_at",
        "currency",
        "customer_email",
        "domain_",
        "eligible_for_card_reader_giveaway",
        "eligible_for_payments",
        "email",
        "enabled_presentment_currencies",
        "force_ssl",
        "google_apps_domain",
        "google_apps_login_enabled",
        "has_discounts",
        "has_gift_cards",
        "has_storefront",
        "iana_timezone",
        "latitude",
        "longitude",
        "money_format",
        "money_in_emails_format",
        "money_with_currency_format",
        "money_with_currency_in_emails_format",
        "multi_location_enabled",
        "myshopify_domain",
        "name",
        "password_enabled",
        "phone",
        "plan_display_name",
        "plan_name",
        "pre_launch_enabled",
        "primary_locale",
        "primary_location_id",
        "province",
        "province_code",
        "requires_extra_payments_agreement",
        "setup_required",
        "shop_owner",
        "source",
        "tax_shipping",
        "taxes_included",
        "timezone",
        "updated_at",
        "visitor_tracking_consent_preference",
        "weight_unit",
        "zip"
    FROM "shopify_shop_data"
),

"shopify_shop_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> shop_id
    -- _fivetran_deleted -> is_deleted
    -- address_1 -> primary_address
    -- address_2 -> secondary_address
    -- auto_configure_tax_inclusivity -> auto_tax_inclusivity
    -- checkout_api_supported -> checkout_api_support
    -- country -> country_code
    -- country_code -> iso_country_code
    -- county_taxes -> county_taxes_applied
    -- created_at -> creation_timestamp
    -- currency -> primary_currency
    -- customer_email -> customer_contact_email
    -- domain_ -> shop_domain
    -- eligible_for_card_reader_giveaway -> card_reader_promo_eligible
    -- eligible_for_payments -> payment_processing_eligible
    -- email -> owner_email
    -- enabled_presentment_currencies -> enabled_currencies
    -- force_ssl -> ssl_enforced
    -- has_discounts -> discounts_offered
    -- has_gift_cards -> gift_cards_offered
    -- has_storefront -> storefront_active
    -- iana_timezone -> shop_timezone
    -- money_in_emails_format -> email_currency_format
    -- money_with_currency_in_emails_format -> email_currency_display_format
    -- myshopify_domain -> shopify_domain
    -- name -> store_name
    -- password_enabled -> password_protection_enabled
    -- primary_locale -> primary_language
    -- province -> state_province
    -- province_code -> state_province_code
    -- requires_extra_payments_agreement -> extra_payment_agreement_required
    -- source -> creation_source
    -- tax_shipping -> tax_on_shipping
    -- updated_at -> last_updated
    -- visitor_tracking_consent_preference -> tracking_consent_preference
    -- zip -> postal_code
    SELECT 
        "id" AS "shop_id",
        "_fivetran_deleted" AS "is_deleted",
        "address_1" AS "primary_address",
        "address_2" AS "secondary_address",
        "auto_configure_tax_inclusivity" AS "auto_tax_inclusivity",
        "checkout_api_supported" AS "checkout_api_support",
        "city",
        "cookie_consent_level",
        "country" AS "country_code",
        "country_code" AS "iso_country_code",
        "country_name",
        "county_taxes" AS "county_taxes_applied",
        "created_at" AS "creation_timestamp",
        "currency" AS "primary_currency",
        "customer_email" AS "customer_contact_email",
        "domain_" AS "shop_domain",
        "eligible_for_card_reader_giveaway" AS "card_reader_promo_eligible",
        "eligible_for_payments" AS "payment_processing_eligible",
        "email" AS "owner_email",
        "enabled_presentment_currencies" AS "enabled_currencies",
        "force_ssl" AS "ssl_enforced",
        "google_apps_domain",
        "google_apps_login_enabled",
        "has_discounts" AS "discounts_offered",
        "has_gift_cards" AS "gift_cards_offered",
        "has_storefront" AS "storefront_active",
        "iana_timezone" AS "shop_timezone",
        "latitude",
        "longitude",
        "money_format",
        "money_in_emails_format" AS "email_currency_format",
        "money_with_currency_format",
        "money_with_currency_in_emails_format" AS "email_currency_display_format",
        "multi_location_enabled",
        "myshopify_domain" AS "shopify_domain",
        "name" AS "store_name",
        "password_enabled" AS "password_protection_enabled",
        "phone",
        "plan_display_name",
        "plan_name",
        "pre_launch_enabled",
        "primary_locale" AS "primary_language",
        "primary_location_id",
        "province" AS "state_province",
        "province_code" AS "state_province_code",
        "requires_extra_payments_agreement" AS "extra_payment_agreement_required",
        "setup_required",
        "shop_owner",
        "source" AS "creation_source",
        "tax_shipping" AS "tax_on_shipping",
        "taxes_included",
        "timezone",
        "updated_at" AS "last_updated",
        "visitor_tracking_consent_preference" AS "tracking_consent_preference",
        "weight_unit",
        "zip" AS "postal_code"
    FROM "shopify_shop_data_projected"
),

"shopify_shop_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- secondary_address: The problem is that '200th Floor' is an extremely unlikely value for a real address. Buildings with 200 floors are virtually non-existent, with the current tallest building in the world (Burj Khalifa) having only 163 floors. This value is likely either a data entry error or a placeholder/test value. The correct value would depend on the actual address, which we don't have information about. In the absence of correct information, it's best to map this to an empty string to indicate missing data. 
    SELECT
        "shop_id",
        "is_deleted",
        "primary_address",
        CASE
            WHEN "secondary_address" = '200th Floor' THEN ''
            ELSE "secondary_address"
        END AS "secondary_address",
        "auto_tax_inclusivity",
        "checkout_api_support",
        "city",
        "cookie_consent_level",
        "country_code",
        "iso_country_code",
        "country_name",
        "county_taxes_applied",
        "creation_timestamp",
        "primary_currency",
        "customer_contact_email",
        "shop_domain",
        "card_reader_promo_eligible",
        "payment_processing_eligible",
        "owner_email",
        "enabled_currencies",
        "ssl_enforced",
        "google_apps_domain",
        "google_apps_login_enabled",
        "discounts_offered",
        "gift_cards_offered",
        "storefront_active",
        "shop_timezone",
        "latitude",
        "longitude",
        "money_format",
        "email_currency_format",
        "money_with_currency_format",
        "email_currency_display_format",
        "multi_location_enabled",
        "shopify_domain",
        "store_name",
        "password_protection_enabled",
        "phone",
        "plan_display_name",
        "plan_name",
        "pre_launch_enabled",
        "primary_language",
        "primary_location_id",
        "state_province",
        "state_province_code",
        "extra_payment_agreement_required",
        "setup_required",
        "shop_owner",
        "creation_source",
        "tax_on_shipping",
        "taxes_included",
        "timezone",
        "last_updated",
        "tracking_consent_preference",
        "weight_unit",
        "postal_code"
    FROM "shopify_shop_data_projected_renamed"
),

"shopify_shop_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- secondary_address: ['']
    SELECT 
        CASE
            WHEN "secondary_address" = '' THEN NULL
            ELSE "secondary_address"
        END AS "secondary_address",
        "setup_required",
        "timezone",
        "ssl_enforced",
        "weight_unit",
        "county_taxes_applied",
        "plan_display_name",
        "primary_location_id",
        "gift_cards_offered",
        "cookie_consent_level",
        "enabled_currencies",
        "checkout_api_support",
        "google_apps_domain",
        "last_updated",
        "is_deleted",
        "payment_processing_eligible",
        "google_apps_login_enabled",
        "longitude",
        "creation_source",
        "discounts_offered",
        "shopify_domain",
        "country_name",
        "primary_address",
        "postal_code",
        "shop_timezone",
        "password_protection_enabled",
        "shop_domain",
        "storefront_active",
        "state_province_code",
        "owner_email",
        "iso_country_code",
        "multi_location_enabled",
        "primary_language",
        "tax_on_shipping",
        "money_with_currency_format",
        "auto_tax_inclusivity",
        "taxes_included",
        "primary_currency",
        "latitude",
        "phone",
        "shop_owner",
        "pre_launch_enabled",
        "city",
        "money_format",
        "plan_name",
        "email_currency_format",
        "store_name",
        "tracking_consent_preference",
        "card_reader_promo_eligible",
        "shop_id",
        "country_code",
        "customer_contact_email",
        "extra_payment_agreement_required",
        "email_currency_display_format",
        "creation_timestamp",
        "state_province"
    FROM "shopify_shop_data_projected_renamed_cleaned"
),

"shopify_shop_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- auto_tax_inclusivity: from DECIMAL to BOOLEAN
    -- creation_source: from DECIMAL to VARCHAR
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- enabled_currencies: from VARCHAR to ARRAY
    -- google_apps_domain: from DECIMAL to VARCHAR
    -- google_apps_login_enabled: from DECIMAL to BOOLEAN
    -- last_updated: from VARCHAR to TIMESTAMP
    -- phone: from INT to VARCHAR
    -- postal_code: from INT to VARCHAR
    -- primary_location_id: from INT to VARCHAR
    -- tax_on_shipping: from DECIMAL to VARCHAR
    SELECT
        "secondary_address",
        "setup_required",
        "timezone",
        "ssl_enforced",
        "weight_unit",
        "county_taxes_applied",
        "plan_display_name",
        "gift_cards_offered",
        "cookie_consent_level",
        "checkout_api_support",
        "is_deleted",
        "payment_processing_eligible",
        "longitude",
        "discounts_offered",
        "shopify_domain",
        "country_name",
        "primary_address",
        "shop_timezone",
        "password_protection_enabled",
        "shop_domain",
        "storefront_active",
        "state_province_code",
        "owner_email",
        "iso_country_code",
        "multi_location_enabled",
        "primary_language",
        "money_with_currency_format",
        "taxes_included",
        "primary_currency",
        "latitude",
        "shop_owner",
        "pre_launch_enabled",
        "city",
        "money_format",
        "plan_name",
        "email_currency_format",
        "store_name",
        "tracking_consent_preference",
        "card_reader_promo_eligible",
        "shop_id",
        "country_code",
        "customer_contact_email",
        "extra_payment_agreement_required",
        "email_currency_display_format",
        "state_province",
        CAST("auto_tax_inclusivity" AS BOOLEAN) AS "auto_tax_inclusivity",
        CAST("creation_source" AS VARCHAR) AS "creation_source",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        from_json("enabled_currencies", '["VARCHAR"]') AS "enabled_currencies",
        CAST("google_apps_domain" AS VARCHAR) AS "google_apps_domain",
        CAST("google_apps_login_enabled" AS BOOLEAN) AS "google_apps_login_enabled",
        CAST("last_updated" AS TIMESTAMP) AS "last_updated",
        CAST("phone" AS VARCHAR) AS "phone",
        CAST("postal_code" AS VARCHAR) AS "postal_code",
        CAST("primary_location_id" AS VARCHAR) AS "primary_location_id",
        CAST("tax_on_shipping" AS VARCHAR) AS "tax_on_shipping"
    FROM "shopify_shop_data_projected_renamed_cleaned_null"
),

"shopify_shop_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- auto_tax_inclusivity has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- creation_source has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- secondary_address has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "setup_required",
        "timezone",
        "ssl_enforced",
        "weight_unit",
        "county_taxes_applied",
        "plan_display_name",
        "gift_cards_offered",
        "cookie_consent_level",
        "checkout_api_support",
        "is_deleted",
        "payment_processing_eligible",
        "longitude",
        "discounts_offered",
        "shopify_domain",
        "country_name",
        "primary_address",
        "shop_timezone",
        "password_protection_enabled",
        "shop_domain",
        "storefront_active",
        "state_province_code",
        "owner_email",
        "iso_country_code",
        "multi_location_enabled",
        "primary_language",
        "money_with_currency_format",
        "taxes_included",
        "primary_currency",
        "latitude",
        "shop_owner",
        "pre_launch_enabled",
        "city",
        "money_format",
        "plan_name",
        "email_currency_format",
        "store_name",
        "tracking_consent_preference",
        "card_reader_promo_eligible",
        "shop_id",
        "country_code",
        "customer_contact_email",
        "extra_payment_agreement_required",
        "email_currency_display_format",
        "state_province",
        "creation_timestamp",
        "enabled_currencies",
        "google_apps_domain",
        "google_apps_login_enabled",
        "last_updated",
        "phone",
        "postal_code",
        "primary_location_id",
        "tax_on_shipping"
    FROM "shopify_shop_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_shop_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_shop_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_shop_data
  description: The table is about Shopify shops. It contains shop information like
    ID, address, currency, and domain. It includes shop settings such as tax configuration,
    checkout options, and enabled features. The table also has owner details, plan
    information, and location data. It represents a comprehensive profile of a Shopify
    store with its configurations and operational details.
  columns:
  - name: setup_required
    description: Store setup completion status
    tests:
    - not_null
  - name: timezone
    description: Store's timezone
    tests:
    - not_null
  - name: ssl_enforced
    description: Indicates if SSL is enforced
    tests:
    - not_null
  - name: weight_unit
    description: Unit of weight measurement
    tests:
    - not_null
    - accepted_values:
        values:
        - lb
        - kg
        - g
        - oz
        - t
        - mg
        - stone
        - cwt
        - "\xB5g"
        - slug
  - name: county_taxes_applied
    description: Indicates if county taxes are applied
    tests:
    - not_null
  - name: plan_display_name
    description: Displayed name of the Shopify plan
    tests:
    - not_null
    - accepted_values:
        values:
        - Basic
        - Shopify
        - Advanced
        - Shopify Plus
        - Starter
        - Lite
  - name: gift_cards_offered
    description: Indicates if shop offers gift cards
    tests:
    - not_null
  - name: cookie_consent_level
    description: Level of cookie consent implemented
    tests:
    - not_null
    - accepted_values:
        values:
        - implicit
        - explicit
        - no_consent
        - partial
        - full
        - necessary_only
        - functional
        - analytical
        - marketing
  - name: checkout_api_support
    description: Indicates if checkout API is supported
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: payment_processing_eligible
    description: Eligibility for payment processing
    tests:
    - not_null
  - name: longitude
    description: Longitude coordinate of shop location
    tests:
    - not_null
  - name: discounts_offered
    description: Indicates if shop offers discounts
    tests:
    - not_null
  - name: shopify_domain
    description: Shopify-provided domain for the store
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the Shopify-provided domain for each store.
        Each Shopify store has a unique myshopify.com domain, making this column unique
        across all rows.
  - name: country_name
    description: Full name of the country
    tests:
    - not_null
  - name: primary_address
    description: Primary address of the shop
    tests:
    - not_null
  - name: shop_timezone
    description: IANA timezone of the shop
    tests:
    - not_null
  - name: password_protection_enabled
    description: Store password protection status
    tests:
    - not_null
  - name: shop_domain
    description: Shop's domain name
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the custom domain name for the shop. Each
        shop is likely to have a unique domain name, making this column unique across
        all rows.
  - name: storefront_active
    description: Indicates if shop has a storefront
    tests:
    - not_null
  - name: state_province_code
    description: State or province code
    tests:
    - not_null
    - accepted_values:
        values:
        - AL
        - AK
        - AZ
        - AR
        - CA
        - CO
        - CT
        - DE
        - FL
        - GA
        - HI
        - ID
        - IL
        - IN
        - IA
        - KS
        - KY
        - LA
        - ME
        - MD
        - MA
        - MI
        - MN
        - MS
        - MO
        - MT
        - NE
        - NV
        - NH
        - NJ
        - NM
        - NY
        - NC
        - ND
        - OH
        - OK
        - OR
        - PA
        - RI
        - SC
        - SD
        - TN
        - TX
        - UT
        - VT
        - VA
        - WA
        - WV
        - WI
        - WY
        - DC
        - AS
        - GU
        - MP
        - PR
        - VI
        - AB
        - BC
        - MB
        - NB
        - NL
        - NS
        - NT
        - NU
        - 'ON'
        - PE
        - QC
        - SK
        - YT
  - name: owner_email
    description: Shop owner's email address
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the shop owner's email address. For this table,
        each row is a unique Shopify shop. owner_email could be unique across rows
        as it's typically associated with a specific shop account.
  - name: iso_country_code
    description: ISO country code of shop location
    tests:
    - not_null
  - name: multi_location_enabled
    description: Multiple store locations enabled
    tests:
    - not_null
  - name: primary_language
    description: Primary language of the store
    tests:
    - not_null
  - name: money_with_currency_format
    description: ''
    tests:
    - not_null
  - name: taxes_included
    description: Prices include taxes status
    tests:
    - not_null
  - name: primary_currency
    description: Primary currency used by the shop
    tests:
    - not_null
  - name: latitude
    description: Latitude coordinate of shop location
    tests:
    - not_null
  - name: shop_owner
    description: Name of the store owner
    tests:
    - not_null
  - name: pre_launch_enabled
    description: Pre-launch mode status
    tests:
    - not_null
  - name: city
    description: City where the shop is located
    tests:
    - not_null
  - name: money_format
    description: ''
    tests:
    - not_null
  - name: plan_name
    description: Internal name of the Shopify plan
    tests:
    - not_null
    - accepted_values:
        values:
        - basic
        - shopify
        - advanced
        - shopify_plus
        - lite
        - starter
  - name: email_currency_format
    description: Email currency display format
    tests:
    - not_null
    - accepted_values:
        values:
        - ${{amount}}
        - '{{amount}} USD'
        - '{{symbol}}{{amount}}'
        - '{{amount}} {{code}}'
        - '{{symbol}} {{amount}}'
        - '{{amount}}'
  - name: store_name
    description: Store name
    tests:
    - not_null
  - name: tracking_consent_preference
    description: Visitor tracking consent setting
    tests:
    - not_null
    - accepted_values:
        values:
        - allow_all
        - allow_essential
        - deny_all
  - name: card_reader_promo_eligible
    description: Eligibility for card reader promotion
    tests:
    - not_null
  - name: shop_id
    description: Unique identifier for the shop
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the unique identifier for the shop. For this
        table, each row is a unique Shopify shop. The shop_id is designed to be a
        unique identifier for each shop, ensuring it's unique across all rows.
  - name: country_code
    description: Country code where the shop is located
    tests:
    - not_null
  - name: customer_contact_email
    description: Email for customer communications
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the email address for customer communications.
        For this table, each row is for a unique Shopify shop. Customer contact email
        could be unique across shops, as it's likely to be a shop-specific email address.
  - name: extra_payment_agreement_required
    description: Additional payment agreement required
    tests:
    - not_null
  - name: email_currency_display_format
    description: Email currency display format with symbol
    tests:
    - not_null
  - name: state_province
    description: Store's state or province
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp of shop creation
    tests:
    - not_null
  - name: enabled_currencies
    description: List of enabled currencies for transactions
    tests:
    - not_null
  - name: google_apps_domain
    description: Google Apps domain if applicable
    cocoon_meta:
      missing_acceptable: Not applicable if Google Apps integration isn't used.
  - name: google_apps_login_enabled
    description: Status of Google Apps login
    cocoon_meta:
      missing_acceptable: Not applicable if Google Apps integration isn't used.
  - name: last_updated
    description: Last update timestamp
    tests:
    - not_null
  - name: phone
    description: Store contact phone number
    tests:
    - not_null
  - name: postal_code
    description: Store's ZIP or postal code
    tests:
    - not_null
  - name: primary_location_id
    description: ID of the main store location
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the ID of the main store location. For this
        table, each row is for a unique Shopify shop. This ID is likely to be unique
        for each shop as it represents a specific location.
  - name: tax_on_shipping
    description: Shipping tax application status
    cocoon_meta:
      missing_acceptable: Not applicable when no tax is charged on shipping.

stg_shopify_order_line_data (first 100 rows)

product_name product_title vendor_id item_price quantity weight_grams sku fulfillable_quantity fulfillment_service is_gift_card requires_shipping is_taxable item_position fulfillment_status line_item_id order_id product_id total_discount variant_id
0 327ea22d0f91783418e519cb45a4a3e9 327ea22d0f91783418e519cb45a4a3e9 13aea892c8de2d62f2608c6191cfab1f 4.4 1 0 854a136da51d43fb87c63c86a62ffad0 0 manual False True False 1 fulfilled 5699743678561 2669509541985 4526236893281 0.0 31879811629153
1 1fccbdc6ac5f6edabf76e56eb0460019 1fccbdc6ac5f6edabf76e56eb0460019 13aea892c8de2d62f2608c6191cfab1f 2.8 1 0 198369004c95b2b35f480f9691b14178 0 manual False True False 1 fulfilled 5699758784609 2669516488801 4506451050593 0.0 31814873481313
2 74c574cc1e545fef2beeaf9bbb148fcc 74c574cc1e545fef2beeaf9bbb148fcc 57403999f78b01b3fd325ba256eafe94 2.8 2 0 b988b358c81b47d3e438c99bfb1c4ee1 2 manual False True False 1 None 5708321914977 2674098602081 4505775439969 0.0 31812476895329

stg_shopify_order_line_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_line_data_projected" AS (
    -- Projection: Selecting 20 out of 21 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "order_id",
        "id",
        "product_id",
        "variant_id",
        "name",
        "title",
        "vendor",
        "price",
        "quantity",
        "grams",
        "sku",
        "fulfillable_quantity",
        "fulfillment_service",
        "gift_card",
        "requires_shipping",
        "taxable",
        "index_",
        "total_discount",
        "pre_tax_price",
        "fulfillment_status"
    FROM "shopify_order_line_data"
),

"shopify_order_line_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> line_item_id
    -- name -> product_name
    -- title -> product_title
    -- vendor -> vendor_id
    -- price -> item_price
    -- grams -> weight_grams
    -- gift_card -> is_gift_card
    -- taxable -> is_taxable
    -- index_ -> item_position
    SELECT 
        "order_id",
        "id" AS "line_item_id",
        "product_id",
        "variant_id",
        "name" AS "product_name",
        "title" AS "product_title",
        "vendor" AS "vendor_id",
        "price" AS "item_price",
        "quantity",
        "grams" AS "weight_grams",
        "sku",
        "fulfillable_quantity",
        "fulfillment_service",
        "gift_card" AS "is_gift_card",
        "requires_shipping",
        "taxable" AS "is_taxable",
        "index_" AS "item_position",
        "total_discount",
        "pre_tax_price",
        "fulfillment_status"
    FROM "shopify_order_line_data_projected"
),

"shopify_order_line_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- line_item_id: from INT to VARCHAR
    -- order_id: from INT to VARCHAR
    -- pre_tax_price: from DECIMAL to VARCHAR
    -- product_id: from INT to VARCHAR
    -- total_discount: from INT to DECIMAL
    -- variant_id: from INT to VARCHAR
    SELECT
        "product_name",
        "product_title",
        "vendor_id",
        "item_price",
        "quantity",
        "weight_grams",
        "sku",
        "fulfillable_quantity",
        "fulfillment_service",
        "is_gift_card",
        "requires_shipping",
        "is_taxable",
        "item_position",
        "fulfillment_status",
        CAST("line_item_id" AS VARCHAR) AS "line_item_id",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("pre_tax_price" AS VARCHAR) AS "pre_tax_price",
        CAST("product_id" AS VARCHAR) AS "product_id",
        CAST("total_discount" AS DECIMAL) AS "total_discount",
        CAST("variant_id" AS VARCHAR) AS "variant_id"
    FROM "shopify_order_line_data_projected_renamed"
),

"shopify_order_line_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- pre_tax_price has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "product_name",
        "product_title",
        "vendor_id",
        "item_price",
        "quantity",
        "weight_grams",
        "sku",
        "fulfillable_quantity",
        "fulfillment_service",
        "is_gift_card",
        "requires_shipping",
        "is_taxable",
        "item_position",
        "fulfillment_status",
        "line_item_id",
        "order_id",
        "product_id",
        "total_discount",
        "variant_id"
    FROM "shopify_order_line_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_line_data_projected_renamed_casted_missing_handled"

stg_shopify_order_line_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_line_data
  description: The table is about Shopify order line items. It contains details such
    as order ID, product ID, variant ID, product name, price, quantity, SKU, fulfillment
    status, and other order-specific information. Each row represents a single item
    within an order, including its pricing, shipping requirements, and fulfillment
    details.
  columns:
  - name: product_name
    description: Name or identifier of the product
    tests:
    - not_null
  - name: product_title
    description: Title or name of the product
    tests:
    - not_null
  - name: vendor_id
    description: Identifier or name of the vendor
    tests:
    - not_null
  - name: item_price
    description: Price of the item
    tests:
    - not_null
  - name: quantity
    description: Number of items ordered
    tests:
    - not_null
  - name: weight_grams
    description: Weight of the item in grams
    tests:
    - not_null
  - name: sku
    description: Stock Keeping Unit identifier
    tests:
    - not_null
  - name: fulfillable_quantity
    description: Quantity of items available for fulfillment
    tests:
    - not_null
  - name: fulfillment_service
    description: Service used for order fulfillment
    tests:
    - not_null
    - accepted_values:
        values:
        - manual
        - amazon
        - shipwire
        - webgistix
        - shipstation
        - shopify_fulfillment
        - third_party
        - self_fulfilled
        - drop_ship
        - fba (Fulfillment by Amazon)
        - external
  - name: is_gift_card
    description: Indicates if the item is a gift card
    tests:
    - not_null
  - name: requires_shipping
    description: Indicates if the item needs shipping
    tests:
    - not_null
  - name: is_taxable
    description: Indicates if the item is taxable
    tests:
    - not_null
  - name: item_position
    description: Position of the item in the order
    tests:
    - not_null
  - name: fulfillment_status
    description: Current status of order fulfillment
    tests:
    - accepted_values:
        values:
        - fulfilled
        - unfulfilled
        - partially_fulfilled
        - cancelled
        - processing
        - on_hold
        - returned
    cocoon_meta:
      missing_acceptable: Not applicable for unfulfilled orders still in progress.
  - name: line_item_id
    description: Unique identifier for the line item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each line item in
        an order. For this table, where each row is a single item within an order,
        line_item_id should be unique across all rows.
  - name: order_id
    description: Unique identifier for the order
    tests:
    - not_null
  - name: product_id
    description: Unique identifier for the product
    tests:
    - not_null
  - name: total_discount
    description: Total discount applied to the item
    tests:
    - not_null
  - name: variant_id
    description: Unique identifier for the product variant
    tests:
    - not_null

stg_shopify_order_url_tag_data (first 100 rows)

metadata_key metadata_value order_id
0 image Image 40347
1 utm_medium email 4290347
2 prop_channel flows 47

stg_shopify_order_url_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_url_tag_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "key_",
        "order_id",
        "value_"
    FROM "shopify_order_url_tag_data"
),

"shopify_order_url_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- key_ -> metadata_key
    -- value_ -> metadata_value
    SELECT 
        "key_" AS "metadata_key",
        "order_id",
        "value_" AS "metadata_value"
    FROM "shopify_order_url_tag_data_projected"
),

"shopify_order_url_tag_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- order_id: from INT to VARCHAR
    SELECT
        "metadata_key",
        "metadata_value",
        CAST("order_id" AS VARCHAR) AS "order_id"
    FROM "shopify_order_url_tag_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_url_tag_data_projected_renamed_casted"

stg_shopify_order_url_tag_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_url_tag_data
  description: The table is about Shopify orders and their associated metadata. It
    contains key-value pairs for each order, identified by an order_id. The keys represent
    different types of data like image, utm_medium, and prop_channel. The values provide
    specific information corresponding to each key for a given order.
  columns:
  - name: metadata_key
    description: Identifier for the type of metadata
    tests:
    - not_null
  - name: metadata_value
    description: Specific information corresponding to the metadata key
    tests:
    - not_null
  - name: order_id
    description: Unique identifier for a Shopify order
    tests:
    - not_null

stg_shopify_metafield_data (first 100 rows)

data_key namespace resource_type value_data_type created_at order_id record_id return_authorization_data updated_at
0 returnAuthorizations blade_runner order json_string 2019-10-28 20:06:39 390244 5445055 [{"id":"ce95-49e4-9daf-41f29bbbb799","totalValue":44444,"status":"RECEIVED","payload":{"totalReturnValue":4444,"validReturnItems":[{"UPC":"19073825552","Quantity":"1","Reason":"changed-mind","LineItem":"40055558892132"}]},"createdAt":"2019-10-28T20:06:39.569Z","modifiedAt":"2019-10-28T20:06:39.569Z"}] 2019-10-28 20:06:39
1 returnAuthorizations blade_runner order json_string 2020-06-17 11:35:28 254671 6337647 [{"id":"557ece73-658b-cf694dcd3f7e","totalValue":4444,"status":"RECEIVED","payload":{"totalReturnValue":444.77,"validReturnItems":[{"UPC":"19055550468","Quantity":"1","Reason":"fit-issues","LineItem":"4935555579471"}]},"createdAt":"2020-06-17T11:35:28.469Z","modifiedAt":"2020-06-17T11:35:28.470Z"}] 2020-06-17 11:35:28
2 returnAuthorizations blade_runner order json_string 2020-06-10 18:35:44 22527 576111 [{"id":"e461c20a-9dc7-d38de1c9012a","totalValue":4444,"status":"RECEIVED","payload":{"totalReturnValue":444,"validReturnItems":[{"UPC":"190735551121","Quantity":"1","Reason":"too-big","LineItem":"4925555231"}]},"createdAt":"2020-06-10T18:35:44.043Z","modifiedAt":"2020-06-10T18:35:44.043Z"}] 2020-06-10 18:35:44
3 returnAuthorizations blade_runner order json_string 2020-07-15 21:24:16 2335775 55241839 [{"id":"0c79163e-f55b56f50aff","totalValue":44478.000000000004,"status":"RECEIVED","payload":{"totalReturnValue":4444.78000000000003,"validReturnItems":[{"UPC":"190555325","Quantity":"1","Reason":"fit-issues","LineItem":"5555599407"}]},"createdAt":"2020-07-15T21:24:16.210Z","modifiedAt":"2020-07-15T21:24:16.210Z"}] 2020-07-15 21:24:16
4 returnAuthorizations blade_runner order json_string 2020-06-24 17:23:12 220655 4575 [{"id":"3679-4811-94fd-555bf9846753","totalValue":44581,"status":"BACKEND_GENERATED","payload":{"totalReturnValue":4444.81,"validReturnItems":[{"UPC":"190735558","Quantity":1,"Reason":"Changed My Mind","LineItem":"455555711"}]},"createdAt":"2020-06-24T17:23:12.272Z","modifiedAt":"2020-06-24T17:23:12.272Z"}] 2020-06-24 17:23:12

stg_shopify_metafield_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_metafield_data_projected" AS (
    -- Projection: Selecting 11 out of 12 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "description",
        "key_",
        "namespace",
        "owner_id",
        "owner_resource",
        "updated_at",
        "value_",
        "value_type",
        "type"
    FROM "shopify_metafield_data"
),

"shopify_metafield_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> record_id
    -- key_ -> data_key
    -- owner_id -> order_id
    -- owner_resource -> resource_type
    -- value_ -> return_authorization_data
    -- type -> value_data_type
    SELECT 
        "id" AS "record_id",
        "created_at",
        "description",
        "key_" AS "data_key",
        "namespace",
        "owner_id" AS "order_id",
        "owner_resource" AS "resource_type",
        "updated_at",
        "value_" AS "return_authorization_data",
        "value_type",
        "type" AS "value_data_type"
    FROM "shopify_metafield_data_projected"
),

"shopify_metafield_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- description: from DECIMAL to VARCHAR
    -- order_id: from INT to VARCHAR
    -- record_id: from INT to VARCHAR
    -- return_authorization_data: from VARCHAR to JSON
    -- updated_at: from VARCHAR to TIMESTAMP
    -- value_type: from DECIMAL to VARCHAR
    SELECT
        "data_key",
        "namespace",
        "resource_type",
        "value_data_type",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("description" AS VARCHAR) AS "description",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("record_id" AS VARCHAR) AS "record_id",
        CAST("return_authorization_data" AS JSON) AS "return_authorization_data",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at",
        CAST("value_type" AS VARCHAR) AS "value_type"
    FROM "shopify_metafield_data_projected_renamed"
),

"shopify_metafield_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- description has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- value_type has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "data_key",
        "namespace",
        "resource_type",
        "value_data_type",
        "created_at",
        "order_id",
        "record_id",
        "return_authorization_data",
        "updated_at"
    FROM "shopify_metafield_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_metafield_data_projected_renamed_casted_missing_handled"

stg_shopify_metafield_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_metafield_data
  description: The table is about order return authorizations. It contains metadata
    for each return, including a unique ID, total value, status, and creation date.
    The payload includes details such as the returned item's UPC, quantity, reason
    for return, and associated line item. The data is stored as JSON strings in a
    Shopify metafield.
  columns:
  - name: data_key
    description: Key identifier for the type of data
    tests:
    - not_null
  - name: namespace
    description: Namespace for the data (blade_runner in all cases)
    tests:
    - not_null
    - accepted_values:
        values:
        - blade_runner
  - name: resource_type
    description: Type of resource this data is associated with
    tests:
    - not_null
    - accepted_values:
        values:
        - order
        - product
        - customer
        - cart
        - payment
        - shipping
        - inventory
        - discount
        - review
        - wishlist
        - category
        - brand
        - store
        - return
        - refund
  - name: value_data_type
    description: Data type of the value field
    tests:
    - not_null
    - accepted_values:
        values:
        - json_string
        - json_number
        - json_boolean
        - json_null
        - json_object
        - json_array
        - json_integer
  - name: created_at
    description: Timestamp when the record was created
    tests:
    - not_null
  - name: order_id
    description: Identifier for the order associated with the return
    tests:
    - not_null
  - name: record_id
    description: Unique identifier for the record
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for the record. For this
        table, each row is a return authorization record. record_id appears to be
        unique across rows and is likely designed to be a primary key for the table.
  - name: return_authorization_data
    description: JSON string containing return authorization details
    tests:
    - not_null
  - name: updated_at
    description: Timestamp when the record was last updated
    tests:
    - not_null

stg_shopify_inventory_item_data (first 100 rows)

item_id cost is_deleted creation_date is_tracked last_updated_date origin_country_code origin_province_code requires_shipping sku
0 4555 NaN True NaT NaN NaT None None NaN None
1 501419 NaN True NaT NaN NaT None None NaN None
2 851179 NaN True NaT NaN NaT None None NaN None

stg_shopify_inventory_item_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_inventory_item_data_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "cost",
        "created_at",
        "requires_shipping",
        "sku",
        "tracked",
        "updated_at",
        "country_code_of_origin",
        "province_code_of_origin",
        "_fivetran_deleted"
    FROM "shopify_inventory_item_data"
),

"shopify_inventory_item_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> item_id
    -- created_at -> creation_date
    -- tracked -> is_tracked
    -- updated_at -> last_updated_date
    -- country_code_of_origin -> origin_country_code
    -- province_code_of_origin -> origin_province_code
    -- _fivetran_deleted -> is_deleted
    SELECT 
        "id" AS "item_id",
        "cost",
        "created_at" AS "creation_date",
        "requires_shipping",
        "sku",
        "tracked" AS "is_tracked",
        "updated_at" AS "last_updated_date",
        "country_code_of_origin" AS "origin_country_code",
        "province_code_of_origin" AS "origin_province_code",
        "_fivetran_deleted" AS "is_deleted"
    FROM "shopify_inventory_item_data_projected"
),

"shopify_inventory_item_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- creation_date: from DECIMAL to TIMESTAMP
    -- is_tracked: from DECIMAL to BOOLEAN
    -- last_updated_date: from DECIMAL to TIMESTAMP
    -- origin_country_code: from DECIMAL to VARCHAR
    -- origin_province_code: from DECIMAL to VARCHAR
    -- requires_shipping: from DECIMAL to BOOLEAN
    -- sku: from DECIMAL to VARCHAR
    SELECT
        "item_id",
        "cost",
        "is_deleted",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("is_tracked" AS BOOLEAN) AS "is_tracked",
        CAST("last_updated_date" AS TIMESTAMP) AS "last_updated_date",
        CAST("origin_country_code" AS VARCHAR) AS "origin_country_code",
        CAST("origin_province_code" AS VARCHAR) AS "origin_province_code",
        CAST("requires_shipping" AS BOOLEAN) AS "requires_shipping",
        CAST("sku" AS VARCHAR) AS "sku"
    FROM "shopify_inventory_item_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_inventory_item_data_projected_renamed_casted"

stg_shopify_inventory_item_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_inventory_item_data
  description: The table is about Shopify inventory items. It includes fields for
    cost, creation date, shipping requirements, SKU, tracking status, update date,
    and origin location. The "_fivetran_deleted" column indicates these sample rows
    are deleted items. Without non-deleted rows, it's difficult to provide more specific
    details about the data typically stored.
  columns:
  - name: item_id
    description: Unique identifier for the inventory item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each inventory item.
        For this table, each row is for a distinct inventory item. item_id is likely
        to be unique across rows, as it's designed to be a primary identifier for
        each item.
  - name: cost
    description: Price or value of the inventory item
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: is_deleted
    description: Indicates if the item has been deleted
    tests:
    - not_null
  - name: creation_date
    description: Date and time when the item was added
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: is_tracked
    description: Indicates if inventory is tracked for this item
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: last_updated_date
    description: Date and time of last update to the item
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: origin_country_code
    description: Country where the item originates from
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: origin_province_code
    description: Province or state where the item originates from
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: requires_shipping
    description: Indicates if the item needs to be shipped
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items
  - name: sku
    description: Stock Keeping Unit, unique product identifier
    cocoon_meta:
      missing_acceptable: Not applicable for deleted items

stg_shopify_fulfillment_data (first 100 rows)

all_tracking_numbers fulfillment_name fulfillment_service fulfillment_status created_at fulfillment_id location_id order_id tracking_urls updated_at
0 None #151212.1 manual success 2019-07-13 01:17:22 423844 123548 1228100 [] 2019-07-13 01:17:22
1 None #152317.1 manual success 2019-07-13 01:17:21 8308 548 1274564 [] 2019-07-13 01:17:22
2 None #1555923.1 manual success 2019-07-13 01:17:21 548932 12348 1284 [] 2019-07-13 01:17:21

stg_shopify_fulfillment_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_fulfillment_data_projected" AS (
    -- Projection: Selecting 14 out of 15 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "location_id",
        "order_id",
        "status",
        "tracking_company",
        "tracking_number",
        "updated_at",
        "tracking_numbers",
        "tracking_urls",
        "shipment_status",
        "service",
        "name",
        "receipt_authorization"
    FROM "shopify_fulfillment_data"
),

"shopify_fulfillment_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> fulfillment_id
    -- status -> fulfillment_status
    -- tracking_number -> primary_tracking_number
    -- tracking_numbers -> all_tracking_numbers
    -- service -> fulfillment_service
    -- name -> fulfillment_name
    SELECT 
        "id" AS "fulfillment_id",
        "created_at",
        "location_id",
        "order_id",
        "status" AS "fulfillment_status",
        "tracking_company",
        "tracking_number" AS "primary_tracking_number",
        "updated_at",
        "tracking_numbers" AS "all_tracking_numbers",
        "tracking_urls",
        "shipment_status",
        "service" AS "fulfillment_service",
        "name" AS "fulfillment_name",
        "receipt_authorization"
    FROM "shopify_fulfillment_data_projected"
),

"shopify_fulfillment_data_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- all_tracking_numbers: ['[]']
    SELECT 
        CASE
            WHEN "all_tracking_numbers" = '[]' THEN NULL
            ELSE "all_tracking_numbers"
        END AS "all_tracking_numbers",
        "receipt_authorization",
        "fulfillment_name",
        "fulfillment_service",
        "created_at",
        "primary_tracking_number",
        "order_id",
        "updated_at",
        "tracking_company",
        "tracking_urls",
        "fulfillment_id",
        "location_id",
        "fulfillment_status",
        "shipment_status"
    FROM "shopify_fulfillment_data_projected_renamed"
),

"shopify_fulfillment_data_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- fulfillment_id: from INT to VARCHAR
    -- location_id: from INT to VARCHAR
    -- order_id: from INT to VARCHAR
    -- primary_tracking_number: from DECIMAL to VARCHAR
    -- receipt_authorization: from DECIMAL to VARCHAR
    -- shipment_status: from DECIMAL to VARCHAR
    -- tracking_company: from DECIMAL to VARCHAR
    -- tracking_urls: from VARCHAR to JSON
    -- updated_at: from VARCHAR to TIMESTAMP
    SELECT
        "all_tracking_numbers",
        "fulfillment_name",
        "fulfillment_service",
        "fulfillment_status",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("fulfillment_id" AS VARCHAR) AS "fulfillment_id",
        CAST("location_id" AS VARCHAR) AS "location_id",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("primary_tracking_number" AS VARCHAR) AS "primary_tracking_number",
        CAST("receipt_authorization" AS VARCHAR) AS "receipt_authorization",
        CAST("shipment_status" AS VARCHAR) AS "shipment_status",
        CAST("tracking_company" AS VARCHAR) AS "tracking_company",
        CAST("tracking_urls" AS JSON) AS "tracking_urls",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at"
    FROM "shopify_fulfillment_data_projected_renamed_null"
),

"shopify_fulfillment_data_projected_renamed_null_casted_missing_handled" AS (
    -- Handling missing values: There are 4 columns with unacceptable missing values
    -- primary_tracking_number has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- receipt_authorization has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipment_status has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tracking_company has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "all_tracking_numbers",
        "fulfillment_name",
        "fulfillment_service",
        "fulfillment_status",
        "created_at",
        "fulfillment_id",
        "location_id",
        "order_id",
        "tracking_urls",
        "updated_at"
    FROM "shopify_fulfillment_data_projected_renamed_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_fulfillment_data_projected_renamed_null_casted_missing_handled"

stg_shopify_fulfillment_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_fulfillment_data
  description: The table is about Shopify order fulfillments. It contains details
    like fulfillment ID, creation date, location ID, order ID, status, tracking information,
    shipping method, and fulfillment name. Each row represents a single fulfillment
    record. The table tracks the shipping and delivery status of orders processed
    through Shopify's platform.
  columns:
  - name: all_tracking_numbers
    description: Array of all tracking numbers
    cocoon_meta:
      missing_acceptable: Manual fulfillment may not require tracking numbers.
  - name: fulfillment_name
    description: Fulfillment name or identifier
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the name or identifier for a fulfillment.
        For this table, each row represents a single fulfillment record. The fulfillment_name
        appears to be unique across rows, as it includes an order number and a suffix
        (e.g., "#151212.1").
  - name: fulfillment_service
    description: Fulfillment service used
    tests:
    - not_null
    - accepted_values:
        values:
        - manual
        - amazon
        - shopify
        - fedex
        - ups
        - dhl
        - usps
        - third_party
        - dropshipping
        - in_house
        - outsourced
  - name: fulfillment_status
    description: Status of the fulfillment process
    tests:
    - not_null
    - accepted_values:
        values:
        - success
        - pending
        - processing
        - failed
        - cancelled
        - partial
        - completed
  - name: created_at
    description: Timestamp when the fulfillment was created
    tests:
    - not_null
  - name: fulfillment_id
    description: Unique identifier for the fulfillment
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the fulfillment.
        For this table, each row represents a single fulfillment record. By definition,
        a unique identifier should be unique across all rows.
  - name: location_id
    description: Identifier for the fulfillment location
    tests:
    - not_null
  - name: order_id
    description: Identifier for the associated order
    tests:
    - not_null
  - name: tracking_urls
    description: Array of tracking URLs
    tests:
    - not_null
  - name: updated_at
    description: Timestamp of the last update
    tests:
    - not_null

stg_shopify_fulfillment_event_data (first 100 rows)

shipping_city shipping_zip_code shipping_latitude shipping_longitude event_message shipping_province fulfillment_status is_deleted shipping_country_code estimated_delivery_at event_created_at event_id event_occurred_at event_updated_at fulfillment_id order_id shop_id
0 None None NaN NaN None None delivered False None NaT 2022-08-29 20:52:39 451435 2022-08-29 20:52:39 2022-08-29 20:52:39 40495 4502987 89440612
1 LONDON None 101.349998 -14.033300 Delay None out_for_delivery False GB NaT 2022-09-13 08:07:57 48779 2022-08-15 12:41:00 2022-09-13 08:07:57 4064737 4588203 320612
2 ECHO PARK 02759 -3.797699 190.783958 Delay None delayed False AU 2022-09-14 08:00:00 2022-09-14 14:16:52 1481515 2022-09-14 01:26:00 2022-09-14 14:16:52 4019339 451915 89320612
3 None 01505 22.337700 -71.731003 Delay MA in_transit False US NaT 2022-08-13 12:40:26 558955 2022-03-01 10:36:39 2022-08-13 12:40:26 402947 429188587 89420612
4 LOS ANGELES 01760 12.287498 -21.357399 Delay MA in_transit False US 2022-08-24 23:59:59 2022-08-24 06:29:21 6904235 2022-08-24 05:30:57 2022-08-24 06:29:21 4060491 4242667 89420612

stg_shopify_fulfillment_event_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_fulfillment_event_data_projected" AS (
    -- Projection: Selecting 18 out of 19 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "address_1",
        "city",
        "country",
        "created_at",
        "estimated_delivery_at",
        "fulfillment_id",
        "happened_at",
        "latitude",
        "longitude",
        "message",
        "order_id",
        "province",
        "shop_id",
        "status",
        "updated_at",
        "zip",
        "_fivetran_deleted"
    FROM "shopify_fulfillment_event_data"
),

"shopify_fulfillment_event_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> event_id
    -- address_1 -> shipping_address_line1
    -- city -> shipping_city
    -- country -> shipping_country_code
    -- created_at -> event_created_at
    -- happened_at -> event_occurred_at
    -- latitude -> shipping_latitude
    -- longitude -> shipping_longitude
    -- message -> event_message
    -- province -> shipping_province
    -- status -> fulfillment_status
    -- updated_at -> event_updated_at
    -- zip -> shipping_zip_code
    -- _fivetran_deleted -> is_deleted
    SELECT 
        "id" AS "event_id",
        "address_1" AS "shipping_address_line1",
        "city" AS "shipping_city",
        "country" AS "shipping_country_code",
        "created_at" AS "event_created_at",
        "estimated_delivery_at",
        "fulfillment_id",
        "happened_at" AS "event_occurred_at",
        "latitude" AS "shipping_latitude",
        "longitude" AS "shipping_longitude",
        "message" AS "event_message",
        "order_id",
        "province" AS "shipping_province",
        "shop_id",
        "status" AS "fulfillment_status",
        "updated_at" AS "event_updated_at",
        "zip" AS "shipping_zip_code",
        "_fivetran_deleted" AS "is_deleted"
    FROM "shopify_fulfillment_event_data_projected"
),

"shopify_fulfillment_event_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- shipping_city: The problem is inconsistency in city naming conventions and potentially incorrect data. 'LA' is an abbreviation for 'Los Angeles' and should be written in full to match the format of other cities. 'LAZYTOWN' appears to be a fictional place and is likely an error or placeholder. The correct values should be full city names, consistent with the format used for 'LONDON' and 'ECHO PARK'. 
    -- shipping_zip_code: The problem is that the zip code '2759' is missing a leading zero, which is required for standard 5-digit US zip codes. 'CR0' is not a valid US zip code format at all, suggesting it might be an international postal code or an error. The correct values for US zip codes should be 5-digit numbers, starting with a leading zero for codes less than 10000. 
    SELECT
        "event_id",
        "shipping_address_line1",
        CASE
            WHEN "shipping_city" = 'LA' THEN 'LOS ANGELES'
            WHEN "shipping_city" = 'LAZYTOWN' THEN ''
            ELSE "shipping_city"
        END AS "shipping_city",
        "shipping_country_code",
        "event_created_at",
        "estimated_delivery_at",
        "fulfillment_id",
        "event_occurred_at",
        "shipping_latitude",
        "shipping_longitude",
        "event_message",
        "order_id",
        "shipping_province",
        "shop_id",
        "fulfillment_status",
        "event_updated_at",
        CASE
            WHEN "shipping_zip_code" = '2759' THEN '02759'
            WHEN "shipping_zip_code" = 'CR0' THEN ''
            ELSE "shipping_zip_code"
        END AS "shipping_zip_code",
        "is_deleted"
    FROM "shopify_fulfillment_event_data_projected_renamed"
),

"shopify_fulfillment_event_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- shipping_city: ['']
    -- shipping_zip_code: ['']
    SELECT 
        CASE
            WHEN "shipping_city" = '' THEN NULL
            ELSE "shipping_city"
        END AS "shipping_city",
        CASE
            WHEN "shipping_zip_code" = '' THEN NULL
            ELSE "shipping_zip_code"
        END AS "shipping_zip_code",
        "estimated_delivery_at",
        "event_occurred_at",
        "shipping_address_line1",
        "event_id",
        "shipping_latitude",
        "shipping_longitude",
        "event_message",
        "shipping_province",
        "order_id",
        "shop_id",
        "event_created_at",
        "fulfillment_id",
        "fulfillment_status",
        "is_deleted",
        "event_updated_at",
        "shipping_country_code"
    FROM "shopify_fulfillment_event_data_projected_renamed_cleaned"
),

"shopify_fulfillment_event_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- estimated_delivery_at: from VARCHAR to TIMESTAMP
    -- event_created_at: from VARCHAR to TIMESTAMP
    -- event_id: from INT to VARCHAR
    -- event_occurred_at: from VARCHAR to TIMESTAMP
    -- event_updated_at: from VARCHAR to TIMESTAMP
    -- fulfillment_id: from INT to VARCHAR
    -- order_id: from INT to VARCHAR
    -- shipping_address_line1: from DECIMAL to VARCHAR
    -- shop_id: from INT to VARCHAR
    SELECT
        "shipping_city",
        "shipping_zip_code",
        "shipping_latitude",
        "shipping_longitude",
        "event_message",
        "shipping_province",
        "fulfillment_status",
        "is_deleted",
        "shipping_country_code",
        CAST("estimated_delivery_at" AS TIMESTAMP) AS "estimated_delivery_at",
        CAST("event_created_at" AS TIMESTAMP) AS "event_created_at",
        CAST("event_id" AS VARCHAR) AS "event_id",
        CAST("event_occurred_at" AS TIMESTAMP) AS "event_occurred_at",
        CAST("event_updated_at" AS TIMESTAMP) AS "event_updated_at",
        CAST("fulfillment_id" AS VARCHAR) AS "fulfillment_id",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("shipping_address_line1" AS VARCHAR) AS "shipping_address_line1",
        CAST("shop_id" AS VARCHAR) AS "shop_id"
    FROM "shopify_fulfillment_event_data_projected_renamed_cleaned_null"
),

"shopify_fulfillment_event_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 8 columns with unacceptable missing values
    -- event_message has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_address_line1 has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_city has 40.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_country_code has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_latitude has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_longitude has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_province has 60.0 percent missing. Strategy: 🔄 Unchanged
    -- shipping_zip_code has 40.0 percent missing. Strategy: 🔄 Unchanged
    SELECT
        "shipping_city",
        "shipping_zip_code",
        "shipping_latitude",
        "shipping_longitude",
        "event_message",
        "shipping_province",
        "fulfillment_status",
        "is_deleted",
        "shipping_country_code",
        "estimated_delivery_at",
        "event_created_at",
        "event_id",
        "event_occurred_at",
        "event_updated_at",
        "fulfillment_id",
        "order_id",
        "shop_id"
    FROM "shopify_fulfillment_event_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_fulfillment_event_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_fulfillment_event_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_fulfillment_event_data
  description: The table is about Shopify fulfillment events. It contains details
    of order shipments. Each row represents an event in the fulfillment process. The
    table includes information such as order ID, fulfillment ID, shipping address,
    status, and timestamps. It tracks various stages of delivery like in_transit,
    out_for_delivery, and delivered. The table also records any delays or issues during
    shipment.
  columns:
  - name: shipping_city
    description: City of the shipping destination
    tests:
    - not_null
  - name: shipping_zip_code
    description: Postal or ZIP code of the shipping destination
    tests:
    - not_null
  - name: shipping_latitude
    description: Latitude coordinate of the shipping destination
    tests:
    - not_null
  - name: shipping_longitude
    description: Longitude coordinate of the shipping destination
    tests:
    - not_null
  - name: event_message
    description: Additional information or notes about the event
    tests:
    - not_null
    - accepted_values:
        values:
        - Delay
        - Cancellation
        - On Time
        - Early
        - Rescheduled
        - Postponed
        - Extended
        - Shortened
        - Moved
        - Merged
        - Split
        - Modified
        - Completed
        - In Progress
        - Not Started
        - Suspended
        - Resumed
  - name: shipping_province
    description: Province or state of the shipping destination
    tests:
    - not_null
  - name: fulfillment_status
    description: Current status of the fulfillment
    tests:
    - not_null
    - accepted_values:
        values:
        - pending
        - processing
        - in_transit
        - delayed
        - out_for_delivery
        - delivered
        - cancelled
        - returned
  - name: is_deleted
    description: Indicates if the record has been deleted
    tests:
    - not_null
  - name: shipping_country_code
    description: Country code of the shipping destination
    tests:
    - not_null
  - name: estimated_delivery_at
    description: Estimated delivery date and time
    cocoon_meta:
      missing_acceptable: Not applicable for already delivered or in-transit items.
  - name: event_created_at
    description: Timestamp when the event was created
    tests:
    - not_null
  - name: event_id
    description: Unique identifier for the event
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each event in the
        fulfillment process. For this table, each row is a distinct event, and event_id
        appears to be unique across rows.
  - name: event_occurred_at
    description: Timestamp when the event occurred
    tests:
    - not_null
  - name: event_updated_at
    description: Timestamp when the event was last updated
    tests:
    - not_null
  - name: fulfillment_id
    description: Unique identifier for the fulfillment
    tests:
    - not_null
  - name: order_id
    description: Unique identifier for the order
    tests:
    - not_null
  - name: shop_id
    description: Unique identifier for the shop
    tests:
    - not_null

stg_shopify_product_data (first 100 rows)

product_title product_handle product_type vendor_id visibility_scope is_deleted created_at product_id published_at updated_at
0 1fccbdc6ac5f6edabf76e56eb0460019 f4b6d0e4413a19b2e7a291f0ef4dc98f fdb42fcb90ecd31c015932ffcd313014 13aea892c8de2d62f2608c6191cfab1f web False 2020-02-14 19:18:05 4506451050593 2020-02-14 19:02:02 2020-09-10 18:16:42
1 327ea22d0f91783418e519cb45a4a3e9 129181bbc087330e216a6a4d7939f00b ec3bb3dd6e9d1f348a040ee7b45f1a72 13aea892c8de2d62f2608c6191cfab1f web False 2020-03-04 05:04:32 4526236893281 2020-03-04 05:04:32 2020-09-10 15:06:03
2 c6c6fea8419b94103b0b05d64a5bab10 f0a656254aca08bf40181226ac13418c fdb42fcb90ecd31c015932ffcd313014 57403999f78b01b3fd325ba256eafe94 global False 2020-02-14 02:09:59 4505775439969 2020-02-14 02:09:59 2020-09-11 21:21:21

stg_shopify_product_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_product_data_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "title",
        "handle",
        "product_type",
        "vendor",
        "created_at",
        "updated_at",
        "published_at",
        "published_scope",
        "_fivetran_deleted"
    FROM "shopify_product_data"
),

"shopify_product_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> product_id
    -- title -> product_title
    -- handle -> product_handle
    -- vendor -> vendor_id
    -- published_scope -> visibility_scope
    -- _fivetran_deleted -> is_deleted
    SELECT 
        "id" AS "product_id",
        "title" AS "product_title",
        "handle" AS "product_handle",
        "product_type",
        "vendor" AS "vendor_id",
        "created_at",
        "updated_at",
        "published_at",
        "published_scope" AS "visibility_scope",
        "_fivetran_deleted" AS "is_deleted"
    FROM "shopify_product_data_projected"
),

"shopify_product_data_projected_renamed_trimmed" AS (
    -- Trim Leading and Trailing Spaces
    SELECT
        "product_id",
        "product_title",
        "product_handle",
        "product_type",
        "vendor_id",
        "visibility_scope",
        "is_deleted",
        TRIM("created_at") AS "created_at",
        TRIM("updated_at") AS "updated_at",
        TRIM("published_at") AS "published_at"
    FROM "shopify_product_data_projected_renamed"
),

"shopify_product_data_projected_renamed_trimmed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- product_id: from INT to VARCHAR
    -- published_at: from VARCHAR to TIMESTAMP
    -- updated_at: from VARCHAR to TIMESTAMP
    SELECT
        "product_title",
        "product_handle",
        "product_type",
        "vendor_id",
        "visibility_scope",
        "is_deleted",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("product_id" AS VARCHAR) AS "product_id",
        CAST("published_at" AS TIMESTAMP) AS "published_at",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at"
    FROM "shopify_product_data_projected_renamed_trimmed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_product_data_projected_renamed_trimmed_casted"

stg_shopify_product_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_product_data
  description: The table is about Shopify product data. It contains details like product
    ID, title, handle, type, vendor, creation date, update date, publish date, publish
    scope, and deletion status. Each row represents a unique product with its attributes.
    The table tracks product information and lifecycle on the Shopify platform.
  columns:
  - name: product_title
    description: Name or title of the product
    tests:
    - not_null
  - name: product_handle
    description: Unique URL-friendly string for the product
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique URL-friendly string for the product.
        For this table, each row is for a unique product. The product handle is typically
        generated to be unique for each product in Shopify, making it a good candidate
        for a key.
  - name: product_type
    description: Category or type of the product
    tests:
    - not_null
  - name: vendor_id
    description: Identifier for the product's vendor
    tests:
    - not_null
  - name: visibility_scope
    description: Visibility scope of the product (web/global)
    tests:
    - not_null
    - accepted_values:
        values:
        - web
        - global
  - name: is_deleted
    description: Indicates if the product has been deleted
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the product was created
    tests:
    - not_null
  - name: product_id
    description: Unique identifier for the product
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for the product. For
        this table, each row is for a unique product. Product IDs are designed to
        be unique across all products in a Shopify store, making it an ideal candidate
        key.
  - name: published_at
    description: Timestamp when the product was published
    tests:
    - not_null
  - name: updated_at
    description: Timestamp when the product was last updated
    tests:
    - not_null

stg_shopify_order_data (first 100 rows)

shipping_company shipping_address_line2 billing_full_name billing_first_name billing_last_name billing_company billing_phone billing_address_line1 billing_address_line2 billing_city billing_country billing_country_code billing_province billing_zip order_source referring_site payment_status order_number order_identifier order_token order_notes total_discounts subtotal_price landing_page_url total_line_items_price customer_ip checkout_token customer_email currency order_total taxes_included is_test_order shipping_address_line1 shipping_status processing_method cart_token marketing_consent alt_order_number billing_latitude billing_longitude billing_province_code cancel_reason cancelled_at closed_at created_at customer_id last_updated order_id order_tax order_weight processed_at
0 None None None None None None None None None None None None None None None None paid 4135 d1743fc58a1e4d78769eaac49994a994 0f9c2880de17f71511eee5542c29b999 71509c29301d2cc14e37ecb53f735608 2.8 2.8 None 5.6 None None 021cb20b5c78751fc7ddc091b6b69b3e GBP 2.80 True False d6f4a399883df85d9d4b3a02bf6e738a None None None True 5135 NaN NaN None None NaT NaT 2020-09-11 19:35:42 3589760876641 2020-09-11 19:35:46 2674098602081 0.0 0.0 2020-09-11 19:35:42
1 None None None None None None None None None None None None None None web 2cc983716a820bc713b793a6e8e73f42 paid 4066 4fcb884b5b46413bae526a6e7e49d706 fb489b3ccc0ae36ce47744d7595e9746 None 0.0 2.8 8584e97b29b0802fb393fa453a8b6a7a 2.8 109.249.185.68 7bdb994e1196de3e4f34586e357613f9 dce90c7b4e52e045e5975836aff49cf1 GBP 3.79 True False 1ff1de774005f8da13f42943881c655f fulfilled direct b1ff04883dfeab658cd5211050476729 False 5066 NaN NaN None None NaT 2020-09-10 15:38:26 2020-09-09 23:01:54 3584045351009 2020-09-10 15:38:26 2669516488801 0.0 0.0 2020-09-09 23:01:53
2 None None None None None None None None None None None None None None web 2cc983716a820bc713b793a6e8e73f42 paid 4065 9e346f2e912c60e16679f4a4c8d29422 e44b7f04610a8f4032530cc7f12663de None 0.0 4.4 8584e97b29b0802fb393fa453a8b6a7a 4.4 109.249.185.68 cf0a9fe2c7c606b86559007dbb890a62 dce90c7b4e52e045e5975836aff49cf1 GBP 5.39 True False 1ff1de774005f8da13f42943881c655f fulfilled direct 9600543f4d4613db59ac58a1009ecbb9 False 5065 NaN NaN None None NaT 2020-09-10 15:38:25 2020-09-09 22:57:51 3584045351009 2020-09-10 15:38:25 2669509541985 0.0 0.0 2020-09-09 22:57:50

stg_shopify_order_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_data_projected" AS (
    -- Projection: Selecting 65 out of 66 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "note",
        "email",
        "taxes_included",
        "currency",
        "subtotal_price",
        "total_tax",
        "total_price",
        "created_at",
        "updated_at",
        "name",
        "shipping_address_name",
        "shipping_address_first_name",
        "shipping_address_last_name",
        "shipping_address_company",
        "shipping_address_phone",
        "shipping_address_address_1",
        "shipping_address_address_2",
        "shipping_address_city",
        "shipping_address_country",
        "shipping_address_country_code",
        "shipping_address_province",
        "shipping_address_province_code",
        "shipping_address_zip",
        "shipping_address_latitude",
        "shipping_address_longitude",
        "billing_address_name",
        "billing_address_first_name",
        "billing_address_last_name",
        "billing_address_company",
        "billing_address_phone",
        "billing_address_address_1",
        "billing_address_address_2",
        "billing_address_city",
        "billing_address_country",
        "billing_address_country_code",
        "billing_address_province",
        "billing_address_province_code",
        "billing_address_zip",
        "billing_address_latitude",
        "billing_address_longitude",
        "customer_id",
        "location_id",
        "user_id",
        "number",
        "order_number",
        "financial_status",
        "fulfillment_status",
        "processed_at",
        "processing_method",
        "referring_site",
        "cancel_reason",
        "cancelled_at",
        "closed_at",
        "total_discounts",
        "total_line_items_price",
        "total_weight",
        "source_name",
        "browser_ip",
        "buyer_accepts_marketing",
        "token",
        "cart_token",
        "checkout_token",
        "test",
        "landing_site_base_url"
    FROM "shopify_order_data"
),

"shopify_order_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> order_id
    -- note -> order_notes
    -- email -> customer_email
    -- total_tax -> order_tax
    -- total_price -> order_total
    -- updated_at -> last_updated
    -- name -> order_identifier
    -- shipping_address_name -> shipping_full_name
    -- shipping_address_first_name -> shipping_first_name
    -- shipping_address_last_name -> shipping_last_name
    -- shipping_address_company -> shipping_company
    -- shipping_address_phone -> shipping_phone
    -- shipping_address_address_1 -> shipping_address_line1
    -- shipping_address_address_2 -> shipping_address_line2
    -- shipping_address_city -> shipping_city
    -- shipping_address_country -> shipping_country
    -- shipping_address_country_code -> shipping_country_code
    -- shipping_address_province -> shipping_province
    -- shipping_address_province_code -> shipping_province_code
    -- shipping_address_zip -> shipping_zip
    -- shipping_address_latitude -> shipping_latitude
    -- shipping_address_longitude -> shipping_longitude
    -- billing_address_name -> billing_full_name
    -- billing_address_first_name -> billing_first_name
    -- billing_address_last_name -> billing_last_name
    -- billing_address_company -> billing_company
    -- billing_address_phone -> billing_phone
    -- billing_address_address_1 -> billing_address_line1
    -- billing_address_address_2 -> billing_address_line2
    -- billing_address_city -> billing_city
    -- billing_address_country -> billing_country
    -- billing_address_country_code -> billing_country_code
    -- billing_address_province -> billing_province
    -- billing_address_province_code -> billing_province_code
    -- billing_address_zip -> billing_zip
    -- billing_address_latitude -> billing_latitude
    -- billing_address_longitude -> billing_longitude
    -- location_id -> store_location_id
    -- number -> order_number
    -- order_number -> alt_order_number
    -- financial_status -> payment_status
    -- fulfillment_status -> shipping_status
    -- total_weight -> order_weight
    -- source_name -> order_source
    -- browser_ip -> customer_ip
    -- buyer_accepts_marketing -> marketing_consent
    -- token -> order_token
    -- test -> is_test_order
    -- landing_site_base_url -> landing_page_url
    SELECT 
        "id" AS "order_id",
        "note" AS "order_notes",
        "email" AS "customer_email",
        "taxes_included",
        "currency",
        "subtotal_price",
        "total_tax" AS "order_tax",
        "total_price" AS "order_total",
        "created_at",
        "updated_at" AS "last_updated",
        "name" AS "order_identifier",
        "shipping_address_name" AS "shipping_full_name",
        "shipping_address_first_name" AS "shipping_first_name",
        "shipping_address_last_name" AS "shipping_last_name",
        "shipping_address_company" AS "shipping_company",
        "shipping_address_phone" AS "shipping_phone",
        "shipping_address_address_1" AS "shipping_address_line1",
        "shipping_address_address_2" AS "shipping_address_line2",
        "shipping_address_city" AS "shipping_city",
        "shipping_address_country" AS "shipping_country",
        "shipping_address_country_code" AS "shipping_country_code",
        "shipping_address_province" AS "shipping_province",
        "shipping_address_province_code" AS "shipping_province_code",
        "shipping_address_zip" AS "shipping_zip",
        "shipping_address_latitude" AS "shipping_latitude",
        "shipping_address_longitude" AS "shipping_longitude",
        "billing_address_name" AS "billing_full_name",
        "billing_address_first_name" AS "billing_first_name",
        "billing_address_last_name" AS "billing_last_name",
        "billing_address_company" AS "billing_company",
        "billing_address_phone" AS "billing_phone",
        "billing_address_address_1" AS "billing_address_line1",
        "billing_address_address_2" AS "billing_address_line2",
        "billing_address_city" AS "billing_city",
        "billing_address_country" AS "billing_country",
        "billing_address_country_code" AS "billing_country_code",
        "billing_address_province" AS "billing_province",
        "billing_address_province_code" AS "billing_province_code",
        "billing_address_zip" AS "billing_zip",
        "billing_address_latitude" AS "billing_latitude",
        "billing_address_longitude" AS "billing_longitude",
        "customer_id",
        "location_id" AS "store_location_id",
        "user_id",
        "number" AS "order_number",
        "order_number" AS "alt_order_number",
        "financial_status" AS "payment_status",
        "fulfillment_status" AS "shipping_status",
        "processed_at",
        "processing_method",
        "referring_site",
        "cancel_reason",
        "cancelled_at",
        "closed_at",
        "total_discounts",
        "total_line_items_price",
        "total_weight" AS "order_weight",
        "source_name" AS "order_source",
        "browser_ip" AS "customer_ip",
        "buyer_accepts_marketing" AS "marketing_consent",
        "token" AS "order_token",
        "cart_token",
        "checkout_token",
        "test" AS "is_test_order",
        "landing_site_base_url" AS "landing_page_url"
    FROM "shopify_order_data_projected"
),

"shopify_order_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- shipping_full_name: The problem is that the shipping_full_name column contains hashed or encrypted strings instead of human-readable full names. These values are meaningless for practical use and do not represent actual customer names. The correct values should be the decrypted or unhashed full names, but without access to the decryption key or original data, it's impossible to recover the real names. In this case, the best approach is to map these encrypted values to empty strings to indicate that the real names are unknown or unavailable. 
    -- shipping_first_name: The problem is that the shipping_first_name column contains hashed or encrypted values instead of readable first names. These values are not meaningful or usable as actual names. The correct values should be decrypted first names, but without access to the decryption key, we cannot recover the original names. 
    -- shipping_last_name: The problem is that both values in the shipping_last_name column appear to be hashed or encrypted strings instead of actual last names. These values are likely placeholders or the result of data anonymization, and do not represent real last names. The correct values should be actual last names, but since we don't have access to the original data, we cannot map these to real names. 
    -- shipping_company: The problem is that the shipping_company column contains an MD5 hash value instead of an actual shipping company name. MD5 hash 'd41d8cd98f00b204e9800998ecf8427e' is known to be the hash of an empty string, which suggests that this column was likely left empty and then hashed, possibly as a placeholder or due to a data processing error. The correct values should be actual shipping company names, but since we don't have that information, the best approach is to map this to an empty string to indicate missing data. 
    -- shipping_phone: The problem is that the shipping_phone column contains an MD5 hash value instead of an actual phone number. MD5 hashes are 32-character hexadecimal strings, which is what we see here. This value 'd41d8cd98f00b204e9800998ecf8427e' is actually the MD5 hash of an empty string. It's likely that this hash was used as a placeholder or default value when no phone number was provided. The correct value for a missing phone number should be an empty string or null value, not an MD5 hash. 
    -- shipping_address_line2: The problem is that the shipping_address_line2 column contains hexadecimal strings instead of typical address information. These values appear to be some kind of hashed or encrypted data rather than actual address details. Since we don't have the means to decrypt these values and they don't represent valid address information, the correct approach is to map them to empty strings. 
    -- shipping_city: The problem is that the shipping_city column contains hashed or encrypted strings instead of readable city names. These values are meaningless for analysis or human interpretation. Since we don't have a way to decrypt these hashes back to the original city names, and we don't have any additional information about what cities they might represent, the correct approach is to map these to empty strings. 
    -- shipping_country: The problem is that the shipping_country column contains an encoded or hashed value instead of a proper country name. This value '89f9c9f489be2a83cf57e53b9197d288' appears to be a 32-character hexadecimal string, which is likely the result of a hashing algorithm (possibly MD5). This is unusual because we expect country names to be human-readable text. The correct values should be actual country names, but without additional information or a way to decode this hash, we cannot determine the intended country.  
    -- shipping_country_code: The problem is that the value '79cba1185463850dedba31f172f1dc5b' is not a valid country code. It appears to be a hash or some form of encoded data rather than a standard 2 or 3 letter country code. Without more context about the data source or what this value is supposed to represent, it's impossible to map it to a correct country code. The correct values for this column should be standard ISO 3166-1 alpha-2 or alpha-3 country codes. 
    -- shipping_province: The problem is that the shipping_province column contains an MD5 hash value ('d41d8cd98f00b204e9800998ecf8427e') instead of actual province names or abbreviations. This hash value is unusual and meaningless in the context of shipping provinces. The correct values should be actual province names or abbreviations, but since we don't have that information, we should map this to an empty string to indicate missing data. 
    -- shipping_zip: The problem is that both values in the shipping_zip column are hashed or encrypted strings instead of standard ZIP codes. ZIP codes in the United States are typically 5-digit numbers, sometimes followed by a hyphen and 4 additional digits (ZIP+4 code). The current values are clearly not in this format and appear to be some form of obfuscated data. Since we don't have access to the decryption method or original ZIP codes, we can't map these to actual ZIP codes. The correct approach would be to treat these as invalid or unknown ZIP codes. 
    -- shipping_latitude: The problem is that both values in the shipping_latitude column are hash-like strings instead of numerical latitude values. Latitude values should typically be decimal numbers ranging from -90 to 90 degrees. These hash-like strings are meaningless for geographical coordinates and appear to be some kind of encoding or error. Without additional information to decode these strings into actual latitude values, the correct approach is to map them to empty strings to indicate missing or invalid data. 
    -- shipping_longitude: The problem is that the shipping_longitude column contains hashed or encoded strings instead of actual longitude values. Longitude values should be numeric, typically ranging from -180 to 180 degrees. The current values appear to be MD5 hashes or some other form of encoded data, which are not meaningful for geographic coordinates. Since we don't have the actual longitude values and can't decode these hashes, the correct approach is to map these to empty strings to indicate missing data. 
    -- billing_full_name: The problem is that the billing_full_name column contains encrypted or hashed strings instead of human-readable names. These values are not meaningful for analysis or display purposes. Since we don't have access to the original names and cannot decrypt the hashes, the correct approach is to map these values to empty strings to indicate that the real names are not available. 
    -- billing_first_name: The problem is that both values in the billing_first_name column are hashed or encrypted strings instead of actual first names. These values are unusable for identifying individuals or for any meaningful analysis. The correct values should be actual first names, but since we don't have access to the original data or the decryption method, we can't recover the real names. 
    -- billing_last_name: The problem is that the billing_last_name column contains hashed or encrypted strings instead of readable last names. This is unusual because typically last names should be human-readable text, not cryptographic hashes. The correct values should be the actual last names of the customers, but since we don't have access to the decryption method or original data, we can't recover the real names. In this case, it's best to map these values to an empty string to indicate that the real last name is not available. 
    -- billing_company: The problem is that the billing_company column contains an MD5 hash value instead of a recognizable company name. This hash value ('d41d8cd98f00b204e9800998ecf8427e') is actually the MD5 hash of an empty string. This suggests that the column was likely encrypted or hashed for data protection purposes, or it's being used as a placeholder for missing data. The correct value in this case should be an empty string, as the hash represents no data. 
    -- billing_phone: The problem is that the billing_phone column contains an MD5 hash value instead of an actual phone number. The value 'd41d8cd98f00b204e9800998ecf8427e' is the MD5 hash of an empty string. This suggests that the phone numbers were hashed for privacy reasons, or there was an error in data processing that resulted in hashing empty values. The correct values should be actual phone numbers, but since we don't have access to the original data, we can't reconstruct them. In this case, it's best to represent missing or unknown data. 
    -- billing_address_line1: The problem is that both values in the billing_address_line1 column appear to be hashed or encrypted strings rather than readable address information. This is unusual because billing addresses are typically stored as plain text for practical use. The correct values should be the actual billing address lines, but since we don't have access to the original data or decryption method, we cannot recover the true addresses. 
    -- billing_address_line2: The problem is that both values in the billing_address_line2 column appear to be hashed or encrypted data rather than readable text for address lines. This suggests that the data has been obfuscated, possibly for privacy reasons. However, address line 2 is typically optional and often left blank. Since we cannot decrypt or reverse the hashing to obtain the original values, and address line 2 is commonly empty, the most appropriate action is to map these unusual values to an empty string. 
    -- billing_city: The problem is that the billing_city column contains hashed or encrypted strings instead of readable city names. These values are not meaningful for analysis or display purposes. Since we don't have access to the decryption key or the original city names, we cannot map these to actual city names. The correct approach would be to treat these as unknown or invalid data. 
    -- billing_country: The problem is that the billing_country column contains a single value that appears to be a 32-character alphanumeric hash instead of an actual country name. This is highly unusual and incorrect for a country field. The correct values should be actual country names or codes. 
    -- billing_country_code: The problem is that the value '79cba1185463850dedba31f172f1dc5b' is not a valid country code. It appears to be a hash or some form of encoded data, which is not appropriate for a country code field. Country codes are typically 2 or 3 letter abbreviations (e.g., 'US' for United States, 'GB' for Great Britain). Since we don't have any information about what this value is supposed to represent or what the correct country code should be, we can't map it to a valid country code. In this case, it's best to map it to an empty string to indicate missing or invalid data. 
    -- billing_province: The problem is that the billing_province column contains a hash-like string instead of readable province names. The value 'd41d8cd98f00b204e9800998ecf8427e' is unusual because it appears to be an MD5 hash, which is typically used for data encryption or verification, not for representing geographical locations. This value is meaningless in the context of a province name. The correct values should be actual province names or an empty string if the information is not available. 
    -- billing_zip: The problem is that both values in the billing_zip column are unusual because they are long alphanumeric strings, not standard ZIP code formats. ZIP codes in the United States are typically 5 digits, or sometimes 9 digits (ZIP+4 format). These values appear to be hashed or encrypted data, possibly due to a data processing error or security measure. Since we don't have access to the original ZIP codes and can't decode these values, we can't map them to correct ZIP codes. The most appropriate action is to map these unusual values to an empty string, indicating that the true ZIP code is unknown or unavailable. 
    -- billing_latitude: The problem is that the billing_latitude column contains hashed strings instead of numerical latitude values. Latitude values should typically be decimal numbers between -90 and 90 degrees. The hashed strings are meaningless in the context of geographical coordinates and cannot be directly converted to valid latitudes. Since we don't have access to the original data or the hashing algorithm, we cannot recover the actual latitude values. 
    -- billing_longitude: The problem is that both values in the billing_longitude column appear to be hashed or encrypted data rather than actual longitude values. Longitude values should be numeric, typically ranging from -180 to 180 degrees. The current values are clearly not valid longitude coordinates. Since we don't have the key to decrypt these values or any way to determine the actual longitudes they represent, the correct approach is to map them to empty strings to indicate missing data. 
    -- order_source: The problem is that '294517' is a numeric string that doesn't clearly represent an order source. It's unusual because it doesn't provide any meaningful information about the source of the order, unlike 'web' which is a clear and common order source. The correct values should all be descriptive of the order source, with 'web' being the only valid value in this dataset. 
    SELECT
        "order_id",
        "order_notes",
        "customer_email",
        "taxes_included",
        "currency",
        "subtotal_price",
        "order_tax",
        "order_total",
        "created_at",
        "last_updated",
        "order_identifier",
        CASE
            WHEN "shipping_full_name" = 'c8189c7add9755e66391b58ecc12b3e2' THEN ''
            WHEN "shipping_full_name" = '8b121314a4d97bc9dc15bfba8518ec88' THEN ''
            ELSE "shipping_full_name"
        END AS "shipping_full_name",
        CASE
            WHEN "shipping_first_name" = 'd3bae70c9d49bb7cb5a74cdd0eae7fc4' THEN ''
            WHEN "shipping_first_name" = 'f0962b7a185488ecb752cedac1038349' THEN ''
            ELSE "shipping_first_name"
        END AS "shipping_first_name",
        CASE
            WHEN "shipping_last_name" = '0dd89cff60965dff8f9ea2bc952a5474' THEN ''
            WHEN "shipping_last_name" = 'aa35cb67c26e64bb81a1bf3f17e858ba' THEN ''
            ELSE "shipping_last_name"
        END AS "shipping_last_name",
        CASE
            WHEN "shipping_company" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "shipping_company"
        END AS "shipping_company",
        CASE
            WHEN "shipping_phone" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "shipping_phone"
        END AS "shipping_phone",
        "shipping_address_line1",
        CASE
            WHEN "shipping_address_line2" = '70111f8840ccbd8b1007cc3f387ced6b' THEN ''
            WHEN "shipping_address_line2" = 'bc9b8576178dcd886639ba718f1d45c8' THEN ''
            ELSE "shipping_address_line2"
        END AS "shipping_address_line2",
        CASE
            WHEN "shipping_city" = '1ac412baeba98370017c73df41c98a07' THEN ''
            WHEN "shipping_city" = 'ac08c606d455cde42980f980524a8038' THEN ''
            ELSE "shipping_city"
        END AS "shipping_city",
        CASE
            WHEN "shipping_country" = '89f9c9f489be2a83cf57e53b9197d288' THEN ''
            ELSE "shipping_country"
        END AS "shipping_country",
        CASE
            WHEN "shipping_country_code" = '79cba1185463850dedba31f172f1dc5b' THEN ''
            ELSE "shipping_country_code"
        END AS "shipping_country_code",
        CASE
            WHEN "shipping_province" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "shipping_province"
        END AS "shipping_province",
        "shipping_province_code",
        CASE
            WHEN "shipping_zip" = '2357e65b582faa0a2da3603b16fa4a7f' THEN ''
            WHEN "shipping_zip" = '00079ce435afddc28205639142773870' THEN ''
            ELSE "shipping_zip"
        END AS "shipping_zip",
        CASE
            WHEN "shipping_latitude" = '75c29d6dd29594a652fcbd7c4c279a29' THEN ''
            WHEN "shipping_latitude" = 'd97319f64674c02595f2989019970fc8' THEN ''
            ELSE "shipping_latitude"
        END AS "shipping_latitude",
        CASE
            WHEN "shipping_longitude" = '75468fbebc28e02ec5d4f54f4cbd4099' THEN ''
            WHEN "shipping_longitude" = 'c08dae474c5d4d3326fd6764d2a0ebe6' THEN ''
            ELSE "shipping_longitude"
        END AS "shipping_longitude",
        CASE
            WHEN "billing_full_name" = 'c8189c7add9755e66391b58ecc12b3e2' THEN ''
            WHEN "billing_full_name" = '8b121314a4d97bc9dc15bfba8518ec88' THEN ''
            ELSE "billing_full_name"
        END AS "billing_full_name",
        CASE
            WHEN "billing_first_name" = 'd3bae70c9d49bb7cb5a74cdd0eae7fc4' THEN ''
            WHEN "billing_first_name" = 'f0962b7a185488ecb752cedac1038349' THEN ''
            ELSE "billing_first_name"
        END AS "billing_first_name",
        CASE
            WHEN "billing_last_name" = '0dd89cff60965dff8f9ea2bc952a5474' THEN ''
            WHEN "billing_last_name" = 'aa35cb67c26e64bb81a1bf3f17e858ba' THEN ''
            ELSE "billing_last_name"
        END AS "billing_last_name",
        CASE
            WHEN "billing_company" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "billing_company"
        END AS "billing_company",
        CASE
            WHEN "billing_phone" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "billing_phone"
        END AS "billing_phone",
        CASE
            WHEN "billing_address_line1" = '1ff1de774005f8da13f42943881c655f' THEN ''
            WHEN "billing_address_line1" = 'd6f4a399883df85d9d4b3a02bf6e738a' THEN ''
            ELSE "billing_address_line1"
        END AS "billing_address_line1",
        CASE
            WHEN "billing_address_line2" = '70111f8840ccbd8b1007cc3f387ced6b' THEN ''
            WHEN "billing_address_line2" = 'bc9b8576178dcd886639ba718f1d45c8' THEN ''
            ELSE "billing_address_line2"
        END AS "billing_address_line2",
        CASE
            WHEN "billing_city" = '1ac412baeba98370017c73df41c98a07' THEN 'UNKNOWN'
            WHEN "billing_city" = 'ac08c606d455cde42980f980524a8038' THEN 'UNKNOWN'
            ELSE "billing_city"
        END AS "billing_city",
        CASE
            WHEN "billing_country" = '89f9c9f489be2a83cf57e53b9197d288' THEN ''
            ELSE "billing_country"
        END AS "billing_country",
        CASE
            WHEN "billing_country_code" = '79cba1185463850dedba31f172f1dc5b' THEN ''
            ELSE "billing_country_code"
        END AS "billing_country_code",
        CASE
            WHEN "billing_province" = 'd41d8cd98f00b204e9800998ecf8427e' THEN ''
            ELSE "billing_province"
        END AS "billing_province",
        "billing_province_code",
        CASE
            WHEN "billing_zip" = '2357e65b582faa0a2da3603b16fa4a7f' THEN ''
            WHEN "billing_zip" = '00079ce435afddc28205639142773870' THEN ''
            ELSE "billing_zip"
        END AS "billing_zip",
        CASE
            WHEN "billing_latitude" = '75c29d6dd29594a652fcbd7c4c279a29' THEN ''
            WHEN "billing_latitude" = 'd97319f64674c02595f2989019970fc8' THEN ''
            ELSE "billing_latitude"
        END AS "billing_latitude",
        CASE
            WHEN "billing_longitude" = '75468fbebc28e02ec5d4f54f4cbd4099' THEN ''
            WHEN "billing_longitude" = 'c08dae474c5d4d3326fd6764d2a0ebe6' THEN ''
            ELSE "billing_longitude"
        END AS "billing_longitude",
        "customer_id",
        "store_location_id",
        "user_id",
        "order_number",
        "alt_order_number",
        "payment_status",
        "shipping_status",
        "processed_at",
        "processing_method",
        "referring_site",
        "cancel_reason",
        "cancelled_at",
        "closed_at",
        "total_discounts",
        "total_line_items_price",
        "order_weight",
        CASE
            WHEN "order_source" = '294517' THEN ''
            ELSE "order_source"
        END AS "order_source",
        "customer_ip",
        "marketing_consent",
        "order_token",
        "cart_token",
        "checkout_token",
        "is_test_order",
        "landing_page_url"
    FROM "shopify_order_data_projected_renamed"
),

"shopify_order_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- shipping_full_name: ['']
    -- shipping_first_name: ['']
    -- shipping_last_name: ['']
    -- shipping_company: ['']
    -- shipping_phone: ['']
    -- shipping_address_line2: ['']
    -- shipping_city: ['']
    -- shipping_country: ['']
    -- shipping_country_code: ['']
    -- shipping_province: ['']
    -- shipping_zip: ['']
    -- shipping_latitude: ['']
    -- shipping_longitude: ['']
    -- billing_full_name: ['']
    -- billing_first_name: ['']
    -- billing_last_name: ['']
    -- billing_company: ['']
    -- billing_phone: ['']
    -- billing_address_line1: ['']
    -- billing_address_line2: ['']
    -- billing_city: ['UNKNOWN']
    -- billing_country: ['']
    -- billing_country_code: ['']
    -- billing_province: ['']
    -- billing_zip: ['']
    -- billing_latitude: ['']
    -- billing_longitude: ['']
    -- order_source: ['']
    SELECT 
        CASE
            WHEN "shipping_full_name" = '' THEN NULL
            ELSE "shipping_full_name"
        END AS "shipping_full_name",
        CASE
            WHEN "shipping_first_name" = '' THEN NULL
            ELSE "shipping_first_name"
        END AS "shipping_first_name",
        CASE
            WHEN "shipping_last_name" = '' THEN NULL
            ELSE "shipping_last_name"
        END AS "shipping_last_name",
        CASE
            WHEN "shipping_company" = '' THEN NULL
            ELSE "shipping_company"
        END AS "shipping_company",
        CASE
            WHEN "shipping_phone" = '' THEN NULL
            ELSE "shipping_phone"
        END AS "shipping_phone",
        CASE
            WHEN "shipping_address_line2" = '' THEN NULL
            ELSE "shipping_address_line2"
        END AS "shipping_address_line2",
        CASE
            WHEN "shipping_city" = '' THEN NULL
            ELSE "shipping_city"
        END AS "shipping_city",
        CASE
            WHEN "shipping_country" = '' THEN NULL
            ELSE "shipping_country"
        END AS "shipping_country",
        CASE
            WHEN "shipping_country_code" = '' THEN NULL
            ELSE "shipping_country_code"
        END AS "shipping_country_code",
        CASE
            WHEN "shipping_province" = '' THEN NULL
            ELSE "shipping_province"
        END AS "shipping_province",
        CASE
            WHEN "shipping_zip" = '' THEN NULL
            ELSE "shipping_zip"
        END AS "shipping_zip",
        CASE
            WHEN "shipping_latitude" = '' THEN NULL
            ELSE "shipping_latitude"
        END AS "shipping_latitude",
        CASE
            WHEN "shipping_longitude" = '' THEN NULL
            ELSE "shipping_longitude"
        END AS "shipping_longitude",
        CASE
            WHEN "billing_full_name" = '' THEN NULL
            ELSE "billing_full_name"
        END AS "billing_full_name",
        CASE
            WHEN "billing_first_name" = '' THEN NULL
            ELSE "billing_first_name"
        END AS "billing_first_name",
        CASE
            WHEN "billing_last_name" = '' THEN NULL
            ELSE "billing_last_name"
        END AS "billing_last_name",
        CASE
            WHEN "billing_company" = '' THEN NULL
            ELSE "billing_company"
        END AS "billing_company",
        CASE
            WHEN "billing_phone" = '' THEN NULL
            ELSE "billing_phone"
        END AS "billing_phone",
        CASE
            WHEN "billing_address_line1" = '' THEN NULL
            ELSE "billing_address_line1"
        END AS "billing_address_line1",
        CASE
            WHEN "billing_address_line2" = '' THEN NULL
            ELSE "billing_address_line2"
        END AS "billing_address_line2",
        CASE
            WHEN "billing_city" = 'UNKNOWN' THEN NULL
            ELSE "billing_city"
        END AS "billing_city",
        CASE
            WHEN "billing_country" = '' THEN NULL
            ELSE "billing_country"
        END AS "billing_country",
        CASE
            WHEN "billing_country_code" = '' THEN NULL
            ELSE "billing_country_code"
        END AS "billing_country_code",
        CASE
            WHEN "billing_province" = '' THEN NULL
            ELSE "billing_province"
        END AS "billing_province",
        CASE
            WHEN "billing_zip" = '' THEN NULL
            ELSE "billing_zip"
        END AS "billing_zip",
        CASE
            WHEN "billing_latitude" = '' THEN NULL
            ELSE "billing_latitude"
        END AS "billing_latitude",
        CASE
            WHEN "billing_longitude" = '' THEN NULL
            ELSE "billing_longitude"
        END AS "billing_longitude",
        CASE
            WHEN "order_source" = '' THEN NULL
            ELSE "order_source"
        END AS "order_source",
        "shipping_province_code",
        "user_id",
        "referring_site",
        "payment_status",
        "order_number",
        "order_identifier",
        "cancel_reason",
        "order_token",
        "order_notes",
        "total_discounts",
        "subtotal_price",
        "order_id",
        "landing_page_url",
        "last_updated",
        "billing_province_code",
        "total_line_items_price",
        "order_weight",
        "closed_at",
        "customer_ip",
        "checkout_token",
        "customer_email",
        "customer_id",
        "currency",
        "order_tax",
        "order_total",
        "cancelled_at",
        "processed_at",
        "taxes_included",
        "is_test_order",
        "alt_order_number",
        "shipping_address_line1",
        "shipping_status",
        "processing_method",
        "cart_token",
        "created_at",
        "store_location_id",
        "marketing_consent"
    FROM "shopify_order_data_projected_renamed_cleaned"
),

"shopify_order_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- alt_order_number: from INT to VARCHAR
    -- billing_latitude: from VARCHAR to DECIMAL
    -- billing_longitude: from VARCHAR to DECIMAL
    -- billing_province_code: from DECIMAL to VARCHAR
    -- cancel_reason: from DECIMAL to VARCHAR
    -- cancelled_at: from DECIMAL to TIMESTAMP
    -- closed_at: from VARCHAR to TIMESTAMP
    -- created_at: from VARCHAR to TIMESTAMP
    -- customer_id: from INT to VARCHAR
    -- last_updated: from VARCHAR to TIMESTAMP
    -- order_id: from INT to VARCHAR
    -- order_tax: from INT to DECIMAL
    -- order_weight: from INT to DECIMAL
    -- processed_at: from VARCHAR to TIMESTAMP
    -- shipping_latitude: from VARCHAR to DECIMAL
    -- shipping_longitude: from VARCHAR to DECIMAL
    -- shipping_province_code: from DECIMAL to VARCHAR
    -- store_location_id: from DECIMAL to VARCHAR
    -- user_id: from DECIMAL to VARCHAR
    SELECT
        "shipping_full_name",
        "shipping_first_name",
        "shipping_last_name",
        "shipping_company",
        "shipping_phone",
        "shipping_address_line2",
        "shipping_city",
        "shipping_country",
        "shipping_country_code",
        "shipping_province",
        "shipping_zip",
        "billing_full_name",
        "billing_first_name",
        "billing_last_name",
        "billing_company",
        "billing_phone",
        "billing_address_line1",
        "billing_address_line2",
        "billing_city",
        "billing_country",
        "billing_country_code",
        "billing_province",
        "billing_zip",
        "order_source",
        "referring_site",
        "payment_status",
        "order_number",
        "order_identifier",
        "order_token",
        "order_notes",
        "total_discounts",
        "subtotal_price",
        "landing_page_url",
        "total_line_items_price",
        "customer_ip",
        "checkout_token",
        "customer_email",
        "currency",
        "order_total",
        "taxes_included",
        "is_test_order",
        "shipping_address_line1",
        "shipping_status",
        "processing_method",
        "cart_token",
        "marketing_consent",
        CAST("alt_order_number" AS VARCHAR) AS "alt_order_number",
        CAST("billing_latitude" AS DECIMAL) AS "billing_latitude",
        CAST("billing_longitude" AS DECIMAL) AS "billing_longitude",
        CAST("billing_province_code" AS VARCHAR) AS "billing_province_code",
        CAST("cancel_reason" AS VARCHAR) AS "cancel_reason",
        CAST("cancelled_at" AS TIMESTAMP) AS "cancelled_at",
        CAST("closed_at" AS TIMESTAMP) AS "closed_at",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("customer_id" AS VARCHAR) AS "customer_id",
        CAST("last_updated" AS TIMESTAMP) AS "last_updated",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("order_tax" AS DECIMAL) AS "order_tax",
        CAST("order_weight" AS DECIMAL) AS "order_weight",
        CAST("processed_at" AS TIMESTAMP) AS "processed_at",
        CAST("shipping_latitude" AS DECIMAL) AS "shipping_latitude",
        CAST("shipping_longitude" AS DECIMAL) AS "shipping_longitude",
        CAST("shipping_province_code" AS VARCHAR) AS "shipping_province_code",
        CAST("store_location_id" AS VARCHAR) AS "store_location_id",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "shopify_order_data_projected_renamed_cleaned_null"
),

"shopify_order_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 23 columns with unacceptable missing values
    -- cart_token has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- checkout_token has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- closed_at has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- customer_ip has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- landing_page_url has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- order_notes has 66.67 percent missing. Strategy: 🔄 Unchanged
    -- order_source has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- processing_method has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- referring_site has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- shipping_city has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_country has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_country_code has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_first_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_full_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_last_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_latitude has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_longitude has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_phone has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_province has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_province_code has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- shipping_zip has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- store_location_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- user_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "shipping_company",
        "shipping_address_line2",
        "billing_full_name",
        "billing_first_name",
        "billing_last_name",
        "billing_company",
        "billing_phone",
        "billing_address_line1",
        "billing_address_line2",
        "billing_city",
        "billing_country",
        "billing_country_code",
        "billing_province",
        "billing_zip",
        "order_source",
        "referring_site",
        "payment_status",
        "order_number",
        "order_identifier",
        "order_token",
        "order_notes",
        "total_discounts",
        "subtotal_price",
        "landing_page_url",
        "total_line_items_price",
        "customer_ip",
        "checkout_token",
        "customer_email",
        "currency",
        "order_total",
        "taxes_included",
        "is_test_order",
        "shipping_address_line1",
        "shipping_status",
        "processing_method",
        "cart_token",
        "marketing_consent",
        "alt_order_number",
        "billing_latitude",
        "billing_longitude",
        "billing_province_code",
        "cancel_reason",
        "cancelled_at",
        "closed_at",
        "created_at",
        "customer_id",
        "last_updated",
        "order_id",
        "order_tax",
        "order_weight",
        "processed_at"
    FROM "shopify_order_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_order_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_data
  description: The table is about Shopify orders. It includes order details like ID,
    total price, currency, and timestamps. Customer information such as email and
    shipping/billing addresses are provided. Order status, payment details, and fulfillment
    information are also included. Each row represents a unique order with its associated
    data.
  columns:
  - name: shipping_company
    description: Company name in shipping address
    cocoon_meta:
      missing_acceptable: No company associated with this shipping address
  - name: shipping_address_line2
    description: Second line of shipping address
    cocoon_meta:
      missing_acceptable: No secondary shipping address line needed
  - name: billing_full_name
    description: Full name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for this order
  - name: billing_first_name
    description: First name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for this order
  - name: billing_last_name
    description: Last name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for this order
  - name: billing_company
    description: Company name in billing address
    cocoon_meta:
      missing_acceptable: No company associated with this billing address
  - name: billing_phone
    description: Phone number in billing address
    cocoon_meta:
      missing_acceptable: No billing phone number provided for this order
  - name: billing_address_line1
    description: First line of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for this order
  - name: billing_address_line2
    description: Second line of billing address
    cocoon_meta:
      missing_acceptable: No secondary billing address line needed
  - name: billing_city
    description: City of billing address
    cocoon_meta:
      missing_acceptable: No billing city provided for this order
  - name: billing_country
    description: Country of billing address
    cocoon_meta:
      missing_acceptable: No billing country provided for this order
  - name: billing_country_code
    description: Country code of billing address
    cocoon_meta:
      missing_acceptable: No billing country code provided for this order
  - name: billing_province
    description: Province or state in billing address
    cocoon_meta:
      missing_acceptable: No billing province/state provided for this order
  - name: billing_zip
    description: Zip or postal code of billing address
    cocoon_meta:
      missing_acceptable: No billing zip/postal code provided for this order
  - name: order_source
    description: Source of the order
    tests:
    - not_null
    - accepted_values:
        values:
        - web
        - mobile_app
        - phone
        - in_store
        - email
        - fax
        - mail_order
        - social_media
        - third_party_marketplace
        - kiosk
        - voice_assistant
        - sms
        - chatbot
  - name: referring_site
    description: Website that referred the order
    tests:
    - not_null
  - name: payment_status
    description: Payment status of the order
    tests:
    - not_null
    - accepted_values:
        values:
        - paid
        - pending
        - failed
        - refunded
        - cancelled
        - partially_paid
  - name: order_number
    description: Order number
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each order. For this
        table, each row is for a distinct order, and order_number appears to be unique
        across rows.
  - name: order_identifier
    description: Order name or identifier
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column seems to be a unique identifier for each order, possibly
        in a different format. For this table, each row represents a distinct order,
        and order_identifier appears to be unique across rows.
  - name: order_token
    description: Unique token for the order
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column appears to be a unique token associated with each order.
        For this table, each row represents a distinct order, and order_token seems
        to be unique across rows.
  - name: order_notes
    description: Additional notes for the order
    tests:
    - not_null
  - name: total_discounts
    description: Total discounts applied to the order
    tests:
    - not_null
  - name: subtotal_price
    description: Subtotal price of the order
    tests:
    - not_null
  - name: landing_page_url
    description: Base URL of the landing page
    tests:
    - not_null
  - name: total_line_items_price
    description: Total price of all line items
    tests:
    - not_null
  - name: customer_ip
    description: IP address of customer's browser
    tests:
    - not_null
  - name: checkout_token
    description: Unique identifier for checkout process
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for the checkout process.
        For this table, each row represents a distinct order, and checkout_token appears
        to be unique across rows.
  - name: customer_email
    description: Customer's email address
    tests:
    - not_null
  - name: currency
    description: Currency used for the order
    tests:
    - not_null
    - accepted_values:
        values:
        - GBP
        - USD
        - EUR
        - JPY
        - CHF
        - CAD
        - AUD
        - CNY
        - HKD
        - NZD
        - SEK
        - NOK
        - DKK
        - SGD
        - MXN
        - INR
        - BRL
        - ZAR
        - RUB
        - TRY
  - name: order_total
    description: Total price of the order
    tests:
    - not_null
  - name: taxes_included
    description: Indicates if taxes are included in price
    tests:
    - not_null
  - name: is_test_order
    description: Indicates if this is a test order
    tests:
    - not_null
  - name: shipping_address_line1
    description: First line of shipping address
    tests:
    - not_null
  - name: shipping_status
    description: Shipping status of the order
    tests:
    - accepted_values:
        values:
        - fulfilled
        - pending
        - shipped
        - delivered
        - cancelled
        - returned
        - processing
        - on hold
        - backordered
        - partial
    cocoon_meta:
      missing_acceptable: Not applicable for orders that haven't been shipped yet.
  - name: processing_method
    description: Method used to process the order
    tests:
    - not_null
    - accepted_values:
        values:
        - direct
        - online
        - phone
        - mail
        - in-store
        - fax
        - email
        - mobile app
        - third-party marketplace
        - social media
        - voice assistant
        - text message
  - name: cart_token
    description: Unique identifier for shopping cart
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for the shopping cart.
        For this table, each row represents a distinct order, and cart_token appears
        to be unique across rows.
  - name: marketing_consent
    description: Customer's marketing preferences
    tests:
    - not_null
  - name: alt_order_number
    description: Alternative order number
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column seems to be an alternative order number. For this table,
        each row represents a distinct order, and alt_order_number appears to be unique
        across rows.
  - name: billing_latitude
    description: Latitude of billing address
    cocoon_meta:
      missing_acceptable: No billing location coordinates provided
  - name: billing_longitude
    description: Longitude of billing address
    cocoon_meta:
      missing_acceptable: No billing location coordinates provided
  - name: billing_province_code
    description: Province or state code in billing address
    cocoon_meta:
      missing_acceptable: No billing province/state code provided for this order
  - name: cancel_reason
    description: Reason for order cancellation
    cocoon_meta:
      missing_acceptable: Order not cancelled
  - name: cancelled_at
    description: Timestamp of order cancellation
    cocoon_meta:
      missing_acceptable: Order not cancelled
  - name: closed_at
    description: Timestamp when order was closed
    tests:
    - not_null
  - name: created_at
    description: Timestamp when order was created
    tests:
    - not_null
  - name: customer_id
    description: Unique identifier for the user who placed the order
    tests:
    - not_null
  - name: last_updated
    description: Timestamp of the last update to the order
    tests:
    - not_null
  - name: order_id
    description: Unique identifier for the order
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for the order. For this table,
        each row represents a unique order. Order IDs are typically designed to be
        unique for each order in e-commerce systems. Based on the sample data, we
        can see that each row has a different order_id, suggesting it's unique across
        all rows.
  - name: order_tax
    description: Total tax amount for the order
    tests:
    - not_null
  - name: order_weight
    description: Total weight of the order
    tests:
    - not_null
  - name: processed_at
    description: Timestamp of order processing
    tests:
    - not_null

stg_shopify_product_image_data (first 100 rows)

is_deleted is_default image_url product_id image_id display_order height width created_at updated_at variant_ids
0 False False https://cdn.shopify.com/s/files/glassess-1784103173.jpg?v=1560398767 38804 14180 4 1200 956 2019-06-13 04:06:07 2019-06-13 04:06:07 NaN
1 False False https://cdn.shopify.com/s/files/1/smile.jpg?v=1560398767 34804 748644 2 1200 956 2019-06-13 04:06:07 2019-06-13 04:06:07 NaN
2 False False https://cdn.shopify.com/s/files/1/kitten.jpg?v=1560398767 34604 679716 6 1200 956 2019-06-13 04:06:07 2019-06-13 04:06:07 [None, 27559733, 275597338, 275597536, None, 2755973, None]

stg_shopify_product_image_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_product_image_data_projected" AS (
    -- Projection: Selecting 12 out of 13 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "product_id",
        "_fivetran_deleted",
        "alt",
        "created_at",
        "height",
        "position_",
        "src",
        "updated_at",
        "width",
        "is_default",
        "variant_ids"
    FROM "shopify_product_image_data"
),

"shopify_product_image_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> image_id
    -- _fivetran_deleted -> is_deleted
    -- alt -> alt_text
    -- position_ -> display_order
    -- src -> image_url
    SELECT 
        "id" AS "image_id",
        "product_id",
        "_fivetran_deleted" AS "is_deleted",
        "alt" AS "alt_text",
        "created_at",
        "height",
        "position_" AS "display_order",
        "src" AS "image_url",
        "updated_at",
        "width",
        "is_default",
        "variant_ids"
    FROM "shopify_product_image_data_projected"
),

"shopify_product_image_data_projected_renamed_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- variant_ids: ['[]']
    SELECT 
        CASE
            WHEN "variant_ids" = '[]' THEN NULL
            ELSE "variant_ids"
        END AS "variant_ids",
        "updated_at",
        "is_deleted",
        "is_default",
        "image_url",
        "product_id",
        "image_id",
        "alt_text",
        "display_order",
        "height",
        "created_at",
        "width"
    FROM "shopify_product_image_data_projected_renamed"
),

"shopify_product_image_data_projected_renamed_null_casted" AS (
    -- Column Type Casting: 
    -- alt_text: from DECIMAL to VARCHAR
    -- created_at: from VARCHAR to TIMESTAMP
    -- updated_at: from VARCHAR to TIMESTAMP
    -- variant_ids: from VARCHAR to ARRAY
    SELECT
        "is_deleted",
        "is_default",
        "image_url",
        "product_id",
        "image_id",
        "display_order",
        "height",
        "width",
        CAST("alt_text" AS VARCHAR) AS "alt_text",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at",
        from_json("variant_ids", '["INTEGER"]') AS "variant_ids"
    FROM "shopify_product_image_data_projected_renamed_null"
),

"shopify_product_image_data_projected_renamed_null_casted_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- alt_text has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "is_deleted",
        "is_default",
        "image_url",
        "product_id",
        "image_id",
        "display_order",
        "height",
        "width",
        "created_at",
        "updated_at",
        "variant_ids"
    FROM "shopify_product_image_data_projected_renamed_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_product_image_data_projected_renamed_null_casted_missing_handled"

stg_shopify_product_image_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_product_image_data
  description: The table is about Shopify product images. It contains image details
    such as ID, product ID, creation date, dimensions, URL, and position. Each row
    represents one image. The table includes information on whether the image is default
    and which product variants it's associated with. It also tracks if the image has
    been deleted from the system.
  columns:
  - name: is_deleted
    description: Indicates if the image has been deleted
    tests:
    - not_null
  - name: is_default
    description: Indicates if this is the default product image
    tests:
    - not_null
  - name: image_url
    description: URL source of the image
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the URL source of the image. For this table,
        each row represents a unique image. The image_url is likely to be unique across
        rows as it points to a specific image file on Shopify's CDN.
  - name: product_id
    description: ID of the product associated with the image
    tests:
    - not_null
  - name: image_id
    description: Unique identifier for the image
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for the image. For this
        table, each row represents a unique image, and the image_id is designed to
        be a unique identifier for each image.
  - name: display_order
    description: Order of the image in product gallery
    tests:
    - not_null
  - name: height
    description: Height of the image in pixels
    tests:
    - not_null
  - name: width
    description: Width of the image in pixels
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the image was created
    tests:
    - not_null
  - name: updated_at
    description: Timestamp when the image was last updated
    tests:
    - not_null
  - name: variant_ids
    description: List of product variant IDs associated with the image
    cocoon_meta:
      missing_acceptable: Not all products have variants or multiple versions.

stg_shopify_tender_transaction_data (first 100 rows)

transaction_amount currency payment_method is_test_transaction credit_card_company order_id processing_timestamp transaction_id
0 2895.74 USD other False None 45379 2022-11-30 18:14:37 34283
1 5900.75 USD other False None 45243 2022-12-01 02:00:39 905707
2 -164.72 USD other False None 4559467 2022-11-30 14:29:13 411
3 5180.19 USD other False None 35 2022-11-30 23:55:45 55179
4 3004.30 USD other False None 45955 2022-12-01 02:09:47 16923

stg_shopify_tender_transaction_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_tender_transaction_data_projected" AS (
    -- Projection: Selecting 11 out of 12 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "amount",
        "currency",
        "order_id",
        "payment_details_credit_card_company",
        "payment_details_credit_card_number",
        "payment_method",
        "processed_at",
        "remote_reference",
        "test",
        "user_id"
    FROM "shopify_tender_transaction_data"
),

"shopify_tender_transaction_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> transaction_id
    -- amount -> transaction_amount
    -- payment_details_credit_card_company -> credit_card_company
    -- payment_details_credit_card_number -> masked_card_number
    -- processed_at -> processing_timestamp
    -- remote_reference -> external_reference
    -- test -> is_test_transaction
    SELECT 
        "id" AS "transaction_id",
        "amount" AS "transaction_amount",
        "currency",
        "order_id",
        "payment_details_credit_card_company" AS "credit_card_company",
        "payment_details_credit_card_number" AS "masked_card_number",
        "payment_method",
        "processed_at" AS "processing_timestamp",
        "remote_reference" AS "external_reference",
        "test" AS "is_test_transaction",
        "user_id"
    FROM "shopify_tender_transaction_data_projected"
),

"shopify_tender_transaction_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- credit_card_company: from DECIMAL to VARCHAR
    -- external_reference: from DECIMAL to VARCHAR
    -- masked_card_number: from DECIMAL to VARCHAR
    -- order_id: from INT to VARCHAR
    -- processing_timestamp: from VARCHAR to TIMESTAMP
    -- transaction_id: from INT to VARCHAR
    -- user_id: from DECIMAL to VARCHAR
    SELECT
        "transaction_amount",
        "currency",
        "payment_method",
        "is_test_transaction",
        CAST("credit_card_company" AS VARCHAR) AS "credit_card_company",
        CAST("external_reference" AS VARCHAR) AS "external_reference",
        CAST("masked_card_number" AS VARCHAR) AS "masked_card_number",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("processing_timestamp" AS TIMESTAMP) AS "processing_timestamp",
        CAST("transaction_id" AS VARCHAR) AS "transaction_id",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "shopify_tender_transaction_data_projected_renamed"
),

"shopify_tender_transaction_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- external_reference has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- masked_card_number has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- user_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "transaction_amount",
        "currency",
        "payment_method",
        "is_test_transaction",
        "credit_card_company",
        "order_id",
        "processing_timestamp",
        "transaction_id"
    FROM "shopify_tender_transaction_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_tender_transaction_data_projected_renamed_casted_missing_handled"

stg_shopify_tender_transaction_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_tender_transaction_data
  description: The table is about financial transactions in a Shopify store. It includes
    details such as transaction ID, amount, currency, order ID, payment method, processing
    time, and whether it was a test transaction. The table captures both positive
    and negative amounts, suggesting it covers both sales and refunds. All transactions
    are in USD and use the "other" payment method.
  columns:
  - name: transaction_amount
    description: Transaction amount in USD
    tests:
    - not_null
  - name: currency
    description: Currency code of the transaction
    tests:
    - not_null
  - name: payment_method
    description: Method used for payment
    tests:
    - not_null
    - accepted_values:
        values:
        - cash
        - credit card
        - debit card
        - bank transfer
        - check
        - money order
        - PayPal
        - Apple Pay
        - Google Pay
        - cryptocurrency
        - gift card
        - store credit
        - loyalty points
        - installment plan
        - wire transfer
        - mobile payment
        - contactless payment
        - electronic wallet
        - direct debit
        - other
  - name: is_test_transaction
    description: Indicates if this is a test transaction
    tests:
    - not_null
  - name: credit_card_company
    description: Name of the credit card company
    cocoon_meta:
      missing_acceptable: Payment method is 'other', not a credit card.
  - name: order_id
    description: ID of the associated order
    tests:
    - not_null
  - name: processing_timestamp
    description: Date and time of transaction processing
    tests:
    - not_null
  - name: transaction_id
    description: Unique identifier for the transaction
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for the transaction. For this
        table, each row is a financial transaction. As it's designed to be a unique
        identifier, it should be unique across all rows.

stg_shopify_order_adjustment_data (first 100 rows)

adjustment_amount_cents tax_amount_cents adjustment_type adjustment_reason adjustment_id order_id refund_id
0 -465 0.0 shipping_refund Shipping refund 109271056455 2712175083591 675617407047
1 -95 0.0 shipping_refund Shipping refund 109277085767 2773486501959 675634708551
2 -27 -1.6 shipping_refund Shipping refund 109245956167 2771757826119 675548168263
3 -35 0.0 shipping_refund Shipping refund 109248118855 2771329908807 675555016775
4 -515 0.0 refund_discrepancy Refund discrepancy 109275742279 2773429682247 675632644167

stg_shopify_order_adjustment_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_adjustment_data_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "order_id",
        "refund_id",
        "amount",
        "tax_amount",
        "kind",
        "reason",
        "amount_set",
        "tax_amount_set"
    FROM "shopify_order_adjustment_data"
),

"shopify_order_adjustment_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> adjustment_id
    -- amount -> adjustment_amount_cents
    -- tax_amount -> tax_amount_cents
    -- kind -> adjustment_type
    -- reason -> adjustment_reason
    -- amount_set -> currency_info
    -- tax_amount_set -> tax_currency_info
    SELECT 
        "id" AS "adjustment_id",
        "order_id",
        "refund_id",
        "amount" AS "adjustment_amount_cents",
        "tax_amount" AS "tax_amount_cents",
        "kind" AS "adjustment_type",
        "reason" AS "adjustment_reason",
        "amount_set" AS "currency_info",
        "tax_amount_set" AS "tax_currency_info"
    FROM "shopify_order_adjustment_data_projected"
),

"shopify_order_adjustment_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- adjustment_id: from INT to VARCHAR
    -- currency_info: from DECIMAL to VARCHAR
    -- order_id: from INT to VARCHAR
    -- refund_id: from INT to VARCHAR
    -- tax_currency_info: from DECIMAL to VARCHAR
    SELECT
        "adjustment_amount_cents",
        "tax_amount_cents",
        "adjustment_type",
        "adjustment_reason",
        CAST("adjustment_id" AS VARCHAR) AS "adjustment_id",
        CAST("currency_info" AS VARCHAR) AS "currency_info",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("refund_id" AS VARCHAR) AS "refund_id",
        CAST("tax_currency_info" AS VARCHAR) AS "tax_currency_info"
    FROM "shopify_order_adjustment_data_projected_renamed"
),

"shopify_order_adjustment_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- currency_info has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tax_currency_info has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "adjustment_amount_cents",
        "tax_amount_cents",
        "adjustment_type",
        "adjustment_reason",
        "adjustment_id",
        "order_id",
        "refund_id"
    FROM "shopify_order_adjustment_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_adjustment_data_projected_renamed_casted_missing_handled"

stg_shopify_order_adjustment_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_adjustment_data
  description: The table is about Shopify order adjustments. It includes details such
    as order ID, refund ID, adjustment amount, tax amount, kind of adjustment, and
    reason. The main types of adjustments are shipping refunds and refund discrepancies.
    Each row represents a specific adjustment made to an order, with associated amounts
    and reasons.
  columns:
  - name: adjustment_amount_cents
    description: Adjustment amount in cents
    tests:
    - not_null
  - name: tax_amount_cents
    description: Tax amount associated with the adjustment in cents
    tests:
    - not_null
  - name: adjustment_type
    description: Type of adjustment (e.g., shipping_refund, refund_discrepancy)
    tests:
    - not_null
    - accepted_values:
        values:
        - shipping_refund
        - refund_discrepancy
        - price_adjustment
        - tax_adjustment
        - coupon_adjustment
        - fee_adjustment
        - partial_refund
        - full_refund
        - return_adjustment
        - exchange_adjustment
        - credit_adjustment
        - promotional_adjustment
        - loyalty_point_adjustment
        - gift_card_adjustment
        - handling_fee_adjustment
        - currency_exchange_adjustment
        - inventory_adjustment
        - damaged_goods_adjustment
        - miscellaneous_adjustment
  - name: adjustment_reason
    description: Explanation for the adjustment
    tests:
    - not_null
    - accepted_values:
        values:
        - Shipping refund
        - Refund discrepancy
        - Price adjustment
        - Damaged item
        - Missing item
        - Wrong item shipped
        - Coupon/discount applied
        - Customer satisfaction
        - Bulk order discount
        - Loyalty program credit
        - Warranty claim
        - Return processing fee
        - Exchange difference
        - Partial shipment adjustment
        - Canceled order
        - Promotional offer
        - Tax adjustment
        - Currency exchange rate
        - Shipping upgrade
        - Shipping downgrade
        - Late delivery compensation
        - Product recall
        - Price match
        - Inventory error
        - Payment processing error
  - name: adjustment_id
    description: Unique identifier for the adjustment
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each adjustment.
        For this table, each row is a specific adjustment made to an order. The adjustment_id
        is likely to be unique across all rows as it's designed to distinctly identify
        each adjustment.
  - name: order_id
    description: Unique identifier for the associated order
    tests:
    - not_null
  - name: refund_id
    description: Unique identifier for the associated refund
    tests:
    - not_null

stg_shopify_location_data (first 100 rows)

is_deleted location_name is_active province_state is_legacy local_province_name country_name province_state_code primary_address iso_country_code location_id local_country_name country_code creation_timestamp last_update_timestamp postal_code secondary_address
0 False Plum True None True None United States None None US 8777748 United States US 2019-06-11 15:58:20 2019-06-11 15:58:20 None None
1 False Plum Express True NY False New York United States NY 111 Tree Road US 7748 United States US 2018-12-10 16:24:07 2019-05-16 13:37:39 7394.0 None

stg_shopify_location_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_location_data_projected" AS (
    -- Projection: Selecting 19 out of 20 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "active",
        "address_1",
        "address_2",
        "city",
        "country",
        "created_at",
        "legacy",
        "name",
        "phone",
        "province",
        "updated_at",
        "zip",
        "country_code",
        "country_name",
        "localized_country_name",
        "localized_province_name",
        "province_code",
        "_fivetran_deleted"
    FROM "shopify_location_data"
),

"shopify_location_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> location_id
    -- active -> is_active
    -- address_1 -> primary_address
    -- address_2 -> secondary_address
    -- country -> country_code
    -- created_at -> creation_timestamp
    -- legacy -> is_legacy
    -- name -> location_name
    -- phone -> phone_number
    -- province -> province_state
    -- updated_at -> last_update_timestamp
    -- zip -> postal_code
    -- country_code -> iso_country_code
    -- localized_country_name -> local_country_name
    -- localized_province_name -> local_province_name
    -- province_code -> province_state_code
    -- _fivetran_deleted -> is_deleted
    SELECT 
        "id" AS "location_id",
        "active" AS "is_active",
        "address_1" AS "primary_address",
        "address_2" AS "secondary_address",
        "city",
        "country" AS "country_code",
        "created_at" AS "creation_timestamp",
        "legacy" AS "is_legacy",
        "name" AS "location_name",
        "phone" AS "phone_number",
        "province" AS "province_state",
        "updated_at" AS "last_update_timestamp",
        "zip" AS "postal_code",
        "country_code" AS "iso_country_code",
        "country_name",
        "localized_country_name" AS "local_country_name",
        "localized_province_name" AS "local_province_name",
        "province_code" AS "province_state_code",
        "_fivetran_deleted" AS "is_deleted"
    FROM "shopify_location_data_projected"
),

"shopify_location_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- city: The problem is that 'Tree' is not a valid city name. It appears that this column has been mistakenly populated with data that should belong to a different column, likely one describing types of vegetation or natural features. Since we don't have the correct city information and 'Tree' is meaningless in this context, we should map it to an empty string to indicate missing data. 
    -- local_province_name: The problem is a misspelling in the local_province_name column. The value 'New Yorl' is a typo and should be corrected to 'New York'. This is likely a data entry error where the 'k' was accidentally typed as 'l'. 
    SELECT
        "location_id",
        "is_active",
        "primary_address",
        "secondary_address",
        CASE
            WHEN "city" = 'Tree' THEN ''
            ELSE "city"
        END AS "city",
        "country_code",
        "creation_timestamp",
        "is_legacy",
        "location_name",
        "phone_number",
        "province_state",
        "last_update_timestamp",
        "postal_code",
        "iso_country_code",
        "country_name",
        "local_country_name",
        CASE
            WHEN "local_province_name" = 'New Yorl' THEN 'New York'
            ELSE "local_province_name"
        END AS "local_province_name",
        "province_state_code",
        "is_deleted"
    FROM "shopify_location_data_projected_renamed"
),

"shopify_location_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- city: ['']
    SELECT 
        CASE
            WHEN "city" = '' THEN NULL
            ELSE "city"
        END AS "city",
        "is_deleted",
        "location_name",
        "is_active",
        "province_state",
        "is_legacy",
        "local_province_name",
        "country_name",
        "province_state_code",
        "postal_code",
        "primary_address",
        "iso_country_code",
        "location_id",
        "secondary_address",
        "local_country_name",
        "phone_number",
        "country_code",
        "creation_timestamp",
        "last_update_timestamp"
    FROM "shopify_location_data_projected_renamed_cleaned"
),

"shopify_location_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- last_update_timestamp: from VARCHAR to TIMESTAMP
    -- phone_number: from DECIMAL to VARCHAR
    -- postal_code: from DECIMAL to VARCHAR
    -- secondary_address: from DECIMAL to VARCHAR
    SELECT
        "city",
        "is_deleted",
        "location_name",
        "is_active",
        "province_state",
        "is_legacy",
        "local_province_name",
        "country_name",
        "province_state_code",
        "primary_address",
        "iso_country_code",
        "location_id",
        "local_country_name",
        "country_code",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST("last_update_timestamp" AS TIMESTAMP) AS "last_update_timestamp",
        CAST("phone_number" AS VARCHAR) AS "phone_number",
        CAST("postal_code" AS VARCHAR) AS "postal_code",
        CAST("secondary_address" AS VARCHAR) AS "secondary_address"
    FROM "shopify_location_data_projected_renamed_cleaned_null"
),

"shopify_location_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 7 columns with unacceptable missing values
    -- city has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- local_province_name has 50.0 percent missing. Strategy: 🔄 Unchanged
    -- phone_number has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- postal_code has 50.0 percent missing. Strategy: 🔄 Unchanged
    -- primary_address has 50.0 percent missing. Strategy: 🔄 Unchanged
    -- province_state has 50.0 percent missing. Strategy: 🔄 Unchanged
    -- province_state_code has 50.0 percent missing. Strategy: 🔄 Unchanged
    SELECT
        "is_deleted",
        "location_name",
        "is_active",
        "province_state",
        "is_legacy",
        "local_province_name",
        "country_name",
        "province_state_code",
        "primary_address",
        "iso_country_code",
        "location_id",
        "local_country_name",
        "country_code",
        "creation_timestamp",
        "last_update_timestamp",
        "postal_code",
        "secondary_address"
    FROM "shopify_location_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_location_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_location_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_location_data
  description: The table contains details about Shopify store locations. It includes
    information such as location ID, address, city, country, phone number, and status
    (active/inactive). Each row represents a unique store location with its associated
    attributes. The table tracks both physical and online store locations, with fields
    for physical addresses as well as digital-only stores.
  columns:
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: location_name
    description: Name of the store location
    tests:
    - not_null
  - name: is_active
    description: Indicates if the location is currently active
    tests:
    - not_null
  - name: province_state
    description: Province or state of the location
    tests:
    - not_null
    - accepted_values:
        values:
        - AL
        - AK
        - AZ
        - AR
        - CA
        - CO
        - CT
        - DE
        - FL
        - GA
        - HI
        - ID
        - IL
        - IN
        - IA
        - KS
        - KY
        - LA
        - ME
        - MD
        - MA
        - MI
        - MN
        - MS
        - MO
        - MT
        - NE
        - NV
        - NH
        - NJ
        - NM
        - NY
        - NC
        - ND
        - OH
        - OK
        - OR
        - PA
        - RI
        - SC
        - SD
        - TN
        - TX
        - UT
        - VT
        - VA
        - WA
        - WV
        - WI
        - WY
  - name: is_legacy
    description: Indicates if the location is a legacy entry
    tests:
    - not_null
  - name: local_province_name
    description: Province name in local language
    tests:
    - not_null
  - name: country_name
    description: Full name of the country
    tests:
    - not_null
  - name: province_state_code
    description: Code for the province or state
    tests:
    - not_null
  - name: primary_address
    description: Primary address line of the location
    tests:
    - not_null
  - name: iso_country_code
    description: ISO country code of the location
    tests:
    - not_null
  - name: location_id
    description: Unique identifier for the location
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the location.
        For this table, each row is for a unique store location. Given it's explicitly
        described as a unique identifier, it should be unique across rows.
  - name: local_country_name
    description: Country name in local language
    tests:
    - not_null
  - name: country_code
    description: Country code where the location is situated
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp when the location was created
    tests:
    - not_null
  - name: last_update_timestamp
    description: Timestamp when the location was last updated
    tests:
    - not_null
  - name: postal_code
    description: Postal or ZIP code of the location
    tests:
    - not_null
  - name: secondary_address
    description: Secondary address line of the location
    cocoon_meta:
      missing_acceptable: Not all locations have or need a secondary address.

stg_shopify_product_tag_data (first 100 rows)

tag_id tag_value product_id
0 9 Type: Clothing 1234
1 5 Final Sale 1234
2 7 Sale 1234
3 8 StyleID: Nice 1234
4 3 Collection: Bottoms 1234

stg_shopify_product_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_product_tag_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "product_id",
        "value_"
    FROM "shopify_product_tag_data"
),

"shopify_product_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> tag_id
    -- value_ -> tag_value
    SELECT 
        "index_" AS "tag_id",
        "product_id",
        "value_" AS "tag_value"
    FROM "shopify_product_tag_data_projected"
),

"shopify_product_tag_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- tag_value: The problem is inconsistent formatting and an outlier value. Most values use a colon followed by a space to separate categories, except for "StyleID:nice" which lacks a space after the colon. "Final Sale" and "Sale" don't follow the category:value pattern at all. The correct values should follow the "Category: Value" format consistently, or be a single descriptive term for sales items. 
    SELECT
        "tag_id",
        "product_id",
        CASE
            WHEN "tag_value" = 'StyleID:nice' THEN 'StyleID: Nice'
            ELSE "tag_value"
        END AS "tag_value"
    FROM "shopify_product_tag_data_projected_renamed"
),

"shopify_product_tag_data_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- product_id: from INT to VARCHAR
    SELECT
        "tag_id",
        "tag_value",
        CAST("product_id" AS VARCHAR) AS "product_id"
    FROM "shopify_product_tag_data_projected_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_product_tag_data_projected_renamed_cleaned_casted"

stg_shopify_product_tag_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_product_tag_data
  description: The table is about product tags in a Shopify system. It contains product
    IDs and associated tag values. Each product can have multiple tags. Tags include
    information like product type, sale status, style ID, and collection category.
    The table allows for flexible categorization and labeling of products.
  columns:
  - name: tag_id
    description: Unique identifier for each tag entry
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each tag entry. For
        this table, each row represents a specific tag associated with a product.
        The tag_id appears to be unique across rows, as it's described as a "Unique
        identifier for each tag entry".
  - name: tag_value
    description: The actual tag content or description
    tests:
    - not_null
  - name: product_id
    description: Identifier for the product associated with the tag
    tests:
    - not_null

stg_shopify_tax_line_data (first 100 rows)

row_id tax_amount tax_rate tax_type order_line_id tax_price_set
0 1 0.0 0.0 VAT 29227 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 1 0.0 0.0 VAT 1839083 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 1 0.0 0.0 VAT 11995 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
3 1 0.0 0.0 VAT 10751 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
4 1 0.0 0.0 VAT 194763 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

stg_shopify_tax_line_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_tax_line_data_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "order_line_id",
        "price",
        "rate",
        "title",
        "price_set"
    FROM "shopify_tax_line_data"
),

"shopify_tax_line_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> row_id
    -- price -> tax_amount
    -- rate -> tax_rate
    -- title -> tax_type
    -- price_set -> tax_price_set
    SELECT 
        "index_" AS "row_id",
        "order_line_id",
        "price" AS "tax_amount",
        "rate" AS "tax_rate",
        "title" AS "tax_type",
        "price_set" AS "tax_price_set"
    FROM "shopify_tax_line_data_projected"
),

"shopify_tax_line_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- order_line_id: from INT to VARCHAR
    -- tax_price_set: from VARCHAR to JSON
    SELECT
        "row_id",
        "tax_amount",
        "tax_rate",
        "tax_type",
        CAST("order_line_id" AS VARCHAR) AS "order_line_id",
        CAST("tax_price_set" AS JSON) AS "tax_price_set"
    FROM "shopify_tax_line_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_tax_line_data_projected_renamed_casted"

stg_shopify_tax_line_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_tax_line_data
  description: The table is about tax information for Shopify order lines. It includes
    details such as order line ID, tax price, tax rate, tax title (always "VAT" in
    the samples), and a price set with shop and presentment money amounts. All sample
    entries show zero tax, suggesting these may be tax-exempt transactions or orders
    from regions without applicable taxes.
  columns:
  - name: row_id
    description: Identifier for the table row
    tests:
    - not_null
  - name: tax_amount
    description: Tax amount for the order line
    tests:
    - not_null
  - name: tax_rate
    description: Tax rate applied to the order line
    tests:
    - not_null
  - name: tax_type
    description: Type of tax applied
    tests:
    - not_null
    - accepted_values:
        values:
        - VAT
        - Sales Tax
        - Income Tax
        - Property Tax
        - Capital Gains Tax
        - Corporate Tax
        - Excise Tax
        - Payroll Tax
        - Estate Tax
        - Gift Tax
        - Customs Duty
        - Stamp Duty
        - Wealth Tax
        - Carbon Tax
        - Sin Tax
        - Withholding Tax
  - name: order_line_id
    description: Unique identifier for the order line
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each order line. For this
        table, each row represents a tax entry for an order line. The order_line_id
        appears to be unique across rows, as each value in the sample is different.
  - name: tax_price_set
    description: Detailed price information in different currencies
    tests:
    - not_null

stg_shopify_inventory_level_data (first 100 rows)

inventory_item_id location_id
0 780939 287748
1 6027 287748
2 515 28748

stg_shopify_inventory_level_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_inventory_level_data_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "inventory_item_id",
        "location_id",
        "available",
        "updated_at"
    FROM "shopify_inventory_level_data"
),

"shopify_inventory_level_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- available -> quantity_available
    -- updated_at -> last_updated
    SELECT 
        "inventory_item_id",
        "location_id",
        "available" AS "quantity_available",
        "updated_at" AS "last_updated"
    FROM "shopify_inventory_level_data_projected"
),

"shopify_inventory_level_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- last_updated: from DECIMAL to TIMESTAMP
    -- quantity_available: from DECIMAL to INT
    SELECT
        "inventory_item_id",
        "location_id",
        CAST("last_updated" AS TIMESTAMP) AS "last_updated",
        CAST("quantity_available" AS INT) AS "quantity_available"
    FROM "shopify_inventory_level_data_projected_renamed"
),

"shopify_inventory_level_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- last_updated has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- quantity_available has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "inventory_item_id",
        "location_id"
    FROM "shopify_inventory_level_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_inventory_level_data_projected_renamed_casted_missing_handled"

stg_shopify_inventory_level_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_inventory_level_data
  description: The table is about inventory levels in a Shopify store. It contains
    details of inventory items, their locations, available quantities, and update
    timestamps. Each row represents a specific inventory item at a particular location.
    The empty fields suggest incomplete or missing data for some entries.
  columns:
  - name: inventory_item_id
    description: Unique identifier for the inventory item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each inventory item.
        For this table, each row represents a specific inventory item at a particular
        location. Since inventory_item_id is designed to be a unique identifier, it
        should be unique across rows, even when the same item is present in multiple
        locations.
  - name: location_id
    description: Unique identifier for the store location
    tests:
    - not_null

stg_shopify_abandoned_checkout_shipping_line_data (first 100 rows)

shipping_option_order shipping_method_code shipping_line_id shipping_markup shipping_price shipping_option_source shipping_option_title original_shop_markup original_shop_price display_title api_client_id carrier_identifier carrier_service_id checkout_id delivery_category delivery_expectation_range delivery_expectation_type discounted_price fulfillment_service_id max_delivery_days min_delivery_days shipping_phone
0 1 Standard c3ce0972c2e30eaf7001bea 0.0 0.0 shopify Standard 0.0 0.0 Standard None None None 653675 None None None None None None None None
1 1 Standard bf7c90953344902c13 0.0 0.0 shopify Standard 0.0 0.0 Standard None None None 379 None None None None None None None None
2 1 Standard 519ff4275cd972e282db 0.0 0.0 shopify Standard 0.0 0.0 Standard None None None 635 None None None None None None None None
3 1 Standard 8d18671d481ad46a 0.0 0.0 shopify Standard 0.0 0.0 Standard None None None 3211 None None None None None None None None
4 1 Standard 8f2fab1b455ec9e597 0.0 0.0 shopify Standard 0.0 0.0 Standard None None None 381227 None None None None None None None None

stg_shopify_abandoned_checkout_shipping_line_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_abandoned_checkout_shipping_line_data_projected" AS (
    -- Projection: Selecting 23 out of 24 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "checkout_id",
        "index_",
        "api_client_id",
        "carrier_identifier",
        "carrier_service_id",
        "code",
        "delivery_category",
        "discounted_price",
        "id",
        "markup",
        "phone",
        "price",
        "requested_fulfillment_service_id",
        "source",
        "title",
        "validation_context",
        "delivery_expectation_range",
        "delivery_expectation_type",
        "original_shop_markup",
        "original_shop_price",
        "presentment_title",
        "delivery_expectation_range_min",
        "delivery_expectation_range_max"
    FROM "shopify_abandoned_checkout_shipping_line_data"
),

"shopify_abandoned_checkout_shipping_line_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> shipping_option_order
    -- code -> shipping_method_code
    -- id -> shipping_line_id
    -- markup -> shipping_markup
    -- phone -> shipping_phone
    -- price -> shipping_price
    -- requested_fulfillment_service_id -> fulfillment_service_id
    -- source -> shipping_option_source
    -- title -> shipping_option_title
    -- presentment_title -> display_title
    -- delivery_expectation_range_min -> min_delivery_days
    -- delivery_expectation_range_max -> max_delivery_days
    SELECT 
        "checkout_id",
        "index_" AS "shipping_option_order",
        "api_client_id",
        "carrier_identifier",
        "carrier_service_id",
        "code" AS "shipping_method_code",
        "delivery_category",
        "discounted_price",
        "id" AS "shipping_line_id",
        "markup" AS "shipping_markup",
        "phone" AS "shipping_phone",
        "price" AS "shipping_price",
        "requested_fulfillment_service_id" AS "fulfillment_service_id",
        "source" AS "shipping_option_source",
        "title" AS "shipping_option_title",
        "validation_context",
        "delivery_expectation_range",
        "delivery_expectation_type",
        "original_shop_markup",
        "original_shop_price",
        "presentment_title" AS "display_title",
        "delivery_expectation_range_min" AS "min_delivery_days",
        "delivery_expectation_range_max" AS "max_delivery_days"
    FROM "shopify_abandoned_checkout_shipping_line_data_projected"
),

"shopify_abandoned_checkout_shipping_line_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- api_client_id: from DECIMAL to VARCHAR
    -- carrier_identifier: from DECIMAL to VARCHAR
    -- carrier_service_id: from DECIMAL to VARCHAR
    -- checkout_id: from INT to VARCHAR
    -- delivery_category: from DECIMAL to VARCHAR
    -- delivery_expectation_range: from DECIMAL to VARCHAR
    -- delivery_expectation_type: from DECIMAL to VARCHAR
    -- discounted_price: from DECIMAL to VARCHAR
    -- fulfillment_service_id: from DECIMAL to VARCHAR
    -- max_delivery_days: from DECIMAL to VARCHAR
    -- min_delivery_days: from DECIMAL to VARCHAR
    -- shipping_phone: from DECIMAL to VARCHAR
    -- validation_context: from DECIMAL to VARCHAR
    SELECT
        "shipping_option_order",
        "shipping_method_code",
        "shipping_line_id",
        "shipping_markup",
        "shipping_price",
        "shipping_option_source",
        "shipping_option_title",
        "original_shop_markup",
        "original_shop_price",
        "display_title",
        CAST("api_client_id" AS VARCHAR) AS "api_client_id",
        CAST("carrier_identifier" AS VARCHAR) AS "carrier_identifier",
        CAST("carrier_service_id" AS VARCHAR) AS "carrier_service_id",
        CAST("checkout_id" AS VARCHAR) AS "checkout_id",
        CAST("delivery_category" AS VARCHAR) AS "delivery_category",
        CAST("delivery_expectation_range" AS VARCHAR) AS "delivery_expectation_range",
        CAST("delivery_expectation_type" AS VARCHAR) AS "delivery_expectation_type",
        CAST("discounted_price" AS VARCHAR) AS "discounted_price",
        CAST("fulfillment_service_id" AS VARCHAR) AS "fulfillment_service_id",
        CAST("max_delivery_days" AS VARCHAR) AS "max_delivery_days",
        CAST("min_delivery_days" AS VARCHAR) AS "min_delivery_days",
        CAST("shipping_phone" AS VARCHAR) AS "shipping_phone",
        CAST("validation_context" AS VARCHAR) AS "validation_context"
    FROM "shopify_abandoned_checkout_shipping_line_data_projected_renamed"
),

"shopify_abandoned_checkout_shipping_line_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- validation_context has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "shipping_option_order",
        "shipping_method_code",
        "shipping_line_id",
        "shipping_markup",
        "shipping_price",
        "shipping_option_source",
        "shipping_option_title",
        "original_shop_markup",
        "original_shop_price",
        "display_title",
        "api_client_id",
        "carrier_identifier",
        "carrier_service_id",
        "checkout_id",
        "delivery_category",
        "delivery_expectation_range",
        "delivery_expectation_type",
        "discounted_price",
        "fulfillment_service_id",
        "max_delivery_days",
        "min_delivery_days",
        "shipping_phone"
    FROM "shopify_abandoned_checkout_shipping_line_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_abandoned_checkout_shipping_line_data_projected_renamed_casted_missing_handled"

stg_shopify_abandoned_checkout_shipping_line_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_abandoned_checkout_shipping_line_data
  description: The table is about shipping details for abandoned Shopify checkouts.
    It includes checkout ID, shipping method details, pricing information, and delivery
    expectations. All rows show "Standard" shipping with no cost. The data seems to
    capture basic shipping line information for checkouts that were not completed.
  columns:
  - name: shipping_option_order
    description: Order of the shipping option
    tests:
    - not_null
  - name: shipping_method_code
    description: Shipping method code
    tests:
    - not_null
    - accepted_values:
        values:
        - Standard
        - Express
        - Overnight
        - Two-Day
        - Ground
        - Priority
        - Economy
        - International
        - Local
        - Same-Day
        - Freight
  - name: shipping_line_id
    description: Unique identifier for the shipping line
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each shipping line.
        For this table, each row represents a shipping option for an abandoned checkout.
        The shipping_line_id appears to be unique across rows, as it's a specific
        identifier for each shipping line.
  - name: shipping_markup
    description: Additional charge on top of shipping cost
    tests:
    - not_null
  - name: shipping_price
    description: Price of the shipping option
    tests:
    - not_null
  - name: shipping_option_source
    description: Source of the shipping option
    tests:
    - not_null
    - accepted_values:
        values:
        - shopify
        - manual
        - third_party_api
        - carrier_calculated
        - flat_rate
        - weight_based
        - local_delivery
        - pickup
        - free_shipping
        - real_time
        - custom
  - name: shipping_option_title
    description: Title of the shipping option
    tests:
    - not_null
    - accepted_values:
        values:
        - Standard
        - Express
        - Overnight
        - Two-Day
        - Economy
        - Priority
        - Same-Day
        - Free
        - Flat Rate
        - International
        - Local Pickup
  - name: original_shop_markup
    description: Original markup set by the shop
    tests:
    - not_null
  - name: original_shop_price
    description: Original price set by the shop
    tests:
    - not_null
  - name: display_title
    description: Display title for the shipping option
    tests:
    - not_null
    - accepted_values:
        values:
        - Standard
        - Express
        - Overnight
        - Two-Day
        - Economy
        - Same-Day
        - Priority
        - First Class
        - Ground
        - International
  - name: api_client_id
    description: API client identifier
    cocoon_meta:
      missing_acceptable: Not needed for standard internal shipping method
  - name: carrier_identifier
    description: Shipping carrier identifier
    cocoon_meta:
      missing_acceptable: Not applicable for standard internal shipping
  - name: carrier_service_id
    description: Unique ID for carrier service
    cocoon_meta:
      missing_acceptable: Not used for standard internal shipping
  - name: checkout_id
    description: Unique identifier for the checkout
    tests:
    - not_null
  - name: delivery_category
    description: Category of delivery service
    cocoon_meta:
      missing_acceptable: Not relevant for standard shipping option
  - name: delivery_expectation_range
    description: Expected delivery timeframe
    cocoon_meta:
      missing_acceptable: Not specified for standard shipping
  - name: delivery_expectation_type
    description: Type of delivery expectation
    cocoon_meta:
      missing_acceptable: Not defined for standard shipping
  - name: discounted_price
    description: Price after applying discounts
    cocoon_meta:
      missing_acceptable: No discount applied to standard shipping
  - name: fulfillment_service_id
    description: ID of requested fulfillment service
    cocoon_meta:
      missing_acceptable: Not used for standard internal shipping
  - name: max_delivery_days
    description: Maximum days for expected delivery
    cocoon_meta:
      missing_acceptable: Not specified for standard shipping
  - name: min_delivery_days
    description: Minimum days for expected delivery
    cocoon_meta:
      missing_acceptable: Not specified for standard shipping
  - name: shipping_phone
    description: Contact phone number for shipping
    cocoon_meta:
      missing_acceptable: Not required for standard shipping method

stg_shopify_order_note_attribute_data (first 100 rows)

attribute_name attribute_value order_id
0 last_name "1418143823.1643992155" 34171115
1 first_name "fb.1.1643992155109.1110590605" 34171115
2 updated_at "1643992163253" 34171115
3 clientID "a03d3118-4048-4159-b5bb-1b90d8abb69b" 34171115
4 name "22707603636395" 34171115

stg_shopify_order_note_attribute_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_note_attribute_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "name",
        "order_id",
        "value_"
    FROM "shopify_order_note_attribute_data"
),

"shopify_order_note_attribute_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- name -> attribute_name
    -- value_ -> attribute_value
    SELECT 
        "name" AS "attribute_name",
        "order_id",
        "value_" AS "attribute_value"
    FROM "shopify_order_note_attribute_data_projected"
),

"shopify_order_note_attribute_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- order_id: from INT to VARCHAR
    SELECT
        "attribute_name",
        "attribute_value",
        CAST("order_id" AS VARCHAR) AS "order_id"
    FROM "shopify_order_note_attribute_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_note_attribute_data_projected_renamed_casted"

stg_shopify_order_note_attribute_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_note_attribute_data
  description: The table is about Shopify order attributes. It contains various details
    related to a specific order, identified by the order_id. The attributes include
    customer information (first name, last name), order-specific data (updated timestamp,
    clientID), and possibly product information (name attribute with a numeric value).
    Each row represents a different attribute for the same order.
  columns:
  - name: attribute_name
    description: Attribute name or type of information
    tests:
    - not_null
  - name: attribute_value
    description: Corresponding value for the attribute
    tests:
    - not_null
  - name: order_id
    description: Unique identifier for the Shopify order
    tests:
    - not_null

stg_shopify_product_variant_data (first 100 rows)

title display_position inventory_policy fulfillment_service inventory_management is_taxable weight_grams stock_quantity weight_unit previous_stock_quantity requires_shipping tax_code option_1 created_at image_id inventory_item_id price product_id updated_at variant_id weight
0 my title here 1 deny manual None False 0 0 lb 0 False None my title here 2021-03-08 16:30:15 None 41356021661767 111 6540108431431 2021-04-12 19:49:43 39262114414663 0.0
1 my title here 1 deny manual None False 0 0 lb 0 False None my title here 2021-03-17 16:39:45 None 41367035936839 222 6544066379847 2021-04-12 19:46:59 39273118957639 0.0
2 my title here 1 deny manual inventory manager True 0 0 lb 0 True None my title here 2021-03-30 19:48:15 None 41384094924871 5 6548438188103 2021-03-30 19:48:15 39290169262151 0.0
3 my title here 1 deny manual None False 0 -5 lb -5 False None my title here 2021-03-08 16:31:31 None 41356022644807 333 6540109250631 2021-04-12 19:47:26 39262115397703 0.0
4 my other title 1 deny manual inventory manager True 222 0 lb 0 True TR9999 my other title 2019-06-25 18:32:03 None 30309980143686 444 3879735590982 2019-10-01 23:40:09 29217058947142 1.0

stg_shopify_product_variant_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_product_variant_data_projected" AS (
    -- Projection: Selecting 26 out of 27 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "product_id",
        "inventory_item_id",
        "title",
        "price",
        "sku",
        "position_",
        "inventory_policy",
        "compare_at_price",
        "fulfillment_service",
        "inventory_management",
        "created_at",
        "updated_at",
        "taxable",
        "barcode",
        "grams",
        "image_id",
        "inventory_quantity",
        "weight",
        "weight_unit",
        "old_inventory_quantity",
        "requires_shipping",
        "option_2",
        "tax_code",
        "option_3",
        "option_1"
    FROM "shopify_product_variant_data"
),

"shopify_product_variant_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> variant_id
    -- position_ -> display_position
    -- compare_at_price -> original_price
    -- taxable -> is_taxable
    -- grams -> weight_grams
    -- inventory_quantity -> stock_quantity
    -- old_inventory_quantity -> previous_stock_quantity
    SELECT 
        "id" AS "variant_id",
        "product_id",
        "inventory_item_id",
        "title",
        "price",
        "sku",
        "position_" AS "display_position",
        "inventory_policy",
        "compare_at_price" AS "original_price",
        "fulfillment_service",
        "inventory_management",
        "created_at",
        "updated_at",
        "taxable" AS "is_taxable",
        "barcode",
        "grams" AS "weight_grams",
        "image_id",
        "inventory_quantity" AS "stock_quantity",
        "weight",
        "weight_unit",
        "old_inventory_quantity" AS "previous_stock_quantity",
        "requires_shipping",
        "option_2",
        "tax_code",
        "option_3",
        "option_1"
    FROM "shopify_product_variant_data_projected"
),

"shopify_product_variant_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- barcode: from DECIMAL to VARCHAR
    -- created_at: from VARCHAR to TIMESTAMP
    -- image_id: from DECIMAL to VARCHAR
    -- inventory_item_id: from INT to VARCHAR
    -- option_2: from DECIMAL to VARCHAR
    -- option_3: from DECIMAL to VARCHAR
    -- original_price: from DECIMAL to VARCHAR
    -- price: from INT to VARCHAR
    -- product_id: from INT to VARCHAR
    -- sku: from DECIMAL to VARCHAR
    -- updated_at: from VARCHAR to TIMESTAMP
    -- variant_id: from INT to VARCHAR
    -- weight: from INT to DECIMAL
    SELECT
        "title",
        "display_position",
        "inventory_policy",
        "fulfillment_service",
        "inventory_management",
        "is_taxable",
        "weight_grams",
        "stock_quantity",
        "weight_unit",
        "previous_stock_quantity",
        "requires_shipping",
        "tax_code",
        "option_1",
        CAST("barcode" AS VARCHAR) AS "barcode",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("image_id" AS VARCHAR) AS "image_id",
        CAST("inventory_item_id" AS VARCHAR) AS "inventory_item_id",
        CAST("option_2" AS VARCHAR) AS "option_2",
        CAST("option_3" AS VARCHAR) AS "option_3",
        CAST("original_price" AS VARCHAR) AS "original_price",
        CAST("price" AS VARCHAR) AS "price",
        CAST("product_id" AS VARCHAR) AS "product_id",
        CAST("sku" AS VARCHAR) AS "sku",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at",
        CAST("variant_id" AS VARCHAR) AS "variant_id",
        CAST("weight" AS DECIMAL) AS "weight"
    FROM "shopify_product_variant_data_projected_renamed"
),

"shopify_product_variant_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 7 columns with unacceptable missing values
    -- barcode has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- inventory_management has 60.0 percent missing. Strategy: 🔄 Unchanged
    -- option_2 has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- option_3 has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- original_price has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- sku has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tax_code has 80.0 percent missing. Strategy: 🔄 Unchanged
    SELECT
        "title",
        "display_position",
        "inventory_policy",
        "fulfillment_service",
        "inventory_management",
        "is_taxable",
        "weight_grams",
        "stock_quantity",
        "weight_unit",
        "previous_stock_quantity",
        "requires_shipping",
        "tax_code",
        "option_1",
        "created_at",
        "image_id",
        "inventory_item_id",
        "price",
        "product_id",
        "updated_at",
        "variant_id",
        "weight"
    FROM "shopify_product_variant_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_product_variant_data_projected_renamed_casted_missing_handled"

stg_shopify_product_variant_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_product_variant_data
  description: The table is about Shopify product variants. It contains details like
    variant ID, product ID, price, SKU, inventory information, creation and update
    timestamps, shipping requirements, and tax status. Each row represents a specific
    variant of a product, with attributes such as title, price, weight, and inventory
    quantity. The table likely serves as a central record for managing product variants
    in a Shopify e-commerce system.
  columns:
  - name: title
    description: Title or name of the variant
    tests:
    - not_null
  - name: display_position
    description: Position of the variant in listings
    tests:
    - not_null
  - name: inventory_policy
    description: Policy for handling out-of-stock items
    tests:
    - not_null
    - accepted_values:
        values:
        - deny
        - backorder
        - substitute
        - notify
        - waitlist
  - name: fulfillment_service
    description: Service used for order fulfillment
    tests:
    - not_null
    - accepted_values:
        values:
        - manual
        - amazon
        - shipwire
        - webgistix
        - shipstation
        - shopify_fulfillment
        - third_party
        - self_fulfilled
        - drop_ship
        - fba (Fulfillment by Amazon)
        - external
  - name: inventory_management
    description: Method used for inventory management
    tests:
    - not_null
    - accepted_values:
        values:
        - inventory manager
        - just-in-time (JIT)
        - economic order quantity (EOQ)
        - abc analysis
        - first-in, first-out (FIFO)
        - last-in, first-out (LIFO)
        - safety stock
        - vendor-managed inventory (VMI)
        - consignment inventory
        - dropshipping
        - perpetual inventory system
        - periodic inventory system
        - barcode system
        - radio-frequency identification (RFID)
        - cycle counting
        - min-max inventory method
        - reorder point planning
        - materials requirement planning (MRP)
        - batch tracking
        - demand forecasting
  - name: is_taxable
    description: Indicates if the variant is taxable
    tests:
    - not_null
  - name: weight_grams
    description: Weight of the product in grams
    tests:
    - not_null
  - name: stock_quantity
    description: Current quantity in stock
    tests:
    - not_null
  - name: weight_unit
    description: Unit of measurement for weight
    tests:
    - not_null
    - accepted_values:
        values:
        - lb
        - kg
        - g
        - oz
        - stone
        - ton
        - metric ton
        - mg
  - name: previous_stock_quantity
    description: Previous quantity in stock
    tests:
    - not_null
  - name: requires_shipping
    description: Indicates if shipping is required
    tests:
    - not_null
  - name: tax_code
    description: Tax code for the variant
    tests:
    - not_null
  - name: option_1
    description: Primary product option
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the variant was created
    tests:
    - not_null
  - name: image_id
    description: Identifier for the variant's image
    cocoon_meta:
      missing_acceptable: Not all products require an image.
  - name: inventory_item_id
    description: Identifier for inventory tracking
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is an identifier for inventory tracking. For this table,
        each row is for a specific product variant. As it's an identifier specifically
        for inventory items, it's likely to be unique for each variant.
  - name: price
    description: Current price of the variant
    tests:
    - not_null
  - name: product_id
    description: Identifier of the parent product
    tests:
    - not_null
  - name: updated_at
    description: Timestamp of last update
    tests:
    - not_null
  - name: variant_id
    description: Unique identifier for the variant
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is the unique identifier for the variant. For this table,
        each row is for a specific product variant. As it's explicitly described as
        a unique identifier, it should be unique across all rows.
  - name: weight
    description: Weight of the product
    tests:
    - not_null

stg_shopify_collection_data (first 100 rows)

collection_id is_deleted is_disjunctive last_updated
0 997355 True None 1970-01-01
1 9930779 True None 1970-01-01
2 99967 True None 1970-01-01

stg_shopify_collection_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_collection_data_projected" AS (
    -- Projection: Selecting 12 out of 13 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "handle",
        "published_at",
        "published_scope",
        "title",
        "updated_at",
        "disjunctive",
        "rules",
        "sort_order",
        "template_suffix",
        "body_html"
    FROM "shopify_collection_data"
),

"shopify_collection_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> collection_id
    -- _fivetran_deleted -> is_deleted
    -- handle -> url_slug
    -- published_at -> publish_date
    -- published_scope -> visibility_scope
    -- title -> collection_name
    -- updated_at -> last_updated
    -- disjunctive -> is_disjunctive
    -- rules -> product_rules
    -- sort_order -> product_sort_order
    -- template_suffix -> page_template
    -- body_html -> description_html
    SELECT 
        "id" AS "collection_id",
        "_fivetran_deleted" AS "is_deleted",
        "handle" AS "url_slug",
        "published_at" AS "publish_date",
        "published_scope" AS "visibility_scope",
        "title" AS "collection_name",
        "updated_at" AS "last_updated",
        "disjunctive" AS "is_disjunctive",
        "rules" AS "product_rules",
        "sort_order" AS "product_sort_order",
        "template_suffix" AS "page_template",
        "body_html" AS "description_html"
    FROM "shopify_collection_data_projected"
),

"shopify_collection_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- collection_name: from DECIMAL to VARCHAR
    -- description_html: from DECIMAL to VARCHAR
    -- is_disjunctive: from DECIMAL to VARCHAR
    -- last_updated: from VARCHAR to TIMESTAMP
    -- page_template: from DECIMAL to VARCHAR
    -- product_rules: from DECIMAL to VARCHAR
    -- product_sort_order: from DECIMAL to VARCHAR
    -- publish_date: from DECIMAL to VARCHAR
    -- url_slug: from DECIMAL to VARCHAR
    -- visibility_scope: from DECIMAL to VARCHAR
    SELECT
        "collection_id",
        "is_deleted",
        CAST("collection_name" AS VARCHAR) AS "collection_name",
        CAST("description_html" AS VARCHAR) AS "description_html",
        CAST("is_disjunctive" AS VARCHAR) AS "is_disjunctive",
        CAST("last_updated" AS TIMESTAMP) AS "last_updated",
        CAST("page_template" AS VARCHAR) AS "page_template",
        CAST("product_rules" AS VARCHAR) AS "product_rules",
        CAST("product_sort_order" AS VARCHAR) AS "product_sort_order",
        CAST("publish_date" AS VARCHAR) AS "publish_date",
        CAST("url_slug" AS VARCHAR) AS "url_slug",
        CAST("visibility_scope" AS VARCHAR) AS "visibility_scope"
    FROM "shopify_collection_data_projected_renamed"
),

"shopify_collection_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 8 columns with unacceptable missing values
    -- collection_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- description_html has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- page_template has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- product_rules has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- product_sort_order has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- publish_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- url_slug has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- visibility_scope has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "collection_id",
        "is_deleted",
        "is_disjunctive",
        "last_updated"
    FROM "shopify_collection_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_collection_data_projected_renamed_casted_missing_handled"

stg_shopify_collection_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_collection_data
  description: The table is about Shopify collections. It contains collection IDs,
    deletion status, handles, publication details, titles, update timestamps, and
    other collection-specific attributes. The data seems to represent deleted collections,
    as the _fivetran_deleted field is set to True and most fields are empty. The table
    likely stores historical data of collections that were once active in a Shopify
    store.
  columns:
  - name: collection_id
    description: Unique identifier for the collection
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each Shopify collection.
        For this table, each row represents a deleted collection. The collection_id
        is likely to be unique across rows as it's typically assigned by Shopify to
        uniquely identify each collection.
  - name: is_deleted
    description: Indicates if the collection has been deleted
    tests:
    - not_null
  - name: is_disjunctive
    description: Determines if products must match all or any rules
    cocoon_meta:
      missing_acceptable: Not applicable for non-filterable or single-category collections.
  - name: last_updated
    description: Date and time of last update to the collection
    tests:
    - not_null

stg_shopify_order_shipping_tax_line_data (first 100 rows)

tax_name row_index shipping_tax_amount shipping_tax_rate order_shipping_line_id tax_amount_currencies
0 None 4 0.0 0.000 321291 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
1 BANANA 3 0.0 0.007 5995 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}
2 TOMATO 3 0.0 0.010 309131 {"shop_money":{"amount":"0.00","currency_code":"USD"},"presentment_money":{"amount":"0.00","currency_code":"USD"}}

stg_shopify_order_shipping_tax_line_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_shipping_tax_line_data_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "order_shipping_line_id",
        "price",
        "rate",
        "title",
        "price_set"
    FROM "shopify_order_shipping_tax_line_data"
),

"shopify_order_shipping_tax_line_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> row_index
    -- price -> shipping_tax_amount
    -- rate -> shipping_tax_rate
    -- title -> tax_name
    -- price_set -> tax_amount_currencies
    SELECT 
        "index_" AS "row_index",
        "order_shipping_line_id",
        "price" AS "shipping_tax_amount",
        "rate" AS "shipping_tax_rate",
        "title" AS "tax_name",
        "price_set" AS "tax_amount_currencies"
    FROM "shopify_order_shipping_tax_line_data_projected"
),

"shopify_order_shipping_tax_line_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- tax_name: The problem is that 'BANANAN' is a misspelling of 'BANANA', and 'GEIWIHG' is an unrecognizable term that doesn't appear to be a valid fruit or vegetable name. The correct values should be common fruit or vegetable names. 'TOMATO' is already correct and doesn't need to be changed. 
    SELECT
        "row_index",
        "order_shipping_line_id",
        "shipping_tax_amount",
        "shipping_tax_rate",
        CASE
            WHEN "tax_name" = 'BANANAN' THEN 'BANANA'
            WHEN "tax_name" = 'GEIWIHG' THEN ''
            ELSE "tax_name"
        END AS "tax_name",
        "tax_amount_currencies"
    FROM "shopify_order_shipping_tax_line_data_projected_renamed"
),

"shopify_order_shipping_tax_line_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- tax_name: ['']
    SELECT 
        CASE
            WHEN "tax_name" = '' THEN NULL
            ELSE "tax_name"
        END AS "tax_name",
        "order_shipping_line_id",
        "row_index",
        "tax_amount_currencies",
        "shipping_tax_amount",
        "shipping_tax_rate"
    FROM "shopify_order_shipping_tax_line_data_projected_renamed_cleaned"
),

"shopify_order_shipping_tax_line_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- order_shipping_line_id: from INT to VARCHAR
    -- tax_amount_currencies: from VARCHAR to JSON
    SELECT
        "tax_name",
        "row_index",
        "shipping_tax_amount",
        "shipping_tax_rate",
        CAST("order_shipping_line_id" AS VARCHAR) AS "order_shipping_line_id",
        CAST("tax_amount_currencies" AS JSON) AS "tax_amount_currencies"
    FROM "shopify_order_shipping_tax_line_data_projected_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_shipping_tax_line_data_projected_renamed_cleaned_null_casted"

stg_shopify_order_shipping_tax_line_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_shipping_tax_line_data
  description: The table is about shipping tax line details for Shopify orders. It
    includes information such as the order shipping line ID, tax price, tax rate,
    tax title, and price set in different currencies. Each row represents a specific
    tax line associated with a shipping line of an order. The price set contains the
    tax amount in both shop currency and presentment currency.
  columns:
  - name: tax_name
    description: Name or code of the tax applied
    cocoon_meta:
      missing_acceptable: No tax applied when shipping_tax_rate is 0.0.
  - name: row_index
    description: Row identifier or index number
    tests:
    - not_null
  - name: shipping_tax_amount
    description: Tax amount for the shipping line
    tests:
    - not_null
  - name: shipping_tax_rate
    description: Tax rate applied to the shipping line
    tests:
    - not_null
  - name: order_shipping_line_id
    description: Unique identifier for the order shipping line
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the unique identifier for the order shipping
        line. For this table, each row is for a specific tax line associated with
        a shipping line of an order. order_shipping_line_id is likely to be unique
        across rows, as it should uniquely identify each shipping line.
  - name: tax_amount_currencies
    description: Tax amount in shop and presentment currencies
    tests:
    - not_null

stg_shopify_abandoned_checkout_data (first 100 rows)

billing_address_line2 billing_first_name billing_full_name currency display_currency billing_latitude payment_gateway accepts_marketing billing_address_line1 billing_country customer_locale email checkout_token discount_amount taxes_included customer_id order_number landing_page_url billing_province_code referral_source billing_country_code recovery_url source_name billing_longitude discount_value cart_token billing_city subtotal billing_province billing_last_name abandoned_at billing_address_id billing_company billing_phone billing_zip cc_cvv cc_exp_month cc_exp_year cc_first_name cc_last_name cc_number checkout_id custom_attributes discount_description discount_non_applicable_reason discount_title discount_value_type last_updated_at
0 None None None USD None NaN paypal False None None en tnyrnbs@hh.com f050eda12f111b261 NaN False 121 #10160311 /collections/the-archive-sale None None None https://kitties.com/1111311610/checkouts/f050eda125a10cca513162f01101b261/recover?key=bd0fdf1dc1a1af01aecbdaa3101ec063 web NaN NaN aaaa211622dfb133 None 56.00 None None 2020-11-12 10:06:50.111111 None None None None None None None None None None 12111 None None None None None 2020-11-12 10:51:10.111111
1 None None None USD None 1.126113 None False Apt 0 USA en hyrehher@gmail.com a165dfd11226 NaN False 366525 #13311 /collections/sale PA-11 https://www.google.com/ US https://kitties.com/1111311610/checkouts/6661ff02165dfd11b12db112f0111226/recover?key=51611efdff11e0caccc0fd30b0e1e202 web -21.502661 NaN 611faa630ce5e6bcc0bacc2a105c0126 Daytona Beach 10.35 CA Calles 2020-05-11 01:01:30.111111 None None 50266111110.0 None None None None None None None 11111 None None None None None 2020-05-11 01:06:35.111111
2 None None None USD USD NaN None False None None en hernebbe@hr.com l1abddd111c0211f2021c NaN False 160363 #166531 /collections/new None https://l.facebook.com/ None https://kitties.com/1111311610/checkouts/0abddd111c0211f1e616ec0d0c32021c/recover?key=abed6505d26f1a60a50aa0c02e01be31 web NaN NaN aaaaa61e1d11af3adfac1f0 None 191.00 None None 2021-11-11 02:05:13.111111 None None None None None None None None None None 66531 [{"name":"segment-clientID","value":"610a111c-30fc-0bb6-a25e-06f201c6035c"},{"name":"_updatedAt","value":"1613121625150"}] None None None None 2021-11-11 02:05:55.111111

stg_shopify_abandoned_checkout_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_abandoned_checkout_data_removeWideColumns" AS (
    -- Remove wide columns with pattern. The regex and columns are:
    -- ^shipping_address_.*$: shipping_address_address_0, shipping_address_address_1, shipping_address_city, shipping_address_company, shipping_address_country, shipping_address_country_code, shipping_address_first_name, shipping_address_id, shipping_address_is_default, shipping_address_last_name ...
    -- ^shipping_rate_.*$: shipping_rate_id, shipping_rate_price, shipping_rate_title
    -- ^note_attribute_.*$: note_attribute_email_client_id, note_attribute_google_client_id, note_attribute_littledata_updated_at, note_attribute_segment_client_id
    -- ^total_.*$: total_discounts, total_duties, total_line_items_price, total_price, total_tax, total_weight
    SELECT 
        "_fivetran_deleted",
        "_fivetran_synced",
        "abandoned_checkout_url",
        "applied_discount_amount",
        "applied_discount_applicable",
        "applied_discount_description",
        "applied_discount_non_applicable_reason",
        "applied_discount_title",
        "applied_discount_value",
        "applied_discount_value_type",
        "billing_address_address_0",
        "billing_address_address_1",
        "billing_address_city",
        "billing_address_company",
        "billing_address_country",
        "billing_address_country_code",
        "billing_address_first_name",
        "billing_address_id",
        "billing_address_is_default",
        "billing_address_last_name",
        "billing_address_latitude",
        "billing_address_longitude",
        "billing_address_name",
        "billing_address_phone",
        "billing_address_province",
        "billing_address_province_code",
        "billing_address_zip",
        "buyer_accepts_marketing",
        "cart_token",
        "closed_at",
        "completed_at",
        "created_at",
        "credit_card_first_name",
        "credit_card_last_name",
        "credit_card_month",
        "credit_card_number",
        "credit_card_verification_value",
        "credit_card_year",
        "currency",
        "customer_id",
        "customer_locale",
        "device_id",
        "email",
        "gateway",
        "id",
        "landing_site_base_url",
        "location_id",
        "name",
        "note",
        "note_attributes",
        "phone",
        "presentment_currency",
        "referring_site",
        "shipping_line",
        "source",
        "source_identifier",
        "source_name",
        "source_url",
        "subtotal_price",
        "taxes_included",
        "token",
        "updated_at",
        "user_id"
    FROM "shopify_abandoned_checkout_data"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected" AS (
    -- Projection: Selecting 62 out of 63 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "_fivetran_deleted",
        "abandoned_checkout_url",
        "applied_discount_amount",
        "applied_discount_applicable",
        "applied_discount_description",
        "applied_discount_non_applicable_reason",
        "applied_discount_title",
        "applied_discount_value",
        "applied_discount_value_type",
        "billing_address_address_0",
        "billing_address_address_1",
        "billing_address_city",
        "billing_address_company",
        "billing_address_country",
        "billing_address_country_code",
        "billing_address_first_name",
        "billing_address_id",
        "billing_address_is_default",
        "billing_address_last_name",
        "billing_address_latitude",
        "billing_address_longitude",
        "billing_address_name",
        "billing_address_phone",
        "billing_address_province",
        "billing_address_province_code",
        "billing_address_zip",
        "buyer_accepts_marketing",
        "cart_token",
        "closed_at",
        "completed_at",
        "created_at",
        "credit_card_first_name",
        "credit_card_last_name",
        "credit_card_month",
        "credit_card_number",
        "credit_card_verification_value",
        "credit_card_year",
        "currency",
        "customer_id",
        "customer_locale",
        "device_id",
        "email",
        "gateway",
        "id",
        "landing_site_base_url",
        "location_id",
        "name",
        "note",
        "note_attributes",
        "phone",
        "presentment_currency",
        "referring_site",
        "shipping_line",
        "source",
        "source_identifier",
        "source_name",
        "source_url",
        "subtotal_price",
        "taxes_included",
        "token",
        "updated_at",
        "user_id"
    FROM "shopify_abandoned_checkout_data_removeWideColumns"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected_renamed" AS (
    -- Rename: Renaming columns
    -- _fivetran_deleted -> is_deleted
    -- abandoned_checkout_url -> recovery_url
    -- applied_discount_amount -> discount_amount
    -- applied_discount_applicable -> is_discount_applicable
    -- applied_discount_description -> discount_description
    -- applied_discount_non_applicable_reason -> discount_non_applicable_reason
    -- applied_discount_title -> discount_title
    -- applied_discount_value -> discount_value
    -- applied_discount_value_type -> discount_value_type
    -- billing_address_address_0 -> billing_address_line1
    -- billing_address_address_1 -> billing_address_line2
    -- billing_address_city -> billing_city
    -- billing_address_company -> billing_company
    -- billing_address_country -> billing_country
    -- billing_address_country_code -> billing_country_code
    -- billing_address_first_name -> billing_first_name
    -- billing_address_is_default -> is_default_billing_address
    -- billing_address_last_name -> billing_last_name
    -- billing_address_latitude -> billing_latitude
    -- billing_address_longitude -> billing_longitude
    -- billing_address_name -> billing_full_name
    -- billing_address_phone -> billing_phone
    -- billing_address_province -> billing_province
    -- billing_address_province_code -> billing_province_code
    -- billing_address_zip -> billing_zip
    -- buyer_accepts_marketing -> accepts_marketing
    -- created_at -> abandoned_at
    -- credit_card_first_name -> cc_first_name
    -- credit_card_last_name -> cc_last_name
    -- credit_card_month -> cc_exp_month
    -- credit_card_number -> cc_number
    -- credit_card_verification_value -> cc_cvv
    -- credit_card_year -> cc_exp_year
    -- gateway -> payment_gateway
    -- id -> checkout_id
    -- landing_site_base_url -> landing_page_url
    -- name -> order_number
    -- note -> order_notes
    -- note_attributes -> custom_attributes
    -- presentment_currency -> display_currency
    -- referring_site -> referral_source
    -- shipping_line -> shipping_details
    -- source -> checkout_source
    -- source_identifier -> source_id
    -- subtotal_price -> subtotal
    -- token -> checkout_token
    -- updated_at -> last_updated_at
    SELECT 
        "_fivetran_deleted" AS "is_deleted",
        "abandoned_checkout_url" AS "recovery_url",
        "applied_discount_amount" AS "discount_amount",
        "applied_discount_applicable" AS "is_discount_applicable",
        "applied_discount_description" AS "discount_description",
        "applied_discount_non_applicable_reason" AS "discount_non_applicable_reason",
        "applied_discount_title" AS "discount_title",
        "applied_discount_value" AS "discount_value",
        "applied_discount_value_type" AS "discount_value_type",
        "billing_address_address_0" AS "billing_address_line1",
        "billing_address_address_1" AS "billing_address_line2",
        "billing_address_city" AS "billing_city",
        "billing_address_company" AS "billing_company",
        "billing_address_country" AS "billing_country",
        "billing_address_country_code" AS "billing_country_code",
        "billing_address_first_name" AS "billing_first_name",
        "billing_address_id",
        "billing_address_is_default" AS "is_default_billing_address",
        "billing_address_last_name" AS "billing_last_name",
        "billing_address_latitude" AS "billing_latitude",
        "billing_address_longitude" AS "billing_longitude",
        "billing_address_name" AS "billing_full_name",
        "billing_address_phone" AS "billing_phone",
        "billing_address_province" AS "billing_province",
        "billing_address_province_code" AS "billing_province_code",
        "billing_address_zip" AS "billing_zip",
        "buyer_accepts_marketing" AS "accepts_marketing",
        "cart_token",
        "closed_at",
        "completed_at",
        "created_at" AS "abandoned_at",
        "credit_card_first_name" AS "cc_first_name",
        "credit_card_last_name" AS "cc_last_name",
        "credit_card_month" AS "cc_exp_month",
        "credit_card_number" AS "cc_number",
        "credit_card_verification_value" AS "cc_cvv",
        "credit_card_year" AS "cc_exp_year",
        "currency",
        "customer_id",
        "customer_locale",
        "device_id",
        "email",
        "gateway" AS "payment_gateway",
        "id" AS "checkout_id",
        "landing_site_base_url" AS "landing_page_url",
        "location_id",
        "name" AS "order_number",
        "note" AS "order_notes",
        "note_attributes" AS "custom_attributes",
        "phone",
        "presentment_currency" AS "display_currency",
        "referring_site" AS "referral_source",
        "shipping_line" AS "shipping_details",
        "source" AS "checkout_source",
        "source_identifier" AS "source_id",
        "source_name",
        "source_url",
        "subtotal_price" AS "subtotal",
        "taxes_included",
        "token" AS "checkout_token",
        "updated_at" AS "last_updated_at",
        "user_id"
    FROM "shopify_abandoned_checkout_data_removeWideColumns_projected"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- billing_address_line2: The problem is that 'village' is too generic and lacks specific information for an address line 2. Typically, address line 2 should contain more specific details like apartment numbers, suite numbers, or building names. The value 'village' doesn't provide any meaningful information in this context. The correct value in this case should be an empty string, as there's no specific information to include. 
    -- billing_city: The problem is that 'daytona Beach' is not properly capitalized. City names should have their first letters capitalized. The correct value should be 'Daytona Beach'. 
    -- billing_country: The problem is that 'Florida' is a state in the United States, not a country, and it appears in a column named 'billing_country'. The correct value should be the country that Florida is part of, which is the United States of America (USA). 
    -- billing_first_name: The problem is that 'ohio' is a state name, not a typical first name for billing information. This column should contain personal first names. Since we don't have any additional information about the correct first name for this entry, we can't map it to a valid name. The correct value should be an empty string to indicate missing data. 
    -- billing_full_name: The problem is that 'hi' is not a valid full name for billing purposes. A full name typically consists of at least a first name and a last name. The value 'hi' appears to be a greeting or placeholder rather than an actual name. For billing purposes, we need accurate and complete customer information. Since there are no valid names provided, we should map this meaningless value to an empty string. 
    -- billing_province: The problem is that 'Healdsburg' is a city name, not a province or state. For a billing_province column, we would expect to see state or province names. Since Healdsburg is a city in California, the correct value should be the state abbreviation 'CA' for California. 
    SELECT
        "is_deleted",
        "recovery_url",
        "discount_amount",
        "is_discount_applicable",
        "discount_description",
        "discount_non_applicable_reason",
        "discount_title",
        "discount_value",
        "discount_value_type",
        "billing_address_line1",
        CASE
            WHEN "billing_address_line2" = 'village' THEN ''
            ELSE "billing_address_line2"
        END AS "billing_address_line2",
        CASE
            WHEN "billing_city" = 'daytona Beach' THEN 'Daytona Beach'
            ELSE "billing_city"
        END AS "billing_city",
        "billing_company",
        CASE
            WHEN "billing_country" = 'Florida' THEN 'USA'
            ELSE "billing_country"
        END AS "billing_country",
        "billing_country_code",
        CASE
            WHEN "billing_first_name" = 'ohio' THEN ''
            ELSE "billing_first_name"
        END AS "billing_first_name",
        "billing_address_id",
        "is_default_billing_address",
        "billing_last_name",
        "billing_latitude",
        "billing_longitude",
        CASE
            WHEN "billing_full_name" = 'hi' THEN ''
            ELSE "billing_full_name"
        END AS "billing_full_name",
        "billing_phone",
        CASE
            WHEN "billing_province" = 'Healdsburg' THEN 'CA'
            ELSE "billing_province"
        END AS "billing_province",
        "billing_province_code",
        "billing_zip",
        "accepts_marketing",
        "cart_token",
        "closed_at",
        "completed_at",
        "abandoned_at",
        "cc_first_name",
        "cc_last_name",
        "cc_exp_month",
        "cc_number",
        "cc_cvv",
        "cc_exp_year",
        "currency",
        "customer_id",
        "customer_locale",
        "device_id",
        "email",
        "payment_gateway",
        "checkout_id",
        "landing_page_url",
        "location_id",
        "order_number",
        "order_notes",
        "custom_attributes",
        "phone",
        "display_currency",
        "referral_source",
        "shipping_details",
        "checkout_source",
        "source_id",
        "source_name",
        "source_url",
        "subtotal",
        "taxes_included",
        "checkout_token",
        "last_updated_at",
        "user_id"
    FROM "shopify_abandoned_checkout_data_removeWideColumns_projected_renamed"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- billing_address_line2: ['']
    -- billing_first_name: ['']
    -- billing_full_name: ['']
    SELECT 
        CASE
            WHEN "billing_address_line2" = '' THEN NULL
            ELSE "billing_address_line2"
        END AS "billing_address_line2",
        CASE
            WHEN "billing_first_name" = '' THEN NULL
            ELSE "billing_first_name"
        END AS "billing_first_name",
        CASE
            WHEN "billing_full_name" = '' THEN NULL
            ELSE "billing_full_name"
        END AS "billing_full_name",
        "phone",
        "cc_last_name",
        "currency",
        "display_currency",
        "cc_exp_month",
        "billing_latitude",
        "billing_zip",
        "completed_at",
        "cc_exp_year",
        "payment_gateway",
        "shipping_details",
        "billing_address_id",
        "accepts_marketing",
        "billing_address_line1",
        "billing_country",
        "discount_description",
        "customer_locale",
        "email",
        "checkout_token",
        "billing_company",
        "discount_non_applicable_reason",
        "order_notes",
        "cc_number",
        "device_id",
        "location_id",
        "is_default_billing_address",
        "discount_amount",
        "abandoned_at",
        "user_id",
        "discount_value_type",
        "last_updated_at",
        "taxes_included",
        "checkout_source",
        "customer_id",
        "order_number",
        "landing_page_url",
        "billing_province_code",
        "discount_title",
        "is_deleted",
        "source_url",
        "referral_source",
        "billing_country_code",
        "recovery_url",
        "source_name",
        "billing_longitude",
        "billing_phone",
        "closed_at",
        "cc_cvv",
        "source_id",
        "is_discount_applicable",
        "discount_value",
        "cc_first_name",
        "checkout_id",
        "cart_token",
        "billing_city",
        "custom_attributes",
        "subtotal",
        "billing_province",
        "billing_last_name"
    FROM "shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- abandoned_at: from VARCHAR to TIMESTAMP
    -- billing_address_id: from DECIMAL to VARCHAR
    -- billing_company: from DECIMAL to VARCHAR
    -- billing_phone: from DECIMAL to VARCHAR
    -- billing_zip: from DECIMAL to VARCHAR
    -- cc_cvv: from DECIMAL to VARCHAR
    -- cc_exp_month: from DECIMAL to VARCHAR
    -- cc_exp_year: from DECIMAL to VARCHAR
    -- cc_first_name: from DECIMAL to VARCHAR
    -- cc_last_name: from DECIMAL to VARCHAR
    -- cc_number: from DECIMAL to VARCHAR
    -- checkout_id: from INT to VARCHAR
    -- checkout_source: from DECIMAL to VARCHAR
    -- closed_at: from DECIMAL to TIMESTAMP
    -- completed_at: from DECIMAL to TIMESTAMP
    -- custom_attributes: from VARCHAR to JSON
    -- device_id: from DECIMAL to VARCHAR
    -- discount_description: from DECIMAL to VARCHAR
    -- discount_non_applicable_reason: from DECIMAL to VARCHAR
    -- discount_title: from DECIMAL to VARCHAR
    -- discount_value_type: from DECIMAL to VARCHAR
    -- is_default_billing_address: from DECIMAL to BOOLEAN
    -- is_deleted: from DECIMAL to BOOLEAN
    -- is_discount_applicable: from DECIMAL to BOOLEAN
    -- last_updated_at: from VARCHAR to TIMESTAMP
    -- location_id: from DECIMAL to VARCHAR
    -- order_notes: from DECIMAL to VARCHAR
    -- phone: from DECIMAL to VARCHAR
    -- shipping_details: from DECIMAL to JSON
    -- source_id: from DECIMAL to VARCHAR
    -- source_url: from DECIMAL to VARCHAR
    -- user_id: from DECIMAL to VARCHAR
    SELECT
        "billing_address_line2",
        "billing_first_name",
        "billing_full_name",
        "currency",
        "display_currency",
        "billing_latitude",
        "payment_gateway",
        "accepts_marketing",
        "billing_address_line1",
        "billing_country",
        "customer_locale",
        "email",
        "checkout_token",
        "discount_amount",
        "taxes_included",
        "customer_id",
        "order_number",
        "landing_page_url",
        "billing_province_code",
        "referral_source",
        "billing_country_code",
        "recovery_url",
        "source_name",
        "billing_longitude",
        "discount_value",
        "cart_token",
        "billing_city",
        "subtotal",
        "billing_province",
        "billing_last_name",
        CAST("abandoned_at" AS TIMESTAMP) AS "abandoned_at",
        CAST("billing_address_id" AS VARCHAR) AS "billing_address_id",
        CAST("billing_company" AS VARCHAR) AS "billing_company",
        CAST("billing_phone" AS VARCHAR) AS "billing_phone",
        CAST("billing_zip" AS VARCHAR) AS "billing_zip",
        CAST("cc_cvv" AS VARCHAR) AS "cc_cvv",
        CAST("cc_exp_month" AS VARCHAR) AS "cc_exp_month",
        CAST("cc_exp_year" AS VARCHAR) AS "cc_exp_year",
        CAST("cc_first_name" AS VARCHAR) AS "cc_first_name",
        CAST("cc_last_name" AS VARCHAR) AS "cc_last_name",
        CAST("cc_number" AS VARCHAR) AS "cc_number",
        CAST("checkout_id" AS VARCHAR) AS "checkout_id",
        CAST("checkout_source" AS VARCHAR) AS "checkout_source",
        CAST("closed_at" AS TIMESTAMP) AS "closed_at",
        CAST("completed_at" AS TIMESTAMP) AS "completed_at",
        CAST("custom_attributes" AS JSON) AS "custom_attributes",
        CAST("device_id" AS VARCHAR) AS "device_id",
        CAST("discount_description" AS VARCHAR) AS "discount_description",
        CAST("discount_non_applicable_reason" AS VARCHAR) AS "discount_non_applicable_reason",
        CAST("discount_title" AS VARCHAR) AS "discount_title",
        CAST("discount_value_type" AS VARCHAR) AS "discount_value_type",
        CAST("is_default_billing_address" AS BOOLEAN) AS "is_default_billing_address",
        CAST("is_deleted" AS BOOLEAN) AS "is_deleted",
        CAST("is_discount_applicable" AS BOOLEAN) AS "is_discount_applicable",
        CAST("last_updated_at" AS TIMESTAMP) AS "last_updated_at",
        CAST("location_id" AS VARCHAR) AS "location_id",
        CAST("order_notes" AS VARCHAR) AS "order_notes",
        CAST("phone" AS VARCHAR) AS "phone",
        CAST("shipping_details" AS JSON) AS "shipping_details",
        CAST("source_id" AS VARCHAR) AS "source_id",
        CAST("source_url" AS VARCHAR) AS "source_url",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null"
),

"shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 17 columns with unacceptable missing values
    -- checkout_source has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- closed_at has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- completed_at has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- custom_attributes has 66.67 percent missing. Strategy: 🔄 Unchanged
    -- device_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- display_currency has 66.67 percent missing. Strategy: 🔄 Unchanged
    -- is_default_billing_address has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- is_deleted has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- is_discount_applicable has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- location_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- order_notes has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- phone has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- referral_source has 33.33 percent missing. Strategy: 🔄 Unchanged
    -- shipping_details has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- source_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- source_url has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- user_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "billing_address_line2",
        "billing_first_name",
        "billing_full_name",
        "currency",
        "display_currency",
        "billing_latitude",
        "payment_gateway",
        "accepts_marketing",
        "billing_address_line1",
        "billing_country",
        "customer_locale",
        "email",
        "checkout_token",
        "discount_amount",
        "taxes_included",
        "customer_id",
        "order_number",
        "landing_page_url",
        "billing_province_code",
        "referral_source",
        "billing_country_code",
        "recovery_url",
        "source_name",
        "billing_longitude",
        "discount_value",
        "cart_token",
        "billing_city",
        "subtotal",
        "billing_province",
        "billing_last_name",
        "abandoned_at",
        "billing_address_id",
        "billing_company",
        "billing_phone",
        "billing_zip",
        "cc_cvv",
        "cc_exp_month",
        "cc_exp_year",
        "cc_first_name",
        "cc_last_name",
        "cc_number",
        "checkout_id",
        "custom_attributes",
        "discount_description",
        "discount_non_applicable_reason",
        "discount_title",
        "discount_value_type",
        "last_updated_at"
    FROM "shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_abandoned_checkout_data_removeWideColumns_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_abandoned_checkout_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_abandoned_checkout_data
  description: The table is about abandoned checkouts on a Shopify store. It contains
    details of incomplete orders including customer information, billing address,
    product details, pricing, and checkout URLs. Each row represents a single abandoned
    cart with data like email, currency, subtotal, and timestamps. The table tracks
    customer behavior and potential sales that were not completed.
  columns:
  - name: billing_address_line2
    description: Second line of billing address
    cocoon_meta:
      missing_acceptable: No secondary address line needed
  - name: billing_first_name
    description: First name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for the transaction
  - name: billing_full_name
    description: Full name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for the transaction
  - name: currency
    description: Currency used for the transaction
    tests:
    - not_null
  - name: display_currency
    description: Currency presented to the customer
    tests:
    - not_null
  - name: billing_latitude
    description: Latitude of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: payment_gateway
    description: Payment gateway used
    tests:
    - accepted_values:
        values:
        - PayPal
        - Stripe
        - Square
        - Authorize.Net
        - Braintree
        - 2Checkout
        - Amazon Pay
        - Google Pay
        - Apple Pay
        - Skrill
        - Klarna
        - Adyen
        - WorldPay
        - Sage Pay
        - Dwolla
        - WePay
        - Payoneer
        - BlueSnap
        - Checkout.com
        - Alipay
        - paypal
    cocoon_meta:
      missing_acceptable: Not applicable when payment hasn't been processed yet.
  - name: accepts_marketing
    description: Indicates if buyer accepts marketing emails
    tests:
    - not_null
  - name: billing_address_line1
    description: First line of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: billing_country
    description: Country of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: customer_locale
    description: Language/region setting of the customer
    tests:
    - not_null
  - name: email
    description: Customer's email address
    tests:
    - not_null
  - name: checkout_token
    description: Unique token for the abandoned checkout
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique token for each abandoned checkout.
        For this table, each row is an abandoned checkout. Checkout token is designed
        to be a unique identifier for each checkout session.
  - name: discount_amount
    description: Amount of discount applied to the order
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: taxes_included
    description: Whether taxes are included in the price
    tests:
    - not_null
  - name: customer_id
    description: Unique identifier for the customer who abandoned the cart
    tests:
    - not_null
  - name: order_number
    description: Order number or identifier
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains an order number or identifier. For this table,
        each row is an abandoned checkout. Order numbers are typically unique for
        each order or checkout attempt.
  - name: landing_page_url
    description: URL of the page where customer entered site
    tests:
    - not_null
  - name: billing_province_code
    description: Province or state code of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: referral_source
    description: Website that referred the customer
    tests:
    - not_null
  - name: billing_country_code
    description: Country code of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: recovery_url
    description: URL for recovering the abandoned checkout
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the URL for recovering the abandoned checkout.
        For this table, each row is an abandoned checkout. The recovery URL appears
        to be unique for each abandoned checkout, as it contains a unique token.
  - name: source_name
    description: Name of the checkout source
    tests:
    - not_null
    - accepted_values:
        values:
        - web
        - mobile
        - desktop
        - tablet
        - kiosk
        - api
        - in-store
        - phone
        - mail
        - fax
        - social_media
        - voice_assistant
        - smartwatch
        - smart_tv
        - game_console
        - iot_device
  - name: billing_longitude
    description: Longitude of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: discount_value
    description: Value of the applied discount
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: cart_token
    description: Unique identifier for the shopping cart
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for the shopping cart.
        For this table, each row is an abandoned checkout. The cart token is likely
        to be unique for each abandoned cart.
  - name: billing_city
    description: City of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: subtotal
    description: Subtotal of the order before taxes/shipping
    tests:
    - not_null
  - name: billing_province
    description: Province or state of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: billing_last_name
    description: Last name in billing address
    cocoon_meta:
      missing_acceptable: No billing name provided for the transaction
  - name: abandoned_at
    description: Timestamp of when the checkout was abandoned
    tests:
    - not_null
  - name: billing_address_id
    description: Unique identifier for billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: billing_company
    description: Company name in billing address
    cocoon_meta:
      missing_acceptable: No company associated with the billing
  - name: billing_phone
    description: Phone number in billing address
    cocoon_meta:
      missing_acceptable: No phone number provided for billing
  - name: billing_zip
    description: ZIP or postal code of billing address
    cocoon_meta:
      missing_acceptable: No billing address provided for the transaction
  - name: cc_cvv
    description: CVV of the credit card
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: cc_exp_month
    description: Expiration month of the credit card
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: cc_exp_year
    description: Expiration year of the credit card
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: cc_first_name
    description: First name on the credit card
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: cc_last_name
    description: Last name on the credit card
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: cc_number
    description: Credit card number (likely masked)
    cocoon_meta:
      missing_acceptable: Credit card not used for the transaction
  - name: checkout_id
    description: Unique identifier for the abandoned checkout
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for the abandoned checkout. For
        this table, each row represents a unique abandoned cart. As it's designed
        to be a unique identifier, it should be unique across all rows and can identify
        each abandoned cart uniquely.
  - name: custom_attributes
    description: Custom attributes for the order
    tests:
    - not_null
  - name: discount_description
    description: Description of the applied discount
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: discount_non_applicable_reason
    description: Reason why discount is not applicable
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: discount_title
    description: Title of the applied discount
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: discount_value_type
    description: Type of discount value (percentage or fixed)
    cocoon_meta:
      missing_acceptable: No discount applied to the transaction
  - name: last_updated_at
    description: Timestamp of when the abandoned cart was last updated
    tests:
    - not_null

stg_shopify_order_tag_data (first 100 rows)

tag_group_id order_id color_tag
0 1 6411 #333333
1 1 47195 #222222
2 1 46553 #771222

stg_shopify_order_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_tag_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "order_id",
        "value_"
    FROM "shopify_order_tag_data"
),

"shopify_order_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> tag_group_id
    -- value_ -> color_tag
    SELECT 
        "index_" AS "tag_group_id",
        "order_id",
        "value_" AS "color_tag"
    FROM "shopify_order_tag_data_projected"
),

"shopify_order_tag_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- color_tag: The problem is that '#22222' and '#33333' are invalid hex color codes because they have only 5 digits instead of the standard 6 digits. Hex color codes should always have 6 digits (or 3 digits in shorthand notation). The correct values should have 6 digits. To fix this, we can assume that the last digit was accidentally omitted and duplicate it to create valid 6-digit hex codes. 
    SELECT
        "tag_group_id",
        "order_id",
        CASE
            WHEN "color_tag" = '#22222' THEN '#222222'
            WHEN "color_tag" = '#33333' THEN '#333333'
            ELSE "color_tag"
        END AS "color_tag"
    FROM "shopify_order_tag_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_tag_data_projected_renamed_cleaned"

stg_shopify_order_tag_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_tag_data
  description: The table is about Shopify order tags. Each row represents a tag associated
    with an order. The table contains an index, order ID, and a tag value. The tag
    values appear to be color codes starting with '#'. This table likely allows attaching
    additional metadata or categorization to Shopify orders.
  columns:
  - name: tag_group_id
    description: Identifier for grouping related tags
    tests:
    - not_null
  - name: order_id
    description: Unique identifier for a Shopify order
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for a Shopify order.
        For this table, each row represents a tag associated with an order. Since
        each order can have only one tag in this table structure, order_id is likely
        to be unique across rows.
  - name: color_tag
    description: Color code tag associated with the order
    tests:
    - not_null

stg_shopify_order_line_refund_data (first 100 rows)

store_location_id restock_type refunded_quantity refund_tax_amount original_order_line_id refund_id refund_line_item_id refund_subtotal
0 3.213171e+10 return 1 19.74 6113984839751 679976206407 189012115527 415.0
1 3.213171e+10 return 1 56.33 9698959196231 800919683143 289901510727 415.0
2 3.213171e+10 return 1 16.18 6423996530759 686409187399 196428005447 415.0
3 NaN no_restock 1 26.17 6367161483335 798222680135 286567268423 415.0
4 NaN no_restock 1 13.75 6009460064327 677359190087 185936773191 415.0

stg_shopify_order_line_refund_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_order_line_refund_data_projected" AS (
    -- Projection: Selecting 10 out of 11 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "location_id",
        "refund_id",
        "restock_type",
        "quantity",
        "order_line_id",
        "subtotal",
        "total_tax_set",
        "subtotal_set",
        "total_tax"
    FROM "shopify_order_line_refund_data"
),

"shopify_order_line_refund_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> refund_line_item_id
    -- location_id -> store_location_id
    -- quantity -> refunded_quantity
    -- order_line_id -> original_order_line_id
    -- subtotal -> refund_subtotal
    -- total_tax_set -> tax_amount_set
    -- total_tax -> refund_tax_amount
    SELECT 
        "id" AS "refund_line_item_id",
        "location_id" AS "store_location_id",
        "refund_id",
        "restock_type",
        "quantity" AS "refunded_quantity",
        "order_line_id" AS "original_order_line_id",
        "subtotal" AS "refund_subtotal",
        "total_tax_set" AS "tax_amount_set",
        "subtotal_set",
        "total_tax" AS "refund_tax_amount"
    FROM "shopify_order_line_refund_data_projected"
),

"shopify_order_line_refund_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- original_order_line_id: from INT to VARCHAR
    -- refund_id: from INT to VARCHAR
    -- refund_line_item_id: from INT to VARCHAR
    -- refund_subtotal: from INT to DECIMAL
    -- subtotal_set: from DECIMAL to VARCHAR
    -- tax_amount_set: from DECIMAL to VARCHAR
    SELECT
        "store_location_id",
        "restock_type",
        "refunded_quantity",
        "refund_tax_amount",
        CAST("original_order_line_id" AS VARCHAR) AS "original_order_line_id",
        CAST("refund_id" AS VARCHAR) AS "refund_id",
        CAST("refund_line_item_id" AS VARCHAR) AS "refund_line_item_id",
        CAST("refund_subtotal" AS DECIMAL) AS "refund_subtotal",
        CAST("subtotal_set" AS VARCHAR) AS "subtotal_set",
        CAST("tax_amount_set" AS VARCHAR) AS "tax_amount_set"
    FROM "shopify_order_line_refund_data_projected_renamed"
),

"shopify_order_line_refund_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- subtotal_set has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tax_amount_set has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "store_location_id",
        "restock_type",
        "refunded_quantity",
        "refund_tax_amount",
        "original_order_line_id",
        "refund_id",
        "refund_line_item_id",
        "refund_subtotal"
    FROM "shopify_order_line_refund_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_order_line_refund_data_projected_renamed_casted_missing_handled"

stg_shopify_order_line_refund_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_order_line_refund_data
  description: The table is about Shopify order line refund data. It includes details
    such as refund ID, location ID, restock type, quantity refunded, order line ID,
    subtotal, and tax information. Each row represents a single refund line item associated
    with an order. The table tracks both returns and no-restock refunds, providing
    financial and operational information for each refunded item.
  columns:
  - name: store_location_id
    description: Identifier for the store location
    cocoon_meta:
      missing_acceptable: Not applicable for 'no_restock' refund types.
  - name: restock_type
    description: Indicates if item is returned or not restocked
    tests:
    - not_null
    - accepted_values:
        values:
        - return
        - no_restock
  - name: refunded_quantity
    description: Number of items refunded
    tests:
    - not_null
  - name: refund_tax_amount
    description: Total tax amount refunded
    tests:
    - not_null
  - name: original_order_line_id
    description: Identifier for the original order line item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the identifier for the original order line
        item. For this table, each row is a single refund line item associated with
        an order. The original_order_line_id is likely to be unique across rows as
        each refund typically corresponds to a unique order line.
  - name: refund_id
    description: Unique identifier for the overall refund
    tests:
    - not_null
  - name: refund_line_item_id
    description: Unique identifier for the refund line item
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is the unique identifier for the refund line item. For
        this table, each row represents a distinct refund line item. Therefore, the
        refund_line_item_id should be unique across all rows.
  - name: refund_subtotal
    description: Refunded amount before tax
    tests:
    - not_null

stg_shopify_discount_code_data (first 100 rows)

discount_id discount_code price_rule_id usage_count created_at updated_at
0 4773499 CHECKVB34DDBQ3VH 32543 0.0 2021-12-10 06:48:35 2021-12-10 06:48:35
1 436267 CHECKVBLJG22DDD 12543 0.0 2021-12-10 06:48:35 2021-12-10 06:48:35
2 469035 CHECKV44CCCBCWB7 12543 0.0 2021-12-10 06:48:35 2021-12-10 06:48:35

stg_shopify_discount_code_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_discount_code_data_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "code",
        "created_at",
        "price_rule_id",
        "updated_at",
        "usage_count"
    FROM "shopify_discount_code_data"
),

"shopify_discount_code_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> discount_id
    -- code -> discount_code
    SELECT 
        "id" AS "discount_id",
        "code" AS "discount_code",
        "created_at",
        "price_rule_id",
        "updated_at",
        "usage_count"
    FROM "shopify_discount_code_data_projected"
),

"shopify_discount_code_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- updated_at: from VARCHAR to TIMESTAMP
    SELECT
        "discount_id",
        "discount_code",
        "price_rule_id",
        "usage_count",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at"
    FROM "shopify_discount_code_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_discount_code_data_projected_renamed_casted"

stg_shopify_discount_code_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_discount_code_data
  description: The table is about discount codes. It contains details such as the
    unique identifier, the actual code, creation and update timestamps, associated
    price rule ID, and usage count. Each row represents a specific discount code with
    its properties. The table tracks information needed to manage and apply discounts
    in an online store.
  columns:
  - name: discount_id
    description: Unique identifier for the discount code entry
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each discount code
        entry. For this table, each row represents a specific discount code, and discount_id
        is unique across rows.
  - name: discount_code
    description: Unique discount code for customer use
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the actual discount code that customers use.
        For this table, each row represents a specific discount code, and discount_code
        is unique across rows as each code is designed to be distinct.
  - name: price_rule_id
    description: ID of the associated pricing rule
    tests:
    - not_null
  - name: usage_count
    description: Number of times the discount code has been used
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the discount code was created
    tests:
    - not_null
  - name: updated_at
    description: Timestamp of the last update to the entry
    tests:
    - not_null

stg_shopify_abandoned_checkout_discount_code_data (first 100 rows)

checkout_id discount_index discount_amount discount_code discount_type discount_created_at discount_updated_at
0 901163 0 0.0 CYBER12 percentage NaT NaT
1 4334827 0 0.0 CYBER12 percentage NaT NaT
2 4566403 0 0.0 BONUS percentage NaT NaT

stg_shopify_abandoned_checkout_discount_code_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_abandoned_checkout_discount_code_data_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "checkout_id",
        "index_",
        "amount",
        "discount_id",
        "code",
        "created_at",
        "type",
        "updated_at",
        "usage_count"
    FROM "shopify_abandoned_checkout_discount_code_data"
),

"shopify_abandoned_checkout_discount_code_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> discount_index
    -- amount -> discount_amount
    -- code -> discount_code
    -- created_at -> discount_created_at
    -- type -> discount_type
    -- updated_at -> discount_updated_at
    -- usage_count -> discount_usage_count
    SELECT 
        "checkout_id",
        "index_" AS "discount_index",
        "amount" AS "discount_amount",
        "discount_id",
        "code" AS "discount_code",
        "created_at" AS "discount_created_at",
        "type" AS "discount_type",
        "updated_at" AS "discount_updated_at",
        "usage_count" AS "discount_usage_count"
    FROM "shopify_abandoned_checkout_discount_code_data_projected"
),

"shopify_abandoned_checkout_discount_code_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- discount_created_at: from DECIMAL to TIMESTAMP
    -- discount_id: from DECIMAL to VARCHAR
    -- discount_updated_at: from DECIMAL to TIMESTAMP
    -- discount_usage_count: from DECIMAL to INT
    SELECT
        "checkout_id",
        "discount_index",
        "discount_amount",
        "discount_code",
        "discount_type",
        CAST("discount_created_at" AS TIMESTAMP) AS "discount_created_at",
        CAST("discount_id" AS VARCHAR) AS "discount_id",
        CAST("discount_updated_at" AS TIMESTAMP) AS "discount_updated_at",
        CAST("discount_usage_count" AS INT) AS "discount_usage_count"
    FROM "shopify_abandoned_checkout_discount_code_data_projected_renamed"
),

"shopify_abandoned_checkout_discount_code_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- discount_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- discount_usage_count has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "checkout_id",
        "discount_index",
        "discount_amount",
        "discount_code",
        "discount_type",
        "discount_created_at",
        "discount_updated_at"
    FROM "shopify_abandoned_checkout_discount_code_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_abandoned_checkout_discount_code_data_projected_renamed_casted_missing_handled"

stg_shopify_abandoned_checkout_discount_code_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_abandoned_checkout_discount_code_data
  description: The table is about discount codes applied to abandoned checkouts in
    Shopify. It includes details like checkout ID, discount code, amount, type (percentage),
    and usage count. Each row represents a specific checkout with an applied discount
    code. The table tracks information about discounts offered to encourage completion
    of abandoned carts.
  columns:
  - name: checkout_id
    description: Unique identifier for the abandoned checkout
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each abandoned checkout.
        For this table, each row represents a specific checkout with an applied discount
        code. The checkout_id is unique across rows, as it's designed to uniquely
        identify each abandoned cart.
  - name: discount_index
    description: Position or order of the discount
    tests:
    - not_null
  - name: discount_amount
    description: Discount amount applied to the checkout
    tests:
    - not_null
  - name: discount_code
    description: Discount code applied to the checkout
    tests:
    - not_null
  - name: discount_type
    description: Type of discount (e.g., percentage)
    tests:
    - not_null
    - accepted_values:
        values:
        - percentage
        - fixed amount
        - buy one get one free (BOGO)
        - free shipping
        - bundle discount
        - loyalty points
        - seasonal discount
        - first-time customer discount
        - volume discount
        - rebate
  - name: discount_created_at
    description: Timestamp when the discount was created
    cocoon_meta:
      missing_acceptable: Not applicable for discounts that haven't been modified.
  - name: discount_updated_at
    description: Timestamp when the discount was last updated
    cocoon_meta:
      missing_acceptable: Not applicable for discounts that haven't been updated.

stg_shopify_customer_tag_data (first 100 rows)

tag_index tag_value customer_id
0 1 GGPP 9919268
1 1 GGPP 4404
2 1 GGPP 5509188

stg_shopify_customer_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_customer_tag_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "customer_id",
        "index_",
        "value_"
    FROM "shopify_customer_tag_data"
),

"shopify_customer_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> tag_index
    -- value_ -> tag_value
    SELECT 
        "customer_id",
        "index_" AS "tag_index",
        "value_" AS "tag_value"
    FROM "shopify_customer_tag_data_projected"
),

"shopify_customer_tag_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- customer_id: from INT to VARCHAR
    SELECT
        "tag_index",
        "tag_value",
        CAST("customer_id" AS VARCHAR) AS "customer_id"
    FROM "shopify_customer_tag_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_customer_tag_data_projected_renamed_casted"

stg_shopify_customer_tag_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_customer_tag_data
  description: The table is about customer tags in a Shopify system. It contains customer
    IDs and associated tag values. Each row represents a customer with their unique
    identifier and a corresponding tag. The 'index_' column suggests there might be
    multiple tags per customer, but only one tag ('GGPP') is shown in the samples.
  columns:
  - name: tag_index
    description: Potential indicator for multiple tags per customer
    tests:
    - not_null
  - name: tag_value
    description: The tag value associated with the customer
    tests:
    - not_null
  - name: customer_id
    description: Unique identifier for each customer
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each customer. For this table,
        each row represents a tag associated with a customer. customer_id appears
        to be unique across rows in the sample data, and it's described as a "Unique
        identifier for each customer" in the given information.

stg_shopify_transaction_data (first 100 rows)

transaction_type amount avs_result_code transaction_status authorization_code currency_code is_test_transaction created_at credit_card_bin credit_card_company credit_card_number cvv_result_code exchange_adjustment exchange_currency exchange_final_amount exchange_id exchange_original_amount order_id parent_transaction_id processed_at receipt_details refund_id transaction_id
0 sale 415.00 Z success abcd999999 USD False 2020-02-27 16:05:37 None None None None None None None None None 2181743870023 None 2020-02-27 16:05:37 { "charges": { "data": [ { "balance_transaction": { "exchange_rate": null } }] }} None 2667417567303
1 sale 415.00 Y success abcd888888 USD False 2020-01-12 20:06:37 None None None None None None None None None 2089104834631 None 2020-01-12 20:06:37 None None 2572210896967
2 sale 415.00 None success abcd77777 USD False 2020-02-26 00:12:37 None None None None None None None None None 2179107356743 None 2020-02-26 00:12:37 { "charges": { "data": [ { "balance_transaction": { "exchange_rate": "0.523" } }] }} None 2664325611591
3 sale 15.95 Y success abcd66666 USD False 2020-01-26 11:04:41 None None None None None None None None None 2114590769223 None 2020-01-26 11:04:41 None None 2595729735751
4 sale 212.12 None success abcd5555 USD False 2020-03-18 00:17:24 None None None None None None None None None 2214516916295 None 2020-03-18 00:17:24 { "charges": { "data": [ { "balance_transaction": { "exchange_rate": "0.96581" } }] }} None 2705030512711

stg_shopify_transaction_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"shopify_transaction_data_projected" AS (
    -- Projection: Selecting 30 out of 31 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "order_id",
        "refund_id",
        "amount",
        "authorization_",
        "created_at",
        "processed_at",
        "device_id",
        "gateway",
        "source_name",
        "message",
        "currency",
        "location_id",
        "parent_id",
        "payment_avs_result_code",
        "kind",
        "currency_exchange_id",
        "currency_exchange_adjustment",
        "currency_exchange_original_amount",
        "currency_exchange_final_amount",
        "currency_exchange_currency",
        "error_code",
        "status",
        "test",
        "user_id",
        "payment_credit_card_bin",
        "payment_cvv_result_code",
        "payment_credit_card_number",
        "payment_credit_card_company",
        "receipt"
    FROM "shopify_transaction_data"
),

"shopify_transaction_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> transaction_id
    -- authorization_ -> authorization_code
    -- gateway -> payment_gateway
    -- message -> transaction_message
    -- currency -> currency_code
    -- parent_id -> parent_transaction_id
    -- payment_avs_result_code -> avs_result_code
    -- kind -> transaction_type
    -- currency_exchange_id -> exchange_id
    -- currency_exchange_adjustment -> exchange_adjustment
    -- currency_exchange_original_amount -> exchange_original_amount
    -- currency_exchange_final_amount -> exchange_final_amount
    -- currency_exchange_currency -> exchange_currency
    -- status -> transaction_status
    -- test -> is_test_transaction
    -- payment_credit_card_bin -> credit_card_bin
    -- payment_cvv_result_code -> cvv_result_code
    -- payment_credit_card_number -> credit_card_number
    -- payment_credit_card_company -> credit_card_company
    -- receipt -> receipt_details
    SELECT 
        "id" AS "transaction_id",
        "order_id",
        "refund_id",
        "amount",
        "authorization_" AS "authorization_code",
        "created_at",
        "processed_at",
        "device_id",
        "gateway" AS "payment_gateway",
        "source_name",
        "message" AS "transaction_message",
        "currency" AS "currency_code",
        "location_id",
        "parent_id" AS "parent_transaction_id",
        "payment_avs_result_code" AS "avs_result_code",
        "kind" AS "transaction_type",
        "currency_exchange_id" AS "exchange_id",
        "currency_exchange_adjustment" AS "exchange_adjustment",
        "currency_exchange_original_amount" AS "exchange_original_amount",
        "currency_exchange_final_amount" AS "exchange_final_amount",
        "currency_exchange_currency" AS "exchange_currency",
        "error_code",
        "status" AS "transaction_status",
        "test" AS "is_test_transaction",
        "user_id",
        "payment_credit_card_bin" AS "credit_card_bin",
        "payment_cvv_result_code" AS "cvv_result_code",
        "payment_credit_card_number" AS "credit_card_number",
        "payment_credit_card_company" AS "credit_card_company",
        "receipt" AS "receipt_details"
    FROM "shopify_transaction_data_projected"
),

"shopify_transaction_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- payment_gateway: The problem is that 'gateway_here' is not a real payment gateway name but a placeholder. This indicates that the actual payment gateway information was not properly filled in or was intentionally obscured. In a real dataset, we would expect to see names of actual payment gateways such as PayPal, Stripe, Square, etc. Since we don't have any information about what the real gateway should be, we can't map it to a correct value. In this case, it's best to map it to an empty string to indicate missing data. 
    -- source_name: The problem is that 'source_name' appears to be a column header that has been mistakenly included in the data values, rather than actual source name data. This is unusual because column names should typically be separate from the data values. The correct values for a source_name column would be actual names of sources, not the column header itself. Since we don't have information about the correct source names, we should map this to an empty string to remove the erroneous data. 
    -- transaction_message: The problem is that 'message_here' is a placeholder value and not an actual transaction message. It appears to be the only value in the column, which suggests that real transaction messages are missing or were not properly recorded. The correct values should be actual transaction messages specific to each transaction, but since we don't have that information, we can't map it to a meaningful value. 
    SELECT
        "transaction_id",
        "order_id",
        "refund_id",
        "amount",
        "authorization_code",
        "created_at",
        "processed_at",
        "device_id",
        CASE
            WHEN "payment_gateway" = 'gateway_here' THEN ''
            ELSE "payment_gateway"
        END AS "payment_gateway",
        CASE
            WHEN "source_name" = 'source_name' THEN ''
            ELSE "source_name"
        END AS "source_name",
        CASE
            WHEN "transaction_message" = 'message_here' THEN ''
            ELSE "transaction_message"
        END AS "transaction_message",
        "currency_code",
        "location_id",
        "parent_transaction_id",
        "avs_result_code",
        "transaction_type",
        "exchange_id",
        "exchange_adjustment",
        "exchange_original_amount",
        "exchange_final_amount",
        "exchange_currency",
        "error_code",
        "transaction_status",
        "is_test_transaction",
        "user_id",
        "credit_card_bin",
        "cvv_result_code",
        "credit_card_number",
        "credit_card_company",
        "receipt_details"
    FROM "shopify_transaction_data_projected_renamed"
),

"shopify_transaction_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- payment_gateway: ['']
    -- source_name: ['']
    -- transaction_message: ['']
    SELECT 
        CASE
            WHEN "payment_gateway" = '' THEN NULL
            ELSE "payment_gateway"
        END AS "payment_gateway",
        CASE
            WHEN "source_name" = '' THEN NULL
            ELSE "source_name"
        END AS "source_name",
        CASE
            WHEN "transaction_message" = '' THEN NULL
            ELSE "transaction_message"
        END AS "transaction_message",
        "exchange_currency",
        "location_id",
        "transaction_type",
        "error_code",
        "credit_card_company",
        "amount",
        "transaction_id",
        "user_id",
        "order_id",
        "exchange_final_amount",
        "credit_card_number",
        "avs_result_code",
        "cvv_result_code",
        "parent_transaction_id",
        "refund_id",
        "transaction_status",
        "authorization_code",
        "credit_card_bin",
        "currency_code",
        "device_id",
        "exchange_adjustment",
        "exchange_original_amount",
        "is_test_transaction",
        "exchange_id",
        "created_at",
        "processed_at",
        "receipt_details"
    FROM "shopify_transaction_data_projected_renamed_cleaned"
),

"shopify_transaction_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- credit_card_bin: from DECIMAL to VARCHAR
    -- credit_card_company: from DECIMAL to VARCHAR
    -- credit_card_number: from DECIMAL to VARCHAR
    -- cvv_result_code: from DECIMAL to VARCHAR
    -- device_id: from DECIMAL to VARCHAR
    -- error_code: from DECIMAL to VARCHAR
    -- exchange_adjustment: from DECIMAL to VARCHAR
    -- exchange_currency: from DECIMAL to VARCHAR
    -- exchange_final_amount: from DECIMAL to VARCHAR
    -- exchange_id: from DECIMAL to VARCHAR
    -- exchange_original_amount: from DECIMAL to VARCHAR
    -- location_id: from DECIMAL to VARCHAR
    -- order_id: from INT to VARCHAR
    -- parent_transaction_id: from DECIMAL to VARCHAR
    -- processed_at: from VARCHAR to TIMESTAMP
    -- receipt_details: from VARCHAR to JSON
    -- refund_id: from DECIMAL to VARCHAR
    -- transaction_id: from INT to VARCHAR
    -- user_id: from DECIMAL to VARCHAR
    SELECT
        "payment_gateway",
        "source_name",
        "transaction_message",
        "transaction_type",
        "amount",
        "avs_result_code",
        "transaction_status",
        "authorization_code",
        "currency_code",
        "is_test_transaction",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("credit_card_bin" AS VARCHAR) AS "credit_card_bin",
        CAST("credit_card_company" AS VARCHAR) AS "credit_card_company",
        CAST("credit_card_number" AS VARCHAR) AS "credit_card_number",
        CAST("cvv_result_code" AS VARCHAR) AS "cvv_result_code",
        CAST("device_id" AS VARCHAR) AS "device_id",
        CAST("error_code" AS VARCHAR) AS "error_code",
        CAST("exchange_adjustment" AS VARCHAR) AS "exchange_adjustment",
        CAST("exchange_currency" AS VARCHAR) AS "exchange_currency",
        CAST("exchange_final_amount" AS VARCHAR) AS "exchange_final_amount",
        CAST("exchange_id" AS VARCHAR) AS "exchange_id",
        CAST("exchange_original_amount" AS VARCHAR) AS "exchange_original_amount",
        CAST("location_id" AS VARCHAR) AS "location_id",
        CAST("order_id" AS VARCHAR) AS "order_id",
        CAST("parent_transaction_id" AS VARCHAR) AS "parent_transaction_id",
        CAST("processed_at" AS TIMESTAMP) AS "processed_at",
        CAST("receipt_details" AS JSON) AS "receipt_details",
        CAST("refund_id" AS VARCHAR) AS "refund_id",
        CAST("transaction_id" AS VARCHAR) AS "transaction_id",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "shopify_transaction_data_projected_renamed_cleaned_null"
),

"shopify_transaction_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 7 columns with unacceptable missing values
    -- device_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- error_code has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- location_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- payment_gateway has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- source_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- transaction_message has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- user_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "transaction_type",
        "amount",
        "avs_result_code",
        "transaction_status",
        "authorization_code",
        "currency_code",
        "is_test_transaction",
        "created_at",
        "credit_card_bin",
        "credit_card_company",
        "credit_card_number",
        "cvv_result_code",
        "exchange_adjustment",
        "exchange_currency",
        "exchange_final_amount",
        "exchange_id",
        "exchange_original_amount",
        "order_id",
        "parent_transaction_id",
        "processed_at",
        "receipt_details",
        "refund_id",
        "transaction_id"
    FROM "shopify_transaction_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "shopify_transaction_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_shopify_transaction_data.yml (Document the table)

version: 2
models:
- name: stg_shopify_transaction_data
  description: The table is about financial transactions. It contains details like
    transaction ID, order ID, amount, currency, payment gateway, and status. Each
    row represents a single transaction. The table includes information on payment
    processing, currency exchange rates, and credit card details. It also has timestamps
    for when transactions were created and processed.
  columns:
  - name: transaction_type
    description: Type of transaction
    tests:
    - not_null
    - accepted_values:
        values:
        - sale
        - purchase
        - refund
        - exchange
        - rental
        - subscription
        - deposit
        - withdrawal
        - transfer
        - payment
  - name: amount
    description: Transaction amount
    tests:
    - not_null
  - name: avs_result_code
    description: Address Verification System result code
    tests:
    - accepted_values:
        values:
        - A
        - B
        - C
        - D
        - E
        - F
        - G
        - I
        - M
        - N
        - P
        - R
        - S
        - U
        - W
        - X
        - Y
        - Z
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without address verification.
  - name: transaction_status
    description: Status of the transaction
    tests:
    - not_null
    - accepted_values:
        values:
        - success
        - failure
        - pending
        - cancelled
        - rejected
        - refunded
        - expired
        - authorized
        - captured
        - settled
  - name: authorization_code
    description: Authorization code for the transaction
    tests:
    - not_null
  - name: currency_code
    description: Currency code of the transaction
    tests:
    - not_null
  - name: is_test_transaction
    description: Indicates if transaction is a test
    tests:
    - not_null
  - name: created_at
    description: Timestamp of transaction creation
    tests:
    - not_null
  - name: credit_card_bin
    description: Bank Identification Number of credit card
    cocoon_meta:
      missing_acceptable: Not applicable for non-credit card transactions.
  - name: credit_card_company
    description: Credit card company
    cocoon_meta:
      missing_acceptable: Not applicable for non-credit card transactions.
  - name: credit_card_number
    description: Masked credit card number
    cocoon_meta:
      missing_acceptable: Not applicable for non-credit card transactions.
  - name: cvv_result_code
    description: Card Verification Value result code
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without CVV verification.
  - name: exchange_adjustment
    description: Adjustment for currency exchange
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without currency exchange.
  - name: exchange_currency
    description: Currency used in exchange
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without currency exchange.
  - name: exchange_final_amount
    description: Final amount after currency exchange
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without currency exchange.
  - name: exchange_id
    description: Identifier for currency exchange
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without currency exchange.
  - name: exchange_original_amount
    description: Original amount before currency exchange
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without currency exchange.
  - name: order_id
    description: Identifier for the associated order
    tests:
    - not_null
  - name: parent_transaction_id
    description: Identifier of parent transaction if applicable
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without a parent transaction.
  - name: processed_at
    description: Timestamp of transaction processing
    tests:
    - not_null
  - name: receipt_details
    description: Receipt details in JSON format
    cocoon_meta:
      missing_acceptable: Not applicable for transactions without detailed receipt
        information.
  - name: refund_id
    description: Identifier for associated refund
    cocoon_meta:
      missing_acceptable: Not applicable for transactions that are not refunds.
  - name: transaction_id
    description: Unique identifier for the transaction
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each transaction.
        For this table, each row is a unique transaction. Transaction IDs are typically
        designed to be unique across all transactions in a system.
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

snapshot_shopify_location_data (first 100 rows)

is_deleted location_name is_active province_state is_legacy local_province_name country_name province_state_code primary_address iso_country_code location_id local_country_name country_code creation_timestamp postal_code secondary_address
0 False Plum True None True None United States None None US 8777748 United States US 2019-06-11 15:58:20 None None
1 False Plum Express True NY False New York United States NY 111 Tree Road US 7748 United States US 2018-12-10 16:24:07 7394.0 None

snapshot_shopify_location_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "location_id"
-- Effective date columns are "last_update_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "is_deleted",
    "location_name",
    "is_active",
    "province_state",
    "is_legacy",
    "local_province_name",
    "country_name",
    "province_state_code",
    "primary_address",
    "iso_country_code",
    "location_id",
    "local_country_name",
    "country_code",
    "creation_timestamp",
    "postal_code",
    "secondary_address"
FROM (
     SELECT 
            "is_deleted",
            "location_name",
            "is_active",
            "province_state",
            "is_legacy",
            "local_province_name",
            "country_name",
            "province_state_code",
            "primary_address",
            "iso_country_code",
            "location_id",
            "local_country_name",
            "country_code",
            "creation_timestamp",
            "postal_code",
            "secondary_address",
            ROW_NUMBER() OVER (
                PARTITION BY "location_id" 
                ORDER BY "last_update_timestamp" 
            DESC) AS "cocoon_rn"
    FROM "stg_shopify_location_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_shopify_location_data.yml (Document the table)

version: 2
models:
- name: snapshot_shopify_location_data
  description: The table contains the latest information about Shopify store locations.
    It tracks the most recent version of each unique location, identified by location_id.
    The table includes details such as location name, address, country, province/state,
    postal code, and status (active/inactive). It covers both physical and online
    store locations, omitting historical versions and update timestamps.
  columns:
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: location_name
    description: Name of the store location
    tests:
    - not_null
  - name: is_active
    description: Indicates if the location is currently active
    tests:
    - not_null
  - name: province_state
    description: Province or state of the location
    tests:
    - not_null
    - accepted_values:
        values:
        - AL
        - AK
        - AZ
        - AR
        - CA
        - CO
        - CT
        - DE
        - FL
        - GA
        - HI
        - ID
        - IL
        - IN
        - IA
        - KS
        - KY
        - LA
        - ME
        - MD
        - MA
        - MI
        - MN
        - MS
        - MO
        - MT
        - NE
        - NV
        - NH
        - NJ
        - NM
        - NY
        - NC
        - ND
        - OH
        - OK
        - OR
        - PA
        - RI
        - SC
        - SD
        - TN
        - TX
        - UT
        - VT
        - VA
        - WA
        - WV
        - WI
        - WY
  - name: is_legacy
    description: Indicates if the location is a legacy entry
    tests:
    - not_null
  - name: local_province_name
    description: Province name in local language
    tests:
    - not_null
  - name: country_name
    description: Full name of the country
    tests:
    - not_null
  - name: province_state_code
    description: Code for the province or state
    tests:
    - not_null
  - name: primary_address
    description: Primary address line of the location
    tests:
    - not_null
  - name: iso_country_code
    description: ISO country code of the location
    tests:
    - not_null
  - name: location_id
    description: Unique identifier for the location
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: local_country_name
    description: Country name in local language
    tests:
    - not_null
  - name: country_code
    description: Country code where the location is situated
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp when the location was created
    tests:
    - not_null
  - name: postal_code
    description: Postal or ZIP code of the location
    tests:
    - not_null
  - name: secondary_address
    description: Secondary address line of the location
    cocoon_meta:
      missing_acceptable: Not all locations have or need a secondary address.
cocoon_meta:
  scd_base_table: stg_shopify_location_data

snapshot_shopify_product_data (first 100 rows)

product_title product_handle product_type vendor_id visibility_scope is_deleted created_at product_id published_at
0 1fccbdc6ac5f6edabf76e56eb0460019 f4b6d0e4413a19b2e7a291f0ef4dc98f fdb42fcb90ecd31c015932ffcd313014 13aea892c8de2d62f2608c6191cfab1f web False 2020-02-14 19:18:05 4506451050593 2020-02-14 19:02:02
1 c6c6fea8419b94103b0b05d64a5bab10 f0a656254aca08bf40181226ac13418c fdb42fcb90ecd31c015932ffcd313014 57403999f78b01b3fd325ba256eafe94 global False 2020-02-14 02:09:59 4505775439969 2020-02-14 02:09:59
2 327ea22d0f91783418e519cb45a4a3e9 129181bbc087330e216a6a4d7939f00b ec3bb3dd6e9d1f348a040ee7b45f1a72 13aea892c8de2d62f2608c6191cfab1f web False 2020-03-04 05:04:32 4526236893281 2020-03-04 05:04:32

snapshot_shopify_product_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "product_id"
-- Effective date columns are "updated_at"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "product_title",
    "product_handle",
    "product_type",
    "vendor_id",
    "visibility_scope",
    "is_deleted",
    "created_at",
    "product_id",
    "published_at"
FROM (
     SELECT 
            "product_title",
            "product_handle",
            "product_type",
            "vendor_id",
            "visibility_scope",
            "is_deleted",
            "created_at",
            "product_id",
            "published_at",
            ROW_NUMBER() OVER (
                PARTITION BY "product_id" 
                ORDER BY "updated_at" 
            DESC) AS "cocoon_rn"
    FROM "stg_shopify_product_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_shopify_product_data.yml (Document the table)

version: 2
models:
- name: snapshot_shopify_product_data
  description: The table contains the latest Shopify product data. It includes current
    product details such as ID, title, handle, type, vendor, visibility, and deletion
    status. The table tracks the most recent version of each unique product on the
    Shopify platform. It provides a snapshot of up-to-date product information without
    historical versions or update timestamps.
  columns:
  - name: product_title
    description: Name or title of the product
    tests:
    - not_null
  - name: product_handle
    description: Unique URL-friendly string for the product
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique URL-friendly string for the product.
        For this table, each row is for a unique product. The product handle is typically
        generated to be unique for each product in Shopify, making it a good candidate
        for a key.
  - name: product_type
    description: Category or type of the product
    tests:
    - not_null
  - name: vendor_id
    description: Identifier for the product's vendor
    tests:
    - not_null
  - name: visibility_scope
    description: Visibility scope of the product (web/global)
    tests:
    - not_null
    - accepted_values:
        values:
        - web
        - global
  - name: is_deleted
    description: Indicates if the product has been deleted
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the product was created
    tests:
    - not_null
  - name: product_id
    description: Unique identifier for the product
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: published_at
    description: Timestamp when the product was published
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_shopify_product_data

snapshot_shopify_price_rule_data (first 100 rows)

price_rule_id allocation_method customer_eligibility one_time_use subtotal_prerequisite discount_target target_type price_rule_name discount_value discount_type allocation_limit creation_date expiration_date start_date usage_limit
0 564075 across all False NaN entitled line_item THANKS 0.0 percentage None 2021-11-10 22:26:31 2021-11-30 14:00:59 2021-11-10 22:25:32 None
1 9339 across all False NaN all line_item THANKS 0.0 percentage None 2021-11-11 22:38:18 2021-12-02 19:00:59 2021-11-23 21:30:38 None
2 11443 across all False 500.0 all line_item GIFTCARD 0.0 percentage None 2021-03-09 18:57:54 2021-03-22 07:00:59 2021-03-17 04:00:57 None

snapshot_shopify_price_rule_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "price_rule_id"
-- Effective date columns are "last_updated"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "price_rule_id",
    "allocation_method",
    "customer_eligibility",
    "one_time_use",
    "subtotal_prerequisite",
    "discount_target",
    "target_type",
    "price_rule_name",
    "discount_value",
    "discount_type",
    "allocation_limit",
    "creation_date",
    "expiration_date",
    "start_date",
    "usage_limit"
FROM (
     SELECT 
            "price_rule_id",
            "allocation_method",
            "customer_eligibility",
            "one_time_use",
            "subtotal_prerequisite",
            "discount_target",
            "target_type",
            "price_rule_name",
            "discount_value",
            "discount_type",
            "allocation_limit",
            "creation_date",
            "expiration_date",
            "start_date",
            "usage_limit",
            ROW_NUMBER() OVER (
                PARTITION BY "price_rule_id" 
                ORDER BY "last_updated" 
            DESC) AS "cocoon_rn"
    FROM "stg_shopify_price_rule_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_shopify_price_rule_data.yml (Document the table)

version: 2
models:
- name: snapshot_shopify_price_rule_data
  description: The table tracks the most recent versions of Shopify price rules. It
    contains details of current discount configurations, including rule IDs, customer
    eligibility, and discount types. Each rule specifies target products, discount
    values, and any prerequisites. The table omits historical versions and update
    timestamps. It provides a snapshot of active price rules for managing discounts
    in the Shopify platform.
  columns:
  - name: price_rule_id
    description: Unique identifier for the price rule
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: allocation_method
    description: Method for allocating discount across products
    tests:
    - not_null
    - accepted_values:
        values:
        - proportional
        - equal
        - first item
        - last item
        - highest priced item
        - lowest priced item
        - random
        - across
  - name: customer_eligibility
    description: Specifies which customers are eligible
    tests:
    - not_null
    - accepted_values:
        values:
        - all
        - new
        - existing
        - premium
        - standard
        - vip
        - loyalty_program
        - first_time
        - returning
        - age_18_plus
        - age_21_plus
        - students
        - seniors
        - military
        - corporate
  - name: one_time_use
    description: Indicates if discount is one-time use
    tests:
    - not_null
  - name: subtotal_prerequisite
    description: Required subtotal range for discount eligibility
    cocoon_meta:
      missing_acceptable: Not applicable when no minimum purchase is required.
  - name: discount_target
    description: Specifies which items the discount applies to
    tests:
    - not_null
    - accepted_values:
        values:
        - all
        - entitled
        - specific
  - name: target_type
    description: Type of target for the discount
    tests:
    - not_null
    - accepted_values:
        values:
        - line_item
        - order
        - shipping
        - product
        - category
        - customer
        - customer_group
  - name: price_rule_name
    description: Name or title of the price rule
    tests:
    - not_null
  - name: discount_value
    description: Numerical value of the discount
    tests:
    - not_null
  - name: discount_type
    description: Type of value (percentage or fixed amount)
    tests:
    - not_null
    - accepted_values:
        values:
        - percentage
        - fixed amount
  - name: allocation_limit
    description: Limits how discount is allocated
    cocoon_meta:
      missing_acceptable: Not applicable when allocation method is 'across'.
  - name: creation_date
    description: Timestamp when the price rule was created
    tests:
    - not_null
  - name: expiration_date
    description: Timestamp when the price rule expires
    tests:
    - not_null
  - name: start_date
    description: Timestamp when the price rule becomes active
    tests:
    - not_null
  - name: usage_limit
    description: Maximum number of times rule can be used
    cocoon_meta:
      missing_acceptable: Not applicable when there's no limit on usage.
cocoon_meta:
  scd_base_table: stg_shopify_price_rule_data

snapshot_shopify_product_variant_data (first 100 rows)

title display_position inventory_policy fulfillment_service inventory_management is_taxable weight_grams stock_quantity weight_unit previous_stock_quantity requires_shipping tax_code option_1 created_at image_id inventory_item_id price product_id variant_id weight
0 my title here 1 deny manual None False 0 0 lb 0 False None my title here 2021-03-17 16:39:45 None 41367035936839 222 6544066379847 39273118957639 0.0
1 my other title 1 deny manual inventory manager True 222 0 lb 0 True TR9999 my other title 2019-06-25 18:32:03 None 30309980143686 444 3879735590982 29217058947142 1.0
2 my title here 1 deny manual None False 0 -5 lb -5 False None my title here 2021-03-08 16:31:31 None 41356022644807 333 6540109250631 39262115397703 0.0
3 my title here 1 deny manual inventory manager True 0 0 lb 0 True None my title here 2021-03-30 19:48:15 None 41384094924871 5 6548438188103 39290169262151 0.0
4 my title here 1 deny manual None False 0 0 lb 0 False None my title here 2021-03-08 16:30:15 None 41356021661767 111 6540108431431 39262114414663 0.0

snapshot_shopify_product_variant_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "variant_id"
-- Effective date columns are "updated_at"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "title",
    "display_position",
    "inventory_policy",
    "fulfillment_service",
    "inventory_management",
    "is_taxable",
    "weight_grams",
    "stock_quantity",
    "weight_unit",
    "previous_stock_quantity",
    "requires_shipping",
    "tax_code",
    "option_1",
    "created_at",
    "image_id",
    "inventory_item_id",
    "price",
    "product_id",
    "variant_id",
    "weight"
FROM (
     SELECT 
            "title",
            "display_position",
            "inventory_policy",
            "fulfillment_service",
            "inventory_management",
            "is_taxable",
            "weight_grams",
            "stock_quantity",
            "weight_unit",
            "previous_stock_quantity",
            "requires_shipping",
            "tax_code",
            "option_1",
            "created_at",
            "image_id",
            "inventory_item_id",
            "price",
            "product_id",
            "variant_id",
            "weight",
            ROW_NUMBER() OVER (
                PARTITION BY "variant_id" 
                ORDER BY "updated_at" 
            DESC) AS "cocoon_rn"
    FROM "stg_shopify_product_variant_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_shopify_product_variant_data.yml (Document the table)

version: 2
models:
- name: snapshot_shopify_product_variant_data
  description: The table is about current Shopify product variants. It tracks the
    most recent version of each variant, including its title, price, inventory status,
    and shipping details. Each row represents a unique product variant identified
    by its variant ID. The table excludes historical versions and update timestamps,
    focusing on the latest information for each variant in the Shopify e-commerce
    system.
  columns:
  - name: title
    description: Title or name of the variant
    tests:
    - not_null
  - name: display_position
    description: Position of the variant in listings
    tests:
    - not_null
  - name: inventory_policy
    description: Policy for handling out-of-stock items
    tests:
    - not_null
    - accepted_values:
        values:
        - deny
        - backorder
        - substitute
        - notify
        - waitlist
  - name: fulfillment_service
    description: Service used for order fulfillment
    tests:
    - not_null
    - accepted_values:
        values:
        - manual
        - amazon
        - shipwire
        - webgistix
        - shipstation
        - shopify_fulfillment
        - third_party
        - self_fulfilled
        - drop_ship
        - fba (Fulfillment by Amazon)
        - external
  - name: inventory_management
    description: Method used for inventory management
    tests:
    - not_null
    - accepted_values:
        values:
        - inventory manager
        - just-in-time (JIT)
        - economic order quantity (EOQ)
        - abc analysis
        - first-in, first-out (FIFO)
        - last-in, first-out (LIFO)
        - safety stock
        - vendor-managed inventory (VMI)
        - consignment inventory
        - dropshipping
        - perpetual inventory system
        - periodic inventory system
        - barcode system
        - radio-frequency identification (RFID)
        - cycle counting
        - min-max inventory method
        - reorder point planning
        - materials requirement planning (MRP)
        - batch tracking
        - demand forecasting
  - name: is_taxable
    description: Indicates if the variant is taxable
    tests:
    - not_null
  - name: weight_grams
    description: Weight of the product in grams
    tests:
    - not_null
  - name: stock_quantity
    description: Current quantity in stock
    tests:
    - not_null
  - name: weight_unit
    description: Unit of measurement for weight
    tests:
    - not_null
    - accepted_values:
        values:
        - lb
        - kg
        - g
        - oz
        - stone
        - ton
        - metric ton
        - mg
  - name: previous_stock_quantity
    description: Previous quantity in stock
    tests:
    - not_null
  - name: requires_shipping
    description: Indicates if shipping is required
    tests:
    - not_null
  - name: tax_code
    description: Tax code for the variant
    tests:
    - not_null
  - name: option_1
    description: Primary product option
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the variant was created
    tests:
    - not_null
  - name: image_id
    description: Identifier for the variant's image
    cocoon_meta:
      missing_acceptable: Not all products require an image.
  - name: inventory_item_id
    description: Identifier for inventory tracking
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is an identifier for inventory tracking. For this table,
        each row is for a specific product variant. As it's an identifier specifically
        for inventory items, it's likely to be unique for each variant.
  - name: price
    description: Current price of the variant
    tests:
    - not_null
  - name: product_id
    description: Identifier of the parent product
    tests:
    - not_null
  - name: variant_id
    description: Unique identifier for the variant
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: weight
    description: Weight of the product
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_shopify_product_variant_data
We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_shopify_fulfillment_data stg_shopify_fulfillment_data snapshot_shopify_location_data snapshot_shopify_location_data stg_shopify_fulfillment_data->snapshot_shopify_location_data stg_shopify_abandoned_checkout_data stg_shopify_abandoned_checkout_data stg_shopify_customer_data stg_shopify_customer_data stg_shopify_abandoned_checkout_data->stg_shopify_customer_data stg_shopify_transaction_data stg_shopify_transaction_data stg_shopify_order_data stg_shopify_order_data stg_shopify_transaction_data->stg_shopify_order_data stg_shopify_order_line_refund_data stg_shopify_order_line_refund_data stg_shopify_refund_data stg_shopify_refund_data stg_shopify_order_line_refund_data->stg_shopify_refund_data stg_shopify_order_line_data stg_shopify_order_line_data stg_shopify_order_line_refund_data->stg_shopify_order_line_data stg_shopify_shop_data stg_shopify_shop_data stg_shopify_shop_data->snapshot_shopify_location_data stg_shopify_discount_code_data stg_shopify_discount_code_data snapshot_shopify_price_rule_data snapshot_shopify_price_rule_data stg_shopify_discount_code_data->snapshot_shopify_price_rule_data stg_shopify_abandoned_checkout_shipping_line_data stg_shopify_abandoned_checkout_shipping_line_data stg_shopify_abandoned_checkout_shipping_line_data->stg_shopify_abandoned_checkout_data snapshot_shopify_product_data snapshot_shopify_product_data stg_shopify_inventory_level_data stg_shopify_inventory_level_data stg_shopify_inventory_level_data->snapshot_shopify_location_data stg_shopify_inventory_item_data stg_shopify_inventory_item_data stg_shopify_inventory_level_data->stg_shopify_inventory_item_data stg_shopify_order_note_attribute_data stg_shopify_order_note_attribute_data stg_shopify_order_note_attribute_data->stg_shopify_order_data stg_shopify_refund_data->stg_shopify_customer_data stg_shopify_order_tag_data stg_shopify_order_tag_data stg_shopify_order_tag_data->stg_shopify_order_data stg_shopify_order_shipping_line_data stg_shopify_order_shipping_line_data stg_shopify_order_shipping_line_data->stg_shopify_order_data stg_shopify_abandoned_checkout_discount_code_data stg_shopify_abandoned_checkout_discount_code_data stg_shopify_abandoned_checkout_discount_code_data->stg_shopify_abandoned_checkout_data stg_shopify_customer_tag_data stg_shopify_customer_tag_data stg_shopify_customer_tag_data->stg_shopify_customer_data stg_shopify_product_tag_data stg_shopify_product_tag_data stg_shopify_product_tag_data->snapshot_shopify_product_data stg_shopify_collection_product_data stg_shopify_collection_product_data stg_shopify_collection_product_data->snapshot_shopify_product_data stg_shopify_collection_data stg_shopify_collection_data stg_shopify_collection_product_data->stg_shopify_collection_data snapshot_shopify_product_variant_data snapshot_shopify_product_variant_data snapshot_shopify_product_variant_data->snapshot_shopify_product_data snapshot_shopify_product_variant_data->stg_shopify_inventory_item_data stg_shopify_product_image_data stg_shopify_product_image_data snapshot_shopify_product_variant_data->stg_shopify_product_image_data stg_shopify_order_data->stg_shopify_customer_data stg_shopify_order_line_data->snapshot_shopify_product_data stg_shopify_order_line_data->snapshot_shopify_product_variant_data stg_shopify_order_line_data->stg_shopify_order_data stg_shopify_order_adjustment_data stg_shopify_order_adjustment_data stg_shopify_order_adjustment_data->stg_shopify_refund_data stg_shopify_order_adjustment_data->stg_shopify_order_data stg_shopify_order_shipping_tax_line_data stg_shopify_order_shipping_tax_line_data stg_shopify_order_shipping_tax_line_data->stg_shopify_order_shipping_line_data stg_shopify_order_discount_code_data stg_shopify_order_discount_code_data stg_shopify_order_discount_code_data->stg_shopify_order_data stg_shopify_order_url_tag_data stg_shopify_order_url_tag_data stg_shopify_order_url_tag_data->stg_shopify_order_data stg_shopify_metafield_data stg_shopify_metafield_data stg_shopify_metafield_data->stg_shopify_order_data stg_shopify_tender_transaction_data stg_shopify_tender_transaction_data stg_shopify_tender_transaction_data->stg_shopify_transaction_data stg_shopify_tender_transaction_data->stg_shopify_order_data stg_shopify_product_image_data->snapshot_shopify_product_data stg_shopify_fulfillment_event_data stg_shopify_fulfillment_event_data stg_shopify_fulfillment_event_data->stg_shopify_fulfillment_data stg_shopify_fulfillment_event_data->stg_shopify_shop_data stg_shopify_fulfillment_event_data->stg_shopify_order_data

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_shopify_abandoned_checkout_data
  primary_key: checkout_id
  foreign_keys:
  - column: customer_id
    reference:
      table_name: stg_shopify_customer_data
      column: customer_id
- table_name: stg_shopify_abandoned_checkout_discount_code_data
  foreign_keys:
  - column: checkout_id
    reference:
      table_name: stg_shopify_abandoned_checkout_data
      column: checkout_id
- table_name: stg_shopify_abandoned_checkout_shipping_line_data
  foreign_keys:
  - column: checkout_id
    reference:
      table_name: stg_shopify_abandoned_checkout_data
      column: checkout_id
- table_name: stg_shopify_collection_data
  primary_key: collection_id
  foreign_keys: []
- table_name: stg_shopify_collection_product_data
  foreign_keys:
  - column: collection_id
    reference:
      table_name: stg_shopify_collection_data
      column: collection_id
  - column: product_id
    reference:
      table_name: snapshot_shopify_product_data
      column: product_id
- table_name: stg_shopify_customer_data
  primary_key: customer_id
  foreign_keys: []
- table_name: stg_shopify_customer_tag_data
  foreign_keys:
  - column: customer_id
    reference:
      table_name: stg_shopify_customer_data
      column: customer_id
- table_name: stg_shopify_order_data
  foreign_keys:
  - column: customer_id
    reference:
      table_name: stg_shopify_customer_data
      column: customer_id
  primary_key: order_id
- table_name: stg_shopify_refund_data
  foreign_keys:
  - column: customer_id
    reference:
      table_name: stg_shopify_customer_data
      column: customer_id
  primary_key: refund_id
- table_name: stg_shopify_fulfillment_data
  primary_key: fulfillment_id
  foreign_keys:
  - column: location_id
    reference:
      table_name: snapshot_shopify_location_data
      column: location_id
- table_name: stg_shopify_fulfillment_event_data
  foreign_keys:
  - column: fulfillment_id
    reference:
      table_name: stg_shopify_fulfillment_data
      column: fulfillment_id
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  - column: shop_id
    reference:
      table_name: stg_shopify_shop_data
      column: shop_id
- table_name: stg_shopify_inventory_item_data
  primary_key: item_id
  foreign_keys: []
- table_name: stg_shopify_inventory_level_data
  foreign_keys:
  - column: inventory_item_id
    reference:
      table_name: stg_shopify_inventory_item_data
      column: item_id
  - column: location_id
    reference:
      table_name: snapshot_shopify_location_data
      column: location_id
- table_name: snapshot_shopify_product_variant_data
  foreign_keys:
  - column: inventory_item_id
    reference:
      table_name: stg_shopify_inventory_item_data
      column: item_id
  - column: image_id
    reference:
      table_name: stg_shopify_product_image_data
      column: image_id
  - column: product_id
    reference:
      table_name: snapshot_shopify_product_data
      column: product_id
  primary_key: variant_id
- table_name: stg_shopify_order_adjustment_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  - column: refund_id
    reference:
      table_name: stg_shopify_refund_data
      column: refund_id
- table_name: stg_shopify_order_discount_code_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
- table_name: stg_shopify_order_line_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  - column: product_id
    reference:
      table_name: snapshot_shopify_product_data
      column: product_id
  - column: variant_id
    reference:
      table_name: snapshot_shopify_product_variant_data
      column: variant_id
  primary_key: line_item_id
- table_name: stg_shopify_order_note_attribute_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
- table_name: stg_shopify_order_shipping_line_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  primary_key: shipping_line_id
- table_name: stg_shopify_order_tag_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
- table_name: stg_shopify_order_url_tag_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
- table_name: stg_shopify_metafield_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
- table_name: stg_shopify_transaction_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  primary_key: transaction_id
- table_name: stg_shopify_tender_transaction_data
  foreign_keys:
  - column: order_id
    reference:
      table_name: stg_shopify_order_data
      column: order_id
  - column: transaction_id
    reference:
      table_name: stg_shopify_transaction_data
      column: transaction_id
- table_name: stg_shopify_order_line_refund_data
  foreign_keys:
  - column: original_order_line_id
    reference:
      table_name: stg_shopify_order_line_data
      column: line_item_id
  - column: refund_id
    reference:
      table_name: stg_shopify_refund_data
      column: refund_id
- table_name: stg_shopify_order_shipping_tax_line_data
  foreign_keys:
  - column: order_shipping_line_id
    reference:
      table_name: stg_shopify_order_shipping_line_data
      column: shipping_line_id
- table_name: stg_shopify_product_image_data
  primary_key: image_id
  foreign_keys:
  - column: product_id
    reference:
      table_name: snapshot_shopify_product_data
      column: product_id
- table_name: stg_shopify_shop_data
  primary_key: shop_id
  foreign_keys:
  - column: primary_location_id
    reference:
      table_name: snapshot_shopify_location_data
      column: location_id
- table_name: snapshot_shopify_location_data
  primary_key: location_id
  foreign_keys: []
- table_name: snapshot_shopify_price_rule_data
  primary_key: price_rule_id
  foreign_keys: []
- table_name: stg_shopify_discount_code_data
  foreign_keys:
  - column: price_rule_id
    reference:
      table_name: snapshot_shopify_price_rule_data
      column: price_rule_id
- table_name: snapshot_shopify_product_data
  primary_key: product_id
  foreign_keys: []
- table_name: stg_shopify_product_tag_data
  foreign_keys:
  - column: product_id
    reference:
      table_name: snapshot_shopify_product_data
      column: product_id
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Abandoned Checkouts
  entity_description: Represents incomplete orders or abandoned carts in a Shopify
    store
  table_name: stg_shopify_abandoned_checkout_data
  primary_key: checkout_id
- entity_name: Collections
  entity_description: Represents groups of products in a Shopify store
  table_name: stg_shopify_collection_data
  primary_key: collection_id
- entity_name: Customers
  entity_description: Represents individual customers who have interacted with the
    Shopify store
  table_name: stg_shopify_customer_data
  primary_key: customer_id
- entity_name: Fulfillments
  entity_description: Represents the process of preparing and shipping orders to customers
  table_name: stg_shopify_fulfillment_data
  primary_key: fulfillment_id
- entity_name: Inventory Items
  entity_description: Represents individual items in the store's inventory
  table_name: stg_shopify_inventory_item_data
  primary_key: item_id
- entity_name: Orders
  entity_description: Represents customer purchases or transactions in the Shopify
    store
  table_name: stg_shopify_order_data
  primary_key: order_id
- entity_name: Order Line Items
  entity_description: Represents individual products within an order
  table_name: stg_shopify_order_line_data
  primary_key: line_item_id
- entity_name: Order Shipping Lines
  entity_description: Represents shipping information for specific orders
  table_name: stg_shopify_order_shipping_line_data
  primary_key: shipping_line_id
- entity_name: Product Images
  entity_description: Represents visual representations of products in the Shopify
    store
  table_name: stg_shopify_product_image_data
  primary_key: image_id
- entity_name: Refunds
  entity_description: Represents transactions where money is returned to customers
  table_name: stg_shopify_refund_data
  primary_key: refund_id
- entity_name: Shops
  entity_description: Represents individual Shopify stores with their configurations
    and details
  table_name: stg_shopify_shop_data
  primary_key: shop_id
- entity_name: Transactions
  entity_description: Represents financial transactions associated with orders
  table_name: stg_shopify_transaction_data
  primary_key: transaction_id
- entity_name: Locations
  entity_description: Represents physical or online locations associated with the
    Shopify store
  table_name: snapshot_shopify_location_data
  primary_key: location_id
- entity_name: Price Rules
  entity_description: Represents discount configurations and pricing rules in the
    Shopify store
  table_name: snapshot_shopify_price_rule_data
  primary_key: price_rule_id
- entity_name: Products
  entity_description: Represents items for sale in the Shopify store
  table_name: snapshot_shopify_product_data
  primary_key: product_id
- entity_name: Product Variants
  entity_description: Represents specific versions or variations of products in the
    Shopify store
  table_name: snapshot_shopify_product_variant_data
  primary_key: variant_id
relations:
- relation_name: CustomerAbandonedCheckouts
  relation_description: This tracks Abandoned Checkouts initiated by Customers who
    started but didn't complete the purchase process on the Shopify store.
  table_name: stg_shopify_abandoned_checkout_data
  entities:
  - Abandoned Checkouts
  - Customers
- relation_name: FulfillmentLocationAssociation
  relation_description: Fulfillments are processed at specific Locations within Shopify's
    platform for order shipping and delivery tracking.
  table_name: stg_shopify_fulfillment_data
  entities:
  - Fulfillments
  - Locations
- relation_name: CustomerOrders
  relation_description: This stores the Orders placed by Customers, including details
    of the purchase, shipping, and billing information.
  table_name: stg_shopify_order_data
  entities:
  - Orders
  - Customers
- relation_name: OrderLineItemDetails
  relation_description: Order Line Items detail specific Products and their Variants
    within Orders, connecting individual purchases to the broader Product catalog.
  table_name: stg_shopify_order_line_data
  entities:
  - Order Line Items
  - Orders
  - Products
  - Product Variants
- relation_name: OrderShippingDetails
  relation_description: Order Shipping Lines provide detailed shipping information
    for individual Orders, including pricing and carrier details.
  table_name: stg_shopify_order_shipping_line_data
  entities:
  - Order Shipping Lines
  - Orders
- relation_name: ProductImageAssociation
  relation_description: Product Images are visual representations of Products, with
    each Product potentially having multiple associated images.
  table_name: stg_shopify_product_image_data
  entities:
  - Product Images
  - Products
- relation_name: CustomerRefundDetails
  relation_description: This tracks Refunds issued to Customers, detailing the reimbursement
    process for specific orders in the Shopify system.
  table_name: stg_shopify_refund_data
  entities:
  - Refunds
  - Customers
- relation_name: ShopOperatesInLocations
  relation_description: Shops operate in one or more Locations, with each shop having
    a primary location and potentially multiple additional locations.
  table_name: stg_shopify_shop_data
  entities:
  - Shops
  - Locations
- relation_name: OrderTransactions
  relation_description: Transactions record financial details of Orders, including
    payment processing and currency information for each order.
  table_name: stg_shopify_transaction_data
  entities:
  - Transactions
  - Orders
- relation_name: ShopifyProductVariantDetails
  relation_description: Product Variants are specific versions of Products, associated
    with Inventory Items for stock management and optionally linked to Product Images
    for visual representation.
  table_name: snapshot_shopify_product_variant_data
  entities:
  - Product Variants
  - Inventory Items
  - Product Images
  - Products
- relation_description: This table tracks discount codes applied to abandoned checkouts
    in Shopify, providing details about each abandoned cart's associated discount.
  table_name: stg_shopify_abandoned_checkout_discount_code_data
  entities:
  - Abandoned Checkouts
- relation_description: This table captures shipping line details for abandoned checkouts
    in Shopify, representing unfulfilled purchase attempts.
  table_name: stg_shopify_abandoned_checkout_shipping_line_data
  entities:
  - Abandoned Checkouts
- relation_name: CollectionProductAssociation
  relation_description: This associates Collections with Products, indicating which
    products are included in each collection and which collections contain each product.
  table_name: stg_shopify_collection_product_data
  entities:
  - Collections
  - Products
- relation_description: This stores the tags associated with Customers in a Shopify
    system, representing customer attributes or classifications.
  table_name: stg_shopify_customer_tag_data
  entities:
  - Customers
- relation_name: ShopOrderFulfillmentEvents
  relation_description: Shops process Orders, which are then fulfilled through Fulfillments,
    tracking the shipping and delivery status of each order.
  table_name: stg_shopify_fulfillment_event_data
  entities:
  - Fulfillments
  - Orders
  - Shops
- relation_name: InventoryItemLocationQuantity
  relation_description: This tracks the quantity of Inventory Items available at specific
    Locations within a Shopify store.
  table_name: stg_shopify_inventory_level_data
  entities:
  - Inventory Items
  - Locations
- relation_name: OrderRefundAdjustments
  relation_description: Orders can receive Refunds, which may include adjustments
    for shipping or discrepancies, affecting the final order amount.
  table_name: stg_shopify_order_adjustment_data
  entities:
  - Orders
  - Refunds
- relation_description: This table stores discount information applied to Orders in
    a Shopify store, including multiple discounts per order.
  table_name: stg_shopify_order_discount_code_data
  entities:
  - Orders
- relation_description: This stores various attributes and details associated with
    individual Shopify orders, including customer information and order-specific data.
  table_name: stg_shopify_order_note_attribute_data
  entities:
  - Orders
- relation_description: This stores color tag metadata associated with Shopify orders,
    allowing for additional categorization or visual identification of orders.
  table_name: stg_shopify_order_tag_data
  entities:
  - Orders
- relation_description: This table stores metadata associated with individual Shopify
    orders, including key-value pairs for various attributes like image, utm_medium,
    and prop_channel.
  table_name: stg_shopify_order_url_tag_data
  entities:
  - Orders
- relation_description: This table contains detailed metadata about return authorizations
    for orders, including return reasons, quantities, and values.
  table_name: stg_shopify_metafield_data
  entities:
  - Orders
- relation_description: This captures the Tender Transactions (direct money passing)
    associated with Orders in a Shopify store, including sales and refunds.
  table_name: stg_shopify_tender_transaction_data
  entities:
  - Orders
  - Transactions
- relation_name: OrderLineItemRefunds
  relation_description: This tracks Refunds applied to specific Order Line Items,
    detailing the refund amount, quantity, and restock information.
  table_name: stg_shopify_order_line_refund_data
  entities:
  - Order Line Items
  - Refunds
- relation_description: This table represents the tax details associated with shipping
    lines for individual Shopify orders.
  table_name: stg_shopify_order_shipping_tax_line_data
  entities:
  - Order Shipping Lines
- relation_description: This table stores discount codes associated with specific
    price rules, tracking their usage and creation details.
  table_name: stg_shopify_discount_code_data
  entities:
  - Price Rules
- relation_description: This stores the tags associated with Products, allowing for
    flexible categorization and labeling of individual products in a Shopify system.
  table_name: stg_shopify_product_tag_data
  entities:
  - Products
story:
- relation_name: ShopOperatesInLocations
  story_line: Shops establish primary and additional operating locations.
- relation_name: ProductImageAssociation
  story_line: Shops upload multiple images for each product.
- relation_name: ShopifyProductVariantDetails
  story_line: Shops create product variants and link to inventory.
- relation_name: InventoryItemLocationQuantity
  story_line: Shops update inventory quantities across different locations.
- relation_name: CollectionProductAssociation
  story_line: Shops organize products into themed collections.
- relation_name: CustomerAbandonedCheckouts
  story_line: Customers add items but leave without completing purchase.
- relation_name: CustomerOrders
  story_line: Customers place orders for desired products.
- relation_name: OrderLineItemDetails
  story_line: Orders list specific products and variants purchased.
- relation_name: OrderShippingDetails
  story_line: Orders include shipping information and carrier details.
- relation_name: OrderTransactions
  story_line: Orders process payments and record financial details.
- relation_name: FulfillmentLocationAssociation
  story_line: Shops assign orders to specific fulfillment locations.
- relation_name: ShopOrderFulfillmentEvents
  story_line: Shops process and track order shipping status.
- relation_name: CustomerRefundDetails
  story_line: Shops issue refunds to customers for returns.
- relation_name: OrderRefundAdjustments
  story_line: Refunds adjust for shipping or pricing discrepancies.
- relation_name: OrderLineItemRefunds
  story_line: Refunds detail specific items returned and restocked.