Fivetran Zendesk

We display the source tables from the data warehouses to model.

brand_data (first 100 rows)

id _fivetran_deleted _fivetran_synced active brand_url default_ has_help_center help_center_state logo_content_type logo_content_url logo_deleted logo_file_name logo_height logo_id logo_inline logo_mapped_content_url logo_size logo_url logo_width name subdomain url
0 36000 False 2021-02-16 20:08:56 True 5e52582af2ab7c3a6e7fa6856b4cd945 True True enabled image/png 5e52582af2ab7c3a6e7fa6856b4cd945 False 2abdc594c0ad6eb2438448b3cbf7da56 22 3600 False 5e52582af2ab7c3a6e7fa6856b4cd945 1961 5e52582af2ab7c3a6e7fa6856b4cd945 80 2abdc594c0ad6eb2438448b3cbf7da56 2abdc594c0ad6eb2438448b3cbf7da56 5e52582af2ab7c3a6e7fa6856b4cd945

daylight_time_data (first 100 rows)

time_zone year_ _fivetran_synced daylight_end_utc daylight_offset daylight_start_utc
0 Montevideo 1970 2020-03-05T05:04:14.920Z 1970-06-14T00:00:00 1 1970-04-25T00:00:00
1 Cairo 1970 2020-03-05T05:04:14.920Z 1970-10-01T03:00:00 1 1970-05-01T01:00:00

domain_name_data (first 100 rows)

index_ organization_id _fivetran_synced domain_name
0 0 370335 2020-08-25 08:08:42 f89dedacd29b4d5b0825d81a0c0fedfa
1 0 370355 2020-08-25 08:08:42 266cb5d33605c86846121770b4d312d1
2 0 370364 2020-08-25 08:08:42 7878cf470a679110108b3da5ea619a0c
3 0 370298 2020-08-04 02:12:24 6fc5425be3db5b4f70eaf61c4323672c
4 0 370298 2020-08-04 02:12:24 040fd698880646db8df23291671e4b57
5 0 370298 2020-08-04 02:12:24 421319ed5d78b90c3048510efd8df758
6 0 370298 2020-08-04 02:12:24 bafeca689fec90acd8878e2fb025e355
7 0 370298 2020-08-04 02:12:24 4ba5ff9deb372b42e67838630b39ea3c
8 0 370298 2020-08-04 02:12:24 7ae869ebf95fd6538c3af28cbbf29e53
9 0 370298 2020-08-04 02:12:24 2c94112ed0107c614f8c5ea6b1944962

group_data (first 100 rows)

id _fivetran_deleted _fivetran_synced created_at name updated_at url
0 360008426314 False 2021-01-19 20:08:24.256 2020-02-04 00:03:11 Non-Support - (Billing Tickets) 2020-02-04 00:40:57 https://zendesk.com/api/v2/groups/360008426314.json
1 360013670454 False 2021-01-19 20:08:24.256 2020-09-01 10:16:50 Assignment Group 7 2020-09-01 10:16:50 https://zendesk.com/api/v2/groups/360013670454.json
2 360014827554 False 2021-01-19 20:08:24.256 2020-10-07 21:53:42 Information Technology 2020-10-07 21:57:54 https://zendesk.com/api/v2/groups/360014827554.json
3 360008418973 False 2021-01-19 20:08:24.256 2020-02-04 00:41:49 Non-Support - (Prospect Tickets) 2020-02-04 00:41:49 https://zendesk.com/api/v2/groups/360008418973.json
4 360013670414 False 2021-01-19 20:08:24.255 2020-09-01 10:15:17 Assignment Group 4 2020-09-01 10:15:17 https://zendesk.com/api/v2/groups/360013670414.json
5 360013656353 False 2021-01-19 20:08:24.255 2020-09-01 10:15:55 Assignment Group 5 2020-09-01 10:15:55 https://zendesk.com/api/v2/groups/360013656353.json
6 360013656373 False 2021-01-19 20:08:24.255 2020-09-01 10:16:23 Assignment Group 6 2020-09-01 10:16:23 https://zendesk.com/api/v2/groups/360013656373.json
7 360013670374 False 2021-01-19 20:08:24.254 2020-09-01 10:14:39 Assignment Group 3 2020-09-01 10:14:39 https://zendesk.com/api/v2/groups/360013670374.json

organization_data (first 100 rows)

id _fivetran_synced created_at details external_id group_id name notes shared_comments shared_tickets updated_at url
0 370326104793 2020-04-24 14:13:07.083 2020-02-13 22:45:02 NaN NaN NaN name1 NaN False False 2020-02-13 22:45:02 https://zendesk.com/api/v2/organizations/370326104793.json
1 370326181493 2020-04-24 14:13:07.091 2020-02-13 22:50:18 NaN NaN NaN name2 NaN False False 2020-02-13 22:50:18 https://zendesk.com/api/v2/organizations/370326181493.json
2 370325990093 2020-04-24 14:13:02.924 2020-02-13 22:19:31 NaN NaN NaN name3 NaN False False 2020-02-13 22:19:31 https://zendesk.com/api/v2/organizations/370325990093.json
3 370557178694 2020-11-19 02:08:25.601 2020-11-18 21:11:48 NaN NaN NaN name4 NaN False False 2020-11-18 21:59:43 https://zendesk.com/api/v2/organizations/370557178694.json
4 370326279253 2020-04-24 14:13:07.111 2020-02-13 23:01:03 NaN NaN NaN name5 NaN False False 2020-02-13 23:01:03 https://zendesk.com/api/v2/organizations/370326279253.json
5 370324281773 2020-06-12 02:09:09.143 2020-02-11 05:51:37 NaN NaN NaN name6 NaN True True 2020-06-11 21:41:32 https://zendesk.com/api/v2/organizations/370324281773.json
6 370325960473 2020-04-24 14:13:02.767 2020-02-13 21:25:16 NaN NaN NaN name7 NaN False False 2020-02-13 21:25:16 https://zendesk.com/api/v2/organizations/370325960473.json
7 370326052813 2020-04-24 14:13:07.056 2020-02-13 22:31:14 NaN NaN NaN name8 NaN False False 2020-02-13 22:31:14 https://zendesk.com/api/v2/organizations/370326052813.json
8 370326014913 2020-04-24 14:13:07.006 2020-02-13 22:26:05 NaN NaN NaN name9 NaN False False 2020-02-13 22:26:05 https://zendesk.com/api/v2/organizations/370326014913.json
9 370301409353 2020-04-24 14:12:43.388 2019-12-16 23:28:32 NaN NaN NaN name10 NaN False False 2019-12-16 23:28:32 https://zendesk.com/api/v2/organizations/370301409353.json

organization_tag_data (first 100 rows)

organization_id tag _fivetran_synced
0 370364 63ad9d34f3503826e5f649ae6b7ac92c 2020-08-25 08:08:42
1 370401 63ad9d34f3503826e5f649ae6b7ac92c 2020-08-12 20:16:04
2 370401 63ad9d34f3503826e5f649ae6b7ac92c 2020-08-12 20:16:04
3 370297 63ad9d34f3503826e5f649ae6b7ac92c 2020-08-04 02:12:16
4 370297 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:20:08
5 370321 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:29:08
6 370354 63ad9d34f3503826e5f649ae6b7ac92c 2020-09-03 02:08:46
7 370319 63ad9d34f3503826e5f649ae6b7ac92c 2020-12-18 14:08:28
8 370321 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:20:40
9 370334 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:13:42
10 370334 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:13:42
11 370298 63ad9d34f3503826e5f649ae6b7ac92c 2020-04-24 14:13:42

schedule_data (first 100 rows)

end_time id start_time _fivetran_deleted _fivetran_synced end_time_utc name start_time_utc time_zone created_at
0 8400 360000310393 7680 False 2021-01-19 20:08:37.690 8820 SupportCA 8100 Pacific Time (US & Canada) 2019-11-18 20:23:34
1 5520 360000310393 4800 False 2021-01-19 20:08:37.690 5940 SupportCA 5220 Pacific Time (US & Canada) 2019-11-18 20:23:34
2 6960 360000310393 6240 False 2021-01-19 20:08:37.690 7380 SupportCA 6660 Pacific Time (US & Canada) 2019-11-18 20:23:34
3 2640 360000310393 1920 False 2021-01-19 20:08:37.689 3060 SupportCA 2340 Pacific Time (US & Canada) 2019-11-18 20:23:34
4 4080 360000310393 3360 False 2021-01-19 20:08:37.690 4500 SupportCA 3780 Pacific Time (US & Canada) 2019-11-18 20:23:34

schedule_holiday_data (first 100 rows)

id schedule_id _fivetran_deleted _fivetran_synced end_date name start_date
0 1163028 360000 False 2023-01-19 21:53:06.281 2022-12-27 Test Holiday 2022-12-26
1 1163027 360000 False 2023-01-19 21:53:06.281 2022-12-29 Test Holiday 2 2022-12-29

ticket_comment_data (first 100 rows)

id _fivetran_synced body created facebook_comment public_ ticket_id tweet user_id voice_comment
0 1030820361033 2020-03-05 04:47:57.580 body1 2020-02-11 07:49:14 False True 1077 False 402935450113 False
1 1198167194074 2020-09-07 08:09:18.997 body2 2020-06-26 20:47:27 False True 7636 False 413068955393 False
2 1227221959693 2020-09-02 14:08:34.410 body3 2020-07-21 19:33:46 False True 7367 False 413068955393 False
3 1227005852153 2020-09-02 14:08:34.410 body4 2020-07-21 17:07:54 False True 7367 False 413068955393 False
4 1225658181514 2020-09-02 14:08:34.410 body5 2020-07-20 20:16:47 False True 7367 False 413068955393 False
5 1189989966994 2020-09-07 08:09:18.991 body6 2020-06-19 20:40:41 False True 7636 False 413068955393 False
6 1228317430153 2020-09-02 14:08:34.412 body7 2020-07-22 16:41:28 False True 7367 False 413068955393 False
7 1227234557733 2020-09-02 14:08:34.411 body8 2020-07-21 19:43:27 False True 7367 False 413068955393 False
8 1227270483653 2020-09-02 14:08:34.411 body9 2020-07-21 20:07:17 False True 7367 False 413068955393 False
9 1335841328993 2020-11-04 14:09:01.386 body10 2020-10-19 13:56:05 False True 13628 False 424142574593 False
10 1338387572833 2020-11-04 14:09:01.388 body11 2020-10-21 09:08:41 False True 13628 False 424142574593 False
11 1326976058654 2020-11-06 14:09:43.392 body12 2020-10-11 13:13:55 False True 13095 False 423362500353 False
12 1326333791494 2020-10-26 02:11:04.021 body13 2020-10-10 15:40:36 False True 13057 False 423362500353 False
13 1326246720513 2020-10-26 02:11:04.021 body14 2020-10-10 14:39:46 False True 13057 False 423362500353 False
14 1326861102034 2020-11-06 14:09:43.391 body15 2020-10-11 08:02:56 False True 13095 False 423362500353 False
15 1326864438354 2020-11-06 14:09:43.391 body16 2020-10-11 08:08:54 False True 13095 False 423362500353 False
16 1350940278314 2020-11-15 08:12:04.339 body17 2020-11-01 02:20:20 False True 14352 False 423362500353 False
17 1326296442114 2020-11-06 14:09:43.388 body18 2020-10-10 14:51:51 False True 13095 False 423362500353 False
18 1326862829914 2020-10-26 02:11:04.025 body19 2020-10-11 08:05:48 False True 13057 False 423362500353 False
19 1326808961753 2020-10-26 02:11:04.023 body20 2020-10-11 07:53:43 False True 13057 False 423362500353 False

ticket_data (first 100 rows)

id _fivetran_synced allow_channelback assignee_id brand_id created_at description due_at external_id forum_topic_id group_id has_incidents is_public organization_id priority problem_id recipient requester_id status subject submitter_id system_client ticket_form_id type updated_at url via_channel via_source_from_id via_source_from_title via_source_rel via_source_to_address via_source_to_name merged_ticket_ids via_source_from_address followup_ids via_followup_source_id
0 1595 2020-03-20 02:32:49.426 False NaN 360003529474 2020-02-19 01:54:52 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 396331237134 deleted subject1 396331237134 NaN 360002048693 incident 2020-02-19 01:55:11 https://zendesk.com/api/v2/tickets/1595.json web NaN NaN NaN example@email.com None [] NaN None NaN
1 16988 2021-01-13 20:09:16.325 False 4.182841e+11 360003529474 2020-12-22 00:19:23 description1 NaN NaN NaN 3.600134e+11 False True 3.704691e+11 NaN NaN email@email.com 1500656884401 solved subject1 1500656884401 NaN 360002048693 None 2021-01-13 18:42:39 https://zendesk.com/api/v2/tickets/16988.json email NaN NaN NaN example@email.com Support [] NaN [] NaN
2 14173 2020-11-11 20:08:45.130 False 3.963717e+11 360003529474 2020-10-28 12:03:02 description1 NaN NaN NaN 3.600070e+11 False True 3.703211e+11 NaN NaN email@email.com 424883466453 closed subject1 424883466453 NaN 360002048693 None 2020-11-11 17:01:32 https://zendesk.com/api/v2/tickets/14173.json email NaN NaN NaN example@email.com Support [] NaN None NaN
3 11071 2020-10-02 14:08:33.216 False NaN 360003529474 2020-08-28 18:06:36 description1 NaN NaN NaN NaN False True NaN NaN NaN email@email.com 419755385214 deleted subject1 419755385214 NaN 360002048693 None 2020-09-02 11:01:27 https://zendesk.com/api/v2/tickets/11071.json email NaN NaN NaN X Support [] NaN None NaN
4 1966 2020-03-25 20:32:23.617 False 3.963154e+11 360003529474 2020-02-27 06:05:08 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 402813302773 closed subject1 402813302773 NaN 360002048693 None 2020-03-25 16:03:26 https://zendesk.com/api/v2/tickets/1966.json email NaN NaN NaN example@email.com Support [1967] NaN None NaN
5 11013 2020-10-02 20:08:20.449 False 4.028517e+11 360003529474 2020-08-27 23:09:52 description1 NaN NaN NaN 3.600084e+11 False True 3.702979e+11 NaN NaN email@email.com 419688934974 deleted subject1 419688934974 NaN 360002048693 None 2020-09-02 15:53:16 https://zendesk.com/api/v2/tickets/11013.json email NaN NaN NaN X Support [] NaN None NaN
6 1404 2020-03-05 04:53:46.466 False 3.963717e+11 360003529474 2020-02-13 21:43:58 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 403125197514 closed subject1 403125197514 NaN 360002048693 None 2020-02-28 01:01:57 https://zendesk.com/api/v2/tickets/1404.json email NaN NaN NaN example@email.com Support None NaN None NaN
7 4721 2020-05-14 20:12:36.297 False 3.963717e+11 360003529474 2020-04-20 14:31:46 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 402862357193 closed subject1 402862357193 NaN 360002048693 None 2020-05-14 20:04:34 https://zendesk.com/api/v2/tickets/4721.json email NaN NaN NaN example@email.com Support [] NaN None NaN
8 6171 2020-06-01 02:11:39.760 False 3.963344e+11 360003529474 2020-05-17 17:50:31 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 410930434074 closed subject1 410930434074 NaN 360002048693 None 2020-05-31 23:03:46 https://zendesk.com/api/v2/tickets/6171.json email NaN NaN NaN example@email.com Support [] NaN None NaN
9 6605 2020-06-10 02:10:24.202 False 3.963154e+11 360003529474 2020-05-26 22:29:50 description1 NaN NaN NaN 3.600070e+11 False True 3.702957e+11 NaN NaN email@email.com 410416672973 closed subject1 410416672973 NaN 360002048693 None 2020-06-09 23:03:49 https://zendesk.com/api/v2/tickets/6605.json email NaN NaN NaN example@email.com Support [] NaN None NaN

ticket_field_history_data (first 100 rows)

field_name ticket_id updated _fivetran_synced user_id value_
0 status 6964 2020-06-01 21:11:59 2020-07-02 02:09:05.984 NaN solved
1 status 974 2020-02-10 21:47:41 2020-03-12 02:32:23.808 NaN solved
2 priority 980 2020-02-10 22:06:57 2020-03-12 02:32:23.808 NaN solved
3 status 11071 2020-09-02 11:01:27 2020-10-02 14:08:33.216 NaN solved
4 status 8205 2020-07-07 23:01:47 2020-08-07 02:09:08.192 NaN solved
5 status 103 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
6 assignee_id 108 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN 1111
7 status 95 2020-02-10 08:36:39 2020-03-11 14:32:23.872 NaN solved
8 status 107 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
9 status 102 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
10 status 102 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
11 status 226 2020-02-10 08:35:52 2020-03-11 14:32:20.096 NaN solved
12 status 229 2020-02-10 08:35:52 2020-03-11 14:32:20.096 NaN solved
13 status 11016 2020-09-02 15:53:15 2020-10-02 20:08:19.701 NaN solved
14 status 102 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
15 status 982 2020-02-10 22:32:56 2020-03-12 02:32:29.045 NaN solved
16 status 141 2020-02-10 08:36:25 2020-03-11 14:32:23.861 NaN solved
17 status 1192 2020-02-12 00:22:26 2020-03-13 02:35:56.661 NaN solved
18 status 102 2020-02-10 08:36:38 2020-03-11 14:32:23.872 NaN solved
19 status 312 2020-02-10 08:35:18 2020-03-11 14:32:09.781 NaN solved

ticket_form_history_data (first 100 rows)

id updated_at _fivetran_deleted _fivetran_synced active created_at display_name end_user_visible name
0 360200 2019-12-18 18:52:02 False 2021-02-16 20:08:55 False 2019-11-07 23:14:54 2be0ef4b34cc85f6e3582b3bef65cd4e True 2be0ef4b34cc85f6e3582b3bef65cd4e
1 360204 2020-01-20 01:27:51 False 2020-02-05 20:16:15 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
2 360204 2020-09-14 22:45:49 False 2020-12-28 14:08:18 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
3 360204 2021-02-11 22:42:12 False 2021-02-16 20:08:55 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
4 360204 2020-02-10 03:34:33 False 2020-02-10 20:19:14 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
5 360204 2020-06-02 14:44:52 False 2020-09-14 20:08:14 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
6 360204 2021-02-02 17:33:15 False 2021-02-04 14:08:48 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
7 360204 2020-02-11 01:43:20 False 2020-02-15 14:20:08 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
8 360204 2021-02-04 16:29:37 False 2021-02-11 20:09:11 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
9 360204 2020-02-15 17:13:29 False 2020-03-05 02:23:07 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
10 360204 2020-03-05 02:29:38 False 2020-06-01 02:11:19 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
11 360204 2020-12-28 16:17:21 False 2021-02-02 14:08:59 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
12 360204 2020-02-05 20:27:15 False 2020-02-10 02:18:09 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
13 360204 2020-06-01 07:57:13 False 2020-06-02 14:11:12 True 2019-11-13 22:40:56 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83
14 360205 2020-12-23 17:11:16 False 2020-12-28 14:08:18 True 2019-11-13 22:12:51 41f1a810ab161e4c3ee35e1afec238dd True 41f1a810ab161e4c3ee35e1afec238dd

ticket_schedule_data (first 100 rows)

created_at ticket_id _fivetran_synced schedule_id
0 2020-02-04 15:26:26 123 2020-02-29 09:09:19.575 3600
1 2019-05-29 15:32:24 76900 2019-09-29 13:30:54.240 3600
2 2019-06-09 09:42:37 77569 2019-09-29 13:31:29.639 3600
3 2019-06-20 19:53:39 79336 2019-09-29 13:55:33.914 3600
4 2019-07-11 23:03:43 79616 2019-09-29 13:56:16.796 3600
5 2019-07-16 20:11:01 79972 2019-09-29 13:56:51.605 3600
6 2019-07-27 15:49:32 90640 2019-09-29 14:06:09.724 3600
7 2019-09-19 02:16:54 93969 2019-09-29 14:50:29.411 3600
8 2019-09-25 21:43:09 94490 2020-04-24 23:13:16.632 3600
9 2019-09-03 13:57:05 94490 2020-04-24 23:13:16.634 3600

ticket_tag_data (first 100 rows)

tag ticket_id _fivetran_synced
0 customer 123 2020-03-05 04:50:07.657
1 customer 123 2020-03-05 04:50:07.657
2 customer 123 2020-03-05 04:50:07.657
3 prospect 123 2020-03-05 04:50:07.657
4 customer 455 2020-03-05 04:50:30.253
5 customer 455 2020-03-05 04:50:30.253
6 prospect 455 2020-03-05 04:50:30.253
7 connector 789 2020-03-05 04:54:18.030
8 prospect 1011 2020-03-05 04:54:18.030
9 customer 1012 2020-03-05 04:54:31.717

time_zone_data (first 100 rows)

time_zone _fivetran_synced standard_offset
0 London 2022-01-19T03:03:33.969Z +00:00
1 Dublin 2022-01-19T03:03:33.969Z +00:00

user_data (first 100 rows)

id _fivetran_synced active alias authenticity_token chat_only created_at details email external_id last_login_at locale locale_id moderator name notes only_private_comments organization_id phone remote_photo_url restricted_agent role shared shared_agent signature suspended ticket_restriction time_zone two_factor_auth_enabled updated_at url verified
0 403958466973 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:55:12 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.702977e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:55:12 https://fivetran1813.zendesk.com/api/v2/users/403958466973.json True
1 403969371634 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:41:37 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False NaN NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:41:37 https://fivetran1813.zendesk.com/api/v2/users/403969371634.json True
2 403957746773 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:35:14 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703192e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:35:15 https://fivetran1813.zendesk.com/api/v2/users/403957746773.json True
3 403970285734 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 23:07:41 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703262e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 23:07:41 https://fivetran1813.zendesk.com/api/v2/users/403970285734.json True
4 403969943274 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:57:51 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703262e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:57:52 https://fivetran1813.zendesk.com/api/v2/users/403969943274.json True
5 403959084893 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 23:14:08 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703262e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 23:14:08 https://fivetran1813.zendesk.com/api/v2/users/403959084893.json True
6 403969141074 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:34:41 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703192e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:34:41 https://fivetran1813.zendesk.com/api/v2/users/403969141074.json True
7 403957960093 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:41:37 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703313e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:41:38 https://fivetran1813.zendesk.com/api/v2/users/403957960093.json True
8 403969141094 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 22:34:41 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703192e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 22:34:41 https://fivetran1813.zendesk.com/api/v2/users/403969141094.json True
9 403959218893 2020-03-05 05:03:34.208 True NaN NaN False 2020-02-26 23:18:06 NaN example@email.com NaN 2023-02-26 22:55:12 en-US 1 False John Doe NaN False 3.703261e+11 NaN NaN True end-user False False NaN False requested Pacific Time (US & Canada) False 2020-02-26 23:18:06 https://fivetran1813.zendesk.com/api/v2/users/403959218893.json True

user_tag_data (first 100 rows)

tag user_id _fivetran_synced
0 362ebc379b4b1b33b9656a2232a9e673 40092 2020-10-24 08:14:06
1 266e0d3d29830abfe7d4ed98b47966f7 40092 2020-10-24 08:14:06
2 266e0d3d29830abfe7d4ed98b47966f7 39882 2020-03-05 05:02:30
3 ef43d72019ff152f0b1209e962a2f4f2 40282 2020-10-22 14:12:10
4 ef43d72019ff152f0b1209e962a2f4f2 40284 2021-02-13 02:09:18
5 ef43d72019ff152f0b1209e962a2f4f2 40285 2021-02-13 02:09:18
6 d041ea3c4cb5f32804365f2f732dd88a 42370 2020-10-13 20:15:06
7 b545892c9465b8e00d32b20fcd55caf7 41244 2021-02-16 08:09:05
8 266e0d3d29830abfe7d4ed98b47966f7 40386 2021-02-16 08:09:05
9 362ebc379b4b1b33b9656a2232a9e673 40386 2021-02-16 08:09:05
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_organization_data (first 100 rows)

organization_name has_shared_comments has_shared_tickets api_url creation_date last_updated_date organization_id
0 name1 False False https://zendesk.com/api/v2/organizations/370326104793.json 2020-02-13 22:45:02 2020-02-13 22:45:02 370326104793
1 name2 False False https://zendesk.com/api/v2/organizations/370326181493.json 2020-02-13 22:50:18 2020-02-13 22:50:18 370326181493
2 name3 False False https://zendesk.com/api/v2/organizations/370325990093.json 2020-02-13 22:19:31 2020-02-13 22:19:31 370325990093
3 name4 False False https://zendesk.com/api/v2/organizations/370557178694.json 2020-11-18 21:11:48 2020-11-18 21:59:43 370557178694
4 name5 False False https://zendesk.com/api/v2/organizations/370326279253.json 2020-02-13 23:01:03 2020-02-13 23:01:03 370326279253
5 name6 True True https://zendesk.com/api/v2/organizations/370324281773.json 2020-02-11 05:51:37 2020-06-11 21:41:32 370324281773
6 name7 False False https://zendesk.com/api/v2/organizations/370325960473.json 2020-02-13 21:25:16 2020-02-13 21:25:16 370325960473
7 name8 False False https://zendesk.com/api/v2/organizations/370326052813.json 2020-02-13 22:31:14 2020-02-13 22:31:14 370326052813
8 name9 False False https://zendesk.com/api/v2/organizations/370326014913.json 2020-02-13 22:26:05 2020-02-13 22:26:05 370326014913
9 name10 False False https://zendesk.com/api/v2/organizations/370301409353.json 2019-12-16 23:28:32 2019-12-16 23:28:32 370301409353

stg_organization_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"organization_data_projected" AS (
    -- Projection: Selecting 11 out of 12 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "details",
        "external_id",
        "group_id",
        "name",
        "notes",
        "shared_comments",
        "shared_tickets",
        "updated_at",
        "url"
    FROM "organization_data"
),

"organization_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> organization_id
    -- created_at -> creation_date
    -- details -> organization_details
    -- name -> organization_name
    -- notes -> organization_notes
    -- shared_comments -> has_shared_comments
    -- shared_tickets -> has_shared_tickets
    -- updated_at -> last_updated_date
    -- url -> api_url
    SELECT 
        "id" AS "organization_id",
        "created_at" AS "creation_date",
        "details" AS "organization_details",
        "external_id",
        "group_id",
        "name" AS "organization_name",
        "notes" AS "organization_notes",
        "shared_comments" AS "has_shared_comments",
        "shared_tickets" AS "has_shared_tickets",
        "updated_at" AS "last_updated_date",
        "url" AS "api_url"
    FROM "organization_data_projected"
),

"organization_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- creation_date: from VARCHAR to TIMESTAMP
    -- external_id: from DECIMAL to VARCHAR
    -- group_id: from DECIMAL to VARCHAR
    -- last_updated_date: from VARCHAR to TIMESTAMP
    -- organization_details: from DECIMAL to VARCHAR
    -- organization_id: from INT to VARCHAR
    -- organization_notes: from DECIMAL to VARCHAR
    SELECT
        "organization_name",
        "has_shared_comments",
        "has_shared_tickets",
        "api_url",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("external_id" AS VARCHAR) AS "external_id",
        CAST("group_id" AS VARCHAR) AS "group_id",
        CAST("last_updated_date" AS TIMESTAMP) AS "last_updated_date",
        CAST("organization_details" AS VARCHAR) AS "organization_details",
        CAST("organization_id" AS VARCHAR) AS "organization_id",
        CAST("organization_notes" AS VARCHAR) AS "organization_notes"
    FROM "organization_data_projected_renamed"
),

"organization_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 4 columns with unacceptable missing values
    -- external_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- group_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- organization_details has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- organization_notes has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "organization_name",
        "has_shared_comments",
        "has_shared_tickets",
        "api_url",
        "creation_date",
        "last_updated_date",
        "organization_id"
    FROM "organization_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "organization_data_projected_renamed_casted_missing_handled"

stg_organization_data.yml (Document the table)

version: 2
models:
- name: stg_organization_data
  description: The table is about organizations. It contains basic details for each
    organization including an ID, name, creation date, and URL. The table also tracks
    whether the organization has shared comments or tickets. Each organization has
    a unique external ID and can belong to a group. The 'details' and 'notes' fields
    allow for additional information storage. The 'updated_at' field shows when the
    organization's information was last modified.
  columns:
  - name: organization_name
    description: Name of the organization
    tests:
    - not_null
  - name: has_shared_comments
    description: Whether the organization has shared comments
    tests:
    - not_null
  - name: has_shared_tickets
    description: Whether the organization has shared tickets
    tests:
    - not_null
  - name: api_url
    description: API URL for the organization
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the API URL for each organization. For this
        table, each row is for a unique organization. The API URL appears to be unique
        for each organization as it includes the organization_id in its structure.
  - name: creation_date
    description: Date and time the organization was created
    tests:
    - not_null
  - name: last_updated_date
    description: Date and time the organization was last updated
    tests:
    - not_null
  - name: organization_id
    description: Unique identifier for the organization
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each organization.
        For this table, each row is for a unique organization. The organization_id
        is designed to be unique across all organizations.

stg_ticket_data (first 100 rows)

ticket_recipient_address is_public ticket_recipient_name api_url has_incidents ticket_description recipient_email ticket_subject organization_id ticket_status brand_id submission_channel channelback_allowed ticket_id ticket_type ticket_form_id group_id assignee_id creation_timestamp due_date followup_ids followup_source_id forum_topic_id last_update_timestamp merged_ticket_ids priority problem_id requester_id submitter_id
0 example@email.com True None https://zendesk.com/api/v2/tickets/1595.json False description1 email@email.com subject1 3.702957e+11 deleted 360003529474 web False 1595 incident 360002048693 3.600070e+11 None 2020-02-19 01:54:52 NaT NaN None None 2020-02-19 01:55:11 NaN None None 396331237134 396331237134
1 example@email.com True Support https://zendesk.com/api/v2/tickets/16988.json False description1 email@email.com subject1 3.704691e+11 solved 360003529474 email False 16988 None 360002048693 3.600134e+11 418284131934.0 2020-12-22 00:19:23 NaT [] None None 2021-01-13 18:42:39 NaN None None 1500656884401 1500656884401
2 example@email.com True Support https://zendesk.com/api/v2/tickets/14173.json False description1 email@email.com subject1 3.703211e+11 closed 360003529474 email False 14173 None 360002048693 3.600070e+11 396371699653.0 2020-10-28 12:03:02 NaT NaN None None 2020-11-11 17:01:32 NaN None None 424883466453 424883466453
3 None True Support https://zendesk.com/api/v2/tickets/11071.json False description1 email@email.com subject1 NaN deleted 360003529474 email False 11071 None 360002048693 NaN None 2020-08-28 18:06:36 NaT NaN None None 2020-09-02 11:01:27 NaN None None 419755385214 419755385214
4 example@email.com True Support https://zendesk.com/api/v2/tickets/1966.json False description1 email@email.com subject1 3.702957e+11 closed 360003529474 email False 1966 None 360002048693 3.600070e+11 396315360434.0 2020-02-27 06:05:08 NaT NaN None None 2020-03-25 16:03:26 [1967] None None 402813302773 402813302773
5 None True Support https://zendesk.com/api/v2/tickets/11013.json False description1 email@email.com subject1 3.702979e+11 deleted 360003529474 email False 11013 None 360002048693 3.600084e+11 402851697393.0 2020-08-27 23:09:52 NaT NaN None None 2020-09-02 15:53:16 NaN None None 419688934974 419688934974
6 example@email.com True Support https://zendesk.com/api/v2/tickets/1404.json False description1 email@email.com subject1 3.702957e+11 closed 360003529474 email False 1404 None 360002048693 3.600070e+11 396371699653.0 2020-02-13 21:43:58 NaT NaN None None 2020-02-28 01:01:57 NaN None None 403125197514 403125197514
7 example@email.com True Support https://zendesk.com/api/v2/tickets/4721.json False description1 email@email.com subject1 3.702957e+11 closed 360003529474 email False 4721 None 360002048693 3.600070e+11 396371706773.0 2020-04-20 14:31:46 NaT NaN None None 2020-05-14 20:04:34 NaN None None 402862357193 402862357193
8 example@email.com True Support https://zendesk.com/api/v2/tickets/6171.json False description1 email@email.com subject1 3.702957e+11 closed 360003529474 email False 6171 None 360002048693 3.600070e+11 396334400494.0 2020-05-17 17:50:31 NaT NaN None None 2020-05-31 23:03:46 NaN None None 410930434074 410930434074
9 example@email.com True Support https://zendesk.com/api/v2/tickets/6605.json False description1 email@email.com subject1 3.702957e+11 closed 360003529474 email False 6605 None 360002048693 3.600070e+11 396315360434.0 2020-05-26 22:29:50 NaT NaN None None 2020-06-09 23:03:49 NaN None None 410416672973 410416672973

stg_ticket_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_data_projected" AS (
    -- Projection: Selecting 35 out of 36 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "allow_channelback",
        "assignee_id",
        "brand_id",
        "created_at",
        "description",
        "due_at",
        "external_id",
        "forum_topic_id",
        "group_id",
        "has_incidents",
        "is_public",
        "organization_id",
        "priority",
        "problem_id",
        "recipient",
        "requester_id",
        "status",
        "subject",
        "submitter_id",
        "system_client",
        "ticket_form_id",
        "type",
        "updated_at",
        "url",
        "via_channel",
        "via_source_from_id",
        "via_source_from_title",
        "via_source_rel",
        "via_source_to_address",
        "via_source_to_name",
        "merged_ticket_ids",
        "via_source_from_address",
        "followup_ids",
        "via_followup_source_id"
    FROM "ticket_data"
),

"ticket_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> ticket_id
    -- allow_channelback -> channelback_allowed
    -- created_at -> creation_timestamp
    -- description -> ticket_description
    -- due_at -> due_date
    -- recipient -> recipient_email
    -- status -> ticket_status
    -- subject -> ticket_subject
    -- type -> ticket_type
    -- updated_at -> last_update_timestamp
    -- url -> api_url
    -- via_channel -> submission_channel
    -- via_source_from_id -> ticket_source_id
    -- via_source_from_title -> ticket_source_name
    -- via_source_rel -> ticket_source_relationship
    -- via_source_to_address -> ticket_recipient_address
    -- via_source_to_name -> ticket_recipient_name
    -- via_source_from_address -> source_email_address
    -- via_followup_source_id -> followup_source_id
    SELECT 
        "id" AS "ticket_id",
        "allow_channelback" AS "channelback_allowed",
        "assignee_id",
        "brand_id",
        "created_at" AS "creation_timestamp",
        "description" AS "ticket_description",
        "due_at" AS "due_date",
        "external_id",
        "forum_topic_id",
        "group_id",
        "has_incidents",
        "is_public",
        "organization_id",
        "priority",
        "problem_id",
        "recipient" AS "recipient_email",
        "requester_id",
        "status" AS "ticket_status",
        "subject" AS "ticket_subject",
        "submitter_id",
        "system_client",
        "ticket_form_id",
        "type" AS "ticket_type",
        "updated_at" AS "last_update_timestamp",
        "url" AS "api_url",
        "via_channel" AS "submission_channel",
        "via_source_from_id" AS "ticket_source_id",
        "via_source_from_title" AS "ticket_source_name",
        "via_source_rel" AS "ticket_source_relationship",
        "via_source_to_address" AS "ticket_recipient_address",
        "via_source_to_name" AS "ticket_recipient_name",
        "merged_ticket_ids",
        "via_source_from_address" AS "source_email_address",
        "followup_ids",
        "via_followup_source_id" AS "followup_source_id"
    FROM "ticket_data_projected"
),

"ticket_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- ticket_description: The problem is that 'description1' appears to be a placeholder value rather than actual ticket content. It's unusual because real ticket descriptions would typically contain varied and specific information about different issues or requests. The presence of only this single, generic value suggests that actual description data is missing or was not properly imported. In this case, there isn't a "correct" value to map to, as we don't have the real descriptions. 
    -- ticket_subject: The problem is that 'subject1' appears to be a placeholder value rather than actual ticket subject content. In a real dataset, ticket subjects would typically contain varied, descriptive text related to the issue being reported. The correct values should be actual ticket subject content, but since we don't have that information, we can't map it to a specific correct value. 
    -- ticket_recipient_address: The problem is that 'X' is not a valid email address format and likely represents missing or placeholder data. The correct value for missing or placeholder data in an email address field should be an empty string. 
    -- merged_ticket_ids: The problem is that the values in the merged_ticket_ids column are string representations of lists rather than actual list values. The correct values should be actual Python lists. The '[]' represents an empty list, while '[1967]' represents a list containing a single integer value. 
    SELECT
        "ticket_id",
        "channelback_allowed",
        "assignee_id",
        "brand_id",
        "creation_timestamp",
        CASE
            WHEN "ticket_description" = '''description1''' THEN ''''
            ELSE "ticket_description"
        END AS "ticket_description",
        "due_date",
        "external_id",
        "forum_topic_id",
        "group_id",
        "has_incidents",
        "is_public",
        "organization_id",
        "priority",
        "problem_id",
        "recipient_email",
        "requester_id",
        "ticket_status",
        CASE
            WHEN "ticket_subject" = '''subject1''' THEN ''''
            ELSE "ticket_subject"
        END AS "ticket_subject",
        "submitter_id",
        "system_client",
        "ticket_form_id",
        "ticket_type",
        "last_update_timestamp",
        "api_url",
        "submission_channel",
        "ticket_source_id",
        "ticket_source_name",
        "ticket_source_relationship",
        CASE
            WHEN "ticket_recipient_address" = '''X''' THEN ''''
            ELSE "ticket_recipient_address"
        END AS "ticket_recipient_address",
        "ticket_recipient_name",
        CASE
            WHEN "merged_ticket_ids" = '''[]''' THEN '[]'
            WHEN "merged_ticket_ids" = '''[1967]''' THEN '[1967]'
            ELSE "merged_ticket_ids"
        END AS "merged_ticket_ids",
        "source_email_address",
        "followup_ids",
        "followup_source_id"
    FROM "ticket_data_projected_renamed"
),

"ticket_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- ticket_recipient_address: ['X']
    -- merged_ticket_ids: ['[]']
    SELECT 
        CASE
            WHEN "ticket_recipient_address" = 'X' THEN NULL
            ELSE "ticket_recipient_address"
        END AS "ticket_recipient_address",
        CASE
            WHEN "merged_ticket_ids" = '[]' THEN NULL
            ELSE "merged_ticket_ids"
        END AS "merged_ticket_ids",
        "forum_topic_id",
        "last_update_timestamp",
        "system_client",
        "is_public",
        "due_date",
        "ticket_recipient_name",
        "followup_source_id",
        "creation_timestamp",
        "api_url",
        "has_incidents",
        "source_email_address",
        "ticket_description",
        "priority",
        "recipient_email",
        "ticket_subject",
        "submitter_id",
        "ticket_source_name",
        "assignee_id",
        "followup_ids",
        "organization_id",
        "ticket_source_relationship",
        "problem_id",
        "external_id",
        "ticket_status",
        "ticket_source_id",
        "brand_id",
        "submission_channel",
        "channelback_allowed",
        "ticket_id",
        "requester_id",
        "ticket_type",
        "ticket_form_id",
        "group_id"
    FROM "ticket_data_projected_renamed_cleaned"
),

"ticket_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- assignee_id: from DECIMAL to VARCHAR
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- due_date: from DECIMAL to DATE
    -- external_id: from DECIMAL to VARCHAR
    -- followup_ids: from VARCHAR to ARRAY
    -- followup_source_id: from DECIMAL to VARCHAR
    -- forum_topic_id: from DECIMAL to VARCHAR
    -- last_update_timestamp: from VARCHAR to TIMESTAMP
    -- merged_ticket_ids: from VARCHAR to ARRAY
    -- priority: from DECIMAL to VARCHAR
    -- problem_id: from DECIMAL to VARCHAR
    -- requester_id: from INT to VARCHAR
    -- source_email_address: from DECIMAL to VARCHAR
    -- submitter_id: from INT to VARCHAR
    -- system_client: from DECIMAL to VARCHAR
    -- ticket_source_id: from DECIMAL to VARCHAR
    -- ticket_source_name: from DECIMAL to VARCHAR
    -- ticket_source_relationship: from DECIMAL to VARCHAR
    SELECT
        "ticket_recipient_address",
        "is_public",
        "ticket_recipient_name",
        "api_url",
        "has_incidents",
        "ticket_description",
        "recipient_email",
        "ticket_subject",
        "organization_id",
        "ticket_status",
        "brand_id",
        "submission_channel",
        "channelback_allowed",
        "ticket_id",
        "ticket_type",
        "ticket_form_id",
        "group_id",
        CAST("assignee_id" AS VARCHAR) AS "assignee_id",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST("due_date" AS DATE) AS "due_date",
        CAST("external_id" AS VARCHAR) AS "external_id",
        CASE 
            WHEN "followup_ids" = '[]' THEN CAST('[]' AS INTEGER[])
            ELSE TRY_CAST("followup_ids" AS INTEGER[])
        END AS "followup_ids",
        CAST("followup_source_id" AS VARCHAR) AS "followup_source_id",
        CAST("forum_topic_id" AS VARCHAR) AS "forum_topic_id",
        CAST("last_update_timestamp" AS TIMESTAMP) AS "last_update_timestamp",
        from_json("merged_ticket_ids", '["INTEGER"]') AS "merged_ticket_ids",
        CAST("priority" AS VARCHAR) AS "priority",
        CAST("problem_id" AS VARCHAR) AS "problem_id",
        CAST("requester_id" AS VARCHAR) AS "requester_id",
        CAST("source_email_address" AS VARCHAR) AS "source_email_address",
        CAST("submitter_id" AS VARCHAR) AS "submitter_id",
        CAST("system_client" AS VARCHAR) AS "system_client",
        CAST("ticket_source_id" AS VARCHAR) AS "ticket_source_id",
        CAST("ticket_source_name" AS VARCHAR) AS "ticket_source_name",
        CAST("ticket_source_relationship" AS VARCHAR) AS "ticket_source_relationship"
    FROM "ticket_data_projected_renamed_cleaned_null"
),

"ticket_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 11 columns with unacceptable missing values
    -- assignee_id has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- external_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- group_id has 10.0 percent missing. Strategy: 🔄 Unchanged
    -- organization_id has 10.0 percent missing. Strategy: 🔄 Unchanged
    -- source_email_address has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- system_client has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- ticket_recipient_address has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- ticket_recipient_name has 10.0 percent missing. Strategy: 🔄 Unchanged
    -- ticket_source_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- ticket_source_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- ticket_source_relationship has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "ticket_recipient_address",
        "is_public",
        "ticket_recipient_name",
        "api_url",
        "has_incidents",
        "ticket_description",
        "recipient_email",
        "ticket_subject",
        "organization_id",
        "ticket_status",
        "brand_id",
        "submission_channel",
        "channelback_allowed",
        "ticket_id",
        "ticket_type",
        "ticket_form_id",
        "group_id",
        "assignee_id",
        "creation_timestamp",
        "due_date",
        "followup_ids",
        "followup_source_id",
        "forum_topic_id",
        "last_update_timestamp",
        "merged_ticket_ids",
        "priority",
        "problem_id",
        "requester_id",
        "submitter_id"
    FROM "ticket_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_ticket_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_data
  description: The table is about support tickets. It contains details like ticket
    ID, creation date, status, requester, assignee, and subject. Each row represents
    a single ticket with its attributes. The table includes information on how the
    ticket was submitted (via web, email), its current status (solved, closed, deleted),
    and associated organizations or groups. It also tracks ticket updates and links
    to related tickets.
  columns:
  - name: ticket_recipient_address
    description: Email address or identifier of the ticket recipient
    tests:
    - not_null
  - name: is_public
    description: Indicates if the ticket is publicly visible
    tests:
    - not_null
  - name: ticket_recipient_name
    description: Name of the ticket recipient or department
    tests:
    - not_null
    - accepted_values:
        values:
        - Support
        - IT
        - Customer Service
        - Technical Support
        - Sales
        - Billing
        - Product Team
        - Quality Assurance
        - Operations
        - Escalations
        - Management
        - Human Resources
        - Legal
        - Security
  - name: api_url
    description: URL to access the ticket in the API
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique URL for each ticket. For this table,
        each row represents a single ticket, and the api_url is unique across rows
        as it includes the ticket_id in the URL.
  - name: has_incidents
    description: Indicates if ticket has associated incidents
    tests:
    - not_null
  - name: ticket_description
    description: Detailed description of the ticket issue
    tests:
    - not_null
  - name: recipient_email
    description: Email address of the ticket recipient
    tests:
    - not_null
  - name: ticket_subject
    description: Brief subject line of the ticket
    tests:
    - not_null
  - name: organization_id
    description: ID of the organization associated with the ticket
    tests:
    - not_null
  - name: ticket_status
    description: Current status of the ticket
    tests:
    - not_null
    - accepted_values:
        values:
        - closed
        - deleted
        - solved
        - open
        - pending
        - in progress
        - on hold
        - escalated
        - new
        - reopened
  - name: brand_id
    description: ID of the brand associated with the ticket
    tests:
    - not_null
  - name: submission_channel
    description: Channel through which the ticket was submitted
    tests:
    - not_null
    - accepted_values:
        values:
        - email
        - web
        - phone
        - in-person
        - mobile app
        - social media
        - chat
  - name: channelback_allowed
    description: Indicates if channelback is allowed
    tests:
    - not_null
  - name: ticket_id
    description: Unique identifier for the ticket
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each ticket. For this table,
        where each row is a ticket, the ticket_id is unique across rows.
  - name: ticket_type
    description: Type of the ticket
    tests:
    - accepted_values:
        values:
        - incident
        - problem
        - change
        - service request
        - task
        - alert
        - event
        - knowledge
        - project
        - maintenance
    cocoon_meta:
      missing_acceptable: Some tickets may not require a specific type.
  - name: ticket_form_id
    description: ID of the form used to create the ticket
    tests:
    - not_null
  - name: group_id
    description: ID of the group handling the ticket
    tests:
    - not_null
  - name: assignee_id
    description: ID of the agent assigned to the ticket
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp of ticket creation
    tests:
    - not_null
  - name: due_date
    description: Due date for the ticket
    cocoon_meta:
      missing_acceptable: Not all tickets require a due date.
  - name: followup_ids
    description: IDs of follow-up tickets
    cocoon_meta:
      missing_acceptable: Not all tickets have followups.
  - name: followup_source_id
    description: ID of the follow-up source
    cocoon_meta:
      missing_acceptable: Not all tickets have followups.
  - name: forum_topic_id
    description: ID of related forum topic
    cocoon_meta:
      missing_acceptable: Not all tickets are related to forum topics.
  - name: last_update_timestamp
    description: Timestamp of last ticket update
    tests:
    - not_null
  - name: merged_ticket_ids
    description: IDs of tickets merged with this one
    cocoon_meta:
      missing_acceptable: Not all tickets are merged with others.
  - name: priority
    description: Priority level of the ticket
    cocoon_meta:
      missing_acceptable: Not all tickets are assigned a priority.
  - name: problem_id
    description: ID of the problem associated with the ticket
    cocoon_meta:
      missing_acceptable: Not all tickets are linked to a specific problem.
  - name: requester_id
    description: ID of the person who requested the ticket
    tests:
    - not_null
  - name: submitter_id
    description: ID of the person who submitted the ticket
    tests:
    - not_null

stg_daylight_time_data (first 100 rows)

time_zone year_ daylight_saving_offset_hours daylight_saving_end_utc daylight_saving_start_utc
0 Montevideo 1970 1 1970-06-14 00:00:00 1970-04-25 00:00:00
1 Cairo 1970 1 1970-10-01 03:00:00 1970-05-01 01:00:00

stg_daylight_time_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"daylight_time_data_projected" AS (
    -- Projection: Selecting 5 out of 6 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "time_zone",
        "year_",
        "daylight_end_utc",
        "daylight_offset",
        "daylight_start_utc"
    FROM "daylight_time_data"
),

"daylight_time_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- daylight_end_utc -> daylight_saving_end_utc
    -- daylight_offset -> daylight_saving_offset_hours
    -- daylight_start_utc -> daylight_saving_start_utc
    SELECT 
        "time_zone",
        "year_",
        "daylight_end_utc" AS "daylight_saving_end_utc",
        "daylight_offset" AS "daylight_saving_offset_hours",
        "daylight_start_utc" AS "daylight_saving_start_utc"
    FROM "daylight_time_data_projected"
),

"daylight_time_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- daylight_saving_end_utc: from VARCHAR to TIMESTAMP
    -- daylight_saving_start_utc: from VARCHAR to TIMESTAMP
    SELECT
        "time_zone",
        "year_",
        "daylight_saving_offset_hours",
        CAST("daylight_saving_end_utc" AS TIMESTAMP) AS "daylight_saving_end_utc",
        CAST("daylight_saving_start_utc" AS TIMESTAMP) AS "daylight_saving_start_utc"
    FROM "daylight_time_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "daylight_time_data_projected_renamed_casted"

stg_daylight_time_data.yml (Document the table)

version: 2
models:
- name: stg_daylight_time_data
  description: The table is about daylight saving time data. It contains information
    for different time zones and years. The details include the start and end times
    of daylight saving in UTC, and the daylight offset. Each row represents a specific
    time zone and year combination with its corresponding daylight saving information.
  columns:
  - name: time_zone
    description: Name of the time zone
    tests:
    - not_null
  - name: year_
    description: Year of the daylight saving time data
    tests:
    - not_null
  - name: daylight_saving_offset_hours
    description: Number of hours added during daylight saving time
    tests:
    - not_null
  - name: daylight_saving_end_utc
    description: End date and time of daylight saving in UTC
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the end date and time of daylight saving
        in UTC. For this table, each row is for a specific time zone and year combination.
        The daylight_saving_end_utc could potentially be unique across rows, as it's
        unlikely for different time zones to have the exact same end time down to
        the second.
  - name: daylight_saving_start_utc
    description: Start date and time of daylight saving in UTC
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the start date and time of daylight saving
        in UTC. For this table, each row is for a specific time zone and year combination.
        The daylight_saving_start_utc could potentially be unique across rows, as
        it's unlikely for different time zones to have the exact same start time down
        to the second.

stg_ticket_field_history_data (first 100 rows)

changed_field ticket_id new_value update_timestamp
0 status 103 solved 2020-02-10 08:36:38
1 status 107 solved 2020-02-10 08:36:38
2 status 982 solved 2020-02-10 22:32:56
3 status 312 solved 2020-02-10 08:35:18
4 status 6964 solved 2020-06-01 21:11:59
5 assignee_id 108 1111 2020-02-10 08:36:38
6 status 95 solved 2020-02-10 08:36:39
7 status 11016 solved 2020-09-02 15:53:15
8 status 1192 solved 2020-02-12 00:22:26
9 status 974 solved 2020-02-10 21:47:41
10 priority 980 solved 2020-02-10 22:06:57
11 status 11071 solved 2020-09-02 11:01:27
12 status 102 solved 2020-02-10 08:36:38
13 status 226 solved 2020-02-10 08:35:52
14 status 141 solved 2020-02-10 08:36:25
15 status 8205 solved 2020-07-07 23:01:47
16 status 229 solved 2020-02-10 08:35:52

stg_ticket_field_history_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_field_history_data_projected" AS (
    -- Projection: Selecting 5 out of 6 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "field_name",
        "ticket_id",
        "updated",
        "user_id",
        "value_"
    FROM "ticket_field_history_data"
),

"ticket_field_history_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- field_name -> changed_field
    -- updated -> update_timestamp
    -- value_ -> new_value
    SELECT 
        "field_name" AS "changed_field",
        "ticket_id",
        "updated" AS "update_timestamp",
        "user_id",
        "value_" AS "new_value"
    FROM "ticket_field_history_data_projected"
),

"ticket_field_history_data_projected_renamed_dedup" AS (
    -- Deduplication: Removed 1 duplicated rows
    SELECT DISTINCT * FROM "ticket_field_history_data_projected_renamed"
),

"ticket_field_history_data_projected_renamed_dedup_casted" AS (
    -- Column Type Casting: 
    -- update_timestamp: from VARCHAR to TIMESTAMP
    -- user_id: from DECIMAL to VARCHAR
    SELECT
        "changed_field",
        "ticket_id",
        "new_value",
        CAST("update_timestamp" AS TIMESTAMP) AS "update_timestamp",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "ticket_field_history_data_projected_renamed_dedup"
),

"ticket_field_history_data_projected_renamed_dedup_casted_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- user_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "changed_field",
        "ticket_id",
        "new_value",
        "update_timestamp"
    FROM "ticket_field_history_data_projected_renamed_dedup_casted"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_field_history_data_projected_renamed_dedup_casted_missing_handled"

stg_ticket_field_history_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_field_history_data
  description: The table is about ticket field history data. It records changes to
    ticket fields like status and priority. Each row represents a change, including
    the field name, ticket ID, update timestamp, user ID (if applicable), and new
    value. This allows tracking of how ticket details evolve over time.
  columns:
  - name: changed_field
    description: Name of the ticket field that was changed
    tests:
    - not_null
    - accepted_values:
        values:
        - status
        - assignee_id
        - priority
        - subject
        - description
        - type
        - due_date
        - tags
        - requester_id
        - group_id
        - cc_emails
        - custom_fields
  - name: ticket_id
    description: Unique identifier for the ticket
    tests:
    - not_null
  - name: new_value
    description: New value of the changed field
    tests:
    - not_null
    - accepted_values:
        values:
        - solved
        - unsolved
        - '1111'
  - name: update_timestamp
    description: Timestamp of when the field was updated
    tests:
    - not_null

stg_brand_data (first 100 rows)

brand_id is_deleted is_active brand_url is_default has_help_center help_center_state logo_content_type logo_content_url is_logo_deleted logo_file_name logo_height logo_id is_logo_inline logo_mapped_content_url logo_size logo_url logo_width brand_name brand_subdomain brand_main_url
0 36000 False True 5e52582af2ab7c3a6e7fa6856b4cd945 True True enabled image/png 5e52582af2ab7c3a6e7fa6856b4cd945 False 2abdc594c0ad6eb2438448b3cbf7da56 22 3600 False 5e52582af2ab7c3a6e7fa6856b4cd945 1961 5e52582af2ab7c3a6e7fa6856b4cd945 80 2abdc594c0ad6eb2438448b3cbf7da56 2abdc594c0ad6eb2438448b3cbf7da56 5e52582af2ab7c3a6e7fa6856b4cd945

stg_brand_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"brand_data_projected" AS (
    -- Projection: Selecting 21 out of 22 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "active",
        "brand_url",
        "default_",
        "has_help_center",
        "help_center_state",
        "logo_content_type",
        "logo_content_url",
        "logo_deleted",
        "logo_file_name",
        "logo_height",
        "logo_id",
        "logo_inline",
        "logo_mapped_content_url",
        "logo_size",
        "logo_url",
        "logo_width",
        "name",
        "subdomain",
        "url"
    FROM "brand_data"
),

"brand_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> brand_id
    -- _fivetran_deleted -> is_deleted
    -- active -> is_active
    -- default_ -> is_default
    -- logo_deleted -> is_logo_deleted
    -- logo_inline -> is_logo_inline
    -- name -> brand_name
    -- subdomain -> brand_subdomain
    -- url -> brand_main_url
    SELECT 
        "id" AS "brand_id",
        "_fivetran_deleted" AS "is_deleted",
        "active" AS "is_active",
        "brand_url",
        "default_" AS "is_default",
        "has_help_center",
        "help_center_state",
        "logo_content_type",
        "logo_content_url",
        "logo_deleted" AS "is_logo_deleted",
        "logo_file_name",
        "logo_height",
        "logo_id",
        "logo_inline" AS "is_logo_inline",
        "logo_mapped_content_url",
        "logo_size",
        "logo_url",
        "logo_width",
        "name" AS "brand_name",
        "subdomain" AS "brand_subdomain",
        "url" AS "brand_main_url"
    FROM "brand_data_projected"
),

"brand_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- brand_name: The problem is that the brand_name column contains a value that appears to be a hash or encoded string ('2abdc594c0ad6eb2438448b3cbf7da56') rather than a readable brand name. This value is not meaningful or interpretable as a brand name. The correct value should be an empty string, as we don't have enough information to determine the actual brand name. 
    SELECT
        "brand_id",
        "is_deleted",
        "is_active",
        "brand_url",
        "is_default",
        "has_help_center",
        "help_center_state",
        "logo_content_type",
        "logo_content_url",
        "is_logo_deleted",
        "logo_file_name",
        "logo_height",
        "logo_id",
        "is_logo_inline",
        "logo_mapped_content_url",
        "logo_size",
        "logo_url",
        "logo_width",
        CASE
            WHEN "brand_name" = '''2abdc594c0ad6eb2438448b3cbf7da56''' THEN ''''
            ELSE "brand_name"
        END AS "brand_name",
        "brand_subdomain",
        "brand_main_url"
    FROM "brand_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "brand_data_projected_renamed_cleaned"

stg_brand_data.yml (Document the table)

version: 2
models:
- name: stg_brand_data
  description: The table is about brand data. It contains details of brands including
    their ID, name, URL, logo information, and help center status. The table tracks
    whether brands are active, have a default status, and includes various attributes
    of their logos such as file type, dimensions, and URLs. It also records subdomain
    information and whether the brand has an enabled help center.
  columns:
  - name: brand_id
    description: Unique identifier for the brand
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each brand. For this
        table, each row represents a unique brand. The brand_id is designed to be
        a unique identifier across rows.
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: is_active
    description: Indicates if the brand is currently active
    tests:
    - not_null
  - name: brand_url
    description: URL associated with the brand
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column likely represents a unique URL for each brand. For this
        table, each row represents a unique brand. The brand_url is likely to be unique
        across rows as it's typically a distinct identifier for each brand's page
        or profile.
  - name: is_default
    description: Indicates if this is the default brand
    tests:
    - not_null
  - name: has_help_center
    description: Indicates if the brand has a help center
    tests:
    - not_null
  - name: help_center_state
    description: Current state of the help center
    tests:
    - not_null
    - accepted_values:
        values:
        - enabled
        - disabled
  - name: logo_content_type
    description: File type of the brand logo
    tests:
    - not_null
    - accepted_values:
        values:
        - image/png
        - image/jpeg
        - image/gif
        - image/bmp
        - image/tiff
        - image/webp
        - image/svg+xml
  - name: logo_content_url
    description: URL to access the logo content
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the URL to access the logo content. While it's
        likely to be unique for each brand's logo, there's a small possibility of
        duplicate URLs. However, given the nature of content URLs, it could be considered
        a candidate key.
  - name: is_logo_deleted
    description: Indicates if the logo has been deleted
    tests:
    - not_null
  - name: logo_file_name
    description: File name of the brand logo
    tests:
    - not_null
  - name: logo_height
    description: Height of the logo in pixels
    tests:
    - not_null
  - name: logo_id
    description: Unique identifier for the logo
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each brand's logo.
        For this table, each row represents a unique brand. The logo_id is likely
        to be unique across rows as each brand would have its own distinct logo.
  - name: is_logo_inline
    description: Indicates if the logo is inline
    tests:
    - not_null
  - name: logo_mapped_content_url
    description: Mapped URL for logo content
    tests:
    - not_null
  - name: logo_size
    description: File size of the logo in bytes
    tests:
    - not_null
  - name: logo_url
    description: URL to access the logo
    tests:
    - not_null
  - name: logo_width
    description: Width of the logo in pixels
    tests:
    - not_null
  - name: brand_name
    description: Name of the brand
    tests:
    - not_null
  - name: brand_subdomain
    description: Subdomain associated with the brand
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column likely represents a subdomain associated with each brand.
        For this table, each row represents a unique brand. The brand_subdomain is
        likely to be unique across rows as it's typically a distinct identifier for
        each brand's subdomain.
  - name: brand_main_url
    description: Main URL for the brand
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the main URL for the brand. For this table,
        each row represents a unique brand. The brand_main_url is likely to be unique
        across rows as it's typically a distinct identifier for each brand's website.

stg_user_tag_data (first 100 rows)

tag_id user_id
0 362ebc379b4b1b33b9656a2232a9e673 40092
1 266e0d3d29830abfe7d4ed98b47966f7 40092
2 266e0d3d29830abfe7d4ed98b47966f7 39882
3 ef43d72019ff152f0b1209e962a2f4f2 40282
4 ef43d72019ff152f0b1209e962a2f4f2 40284
5 ef43d72019ff152f0b1209e962a2f4f2 40285
6 d041ea3c4cb5f32804365f2f732dd88a 42370
7 b545892c9465b8e00d32b20fcd55caf7 41244
8 266e0d3d29830abfe7d4ed98b47966f7 40386
9 362ebc379b4b1b33b9656a2232a9e673 40386

stg_user_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"user_tag_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "tag",
        "user_id"
    FROM "user_tag_data"
),

"user_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- tag -> tag_id
    SELECT 
        "tag" AS "tag_id",
        "user_id"
    FROM "user_tag_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "user_tag_data_projected_renamed"

stg_user_tag_data.yml (Document the table)

version: 2
models:
- name: stg_user_tag_data
  description: The table represents a many-to-many relationship between users and
    tags. Each row shows a tag associated with a user_id. Users can have multiple
    tags, and tags can be assigned to multiple users. The tag column contains hash-like
    strings, while user_id contains numeric identifiers. This structure allows for
    tracking which tags are associated with which users.
  columns:
  - name: tag_id
    description: Unique identifier for a tag
    tests:
    - not_null
  - name: user_id
    description: Numeric identifier for a user
    tests:
    - not_null

stg_schedule_holiday_data (first 100 rows)

holiday_id schedule_id is_deleted holiday_name end_date start_date
0 1163028 360000 False Test Holiday 2022-12-27 2022-12-26
1 1163027 360000 False Test Holiday 2 2022-12-29 2022-12-29

stg_schedule_holiday_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"schedule_holiday_data_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "schedule_id",
        "_fivetran_deleted",
        "end_date",
        "name",
        "start_date"
    FROM "schedule_holiday_data"
),

"schedule_holiday_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> holiday_id
    -- _fivetran_deleted -> is_deleted
    -- name -> holiday_name
    SELECT 
        "id" AS "holiday_id",
        "schedule_id",
        "_fivetran_deleted" AS "is_deleted",
        "end_date",
        "name" AS "holiday_name",
        "start_date"
    FROM "schedule_holiday_data_projected"
),

"schedule_holiday_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- end_date: from VARCHAR to DATE
    -- start_date: from VARCHAR to DATE
    SELECT
        "holiday_id",
        "schedule_id",
        "is_deleted",
        "holiday_name",
        CAST("end_date" AS DATE) AS "end_date",
        CAST("start_date" AS DATE) AS "start_date"
    FROM "schedule_holiday_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "schedule_holiday_data_projected_renamed_casted"

stg_schedule_holiday_data.yml (Document the table)

version: 2
models:
- name: stg_schedule_holiday_data
  description: The table is about holiday schedules. It contains details of holidays
    including an ID, schedule ID, deletion status, end date, name, and start date.
    Each row represents a specific holiday period within a schedule. The schedule_id
    likely links to a separate schedule table, allowing multiple holidays to be associated
    with one schedule.
  columns:
  - name: holiday_id
    description: Unique identifier for the holiday entry
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each holiday entry. For this
        table, each row represents a specific holiday period within a schedule. holiday_id
        is likely to be unique across rows as it's designed to be an identifier.
  - name: schedule_id
    description: Identifier linking to a separate schedule table
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record has been deleted
    tests:
    - not_null
  - name: holiday_name
    description: Name or description of the holiday
    tests:
    - not_null
  - name: end_date
    description: The last day of the holiday period
    tests:
    - not_null
  - name: start_date
    description: The first day of the holiday period
    tests:
    - not_null

stg_user_data (first 100 rows)

is_active is_chat_only email locale locale_id is_moderator full_name private_comments_only organization_id is_restricted_agent user_role is_shared is_shared_agent is_suspended ticket_restriction time_zone is_2fa_enabled api_url account_verified alias creation_date last_login_date last_update_date user_id
0 True False example@email.com en-US 1 False John Doe False 3.702977e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403958466973.json True None 2020-02-26 22:55:12 2023-02-26 22:55:12 2020-02-26 22:55:12 403958466973
1 True False example@email.com en-US 1 False John Doe False NaN True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403969371634.json True None 2020-02-26 22:41:37 2023-02-26 22:55:12 2020-02-26 22:41:37 403969371634
2 True False example@email.com en-US 1 False John Doe False 3.703192e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403957746773.json True None 2020-02-26 22:35:14 2023-02-26 22:55:12 2020-02-26 22:35:15 403957746773
3 True False example@email.com en-US 1 False John Doe False 3.703262e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403970285734.json True None 2020-02-26 23:07:41 2023-02-26 22:55:12 2020-02-26 23:07:41 403970285734
4 True False example@email.com en-US 1 False John Doe False 3.703262e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403969943274.json True None 2020-02-26 22:57:51 2023-02-26 22:55:12 2020-02-26 22:57:52 403969943274
5 True False example@email.com en-US 1 False John Doe False 3.703262e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403959084893.json True None 2020-02-26 23:14:08 2023-02-26 22:55:12 2020-02-26 23:14:08 403959084893
6 True False example@email.com en-US 1 False John Doe False 3.703192e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403969141074.json True None 2020-02-26 22:34:41 2023-02-26 22:55:12 2020-02-26 22:34:41 403969141074
7 True False example@email.com en-US 1 False John Doe False 3.703313e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403957960093.json True None 2020-02-26 22:41:37 2023-02-26 22:55:12 2020-02-26 22:41:38 403957960093
8 True False example@email.com en-US 1 False John Doe False 3.703192e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403969141094.json True None 2020-02-26 22:34:41 2023-02-26 22:55:12 2020-02-26 22:34:41 403969141094
9 True False example@email.com en-US 1 False John Doe False 3.703261e+11 True end-user False False False requested Pacific Time (US & Canada) False https://fivetran1813.zendesk.com/api/v2/users/403959218893.json True None 2020-02-26 23:18:06 2023-02-26 22:55:12 2020-02-26 23:18:06 403959218893

stg_user_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"user_data_projected" AS (
    -- Projection: Selecting 31 out of 32 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "active",
        "alias",
        "authenticity_token",
        "chat_only",
        "created_at",
        "details",
        "email",
        "external_id",
        "last_login_at",
        "locale",
        "locale_id",
        "moderator",
        "name",
        "notes",
        "only_private_comments",
        "organization_id",
        "phone",
        "remote_photo_url",
        "restricted_agent",
        "role",
        "shared",
        "shared_agent",
        "signature",
        "suspended",
        "ticket_restriction",
        "time_zone",
        "two_factor_auth_enabled",
        "updated_at",
        "url",
        "verified"
    FROM "user_data"
),

"user_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> user_id
    -- active -> is_active
    -- authenticity_token -> auth_token
    -- chat_only -> is_chat_only
    -- created_at -> creation_date
    -- details -> additional_details
    -- last_login_at -> last_login_date
    -- moderator -> is_moderator
    -- name -> full_name
    -- only_private_comments -> private_comments_only
    -- phone -> phone_number
    -- remote_photo_url -> photo_url
    -- restricted_agent -> is_restricted_agent
    -- role -> user_role
    -- shared -> is_shared
    -- shared_agent -> is_shared_agent
    -- suspended -> is_suspended
    -- two_factor_auth_enabled -> is_2fa_enabled
    -- updated_at -> last_update_date
    -- url -> api_url
    -- verified -> account_verified
    SELECT 
        "id" AS "user_id",
        "active" AS "is_active",
        "alias",
        "authenticity_token" AS "auth_token",
        "chat_only" AS "is_chat_only",
        "created_at" AS "creation_date",
        "details" AS "additional_details",
        "email",
        "external_id",
        "last_login_at" AS "last_login_date",
        "locale",
        "locale_id",
        "moderator" AS "is_moderator",
        "name" AS "full_name",
        "notes",
        "only_private_comments" AS "private_comments_only",
        "organization_id",
        "phone" AS "phone_number",
        "remote_photo_url" AS "photo_url",
        "restricted_agent" AS "is_restricted_agent",
        "role" AS "user_role",
        "shared" AS "is_shared",
        "shared_agent" AS "is_shared_agent",
        "signature",
        "suspended" AS "is_suspended",
        "ticket_restriction",
        "time_zone",
        "two_factor_auth_enabled" AS "is_2fa_enabled",
        "updated_at" AS "last_update_date",
        "url" AS "api_url",
        "verified" AS "account_verified"
    FROM "user_data_projected"
),

"user_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- additional_details: from DECIMAL to VARCHAR
    -- alias: from DECIMAL to VARCHAR
    -- auth_token: from DECIMAL to VARCHAR
    -- creation_date: from VARCHAR to TIMESTAMP
    -- external_id: from DECIMAL to VARCHAR
    -- last_login_date: from VARCHAR to TIMESTAMP
    -- last_update_date: from VARCHAR to TIMESTAMP
    -- notes: from DECIMAL to VARCHAR
    -- phone_number: from DECIMAL to VARCHAR
    -- photo_url: from DECIMAL to VARCHAR
    -- signature: from DECIMAL to VARCHAR
    -- user_id: from INT to VARCHAR
    SELECT
        "is_active",
        "is_chat_only",
        "email",
        "locale",
        "locale_id",
        "is_moderator",
        "full_name",
        "private_comments_only",
        "organization_id",
        "is_restricted_agent",
        "user_role",
        "is_shared",
        "is_shared_agent",
        "is_suspended",
        "ticket_restriction",
        "time_zone",
        "is_2fa_enabled",
        "api_url",
        "account_verified",
        CAST("additional_details" AS VARCHAR) AS "additional_details",
        CAST("alias" AS VARCHAR) AS "alias",
        CAST("auth_token" AS VARCHAR) AS "auth_token",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("external_id" AS VARCHAR) AS "external_id",
        CAST("last_login_date" AS TIMESTAMP) AS "last_login_date",
        CAST("last_update_date" AS TIMESTAMP) AS "last_update_date",
        CAST("notes" AS VARCHAR) AS "notes",
        CAST("phone_number" AS VARCHAR) AS "phone_number",
        CAST("photo_url" AS VARCHAR) AS "photo_url",
        CAST("signature" AS VARCHAR) AS "signature",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "user_data_projected_renamed"
),

"user_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 8 columns with unacceptable missing values
    -- additional_details has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- auth_token has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- external_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- notes has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- organization_id has 10.0 percent missing. Strategy: 🔄 Unchanged
    -- phone_number has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- photo_url has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- signature has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "is_active",
        "is_chat_only",
        "email",
        "locale",
        "locale_id",
        "is_moderator",
        "full_name",
        "private_comments_only",
        "organization_id",
        "is_restricted_agent",
        "user_role",
        "is_shared",
        "is_shared_agent",
        "is_suspended",
        "ticket_restriction",
        "time_zone",
        "is_2fa_enabled",
        "api_url",
        "account_verified",
        "alias",
        "creation_date",
        "last_login_date",
        "last_update_date",
        "user_id"
    FROM "user_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "user_data_projected_renamed_casted_missing_handled"

stg_user_data.yml (Document the table)

version: 2
models:
- name: stg_user_data
  description: The table is about user data in a system. It contains user details
    like ID, name, email, and account settings. The table includes information on
    user activity, roles, and preferences. It also stores authentication details,
    organization affiliations, and system-specific data like API URLs and timestamps
    for creation and updates.
  columns:
  - name: is_active
    description: Indicates if the user account is active
    tests:
    - not_null
  - name: is_chat_only
    description: User is restricted to chat interactions only
    tests:
    - not_null
  - name: email
    description: User's email address
    tests:
    - not_null
  - name: locale
    description: User's preferred language and region
    tests:
    - not_null
  - name: locale_id
    description: Identifier for user's locale
    tests:
    - not_null
  - name: is_moderator
    description: Indicates if user has moderator privileges
    tests:
    - not_null
  - name: full_name
    description: User's full name
    tests:
    - not_null
  - name: private_comments_only
    description: User can only make private comments
    tests:
    - not_null
  - name: organization_id
    description: Identifier of user's associated organization
    tests:
    - not_null
  - name: is_restricted_agent
    description: Indicates if user is a restricted agent
    tests:
    - not_null
  - name: user_role
    description: User's role in the system
    tests:
    - not_null
    - accepted_values:
        values:
        - end-user
        - admin
        - moderator
        - guest
        - editor
        - manager
        - developer
        - support
        - analyst
        - tester
  - name: is_shared
    description: Indicates if user account is shared
    tests:
    - not_null
  - name: is_shared_agent
    description: Indicates if user is a shared agent
    tests:
    - not_null
  - name: is_suspended
    description: Indicates if user account is suspended
    tests:
    - not_null
  - name: ticket_restriction
    description: User's ticket access restriction level
    tests:
    - not_null
    - accepted_values:
        values:
        - requested
        - pending
        - approved
        - denied
        - basic
        - standard
        - premium
        - vip
        - admin
        - superadmin
        - restricted
        - blocked
  - name: time_zone
    description: User's preferred time zone
    tests:
    - not_null
  - name: is_2fa_enabled
    description: Two-factor authentication status
    tests:
    - not_null
  - name: api_url
    description: API URL for user data
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique API URL for each user. For this table,
        each row represents a unique user in the system. The api_url appears to be
        unique across rows, as it includes the user_id in its structure.
  - name: account_verified
    description: User's account verification status
    tests:
    - not_null
  - name: alias
    description: Alternative name for the user
    cocoon_meta:
      missing_acceptable: Optional alternate name for users who don't use one.
  - name: creation_date
    description: Timestamp of user account creation
    tests:
    - not_null
  - name: last_login_date
    description: Timestamp of user's last login
    tests:
    - not_null
  - name: last_update_date
    description: Timestamp of last user account update
    tests:
    - not_null
  - name: user_id
    description: Unique identifier for the user
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each user. For this
        table, each row represents a unique user in the system. The user_id is designed
        to be unique across all users and appears to be unique across rows.

stg_ticket_comment_data (first 100 rows)

comment_content is_facebook_comment is_public ticket_id is_tweet is_voice_comment comment_id creation_timestamp user_id
0 body1 False True 1077 False False 1030820361033 2020-02-11 07:49:14 402935450113
1 body2 False True 7636 False False 1198167194074 2020-06-26 20:47:27 413068955393
2 body3 False True 7367 False False 1227221959693 2020-07-21 19:33:46 413068955393
3 body4 False True 7367 False False 1227005852153 2020-07-21 17:07:54 413068955393
4 body5 False True 7367 False False 1225658181514 2020-07-20 20:16:47 413068955393
5 body6 False True 7636 False False 1189989966994 2020-06-19 20:40:41 413068955393
6 body7 False True 7367 False False 1228317430153 2020-07-22 16:41:28 413068955393
7 body8 False True 7367 False False 1227234557733 2020-07-21 19:43:27 413068955393
8 body9 False True 7367 False False 1227270483653 2020-07-21 20:07:17 413068955393
9 body10 False True 13628 False False 1335841328993 2020-10-19 13:56:05 424142574593
10 body11 False True 13628 False False 1338387572833 2020-10-21 09:08:41 424142574593
11 body12 False True 13095 False False 1326976058654 2020-10-11 13:13:55 423362500353
12 body13 False True 13057 False False 1326333791494 2020-10-10 15:40:36 423362500353
13 body14 False True 13057 False False 1326246720513 2020-10-10 14:39:46 423362500353
14 body15 False True 13095 False False 1326861102034 2020-10-11 08:02:56 423362500353
15 body16 False True 13095 False False 1326864438354 2020-10-11 08:08:54 423362500353
16 body17 False True 14352 False False 1350940278314 2020-11-01 02:20:20 423362500353
17 body18 False True 13095 False False 1326296442114 2020-10-10 14:51:51 423362500353
18 body19 False True 13057 False False 1326862829914 2020-10-11 08:05:48 423362500353
19 body20 False True 13057 False False 1326808961753 2020-10-11 07:53:43 423362500353

stg_ticket_comment_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_comment_data_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "body",
        "created",
        "facebook_comment",
        "public_",
        "ticket_id",
        "tweet",
        "user_id",
        "voice_comment"
    FROM "ticket_comment_data"
),

"ticket_comment_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> comment_id
    -- body -> comment_content
    -- created -> creation_timestamp
    -- facebook_comment -> is_facebook_comment
    -- public_ -> is_public
    -- tweet -> is_tweet
    -- voice_comment -> is_voice_comment
    SELECT 
        "id" AS "comment_id",
        "body" AS "comment_content",
        "created" AS "creation_timestamp",
        "facebook_comment" AS "is_facebook_comment",
        "public_" AS "is_public",
        "ticket_id",
        "tweet" AS "is_tweet",
        "user_id",
        "voice_comment" AS "is_voice_comment"
    FROM "ticket_comment_data_projected"
),

"ticket_comment_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- comment_id: from INT to VARCHAR
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- user_id: from INT to VARCHAR
    SELECT
        "comment_content",
        "is_facebook_comment",
        "is_public",
        "ticket_id",
        "is_tweet",
        "is_voice_comment",
        CAST("comment_id" AS VARCHAR) AS "comment_id",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "ticket_comment_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_comment_data_projected_renamed_casted"

stg_ticket_comment_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_comment_data
  description: The table is about ticket comments. It contains details such as the
    comment ID, body, creation time, ticket ID, and user ID. It also includes boolean
    flags for Facebook comments, public visibility, tweets, and voice comments. Each
    row represents a single comment associated with a specific ticket and user.
  columns:
  - name: comment_content
    description: The content of the comment
    tests:
    - not_null
  - name: is_facebook_comment
    description: Indicates if the comment is from Facebook
    tests:
    - not_null
  - name: is_public
    description: Indicates if the comment is publicly visible
    tests:
    - not_null
  - name: ticket_id
    description: Identifier of the ticket associated with the comment
    tests:
    - not_null
  - name: is_tweet
    description: Indicates if the comment is a tweet
    tests:
    - not_null
  - name: is_voice_comment
    description: Indicates if the comment is a voice comment
    tests:
    - not_null
  - name: comment_id
    description: Unique identifier for the comment
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each comment. For this
        table, where each row represents a single comment, the comment_id should be
        unique across all rows, as it's designed to distinctly identify each comment.
  - name: creation_timestamp
    description: Timestamp when the comment was created
    tests:
    - not_null
  - name: user_id
    description: Identifier of the user who made the comment
    tests:
    - not_null

stg_group_data (first 100 rows)

is_deleted group_name api_url created_at group_id updated_at
0 False Non-Support - (Billing Tickets) https://zendesk.com/api/v2/groups/360008426314.json 2020-02-04 00:03:11 360008426314 2020-02-04 00:40:57
1 False Assignment Group 7 https://zendesk.com/api/v2/groups/360013670454.json 2020-09-01 10:16:50 360013670454 2020-09-01 10:16:50
2 False Information Technology https://zendesk.com/api/v2/groups/360014827554.json 2020-10-07 21:53:42 360014827554 2020-10-07 21:57:54
3 False Non-Support - (Prospect Tickets) https://zendesk.com/api/v2/groups/360008418973.json 2020-02-04 00:41:49 360008418973 2020-02-04 00:41:49
4 False Assignment Group 4 https://zendesk.com/api/v2/groups/360013670414.json 2020-09-01 10:15:17 360013670414 2020-09-01 10:15:17
5 False Assignment Group 5 https://zendesk.com/api/v2/groups/360013656353.json 2020-09-01 10:15:55 360013656353 2020-09-01 10:15:55
6 False Assignment Group 6 https://zendesk.com/api/v2/groups/360013656373.json 2020-09-01 10:16:23 360013656373 2020-09-01 10:16:23
7 False Assignment Group 3 https://zendesk.com/api/v2/groups/360013670374.json 2020-09-01 10:14:39 360013670374 2020-09-01 10:14:39

stg_group_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"group_data_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "created_at",
        "name",
        "updated_at",
        "url"
    FROM "group_data"
),

"group_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> group_id
    -- _fivetran_deleted -> is_deleted
    -- name -> group_name
    -- url -> api_url
    SELECT 
        "id" AS "group_id",
        "_fivetran_deleted" AS "is_deleted",
        "created_at",
        "name" AS "group_name",
        "updated_at",
        "url" AS "api_url"
    FROM "group_data_projected"
),

"group_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- group_id: from INT to VARCHAR
    -- updated_at: from VARCHAR to TIMESTAMP
    SELECT
        "is_deleted",
        "group_name",
        "api_url",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("group_id" AS VARCHAR) AS "group_id",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at"
    FROM "group_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "group_data_projected_renamed_casted"

stg_group_data.yml (Document the table)

version: 2
models:
- name: stg_group_data
  description: The table is about support groups. It contains group details such as
    ID, name, creation date, update date, and API URL. Each row represents a distinct
    group like "Non-Support - (Billing Tickets)" or "Information Technology". The
    table likely comes from a customer support platform like Zendesk, as evidenced
    by the URL structure. It tracks when groups were created and last updated, which
    is useful for auditing and management purposes.
  columns:
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: group_name
    description: Name of the support group
    tests:
    - not_null
  - name: api_url
    description: API endpoint URL for the group
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains the API endpoint URL for each group. For this
        table, each row represents a unique group, and the API URL is specific to
        each group. The URL structure includes the group_id, making it unique for
        each group.
  - name: created_at
    description: Timestamp when the group was created
    tests:
    - not_null
  - name: group_id
    description: Unique identifier for the group
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each group. For this
        table, each row represents a distinct group, and the group_id is designed
        to be a unique identifier across all groups in the system.
  - name: updated_at
    description: Timestamp when the group was last updated
    tests:
    - not_null

stg_time_zone_data (first 100 rows)

location utc_offset
0 London +00:00
1 Dublin +00:00

stg_time_zone_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"time_zone_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "time_zone",
        "standard_offset"
    FROM "time_zone_data"
),

"time_zone_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- time_zone -> location
    -- standard_offset -> utc_offset
    SELECT 
        "time_zone" AS "location",
        "standard_offset" AS "utc_offset"
    FROM "time_zone_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "time_zone_data_projected_renamed"

stg_time_zone_data.yml (Document the table)

version: 2
models:
- name: stg_time_zone_data
  description: The table is about time zones. It contains information on different
    cities or regions and their standard time offsets from UTC. The "time_zone" column
    lists locations like London and Dublin. The "standard_offset" column shows the
    time difference from UTC, with both examples having "+00:00" indicating they are
    in the same time zone.
  columns:
  - name: location
    description: Name of the city or region in the time zone
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the name of the city or region in the time
        zone. For this table, each row represents a unique location and its corresponding
        UTC offset. Since each city or region typically has its own distinct time
        zone setting, the location is likely to be unique across rows.
  - name: utc_offset
    description: Time difference from UTC in hours and minutes
    tests:
    - not_null

stg_domain_name_data (first 100 rows)

row_index organization_id encrypted_domain
0 0 370335 f89dedacd29b4d5b0825d81a0c0fedfa
1 0 370355 266cb5d33605c86846121770b4d312d1
2 0 370364 7878cf470a679110108b3da5ea619a0c
3 0 370298 6fc5425be3db5b4f70eaf61c4323672c
4 0 370298 040fd698880646db8df23291671e4b57
5 0 370298 421319ed5d78b90c3048510efd8df758
6 0 370298 bafeca689fec90acd8878e2fb025e355
7 0 370298 4ba5ff9deb372b42e67838630b39ea3c
8 0 370298 7ae869ebf95fd6538c3af28cbbf29e53
9 0 370298 2c94112ed0107c614f8c5ea6b1944962

stg_domain_name_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"domain_name_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "index_",
        "organization_id",
        "domain_name"
    FROM "domain_name_data"
),

"domain_name_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- index_ -> row_index
    -- domain_name -> encrypted_domain
    SELECT 
        "index_" AS "row_index",
        "organization_id",
        "domain_name" AS "encrypted_domain"
    FROM "domain_name_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "domain_name_data_projected_renamed"

stg_domain_name_data.yml (Document the table)

version: 2
models:
- name: stg_domain_name_data
  description: The table represents a relation between organizations and domain names.
    Each row links an organization ID to a domain name. Some organizations may have
    multiple domain names associated with them, as seen with organization 370298 which
    has two entries. The domain names appear to be hashed or encrypted values rather
    than readable URLs.
  columns:
  - name: row_index
    description: Constant value, possibly for indexing or sorting
    tests:
    - not_null
  - name: organization_id
    description: Unique identifier for the organization
    tests:
    - not_null
  - name: encrypted_domain
    description: Hashed or encrypted domain name associated with organization
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a hashed or encrypted domain name associated
        with an organization. For this table, each row represents a link between an
        organization and a domain name. The encrypted_domain appears to be unique
        across rows in the given sample, and it's reasonable to assume that each domain
        name would have a unique encrypted value.

stg_organization_tag_data (first 100 rows)

organization_id tag_hash
0 370364 63ad9d34f3503826e5f649ae6b7ac92c
1 370401 63ad9d34f3503826e5f649ae6b7ac92c
2 370297 63ad9d34f3503826e5f649ae6b7ac92c
3 370321 63ad9d34f3503826e5f649ae6b7ac92c
4 370354 63ad9d34f3503826e5f649ae6b7ac92c
5 370334 63ad9d34f3503826e5f649ae6b7ac92c
6 370319 63ad9d34f3503826e5f649ae6b7ac92c
7 370298 63ad9d34f3503826e5f649ae6b7ac92c

stg_organization_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"organization_tag_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "organization_id",
        "tag"
    FROM "organization_tag_data"
),

"organization_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- tag -> tag_hash
    SELECT 
        "organization_id",
        "tag" AS "tag_hash"
    FROM "organization_tag_data_projected"
),

"organization_tag_data_projected_renamed_dedup" AS (
    -- Deduplication: Removed 4 duplicated rows
    SELECT DISTINCT * FROM "organization_tag_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "organization_tag_data_projected_renamed_dedup"

stg_organization_tag_data.yml (Document the table)

version: 2
models:
- name: stg_organization_tag_data
  description: The table represents a many-to-many relationship between organizations
    and tags. Each row links an organization (identified by organization_id) to a
    tag (represented by a hash-like string). Organizations can have multiple tags,
    and tags can be associated with multiple organizations. The table allows for tracking
    which tags are applied to which organizations.
  columns:
  - name: organization_id
    description: Unique identifier for an organization
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for an organization.
        For this table, each row links an organization to a tag. The organization_id
        appears to be unique across rows in the sample data, and it's described as
        a "Unique identifier for an organization" in the provided information.
  - name: tag_hash
    description: Hash-like string representing a tag
    tests:
    - not_null

stg_ticket_tag_data (first 100 rows)

tag_name ticket_id
0 customer 123
1 customer 455
2 prospect 123
3 prospect 455
4 customer 1012
5 connector 789
6 prospect 1011

stg_ticket_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_tag_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "tag",
        "ticket_id"
    FROM "ticket_tag_data"
),

"ticket_tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- tag -> tag_name
    SELECT 
        "tag" AS "tag_name",
        "ticket_id"
    FROM "ticket_tag_data_projected"
),

"ticket_tag_data_projected_renamed_dedup" AS (
    -- Deduplication: Removed 2 duplicated rows
    SELECT DISTINCT * FROM "ticket_tag_data_projected_renamed"
),

"ticket_tag_data_projected_renamed_dedup_casted" AS (
    -- Column Type Casting: 
    -- ticket_id: from INT to VARCHAR
    SELECT
        "tag_name",
        CAST("ticket_id" AS VARCHAR) AS "ticket_id"
    FROM "ticket_tag_data_projected_renamed_dedup"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_tag_data_projected_renamed_dedup_casted"

stg_ticket_tag_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_tag_data
  description: The table represents the relationship between tags and tickets. Each
    ticket can have multiple tags. Tags can be applied to multiple tickets. The table
    stores tag names and ticket IDs. It allows for categorizing or labeling tickets
    with various attributes like "customer" or "prospect".
  columns:
  - name: tag_name
    description: Represents the label or category assigned to a ticket
    tests:
    - not_null
    - accepted_values:
        values:
        - customer
        - prospect
        - connector
        - lead
        - partner
        - vendor
        - employee
        - investor
        - affiliate
        - reseller
        - distributor
        - supplier
        - competitor
        - media
        - government
        - non-profit
        - other
  - name: ticket_id
    description: Unique identifier for each support ticket
    tests:
    - not_null

stg_ticket_schedule_data (first 100 rows)

ticket_id schedule_id created_at
0 123 3600 2020-02-04 15:26:26
1 76900 3600 2019-05-29 15:32:24
2 77569 3600 2019-06-09 09:42:37
3 79336 3600 2019-06-20 19:53:39
4 79616 3600 2019-07-11 23:03:43
5 79972 3600 2019-07-16 20:11:01
6 90640 3600 2019-07-27 15:49:32
7 93969 3600 2019-09-19 02:16:54
8 94490 3600 2019-09-25 21:43:09
9 94490 3600 2019-09-03 13:57:05

stg_ticket_schedule_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_schedule_data_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "created_at",
        "ticket_id",
        "schedule_id"
    FROM "ticket_schedule_data"
),

"ticket_schedule_data_projected_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    SELECT
        "ticket_id",
        "schedule_id",
        CAST("created_at" AS TIMESTAMP) AS "created_at"
    FROM "ticket_schedule_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_schedule_data_projected_casted"

stg_ticket_schedule_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_schedule_data
  description: 'The table represents a relation between tickets and schedules. It
    contains three main pieces of information: the creation timestamp, the ticket
    ID, and the schedule ID. Each row links a specific ticket to a particular schedule,
    indicating when the ticket was created and associated with that schedule. The
    consistent schedule ID (3600) suggests all tickets in the sample are linked to
    the same schedule.'
  columns:
  - name: ticket_id
    description: Unique identifier for each ticket
    tests:
    - not_null
  - name: schedule_id
    description: Identifier for the associated schedule
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the ticket was created
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the timestamp when a ticket was created.
        For this table, each row represents a unique ticket associated with a schedule.
        The created_at timestamp is likely to be unique across rows, as it's unlikely
        that two tickets would be created at the exact same second. However, in theory,
        it's possible for two tickets to have the same creation timestamp if they
        were created simultaneously or if the system doesn't have millisecond precision.

stg_schedule_data (first 100 rows)

shift_end_local shift_start_local is_deleted shift_end_utc team_name shift_start_utc time_zone created_at id
0 8400 7680 False 8820 SupportCA 8100 Pacific Time (US & Canada) 2019-11-18 20:23:34 360000310393
1 5520 4800 False 5940 SupportCA 5220 Pacific Time (US & Canada) 2019-11-18 20:23:34 360000310393
2 6960 6240 False 7380 SupportCA 6660 Pacific Time (US & Canada) 2019-11-18 20:23:34 360000310393
3 2640 1920 False 3060 SupportCA 2340 Pacific Time (US & Canada) 2019-11-18 20:23:34 360000310393
4 4080 3360 False 4500 SupportCA 3780 Pacific Time (US & Canada) 2019-11-18 20:23:34 360000310393

stg_schedule_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"schedule_data_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "end_time",
        "id",
        "start_time",
        "_fivetran_deleted",
        "end_time_utc",
        "name",
        "start_time_utc",
        "time_zone",
        "created_at"
    FROM "schedule_data"
),

"schedule_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- end_time -> shift_end_local
    -- start_time -> shift_start_local
    -- _fivetran_deleted -> is_deleted
    -- end_time_utc -> shift_end_utc
    -- name -> team_name
    -- start_time_utc -> shift_start_utc
    SELECT 
        "end_time" AS "shift_end_local",
        "id",
        "start_time" AS "shift_start_local",
        "_fivetran_deleted" AS "is_deleted",
        "end_time_utc" AS "shift_end_utc",
        "name" AS "team_name",
        "start_time_utc" AS "shift_start_utc",
        "time_zone",
        "created_at"
    FROM "schedule_data_projected"
),

"schedule_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- id: from INT to VARCHAR
    SELECT
        "shift_end_local",
        "shift_start_local",
        "is_deleted",
        "shift_end_utc",
        "team_name",
        "shift_start_utc",
        "time_zone",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("id" AS VARCHAR) AS "id"
    FROM "schedule_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "schedule_data_projected_renamed_casted"

stg_schedule_data.yml (Document the table)

version: 2
models:
- name: stg_schedule_data
  description: The table is about support team schedules. It contains shift details
    like start and end times, both in local and UTC formats. Each record has a unique
    ID, team name (SupportCA), time zone (Pacific Time), and creation timestamp. The
    table tracks multiple shifts for the same team, likely representing different
    work periods throughout the day or week.
  columns:
  - name: shift_end_local
    description: Local end time of the shift in minutes
    tests:
    - not_null
  - name: shift_start_local
    description: Local start time of the shift in minutes
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: shift_end_utc
    description: UTC end time of the shift in minutes
    tests:
    - not_null
  - name: team_name
    description: Name of the support team
    tests:
    - not_null
  - name: shift_start_utc
    description: UTC start time of the shift in minutes
    tests:
    - not_null
  - name: time_zone
    description: Time zone of the support team
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the record was created
    tests:
    - not_null
  - name: id
    description: Unique identifier for the record
    tests:
    - not_null

stg_ticket_form_history_data (first 100 rows)

form_id is_deleted is_active form_identifier is_visible_to_end_user form_name created_at updated_at
0 360200 False False 2be0ef4b34cc85f6e3582b3bef65cd4e True 2be0ef4b34cc85f6e3582b3bef65cd4e 2019-11-07 23:14:54 2019-12-18 18:52:02
1 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-01-20 01:27:51
2 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-09-14 22:45:49
3 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2021-02-11 22:42:12
4 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-02-10 03:34:33
5 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-06-02 14:44:52
6 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2021-02-02 17:33:15
7 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-02-11 01:43:20
8 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2021-02-04 16:29:37
9 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-02-15 17:13:29
10 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-03-05 02:29:38
11 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-12-28 16:17:21
12 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-02-05 20:27:15
13 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56 2020-06-01 07:57:13
14 360205 False True 41f1a810ab161e4c3ee35e1afec238dd True 41f1a810ab161e4c3ee35e1afec238dd 2019-11-13 22:12:51 2020-12-23 17:11:16

stg_ticket_form_history_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"ticket_form_history_data_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "updated_at",
        "_fivetran_deleted",
        "active",
        "created_at",
        "display_name",
        "end_user_visible",
        "name"
    FROM "ticket_form_history_data"
),

"ticket_form_history_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> form_id
    -- _fivetran_deleted -> is_deleted
    -- active -> is_active
    -- display_name -> form_identifier
    -- end_user_visible -> is_visible_to_end_user
    -- name -> form_name
    SELECT 
        "id" AS "form_id",
        "updated_at",
        "_fivetran_deleted" AS "is_deleted",
        "active" AS "is_active",
        "created_at",
        "display_name" AS "form_identifier",
        "end_user_visible" AS "is_visible_to_end_user",
        "name" AS "form_name"
    FROM "ticket_form_history_data_projected"
),

"ticket_form_history_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- updated_at: from VARCHAR to TIMESTAMP
    SELECT
        "form_id",
        "is_deleted",
        "is_active",
        "form_identifier",
        "is_visible_to_end_user",
        "form_name",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at"
    FROM "ticket_form_history_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "ticket_form_history_data_projected_renamed_casted"

stg_ticket_form_history_data.yml (Document the table)

version: 2
models:
- name: stg_ticket_form_history_data
  description: The table is about ticket form history data. It contains details of
    ticket forms including their ID, creation date, update date, active status, display
    name, and visibility to end users. Each row represents a version or update of
    a ticket form, with multiple entries for the same ID showing its change history
    over time.
  columns:
  - name: form_id
    description: Unique numeric identifier for the ticket form
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record was deleted in Fivetran
    tests:
    - not_null
  - name: is_active
    description: Indicates if the ticket form is currently active
    tests:
    - not_null
  - name: form_identifier
    description: Unique identifier or name for the ticket form
    tests:
    - not_null
  - name: is_visible_to_end_user
    description: Indicates if the form is visible to end users
    tests:
    - not_null
  - name: form_name
    description: Name or identifier of the ticket form
    tests:
    - not_null
  - name: created_at
    description: Timestamp of when the ticket form was created
    tests:
    - not_null
  - name: updated_at
    description: Timestamp of when the ticket form was last updated
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the timestamp when the ticket form was last
        updated. For this table, each row represents a version or update of a ticket
        form. The updated_at column is unique across rows, as it captures the exact
        moment of each update, which is unlikely to be exactly the same for different
        updates.
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

snapshot_ticket_field_history_data (first 100 rows)

changed_field ticket_id new_value
0 status 107 solved
1 status 141 solved
2 status 229 solved
3 status 1192 solved
4 status 6964 solved
5 status 102 solved
6 status 974 solved
7 priority 980 solved
8 status 103 solved
9 status 982 solved
10 status 226 solved
11 status 8205 solved
12 status 95 solved
13 status 11071 solved
14 assignee_id 108 1111
15 status 312 solved
16 status 11016 solved

snapshot_ticket_field_history_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "ticket_id", "changed_field"
-- Effective date columns are "update_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "changed_field",
    "ticket_id",
    "new_value"
FROM (
     SELECT 
            "changed_field",
            "ticket_id",
            "new_value",
            ROW_NUMBER() OVER (
                PARTITION BY "ticket_id", "changed_field" 
                ORDER BY "update_timestamp" 
            DESC) AS "cocoon_rn"
    FROM "stg_ticket_field_history_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_ticket_field_history_data.yml (Document the table)

version: 2
models:
- name: snapshot_ticket_field_history_data
  description: The table is about the current state of ticket fields. It tracks the
    most recent version of each ticket's field values. Each row represents the latest
    change for a specific field of a ticket. The table includes the ticket ID, changed
    field name, and the most up-to-date value for that field. This snapshot provides
    a quick overview of the current status of all tickets in the system.
  columns:
  - name: changed_field
    description: Name of the ticket field that was changed
    tests:
    - not_null
    - accepted_values:
        values:
        - status
        - assignee_id
        - priority
        - subject
        - description
        - type
        - due_date
        - tags
        - requester_id
        - group_id
        - cc_emails
        - custom_fields
  - name: ticket_id
    description: Unique identifier for the ticket
    tests:
    - not_null
  - name: new_value
    description: New value of the changed field
    tests:
    - not_null
    - accepted_values:
        values:
        - solved
        - unsolved
        - '1111'
cocoon_meta:
  scd_base_table: stg_ticket_field_history_data

snapshot_ticket_form_history_data (first 100 rows)

form_id is_deleted is_active form_identifier is_visible_to_end_user form_name created_at
0 360205 False True 41f1a810ab161e4c3ee35e1afec238dd True 41f1a810ab161e4c3ee35e1afec238dd 2019-11-13 22:12:51
1 360200 False False 2be0ef4b34cc85f6e3582b3bef65cd4e True 2be0ef4b34cc85f6e3582b3bef65cd4e 2019-11-07 23:14:54
2 360204 False True 1fa61d55d8504dbd25f92f31a9326b83 True 1fa61d55d8504dbd25f92f31a9326b83 2019-11-13 22:40:56

snapshot_ticket_form_history_data.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "form_id"
-- Effective date columns are "updated_at"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "form_id",
    "is_deleted",
    "is_active",
    "form_identifier",
    "is_visible_to_end_user",
    "form_name",
    "created_at"
FROM (
     SELECT 
            "form_id",
            "is_deleted",
            "is_active",
            "form_identifier",
            "is_visible_to_end_user",
            "form_name",
            "created_at",
            ROW_NUMBER() OVER (
                PARTITION BY "form_id" 
                ORDER BY "updated_at" 
            DESC) AS "cocoon_rn"
    FROM "stg_ticket_form_history_data"
) ranked
WHERE "cocoon_rn" = 1

snapshot_ticket_form_history_data.yml (Document the table)

version: 2
models:
- name: snapshot_ticket_form_history_data
  description: The table is about current ticket form data. It tracks the most recent
    version of each ticket form, including its ID, active status, identifier, visibility
    to end users, and name. Each row represents a unique ticket form with its latest
    information. The table excludes historical versions and update timestamps, focusing
    on the current state of each form.
  columns:
  - name: form_id
    description: Unique numeric identifier for the ticket form
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: is_deleted
    description: Indicates if the record was deleted in Fivetran
    tests:
    - not_null
  - name: is_active
    description: Indicates if the ticket form is currently active
    tests:
    - not_null
  - name: form_identifier
    description: Unique identifier or name for the ticket form
    tests:
    - not_null
  - name: is_visible_to_end_user
    description: Indicates if the form is visible to end users
    tests:
    - not_null
  - name: form_name
    description: Name or identifier of the ticket form
    tests:
    - not_null
  - name: created_at
    description: Timestamp of when the ticket form was created
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_ticket_form_history_data
We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_ticket_data stg_ticket_data stg_brand_data stg_brand_data stg_ticket_data->stg_brand_data stg_group_data stg_group_data stg_ticket_data->stg_group_data stg_organization_data stg_organization_data stg_ticket_data->stg_organization_data snapshot_ticket_form_history_data snapshot_ticket_form_history_data stg_ticket_data->snapshot_ticket_form_history_data stg_user_data stg_user_data stg_ticket_data->stg_user_data stg_ticket_data->stg_user_data stg_ticket_data->stg_user_data stg_ticket_comment_data stg_ticket_comment_data stg_ticket_comment_data->stg_ticket_data stg_ticket_comment_data->stg_user_data stg_domain_name_data stg_domain_name_data stg_domain_name_data->stg_organization_data stg_ticket_tag_data stg_ticket_tag_data stg_ticket_tag_data->stg_ticket_data stg_organization_tag_data stg_organization_tag_data stg_organization_tag_data->stg_organization_data stg_schedule_data stg_schedule_data stg_schedule_holiday_data stg_schedule_holiday_data stg_schedule_holiday_data->stg_schedule_data stg_daylight_time_data stg_daylight_time_data stg_time_zone_data stg_time_zone_data stg_daylight_time_data->stg_time_zone_data stg_user_data->stg_organization_data stg_ticket_schedule_data stg_ticket_schedule_data stg_ticket_schedule_data->stg_ticket_data stg_ticket_schedule_data->stg_schedule_data snapshot_ticket_field_history_data snapshot_ticket_field_history_data snapshot_ticket_field_history_data->stg_ticket_data stg_user_tag_data stg_user_tag_data stg_user_tag_data->stg_user_data

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_brand_data
  primary_key: brand_id
  foreign_keys: []
- table_name: stg_ticket_data
  foreign_keys:
  - column: brand_id
    reference:
      table_name: stg_brand_data
      column: brand_id
  - column: group_id
    reference:
      table_name: stg_group_data
      column: group_id
  - column: organization_id
    reference:
      table_name: stg_organization_data
      column: organization_id
  - column: assignee_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: requester_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: submitter_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: ticket_form_id
    reference:
      table_name: snapshot_ticket_form_history_data
      column: form_id
  primary_key: ticket_id
- table_name: stg_group_data
  primary_key: group_id
  foreign_keys: []
- table_name: stg_organization_data
  primary_key: organization_id
  foreign_keys: []
- table_name: stg_domain_name_data
  foreign_keys:
  - column: organization_id
    reference:
      table_name: stg_organization_data
      column: organization_id
- table_name: stg_organization_tag_data
  foreign_keys:
  - column: organization_id
    reference:
      table_name: stg_organization_data
      column: organization_id
- table_name: stg_user_data
  foreign_keys:
  - column: organization_id
    reference:
      table_name: stg_organization_data
      column: organization_id
  primary_key: user_id
- table_name: stg_schedule_data
  primary_key: id
  foreign_keys: []
- table_name: stg_schedule_holiday_data
  foreign_keys:
  - column: schedule_id
    reference:
      table_name: stg_schedule_data
      column: id
- table_name: stg_ticket_schedule_data
  foreign_keys:
  - column: schedule_id
    reference:
      table_name: stg_schedule_data
      column: id
  - column: ticket_id
    reference:
      table_name: stg_ticket_data
      column: ticket_id
- table_name: stg_ticket_comment_data
  foreign_keys:
  - column: ticket_id
    reference:
      table_name: stg_ticket_data
      column: ticket_id
  - column: user_id
    reference:
      table_name: stg_user_data
      column: user_id
- table_name: stg_ticket_tag_data
  foreign_keys:
  - column: ticket_id
    reference:
      table_name: stg_ticket_data
      column: ticket_id
- table_name: snapshot_ticket_field_history_data
  foreign_keys:
  - column: ticket_id
    reference:
      table_name: stg_ticket_data
      column: ticket_id
- table_name: stg_time_zone_data
  primary_key: location
  foreign_keys: []
- table_name: stg_daylight_time_data
  foreign_keys:
  - column: time_zone
    reference:
      table_name: stg_time_zone_data
      column: location
- table_name: stg_user_tag_data
  foreign_keys:
  - column: user_id
    reference:
      table_name: stg_user_data
      column: user_id
- table_name: snapshot_ticket_form_history_data
  primary_key: form_id
  foreign_keys: []
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Brands
  entity_description: Represents different brands with their associated details and
    attributes.
  table_name: stg_brand_data
  primary_key: brand_id
- entity_name: Support Groups
  entity_description: Represents different support groups or teams within a customer
    support system.
  table_name: stg_group_data
  primary_key: group_id
- entity_name: Organizations
  entity_description: Represents client or customer organizations with their basic
    details and attributes.
  table_name: stg_organization_data
  primary_key: organization_id
- entity_name: Support Schedules
  entity_description: Represents work schedules or shifts for support teams.
  table_name: stg_schedule_data
  primary_key: id
- entity_name: Support Tickets
  entity_description: Represents individual customer support tickets with their details
    and status.
  table_name: stg_ticket_data
  primary_key: ticket_id
- entity_name: Time Zones
  entity_description: Represents different time zones with their standard offsets
    from UTC.
  table_name: stg_time_zone_data
  primary_key: location
- entity_name: Users
  entity_description: Represents individual users of the system with their account
    details and preferences.
  table_name: stg_user_data
  primary_key: user_id
- entity_name: Ticket Forms
  entity_description: Represents the current state of different ticket forms used
    in the support system.
  table_name: snapshot_ticket_form_history_data
  primary_key: form_id
relations:
- relation_name: SupportTicketManagement
  relation_description: Support Tickets are submitted by Users through Ticket Forms,
    assigned to Support Groups, managed under Brands, and associated with Organizations.
  table_name: stg_ticket_data
  entities:
  - Support Tickets
  - Brands
  - Support Groups
  - Organizations
  - Users
  - Ticket Forms
- relation_name: UserOrganizationMembership
  relation_description: Users belong to Organizations, with each user potentially
    associated with one organization through the organization_id field.
  table_name: stg_user_data
  entities:
  - Users
  - Organizations
- relation_description: This table associates Organizations with their encrypted domain
    names, where each Organization can have one or multiple domain names.
  table_name: stg_domain_name_data
  entities:
  - Organizations
- relation_description: This table associates Organizations with Tags, representing
    which tags are applied to each organization in a many-to-many relationship.
  table_name: stg_organization_tag_data
  entities:
  - Organizations
- relation_description: This table stores holiday periods within Support Schedules,
    defining specific dates when support services may be unavailable or modified.
  table_name: stg_schedule_holiday_data
  entities:
  - Support Schedules
- relation_name: TicketScheduleAssignment
  relation_description: This associates Support Tickets with specific Support Schedules,
    indicating when each ticket was created and assigned to a schedule.
  table_name: stg_ticket_schedule_data
  entities:
  - Support Schedules
  - Support Tickets
- relation_name: UserTicketComments
  relation_description: This captures the comments made by Users on Support Tickets,
    providing detailed information about each comment.
  table_name: stg_ticket_comment_data
  entities:
  - Support Tickets
  - Users
- relation_description: This table associates Support Tickets with multiple tags,
    allowing for categorization and labeling of tickets with various attributes.
  table_name: stg_ticket_tag_data
  entities:
  - Support Tickets
- relation_description: This table tracks the current state of Support Tickets by
    storing the most recent changes to their field values.
  table_name: snapshot_ticket_field_history_data
  entities:
  - Support Tickets
- relation_description: This stores daylight saving time information for various Time
    Zones across different years, including start and end times and offset hours.
  table_name: stg_daylight_time_data
  entities:
  - Time Zones
- relation_description: This table represents the association between Users and their
    assigned tags, allowing multiple tags per user and multiple users per tag.
  table_name: stg_user_tag_data
  entities:
  - Users
story:
- relation_name: UserOrganizationMembership
  story_line: Users register with organizations for support access.
- relation_name: SupportTicketManagement
  story_line: Users submit tickets through forms to specific brands.
- relation_name: TicketScheduleAssignment
  story_line: Support groups assign tickets to appropriate schedules.
- relation_name: UserTicketComments
  story_line: Users and support staff exchange comments on tickets.