Fivetran Asana

We display the source tables from the data warehouses to model.

project_data (first 100 rows)

id _fivetran_deleted _fivetran_synced archived color created_at current_status due_date modified_at name notes owner_id public_ team_id workspace_id
0 1142815262788834 False 2021-03-22 14:24:47 False light-purple 2019-10-02 16:02:54 None None 2020-02-27 17:26:31 75f4eda5755b226c07bdd5ef96624733 None NaN NaN 1109585467162456 2104505001950
1 339036947888917 False 2021-03-22 14:24:31 False dark-green 2019-09-18 17:03:10 None None 2020-02-27 17:25:56 3a704e169290367034b7a33b86a6c894 None NaN NaN 1109585467162456 2104505001950
2 1135568818569273 False 2021-03-22 14:23:20 False light-blue 2019-08-14 4:26:19 None None 2020-02-27 17:26:24 35a0789351b1e8bea1efce4b7e1b6d58 f2872ac60b5baf3040b01ae77f699889 NaN NaN 1109585467162456 2104505001950
3 339036947888916 False 2021-03-22 14:24:21 False light-purple 2019-09-13 18:17:47 None None 2019-09-21 22:37:42 5c7fac1b9516ef9ff51cbc30a6292a80 None 2.104505e+12 NaN 1130944552712711 2104505001950
4 1123411555652103 False 2021-03-22 14:21:52 False light-purple 2019-05-16 16:15:08 None None 2019-05-16 16:15:10 d3d4421c105c43597ec6bc36c21c7a05 None 2.349446e+12 NaN 2527546068509 2104505001950
5 994480050688368 False 2021-03-22 14:21:46 False None 2019-01-17 18:17:50 None None 2019-09-21 22:27:55 ea50dd49e3f04fc0b08874465c0568ed None 2.349446e+12 NaN 2527546068509 2104505001950
6 339036947888918 False 2021-03-22 14:21:26 False None 2017-05-10 18:58:18 None None 2019-09-21 22:27:28 44426ab2d0738bb56fec785a43de6e62 None 2.349446e+12 NaN 2527546068509 2104505001950
7 339014487764325 False 2021-03-22 14:21:25 False None 2017-05-10 17:07:21 None None 2019-09-21 22:27:08 72d7ec4b0127c0b11876d57b054b1201 None 2.349446e+12 NaN 2527546068509 2104505001950
8 339036947888913 False 2021-03-22 14:21:25 False None 2017-05-10 18:30:46 None None 2019-09-21 22:27:17 780efbde414f524f93e38332db3cc32d None 2.349446e+12 NaN 2527546068509 2104505001950
9 339282757803825 False 2021-03-22 14:21:27 False None 2017-05-10 22:25:19 None None 2019-09-21 22:27:21 f32c3edaacea72c0ddb30ecf0135c4de None 2.349446e+12 NaN 2527546068509 2104505001950
10 338932209783787 False 2021-03-22 14:21:24 False None 2017-05-10 16:44:24 None None 2019-09-21 22:27:15 f5e81e2cb4be3ae9d39fffc68d8af82c None 2.349446e+12 NaN 2527546068509 2104505001950
11 1119233841877068 False 2021-03-22 14:21:51 False light-blue 2019-04-18 14:25:19 None None 2019-04-18 14:25:21 df021e8d790008ba29b535b898402cd0 None 2.349446e+12 NaN 2527546068509 2104505001950
12 1103976536247089 False 2021-03-22 14:21:50 False None 2019-01-28 6:14:18 None None 2019-09-21 22:27:55 678884011246bebab096ff8b65f1b541 None 2.349446e+12 NaN 2527546068509 2104505001950
13 1133789517942526 False 2021-03-22 14:22:37 False dark-pink 2019-07-31 20:35:29 None 2019-09-11 0:00:00 2020-02-27 17:26:23 d704501c4aadd255aa9245d2c926529d d41d8cd98f00b204e9800998ecf8427e 3.389842e+14 NaN 1109585467162456 2104505001950
14 1127687633259843 False 2021-03-22 14:21:58 False light-warm-gray 2019-06-19 5:04:03 None None 2020-02-27 17:26:22 d704501c4aadd255aa9245d2c926529d None 3.389842e+14 NaN 1109585467162456 2104505001950
15 1138337102495418 False 2021-03-22 14:24:06 False light-blue 2019-09-04 0:06:05 752e3db7344085cd1bdb19f63d241ca8 None 2020-02-27 17:26:29 d704501c4aadd255aa9245d2c926529d ccf334d79abdfd11da2d0cba23fb1108 3.389842e+14 NaN 1109585467162456 2104505001950

project_task_data (first 100 rows)

project_id task_id _fivetran_synced
0 1145621382672906 1199893417793941 2021-02-04 02:24:35
1 1134396676363546 1199893417793941 2021-02-04 02:24:35

section_data (first 100 rows)

id _fivetran_synced created_at name project_id
0 1140999307201834 2021-03-22 14:21:03 2019-09-21 22:25:26 05a821cdf4370ff1dd2fc1d4465fb23e 2417304153254
1 1140999305179192 2021-03-22 14:21:03 2019-09-21 22:25:04 453486c410b442f5df1143816d2fa52f 2417304153254
2 1140999422113332 2021-03-22 14:21:04 2019-09-21 22:25:04 453486c410b442f5df1143816d2fa52f 2693467581098
3 1140999422891922 2021-03-22 14:21:06 2019-09-21 22:25:14 453486c410b442f5df1143816d2fa52f 2724895544714
4 1140999306720518 2021-03-22 14:21:08 2019-09-21 22:25:21 453486c410b442f5df1143816d2fa52f 3931050316709
5 1140999423455392 2021-03-22 14:21:11 2019-09-21 22:25:21 453486c410b442f5df1143816d2fa52f 3944161162427
6 1140999424217513 2021-03-22 14:21:11 2019-09-21 22:25:27 453486c410b442f5df1143816d2fa52f 4021281497638
7 1140999307427051 2021-03-22 14:21:14 2019-09-21 22:25:31 453486c410b442f5df1143816d2fa52f 4388687446288
8 1140999307646897 2021-03-22 14:21:14 2019-09-21 22:25:36 98f770b0af18ca763421bac22b4b6805 4388687446288
9 1140999307646896 2021-03-22 14:21:15 2019-09-21 22:25:35 453486c410b442f5df1143816d2fa52f 4404698904438
10 1140999307926262 2021-03-22 14:21:16 2019-09-21 22:25:39 453486c410b442f5df1143816d2fa52f 4428993261541
11 1140999428608188 2021-03-22 14:21:18 2019-09-21 22:26:19 23ce0eb7d180f3dc8391d4af48572d21 28303577354184
12 1140999427143974 2021-03-22 14:21:18 2019-09-21 22:26:02 453486c410b442f5df1143816d2fa52f 28303577354184

story_data (first 100 rows)

id _fivetran_synced created_at created_by_id hearted num_hearts source target_id text type
0 1199886380677330 2021-02-04 02:24:36 2021-02-04 00:22:02 1137799670337705 NaN 0 web 1199893417793941 e4d11d32473e2f36be6c080f2d4ba59b system
1 1199886372611183 2021-02-04 02:24:36 2021-02-04 00:11:27 1137799670337705 NaN 0 web 1199893417793941 1d2d2a1e18edac3a1d519b7df24ec16e system
2 1199886381437255 2021-02-04 02:24:36 2021-02-04 00:22:15 1137799670337705 NaN 0 web 1199893417793941 533b581ae821186aa349ca7cd8f19179 system

tag_data (first 100 rows)

id _fivetran_deleted _fivetran_synced color created_at message name notes workspace_id
0 163915846063865 False 2021-03-22 14:20:57 NaN 2016-08-04 17:26:14 NaN 54ce99fa85c92b1d87678436e956a2e8 NaN 2104505001950
1 155711202630775 False 2021-03-22 14:20:57 NaN 2016-07-14 23:52:42 NaN 5b6cf869265c13af8566f192b4ab3d2a NaN 2104505001950
2 205334637062213 False 2021-03-22 14:20:57 NaN 2016-11-01 15:28:12 NaN e7cc8f3f3aef4a5a55daa94bebb1b577 NaN 2104505001950
3 155711202630769 False 2021-03-22 14:20:57 NaN 2016-07-14 23:52:14 NaN 221d2a347ebfc15d48665188f639d1cb NaN 2104505001950
4 155711202630772 False 2021-03-22 14:20:57 NaN 2016-07-14 23:52:36 NaN 7cb15e416d62919b1b40298324fbe30b NaN 2104505001950
5 155711202630764 False 2021-03-22 14:20:57 NaN 2016-07-14 23:51:23 NaN 87f9f735a1d36793ceaecd4e47124b63 NaN 2104505001950
6 155711202630767 False 2021-03-22 14:20:57 NaN 2016-07-14 23:51:31 NaN 24f47cdbe9ddba774a7cc53e51d9032e NaN 2104505001950
7 87008074978192 False 2021-03-22 14:20:57 NaN 2016-02-03 18:36:17 NaN 11ff9f68afb6b8b5b8eda218d7c83a65 NaN 2104505001950
8 155711202630759 False 2021-03-22 14:20:57 NaN 2016-07-14 23:50:37 NaN e01aecb528730f3bfe10f9d57f1317bf NaN 2104505001950
9 155711202630747 False 2021-03-22 14:20:57 NaN 2016-07-14 23:49:45 NaN 21021ea0e52be8e9c599f4dff41e5be0 NaN 2104505001950
10 155711202630754 False 2021-03-22 14:20:57 NaN 2016-07-14 23:50:10 NaN bf78fa4160a6ed81e32eb6964d297cf9 NaN 2104505001950
11 155711202630745 False 2021-03-22 14:20:57 NaN 2016-07-14 23:49:31 NaN 8c6384dda98d8864d00e29dc96179970 NaN 2104505001950
12 155711202630751 False 2021-03-22 14:20:57 NaN 2016-07-14 23:50:03 NaN cc8d68c551c4a9a6d5313e07de4deafd NaN 2104505001950
13 1143821191383339 False 2021-03-22 14:20:57 NaN 2019-10-08 20:56:09 NaN 464fe4ea0557b9c34b4ef33c135f6a08 NaN 2104505001950
14 148696288165443 False 2021-03-22 14:20:57 NaN 2016-06-27 17:44:23 NaN da50c2fb0cbcdf1b923b0ca88e4d49ee NaN 2104505001950
15 146630888383184 False 2021-03-22 14:20:57 NaN 2016-06-21 19:34:34 NaN 2ceea82aab3ac5bc623e79186b389481 NaN 2104505001950
16 148289869797755 False 2021-03-22 14:20:57 NaN 2016-06-25 0:54:37 NaN 4007f46c835117c8761ebbaf8dd9969f NaN 2104505001950

task_data (first 100 rows)

id assignee_id completed completed_at completed_by_id created_at due_on due_at modified_at name parent_id start_on notes workspace_id
0 1138074657603006 1126539287026862 True 2019-09-18T16:48:31.176Z NaN 2019-09-02T14:53:02.685Z 2019-09-25T00:00:00Z NaN 2021-12-18T14:01:40.058Z cool name NaN NaN many notes 2104505001950

task_follower_data (first 100 rows)

task_id user_id _fivetran_synced
0 1199893417793941 1137799670337705 2021-02-04 02:24:35

task_section_data (first 100 rows)

section_id task_id _fivetran_synced
0 1145638605769770 1199893417793941 2021-02-04 02:24:35
1 1140999344898860 1199893417793941 2021-02-04 02:24:35

task_tag_data (first 100 rows)

tag_id task_id _fivetran_synced
0 163915846063865 1199893417793941 2021-02-04 02:24:35
1 155711202630775 1199893417793941 2021-02-04 02:24:35
2 205334637062213 1199893417793941 2021-02-04 02:24:35
3 155711202630769 1199893417793941 2021-02-04 02:24:35

team_data (first 100 rows)

id _fivetran_deleted _fivetran_synced name organization_id
0 1175290352849593 False 2021-03-22 14:20:55 c8a88aee2acd5dbf76978f421dd70dee 2104505001950
1 1165006878005385 False 2021-03-22 14:20:55 042627cc066e7a21b9b0b16ddf92a089 2104505001950
2 1175424836114178 False 2021-03-22 14:20:55 b1861e61b41ba3fff4e74757d74c4e90 2104505001950
3 1190726085138177 False 2021-03-22 14:20:55 e545301c9a02e33abe2917f4e38405e5 2104505001950
4 1163857948371041 False 2021-03-22 14:20:55 53e0dbc06e48e3d381ac224fa8bae3df 2104505001950
5 1175424836114180 False 2021-03-22 14:20:55 54dd0f611e06ada1547a2fa4b5ac4e2c 2104505001950
6 1145563915215145 False 2021-03-22 14:20:55 7cb15e416d62919b1b40298324fbe30b 2104505001950
7 1140675397099665 False 2021-03-22 14:20:55 7d10c031056d08b88a3b4fcf25631682 2104505001950
8 1131216624451101 False 2021-03-22 14:20:55 f9d8847b3da721264efcd1aaeebacf13 2104505001950
9 1127125465171855 False 2021-03-22 14:20:55 99992776e80912d4b03c9540a2255c1a 2104505001950
10 1131147572225013 False 2021-03-22 14:20:55 1458f7aba7fa08b3920d2836d95ce518 2104505001950
11 1130779217347436 False 2021-03-22 14:20:55 d98628a5ec525dab008c2262ad6669d4 2104505001950
12 1139142125254369 False 2021-03-22 14:20:55 29d94732ca1696b21b3cdd2df5795dab 2104505001950
13 1130944552712711 False 2021-03-22 14:20:55 31ff7243543d02f6537c4a43a4a18ce5 2104505001950
14 1125199132412749 False 2021-03-22 14:20:55 36cbc41c1c121f2c68f5776a118ea027 2104505001950
15 1126309244892859 False 2021-03-22 14:20:55 cd32106bcb6de321930cf34574ea388c 2104505001950
16 1126866959976249 False 2021-03-22 14:20:55 deb10517653c255364175796ace3553f 2104505001950
17 1126539229462849 False 2021-03-22 14:20:55 5f3c8189bb84d2ea4061e757daa9561f 2104505001950
18 1126687349073941 False 2021-03-22 14:20:55 c482980d384a9d0e7bc39e1140270870 2104505001950
19 1119148247636811 False 2021-03-22 14:20:55 5231dee7156b84b0d1854de1a18dd169 2104505001950
20 2527546068509 False 2021-03-22 14:20:55 a7ace9d7b0d52558c00f7540af56ed57 2104505001950
21 595727761265526 False 2021-03-22 14:20:55 c46c77edb3c31643a9260161249548c5 2104505001950
22 1109585467162456 False 2021-03-22 14:20:55 7cb15e416d62919b1b40298324fbe30b 2104505001950
23 1199557466985079 False 2021-03-22 14:20:55 cd32106bcb6de321930cf34574ea388c 2104505001950
24 1199576475848629 False 2021-03-22 14:20:55 deb10517653c255364175796ace3553f 2104505001950

user_data (first 100 rows)

id _fivetran_deleted _fivetran_synced email name
0 1145566427371773 False 2021-03-22 14:20:55 f8fca099b562801e43e671847ab63793 ba502a3057da84df93c1cee38be20d98
1 1140679468933567 False 2021-03-22 14:20:55 bf70064fcc790acb2640c9ea6658e7bf 784742a66a3a0c271feced5b149ff8db
2 1143584790050961 False 2021-03-22 14:20:55 0e8eb57f064990dd67d18c6457d3b7e4 3c065e105ba1ffa3cbcb06ce6274a0b4
3 1142174241683304 False 2021-03-22 14:20:55 74ea05adc4b1b53d8ed38de392bb6ae8 eea1bd8603b800df2834f07d362a1998
4 1140738529427046 False 2021-03-22 14:20:55 0920cafb10279abbb850ccf42906f584 dc4bc96708073c555c91c4909f3e7da2
5 874533751271912 False 2021-03-22 14:20:55 c2045671bfeec500fa785e46bda796b0 c2045671bfeec500fa785e46bda796b0
6 892002646367883 False 2021-03-22 14:20:55 cc9681199b8d7962c787a49aad79b28a c0639ca053b1211987bbbfd70e944bdc
7 1115060366861453 False 2021-03-22 14:20:55 1d4af9f6d0045aac6abf03629535e875 b31671608b9a4785bf89d02a868391c9
8 1109585316746864 False 2021-03-22 14:20:55 b9adcab5121811121c2c3d0aaaf9b2e8 b6e6868d2c745b987fd6bac86965e201
9 874533751409163 False 2021-03-22 14:20:55 64b62aabe443ca4e937bbc5576179c88 64b62aabe443ca4e937bbc5576179c88
10 892002646367901 False 2021-03-22 14:20:55 368eaa391757666b80b48d45492d5a62 413685a28fcb6d760c825c796a0fcecf
11 1119270776407458 False 2021-03-22 14:20:55 72af537b7a20cdc9125de91f5f6c9052 72af537b7a20cdc9125de91f5f6c9052
12 1116354963772542 False 2021-03-22 14:20:55 396b3a68984e542c218467d3354ce079 ac0df616713dcb9a53f04f106e973d81
13 1139503248172020 False 2021-03-22 14:20:55 5b4d3e849ff258a849b4cd99080097b6 527a7c3ec3b7736b0ebfa0b9e688cc4d
14 1139503169042444 False 2021-03-22 14:20:55 52537af58083acd49f69acaadfa6552c 4b4826feea5d1e05d9596eaa04fe154b
15 1139330627337223 False 2021-03-22 14:20:55 36e84c9f6590a44b2f10df1ecfc3649a a8ddbe83b3c803beff4f04d48f6f5312
16 388477550412326 False 2021-03-22 14:20:55 2be7fdf344cc0e93a63089083b2c3a53 e549f53a2e1eb4b27b65b3a04c315f4e
17 648669515172057 False 2021-03-22 14:20:55 76f8de0b00112335503a7d3734f52f79 00046eaa445242bb910688f9e1893b07
18 350290010811012 False 2021-03-22 14:20:55 46006954dd091c5dab1b6159a5e1524c 9605959095f1e07ba7628a197088bd70
19 719802245514819 False 2021-03-22 14:20:55 008e38cb1023225e4441f47ab5d01c70 a07d50fa47e3cfcad2a166929d680b45
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_user_data (first 100 rows)

is_deleted hashed_email hashed_name user_id
0 False f8fca099b562801e43e671847ab63793 ba502a3057da84df93c1cee38be20d98 1145566427371773
1 False bf70064fcc790acb2640c9ea6658e7bf 784742a66a3a0c271feced5b149ff8db 1140679468933567
2 False 0e8eb57f064990dd67d18c6457d3b7e4 3c065e105ba1ffa3cbcb06ce6274a0b4 1143584790050961
3 False 74ea05adc4b1b53d8ed38de392bb6ae8 eea1bd8603b800df2834f07d362a1998 1142174241683304
4 False 0920cafb10279abbb850ccf42906f584 dc4bc96708073c555c91c4909f3e7da2 1140738529427046
5 False c2045671bfeec500fa785e46bda796b0 c2045671bfeec500fa785e46bda796b0 874533751271912
6 False cc9681199b8d7962c787a49aad79b28a c0639ca053b1211987bbbfd70e944bdc 892002646367883
7 False 1d4af9f6d0045aac6abf03629535e875 b31671608b9a4785bf89d02a868391c9 1115060366861453
8 False b9adcab5121811121c2c3d0aaaf9b2e8 b6e6868d2c745b987fd6bac86965e201 1109585316746864
9 False 64b62aabe443ca4e937bbc5576179c88 64b62aabe443ca4e937bbc5576179c88 874533751409163
10 False 368eaa391757666b80b48d45492d5a62 413685a28fcb6d760c825c796a0fcecf 892002646367901
11 False 72af537b7a20cdc9125de91f5f6c9052 72af537b7a20cdc9125de91f5f6c9052 1119270776407458
12 False 396b3a68984e542c218467d3354ce079 ac0df616713dcb9a53f04f106e973d81 1116354963772542
13 False 5b4d3e849ff258a849b4cd99080097b6 527a7c3ec3b7736b0ebfa0b9e688cc4d 1139503248172020
14 False 52537af58083acd49f69acaadfa6552c 4b4826feea5d1e05d9596eaa04fe154b 1139503169042444
15 False 36e84c9f6590a44b2f10df1ecfc3649a a8ddbe83b3c803beff4f04d48f6f5312 1139330627337223
16 False 2be7fdf344cc0e93a63089083b2c3a53 e549f53a2e1eb4b27b65b3a04c315f4e 388477550412326
17 False 76f8de0b00112335503a7d3734f52f79 00046eaa445242bb910688f9e1893b07 648669515172057
18 False 46006954dd091c5dab1b6159a5e1524c 9605959095f1e07ba7628a197088bd70 350290010811012
19 False 008e38cb1023225e4441f47ab5d01c70 a07d50fa47e3cfcad2a166929d680b45 719802245514819

stg_user_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:05:40.193939+00:00
WITH 
"user_data_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "email",
        "name"
    FROM "memory"."main"."user_data"
),

"user_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> user_id
    -- _fivetran_deleted -> is_deleted
    -- email -> hashed_email
    -- name -> hashed_name
    SELECT 
        "id" AS "user_id",
        "_fivetran_deleted" AS "is_deleted",
        "email" AS "hashed_email",
        "name" AS "hashed_name"
    FROM "user_data_projected"
),

"user_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- user_id: from INT to VARCHAR
    SELECT
        "is_deleted",
        "hashed_email",
        "hashed_name",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "user_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "user_data_projected_renamed_casted"

stg_user_data.yml (Document the table)

version: 2
models:
- name: stg_user_data
  description: The table is about user data. It contains user IDs, email addresses,
    and names. The email and name fields appear to be hashed or encrypted for privacy.
    There is also a "_fivetran_deleted" column, likely indicating if the record has
    been deleted in a data pipeline. The table stores basic user information in a
    secure manner.
  columns:
  - name: is_deleted
    description: Indicates if the record has been deleted
    tests:
    - not_null
  - name: hashed_email
    description: Hashed or encrypted email address of the user
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the hashed or encrypted email address of
        the user. For this table, each row represents a unique user. Email addresses
        are typically unique for each user, so the hashed_email is likely to be unique
        across rows.
  - name: hashed_name
    description: Hashed or encrypted name of the user
    tests:
    - not_null
  - name: user_id
    description: Unique identifier for the user
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for the user. For this
        table, each row represents a unique user. User IDs are specifically designed
        to be unique identifiers, so user_id is guaranteed to be unique across rows.

stg_task_follower_data (first 100 rows)

task_id user_id
0 1199893417793941 1137799670337705

stg_task_follower_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:03:47.783533+00:00
WITH 
"task_follower_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "task_id",
        "user_id"
    FROM "memory"."main"."task_follower_data"
),

"task_follower_data_projected_casted" AS (
    -- Column Type Casting: 
    -- task_id: from INT to VARCHAR
    -- user_id: from INT to VARCHAR
    SELECT
        CAST("task_id" AS VARCHAR) AS "task_id",
        CAST("user_id" AS VARCHAR) AS "user_id"
    FROM "task_follower_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "task_follower_data_projected_casted"

stg_task_follower_data.yml (Document the table)

version: 2
models:
- name: stg_task_follower_data
  description: 'The table represents the follower relation between tasks and users.
    It contains two columns: task_id and user_id. Each row indicates that a specific
    user is following or associated with a particular task. This structure allows
    tracking which users are following or involved with which tasks in a system.'
  columns:
  - name: task_id
    description: Unique identifier for a task
    tests:
    - not_null
  - name: user_id
    description: Unique identifier for a user
    tests:
    - not_null

stg_project_task_data (first 100 rows)

project_id task_id
0 1145621382672906 1199893417793941
1 1134396676363546 1199893417793941

stg_project_task_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 23:59:29.269778+00:00
WITH 
"project_task_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "project_id",
        "task_id"
    FROM "memory"."main"."project_task_data"
),

"project_task_data_projected_casted" AS (
    -- Column Type Casting: 
    -- project_id: from INT to VARCHAR
    -- task_id: from INT to VARCHAR
    SELECT
        CAST("project_id" AS VARCHAR) AS "project_id",
        CAST("task_id" AS VARCHAR) AS "task_id"
    FROM "project_task_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "project_task_data_projected_casted"

stg_project_task_data.yml (Document the table)

version: 2
models:
- name: stg_project_task_data
  description: The table represents a many-to-many relationship between projects and
    tasks. It shows which tasks are associated with which projects. Each row links
    a project (identified by project_id) to a task (identified by task_id). Multiple
    projects can be associated with the same task, and a project can have multiple
    tasks.
  columns:
  - name: project_id
    description: Unique identifier for a project
    tests:
    - not_null
  - name: task_id
    description: Unique identifier for a task
    tests:
    - not_null

stg_story_data (first 100 rows)

heart_count source_platform content entry_type author_id created_at entry_id is_hearted target_id
0 0 web e4d11d32473e2f36be6c080f2d4ba59b system 1137799670337705 2021-02-04 00:22:02 1199886380677330 NaN 1199893417793941
1 0 web 1d2d2a1e18edac3a1d519b7df24ec16e system 1137799670337705 2021-02-04 00:11:27 1199886372611183 NaN 1199893417793941
2 0 web 533b581ae821186aa349ca7cd8f19179 system 1137799670337705 2021-02-04 00:22:15 1199886381437255 NaN 1199893417793941

stg_story_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:01:13.008982+00:00
WITH 
"story_data_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "created_by_id",
        "hearted",
        "num_hearts",
        "source",
        "target_id",
        "text",
        "type"
    FROM "memory"."main"."story_data"
),

"story_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> entry_id
    -- created_by_id -> author_id
    -- hearted -> is_hearted
    -- num_hearts -> heart_count
    -- source -> source_platform
    -- text -> content
    -- type -> entry_type
    SELECT 
        "id" AS "entry_id",
        "created_at",
        "created_by_id" AS "author_id",
        "hearted" AS "is_hearted",
        "num_hearts" AS "heart_count",
        "source" AS "source_platform",
        "target_id",
        "text" AS "content",
        "type" AS "entry_type"
    FROM "story_data_projected"
),

"story_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- author_id: from INT to VARCHAR
    -- created_at: from VARCHAR to TIMESTAMP
    -- entry_id: from INT to VARCHAR
    -- is_hearted: from DECIMAL to BOOLEAN
    -- target_id: from INT to VARCHAR
    SELECT
        "heart_count",
        "source_platform",
        "content",
        "entry_type",
        CAST("author_id" AS VARCHAR) AS "author_id",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("entry_id" AS VARCHAR) AS "entry_id",
        CAST("is_hearted" AS BOOLEAN) AS "is_hearted",
        CAST("target_id" AS VARCHAR) AS "target_id"
    FROM "story_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "story_data_projected_renamed_casted"

stg_story_data.yml (Document the table)

version: 2
models:
- name: stg_story_data
  description: The table is about story or message entries. It contains details like
    unique ID, creation timestamp, creator ID, number of hearts, source platform,
    target ID, text content, and message type. Each row represents a single story/message
    item with its associated metadata and engagement metrics.
  columns:
  - name: heart_count
    description: Number of hearts received by the entry
    tests:
    - not_null
  - name: source_platform
    description: Platform or medium where the entry was created
    tests:
    - not_null
    - accepted_values:
        values:
        - web
        - mobile
        - desktop
        - tablet
        - social media
        - email
        - SMS
        - print
        - television
        - radio
        - podcast
        - video
        - blog
        - forum
        - app
        - API
        - IoT device
        - wearable
        - voice assistant
        - kiosk
        - game console
        - VR/AR
  - name: content
    description: Content of the message or story (possibly encrypted)
    tests:
    - not_null
  - name: entry_type
    description: Classification or category of the entry
    tests:
    - not_null
    - accepted_values:
        values:
        - system
        - user
        - application
        - security
        - network
        - hardware
        - software
        - database
        - file
        - process
        - service
        - device
        - driver
        - kernel
        - memory
        - storage
        - backup
        - recovery
        - performance
        - configuration
        - update
        - error
        - event
        - audit
        - monitoring
  - name: author_id
    description: Unique identifier of the user who created the entry
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the entry was created
    tests:
    - not_null
  - name: entry_id
    description: Unique identifier for the entry
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the entry. For
        this table, where each row represents a single story/message item, the entry_id
        should be unique across all rows, as it's designed to uniquely identify each
        entry.
  - name: is_hearted
    description: Indicates if the entry was hearted (seems unused)
    cocoon_meta:
      missing_acceptable: Empty when the entry has not been hearted.
  - name: target_id
    description: Identifier of the target or related entity
    tests:
    - not_null

stg_project_data (first 100 rows)

project_notes project_color is_archived is_deleted project_name creation_date is_public last_modified_date owner_id project_id team_id workspace_id
0 None light-purple False False 75f4eda5755b226c07bdd5ef96624733 2019-10-02 16:02:54 None 2020-02-27 17:26:31 None 1142815262788834 1109585467162456 2104505001950
1 None dark-green False False 3a704e169290367034b7a33b86a6c894 2019-09-18 17:03:10 None 2020-02-27 17:25:56 None 339036947888917 1109585467162456 2104505001950
2 f2872ac60b5baf3040b01ae77f699889 light-blue False False 35a0789351b1e8bea1efce4b7e1b6d58 2019-08-14 04:26:19 None 2020-02-27 17:26:24 None 1135568818569273 1109585467162456 2104505001950
3 None light-purple False False 5c7fac1b9516ef9ff51cbc30a6292a80 2019-09-13 18:17:47 None 2019-09-21 22:37:42 2104505001945.0 339036947888916 1130944552712711 2104505001950
4 None light-purple False False d3d4421c105c43597ec6bc36c21c7a05 2019-05-16 16:15:08 None 2019-05-16 16:15:10 2349446254454.0 1123411555652103 2527546068509 2104505001950
5 None None False False ea50dd49e3f04fc0b08874465c0568ed 2019-01-17 18:17:50 None 2019-09-21 22:27:55 2349446254454.0 994480050688368 2527546068509 2104505001950
6 None None False False 44426ab2d0738bb56fec785a43de6e62 2017-05-10 18:58:18 None 2019-09-21 22:27:28 2349446254454.0 339036947888918 2527546068509 2104505001950
7 None None False False 72d7ec4b0127c0b11876d57b054b1201 2017-05-10 17:07:21 None 2019-09-21 22:27:08 2349446254454.0 339014487764325 2527546068509 2104505001950
8 None None False False 780efbde414f524f93e38332db3cc32d 2017-05-10 18:30:46 None 2019-09-21 22:27:17 2349446254454.0 339036947888913 2527546068509 2104505001950
9 None None False False f32c3edaacea72c0ddb30ecf0135c4de 2017-05-10 22:25:19 None 2019-09-21 22:27:21 2349446254454.0 339282757803825 2527546068509 2104505001950
10 None None False False f5e81e2cb4be3ae9d39fffc68d8af82c 2017-05-10 16:44:24 None 2019-09-21 22:27:15 2349446254454.0 338932209783787 2527546068509 2104505001950
11 None light-blue False False df021e8d790008ba29b535b898402cd0 2019-04-18 14:25:19 None 2019-04-18 14:25:21 2349446254454.0 1119233841877068 2527546068509 2104505001950
12 None None False False 678884011246bebab096ff8b65f1b541 2019-01-28 06:14:18 None 2019-09-21 22:27:55 2349446254454.0 1103976536247089 2527546068509 2104505001950
13 None dark-pink False False d704501c4aadd255aa9245d2c926529d 2019-07-31 20:35:29 None 2020-02-27 17:26:23 338984235816349.0 1133789517942526 1109585467162456 2104505001950
14 None light-warm-gray False False d704501c4aadd255aa9245d2c926529d 2019-06-19 05:04:03 None 2020-02-27 17:26:22 338984235816349.0 1127687633259843 1109585467162456 2104505001950
15 ccf334d79abdfd11da2d0cba23fb1108 light-blue False False d704501c4aadd255aa9245d2c926529d 2019-09-04 00:06:05 None 2020-02-27 17:26:29 338984235816349.0 1138337102495418 1109585467162456 2104505001950

stg_project_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 23:59:07.456396+00:00
WITH 
"project_data_projected" AS (
    -- Projection: Selecting 14 out of 15 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "archived",
        "color",
        "created_at",
        "current_status",
        "due_date",
        "modified_at",
        "name",
        "notes",
        "owner_id",
        "public_",
        "team_id",
        "workspace_id"
    FROM "memory"."main"."project_data"
),

"project_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> project_id
    -- _fivetran_deleted -> is_deleted
    -- archived -> is_archived
    -- color -> project_color
    -- created_at -> creation_date
    -- current_status -> project_status
    -- due_date -> project_deadline
    -- modified_at -> last_modified_date
    -- name -> project_name
    -- notes -> project_notes
    -- public_ -> is_public
    SELECT 
        "id" AS "project_id",
        "_fivetran_deleted" AS "is_deleted",
        "archived" AS "is_archived",
        "color" AS "project_color",
        "created_at" AS "creation_date",
        "current_status" AS "project_status",
        "due_date" AS "project_deadline",
        "modified_at" AS "last_modified_date",
        "name" AS "project_name",
        "notes" AS "project_notes",
        "owner_id",
        "public_" AS "is_public",
        "team_id",
        "workspace_id"
    FROM "project_data_projected"
),

"project_data_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- project_status: The problem is that the project_status column contains a single value that appears to be a hashed or encoded string ('752e3db7344085cd1bdb19f63d241ca8') rather than a human-readable project status. This value is unusual because it doesn't provide any meaningful information about the status of the project. The correct values for a project status column would typically be descriptive terms like 'In Progress', 'Completed', 'On Hold', etc. Since we don't have any information about what this hash might represent, and there are no other values to compare it to, the safest approach is to map it to an empty string to indicate that the status is unknown or not provided. 
    SELECT
        "project_id",
        "is_deleted",
        "is_archived",
        "project_color",
        "creation_date",
        CASE
            WHEN "project_status" = '''752e3db7344085cd1bdb19f63d241ca8''' THEN ''''
            ELSE "project_status"
        END AS "project_status",
        "project_deadline",
        "last_modified_date",
        "project_name",
        "project_notes",
        "owner_id",
        "is_public",
        "team_id",
        "workspace_id"
    FROM "project_data_projected_renamed"
),

"project_data_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- project_notes: ['d41d8cd98f00b204e9800998ecf8427e']
    SELECT 
        CASE
            WHEN "project_notes" = 'd41d8cd98f00b204e9800998ecf8427e' THEN NULL
            ELSE "project_notes"
        END AS "project_notes",
        "team_id",
        "project_color",
        "project_id",
        "workspace_id",
        "creation_date",
        "owner_id",
        "last_modified_date",
        "is_archived",
        "is_deleted",
        "project_status",
        "project_name",
        "is_public",
        "project_deadline"
    FROM "project_data_projected_renamed_cleaned"
),

"project_data_projected_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- creation_date: from VARCHAR to TIMESTAMP
    -- is_public: from DECIMAL to VARCHAR
    -- last_modified_date: from VARCHAR to TIMESTAMP
    -- owner_id: from DECIMAL to VARCHAR
    -- project_id: from INT to VARCHAR
    -- team_id: from INT to VARCHAR
    -- workspace_id: from INT to VARCHAR
    SELECT
        "project_notes",
        "project_color",
        "is_archived",
        "is_deleted",
        "project_status",
        "project_name",
        "project_deadline",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("is_public" AS VARCHAR) AS "is_public",
        CAST("last_modified_date" AS TIMESTAMP) AS "last_modified_date",
        CAST("owner_id" AS VARCHAR) AS "owner_id",
        CAST("project_id" AS VARCHAR) AS "project_id",
        CAST("team_id" AS VARCHAR) AS "team_id",
        CAST("workspace_id" AS VARCHAR) AS "workspace_id"
    FROM "project_data_projected_renamed_cleaned_null"
),

"project_data_projected_renamed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 5 columns with unacceptable missing values
    -- owner_id has 18.75 percent missing. Strategy: 🔄 Unchanged
    -- project_color has 43.75 percent missing. Strategy: 🔄 Unchanged
    -- project_deadline has 93.75 percent missing. Strategy: 🗑️ Drop Column
    -- project_notes has 87.5 percent missing. Strategy: 🔄 Unchanged
    -- project_status has 93.75 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "project_notes",
        "project_color",
        "is_archived",
        "is_deleted",
        "project_name",
        "creation_date",
        "is_public",
        "last_modified_date",
        "owner_id",
        "project_id",
        "team_id",
        "workspace_id"
    FROM "project_data_projected_renamed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "project_data_projected_renamed_cleaned_null_casted_missing_handled"

stg_project_data.yml (Document the table)

version: 2
models:
- name: stg_project_data
  description: The table is about project data. It contains details such as project
    ID, name, creation date, modification date, color, status, due date, notes, owner
    ID, team ID, and workspace ID. Each row represents a unique project with its associated
    attributes. The table tracks project information including metadata like archival
    status and public visibility.
  columns:
  - name: project_notes
    description: Additional notes or comments about the project
    tests:
    - not_null
  - name: project_color
    description: Color assigned to the project for visual identification
    tests:
    - not_null
    - accepted_values:
        values:
        - light-blue
        - light-purple
        - dark-green
        - dark-pink
        - light-warm-gray
        - dark-blue
        - dark-purple
        - light-green
        - light-pink
        - dark-warm-gray
        - light-red
        - dark-red
        - light-orange
        - dark-orange
        - light-yellow
        - dark-yellow
        - light-gray
        - dark-gray
        - light-cyan
        - dark-cyan
  - name: is_archived
    description: Indicates if the project is archived
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record has been deleted
    tests:
    - not_null
  - name: project_name
    description: Name or title of the project
    tests:
    - not_null
  - name: creation_date
    description: Timestamp of project creation
    tests:
    - not_null
  - name: is_public
    description: Indicates if the project is publicly visible
    cocoon_meta:
      missing_acceptable: Boolean flag; absence likely means project is not public.
  - name: last_modified_date
    description: Timestamp of last modification
    tests:
    - not_null
  - name: owner_id
    description: Identifier of the project owner
    tests:
    - not_null
  - name: project_id
    description: Unique identifier for the project
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each project. For this
        table, each row represents a unique project. project_id is designed to be
        unique across all projects, ensuring that no two projects have the same ID.
  - name: team_id
    description: Identifier of the team associated with the project
    tests:
    - not_null
  - name: workspace_id
    description: Identifier of the workspace containing the project
    tests:
    - not_null

stg_tag_data (first 100 rows)

is_deleted tag_hash creation_timestamp tag_id workspace_id
0 False 54ce99fa85c92b1d87678436e956a2e8 2016-08-04 17:26:14 163915846063865 2104505001950
1 False 5b6cf869265c13af8566f192b4ab3d2a 2016-07-14 23:52:42 155711202630775 2104505001950
2 False e7cc8f3f3aef4a5a55daa94bebb1b577 2016-11-01 15:28:12 205334637062213 2104505001950
3 False 221d2a347ebfc15d48665188f639d1cb 2016-07-14 23:52:14 155711202630769 2104505001950
4 False 7cb15e416d62919b1b40298324fbe30b 2016-07-14 23:52:36 155711202630772 2104505001950
5 False 87f9f735a1d36793ceaecd4e47124b63 2016-07-14 23:51:23 155711202630764 2104505001950
6 False 24f47cdbe9ddba774a7cc53e51d9032e 2016-07-14 23:51:31 155711202630767 2104505001950
7 False 11ff9f68afb6b8b5b8eda218d7c83a65 2016-02-03 18:36:17 87008074978192 2104505001950
8 False e01aecb528730f3bfe10f9d57f1317bf 2016-07-14 23:50:37 155711202630759 2104505001950
9 False 21021ea0e52be8e9c599f4dff41e5be0 2016-07-14 23:49:45 155711202630747 2104505001950
10 False bf78fa4160a6ed81e32eb6964d297cf9 2016-07-14 23:50:10 155711202630754 2104505001950
11 False 8c6384dda98d8864d00e29dc96179970 2016-07-14 23:49:31 155711202630745 2104505001950
12 False cc8d68c551c4a9a6d5313e07de4deafd 2016-07-14 23:50:03 155711202630751 2104505001950
13 False 464fe4ea0557b9c34b4ef33c135f6a08 2019-10-08 20:56:09 1143821191383339 2104505001950
14 False da50c2fb0cbcdf1b923b0ca88e4d49ee 2016-06-27 17:44:23 148696288165443 2104505001950
15 False 2ceea82aab3ac5bc623e79186b389481 2016-06-21 19:34:34 146630888383184 2104505001950
16 False 4007f46c835117c8761ebbaf8dd9969f 2016-06-25 00:54:37 148289869797755 2104505001950

stg_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:02:09.098470+00:00
WITH 
"tag_data_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "color",
        "created_at",
        "message",
        "name",
        "notes",
        "workspace_id"
    FROM "memory"."main"."tag_data"
),

"tag_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> tag_id
    -- _fivetran_deleted -> is_deleted
    -- color -> tag_color
    -- created_at -> creation_timestamp
    -- message -> tag_message
    -- name -> tag_hash
    -- notes -> tag_notes
    SELECT 
        "id" AS "tag_id",
        "_fivetran_deleted" AS "is_deleted",
        "color" AS "tag_color",
        "created_at" AS "creation_timestamp",
        "message" AS "tag_message",
        "name" AS "tag_hash",
        "notes" AS "tag_notes",
        "workspace_id"
    FROM "tag_data_projected"
),

"tag_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- tag_color: from DECIMAL to VARCHAR
    -- tag_id: from INT to VARCHAR
    -- tag_message: from DECIMAL to VARCHAR
    -- tag_notes: from DECIMAL to VARCHAR
    -- workspace_id: from INT to VARCHAR
    SELECT
        "is_deleted",
        "tag_hash",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST("tag_color" AS VARCHAR) AS "tag_color",
        CAST("tag_id" AS VARCHAR) AS "tag_id",
        CAST("tag_message" AS VARCHAR) AS "tag_message",
        CAST("tag_notes" AS VARCHAR) AS "tag_notes",
        CAST("workspace_id" AS VARCHAR) AS "workspace_id"
    FROM "tag_data_projected_renamed"
),

"tag_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- tag_color has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tag_message has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- tag_notes has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "is_deleted",
        "tag_hash",
        "creation_timestamp",
        "tag_id",
        "workspace_id"
    FROM "tag_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "tag_data_projected_renamed_casted_missing_handled"

stg_tag_data.yml (Document the table)

version: 2
models:
- name: stg_tag_data
  description: The table is about tags in a workspace system. It contains tag details
    like id, name, color, creation date, and associated workspace id. Each row represents
    a unique tag. The 'name' field contains hash-like values, possibly for identification
    or security purposes. The table tracks if tags are deleted and allows for optional
    messages and notes.
  columns:
  - name: is_deleted
    description: Indicates if the tag has been deleted
    tests:
    - not_null
  - name: tag_hash
    description: Hashed name or identifier for the tag
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a hashed name or identifier for the tag.
        For this table, each row represents a unique tag. The tag_hash appears to
        be unique across rows in the given sample, and it's likely designed to be
        unique for each tag.
  - name: creation_timestamp
    description: Timestamp when the tag was created
    tests:
    - not_null
  - name: tag_id
    description: Unique identifier for the tag
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is explicitly described as a unique identifier for the
        tag. For this table, each row represents a unique tag. The tag_id is designed
        to be unique across all tags in the system.
  - name: workspace_id
    description: Identifier of the workspace the tag belongs to
    tests:
    - not_null

stg_section_data (first 100 rows)

section_hash created_at project_id section_id
0 05a821cdf4370ff1dd2fc1d4465fb23e 2019-09-21 22:25:26 2417304153254 1140999307201834
1 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:04 2417304153254 1140999305179192
2 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:04 2693467581098 1140999422113332
3 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:14 2724895544714 1140999422891922
4 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:21 3931050316709 1140999306720518
5 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:21 3944161162427 1140999423455392
6 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:27 4021281497638 1140999424217513
7 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:31 4388687446288 1140999307427051
8 98f770b0af18ca763421bac22b4b6805 2019-09-21 22:25:36 4388687446288 1140999307646897
9 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:35 4404698904438 1140999307646896
10 453486c410b442f5df1143816d2fa52f 2019-09-21 22:25:39 4428993261541 1140999307926262
11 23ce0eb7d180f3dc8391d4af48572d21 2019-09-21 22:26:19 28303577354184 1140999428608188
12 453486c410b442f5df1143816d2fa52f 2019-09-21 22:26:02 28303577354184 1140999427143974

stg_section_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:00:06.130210+00:00
WITH 
"section_data_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "created_at",
        "name",
        "project_id"
    FROM "memory"."main"."section_data"
),

"section_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> section_id
    -- name -> section_hash
    SELECT 
        "id" AS "section_id",
        "created_at",
        "name" AS "section_hash",
        "project_id"
    FROM "section_data_projected"
),

"section_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_at: from VARCHAR to TIMESTAMP
    -- project_id: from INT to VARCHAR
    -- section_id: from INT to VARCHAR
    SELECT
        "section_hash",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("project_id" AS VARCHAR) AS "project_id",
        CAST("section_id" AS VARCHAR) AS "section_id"
    FROM "section_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "section_data_projected_renamed_casted"

stg_section_data.yml (Document the table)

version: 2
models:
- name: stg_section_data
  description: The table is about project sections. It contains details of each section,
    including a unique ID, creation date, name (likely a hash), and the project ID
    it belongs to. The table allows tracking of different sections within various
    projects, with each section having a specific creation timestamp and identifier.
  columns:
  - name: section_hash
    description: Hash-like name or identifier for the section
    tests:
    - not_null
  - name: created_at
    description: Timestamp when the section was created
    tests:
    - not_null
  - name: project_id
    description: Identifier of the project the section belongs to
    tests:
    - not_null
  - name: section_id
    description: Unique identifier for the section
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each section. For this table,
        each row represents a unique section, and the section_id is designed to be
        unique across all rows. It serves as the primary identifier for each section
        entry.

stg_task_data (first 100 rows)

is_completed task_name task_description assignee_id completed_by_id completion_timestamp creation_timestamp due_date last_modified_timestamp task_id workspace_id
0 True cool name many notes 1126539287026862 None 2019-09-18 16:48:31.176 2019-09-02 14:53:02.685 2019-09-25 2021-12-18 14:01:40.058 1138074657603006 2104505001950

stg_task_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:03:32.255648+00:00
WITH 
"task_data_renamed" AS (
    -- Rename: Renaming columns
    -- id -> task_id
    -- completed -> is_completed
    -- completed_at -> completion_timestamp
    -- created_at -> creation_timestamp
    -- due_on -> due_date
    -- due_at -> due_timestamp
    -- modified_at -> last_modified_timestamp
    -- name -> task_name
    -- parent_id -> parent_task_id
    -- start_on -> start_date
    -- notes -> task_description
    SELECT 
        "id" AS "task_id",
        "assignee_id",
        "completed" AS "is_completed",
        "completed_at" AS "completion_timestamp",
        "completed_by_id",
        "created_at" AS "creation_timestamp",
        "due_on" AS "due_date",
        "due_at" AS "due_timestamp",
        "modified_at" AS "last_modified_timestamp",
        "name" AS "task_name",
        "parent_id" AS "parent_task_id",
        "start_on" AS "start_date",
        "notes" AS "task_description",
        "workspace_id"
    FROM "memory"."main"."task_data"
),

"task_data_renamed_casted" AS (
    -- Column Type Casting: 
    -- assignee_id: from INT to VARCHAR
    -- completed_by_id: from DECIMAL to VARCHAR
    -- completion_timestamp: from VARCHAR to TIMESTAMP
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- due_date: from VARCHAR to TIMESTAMP
    -- due_timestamp: from DECIMAL to TIMESTAMP
    -- last_modified_timestamp: from VARCHAR to TIMESTAMP
    -- parent_task_id: from DECIMAL to VARCHAR
    -- start_date: from DECIMAL to TIMESTAMP
    -- task_id: from INT to VARCHAR
    -- workspace_id: from INT to VARCHAR
    SELECT
        "is_completed",
        "task_name",
        "task_description",
        CAST("assignee_id" AS VARCHAR) AS "assignee_id",
        CAST("completed_by_id" AS VARCHAR) AS "completed_by_id",
        CAST("completion_timestamp" AS TIMESTAMP) AS "completion_timestamp",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST("due_date" AS TIMESTAMP) AS "due_date",
        CAST("due_timestamp" AS TIMESTAMP) AS "due_timestamp",
        CAST("last_modified_timestamp" AS TIMESTAMP) AS "last_modified_timestamp",
        CAST("parent_task_id" AS VARCHAR) AS "parent_task_id",
        CAST("start_date" AS TIMESTAMP) AS "start_date",
        CAST("task_id" AS VARCHAR) AS "task_id",
        CAST("workspace_id" AS VARCHAR) AS "workspace_id"
    FROM "task_data_renamed"
),

"task_data_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- due_timestamp has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- parent_task_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- start_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "is_completed",
        "task_name",
        "task_description",
        "assignee_id",
        "completed_by_id",
        "completion_timestamp",
        "creation_timestamp",
        "due_date",
        "last_modified_timestamp",
        "task_id",
        "workspace_id"
    FROM "task_data_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "task_data_renamed_casted_missing_handled"

stg_task_data.yml (Document the table)

version: 2
models:
- name: stg_task_data
  description: The table is about tasks. It contains details like task ID, assignee,
    completion status, due dates, and workspace. Each row represents a single task
    with its properties. The table tracks task creation, modification, and completion
    dates. It also includes task names, notes, and parent task relationships. This
    structure allows for managing and tracking tasks within a workspace.
  columns:
  - name: is_completed
    description: Indicates if the task is completed
    tests:
    - not_null
  - name: task_name
    description: Name or title of the task
    tests:
    - not_null
  - name: task_description
    description: Additional information or details about the task
    tests:
    - not_null
  - name: assignee_id
    description: ID of the person assigned to the task
    tests:
    - not_null
  - name: completed_by_id
    description: ID of the person who completed the task
    cocoon_meta:
      missing_acceptable: Task not completed by anyone if is_completed is False.
  - name: completion_timestamp
    description: Timestamp when the task was completed
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp when the task was created
    tests:
    - not_null
  - name: due_date
    description: Date when the task is due
    tests:
    - not_null
  - name: last_modified_timestamp
    description: Timestamp when the task was last modified
    tests:
    - not_null
  - name: task_id
    description: Unique identifier for the task
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains a unique identifier for each task. In a well-designed
        task management system, each task should have a unique ID to distinguish it
        from all other tasks.
  - name: workspace_id
    description: ID of the workspace the task belongs to
    tests:
    - not_null

stg_task_section_data (first 100 rows)

parent_task_id section_id
0 1199893417793941 1145638605769770
1 1199893417793941 1140999344898860

stg_task_section_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:04:09.011709+00:00
WITH 
"task_section_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "section_id",
        "task_id"
    FROM "memory"."main"."task_section_data"
),

"task_section_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- task_id -> parent_task_id
    SELECT 
        "section_id",
        "task_id" AS "parent_task_id"
    FROM "task_section_data_projected"
),

"task_section_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- parent_task_id: from INT to VARCHAR
    -- section_id: from INT to VARCHAR
    SELECT
        CAST("parent_task_id" AS VARCHAR) AS "parent_task_id",
        CAST("section_id" AS VARCHAR) AS "section_id"
    FROM "task_section_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "task_section_data_projected_renamed_casted"

stg_task_section_data.yml (Document the table)

version: 2
models:
- name: stg_task_section_data
  description: The table represents the relationship between task sections and tasks.
    Each row links a section to a task. A task can have multiple sections. The section_id
    is unique for each section. The task_id connects sections to their parent task.
    This structure allows for organizing tasks into multiple sections or subtasks.
  columns:
  - name: parent_task_id
    description: Identifier linking sections to their parent task
    tests:
    - not_null
  - name: section_id
    description: Unique identifier for each task section
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each task section.
        For this table, each row represents a link between a section and its parent
        task. The section_id is described as unique for each section, which means
        it is unique across rows.

stg_team_data (first 100 rows)

is_deleted encrypted_team_name organization_id team_id
0 False c8a88aee2acd5dbf76978f421dd70dee 2104505001950 1175290352849593
1 False 042627cc066e7a21b9b0b16ddf92a089 2104505001950 1165006878005385
2 False b1861e61b41ba3fff4e74757d74c4e90 2104505001950 1175424836114178
3 False e545301c9a02e33abe2917f4e38405e5 2104505001950 1190726085138177
4 False 53e0dbc06e48e3d381ac224fa8bae3df 2104505001950 1163857948371041
5 False 54dd0f611e06ada1547a2fa4b5ac4e2c 2104505001950 1175424836114180
6 False 7cb15e416d62919b1b40298324fbe30b 2104505001950 1145563915215145
7 False 7d10c031056d08b88a3b4fcf25631682 2104505001950 1140675397099665
8 False f9d8847b3da721264efcd1aaeebacf13 2104505001950 1131216624451101
9 False 99992776e80912d4b03c9540a2255c1a 2104505001950 1127125465171855
10 False 1458f7aba7fa08b3920d2836d95ce518 2104505001950 1131147572225013
11 False d98628a5ec525dab008c2262ad6669d4 2104505001950 1130779217347436
12 False 29d94732ca1696b21b3cdd2df5795dab 2104505001950 1139142125254369
13 False 31ff7243543d02f6537c4a43a4a18ce5 2104505001950 1130944552712711
14 False 36cbc41c1c121f2c68f5776a118ea027 2104505001950 1125199132412749
15 False cd32106bcb6de321930cf34574ea388c 2104505001950 1126309244892859
16 False deb10517653c255364175796ace3553f 2104505001950 1126866959976249
17 False 5f3c8189bb84d2ea4061e757daa9561f 2104505001950 1126539229462849
18 False c482980d384a9d0e7bc39e1140270870 2104505001950 1126687349073941
19 False 5231dee7156b84b0d1854de1a18dd169 2104505001950 1119148247636811
20 False a7ace9d7b0d52558c00f7540af56ed57 2104505001950 2527546068509
21 False c46c77edb3c31643a9260161249548c5 2104505001950 595727761265526
22 False 7cb15e416d62919b1b40298324fbe30b 2104505001950 1109585467162456
23 False cd32106bcb6de321930cf34574ea388c 2104505001950 1199557466985079
24 False deb10517653c255364175796ace3553f 2104505001950 1199576475848629

stg_team_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:05:01.424219+00:00
WITH 
"team_data_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "name",
        "organization_id"
    FROM "memory"."main"."team_data"
),

"team_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> team_id
    -- _fivetran_deleted -> is_deleted
    -- name -> encrypted_team_name
    SELECT 
        "id" AS "team_id",
        "_fivetran_deleted" AS "is_deleted",
        "name" AS "encrypted_team_name",
        "organization_id"
    FROM "team_data_projected"
),

"team_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- organization_id: from INT to VARCHAR
    -- team_id: from INT to VARCHAR
    SELECT
        "is_deleted",
        "encrypted_team_name",
        CAST("organization_id" AS VARCHAR) AS "organization_id",
        CAST("team_id" AS VARCHAR) AS "team_id"
    FROM "team_data_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "team_data_projected_renamed_casted"

stg_team_data.yml (Document the table)

version: 2
models:
- name: stg_team_data
  description: The table is about teams within an organization. Each row represents
    a team with a unique ID. The table includes the team's name (encrypted), whether
    it's deleted, and the organization ID it belongs to. All teams shown belong to
    the same organization (2104505001950). The table likely tracks teams across different
    organizations for management or analytics purposes.
  columns:
  - name: is_deleted
    description: Indicates if the team has been deleted
    tests:
    - not_null
  - name: encrypted_team_name
    description: Encrypted name of the team
    tests:
    - not_null
  - name: organization_id
    description: Identifier of the organization the team belongs to
    tests:
    - not_null
  - name: team_id
    description: Unique identifier for the team
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each team. For this
        table, each row represents a distinct team within the organization. The team_id
        appears to be unique across rows, as each entry has a different value.

stg_task_tag_data (first 100 rows)

tag_id task_id
0 163915846063865 1199893417793941
1 155711202630775 1199893417793941
2 205334637062213 1199893417793941
3 155711202630769 1199893417793941

stg_task_tag_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-06 00:04:25.812042+00:00
WITH 
"task_tag_data_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "tag_id",
        "task_id"
    FROM "memory"."main"."task_tag_data"
),

"task_tag_data_projected_casted" AS (
    -- Column Type Casting: 
    -- tag_id: from INT to VARCHAR
    -- task_id: from INT to VARCHAR
    SELECT
        CAST("tag_id" AS VARCHAR) AS "tag_id",
        CAST("task_id" AS VARCHAR) AS "task_id"
    FROM "task_tag_data_projected"
)

-- COCOON BLOCK END
SELECT * FROM "task_tag_data_projected_casted"

stg_task_tag_data.yml (Document the table)

version: 2
models:
- name: stg_task_tag_data
  description: The table represents a many-to-many relationship between tags and tasks.
    Each row links a tag (identified by tag_id) to a task (identified by task_id).
    Multiple tags can be associated with a single task, and a single tag can be applied
    to multiple tasks. This structure allows for flexible categorization and organization
    of tasks using tags.
  columns:
  - name: tag_id
    description: Unique identifier for a tag
    tests:
    - not_null
  - name: task_id
    description: Unique identifier for a task
    tests:
    - not_null
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_section_data stg_section_data stg_project_data stg_project_data stg_section_data->stg_project_data stg_user_data stg_user_data stg_task_section_data stg_task_section_data stg_task_section_data->stg_section_data stg_task_data stg_task_data stg_task_section_data->stg_task_data stg_story_data stg_story_data stg_story_data->stg_user_data stg_story_data->stg_project_data stg_project_data->stg_user_data stg_team_data stg_team_data stg_project_data->stg_team_data stg_task_data->stg_user_data stg_task_data->stg_user_data stg_task_data->stg_project_data

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_user_data
  primary_key: user_id
  foreign_keys: []
- table_name: stg_story_data
  foreign_keys:
  - column: author_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: target_id
    reference:
      table_name: stg_project_data
      column: project_id
- table_name: stg_project_data
  foreign_keys:
  - column: owner_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: team_id
    reference:
      table_name: stg_team_data
      column: team_id
  primary_key: project_id
- table_name: stg_task_data
  foreign_keys:
  - column: assignee_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: completed_by_id
    reference:
      table_name: stg_user_data
      column: user_id
  - column: workspace_id
    reference:
      table_name: stg_project_data
      column: project_id
  primary_key: task_id
- table_name: stg_section_data
  foreign_keys:
  - column: project_id
    reference:
      table_name: stg_project_data
      column: project_id
  primary_key: section_id
- table_name: stg_team_data
  primary_key: team_id
  foreign_keys: []
- table_name: stg_task_section_data
  foreign_keys:
  - column: parent_task_id
    reference:
      table_name: stg_task_data
      column: task_id
  - column: section_id
    reference:
      table_name: stg_section_data
      column: section_id
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Users
  entity_description: Represents individual users in the system with their basic information
    stored securely.
  table_name: stg_user_data
  primary_key: user_id
- entity_name: Projects
  entity_description: Represents individual projects with their associated details
    and metadata.
  table_name: stg_project_data
  primary_key: project_id
- entity_name: Teams
  entity_description: Represents teams within organizations, tracking team information
    across different organizations.
  table_name: stg_team_data
  primary_key: team_id
- entity_name: Tasks
  entity_description: Represents individual tasks with their properties, allowing
    for task management and tracking within workspaces.
  table_name: stg_task_data
  primary_key: task_id
- entity_name: Project Sections
  entity_description: Represents sections within projects, allowing for organization
    and tracking of different parts of a project.
  table_name: stg_section_data
  primary_key: section_id
relations:
- relation_name: ProjectTeamUserManagement
  relation_description: Projects are created and managed by Users, who are organized
    into Teams within a Workspace.
  table_name: stg_project_data
  entities:
  - Projects
  - Users
  - Teams
- relation_name: ProjectTaskAssignments
  relation_description: Users are assigned Tasks within Projects, tracking task details,
    completion status, and deadlines in a workspace.
  table_name: stg_task_data
  entities:
  - Tasks
  - Users
  - Projects
- relation_name: ProjectSectionOrganization
  relation_description: Project Sections are organizational units within Projects,
    allowing for structured division and management of project content.
  table_name: stg_section_data
  entities:
  - Project Sections
  - Projects
- relation_name: UserProjectStories
  relation_description: Users create and interact with stories or messages within
    Projects, tracking engagement and metadata for each entry.
  table_name: stg_story_data
  entities:
  - Users
  - Projects
- relation_name: TaskSectionAssociation
  relation_description: This associates Tasks with Project Sections, allowing a single
    Task to be organized into multiple Sections or subtasks.
  table_name: stg_task_section_data
  entities:
  - Tasks
  - Project Sections
story:
- relation_name: ProjectTeamUserManagement
  story_line: Users form Teams and create Projects in a Workspace.
- relation_name: ProjectSectionOrganization
  story_line: Users divide Projects into Sections for better organization.
- relation_name: ProjectTaskAssignments
  story_line: Users assign Tasks to team members within Projects.
- relation_name: TaskSectionAssociation
  story_line: Users categorize Tasks into relevant Project Sections.
- relation_name: UserProjectStories
  story_line: Users post stories and messages in Project discussions.