Fivetran Qualtrics

We display the source tables from the data warehouses to model.

block (first 100 rows)

id survey_id _fivetran_deleted _fivetran_synced block_locking block_visibility description randomize_questions type
0 BL_b8x54QK6VJE3A1f SV_8GsOFOOKnw9KAmx False 2023-06-05 03:05:38.341000 NaN NaN Block 1 NaN Standard
1 BL_9BOpMYC7bZTNZTn SV_8GsOFOOKnw9KAmx False 2023-06-05 03:05:38.341000 NaN NaN Block 2 NaN Standard
2 BL_797qWoQosqXXiJv SV_8GsOFOOKnw9KAmx False 2023-06-05 03:05:38.341000 NaN NaN Block 3 NaN Standard
3 BL_8pn7C2f84AUw3ZQ SV_afWZKdnyaANAL5j False 2023-06-05 03:05:52.732000 NaN NaN Block 3 NaN Standard
4 BL_3gZj0OfR5Qncg9T SV_1XGe0Xykmeo0lyB False 2023-06-05 03:05:01.276000 NaN NaN Default Question Block NaN Default

block_question (first 100 rows)

block_id question_id survey_id _fivetran_deleted _fivetran_synced
0 BL_9sjHtHF4P4LyAOF QID21 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.901000
1 BL_9sjHtHF4P4LyAOF QID24 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.901000
2 BL_9sjHtHF4P4LyAOF QID12 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.901000
3 BL_9sjHtHF4P4LyAOF QID11 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.901000
4 BL_diiw4C9MVCcKFHn QID23 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.901000

contact_mailing_list_membership (first 100 rows)

contact_lookup_id _fivetran_synced contact_id directory_id mailing_list_id name owner_id unsubscribe_date unsubscribed
0 CGC_BH2ZlyV81xz5w1R 2023-06-05 03:06:31.111000 CID_0VVnnbsmkvMGyrj POOL_vw3p9mEAf692Aql CG_T1Ppmm6s8WF9uRr TestMl UR_cYdNGsrFUemlDbD NaN False
1 CGC_bhc0s38kcQd67LH 2023-06-05 03:06:31.111000 CID_0VVnnbsmkvMGyrj POOL_vw3p9mEAf692Aql CG_3GrJHnIL1pkJw1P TodayML UR_cYdNGsrFUemlDbD NaN False
2 CGC_kVu3GCgKacGvOKf 2023-06-05 03:06:31.668000 CID_d060swLlNUKGa1L POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD NaN False
3 CGC_RSisAWjzoEqI8Lr 2023-06-05 03:06:31.145000 CID_0pkCodbTgpX4awd POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD NaN False
4 CGC_TReR1yFov0IUarm 2023-06-05 03:06:31.215000 CID_6fEsreE9bM67AVv POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD NaN False

core_contact (first 100 rows)

_fivetran_synced id mailing_list_id first_name last_name email external_data_reference language_ unsubscribed _fivetran_deleted
0 2023-06-05 03:06:31.097000 hersanid ML_553543t34t534 bing bong bingbong@gmail.com bingo123 EN False False

core_mailing_list (first 100 rows)

_fivetran_synced id library_id name category folder _fivetran_deleted
0 2023-06-05 03:06:31.097000 ML_553543t34t534 library_blahblah123 BingBong Bad Category Good Folder False

directory (first 100 rows)

id _fivetran_deleted _fivetran_synced deduplication_criteria_email deduplication_criteria_external_data_reference deduplication_criteria_first_name deduplication_criteria_last_name deduplication_criteria_phone is_default name
0 POOL_vw3p9mEAf692Aql False 2023-06-05 03:06:29.007000 True False False False False True Test_directory

directory_contact (first 100 rows)

directory_id id _fivetran_synced creation_date directory_unsubscribe_date directory_unsubscribed email email_domain embedded_data_last_active_time embedded_data_last_response_date embedded_data_login_date ext_ref first_name language_ last_modified last_name phone write_blanks
0 POOL_vw3p9mEAf692Aql CID_0VVnnbsmkvMGyrj 2023-06-05 03:06:31.110000 2020-07-09 08:32:46.467000 NaN False abc@abc.com abc.com NaN NaN NaN NaN S NaN 2020-08-04 09:53:30.138000 M 3232323.0 NaN
1 POOL_vw3p9mEAf692Aql CID_0pkCodbTgpX4awd 2023-06-05 03:06:31.145000 2020-06-24 15:44:46.380000 NaN False abc1@abc1.com abc1.com NaN NaN NaN NaN Saksham NaN 2020-06-24 15:44:46.380000 Malhotra NaN NaN
2 POOL_vw3p9mEAf692Aql CID_d060swLlNUKGa1L 2023-06-05 03:06:31.668000 2020-06-24 15:45:38.798000 NaN False abc2@abc2.com abc2.com NaN NaN NaN NaN Saksham NaN 2020-06-24 15:45:38.798000 None NaN NaN
3 POOL_vw3p9mEAf692Aql CID_6fEsreE9bM67AVv 2023-06-05 03:06:31.215000 2020-06-24 15:49:29.330000 NaN False abc3@abc3.com abc3.com NaN NaN NaN NaN None NaN 2020-06-24 15:49:29.330000 Malhotra NaN NaN
4 POOL_vw3p9mEAf692Aql CID_71bfBUYt4qS6Fsp 2023-06-05 03:06:31.097000 2020-08-06 05:50:32.126000 NaN False connect2me@gmail.com gmail.com NaN NaN NaN NaN me k NaN 2020-08-07 09:48:20.651000 Aggarwal NaN NaN

directory_mailing_list (first 100 rows)

directory_id id _fivetran_deleted _fivetran_synced creation_date last_modified_date name owner_id
0 POOL_vw3p9mEAf692Aql CG_3lVV6y8VRw0CovA False 2023-06-05 03:06:29.408000 2020-06-19 18:41:15.768000 2020-06-19 18:41:15.768000 TestMailingList2 UR_cYdNGsrFUemlDbD
1 POOL_vw3p9mEAf692Aql CG_2YVdk5sjpPIUfBW False 2023-06-05 03:06:29.408000 2020-08-10 11:43:01.844000 2020-08-10 11:43:01.844000 Test_0810 UR_cYdNGsrFUemlDbD
2 POOL_vw3p9mEAf692Aql CG_2TOcRsxw5QF7r8d False 2023-06-05 03:06:29.408000 2020-08-07 09:48:18.474000 2020-08-07 09:48:18.474000 New Contact List (8/7/20 3:17 PM) UR_cYdNGsrFUemlDbD
3 POOL_vw3p9mEAf692Aql CG_UPYDsHbOosxP51D False 2023-06-05 03:06:29.408000 2020-08-06 05:52:13.640000 2020-08-06 05:52:13.640000 Test060820 UR_cYdNGsrFUemlDbD
4 POOL_vw3p9mEAf692Aql CG_2Xamq59U7RFDz1a False 2023-06-05 03:06:29.408000 2023-02-24 10:47:53.810000 2023-02-24 10:47:53.810000 New Contact List (2/24/23 4:17 PM) UR_cYdNGsrFUemlDbD

distribution (first 100 rows)

id _fivetran_deleted _fivetran_synced created_date header_from_email header_from_name header_reply_to_email header_subject message_library_id message_message_id message_message_text modified_date organization_id owner_id parent_distribution_id recipient_contact_id recipient_library_id recipient_mailing_list_id recipient_sample_id request_status request_type send_date survey_link_expiration_date survey_link_link_type survey_link_survey_id
0 EMD_rXUNkFnt2LlVaYE False 2023-05-02 00:28:21.455000 2021-09-16 09:54:07.000000 None None None None NaN NaN NaN 2021-09-16 03:54:07.000000 puppydev UR_cYdNGsrFUemlDbD NaN None UR_cYdNGsrFUemlDbD CG_6nYnmPWONC4JziN None Generated GeneratedInvite 2021-09-16 09:54:07.000000 2021-11-15 09:54:06.000000 Individual SV_afWZKdnyaANAL5j
1 EMD_pLLqzPVqIC80qOQ False 2023-05-02 00:04:45.632000 2020-06-25 10:58:24.000000 None None None None NaN NaN NaN 2020-06-25 04:58:24.000000 puppydev UR_cYdNGsrFUemlDbD NaN None UR_cYdNGsrFUemlDbD CG_2B4IIbgp8ZC0NVn CG_3dVV5daI3GYd8KD Generated GeneratedInvite 2020-06-25 10:58:24.000000 2020-08-24 10:58:23.000000 Individual SV_1MKtxtk2owJjOol
2 EMD_kOb54z4ZpofISxX False 2023-05-02 00:11:56.931000 2022-12-08 12:48:07.000000 None None None None NaN NaN NaN 2022-12-08 05:53:39.000000 puppydev UR_cYdNGsrFUemlDbD NaN CGC_kKkUtjVs3mFxeAK UR_cYdNGsrFUemlDbD CG_OI48zJxTuMsKUq5 None Generated GeneratedInvite 2022-12-08 12:48:07.000000 2022-12-09 12:48:07.000000 Individual SV_6Jcja72TpOiHjdX
3 EMD_GgJDmizeU8yAwvI False 2023-05-02 00:34:44.169000 2022-04-06 12:16:14.000000 None None None None NaN NaN NaN 2022-04-06 06:16:14.000000 puppydev UR_cYdNGsrFUemlDbD NaN None UR_cYdNGsrFUemlDbD CG_6nYnmPWONC4JziN None Generated GeneratedInvite 2022-04-06 12:16:14.000000 2022-06-05 12:16:14.000000 Individual SV_djzINkbJIFJUs05
4 EMD_8AvFYfH0jD2axcz False 2023-05-02 00:34:44.169000 2022-04-06 11:27:16.000000 noreply@qemailserver.com Developers developers@puppy.com Test Survey NaN NaN NaN 2022-04-06 05:29:56.000000 puppydev UR_cYdNGsrFUemlDbD NaN None None None None Done TestEmail 2022-04-06 11:27:16.000000 None None SV_djzINkbJIFJUs05

distribution_contact (first 100 rows)

contact_id distribution_id _fivetran_synced contact_frequency_rule_id contact_lookup_id opened_at response_completed_at response_id response_started_at sent_at status survey_link survey_session_id
0 CID_4NG7JC9R7VTmHYi EMD_Q8AiTuBzgLZsODX 2023-05-02 00:18:19.286000 NaN CGC_kKkUtjVs3mFxeAK 2023-03-28 07:41:58.000000 NaN NaN NaN 2023-03-28 07:17:14.896000 Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_7Uw1HuxpN0OxbVQ?Q_CHL=email&Q_DL=EMD_Q8AiTuBzgLZsODX_7Uw1HuxpN0OxbVQ_CGC_kKkUtjVs3mFxeAK&_g_=g NaN
1 CID_4NG7JC9R7VTmHYi EMD_FV7lj9zfNfFAxFI 2023-05-02 00:18:18.931000 NaN CGC_T8oa4TlYPoGvnwW 2023-04-25 09:19:47.000000 NaN NaN NaN 2023-04-25 09:06:38.894000 Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_7Uw1HuxpN0OxbVQ?Q_CHL=email&Q_DL=EMD_FV7lj9zfNfFAxFI_7Uw1HuxpN0OxbVQ_CGC_T8oa4TlYPoGvnwW&_g_=g NaN
2 CID_4NG7JC9R7VTmHYi EMD_b4UhhqAF6Sx0Mcl 2023-05-02 00:11:58.321000 NaN CGC_kKkUtjVs3mFxeAK 2023-01-10 11:30:29.000000 NaN NaN NaN 2023-01-10 11:30:20.925000 Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_6Jcja72TpOiHjdX?Q_CHL=email&Q_DL=EMD_b4UhhqAF6Sx0Mcl_6Jcja72TpOiHjdX_CGC_kKkUtjVs3mFxeAK&_g_=g NaN
3 CID_9Fa5PKgMwjiExoh EMD_OY1XxLK7rncWDt8 2023-05-02 00:34:45.341000 NaN CGC_0ZqBiW44uMDQbon 2022-04-06 13:14:19.000000 NaN NaN NaN 2022-04-06 13:14:18.296000 Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_djzINkbJIFJUs05?Q_CHL=email&Q_DL=EMD_OY1XxLK7rncWDt8_djzINkbJIFJUs05_CGC_0ZqBiW44uMDQbon&_g_=g NaN
4 CID_9Fa5PKgMwjiExoh EMD_GgJDmizeU8yAwvI 2023-05-02 00:34:44.672000 NaN CGC_0ZqBiW44uMDQbon None NaN NaN NaN 2022-09-12 05:13:55.058000 Pending https://myorg.ca1.qualtrics.com/jfe/form/SV_djzINkbJIFJUs05?Q_CHL=gl&Q_DL=EMD_GgJDmizeU8yAwvI_djzINkbJIFJUs05_CGC_0ZqBiW44uMDQbon&_g_=g NaN

question (first 100 rows)

id survey_id _fivetran_deleted _fivetran_synced data_export_tag data_visibility_hidden data_visibility_private next_answer_id next_choice_id question_description question_description_option question_text question_text_unsafe question_type selector sub_selector validation_setting_force_response validation_setting_force_response_type validation_setting_type
0 QID26 SV_djzINkbJIFJUs05 False 2023-06-05 03:06:14.697000 Q21 NaN NaN 1 4 Click to write the question text UseText Click to write the question text Click to write the question text CS VRTL TX NaN NaN NaN
1 QID9 SV_afWZKdnyaANAL5j False 2023-06-05 03:05:52.723000 Q9 False False 1 6 Marks UseText Marks Marks CS VRTL TX NaN NaN NaN
2 QID9 SV_6Jcja72TpOiHjdX False 2023-06-05 03:05:16.052000 Q9 False False 1 6 Marks UseText Marks Marks CS VRTL TX NaN NaN NaN
3 QID9 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.889000 Q9 False False 1 6 Marks UseText Marks Marks CS VRTL TX NaN NaN NaN
4 QID1 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.885000 Q1 False False 1 4 Tell About experience ? UseText Tell About experience ?  Tell About experience ?  DB TB None NaN NaN NaN

question_option (first 100 rows)

key_ question_id survey_id _fivetran_deleted _fivetran_synced recode_value text
0 1 QID1 SV_5vVBVUfSVz94szc False 2023-06-05 03:05:08.604000 1 Testing 1
1 1 QID1 SV_08GFzuA5eDhxx53 False 2023-06-05 03:04:38.166000 1 Click to write Choice 1
2 1 QID1 SV_6zmBz8qb19bpMvH False 2023-06-05 03:05:23.519000 1 Click to write Choice 1
3 1 QID1 SV_1XGe0Xykmeo0lyB False 2023-06-05 03:05:01.276000 1 Click to write Choice 1
4 1 QID1 SV_0VZnz7EkhZquEKy False 2023-06-05 03:04:45.576000 1 Click to write Choice 1

question_response (first 100 rows)

_fivetran_id _fivetran_synced loop_id question question_id question_option_key response_id sub_question_key sub_question_text value_
0 vnzZKokL2ijFL+3cv266gKI+R3U= 2023-05-02 00:18:29.039000 NaN Drink QID2 NaN R_2w6q9pgBpebvjCg NaN NaN 2.0
1 ZOmuYJzFNtYua4Iwbtk474SXBsk= 2023-05-02 00:18:29.040000 NaN Drink QID2 NaN R_3RvEQNuWciDhUpl NaN NaN 2.0
2 KeKECHYR7V5729skoyowWs4Ah4o= 2023-05-02 00:18:29.041000 NaN Drink QID2 NaN R_2SqSR1Q6pLJZplH NaN NaN 1.0
3 j2kXb68qeBgQ9Tdsj/nyiypvEZ0= 2023-05-02 00:18:29.042000 NaN Drink QID2 NaN R_2xyo8lOlBsJkqNT NaN NaN 3.0
4 tm3O8dMY4osydnC+ZJuS+O5u8QI= 2023-05-02 00:25:43.221000 NaN state QID58 NaN R_1Qnonc63Wkk2Fur NaN NaN NaN

sub_question (first 100 rows)

key_ question_id survey_id _fivetran_deleted _fivetran_synced choice_data_export_tag text
0 2 QID4 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.892000 NaN Product
1 1 QID4 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.892000 NaN QA
2 3 QID4 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.892000 NaN Support
3 1 QID6 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.893000 NaN Java
4 3 QID6 SV_06auvDLqWENyV6t False 2023-06-05 03:04:30.893000 NaN C

survey (first 100 rows)

id _fivetran_deleted _fivetran_synced auto_scoring_category brand_base_url brand_id bundle_short_name composition_type creator_id default_scoring_category division_id is_active last_accessed last_activated last_modified option_active_response_set option_anonymize_response option_auto_confirm_start option_autoadvance option_autoadvance_pages option_autofocus option_available_languages option_back_button option_ballot_box_stuffing_prevention option_collect_geo_location option_confirm_start option_custom_styles option_email_thank_you option_eosredirect_url option_highlight_questions option_inactive_survey option_new_scoring option_next_button option_no_index option_page_transition option_partial_data option_partial_data_close_after option_password_protection option_previous_button option_progress_bar_display option_protect_selection_ids option_recaptcha_v_3 option_referer_check option_referer_url option_relevant_id option_relevant_idlockout_period option_response_summary option_save_and_continue option_secure_response_files option_show_export_tags option_skin option_skin_library option_skin_question_width option_skin_type option_survey_creation_date option_survey_expiration option_survey_language option_survey_meta_description option_survey_name option_survey_protection option_survey_termination option_survey_title option_validate_message owner_id project_category project_type registry_sha registry_version schema_version scoring_summary_after_questions scoring_summary_after_survey scoring_summary_category survey_name survey_status
0 SV_cARsuFqXn9bksDk False 2023-06-05 03:06:07.443000 NaN https://fivetrandev.ca1.qualtrics.com fivetrandev NaN NaN NaN NaN NaN NaN NaN None 2022-07-15 15:31:11.000000 RS_3zccqsZVQTIwYfQ NaN NaN False False False NaN False True NaN NaN {"altRowStyles":{"selector":".Skin .ReadableAlt, .Skin .LightBG","styles":{"backgroundColor":null}},"borderStyles":{"selector":".Skin .CS .horizontalbar table.sliderGrid tr.xlabel th, .Skin .CS .horizontalbar .ylabel, .Skin .CS .horizontalbar table.sliderGrid tr td.value, .Skin .Slider .horizontalbar table.sliderGrid tr.xlabel th, .Skin .BorderColor, .Skin .DarkBorderColor, .Skin .Slider .horizontalbar table.sliderGrid tr td.value, .Skin thead, .Skin .GAP .ChoiceStructure thead th ","styles":{"borderColor":null}},"choiceStyles":{"selector":".Skin .horizontalbar th.ylabel, .Skin .Matrix table td, .Skin .Matrix table th, .Skin .Matrix table thead th, .Skin .Matrix table thead td, .Skin .PGR .DragAndDrop .Items label, .Skin .QuestionBody ul.ChoiceStructure, .Skin .PGR .DragAndDrop .Group ul, .Skin .PGR .DragAndDrop .Group h2, .Skin .PGR .DragAndDrop .Items ul, .Skin .PGR .DragAndDrop .Items h2, .Skin .DD .QuestionBody table.ChoiceStructure, .Skin .PGR .DragAndDrop .NoColumns td.groupsContainerTd div ul, .Skin .PGR .DragAndDrop .NoColumns td.groupsContainerTd div h2, .Skin .SBS thead th, .Skin .SBS td, .Skin .SBS .Answers td, .Skin .SBS .Answers th, .SBS table.ChoiceStructure, .Skin .horizontalbar thead table.LabelDescriptions tr td, .Skin .horizontalbar thead tr td.NotApplicable, .QuestionBody th, .Skin .QuestionBody .MC .MAVR label, .Skin .QuestionBody .MC .SAVR label, .Skin .SBS thead th, .Skin .SBS td, .Skin .RO .DND ul li, .Skin .reg, .SkinInner .ChoiceStructure, .Skin thead, .Skin .GAP .ChoiceStructure thead th, .Skin .GAP .ChoiceStructure .c4, .Skin .GAP .ChoiceStructure th, .Skin .GAP .ChoiceStructure .WhyText, .SkinInner .ChoiceStructure .LightBG, .Skin .QuestionOuter.Highlight .Inner .reg, .Skin .QuestionOuter.Highlight .Inner .alt, .Skin .QuestionOuter.Highlight .Inner .ReadableAlt ","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}},"customCSS":null,"errorStyles":{"selector":".Skin .ValidationError","styles":{"backgroundColor":null,"borderColor":null,"color":null}},"errorStylesBG":{"selector":".Skin .HeaderValidationError, .Skin .HeaderValidationError li a","styles":{"backgroundColor":null,"color":null}},"footerStyles":{"selector":".Skin #Footer","styles":{"color":null}},"headerStyles":{"selector":".Skin div#Header","styles":{"color":null}},"highlightStyles":{"selector":".Skin .QuestionOuter.Highlight .Inner, .Skin .QuestionOuter.Highlight .Inner .reg, .Skin .QuestionOuter.Highlight .Inner .alt, .Skin .QuestionOuter.Highlight .Inner .ReadableAlt","styles":{"backgroundColor":null}},"highlightStylesAlt":{"selector":".Skin .QuestionOuter.Highlight .Inner .ReadableAlt, .Skin .QuestionOuter.Highlight .Inner .LightBG","styles":{"backgroundColor":null}},"pageStyles":{"selector":"#SurveyEngineBody","styles":{"backgroundColor":null}},"questionSeparatorStyles":{"selector":".Skin .Separator","styles":{"backgroundColor":null,"display":null}},"questionStyles":{"selector":".Skin .QuestionText","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}},"textEntryStyles":{"selector":".Skin select, .Skin .InputText","styles":{"color":null}},"textStyles":{"selector":".SkinInner","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}}} NaN NaN off NaN 1 Yes NaN +1 week NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN True True NaN "xmskin" Qualtrics 700.0 MQ 2022-07-15 15:30:27.000000 NaN EN NaN NaN PublicSurvey DefaultMessage None NaN UR_cYdNGsrFUemlDbD CX CustomerSatisfactionScore NaN NaN NaN False False NaN TestSurvey_FilterAPI Inactive
1 SV_08GFzuA5eDhxx53 False 2023-06-05 03:04:38.166000 NaN https://fivetrandev.ca1.qualtrics.com fivetrandev NaN NaN NaN NaN NaN NaN NaN None 2020-07-01 10:25:18.000000 RS_3XaYj2HxzK4NTs9 NaN NaN NaN NaN NaN NaN False False NaN NaN None NaN NaN None NaN 1 Yes NaN +1 week NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN True True NaN {"brandingId":null,"templateId":"*2014","overrides":null} fivetrandev NaN templated 2020-07-01 10:25:11.000000 NaN EN NaN NaN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions NaN UR_cYdNGsrFUemlDbD CX None NaN NaN NaN False False NaN Untitled Project Inactive
2 SV_8GsOFOOKnw9KAmx False 2023-06-05 03:05:38.340000 NaN https://fivetrandev.ca1.qualtrics.com fivetrandev NaN NaN NaN NaN NaN NaN NaN 2020-06-30 11:24:41.000000 2020-07-01 08:57:47.000000 RS_bjbsBC4lfPOyEhT NaN NaN NaN NaN NaN NaN False False NaN NaN None NaN NaN None NaN 1 Yes NaN +1 week NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN True True NaN {"brandingId":null,"templateId":"*2014","overrides":null} fivetrandev NaN templated 2020-06-30 11:22:54.000000 NaN EN NaN NaN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions NaN UR_cYdNGsrFUemlDbD CX None NaN NaN NaN False False NaN TestSurvey2 Active
3 SV_8vvMJsA88y5AZxA False 2023-06-05 03:05:45.573000 NaN https://fivetrandev.ca1.qualtrics.com fivetrandev NaN NaN NaN NaN NaN NaN NaN None 2021-02-04 10:59:03.000000 RS_38eltRqQKKndZSS NaN NaN NaN NaN NaN NaN False False NaN NaN None NaN NaN None NaN 1 Yes NaN +1 week NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN True True NaN {"brandingId":null,"templateId":"*2014","overrides":null} fivetrandev NaN templated 2021-02-04 10:55:54.000000 NaN EN NaN NaN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions NaN UR_cYdNGsrFUemlDbD CORE None NaN NaN NaN False False NaN SurveyWithMAQ1 Inactive
4 SV_1XGe0Xykmeo0lyB False 2023-06-05 03:05:01.275000 NaN https://fivetrandev.ca1.qualtrics.com fivetrandev NaN NaN NaN NaN NaN NaN NaN None 2020-08-06 05:38:26.000000 RS_1MNRbPFAJML4L0F NaN NaN NaN NaN NaN NaN False False NaN NaN None NaN NaN None NaN 1 Yes NaN +1 week NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN True True NaN {"brandingId":null,"templateId":"*2014","overrides":null} fivetrandev NaN templated 2020-08-06 05:38:07.000000 NaN EN NaN NaN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions NaN UR_cYdNGsrFUemlDbD CORE None NaN NaN NaN False False NaN Untitled Project Inactive

survey_embedded_data (first 100 rows)

import_id key_ response_id value_ _fivetran_synced
0 Website Website R_2SABUWvdPsqUvZf NaN 2023-05-02 00:28:32.500000
1 Website Website R_2sbtWBrH4K97W5X NaN 2023-05-02 00:28:57.167000
2 Website Website R_VQlBNvN3iJYpL1L NaN 2023-05-02 00:28:32.493000
3 Website Website R_2BxNumKW69fwQWV NaN 2023-05-02 00:28:32.507000
4 country country R_1Qnonc63Wkk2Fur NaN 2023-05-02 00:25:43.232000

survey_response (first 100 rows)

id _fivetran_synced distribution_channel duration_in_seconds end_date finished ip_address last_modified_date location_latitude location_longitude progress recipient_email recipient_first_name recipient_last_name recorded_date start_date status survey_id user_language
0 R_3e9aZTqhcgXRH0o 2023-05-02 00:11:25.529000 email 9 2022-12-07 08:59:21.000000 0 182.69.177.44 2022-12-14 09:02:00.000000 NaN NaN 20 jatin.aggarwal@fivetran.com None None 2022-12-14 09:02:00.000000 2022-12-07 08:59:11.000000 0 SV_6Jcja72TpOiHjdX EN
1 R_2DUphVbjGZGkvkg 2023-05-02 00:11:13.034000 anonymous 118 2022-10-17 10:52:34.000000 0 182.69.177.103 2022-10-24 10:52:34.000000 NaN NaN 12 None None None 2022-10-24 10:52:34.000000 2022-10-17 10:50:36.000000 0 SV_6Jcja72TpOiHjdX EN
2 R_1LIOcMY4EdVCU3r 2023-05-02 00:11:25.566000 anonymous 26 2022-12-07 10:04:28.000000 0 182.69.177.44 2022-12-14 10:04:31.000000 NaN NaN 20 None None None 2022-12-14 10:04:31.000000 2022-12-07 10:04:01.000000 0 SV_6Jcja72TpOiHjdX EN
3 R_3q9HmgV7GfTQP6o 2023-05-02 00:11:25.381000 anonymous 8 2022-10-19 13:38:21.000000 0 49.204.210.134 2022-12-07 07:58:37.000000 NaN NaN 32 None None None 2022-10-26 14:47:34.000000 2022-10-19 13:38:13.000000 0 SV_6Jcja72TpOiHjdX EN
4 R_3lDZ9xxqR5Ptexd 2023-05-02 00:11:43.953000 email 32 2023-02-28 12:40:15.000000 1 122.162.147.254 2023-03-21 12:47:06.000000 NaN NaN 23 jatin.aggarwal1406@gmail.com Jatin Aggarwak=l 2023-02-28 12:44:17.000000 2023-02-28 12:39:43.000000 0 SV_6Jcja72TpOiHjdX EN

survey_version (first 100 rows)

id survey_id _fivetran_deleted _fivetran_synced creation_date description published user_id version_number was_published
0 9223351 SV_06auvDLqWENyV6t False 2023-06-05 03:04:37.796000 2020-08-05 04:25:37.000000 Autosaved version False UR_cYdNGsrFUemlDbD 1 False
1 255659202 SV_08GFzuA5eDhxx53 False 2023-06-05 03:04:45.105000 2020-07-01 10:25:16.000000 Autosaved version False UR_cYdNGsrFUemlDbD 1 False
2 20510032 SV_0VZnz7EkhZquEKy False 2023-06-05 03:04:52.742000 2022-09-29 06:51:05.000000 Autosaved version True UR_cYdNGsrFUemlDbD 1 True
3 40331332 SV_1MKtxtk2owJjOol False 2023-06-05 03:05:00.836000 2020-06-23 12:14:04.000000 Autosaved version False UR_cYdNGsrFUemlDbD 3 False
4 63473806 SV_1MKtxtk2owJjOol False 2023-06-05 03:05:00.836000 2020-06-25 13:21:42.000000 Autosaved version False UR_cYdNGsrFUemlDbD 4 False

user (first 100 rows)

id _fivetran_deleted _fivetran_synced account_creation_date account_expiration_date account_status division_id email first_name language_ last_login_date last_name organization_id password_expiration_date password_last_changed_date response_count_auditable response_count_deleted response_count_generated time_zone unsubscribed user_type username
0 UR_cYdNGsrFUemlDbD False 2023-06-04 15:06:38.944000 NaN NaN active NaN developers@puppy.com Puppy NaN NaN Developers NaN NaN NaN NaN NaN NaN NaN NaN UT_BRANDADMIN developers@fivetran.com
1 UR_0AnU60plqo2I4DL False 2023-06-04 15:06:38.944000 NaN NaN active NaN cat@fivetran.com Cat NaN NaN Meow NaN NaN NaN NaN NaN NaN NaN NaN UT_BRANDADMIN colin@fivetran.com
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_core_contact (first 100 rows)

id mailing_list_id first_name last_name email external_data_reference language_ unsubscribed _fivetran_deleted
0 hersanid ML_553543t34t534 bing bong bingbong@gmail.com bingo123 EN False False

stg_core_contact.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:55.483602+00:00
WITH 
"core_contact_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "mailing_list_id",
        "first_name",
        "last_name",
        "email",
        "external_data_reference",
        "language_",
        "unsubscribed",
        "_fivetran_deleted"
    FROM "memory"."main"."core_contact"
)

-- COCOON BLOCK END
SELECT * FROM "core_contact_projected"

stg_core_contact.yml (Document the table)

version: 2
models:
- name: stg_core_contact
  description: Please summarize the table
  columns:
  - name: id
    description: id
    tests:
    - not_null
  - name: mailing_list_id
    description: mailing_list_id
    tests:
    - not_null
  - name: first_name
    description: first_name
    tests:
    - not_null
  - name: last_name
    description: last_name
    tests:
    - not_null
  - name: email
    description: email
    tests:
    - not_null
  - name: external_data_reference
    description: external_data_reference
    tests:
    - not_null
  - name: language_
    description: language_
    tests:
    - not_null
  - name: unsubscribed
    description: unsubscribed
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null

stg_survey_response (first 100 rows)

response_id distribution_channel survey_duration_seconds respondent_ip_address survey_progress_percentage recipient_email recipient_first_name recipient_last_name response_status survey_id respondent_language is_survey_completed last_modified_datetime response_end_datetime response_recorded_datetime response_start_datetime
0 R_3e9aZTqhcgXRH0o email 9 182.69.177.44 20 jatin.aggarwal@fivetran.com None None 0 SV_6Jcja72TpOiHjdX EN b'\x00\x00\x00\x00\x00\x00\x00\x00\x00' 2022-12-14 09:02:00 2022-12-07 08:59:21 2022-12-14 09:02:00 2022-12-07 08:59:11
1 R_2DUphVbjGZGkvkg anonymous 118 182.69.177.103 12 None None None 0 SV_6Jcja72TpOiHjdX EN b'\x00\x00\x00\x00\x00\x00\x00\x00\x00' 2022-10-24 10:52:34 2022-10-17 10:52:34 2022-10-24 10:52:34 2022-10-17 10:50:36
2 R_1LIOcMY4EdVCU3r anonymous 26 182.69.177.44 20 None None None 0 SV_6Jcja72TpOiHjdX EN b'\x00\x00\x00\x00\x00\x00\x00\x00\x00' 2022-12-14 10:04:31 2022-12-07 10:04:28 2022-12-14 10:04:31 2022-12-07 10:04:01
3 R_3q9HmgV7GfTQP6o anonymous 8 49.204.210.134 32 None None None 0 SV_6Jcja72TpOiHjdX EN b'\x00\x00\x00\x00\x00\x00\x00\x00\x00' 2022-12-07 07:58:37 2022-10-19 13:38:21 2022-10-26 14:47:34 2022-10-19 13:38:13
4 R_3lDZ9xxqR5Ptexd email 32 122.162.147.254 23 jatin.aggarwal1406@gmail.com Jatin Aggarwak=l 0 SV_6Jcja72TpOiHjdX EN b'\x00\x00\x00\x00\x00\x00\x00\x00\x01' 2023-03-21 12:47:06 2023-02-28 12:40:15 2023-02-28 12:44:17 2023-02-28 12:39:43

stg_survey_response.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:59:48.468752+00:00
WITH 
"survey_response_projected" AS (
    -- Projection: Selecting 18 out of 19 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "distribution_channel",
        "duration_in_seconds",
        "end_date",
        "finished",
        "ip_address",
        "last_modified_date",
        "location_latitude",
        "location_longitude",
        "progress",
        "recipient_email",
        "recipient_first_name",
        "recipient_last_name",
        "recorded_date",
        "start_date",
        "status",
        "survey_id",
        "user_language"
    FROM "memory"."main"."survey_response"
),

"survey_response_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> response_id
    -- duration_in_seconds -> survey_duration_seconds
    -- end_date -> response_end_datetime
    -- finished -> is_survey_completed
    -- ip_address -> respondent_ip_address
    -- last_modified_date -> last_modified_datetime
    -- location_latitude -> respondent_latitude
    -- location_longitude -> respondent_longitude
    -- progress -> survey_progress_percentage
    -- recorded_date -> response_recorded_datetime
    -- start_date -> response_start_datetime
    -- status -> response_status
    -- user_language -> respondent_language
    SELECT 
        "id" AS "response_id",
        "distribution_channel",
        "duration_in_seconds" AS "survey_duration_seconds",
        "end_date" AS "response_end_datetime",
        "finished" AS "is_survey_completed",
        "ip_address" AS "respondent_ip_address",
        "last_modified_date" AS "last_modified_datetime",
        "location_latitude" AS "respondent_latitude",
        "location_longitude" AS "respondent_longitude",
        "progress" AS "survey_progress_percentage",
        "recipient_email",
        "recipient_first_name",
        "recipient_last_name",
        "recorded_date" AS "response_recorded_datetime",
        "start_date" AS "response_start_datetime",
        "status" AS "response_status",
        "survey_id",
        "user_language" AS "respondent_language"
    FROM "survey_response_projected"
),

"survey_response_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- recipient_last_name: The problem is that the last name "Aggarwak=l" contains an equals sign (=), which is not typical in names. It appears that the equals sign was mistakenly included, possibly due to a data entry error or parsing issue. The correct value should be "Aggarwal", which is a common Indian surname. 
    SELECT
        "response_id",
        "distribution_channel",
        "survey_duration_seconds",
        "response_end_datetime",
        "is_survey_completed",
        "respondent_ip_address",
        "last_modified_datetime",
        "respondent_latitude",
        "respondent_longitude",
        "survey_progress_percentage",
        "recipient_email",
        "recipient_first_name",
        CASE
            WHEN "recipient_last_name" = '''Aggarwak=l''' THEN '''Aggarwal'''
            ELSE "recipient_last_name"
        END AS "recipient_last_name",
        "response_recorded_datetime",
        "response_start_datetime",
        "response_status",
        "survey_id",
        "respondent_language"
    FROM "survey_response_projected_renamed"
),

"survey_response_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- is_survey_completed: from INT to BIT
    -- last_modified_datetime: from VARCHAR to TIMESTAMP
    -- response_end_datetime: from VARCHAR to TIMESTAMP
    -- response_recorded_datetime: from VARCHAR to TIMESTAMP
    -- response_start_datetime: from VARCHAR to TIMESTAMP
    SELECT
        "response_id",
        "distribution_channel",
        "survey_duration_seconds",
        "respondent_ip_address",
        "respondent_latitude",
        "respondent_longitude",
        "survey_progress_percentage",
        "recipient_email",
        "recipient_first_name",
        "recipient_last_name",
        "response_status",
        "survey_id",
        "respondent_language",
        CAST("is_survey_completed" AS BIT) AS "is_survey_completed",
        CAST("last_modified_datetime" AS TIMESTAMP) AS "last_modified_datetime",
        CAST("response_end_datetime" AS TIMESTAMP) AS "response_end_datetime",
        CAST("response_recorded_datetime" AS TIMESTAMP) AS "response_recorded_datetime",
        CAST("response_start_datetime" AS TIMESTAMP) AS "response_start_datetime"
    FROM "survey_response_projected_renamed_cleaned"
),

"survey_response_projected_renamed_cleaned_casted_missing_handled" AS (
    -- Handling missing values: There are 4 columns with unacceptable missing values
    -- recipient_first_name has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- recipient_last_name has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- respondent_latitude has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- respondent_longitude has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "response_id",
        "distribution_channel",
        "survey_duration_seconds",
        "respondent_ip_address",
        "survey_progress_percentage",
        "recipient_email",
        "recipient_first_name",
        "recipient_last_name",
        "response_status",
        "survey_id",
        "respondent_language",
        "is_survey_completed",
        "last_modified_datetime",
        "response_end_datetime",
        "response_recorded_datetime",
        "response_start_datetime"
    FROM "survey_response_projected_renamed_cleaned_casted"
)

-- COCOON BLOCK END
SELECT * FROM "survey_response_projected_renamed_cleaned_casted_missing_handled"

stg_survey_response.yml (Document the table)

version: 2
models:
- name: stg_survey_response
  description: The table is about survey responses. It contains details like response
    ID, distribution channel, duration, start/end dates, IP address, respondent information,
    survey progress, and survey ID. Each row represents a single survey response with
    associated metadata. The table tracks both completed and incomplete survey attempts
    across different channels (email, anonymous) with timestamps and respondent details
    when available.
  columns:
  - name: response_id
    description: Unique identifier for the survey response
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each survey response.
        For this table, each row is for a single survey response. response_id is unique
        across rows.
  - name: distribution_channel
    description: Method used to distribute the survey
    tests:
    - not_null
    - accepted_values:
        values:
        - anonymous
        - email
        - phone
        - mail
        - in-person
        - online
        - social media
  - name: survey_duration_seconds
    description: Time taken to complete the survey in seconds
    tests:
    - not_null
  - name: respondent_ip_address
    description: IP address of the respondent
    tests:
    - not_null
  - name: survey_progress_percentage
    description: Percentage of survey completed
    tests:
    - not_null
  - name: recipient_email
    description: Email address of the survey recipient
    cocoon_meta:
      missing_acceptable: Not applicable for anonymous distribution channels
  - name: recipient_first_name
    description: First name of the survey recipient
    tests:
    - not_null
  - name: recipient_last_name
    description: Last name of the survey recipient
    tests:
    - not_null
  - name: response_status
    description: Status of the survey response
    tests:
    - not_null
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
  - name: respondent_language
    description: Language used by the respondent
    tests:
    - not_null
    - accepted_values:
        values:
        - EN
        - ES
        - FR
        - DE
        - IT
        - PT
        - RU
        - ZH
        - JA
        - KO
        - AR
        - HI
        - BN
        - UR
        - FA
        - TR
        - VI
        - TH
        - ID
        - NL
        - PL
        - SV
        - FI
        - 'NO'
        - DA
  - name: is_survey_completed
    description: Indicates if the survey was completed (0=No, 1=Yes)
    tests:
    - not_null
  - name: last_modified_datetime
    description: Date and time of last modification to the response
    tests:
    - not_null
  - name: response_end_datetime
    description: Date and time when the survey response ended
    tests:
    - not_null
  - name: response_recorded_datetime
    description: Date and time when the response was recorded
    tests:
    - not_null
  - name: response_start_datetime
    description: Date and time when the survey response started
    tests:
    - not_null

stg_directory_contact (first 100 rows)

directory_id id creation_date directory_unsubscribed email email_domain first_name last_modified last_name phone
0 POOL_vw3p9mEAf692Aql CID_0VVnnbsmkvMGyrj 2020-07-09 08:32:46.467000 False abc@abc.com abc.com S 2020-08-04 09:53:30.138000 M 3232323.0
1 POOL_vw3p9mEAf692Aql CID_0pkCodbTgpX4awd 2020-06-24 15:44:46.380000 False abc1@abc1.com abc1.com Saksham 2020-06-24 15:44:46.380000 Malhotra NaN
2 POOL_vw3p9mEAf692Aql CID_d060swLlNUKGa1L 2020-06-24 15:45:38.798000 False abc2@abc2.com abc2.com Saksham 2020-06-24 15:45:38.798000 None NaN
3 POOL_vw3p9mEAf692Aql CID_6fEsreE9bM67AVv 2020-06-24 15:49:29.330000 False abc3@abc3.com abc3.com None 2020-06-24 15:49:29.330000 Malhotra NaN
4 POOL_vw3p9mEAf692Aql CID_71bfBUYt4qS6Fsp 2020-08-06 05:50:32.126000 False connect2me@gmail.com gmail.com me k 2020-08-07 09:48:20.651000 Aggarwal NaN

stg_directory_contact.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:42:00.767074+00:00
WITH 
"directory_contact_projected" AS (
    -- Projection: Selecting 17 out of 18 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "directory_id",
        "id",
        "creation_date",
        "directory_unsubscribe_date",
        "directory_unsubscribed",
        "email",
        "email_domain",
        "embedded_data_last_active_time",
        "embedded_data_last_response_date",
        "embedded_data_login_date",
        "ext_ref",
        "first_name",
        "language_",
        "last_modified",
        "last_name",
        "phone",
        "write_blanks"
    FROM "memory"."main"."directory_contact"
),

"directory_contact_projected_missing_handled" AS (
    -- Handling missing values: There are 10 columns with unacceptable missing values
    -- directory_unsubscribe_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- embedded_data_last_active_time has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- embedded_data_last_response_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- embedded_data_login_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- ext_ref has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- first_name has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- language_ has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- last_name has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- phone has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- write_blanks has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "directory_id",
        "id",
        "creation_date",
        "directory_unsubscribed",
        "email",
        "email_domain",
        "first_name",
        "last_modified",
        "last_name",
        "phone"
    FROM "directory_contact_projected"
)

-- COCOON BLOCK END
SELECT * FROM "directory_contact_projected_missing_handled"

stg_directory_contact.yml (Document the table)

version: 2
models:
- name: stg_directory_contact
  description: Please summarize the table
  columns:
  - name: directory_id
    description: directory_id
    tests:
    - not_null
  - name: id
    description: id
    tests:
    - not_null
  - name: creation_date
    description: creation_date
    tests:
    - not_null
  - name: directory_unsubscribed
    description: directory_unsubscribed
    tests:
    - not_null
  - name: email
    description: email
    tests:
    - not_null
  - name: email_domain
    description: email_domain
    tests:
    - not_null
  - name: first_name
    description: first_name
    tests:
    - not_null
  - name: last_modified
    description: last_modified
    tests:
    - not_null
  - name: last_name
    description: last_name
    tests:
    - not_null
  - name: phone
    description: phone
    tests:
    - not_null

stg_contact_mailing_list_membership (first 100 rows)

contact_lookup_id contact_id directory_id mailing_list_id name owner_id unsubscribed
0 CGC_BH2ZlyV81xz5w1R CID_0VVnnbsmkvMGyrj POOL_vw3p9mEAf692Aql CG_T1Ppmm6s8WF9uRr TestMl UR_cYdNGsrFUemlDbD False
1 CGC_bhc0s38kcQd67LH CID_0VVnnbsmkvMGyrj POOL_vw3p9mEAf692Aql CG_3GrJHnIL1pkJw1P TodayML UR_cYdNGsrFUemlDbD False
2 CGC_kVu3GCgKacGvOKf CID_d060swLlNUKGa1L POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD False
3 CGC_RSisAWjzoEqI8Lr CID_0pkCodbTgpX4awd POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD False
4 CGC_TReR1yFov0IUarm CID_6fEsreE9bM67AVv POOL_vw3p9mEAf692Aql CG_2B4IIbgp8ZC0NVn MailAgain UR_cYdNGsrFUemlDbD False

stg_contact_mailing_list_membership.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:53.401328+00:00
WITH 
"contact_mailing_list_membership_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "contact_lookup_id",
        "contact_id",
        "directory_id",
        "mailing_list_id",
        "name",
        "owner_id",
        "unsubscribe_date",
        "unsubscribed"
    FROM "memory"."main"."contact_mailing_list_membership"
),

"contact_mailing_list_membership_projected_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- unsubscribe_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "contact_lookup_id",
        "contact_id",
        "directory_id",
        "mailing_list_id",
        "name",
        "owner_id",
        "unsubscribed"
    FROM "contact_mailing_list_membership_projected"
)

-- COCOON BLOCK END
SELECT * FROM "contact_mailing_list_membership_projected_missing_handled"

stg_contact_mailing_list_membership.yml (Document the table)

version: 2
models:
- name: stg_contact_mailing_list_membership
  description: Please summarize the table
  columns:
  - name: contact_lookup_id
    description: contact_lookup_id
    tests:
    - not_null
  - name: contact_id
    description: contact_id
    tests:
    - not_null
  - name: directory_id
    description: directory_id
    tests:
    - not_null
  - name: mailing_list_id
    description: mailing_list_id
    tests:
    - not_null
  - name: name
    description: name
    tests:
    - not_null
  - name: owner_id
    description: owner_id
    tests:
    - not_null
  - name: unsubscribed
    description: unsubscribed
    tests:
    - not_null

stg_survey (first 100 rows)

survey_id is_deleted brand_base_url brand_id active_response_set autoadvance autoadvance_pages autofocus show_back_button prevent_ballot_stuffing highlight_questions new_scoring_enabled next_button_text no_index_enabled partial_data_allowed previous_button_text save_and_continue_enabled secure_response_files_enabled skin_library_name skin_question_width skin_type survey_language survey_protection_type survey_termination_message survey_title owner_id project_category project_type show_question_scoring show_survey_scoring survey_name survey_status custom_styles last_activated last_modified survey_creation_date survey_expiration_date survey_skin_settings
0 SV_cARsuFqXn9bksDk False https://fivetrandev.ca1.qualtrics.com fivetrandev RS_3zccqsZVQTIwYfQ False False False False True off 1 Yes +1 week True True Qualtrics 700.0 MQ EN PublicSurvey DefaultMessage None UR_cYdNGsrFUemlDbD CX CustomerSatisfactionScore False False TestSurvey_FilterAPI Inactive {"altRowStyles":{"selector":".Skin .ReadableAlt, .Skin .LightBG","styles":{"backgroundColor":null}},"borderStyles":{"selector":".Skin .CS .horizontalbar table.sliderGrid tr.xlabel th, .Skin .CS .horizontalbar .ylabel, .Skin .CS .horizontalbar table.sliderGrid tr td.value, .Skin .Slider .horizontalbar table.sliderGrid tr.xlabel th, .Skin .BorderColor, .Skin .DarkBorderColor, .Skin .Slider .horizontalbar table.sliderGrid tr td.value, .Skin thead, .Skin .GAP .ChoiceStructure thead th ","styles":{"borderColor":null}},"choiceStyles":{"selector":".Skin .horizontalbar th.ylabel, .Skin .Matrix table td, .Skin .Matrix table th, .Skin .Matrix table thead th, .Skin .Matrix table thead td, .Skin .PGR .DragAndDrop .Items label, .Skin .QuestionBody ul.ChoiceStructure, .Skin .PGR .DragAndDrop .Group ul, .Skin .PGR .DragAndDrop .Group h2, .Skin .PGR .DragAndDrop .Items ul, .Skin .PGR .DragAndDrop .Items h2, .Skin .DD .QuestionBody table.ChoiceStructure, .Skin .PGR .DragAndDrop .NoColumns td.groupsContainerTd div ul, .Skin .PGR .DragAndDrop .NoColumns td.groupsContainerTd div h2, .Skin .SBS thead th, .Skin .SBS td, .Skin .SBS .Answers td, .Skin .SBS .Answers th, .SBS table.ChoiceStructure, .Skin .horizontalbar thead table.LabelDescriptions tr td, .Skin .horizontalbar thead tr td.NotApplicable, .QuestionBody th, .Skin .QuestionBody .MC .MAVR label, .Skin .QuestionBody .MC .SAVR label, .Skin .SBS thead th, .Skin .SBS td, .Skin .RO .DND ul li, .Skin .reg, .SkinInner .ChoiceStructure, .Skin thead, .Skin .GAP .ChoiceStructure thead th, .Skin .GAP .ChoiceStructure .c4, .Skin .GAP .ChoiceStructure th, .Skin .GAP .ChoiceStructure .WhyText, .SkinInner .ChoiceStructure .LightBG, .Skin .QuestionOuter.Highlight .Inner .reg, .Skin .QuestionOuter.Highlight .Inner .alt, .Skin .QuestionOuter.Highlight .Inner .ReadableAlt ","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}},"customCSS":null,"errorStyles":{"selector":".Skin .ValidationError","styles":{"backgroundColor":null,"borderColor":null,"color":null}},"errorStylesBG":{"selector":".Skin .HeaderValidationError, .Skin .HeaderValidationError li a","styles":{"backgroundColor":null,"color":null}},"footerStyles":{"selector":".Skin #Footer","styles":{"color":null}},"headerStyles":{"selector":".Skin div#Header","styles":{"color":null}},"highlightStyles":{"selector":".Skin .QuestionOuter.Highlight .Inner, .Skin .QuestionOuter.Highlight .Inner .reg, .Skin .QuestionOuter.Highlight .Inner .alt, .Skin .QuestionOuter.Highlight .Inner .ReadableAlt","styles":{"backgroundColor":null}},"highlightStylesAlt":{"selector":".Skin .QuestionOuter.Highlight .Inner .ReadableAlt, .Skin .QuestionOuter.Highlight .Inner .LightBG","styles":{"backgroundColor":null}},"pageStyles":{"selector":"#SurveyEngineBody","styles":{"backgroundColor":null}},"questionSeparatorStyles":{"selector":".Skin .Separator","styles":{"backgroundColor":null,"display":null}},"questionStyles":{"selector":".Skin .QuestionText","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}},"textEntryStyles":{"selector":".Skin select, .Skin .InputText","styles":{"color":null}},"textStyles":{"selector":".SkinInner","styles":{"color":null,"fontFamily":null,"fontSize":null,"fontStyle":null,"fontWeight":null,"textDecoration":null}}} NaT 2022-07-15 15:31:11 2022-07-15 15:30:27 NaT "xmskin"
1 SV_08GFzuA5eDhxx53 False https://fivetrandev.ca1.qualtrics.com fivetrandev RS_3XaYj2HxzK4NTs9 NaN NaN NaN False False None 1 Yes +1 week True True fivetrandev NaN templated EN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions UR_cYdNGsrFUemlDbD CX None False False Untitled Project Inactive None NaT 2020-07-01 10:25:18 2020-07-01 10:25:11 NaT {"brandingId":null,"templateId":"*2014","overrides":null}
2 SV_8GsOFOOKnw9KAmx False https://fivetrandev.ca1.qualtrics.com fivetrandev RS_bjbsBC4lfPOyEhT NaN NaN NaN False False None 1 Yes +1 week True True fivetrandev NaN templated EN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions UR_cYdNGsrFUemlDbD CX None False False TestSurvey2 Active None 2020-06-30 11:24:41 2020-07-01 08:57:47 2020-06-30 11:22:54 NaT {"brandingId":null,"templateId":"*2014","overrides":null}
3 SV_8vvMJsA88y5AZxA False https://fivetrandev.ca1.qualtrics.com fivetrandev RS_38eltRqQKKndZSS NaN NaN NaN False False None 1 Yes +1 week True True fivetrandev NaN templated EN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions UR_cYdNGsrFUemlDbD CORE None False False SurveyWithMAQ1 Inactive None NaT 2021-02-04 10:59:03 2021-02-04 10:55:54 NaT {"brandingId":null,"templateId":"*2014","overrides":null}
4 SV_1XGe0Xykmeo0lyB False https://fivetrandev.ca1.qualtrics.com fivetrandev RS_1MNRbPFAJML4L0F NaN NaN NaN False False None 1 Yes +1 week True True fivetrandev NaN templated EN PublicSurvey DefaultMessage Online Survey Software | Qualtrics Survey Solutions UR_cYdNGsrFUemlDbD CORE None False False Untitled Project Inactive None NaT 2020-08-06 05:38:26 2020-08-06 05:38:07 NaT {"brandingId":null,"templateId":"*2014","overrides":null}

stg_survey.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:56:59.737905+00:00
WITH 
"survey_projected" AS (
    -- Projection: Selecting 73 out of 74 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "auto_scoring_category",
        "brand_base_url",
        "brand_id",
        "bundle_short_name",
        "composition_type",
        "creator_id",
        "default_scoring_category",
        "division_id",
        "is_active",
        "last_accessed",
        "last_activated",
        "last_modified",
        "option_active_response_set",
        "option_anonymize_response",
        "option_auto_confirm_start",
        "option_autoadvance",
        "option_autoadvance_pages",
        "option_autofocus",
        "option_available_languages",
        "option_back_button",
        "option_ballot_box_stuffing_prevention",
        "option_collect_geo_location",
        "option_confirm_start",
        "option_custom_styles",
        "option_email_thank_you",
        "option_eosredirect_url",
        "option_highlight_questions",
        "option_inactive_survey",
        "option_new_scoring",
        "option_next_button",
        "option_no_index",
        "option_page_transition",
        "option_partial_data",
        "option_partial_data_close_after",
        "option_password_protection",
        "option_previous_button",
        "option_progress_bar_display",
        "option_protect_selection_ids",
        "option_recaptcha_v_3",
        "option_referer_check",
        "option_referer_url",
        "option_relevant_id",
        "option_relevant_idlockout_period",
        "option_response_summary",
        "option_save_and_continue",
        "option_secure_response_files",
        "option_show_export_tags",
        "option_skin",
        "option_skin_library",
        "option_skin_question_width",
        "option_skin_type",
        "option_survey_creation_date",
        "option_survey_expiration",
        "option_survey_language",
        "option_survey_meta_description",
        "option_survey_name",
        "option_survey_protection",
        "option_survey_termination",
        "option_survey_title",
        "option_validate_message",
        "owner_id",
        "project_category",
        "project_type",
        "registry_sha",
        "registry_version",
        "schema_version",
        "scoring_summary_after_questions",
        "scoring_summary_after_survey",
        "scoring_summary_category",
        "survey_name",
        "survey_status"
    FROM "memory"."main"."survey"
),

"survey_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> survey_id
    -- _fivetran_deleted -> is_deleted
    -- option_active_response_set -> active_response_set
    -- option_anonymize_response -> anonymize_response
    -- option_auto_confirm_start -> auto_confirm_start
    -- option_autoadvance -> autoadvance
    -- option_autoadvance_pages -> autoadvance_pages
    -- option_autofocus -> autofocus
    -- option_available_languages -> available_languages
    -- option_back_button -> show_back_button
    -- option_ballot_box_stuffing_prevention -> prevent_ballot_stuffing
    -- option_collect_geo_location -> collect_geolocation
    -- option_confirm_start -> confirm_start
    -- option_custom_styles -> custom_styles
    -- option_email_thank_you -> send_thank_you_email
    -- option_eosredirect_url -> end_of_survey_redirect_url
    -- option_highlight_questions -> highlight_questions
    -- option_inactive_survey -> inactive_survey_behavior
    -- option_new_scoring -> new_scoring_enabled
    -- option_next_button -> next_button_text
    -- option_no_index -> no_index_enabled
    -- option_page_transition -> page_transition_effect
    -- option_partial_data -> partial_data_allowed
    -- option_partial_data_close_after -> partial_data_close_time
    -- option_password_protection -> password_protected
    -- option_previous_button -> previous_button_text
    -- option_progress_bar_display -> progress_bar_display
    -- option_protect_selection_ids -> protect_selection_ids
    -- option_recaptcha_v_3 -> recaptcha_v3_settings
    -- option_referer_check -> referer_check_enabled
    -- option_referer_url -> allowed_referer_url
    -- option_relevant_id -> survey_relevant_id
    -- option_relevant_idlockout_period -> relevant_id_lockout_period
    -- option_response_summary -> response_summary_enabled
    -- option_save_and_continue -> save_and_continue_enabled
    -- option_secure_response_files -> secure_response_files_enabled
    -- option_show_export_tags -> show_export_tags
    -- option_skin -> survey_skin_settings
    -- option_skin_library -> skin_library_name
    -- option_skin_question_width -> skin_question_width
    -- option_skin_type -> skin_type
    -- option_survey_creation_date -> survey_creation_date
    -- option_survey_expiration -> survey_expiration_date
    -- option_survey_language -> survey_language
    -- option_survey_meta_description -> survey_meta_description
    -- option_survey_protection -> survey_protection_type
    -- option_survey_termination -> survey_termination_message
    -- option_survey_title -> survey_title
    -- option_validate_message -> validation_message
    -- scoring_summary_after_questions -> show_question_scoring
    -- scoring_summary_after_survey -> show_survey_scoring
    -- scoring_summary_category -> scoring_category
    SELECT 
        "id" AS "survey_id",
        "_fivetran_deleted" AS "is_deleted",
        "auto_scoring_category",
        "brand_base_url",
        "brand_id",
        "bundle_short_name",
        "composition_type",
        "creator_id",
        "default_scoring_category",
        "division_id",
        "is_active",
        "last_accessed",
        "last_activated",
        "last_modified",
        "option_active_response_set" AS "active_response_set",
        "option_anonymize_response" AS "anonymize_response",
        "option_auto_confirm_start" AS "auto_confirm_start",
        "option_autoadvance" AS "autoadvance",
        "option_autoadvance_pages" AS "autoadvance_pages",
        "option_autofocus" AS "autofocus",
        "option_available_languages" AS "available_languages",
        "option_back_button" AS "show_back_button",
        "option_ballot_box_stuffing_prevention" AS "prevent_ballot_stuffing",
        "option_collect_geo_location" AS "collect_geolocation",
        "option_confirm_start" AS "confirm_start",
        "option_custom_styles" AS "custom_styles",
        "option_email_thank_you" AS "send_thank_you_email",
        "option_eosredirect_url" AS "end_of_survey_redirect_url",
        "option_highlight_questions" AS "highlight_questions",
        "option_inactive_survey" AS "inactive_survey_behavior",
        "option_new_scoring" AS "new_scoring_enabled",
        "option_next_button" AS "next_button_text",
        "option_no_index" AS "no_index_enabled",
        "option_page_transition" AS "page_transition_effect",
        "option_partial_data" AS "partial_data_allowed",
        "option_partial_data_close_after" AS "partial_data_close_time",
        "option_password_protection" AS "password_protected",
        "option_previous_button" AS "previous_button_text",
        "option_progress_bar_display" AS "progress_bar_display",
        "option_protect_selection_ids" AS "protect_selection_ids",
        "option_recaptcha_v_3" AS "recaptcha_v3_settings",
        "option_referer_check" AS "referer_check_enabled",
        "option_referer_url" AS "allowed_referer_url",
        "option_relevant_id" AS "survey_relevant_id",
        "option_relevant_idlockout_period" AS "relevant_id_lockout_period",
        "option_response_summary" AS "response_summary_enabled",
        "option_save_and_continue" AS "save_and_continue_enabled",
        "option_secure_response_files" AS "secure_response_files_enabled",
        "option_show_export_tags" AS "show_export_tags",
        "option_skin" AS "survey_skin_settings",
        "option_skin_library" AS "skin_library_name",
        "option_skin_question_width" AS "skin_question_width",
        "option_skin_type" AS "skin_type",
        "option_survey_creation_date" AS "survey_creation_date",
        "option_survey_expiration" AS "survey_expiration_date",
        "option_survey_language" AS "survey_language",
        "option_survey_meta_description" AS "survey_meta_description",
        "option_survey_name",
        "option_survey_protection" AS "survey_protection_type",
        "option_survey_termination" AS "survey_termination_message",
        "option_survey_title" AS "survey_title",
        "option_validate_message" AS "validation_message",
        "owner_id",
        "project_category",
        "project_type",
        "registry_sha",
        "registry_version",
        "schema_version",
        "scoring_summary_after_questions" AS "show_question_scoring",
        "scoring_summary_after_survey" AS "show_survey_scoring",
        "scoring_summary_category" AS "scoring_category",
        "survey_name",
        "survey_status"
    FROM "survey_projected"
),

"survey_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- allowed_referer_url: from DECIMAL to VARCHAR
    -- anonymize_response: from DECIMAL to VARCHAR
    -- auto_confirm_start: from DECIMAL to VARCHAR
    -- auto_scoring_category: from DECIMAL to VARCHAR
    -- available_languages: from DECIMAL to VARCHAR
    -- bundle_short_name: from DECIMAL to VARCHAR
    -- collect_geolocation: from DECIMAL to VARCHAR
    -- composition_type: from DECIMAL to VARCHAR
    -- confirm_start: from DECIMAL to VARCHAR
    -- creator_id: from DECIMAL to VARCHAR
    -- custom_styles: from VARCHAR to JSON
    -- default_scoring_category: from DECIMAL to VARCHAR
    -- division_id: from DECIMAL to VARCHAR
    -- end_of_survey_redirect_url: from DECIMAL to VARCHAR
    -- inactive_survey_behavior: from DECIMAL to VARCHAR
    -- is_active: from DECIMAL to VARCHAR
    -- last_accessed: from DECIMAL to TIMESTAMP
    -- last_activated: from VARCHAR to TIMESTAMP
    -- last_modified: from VARCHAR to TIMESTAMP
    -- option_survey_name: from DECIMAL to VARCHAR
    -- page_transition_effect: from DECIMAL to VARCHAR
    -- partial_data_close_time: from DECIMAL to VARCHAR
    -- password_protected: from DECIMAL to VARCHAR
    -- progress_bar_display: from DECIMAL to VARCHAR
    -- protect_selection_ids: from DECIMAL to VARCHAR
    -- recaptcha_v3_settings: from DECIMAL to VARCHAR
    -- referer_check_enabled: from DECIMAL to VARCHAR
    -- registry_sha: from DECIMAL to VARCHAR
    -- registry_version: from DECIMAL to VARCHAR
    -- relevant_id_lockout_period: from DECIMAL to VARCHAR
    -- response_summary_enabled: from DECIMAL to VARCHAR
    -- schema_version: from DECIMAL to VARCHAR
    -- scoring_category: from DECIMAL to VARCHAR
    -- send_thank_you_email: from DECIMAL to VARCHAR
    -- show_export_tags: from DECIMAL to VARCHAR
    -- survey_creation_date: from VARCHAR to TIMESTAMP
    -- survey_expiration_date: from DECIMAL to TIMESTAMP
    -- survey_meta_description: from DECIMAL to VARCHAR
    -- survey_relevant_id: from DECIMAL to VARCHAR
    -- survey_skin_settings: from VARCHAR to JSON
    -- validation_message: from DECIMAL to VARCHAR
    SELECT
        "survey_id",
        "is_deleted",
        "brand_base_url",
        "brand_id",
        "active_response_set",
        "autoadvance",
        "autoadvance_pages",
        "autofocus",
        "show_back_button",
        "prevent_ballot_stuffing",
        "highlight_questions",
        "new_scoring_enabled",
        "next_button_text",
        "no_index_enabled",
        "partial_data_allowed",
        "previous_button_text",
        "save_and_continue_enabled",
        "secure_response_files_enabled",
        "skin_library_name",
        "skin_question_width",
        "skin_type",
        "survey_language",
        "survey_protection_type",
        "survey_termination_message",
        "survey_title",
        "owner_id",
        "project_category",
        "project_type",
        "show_question_scoring",
        "show_survey_scoring",
        "survey_name",
        "survey_status",
        CAST("allowed_referer_url" AS VARCHAR) AS "allowed_referer_url",
        CAST("anonymize_response" AS VARCHAR) AS "anonymize_response",
        CAST("auto_confirm_start" AS VARCHAR) AS "auto_confirm_start",
        CAST("auto_scoring_category" AS VARCHAR) AS "auto_scoring_category",
        CAST("available_languages" AS VARCHAR) AS "available_languages",
        CAST("bundle_short_name" AS VARCHAR) AS "bundle_short_name",
        CAST("collect_geolocation" AS VARCHAR) AS "collect_geolocation",
        CAST("composition_type" AS VARCHAR) AS "composition_type",
        CAST("confirm_start" AS VARCHAR) AS "confirm_start",
        CAST("creator_id" AS VARCHAR) AS "creator_id",
        CAST("custom_styles" AS JSON) AS "custom_styles",
        CAST("default_scoring_category" AS VARCHAR) AS "default_scoring_category",
        CAST("division_id" AS VARCHAR) AS "division_id",
        CAST("end_of_survey_redirect_url" AS VARCHAR) AS "end_of_survey_redirect_url",
        CAST("inactive_survey_behavior" AS VARCHAR) AS "inactive_survey_behavior",
        CAST("is_active" AS VARCHAR) AS "is_active",
        CAST("last_accessed" AS TIMESTAMP) AS "last_accessed",
        CAST("last_activated" AS TIMESTAMP) AS "last_activated",
        CAST("last_modified" AS TIMESTAMP) AS "last_modified",
        CAST("option_survey_name" AS VARCHAR) AS "option_survey_name",
        CAST("page_transition_effect" AS VARCHAR) AS "page_transition_effect",
        CAST("partial_data_close_time" AS VARCHAR) AS "partial_data_close_time",
        CAST("password_protected" AS VARCHAR) AS "password_protected",
        CAST("progress_bar_display" AS VARCHAR) AS "progress_bar_display",
        CAST("protect_selection_ids" AS VARCHAR) AS "protect_selection_ids",
        CAST("recaptcha_v3_settings" AS VARCHAR) AS "recaptcha_v3_settings",
        CAST("referer_check_enabled" AS VARCHAR) AS "referer_check_enabled",
        CAST("registry_sha" AS VARCHAR) AS "registry_sha",
        CAST("registry_version" AS VARCHAR) AS "registry_version",
        CAST("relevant_id_lockout_period" AS VARCHAR) AS "relevant_id_lockout_period",
        CAST("response_summary_enabled" AS VARCHAR) AS "response_summary_enabled",
        CAST("schema_version" AS VARCHAR) AS "schema_version",
        CAST("scoring_category" AS VARCHAR) AS "scoring_category",
        CAST("send_thank_you_email" AS VARCHAR) AS "send_thank_you_email",
        CAST("show_export_tags" AS VARCHAR) AS "show_export_tags",
        CAST("survey_creation_date" AS TIMESTAMP) AS "survey_creation_date",
        CAST("survey_expiration_date" AS TIMESTAMP) AS "survey_expiration_date",
        CAST("survey_meta_description" AS VARCHAR) AS "survey_meta_description",
        CAST("survey_relevant_id" AS VARCHAR) AS "survey_relevant_id",
        CAST("survey_skin_settings" AS JSON) AS "survey_skin_settings",
        CAST("validation_message" AS VARCHAR) AS "validation_message"
    FROM "survey_projected_renamed"
),

"survey_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 39 columns with unacceptable missing values
    -- allowed_referer_url has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- anonymize_response has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- auto_confirm_start has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- auto_scoring_category has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- available_languages has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- bundle_short_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- collect_geolocation has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- composition_type has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- confirm_start has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- creator_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- custom_styles has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- default_scoring_category has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- division_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- end_of_survey_redirect_url has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- inactive_survey_behavior has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- is_active has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- last_accessed has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- last_activated has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- option_survey_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- page_transition_effect has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- partial_data_close_time has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- password_protected has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- progress_bar_display has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- protect_selection_ids has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- recaptcha_v3_settings has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- referer_check_enabled has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- registry_sha has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- registry_version has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- relevant_id_lockout_period has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- response_summary_enabled has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- schema_version has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- scoring_category has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- send_thank_you_email has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- show_export_tags has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- skin_question_width has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- survey_meta_description has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- survey_relevant_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- survey_title has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- validation_message has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "survey_id",
        "is_deleted",
        "brand_base_url",
        "brand_id",
        "active_response_set",
        "autoadvance",
        "autoadvance_pages",
        "autofocus",
        "show_back_button",
        "prevent_ballot_stuffing",
        "highlight_questions",
        "new_scoring_enabled",
        "next_button_text",
        "no_index_enabled",
        "partial_data_allowed",
        "previous_button_text",
        "save_and_continue_enabled",
        "secure_response_files_enabled",
        "skin_library_name",
        "skin_question_width",
        "skin_type",
        "survey_language",
        "survey_protection_type",
        "survey_termination_message",
        "survey_title",
        "owner_id",
        "project_category",
        "project_type",
        "show_question_scoring",
        "show_survey_scoring",
        "survey_name",
        "survey_status",
        "custom_styles",
        "last_activated",
        "last_modified",
        "survey_creation_date",
        "survey_expiration_date",
        "survey_skin_settings"
    FROM "survey_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "survey_projected_renamed_casted_missing_handled"

stg_survey.yml (Document the table)

version: 2
models:
- name: stg_survey
  description: The table is about surveys. Each row represents a single survey with
    various configuration options. It includes details like survey ID, brand information,
    creator ID, response settings, scoring options, language preferences, and survey
    status. The table also contains metadata such as creation date, last modified
    date, and various customization options for the survey's appearance and behavior.
  columns:
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each survey. For
        this table, each row represents a unique survey, and the survey_id appears
        to be a unique string for each row. This column is likely to be unique across
        all rows and can serve as a primary key for the table.
  - name: is_deleted
    description: Indicates if the record was deleted
    tests:
    - not_null
  - name: brand_base_url
    description: Base URL for the brand's Qualtrics instance
    tests:
    - not_null
  - name: brand_id
    description: Unique identifier for the brand
    tests:
    - not_null
  - name: active_response_set
    description: Active response set option
    tests:
    - not_null
  - name: autoadvance
    description: Option to automatically advance questions
    cocoon_meta:
      missing_acceptable: Feature may not be enabled for all surveys
  - name: autoadvance_pages
    description: Option to automatically advance pages
    cocoon_meta:
      missing_acceptable: Feature may not be enabled for all surveys
  - name: autofocus
    description: Option to automatically focus on first question
    cocoon_meta:
      missing_acceptable: Feature may not be enabled for all surveys
  - name: show_back_button
    description: Option to show back button
    tests:
    - not_null
  - name: prevent_ballot_stuffing
    description: Option to prevent ballot box stuffing
    tests:
    - not_null
  - name: highlight_questions
    description: Option to highlight questions
    tests:
    - accepted_values:
        values:
        - 'off'
        - 'on'
    cocoon_meta:
      missing_acceptable: Feature may not be enabled for all surveys
  - name: new_scoring_enabled
    description: New scoring system enabled
    tests:
    - not_null
  - name: next_button_text
    description: Text or symbol for next button
    tests:
    - not_null
    - accepted_values:
        values:
        - "\u2192"
        - '>'
        - '>>'
        - "\u23F5"
        - "\u21D2"
        - "\u2794"
        - "\u279C"
        - "\u25B6"
        - Next
        - Forward
        - Continue
        - Proceed
        - Advance
        - " \u2192 "
  - name: no_index_enabled
    description: Prevent search engine indexing
    tests:
    - not_null
    - accepted_values:
        values:
        - 'Yes'
        - 'No'
  - name: partial_data_allowed
    description: Allow partial data submission
    tests:
    - not_null
  - name: previous_button_text
    description: Text or symbol for previous button
    tests:
    - not_null
    - accepted_values:
        values:
        - "\u2190"
        - "\u25C4"
        - "\xAB"
        - "\u27E8"
        - "\u2B9C"
        - "\u2BC7"
        - Previous
        - Prev
        - Back
        - <
        - '<<'
        - "\u276E"
        - "\u23EA"
        - "\U0001F844"
        - "\U0001F878"
        - " \u2190 "
  - name: save_and_continue_enabled
    description: Save and continue feature enabled
    tests:
    - not_null
  - name: secure_response_files_enabled
    description: Secure response files enabled
    tests:
    - not_null
  - name: skin_library_name
    description: Skin library name
    tests:
    - not_null
  - name: skin_question_width
    description: Width of question in skin
    tests:
    - not_null
  - name: skin_type
    description: Type of skin used
    tests:
    - not_null
    - accepted_values:
        values:
        - MQ
        - Bootstrap
        - Foundation
        - Bulma
        - Tailwind
        - Material
        - Semantic UI
        - UIkit
        - Skeleton
        - Pure
        - Milligram
        - Materialize
        - Primer
        - Spectre
        - Picnic
        - Chota
        - Water.css
        - Sakura
        - MVP.css
        - Mini.css
        - templated
  - name: survey_language
    description: Survey language
    tests:
    - not_null
    - accepted_values:
        values:
        - EN
        - ES
        - FR
        - DE
        - IT
        - PT
        - RU
        - ZH
        - JA
        - KO
        - AR
        - HI
        - BN
        - PA
        - TE
        - MR
        - TA
        - UR
        - GU
        - KN
  - name: survey_protection_type
    description: Survey protection type
    tests:
    - not_null
    - accepted_values:
        values:
        - PublicSurvey
        - PasswordProtected
        - InvitationOnly
        - IPRestricted
        - TimeRestricted
        - GeographicallyRestricted
        - SingleResponse
        - AnonymousResponses
        - EncryptedResponses
        - CaptchaProtected
  - name: survey_termination_message
    description: Survey termination message
    tests:
    - not_null
  - name: survey_title
    description: Title displayed for the survey
    tests:
    - not_null
  - name: owner_id
    description: Unique identifier of the survey owner
    tests:
    - not_null
  - name: project_category
    description: Category of the survey project
    tests:
    - not_null
    - accepted_values:
        values:
        - CX
        - CORE
        - NPS
        - CSAT
        - Employee Experience
        - Market Research
        - Product Feedback
        - Brand Awareness
        - Customer Loyalty
        - Voice of Customer
        - User Experience
        - Website Feedback
        - Event Feedback
        - Business-to-Business
        - Academic Research
  - name: project_type
    description: Type of the survey project
    tests:
    - accepted_values:
        values:
        - CustomerSatisfactionScore
        - NetPromoterScore
        - ProductFeedback
        - EmployeeSatisfaction
        - MarketResearch
        - BrandAwareness
        - EventFeedback
        - WebsiteUsability
        - CourseEvaluation
        - PatientExperience
        - PoliticalPolling
        - SocialMediaSentiment
        - ConceptTesting
        - CustomerNeeds
        - CompetitiveAnalysis
        - UserExperience
        - ServiceQuality
        - PricingSensitivity
        - AdvertisingEffectiveness
        - DemographicStudy
    cocoon_meta:
      missing_acceptable: Some surveys might not belong to a specific project type
  - name: show_question_scoring
    description: Display scoring summary after each question
    tests:
    - not_null
  - name: show_survey_scoring
    description: Display scoring summary after survey completion
    tests:
    - not_null
  - name: survey_name
    description: Name of the survey
    tests:
    - not_null
  - name: survey_status
    description: Current status of the survey
    tests:
    - not_null
    - accepted_values:
        values:
        - Inactive
        - Active
  - name: custom_styles
    description: Custom CSS styles for the survey
    tests:
    - not_null
  - name: last_activated
    description: Timestamp of last survey activation
    tests:
    - not_null
  - name: last_modified
    description: Timestamp of last survey modification
    tests:
    - not_null
  - name: survey_creation_date
    description: Survey creation date
    tests:
    - not_null
  - name: survey_expiration_date
    description: Survey expiration date
    cocoon_meta:
      missing_acceptable: Surveys without expiration dates don't need this field.
  - name: survey_skin_settings
    description: Survey skin or theme settings
    tests:
    - not_null

stg_survey_embedded_data (first 100 rows)

data_category data_key response_id
0 Website Website R_2SABUWvdPsqUvZf
1 Website Website R_2sbtWBrH4K97W5X
2 Website Website R_VQlBNvN3iJYpL1L
3 Website Website R_2BxNumKW69fwQWV
4 country country R_1Qnonc63Wkk2Fur

stg_survey_embedded_data.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:57:52.211204+00:00
WITH 
"survey_embedded_data_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "import_id",
        "key_",
        "response_id",
        "value_"
    FROM "memory"."main"."survey_embedded_data"
),

"survey_embedded_data_projected_renamed" AS (
    -- Rename: Renaming columns
    -- import_id -> data_category
    -- key_ -> data_key
    -- value_ -> data_value
    SELECT 
        "import_id" AS "data_category",
        "key_" AS "data_key",
        "response_id",
        "value_" AS "data_value"
    FROM "survey_embedded_data_projected"
),

"survey_embedded_data_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- data_value: from DECIMAL to VARCHAR
    SELECT
        "data_category",
        "data_key",
        "response_id",
        CAST("data_value" AS VARCHAR) AS "data_value"
    FROM "survey_embedded_data_projected_renamed"
),

"survey_embedded_data_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 1 columns with unacceptable missing values
    -- data_value has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "data_category",
        "data_key",
        "response_id"
    FROM "survey_embedded_data_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "survey_embedded_data_projected_renamed_casted_missing_handled"

stg_survey_embedded_data.yml (Document the table)

version: 2
models:
- name: stg_survey_embedded_data
  description: The table is about survey embedded data. It contains information related
    to survey responses. The table has columns for import_id, key_, response_id, and
    value_. The import_id and key_ seem to represent categories like "Website" and
    "country". The response_id is a unique identifier for each survey response. The
    value_ column appears to be empty in the given samples.
  columns:
  - name: data_category
    description: Category or type of embedded data
    tests:
    - not_null
  - name: data_key
    description: Specific identifier for the embedded data type
    tests:
    - not_null
  - name: response_id
    description: Unique identifier for each survey response
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each survey response.
        For this table, each row appears to be a single piece of embedded data for
        a survey response. The response_id is likely to be unique across rows for
        each unique survey response, even though it may appear multiple times for
        different data categories of the same response.

stg_survey_version (first 100 rows)

survey_id is_deleted description is_published user_id version_number was_ever_published creation_date version_id
0 SV_06auvDLqWENyV6t False Autosaved version False UR_cYdNGsrFUemlDbD 1 False 2020-08-05 04:25:37 9223351
1 SV_08GFzuA5eDhxx53 False Autosaved version False UR_cYdNGsrFUemlDbD 1 False 2020-07-01 10:25:16 255659202
2 SV_0VZnz7EkhZquEKy False Autosaved version True UR_cYdNGsrFUemlDbD 1 True 2022-09-29 06:51:05 20510032
3 SV_1MKtxtk2owJjOol False Autosaved version False UR_cYdNGsrFUemlDbD 3 False 2020-06-23 12:14:04 40331332
4 SV_1MKtxtk2owJjOol False Autosaved version False UR_cYdNGsrFUemlDbD 4 False 2020-06-25 13:21:42 63473806

stg_survey_version.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 23:00:30.364905+00:00
WITH 
"survey_version_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "survey_id",
        "_fivetran_deleted",
        "creation_date",
        "description",
        "published",
        "user_id",
        "version_number",
        "was_published"
    FROM "memory"."main"."survey_version"
),

"survey_version_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> version_id
    -- _fivetran_deleted -> is_deleted
    -- published -> is_published
    -- was_published -> was_ever_published
    SELECT 
        "id" AS "version_id",
        "survey_id",
        "_fivetran_deleted" AS "is_deleted",
        "creation_date",
        "description",
        "published" AS "is_published",
        "user_id",
        "version_number",
        "was_published" AS "was_ever_published"
    FROM "survey_version_projected"
),

"survey_version_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- creation_date: from VARCHAR to TIMESTAMP
    -- version_id: from INT to VARCHAR
    SELECT
        "survey_id",
        "is_deleted",
        "description",
        "is_published",
        "user_id",
        "version_number",
        "was_ever_published",
        CAST("creation_date" AS TIMESTAMP) AS "creation_date",
        CAST("version_id" AS VARCHAR) AS "version_id"
    FROM "survey_version_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "survey_version_projected_renamed_casted"

stg_survey_version.yml (Document the table)

version: 2
models:
- name: stg_survey_version
  description: The table is about survey versions. It contains details like survey
    ID, creation date, description, publication status, user ID, and version number.
    Each row represents a specific version of a survey. The table tracks whether a
    version was published or autosaved. It also includes a flag for deleted records
    and a unique identifier for each version.
  columns:
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
  - name: is_deleted
    description: Flag indicating if the record was deleted
    tests:
    - not_null
  - name: description
    description: Brief description of the survey version
    tests:
    - not_null
  - name: is_published
    description: Flag indicating if the version is currently published
    tests:
    - not_null
  - name: user_id
    description: Unique identifier of the user who created the version
    tests:
    - not_null
  - name: version_number
    description: Sequential number of the survey version
    tests:
    - not_null
  - name: was_ever_published
    description: Flag indicating if the version was ever published
    tests:
    - not_null
  - name: creation_date
    description: Date and time when the survey version was created
    tests:
    - not_null
  - name: version_id
    description: Unique identifier for the survey version
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the survey version.
        For this table, each row represents a specific version of a survey. Given
        that it's explicitly stated as a unique identifier, it should be unique across
        all rows.

stg_question (first 100 rows)

question_text raw_question_text question_description question_id description_display_option question_type is_deleted next_choice_id selector_type export_tag next_answer_id survey_id sub_selector_type is_hidden is_private
0 None None None QID26 UseText CS False 4 VRTL Q21 1 SV_djzINkbJIFJUs05 TX None None
1 Marks Marks Marks QID9 UseText CS False 6 VRTL Q9 1 SV_afWZKdnyaANAL5j TX false false
2 Marks Marks Marks QID9 UseText CS False 6 VRTL Q9 1 SV_6Jcja72TpOiHjdX TX false false
3 Marks Marks Marks QID9 UseText CS False 6 VRTL Q9 1 SV_06auvDLqWENyV6t TX false false
4 Tell About experience ?&nbsp; Tell About experience ?&nbsp; Tell About experience ? QID1 UseText DB False 4 TB Q1 1 SV_06auvDLqWENyV6t None false false

stg_question.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:48:57.242024+00:00
WITH 
"question_projected" AS (
    -- Projection: Selecting 18 out of 19 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "survey_id",
        "_fivetran_deleted",
        "data_export_tag",
        "data_visibility_hidden",
        "data_visibility_private",
        "next_answer_id",
        "next_choice_id",
        "question_description",
        "question_description_option",
        "question_text",
        "question_text_unsafe",
        "question_type",
        "selector",
        "sub_selector",
        "validation_setting_force_response",
        "validation_setting_force_response_type",
        "validation_setting_type"
    FROM "memory"."main"."question"
),

"question_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> question_id
    -- _fivetran_deleted -> is_deleted
    -- data_export_tag -> export_tag
    -- data_visibility_hidden -> is_hidden
    -- data_visibility_private -> is_private
    -- question_description_option -> description_display_option
    -- question_text_unsafe -> raw_question_text
    -- selector -> selector_type
    -- sub_selector -> sub_selector_type
    -- validation_setting_force_response -> force_response
    -- validation_setting_force_response_type -> force_response_type
    -- validation_setting_type -> validation_type
    SELECT 
        "id" AS "question_id",
        "survey_id",
        "_fivetran_deleted" AS "is_deleted",
        "data_export_tag" AS "export_tag",
        "data_visibility_hidden" AS "is_hidden",
        "data_visibility_private" AS "is_private",
        "next_answer_id",
        "next_choice_id",
        "question_description",
        "question_description_option" AS "description_display_option",
        "question_text",
        "question_text_unsafe" AS "raw_question_text",
        "question_type",
        "selector" AS "selector_type",
        "sub_selector" AS "sub_selector_type",
        "validation_setting_force_response" AS "force_response",
        "validation_setting_force_response_type" AS "force_response_type",
        "validation_setting_type" AS "validation_type"
    FROM "question_projected"
),

"question_projected_renamed_trimmed" AS (
    -- Trim Leading and Trailing Spaces
    SELECT
        "question_id",
        "survey_id",
        "is_deleted",
        "export_tag",
        "is_hidden",
        "is_private",
        "next_answer_id",
        "next_choice_id",
        "description_display_option",
        "question_text",
        "raw_question_text",
        "question_type",
        "selector_type",
        "sub_selector_type",
        "force_response",
        "force_response_type",
        "validation_type",
        TRIM("question_description") AS "question_description"
    FROM "question_projected_renamed"
),

"question_projected_renamed_trimmed_cleaned" AS (
    -- Clean unusual string values: 
    -- question_text: The problem is that 'Click to write the question text' is a placeholder value that should be replaced with actual content. 'Tell About experience ? ' has typos, improper capitalization, and an HTML entity that should be removed. 'Marks' appears to be the only valid entry. The correct values should be consistent and meaningful question text. 
    -- raw_question_text: The problem is that this column contains some unusual values. "Click to write the question text" is likely a placeholder and not an actual question. "Tell About experience ? " contains an HTML entity ( ) which should be removed. "Marks" seems to be the only valid value. The correct values should be either "Marks" or an empty string for invalid entries. 
    -- question_description: The problem is that 'Click to write the question text' is a placeholder value that should not appear in actual data, and 'Tell About experience ?' has grammatical errors and inconsistent capitalization. The correct values should be either 'Marks' (which appears to be a valid category) or a properly formatted question about experience. 
    SELECT
        "question_id",
        "survey_id",
        "is_deleted",
        "export_tag",
        "is_hidden",
        "is_private",
        "next_answer_id",
        "next_choice_id",
        "description_display_option",
        CASE
            WHEN "question_text" = '''Click to write the question text''' THEN ''''
            WHEN "question_text" = '''Tell About experience ? ''' THEN '''Tell about experience?'''
            ELSE "question_text"
        END AS "question_text",
        CASE
            WHEN "raw_question_text" = '''Click to write the question text''' THEN ''''
            WHEN "raw_question_text" = '''Tell About experience ? ''' THEN '''Tell About experience ?'''
            ELSE "raw_question_text"
        END AS "raw_question_text",
        "question_type",
        "selector_type",
        "sub_selector_type",
        "force_response",
        "force_response_type",
        "validation_type",
        CASE
            WHEN "question_description" = '''Click to write the question text''' THEN ''''
            WHEN "question_description" = '''Tell About experience ?''' THEN '''Tell about experience?'''
            ELSE "question_description"
        END AS "question_description"
    FROM "question_projected_renamed_trimmed"
),

"question_projected_renamed_trimmed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- question_text: ['Click to write the question text']
    -- raw_question_text: ['Click to write the question text']
    -- question_description: ['Click to write the question text']
    SELECT 
        CASE
            WHEN "question_text" = 'Click to write the question text' THEN NULL
            ELSE "question_text"
        END AS "question_text",
        CASE
            WHEN "raw_question_text" = 'Click to write the question text' THEN NULL
            ELSE "raw_question_text"
        END AS "raw_question_text",
        CASE
            WHEN "question_description" = 'Click to write the question text' THEN NULL
            ELSE "question_description"
        END AS "question_description",
        "question_id",
        "description_display_option",
        "question_type",
        "is_hidden",
        "is_deleted",
        "next_choice_id",
        "selector_type",
        "is_private",
        "force_response",
        "export_tag",
        "next_answer_id",
        "survey_id",
        "sub_selector_type",
        "force_response_type",
        "validation_type"
    FROM "question_projected_renamed_trimmed_cleaned"
),

"question_projected_renamed_trimmed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- force_response: from DECIMAL to VARCHAR
    -- force_response_type: from DECIMAL to VARCHAR
    -- is_hidden: from BOOLEAN to VARCHAR
    -- is_private: from BOOLEAN to VARCHAR
    -- validation_type: from DECIMAL to VARCHAR
    SELECT
        "question_text",
        "raw_question_text",
        "question_description",
        "question_id",
        "description_display_option",
        "question_type",
        "is_deleted",
        "next_choice_id",
        "selector_type",
        "export_tag",
        "next_answer_id",
        "survey_id",
        "sub_selector_type",
        CAST("force_response" AS VARCHAR) AS "force_response",
        CAST("force_response_type" AS VARCHAR) AS "force_response_type",
        CAST("is_hidden" AS VARCHAR) AS "is_hidden",
        CAST("is_private" AS VARCHAR) AS "is_private",
        CAST("validation_type" AS VARCHAR) AS "validation_type"
    FROM "question_projected_renamed_trimmed_cleaned_null"
),

"question_projected_renamed_trimmed_cleaned_null_casted_missing_handled" AS (
    -- Handling missing values: There are 8 columns with unacceptable missing values
    -- force_response has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- force_response_type has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- is_hidden has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- is_private has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- question_description has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- question_text has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- raw_question_text has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- validation_type has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "question_text",
        "raw_question_text",
        "question_description",
        "question_id",
        "description_display_option",
        "question_type",
        "is_deleted",
        "next_choice_id",
        "selector_type",
        "export_tag",
        "next_answer_id",
        "survey_id",
        "sub_selector_type",
        "is_hidden",
        "is_private"
    FROM "question_projected_renamed_trimmed_cleaned_null_casted"
)

-- COCOON BLOCK END
SELECT * FROM "question_projected_renamed_trimmed_cleaned_null_casted_missing_handled"

stg_question.yml (Document the table)

version: 2
models:
- name: stg_question
  description: The table is about survey questions. It contains details like question
    ID, survey ID, question text, question type, and various settings. Each row represents
    a unique question within a survey. The table includes fields for visibility, data
    export tags, validation settings, and selector types. This structure allows for
    flexible question configuration across different surveys.
  columns:
  - name: question_text
    description: The text of the question
    tests:
    - not_null
    - accepted_values:
        values:
        - A+
        - A
        - A-
        - B+
        - B
        - B-
        - C+
        - C
        - C-
        - D+
        - D
        - D-
        - F
        - Pass
        - Fail
        - Incomplete
        - Withdrawn
        - '1'
        - '2'
        - '3'
        - '4'
        - '5'
        - Marks
        - Tell About experience ? 
  - name: raw_question_text
    description: Unprocessed question text
    tests:
    - not_null
    - accepted_values:
        values:
        - A+
        - A
        - A-
        - B+
        - B
        - B-
        - C+
        - C
        - C-
        - D+
        - D
        - D-
        - F
        - 100%
        - 90-99%
        - 80-89%
        - 70-79%
        - 60-69%
        - 50-59%
        - 0-49%
        - Pass
        - Fail
        - Incomplete
        - Withdrawn
        - Marks
        - Tell About experience ? 
  - name: question_description
    description: Description of the question
    tests:
    - not_null
  - name: question_id
    description: Unique identifier for the question
    tests:
    - not_null
  - name: description_display_option
    description: Option for question description display
    tests:
    - not_null
    - accepted_values:
        values:
        - UseText
        - UseHTML
        - NoDisplay
        - UseMarkdown
  - name: question_type
    description: Type of question (e.g., CS, DB)
    tests:
    - not_null
    - accepted_values:
        values:
        - CS
        - DB
        - Math
        - Physics
        - Chemistry
        - Biology
        - History
        - Geography
        - Literature
        - Language
        - Economics
        - Psychology
        - Sociology
        - Philosophy
        - Art
        - Music
        - Political Science
        - Engineering
        - Environmental Science
        - Anthropology
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: next_choice_id
    description: ID for the next choice
    tests:
    - not_null
  - name: selector_type
    description: Selector type for the question
    tests:
    - not_null
  - name: export_tag
    description: Tag for data export purposes
    tests:
    - not_null
  - name: next_answer_id
    description: ID for the next answer
    tests:
    - not_null
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
  - name: sub_selector_type
    description: Sub-selector type for the question
    cocoon_meta:
      missing_acceptable: Not applicable for question types without sub-selectors.
  - name: is_hidden
    description: Indicates if the question is hidden
    tests:
    - not_null
    - accepted_values:
        values:
        - 'True'
        - 'False'
        - 'false'
  - name: is_private
    description: Indicates if the question is private
    tests:
    - not_null
    - accepted_values:
        values:
        - 'True'
        - 'False'
        - 'false'

stg_core_mailing_list (first 100 rows)

id library_id name category folder _fivetran_deleted
0 ML_553543t34t534 library_blahblah123 BingBong Bad Category Good Folder False

stg_core_mailing_list.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:57.026219+00:00
WITH 
"core_mailing_list_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "library_id",
        "name",
        "category",
        "folder",
        "_fivetran_deleted"
    FROM "memory"."main"."core_mailing_list"
)

-- COCOON BLOCK END
SELECT * FROM "core_mailing_list_projected"

stg_core_mailing_list.yml (Document the table)

version: 2
models:
- name: stg_core_mailing_list
  description: Please summarize the table
  columns:
  - name: id
    description: id
    tests:
    - not_null
  - name: library_id
    description: library_id
    tests:
    - not_null
  - name: name
    description: name
    tests:
    - not_null
  - name: category
    description: category
    tests:
    - not_null
  - name: folder
    description: folder
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null

stg_question_response (first 100 rows)

record_id question_topic question_id response_id response_value loop_id option_key sub_question_key sub_question_text
0 vnzZKokL2ijFL+3cv266gKI+R3U= Drink QID2 R_2w6q9pgBpebvjCg 2.0 None None None None
1 ZOmuYJzFNtYua4Iwbtk474SXBsk= Drink QID2 R_3RvEQNuWciDhUpl 2.0 None None None None
2 KeKECHYR7V5729skoyowWs4Ah4o= Drink QID2 R_2SqSR1Q6pLJZplH 1.0 None None None None
3 j2kXb68qeBgQ9Tdsj/nyiypvEZ0= Drink QID2 R_2xyo8lOlBsJkqNT 3.0 None None None None
4 tm3O8dMY4osydnC+ZJuS+O5u8QI= state QID58 R_1Qnonc63Wkk2Fur NaN None None None None

stg_question_response.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:50:48.307661+00:00
WITH 
"question_response_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "_fivetran_id",
        "loop_id",
        "question",
        "question_id",
        "question_option_key",
        "response_id",
        "sub_question_key",
        "sub_question_text",
        "value_"
    FROM "memory"."main"."question_response"
),

"question_response_projected_renamed" AS (
    -- Rename: Renaming columns
    -- _fivetran_id -> record_id
    -- question -> question_topic
    -- question_option_key -> option_key
    -- value_ -> response_value
    SELECT 
        "_fivetran_id" AS "record_id",
        "loop_id",
        "question" AS "question_topic",
        "question_id",
        "question_option_key" AS "option_key",
        "response_id",
        "sub_question_key",
        "sub_question_text",
        "value_" AS "response_value"
    FROM "question_response_projected"
),

"question_response_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- loop_id: from DECIMAL to VARCHAR
    -- option_key: from DECIMAL to VARCHAR
    -- sub_question_key: from DECIMAL to VARCHAR
    -- sub_question_text: from DECIMAL to VARCHAR
    SELECT
        "record_id",
        "question_topic",
        "question_id",
        "response_id",
        "response_value",
        CAST("loop_id" AS VARCHAR) AS "loop_id",
        CAST("option_key" AS VARCHAR) AS "option_key",
        CAST("sub_question_key" AS VARCHAR) AS "sub_question_key",
        CAST("sub_question_text" AS VARCHAR) AS "sub_question_text"
    FROM "question_response_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "question_response_projected_renamed_casted"

stg_question_response.yml (Document the table)

version: 2
models:
- name: stg_question_response
  description: The table is about survey question responses. It contains details like
    question ID, response ID, and response value. Each row represents a response to
    a specific question. The "question" column shows the question topic, while "value_"
    contains the numerical response. Some questions may have sub-questions or options,
    as indicated by additional columns. The table appears to track responses across
    different survey questions.
  columns:
  - name: record_id
    description: Unique identifier for each record
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column appears to be a unique identifier for each record in
        the survey. For this table, each row represents a response to a specific question.
        The record_id seems to be a hashed value that is unique across rows.
  - name: question_topic
    description: Topic or subject of the survey question
    tests:
    - not_null
  - name: question_id
    description: Unique identifier for each question
    tests:
    - not_null
  - name: response_id
    description: Unique identifier for each response
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column appears to be a unique identifier for each response
        in the survey. For this table, each row represents a response to a specific
        question. The response_id seems to be a unique identifier across rows.
  - name: response_value
    description: Numerical value of the response
    tests:
    - not_null
  - name: loop_id
    description: Identifier for looped questions (currently empty)
    cocoon_meta:
      missing_acceptable: Not applicable for non-looped questions.
  - name: option_key
    description: Key for question options (currently empty)
    cocoon_meta:
      missing_acceptable: Not applicable for non-multiple choice questions.
  - name: sub_question_key
    description: Key for sub-questions (currently empty)
    cocoon_meta:
      missing_acceptable: Not applicable for main questions without sub-questions.
  - name: sub_question_text
    description: Text of sub-questions (currently empty)
    cocoon_meta:
      missing_acceptable: Not applicable for main questions without sub-questions.

stg_distribution_contact (first 100 rows)

contact_id distribution_id contact_lookup_id survey_status survey_link response_completed_at response_id response_started_at survey_opened_at survey_sent_at
0 CID_4NG7JC9R7VTmHYi EMD_Q8AiTuBzgLZsODX CGC_kKkUtjVs3mFxeAK Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_7Uw1HuxpN0OxbVQ?Q_CHL=email&Q_DL=EMD_Q8AiTuBzgLZsODX_7Uw1HuxpN0OxbVQ_CGC_kKkUtjVs3mFxeAK&_g_=g NaT None NaT 2023-03-28 07:41:58 2023-03-28 07:17:14.896
1 CID_4NG7JC9R7VTmHYi EMD_FV7lj9zfNfFAxFI CGC_T8oa4TlYPoGvnwW Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_7Uw1HuxpN0OxbVQ?Q_CHL=email&Q_DL=EMD_FV7lj9zfNfFAxFI_7Uw1HuxpN0OxbVQ_CGC_T8oa4TlYPoGvnwW&_g_=g NaT None NaT 2023-04-25 09:19:47 2023-04-25 09:06:38.894
2 CID_4NG7JC9R7VTmHYi EMD_b4UhhqAF6Sx0Mcl CGC_kKkUtjVs3mFxeAK Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_6Jcja72TpOiHjdX?Q_CHL=email&Q_DL=EMD_b4UhhqAF6Sx0Mcl_6Jcja72TpOiHjdX_CGC_kKkUtjVs3mFxeAK&_g_=g NaT None NaT 2023-01-10 11:30:29 2023-01-10 11:30:20.925
3 CID_9Fa5PKgMwjiExoh EMD_OY1XxLK7rncWDt8 CGC_0ZqBiW44uMDQbon Opened https://myorg.ca1.qualtrics.com/jfe/form/SV_djzINkbJIFJUs05?Q_CHL=email&Q_DL=EMD_OY1XxLK7rncWDt8_djzINkbJIFJUs05_CGC_0ZqBiW44uMDQbon&_g_=g NaT None NaT 2022-04-06 13:14:19 2022-04-06 13:14:18.296
4 CID_9Fa5PKgMwjiExoh EMD_GgJDmizeU8yAwvI CGC_0ZqBiW44uMDQbon Pending https://myorg.ca1.qualtrics.com/jfe/form/SV_djzINkbJIFJUs05?Q_CHL=gl&Q_DL=EMD_GgJDmizeU8yAwvI_djzINkbJIFJUs05_CGC_0ZqBiW44uMDQbon&_g_=g NaT None NaT NaT 2022-09-12 05:13:55.058

stg_distribution_contact.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:46:51.239370+00:00
WITH 
"distribution_contact_projected" AS (
    -- Projection: Selecting 12 out of 13 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "contact_id",
        "distribution_id",
        "contact_frequency_rule_id",
        "contact_lookup_id",
        "opened_at",
        "response_completed_at",
        "response_id",
        "response_started_at",
        "sent_at",
        "status",
        "survey_link",
        "survey_session_id"
    FROM "memory"."main"."distribution_contact"
),

"distribution_contact_projected_renamed" AS (
    -- Rename: Renaming columns
    -- opened_at -> survey_opened_at
    -- sent_at -> survey_sent_at
    -- status -> survey_status
    SELECT 
        "contact_id",
        "distribution_id",
        "contact_frequency_rule_id",
        "contact_lookup_id",
        "opened_at" AS "survey_opened_at",
        "response_completed_at",
        "response_id",
        "response_started_at",
        "sent_at" AS "survey_sent_at",
        "status" AS "survey_status",
        "survey_link",
        "survey_session_id"
    FROM "distribution_contact_projected"
),

"distribution_contact_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- contact_frequency_rule_id: from DECIMAL to VARCHAR
    -- response_completed_at: from DECIMAL to TIMESTAMP
    -- response_id: from DECIMAL to VARCHAR
    -- response_started_at: from DECIMAL to TIMESTAMP
    -- survey_opened_at: from VARCHAR to TIMESTAMP
    -- survey_sent_at: from VARCHAR to TIMESTAMP
    -- survey_session_id: from DECIMAL to VARCHAR
    SELECT
        "contact_id",
        "distribution_id",
        "contact_lookup_id",
        "survey_status",
        "survey_link",
        CAST("contact_frequency_rule_id" AS VARCHAR) AS "contact_frequency_rule_id",
        CAST("response_completed_at" AS TIMESTAMP) AS "response_completed_at",
        CAST("response_id" AS VARCHAR) AS "response_id",
        CAST("response_started_at" AS TIMESTAMP) AS "response_started_at",
        CAST("survey_opened_at" AS TIMESTAMP) AS "survey_opened_at",
        CAST("survey_sent_at" AS TIMESTAMP) AS "survey_sent_at",
        CAST("survey_session_id" AS VARCHAR) AS "survey_session_id"
    FROM "distribution_contact_projected_renamed"
),

"distribution_contact_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- contact_frequency_rule_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- survey_opened_at has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- survey_session_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "contact_id",
        "distribution_id",
        "contact_lookup_id",
        "survey_status",
        "survey_link",
        "response_completed_at",
        "response_id",
        "response_started_at",
        "survey_opened_at",
        "survey_sent_at"
    FROM "distribution_contact_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "distribution_contact_projected_renamed_casted_missing_handled"

stg_distribution_contact.yml (Document the table)

version: 2
models:
- name: stg_distribution_contact
  description: The table shows the distribution of surveys to contacts. It includes
    contact IDs, distribution IDs, and contact lookup IDs. It tracks survey status
    (e.g., Opened, Pending), sent and opened timestamps, and survey links. The table
    relates contacts to specific survey distributions and captures response data,
    including start and completion times. It represents the lifecycle of survey distribution
    and response tracking.
  columns:
  - name: contact_id
    description: Unique identifier for the contact
    tests:
    - not_null
  - name: distribution_id
    description: Unique identifier for the survey distribution
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each survey distribution.
        For this table, each row represents a specific survey distribution to a contact.
        The distribution_id appears to be unique across rows, as it's designed to
        identify each unique distribution.
  - name: contact_lookup_id
    description: Lookup identifier for the contact
    tests:
    - not_null
  - name: survey_status
    description: Current status of the survey
    tests:
    - not_null
    - accepted_values:
        values:
        - Opened
        - Pending
        - Closed
        - Completed
        - Canceled
        - Archived
  - name: survey_link
    description: URL link to the survey
    tests:
    - not_null
  - name: response_completed_at
    description: Timestamp when the survey response was completed
    cocoon_meta:
      missing_acceptable: Not applicable for surveys not yet completed.
  - name: response_id
    description: Unique identifier for the survey response
    cocoon_meta:
      missing_acceptable: Not applicable for surveys not yet responded to.
  - name: response_started_at
    description: Timestamp when the survey response was started
    cocoon_meta:
      missing_acceptable: Not applicable for surveys not yet started.
  - name: survey_opened_at
    description: Timestamp when the survey was opened
    tests:
    - not_null
  - name: survey_sent_at
    description: Timestamp when the survey was sent
    tests:
    - not_null

stg_block (first 100 rows)

id survey_id _fivetran_deleted description type
0 BL_b8x54QK6VJE3A1f SV_8GsOFOOKnw9KAmx False Block 1 Standard
1 BL_9BOpMYC7bZTNZTn SV_8GsOFOOKnw9KAmx False Block 2 Standard
2 BL_797qWoQosqXXiJv SV_8GsOFOOKnw9KAmx False Block 3 Standard
3 BL_8pn7C2f84AUw3ZQ SV_afWZKdnyaANAL5j False Block 3 Standard
4 BL_3gZj0OfR5Qncg9T SV_1XGe0Xykmeo0lyB False Default Question Block Default

stg_block.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:50.303242+00:00
WITH 
"block_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "survey_id",
        "_fivetran_deleted",
        "block_locking",
        "block_visibility",
        "description",
        "randomize_questions",
        "type"
    FROM "memory"."main"."block"
),

"block_projected_missing_handled" AS (
    -- Handling missing values: There are 3 columns with unacceptable missing values
    -- block_locking has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- block_visibility has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- randomize_questions has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "id",
        "survey_id",
        "_fivetran_deleted",
        "description",
        "type"
    FROM "block_projected"
)

-- COCOON BLOCK END
SELECT * FROM "block_projected_missing_handled"

stg_block.yml (Document the table)

version: 2
models:
- name: stg_block
  description: Please summarize the table
  columns:
  - name: id
    description: id
    tests:
    - not_null
  - name: survey_id
    description: survey_id
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null
  - name: description
    description: description
    tests:
    - not_null
  - name: type
    description: type
    tests:
    - not_null

stg_directory_mailing_list (first 100 rows)

directory_id id _fivetran_deleted creation_date last_modified_date name owner_id
0 POOL_vw3p9mEAf692Aql CG_3lVV6y8VRw0CovA False 2020-06-19 18:41:15.768000 2020-06-19 18:41:15.768000 TestMailingList2 UR_cYdNGsrFUemlDbD
1 POOL_vw3p9mEAf692Aql CG_2YVdk5sjpPIUfBW False 2020-08-10 11:43:01.844000 2020-08-10 11:43:01.844000 Test_0810 UR_cYdNGsrFUemlDbD
2 POOL_vw3p9mEAf692Aql CG_2TOcRsxw5QF7r8d False 2020-08-07 09:48:18.474000 2020-08-07 09:48:18.474000 New Contact List (8/7/20 3:17 PM) UR_cYdNGsrFUemlDbD
3 POOL_vw3p9mEAf692Aql CG_UPYDsHbOosxP51D False 2020-08-06 05:52:13.640000 2020-08-06 05:52:13.640000 Test060820 UR_cYdNGsrFUemlDbD
4 POOL_vw3p9mEAf692Aql CG_2Xamq59U7RFDz1a False 2023-02-24 10:47:53.810000 2023-02-24 10:47:53.810000 New Contact List (2/24/23 4:17 PM) UR_cYdNGsrFUemlDbD

stg_directory_mailing_list.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:42:02.933253+00:00
WITH 
"directory_mailing_list_projected" AS (
    -- Projection: Selecting 7 out of 8 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "directory_id",
        "id",
        "_fivetran_deleted",
        "creation_date",
        "last_modified_date",
        "name",
        "owner_id"
    FROM "memory"."main"."directory_mailing_list"
)

-- COCOON BLOCK END
SELECT * FROM "directory_mailing_list_projected"

stg_directory_mailing_list.yml (Document the table)

version: 2
models:
- name: stg_directory_mailing_list
  description: Please summarize the table
  columns:
  - name: directory_id
    description: directory_id
    tests:
    - not_null
  - name: id
    description: id
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null
  - name: creation_date
    description: creation_date
    tests:
    - not_null
  - name: last_modified_date
    description: last_modified_date
    tests:
    - not_null
  - name: name
    description: name
    tests:
    - not_null
  - name: owner_id
    description: owner_id
    tests:
    - not_null

stg_directory (first 100 rows)

id _fivetran_deleted deduplication_criteria_email deduplication_criteria_external_data_reference deduplication_criteria_first_name deduplication_criteria_last_name deduplication_criteria_phone is_default name
0 POOL_vw3p9mEAf692Aql False True False False False False True Test_directory

stg_directory.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:58.109139+00:00
WITH 
"directory_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "deduplication_criteria_email",
        "deduplication_criteria_external_data_reference",
        "deduplication_criteria_first_name",
        "deduplication_criteria_last_name",
        "deduplication_criteria_phone",
        "is_default",
        "name"
    FROM "memory"."main"."directory"
)

-- COCOON BLOCK END
SELECT * FROM "directory_projected"

stg_directory.yml (Document the table)

version: 2
models:
- name: stg_directory
  description: Please summarize the table
  columns:
  - name: id
    description: id
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null
  - name: deduplication_criteria_email
    description: deduplication_criteria_email
    tests:
    - not_null
  - name: deduplication_criteria_external_data_reference
    description: deduplication_criteria_external_data_reference
    tests:
    - not_null
  - name: deduplication_criteria_first_name
    description: deduplication_criteria_first_name
    tests:
    - not_null
  - name: deduplication_criteria_last_name
    description: deduplication_criteria_last_name
    tests:
    - not_null
  - name: deduplication_criteria_phone
    description: deduplication_criteria_phone
    tests:
    - not_null
  - name: is_default
    description: is_default
    tests:
    - not_null
  - name: name
    description: name
    tests:
    - not_null

stg_block_question (first 100 rows)

block_id question_id survey_id _fivetran_deleted
0 BL_9sjHtHF4P4LyAOF QID21 SV_06auvDLqWENyV6t False
1 BL_9sjHtHF4P4LyAOF QID24 SV_06auvDLqWENyV6t False
2 BL_9sjHtHF4P4LyAOF QID12 SV_06auvDLqWENyV6t False
3 BL_9sjHtHF4P4LyAOF QID11 SV_06auvDLqWENyV6t False
4 BL_diiw4C9MVCcKFHn QID23 SV_06auvDLqWENyV6t False

stg_block_question.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:41:51.567164+00:00
WITH 
"block_question_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "block_id",
        "question_id",
        "survey_id",
        "_fivetran_deleted"
    FROM "memory"."main"."block_question"
)

-- COCOON BLOCK END
SELECT * FROM "block_question_projected"

stg_block_question.yml (Document the table)

version: 2
models:
- name: stg_block_question
  description: Please summarize the table
  columns:
  - name: block_id
    description: block_id
    tests:
    - not_null
  - name: question_id
    description: question_id
    tests:
    - not_null
  - name: survey_id
    description: survey_id
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null

stg_sub_question (first 100 rows)

answer_choice_id question_id survey_id is_deleted answer_text export_tag
0 2 QID4 SV_06auvDLqWENyV6t False Product None
1 1 QID4 SV_06auvDLqWENyV6t False QA None
2 3 QID4 SV_06auvDLqWENyV6t False Support None
3 1 QID6 SV_06auvDLqWENyV6t False Java None
4 3 QID6 SV_06auvDLqWENyV6t False C None

stg_sub_question.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:51:31.889107+00:00
WITH 
"sub_question_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "key_",
        "question_id",
        "survey_id",
        "_fivetran_deleted",
        "choice_data_export_tag",
        "text"
    FROM "memory"."main"."sub_question"
),

"sub_question_projected_renamed" AS (
    -- Rename: Renaming columns
    -- key_ -> answer_choice_id
    -- _fivetran_deleted -> is_deleted
    -- choice_data_export_tag -> export_tag
    -- text -> answer_text
    SELECT 
        "key_" AS "answer_choice_id",
        "question_id",
        "survey_id",
        "_fivetran_deleted" AS "is_deleted",
        "choice_data_export_tag" AS "export_tag",
        "text" AS "answer_text"
    FROM "sub_question_projected"
),

"sub_question_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- export_tag: from DECIMAL to VARCHAR
    SELECT
        "answer_choice_id",
        "question_id",
        "survey_id",
        "is_deleted",
        "answer_text",
        CAST("export_tag" AS VARCHAR) AS "export_tag"
    FROM "sub_question_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "sub_question_projected_renamed_casted"

stg_sub_question.yml (Document the table)

version: 2
models:
- name: stg_sub_question
  description: The table represents a relation between survey questions and their
    possible answers or options. Each row contains a key, question ID, survey ID,
    and text representing an answer choice. The question_id links multiple options
    to a single question. The survey_id associates questions with a specific survey.
    This structure allows for organizing multiple choice questions within surveys.
  columns:
  - name: answer_choice_id
    description: Unique identifier for the answer choice
    tests:
    - not_null
  - name: question_id
    description: Identifier for the question
    tests:
    - not_null
  - name: survey_id
    description: Identifier for the survey
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: answer_text
    description: The text of the answer choice
    tests:
    - not_null
    - accepted_values:
        values:
        - C
        - Java
        - Product
        - QA
        - Support
        - Python
        - JavaScript
        - DevOps
        - UI/UX
        - Data Science
        - Machine Learning
        - Mobile Development
        - Cloud Computing
        - Security
        - Database
        - Frontend
        - Backend
        - Full Stack
        - Project Management
        - Customer Success
        - Sales
        - Marketing
        - HR
        - Finance
        - Legal
  - name: export_tag
    description: Additional tag for data export purposes
    cocoon_meta:
      missing_acceptable: Optional tag, not required for all answer choices.

stg_distribution (first 100 rows)

distribution_id is_deleted sender_email sender_name reply_to_email email_subject organization_id owner_id recipient_contact_id recipient_library_id mailing_list_id sample_id distribution_status distribution_type survey_link_type survey_id created_date modified_date send_date survey_expiration_date
0 EMD_rXUNkFnt2LlVaYE False None None None None puppydev UR_cYdNGsrFUemlDbD None UR_cYdNGsrFUemlDbD CG_6nYnmPWONC4JziN None Generated GeneratedInvite Individual SV_afWZKdnyaANAL5j 2021-09-16 09:54:07 2021-09-16 03:54:07 2021-09-16 09:54:07 2021-11-15 09:54:06
1 EMD_pLLqzPVqIC80qOQ False None None None None puppydev UR_cYdNGsrFUemlDbD None UR_cYdNGsrFUemlDbD CG_2B4IIbgp8ZC0NVn CG_3dVV5daI3GYd8KD Generated GeneratedInvite Individual SV_1MKtxtk2owJjOol 2020-06-25 10:58:24 2020-06-25 04:58:24 2020-06-25 10:58:24 2020-08-24 10:58:23
2 EMD_kOb54z4ZpofISxX False None None None None puppydev UR_cYdNGsrFUemlDbD CGC_kKkUtjVs3mFxeAK UR_cYdNGsrFUemlDbD CG_OI48zJxTuMsKUq5 None Generated GeneratedInvite Individual SV_6Jcja72TpOiHjdX 2022-12-08 12:48:07 2022-12-08 05:53:39 2022-12-08 12:48:07 2022-12-09 12:48:07
3 EMD_GgJDmizeU8yAwvI False None None None None puppydev UR_cYdNGsrFUemlDbD None UR_cYdNGsrFUemlDbD CG_6nYnmPWONC4JziN None Generated GeneratedInvite Individual SV_djzINkbJIFJUs05 2022-04-06 12:16:14 2022-04-06 06:16:14 2022-04-06 12:16:14 2022-06-05 12:16:14
4 EMD_8AvFYfH0jD2axcz False noreply@qemailserver.com Developers developers@puppy.com Test Survey puppydev UR_cYdNGsrFUemlDbD None None None None Done TestEmail None SV_djzINkbJIFJUs05 2022-04-06 11:27:16 2022-04-06 05:29:56 2022-04-06 11:27:16 NaT

stg_distribution.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:45:16.233443+00:00
WITH 
"distribution_projected" AS (
    -- Projection: Selecting 24 out of 25 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "created_date",
        "header_from_email",
        "header_from_name",
        "header_reply_to_email",
        "header_subject",
        "message_library_id",
        "message_message_id",
        "message_message_text",
        "modified_date",
        "organization_id",
        "owner_id",
        "parent_distribution_id",
        "recipient_contact_id",
        "recipient_library_id",
        "recipient_mailing_list_id",
        "recipient_sample_id",
        "request_status",
        "request_type",
        "send_date",
        "survey_link_expiration_date",
        "survey_link_link_type",
        "survey_link_survey_id"
    FROM "memory"."main"."distribution"
),

"distribution_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> distribution_id
    -- _fivetran_deleted -> is_deleted
    -- header_from_email -> sender_email
    -- header_from_name -> sender_name
    -- header_reply_to_email -> reply_to_email
    -- header_subject -> email_subject
    -- message_message_id -> message_id
    -- message_message_text -> message_text
    -- recipient_mailing_list_id -> mailing_list_id
    -- recipient_sample_id -> sample_id
    -- request_status -> distribution_status
    -- request_type -> distribution_type
    -- survey_link_expiration_date -> survey_expiration_date
    -- survey_link_link_type -> survey_link_type
    -- survey_link_survey_id -> survey_id
    SELECT 
        "id" AS "distribution_id",
        "_fivetran_deleted" AS "is_deleted",
        "created_date",
        "header_from_email" AS "sender_email",
        "header_from_name" AS "sender_name",
        "header_reply_to_email" AS "reply_to_email",
        "header_subject" AS "email_subject",
        "message_library_id",
        "message_message_id" AS "message_id",
        "message_message_text" AS "message_text",
        "modified_date",
        "organization_id",
        "owner_id",
        "parent_distribution_id",
        "recipient_contact_id",
        "recipient_library_id",
        "recipient_mailing_list_id" AS "mailing_list_id",
        "recipient_sample_id" AS "sample_id",
        "request_status" AS "distribution_status",
        "request_type" AS "distribution_type",
        "send_date",
        "survey_link_expiration_date" AS "survey_expiration_date",
        "survey_link_link_type" AS "survey_link_type",
        "survey_link_survey_id" AS "survey_id"
    FROM "distribution_projected"
),

"distribution_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- created_date: from VARCHAR to TIMESTAMP
    -- message_id: from DECIMAL to VARCHAR
    -- message_library_id: from DECIMAL to VARCHAR
    -- message_text: from DECIMAL to VARCHAR
    -- modified_date: from VARCHAR to TIMESTAMP
    -- parent_distribution_id: from DECIMAL to VARCHAR
    -- send_date: from VARCHAR to TIMESTAMP
    -- survey_expiration_date: from VARCHAR to TIMESTAMP
    SELECT
        "distribution_id",
        "is_deleted",
        "sender_email",
        "sender_name",
        "reply_to_email",
        "email_subject",
        "organization_id",
        "owner_id",
        "recipient_contact_id",
        "recipient_library_id",
        "mailing_list_id",
        "sample_id",
        "distribution_status",
        "distribution_type",
        "survey_link_type",
        "survey_id",
        CAST("created_date" AS TIMESTAMP) AS "created_date",
        CAST("message_id" AS VARCHAR) AS "message_id",
        CAST("message_library_id" AS VARCHAR) AS "message_library_id",
        CAST("message_text" AS VARCHAR) AS "message_text",
        CAST("modified_date" AS TIMESTAMP) AS "modified_date",
        CAST("parent_distribution_id" AS VARCHAR) AS "parent_distribution_id",
        CAST("send_date" AS TIMESTAMP) AS "send_date",
        CAST("survey_expiration_date" AS TIMESTAMP) AS "survey_expiration_date"
    FROM "distribution_projected_renamed"
),

"distribution_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 7 columns with unacceptable missing values
    -- message_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- message_library_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- message_text has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- parent_distribution_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- recipient_contact_id has 80.0 percent missing. Strategy: 🔄 Unchanged
    -- recipient_library_id has 20.0 percent missing. Strategy: 🔄 Unchanged
    -- survey_expiration_date has 20.0 percent missing. Strategy: 🔄 Unchanged
    SELECT
        "distribution_id",
        "is_deleted",
        "sender_email",
        "sender_name",
        "reply_to_email",
        "email_subject",
        "organization_id",
        "owner_id",
        "recipient_contact_id",
        "recipient_library_id",
        "mailing_list_id",
        "sample_id",
        "distribution_status",
        "distribution_type",
        "survey_link_type",
        "survey_id",
        "created_date",
        "modified_date",
        "send_date",
        "survey_expiration_date"
    FROM "distribution_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "distribution_projected_renamed_casted_missing_handled"

stg_distribution.yml (Document the table)

version: 2
models:
- name: stg_distribution
  description: The table is about distribution of surveys. It contains unique distribution
    IDs. The table includes details like creation date, sender information, and recipient
    data. It specifies survey links, expiration dates, and request types. The table
    tracks the status of distributions. It associates distributions with specific
    organizations and owners. The data covers various survey-related actions like
    invites and test emails.
  columns:
  - name: distribution_id
    description: Unique identifier for the distribution
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each distribution. For this
        table, each row represents a unique distribution, and the distribution_id
        is unique across rows.
  - name: is_deleted
    description: Indicates if the record has been deleted
    tests:
    - not_null
  - name: sender_email
    description: Email address of the sender
    cocoon_meta:
      missing_acceptable: Not applicable for GeneratedInvite distribution types
  - name: sender_name
    description: Name of the sender
    cocoon_meta:
      missing_acceptable: Not applicable for GeneratedInvite distribution types
  - name: reply_to_email
    description: Reply-to email address for the distribution
    cocoon_meta:
      missing_acceptable: Not applicable for GeneratedInvite distribution types
  - name: email_subject
    description: Subject line of the distribution email
    cocoon_meta:
      missing_acceptable: Not applicable for GeneratedInvite distribution types
  - name: organization_id
    description: ID of the organization associated with distribution
    tests:
    - not_null
  - name: owner_id
    description: ID of the owner of the distribution
    tests:
    - not_null
  - name: recipient_contact_id
    description: ID of the recipient contact
    tests:
    - not_null
  - name: recipient_library_id
    description: ID of the recipient library
    tests:
    - not_null
  - name: mailing_list_id
    description: ID of the recipient mailing list
    cocoon_meta:
      missing_acceptable: Not applicable for TestEmail distribution types
  - name: sample_id
    description: ID of the recipient sample
    cocoon_meta:
      missing_acceptable: Not applicable for non-sample based distributions
  - name: distribution_status
    description: Current status of the distribution request
    tests:
    - not_null
    - accepted_values:
        values:
        - Generated
        - Done
        - In Progress
        - Pending
        - Failed
        - Cancelled
        - On Hold
  - name: distribution_type
    description: Type of distribution request
    tests:
    - not_null
    - accepted_values:
        values:
        - GeneratedInvite
        - TestEmail
        - ManualInvite
        - BulkInvite
        - ReminderInvite
        - FollowUpInvite
        - EventSpecificInvite
  - name: survey_link_type
    description: Type of survey link
    tests:
    - accepted_values:
        values:
        - Individual
        - Group
        - Anonymous
        - Public
        - Private
        - One-time
        - Reusable
        - Embedded
        - QR Code
        - Email
        - SMS
        - Social Media
    cocoon_meta:
      missing_acceptable: Not applicable for TestEmail distribution types
  - name: survey_id
    description: ID of the associated survey
    tests:
    - not_null
  - name: created_date
    description: Date and time when the distribution was created
    tests:
    - not_null
  - name: modified_date
    description: Date and time of last modification
    tests:
    - not_null
  - name: send_date
    description: Date and time when the distribution was sent
    tests:
    - not_null
  - name: survey_expiration_date
    description: Expiration date of the survey link
    tests:
    - not_null

stg_question_option (first 100 rows)

option_id question_id survey_id is_deleted option_value option_text
0 1 QID1 SV_5vVBVUfSVz94szc False 1 Testing 1
1 1 QID1 SV_08GFzuA5eDhxx53 False 1 Click to write Choice 1
2 1 QID1 SV_6zmBz8qb19bpMvH False 1 Click to write Choice 1
3 1 QID1 SV_1XGe0Xykmeo0lyB False 1 Click to write Choice 1
4 1 QID1 SV_0VZnz7EkhZquEKy False 1 Click to write Choice 1

stg_question_option.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 22:49:45.805052+00:00
WITH 
"question_option_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "key_",
        "question_id",
        "survey_id",
        "_fivetran_deleted",
        "recode_value",
        "text"
    FROM "memory"."main"."question_option"
),

"question_option_projected_renamed" AS (
    -- Rename: Renaming columns
    -- key_ -> option_id
    -- _fivetran_deleted -> is_deleted
    -- recode_value -> option_value
    -- text -> option_text
    SELECT 
        "key_" AS "option_id",
        "question_id",
        "survey_id",
        "_fivetran_deleted" AS "is_deleted",
        "recode_value" AS "option_value",
        "text" AS "option_text"
    FROM "question_option_projected"
)

-- COCOON BLOCK END
SELECT * FROM "question_option_projected_renamed"

stg_question_option.yml (Document the table)

version: 2
models:
- name: stg_question_option
  description: The table is about question options in surveys. It contains details
    of each option, including a unique key, question ID, survey ID, deletion status,
    recode value, and option text. Each row represents a single option for a specific
    question in a particular survey. The table allows for tracking and managing multiple
    choice options across different surveys and questions.
  columns:
  - name: option_id
    description: Unique identifier for the option
    tests:
    - not_null
  - name: question_id
    description: Identifier for the question
    tests:
    - not_null
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: option_value
    description: Numerical value assigned to the option
    tests:
    - not_null
  - name: option_text
    description: Text content of the option
    tests:
    - not_null

stg_user (first 100 rows)

user_id is_deleted account_status email first_name last_name user_type username account_expiration_date
0 UR_cYdNGsrFUemlDbD False active developers@puppy.com Puppy Developers UT_BRANDADMIN developers@fivetran.com NaT
1 UR_0AnU60plqo2I4DL False active cat@fivetran.com Cat Meow UT_BRANDADMIN colin@fivetran.com NaT

stg_user.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-07-05 23:02:37.875113+00:00
WITH 
"user_projected" AS (
    -- Projection: Selecting 21 out of 22 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "account_creation_date",
        "account_expiration_date",
        "account_status",
        "division_id",
        "email",
        "first_name",
        "language_",
        "last_login_date",
        "last_name",
        "organization_id",
        "password_expiration_date",
        "password_last_changed_date",
        "response_count_auditable",
        "response_count_deleted",
        "response_count_generated",
        "time_zone",
        "unsubscribed",
        "user_type",
        "username"
    FROM "memory"."main"."user"
),

"user_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> user_id
    -- _fivetran_deleted -> is_deleted
    -- language_ -> preferred_language
    -- response_count_auditable -> auditable_response_count
    -- response_count_deleted -> deleted_response_count
    -- response_count_generated -> generated_response_count
    -- unsubscribed -> is_unsubscribed
    SELECT 
        "id" AS "user_id",
        "_fivetran_deleted" AS "is_deleted",
        "account_creation_date",
        "account_expiration_date",
        "account_status",
        "division_id",
        "email",
        "first_name",
        "language_" AS "preferred_language",
        "last_login_date",
        "last_name",
        "organization_id",
        "password_expiration_date",
        "password_last_changed_date",
        "response_count_auditable" AS "auditable_response_count",
        "response_count_deleted" AS "deleted_response_count",
        "response_count_generated" AS "generated_response_count",
        "time_zone",
        "unsubscribed" AS "is_unsubscribed",
        "user_type",
        "username"
    FROM "user_projected"
),

"user_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- first_name: The problem is that 'Cat' and 'Puppy' are animal names, not typical human first names. They are likely placeholder or test data that was mistakenly left in the dataset. Since these values do not represent actual human first names, they should be replaced with empty strings to indicate missing or invalid data. 
    -- last_name: The problem is that both "Developers" and "Meow" are unusual for last names. "Developers" appears to be a job title or group name rather than a family name. "Meow" is typically associated with the sound a cat makes and is not a common surname. Neither of these represents typical family names. Since we don't have additional information to determine the correct last names for these entries, the best approach is to map these unusual values to an empty string, indicating that the last name is unknown or not provided. 
    SELECT
        "user_id",
        "is_deleted",
        "account_creation_date",
        "account_expiration_date",
        "account_status",
        "division_id",
        "email",
        CASE
            WHEN "first_name" = '''Cat''' THEN ''''
            WHEN "first_name" = '''Puppy''' THEN ''''
            ELSE "first_name"
        END AS "first_name",
        "preferred_language",
        "last_login_date",
        CASE
            WHEN "last_name" = '''Developers''' THEN ''''
            WHEN "last_name" = '''Meow''' THEN ''''
            ELSE "last_name"
        END AS "last_name",
        "organization_id",
        "password_expiration_date",
        "password_last_changed_date",
        "auditable_response_count",
        "deleted_response_count",
        "generated_response_count",
        "time_zone",
        "is_unsubscribed",
        "user_type",
        "username"
    FROM "user_projected_renamed"
),

"user_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- account_creation_date: from DECIMAL to TIMESTAMP
    -- account_expiration_date: from DECIMAL to TIMESTAMP
    -- auditable_response_count: from DECIMAL to INT
    -- deleted_response_count: from DECIMAL to INT
    -- division_id: from DECIMAL to VARCHAR
    -- generated_response_count: from DECIMAL to INT
    -- is_unsubscribed: from DECIMAL to BOOLEAN
    -- last_login_date: from DECIMAL to TIMESTAMP
    -- organization_id: from DECIMAL to VARCHAR
    -- password_expiration_date: from DECIMAL to TIMESTAMP
    -- password_last_changed_date: from DECIMAL to TIMESTAMP
    -- preferred_language: from DECIMAL to VARCHAR
    -- time_zone: from DECIMAL to VARCHAR
    SELECT
        "user_id",
        "is_deleted",
        "account_status",
        "email",
        "first_name",
        "last_name",
        "user_type",
        "username",
        CAST("account_creation_date" AS TIMESTAMP) AS "account_creation_date",
        CAST("account_expiration_date" AS TIMESTAMP) AS "account_expiration_date",
        CAST("auditable_response_count" AS INT) AS "auditable_response_count",
        CAST("deleted_response_count" AS INT) AS "deleted_response_count",
        CAST("division_id" AS VARCHAR) AS "division_id",
        CAST("generated_response_count" AS INT) AS "generated_response_count",
        CAST("is_unsubscribed" AS BOOLEAN) AS "is_unsubscribed",
        CAST("last_login_date" AS TIMESTAMP) AS "last_login_date",
        CAST("organization_id" AS VARCHAR) AS "organization_id",
        CAST("password_expiration_date" AS TIMESTAMP) AS "password_expiration_date",
        CAST("password_last_changed_date" AS TIMESTAMP) AS "password_last_changed_date",
        CAST("preferred_language" AS VARCHAR) AS "preferred_language",
        CAST("time_zone" AS VARCHAR) AS "time_zone"
    FROM "user_projected_renamed_cleaned"
),

"user_projected_renamed_cleaned_casted_missing_handled" AS (
    -- Handling missing values: There are 12 columns with unacceptable missing values
    -- account_creation_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- auditable_response_count has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- deleted_response_count has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- division_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- generated_response_count has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- is_unsubscribed has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- last_login_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- organization_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- password_expiration_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- password_last_changed_date has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- preferred_language has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- time_zone has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "user_id",
        "is_deleted",
        "account_status",
        "email",
        "first_name",
        "last_name",
        "user_type",
        "username",
        "account_expiration_date"
    FROM "user_projected_renamed_cleaned_casted"
)

-- COCOON BLOCK END
SELECT * FROM "user_projected_renamed_cleaned_casted_missing_handled"

stg_user.yml (Document the table)

version: 2
models:
- name: stg_user
  description: The table is about user accounts. It contains user details like ID,
    email, name, and account status. The table includes information on account creation,
    expiration, and last login dates. It also tracks user types, organizations, and
    response counts. Some fields like language and time zone suggest customization
    options for users.
  columns:
  - name: user_id
    description: Unique identifier for the user
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each user. For this table,
        each row represents a distinct user account. user_id appears to be unique
        across rows, as it's designed to be a specific identifier for each user.
  - name: is_deleted
    description: Indicates if the record is deleted
    tests:
    - not_null
  - name: account_status
    description: Current status of the user account
    tests:
    - not_null
    - accepted_values:
        values:
        - active
        - inactive
        - suspended
        - pending
        - locked
        - closed
        - banned
        - archived
  - name: email
    description: User's email address
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the user's email address. In most systems,
        email addresses are unique for each user account. However, there's a slight
        possibility that in some systems, multiple accounts could share an email (e.g.,
        work accounts). Based on the sample data, it appears to be unique.
  - name: first_name
    description: User's first name
    tests:
    - not_null
  - name: last_name
    description: User's last name
    tests:
    - not_null
  - name: user_type
    description: Type or role of the user
    tests:
    - not_null
    - accepted_values:
        values:
        - UT_BRANDADMIN
        - UT_ADMIN
        - UT_USER
        - UT_MANAGER
        - UT_SUPPORT
        - UT_GUEST
        - UT_DEVELOPER
        - UT_ANALYST
        - UT_MODERATOR
        - UT_EDITOR
  - name: username
    description: User's username for login
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents the user's username for login. In most systems,
        usernames are unique for each user account. Based on the sample data, it appears
        to be unique across rows.
  - name: account_expiration_date
    description: Date when the user account expires
    cocoon_meta:
      missing_acceptable: Active accounts may not have an expiration date.
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

snapshot_directory_contact (first 100 rows)

directory_id id creation_date directory_unsubscribed email email_domain first_name last_name phone
0 POOL_vw3p9mEAf692Aql CID_0pkCodbTgpX4awd 2020-06-24 15:44:46.380000 False abc1@abc1.com abc1.com Saksham Malhotra NaN
1 POOL_vw3p9mEAf692Aql CID_0VVnnbsmkvMGyrj 2020-07-09 08:32:46.467000 False abc@abc.com abc.com S M 3232323.0
2 POOL_vw3p9mEAf692Aql CID_6fEsreE9bM67AVv 2020-06-24 15:49:29.330000 False abc3@abc3.com abc3.com None Malhotra NaN
3 POOL_vw3p9mEAf692Aql CID_71bfBUYt4qS6Fsp 2020-08-06 05:50:32.126000 False connect2me@gmail.com gmail.com me k Aggarwal NaN
4 POOL_vw3p9mEAf692Aql CID_d060swLlNUKGa1L 2020-06-24 15:45:38.798000 False abc2@abc2.com abc2.com Saksham None NaN

snapshot_directory_contact.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "directory_id", "id"
-- Effective date columns are "last_modified"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "directory_id",
    "id",
    "creation_date",
    "directory_unsubscribed",
    "email",
    "email_domain",
    "first_name",
    "last_name",
    "phone"
FROM (
     SELECT 
            "directory_id",
            "id",
            "creation_date",
            "directory_unsubscribed",
            "email",
            "email_domain",
            "first_name",
            "last_name",
            "phone",
            ROW_NUMBER() OVER (
                PARTITION BY "directory_id", "id" 
                ORDER BY "last_modified" 
            DESC) AS "cocoon_rn"
    FROM "stg_directory_contact"
) ranked
WHERE "cocoon_rn" = 1

snapshot_directory_contact.yml (Document the table)

version: 2
models:
- name: snapshot_directory_contact
  description: The table is about contact information in a directory. It tracks the
    most recent version of each contact, identified by directory_id and id. The table
    includes fields for email, name, phone, and subscription status. It omits version-related
    information like creation date and last modified date. Each row represents the
    latest data for a unique contact in the directory.
  columns:
  - name: directory_id
    description: directory_id
    tests:
    - not_null
  - name: id
    description: id
    tests:
    - not_null
  - name: creation_date
    description: creation_date
    tests:
    - not_null
  - name: directory_unsubscribed
    description: directory_unsubscribed
    tests:
    - not_null
  - name: email
    description: email
    tests:
    - not_null
  - name: email_domain
    description: email_domain
    tests:
    - not_null
  - name: first_name
    description: first_name
    tests:
    - not_null
  - name: last_name
    description: last_name
    tests:
    - not_null
  - name: phone
    description: phone
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_directory_contact

snapshot_directory_mailing_list (first 100 rows)

directory_id id _fivetran_deleted creation_date name owner_id
0 POOL_vw3p9mEAf692Aql CG_3lVV6y8VRw0CovA False 2020-06-19 18:41:15.768000 TestMailingList2 UR_cYdNGsrFUemlDbD
1 POOL_vw3p9mEAf692Aql CG_2Xamq59U7RFDz1a False 2023-02-24 10:47:53.810000 New Contact List (2/24/23 4:17 PM) UR_cYdNGsrFUemlDbD
2 POOL_vw3p9mEAf692Aql CG_UPYDsHbOosxP51D False 2020-08-06 05:52:13.640000 Test060820 UR_cYdNGsrFUemlDbD
3 POOL_vw3p9mEAf692Aql CG_2TOcRsxw5QF7r8d False 2020-08-07 09:48:18.474000 New Contact List (8/7/20 3:17 PM) UR_cYdNGsrFUemlDbD
4 POOL_vw3p9mEAf692Aql CG_2YVdk5sjpPIUfBW False 2020-08-10 11:43:01.844000 Test_0810 UR_cYdNGsrFUemlDbD

snapshot_directory_mailing_list.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "directory_id", "id"
-- Effective date columns are "last_modified_date"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "directory_id",
    "id",
    "_fivetran_deleted",
    "creation_date",
    "name",
    "owner_id"
FROM (
     SELECT 
            "directory_id",
            "id",
            "_fivetran_deleted",
            "creation_date",
            "name",
            "owner_id",
            ROW_NUMBER() OVER (
                PARTITION BY "directory_id", "id" 
                ORDER BY "last_modified_date" 
            DESC) AS "cocoon_rn"
    FROM "stg_directory_mailing_list"
) ranked
WHERE "cocoon_rn" = 1

snapshot_directory_mailing_list.yml (Document the table)

version: 2
models:
- name: snapshot_directory_mailing_list
  description: The table is about mailing lists in a directory. It tracks the most
    recent version of each mailing list, identified by a unique combination of directory_id
    and id. The table includes information such as the creation date, name, and owner
    of each mailing list. It excludes version-related information like last modified
    date and deletion status.
  columns:
  - name: directory_id
    description: directory_id
    tests:
    - not_null
  - name: id
    description: id
    tests:
    - not_null
  - name: _fivetran_deleted
    description: _fivetran_deleted
    tests:
    - not_null
  - name: creation_date
    description: creation_date
    tests:
    - not_null
  - name: name
    description: name
    tests:
    - not_null
  - name: owner_id
    description: owner_id
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_directory_mailing_list

snapshot_survey_version (first 100 rows)

survey_id is_deleted description is_published user_id was_ever_published creation_date version_id
0 SV_1MKtxtk2owJjOol False Autosaved version False UR_cYdNGsrFUemlDbD False 2020-06-25 13:21:42 63473806
1 SV_08GFzuA5eDhxx53 False Autosaved version False UR_cYdNGsrFUemlDbD False 2020-07-01 10:25:16 255659202
2 SV_0VZnz7EkhZquEKy False Autosaved version True UR_cYdNGsrFUemlDbD True 2022-09-29 06:51:05 20510032
3 SV_06auvDLqWENyV6t False Autosaved version False UR_cYdNGsrFUemlDbD False 2020-08-05 04:25:37 9223351

snapshot_survey_version.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "survey_id"
-- Effective date columns are "version_number"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "survey_id",
    "is_deleted",
    "description",
    "is_published",
    "user_id",
    "was_ever_published",
    "creation_date",
    "version_id"
FROM (
     SELECT 
            "survey_id",
            "is_deleted",
            "description",
            "is_published",
            "user_id",
            "was_ever_published",
            "creation_date",
            "version_id",
            ROW_NUMBER() OVER (
                PARTITION BY "survey_id" 
                ORDER BY "version_number" 
            DESC) AS "cocoon_rn"
    FROM "stg_survey_version"
) ranked
WHERE "cocoon_rn" = 1

snapshot_survey_version.yml (Document the table)

version: 2
models:
- name: snapshot_survey_version
  description: The table is about the latest versions of surveys. It tracks the most
    recent state of each survey, including its ID, deletion status, description, publication
    status, associated user, and creation date. The table indicates whether a survey
    was ever published and provides a unique identifier for each survey's latest version.
    It excludes version-specific information, focusing on the current state of each
    survey.
  columns:
  - name: survey_id
    description: Unique identifier for the survey
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: is_deleted
    description: Flag indicating if the record was deleted
    tests:
    - not_null
  - name: description
    description: Brief description of the survey version
    tests:
    - not_null
  - name: is_published
    description: Flag indicating if the version is currently published
    tests:
    - not_null
  - name: user_id
    description: Unique identifier of the user who created the version
    tests:
    - not_null
  - name: was_ever_published
    description: Flag indicating if the version was ever published
    tests:
    - not_null
  - name: creation_date
    description: Date and time when the survey version was created
    tests:
    - not_null
  - name: version_id
    description: Unique identifier for the survey version
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the survey version.
        For this table, each row represents a specific version of a survey. Given
        that it's explicitly stated as a unique identifier, it should be unique across
        all rows.
cocoon_meta:
  scd_base_table: stg_survey_version
We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_survey_response stg_survey_response stg_survey stg_survey stg_survey_response->stg_survey stg_core_contact stg_core_contact stg_core_mailing_list stg_core_mailing_list stg_core_contact->stg_core_mailing_list stg_question_response stg_question_response stg_question_response->stg_survey_response stg_question stg_question stg_question_response->stg_question stg_distribution_contact stg_distribution_contact stg_distribution_contact->stg_survey_response stg_distribution_contact->stg_core_contact stg_contact_mailing_list_membership stg_contact_mailing_list_membership stg_distribution_contact->stg_contact_mailing_list_membership stg_distribution stg_distribution stg_distribution_contact->stg_distribution snapshot_directory_contact snapshot_directory_contact stg_directory stg_directory snapshot_directory_contact->stg_directory stg_user stg_user stg_question->stg_survey stg_survey->stg_user stg_contact_mailing_list_membership->stg_core_contact stg_contact_mailing_list_membership->stg_user stg_contact_mailing_list_membership->stg_directory stg_contact_mailing_list_membership->stg_core_mailing_list snapshot_survey_version snapshot_survey_version snapshot_survey_version->stg_user snapshot_survey_version->stg_survey stg_distribution->stg_core_contact stg_distribution->stg_user stg_distribution->stg_survey stg_distribution->stg_core_mailing_list stg_block stg_block stg_block->stg_survey

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_core_contact
  primary_key: id
  foreign_keys:
  - column: mailing_list_id
    reference:
      table_name: stg_core_mailing_list
      column: id
- table_name: stg_contact_mailing_list_membership
  foreign_keys:
  - column: contact_id
    reference:
      table_name: stg_core_contact
      column: id
  - column: directory_id
    reference:
      table_name: stg_directory
      column: id
  - column: mailing_list_id
    reference:
      table_name: stg_core_mailing_list
      column: id
  - column: owner_id
    reference:
      table_name: stg_user
      column: user_id
  primary_key: contact_lookup_id
- table_name: stg_distribution
  foreign_keys:
  - column: recipient_contact_id
    reference:
      table_name: stg_core_contact
      column: id
  - column: mailing_list_id
    reference:
      table_name: stg_core_mailing_list
      column: id
  - column: owner_id
    reference:
      table_name: stg_user
      column: user_id
  - column: survey_id
    reference:
      table_name: stg_survey
      column: survey_id
  primary_key: distribution_id
- table_name: stg_distribution_contact
  foreign_keys:
  - column: contact_id
    reference:
      table_name: stg_core_contact
      column: id
  - column: distribution_id
    reference:
      table_name: stg_distribution
      column: distribution_id
  - column: contact_lookup_id
    reference:
      table_name: stg_contact_mailing_list_membership
      column: contact_lookup_id
  - column: response_id
    reference:
      table_name: stg_survey_response
      column: response_id
- table_name: stg_directory
  primary_key: id
  foreign_keys: []
- table_name: snapshot_directory_contact
  foreign_keys:
  - column: directory_id
    reference:
      table_name: stg_directory
      column: id
- table_name: stg_core_mailing_list
  primary_key: id
  foreign_keys: []
- table_name: stg_user
  primary_key: user_id
  foreign_keys: []
- table_name: stg_survey
  foreign_keys:
  - column: owner_id
    reference:
      table_name: stg_user
      column: user_id
  primary_key: survey_id
- table_name: snapshot_survey_version
  foreign_keys:
  - column: user_id
    reference:
      table_name: stg_user
      column: user_id
  - column: survey_id
    reference:
      table_name: stg_survey
      column: survey_id
- table_name: stg_block
  foreign_keys:
  - column: survey_id
    reference:
      table_name: stg_survey
      column: survey_id
- table_name: stg_question
  foreign_keys:
  - column: survey_id
    reference:
      table_name: stg_survey
      column: survey_id
  primary_key: question_id
- table_name: stg_survey_response
  foreign_keys:
  - column: survey_id
    reference:
      table_name: stg_survey
      column: survey_id
  primary_key: response_id
- table_name: stg_question_response
  foreign_keys:
  - column: response_id
    reference:
      table_name: stg_survey_response
      column: response_id
  - column: question_id
    reference:
      table_name: stg_question
      column: question_id
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Contacts
  entity_description: Represents individual contacts in the system
  table_name: stg_core_contact
  primary_key: id
- entity_name: Directories
  entity_description: Represents organizational directories or structures
  table_name: stg_directory
  primary_key: id
- entity_name: Mailing Lists
  entity_description: Represents groups of contacts for mailing purposes
  table_name: stg_core_mailing_list
  primary_key: id
- entity_name: Users
  entity_description: Represents user accounts with their details and preferences
  table_name: stg_user
  primary_key: user_id
- entity_name: Surveys
  entity_description: Represents individual surveys with their configurations and
    settings
  table_name: stg_survey
  primary_key: survey_id
- entity_name: Survey Distributions
  entity_description: Represents the distribution of surveys to recipients
  table_name: stg_distribution
  primary_key: distribution_id
- entity_name: Mailing List Memberships
  entity_description: Represents the association between contacts and mailing lists
  table_name: stg_contact_mailing_list_membership
  primary_key: contact_lookup_id
- entity_name: Survey Responses
  entity_description: Represents individual responses to surveys
  table_name: stg_survey_response
  primary_key: response_id
- entity_name: Survey Questions
  entity_description: Represents individual questions within surveys
  table_name: stg_question
  primary_key: question_id
relations:
- relation_name: MailingListContacts
  relation_description: This stores the Contacts who are subscribed to specific Mailing
    Lists, including their personal information and subscription status.
  table_name: stg_core_contact
  entities:
  - Contacts
  - Mailing Lists
- relation_name: UserCreatedSurveys
  relation_description: Users create and manage Surveys, configuring various settings
    and options for each survey.
  table_name: stg_survey
  entities:
  - Surveys
  - Users
- relation_name: SurveyDistributionManagement
  relation_description: Survey Distributions are created by Users to send Surveys
    to Contacts or Mailing Lists,  tracking distribution details and recipient information.
  table_name: stg_distribution
  entities:
  - Survey Distributions
  - Contacts
  - Mailing Lists
  - Users
  - Surveys
- relation_name: ContactMailingListMembership
  relation_description: This tracks Contacts' memberships in Mailing Lists, which
    are managed by Users and organized within Directories.
  table_name: stg_contact_mailing_list_membership
  entities:
  - Mailing List Memberships
  - Contacts
  - Directories
  - Mailing Lists
  - Users
- relation_name: SurveyResponseSubmissions
  relation_description: Survey Responses are individual submissions to Surveys, capturing
    respondent details and progress for each attempt.
  table_name: stg_survey_response
  entities:
  - Survey Responses
  - Surveys
- relation_name: SurveyQuestionContent
  relation_description: Survey Questions are specific inquiries or prompts that belong
    to and comprise the content of Surveys.
  table_name: stg_question
  entities:
  - Survey Questions
  - Surveys
- relation_name: SurveyDistributionResponseTracking
  relation_description: Contacts receive Survey Distributions through Mailing List
    Memberships, potentially generating Survey Responses upon interaction with the
    distributed surveys.
  table_name: stg_distribution_contact
  entities:
  - Contacts
  - Survey Distributions
  - Mailing List Memberships
  - Survey Responses
- relation_description: This table stores contact information for individuals within
    a specific directory, identified by a unique directory ID.
  table_name: snapshot_directory_contact
  entities:
  - Directories
- relation_name: UserSurveyManagement
  relation_description: Users create and manage Surveys, tracking their latest versions,
    publication status, and other attributes.
  table_name: snapshot_survey_version
  entities:
  - Users
  - Surveys
- relation_description: This table represents the blocks within surveys, where each
    survey can contain multiple blocks for organizing questions.
  table_name: stg_block
  entities:
  - Surveys
- relation_name: QuestionResponseMapping
  relation_description: Survey Questions are answered by respondents, generating Survey
    Responses that capture specific values for each question.
  table_name: stg_question_response
  entities:
  - Survey Responses
  - Survey Questions
story:
- relation_name: UserCreatedSurveys
  story_line: Users design and configure new surveys.
- relation_name: SurveyQuestionContent
  story_line: Users add specific questions to their surveys.
- relation_name: UserSurveyManagement
  story_line: Users publish and manage their created surveys.
- relation_name: MailingListContacts
  story_line: System stores contact information for potential survey recipients.
- relation_name: ContactMailingListMembership
  story_line: Users organize contacts into mailing lists within directories.
- relation_name: SurveyDistributionManagement
  story_line: Users send surveys to contacts or mailing lists.
- relation_name: SurveyDistributionResponseTracking
  story_line: System tracks survey distribution and recipient interactions.
- relation_name: SurveyResponseSubmissions
  story_line: Recipients submit responses to distributed surveys.
- relation_name: QuestionResponseMapping
  story_line: System records specific answers for each survey question.