Synthea

We display the source tables from the data warehouses to model.

allergies

Table Preview(first 5 rows)

START STOP PATIENT ENCOUNTER CODE SYSTEM DESCRIPTION TYPE CATEGORY REACTION1 DESCRIPTION1 SEVERITY1 REACTION2 DESCRIPTION2 SEVERITY2
0 1974-09-22 NaN d84815a3-c5b3-8ca2-025f-6323a4ec59ef 703ebfb9-fa6b-eb01-664b-f145749cd9f9 288328004 SNOMED-CT Bee venom (substance) allergy environment NaN None None NaN None None
1 1974-09-22 NaN d84815a3-c5b3-8ca2-025f-6323a4ec59ef 703ebfb9-fa6b-eb01-664b-f145749cd9f9 84489001 SNOMED-CT Mold (organism) allergy environment NaN None None NaN None None
2 1974-09-22 NaN d84815a3-c5b3-8ca2-025f-6323a4ec59ef 703ebfb9-fa6b-eb01-664b-f145749cd9f9 260147004 SNOMED-CT House dust mite (organism) allergy environment NaN None None NaN None None
3 1974-09-22 NaN d84815a3-c5b3-8ca2-025f-6323a4ec59ef 703ebfb9-fa6b-eb01-664b-f145749cd9f9 264287008 SNOMED-CT Animal dander (substance) allergy environment 878820003.0 Rhinoconjunctivitis (disorder) MODERATE 247472004.0 Wheal (finding) MILD
4 1974-09-22 NaN d84815a3-c5b3-8ca2-025f-6323a4ec59ef 703ebfb9-fa6b-eb01-664b-f145749cd9f9 256277009 SNOMED-CT Grass pollen (substance) allergy environment NaN None None NaN None None

careplans

Table Preview(first 5 rows)

Id START STOP PATIENT ENCOUNTER CODE DESCRIPTION REASONCODE REASONDESCRIPTION
0 e5cc23db-c84f-3ded-411d-deb020dfcb4c 2016-04-11 2016-07-23 3dfb065a-67df-5b8a-3901-49bfd834bed1 1600b01f-01b3-fd15-135f-c7c4a30d867f 385691007 Fracture care 58150001.0 Fracture of clavicle
1 d70b9f2f-745e-f594-4983-b48790e4291f 1988-04-27 1988-05-12 db80575b-5e9b-921b-fad9-1e3a20929dc7 3c131d0c-6d15-7fe1-1372-a3d8316b6d82 225358003 Wound care 284551006.0 Laceration of foot
2 54b5c4bb-1df6-0892-7097-416ecea0f44f 1989-09-03 1990-07-10 db80575b-5e9b-921b-fad9-1e3a20929dc7 d6df3fd6-e4a0-9b51-ebd5-c6394be154eb 53950000 Respiratory therapy NaN None
3 254cfb18-3266-be35-b8c4-8a6a89ce0547 2014-01-20 2014-02-03 2ffa361e-5858-877e-e022-ce81fe32da1b 17d8fd78-be39-d2f7-6032-74d712972fa3 225358003 Wound care 370247008.0 Facial laceration
4 7e07f691-b437-c62b-4379-0a8e55812747 2017-06-08 2017-09-02 3dfb065a-67df-5b8a-3901-49bfd834bed1 5fbc8a13-6944-a854-c4cb-bbd04e969ee1 47387005 Head injury rehabilitation 62106007.0 Concussion with no loss of consciousness

claims

Table Preview(first 5 rows)

Id PATIENTID PROVIDERID PRIMARYPATIENTINSURANCEID SECONDARYPATIENTINSURANCEID DEPARTMENTID PATIENTDEPARTMENTID DIAGNOSIS1 DIAGNOSIS2 DIAGNOSIS3 DIAGNOSIS4 DIAGNOSIS5 DIAGNOSIS6 DIAGNOSIS7 DIAGNOSIS8 REFERRINGPROVIDERID APPOINTMENTID CURRENTILLNESSDATE SERVICEDATE SUPERVISINGPROVIDERID STATUS1 STATUS2 STATUSP OUTSTANDING1 OUTSTANDING2 OUTSTANDINGP LASTBILLEDDATE1 LASTBILLEDDATE2 LASTBILLEDDATEP HEALTHCARECLAIMTYPEID1 HEALTHCARECLAIMTYPEID2
0 9f4426b7-f32c-2b00-8ab4-1152cd0915ce 2ffa361e-5858-877e-e022-ce81fe32da1b 5ceef3de-8e50-3c8f-8224-d277539f4100 0 0 20 20 5251000175109 160903007.0 NaN NaN NaN NaN NaN NaN NaN c6600278-5c90-c0c7-db52-140b3b98d1e8 1962-07-25T12:00:39Z 1962-07-25T12:00:39Z 5ceef3de-8e50-3c8f-8224-d277539f4100 CLOSED CLOSED CLOSED 0.0 0.0 0.0 1962-07-25T12:15:39Z 1962-07-25T12:15:39Z 1962-07-25T12:15:39Z 1 1
1 e22515cc-2422-7aac-49bd-ecc3a2ffdf86 2ffa361e-5858-877e-e022-ce81fe32da1b 5ceef3de-8e50-3c8f-8224-d277539f4100 0 0 20 20 160903007 NaN NaN NaN NaN NaN NaN NaN NaN 5d6bce0a-93b6-1c60-4399-97a717a3f843 1963-07-31T12:00:39Z 1963-07-31T12:00:39Z 5ceef3de-8e50-3c8f-8224-d277539f4100 CLOSED CLOSED CLOSED 0.0 0.0 0.0 1963-07-31T12:15:39Z 1963-07-31T12:15:39Z 1963-07-31T12:15:39Z 1 1
2 7ac8ce29-5517-d3d5-5004-f11942ae3225 2ffa361e-5858-877e-e022-ce81fe32da1b 5ceef3de-8e50-3c8f-8224-d277539f4100 0 0 20 20 160903007 NaN NaN NaN NaN NaN NaN NaN NaN 829830eb-ca6e-ae6a-99a2-728751b0ea8e 1966-08-03T12:00:39Z 1966-08-03T12:00:39Z 5ceef3de-8e50-3c8f-8224-d277539f4100 CLOSED CLOSED CLOSED 0.0 0.0 0.0 1966-08-03T12:15:39Z 1966-08-03T12:15:39Z 1966-08-03T12:15:39Z 1 1
3 92a599e7-23f3-c14d-343e-635eded378a6 2ffa361e-5858-877e-e022-ce81fe32da1b 5ceef3de-8e50-3c8f-8224-d277539f4100 0 0 20 20 160903007 NaN NaN NaN NaN NaN NaN NaN NaN 0f6ff0c6-c365-84f5-6742-226325f0ab22 1972-08-09T12:00:39Z 1972-08-09T12:00:39Z 5ceef3de-8e50-3c8f-8224-d277539f4100 CLOSED CLOSED CLOSED 0.0 0.0 0.0 1972-08-09T12:15:39Z 1972-08-09T12:15:39Z 1972-08-09T12:15:39Z 1 1
4 c409c86b-ece6-a55c-79a9-c066fe56433a 3dfb065a-67df-5b8a-3901-49bfd834bed1 5053c59c-2f74-3246-9fda-cf2fbe4f6972 26aab0cd-6aba-3e1b-ac5b-05c8867e762c 0 10 10 410620009 NaN NaN NaN NaN NaN NaN NaN NaN 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13T01:05:00Z 2013-01-13T01:05:00Z 5053c59c-2f74-3246-9fda-cf2fbe4f6972 CLOSED CLOSED CLOSED 0.0 0.0 0.0 2013-01-13T01:20:00Z 2013-01-13T01:20:00Z 2013-01-13T01:20:00Z 1 1

claims_transactions

Table Preview(first 5 rows)

ID CLAIMID CHARGEID PATIENTID TYPE AMOUNT METHOD FROMDATE TODATE PLACEOFSERVICE PROCEDURECODE MODIFIER1 MODIFIER2 DIAGNOSISREF1 DIAGNOSISREF2 DIAGNOSISREF3 DIAGNOSISREF4 UNITS DEPARTMENTID NOTES UNITAMOUNT TRANSFEROUTID TRANSFERTYPE PAYMENTS ADJUSTMENTS TRANSFERS OUTSTANDING APPOINTMENTID LINENOTE PATIENTINSURANCEID FEESCHEDULEID PROVIDERID SUPERVISINGPROVIDERID
0 da18b712-59be-0489-1ba0-4830aef61968 9f4426b7-f32c-2b00-8ab4-1152cd0915ce 0 2ffa361e-5858-877e-e022-ce81fe32da1b CHARGE 136.8 None 1962-07-25T12:00:39Z 1962-07-25T12:15:39Z f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 162673000 NaN NaN 1 2.0 NaN NaN 1 20 General examination of patient (procedure) 136.8 NaN p 0.0 0.0 NaN 0.0 c6600278-5c90-c0c7-db52-140b3b98d1e8 NaN None 1 5ceef3de-8e50-3c8f-8224-d277539f4100 5ceef3de-8e50-3c8f-8224-d277539f4100
1 907b5d5f-4d34-385d-8fd9-564e782a8743 9f4426b7-f32c-2b00-8ab4-1152cd0915ce 1 2ffa361e-5858-877e-e022-ce81fe32da1b PAYMENT NaN CHECK 1962-07-25T12:00:39Z 1962-07-25T12:15:39Z f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 162673000 NaN NaN 1 2.0 NaN NaN 1 20 General examination of patient (procedure) 0.0 NaN None 136.8 0.0 NaN 0.0 c6600278-5c90-c0c7-db52-140b3b98d1e8 NaN None 1 5ceef3de-8e50-3c8f-8224-d277539f4100 5ceef3de-8e50-3c8f-8224-d277539f4100
2 bb0601d7-9d50-d1cc-488d-7e29cd95b49c e22515cc-2422-7aac-49bd-ecc3a2ffdf86 3 2ffa361e-5858-877e-e022-ce81fe32da1b CHARGE 136.8 None 1963-07-31T12:00:39Z 1963-07-31T12:15:39Z f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 162673000 NaN NaN 1 NaN NaN NaN 1 20 General examination of patient (procedure) 136.8 NaN p 0.0 0.0 NaN 0.0 5d6bce0a-93b6-1c60-4399-97a717a3f843 NaN None 1 5ceef3de-8e50-3c8f-8224-d277539f4100 5ceef3de-8e50-3c8f-8224-d277539f4100
3 555840f3-d0fc-4bb2-52fc-477e525e9ff1 e22515cc-2422-7aac-49bd-ecc3a2ffdf86 4 2ffa361e-5858-877e-e022-ce81fe32da1b PAYMENT NaN CASH 1963-07-31T12:00:39Z 1963-07-31T12:15:39Z f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 162673000 NaN NaN 1 NaN NaN NaN 1 20 General examination of patient (procedure) 0.0 NaN None 136.8 0.0 NaN 0.0 5d6bce0a-93b6-1c60-4399-97a717a3f843 NaN None 1 5ceef3de-8e50-3c8f-8224-d277539f4100 5ceef3de-8e50-3c8f-8224-d277539f4100
4 7e46adfb-5b4f-4454-037a-2d834eefc4e8 7ac8ce29-5517-d3d5-5004-f11942ae3225 6 2ffa361e-5858-877e-e022-ce81fe32da1b CHARGE 136.8 None 1966-08-03T12:00:39Z 1966-08-03T12:15:39Z f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 162673000 NaN NaN 1 NaN NaN NaN 1 20 General examination of patient (procedure) 136.8 NaN p 0.0 0.0 NaN 0.0 829830eb-ca6e-ae6a-99a2-728751b0ea8e NaN None 1 5ceef3de-8e50-3c8f-8224-d277539f4100 5ceef3de-8e50-3c8f-8224-d277539f4100

conditions

Table Preview(first 5 rows)

START STOP PATIENT ENCOUNTER CODE DESCRIPTION
0 1962-07-25 None 2ffa361e-5858-877e-e022-ce81fe32da1b c6600278-5c90-c0c7-db52-140b3b98d1e8 5251000175109 Received certificate of high school equivalency (finding)
1 1962-07-25 1963-07-31 2ffa361e-5858-877e-e022-ce81fe32da1b c6600278-5c90-c0c7-db52-140b3b98d1e8 160903007 Full-time employment (finding)
2 1963-07-31 1966-08-03 2ffa361e-5858-877e-e022-ce81fe32da1b 5d6bce0a-93b6-1c60-4399-97a717a3f843 160903007 Full-time employment (finding)
3 1966-08-03 1969-08-06 2ffa361e-5858-877e-e022-ce81fe32da1b 829830eb-ca6e-ae6a-99a2-728751b0ea8e 160903007 Full-time employment (finding)
4 1972-08-09 1975-08-13 2ffa361e-5858-877e-e022-ce81fe32da1b 0f6ff0c6-c365-84f5-6742-226325f0ab22 160903007 Full-time employment (finding)

devices

Table Preview(first 5 rows)

START STOP PATIENT ENCOUNTER CODE DESCRIPTION UDI
0 2004-12-16T19:00:39Z None 2ffa361e-5858-877e-e022-ce81fe32da1b 7cde5dee-2065-db54-2aca-5335fca85bd5 170615005 Home nebulizer (physical object) (01)22513743438051(11)041125(17)291210(10)68326934928628(21)359700838
1 2022-09-25T02:44:04Z None eb247227-e839-88d3-447d-b5972468f33b 1cedec53-a362-320c-d52a-ddbc57fbce63 228869008 Manual wheelchair (physical object) (01)10803398532575(11)220904(17)470919(10)77584230(21)506502350562
2 2003-01-26T18:35:55Z None d84815a3-c5b3-8ca2-025f-6323a4ec59ef becb96b4-e1aa-1b80-0aa9-8b06e258871f 170615005 Home nebulizer (physical object) (01)91908954162042(11)030105(17)280120(10)384840111239630881(21)2214828292399
3 2017-04-19T15:37:58Z 2017-05-15T15:37:58Z 7ec76836-c039-d9bf-8bb9-fe488c66d452 4e5eeece-1a14-8ae0-874f-15d9c6f42a34 228869008 Manual wheelchair (physical object) (01)75341347423725(11)170329(17)420413(10)91627316426(21)736039710205635138
4 2014-10-18T16:39:19Z 2015-01-09T17:11:06Z 734e5f3c-e660-6cbe-7c26-c5264cbde68e a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 228869008 Manual wheelchair (physical object) (01)44815148141981(11)140927(17)391012(10)395970756765435894(21)565882491188675

encounters

Table Preview(first 5 rows)

Id START STOP PATIENT ORGANIZATION PROVIDER PAYER ENCOUNTERCLASS CODE DESCRIPTION BASE_ENCOUNTER_COST TOTAL_CLAIM_COST PAYER_COVERAGE REASONCODE REASONDESCRIPTION
0 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13T01:05:00Z 2013-01-13T01:20:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 0e061004-bcc0-3e99-a237-d700823e817b 5053c59c-2f74-3246-9fda-cf2fbe4f6972 26aab0cd-6aba-3e1b-ac5b-05c8867e762c wellness 410620009 Well child visit (procedure) 136.8 272.80 272.8 NaN None
1 79b413f6-c668-bf41-bce7-b403c40cf7a4 2021-09-23T02:44:04Z 2021-09-23T02:59:04Z eb247227-e839-88d3-447d-b5972468f33b ea153d0a-9ccc-3769-9a25-934b417071b7 a9a4676a-e232-3eef-a874-e7af0fa54350 734afbd6-4794-363b-9bc0-6a3981533ed5 wellness 410620009 Well child visit (procedure) 136.8 871.12 0.0 NaN None
2 c6600278-5c90-c0c7-db52-140b3b98d1e8 1962-07-25T12:00:39Z 1962-07-25T12:15:39Z 2ffa361e-5858-877e-e022-ce81fe32da1b f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 5ceef3de-8e50-3c8f-8224-d277539f4100 b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 wellness 162673000 General examination of patient (procedure) 136.8 704.20 0.0 NaN None
3 5d6bce0a-93b6-1c60-4399-97a717a3f843 1963-07-31T12:00:39Z 1963-07-31T12:15:39Z 2ffa361e-5858-877e-e022-ce81fe32da1b f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 5ceef3de-8e50-3c8f-8224-d277539f4100 b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 wellness 162673000 General examination of patient (procedure) 136.8 704.20 0.0 NaN None
4 829830eb-ca6e-ae6a-99a2-728751b0ea8e 1966-08-03T12:00:39Z 1966-08-03T12:15:39Z 2ffa361e-5858-877e-e022-ce81fe32da1b f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 5ceef3de-8e50-3c8f-8224-d277539f4100 b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 wellness 162673000 General examination of patient (procedure) 136.8 778.78 0.0 NaN None

imaging_studies

Table Preview(first 5 rows)

Id DATE_ PATIENT ENCOUNTER SERIES_UID BODYSITE_CODE BODYSITE_DESCRIPTION MODALITY_CODE MODALITY_DESCRIPTION INSTANCE_UID SOP_CODE SOP_DESCRIPTION PROCEDURE_CODE
0 b7b3078a-86e4-fc51-1720-d33afcd4ce88 2016-04-11T03:12:57Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 1600b01f-01b3-fd15-135f-c7c4a30d867f 1.2.840.99999999.1.63419662.1460344377161 51299004 Clavicle DX Digital Radiography 1.2.840.99999999.1.1.35771508.1103224827588 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 168594001
1 6f1b4250-2e8e-eaaf-f6a7-655d8f8e5791 2022-09-25T02:44:04Z eb247227-e839-88d3-447d-b5972468f33b 1cedec53-a362-320c-d52a-ddbc57fbce63 1.2.840.99999999.1.83549147.1664073844511 344001 Ankle DX Digital Radiography 1.2.840.99999999.1.1.49048804.1413650359123 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 19490002
2 932a8e0d-20d5-573b-db47-8b9db9d3b99c 1991-05-28T22:13:31Z db80575b-5e9b-921b-fad9-1e3a20929dc7 59f48dfd-f1f6-faa7-f378-150137d46be4 1.2.840.99999999.1.10943528.675468811165 40983000 Arm DX Digital Radiography 1.2.840.99999999.1.1.93692690.675468811165 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 1225002
3 c16fda8c-2353-75d9-f7c2-d91c773d87de 2014-10-18T16:39:19Z 734e5f3c-e660-6cbe-7c26-c5264cbde68e a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 1.2.840.99999999.1.70861523.1413650359123 344001 Ankle DX Digital Radiography 1.2.840.99999999.1.1.49048804.1413650359123 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 19490002
4 ef1ee1a1-e906-6744-421a-ab8480b8b8ae 2013-11-23T00:28:23Z 750cdaf4-c264-e967-e76b-53a5a61abcab 0148026b-b4db-1663-19ed-a6a51d3c33b2 1.2.840.99999999.1.26012892.1385166503634 51299004 Clavicle DX Digital Radiography 1.2.840.99999999.1.1.35771508.1103224827588 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 168594001

immunizations

Table Preview(first 5 rows)

DATE_ PATIENT ENCOUNTER CODE DESCRIPTION BASE_COST
0 2013-01-13T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b80cac7-383e-b479-1a05-7d8a9de45d04 140 Influenza seasonal injectable preservative free 136.0
1 1986-06-17T21:35:49Z db80575b-5e9b-921b-fad9-1e3a20929dc7 07969a4d-f9f0-c63f-6a0e-ecb659a20571 140 Influenza seasonal injectable preservative free 136.0
2 2014-01-19T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b1ef1fb-3297-bf02-8442-f68af013a64c 21 varicella 136.0
3 2014-01-19T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b1ef1fb-3297-bf02-8442-f68af013a64c 10 IPV 136.0
4 2014-01-19T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b1ef1fb-3297-bf02-8442-f68af013a64c 140 Influenza seasonal injectable preservative free 136.0

medications

Table Preview(first 5 rows)

START STOP PATIENT PAYER ENCOUNTER CODE DESCRIPTION BASE_COST PAYER_COVERAGE DISPENSES TOTALCOST REASONCODE REASONDESCRIPTION
0 2009-09-02T12:00:39Z None 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a 546912c2-aa40-a2e0-f148-af23fa171af0 309362 Clopidogrel 75 MG Oral Tablet 631.42 0.00 162 102290.04 NaN None
1 2009-09-02T12:00:39Z None 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a 546912c2-aa40-a2e0-f148-af23fa171af0 312961 Simvastatin 20 MG Oral Tablet 323.20 29.40 162 52358.40 NaN None
2 2009-09-02T12:00:39Z None 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a 546912c2-aa40-a2e0-f148-af23fa171af0 866412 24 HR metoprolol succinate 100 MG Extended Release Oral Tablet 129.94 99.94 162 21050.28 NaN None
3 2009-09-02T12:00:39Z None 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a 546912c2-aa40-a2e0-f148-af23fa171af0 705129 Nitroglycerin 0.4 MG/ACTUAT Mucosal Spray 702.61 672.61 162 113822.82 NaN None
4 2016-04-11T04:55:57Z 2016-07-23T04:55:57Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c 1600b01f-01b3-fd15-135f-c7c4a30d867f 198405 Ibuprofen 100 MG Oral Tablet 332.42 0.00 3 997.26 NaN None

observations

Table Preview(first 5 rows)

DATE_ PATIENT ENCOUNTER CATEGORY CODE DESCRIPTION VALUE_ UNITS TYPE
0 2021-09-23T02:44:04Z eb247227-e839-88d3-447d-b5972468f33b 79b413f6-c668-bf41-bce7-b403c40cf7a4 vital-signs 8302-2 Body Height 54.4 cm numeric
1 2021-09-23T02:44:04Z eb247227-e839-88d3-447d-b5972468f33b 79b413f6-c668-bf41-bce7-b403c40cf7a4 vital-signs 72514-3 Pain severity - 0-10 verbal numeric rating [Score] - Reported 3.0 {score} numeric
2 2021-09-23T02:44:04Z eb247227-e839-88d3-447d-b5972468f33b 79b413f6-c668-bf41-bce7-b403c40cf7a4 vital-signs 29463-7 Body Weight 4.8 kg numeric
3 2013-01-13T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b80cac7-383e-b479-1a05-7d8a9de45d04 vital-signs 8302-2 Body Height 92.3 cm numeric
4 2013-01-13T01:05:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b80cac7-383e-b479-1a05-7d8a9de45d04 vital-signs 72514-3 Pain severity - 0-10 verbal numeric rating [Score] - Reported 3.0 {score} numeric

organizations

Table Preview(first 5 rows)

Id NAME ADDRESS CITY STATE ZIP LAT LON PHONE REVENUE UTILIZATION
0 74ab949d-17ac-3309-83a0-13b4405c66aa Fitchburg Outpatient Clinic 881 Main Street Fitchburg MA 1420 42.586487 -71.805210 978-342-9781 Or 978-342-9781 0.0 131
1 588f6ce6-b8db-3588-8189-29db2680a313 BOSTON HEALTH CARE FOR THE HOMELESS PROGRAM INC 461 WALNUT AVE JAMAICA PLAIN MA 21302331 42.311588 -71.098001 8576541550 0.0 30
2 e09d4c49-c2ef-3b0f-9a46-3719d9219306 UMASS MEMORIAL HEALTHALLIANCE CLINTON HOSPITAL INC 60 HOSPITAL RD LEOMINSTER MA 14533290 42.540319 -71.763130 9784662000 0.0 8
3 aa682136-a4df-3942-9f50-e6fcee6c0b73 SOUTHWOOD AT NORWELL NURSING CTR 501 CORDWAINER DRIVE NORWELL MA 20611630 42.164707 -70.881491 7819827450 0.0 12
4 ab701a70-a658-340e-8f69-ee196a7d40c6 CAREWELL URGENT CARE CENTERS OF MA PC 484 RTE 134 WORCESTER MA 16071728 42.262562 -71.801888 5086947901 0.0 3

patients

Table Preview(first 5 rows)

Id BIRTHDATE DEATHDATE SSN DRIVERS PASSPORT PREFIX FIRST_ LAST_ SUFFIX MAIDEN MARITAL RACE ETHNICITY GENDER BIRTHPLACE ADDRESS CITY STATE COUNTY FIPS ZIP LAT LON HEALTHCARE_EXPENSES HEALTHCARE_COVERAGE INCOME
0 eb247227-e839-88d3-447d-b5972468f33b 2021-09-23 None 999-41-1756 None None None Mel236 Bailey598 None None None white nonhispanic M Norton Center Massachusetts US 716 Wunsch Gardens Unit 48 Framingham Massachusetts Middlesex County 25017.0 1701 42.275650 -71.476367 2520.80 4323.64 170754
1 2ffa361e-5858-877e-e022-ce81fe32da1b 1944-05-31 None 999-33-4589 S99957814 X45639058X Mrs. Cheyenne169 Marks830 None Lowe577 M white nonhispanic F Longmeadow Massachusetts US 123 Bayer Camp Taunton Massachusetts Bristol County 25005.0 2718 41.892884 -71.066686 205342.20 94647.00 40526
2 3dfb065a-67df-5b8a-3901-49bfd834bed1 2009-02-08 None 999-59-2568 None None None Hunter736 Keebler762 None None None white nonhispanic M Maynard Massachusetts US 575 Jast Rue Unit 48 Winchendon Massachusetts Worcester County 25027.0 1475 42.670059 -72.074664 16381.92 17447.87 79884
3 db80575b-5e9b-921b-fad9-1e3a20929dc7 1979-06-26 1995-07-04 999-77-7700 S99968506 None None Herschel574 Ernser583 None None None asian nonhispanic M Somerville Massachusetts US 184 Langworth Parade Apt 10 Boston Massachusetts Suffolk County 25025.0 2131 42.395516 -71.059015 3850.00 44057.32 6420
4 d84815a3-c5b3-8ca2-025f-6323a4ec59ef 1973-05-31 None 999-29-2359 S99967405 X86891718X Mrs. Lacey714 Heathcote539 None Hegmann834 M white nonhispanic F Natick Massachusetts US 801 Morissette Divide Hingham Massachusetts Plymouth County 25023.0 2043 42.200723 -70.836590 66662.10 1777031.06 933420

payers

Table Preview(first 5 rows)

Id NAME OWNERSHIP ADDRESS CITY STATE_HEADQUARTERED ZIP PHONE AMOUNT_COVERED AMOUNT_UNCOVERED REVENUE COVERED_ENCOUNTERS UNCOVERED_ENCOUNTERS COVERED_MEDICATIONS UNCOVERED_MEDICATIONS COVERED_PROCEDURES UNCOVERED_PROCEDURES COVERED_IMMUNIZATIONS UNCOVERED_IMMUNIZATIONS UNIQUE_CUSTOMERS QOLS_AVG MEMBER_MONTHS
0 a735bf55-83e9-331a-899d-a82a60b9f60c Medicare GOVERNMENT NaN NaN NaN NaN NaN 3245258.85 57324.03 460290.60 927 0 918 0 1270 0 202 0 16 0.753073 2748
1 df166300-5a78-3502-a46a-832842197811 Medicaid GOVERNMENT NaN NaN NaN NaN NaN 6112421.45 103336.16 45600.00 1217 0 286 0 1908 0 534 0 19 0.953093 4524
2 d31fccc3-1767-390d-966a-22a5156f4219 UnitedHealthcare PRIVATE NaN NaN NaN NaN NaN 37929.59 28070.66 218949.27 49 0 10 0 36 0 41 0 3 0.800393 384
3 8fa6c185-e44e-3e34-8bd8-39be8694f4ce Cigna Health PRIVATE NaN NaN NaN NaN NaN 2110812.70 318590.30 2102717.53 727 0 236 0 765 0 412 0 13 0.939912 3660
4 d18ef2e6-ef40-324c-be54-34a5ee865625 Dual Eligible GOVERNMENT NaN NaN NaN NaN NaN 931603.10 8658.65 2990.00 201 0 84 0 231 0 48 0 5 0.878956 300

payer_transitions

Table Preview(first 5 rows)

PATIENT MEMBERID START_DATE END_DATE PAYER SECONDARY_PAYER PLAN_OWNERSHIP OWNER_NAME
0 eb247227-e839-88d3-447d-b5972468f33b 74d0f7af-d6c6-7848-9f62-c3f165170e09 2021-09-23T02:44:04Z 2022-09-29T02:44:04Z 734afbd6-4794-363b-9bc0-6a3981533ed5 None Guardian Solomon675 Bailey598
1 eb247227-e839-88d3-447d-b5972468f33b 74d0f7af-d6c6-7848-9f62-c3f165170e09 2022-09-29T02:44:04Z 2023-09-29T02:44:04Z 734afbd6-4794-363b-9bc0-6a3981533ed5 None Guardian Solomon675 Bailey598
2 3dfb065a-67df-5b8a-3901-49bfd834bed1 d4ef17f8-1714-3490-17af-70d4d0c16478 2009-02-08T01:05:00Z 2010-02-14T01:05:00Z 26aab0cd-6aba-3e1b-ac5b-05c8867e762c None Guardian Hong136 Keebler762
3 3dfb065a-67df-5b8a-3901-49bfd834bed1 d4ef17f8-1714-3490-17af-70d4d0c16478 2010-02-14T01:05:00Z 2011-02-20T01:05:00Z 26aab0cd-6aba-3e1b-ac5b-05c8867e762c None Guardian Hong136 Keebler762
4 3dfb065a-67df-5b8a-3901-49bfd834bed1 d4ef17f8-1714-3490-17af-70d4d0c16478 2011-02-20T01:05:00Z 2012-02-26T01:05:00Z 26aab0cd-6aba-3e1b-ac5b-05c8867e762c None Guardian Hong136 Keebler762

procedures

Table Preview(first 5 rows)

START STOP PATIENT ENCOUNTER CODE DESCRIPTION BASE_COST REASONCODE REASONDESCRIPTION
0 2014-01-19T01:05:00Z 2014-01-19T01:20:00Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 8b1ef1fb-3297-bf02-8442-f68af013a64c 430193006 Medication Reconciliation (procedure) 493.03 NaN None
1 2021-09-23T02:44:04Z 2021-09-23T02:59:04Z eb247227-e839-88d3-447d-b5972468f33b 79b413f6-c668-bf41-bce7-b403c40cf7a4 430193006 Medication Reconciliation (procedure) 523.74 NaN None
2 2021-10-28T02:44:04Z 2021-10-28T02:59:04Z eb247227-e839-88d3-447d-b5972468f33b a79858c9-a74e-8ca3-c9a0-b63897e8fc05 430193006 Medication Reconciliation (procedure) 536.16 NaN None
3 2016-04-11T03:12:57Z 2016-04-11T03:42:57Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 1600b01f-01b3-fd15-135f-c7c4a30d867f 168594001 Clavicle X-ray 431.40 NaN None
4 2016-04-11T03:12:57Z 2016-04-11T04:55:57Z 3dfb065a-67df-5b8a-3901-49bfd834bed1 1600b01f-01b3-fd15-135f-c7c4a30d867f 305428000 Admission to orthopedic department 431.40 58150001.0 Fracture of clavicle

providers

Table Preview(first 5 rows)

Id ORGANIZATION NAME GENDER SPECIALITY ADDRESS CITY STATE ZIP LAT LON ENCOUNTERS PROCEDURES
0 6e3f59d7-742a-3b38-a4b1-1b471ed651a2 74ab949d-17ac-3309-83a0-13b4405c66aa Gabriel934 Reilly981 F GENERAL PRACTICE 881 Main Street Fitchburg MA 1420 42.586487 -71.805210 131 0
1 63bf819f-a8ee-3bc1-b34a-787fe8e8e4dc 588f6ce6-b8db-3588-8189-29db2680a313 Terence292 Brakus656 M GENERAL PRACTICE 461 WALNUT AVE JAMAICA PLAIN MA 21302331 42.311588 -71.098001 30 0
2 ec207479-d18d-3b2d-9f80-f76ba864aefb e09d4c49-c2ef-3b0f-9a46-3719d9219306 Jules135 Emard19 M GENERAL PRACTICE 60 HOSPITAL RD LEOMINSTER MA 14533290 42.540319 -71.763130 8 0
3 f751fb18-fa7d-346e-8b95-7b8cb0dbcf32 aa682136-a4df-3942-9f50-e6fcee6c0b73 Precious140 Runolfsson901 F GENERAL PRACTICE 501 CORDWAINER DRIVE NORWELL MA 20611630 42.164707 -70.881491 12 0
4 5afd05ef-bd29-32bc-bb4f-a17144628cbd ab701a70-a658-340e-8f69-ee196a7d40c6 Eduardo902 Villegas15 M GENERAL PRACTICE 484 RTE 134 WORCESTER MA 16071728 42.262562 -71.801888 3 0

supplies

Table Preview(first 5 rows)

DATE_ PATIENT ENCOUNTER CODE DESCRIPTION QUANTITY
0 1993-07-26 b3b71304-fe5b-bda4-6822-bd901b2836d1 71154994-99cb-1859-da47-be6aefdc4632 337388004 Blood glucose testing strips (physical object) 50
1 1999-08-02 b3b71304-fe5b-bda4-6822-bd901b2836d1 4e7c8052-dcb0-0b2d-76e9-42638068f34d 337388004 Blood glucose testing strips (physical object) 50
2 2002-05-20 b3b71304-fe5b-bda4-6822-bd901b2836d1 f3905515-1215-b9dc-c4dd-e5113939678a 337388004 Blood glucose testing strips (physical object) 50
3 2004-05-24 b3b71304-fe5b-bda4-6822-bd901b2836d1 53550222-7920-d954-8131-bdc89637746d 337388004 Blood glucose testing strips (physical object) 50
4 2005-09-12 b3b71304-fe5b-bda4-6822-bd901b2836d1 027a6834-445b-9d67-f1a8-437d8c1af741 337388004 Blood glucose testing strips (physical object) 50
We display the stage tables, their SQL transformations, and their configurations.

stg_allergies

Table Preview (first 5 rows)

coding_system allergen_description sensitivity_type allergen_category primary_reaction_description primary_reaction_severity secondary_reaction_description secondary_reaction_severity allergen_code allergy_end_date allergy_start_date encounter_id patient_id primary_reaction_code secondary_reaction_code
0 SNOMED-CT Bee venom (substance) allergy environment None None None None 288328004 NaT 1974-09-22 703ebfb9-fa6b-eb01-664b-f145749cd9f9 d84815a3-c5b3-8ca2-025f-6323a4ec59ef None None
1 SNOMED-CT Mold (organism) allergy environment None None None None 84489001 NaT 1974-09-22 703ebfb9-fa6b-eb01-664b-f145749cd9f9 d84815a3-c5b3-8ca2-025f-6323a4ec59ef None None
2 SNOMED-CT House dust mite (organism) allergy environment None None None None 260147004 NaT 1974-09-22 703ebfb9-fa6b-eb01-664b-f145749cd9f9 d84815a3-c5b3-8ca2-025f-6323a4ec59ef None None
3 SNOMED-CT Animal dander (substance) allergy environment Rhinoconjunctivitis (disorder) MODERATE Wheal (finding) MILD 264287008 NaT 1974-09-22 703ebfb9-fa6b-eb01-664b-f145749cd9f9 d84815a3-c5b3-8ca2-025f-6323a4ec59ef 878820003.0 247472004.0
4 SNOMED-CT Grass pollen (substance) allergy environment None None None None 256277009 NaT 1974-09-22 703ebfb9-fa6b-eb01-664b-f145749cd9f9 d84815a3-c5b3-8ca2-025f-6323a4ec59ef None None

stg_allergies.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:46:26.795976+00:00
WITH 
"allergies_renamed" AS (
    -- Rename: Renaming columns
    -- START -> allergy_start_date
    -- STOP -> allergy_end_date
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> allergen_code
    -- SYSTEM -> coding_system
    -- DESCRIPTION -> allergen_description
    -- TYPE -> sensitivity_type
    -- CATEGORY -> allergen_category
    -- REACTION1 -> primary_reaction_code
    -- DESCRIPTION1 -> primary_reaction_description
    -- SEVERITY1 -> primary_reaction_severity
    -- REACTION2 -> secondary_reaction_code
    -- DESCRIPTION2 -> secondary_reaction_description
    -- SEVERITY2 -> secondary_reaction_severity
    SELECT 
        "START" AS "allergy_start_date",
        "STOP" AS "allergy_end_date",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "allergen_code",
        "SYSTEM" AS "coding_system",
        "DESCRIPTION" AS "allergen_description",
        "TYPE" AS "sensitivity_type",
        "CATEGORY" AS "allergen_category",
        "REACTION1" AS "primary_reaction_code",
        "DESCRIPTION1" AS "primary_reaction_description",
        "SEVERITY1" AS "primary_reaction_severity",
        "REACTION2" AS "secondary_reaction_code",
        "DESCRIPTION2" AS "secondary_reaction_description",
        "SEVERITY2" AS "secondary_reaction_severity"
    FROM "memory"."main"."allergies"
),

"allergies_renamed_casted" AS (
    -- Column Type Casting: 
    -- allergen_code: from INT to VARCHAR
    -- allergy_end_date: from DECIMAL to DATE
    -- allergy_start_date: from VARCHAR to DATE
    -- encounter_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- primary_reaction_code: from DECIMAL to VARCHAR
    -- secondary_reaction_code: from DECIMAL to VARCHAR
    SELECT
        "coding_system",
        "allergen_description",
        "sensitivity_type",
        "allergen_category",
        "primary_reaction_description",
        "primary_reaction_severity",
        "secondary_reaction_description",
        "secondary_reaction_severity",
        CAST("allergen_code" AS VARCHAR) 
        AS "allergen_code",
        CAST("allergy_end_date" AS DATE) 
        AS "allergy_end_date",
        CAST("allergy_start_date" AS DATE) 
        AS "allergy_start_date",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("primary_reaction_code" AS VARCHAR) 
        AS "primary_reaction_code",
        CAST("secondary_reaction_code" AS VARCHAR) 
        AS "secondary_reaction_code"
    FROM "allergies_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "allergies_renamed_casted"

stg_allergies.yml (Document the table)

version: 2
models:
- name: stg_allergies
  description: The table is about allergies. It contains patient and encounter IDs.
    Each row represents an allergy with its code, description, type, and category.
    Some allergies have associated reactions and their severities. The allergies are
    mostly environmental, starting on the same date for this patient. Common allergens
    include bee venom, mold, dust mites, animal dander, and grass pollen.
  columns:
  - name: coding_system
    description: Coding system used for the allergen code
    tests:
    - not_null
    - accepted_values:
        values:
        - SNOMED-CT
        - RxNorm
    cocoon_meta:
      future_accepted_values:
      - ICD-10
      - LOINC
      - NDC
      - CPT
      - HCPCS
      - ATC
      - MedDRA
      - UMLS
      - NDF-RT
      - UNII
      data_type:
        current_data_type: VARCHAR
  - name: allergen_description
    description: Description of the allergen
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: sensitivity_type
    description: Type of sensitivity (e.g., allergy)
    tests:
    - not_null
    - accepted_values:
        values:
        - allergy
        - intolerance
    cocoon_meta:
      future_accepted_values:
      - hypersensitivity
      - idiosyncrasy
      - autoimmune reaction
      - adverse reaction
      - toxicity
      - chemical sensitivity
      - food sensitivity
      - drug sensitivity
      - environmental sensitivity
      data_type:
        current_data_type: VARCHAR
  - name: allergen_category
    description: Category of the allergen (e.g., environment)
    tests:
    - not_null
    - accepted_values:
        values:
        - environment
        - food
        - medication
    cocoon_meta:
      future_accepted_values:
      - insect
      - latex
      - pet
      - occupational
      data_type:
        current_data_type: VARCHAR
  - name: primary_reaction_description
    description: Description of the first allergic reaction
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: primary_reaction_severity
    description: Severity of the first allergic reaction
    tests:
    - accepted_values:
        values:
        - MODERATE
        - MILD
        - SEVERE
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: secondary_reaction_description
    description: Description of the second allergic reaction
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: secondary_reaction_severity
    description: Severity of the second allergic reaction
    tests:
    - accepted_values:
        values:
        - MILD
        - MODERATE
    cocoon_meta:
      missing_reason: Unknown
      future_accepted_values:
      - SEVERE
      - CRITICAL
      - NONE
      data_type:
        current_data_type: VARCHAR
  - name: allergen_code
    description: Standardized code for the allergen
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: allergy_end_date
    description: End date of the allergy record, if applicable
    cocoon_meta:
      missing_reason: Ongoing allergies don't have an end date.
      data_type:
        current_data_type: DATE
  - name: allergy_start_date
    description: Start date of the allergy record
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DATE
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: primary_reaction_code
    description: Code for the first allergic reaction
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: secondary_reaction_code
    description: Code for the second allergic reaction
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_claims

Table Preview (first 5 rows)

secondary_insurance_id primary_claim_type_id patient_outstanding_amount claim_type_id secondary_outstanding_amount primary_claim_status department_id patient_claim_status secondary_claim_status primary_outstanding_amount patient_department_id appointment_id claim_id illness_onset_date octonary_diagnosis patient_id patient_last_billed_date primary_diagnosis primary_insurance_id primary_last_billed_date provider_id quaternary_diagnosis quinary_diagnosis referring_provider_id secondary_diagnosis secondary_last_billed_date senary_diagnosis septenary_diagnosis service_date supervising_provider_id tertiary_diagnosis
0 None 1 0.0 1 0.0 CLOSED 20 CLOSED CLOSED 0.0 20 c6600278-5c90-c0c7-db52-140b3b98d1e8 9f4426b7-f32c-2b00-8ab4-1152cd0915ce 1962-07-25 12:00:39 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1962-07-25 12:15:39 5251000175109 NaN 1962-07-25 12:15:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None None None 160903007.0 1962-07-25 12:15:39 None None 1962-07-25 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None
1 None 1 0.0 1 0.0 CLOSED 20 CLOSED CLOSED 0.0 20 5d6bce0a-93b6-1c60-4399-97a717a3f843 e22515cc-2422-7aac-49bd-ecc3a2ffdf86 1963-07-31 12:00:39 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1963-07-31 12:15:39 160903007 NaN 1963-07-31 12:15:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None None None None 1963-07-31 12:15:39 None None 1963-07-31 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None
2 None 1 0.0 1 0.0 CLOSED 20 CLOSED CLOSED 0.0 20 829830eb-ca6e-ae6a-99a2-728751b0ea8e 7ac8ce29-5517-d3d5-5004-f11942ae3225 1966-08-03 12:00:39 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1966-08-03 12:15:39 160903007 NaN 1966-08-03 12:15:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None None None None 1966-08-03 12:15:39 None None 1966-08-03 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None
3 None 1 0.0 1 0.0 CLOSED 20 CLOSED CLOSED 0.0 20 0f6ff0c6-c365-84f5-6742-226325f0ab22 92a599e7-23f3-c14d-343e-635eded378a6 1972-08-09 12:00:39 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1972-08-09 12:15:39 160903007 NaN 1972-08-09 12:15:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None None None None 1972-08-09 12:15:39 None None 1972-08-09 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None
4 None 1 0.0 1 0.0 CLOSED 10 CLOSED CLOSED 0.0 10 8b80cac7-383e-b479-1a05-7d8a9de45d04 c409c86b-ece6-a55c-79a9-c066fe56433a 2013-01-13 01:05:00 None 3dfb065a-67df-5b8a-3901-49bfd834bed1 2013-01-13 01:20:00 410620009 26aab0cd-6aba-3e1b-ac5b-05c8867e762c 2013-01-13 01:20:00 5053c59c-2f74-3246-9fda-cf2fbe4f6972 None None None None 2013-01-13 01:20:00 None None 2013-01-13 01:05:00 5053c59c-2f74-3246-9fda-cf2fbe4f6972 None

stg_claims.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:51:17.265171+00:00
WITH 
"claims_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> claim_id
    -- PATIENTID -> patient_id
    -- PROVIDERID -> provider_id
    -- PRIMARYPATIENTINSURANCEID -> primary_insurance_id
    -- SECONDARYPATIENTINSURANCEID -> secondary_insurance_id
    -- DEPARTMENTID -> department_id
    -- PATIENTDEPARTMENTID -> patient_department_id
    -- DIAGNOSIS1 -> primary_diagnosis
    -- DIAGNOSIS2 -> secondary_diagnosis
    -- DIAGNOSIS3 -> tertiary_diagnosis
    -- DIAGNOSIS4 -> quaternary_diagnosis
    -- DIAGNOSIS5 -> quinary_diagnosis
    -- DIAGNOSIS6 -> senary_diagnosis
    -- DIAGNOSIS7 -> septenary_diagnosis
    -- DIAGNOSIS8 -> octonary_diagnosis
    -- REFERRINGPROVIDERID -> referring_provider_id
    -- APPOINTMENTID -> appointment_id
    -- CURRENTILLNESSDATE -> illness_onset_date
    -- SERVICEDATE -> service_date
    -- SUPERVISINGPROVIDERID -> supervising_provider_id
    -- STATUS1 -> primary_claim_status
    -- STATUS2 -> secondary_claim_status
    -- STATUSP -> patient_claim_status
    -- OUTSTANDING1 -> primary_outstanding_amount
    -- OUTSTANDING2 -> secondary_outstanding_amount
    -- OUTSTANDINGP -> patient_outstanding_amount
    -- LASTBILLEDDATE1 -> primary_last_billed_date
    -- LASTBILLEDDATE2 -> secondary_last_billed_date
    -- LASTBILLEDDATEP -> patient_last_billed_date
    -- HEALTHCARECLAIMTYPEID1 -> primary_claim_type_id
    -- HEALTHCARECLAIMTYPEID2 -> claim_type_id
    SELECT 
        "Id" AS "claim_id",
        "PATIENTID" AS "patient_id",
        "PROVIDERID" AS "provider_id",
        "PRIMARYPATIENTINSURANCEID" AS "primary_insurance_id",
        "SECONDARYPATIENTINSURANCEID" AS "secondary_insurance_id",
        "DEPARTMENTID" AS "department_id",
        "PATIENTDEPARTMENTID" AS "patient_department_id",
        "DIAGNOSIS1" AS "primary_diagnosis",
        "DIAGNOSIS2" AS "secondary_diagnosis",
        "DIAGNOSIS3" AS "tertiary_diagnosis",
        "DIAGNOSIS4" AS "quaternary_diagnosis",
        "DIAGNOSIS5" AS "quinary_diagnosis",
        "DIAGNOSIS6" AS "senary_diagnosis",
        "DIAGNOSIS7" AS "septenary_diagnosis",
        "DIAGNOSIS8" AS "octonary_diagnosis",
        "REFERRINGPROVIDERID" AS "referring_provider_id",
        "APPOINTMENTID" AS "appointment_id",
        "CURRENTILLNESSDATE" AS "illness_onset_date",
        "SERVICEDATE" AS "service_date",
        "SUPERVISINGPROVIDERID" AS "supervising_provider_id",
        "STATUS1" AS "primary_claim_status",
        "STATUS2" AS "secondary_claim_status",
        "STATUSP" AS "patient_claim_status",
        "OUTSTANDING1" AS "primary_outstanding_amount",
        "OUTSTANDING2" AS "secondary_outstanding_amount",
        "OUTSTANDINGP" AS "patient_outstanding_amount",
        "LASTBILLEDDATE1" AS "primary_last_billed_date",
        "LASTBILLEDDATE2" AS "secondary_last_billed_date",
        "LASTBILLEDDATEP" AS "patient_last_billed_date",
        "HEALTHCARECLAIMTYPEID1" AS "primary_claim_type_id",
        "HEALTHCARECLAIMTYPEID2" AS "claim_type_id"
    FROM "memory"."main"."claims"
),

"claims_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- primary_insurance_id: The problem is that the '0' value does not match the UUID format of the other values in the primary_insurance_id column. All other values are valid UUIDs in the format of 8-4-4-4-12 hexadecimal characters. The '0' value is likely a placeholder or default value used when no valid UUID was available. The correct values should all be valid UUIDs or an empty string if no valid UUID is available.
    SELECT
        "claim_id",
        "patient_id",
        "provider_id",
        CASE
            WHEN "primary_insurance_id" = '0' THEN NULL
            ELSE "primary_insurance_id"
        END AS "primary_insurance_id",
        "secondary_insurance_id",
        "department_id",
        "patient_department_id",
        "primary_diagnosis",
        "secondary_diagnosis",
        "tertiary_diagnosis",
        "quaternary_diagnosis",
        "quinary_diagnosis",
        "senary_diagnosis",
        "septenary_diagnosis",
        "octonary_diagnosis",
        "referring_provider_id",
        "appointment_id",
        "illness_onset_date",
        "service_date",
        "supervising_provider_id",
        "primary_claim_status",
        "secondary_claim_status",
        "patient_claim_status",
        "primary_outstanding_amount",
        "secondary_outstanding_amount",
        "patient_outstanding_amount",
        "primary_last_billed_date",
        "secondary_last_billed_date",
        "patient_last_billed_date",
        "primary_claim_type_id",
        "claim_type_id"
    FROM "claims_renamed"
),

"claims_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- secondary_insurance_id: ['0']
    SELECT 
        CASE
            WHEN "secondary_insurance_id" = '0' THEN NULL
            ELSE "secondary_insurance_id"
        END AS "secondary_insurance_id",
        "patient_last_billed_date",
        "claim_id",
        "primary_claim_type_id",
        "patient_outstanding_amount",
        "primary_diagnosis",
        "provider_id",
        "claim_type_id",
        "secondary_outstanding_amount",
        "illness_onset_date",
        "appointment_id",
        "secondary_last_billed_date",
        "senary_diagnosis",
        "service_date",
        "quaternary_diagnosis",
        "patient_id",
        "primary_claim_status",
        "primary_insurance_id",
        "department_id",
        "octonary_diagnosis",
        "septenary_diagnosis",
        "patient_claim_status",
        "secondary_claim_status",
        "primary_last_billed_date",
        "primary_outstanding_amount",
        "patient_department_id",
        "referring_provider_id",
        "supervising_provider_id",
        "secondary_diagnosis",
        "quinary_diagnosis",
        "tertiary_diagnosis"
    FROM "claims_renamed_cleaned"
),

"claims_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- appointment_id: from VARCHAR to UUID
    -- claim_id: from VARCHAR to UUID
    -- illness_onset_date: from VARCHAR to TIMESTAMP
    -- octonary_diagnosis: from DECIMAL to VARCHAR
    -- patient_id: from VARCHAR to UUID
    -- patient_last_billed_date: from VARCHAR to TIMESTAMP
    -- primary_diagnosis: from INT to VARCHAR
    -- primary_insurance_id: from VARCHAR to UUID
    -- primary_last_billed_date: from VARCHAR to TIMESTAMP
    -- provider_id: from VARCHAR to UUID
    -- quaternary_diagnosis: from DECIMAL to VARCHAR
    -- quinary_diagnosis: from DECIMAL to VARCHAR
    -- referring_provider_id: from DECIMAL to VARCHAR
    -- secondary_diagnosis: from DECIMAL to VARCHAR
    -- secondary_last_billed_date: from VARCHAR to TIMESTAMP
    -- senary_diagnosis: from DECIMAL to VARCHAR
    -- septenary_diagnosis: from DECIMAL to VARCHAR
    -- service_date: from VARCHAR to TIMESTAMP
    -- supervising_provider_id: from VARCHAR to UUID
    -- tertiary_diagnosis: from DECIMAL to VARCHAR
    SELECT
        "secondary_insurance_id",
        "primary_claim_type_id",
        "patient_outstanding_amount",
        "claim_type_id",
        "secondary_outstanding_amount",
        "primary_claim_status",
        "department_id",
        "patient_claim_status",
        "secondary_claim_status",
        "primary_outstanding_amount",
        "patient_department_id",
        CAST("appointment_id" AS UUID) 
        AS "appointment_id",
        CAST("claim_id" AS UUID) 
        AS "claim_id",
        CAST("illness_onset_date" AS TIMESTAMP) 
        AS "illness_onset_date",
        CAST("octonary_diagnosis" AS VARCHAR) 
        AS "octonary_diagnosis",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("patient_last_billed_date" AS TIMESTAMP) 
        AS "patient_last_billed_date",
        CAST("primary_diagnosis" AS VARCHAR) 
        AS "primary_diagnosis",
        CAST("primary_insurance_id" AS UUID) 
        AS "primary_insurance_id",
        CAST("primary_last_billed_date" AS TIMESTAMP) 
        AS "primary_last_billed_date",
        CAST("provider_id" AS UUID) 
        AS "provider_id",
        CAST("quaternary_diagnosis" AS VARCHAR) 
        AS "quaternary_diagnosis",
        CAST("quinary_diagnosis" AS VARCHAR) 
        AS "quinary_diagnosis",
        CAST("referring_provider_id" AS VARCHAR) 
        AS "referring_provider_id",
        CAST("secondary_diagnosis" AS VARCHAR) 
        AS "secondary_diagnosis",
        CAST("secondary_last_billed_date" AS TIMESTAMP) 
        AS "secondary_last_billed_date",
        CAST("senary_diagnosis" AS VARCHAR) 
        AS "senary_diagnosis",
        CAST("septenary_diagnosis" AS VARCHAR) 
        AS "septenary_diagnosis",
        CAST("service_date" AS TIMESTAMP) 
        AS "service_date",
        CAST("supervising_provider_id" AS UUID) 
        AS "supervising_provider_id",
        CAST("tertiary_diagnosis" AS VARCHAR) 
        AS "tertiary_diagnosis"
    FROM "claims_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT *
FROM "claims_renamed_cleaned_null_casted"

stg_claims.yml (Document the table)

version: 2
models:
- name: stg_claims
  description: The table is about insurance claims. It contains details of each claim,
    including patient ID, provider ID, diagnosis codes, appointment details, and claim
    status. The table tracks claim processing information such as outstanding amounts,
    last billed dates, and healthcare claim type IDs. Each row represents a unique
    claim with its associated patient, provider, and billing information.
  columns:
  - name: secondary_insurance_id
    description: Secondary insurance identifier for the patient
    cocoon_meta:
      missing_reason: Not applicable if patient doesn't have secondary insurance.
      data_type:
        current_data_type: VARCHAR
  - name: primary_claim_type_id
    description: Type identifier for primary healthcare claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: patient_outstanding_amount
    description: Outstanding amount for patient claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: claim_type_id
    description: Identifies the type of healthcare claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: secondary_outstanding_amount
    description: Outstanding amount for secondary claim
    cocoon_meta:
      missing_reason: Not applicable if there's no secondary insurance claim.
      data_type:
        current_data_type: DECIMAL
  - name: primary_claim_status
    description: Status of primary claim
    tests:
    - not_null
    - accepted_values:
        values:
        - CLOSED
        - BILLED
    cocoon_meta:
      future_accepted_values:
      - OPEN
      - PENDING
      - APPROVED
      - DENIED
      - PAID
      - PARTIALLY PAID
      - APPEALED
      - UNDER REVIEW
      - CANCELLED
      data_type:
        current_data_type: VARCHAR
  - name: department_id
    description: Identifier for the department handling the claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: patient_claim_status
    description: Status of patient claim
    tests:
    - not_null
    - accepted_values:
        values:
        - CLOSED
        - BILLED
    cocoon_meta:
      future_accepted_values:
      - OPEN
      - PENDING
      - APPROVED
      - DENIED
      - UNDER REVIEW
      - APPEALED
      - PAID
      - PARTIALLY PAID
      - CANCELLED
      data_type:
        current_data_type: VARCHAR
  - name: secondary_claim_status
    description: Status of secondary claim
    tests:
    - accepted_values:
        values:
        - CLOSED
        - BILLED
    cocoon_meta:
      missing_reason: Not applicable if there's no secondary insurance claim.
      future_accepted_values:
      - OPEN
      - PENDING
      - DENIED
      - APPROVED
      - PAID
      - PARTIAL_PAID
      - APPEALED
      - CANCELLED
      - SUSPENDED
      data_type:
        current_data_type: VARCHAR
  - name: primary_outstanding_amount
    description: Outstanding amount for primary claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: patient_department_id
    description: Department identifier associated with the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: appointment_id
    description: Unique identifier for the appointment
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: claim_id
    description: Unique identifier for the claim
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for each claim. For
        this table, each row represents a unique insurance claim. The claim_id appears
        to be unique across rows, as it's a UUID-like string that's different for
        each sample.
      data_type:
        current_data_type: UUID
  - name: illness_onset_date
    description: Date of current illness onset
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: octonary_diagnosis
    description: Eighth diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_last_billed_date
    description: Last billed date for patient claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: primary_diagnosis
    description: Primary diagnosis code
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: primary_insurance_id
    description: Primary insurance identifier for the patient
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: UUID
  - name: primary_last_billed_date
    description: Last billed date for primary claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: provider_id
    description: Unique identifier for the healthcare provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: quaternary_diagnosis
    description: Fourth diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: quinary_diagnosis
    description: Fifth diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: referring_provider_id
    description: Identifier for the referring provider
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: secondary_diagnosis
    description: Secondary diagnosis code
    cocoon_meta:
      missing_reason: Not applicable if there's no secondary insurance claim.
      data_type:
        current_data_type: VARCHAR
  - name: secondary_last_billed_date
    description: Last billed date for secondary claim
    cocoon_meta:
      missing_reason: Not applicable if there's no secondary insurance claim.
      data_type:
        current_data_type: TIMESTAMP
  - name: senary_diagnosis
    description: Sixth diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: septenary_diagnosis
    description: Seventh diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: service_date
    description: Date of service provided
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: supervising_provider_id
    description: Identifier for the supervising provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: tertiary_diagnosis
    description: Third diagnosis code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_providers

Table Preview (first 5 rows)

provider_name GENDER SPECIALITY street_address CITY STATE latitude longitude patient_encounters procedures_performed ZIP organization_id provider_id
0 Gabriel Reilly F GENERAL PRACTICE 881 Main Street Fitchburg MA 42.586487 -71.805210 131 0 1420 74ab949d-17ac-3309-83a0-13b4405c66aa 6e3f59d7-742a-3b38-a4b1-1b471ed651a2
1 Terence Brakus M GENERAL PRACTICE 461 WALNUT AVE JAMAICA PLAIN MA 42.311588 -71.098001 30 0 21302331 588f6ce6-b8db-3588-8189-29db2680a313 63bf819f-a8ee-3bc1-b34a-787fe8e8e4dc
2 Jules Emard M GENERAL PRACTICE 60 HOSPITAL RD LEOMINSTER MA 42.540319 -71.763130 8 0 14533290 e09d4c49-c2ef-3b0f-9a46-3719d9219306 ec207479-d18d-3b2d-9f80-f76ba864aefb
3 Precious Runolfsson F GENERAL PRACTICE 501 CORDWAINER DRIVE NORWELL MA 42.164707 -70.881491 12 0 20611630 aa682136-a4df-3942-9f50-e6fcee6c0b73 f751fb18-fa7d-346e-8b95-7b8cb0dbcf32
4 Eduardo Villegas M GENERAL PRACTICE 484 RTE 134 WORCESTER MA 42.262562 -71.801888 3 0 16071728 ab701a70-a658-340e-8f69-ee196a7d40c6 5afd05ef-bd29-32bc-bb4f-a17144628cbd

stg_providers.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:16:20.447136+00:00
WITH 
"providers_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> provider_id
    -- ORGANIZATION -> organization_id
    -- NAME -> provider_name
    -- ADDRESS -> street_address
    -- LAT -> latitude
    -- LON -> longitude
    -- ENCOUNTERS -> patient_encounters
    -- PROCEDURES -> procedures_performed
    SELECT 
        "Id" AS "provider_id",
        "ORGANIZATION" AS "organization_id",
        "NAME" AS "provider_name",
        "GENDER",
        "SPECIALITY",
        "ADDRESS" AS "street_address",
        "CITY",
        "STATE",
        "ZIP",
        "LAT" AS "latitude",
        "LON" AS "longitude",
        "ENCOUNTERS" AS "patient_encounters",
        "PROCEDURES" AS "procedures_performed"
    FROM "memory"."main"."providers"
),

"providers_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- provider_name: The problem is that all the provider names in this column contain numbers, which is highly unusual for real names. The names appear to be a combination of a first name followed by numbers, then a last name followed by more numbers. This pattern is consistent across all entries and doesn't represent typical naming conventions. The correct values should be the names without the numbers.
    SELECT
        "provider_id",
        "organization_id",
        CASE
            WHEN "provider_name" = 'Afton574 Kulas532' THEN 'Afton Kulas'
            WHEN "provider_name" = 'Agustín529 Gaona896' THEN 'Agustín Gaona'
            WHEN "provider_name" = 'Alan320 Keeling57' THEN 'Alan Keeling'
            WHEN "provider_name" = 'Alvin56 Hayes766' THEN 'Alvin Hayes'
            WHEN "provider_name" = 'Ammie189 Kuvalis369' THEN 'Ammie Kuvalis'
            WHEN "provider_name" = 'Anderson154 Lemke654' THEN 'Anderson Lemke'
            WHEN "provider_name" = 'Angelic427 McCullough561' THEN 'Angelic McCullough'
            WHEN "provider_name" = 'Annita733 Frami345' THEN 'Annita Frami'
            WHEN "provider_name" = 'Antwan357 Lindgren255' THEN 'Antwan Lindgren'
            WHEN "provider_name" = 'Ariane992 Pagac496' THEN 'Ariane Pagac'
            WHEN "provider_name" = 'Arla414 Fritsch593' THEN 'Arla Fritsch'
            WHEN "provider_name" = 'Barrett790 Wolf938' THEN 'Barrett Wolf'
            WHEN "provider_name" = 'Bernardo699 Nieves278' THEN 'Bernardo Nieves'
            WHEN "provider_name" = 'Bradly656 Lang846' THEN 'Bradly Lang'
            WHEN "provider_name" = 'Brigitte394 Bartoletti50' THEN 'Brigitte Bartoletti'
            WHEN "provider_name" = 'Brock407 Bergstrom287' THEN 'Brock Bergstrom'
            WHEN "provider_name" = 'Carisa395 Kutch271' THEN 'Carisa Kutch'
            WHEN "provider_name" = 'Carlos172 Wiza601' THEN 'Carlos Wiza'
            WHEN "provider_name" = 'Carri827 Flatley871' THEN 'Carri Flatley'
            WHEN "provider_name" = 'Chantell995 Krajcik437' THEN 'Chantell Krajcik'
            WHEN "provider_name" = 'Charlena802 Lockman863' THEN 'Charlena Lockman'
            WHEN "provider_name" = 'Cherrie404 Conn188' THEN 'Cherrie Conn'
            WHEN "provider_name" = 'Chong355 Balistreri607' THEN 'Chong Balistreri'
            WHEN "provider_name" = 'Clemente531 Schamberger479' THEN 'Clemente Schamberger'
            WHEN "provider_name" = 'Clemente531 Treutel973' THEN 'Clemente Treutel'
            WHEN "provider_name" = 'Cristi782 Miller503' THEN 'Cristi Miller'
            WHEN "provider_name" = 'Cristy798 Auer97' THEN 'Cristy Auer'
            WHEN "provider_name" = 'Dana512 Wilkinson796' THEN 'Dana Wilkinson'
            WHEN "provider_name" = 'Danial835 Lehner980' THEN 'Danial Lehner'
            WHEN "provider_name" = 'Danita413 Becker968' THEN 'Danita Becker'
            WHEN "provider_name" = 'Dannie881 Howell947' THEN 'Dannie Howell'
            WHEN "provider_name" = 'Danuta945 Little434' THEN 'Danuta Little'
            WHEN "provider_name" = 'Dean966 Kris249' THEN 'Dean Kris'
            WHEN "provider_name" = 'Delicia67 Bernhard322' THEN 'Delicia Bernhard'
            WHEN "provider_name" = 'Dorcas534 Quigley282' THEN 'Dorcas Quigley'
            WHEN "provider_name" = 'Drew592 Ankunding277' THEN 'Drew Ankunding'
            WHEN "provider_name" = 'Eduardo902 Méndez913' THEN 'Eduardo Méndez'
            WHEN "provider_name" = 'Eduardo902 Villegas15' THEN 'Eduardo Villegas'
            WHEN "provider_name" = 'Elise948 Smitham825' THEN 'Elise Smitham'
            WHEN "provider_name" = 'Eliseo499 Reinger292' THEN 'Eliseo Reinger'
            WHEN "provider_name" = 'Ellen406 Farrell962' THEN 'Ellen Farrell'
            WHEN "provider_name" = 'Elmer371 Kutch271' THEN 'Elmer Kutch'
            WHEN "provider_name" = 'Elvera717 Gusikowski974' THEN 'Elvera Gusikowski'
            WHEN "provider_name" = 'Emerita401 Satterfield305' THEN 'Emerita Satterfield'
            WHEN "provider_name" = 'Errol226 Gaylord332' THEN 'Errol Gaylord'
            WHEN "provider_name" = 'Ervin886 Leannon79' THEN 'Ervin Leannon'
            WHEN "provider_name" = 'Erwin847 Stiedemann542' THEN 'Erwin Stiedemann'
            WHEN "provider_name" = 'Esteban536 Torres807' THEN 'Esteban Torres'
            WHEN "provider_name" = 'Eufemia350 Casper496' THEN 'Eufemia Casper'
            WHEN "provider_name" = 'Evelia928 Bogisich202' THEN 'Evelia Bogisich'
            WHEN "provider_name" = 'Fidel864 Swift555' THEN 'Fidel Swift'
            WHEN "provider_name" = 'Francisco472 Gusikowski974' THEN 'Francisco Gusikowski'
            WHEN "provider_name" = 'Gabriel934 Reilly981' THEN 'Gabriel Reilly'
            WHEN "provider_name" = 'Gerardo48 Montez99' THEN 'Gerardo Montez'
            WHEN "provider_name" = 'Germán350 Dueñas839' THEN 'Germán Dueñas'
            WHEN "provider_name" = 'Gonzalo160 Rodríguez701' THEN 'Gonzalo Rodríguez'
            WHEN "provider_name" = 'Gregory545 Toy286' THEN 'Gregory Toy'
            WHEN "provider_name" = 'Hal307 Wiegand701' THEN 'Hal Wiegand'
            WHEN "provider_name" = 'Hank686 Kemmer137' THEN 'Hank Kemmer'
            WHEN "provider_name" = 'Harvey63 Tromp100' THEN 'Harvey Tromp'
            WHEN "provider_name" = 'Haywood675 Bednar518' THEN 'Haywood Bednar'
            WHEN "provider_name" = 'Horace32 Grady603' THEN 'Horace Grady'
            WHEN "provider_name" = 'Humberto482 Mann644' THEN 'Humberto Mann'
            WHEN "provider_name" = 'Hyman89 Schmeler639' THEN 'Hyman Schmeler'
            WHEN "provider_name" = 'Ignacio928 O''Keefe54' THEN 'Ignacio O''Keefe'
            WHEN "provider_name" = 'Ilana185 Torp761' THEN 'Ilana Torp'
            WHEN "provider_name" = 'Irvin970 Emard19' THEN 'Irvin Emard'
            WHEN "provider_name" = 'Isaias604 Dickinson688' THEN 'Isaias Dickinson'
            WHEN "provider_name" = 'Isiah14 Nikolaus26' THEN 'Isiah Nikolaus'
            WHEN "provider_name" = 'Isiah14 Schaefer657' THEN 'Isiah Schaefer'
            WHEN "provider_name" = 'Jacobo456 Vergara204' THEN 'Jacobo Vergara'
            WHEN "provider_name" = 'Janeth814 Hyatt152' THEN 'Janeth Hyatt'
            WHEN "provider_name" = 'Jeana169 Cronin387' THEN 'Jeana Cronin'
            WHEN "provider_name" = 'Jefferey580 Luettgen772' THEN 'Jefferey Luettgen'
            WHEN "provider_name" = 'Joanna347 Abbott774' THEN 'Joanna Abbott'
            WHEN "provider_name" = 'Johna806 Klein929' THEN 'Johna Klein'
            WHEN "provider_name" = 'Joline447 Stark857' THEN 'Joline Stark'
            WHEN "provider_name" = 'Jolynn62 Adams676' THEN 'Jolynn Adams'
            WHEN "provider_name" = 'Joshua658 Borer986' THEN 'Joshua Borer'
            WHEN "provider_name" = 'Juan88 Friesen796' THEN 'Juan Friesen'
            WHEN "provider_name" = 'Jules135 Emard19' THEN 'Jules Emard'
            WHEN "provider_name" = 'Julio255 Rivas297' THEN 'Julio Rivas'
            WHEN "provider_name" = 'Kai187 Mann644' THEN 'Kai Mann'
            WHEN "provider_name" = 'Kathi607 Hilpert278' THEN 'Kathi Hilpert'
            WHEN "provider_name" = 'Keith571 Friesen796' THEN 'Keith Friesen'
            WHEN "provider_name" = 'Kendra609 Kassulke119' THEN 'Kendra Kassulke'
            WHEN "provider_name" = 'Kirby843 Rippin620' THEN 'Kirby Rippin'
            WHEN "provider_name" = 'Kisha966 Beahan375' THEN 'Kisha Beahan'
            WHEN "provider_name" = 'Kizzie166 Lehner980' THEN 'Kizzie Lehner'
            WHEN "provider_name" = 'Kristina583 Bashirian201' THEN 'Kristina Bashirian'
            WHEN "provider_name" = 'Kyoko885 Price929' THEN 'Kyoko Price'
            WHEN "provider_name" = 'Lane844 Gutmann970' THEN 'Lane Gutmann'
            WHEN "provider_name" = 'Lashanda692 Gutmann970' THEN 'Lashanda Gutmann'
            WHEN "provider_name" = 'Lasonya941 Kiehn525' THEN 'Lasonya Kiehn'
            WHEN "provider_name" = 'Latasha666 Fay398' THEN 'Latasha Fay'
            WHEN "provider_name" = 'Latoria810 Eichmann909' THEN 'Latoria Eichmann'
            WHEN "provider_name" = 'Latrina689 White193' THEN 'Latrina White'
            WHEN "provider_name" = 'Leif534 Hane680' THEN 'Leif Hane'
            WHEN "provider_name" = 'Leonarda398 Schumm995' THEN 'Leonarda Schumm'
            WHEN "provider_name" = 'Lindsay928 Leffler128' THEN 'Lindsay Leffler'
            WHEN "provider_name" = 'Linn541 Lynch190' THEN 'Linn Lynch'
            WHEN "provider_name" = 'Lino542 Feest103' THEN 'Lino Feest'
            WHEN "provider_name" = 'Livia401 Rippin620' THEN 'Livia Rippin'
            WHEN "provider_name" = 'Luvenia178 Breitenberg711' THEN 'Luvenia Breitenberg'
            WHEN "provider_name" = 'Lyman173 Kshlerin58' THEN 'Lyman Kshlerin'
            WHEN "provider_name" = 'Madelaine318 Walker122' THEN 'Madelaine Walker'
            WHEN "provider_name" = 'Magdalena964 Torphy630' THEN 'Magdalena Torphy'
            WHEN "provider_name" = 'Maragaret140 Erdman779' THEN 'Maragaret Erdman'
            WHEN "provider_name" = 'Maren639 Aufderhar910' THEN 'Maren Aufderhar'
            WHEN "provider_name" = 'Margarite168 Koepp521' THEN 'Margarite Koepp'
            WHEN "provider_name" = 'Margene509 Schamberger479' THEN 'Margene Schamberger'
            WHEN "provider_name" = 'Mariah942 Hilpert278' THEN 'Mariah Hilpert'
            WHEN "provider_name" = 'Mariana775 Menéndez746' THEN 'Mariana Menéndez'
            WHEN "provider_name" = 'Mariette443 Rau926' THEN 'Mariette Rau'
            WHEN "provider_name" = 'Matthew562 Bailey598' THEN 'Matthew Bailey'
            WHEN "provider_name" = 'Maynard46 Ward668' THEN 'Maynard Ward'
            WHEN "provider_name" = 'Melissa844 Yundt842' THEN 'Melissa Yundt'
            WHEN "provider_name" = 'Mertie42 Lakin515' THEN 'Mertie Lakin'
            WHEN "provider_name" = 'Milda157 Schoen8' THEN 'Milda Schoen'
            WHEN "provider_name" = 'Millard193 Doyle959' THEN 'Millard Doyle'
            WHEN "provider_name" = 'Nicholle822 Satterfield305' THEN 'Nicholle Satterfield'
            WHEN "provider_name" = 'Noma845 Mraz590' THEN 'Noma Mraz'
            WHEN "provider_name" = 'Norah104 Jenkins714' THEN 'Norah Jenkins'
            WHEN "provider_name" = 'Olen518 Farrell962' THEN 'Olen Farrell'
            WHEN "provider_name" = 'Orpha286 Marks830' THEN 'Orpha Marks'
            WHEN "provider_name" = 'Paris331 Ruecker817' THEN 'Paris Ruecker'
            WHEN "provider_name" = 'Patrina117 Strosin214' THEN 'Patrina Strosin'
            WHEN "provider_name" = 'Paulene52 Kihn564' THEN 'Paulene Kihn'
            WHEN "provider_name" = 'Precious140 Runolfsson901' THEN 'Precious Runolfsson'
            WHEN "provider_name" = 'Ramona980 Zavala169' THEN 'Ramona Zavala'
            WHEN "provider_name" = 'Randa356 Ritchie586' THEN 'Randa Ritchie'
            WHEN "provider_name" = 'Randy380 Bergstrom287' THEN 'Randy Bergstrom'
            WHEN "provider_name" = 'Randy380 Kilback373' THEN 'Randy Kilback'
            WHEN "provider_name" = 'Rebekah348 Rippin620' THEN 'Rebekah Rippin'
            WHEN "provider_name" = 'Reginia455 Johnson679' THEN 'Reginia Johnson'
            WHEN "provider_name" = 'Rex53 Gerhold939' THEN 'Rex Gerhold'
            WHEN "provider_name" = 'Rhonda22 Bins636' THEN 'Rhonda Bins'
            WHEN "provider_name" = 'Rod343 Durgan499' THEN 'Rod Durgan'
            WHEN "provider_name" = 'Rod343 Goyette777' THEN 'Rod Goyette'
            WHEN "provider_name" = 'Rodney21 Hettinger594' THEN 'Rodney Hettinger'
            WHEN "provider_name" = 'Rolland302 Lemke654' THEN 'Rolland Lemke'
            WHEN "provider_name" = 'Royce974 Kohler843' THEN 'Royce Kohler'
            WHEN "provider_name" = 'Rupert654 Walker122' THEN 'Rupert Walker'
            WHEN "provider_name" = 'Rusty501 Rodriguez71' THEN 'Rusty Rodriguez'
            WHEN "provider_name" = 'Sallie654 Barrows492' THEN 'Sallie Barrows'
            WHEN "provider_name" = 'Samuel331 Runte676' THEN 'Samuel Runte'
            WHEN "provider_name" = 'Santina680 Dicki44' THEN 'Santina Dicki'
            WHEN "provider_name" = 'Sean831 DuBuque211' THEN 'Sean DuBuque'
            WHEN "provider_name" = 'Sharilyn202 Wolff180' THEN 'Sharilyn Wolff'
            WHEN "provider_name" = 'Shasta644 King743' THEN 'Shasta King'
            WHEN "provider_name" = 'Shawanna357 Roob72' THEN 'Shawanna Roob'
            WHEN "provider_name" = 'Shayla126 White193' THEN 'Shayla White'
            WHEN "provider_name" = 'Shon148 Reinger292' THEN 'Shon Reinger'
            WHEN "provider_name" = 'Shon148 Swift555' THEN 'Shon Swift'
            WHEN "provider_name" = 'Tambra47 Heaney114' THEN 'Tambra Heaney'
            WHEN "provider_name" = 'Tamisha203 Wilderman619' THEN 'Tamisha Wilderman'
            WHEN "provider_name" = 'Tashina114 Towne435' THEN 'Tashina Towne'
            WHEN "provider_name" = 'Teodoro374 Koss676' THEN 'Teodoro Koss'
            WHEN "provider_name" = 'Terence292 Brakus656' THEN 'Terence Brakus'
            WHEN "provider_name" = 'Tomás404 Hinojosa147' THEN 'Tomás Hinojosa'
            WHEN "provider_name" = 'Trey250 Satterfield305' THEN 'Trey Satterfield'
            WHEN "provider_name" = 'Trinity427 Hartmann983' THEN 'Trinity Hartmann'
            WHEN "provider_name" = 'Troy560 Carter549' THEN 'Troy Carter'
            WHEN "provider_name" = 'Tyron580 Littel644' THEN 'Tyron Littel'
            WHEN "provider_name" = 'Venice604 Rogahn59' THEN 'Venice Rogahn'
            WHEN "provider_name" = 'Vito638 Barton704' THEN 'Vito Barton'
            WHEN "provider_name" = 'Waldo53 Hintz995' THEN 'Waldo Hintz'
            WHEN "provider_name" = 'Willian804 Batz141' THEN 'Willian Batz'
            WHEN "provider_name" = 'Wilson960 Turner526' THEN 'Wilson Turner'
            WHEN "provider_name" = 'Zane918 Bosco882' THEN 'Zane Bosco'
            ELSE "provider_name"
        END AS "provider_name",
        "GENDER",
        "SPECIALITY",
        "street_address",
        "CITY",
        "STATE",
        "ZIP",
        "latitude",
        "longitude",
        "patient_encounters",
        "procedures_performed"
    FROM "providers_renamed"
),

"providers_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- ZIP: from INT to VARCHAR
    -- organization_id: from VARCHAR to UUID
    -- provider_id: from VARCHAR to UUID
    SELECT
        "provider_name",
        "GENDER",
        "SPECIALITY",
        "street_address",
        "CITY",
        "STATE",
        "latitude",
        "longitude",
        "patient_encounters",
        "procedures_performed",
        CAST("ZIP" AS VARCHAR) 
        AS "ZIP",
        CAST("organization_id" AS UUID) 
        AS "organization_id",
        CAST("provider_id" AS UUID) 
        AS "provider_id"
    FROM "providers_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "providers_renamed_cleaned_casted"

stg_providers.yml (Document the table)

version: 2
models:
- name: stg_providers
  description: The table is about healthcare providers. It includes their unique ID,
    organization, name, gender, specialty, and contact details. The table also provides
    location information including address, city, state, ZIP code, latitude, and longitude.
    Additionally, it records the number of encounters and procedures for each provider.
    All providers listed specialize in general practice and are located in Massachusetts.
  columns:
  - name: provider_name
    description: Full name of the healthcare provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: GENDER
    description: Gender of the healthcare provider
    tests:
    - not_null
    - accepted_values:
        values:
        - F
        - M
    cocoon_meta:
      future_accepted_values:
      - O
      data_type:
        current_data_type: VARCHAR
  - name: SPECIALITY
    description: Medical specialty of the provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: street_address
    description: Street address of the provider's practice
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)$
        summary: Basic street address with number and street name
      - regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)(?:\s+(?:SUITE|STE)\s+\d+)?$
        summary: Address with suite or unit number
      - regex: ^[A-Z0-9\s]+$
        summary: General pattern allowing for various formats
      - regex: \d+\s+[A-Za-z\s&''\.]+\s+(Road|ST\.|Street|ROAD|PL)
        summary: Street addresses with number, name, and type (most strict)
      - regex: \d+\s+[A-Za-z\s&''\.]+
        summary: Street addresses with number and name (less strict)
      data_type:
        current_data_type: VARCHAR
  - name: CITY
    description: City where the provider's practice is located
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: STATE
    description: State where the provider's practice is located
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: latitude
    description: Latitude coordinate of the provider's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: longitude
    description: Longitude coordinate of the provider's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: patient_encounters
    description: Number of patient encounters for the provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: procedures_performed
    description: Number of procedures performed by the provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: ZIP
    description: ZIP code of the provider's practice
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: organization_id
    description: Identifier for the provider's organization
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: provider_id
    description: Unique identifier for the healthcare provider
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for the healthcare
        provider. For this table, each row is for a unique provider. The provider_id
        is designed to be unique for each provider, ensuring no two providers have
        the same ID.
      data_type:
        current_data_type: UUID
  tests: []

stg_supplies

Table Preview (first 5 rows)

supply_description quantity_ordered encounter_id order_date patient_id supply_code
0 Blood glucose testing strips (physical object) 50 027a6834-445b-9d67-f1a8-437d8c1af741 2005-09-12 b3b71304-fe5b-bda4-6822-bd901b2836d1 337388004
1 Blood glucose testing strips (physical object) 50 76530141-ca06-a819-537e-73a498808c84 2005-10-24 b3b71304-fe5b-bda4-6822-bd901b2836d1 337388004
2 Blood glucose testing strips (physical object) 50 e46e3f29-51a3-9d87-d57d-9d317b179c16 2008-10-13 b3b71304-fe5b-bda4-6822-bd901b2836d1 337388004
3 Blood glucose testing strips (physical object) 50 c563ea20-ad9e-2001-a4c9-c00890842509 2009-10-19 b3b71304-fe5b-bda4-6822-bd901b2836d1 337388004
4 Blood glucose testing strips (physical object) 50 c8e62551-765a-de49-e5cc-fd98f712ae73 2014-11-17 b3b71304-fe5b-bda4-6822-bd901b2836d1 337388004

stg_supplies.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:17:12.506926+00:00
WITH 
"supplies_renamed" AS (
    -- Rename: Renaming columns
    -- DATE_ -> order_date
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> supply_code
    -- DESCRIPTION -> supply_description
    -- QUANTITY -> quantity_ordered
    SELECT 
        "DATE_" AS "order_date",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "supply_code",
        "DESCRIPTION" AS "supply_description",
        "QUANTITY" AS "quantity_ordered"
    FROM "memory"."main"."supplies"
),

"supplies_renamed_dedup" AS (
    -- Deduplication: Removed 7 duplicated rows
    SELECT DISTINCT *
    FROM "supplies_renamed"
),

"supplies_renamed_dedup_cleaned" AS (
    -- Clean unusual string values: 
    -- supply_description: The supply_description column has several inconsistencies: 1. Inconsistent spacing: 'Air filter  device' has an extra space. 2. Inconsistent representations of CPAP/BPAP: Some use full words, others use abbreviations. 3. Inconsistent use of parentheses: Most entries have '(physical object)' at the end, but one has '(substance)' and another has '(product)'. 4. Inconsistent capitalization: Some entries start with capital letters, others don't. The correct values should have consistent spacing, use full words instead of abbreviations, use '(physical object)' consistently for devices, and have consistent capitalization.
    SELECT
        "order_date",
        "patient_id",
        "encounter_id",
        "supply_code",
        CASE
            WHEN "supply_description" = 'Air filter  device (physical object)' THEN 'Air filter device (physical object)'
            WHEN "supply_description" = 'CPAP/BPAP oral mask (physical object)' THEN 'Continuous positive airway pressure/Bilevel positive airway pressure oral mask (physical object)'
            WHEN "supply_description" = 'Solution (substance)' THEN 'Solution (physical object)'
            WHEN "supply_description" = 'Packed red blood cells (product)' THEN 'Packed red blood cells (physical object)'
            ELSE "supply_description"
        END AS "supply_description",
        "quantity_ordered"
    FROM "supplies_renamed_dedup"
),

"supplies_renamed_dedup_cleaned_casted" AS (
    -- Column Type Casting: 
    -- encounter_id: from VARCHAR to UUID
    -- order_date: from VARCHAR to DATE
    -- patient_id: from VARCHAR to UUID
    -- supply_code: from INT to VARCHAR
    SELECT
        "supply_description",
        "quantity_ordered",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("order_date" AS DATE) 
        AS "order_date",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("supply_code" AS VARCHAR) 
        AS "supply_code"
    FROM "supplies_renamed_dedup_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "supplies_renamed_dedup_cleaned_casted"

stg_supplies.yml (Document the table)

version: 2
models:
- name: stg_supplies
  description: The table is about medical supply orders. It shows the date, patient
    ID, encounter ID, supply code, description, and quantity ordered. All entries
    are for blood glucose testing strips. The same patient received 50 strips on multiple
    occasions between 1993 and 2005. Each order is associated with a unique encounter
    ID.
  columns:
  - name: supply_description
    description: Text description of the ordered medical supply
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: quantity_ordered
    description: Number of units ordered
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: encounter_id
    description: Unique identifier for each medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: order_date
    description: Date of the medical supply order
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DATE
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: supply_code
    description: Standardized code for the medical supply
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_encounters

Table Preview (first 5 rows)

encounter_type procedure_description base_cost total_cost payer_coverage reason_description encounter_end encounter_id encounter_start organization_id patient_id payer_id procedure_code provider_id reason_code
0 wellness Well child visit (procedure) 136.8 272.80 272.8 None 2013-01-13 01:20:00 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 0e061004-bcc0-3e99-a237-d700823e817b 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c 410620009 5053c59c-2f74-3246-9fda-cf2fbe4f6972 None
1 wellness Well child visit (procedure) 136.8 871.12 0.0 None 2021-09-23 02:59:04 79b413f6-c668-bf41-bce7-b403c40cf7a4 2021-09-23 02:44:04 ea153d0a-9ccc-3769-9a25-934b417071b7 eb247227-e839-88d3-447d-b5972468f33b 734afbd6-4794-363b-9bc0-6a3981533ed5 410620009 a9a4676a-e232-3eef-a874-e7af0fa54350 None
2 wellness General examination of patient (procedure) 136.8 704.20 0.0 None 1962-07-25 12:15:39 c6600278-5c90-c0c7-db52-140b3b98d1e8 1962-07-25 12:00:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 2ffa361e-5858-877e-e022-ce81fe32da1b b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 162673000 5ceef3de-8e50-3c8f-8224-d277539f4100 None
3 wellness General examination of patient (procedure) 136.8 704.20 0.0 None 1963-07-31 12:15:39 5d6bce0a-93b6-1c60-4399-97a717a3f843 1963-07-31 12:00:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 2ffa361e-5858-877e-e022-ce81fe32da1b b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 162673000 5ceef3de-8e50-3c8f-8224-d277539f4100 None
4 wellness General examination of patient (procedure) 136.8 778.78 0.0 None 1966-08-03 12:15:39 829830eb-ca6e-ae6a-99a2-728751b0ea8e 1966-08-03 12:00:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 2ffa361e-5858-877e-e022-ce81fe32da1b b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 162673000 5ceef3de-8e50-3c8f-8224-d277539f4100 None

stg_encounters.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:59:11.305102+00:00
WITH 
"encounters_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> encounter_id
    -- START -> encounter_start
    -- STOP -> encounter_end
    -- PATIENT -> patient_id
    -- ORGANIZATION -> organization_id
    -- PROVIDER -> provider_id
    -- PAYER -> payer_id
    -- ENCOUNTERCLASS -> encounter_type
    -- CODE -> procedure_code
    -- DESCRIPTION -> procedure_description
    -- BASE_ENCOUNTER_COST -> base_cost
    -- TOTAL_CLAIM_COST -> total_cost
    -- PAYER_COVERAGE -> payer_coverage
    -- REASONCODE -> reason_code
    -- REASONDESCRIPTION -> reason_description
    SELECT 
        "Id" AS "encounter_id",
        "START" AS "encounter_start",
        "STOP" AS "encounter_end",
        "PATIENT" AS "patient_id",
        "ORGANIZATION" AS "organization_id",
        "PROVIDER" AS "provider_id",
        "PAYER" AS "payer_id",
        "ENCOUNTERCLASS" AS "encounter_type",
        "CODE" AS "procedure_code",
        "DESCRIPTION" AS "procedure_description",
        "BASE_ENCOUNTER_COST" AS "base_cost",
        "TOTAL_CLAIM_COST" AS "total_cost",
        "PAYER_COVERAGE" AS "payer_coverage",
        "REASONCODE" AS "reason_code",
        "REASONDESCRIPTION" AS "reason_description"
    FROM "memory"."main"."encounters"
),

"encounters_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- procedure_description: The procedure_description column has several inconsistencies: 1. Some entries have "(procedure)" appended while others don't. 2. There are redundant entries like "Encounter for problem" and "Encounter for problem (procedure)". 3. Some entries have inconsistent capitalization or punctuation. 4. A few entries seem out of place or too specific compared to others.The correct values should follow a consistent pattern, preferably with "(procedure)" appended where appropriate, and use consistent capitalization and punctuation.
    -- reason_description: The problem is inconsistent labeling of medical conditions. Some values include '(disorder)' at the end while others don't. Additionally, some conditions are labeled as '(finding)' or '(situation)'. The correct values should consistently use the base condition name without these suffixes for uniformity.
    SELECT
        "encounter_id",
        "encounter_start",
        "encounter_end",
        "patient_id",
        "organization_id",
        "provider_id",
        "payer_id",
        "encounter_type",
        "procedure_code",
        CASE
            WHEN "procedure_description" = 'Follow-up encounter' THEN 'Follow-up encounter (procedure)'
            WHEN "procedure_description" = 'Encounter for problem' THEN 'Encounter for problem (procedure)'
            WHEN "procedure_description" = 'Prenatal visit' THEN 'Prenatal visit (procedure)'
            WHEN "procedure_description" = 'Patient encounter procedure' THEN 'Patient encounter procedure (procedure)'
            WHEN "procedure_description" = 'Encounter for ''check-up''' THEN 'Encounter for check up (procedure)'
            WHEN "procedure_description" = 'Emergency room admission (procedure)' THEN 'Emergency Room Admission (procedure)'
            WHEN "procedure_description" = 'Emergency Room Admission' THEN 'Emergency Room Admission (procedure)'
            WHEN "procedure_description" = 'Encounter for symptom' THEN 'Encounter for symptom (procedure)'
            WHEN "procedure_description" = 'Asthma follow-up' THEN 'Asthma follow-up (procedure)'
            WHEN "procedure_description" = 'Postnatal visit' THEN 'Postnatal visit (procedure)'
            WHEN "procedure_description" = 'Death Certification' THEN 'Death Certification (procedure)'
            WHEN "procedure_description" = 'Gynecology service (qualifier value)' THEN 'Gynecology service (procedure)'
            WHEN "procedure_description" = 'Screening surveillance (regime/therapy)' THEN 'Screening surveillance (procedure)'
            WHEN "procedure_description" = 'Encounter Inpatient' THEN 'Inpatient encounter (procedure)'
            WHEN "procedure_description" = 'Drug rehabilitation and detoxification' THEN 'Drug rehabilitation and detoxification (procedure)'
            WHEN "procedure_description" = 'Office Visit' THEN 'Office visit (procedure)'
            WHEN "procedure_description" = 'Emergency Encounter' THEN 'Emergency encounter (procedure)'
            WHEN "procedure_description" = 'Encounter for Problem' THEN 'Encounter for problem (procedure)'
            WHEN "procedure_description" = 'Patient-initiated encounter' THEN 'Patient-initiated encounter (procedure)'
            WHEN "procedure_description" = 'Admission to thoracic surgery department' THEN 'Admission to thoracic surgery department (procedure)'
            WHEN "procedure_description" = 'posttraumatic stress disorder' THEN 'Posttraumatic stress disorder follow-up (procedure)'
            WHEN "procedure_description" = 'Initial Psychiatric Interview with mental status evaluation' THEN 'Initial psychiatric interview (procedure)'
            ELSE "procedure_description"
        END AS "procedure_description",
        "base_cost",
        "total_cost",
        "payer_coverage",
        "reason_code",
        CASE
            WHEN "reason_description" = 'Chronic congestive heart failure (disorder)' THEN 'Chronic congestive heart failure'
            WHEN "reason_description" = 'Viral sinusitis (disorder)' THEN 'Viral sinusitis'
            WHEN "reason_description" = 'Acute bronchitis (disorder)' THEN 'Acute bronchitis'
            WHEN "reason_description" = 'Acute viral pharyngitis (disorder)' THEN 'Acute viral pharyngitis'
            WHEN "reason_description" = 'Malignant neoplasm of breast (disorder)' THEN 'Malignant neoplasm of breast'
            WHEN "reason_description" = 'Ischemic heart disease (disorder)' THEN 'Ischemic heart disease'
            WHEN "reason_description" = 'Anemia (disorder)' THEN 'Anemia'
            WHEN "reason_description" = 'Sinusitis (disorder)' THEN 'Sinusitis'
            WHEN "reason_description" = 'Sleep disorder (disorder)' THEN 'Sleep disorder'
            WHEN "reason_description" = 'Acute bacterial sinusitis (disorder)' THEN 'Acute bacterial sinusitis'
            WHEN "reason_description" = 'Streptococcal sore throat (disorder)' THEN 'Streptococcal sore throat'
            WHEN "reason_description" = 'Acute myeloid leukemia  disease (disorder)' THEN 'Acute myeloid leukemia'
            WHEN "reason_description" = 'Alzheimer''s disease (disorder)' THEN 'Alzheimer''s disease'
            WHEN "reason_description" = 'History of coronary artery bypass grafting (situation)' THEN 'History of coronary artery bypass grafting'
            WHEN "reason_description" = 'Abnormal findings diagnostic imaging heart+coronary circulat (finding)' THEN 'Abnormal findings diagnostic imaging heart and coronary circulation'
            WHEN "reason_description" = 'Acute ST segment elevation myocardial infarction (disorder)' THEN 'Acute ST segment elevation myocardial infarction'
            WHEN "reason_description" = 'Acute non-ST segment elevation myocardial infarction (disorder)' THEN 'Acute non-ST segment elevation myocardial infarction'
            WHEN "reason_description" = 'Pulmonary emphysema (disorder)' THEN 'Pulmonary emphysema'
            WHEN "reason_description" = 'Aortic valve stenosis (disorder)' THEN 'Aortic valve stenosis'
            WHEN "reason_description" = 'Sepsis (disorder)' THEN 'Sepsis'
            WHEN "reason_description" = 'Chronic obstructive bronchitis (disorder)' THEN 'Chronic obstructive bronchitis'
            WHEN "reason_description" = 'Fracture of forearm' THEN 'Fracture of Forearm'
            WHEN "reason_description" = 'Fracture subluxation of wrist' THEN 'Fracture Subluxation of Wrist'
            WHEN "reason_description" = 'Injury of medial collateral ligament of knee' THEN 'Injury of Medial Collateral Ligament of Knee'
            WHEN "reason_description" = 'Localized  primary osteoarthritis of the hand' THEN 'Localized Primary Osteoarthritis of the Hand'
            WHEN "reason_description" = 'Perennial allergic rhinitis' THEN 'Perennial Allergic Rhinitis'
            WHEN "reason_description" = 'Perennial allergic rhinitis with seasonal variation' THEN 'Perennial Allergic Rhinitis with Seasonal Variation'
            WHEN "reason_description" = 'Polyp of colon' THEN 'Polyp of Colon'
            ELSE "reason_description"
        END AS "reason_description"
    FROM "encounters_renamed"
),

"encounters_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- encounter_end: from VARCHAR to TIMESTAMP
    -- encounter_id: from VARCHAR to UUID
    -- encounter_start: from VARCHAR to TIMESTAMP
    -- organization_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- payer_id: from VARCHAR to UUID
    -- procedure_code: from INT to VARCHAR
    -- provider_id: from VARCHAR to UUID
    -- reason_code: from DECIMAL to VARCHAR
    SELECT
        "encounter_type",
        "procedure_description",
        "base_cost",
        "total_cost",
        "payer_coverage",
        "reason_description",
        CAST("encounter_end" AS TIMESTAMP) 
        AS "encounter_end",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("encounter_start" AS TIMESTAMP) 
        AS "encounter_start",
        CAST("organization_id" AS UUID) 
        AS "organization_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("payer_id" AS UUID) 
        AS "payer_id",
        CAST("procedure_code" AS VARCHAR) 
        AS "procedure_code",
        CAST("provider_id" AS UUID) 
        AS "provider_id",
        CAST("reason_code" AS VARCHAR) 
        AS "reason_code"
    FROM "encounters_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "encounters_renamed_cleaned_casted"

stg_encounters.yml (Document the table)

version: 2
models:
- name: stg_encounters
  description: The table is about medical encounters. It includes details like encounter
    ID, start and stop times, patient ID, provider ID, and payer ID. Each encounter
    has a class, code, description, and associated costs. The table links patients,
    healthcare providers, organizations, and payers involved in each encounter. It
    also includes information on claim costs and payer coverage.
  columns:
  - name: encounter_type
    description: Type or category of the medical encounter
    tests:
    - not_null
    - accepted_values:
        values:
        - ambulatory
        - wellness
        - outpatient
        - emergency
        - inpatient
        - home
        - urgentcare
        - snf
        - virtual
        - hospice
    cocoon_meta:
      future_accepted_values:
      - primary care
      - specialist
      - dental
      - vision
      - mental health
      - rehabilitation
      - palliative
      - long-term care
      data_type:
        current_data_type: VARCHAR
  - name: procedure_description
    description: Description of the procedure or visit
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: base_cost
    description: Base cost of the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: total_cost
    description: Total cost claimed for the encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: payer_coverage
    description: Amount covered by the insurance payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: reason_description
    description: Description of the reason for the encounter
    cocoon_meta:
      missing_reason: Not applicable for routine wellness visits without specific
        issues.
      data_type:
        current_data_type: VARCHAR
  - name: encounter_end
    description: End date and time of the encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: encounter_id
    description: Unique identifier for the encounter
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column is explicitly described as a unique identifier for
        the encounter. For this table, each row represents a unique medical encounter.
        The encounter_id is designed to be unique across all encounters.
      data_type:
        current_data_type: UUID
  - name: encounter_start
    description: Start date and time of the encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: organization_id
    description: Unique identifier for the healthcare organization
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: payer_id
    description: Unique identifier for the insurance payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: procedure_code
    description: Medical code for the procedure or visit
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: provider_id
    description: Unique identifier for the healthcare provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: reason_code
    description: Code for the reason of the encounter
    cocoon_meta:
      missing_reason: Not applicable for routine wellness visits without specific
        issues.
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_payers

Table Preview (first 5 rows)

payer_name ownership_type covered_amount uncovered_amount revenue covered_encounters uncovered_encounters covered_medications uncovered_medications covered_procedures uncovered_procedures covered_immunizations uncovered_immunizations unique_customers average_qol_score member_months city payer_id phone_number state street_address zip_code
0 Medicare GOVERNMENT 3245258.85 57324.03 460290.60 927 0 918 0 1270 0 202 0 16 0.753073 2748 None a735bf55-83e9-331a-899d-a82a60b9f60c None None None None
1 Medicaid GOVERNMENT 6112421.45 103336.16 45600.00 1217 0 286 0 1908 0 534 0 19 0.953093 4524 None df166300-5a78-3502-a46a-832842197811 None None None None
2 UnitedHealthcare PRIVATE 37929.59 28070.66 218949.27 49 0 10 0 36 0 41 0 3 0.800393 384 None d31fccc3-1767-390d-966a-22a5156f4219 None None None None
3 Cigna Health PRIVATE 2110812.70 318590.30 2102717.53 727 0 236 0 765 0 412 0 13 0.939912 3660 None 8fa6c185-e44e-3e34-8bd8-39be8694f4ce None None None None
4 Dual Eligible GOVERNMENT 931603.10 8658.65 2990.00 201 0 84 0 231 0 48 0 5 0.878956 300 None d18ef2e6-ef40-324c-be54-34a5ee865625 None None None None

stg_payers.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:12:14.679962+00:00
WITH 
"payers_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> payer_id
    -- NAME -> payer_name
    -- OWNERSHIP -> ownership_type
    -- ADDRESS -> street_address
    -- CITY -> city
    -- STATE_HEADQUARTERED -> state
    -- ZIP -> zip_code
    -- PHONE -> phone_number
    -- AMOUNT_COVERED -> covered_amount
    -- AMOUNT_UNCOVERED -> uncovered_amount
    -- REVENUE -> revenue
    -- COVERED_ENCOUNTERS -> covered_encounters
    -- UNCOVERED_ENCOUNTERS -> uncovered_encounters
    -- COVERED_MEDICATIONS -> covered_medications
    -- UNCOVERED_MEDICATIONS -> uncovered_medications
    -- COVERED_PROCEDURES -> covered_procedures
    -- UNCOVERED_PROCEDURES -> uncovered_procedures
    -- COVERED_IMMUNIZATIONS -> covered_immunizations
    -- UNCOVERED_IMMUNIZATIONS -> uncovered_immunizations
    -- UNIQUE_CUSTOMERS -> unique_customers
    -- QOLS_AVG -> average_qol_score
    -- MEMBER_MONTHS -> member_months
    SELECT 
        "Id" AS "payer_id",
        "NAME" AS "payer_name",
        "OWNERSHIP" AS "ownership_type",
        "ADDRESS" AS "street_address",
        "CITY" AS "city",
        "STATE_HEADQUARTERED" AS "state",
        "ZIP" AS "zip_code",
        "PHONE" AS "phone_number",
        "AMOUNT_COVERED" AS "covered_amount",
        "AMOUNT_UNCOVERED" AS "uncovered_amount",
        "REVENUE" AS "revenue",
        "COVERED_ENCOUNTERS" AS "covered_encounters",
        "UNCOVERED_ENCOUNTERS" AS "uncovered_encounters",
        "COVERED_MEDICATIONS" AS "covered_medications",
        "UNCOVERED_MEDICATIONS" AS "uncovered_medications",
        "COVERED_PROCEDURES" AS "covered_procedures",
        "UNCOVERED_PROCEDURES" AS "uncovered_procedures",
        "COVERED_IMMUNIZATIONS" AS "covered_immunizations",
        "UNCOVERED_IMMUNIZATIONS" AS "uncovered_immunizations",
        "UNIQUE_CUSTOMERS" AS "unique_customers",
        "QOLS_AVG" AS "average_qol_score",
        "MEMBER_MONTHS" AS "member_months"
    FROM "memory"."main"."payers"
),

"payers_renamed_casted" AS (
    -- Column Type Casting: 
    -- city: from DECIMAL to VARCHAR
    -- payer_id: from VARCHAR to UUID
    -- phone_number: from DECIMAL to VARCHAR
    -- state: from DECIMAL to VARCHAR
    -- street_address: from DECIMAL to VARCHAR
    -- zip_code: from DECIMAL to VARCHAR
    SELECT
        "payer_name",
        "ownership_type",
        "covered_amount",
        "uncovered_amount",
        "revenue",
        "covered_encounters",
        "uncovered_encounters",
        "covered_medications",
        "uncovered_medications",
        "covered_procedures",
        "uncovered_procedures",
        "covered_immunizations",
        "uncovered_immunizations",
        "unique_customers",
        "average_qol_score",
        "member_months",
        CAST("city" AS VARCHAR) 
        AS "city",
        CAST("payer_id" AS UUID) 
        AS "payer_id",
        CAST("phone_number" AS VARCHAR) 
        AS "phone_number",
        CAST("state" AS VARCHAR) 
        AS "state",
        CAST("street_address" AS VARCHAR) 
        AS "street_address",
        CAST("zip_code" AS VARCHAR) 
        AS "zip_code"
    FROM "payers_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "payers_renamed_casted"

stg_payers.yml (Document the table)

version: 2
models:
- name: stg_payers
  description: The table is about payers in the healthcare system. It includes government
    and private payers. Each payer has an ID, name, ownership type, and contact details.
    The table provides financial data like covered and uncovered amounts, revenue,
    and encounters. It also includes information on medications, procedures, immunizations,
    unique customers, quality of life scores, and member months.
  columns:
  - name: payer_name
    description: Name of the payer organization
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: ownership_type
    description: Type of ownership (government or private)
    tests:
    - not_null
    - accepted_values:
        values:
        - PRIVATE
        - GOVERNMENT
        - NO_INSURANCE
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: covered_amount
    description: Total amount covered by the payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: uncovered_amount
    description: Total amount not covered by the payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: revenue
    description: Total revenue generated by the payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: covered_encounters
    description: Number of covered healthcare encounters
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: uncovered_encounters
    description: Number of uncovered healthcare encounters
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: covered_medications
    description: Number of covered medications
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: uncovered_medications
    description: Number of uncovered medications
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: covered_procedures
    description: Number of covered medical procedures
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: uncovered_procedures
    description: Number of uncovered medical procedures
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: covered_immunizations
    description: Number of covered immunizations
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: uncovered_immunizations
    description: Number of uncovered immunizations
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: unique_customers
    description: Number of unique customers served
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: average_qol_score
    description: Average Quality of Life Score
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: member_months
    description: Total number of months members were enrolled
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: city
    description: City where the payer is located
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: payer_id
    description: Unique identifier for each payer
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for each payer. For
        this table, each row is for a distinct payer, and payer_id is unique across
        rows.
      data_type:
        current_data_type: UUID
  - name: phone_number
    description: Contact phone number for the payer
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: state
    description: State where the payer is headquartered
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: street_address
    description: Street address of the payer
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: zip_code
    description: ZIP code of the payer's location
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_patients

Table Preview (first 5 rows)

ssn drivers_license passport_number name_prefix first_name last_name name_suffix maiden_name marital_status race ethnicity gender birthplace street_address city state county fips_code latitude longitude healthcare_expenses healthcare_coverage annual_income birth_date death_date patient_id zip_code
0 None None None None Mel Bailey598 None None None white nonhispanic M Norton Center Massachusetts US 716 Wunsch Gardens Unit 48 Framingham Massachusetts Middlesex County 25017.0 42.275650 -71.476367 2520.80 4323.64 170754 2021-09-23 NaT eb247227-e839-88d3-447d-b5972468f33b 1701
1 None S99957814 X45639058X Mrs. Cheyenne Marks830 None Lowe M white nonhispanic F Longmeadow Massachusetts US 123 Bayer Camp Taunton Massachusetts Bristol County 25005.0 41.892884 -71.066686 205342.20 94647.00 40526 1944-05-31 NaT 2ffa361e-5858-877e-e022-ce81fe32da1b 2718
2 None None None None Hunter Keebler762 None None None white nonhispanic M Maynard Massachusetts US 575 Jast Rue Unit 48 Winchendon Massachusetts Worcester County 25027.0 42.670059 -72.074664 16381.92 17447.87 79884 2009-02-08 NaT 3dfb065a-67df-5b8a-3901-49bfd834bed1 1475
3 None S99968506 None None Herschel Ernser583 None None None asian nonhispanic M Somerville Massachusetts US 184 Langworth Parade Apt 10 Boston Massachusetts Suffolk County 25025.0 42.395516 -71.059015 3850.00 44057.32 6420 1979-06-26 1995-07-04 db80575b-5e9b-921b-fad9-1e3a20929dc7 2131
4 None S99967405 X86891718X Mrs. Lacey Heathcote539 None Hegmann M white nonhispanic F Natick Massachusetts US 801 Morissette Divide Hingham Massachusetts Plymouth County 25023.0 42.200723 -70.836590 66662.10 1777031.06 933420 1973-05-31 NaT d84815a3-c5b3-8ca2-025f-6323a4ec59ef 2043

stg_patients.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:11:16.831598+00:00
WITH 
"patients_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> patient_id
    -- BIRTHDATE -> birth_date
    -- DEATHDATE -> death_date
    -- SSN -> ssn
    -- DRIVERS -> drivers_license
    -- PASSPORT -> passport_number
    -- PREFIX -> name_prefix
    -- FIRST_ -> first_name
    -- LAST_ -> last_name
    -- SUFFIX -> name_suffix
    -- MAIDEN -> maiden_name
    -- MARITAL -> marital_status
    -- RACE -> race
    -- ETHNICITY -> ethnicity
    -- GENDER -> gender
    -- BIRTHPLACE -> birthplace
    -- ADDRESS -> street_address
    -- CITY -> city
    -- STATE -> state
    -- COUNTY -> county
    -- FIPS -> fips_code
    -- ZIP -> zip_code
    -- LAT -> latitude
    -- LON -> longitude
    -- HEALTHCARE_EXPENSES -> healthcare_expenses
    -- HEALTHCARE_COVERAGE -> healthcare_coverage
    -- INCOME -> annual_income
    SELECT 
        "Id" AS "patient_id",
        "BIRTHDATE" AS "birth_date",
        "DEATHDATE" AS "death_date",
        "SSN" AS "ssn",
        "DRIVERS" AS "drivers_license",
        "PASSPORT" AS "passport_number",
        "PREFIX" AS "name_prefix",
        "FIRST_" AS "first_name",
        "LAST_" AS "last_name",
        "SUFFIX" AS "name_suffix",
        "MAIDEN" AS "maiden_name",
        "MARITAL" AS "marital_status",
        "RACE" AS "race",
        "ETHNICITY" AS "ethnicity",
        "GENDER" AS "gender",
        "BIRTHPLACE" AS "birthplace",
        "ADDRESS" AS "street_address",
        "CITY" AS "city",
        "STATE" AS "state",
        "COUNTY" AS "county",
        "FIPS" AS "fips_code",
        "ZIP" AS "zip_code",
        "LAT" AS "latitude",
        "LON" AS "longitude",
        "HEALTHCARE_EXPENSES" AS "healthcare_expenses",
        "HEALTHCARE_COVERAGE" AS "healthcare_coverage",
        "INCOME" AS "annual_income"
    FROM "memory"."main"."patients"
),

"patients_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- ssn: The problem is that all the SSN values begin with 999, which is invalid for U.S. Social Security Numbers. Valid SSNs never start with 999. These appear to be placeholder or dummy SSNs rather than real ones. Since we can't determine the correct SSNs from this information alone, the best approach is to map these invalid values to an empty string to indicate missing data.
    -- first_name: The problem is that all first names have a random 3-digit number appended to them. These numbers appear to be meaningless and don't provide any useful information. The correct values should be just the first names without the numbers.
    -- last_name: The problem is that most last names follow a pattern of a capitalized English surname followed by 3 digits, but there are two exceptions: 1. "Bermúdez789" contains a non-English character (ú). 2. "Roldán470" contains a non-English character (á). These should be normalized to use standard English characters. The correct values should maintain the same naming pattern as the majority of entries.
    -- maiden_name: The problem is that all values in the maiden_name column are unusual because they combine surnames with three-digit numbers, which is not typical for real maiden names. This appears to be a data generation artifact rather than genuine maiden names. Since we can't determine the correct maiden names from this data, the best approach is to remove the numeric portion and keep only the surname part, which could potentially be valid maiden names.
    SELECT
        "patient_id",
        "birth_date",
        "death_date",
        CASE
            WHEN "ssn" = '999-10-1178' THEN NULL
            WHEN "ssn" = '999-10-6028' THEN NULL
            WHEN "ssn" = '999-12-9121' THEN NULL
            WHEN "ssn" = '999-14-9380' THEN NULL
            WHEN "ssn" = '999-14-9672' THEN NULL
            WHEN "ssn" = '999-16-4297' THEN NULL
            WHEN "ssn" = '999-17-2611' THEN NULL
            WHEN "ssn" = '999-20-4271' THEN NULL
            WHEN "ssn" = '999-20-5403' THEN NULL
            WHEN "ssn" = '999-22-2635' THEN NULL
            WHEN "ssn" = '999-23-4696' THEN NULL
            WHEN "ssn" = '999-23-9351' THEN NULL
            WHEN "ssn" = '999-26-4422' THEN NULL
            WHEN "ssn" = '999-26-8041' THEN NULL
            WHEN "ssn" = '999-29-2359' THEN NULL
            WHEN "ssn" = '999-29-4844' THEN NULL
            WHEN "ssn" = '999-29-7349' THEN NULL
            WHEN "ssn" = '999-30-8851' THEN NULL
            WHEN "ssn" = '999-32-4862' THEN NULL
            WHEN "ssn" = '999-33-4589' THEN NULL
            WHEN "ssn" = '999-35-8448' THEN NULL
            WHEN "ssn" = '999-36-1150' THEN NULL
            WHEN "ssn" = '999-37-8682' THEN NULL
            WHEN "ssn" = '999-38-7473' THEN NULL
            WHEN "ssn" = '999-41-1756' THEN NULL
            WHEN "ssn" = '999-42-9847' THEN NULL
            WHEN "ssn" = '999-43-4282' THEN NULL
            WHEN "ssn" = '999-44-2795' THEN NULL
            WHEN "ssn" = '999-44-9634' THEN NULL
            WHEN "ssn" = '999-48-5926' THEN NULL
            WHEN "ssn" = '999-49-9846' THEN NULL
            WHEN "ssn" = '999-50-5586' THEN NULL
            WHEN "ssn" = '999-50-5697' THEN NULL
            WHEN "ssn" = '999-51-3221' THEN NULL
            WHEN "ssn" = '999-55-3195' THEN NULL
            WHEN "ssn" = '999-55-6098' THEN NULL
            WHEN "ssn" = '999-56-9201' THEN NULL
            WHEN "ssn" = '999-57-7157' THEN NULL
            WHEN "ssn" = '999-58-7543' THEN NULL
            WHEN "ssn" = '999-59-2568' THEN NULL
            WHEN "ssn" = '999-61-4140' THEN NULL
            WHEN "ssn" = '999-61-6611' THEN NULL
            WHEN "ssn" = '999-61-6740' THEN NULL
            WHEN "ssn" = '999-62-7937' THEN NULL
            WHEN "ssn" = '999-70-2405' THEN NULL
            WHEN "ssn" = '999-70-4594' THEN NULL
            WHEN "ssn" = '999-71-8314' THEN NULL
            WHEN "ssn" = '999-73-5643' THEN NULL
            WHEN "ssn" = '999-77-7700' THEN NULL
            WHEN "ssn" = '999-79-2426' THEN NULL
            WHEN "ssn" = '999-79-3695' THEN NULL
            WHEN "ssn" = '999-85-2178' THEN NULL
            WHEN "ssn" = '999-88-1792' THEN NULL
            WHEN "ssn" = '999-89-9127' THEN NULL
            WHEN "ssn" = '999-91-9580' THEN NULL
            WHEN "ssn" = '999-93-7263' THEN NULL
            WHEN "ssn" = '999-95-3792' THEN NULL
            WHEN "ssn" = '999-96-6743' THEN NULL
            WHEN "ssn" = '999-99-2106' THEN NULL
            WHEN "ssn" = '999-99-2436' THEN NULL
            ELSE "ssn"
        END AS "ssn",
        "drivers_license",
        "passport_number",
        "name_prefix",
        CASE
            WHEN "first_name" = 'Hayden835' THEN 'Hayden'
            WHEN "first_name" = 'Adelia946' THEN 'Adelia'
            WHEN "first_name" = 'Andera917' THEN 'Andera'
            WHEN "first_name" = 'Antony83' THEN 'Antony'
            WHEN "first_name" = 'Burton124' THEN 'Burton'
            WHEN "first_name" = 'Carmelita854' THEN 'Carmelita'
            WHEN "first_name" = 'Carrol931' THEN 'Carrol'
            WHEN "first_name" = 'Cedrick207' THEN 'Cedrick'
            WHEN "first_name" = 'Cheyenne169' THEN 'Cheyenne'
            WHEN "first_name" = 'Christal240' THEN 'Christal'
            WHEN "first_name" = 'Cindy893' THEN 'Cindy'
            WHEN "first_name" = 'Coleman27' THEN 'Coleman'
            WHEN "first_name" = 'Corey514' THEN 'Corey'
            WHEN "first_name" = 'Damon455' THEN 'Damon'
            WHEN "first_name" = 'Daniela614' THEN 'Daniela'
            WHEN "first_name" = 'Dione665' THEN 'Dione'
            WHEN "first_name" = 'Dominick530' THEN 'Dominick'
            WHEN "first_name" = 'Elden718' THEN 'Elden'
            WHEN "first_name" = 'Ethel888' THEN 'Ethel'
            WHEN "first_name" = 'Fletcher87' THEN 'Fletcher'
            WHEN "first_name" = 'Garry927' THEN 'Garry'
            WHEN "first_name" = 'Gayle448' THEN 'Gayle'
            WHEN "first_name" = 'Grant908' THEN 'Grant'
            WHEN "first_name" = 'Guadalupe206' THEN 'Guadalupe'
            WHEN "first_name" = 'Herb645' THEN 'Herb'
            WHEN "first_name" = 'Herschel574' THEN 'Herschel'
            WHEN "first_name" = 'Hildred696' THEN 'Hildred'
            WHEN "first_name" = 'Huey641' THEN 'Huey'
            WHEN "first_name" = 'Hunter736' THEN 'Hunter'
            WHEN "first_name" = 'Irving123' THEN 'Irving'
            WHEN "first_name" = 'Janeth814' THEN 'Janeth'
            WHEN "first_name" = 'Jenae263' THEN 'Jenae'
            WHEN "first_name" = 'Jimmie93' THEN 'Jimmie'
            WHEN "first_name" = 'Kaycee352' THEN 'Kaycee'
            WHEN "first_name" = 'Kirk871' THEN 'Kirk'
            WHEN "first_name" = 'Lacey714' THEN 'Lacey'
            WHEN "first_name" = 'Lavette209' THEN 'Lavette'
            WHEN "first_name" = 'Leandro563' THEN 'Leandro'
            WHEN "first_name" = 'Lessie363' THEN 'Lessie'
            WHEN "first_name" = 'Lillia547' THEN 'Lillia'
            WHEN "first_name" = 'Linn541' THEN 'Linn'
            WHEN "first_name" = 'Lonny638' THEN 'Lonny'
            WHEN "first_name" = 'Luke971' THEN 'Luke'
            WHEN "first_name" = 'Manuel446' THEN 'Manuel'
            WHEN "first_name" = 'Martín25' THEN 'Martín'
            WHEN "first_name" = 'Mel236' THEN 'Mel'
            WHEN "first_name" = 'Mirta419' THEN 'Mirta'
            WHEN "first_name" = 'Moises22' THEN 'Moises'
            WHEN "first_name" = 'Raymon366' THEN 'Raymon'
            WHEN "first_name" = 'Shaquana156' THEN 'Shaquana'
            WHEN "first_name" = 'Shayla126' THEN 'Shayla'
            WHEN "first_name" = 'Shiela18' THEN 'Shiela'
            WHEN "first_name" = 'Steve819' THEN 'Steve'
            WHEN "first_name" = 'Stewart672' THEN 'Stewart'
            WHEN "first_name" = 'Tyrell880' THEN 'Tyrell'
            WHEN "first_name" = 'Vanesa40' THEN 'Vanesa'
            WHEN "first_name" = 'Whitney250' THEN 'Whitney'
            WHEN "first_name" = 'Yetta429' THEN 'Yetta'
            WHEN "first_name" = 'Zoila41' THEN 'Zoila'
            ELSE "first_name"
        END AS "first_name",
        CASE
            WHEN "last_name" = 'Bermúdez789' THEN 'Bermudez789'
            WHEN "last_name" = 'Roldán470' THEN 'Roldan470'
            WHEN "last_name" = 'Wilkinson796' THEN 'Wilkinson'
            WHEN "last_name" = 'Willms744' THEN 'Willms'
            WHEN "last_name" = 'Zboncak558' THEN 'Zboncak'
            ELSE "last_name"
        END AS "last_name",
        "name_suffix",
        CASE
            WHEN "maiden_name" = 'Barela183' THEN 'Barela'
            WHEN "maiden_name" = 'Beier427' THEN 'Beier'
            WHEN "maiden_name" = 'Deckow585' THEN 'Deckow'
            WHEN "maiden_name" = 'DuBuque211' THEN 'DuBuque'
            WHEN "maiden_name" = 'Gorczany269' THEN 'Gorczany'
            WHEN "maiden_name" = 'Hegmann834' THEN 'Hegmann'
            WHEN "maiden_name" = 'Hermann103' THEN 'Hermann'
            WHEN "maiden_name" = 'Kohler843' THEN 'Kohler'
            WHEN "maiden_name" = 'Lowe577' THEN 'Lowe'
            WHEN "maiden_name" = 'Predovic534' THEN 'Predovic'
            WHEN "maiden_name" = 'Shanahan202' THEN 'Shanahan'
            ELSE "maiden_name"
        END AS "maiden_name",
        "marital_status",
        "race",
        "ethnicity",
        "gender",
        "birthplace",
        "street_address",
        "city",
        "state",
        "county",
        "fips_code",
        "zip_code",
        "latitude",
        "longitude",
        "healthcare_expenses",
        "healthcare_coverage",
        "annual_income"
    FROM "patients_renamed"
),

"patients_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- birth_date: from VARCHAR to DATE
    -- death_date: from VARCHAR to DATE
    -- patient_id: from VARCHAR to UUID
    -- zip_code: from INT to VARCHAR
    SELECT
        "ssn",
        "drivers_license",
        "passport_number",
        "name_prefix",
        "first_name",
        "last_name",
        "name_suffix",
        "maiden_name",
        "marital_status",
        "race",
        "ethnicity",
        "gender",
        "birthplace",
        "street_address",
        "city",
        "state",
        "county",
        "fips_code",
        "latitude",
        "longitude",
        "healthcare_expenses",
        "healthcare_coverage",
        "annual_income",
        CAST("birth_date" AS DATE) 
        AS "birth_date",
        CAST("death_date" AS DATE) 
        AS "death_date",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("zip_code" AS VARCHAR) 
        AS "zip_code"
    FROM "patients_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "patients_renamed_cleaned_casted"

stg_patients.yml (Document the table)

version: 2
models:
- name: stg_patients
  description: The table is about patients' personal and healthcare information. It
    includes demographics like birth date, race, gender, and marital status. It also
    contains contact details such as address, city, and state. Healthcare-related
    information includes expenses, coverage, and income. The table provides a comprehensive
    profile of each patient, including unique identifiers like SSN and driver's license
    numbers.
  columns:
  - name: ssn
    description: Social Security Number of the patient
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: drivers_license
    description: Driver's license number of the patient
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: passport_number
    description: Passport number of the patient
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: name_prefix
    description: Title or honorific prefix of the patient
    tests:
    - accepted_values:
        values:
        - Mr.
        - Ms.
        - Mrs.
    cocoon_meta:
      missing_reason: Not applicable if person doesn't use a prefix.
      future_accepted_values:
      - Miss
      - Dr.
      - Prof.
      - Rev.
      - Sir
      - Lady
      - Dame
      - Lord
      - Capt.
      - Lt.
      - Col.
      - Gen.
      - Adm.
      - Mx.
      data_type:
        current_data_type: VARCHAR
  - name: first_name
    description: First name of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: last_name
    description: Last name of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: name_suffix
    description: Name suffix of the patient, if any
    tests:
    - accepted_values:
        values:
        - JD
        - MD
        - PhD
    cocoon_meta:
      missing_reason: Not applicable if person doesn't have a suffix.
      future_accepted_values:
      - Sr.
      - Jr.
      - I
      - II
      - III
      - IV
      - V
      - Esq.
      - DDS
      - DVM
      - DO
      - RN
      - BSN
      - MSN
      - PA
      - NP
      - CPA
      - PE
      data_type:
        current_data_type: VARCHAR
  - name: maiden_name
    description: Maiden name of the patient, if applicable
    cocoon_meta:
      missing_reason: Not applicable for males or unmarried females.
      data_type:
        current_data_type: VARCHAR
  - name: marital_status
    description: Marital status of the patient
    tests:
    - accepted_values:
        values:
        - M
        - S
    cocoon_meta:
      missing_reason: Not applicable for minors or very young individuals.
      future_accepted_values:
      - D
      - W
      - P
      - U
      data_type:
        current_data_type: VARCHAR
  - name: race
    description: Racial background of the patient
    tests:
    - not_null
    - accepted_values:
        values:
        - white
        - asian
        - black
    cocoon_meta:
      future_accepted_values:
      - White
      - Asian
      - Black
      - Hispanic or Latino
      - American Indian or Alaska Native
      - Native Hawaiian or Other Pacific Islander
      - Two or More Races
      - Other
      - Unknown
      - Prefer not to say
      data_type:
        current_data_type: VARCHAR
  - name: ethnicity
    description: Ethnic background of the patient
    tests:
    - not_null
    - accepted_values:
        values:
        - nonhispanic
        - hispanic
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: gender
    description: Gender of the patient
    tests:
    - not_null
    - accepted_values:
        values:
        - M
        - F
    cocoon_meta:
      future_accepted_values:
      - O
      data_type:
        current_data_type: VARCHAR
  - name: birthplace
    description: Place of birth of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: street_address
    description: Street address of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: city
    description: City of residence of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: state
    description: State of residence of the patient
    tests:
    - not_null
    - accepted_values:
        values:
        - Massachusetts
    cocoon_meta:
      future_accepted_values:
      - Alabama
      - Alaska
      - Arizona
      - Arkansas
      - California
      - Colorado
      - Connecticut
      - Delaware
      - Florida
      - Georgia
      - Hawaii
      - Idaho
      - Illinois
      - Indiana
      - Iowa
      - Kansas
      - Kentucky
      - Louisiana
      - Maine
      - Maryland
      - Michigan
      - Minnesota
      - Mississippi
      - Missouri
      - Montana
      - Nebraska
      - Nevada
      - New Hampshire
      - New Jersey
      - New Mexico
      - New York
      - North Carolina
      - North Dakota
      - Ohio
      - Oklahoma
      - Oregon
      - Pennsylvania
      - Rhode Island
      - South Carolina
      - South Dakota
      - Tennessee
      - Texas
      - Utah
      - Vermont
      - Virginia
      - Washington
      - West Virginia
      - Wisconsin
      - Wyoming
      data_type:
        current_data_type: VARCHAR
  - name: county
    description: County of residence of the patient
    tests:
    - not_null
    - accepted_values:
        values:
        - Hampden County
        - Plymouth County
        - Essex County
        - Middlesex County
        - Suffolk County
        - Worcester County
        - Bristol County
        - Barnstable County
        - Norfolk County
        - Berkshire County
        - Franklin County
    cocoon_meta:
      future_accepted_values:
      - Dukes County
      - Hampshire County
      - Nantucket County
      data_type:
        current_data_type: VARCHAR
  - name: fips_code
    description: Federal Information Processing Standards code for location
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: DECIMAL
  - name: latitude
    description: Latitude coordinate of the patient's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: longitude
    description: Longitude coordinate of the patient's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: healthcare_expenses
    description: Total healthcare expenses for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: healthcare_coverage
    description: Total healthcare coverage for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: annual_income
    description: Annual income of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: birth_date
    description: Date of birth of the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DATE
  - name: death_date
    description: Date of death of the patient, if applicable
    cocoon_meta:
      missing_reason: Not applicable for living individuals.
      data_type:
        current_data_type: DATE
  - name: patient_id
    description: Unique identifier for each patient record
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column is a unique identifier for each patient. For this
        table, each row represents a distinct patient, and patient_id appears to be
        unique across rows.
      data_type:
        current_data_type: UUID
  - name: zip_code
    description: ZIP code of the patient's residence
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_careplans

Table Preview (first 5 rows)

treatment_description diagnosis_code diagnosis_description care_plan_id encounter_id end_date patient_id start_date treatment_code
0 Fracture care 58150001.0 Fracture of clavicle e5cc23db-c84f-3ded-411d-deb020dfcb4c 1600b01f-01b3-fd15-135f-c7c4a30d867f 2016-07-23 3dfb065a-67df-5b8a-3901-49bfd834bed1 2016-04-11 385691007
1 Wound care 284551006.0 Laceration of foot d70b9f2f-745e-f594-4983-b48790e4291f 3c131d0c-6d15-7fe1-1372-a3d8316b6d82 1988-05-12 db80575b-5e9b-921b-fad9-1e3a20929dc7 1988-04-27 225358003
2 Respiratory therapy NaN None 54b5c4bb-1df6-0892-7097-416ecea0f44f d6df3fd6-e4a0-9b51-ebd5-c6394be154eb 1990-07-10 db80575b-5e9b-921b-fad9-1e3a20929dc7 1989-09-03 53950000
3 Wound care 370247008.0 Facial laceration 254cfb18-3266-be35-b8c4-8a6a89ce0547 17d8fd78-be39-d2f7-6032-74d712972fa3 2014-02-03 2ffa361e-5858-877e-e022-ce81fe32da1b 2014-01-20 225358003
4 Head injury rehabilitation 62106007.0 Concussion with no loss of consciousness 7e07f691-b437-c62b-4379-0a8e55812747 5fbc8a13-6944-a854-c4cb-bbd04e969ee1 2017-09-02 3dfb065a-67df-5b8a-3901-49bfd834bed1 2017-06-08 47387005

stg_careplans.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:47:43.534687+00:00
WITH 
"careplans_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> care_plan_id
    -- START -> start_date
    -- STOP -> end_date
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> treatment_code
    -- DESCRIPTION -> treatment_description
    -- REASONCODE -> diagnosis_code
    -- REASONDESCRIPTION -> diagnosis_description
    SELECT 
        "Id" AS "care_plan_id",
        "START" AS "start_date",
        "STOP" AS "end_date",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "treatment_code",
        "DESCRIPTION" AS "treatment_description",
        "REASONCODE" AS "diagnosis_code",
        "REASONDESCRIPTION" AS "diagnosis_description"
    FROM "memory"."main"."careplans"
),

"careplans_renamed_casted" AS (
    -- Column Type Casting: 
    -- care_plan_id: from VARCHAR to UUID
    -- encounter_id: from VARCHAR to UUID
    -- end_date: from VARCHAR to DATE
    -- patient_id: from VARCHAR to UUID
    -- start_date: from VARCHAR to DATE
    -- treatment_code: from INT to VARCHAR
    SELECT
        "treatment_description",
        "diagnosis_code",
        "diagnosis_description",
        CAST("care_plan_id" AS UUID) 
        AS "care_plan_id",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("end_date" AS DATE) 
        AS "end_date",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("start_date" AS DATE) 
        AS "start_date",
        CAST("treatment_code" AS VARCHAR) 
        AS "treatment_code"
    FROM "careplans_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "careplans_renamed_casted"

stg_careplans.yml (Document the table)

version: 2
models:
- name: stg_careplans
  description: The table is about care plans for patients. It shows the start and
    stop dates of treatments. Each care plan is linked to a patient and an encounter.
    The table includes codes and descriptions for the treatment and reason. It relates
    patients, encounters, and specific medical interventions over time. The data covers
    various medical issues like fractures, wounds, and head injuries.
  columns:
  - name: treatment_description
    description: Text description of the treatment or procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: diagnosis_code
    description: Numeric code for the reason of treatment
    cocoon_meta:
      missing_reason: Not applicable for non-diagnosis-related care plans.
      data_type:
        current_data_type: DECIMAL
  - name: diagnosis_description
    description: Text description of the reason for treatment
    cocoon_meta:
      missing_reason: Not applicable for non-diagnosis-related care plans.
      data_type:
        current_data_type: VARCHAR
  - name: care_plan_id
    description: Unique identifier for the care plan entry
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column is a unique identifier for each care plan entry.
        For this table, each row represents a distinct care plan. care_plan_id appears
        to be unique across rows, as it's designed to be a specific identifier for
        each care plan.
      data_type:
        current_data_type: UUID
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: end_date
    description: End date of the care plan
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: DATE
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: start_date
    description: Start date of the care plan
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DATE
  - name: treatment_code
    description: Numeric code for the treatment or procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_organizations

Table Preview (first 5 rows)

facility_name street_address CITY STATE latitude longitude PHONE revenue utilization_rate ZIP organization_id
0 FITCHBURG OUTPATIENT CLINIC 881 Main Street Fitchburg MA 42.586487 -71.805210 9783429781 0.0 131 1420 74ab949d-17ac-3309-83a0-13b4405c66aa
1 BOSTON HEALTH CARE FOR THE HOMELESS PROGRAM INC 461 WALNUT AVE JAMAICA PLAIN MA 42.311588 -71.098001 8576541550 0.0 30 21302331 588f6ce6-b8db-3588-8189-29db2680a313
2 UMASS MEMORIAL HEALTHALLIANCE CLINTON HOSPITAL INC 60 HOSPITAL RD LEOMINSTER MA 42.540319 -71.763130 9784662000 0.0 8 14533290 e09d4c49-c2ef-3b0f-9a46-3719d9219306
3 SOUTHWOOD AT NORWELL NURSING CENTER 501 CORDWAINER DRIVE NORWELL MA 42.164707 -70.881491 7819827450 0.0 12 20611630 aa682136-a4df-3942-9f50-e6fcee6c0b73
4 CAREWELL URGENT CARE CENTERS OF MA P.C. 484 RTE 134 WORCESTER MA 42.262562 -71.801888 5086947901 0.0 3 16071728 ab701a70-a658-340e-8f69-ee196a7d40c6

stg_organizations.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:08:13.429847+00:00
WITH 
"organizations_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> organization_id
    -- NAME -> facility_name
    -- ADDRESS -> street_address
    -- LAT -> latitude
    -- LON -> longitude
    -- REVENUE -> revenue
    -- UTILIZATION -> utilization_rate
    SELECT 
        "Id" AS "organization_id",
        "NAME" AS "facility_name",
        "ADDRESS" AS "street_address",
        "CITY",
        "STATE",
        "ZIP",
        "LAT" AS "latitude",
        "LON" AS "longitude",
        "PHONE",
        "REVENUE" AS "revenue",
        "UTILIZATION" AS "utilization_rate"
    FROM "memory"."main"."organizations"
),

"organizations_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- facility_name: The main problems are inconsistent use of 'INC' vs 'INC.', inconsistent spacing before 'LLC', and varying formats for company names. The correct values should use 'INC' without a period, remove extra spaces before 'LLC', and maintain consistent capitalization. Some entries also have typos or unusual abbreviations that need to be corrected.
    -- PHONE: The main problem is inconsistent formatting of phone numbers. Most numbers are either in a single string format (like '4134472000') or have parentheses for the area code (like '(978) 406-1327'). The value '413-731-6000 Or 413-731-6000' is particularly unusual as it repeats the same number and includes the word 'Or'. The correct values should follow the most common format, which appears to be a single string of digits without any separators.
    SELECT
        "organization_id",
        CASE
            WHEN "facility_name" = 'STEWARD GOOD SAMARITAN MEDICAL CENTER  INC.' THEN 'STEWARD GOOD SAMARITAN MEDICAL CENTER INC'
            WHEN "facility_name" = '135 BENTON DRIVE OPERATING COMPANY  LLC' THEN '135 BENTON DRIVE OPERATING COMPANY LLC'
            WHEN "facility_name" = '265 ESSEX STREET OPERATING COMPANY  LLC' THEN '265 ESSEX STREET OPERATING COMPANY LLC'
            WHEN "facility_name" = '4499 ACUSHNET AVENUE OPERATING COMPANY  LLC' THEN '4499 ACUSHNET AVENUE OPERATING COMPANY LLC'
            WHEN "facility_name" = 'ABBOTT HOME HEALTH CARE  INC' THEN 'ABBOTT HOME HEALTH CARE INC'
            WHEN "facility_name" = 'ARBOUR HOSPITAL  THE' THEN 'THE ARBOUR HOSPITAL'
            WHEN "facility_name" = 'ASSOCIATED PHYSICIANS OF HARVARD MEDICAL FACULTY PHYSICIANS AT BETH IS' THEN 'ASSOCIATED PHYSICIANS OF HARVARD MEDICAL FACULTY PHYSICIANS AT BETH ISRAEL'
            WHEN "facility_name" = 'BAYADA HOME HEALTH CARE  INC' THEN 'BAYADA HOME HEALTH CARE INC'
            WHEN "facility_name" = 'BEAR MT EAST LONGMEADOW LLC' THEN 'BEAR MOUNTAIN EAST LONGMEADOW LLC'
            WHEN "facility_name" = 'BEAR MT SPRINGFIELD LLC' THEN 'BEAR MOUNTAIN SPRINGFIELD LLC'
            WHEN "facility_name" = 'BERKSHIRE MEDICAL CENTER INC - 1' THEN 'BERKSHIRE MEDICAL CENTER INC'
            WHEN "facility_name" = 'BL HEALTHCARE INC DELEWARE' THEN 'BL HEALTHCARE INC DELAWARE'
            WHEN "facility_name" = 'BOSTON HOME  INC (THE)' THEN 'THE BOSTON HOME INC'
            WHEN "facility_name" = 'BOSTON MEDICAL CENTER CORPORATION-' THEN 'BOSTON MEDICAL CENTER CORPORATION'
            WHEN "facility_name" = 'BROCKTON HOSPITAL  INC.' THEN 'BROCKTON HOSPITAL INC'
            WHEN "facility_name" = 'CARECENTRAL URGENT CARE MEDICAL GROUP PC' THEN 'CARECENTRAL URGENT CARE MEDICAL GROUP P.C.'
            WHEN "facility_name" = 'CAREWELL URGENT CARE CENTERS OF MA  PC' THEN 'CAREWELL URGENT CARE CENTERS OF MA P.C.'
            WHEN "facility_name" = 'CARING HEALTH CENTER  INC' THEN 'CARING HEALTH CENTER INC'
            WHEN "facility_name" = 'CHARLES RIVER COMMUNITY HEALTH  INC' THEN 'CHARLES RIVER COMMUNITY HEALTH INC'
            WHEN "facility_name" = 'CHELMSFORD FAMILY PRACTICE  PC' THEN 'CHELMSFORD FAMILY PRACTICE P.C.'
            WHEN "facility_name" = 'COMPASSIONATE CARE HOSPICE OF SOUTHEASTERN MASS LL' THEN 'COMPASSIONATE CARE HOSPICE OF SOUTHEASTERN MASS LLC'
            WHEN "facility_name" = 'CONNECTICUT RIVER INTERNISTS. LLP' THEN 'CONNECTICUT RIVER INTERNISTS LLP'
            WHEN "facility_name" = 'DIMOCK COMMUNITY HEALTH CENTER  INC' THEN 'DIMOCK COMMUNITY HEALTH CENTER INC.'
            WHEN "facility_name" = 'DYOUVILLE TRANSITIONAL CARE INC' THEN 'D''YOUVILLE TRANSITIONAL CARE INC.'
            WHEN "facility_name" = 'EDWARD M KENNEDY COMMUNITY HEALTH CENTER INC' THEN 'EDWARD M. KENNEDY COMMUNITY HEALTH CENTER INC.'
            WHEN "facility_name" = 'ENCOMPASS HEALTH BRAINTREE HOSPITAL OF BRAINTREE' THEN 'ENCOMPASS HEALTH HOSPITAL OF BRAINTREE'
            WHEN "facility_name" = 'ENCOMPASS HEALTH REHAB HOSPITAL OF WESTERN MASS' THEN 'ENCOMPASS HEALTH REHAB HOSPITAL OF WESTERN MASS.'
            WHEN "facility_name" = 'EXCEL HOME CARE SERVICES INC' THEN 'EXCEL HOME CARE SERVICES INC.'
            WHEN "facility_name" = 'FALMOUTH HOSPITAL ASSOCIATION INC' THEN 'FALMOUTH HOSPITAL ASSOCIATION INC.'
            WHEN "facility_name" = 'FRANCISCAN HOSPITAL FOR CHILDREN INC' THEN 'FRANCISCAN HOSPITAL FOR CHILDREN INC.'
            WHEN "facility_name" = 'Fitchburg Outpatient Clinic' THEN 'FITCHBURG OUTPATIENT CLINIC'
            WHEN "facility_name" = 'HOLYOKE HEALTH CENTER INC' THEN 'HOLYOKE HEALTH CENTER INC.'
            WHEN "facility_name" = 'HOLYOKE MEDICAL CENTER INC' THEN 'HOLYOKE MEDICAL CENTER INC.'
            WHEN "facility_name" = 'LAHEY HOSPITAL & MEDICAL CENTER  BURLINGTON' THEN 'LAHEY HOSPITAL & MEDICAL CENTER BURLINGTON'
            WHEN "facility_name" = 'LIGHTHOUSE OF REVERE  INC.' THEN 'LIGHTHOUSE OF REVERE INC.'
            WHEN "facility_name" = 'LUTHERAN HOME OF JAMAICA PLAIN INC' THEN 'LUTHERAN HOME OF JAMAICA PLAIN INC.'
            WHEN "facility_name" = 'MELROSEWAKEFIELD HEALTHCARE  INC' THEN 'MELROSEWAKEFIELD HEALTHCARE, INC.'
            WHEN "facility_name" = 'MIDDLESEX INTERNAL MEDICINE ASSOCIATES  INC' THEN 'MIDDLESEX INTERNAL MEDICINE ASSOCIATES, INC.'
            WHEN "facility_name" = 'MORTON HOSPITAL  A STEWARD FAMILY HOSPITAL  INC.' THEN 'MORTON HOSPITAL, A STEWARD FAMILY HOSPITAL, INC.'
            WHEN "facility_name" = 'NEW BEDFORD INTERNAL MEDICINE & GERIATRICS  LLC' THEN 'NEW BEDFORD INTERNAL MEDICINE & GERIATRICS, LLC'
            WHEN "facility_name" = 'NEW ENGLAND FAMILY HEALTH LLC' THEN 'NEW ENGLAND FAMILY HEALTH, LLC'
            WHEN "facility_name" = 'NEW ENGLAND PROFESSIONAL HOME HEALTH CARE LLC' THEN 'NEW ENGLAND PROFESSIONAL HOME HEALTH CARE, LLC'
            WHEN "facility_name" = 'NORTH ADAMS COMMONS NURSING & REHABILITATION CENTE' THEN 'NORTH ADAMS COMMONS NURSING & REHABILITATION CENTER'
            WHEN "facility_name" = 'NORTH RIVER HOSPICE LLC' THEN 'NORTH RIVER HOSPICE, LLC'
            WHEN "facility_name" = 'NORTH SHORE MEDICAL CENTER INC' THEN 'NORTH SHORE MEDICAL CENTER, INC.'
            WHEN "facility_name" = 'OAKS  THE' THEN 'THE OAKS'
            WHEN "facility_name" = 'PALM SKILLED  NRSING CR & CTR FOR REHAB EXCELLENCE' THEN 'PALM SKILLED NURSING CARE & CENTER FOR REHAB EXCELLENCE'
            WHEN "facility_name" = 'PRIMARY & PREVENTIVE CARE  INC.' THEN 'PRIMARY & PREVENTIVE CARE, INC.'
            WHEN "facility_name" = 'Plymouth Outreach Clinic' THEN 'PLYMOUTH OUTREACH CLINIC'
            WHEN "facility_name" = 'QUINCY HEALTH AND REHABILITATION CENTER LLC' THEN 'QUINCY HEALTH AND REHABILITATION CENTER, LLC'
            WHEN "facility_name" = 'RIVER''S EDGE PRIMARY CARE LLC' THEN 'RIVER''S EDGE PRIMARY CARE, LLC'
            WHEN "facility_name" = 'ROYAL NORWELL NURSING & REHABILITATION CENTER LLC' THEN 'ROYAL NORWELL NURSING & REHABILITATION CENTER, LLC'
            WHEN "facility_name" = 'SEASONS HOSPICE & PALLIATIVE CARE OF MASS LLC' THEN 'SEASONS HOSPICE & PALLIATIVE CARE OF MASS, LLC'
            WHEN "facility_name" = 'SOUTH SHORE MEDICAL INVESTORS LLC' THEN 'SOUTH SHORE MEDICAL INVESTORS, LLC'
            WHEN "facility_name" = 'SOUTH SHORE PRIMARY AND URGENT CARE LLC' THEN 'SOUTH SHORE PRIMARY AND URGENT CARE, LLC'
            WHEN "facility_name" = 'SOUTHWOOD AT NORWELL NURSING CTR' THEN 'SOUTHWOOD AT NORWELL NURSING CENTER'
            WHEN "facility_name" = 'SUN MEDICAL CLINIC  PC' THEN 'SUN MEDICAL CLINIC, PC'
            WHEN "facility_name" = 'Springfield Outpatient Clinic' THEN 'SPRINGFIELD OUTPATIENT CLINIC'
            WHEN "facility_name" = 'TAUNTON MEDICAL CENTER  PC' THEN 'TAUNTON MEDICAL CENTER, PC'
            WHEN "facility_name" = 'TUFTS MEDICAL CENTER  INC' THEN 'TUFTS MEDICAL CENTER INC'
            WHEN "facility_name" = 'URGENT CARE MEDICAL ASSOCIATES  LLC' THEN 'URGENT CARE MEDICAL ASSOCIATES LLC'
            WHEN "facility_name" = 'URGENT CARE SPECIALISTS  P.C.' THEN 'URGENT CARE SPECIALISTS PC'
            WHEN "facility_name" = 'VIGILANT FAMILY HEALTH CLINIC  INC.' THEN 'VIGILANT FAMILY HEALTH CLINIC INC'
            WHEN "facility_name" = 'WESTBOROUGH BEHAVIORAL HEALTHCARE HOSPITAL  LLC' THEN 'WESTBOROUGH BEHAVIORAL HEALTHCARE HOSPITAL LLC'
            WHEN "facility_name" = 'WORCESTER INTERNAL MEDICINE  INC.' THEN 'WORCESTER INTERNAL MEDICINE INC'
            ELSE "facility_name"
        END AS "facility_name",
        "street_address",
        "CITY",
        "STATE",
        "ZIP",
        "latitude",
        "longitude",
        CASE
            WHEN "PHONE" = '(978) 406-1327' THEN '9784061327'
            WHEN "PHONE" = '413-731-6000 Or 413-731-6000' THEN '4137316000'
            WHEN "PHONE" = '978-342-9781 Or 978-342-9781' THEN '9783429781'
            WHEN "PHONE" = '800-865-3384' THEN '8008653384'
            ELSE "PHONE"
        END AS "PHONE",
        "revenue",
        "utilization_rate"
    FROM "organizations_renamed"
),

"organizations_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- ZIP: from INT to VARCHAR
    -- organization_id: from VARCHAR to UUID
    SELECT
        "facility_name",
        "street_address",
        "CITY",
        "STATE",
        "latitude",
        "longitude",
        "PHONE",
        "revenue",
        "utilization_rate",
        CAST("ZIP" AS VARCHAR) 
        AS "ZIP",
        CAST("organization_id" AS UUID) 
        AS "organization_id"
    FROM "organizations_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "organizations_renamed_cleaned_casted"

stg_organizations.yml (Document the table)

version: 2
models:
- name: stg_organizations
  description: The table is about healthcare organizations. It includes details such
    as unique ID, name, address, city, state, ZIP code, latitude, longitude, phone
    number, revenue, and utilization. The organizations appear to be various healthcare
    facilities like clinics, hospitals, and urgent care centers. They are located
    in Massachusetts. The table provides contact information and some operational
    data for each organization.
  columns:
  - name: facility_name
    description: Name of the healthcare facility
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: street_address
    description: Street address of the facility
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)$
        summary: Basic street address with number and street name
      - regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)(?:\s+(?:SUITE|STE)\s+\d+)?$
        summary: Address with suite or unit number
      - regex: ^[A-Z0-9\s]+$
        summary: General pattern allowing for various formats
      - regex: \d+\s+[A-Za-z\s&''\.]+\s+(Road|ST\.|Street|ROAD|PL)
        summary: Street addresses with number, name, and type (most strict)
      - regex: \d+\s+[A-Za-z\s&''\.]+
        summary: Street addresses with number and name (less strict)
      data_type:
        current_data_type: VARCHAR
  - name: CITY
    description: City where the facility is located
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: STATE
    description: State abbreviation for the facility's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: latitude
    description: Latitude coordinate of the facility
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: longitude
    description: Longitude coordinate of the facility
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: PHONE
    description: Contact phone number for the facility
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: \d{10}
        summary: 10-digit phone numbers without separators
      data_type:
        current_data_type: VARCHAR
  - name: revenue
    description: Revenue information for the facility
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: utilization_rate
    description: Measure of facility usage or capacity
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: ZIP
    description: ZIP code of the facility's location
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: organization_id
    description: Unique identifier for the healthcare organization
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for the healthcare
        organization. For this table, each row is for a unique healthcare facility.
        The organization_id appears to be a UUID, which is designed to be unique for
        each entry.
      data_type:
        current_data_type: UUID
  tests: []

stg_immunizations

Table Preview (first 5 rows)

vaccine_code vaccine_description base_cost encounter_id immunization_datetime patient_id
0 140 Influenza seasonal injectable preservative free 136.0 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 3dfb065a-67df-5b8a-3901-49bfd834bed1
1 140 Influenza seasonal injectable preservative free 136.0 07969a4d-f9f0-c63f-6a0e-ecb659a20571 1986-06-17 21:35:49 db80575b-5e9b-921b-fad9-1e3a20929dc7
2 21 varicella 136.0 8b1ef1fb-3297-bf02-8442-f68af013a64c 2014-01-19 01:05:00 3dfb065a-67df-5b8a-3901-49bfd834bed1
3 10 IPV 136.0 8b1ef1fb-3297-bf02-8442-f68af013a64c 2014-01-19 01:05:00 3dfb065a-67df-5b8a-3901-49bfd834bed1
4 140 Influenza seasonal injectable preservative free 136.0 8b1ef1fb-3297-bf02-8442-f68af013a64c 2014-01-19 01:05:00 3dfb065a-67df-5b8a-3901-49bfd834bed1

stg_immunizations.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:01:52.568671+00:00
WITH 
"immunizations_renamed" AS (
    -- Rename: Renaming columns
    -- DATE_ -> immunization_datetime
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> vaccine_code
    -- DESCRIPTION -> vaccine_description
    -- BASE_COST -> base_cost
    SELECT 
        "DATE_" AS "immunization_datetime",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "vaccine_code",
        "DESCRIPTION" AS "vaccine_description",
        "BASE_COST" AS "base_cost"
    FROM "memory"."main"."immunizations"
),

"immunizations_renamed_casted" AS (
    -- Column Type Casting: 
    -- encounter_id: from VARCHAR to UUID
    -- immunization_datetime: from VARCHAR to TIMESTAMP
    -- patient_id: from VARCHAR to UUID
    SELECT
        "vaccine_code",
        "vaccine_description",
        "base_cost",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("immunization_datetime" AS TIMESTAMP) 
        AS "immunization_datetime",
        CAST("patient_id" AS UUID) 
        AS "patient_id"
    FROM "immunizations_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "immunizations_renamed_casted"

stg_immunizations.yml (Document the table)

version: 2
models:
- name: stg_immunizations
  description: The table is about immunizations given to patients. It includes details
    such as the date of immunization, patient ID, encounter ID, vaccine code, vaccine
    description, and base cost. Each row represents a single immunization event. Multiple
    immunizations can be given to the same patient on the same date. All immunizations
    in the sample have the same base cost of $136.00.
  columns:
  - name: vaccine_code
    description: Numeric code representing the vaccine type
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: vaccine_description
    description: Text description of the vaccine administered
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: base_cost
    description: Base cost of the immunization in dollars
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: encounter_id
    description: Unique identifier for the healthcare encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: immunization_datetime
    description: Date and time of immunization
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  tests: []

stg_payer_transitions

Table Preview (first 5 rows)

plan_ownership owner_name coverage_end_date coverage_start_date member_id patient_id primary_payer_id secondary_payer_id
0 Guardian Solomon675 Bailey598 2022-09-29 02:44:04 2021-09-23 02:44:04 74d0f7af-d6c6-7848-9f62-c3f165170e09 eb247227-e839-88d3-447d-b5972468f33b 734afbd6-4794-363b-9bc0-6a3981533ed5 NaN
1 Guardian Solomon675 Bailey598 2023-09-29 02:44:04 2022-09-29 02:44:04 74d0f7af-d6c6-7848-9f62-c3f165170e09 eb247227-e839-88d3-447d-b5972468f33b 734afbd6-4794-363b-9bc0-6a3981533ed5 NaN
2 Guardian Hong136 Keebler762 2010-02-14 01:05:00 2009-02-08 01:05:00 d4ef17f8-1714-3490-17af-70d4d0c16478 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c NaN
3 Guardian Hong136 Keebler762 2011-02-20 01:05:00 2010-02-14 01:05:00 d4ef17f8-1714-3490-17af-70d4d0c16478 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c NaN
4 Guardian Hong136 Keebler762 2012-02-26 01:05:00 2011-02-20 01:05:00 d4ef17f8-1714-3490-17af-70d4d0c16478 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c NaN

stg_payer_transitions.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:13:46.432396+00:00
WITH 
"payer_transitions_renamed" AS (
    -- Rename: Renaming columns
    -- PATIENT -> patient_id
    -- MEMBERID -> member_id
    -- START_DATE -> coverage_start_date
    -- END_DATE -> coverage_end_date
    -- PAYER -> primary_payer_id
    -- SECONDARY_PAYER -> secondary_payer_id
    -- PLAN_OWNERSHIP -> plan_ownership
    -- OWNER_NAME -> owner_name
    SELECT 
        "PATIENT" AS "patient_id",
        "MEMBERID" AS "member_id",
        "START_DATE" AS "coverage_start_date",
        "END_DATE" AS "coverage_end_date",
        "PAYER" AS "primary_payer_id",
        "SECONDARY_PAYER" AS "secondary_payer_id",
        "PLAN_OWNERSHIP" AS "plan_ownership",
        "OWNER_NAME" AS "owner_name"
    FROM "memory"."main"."payer_transitions"
),

"payer_transitions_renamed_casted" AS (
    -- Column Type Casting: 
    -- coverage_end_date: from VARCHAR to TIMESTAMP
    -- coverage_start_date: from VARCHAR to TIMESTAMP
    -- member_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- primary_payer_id: from VARCHAR to UUID
    -- secondary_payer_id: from VARCHAR to UUID
    SELECT
        "plan_ownership",
        "owner_name",
        CASE
            WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:04Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
            WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:07Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
            WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
        END 
        AS "coverage_end_date",
        CASE
            WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T13:42:04Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
            WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T03:41:07Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
            WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T19:43:25Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
            WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
        END 
        AS "coverage_start_date",
        CAST("member_id" AS UUID) 
        AS "member_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("primary_payer_id" AS UUID) 
        AS "primary_payer_id",
        CAST("secondary_payer_id" AS UUID) 
        AS "secondary_payer_id"
    FROM "payer_transitions_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "payer_transitions_renamed_casted"

stg_payer_transitions.yml (Document the table)

version: 2
models:
- name: stg_payer_transitions
  description: The table represents payer transitions for patients. It shows insurance
    coverage periods for patients. Each row contains a patient ID, member ID, start
    and end dates of coverage, primary payer, secondary payer (if any), plan ownership,
    and owner name. The table tracks changes in insurance coverage over time for individual
    patients.
  columns:
  - name: plan_ownership
    description: Type of ownership for the insurance plan
    tests:
    - accepted_values:
        values:
        - Self
        - Guardian
        - Spouse
    cocoon_meta:
      missing_reason: Unknown
      future_accepted_values:
      - Parent
      - Child
      - Sibling
      - Domestic Partner
      - Other Family Member
      - Employer
      - Trust
      - Other
      data_type:
        current_data_type: VARCHAR
  - name: owner_name
    description: Name of the insurance plan owner
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: coverage_end_date
    description: Ending date of insurance coverage period
    tests:
    - not_null
    cocoon_meta:
      unusual_values: 'Two different time formats are used: 13:42:04Z for most dates
        and 03:41:07Z for two December dates.'
      data_type:
        current_data_type: TIMESTAMP
  - name: coverage_start_date
    description: Beginning date of insurance coverage period
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: member_id
    description: Unique identifier for the insurance member
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: primary_payer_id
    description: Primary insurance payer identifier
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: secondary_payer_id
    description: Secondary insurance payer identifier, if applicable
    cocoon_meta:
      missing_reason: Not all patients have a secondary insurance plan.
      data_type:
        current_data_type: UUID
  tests: []

stg_medications

Table Preview (first 5 rows)

medication_description base_cost payer_coverage_amount dispense_count total_cost reason_description encounter_id medication_code order_end_datetime order_start_datetime patient_id payer_id reason_code
0 Clopidogrel 75 MG Oral Tablet 631.42 0.00 162 102290.04 None 546912c2-aa40-a2e0-f148-af23fa171af0 309362 NaT 2009-09-02 12:00:39 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a None
1 Simvastatin 20 MG Oral Tablet 323.20 29.40 162 52358.40 None 546912c2-aa40-a2e0-f148-af23fa171af0 312961 NaT 2009-09-02 12:00:39 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a None
2 24 HR metoprolol succinate 100 MG Extended Release Oral Tablet 129.94 99.94 162 21050.28 None 546912c2-aa40-a2e0-f148-af23fa171af0 866412 NaT 2009-09-02 12:00:39 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a None
3 Nitroglycerin 0.4 MG/ACTUAT Mucosal Spray 702.61 672.61 162 113822.82 None 546912c2-aa40-a2e0-f148-af23fa171af0 705129 NaT 2009-09-02 12:00:39 2ffa361e-5858-877e-e022-ce81fe32da1b 0133f751-9229-3cfd-815f-b6d4979bdd6a None
4 Ibuprofen 100 MG Oral Tablet 332.42 0.00 3 997.26 None 1600b01f-01b3-fd15-135f-c7c4a30d867f 198405 2016-07-23 04:55:57 2016-04-11 04:55:57 3dfb065a-67df-5b8a-3901-49bfd834bed1 26aab0cd-6aba-3e1b-ac5b-05c8867e762c None

stg_medications.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:03:31.386317+00:00
WITH 
"medications_renamed" AS (
    -- Rename: Renaming columns
    -- START -> order_start_datetime
    -- STOP -> order_end_datetime
    -- PATIENT -> patient_id
    -- PAYER -> payer_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> medication_code
    -- DESCRIPTION -> medication_description
    -- BASE_COST -> base_cost
    -- PAYER_COVERAGE -> payer_coverage_amount
    -- DISPENSES -> dispense_count
    -- TOTALCOST -> total_cost
    -- REASONCODE -> reason_code
    -- REASONDESCRIPTION -> reason_description
    SELECT 
        "START" AS "order_start_datetime",
        "STOP" AS "order_end_datetime",
        "PATIENT" AS "patient_id",
        "PAYER" AS "payer_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "medication_code",
        "DESCRIPTION" AS "medication_description",
        "BASE_COST" AS "base_cost",
        "PAYER_COVERAGE" AS "payer_coverage_amount",
        "DISPENSES" AS "dispense_count",
        "TOTALCOST" AS "total_cost",
        "REASONCODE" AS "reason_code",
        "REASONDESCRIPTION" AS "reason_description"
    FROM "memory"."main"."medications"
),

"medications_renamed_casted" AS (
    -- Column Type Casting: 
    -- encounter_id: from VARCHAR to UUID
    -- medication_code: from INT to VARCHAR
    -- order_end_datetime: from VARCHAR to TIMESTAMP
    -- order_start_datetime: from VARCHAR to TIMESTAMP
    -- patient_id: from VARCHAR to UUID
    -- payer_id: from VARCHAR to UUID
    -- reason_code: from DECIMAL to VARCHAR
    SELECT
        "medication_description",
        "base_cost",
        "payer_coverage_amount",
        "dispense_count",
        "total_cost",
        "reason_description",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("medication_code" AS VARCHAR) 
        AS "medication_code",
        CAST("order_end_datetime" AS TIMESTAMP) 
        AS "order_end_datetime",
        CAST("order_start_datetime" AS TIMESTAMP) 
        AS "order_start_datetime",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("payer_id" AS UUID) 
        AS "payer_id",
        CAST("reason_code" AS VARCHAR) 
        AS "reason_code"
    FROM "medications_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "medications_renamed_casted"

stg_medications.yml (Document the table)

version: 2
models:
- name: stg_medications
  description: The table is about medication orders. It includes patient IDs, payer
    information, encounter IDs, and medication details. Each row represents a medication
    prescription with its start date, code, description, cost, coverage, and quantity
    dispensed. The table tracks both base cost and total cost for each medication.
    Some entries have reason codes and descriptions, though these are often empty
    in the sample data.
  columns:
  - name: medication_description
    description: Description of the medication
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: ^[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
        summary: Medication name followed by dosage (in MG, UNT, or ACTUAT) and form
          (e.g., Oral Tablet, Injection)
      - regex: ^\d+\sHR\s[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
        summary: Medication with time duration (e.g., 24 HR) at the beginning
      - regex: ^\d+\sML\s[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
        summary: Medication with volume (e.g., 1 ML) at the beginning
      - regex: ^[A-Za-z]+\s\d+\sDay\sPack$
        summary: Birth control packs (e.g., 28 Day Pack)
      - regex: ^.+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT|ML)(/[A-Z]+)?.*$
        summary: Looser pattern to catch most entries
      - regex: ^[A-Za-z]+\s\d+\.\d+/\d+\.\d+\sMG\sper\s\d+HR\s\d+\sDay\s[A-Za-z]+\s[A-Za-z]+$
        summary: Medication name followed by dosage in MG and administration method
      - regex: ^[A-Za-z]+\s\d+\.\d+/\d+\s\d+\sDay\sPack$
        summary: Medication name followed by dosage and pack information
      - regex: ^[A-Za-z]+/[A-Za-z]+$
        summary: Medication names separated by forward slash
      data_type:
        current_data_type: VARCHAR
  - name: base_cost
    description: Base cost of the medication
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: payer_coverage_amount
    description: Amount covered by the payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: dispense_count
    description: Number of times the medication was dispensed
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: total_cost
    description: Total cost of the medication
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: reason_description
    description: Description of the reason for prescription
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: medication_code
    description: Medication code
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: order_end_datetime
    description: End date and time of medication order
    cocoon_meta:
      missing_reason: Ongoing medications don't have an end date.
      data_type:
        current_data_type: TIMESTAMP
  - name: order_start_datetime
    description: Start date and time of medication order
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: payer_id
    description: Unique identifier for the insurance payer
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: reason_code
    description: Code for the reason of prescription
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_imaging_studies

Table Preview (first 5 rows)

series_uid bodysite_description modality_code modality_description instance_uid sop_code sop_description bodysite_code encounter_id patient_id procedure_code study_datetime study_id
0 1.2.840.99999999.1.63419662.1460344377161 Clavicle DX Digital Radiography 1.2.840.99999999.1.1.35771508.1103224827588 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 51299004 1600b01f-01b3-fd15-135f-c7c4a30d867f 3dfb065a-67df-5b8a-3901-49bfd834bed1 168594001 2016-04-11 03:12:57 b7b3078a-86e4-fc51-1720-d33afcd4ce88
1 1.2.840.99999999.1.83549147.1664073844511 Ankle DX Digital Radiography 1.2.840.99999999.1.1.49048804.1413650359123 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 344001 1cedec53-a362-320c-d52a-ddbc57fbce63 eb247227-e839-88d3-447d-b5972468f33b 19490002 2022-09-25 02:44:04 6f1b4250-2e8e-eaaf-f6a7-655d8f8e5791
2 1.2.840.99999999.1.10943528.675468811165 Arm DX Digital Radiography 1.2.840.99999999.1.1.93692690.675468811165 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 40983000 59f48dfd-f1f6-faa7-f378-150137d46be4 db80575b-5e9b-921b-fad9-1e3a20929dc7 1225002 1991-05-28 22:13:31 932a8e0d-20d5-573b-db47-8b9db9d3b99c
3 1.2.840.99999999.1.70861523.1413650359123 Ankle DX Digital Radiography 1.2.840.99999999.1.1.49048804.1413650359123 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 344001 a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 734e5f3c-e660-6cbe-7c26-c5264cbde68e 19490002 2014-10-18 16:39:19 c16fda8c-2353-75d9-f7c2-d91c773d87de
4 1.2.840.99999999.1.26012892.1385166503634 Clavicle DX Digital Radiography 1.2.840.99999999.1.1.35771508.1103224827588 1.2.840.10008.5.1.4.1.1.1.1 Digital X-Ray Image Storage 51299004 0148026b-b4db-1663-19ed-a6a51d3c33b2 750cdaf4-c264-e967-e76b-53a5a61abcab 168594001 2013-11-23 00:28:23 ef1ee1a1-e906-6744-421a-ab8480b8b8ae

stg_imaging_studies.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:01:06.123995+00:00
WITH 
"imaging_studies_renamed" AS (
    -- Rename: Renaming columns
    -- Id -> study_id
    -- DATE_ -> study_datetime
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- SERIES_UID -> series_uid
    -- BODYSITE_CODE -> bodysite_code
    -- BODYSITE_DESCRIPTION -> bodysite_description
    -- MODALITY_CODE -> modality_code
    -- MODALITY_DESCRIPTION -> modality_description
    -- INSTANCE_UID -> instance_uid
    -- SOP_CODE -> sop_code
    -- SOP_DESCRIPTION -> sop_description
    -- PROCEDURE_CODE -> procedure_code
    SELECT 
        "Id" AS "study_id",
        "DATE_" AS "study_datetime",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "SERIES_UID" AS "series_uid",
        "BODYSITE_CODE" AS "bodysite_code",
        "BODYSITE_DESCRIPTION" AS "bodysite_description",
        "MODALITY_CODE" AS "modality_code",
        "MODALITY_DESCRIPTION" AS "modality_description",
        "INSTANCE_UID" AS "instance_uid",
        "SOP_CODE" AS "sop_code",
        "SOP_DESCRIPTION" AS "sop_description",
        "PROCEDURE_CODE" AS "procedure_code"
    FROM "memory"."main"."imaging_studies"
),

"imaging_studies_renamed_casted" AS (
    -- Column Type Casting: 
    -- bodysite_code: from INT to VARCHAR
    -- encounter_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- procedure_code: from INT to VARCHAR
    -- study_datetime: from VARCHAR to TIMESTAMP
    -- study_id: from VARCHAR to UUID
    SELECT
        "series_uid",
        "bodysite_description",
        "modality_code",
        "modality_description",
        "instance_uid",
        "sop_code",
        "sop_description",
        CAST("bodysite_code" AS VARCHAR) 
        AS "bodysite_code",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("procedure_code" AS VARCHAR) 
        AS "procedure_code",
        CAST("study_datetime" AS TIMESTAMP) 
        AS "study_datetime",
        CAST("study_id" AS UUID) 
        AS "study_id"
    FROM "imaging_studies_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "imaging_studies_renamed_casted"

stg_imaging_studies.yml (Document the table)

version: 2
models:
- name: stg_imaging_studies
  description: The table is about imaging studies. It contains details of medical
    imaging procedures. Each row represents a unique imaging study. It includes the
    study ID, date, patient ID, encounter ID, body site, imaging modality, and procedure
    codes. The table also contains technical details like series UID, instance UID,
    and SOP codes. This data likely comes from a healthcare system's radiology department.
  columns:
  - name: series_uid
    description: Unique identifier for the image series
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for the image series.
        For this table, each row is for a unique imaging study. The series_uid appears
        to be unique across rows in the sample data.
      data_type:
        current_data_type: VARCHAR
  - name: bodysite_description
    description: Description of the body part imaged
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: modality_code
    description: Code representing the imaging modality
    tests:
    - not_null
    - accepted_values:
        values:
        - CR
        - DX
        - US
        - CT
    cocoon_meta:
      future_accepted_values:
      - MR
      - NM
      - PT
      - XA
      - RF
      - MG
      - IO
      - PX
      - GM
      - SM
      - XC
      - ES
      - ECG
      - HD
      - SR
      - OT
      - BI
      - CD
      - DD
      - LS
      - OPT
      data_type:
        current_data_type: VARCHAR
  - name: modality_description
    description: Description of the imaging modality
    tests:
    - not_null
    - accepted_values:
        values:
        - Computed Radiography
        - Digital Radiography
        - Ultrasound
        - Computed Tomography
    cocoon_meta:
      future_accepted_values:
      - Magnetic Resonance Imaging
      - Nuclear Medicine
      - Positron Emission Tomography
      - Fluoroscopy
      - Mammography
      - Angiography
      - Dual-Energy X-ray Absorptiometry
      - Optical Coherence Tomography
      data_type:
        current_data_type: VARCHAR
  - name: instance_uid
    description: Unique identifier for the image instance
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: sop_code
    description: Service-Object Pair code for DICOM standard
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: sop_description
    description: Description of the SOP code
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: bodysite_code
    description: Code representing the body part imaged
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: encounter_id
    description: Unique identifier for the healthcare encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: procedure_code
    description: Code representing the imaging procedure performed
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: study_datetime
    description: Date and time of the imaging study
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: study_id
    description: Unique identifier for the imaging study
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for the imaging study.
        For this table, each row is for a unique imaging study. The study_id appears
        to be unique across rows in the sample data.
      data_type:
        current_data_type: UUID
  tests: []

stg_devices

Table Preview (first 5 rows)

device_description device_udi device_code encounter_id patient_id usage_end_date usage_start_date
0 Home nebulizer (physical object) (01)22513743438051(11)041125(17)291210(10)68326934928628(21)359700838 170615005 7cde5dee-2065-db54-2aca-5335fca85bd5 2ffa361e-5858-877e-e022-ce81fe32da1b NaT 2004-12-16 19:00:39
1 Manual wheelchair (physical object) (01)10803398532575(11)220904(17)470919(10)77584230(21)506502350562 228869008 1cedec53-a362-320c-d52a-ddbc57fbce63 eb247227-e839-88d3-447d-b5972468f33b NaT 2022-09-25 02:44:04
2 Home nebulizer (physical object) (01)91908954162042(11)030105(17)280120(10)384840111239630881(21)2214828292399 170615005 becb96b4-e1aa-1b80-0aa9-8b06e258871f d84815a3-c5b3-8ca2-025f-6323a4ec59ef NaT 2003-01-26 18:35:55
3 Manual wheelchair (physical object) (01)75341347423725(11)170329(17)420413(10)91627316426(21)736039710205635138 228869008 4e5eeece-1a14-8ae0-874f-15d9c6f42a34 7ec76836-c039-d9bf-8bb9-fe488c66d452 2017-05-15 15:37:58 2017-04-19 15:37:58
4 Manual wheelchair (physical object) (01)44815148141981(11)140927(17)391012(10)395970756765435894(21)565882491188675 228869008 a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 734e5f3c-e660-6cbe-7c26-c5264cbde68e 2015-01-09 17:11:06 2014-10-18 16:39:19

stg_devices.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:56:59.986257+00:00
WITH 
"devices_renamed" AS (
    -- Rename: Renaming columns
    -- START -> usage_start_date
    -- STOP -> usage_end_date
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> device_code
    -- DESCRIPTION -> device_description
    -- UDI -> device_udi
    SELECT 
        "START" AS "usage_start_date",
        "STOP" AS "usage_end_date",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "device_code",
        "DESCRIPTION" AS "device_description",
        "UDI" AS "device_udi"
    FROM "memory"."main"."devices"
),

"devices_renamed_casted" AS (
    -- Column Type Casting: 
    -- device_code: from INT to VARCHAR
    -- encounter_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- usage_end_date: from VARCHAR to TIMESTAMP
    -- usage_start_date: from VARCHAR to TIMESTAMP
    SELECT
        "device_description",
        "device_udi",
        CAST("device_code" AS VARCHAR) 
        AS "device_code",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("usage_end_date" AS TIMESTAMP) 
        AS "usage_end_date",
        CAST("usage_start_date" AS TIMESTAMP) 
        AS "usage_start_date"
    FROM "devices_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "devices_renamed_casted"

stg_devices.yml (Document the table)

version: 2
models:
- name: stg_devices
  description: The table is about medical devices. It contains start and stop dates
    for device usage. Each row represents a device associated with a patient and encounter.
    The table includes device codes, descriptions, and unique device identifiers (UDI).
    Devices listed are home nebulizers and manual wheelchairs. The UDI contains encoded
    information about the device.
  columns:
  - name: device_description
    description: Text description of the device
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: device_udi
    description: Unique Device Identifier with encoded information
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column contains the Unique Device Identifier (UDI) for each
        medical device. For this table, each row represents a unique device usage
        instance associated with a patient and encounter. The device_udi appears to
        be unique across rows, as it contains encoded information specific to each
        device instance.
      patterns:
      - regex: \(01\)\d+\(11\)\d+\(17\)\d+\(10\)\d+\(21\)\d+
        summary: 'UDI code with 5 segments: (01), (11), (17), (10), and (21)'
      data_type:
        current_data_type: VARCHAR
  - name: device_code
    description: Numeric code identifying the device type
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: usage_end_date
    description: End date of device usage
    cocoon_meta:
      missing_reason: Device usage might be ongoing, without an end date.
      data_type:
        current_data_type: TIMESTAMP
  - name: usage_start_date
    description: Start date of device usage
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  tests: []

stg_observations

Table Preview (first 5 rows)

CATEGORY observation_code DESCRIPTION UNITS data_type encounter_id observation_datetime observation_value patient_id
0 vital-signs 29463-7 Body Weight kg numeric 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 11.1 3dfb065a-67df-5b8a-3901-49bfd834bed1
1 vital-signs 59576-9 Body mass index (BMI) [Percentile] Per age and gender % numeric 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 0.1 3dfb065a-67df-5b8a-3901-49bfd834bed1
2 vital-signs 8480-6 Systolic Blood Pressure mmHg numeric 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 127.0 3dfb065a-67df-5b8a-3901-49bfd834bed1
3 vital-signs 9279-1 Respiratory rate /min numeric 8b80cac7-383e-b479-1a05-7d8a9de45d04 2013-01-13 01:05:00 15.0 3dfb065a-67df-5b8a-3901-49bfd834bed1
4 vital-signs 72514-3 Pain severity - 0-10 verbal numeric rating [Score] - Reported None numeric 8b1ef1fb-3297-bf02-8442-f68af013a64c 2014-01-19 01:05:00 1.0 3dfb065a-67df-5b8a-3901-49bfd834bed1

stg_observations.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:06:20.318190+00:00
WITH 
"observations_renamed" AS (
    -- Rename: Renaming columns
    -- DATE_ -> observation_datetime
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> observation_code
    -- VALUE_ -> observation_value
    -- TYPE -> data_type
    SELECT 
        "DATE_" AS "observation_datetime",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CATEGORY",
        "CODE" AS "observation_code",
        "DESCRIPTION",
        "VALUE_" AS "observation_value",
        "UNITS",
        "TYPE" AS "data_type"
    FROM "memory"."main"."observations"
),

"observations_renamed_dedup" AS (
    -- Deduplication: Removed 556 duplicated rows
    SELECT DISTINCT *
    FROM "observations_renamed"
),

"observations_renamed_dedup_cleaned" AS (
    -- Clean unusual string values: 
    -- DESCRIPTION: The problem is inconsistent capitalization and punctuation in question-format values, as well as unexplained abbreviations. The correct values should have consistent capitalization (sentence case) and punctuation (question marks for questions), and abbreviated terms should be expanded for clarity.
    -- observation_value: '0.0' and 'No' likely represent the same concept. '1.0' and 'Yes' might also be redundant representations.
    -- UNITS: The problem is that some values in the UNITS column are not standard unit abbreviations and lack specificity. The unusual values are '{score}', '{nominal}', 'a', '{#}', '{INR}', '{presence}', '{SG}', and '{T-score}'. These are not standardized units of measurement but rather descriptors or placeholders. Additionally, some units have inconsistent formatting, such as '[pH]' vs 'pH', and '[degF]' which should be standardized. The correct values should be standardized units of measurement or left empty if they represent a score or nominal value without a specific unit.
    SELECT
        "observation_datetime",
        "patient_id",
        "encounter_id",
        "CATEGORY",
        "observation_code",
        CASE
            WHEN "DESCRIPTION" = 'At any point in the past 2 years  has season or migrant farm work been your or your family''s main source of income?' THEN 'At any point in the past 2 years, has season or migrant farm work been your or your family''s main source of income?'
            WHEN "DESCRIPTION" = 'During the past year  what was the total combined income for you and the family members you live with? This information will help us determine if you are eligible for any benefits.' THEN 'During the past year, what was the total combined income for you and the family members you live with? This information will help us determine if you are eligible for any benefits.'
            WHEN "DESCRIPTION" = 'Has lack of transportation kept you from medical appointments  meetings  work  or from getting things needed for daily living?' THEN 'Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living?'
            WHEN "DESCRIPTION" = 'How many family members  including yourself  do you currently live with?' THEN 'How many family members, including yourself, do you currently live with?'
            WHEN "DESCRIPTION" = 'How often do you see or talk to people that you care about and feel close to (For example: talking to friends on the phone  visiting friends or family  going to church or club meetings)?' THEN 'How often do you see or talk to people that you care about and feel close to? (For example: talking to friends on the phone, visiting friends or family, going to church or club meetings)'
            WHEN "DESCRIPTION" = 'In the past year  have you been afraid of your partner or ex-partner?' THEN 'In the past year, have you been afraid of your partner or ex-partner?'
            WHEN "DESCRIPTION" = 'In the past year  have you or any family members you live with been unable to get any of the following when it was really needed?' THEN 'In the past year, have you or any family members you live with been unable to get any of the following when it was really needed?'
            WHEN "DESCRIPTION" = 'In the past year  have you spent more than 2 nights in a row in a jail  prison  detention center  or juvenile correctional facility?' THEN 'In the past year, have you spent more than 2 nights in a row in a jail, prison, detention center, or juvenile correctional facility?'
            WHEN "DESCRIPTION" = 'Stress is when someone feels tense  nervous  anxious or can''t sleep at night because their mind is troubled. How stressed are you?' THEN 'Stress is when someone feels tense, nervous, anxious or can''t sleep at night because their mind is troubled. How stressed are you?'
            WHEN "DESCRIPTION" = 'DALY' THEN 'Disability-Adjusted Life Year (DALY)'
            WHEN "DESCRIPTION" = 'QALY' THEN 'Quality-Adjusted Life Year (QALY)'
            WHEN "DESCRIPTION" = 'QOLS' THEN 'Quality of Life Scale (QOLS)'
            WHEN "DESCRIPTION" = 'Total score [DAST-10]' THEN 'Total score [Drug Abuse Screening Test-10]'
            WHEN "DESCRIPTION" = 'Total score [HARK]' THEN 'Total score [Humiliation, Afraid, Rape, Kick Domestic Violence Screening Tool]'
            WHEN "DESCRIPTION" = 'Total score [AUDIT-C]' THEN 'Total score [Alcohol Use Disorders Identification Test-Concise]'
            WHEN "DESCRIPTION" = 'Patient Health Questionnaire-9: Modified for Teens total score [Reported.PHQ.Teen]' THEN 'Patient Health Questionnaire-9: Modified for Teens total score [PHQ-Teen]'
            WHEN "DESCRIPTION" = 'FEV1/FVC' THEN 'FEV1/FVC Ratio [Forced Expiratory Volume in 1 second / Forced Vital Capacity]'
            WHEN "DESCRIPTION" = 'Body temperature' THEN 'Body Temperature'
            WHEN "DESCRIPTION" = 'INR in Platelet poor plasma by Coagulation assay' THEN 'International Normalized Ratio (INR) in Platelet Poor Plasma by Coagulation Assay'
            WHEN "DESCRIPTION" = 'What number best describes how  during the past week  pain has interfered with your enjoyment of life?' THEN 'Pain Interference with Life Enjoyment (Past Week)'
            WHEN "DESCRIPTION" = 'What number best describes how  during the past week  pain has interfered with your general activity?' THEN 'Pain Interference with General Activity (Past Week)'
            WHEN "DESCRIPTION" = 'What number best describes your pain on average in the past week?' THEN 'Average Pain Level (Past Week)'
            WHEN "DESCRIPTION" = 'Head Occipital-frontal circumference' THEN 'Head Occipital-Frontal Circumference'
            WHEN "DESCRIPTION" = 'Head Occipital-frontal circumference Percentile' THEN 'Head Occipital-Frontal Circumference Percentile'
            WHEN "DESCRIPTION" = 'Weight-for-length Per age and sex' THEN 'Weight-for-Length Percentile by Age and Sex'
            WHEN "DESCRIPTION" = 'aPTT in Blood by Coagulation assay' THEN 'Activated Partial Thromboplastin Time (aPTT) in Blood by Coagulation Assay'
            WHEN "DESCRIPTION" = 'Bicarbonate [Moles/volume] in Arterial blood' THEN 'Bicarbonate [Moles/Volume] in Arterial Blood'
            WHEN "DESCRIPTION" = 'Carbon dioxide [Partial pressure] in Arterial blood' THEN 'Carbon Dioxide [Partial Pressure] in Arterial Blood'
            WHEN "DESCRIPTION" = 'Oxygen [Partial pressure] in Arterial blood' THEN 'Oxygen [Partial Pressure] in Arterial Blood'
            WHEN "DESCRIPTION" = 'pH of Arterial blood' THEN 'pH of Arterial Blood'
            WHEN "DESCRIPTION" = 'Phosphate [Mass/volume] in Serum or Plasma' THEN 'Phosphate [Mass/Volume] in Serum or Plasma'
            WHEN "DESCRIPTION" = 'Prothrombin time (PT)' THEN 'Prothrombin Time (PT)'
            WHEN "DESCRIPTION" = 'Patient Health Questionnaire 9 item (PHQ-9) total score [Reported]' THEN 'Patient Health Questionnaire-9 (PHQ-9) Total Score [Reported]'
            WHEN "DESCRIPTION" = 'Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method' THEN 'Cardiac Troponin I [Mass/Volume] in Serum or Plasma by High Sensitivity Method'
            WHEN "DESCRIPTION" = 'Drugs of abuse 5 panel - Urine by Screen method' THEN 'Drugs of Abuse 5-Panel - Urine Screen'
            WHEN "DESCRIPTION" = 'Iron binding capacity [Mass/volume] in Serum or Plasma' THEN 'Iron Binding Capacity [Mass/Volume] in Serum or Plasma'
            WHEN "DESCRIPTION" = 'Iron saturation [Mass Fraction] in Serum or Plasma' THEN 'Iron Saturation [Mass Fraction] in Serum or Plasma'
            WHEN "DESCRIPTION" = 'NT-proBNP' THEN 'N-Terminal Pro-B-Type Natriuretic Peptide (NT-proBNP)'
            WHEN "DESCRIPTION" = 'Total score [MMSE]' THEN 'Total Score [Mini-Mental State Examination (MMSE)]'
            WHEN "DESCRIPTION" = 'Natriuretic peptide.B prohormone N-Terminal [Mass/volume] in Blood by Immunoassay' THEN 'N-Terminal Pro-B-Type Natriuretic Peptide [Mass/Volume] in Blood by Immunoassay'
            WHEN "DESCRIPTION" = 'Left ventricular Ejection fraction' THEN 'Left Ventricular Ejection Fraction'
            WHEN "DESCRIPTION" = 'Operative Status Value' THEN 'Operative Status'
            WHEN "DESCRIPTION" = 'American house dust mite IgE Ab in Serum' THEN 'American House Dust Mite IgE Antibody in Serum'
            ELSE "DESCRIPTION"
        END AS "DESCRIPTION",
        "observation_value",
        CASE
            WHEN "UNITS" = '{score}' THEN NULL
            WHEN "UNITS" = 'mm[Hg]' THEN 'mmHg'
            WHEN "UNITS" = 'a' THEN '/year'
            WHEN "UNITS" = '{nominal}' THEN NULL
            WHEN "UNITS" = 'kg/m2' THEN 'kg/m²'
            WHEN "UNITS" = '{#}' THEN NULL
            WHEN "UNITS" = '10*3/uL' THEN '10³/µL'
            WHEN "UNITS" = '/a' THEN '/year'
            WHEN "UNITS" = '10*6/uL' THEN '10⁶/µL'
            WHEN "UNITS" = 'Cel' THEN '°C'
            WHEN "UNITS" = '{INR}' THEN NULL
            WHEN "UNITS" = 'mL/min/{1.73_m2}' THEN 'mL/min/1.73m²'
            WHEN "UNITS" = '[pH]' THEN 'pH'
            WHEN "UNITS" = '{presence}' THEN NULL
            WHEN "UNITS" = 'ug/dL' THEN 'µg/dL'
            WHEN "UNITS" = 'ug/L' THEN 'µg/L'
            WHEN "UNITS" = '[degF]' THEN '°F'
            WHEN "UNITS" = 'm[IU]/L' THEN 'mIU/L'
            WHEN "UNITS" = '{SG}' THEN NULL
            WHEN "UNITS" = '{T-score}' THEN NULL
            ELSE "UNITS"
        END AS "UNITS",
        "data_type"
    FROM "observations_renamed_dedup"
),

"observations_renamed_dedup_cleaned_casted" AS (
    -- Column Type Casting: 
    -- encounter_id: from VARCHAR to UUID
    -- observation_datetime: from VARCHAR to TIMESTAMP
    -- observation_value: from VARCHAR to DECIMAL
    -- patient_id: from VARCHAR to UUID
    SELECT
        "CATEGORY",
        "observation_code",
        "DESCRIPTION",
        "UNITS",
        "data_type",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("observation_datetime" AS TIMESTAMP) 
        AS "observation_datetime",
        "observation_value" 
        AS "observation_value",
        CAST("patient_id" AS UUID) 
        AS "patient_id"
    FROM "observations_renamed_dedup_cleaned"
)

-- COCOON BLOCK END
SELECT *
FROM "observations_renamed_dedup_cleaned_casted"

stg_observations.yml (Document the table)

version: 2
models:
- name: stg_observations
  description: The table is about medical observations. It contains details of patient
    health measurements. Each row represents a single observation. The table includes
    the date, patient ID, encounter ID, category, measurement code, description, value,
    units, and type of measurement. Observations cover vital signs like body height,
    weight, and pain severity scores.
  columns:
  - name: CATEGORY
    description: Category of the medical observation
    tests:
    - accepted_values:
        values:
        - laboratory
        - survey
        - vital-signs
        - social-history
        - procedure
        - exam
        - therapy
    cocoon_meta:
      missing_reason: Unknown
      future_accepted_values:
      - medication
      - imaging
      - diagnosis
      - allergy
      - immunization
      - family-history
      data_type:
        current_data_type: VARCHAR
  - name: observation_code
    description: Standardized code for the medical observation
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: \d{5,7}-\d
        summary: LOINC codes (5-7 digits, hyphen, 1 digit)
      - regex: '[A-Z]{3,4}'
        summary: Short alphabetic codes (3-4 uppercase letters)
      - regex: '[A-Za-z0-9-]+'
        summary: Alphanumeric codes (mix of digits and letters)
      data_type:
        current_data_type: VARCHAR
  - name: DESCRIPTION
    description: Description of the medical observation
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: UNITS
    description: Units of measurement for the observation
    cocoon_meta:
      missing_reason: Not all vital signs or measurements require units.
      data_type:
        current_data_type: VARCHAR
  - name: data_type
    description: Type of data for the observation
    tests:
    - not_null
    - accepted_values:
        values:
        - numeric
        - text
    cocoon_meta:
      future_accepted_values:
      - categorical
      - boolean
      - date
      - datetime
      - time
      data_type:
        current_data_type: VARCHAR
  - name: encounter_id
    description: Unique identifier for the medical encounter
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: UUID
  - name: observation_datetime
    description: Date and time of the medical observation
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: observation_value
    description: Numeric value of the medical observation
    tests:
    - not_null
    cocoon_meta:
      unusual_values: '''0.0'' and ''No'' likely represent the same concept. ''1.0''
        and ''Yes'' might also be redundant representations.'
      patterns:
      - regex: ^\d+(\.\d+)?$
        summary: Floating point numbers (including whole numbers)
      - regex: ^(Yes|No)$
        summary: Yes/No values
      - regex: ^(3 to 5 times a week|5 or more times a week)$
        summary: Frequency descriptions
      - regex: ^(More than high school|High school diploma or GED)$
        summary: Education levels
      - regex: ^(Private insurance|Medicare|Medicaid)$
        summary: Insurance types
      - regex: ^(Full-time work|Part-time or temporary work)$
        summary: Employment statuses
      - regex: ^(Finding of .+|\w+.+\(finding\))$
        summary: Medical findings (starting with "Finding of" or ending with "(finding)")
      - regex: ^.+\(qualifier value\)$
        summary: Qualifiers (ending with "(qualifier value)")
      data_type:
        current_data_type: VARCHAR
        expected_data_type: DECIMAL
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  tests: []

stg_conditions

Table Preview (first 5 rows)

condition_description condition_code condition_end_date condition_start_date encounter_id patient_id
0 Received certificate of high school equivalency (finding) 5251000175109 NaT 1962-07-25 c6600278-5c90-c0c7-db52-140b3b98d1e8 2ffa361e-5858-877e-e022-ce81fe32da1b
1 Full-time employment (finding) 160903007 1963-07-31 1962-07-25 c6600278-5c90-c0c7-db52-140b3b98d1e8 2ffa361e-5858-877e-e022-ce81fe32da1b
2 Full-time employment (finding) 160903007 1966-08-03 1963-07-31 5d6bce0a-93b6-1c60-4399-97a717a3f843 2ffa361e-5858-877e-e022-ce81fe32da1b
3 Full-time employment (finding) 160903007 1969-08-06 1966-08-03 829830eb-ca6e-ae6a-99a2-728751b0ea8e 2ffa361e-5858-877e-e022-ce81fe32da1b
4 Full-time employment (finding) 160903007 1975-08-13 1972-08-09 0f6ff0c6-c365-84f5-6742-226325f0ab22 2ffa361e-5858-877e-e022-ce81fe32da1b

stg_conditions.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:55:38.865223+00:00
WITH 
"conditions_renamed" AS (
    -- Rename: Renaming columns
    -- START -> condition_start_date
    -- STOP -> condition_end_date
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> condition_code
    -- DESCRIPTION -> condition_description
    SELECT 
        "START" AS "condition_start_date",
        "STOP" AS "condition_end_date",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "condition_code",
        "DESCRIPTION" AS "condition_description"
    FROM "memory"."main"."conditions"
),

"conditions_renamed_casted" AS (
    -- Column Type Casting: 
    -- condition_code: from INT to VARCHAR
    -- condition_end_date: from VARCHAR to DATE
    -- condition_start_date: from VARCHAR to DATE
    -- encounter_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    SELECT
        "condition_description",
        CAST("condition_code" AS VARCHAR) 
        AS "condition_code",
        CAST("condition_end_date" AS DATE) 
        AS "condition_end_date",
        CAST("condition_start_date" AS DATE) 
        AS "condition_start_date",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id"
    FROM "conditions_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "conditions_renamed_casted"

stg_conditions.yml (Document the table)

version: 2
models:
- name: stg_conditions
  description: The table is about patient conditions and employment status. It shows
    start and end dates for conditions. Each row links a patient to an encounter and
    a condition code. The conditions include high school equivalency and full-time
    employment. The same patient has multiple entries over different time periods.
    The table tracks changes in employment status for the patient across years.
  columns:
  - name: condition_description
    description: Text description of the condition or status
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: .+\s\((finding|disorder)\)$
        summary: Entries ending with "(finding)" or "(disorder)"
      - regex: ([A-Z][a-z]+\s?)+(\([a-z]+\))?
        summary: Capitalized phrases (allowing for some lowercase words)
      data_type:
        current_data_type: VARCHAR
  - name: condition_code
    description: Standardized code for the condition or status
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: condition_end_date
    description: End date of the condition or status
    cocoon_meta:
      missing_reason: Ongoing conditions or one-time events don't have end dates.
      data_type:
        current_data_type: DATE
  - name: condition_start_date
    description: Start date of the condition or status
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DATE
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  tests: []

stg_claims_transactions

Table Preview (first 5 rows)

service_description patient_insurance_id outstanding_balance charge_id service_units department_id transfer_type payment_amount unit_cost adjustment_amount transaction_amount transaction_type payment_method transfer_amount fee_schedule_id appointment_id claim_id line_note patient_id primary_diagnosis_ref procedure_code procedure_modifier_1 procedure_modifier_2 provider_id quaternary_diagnosis_ref secondary_diagnosis_ref service_end_datetime service_location_id service_start_datetime supervising_provider_id tertiary_diagnosis_ref transaction_id transfer_out_id
0 General examination of patient (procedure) None 0.0 0 1 20 1 0.0 136.8 0.0 136.8 CHARGE None NaN 1 c6600278-5c90-c0c7-db52-140b3b98d1e8 9f4426b7-f32c-2b00-8ab4-1152cd0915ce None 2ffa361e-5858-877e-e022-ce81fe32da1b 1 162673000 None None 5ceef3de-8e50-3c8f-8224-d277539f4100 None 2.0 1962-07-25 12:15:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 1962-07-25 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None da18b712-59be-0489-1ba0-4830aef61968 None
1 General examination of patient (procedure) None 0.0 1 1 20 None 136.8 0.0 0.0 NaN PAYMENT CHECK NaN 1 c6600278-5c90-c0c7-db52-140b3b98d1e8 9f4426b7-f32c-2b00-8ab4-1152cd0915ce None 2ffa361e-5858-877e-e022-ce81fe32da1b 1 162673000 None None 5ceef3de-8e50-3c8f-8224-d277539f4100 None 2.0 1962-07-25 12:15:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 1962-07-25 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None 907b5d5f-4d34-385d-8fd9-564e782a8743 None
2 General examination of patient (procedure) None 0.0 3 1 20 1 0.0 136.8 0.0 136.8 CHARGE None NaN 1 5d6bce0a-93b6-1c60-4399-97a717a3f843 e22515cc-2422-7aac-49bd-ecc3a2ffdf86 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1 162673000 None None 5ceef3de-8e50-3c8f-8224-d277539f4100 None None 1963-07-31 12:15:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 1963-07-31 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None bb0601d7-9d50-d1cc-488d-7e29cd95b49c None
3 General examination of patient (procedure) None 0.0 4 1 20 None 136.8 0.0 0.0 NaN PAYMENT CASH NaN 1 5d6bce0a-93b6-1c60-4399-97a717a3f843 e22515cc-2422-7aac-49bd-ecc3a2ffdf86 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1 162673000 None None 5ceef3de-8e50-3c8f-8224-d277539f4100 None None 1963-07-31 12:15:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 1963-07-31 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None 555840f3-d0fc-4bb2-52fc-477e525e9ff1 None
4 General examination of patient (procedure) None 0.0 6 1 20 1 0.0 136.8 0.0 136.8 CHARGE None NaN 1 829830eb-ca6e-ae6a-99a2-728751b0ea8e 7ac8ce29-5517-d3d5-5004-f11942ae3225 None 2ffa361e-5858-877e-e022-ce81fe32da1b 1 162673000 None None 5ceef3de-8e50-3c8f-8224-d277539f4100 None None 1966-08-03 12:15:39 f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 1966-08-03 12:00:39 5ceef3de-8e50-3c8f-8224-d277539f4100 None 7e46adfb-5b4f-4454-037a-2d834eefc4e8 None

stg_claims_transactions.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:54:34.603382+00:00
WITH 
"claims_transactions_renamed" AS (
    -- Rename: Renaming columns
    -- ID -> transaction_id
    -- CLAIMID -> claim_id
    -- CHARGEID -> charge_id
    -- PATIENTID -> patient_id
    -- TYPE -> transaction_type
    -- AMOUNT -> transaction_amount
    -- METHOD -> payment_method
    -- FROMDATE -> service_start_datetime
    -- TODATE -> service_end_datetime
    -- PLACEOFSERVICE -> service_location_id
    -- PROCEDURECODE -> procedure_code
    -- MODIFIER1 -> procedure_modifier_1
    -- MODIFIER2 -> procedure_modifier_2
    -- DIAGNOSISREF1 -> primary_diagnosis_ref
    -- DIAGNOSISREF2 -> secondary_diagnosis_ref
    -- DIAGNOSISREF3 -> tertiary_diagnosis_ref
    -- DIAGNOSISREF4 -> quaternary_diagnosis_ref
    -- UNITS -> service_units
    -- DEPARTMENTID -> department_id
    -- NOTES -> service_description
    -- UNITAMOUNT -> unit_cost
    -- TRANSFEROUTID -> transfer_out_id
    -- TRANSFERTYPE -> transfer_type
    -- PAYMENTS -> payment_amount
    -- ADJUSTMENTS -> adjustment_amount
    -- TRANSFERS -> transfer_amount
    -- OUTSTANDING -> outstanding_balance
    -- APPOINTMENTID -> appointment_id
    -- LINENOTE -> line_note
    -- PATIENTINSURANCEID -> patient_insurance_id
    -- FEESCHEDULEID -> fee_schedule_id
    -- PROVIDERID -> provider_id
    -- SUPERVISINGPROVIDERID -> supervising_provider_id
    SELECT 
        "ID" AS "transaction_id",
        "CLAIMID" AS "claim_id",
        "CHARGEID" AS "charge_id",
        "PATIENTID" AS "patient_id",
        "TYPE" AS "transaction_type",
        "AMOUNT" AS "transaction_amount",
        "METHOD" AS "payment_method",
        "FROMDATE" AS "service_start_datetime",
        "TODATE" AS "service_end_datetime",
        "PLACEOFSERVICE" AS "service_location_id",
        "PROCEDURECODE" AS "procedure_code",
        "MODIFIER1" AS "procedure_modifier_1",
        "MODIFIER2" AS "procedure_modifier_2",
        "DIAGNOSISREF1" AS "primary_diagnosis_ref",
        "DIAGNOSISREF2" AS "secondary_diagnosis_ref",
        "DIAGNOSISREF3" AS "tertiary_diagnosis_ref",
        "DIAGNOSISREF4" AS "quaternary_diagnosis_ref",
        "UNITS" AS "service_units",
        "DEPARTMENTID" AS "department_id",
        "NOTES" AS "service_description",
        "UNITAMOUNT" AS "unit_cost",
        "TRANSFEROUTID" AS "transfer_out_id",
        "TRANSFERTYPE" AS "transfer_type",
        "PAYMENTS" AS "payment_amount",
        "ADJUSTMENTS" AS "adjustment_amount",
        "TRANSFERS" AS "transfer_amount",
        "OUTSTANDING" AS "outstanding_balance",
        "APPOINTMENTID" AS "appointment_id",
        "LINENOTE" AS "line_note",
        "PATIENTINSURANCEID" AS "patient_insurance_id",
        "FEESCHEDULEID" AS "fee_schedule_id",
        "PROVIDERID" AS "provider_id",
        "SUPERVISINGPROVIDERID" AS "supervising_provider_id"
    FROM "memory"."main"."claims_transactions"
),

"claims_transactions_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- service_description: 'Encounter for problem' and 'Encounter for problem (procedure)' are redundant. Inconsistent use of '(procedure)' suffix.
    -- transfer_type: The problem is that the transfer_type column contains a mix of numeric and alphabetic values. The value 'p' is unusual because it's alphabetic while the others ('1' and '2') are numeric. Without more context, it's unclear what 'p' represents. However, since '1' is the most frequent value and the column seems to be intended for numeric codes, it's reasonable to map 'p' to '1'.
    SELECT
        "transaction_id",
        "claim_id",
        "charge_id",
        "patient_id",
        "transaction_type",
        "transaction_amount",
        "payment_method",
        "service_start_datetime",
        "service_end_datetime",
        "service_location_id",
        "procedure_code",
        "procedure_modifier_1",
        "procedure_modifier_2",
        "primary_diagnosis_ref",
        "secondary_diagnosis_ref",
        "tertiary_diagnosis_ref",
        "quaternary_diagnosis_ref",
        "service_units",
        "department_id",
        "service_description",
        "unit_cost",
        "transfer_out_id",
        CASE
            WHEN "transfer_type" = 'p' THEN '1'
            ELSE "transfer_type"
        END AS "transfer_type",
        "payment_amount",
        "adjustment_amount",
        "transfer_amount",
        "outstanding_balance",
        "appointment_id",
        "line_note",
        "patient_insurance_id",
        "fee_schedule_id",
        "provider_id",
        "supervising_provider_id"
    FROM "claims_transactions_renamed"
),

"claims_transactions_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- service_end_datetime: ['1970-01-01T00:00:00Z']
    SELECT 
        CASE
            WHEN "service_end_datetime" = '1970-01-01T00:00:00Z' THEN NULL
            ELSE "service_end_datetime"
        END AS "service_end_datetime",
        "service_description",
        "patient_insurance_id",
        "quaternary_diagnosis_ref",
        "claim_id",
        "procedure_code",
        "outstanding_balance",
        "provider_id",
        "charge_id",
        "appointment_id",
        "primary_diagnosis_ref",
        "line_note",
        "patient_id",
        "service_units",
        "department_id",
        "transfer_type",
        "payment_amount",
        "unit_cost",
        "service_start_datetime",
        "adjustment_amount",
        "transaction_id",
        "service_location_id",
        "transaction_amount",
        "supervising_provider_id",
        "procedure_modifier_2",
        "tertiary_diagnosis_ref",
        "transaction_type",
        "secondary_diagnosis_ref",
        "payment_method",
        "procedure_modifier_1",
        "transfer_amount",
        "transfer_out_id",
        "fee_schedule_id"
    FROM "claims_transactions_renamed_cleaned"
),

"claims_transactions_renamed_cleaned_null_casted" AS (
    -- Column Type Casting: 
    -- appointment_id: from VARCHAR to UUID
    -- claim_id: from VARCHAR to UUID
    -- line_note: from DECIMAL to VARCHAR
    -- patient_id: from VARCHAR to UUID
    -- primary_diagnosis_ref: from INT to VARCHAR
    -- procedure_code: from INT to VARCHAR
    -- procedure_modifier_1: from DECIMAL to VARCHAR
    -- procedure_modifier_2: from DECIMAL to VARCHAR
    -- provider_id: from VARCHAR to UUID
    -- quaternary_diagnosis_ref: from DECIMAL to VARCHAR
    -- secondary_diagnosis_ref: from DECIMAL to VARCHAR
    -- service_end_datetime: from VARCHAR to TIMESTAMP
    -- service_location_id: from VARCHAR to UUID
    -- service_start_datetime: from VARCHAR to TIMESTAMP
    -- supervising_provider_id: from VARCHAR to UUID
    -- tertiary_diagnosis_ref: from DECIMAL to VARCHAR
    -- transaction_id: from VARCHAR to UUID
    -- transfer_out_id: from DECIMAL to VARCHAR
    SELECT
        "service_description",
        "patient_insurance_id",
        "outstanding_balance",
        "charge_id",
        "service_units",
        "department_id",
        "transfer_type",
        "payment_amount",
        "unit_cost",
        "adjustment_amount",
        "transaction_amount",
        "transaction_type",
        "payment_method",
        "transfer_amount",
        "fee_schedule_id",
        CAST("appointment_id" AS UUID) 
        AS "appointment_id",
        CAST("claim_id" AS UUID) 
        AS "claim_id",
        CAST("line_note" AS VARCHAR) 
        AS "line_note",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("primary_diagnosis_ref" AS VARCHAR) 
        AS "primary_diagnosis_ref",
        CAST("procedure_code" AS VARCHAR) 
        AS "procedure_code",
        CAST("procedure_modifier_1" AS VARCHAR) 
        AS "procedure_modifier_1",
        CAST("procedure_modifier_2" AS VARCHAR) 
        AS "procedure_modifier_2",
        CAST("provider_id" AS UUID) 
        AS "provider_id",
        CAST("quaternary_diagnosis_ref" AS VARCHAR) 
        AS "quaternary_diagnosis_ref",
        CAST("secondary_diagnosis_ref" AS VARCHAR) 
        AS "secondary_diagnosis_ref",
        CAST("service_end_datetime" AS TIMESTAMP) 
        AS "service_end_datetime",
        CAST("service_location_id" AS UUID) 
        AS "service_location_id",
        CAST("service_start_datetime" AS TIMESTAMP) 
        AS "service_start_datetime",
        CAST("supervising_provider_id" AS UUID) 
        AS "supervising_provider_id",
        CAST("tertiary_diagnosis_ref" AS VARCHAR) 
        AS "tertiary_diagnosis_ref",
        CAST("transaction_id" AS UUID) 
        AS "transaction_id",
        CAST("transfer_out_id" AS VARCHAR) 
        AS "transfer_out_id"
    FROM "claims_transactions_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT *
FROM "claims_transactions_renamed_cleaned_null_casted"

stg_claims_transactions.yml (Document the table)

version: 2
models:
- name: stg_claims_transactions
  description: The table represents claims transactions in a healthcare system. It
    includes charges and payments for medical services. Each row is a transaction
    linked to a claim. Transactions have details like patient ID, procedure code,
    amount, and payment method. The table connects patients, providers, and financial
    aspects of healthcare services.
  columns:
  - name: service_description
    description: Description of the procedure or service
    tests:
    - not_null
    cocoon_meta:
      unusual_values: '''Encounter for problem'' and ''Encounter for problem (procedure)''
        are redundant. Inconsistent use of ''(procedure)'' suffix.'
      data_type:
        current_data_type: VARCHAR
  - name: patient_insurance_id
    description: Identifier for patient's insurance
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: outstanding_balance
    description: Remaining balance to be paid
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: charge_id
    description: Identifier for the specific charge within a claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: service_units
    description: Number of units of service provided
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: department_id
    description: Identifier for the department providing the service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: transfer_type
    description: Type of transfer (e.g., 'p' for payment)
    tests:
    - accepted_values:
        values:
        - '1'
        - '2'
    cocoon_meta:
      missing_reason: Not applicable when there's no transfer involved
      future_accepted_values:
      - p
      - t
      - w
      - d
      - r
      - f
      data_type:
        current_data_type: VARCHAR
  - name: payment_amount
    description: Amount paid for the service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: unit_cost
    description: Cost per unit of service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: adjustment_amount
    description: Adjustments made to the charge
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: transaction_amount
    description: Monetary value of the transaction
    cocoon_meta:
      missing_reason: Not applicable for PAYMENT transactions, only for CHARGES
      data_type:
        current_data_type: DECIMAL
  - name: transaction_type
    description: Type of transaction (e.g., CHARGE, PAYMENT)
    tests:
    - not_null
    - accepted_values:
        values:
        - PAYMENT
        - CHARGE
        - TRANSFERIN
        - TRANSFEROUT
    cocoon_meta:
      future_accepted_values:
      - REFUND
      - WITHDRAWAL
      - DEPOSIT
      - INTEREST
      - FEE
      - ADJUSTMENT
      - REVERSAL
      - CASHBACK
      - CREDIT
      - DEBIT
      data_type:
        current_data_type: VARCHAR
  - name: payment_method
    description: Payment method used (e.g., CHECK, CASH)
    tests:
    - accepted_values:
        values:
        - ECHECK
        - COPAY
        - CASH
        - CC
        - CHECK
    cocoon_meta:
      missing_reason: Not applicable for CHARGE transactions, only for PAYMENTS
      future_accepted_values:
      - DEBIT
      - WIRE
      - ACH
      - MONEY ORDER
      - PAYPAL
      - VENMO
      - APPLE PAY
      - GOOGLE PAY
      - CRYPTOCURRENCY
      - GIFT CARD
      - STORE CREDIT
      - LOYALTY POINTS
      data_type:
        current_data_type: VARCHAR
  - name: transfer_amount
    description: Amount transferred between accounts
    cocoon_meta:
      missing_reason: Not applicable when there's no transfer involved
      data_type:
        current_data_type: DECIMAL
  - name: fee_schedule_id
    description: Identifier for the fee schedule used
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: INT
  - name: appointment_id
    description: Identifier for the associated appointment
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: claim_id
    description: Identifier for the associated claim
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: line_note
    description: Additional notes for the transaction line
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: primary_diagnosis_ref
    description: Primary diagnosis reference
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: procedure_code
    description: Code identifying the medical procedure performed
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: procedure_modifier_1
    description: First modifier for the procedure code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: procedure_modifier_2
    description: Second modifier for the procedure code
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: provider_id
    description: Unique identifier for the healthcare provider
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: quaternary_diagnosis_ref
    description: Quaternary diagnosis reference
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: secondary_diagnosis_ref
    description: Secondary diagnosis reference
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: service_end_datetime
    description: End date and time of the service
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: TIMESTAMP
  - name: service_location_id
    description: Identifier for the location of service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: service_start_datetime
    description: Start date and time of the service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: supervising_provider_id
    description: ID of supervising provider for the service
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: tertiary_diagnosis_ref
    description: Tertiary diagnosis reference
    cocoon_meta:
      missing_reason: Unknown
      data_type:
        current_data_type: VARCHAR
  - name: transaction_id
    description: Unique identifier for the transaction
    tests:
    - not_null
    - unique
    cocoon_meta:
      unique_reason: This column represents a unique identifier for each transaction.
        For this table, each row is a distinct transaction (either a charge or a payment).
        The transaction_id appears to be unique for each row, as it's different for
        every sample provided.
      data_type:
        current_data_type: UUID
  - name: transfer_out_id
    description: Identifier for transfer out transactions
    cocoon_meta:
      missing_reason: Not applicable when there's no transfer involved
      data_type:
        current_data_type: VARCHAR
  tests: []

stg_procedures

Table Preview (first 5 rows)

procedure_description procedure_cost reason_description encounter_id patient_id procedure_code procedure_end_time procedure_start_time reason_code
0 Medication Reconciliation (procedure) 493.03 None 8b1ef1fb-3297-bf02-8442-f68af013a64c 3dfb065a-67df-5b8a-3901-49bfd834bed1 430193006 2014-01-19 01:20:00 2014-01-19 01:05:00 None
1 Medication Reconciliation (procedure) 523.74 None 79b413f6-c668-bf41-bce7-b403c40cf7a4 eb247227-e839-88d3-447d-b5972468f33b 430193006 2021-09-23 02:59:04 2021-09-23 02:44:04 None
2 Medication Reconciliation (procedure) 536.16 None a79858c9-a74e-8ca3-c9a0-b63897e8fc05 eb247227-e839-88d3-447d-b5972468f33b 430193006 2021-10-28 02:59:04 2021-10-28 02:44:04 None
3 Clavicle X-ray 431.40 None 1600b01f-01b3-fd15-135f-c7c4a30d867f 3dfb065a-67df-5b8a-3901-49bfd834bed1 168594001 2016-04-11 03:42:57 2016-04-11 03:12:57 None
4 Admission to orthopedic department 431.40 Fracture of clavicle 1600b01f-01b3-fd15-135f-c7c4a30d867f 3dfb065a-67df-5b8a-3901-49bfd834bed1 305428000 2016-04-11 04:55:57 2016-04-11 03:12:57 58150001.0

stg_procedures.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:14:58.976415+00:00
WITH 
"procedures_renamed" AS (
    -- Rename: Renaming columns
    -- START -> procedure_start_time
    -- STOP -> procedure_end_time
    -- PATIENT -> patient_id
    -- ENCOUNTER -> encounter_id
    -- CODE -> procedure_code
    -- DESCRIPTION -> procedure_description
    -- BASE_COST -> procedure_cost
    -- REASONCODE -> reason_code
    -- REASONDESCRIPTION -> reason_description
    SELECT 
        "START" AS "procedure_start_time",
        "STOP" AS "procedure_end_time",
        "PATIENT" AS "patient_id",
        "ENCOUNTER" AS "encounter_id",
        "CODE" AS "procedure_code",
        "DESCRIPTION" AS "procedure_description",
        "BASE_COST" AS "procedure_cost",
        "REASONCODE" AS "reason_code",
        "REASONDESCRIPTION" AS "reason_description"
    FROM "memory"."main"."procedures"
),

"procedures_renamed_casted" AS (
    -- Column Type Casting: 
    -- encounter_id: from VARCHAR to UUID
    -- patient_id: from VARCHAR to UUID
    -- procedure_code: from INT to VARCHAR
    -- procedure_end_time: from VARCHAR to TIMESTAMP
    -- procedure_start_time: from VARCHAR to TIMESTAMP
    -- reason_code: from DECIMAL to VARCHAR
    SELECT
        "procedure_description",
        "procedure_cost",
        "reason_description",
        CAST("encounter_id" AS UUID) 
        AS "encounter_id",
        CAST("patient_id" AS UUID) 
        AS "patient_id",
        CAST("procedure_code" AS VARCHAR) 
        AS "procedure_code",
        CAST("procedure_end_time" AS TIMESTAMP) 
        AS "procedure_end_time",
        CAST("procedure_start_time" AS TIMESTAMP) 
        AS "procedure_start_time",
        CAST("reason_code" AS VARCHAR) 
        AS "reason_code"
    FROM "procedures_renamed"
)

-- COCOON BLOCK END
SELECT *
FROM "procedures_renamed_casted"

stg_procedures.yml (Document the table)

version: 2
models:
- name: stg_procedures
  description: The table is about medical procedures performed on patients. It includes
    details like start and stop times, patient ID, encounter ID, procedure code and
    description, base cost, and reason codes. Each row represents a specific procedure
    performed on a patient during a medical encounter. The table links patients, encounters,
    and procedures, providing a comprehensive view of medical services rendered.
  columns:
  - name: procedure_description
    description: Brief description of the medical procedure
    tests:
    - not_null
    cocoon_meta:
      patterns:
      - regex: .+\s\(procedure\)$
        summary: Procedures ending with "(procedure)"
      - regex: .+\s\(regime\/therapy\)$
        summary: Regimes or therapies ending with "(regime/therapy)"
      - regex: .+\s\([a-z\/]+\)$
        summary: General pattern for procedures or regimes in parentheses
      data_type:
        current_data_type: VARCHAR
  - name: procedure_cost
    description: Base cost of the medical procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: DECIMAL
  - name: reason_description
    description: Description of the reason for the procedure
    cocoon_meta:
      missing_reason: Not applicable for procedures without specific medical reasons.
      data_type:
        current_data_type: VARCHAR
  - name: encounter_id
    description: Unique identifier for the medical encounter
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: patient_id
    description: Unique identifier for the patient
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: UUID
  - name: procedure_code
    description: Numerical code for the medical procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: VARCHAR
  - name: procedure_end_time
    description: End time of the medical procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: procedure_start_time
    description: Start time of the medical procedure
    tests:
    - not_null
    cocoon_meta:
      data_type:
        current_data_type: TIMESTAMP
  - name: reason_code
    description: Code indicating the reason for the procedure
    cocoon_meta:
      missing_reason: Not applicable for procedures without specific medical reasons.
      data_type:
        current_data_type: VARCHAR
  tests: []
We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_careplans_0 stg_careplans stg_patients_15 stg_patients stg_careplans_0->stg_patients_15 stg_encounters_17 stg_encounters stg_careplans_0->stg_encounters_17 stg_organizations_1 stg_organizations stg_immunizations_2 stg_immunizations stg_immunizations_2->stg_patients_15 stg_immunizations_2->stg_encounters_17 stg_allergies_3 stg_allergies stg_allergies_3->stg_patients_15 stg_allergies_3->stg_encounters_17 stg_medications_4 stg_medications stg_payers_6 stg_payers stg_medications_4->stg_payers_6 stg_medications_4->stg_patients_15 stg_medications_4->stg_encounters_17 stg_procedures_5 stg_procedures stg_procedures_5->stg_patients_15 stg_procedures_5->stg_encounters_17 stg_providers_7 stg_providers stg_providers_7->stg_organizations_1 stg_payer_transitions_8 stg_payer_transitions stg_payer_transitions_8->stg_payers_6 stg_payer_transitions_8->stg_payers_6 stg_payer_transitions_8->stg_patients_15 stg_supplies_9 stg_supplies stg_supplies_9->stg_patients_15 stg_supplies_9->stg_encounters_17 stg_imaging_studies_10 stg_imaging_studies stg_imaging_studies_10->stg_patients_15 stg_imaging_studies_10->stg_encounters_17 stg_claims_11 stg_claims stg_claims_11->stg_payers_6 stg_claims_11->stg_payers_6 stg_claims_11->stg_providers_7 stg_claims_11->stg_providers_7 stg_claims_11->stg_providers_7 stg_claims_11->stg_patients_15 stg_devices_12 stg_devices stg_devices_12->stg_patients_15 stg_devices_12->stg_encounters_17 stg_claims_transactions_13 stg_claims_transactions stg_claims_transactions_13->stg_providers_7 stg_claims_transactions_13->stg_providers_7 stg_claims_transactions_13->stg_claims_11 stg_claims_transactions_13->stg_patients_15 stg_observations_14 stg_observations stg_observations_14->stg_patients_15 stg_observations_14->stg_encounters_17 stg_conditions_16 stg_conditions stg_conditions_16->stg_patients_15 stg_conditions_16->stg_encounters_17 stg_encounters_17->stg_organizations_1 stg_encounters_17->stg_payers_6 stg_encounters_17->stg_providers_7 stg_encounters_17->stg_patients_15

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_encounters
  primary_key: encounter_id
  foreign_keys:
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  - column: payer_id
    reference:
      table_name: stg_payers
      column: payer_id
  - column: provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  - column: organization_id
    reference:
      table_name: stg_organizations
      column: organization_id
  time_keys:
  - encounter_end
  - encounter_start
- table_name: stg_allergies
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - allergy_end_date
  - allergy_start_date
- table_name: stg_careplans
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - end_date
  - start_date
- table_name: stg_conditions
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - condition_end_date
  - condition_start_date
- table_name: stg_devices
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - usage_end_date
  - usage_start_date
- table_name: stg_imaging_studies
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - study_datetime
- table_name: stg_immunizations
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - immunization_datetime
- table_name: stg_medications
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  - column: payer_id
    reference:
      table_name: stg_payers
      column: payer_id
  time_keys:
  - order_end_datetime
  - order_start_datetime
- table_name: stg_observations
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - observation_datetime
- table_name: stg_procedures
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - procedure_end_time
  - procedure_start_time
- table_name: stg_supplies
  foreign_keys:
  - column: encounter_id
    reference:
      table_name: stg_encounters
      column: encounter_id
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  time_keys:
  - order_date
- table_name: stg_patients
  primary_key: patient_id
  foreign_keys: []
  time_keys:
  - birth_date
  - death_date
- table_name: stg_claims
  foreign_keys:
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  - column: primary_insurance_id
    reference:
      table_name: stg_payers
      column: payer_id
  - column: secondary_insurance_id
    reference:
      table_name: stg_payers
      column: payer_id
  - column: provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  - column: referring_provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  - column: supervising_provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  primary_key: claim_id
  time_keys:
  - illness_onset_date
  - patient_last_billed_date
  - primary_last_billed_date
  - secondary_last_billed_date
  - service_date
- table_name: stg_claims_transactions
  foreign_keys:
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  - column: provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  - column: supervising_provider_id
    reference:
      table_name: stg_providers
      column: provider_id
  - column: claim_id
    reference:
      table_name: stg_claims
      column: claim_id
  time_keys:
  - service_end_datetime
  - service_start_datetime
- table_name: stg_payer_transitions
  foreign_keys:
  - column: patient_id
    reference:
      table_name: stg_patients
      column: patient_id
  - column: primary_payer_id
    reference:
      table_name: stg_payers
      column: payer_id
  - column: secondary_payer_id
    reference:
      table_name: stg_payers
      column: payer_id
  time_keys:
  - coverage_end_date
  - coverage_start_date
- table_name: stg_payers
  primary_key: payer_id
  foreign_keys: []
- table_name: stg_providers
  primary_key: provider_id
  foreign_keys:
  - column: organization_id
    reference:
      table_name: stg_organizations
      column: organization_id
- table_name: stg_organizations
  primary_key: organization_id
  foreign_keys: []
We illustrate the step-by-step process behind the data.

All

Process Story

cocoon_er.yml (Document the entity relationships)

groups: []
entities:
- entity_name: Encounters
  entity_description: Represents individual medical encounters or visits, including
    details about the patient, provider, payer, and associated costs.
  table_name: stg_encounters
  primary_key: encounter_id
- entity_name: Patients
  entity_description: Represents individual patients and their personal and healthcare-related
    information.
  table_name: stg_patients
  primary_key: patient_id
- entity_name: Payers
  entity_description: Represents healthcare payers, including both government and
    private entities, along with their financial and operational data.
  table_name: stg_payers
  primary_key: payer_id
- entity_name: Providers
  entity_description: Represents healthcare providers, including their personal information,
    specialties, and practice details.
  table_name: stg_providers
  primary_key: provider_id
- entity_name: Claims
  entity_description: Represents individual insurance claims, including details about
    the patient, provider, diagnosis, and claim processing information.
  table_name: stg_claims
  primary_key: claim_id
- entity_name: Organizations
  entity_description: Represents healthcare organizations such as clinics, hospitals,
    and urgent care centers, including their contact information and operational data.
  table_name: stg_organizations
  primary_key: organization_id
relations:
- relation_name: PatientEncounterDetails
  relation_description: Encounters link Patients with Providers at Organizations,
    documenting medical procedures and associated costs covered by Payers.
  table_name: stg_encounters
  entities:
  - Encounters
  - Patients
  - Payers
  - Providers
  - Organizations
- relation_name: ProviderOrganizationAffiliation
  relation_description: Providers are affiliated with Organizations and deliver healthcare
    services within those organizations' networks.
  table_name: stg_providers
  entities:
  - Providers
  - Organizations
- relation_name: PatientProviderClaimPayment
  relation_description: Claims are filed by Patients for medical services provided
    by Providers and are processed by Payers for reimbursement.
  table_name: stg_claims
  entities:
  - Claims
  - Patients
  - Payers
  - Providers
- relation_name: PatientAllergyEncounters
  relation_description: Patients experience allergies during Encounters, with each
    allergy recorded along with its details and potential reactions.
  table_name: stg_allergies
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterCarePlans
  relation_description: Patients receive care plans during Encounters, detailing specific
    treatments and diagnoses for medical conditions.
  table_name: stg_careplans
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterConditions
  relation_description: Patients have Encounters during which their conditions and
    employment status are recorded and tracked over time.
  table_name: stg_conditions
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterDevices
  relation_description: Patients receive medical devices during Encounters, tracking
    device usage periods and specifications.
  table_name: stg_devices
  entities:
  - Encounters
  - Patients
- relation_name: PatientImagingEncounters
  relation_description: Patients undergo imaging studies during Encounters, which
    are specific healthcare visits or interactions.
  table_name: stg_imaging_studies
  entities:
  - Encounters
  - Patients
- relation_name: PatientImmunizationEncounters
  relation_description: Patients receive immunizations during Encounters, with each
    immunization event recorded as a separate row.
  table_name: stg_immunizations
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterMedicationCoverage
  relation_description: Patients receive medication prescriptions during Encounters,
    which are covered by Payers who manage the financial aspects of the medications.
  table_name: stg_medications
  entities:
  - Encounters
  - Patients
  - Payers
- relation_name: PatientEncounterObservations
  relation_description: Patients undergo medical Encounters during which various health
    observations are recorded and measured.
  table_name: stg_observations
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterProcedures
  relation_description: Patients undergo medical procedures during Encounters, which
    are documented with details such as procedure codes, descriptions, and costs.
  table_name: stg_procedures
  entities:
  - Encounters
  - Patients
- relation_name: PatientEncounterSupplies
  relation_description: Patients receive medical supplies during Encounters, with
    each supply order linked to a specific Encounter.
  table_name: stg_supplies
  entities:
  - Encounters
  - Patients
- relation_name: PatientProviderClaimTransactions
  relation_description: Patients receive medical services from Providers, generating
    Claims that track charges, payments, and adjustments for these services.
  table_name: stg_claims_transactions
  entities:
  - Patients
  - Providers
  - Claims
- relation_name: PatientInsuranceCoverage
  relation_description: This tracks insurance coverage periods for Patients, showing
    their primary and secondary Payers over time.
  table_name: stg_payer_transitions
  entities:
  - Patients
  - Payers
story:
- name: PatientInsuranceCoverage
  description: Patient obtains insurance from primary and secondary payers.
  type: relation
- name: ProviderOrganizationAffiliation
  description: Healthcare providers join organizations to offer medical services.
  type: relation
- name: PatientEncounterDetails
  description: Patient visits provider at healthcare organization for treatment.
  type: relation
- name: PatientAllergyEncounters
  description: Provider records patient's allergies during the visit.
  type: relation
- name: PatientEncounterObservations
  description: Provider conducts and records health observations and measurements.
  type: relation
- name: PatientEncounterConditions
  description: Provider diagnoses and documents patient's medical conditions.
  type: relation
- name: PatientEncounterCarePlans
  description: Provider creates tailored care plan for patient's conditions.
  type: relation
- name: PatientImagingEncounters
  description: Patient undergoes necessary imaging studies during visit.
  type: relation
- name: PatientEncounterProcedures
  description: Provider performs and documents medical procedures for patient.
  type: relation
- name: PatientEncounterDevices
  description: Provider prescribes and records medical devices for patient.
  type: relation
- name: PatientEncounterSupplies
  description: Provider orders and documents medical supplies for patient.
  type: relation
- name: PatientEncounterMedicationCoverage
  description: Provider prescribes medications covered by patient's insurance.
  type: relation
- name: PatientImmunizationEncounters
  description: Patient receives and provider records necessary immunizations.
  type: relation
- name: PatientProviderClaimTransactions
  description: Provider submits claim for services rendered to patient.
  type: relation
- name: PatientProviderClaimPayment
  description: Insurance processes and pays claim for patient's treatment.
  type: relation