allergies
Table Preview(first 5 rows)
START | STOP | PATIENT | ENCOUNTER | CODE | SYSTEM | DESCRIPTION | TYPE | CATEGORY | REACTION1 | DESCRIPTION1 | SEVERITY1 | REACTION2 | DESCRIPTION2 | SEVERITY2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1974-09-22 | NaN | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | 288328004 | SNOMED-CT | Bee venom (substance) | allergy | environment | NaN | None | None | NaN | None | None |
1 | 1974-09-22 | NaN | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | 84489001 | SNOMED-CT | Mold (organism) | allergy | environment | NaN | None | None | NaN | None | None |
2 | 1974-09-22 | NaN | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | 260147004 | SNOMED-CT | House dust mite (organism) | allergy | environment | NaN | None | None | NaN | None | None |
3 | 1974-09-22 | NaN | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | 264287008 | SNOMED-CT | Animal dander (substance) | allergy | environment | 878820003.0 | Rhinoconjunctivitis (disorder) | MODERATE | 247472004.0 | Wheal (finding) | MILD |
4 | 1974-09-22 | NaN | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | 256277009 | SNOMED-CT | Grass pollen (substance) | allergy | environment | NaN | None | None | NaN | None | None |
careplans
Table Preview(first 5 rows)
Id | START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | REASONCODE | REASONDESCRIPTION | |
---|---|---|---|---|---|---|---|---|---|
0 | e5cc23db-c84f-3ded-411d-deb020dfcb4c | 2016-04-11 | 2016-07-23 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 385691007 | Fracture care | 58150001.0 | Fracture of clavicle |
1 | d70b9f2f-745e-f594-4983-b48790e4291f | 1988-04-27 | 1988-05-12 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 3c131d0c-6d15-7fe1-1372-a3d8316b6d82 | 225358003 | Wound care | 284551006.0 | Laceration of foot |
2 | 54b5c4bb-1df6-0892-7097-416ecea0f44f | 1989-09-03 | 1990-07-10 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | d6df3fd6-e4a0-9b51-ebd5-c6394be154eb | 53950000 | Respiratory therapy | NaN | None |
3 | 254cfb18-3266-be35-b8c4-8a6a89ce0547 | 2014-01-20 | 2014-02-03 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 17d8fd78-be39-d2f7-6032-74d712972fa3 | 225358003 | Wound care | 370247008.0 | Facial laceration |
4 | 7e07f691-b437-c62b-4379-0a8e55812747 | 2017-06-08 | 2017-09-02 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 5fbc8a13-6944-a854-c4cb-bbd04e969ee1 | 47387005 | Head injury rehabilitation | 62106007.0 | Concussion with no loss of consciousness |
claims
Table Preview(first 5 rows)
Id | PATIENTID | PROVIDERID | PRIMARYPATIENTINSURANCEID | SECONDARYPATIENTINSURANCEID | DEPARTMENTID | PATIENTDEPARTMENTID | DIAGNOSIS1 | DIAGNOSIS2 | DIAGNOSIS3 | DIAGNOSIS4 | DIAGNOSIS5 | DIAGNOSIS6 | DIAGNOSIS7 | DIAGNOSIS8 | REFERRINGPROVIDERID | APPOINTMENTID | CURRENTILLNESSDATE | SERVICEDATE | SUPERVISINGPROVIDERID | STATUS1 | STATUS2 | STATUSP | OUTSTANDING1 | OUTSTANDING2 | OUTSTANDINGP | LASTBILLEDDATE1 | LASTBILLEDDATE2 | LASTBILLEDDATEP | HEALTHCARECLAIMTYPEID1 | HEALTHCARECLAIMTYPEID2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | 2ffa361e-5858-877e-e022-ce81fe32da1b | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 0 | 0 | 20 | 20 | 5251000175109 | 160903007.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 1962-07-25T12:00:39Z | 1962-07-25T12:00:39Z | 5ceef3de-8e50-3c8f-8224-d277539f4100 | CLOSED | CLOSED | CLOSED | 0.0 | 0.0 | 0.0 | 1962-07-25T12:15:39Z | 1962-07-25T12:15:39Z | 1962-07-25T12:15:39Z | 1 | 1 |
1 | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 0 | 0 | 20 | 20 | 160903007 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | 1963-07-31T12:00:39Z | 1963-07-31T12:00:39Z | 5ceef3de-8e50-3c8f-8224-d277539f4100 | CLOSED | CLOSED | CLOSED | 0.0 | 0.0 | 0.0 | 1963-07-31T12:15:39Z | 1963-07-31T12:15:39Z | 1963-07-31T12:15:39Z | 1 | 1 |
2 | 7ac8ce29-5517-d3d5-5004-f11942ae3225 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 0 | 0 | 20 | 20 | 160903007 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 1966-08-03T12:00:39Z | 1966-08-03T12:00:39Z | 5ceef3de-8e50-3c8f-8224-d277539f4100 | CLOSED | CLOSED | CLOSED | 0.0 | 0.0 | 0.0 | 1966-08-03T12:15:39Z | 1966-08-03T12:15:39Z | 1966-08-03T12:15:39Z | 1 | 1 |
3 | 92a599e7-23f3-c14d-343e-635eded378a6 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 0 | 0 | 20 | 20 | 160903007 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0f6ff0c6-c365-84f5-6742-226325f0ab22 | 1972-08-09T12:00:39Z | 1972-08-09T12:00:39Z | 5ceef3de-8e50-3c8f-8224-d277539f4100 | CLOSED | CLOSED | CLOSED | 0.0 | 0.0 | 0.0 | 1972-08-09T12:15:39Z | 1972-08-09T12:15:39Z | 1972-08-09T12:15:39Z | 1 | 1 |
4 | c409c86b-ece6-a55c-79a9-c066fe56433a | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | 0 | 10 | 10 | 410620009 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13T01:05:00Z | 2013-01-13T01:05:00Z | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | CLOSED | CLOSED | CLOSED | 0.0 | 0.0 | 0.0 | 2013-01-13T01:20:00Z | 2013-01-13T01:20:00Z | 2013-01-13T01:20:00Z | 1 | 1 |
claims_transactions
Table Preview(first 5 rows)
ID | CLAIMID | CHARGEID | PATIENTID | TYPE | AMOUNT | METHOD | FROMDATE | TODATE | PLACEOFSERVICE | PROCEDURECODE | MODIFIER1 | MODIFIER2 | DIAGNOSISREF1 | DIAGNOSISREF2 | DIAGNOSISREF3 | DIAGNOSISREF4 | UNITS | DEPARTMENTID | NOTES | UNITAMOUNT | TRANSFEROUTID | TRANSFERTYPE | PAYMENTS | ADJUSTMENTS | TRANSFERS | OUTSTANDING | APPOINTMENTID | LINENOTE | PATIENTINSURANCEID | FEESCHEDULEID | PROVIDERID | SUPERVISINGPROVIDERID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | da18b712-59be-0489-1ba0-4830aef61968 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | 0 | 2ffa361e-5858-877e-e022-ce81fe32da1b | CHARGE | 136.8 | None | 1962-07-25T12:00:39Z | 1962-07-25T12:15:39Z | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 162673000 | NaN | NaN | 1 | 2.0 | NaN | NaN | 1 | 20 | General examination of patient (procedure) | 136.8 | NaN | p | 0.0 | 0.0 | NaN | 0.0 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | NaN | None | 1 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 5ceef3de-8e50-3c8f-8224-d277539f4100 |
1 | 907b5d5f-4d34-385d-8fd9-564e782a8743 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | 1 | 2ffa361e-5858-877e-e022-ce81fe32da1b | PAYMENT | NaN | CHECK | 1962-07-25T12:00:39Z | 1962-07-25T12:15:39Z | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 162673000 | NaN | NaN | 1 | 2.0 | NaN | NaN | 1 | 20 | General examination of patient (procedure) | 0.0 | NaN | None | 136.8 | 0.0 | NaN | 0.0 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | NaN | None | 1 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 5ceef3de-8e50-3c8f-8224-d277539f4100 |
2 | bb0601d7-9d50-d1cc-488d-7e29cd95b49c | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | 3 | 2ffa361e-5858-877e-e022-ce81fe32da1b | CHARGE | 136.8 | None | 1963-07-31T12:00:39Z | 1963-07-31T12:15:39Z | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 162673000 | NaN | NaN | 1 | NaN | NaN | NaN | 1 | 20 | General examination of patient (procedure) | 136.8 | NaN | p | 0.0 | 0.0 | NaN | 0.0 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | NaN | None | 1 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 5ceef3de-8e50-3c8f-8224-d277539f4100 |
3 | 555840f3-d0fc-4bb2-52fc-477e525e9ff1 | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | 4 | 2ffa361e-5858-877e-e022-ce81fe32da1b | PAYMENT | NaN | CASH | 1963-07-31T12:00:39Z | 1963-07-31T12:15:39Z | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 162673000 | NaN | NaN | 1 | NaN | NaN | NaN | 1 | 20 | General examination of patient (procedure) | 0.0 | NaN | None | 136.8 | 0.0 | NaN | 0.0 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | NaN | None | 1 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 5ceef3de-8e50-3c8f-8224-d277539f4100 |
4 | 7e46adfb-5b4f-4454-037a-2d834eefc4e8 | 7ac8ce29-5517-d3d5-5004-f11942ae3225 | 6 | 2ffa361e-5858-877e-e022-ce81fe32da1b | CHARGE | 136.8 | None | 1966-08-03T12:00:39Z | 1966-08-03T12:15:39Z | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 162673000 | NaN | NaN | 1 | NaN | NaN | NaN | 1 | 20 | General examination of patient (procedure) | 136.8 | NaN | p | 0.0 | 0.0 | NaN | 0.0 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | NaN | None | 1 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | 5ceef3de-8e50-3c8f-8224-d277539f4100 |
conditions
Table Preview(first 5 rows)
START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | |
---|---|---|---|---|---|---|
0 | 1962-07-25 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 5251000175109 | Received certificate of high school equivalency (finding) |
1 | 1962-07-25 | 1963-07-31 | 2ffa361e-5858-877e-e022-ce81fe32da1b | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 160903007 | Full-time employment (finding) |
2 | 1963-07-31 | 1966-08-03 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | 160903007 | Full-time employment (finding) |
3 | 1966-08-03 | 1969-08-06 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 160903007 | Full-time employment (finding) |
4 | 1972-08-09 | 1975-08-13 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0f6ff0c6-c365-84f5-6742-226325f0ab22 | 160903007 | Full-time employment (finding) |
devices
Table Preview(first 5 rows)
START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | UDI | |
---|---|---|---|---|---|---|---|
0 | 2004-12-16T19:00:39Z | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 7cde5dee-2065-db54-2aca-5335fca85bd5 | 170615005 | Home nebulizer (physical object) | (01)22513743438051(11)041125(17)291210(10)68326934928628(21)359700838 |
1 | 2022-09-25T02:44:04Z | None | eb247227-e839-88d3-447d-b5972468f33b | 1cedec53-a362-320c-d52a-ddbc57fbce63 | 228869008 | Manual wheelchair (physical object) | (01)10803398532575(11)220904(17)470919(10)77584230(21)506502350562 |
2 | 2003-01-26T18:35:55Z | None | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | becb96b4-e1aa-1b80-0aa9-8b06e258871f | 170615005 | Home nebulizer (physical object) | (01)91908954162042(11)030105(17)280120(10)384840111239630881(21)2214828292399 |
3 | 2017-04-19T15:37:58Z | 2017-05-15T15:37:58Z | 7ec76836-c039-d9bf-8bb9-fe488c66d452 | 4e5eeece-1a14-8ae0-874f-15d9c6f42a34 | 228869008 | Manual wheelchair (physical object) | (01)75341347423725(11)170329(17)420413(10)91627316426(21)736039710205635138 |
4 | 2014-10-18T16:39:19Z | 2015-01-09T17:11:06Z | 734e5f3c-e660-6cbe-7c26-c5264cbde68e | a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 | 228869008 | Manual wheelchair (physical object) | (01)44815148141981(11)140927(17)391012(10)395970756765435894(21)565882491188675 |
encounters
Table Preview(first 5 rows)
Id | START | STOP | PATIENT | ORGANIZATION | PROVIDER | PAYER | ENCOUNTERCLASS | CODE | DESCRIPTION | BASE_ENCOUNTER_COST | TOTAL_CLAIM_COST | PAYER_COVERAGE | REASONCODE | REASONDESCRIPTION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13T01:05:00Z | 2013-01-13T01:20:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 0e061004-bcc0-3e99-a237-d700823e817b | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | wellness | 410620009 | Well child visit (procedure) | 136.8 | 272.80 | 272.8 | NaN | None |
1 | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | 2021-09-23T02:44:04Z | 2021-09-23T02:59:04Z | eb247227-e839-88d3-447d-b5972468f33b | ea153d0a-9ccc-3769-9a25-934b417071b7 | a9a4676a-e232-3eef-a874-e7af0fa54350 | 734afbd6-4794-363b-9bc0-6a3981533ed5 | wellness | 410620009 | Well child visit (procedure) | 136.8 | 871.12 | 0.0 | NaN | None |
2 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 1962-07-25T12:00:39Z | 1962-07-25T12:15:39Z | 2ffa361e-5858-877e-e022-ce81fe32da1b | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | wellness | 162673000 | General examination of patient (procedure) | 136.8 | 704.20 | 0.0 | NaN | None |
3 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | 1963-07-31T12:00:39Z | 1963-07-31T12:15:39Z | 2ffa361e-5858-877e-e022-ce81fe32da1b | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | wellness | 162673000 | General examination of patient (procedure) | 136.8 | 704.20 | 0.0 | NaN | None |
4 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 1966-08-03T12:00:39Z | 1966-08-03T12:15:39Z | 2ffa361e-5858-877e-e022-ce81fe32da1b | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | wellness | 162673000 | General examination of patient (procedure) | 136.8 | 778.78 | 0.0 | NaN | None |
imaging_studies
Table Preview(first 5 rows)
Id | DATE_ | PATIENT | ENCOUNTER | SERIES_UID | BODYSITE_CODE | BODYSITE_DESCRIPTION | MODALITY_CODE | MODALITY_DESCRIPTION | INSTANCE_UID | SOP_CODE | SOP_DESCRIPTION | PROCEDURE_CODE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | b7b3078a-86e4-fc51-1720-d33afcd4ce88 | 2016-04-11T03:12:57Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 1.2.840.99999999.1.63419662.1460344377161 | 51299004 | Clavicle | DX | Digital Radiography | 1.2.840.99999999.1.1.35771508.1103224827588 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 168594001 |
1 | 6f1b4250-2e8e-eaaf-f6a7-655d8f8e5791 | 2022-09-25T02:44:04Z | eb247227-e839-88d3-447d-b5972468f33b | 1cedec53-a362-320c-d52a-ddbc57fbce63 | 1.2.840.99999999.1.83549147.1664073844511 | 344001 | Ankle | DX | Digital Radiography | 1.2.840.99999999.1.1.49048804.1413650359123 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 19490002 |
2 | 932a8e0d-20d5-573b-db47-8b9db9d3b99c | 1991-05-28T22:13:31Z | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 59f48dfd-f1f6-faa7-f378-150137d46be4 | 1.2.840.99999999.1.10943528.675468811165 | 40983000 | Arm | DX | Digital Radiography | 1.2.840.99999999.1.1.93692690.675468811165 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 1225002 |
3 | c16fda8c-2353-75d9-f7c2-d91c773d87de | 2014-10-18T16:39:19Z | 734e5f3c-e660-6cbe-7c26-c5264cbde68e | a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 | 1.2.840.99999999.1.70861523.1413650359123 | 344001 | Ankle | DX | Digital Radiography | 1.2.840.99999999.1.1.49048804.1413650359123 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 19490002 |
4 | ef1ee1a1-e906-6744-421a-ab8480b8b8ae | 2013-11-23T00:28:23Z | 750cdaf4-c264-e967-e76b-53a5a61abcab | 0148026b-b4db-1663-19ed-a6a51d3c33b2 | 1.2.840.99999999.1.26012892.1385166503634 | 51299004 | Clavicle | DX | Digital Radiography | 1.2.840.99999999.1.1.35771508.1103224827588 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 168594001 |
immunizations
Table Preview(first 5 rows)
DATE_ | PATIENT | ENCOUNTER | CODE | DESCRIPTION | BASE_COST | |
---|---|---|---|---|---|---|
0 | 2013-01-13T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 140 | Influenza seasonal injectable preservative free | 136.0 |
1 | 1986-06-17T21:35:49Z | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 07969a4d-f9f0-c63f-6a0e-ecb659a20571 | 140 | Influenza seasonal injectable preservative free | 136.0 |
2 | 2014-01-19T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 21 | varicella | 136.0 |
3 | 2014-01-19T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 10 | IPV | 136.0 |
4 | 2014-01-19T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 140 | Influenza seasonal injectable preservative free | 136.0 |
medications
Table Preview(first 5 rows)
START | STOP | PATIENT | PAYER | ENCOUNTER | CODE | DESCRIPTION | BASE_COST | PAYER_COVERAGE | DISPENSES | TOTALCOST | REASONCODE | REASONDESCRIPTION | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2009-09-02T12:00:39Z | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | 546912c2-aa40-a2e0-f148-af23fa171af0 | 309362 | Clopidogrel 75 MG Oral Tablet | 631.42 | 0.00 | 162 | 102290.04 | NaN | None |
1 | 2009-09-02T12:00:39Z | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | 546912c2-aa40-a2e0-f148-af23fa171af0 | 312961 | Simvastatin 20 MG Oral Tablet | 323.20 | 29.40 | 162 | 52358.40 | NaN | None |
2 | 2009-09-02T12:00:39Z | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | 546912c2-aa40-a2e0-f148-af23fa171af0 | 866412 | 24 HR metoprolol succinate 100 MG Extended Release Oral Tablet | 129.94 | 99.94 | 162 | 21050.28 | NaN | None |
3 | 2009-09-02T12:00:39Z | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | 546912c2-aa40-a2e0-f148-af23fa171af0 | 705129 | Nitroglycerin 0.4 MG/ACTUAT Mucosal Spray | 702.61 | 672.61 | 162 | 113822.82 | NaN | None |
4 | 2016-04-11T04:55:57Z | 2016-07-23T04:55:57Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 198405 | Ibuprofen 100 MG Oral Tablet | 332.42 | 0.00 | 3 | 997.26 | NaN | None |
observations
Table Preview(first 5 rows)
DATE_ | PATIENT | ENCOUNTER | CATEGORY | CODE | DESCRIPTION | VALUE_ | UNITS | TYPE | |
---|---|---|---|---|---|---|---|---|---|
0 | 2021-09-23T02:44:04Z | eb247227-e839-88d3-447d-b5972468f33b | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | vital-signs | 8302-2 | Body Height | 54.4 | cm | numeric |
1 | 2021-09-23T02:44:04Z | eb247227-e839-88d3-447d-b5972468f33b | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | vital-signs | 72514-3 | Pain severity - 0-10 verbal numeric rating [Score] - Reported | 3.0 | {score} | numeric |
2 | 2021-09-23T02:44:04Z | eb247227-e839-88d3-447d-b5972468f33b | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | vital-signs | 29463-7 | Body Weight | 4.8 | kg | numeric |
3 | 2013-01-13T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | vital-signs | 8302-2 | Body Height | 92.3 | cm | numeric |
4 | 2013-01-13T01:05:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | vital-signs | 72514-3 | Pain severity - 0-10 verbal numeric rating [Score] - Reported | 3.0 | {score} | numeric |
organizations
Table Preview(first 5 rows)
Id | NAME | ADDRESS | CITY | STATE | ZIP | LAT | LON | PHONE | REVENUE | UTILIZATION | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 74ab949d-17ac-3309-83a0-13b4405c66aa | Fitchburg Outpatient Clinic | 881 Main Street | Fitchburg | MA | 1420 | 42.586487 | -71.805210 | 978-342-9781 Or 978-342-9781 | 0.0 | 131 |
1 | 588f6ce6-b8db-3588-8189-29db2680a313 | BOSTON HEALTH CARE FOR THE HOMELESS PROGRAM INC | 461 WALNUT AVE | JAMAICA PLAIN | MA | 21302331 | 42.311588 | -71.098001 | 8576541550 | 0.0 | 30 |
2 | e09d4c49-c2ef-3b0f-9a46-3719d9219306 | UMASS MEMORIAL HEALTHALLIANCE CLINTON HOSPITAL INC | 60 HOSPITAL RD | LEOMINSTER | MA | 14533290 | 42.540319 | -71.763130 | 9784662000 | 0.0 | 8 |
3 | aa682136-a4df-3942-9f50-e6fcee6c0b73 | SOUTHWOOD AT NORWELL NURSING CTR | 501 CORDWAINER DRIVE | NORWELL | MA | 20611630 | 42.164707 | -70.881491 | 7819827450 | 0.0 | 12 |
4 | ab701a70-a658-340e-8f69-ee196a7d40c6 | CAREWELL URGENT CARE CENTERS OF MA PC | 484 RTE 134 | WORCESTER | MA | 16071728 | 42.262562 | -71.801888 | 5086947901 | 0.0 | 3 |
patients
Table Preview(first 5 rows)
Id | BIRTHDATE | DEATHDATE | SSN | DRIVERS | PASSPORT | PREFIX | FIRST_ | LAST_ | SUFFIX | MAIDEN | MARITAL | RACE | ETHNICITY | GENDER | BIRTHPLACE | ADDRESS | CITY | STATE | COUNTY | FIPS | ZIP | LAT | LON | HEALTHCARE_EXPENSES | HEALTHCARE_COVERAGE | INCOME | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | eb247227-e839-88d3-447d-b5972468f33b | 2021-09-23 | None | 999-41-1756 | None | None | None | Mel236 | Bailey598 | None | None | None | white | nonhispanic | M | Norton Center Massachusetts US | 716 Wunsch Gardens Unit 48 | Framingham | Massachusetts | Middlesex County | 25017.0 | 1701 | 42.275650 | -71.476367 | 2520.80 | 4323.64 | 170754 |
1 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1944-05-31 | None | 999-33-4589 | S99957814 | X45639058X | Mrs. | Cheyenne169 | Marks830 | None | Lowe577 | M | white | nonhispanic | F | Longmeadow Massachusetts US | 123 Bayer Camp | Taunton | Massachusetts | Bristol County | 25005.0 | 2718 | 41.892884 | -71.066686 | 205342.20 | 94647.00 | 40526 |
2 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 2009-02-08 | None | 999-59-2568 | None | None | None | Hunter736 | Keebler762 | None | None | None | white | nonhispanic | M | Maynard Massachusetts US | 575 Jast Rue Unit 48 | Winchendon | Massachusetts | Worcester County | 25027.0 | 1475 | 42.670059 | -72.074664 | 16381.92 | 17447.87 | 79884 |
3 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 1979-06-26 | 1995-07-04 | 999-77-7700 | S99968506 | None | None | Herschel574 | Ernser583 | None | None | None | asian | nonhispanic | M | Somerville Massachusetts US | 184 Langworth Parade Apt 10 | Boston | Massachusetts | Suffolk County | 25025.0 | 2131 | 42.395516 | -71.059015 | 3850.00 | 44057.32 | 6420 |
4 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 1973-05-31 | None | 999-29-2359 | S99967405 | X86891718X | Mrs. | Lacey714 | Heathcote539 | None | Hegmann834 | M | white | nonhispanic | F | Natick Massachusetts US | 801 Morissette Divide | Hingham | Massachusetts | Plymouth County | 25023.0 | 2043 | 42.200723 | -70.836590 | 66662.10 | 1777031.06 | 933420 |
payers
Table Preview(first 5 rows)
Id | NAME | OWNERSHIP | ADDRESS | CITY | STATE_HEADQUARTERED | ZIP | PHONE | AMOUNT_COVERED | AMOUNT_UNCOVERED | REVENUE | COVERED_ENCOUNTERS | UNCOVERED_ENCOUNTERS | COVERED_MEDICATIONS | UNCOVERED_MEDICATIONS | COVERED_PROCEDURES | UNCOVERED_PROCEDURES | COVERED_IMMUNIZATIONS | UNCOVERED_IMMUNIZATIONS | UNIQUE_CUSTOMERS | QOLS_AVG | MEMBER_MONTHS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | a735bf55-83e9-331a-899d-a82a60b9f60c | Medicare | GOVERNMENT | NaN | NaN | NaN | NaN | NaN | 3245258.85 | 57324.03 | 460290.60 | 927 | 0 | 918 | 0 | 1270 | 0 | 202 | 0 | 16 | 0.753073 | 2748 |
1 | df166300-5a78-3502-a46a-832842197811 | Medicaid | GOVERNMENT | NaN | NaN | NaN | NaN | NaN | 6112421.45 | 103336.16 | 45600.00 | 1217 | 0 | 286 | 0 | 1908 | 0 | 534 | 0 | 19 | 0.953093 | 4524 |
2 | d31fccc3-1767-390d-966a-22a5156f4219 | UnitedHealthcare | PRIVATE | NaN | NaN | NaN | NaN | NaN | 37929.59 | 28070.66 | 218949.27 | 49 | 0 | 10 | 0 | 36 | 0 | 41 | 0 | 3 | 0.800393 | 384 |
3 | 8fa6c185-e44e-3e34-8bd8-39be8694f4ce | Cigna Health | PRIVATE | NaN | NaN | NaN | NaN | NaN | 2110812.70 | 318590.30 | 2102717.53 | 727 | 0 | 236 | 0 | 765 | 0 | 412 | 0 | 13 | 0.939912 | 3660 |
4 | d18ef2e6-ef40-324c-be54-34a5ee865625 | Dual Eligible | GOVERNMENT | NaN | NaN | NaN | NaN | NaN | 931603.10 | 8658.65 | 2990.00 | 201 | 0 | 84 | 0 | 231 | 0 | 48 | 0 | 5 | 0.878956 | 300 |
payer_transitions
Table Preview(first 5 rows)
PATIENT | MEMBERID | START_DATE | END_DATE | PAYER | SECONDARY_PAYER | PLAN_OWNERSHIP | OWNER_NAME | |
---|---|---|---|---|---|---|---|---|
0 | eb247227-e839-88d3-447d-b5972468f33b | 74d0f7af-d6c6-7848-9f62-c3f165170e09 | 2021-09-23T02:44:04Z | 2022-09-29T02:44:04Z | 734afbd6-4794-363b-9bc0-6a3981533ed5 | None | Guardian | Solomon675 Bailey598 |
1 | eb247227-e839-88d3-447d-b5972468f33b | 74d0f7af-d6c6-7848-9f62-c3f165170e09 | 2022-09-29T02:44:04Z | 2023-09-29T02:44:04Z | 734afbd6-4794-363b-9bc0-6a3981533ed5 | None | Guardian | Solomon675 Bailey598 |
2 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 2009-02-08T01:05:00Z | 2010-02-14T01:05:00Z | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | None | Guardian | Hong136 Keebler762 |
3 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 2010-02-14T01:05:00Z | 2011-02-20T01:05:00Z | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | None | Guardian | Hong136 Keebler762 |
4 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 2011-02-20T01:05:00Z | 2012-02-26T01:05:00Z | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | None | Guardian | Hong136 Keebler762 |
procedures
Table Preview(first 5 rows)
START | STOP | PATIENT | ENCOUNTER | CODE | DESCRIPTION | BASE_COST | REASONCODE | REASONDESCRIPTION | |
---|---|---|---|---|---|---|---|---|---|
0 | 2014-01-19T01:05:00Z | 2014-01-19T01:20:00Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 430193006 | Medication Reconciliation (procedure) | 493.03 | NaN | None |
1 | 2021-09-23T02:44:04Z | 2021-09-23T02:59:04Z | eb247227-e839-88d3-447d-b5972468f33b | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | 430193006 | Medication Reconciliation (procedure) | 523.74 | NaN | None |
2 | 2021-10-28T02:44:04Z | 2021-10-28T02:59:04Z | eb247227-e839-88d3-447d-b5972468f33b | a79858c9-a74e-8ca3-c9a0-b63897e8fc05 | 430193006 | Medication Reconciliation (procedure) | 536.16 | NaN | None |
3 | 2016-04-11T03:12:57Z | 2016-04-11T03:42:57Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 168594001 | Clavicle X-ray | 431.40 | NaN | None |
4 | 2016-04-11T03:12:57Z | 2016-04-11T04:55:57Z | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 305428000 | Admission to orthopedic department | 431.40 | 58150001.0 | Fracture of clavicle |
providers
Table Preview(first 5 rows)
Id | ORGANIZATION | NAME | GENDER | SPECIALITY | ADDRESS | CITY | STATE | ZIP | LAT | LON | ENCOUNTERS | PROCEDURES | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 6e3f59d7-742a-3b38-a4b1-1b471ed651a2 | 74ab949d-17ac-3309-83a0-13b4405c66aa | Gabriel934 Reilly981 | F | GENERAL PRACTICE | 881 Main Street | Fitchburg | MA | 1420 | 42.586487 | -71.805210 | 131 | 0 |
1 | 63bf819f-a8ee-3bc1-b34a-787fe8e8e4dc | 588f6ce6-b8db-3588-8189-29db2680a313 | Terence292 Brakus656 | M | GENERAL PRACTICE | 461 WALNUT AVE | JAMAICA PLAIN | MA | 21302331 | 42.311588 | -71.098001 | 30 | 0 |
2 | ec207479-d18d-3b2d-9f80-f76ba864aefb | e09d4c49-c2ef-3b0f-9a46-3719d9219306 | Jules135 Emard19 | M | GENERAL PRACTICE | 60 HOSPITAL RD | LEOMINSTER | MA | 14533290 | 42.540319 | -71.763130 | 8 | 0 |
3 | f751fb18-fa7d-346e-8b95-7b8cb0dbcf32 | aa682136-a4df-3942-9f50-e6fcee6c0b73 | Precious140 Runolfsson901 | F | GENERAL PRACTICE | 501 CORDWAINER DRIVE | NORWELL | MA | 20611630 | 42.164707 | -70.881491 | 12 | 0 |
4 | 5afd05ef-bd29-32bc-bb4f-a17144628cbd | ab701a70-a658-340e-8f69-ee196a7d40c6 | Eduardo902 Villegas15 | M | GENERAL PRACTICE | 484 RTE 134 | WORCESTER | MA | 16071728 | 42.262562 | -71.801888 | 3 | 0 |
supplies
Table Preview(first 5 rows)
DATE_ | PATIENT | ENCOUNTER | CODE | DESCRIPTION | QUANTITY | |
---|---|---|---|---|---|---|
0 | 1993-07-26 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 71154994-99cb-1859-da47-be6aefdc4632 | 337388004 | Blood glucose testing strips (physical object) | 50 |
1 | 1999-08-02 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 4e7c8052-dcb0-0b2d-76e9-42638068f34d | 337388004 | Blood glucose testing strips (physical object) | 50 |
2 | 2002-05-20 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | f3905515-1215-b9dc-c4dd-e5113939678a | 337388004 | Blood glucose testing strips (physical object) | 50 |
3 | 2004-05-24 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 53550222-7920-d954-8131-bdc89637746d | 337388004 | Blood glucose testing strips (physical object) | 50 |
4 | 2005-09-12 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 027a6834-445b-9d67-f1a8-437d8c1af741 | 337388004 | Blood glucose testing strips (physical object) | 50 |
stg_allergies
Table Preview (first 5 rows)
coding_system | allergen_description | sensitivity_type | allergen_category | primary_reaction_description | primary_reaction_severity | secondary_reaction_description | secondary_reaction_severity | allergen_code | allergy_end_date | allergy_start_date | encounter_id | patient_id | primary_reaction_code | secondary_reaction_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | SNOMED-CT | Bee venom (substance) | allergy | environment | None | None | None | None | 288328004 | NaT | 1974-09-22 | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | None | None |
1 | SNOMED-CT | Mold (organism) | allergy | environment | None | None | None | None | 84489001 | NaT | 1974-09-22 | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | None | None |
2 | SNOMED-CT | House dust mite (organism) | allergy | environment | None | None | None | None | 260147004 | NaT | 1974-09-22 | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | None | None |
3 | SNOMED-CT | Animal dander (substance) | allergy | environment | Rhinoconjunctivitis (disorder) | MODERATE | Wheal (finding) | MILD | 264287008 | NaT | 1974-09-22 | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 878820003.0 | 247472004.0 |
4 | SNOMED-CT | Grass pollen (substance) | allergy | environment | None | None | None | None | 256277009 | NaT | 1974-09-22 | 703ebfb9-fa6b-eb01-664b-f145749cd9f9 | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | None | None |
stg_allergies.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:46:26.795976+00:00
WITH
"allergies_renamed" AS (
-- Rename: Renaming columns
-- START -> allergy_start_date
-- STOP -> allergy_end_date
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> allergen_code
-- SYSTEM -> coding_system
-- DESCRIPTION -> allergen_description
-- TYPE -> sensitivity_type
-- CATEGORY -> allergen_category
-- REACTION1 -> primary_reaction_code
-- DESCRIPTION1 -> primary_reaction_description
-- SEVERITY1 -> primary_reaction_severity
-- REACTION2 -> secondary_reaction_code
-- DESCRIPTION2 -> secondary_reaction_description
-- SEVERITY2 -> secondary_reaction_severity
SELECT
"START" AS "allergy_start_date",
"STOP" AS "allergy_end_date",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "allergen_code",
"SYSTEM" AS "coding_system",
"DESCRIPTION" AS "allergen_description",
"TYPE" AS "sensitivity_type",
"CATEGORY" AS "allergen_category",
"REACTION1" AS "primary_reaction_code",
"DESCRIPTION1" AS "primary_reaction_description",
"SEVERITY1" AS "primary_reaction_severity",
"REACTION2" AS "secondary_reaction_code",
"DESCRIPTION2" AS "secondary_reaction_description",
"SEVERITY2" AS "secondary_reaction_severity"
FROM "memory"."main"."allergies"
),
"allergies_renamed_casted" AS (
-- Column Type Casting:
-- allergen_code: from INT to VARCHAR
-- allergy_end_date: from DECIMAL to DATE
-- allergy_start_date: from VARCHAR to DATE
-- encounter_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- primary_reaction_code: from DECIMAL to VARCHAR
-- secondary_reaction_code: from DECIMAL to VARCHAR
SELECT
"coding_system",
"allergen_description",
"sensitivity_type",
"allergen_category",
"primary_reaction_description",
"primary_reaction_severity",
"secondary_reaction_description",
"secondary_reaction_severity",
CAST("allergen_code" AS VARCHAR)
AS "allergen_code",
CAST("allergy_end_date" AS DATE)
AS "allergy_end_date",
CAST("allergy_start_date" AS DATE)
AS "allergy_start_date",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("primary_reaction_code" AS VARCHAR)
AS "primary_reaction_code",
CAST("secondary_reaction_code" AS VARCHAR)
AS "secondary_reaction_code"
FROM "allergies_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "allergies_renamed_casted"
stg_allergies.yml (Document the table)
version: 2
models:
- name: stg_allergies
description: The table is about allergies. It contains patient and encounter IDs.
Each row represents an allergy with its code, description, type, and category.
Some allergies have associated reactions and their severities. The allergies are
mostly environmental, starting on the same date for this patient. Common allergens
include bee venom, mold, dust mites, animal dander, and grass pollen.
columns:
- name: coding_system
description: Coding system used for the allergen code
tests:
- not_null
- accepted_values:
values:
- SNOMED-CT
- RxNorm
cocoon_meta:
future_accepted_values:
- ICD-10
- LOINC
- NDC
- CPT
- HCPCS
- ATC
- MedDRA
- UMLS
- NDF-RT
- UNII
data_type:
current_data_type: VARCHAR
- name: allergen_description
description: Description of the allergen
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: sensitivity_type
description: Type of sensitivity (e.g., allergy)
tests:
- not_null
- accepted_values:
values:
- allergy
- intolerance
cocoon_meta:
future_accepted_values:
- hypersensitivity
- idiosyncrasy
- autoimmune reaction
- adverse reaction
- toxicity
- chemical sensitivity
- food sensitivity
- drug sensitivity
- environmental sensitivity
data_type:
current_data_type: VARCHAR
- name: allergen_category
description: Category of the allergen (e.g., environment)
tests:
- not_null
- accepted_values:
values:
- environment
- food
- medication
cocoon_meta:
future_accepted_values:
- insect
- latex
- pet
- occupational
data_type:
current_data_type: VARCHAR
- name: primary_reaction_description
description: Description of the first allergic reaction
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: primary_reaction_severity
description: Severity of the first allergic reaction
tests:
- accepted_values:
values:
- MODERATE
- MILD
- SEVERE
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: secondary_reaction_description
description: Description of the second allergic reaction
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: secondary_reaction_severity
description: Severity of the second allergic reaction
tests:
- accepted_values:
values:
- MILD
- MODERATE
cocoon_meta:
missing_reason: Unknown
future_accepted_values:
- SEVERE
- CRITICAL
- NONE
data_type:
current_data_type: VARCHAR
- name: allergen_code
description: Standardized code for the allergen
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: allergy_end_date
description: End date of the allergy record, if applicable
cocoon_meta:
missing_reason: Ongoing allergies don't have an end date.
data_type:
current_data_type: DATE
- name: allergy_start_date
description: Start date of the allergy record
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DATE
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: primary_reaction_code
description: Code for the first allergic reaction
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: secondary_reaction_code
description: Code for the second allergic reaction
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
tests: []
stg_claims
Table Preview (first 5 rows)
secondary_insurance_id | primary_claim_type_id | patient_outstanding_amount | claim_type_id | secondary_outstanding_amount | primary_claim_status | department_id | patient_claim_status | secondary_claim_status | primary_outstanding_amount | patient_department_id | appointment_id | claim_id | illness_onset_date | octonary_diagnosis | patient_id | patient_last_billed_date | primary_diagnosis | primary_insurance_id | primary_last_billed_date | provider_id | quaternary_diagnosis | quinary_diagnosis | referring_provider_id | secondary_diagnosis | secondary_last_billed_date | senary_diagnosis | septenary_diagnosis | service_date | supervising_provider_id | tertiary_diagnosis | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | 1 | 0.0 | 1 | 0.0 | CLOSED | 20 | CLOSED | CLOSED | 0.0 | 20 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | 1962-07-25 12:00:39 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1962-07-25 12:15:39 | 5251000175109 | NaN | 1962-07-25 12:15:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | None | 160903007.0 | 1962-07-25 12:15:39 | None | None | 1962-07-25 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
1 | None | 1 | 0.0 | 1 | 0.0 | CLOSED | 20 | CLOSED | CLOSED | 0.0 | 20 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | 1963-07-31 12:00:39 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1963-07-31 12:15:39 | 160903007 | NaN | 1963-07-31 12:15:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | None | None | 1963-07-31 12:15:39 | None | None | 1963-07-31 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
2 | None | 1 | 0.0 | 1 | 0.0 | CLOSED | 20 | CLOSED | CLOSED | 0.0 | 20 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 7ac8ce29-5517-d3d5-5004-f11942ae3225 | 1966-08-03 12:00:39 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1966-08-03 12:15:39 | 160903007 | NaN | 1966-08-03 12:15:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | None | None | 1966-08-03 12:15:39 | None | None | 1966-08-03 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
3 | None | 1 | 0.0 | 1 | 0.0 | CLOSED | 20 | CLOSED | CLOSED | 0.0 | 20 | 0f6ff0c6-c365-84f5-6742-226325f0ab22 | 92a599e7-23f3-c14d-343e-635eded378a6 | 1972-08-09 12:00:39 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1972-08-09 12:15:39 | 160903007 | NaN | 1972-08-09 12:15:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | None | None | 1972-08-09 12:15:39 | None | None | 1972-08-09 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
4 | None | 1 | 0.0 | 1 | 0.0 | CLOSED | 10 | CLOSED | CLOSED | 0.0 | 10 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | c409c86b-ece6-a55c-79a9-c066fe56433a | 2013-01-13 01:05:00 | None | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 2013-01-13 01:20:00 | 410620009 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | 2013-01-13 01:20:00 | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | None | None | None | None | 2013-01-13 01:20:00 | None | None | 2013-01-13 01:05:00 | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | None |
stg_claims.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:51:17.265171+00:00
WITH
"claims_renamed" AS (
-- Rename: Renaming columns
-- Id -> claim_id
-- PATIENTID -> patient_id
-- PROVIDERID -> provider_id
-- PRIMARYPATIENTINSURANCEID -> primary_insurance_id
-- SECONDARYPATIENTINSURANCEID -> secondary_insurance_id
-- DEPARTMENTID -> department_id
-- PATIENTDEPARTMENTID -> patient_department_id
-- DIAGNOSIS1 -> primary_diagnosis
-- DIAGNOSIS2 -> secondary_diagnosis
-- DIAGNOSIS3 -> tertiary_diagnosis
-- DIAGNOSIS4 -> quaternary_diagnosis
-- DIAGNOSIS5 -> quinary_diagnosis
-- DIAGNOSIS6 -> senary_diagnosis
-- DIAGNOSIS7 -> septenary_diagnosis
-- DIAGNOSIS8 -> octonary_diagnosis
-- REFERRINGPROVIDERID -> referring_provider_id
-- APPOINTMENTID -> appointment_id
-- CURRENTILLNESSDATE -> illness_onset_date
-- SERVICEDATE -> service_date
-- SUPERVISINGPROVIDERID -> supervising_provider_id
-- STATUS1 -> primary_claim_status
-- STATUS2 -> secondary_claim_status
-- STATUSP -> patient_claim_status
-- OUTSTANDING1 -> primary_outstanding_amount
-- OUTSTANDING2 -> secondary_outstanding_amount
-- OUTSTANDINGP -> patient_outstanding_amount
-- LASTBILLEDDATE1 -> primary_last_billed_date
-- LASTBILLEDDATE2 -> secondary_last_billed_date
-- LASTBILLEDDATEP -> patient_last_billed_date
-- HEALTHCARECLAIMTYPEID1 -> primary_claim_type_id
-- HEALTHCARECLAIMTYPEID2 -> claim_type_id
SELECT
"Id" AS "claim_id",
"PATIENTID" AS "patient_id",
"PROVIDERID" AS "provider_id",
"PRIMARYPATIENTINSURANCEID" AS "primary_insurance_id",
"SECONDARYPATIENTINSURANCEID" AS "secondary_insurance_id",
"DEPARTMENTID" AS "department_id",
"PATIENTDEPARTMENTID" AS "patient_department_id",
"DIAGNOSIS1" AS "primary_diagnosis",
"DIAGNOSIS2" AS "secondary_diagnosis",
"DIAGNOSIS3" AS "tertiary_diagnosis",
"DIAGNOSIS4" AS "quaternary_diagnosis",
"DIAGNOSIS5" AS "quinary_diagnosis",
"DIAGNOSIS6" AS "senary_diagnosis",
"DIAGNOSIS7" AS "septenary_diagnosis",
"DIAGNOSIS8" AS "octonary_diagnosis",
"REFERRINGPROVIDERID" AS "referring_provider_id",
"APPOINTMENTID" AS "appointment_id",
"CURRENTILLNESSDATE" AS "illness_onset_date",
"SERVICEDATE" AS "service_date",
"SUPERVISINGPROVIDERID" AS "supervising_provider_id",
"STATUS1" AS "primary_claim_status",
"STATUS2" AS "secondary_claim_status",
"STATUSP" AS "patient_claim_status",
"OUTSTANDING1" AS "primary_outstanding_amount",
"OUTSTANDING2" AS "secondary_outstanding_amount",
"OUTSTANDINGP" AS "patient_outstanding_amount",
"LASTBILLEDDATE1" AS "primary_last_billed_date",
"LASTBILLEDDATE2" AS "secondary_last_billed_date",
"LASTBILLEDDATEP" AS "patient_last_billed_date",
"HEALTHCARECLAIMTYPEID1" AS "primary_claim_type_id",
"HEALTHCARECLAIMTYPEID2" AS "claim_type_id"
FROM "memory"."main"."claims"
),
"claims_renamed_cleaned" AS (
-- Clean unusual string values:
-- primary_insurance_id: The problem is that the '0' value does not match the UUID format of the other values in the primary_insurance_id column. All other values are valid UUIDs in the format of 8-4-4-4-12 hexadecimal characters. The '0' value is likely a placeholder or default value used when no valid UUID was available. The correct values should all be valid UUIDs or an empty string if no valid UUID is available.
SELECT
"claim_id",
"patient_id",
"provider_id",
CASE
WHEN "primary_insurance_id" = '0' THEN NULL
ELSE "primary_insurance_id"
END AS "primary_insurance_id",
"secondary_insurance_id",
"department_id",
"patient_department_id",
"primary_diagnosis",
"secondary_diagnosis",
"tertiary_diagnosis",
"quaternary_diagnosis",
"quinary_diagnosis",
"senary_diagnosis",
"septenary_diagnosis",
"octonary_diagnosis",
"referring_provider_id",
"appointment_id",
"illness_onset_date",
"service_date",
"supervising_provider_id",
"primary_claim_status",
"secondary_claim_status",
"patient_claim_status",
"primary_outstanding_amount",
"secondary_outstanding_amount",
"patient_outstanding_amount",
"primary_last_billed_date",
"secondary_last_billed_date",
"patient_last_billed_date",
"primary_claim_type_id",
"claim_type_id"
FROM "claims_renamed"
),
"claims_renamed_cleaned_null" AS (
-- NULL Imputation: Impute Null to Disguised Missing Values
-- secondary_insurance_id: ['0']
SELECT
CASE
WHEN "secondary_insurance_id" = '0' THEN NULL
ELSE "secondary_insurance_id"
END AS "secondary_insurance_id",
"patient_last_billed_date",
"claim_id",
"primary_claim_type_id",
"patient_outstanding_amount",
"primary_diagnosis",
"provider_id",
"claim_type_id",
"secondary_outstanding_amount",
"illness_onset_date",
"appointment_id",
"secondary_last_billed_date",
"senary_diagnosis",
"service_date",
"quaternary_diagnosis",
"patient_id",
"primary_claim_status",
"primary_insurance_id",
"department_id",
"octonary_diagnosis",
"septenary_diagnosis",
"patient_claim_status",
"secondary_claim_status",
"primary_last_billed_date",
"primary_outstanding_amount",
"patient_department_id",
"referring_provider_id",
"supervising_provider_id",
"secondary_diagnosis",
"quinary_diagnosis",
"tertiary_diagnosis"
FROM "claims_renamed_cleaned"
),
"claims_renamed_cleaned_null_casted" AS (
-- Column Type Casting:
-- appointment_id: from VARCHAR to UUID
-- claim_id: from VARCHAR to UUID
-- illness_onset_date: from VARCHAR to TIMESTAMP
-- octonary_diagnosis: from DECIMAL to VARCHAR
-- patient_id: from VARCHAR to UUID
-- patient_last_billed_date: from VARCHAR to TIMESTAMP
-- primary_diagnosis: from INT to VARCHAR
-- primary_insurance_id: from VARCHAR to UUID
-- primary_last_billed_date: from VARCHAR to TIMESTAMP
-- provider_id: from VARCHAR to UUID
-- quaternary_diagnosis: from DECIMAL to VARCHAR
-- quinary_diagnosis: from DECIMAL to VARCHAR
-- referring_provider_id: from DECIMAL to VARCHAR
-- secondary_diagnosis: from DECIMAL to VARCHAR
-- secondary_last_billed_date: from VARCHAR to TIMESTAMP
-- senary_diagnosis: from DECIMAL to VARCHAR
-- septenary_diagnosis: from DECIMAL to VARCHAR
-- service_date: from VARCHAR to TIMESTAMP
-- supervising_provider_id: from VARCHAR to UUID
-- tertiary_diagnosis: from DECIMAL to VARCHAR
SELECT
"secondary_insurance_id",
"primary_claim_type_id",
"patient_outstanding_amount",
"claim_type_id",
"secondary_outstanding_amount",
"primary_claim_status",
"department_id",
"patient_claim_status",
"secondary_claim_status",
"primary_outstanding_amount",
"patient_department_id",
CAST("appointment_id" AS UUID)
AS "appointment_id",
CAST("claim_id" AS UUID)
AS "claim_id",
CAST("illness_onset_date" AS TIMESTAMP)
AS "illness_onset_date",
CAST("octonary_diagnosis" AS VARCHAR)
AS "octonary_diagnosis",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("patient_last_billed_date" AS TIMESTAMP)
AS "patient_last_billed_date",
CAST("primary_diagnosis" AS VARCHAR)
AS "primary_diagnosis",
CAST("primary_insurance_id" AS UUID)
AS "primary_insurance_id",
CAST("primary_last_billed_date" AS TIMESTAMP)
AS "primary_last_billed_date",
CAST("provider_id" AS UUID)
AS "provider_id",
CAST("quaternary_diagnosis" AS VARCHAR)
AS "quaternary_diagnosis",
CAST("quinary_diagnosis" AS VARCHAR)
AS "quinary_diagnosis",
CAST("referring_provider_id" AS VARCHAR)
AS "referring_provider_id",
CAST("secondary_diagnosis" AS VARCHAR)
AS "secondary_diagnosis",
CAST("secondary_last_billed_date" AS TIMESTAMP)
AS "secondary_last_billed_date",
CAST("senary_diagnosis" AS VARCHAR)
AS "senary_diagnosis",
CAST("septenary_diagnosis" AS VARCHAR)
AS "septenary_diagnosis",
CAST("service_date" AS TIMESTAMP)
AS "service_date",
CAST("supervising_provider_id" AS UUID)
AS "supervising_provider_id",
CAST("tertiary_diagnosis" AS VARCHAR)
AS "tertiary_diagnosis"
FROM "claims_renamed_cleaned_null"
)
-- COCOON BLOCK END
SELECT *
FROM "claims_renamed_cleaned_null_casted"
stg_claims.yml (Document the table)
version: 2
models:
- name: stg_claims
description: The table is about insurance claims. It contains details of each claim,
including patient ID, provider ID, diagnosis codes, appointment details, and claim
status. The table tracks claim processing information such as outstanding amounts,
last billed dates, and healthcare claim type IDs. Each row represents a unique
claim with its associated patient, provider, and billing information.
columns:
- name: secondary_insurance_id
description: Secondary insurance identifier for the patient
cocoon_meta:
missing_reason: Not applicable if patient doesn't have secondary insurance.
data_type:
current_data_type: VARCHAR
- name: primary_claim_type_id
description: Type identifier for primary healthcare claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: patient_outstanding_amount
description: Outstanding amount for patient claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: claim_type_id
description: Identifies the type of healthcare claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: secondary_outstanding_amount
description: Outstanding amount for secondary claim
cocoon_meta:
missing_reason: Not applicable if there's no secondary insurance claim.
data_type:
current_data_type: DECIMAL
- name: primary_claim_status
description: Status of primary claim
tests:
- not_null
- accepted_values:
values:
- CLOSED
- BILLED
cocoon_meta:
future_accepted_values:
- OPEN
- PENDING
- APPROVED
- DENIED
- PAID
- PARTIALLY PAID
- APPEALED
- UNDER REVIEW
- CANCELLED
data_type:
current_data_type: VARCHAR
- name: department_id
description: Identifier for the department handling the claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: patient_claim_status
description: Status of patient claim
tests:
- not_null
- accepted_values:
values:
- CLOSED
- BILLED
cocoon_meta:
future_accepted_values:
- OPEN
- PENDING
- APPROVED
- DENIED
- UNDER REVIEW
- APPEALED
- PAID
- PARTIALLY PAID
- CANCELLED
data_type:
current_data_type: VARCHAR
- name: secondary_claim_status
description: Status of secondary claim
tests:
- accepted_values:
values:
- CLOSED
- BILLED
cocoon_meta:
missing_reason: Not applicable if there's no secondary insurance claim.
future_accepted_values:
- OPEN
- PENDING
- DENIED
- APPROVED
- PAID
- PARTIAL_PAID
- APPEALED
- CANCELLED
- SUSPENDED
data_type:
current_data_type: VARCHAR
- name: primary_outstanding_amount
description: Outstanding amount for primary claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: patient_department_id
description: Department identifier associated with the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: appointment_id
description: Unique identifier for the appointment
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: claim_id
description: Unique identifier for the claim
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for each claim. For
this table, each row represents a unique insurance claim. The claim_id appears
to be unique across rows, as it's a UUID-like string that's different for
each sample.
data_type:
current_data_type: UUID
- name: illness_onset_date
description: Date of current illness onset
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: octonary_diagnosis
description: Eighth diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_last_billed_date
description: Last billed date for patient claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: primary_diagnosis
description: Primary diagnosis code
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: primary_insurance_id
description: Primary insurance identifier for the patient
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: UUID
- name: primary_last_billed_date
description: Last billed date for primary claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: provider_id
description: Unique identifier for the healthcare provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: quaternary_diagnosis
description: Fourth diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: quinary_diagnosis
description: Fifth diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: referring_provider_id
description: Identifier for the referring provider
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: secondary_diagnosis
description: Secondary diagnosis code
cocoon_meta:
missing_reason: Not applicable if there's no secondary insurance claim.
data_type:
current_data_type: VARCHAR
- name: secondary_last_billed_date
description: Last billed date for secondary claim
cocoon_meta:
missing_reason: Not applicable if there's no secondary insurance claim.
data_type:
current_data_type: TIMESTAMP
- name: senary_diagnosis
description: Sixth diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: septenary_diagnosis
description: Seventh diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: service_date
description: Date of service provided
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: supervising_provider_id
description: Identifier for the supervising provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: tertiary_diagnosis
description: Third diagnosis code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
tests: []
stg_providers
Table Preview (first 5 rows)
provider_name | GENDER | SPECIALITY | street_address | CITY | STATE | latitude | longitude | patient_encounters | procedures_performed | ZIP | organization_id | provider_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Gabriel Reilly | F | GENERAL PRACTICE | 881 Main Street | Fitchburg | MA | 42.586487 | -71.805210 | 131 | 0 | 1420 | 74ab949d-17ac-3309-83a0-13b4405c66aa | 6e3f59d7-742a-3b38-a4b1-1b471ed651a2 |
1 | Terence Brakus | M | GENERAL PRACTICE | 461 WALNUT AVE | JAMAICA PLAIN | MA | 42.311588 | -71.098001 | 30 | 0 | 21302331 | 588f6ce6-b8db-3588-8189-29db2680a313 | 63bf819f-a8ee-3bc1-b34a-787fe8e8e4dc |
2 | Jules Emard | M | GENERAL PRACTICE | 60 HOSPITAL RD | LEOMINSTER | MA | 42.540319 | -71.763130 | 8 | 0 | 14533290 | e09d4c49-c2ef-3b0f-9a46-3719d9219306 | ec207479-d18d-3b2d-9f80-f76ba864aefb |
3 | Precious Runolfsson | F | GENERAL PRACTICE | 501 CORDWAINER DRIVE | NORWELL | MA | 42.164707 | -70.881491 | 12 | 0 | 20611630 | aa682136-a4df-3942-9f50-e6fcee6c0b73 | f751fb18-fa7d-346e-8b95-7b8cb0dbcf32 |
4 | Eduardo Villegas | M | GENERAL PRACTICE | 484 RTE 134 | WORCESTER | MA | 42.262562 | -71.801888 | 3 | 0 | 16071728 | ab701a70-a658-340e-8f69-ee196a7d40c6 | 5afd05ef-bd29-32bc-bb4f-a17144628cbd |
stg_providers.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:16:20.447136+00:00
WITH
"providers_renamed" AS (
-- Rename: Renaming columns
-- Id -> provider_id
-- ORGANIZATION -> organization_id
-- NAME -> provider_name
-- ADDRESS -> street_address
-- LAT -> latitude
-- LON -> longitude
-- ENCOUNTERS -> patient_encounters
-- PROCEDURES -> procedures_performed
SELECT
"Id" AS "provider_id",
"ORGANIZATION" AS "organization_id",
"NAME" AS "provider_name",
"GENDER",
"SPECIALITY",
"ADDRESS" AS "street_address",
"CITY",
"STATE",
"ZIP",
"LAT" AS "latitude",
"LON" AS "longitude",
"ENCOUNTERS" AS "patient_encounters",
"PROCEDURES" AS "procedures_performed"
FROM "memory"."main"."providers"
),
"providers_renamed_cleaned" AS (
-- Clean unusual string values:
-- provider_name: The problem is that all the provider names in this column contain numbers, which is highly unusual for real names. The names appear to be a combination of a first name followed by numbers, then a last name followed by more numbers. This pattern is consistent across all entries and doesn't represent typical naming conventions. The correct values should be the names without the numbers.
SELECT
"provider_id",
"organization_id",
CASE
WHEN "provider_name" = 'Afton574 Kulas532' THEN 'Afton Kulas'
WHEN "provider_name" = 'Agustín529 Gaona896' THEN 'Agustín Gaona'
WHEN "provider_name" = 'Alan320 Keeling57' THEN 'Alan Keeling'
WHEN "provider_name" = 'Alvin56 Hayes766' THEN 'Alvin Hayes'
WHEN "provider_name" = 'Ammie189 Kuvalis369' THEN 'Ammie Kuvalis'
WHEN "provider_name" = 'Anderson154 Lemke654' THEN 'Anderson Lemke'
WHEN "provider_name" = 'Angelic427 McCullough561' THEN 'Angelic McCullough'
WHEN "provider_name" = 'Annita733 Frami345' THEN 'Annita Frami'
WHEN "provider_name" = 'Antwan357 Lindgren255' THEN 'Antwan Lindgren'
WHEN "provider_name" = 'Ariane992 Pagac496' THEN 'Ariane Pagac'
WHEN "provider_name" = 'Arla414 Fritsch593' THEN 'Arla Fritsch'
WHEN "provider_name" = 'Barrett790 Wolf938' THEN 'Barrett Wolf'
WHEN "provider_name" = 'Bernardo699 Nieves278' THEN 'Bernardo Nieves'
WHEN "provider_name" = 'Bradly656 Lang846' THEN 'Bradly Lang'
WHEN "provider_name" = 'Brigitte394 Bartoletti50' THEN 'Brigitte Bartoletti'
WHEN "provider_name" = 'Brock407 Bergstrom287' THEN 'Brock Bergstrom'
WHEN "provider_name" = 'Carisa395 Kutch271' THEN 'Carisa Kutch'
WHEN "provider_name" = 'Carlos172 Wiza601' THEN 'Carlos Wiza'
WHEN "provider_name" = 'Carri827 Flatley871' THEN 'Carri Flatley'
WHEN "provider_name" = 'Chantell995 Krajcik437' THEN 'Chantell Krajcik'
WHEN "provider_name" = 'Charlena802 Lockman863' THEN 'Charlena Lockman'
WHEN "provider_name" = 'Cherrie404 Conn188' THEN 'Cherrie Conn'
WHEN "provider_name" = 'Chong355 Balistreri607' THEN 'Chong Balistreri'
WHEN "provider_name" = 'Clemente531 Schamberger479' THEN 'Clemente Schamberger'
WHEN "provider_name" = 'Clemente531 Treutel973' THEN 'Clemente Treutel'
WHEN "provider_name" = 'Cristi782 Miller503' THEN 'Cristi Miller'
WHEN "provider_name" = 'Cristy798 Auer97' THEN 'Cristy Auer'
WHEN "provider_name" = 'Dana512 Wilkinson796' THEN 'Dana Wilkinson'
WHEN "provider_name" = 'Danial835 Lehner980' THEN 'Danial Lehner'
WHEN "provider_name" = 'Danita413 Becker968' THEN 'Danita Becker'
WHEN "provider_name" = 'Dannie881 Howell947' THEN 'Dannie Howell'
WHEN "provider_name" = 'Danuta945 Little434' THEN 'Danuta Little'
WHEN "provider_name" = 'Dean966 Kris249' THEN 'Dean Kris'
WHEN "provider_name" = 'Delicia67 Bernhard322' THEN 'Delicia Bernhard'
WHEN "provider_name" = 'Dorcas534 Quigley282' THEN 'Dorcas Quigley'
WHEN "provider_name" = 'Drew592 Ankunding277' THEN 'Drew Ankunding'
WHEN "provider_name" = 'Eduardo902 Méndez913' THEN 'Eduardo Méndez'
WHEN "provider_name" = 'Eduardo902 Villegas15' THEN 'Eduardo Villegas'
WHEN "provider_name" = 'Elise948 Smitham825' THEN 'Elise Smitham'
WHEN "provider_name" = 'Eliseo499 Reinger292' THEN 'Eliseo Reinger'
WHEN "provider_name" = 'Ellen406 Farrell962' THEN 'Ellen Farrell'
WHEN "provider_name" = 'Elmer371 Kutch271' THEN 'Elmer Kutch'
WHEN "provider_name" = 'Elvera717 Gusikowski974' THEN 'Elvera Gusikowski'
WHEN "provider_name" = 'Emerita401 Satterfield305' THEN 'Emerita Satterfield'
WHEN "provider_name" = 'Errol226 Gaylord332' THEN 'Errol Gaylord'
WHEN "provider_name" = 'Ervin886 Leannon79' THEN 'Ervin Leannon'
WHEN "provider_name" = 'Erwin847 Stiedemann542' THEN 'Erwin Stiedemann'
WHEN "provider_name" = 'Esteban536 Torres807' THEN 'Esteban Torres'
WHEN "provider_name" = 'Eufemia350 Casper496' THEN 'Eufemia Casper'
WHEN "provider_name" = 'Evelia928 Bogisich202' THEN 'Evelia Bogisich'
WHEN "provider_name" = 'Fidel864 Swift555' THEN 'Fidel Swift'
WHEN "provider_name" = 'Francisco472 Gusikowski974' THEN 'Francisco Gusikowski'
WHEN "provider_name" = 'Gabriel934 Reilly981' THEN 'Gabriel Reilly'
WHEN "provider_name" = 'Gerardo48 Montez99' THEN 'Gerardo Montez'
WHEN "provider_name" = 'Germán350 Dueñas839' THEN 'Germán Dueñas'
WHEN "provider_name" = 'Gonzalo160 Rodríguez701' THEN 'Gonzalo Rodríguez'
WHEN "provider_name" = 'Gregory545 Toy286' THEN 'Gregory Toy'
WHEN "provider_name" = 'Hal307 Wiegand701' THEN 'Hal Wiegand'
WHEN "provider_name" = 'Hank686 Kemmer137' THEN 'Hank Kemmer'
WHEN "provider_name" = 'Harvey63 Tromp100' THEN 'Harvey Tromp'
WHEN "provider_name" = 'Haywood675 Bednar518' THEN 'Haywood Bednar'
WHEN "provider_name" = 'Horace32 Grady603' THEN 'Horace Grady'
WHEN "provider_name" = 'Humberto482 Mann644' THEN 'Humberto Mann'
WHEN "provider_name" = 'Hyman89 Schmeler639' THEN 'Hyman Schmeler'
WHEN "provider_name" = 'Ignacio928 O''Keefe54' THEN 'Ignacio O''Keefe'
WHEN "provider_name" = 'Ilana185 Torp761' THEN 'Ilana Torp'
WHEN "provider_name" = 'Irvin970 Emard19' THEN 'Irvin Emard'
WHEN "provider_name" = 'Isaias604 Dickinson688' THEN 'Isaias Dickinson'
WHEN "provider_name" = 'Isiah14 Nikolaus26' THEN 'Isiah Nikolaus'
WHEN "provider_name" = 'Isiah14 Schaefer657' THEN 'Isiah Schaefer'
WHEN "provider_name" = 'Jacobo456 Vergara204' THEN 'Jacobo Vergara'
WHEN "provider_name" = 'Janeth814 Hyatt152' THEN 'Janeth Hyatt'
WHEN "provider_name" = 'Jeana169 Cronin387' THEN 'Jeana Cronin'
WHEN "provider_name" = 'Jefferey580 Luettgen772' THEN 'Jefferey Luettgen'
WHEN "provider_name" = 'Joanna347 Abbott774' THEN 'Joanna Abbott'
WHEN "provider_name" = 'Johna806 Klein929' THEN 'Johna Klein'
WHEN "provider_name" = 'Joline447 Stark857' THEN 'Joline Stark'
WHEN "provider_name" = 'Jolynn62 Adams676' THEN 'Jolynn Adams'
WHEN "provider_name" = 'Joshua658 Borer986' THEN 'Joshua Borer'
WHEN "provider_name" = 'Juan88 Friesen796' THEN 'Juan Friesen'
WHEN "provider_name" = 'Jules135 Emard19' THEN 'Jules Emard'
WHEN "provider_name" = 'Julio255 Rivas297' THEN 'Julio Rivas'
WHEN "provider_name" = 'Kai187 Mann644' THEN 'Kai Mann'
WHEN "provider_name" = 'Kathi607 Hilpert278' THEN 'Kathi Hilpert'
WHEN "provider_name" = 'Keith571 Friesen796' THEN 'Keith Friesen'
WHEN "provider_name" = 'Kendra609 Kassulke119' THEN 'Kendra Kassulke'
WHEN "provider_name" = 'Kirby843 Rippin620' THEN 'Kirby Rippin'
WHEN "provider_name" = 'Kisha966 Beahan375' THEN 'Kisha Beahan'
WHEN "provider_name" = 'Kizzie166 Lehner980' THEN 'Kizzie Lehner'
WHEN "provider_name" = 'Kristina583 Bashirian201' THEN 'Kristina Bashirian'
WHEN "provider_name" = 'Kyoko885 Price929' THEN 'Kyoko Price'
WHEN "provider_name" = 'Lane844 Gutmann970' THEN 'Lane Gutmann'
WHEN "provider_name" = 'Lashanda692 Gutmann970' THEN 'Lashanda Gutmann'
WHEN "provider_name" = 'Lasonya941 Kiehn525' THEN 'Lasonya Kiehn'
WHEN "provider_name" = 'Latasha666 Fay398' THEN 'Latasha Fay'
WHEN "provider_name" = 'Latoria810 Eichmann909' THEN 'Latoria Eichmann'
WHEN "provider_name" = 'Latrina689 White193' THEN 'Latrina White'
WHEN "provider_name" = 'Leif534 Hane680' THEN 'Leif Hane'
WHEN "provider_name" = 'Leonarda398 Schumm995' THEN 'Leonarda Schumm'
WHEN "provider_name" = 'Lindsay928 Leffler128' THEN 'Lindsay Leffler'
WHEN "provider_name" = 'Linn541 Lynch190' THEN 'Linn Lynch'
WHEN "provider_name" = 'Lino542 Feest103' THEN 'Lino Feest'
WHEN "provider_name" = 'Livia401 Rippin620' THEN 'Livia Rippin'
WHEN "provider_name" = 'Luvenia178 Breitenberg711' THEN 'Luvenia Breitenberg'
WHEN "provider_name" = 'Lyman173 Kshlerin58' THEN 'Lyman Kshlerin'
WHEN "provider_name" = 'Madelaine318 Walker122' THEN 'Madelaine Walker'
WHEN "provider_name" = 'Magdalena964 Torphy630' THEN 'Magdalena Torphy'
WHEN "provider_name" = 'Maragaret140 Erdman779' THEN 'Maragaret Erdman'
WHEN "provider_name" = 'Maren639 Aufderhar910' THEN 'Maren Aufderhar'
WHEN "provider_name" = 'Margarite168 Koepp521' THEN 'Margarite Koepp'
WHEN "provider_name" = 'Margene509 Schamberger479' THEN 'Margene Schamberger'
WHEN "provider_name" = 'Mariah942 Hilpert278' THEN 'Mariah Hilpert'
WHEN "provider_name" = 'Mariana775 Menéndez746' THEN 'Mariana Menéndez'
WHEN "provider_name" = 'Mariette443 Rau926' THEN 'Mariette Rau'
WHEN "provider_name" = 'Matthew562 Bailey598' THEN 'Matthew Bailey'
WHEN "provider_name" = 'Maynard46 Ward668' THEN 'Maynard Ward'
WHEN "provider_name" = 'Melissa844 Yundt842' THEN 'Melissa Yundt'
WHEN "provider_name" = 'Mertie42 Lakin515' THEN 'Mertie Lakin'
WHEN "provider_name" = 'Milda157 Schoen8' THEN 'Milda Schoen'
WHEN "provider_name" = 'Millard193 Doyle959' THEN 'Millard Doyle'
WHEN "provider_name" = 'Nicholle822 Satterfield305' THEN 'Nicholle Satterfield'
WHEN "provider_name" = 'Noma845 Mraz590' THEN 'Noma Mraz'
WHEN "provider_name" = 'Norah104 Jenkins714' THEN 'Norah Jenkins'
WHEN "provider_name" = 'Olen518 Farrell962' THEN 'Olen Farrell'
WHEN "provider_name" = 'Orpha286 Marks830' THEN 'Orpha Marks'
WHEN "provider_name" = 'Paris331 Ruecker817' THEN 'Paris Ruecker'
WHEN "provider_name" = 'Patrina117 Strosin214' THEN 'Patrina Strosin'
WHEN "provider_name" = 'Paulene52 Kihn564' THEN 'Paulene Kihn'
WHEN "provider_name" = 'Precious140 Runolfsson901' THEN 'Precious Runolfsson'
WHEN "provider_name" = 'Ramona980 Zavala169' THEN 'Ramona Zavala'
WHEN "provider_name" = 'Randa356 Ritchie586' THEN 'Randa Ritchie'
WHEN "provider_name" = 'Randy380 Bergstrom287' THEN 'Randy Bergstrom'
WHEN "provider_name" = 'Randy380 Kilback373' THEN 'Randy Kilback'
WHEN "provider_name" = 'Rebekah348 Rippin620' THEN 'Rebekah Rippin'
WHEN "provider_name" = 'Reginia455 Johnson679' THEN 'Reginia Johnson'
WHEN "provider_name" = 'Rex53 Gerhold939' THEN 'Rex Gerhold'
WHEN "provider_name" = 'Rhonda22 Bins636' THEN 'Rhonda Bins'
WHEN "provider_name" = 'Rod343 Durgan499' THEN 'Rod Durgan'
WHEN "provider_name" = 'Rod343 Goyette777' THEN 'Rod Goyette'
WHEN "provider_name" = 'Rodney21 Hettinger594' THEN 'Rodney Hettinger'
WHEN "provider_name" = 'Rolland302 Lemke654' THEN 'Rolland Lemke'
WHEN "provider_name" = 'Royce974 Kohler843' THEN 'Royce Kohler'
WHEN "provider_name" = 'Rupert654 Walker122' THEN 'Rupert Walker'
WHEN "provider_name" = 'Rusty501 Rodriguez71' THEN 'Rusty Rodriguez'
WHEN "provider_name" = 'Sallie654 Barrows492' THEN 'Sallie Barrows'
WHEN "provider_name" = 'Samuel331 Runte676' THEN 'Samuel Runte'
WHEN "provider_name" = 'Santina680 Dicki44' THEN 'Santina Dicki'
WHEN "provider_name" = 'Sean831 DuBuque211' THEN 'Sean DuBuque'
WHEN "provider_name" = 'Sharilyn202 Wolff180' THEN 'Sharilyn Wolff'
WHEN "provider_name" = 'Shasta644 King743' THEN 'Shasta King'
WHEN "provider_name" = 'Shawanna357 Roob72' THEN 'Shawanna Roob'
WHEN "provider_name" = 'Shayla126 White193' THEN 'Shayla White'
WHEN "provider_name" = 'Shon148 Reinger292' THEN 'Shon Reinger'
WHEN "provider_name" = 'Shon148 Swift555' THEN 'Shon Swift'
WHEN "provider_name" = 'Tambra47 Heaney114' THEN 'Tambra Heaney'
WHEN "provider_name" = 'Tamisha203 Wilderman619' THEN 'Tamisha Wilderman'
WHEN "provider_name" = 'Tashina114 Towne435' THEN 'Tashina Towne'
WHEN "provider_name" = 'Teodoro374 Koss676' THEN 'Teodoro Koss'
WHEN "provider_name" = 'Terence292 Brakus656' THEN 'Terence Brakus'
WHEN "provider_name" = 'Tomás404 Hinojosa147' THEN 'Tomás Hinojosa'
WHEN "provider_name" = 'Trey250 Satterfield305' THEN 'Trey Satterfield'
WHEN "provider_name" = 'Trinity427 Hartmann983' THEN 'Trinity Hartmann'
WHEN "provider_name" = 'Troy560 Carter549' THEN 'Troy Carter'
WHEN "provider_name" = 'Tyron580 Littel644' THEN 'Tyron Littel'
WHEN "provider_name" = 'Venice604 Rogahn59' THEN 'Venice Rogahn'
WHEN "provider_name" = 'Vito638 Barton704' THEN 'Vito Barton'
WHEN "provider_name" = 'Waldo53 Hintz995' THEN 'Waldo Hintz'
WHEN "provider_name" = 'Willian804 Batz141' THEN 'Willian Batz'
WHEN "provider_name" = 'Wilson960 Turner526' THEN 'Wilson Turner'
WHEN "provider_name" = 'Zane918 Bosco882' THEN 'Zane Bosco'
ELSE "provider_name"
END AS "provider_name",
"GENDER",
"SPECIALITY",
"street_address",
"CITY",
"STATE",
"ZIP",
"latitude",
"longitude",
"patient_encounters",
"procedures_performed"
FROM "providers_renamed"
),
"providers_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- ZIP: from INT to VARCHAR
-- organization_id: from VARCHAR to UUID
-- provider_id: from VARCHAR to UUID
SELECT
"provider_name",
"GENDER",
"SPECIALITY",
"street_address",
"CITY",
"STATE",
"latitude",
"longitude",
"patient_encounters",
"procedures_performed",
CAST("ZIP" AS VARCHAR)
AS "ZIP",
CAST("organization_id" AS UUID)
AS "organization_id",
CAST("provider_id" AS UUID)
AS "provider_id"
FROM "providers_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "providers_renamed_cleaned_casted"
stg_providers.yml (Document the table)
version: 2
models:
- name: stg_providers
description: The table is about healthcare providers. It includes their unique ID,
organization, name, gender, specialty, and contact details. The table also provides
location information including address, city, state, ZIP code, latitude, and longitude.
Additionally, it records the number of encounters and procedures for each provider.
All providers listed specialize in general practice and are located in Massachusetts.
columns:
- name: provider_name
description: Full name of the healthcare provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: GENDER
description: Gender of the healthcare provider
tests:
- not_null
- accepted_values:
values:
- F
- M
cocoon_meta:
future_accepted_values:
- O
data_type:
current_data_type: VARCHAR
- name: SPECIALITY
description: Medical specialty of the provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: street_address
description: Street address of the provider's practice
tests:
- not_null
cocoon_meta:
patterns:
- regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)$
summary: Basic street address with number and street name
- regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)(?:\s+(?:SUITE|STE)\s+\d+)?$
summary: Address with suite or unit number
- regex: ^[A-Z0-9\s]+$
summary: General pattern allowing for various formats
- regex: \d+\s+[A-Za-z\s&''\.]+\s+(Road|ST\.|Street|ROAD|PL)
summary: Street addresses with number, name, and type (most strict)
- regex: \d+\s+[A-Za-z\s&''\.]+
summary: Street addresses with number and name (less strict)
data_type:
current_data_type: VARCHAR
- name: CITY
description: City where the provider's practice is located
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: STATE
description: State where the provider's practice is located
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: latitude
description: Latitude coordinate of the provider's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: longitude
description: Longitude coordinate of the provider's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: patient_encounters
description: Number of patient encounters for the provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: procedures_performed
description: Number of procedures performed by the provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: ZIP
description: ZIP code of the provider's practice
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: organization_id
description: Identifier for the provider's organization
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: provider_id
description: Unique identifier for the healthcare provider
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for the healthcare
provider. For this table, each row is for a unique provider. The provider_id
is designed to be unique for each provider, ensuring no two providers have
the same ID.
data_type:
current_data_type: UUID
tests: []
stg_supplies
Table Preview (first 5 rows)
supply_description | quantity_ordered | encounter_id | order_date | patient_id | supply_code | |
---|---|---|---|---|---|---|
0 | Blood glucose testing strips (physical object) | 50 | 027a6834-445b-9d67-f1a8-437d8c1af741 | 2005-09-12 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 337388004 |
1 | Blood glucose testing strips (physical object) | 50 | 76530141-ca06-a819-537e-73a498808c84 | 2005-10-24 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 337388004 |
2 | Blood glucose testing strips (physical object) | 50 | e46e3f29-51a3-9d87-d57d-9d317b179c16 | 2008-10-13 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 337388004 |
3 | Blood glucose testing strips (physical object) | 50 | c563ea20-ad9e-2001-a4c9-c00890842509 | 2009-10-19 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 337388004 |
4 | Blood glucose testing strips (physical object) | 50 | c8e62551-765a-de49-e5cc-fd98f712ae73 | 2014-11-17 | b3b71304-fe5b-bda4-6822-bd901b2836d1 | 337388004 |
stg_supplies.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:17:12.506926+00:00
WITH
"supplies_renamed" AS (
-- Rename: Renaming columns
-- DATE_ -> order_date
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> supply_code
-- DESCRIPTION -> supply_description
-- QUANTITY -> quantity_ordered
SELECT
"DATE_" AS "order_date",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "supply_code",
"DESCRIPTION" AS "supply_description",
"QUANTITY" AS "quantity_ordered"
FROM "memory"."main"."supplies"
),
"supplies_renamed_dedup" AS (
-- Deduplication: Removed 7 duplicated rows
SELECT DISTINCT *
FROM "supplies_renamed"
),
"supplies_renamed_dedup_cleaned" AS (
-- Clean unusual string values:
-- supply_description: The supply_description column has several inconsistencies: 1. Inconsistent spacing: 'Air filter device' has an extra space. 2. Inconsistent representations of CPAP/BPAP: Some use full words, others use abbreviations. 3. Inconsistent use of parentheses: Most entries have '(physical object)' at the end, but one has '(substance)' and another has '(product)'. 4. Inconsistent capitalization: Some entries start with capital letters, others don't. The correct values should have consistent spacing, use full words instead of abbreviations, use '(physical object)' consistently for devices, and have consistent capitalization.
SELECT
"order_date",
"patient_id",
"encounter_id",
"supply_code",
CASE
WHEN "supply_description" = 'Air filter device (physical object)' THEN 'Air filter device (physical object)'
WHEN "supply_description" = 'CPAP/BPAP oral mask (physical object)' THEN 'Continuous positive airway pressure/Bilevel positive airway pressure oral mask (physical object)'
WHEN "supply_description" = 'Solution (substance)' THEN 'Solution (physical object)'
WHEN "supply_description" = 'Packed red blood cells (product)' THEN 'Packed red blood cells (physical object)'
ELSE "supply_description"
END AS "supply_description",
"quantity_ordered"
FROM "supplies_renamed_dedup"
),
"supplies_renamed_dedup_cleaned_casted" AS (
-- Column Type Casting:
-- encounter_id: from VARCHAR to UUID
-- order_date: from VARCHAR to DATE
-- patient_id: from VARCHAR to UUID
-- supply_code: from INT to VARCHAR
SELECT
"supply_description",
"quantity_ordered",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("order_date" AS DATE)
AS "order_date",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("supply_code" AS VARCHAR)
AS "supply_code"
FROM "supplies_renamed_dedup_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "supplies_renamed_dedup_cleaned_casted"
stg_supplies.yml (Document the table)
version: 2
models:
- name: stg_supplies
description: The table is about medical supply orders. It shows the date, patient
ID, encounter ID, supply code, description, and quantity ordered. All entries
are for blood glucose testing strips. The same patient received 50 strips on multiple
occasions between 1993 and 2005. Each order is associated with a unique encounter
ID.
columns:
- name: supply_description
description: Text description of the ordered medical supply
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: quantity_ordered
description: Number of units ordered
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: encounter_id
description: Unique identifier for each medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: order_date
description: Date of the medical supply order
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DATE
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: supply_code
description: Standardized code for the medical supply
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
tests: []
stg_encounters
Table Preview (first 5 rows)
encounter_type | procedure_description | base_cost | total_cost | payer_coverage | reason_description | encounter_end | encounter_id | encounter_start | organization_id | patient_id | payer_id | procedure_code | provider_id | reason_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | wellness | Well child visit (procedure) | 136.8 | 272.80 | 272.8 | None | 2013-01-13 01:20:00 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 0e061004-bcc0-3e99-a237-d700823e817b | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | 410620009 | 5053c59c-2f74-3246-9fda-cf2fbe4f6972 | None |
1 | wellness | Well child visit (procedure) | 136.8 | 871.12 | 0.0 | None | 2021-09-23 02:59:04 | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | 2021-09-23 02:44:04 | ea153d0a-9ccc-3769-9a25-934b417071b7 | eb247227-e839-88d3-447d-b5972468f33b | 734afbd6-4794-363b-9bc0-6a3981533ed5 | 410620009 | a9a4676a-e232-3eef-a874-e7af0fa54350 | None |
2 | wellness | General examination of patient (procedure) | 136.8 | 704.20 | 0.0 | None | 1962-07-25 12:15:39 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 1962-07-25 12:00:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 2ffa361e-5858-877e-e022-ce81fe32da1b | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | 162673000 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
3 | wellness | General examination of patient (procedure) | 136.8 | 704.20 | 0.0 | None | 1963-07-31 12:15:39 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | 1963-07-31 12:00:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 2ffa361e-5858-877e-e022-ce81fe32da1b | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | 162673000 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
4 | wellness | General examination of patient (procedure) | 136.8 | 778.78 | 0.0 | None | 1966-08-03 12:15:39 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 1966-08-03 12:00:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 2ffa361e-5858-877e-e022-ce81fe32da1b | b1c428d6-4f07-31e0-90f0-68ffa6ff8c76 | 162673000 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None |
stg_encounters.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:59:11.305102+00:00
WITH
"encounters_renamed" AS (
-- Rename: Renaming columns
-- Id -> encounter_id
-- START -> encounter_start
-- STOP -> encounter_end
-- PATIENT -> patient_id
-- ORGANIZATION -> organization_id
-- PROVIDER -> provider_id
-- PAYER -> payer_id
-- ENCOUNTERCLASS -> encounter_type
-- CODE -> procedure_code
-- DESCRIPTION -> procedure_description
-- BASE_ENCOUNTER_COST -> base_cost
-- TOTAL_CLAIM_COST -> total_cost
-- PAYER_COVERAGE -> payer_coverage
-- REASONCODE -> reason_code
-- REASONDESCRIPTION -> reason_description
SELECT
"Id" AS "encounter_id",
"START" AS "encounter_start",
"STOP" AS "encounter_end",
"PATIENT" AS "patient_id",
"ORGANIZATION" AS "organization_id",
"PROVIDER" AS "provider_id",
"PAYER" AS "payer_id",
"ENCOUNTERCLASS" AS "encounter_type",
"CODE" AS "procedure_code",
"DESCRIPTION" AS "procedure_description",
"BASE_ENCOUNTER_COST" AS "base_cost",
"TOTAL_CLAIM_COST" AS "total_cost",
"PAYER_COVERAGE" AS "payer_coverage",
"REASONCODE" AS "reason_code",
"REASONDESCRIPTION" AS "reason_description"
FROM "memory"."main"."encounters"
),
"encounters_renamed_cleaned" AS (
-- Clean unusual string values:
-- procedure_description: The procedure_description column has several inconsistencies: 1. Some entries have "(procedure)" appended while others don't. 2. There are redundant entries like "Encounter for problem" and "Encounter for problem (procedure)". 3. Some entries have inconsistent capitalization or punctuation. 4. A few entries seem out of place or too specific compared to others.The correct values should follow a consistent pattern, preferably with "(procedure)" appended where appropriate, and use consistent capitalization and punctuation.
-- reason_description: The problem is inconsistent labeling of medical conditions. Some values include '(disorder)' at the end while others don't. Additionally, some conditions are labeled as '(finding)' or '(situation)'. The correct values should consistently use the base condition name without these suffixes for uniformity.
SELECT
"encounter_id",
"encounter_start",
"encounter_end",
"patient_id",
"organization_id",
"provider_id",
"payer_id",
"encounter_type",
"procedure_code",
CASE
WHEN "procedure_description" = 'Follow-up encounter' THEN 'Follow-up encounter (procedure)'
WHEN "procedure_description" = 'Encounter for problem' THEN 'Encounter for problem (procedure)'
WHEN "procedure_description" = 'Prenatal visit' THEN 'Prenatal visit (procedure)'
WHEN "procedure_description" = 'Patient encounter procedure' THEN 'Patient encounter procedure (procedure)'
WHEN "procedure_description" = 'Encounter for ''check-up''' THEN 'Encounter for check up (procedure)'
WHEN "procedure_description" = 'Emergency room admission (procedure)' THEN 'Emergency Room Admission (procedure)'
WHEN "procedure_description" = 'Emergency Room Admission' THEN 'Emergency Room Admission (procedure)'
WHEN "procedure_description" = 'Encounter for symptom' THEN 'Encounter for symptom (procedure)'
WHEN "procedure_description" = 'Asthma follow-up' THEN 'Asthma follow-up (procedure)'
WHEN "procedure_description" = 'Postnatal visit' THEN 'Postnatal visit (procedure)'
WHEN "procedure_description" = 'Death Certification' THEN 'Death Certification (procedure)'
WHEN "procedure_description" = 'Gynecology service (qualifier value)' THEN 'Gynecology service (procedure)'
WHEN "procedure_description" = 'Screening surveillance (regime/therapy)' THEN 'Screening surveillance (procedure)'
WHEN "procedure_description" = 'Encounter Inpatient' THEN 'Inpatient encounter (procedure)'
WHEN "procedure_description" = 'Drug rehabilitation and detoxification' THEN 'Drug rehabilitation and detoxification (procedure)'
WHEN "procedure_description" = 'Office Visit' THEN 'Office visit (procedure)'
WHEN "procedure_description" = 'Emergency Encounter' THEN 'Emergency encounter (procedure)'
WHEN "procedure_description" = 'Encounter for Problem' THEN 'Encounter for problem (procedure)'
WHEN "procedure_description" = 'Patient-initiated encounter' THEN 'Patient-initiated encounter (procedure)'
WHEN "procedure_description" = 'Admission to thoracic surgery department' THEN 'Admission to thoracic surgery department (procedure)'
WHEN "procedure_description" = 'posttraumatic stress disorder' THEN 'Posttraumatic stress disorder follow-up (procedure)'
WHEN "procedure_description" = 'Initial Psychiatric Interview with mental status evaluation' THEN 'Initial psychiatric interview (procedure)'
ELSE "procedure_description"
END AS "procedure_description",
"base_cost",
"total_cost",
"payer_coverage",
"reason_code",
CASE
WHEN "reason_description" = 'Chronic congestive heart failure (disorder)' THEN 'Chronic congestive heart failure'
WHEN "reason_description" = 'Viral sinusitis (disorder)' THEN 'Viral sinusitis'
WHEN "reason_description" = 'Acute bronchitis (disorder)' THEN 'Acute bronchitis'
WHEN "reason_description" = 'Acute viral pharyngitis (disorder)' THEN 'Acute viral pharyngitis'
WHEN "reason_description" = 'Malignant neoplasm of breast (disorder)' THEN 'Malignant neoplasm of breast'
WHEN "reason_description" = 'Ischemic heart disease (disorder)' THEN 'Ischemic heart disease'
WHEN "reason_description" = 'Anemia (disorder)' THEN 'Anemia'
WHEN "reason_description" = 'Sinusitis (disorder)' THEN 'Sinusitis'
WHEN "reason_description" = 'Sleep disorder (disorder)' THEN 'Sleep disorder'
WHEN "reason_description" = 'Acute bacterial sinusitis (disorder)' THEN 'Acute bacterial sinusitis'
WHEN "reason_description" = 'Streptococcal sore throat (disorder)' THEN 'Streptococcal sore throat'
WHEN "reason_description" = 'Acute myeloid leukemia disease (disorder)' THEN 'Acute myeloid leukemia'
WHEN "reason_description" = 'Alzheimer''s disease (disorder)' THEN 'Alzheimer''s disease'
WHEN "reason_description" = 'History of coronary artery bypass grafting (situation)' THEN 'History of coronary artery bypass grafting'
WHEN "reason_description" = 'Abnormal findings diagnostic imaging heart+coronary circulat (finding)' THEN 'Abnormal findings diagnostic imaging heart and coronary circulation'
WHEN "reason_description" = 'Acute ST segment elevation myocardial infarction (disorder)' THEN 'Acute ST segment elevation myocardial infarction'
WHEN "reason_description" = 'Acute non-ST segment elevation myocardial infarction (disorder)' THEN 'Acute non-ST segment elevation myocardial infarction'
WHEN "reason_description" = 'Pulmonary emphysema (disorder)' THEN 'Pulmonary emphysema'
WHEN "reason_description" = 'Aortic valve stenosis (disorder)' THEN 'Aortic valve stenosis'
WHEN "reason_description" = 'Sepsis (disorder)' THEN 'Sepsis'
WHEN "reason_description" = 'Chronic obstructive bronchitis (disorder)' THEN 'Chronic obstructive bronchitis'
WHEN "reason_description" = 'Fracture of forearm' THEN 'Fracture of Forearm'
WHEN "reason_description" = 'Fracture subluxation of wrist' THEN 'Fracture Subluxation of Wrist'
WHEN "reason_description" = 'Injury of medial collateral ligament of knee' THEN 'Injury of Medial Collateral Ligament of Knee'
WHEN "reason_description" = 'Localized primary osteoarthritis of the hand' THEN 'Localized Primary Osteoarthritis of the Hand'
WHEN "reason_description" = 'Perennial allergic rhinitis' THEN 'Perennial Allergic Rhinitis'
WHEN "reason_description" = 'Perennial allergic rhinitis with seasonal variation' THEN 'Perennial Allergic Rhinitis with Seasonal Variation'
WHEN "reason_description" = 'Polyp of colon' THEN 'Polyp of Colon'
ELSE "reason_description"
END AS "reason_description"
FROM "encounters_renamed"
),
"encounters_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- encounter_end: from VARCHAR to TIMESTAMP
-- encounter_id: from VARCHAR to UUID
-- encounter_start: from VARCHAR to TIMESTAMP
-- organization_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- payer_id: from VARCHAR to UUID
-- procedure_code: from INT to VARCHAR
-- provider_id: from VARCHAR to UUID
-- reason_code: from DECIMAL to VARCHAR
SELECT
"encounter_type",
"procedure_description",
"base_cost",
"total_cost",
"payer_coverage",
"reason_description",
CAST("encounter_end" AS TIMESTAMP)
AS "encounter_end",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("encounter_start" AS TIMESTAMP)
AS "encounter_start",
CAST("organization_id" AS UUID)
AS "organization_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("payer_id" AS UUID)
AS "payer_id",
CAST("procedure_code" AS VARCHAR)
AS "procedure_code",
CAST("provider_id" AS UUID)
AS "provider_id",
CAST("reason_code" AS VARCHAR)
AS "reason_code"
FROM "encounters_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "encounters_renamed_cleaned_casted"
stg_encounters.yml (Document the table)
version: 2
models:
- name: stg_encounters
description: The table is about medical encounters. It includes details like encounter
ID, start and stop times, patient ID, provider ID, and payer ID. Each encounter
has a class, code, description, and associated costs. The table links patients,
healthcare providers, organizations, and payers involved in each encounter. It
also includes information on claim costs and payer coverage.
columns:
- name: encounter_type
description: Type or category of the medical encounter
tests:
- not_null
- accepted_values:
values:
- ambulatory
- wellness
- outpatient
- emergency
- inpatient
- home
- urgentcare
- snf
- virtual
- hospice
cocoon_meta:
future_accepted_values:
- primary care
- specialist
- dental
- vision
- mental health
- rehabilitation
- palliative
- long-term care
data_type:
current_data_type: VARCHAR
- name: procedure_description
description: Description of the procedure or visit
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: base_cost
description: Base cost of the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: total_cost
description: Total cost claimed for the encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: payer_coverage
description: Amount covered by the insurance payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: reason_description
description: Description of the reason for the encounter
cocoon_meta:
missing_reason: Not applicable for routine wellness visits without specific
issues.
data_type:
current_data_type: VARCHAR
- name: encounter_end
description: End date and time of the encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: encounter_id
description: Unique identifier for the encounter
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column is explicitly described as a unique identifier for
the encounter. For this table, each row represents a unique medical encounter.
The encounter_id is designed to be unique across all encounters.
data_type:
current_data_type: UUID
- name: encounter_start
description: Start date and time of the encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: organization_id
description: Unique identifier for the healthcare organization
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: payer_id
description: Unique identifier for the insurance payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: procedure_code
description: Medical code for the procedure or visit
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: provider_id
description: Unique identifier for the healthcare provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: reason_code
description: Code for the reason of the encounter
cocoon_meta:
missing_reason: Not applicable for routine wellness visits without specific
issues.
data_type:
current_data_type: VARCHAR
tests: []
stg_payers
Table Preview (first 5 rows)
payer_name | ownership_type | covered_amount | uncovered_amount | revenue | covered_encounters | uncovered_encounters | covered_medications | uncovered_medications | covered_procedures | uncovered_procedures | covered_immunizations | uncovered_immunizations | unique_customers | average_qol_score | member_months | city | payer_id | phone_number | state | street_address | zip_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Medicare | GOVERNMENT | 3245258.85 | 57324.03 | 460290.60 | 927 | 0 | 918 | 0 | 1270 | 0 | 202 | 0 | 16 | 0.753073 | 2748 | None | a735bf55-83e9-331a-899d-a82a60b9f60c | None | None | None | None |
1 | Medicaid | GOVERNMENT | 6112421.45 | 103336.16 | 45600.00 | 1217 | 0 | 286 | 0 | 1908 | 0 | 534 | 0 | 19 | 0.953093 | 4524 | None | df166300-5a78-3502-a46a-832842197811 | None | None | None | None |
2 | UnitedHealthcare | PRIVATE | 37929.59 | 28070.66 | 218949.27 | 49 | 0 | 10 | 0 | 36 | 0 | 41 | 0 | 3 | 0.800393 | 384 | None | d31fccc3-1767-390d-966a-22a5156f4219 | None | None | None | None |
3 | Cigna Health | PRIVATE | 2110812.70 | 318590.30 | 2102717.53 | 727 | 0 | 236 | 0 | 765 | 0 | 412 | 0 | 13 | 0.939912 | 3660 | None | 8fa6c185-e44e-3e34-8bd8-39be8694f4ce | None | None | None | None |
4 | Dual Eligible | GOVERNMENT | 931603.10 | 8658.65 | 2990.00 | 201 | 0 | 84 | 0 | 231 | 0 | 48 | 0 | 5 | 0.878956 | 300 | None | d18ef2e6-ef40-324c-be54-34a5ee865625 | None | None | None | None |
stg_payers.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:12:14.679962+00:00
WITH
"payers_renamed" AS (
-- Rename: Renaming columns
-- Id -> payer_id
-- NAME -> payer_name
-- OWNERSHIP -> ownership_type
-- ADDRESS -> street_address
-- CITY -> city
-- STATE_HEADQUARTERED -> state
-- ZIP -> zip_code
-- PHONE -> phone_number
-- AMOUNT_COVERED -> covered_amount
-- AMOUNT_UNCOVERED -> uncovered_amount
-- REVENUE -> revenue
-- COVERED_ENCOUNTERS -> covered_encounters
-- UNCOVERED_ENCOUNTERS -> uncovered_encounters
-- COVERED_MEDICATIONS -> covered_medications
-- UNCOVERED_MEDICATIONS -> uncovered_medications
-- COVERED_PROCEDURES -> covered_procedures
-- UNCOVERED_PROCEDURES -> uncovered_procedures
-- COVERED_IMMUNIZATIONS -> covered_immunizations
-- UNCOVERED_IMMUNIZATIONS -> uncovered_immunizations
-- UNIQUE_CUSTOMERS -> unique_customers
-- QOLS_AVG -> average_qol_score
-- MEMBER_MONTHS -> member_months
SELECT
"Id" AS "payer_id",
"NAME" AS "payer_name",
"OWNERSHIP" AS "ownership_type",
"ADDRESS" AS "street_address",
"CITY" AS "city",
"STATE_HEADQUARTERED" AS "state",
"ZIP" AS "zip_code",
"PHONE" AS "phone_number",
"AMOUNT_COVERED" AS "covered_amount",
"AMOUNT_UNCOVERED" AS "uncovered_amount",
"REVENUE" AS "revenue",
"COVERED_ENCOUNTERS" AS "covered_encounters",
"UNCOVERED_ENCOUNTERS" AS "uncovered_encounters",
"COVERED_MEDICATIONS" AS "covered_medications",
"UNCOVERED_MEDICATIONS" AS "uncovered_medications",
"COVERED_PROCEDURES" AS "covered_procedures",
"UNCOVERED_PROCEDURES" AS "uncovered_procedures",
"COVERED_IMMUNIZATIONS" AS "covered_immunizations",
"UNCOVERED_IMMUNIZATIONS" AS "uncovered_immunizations",
"UNIQUE_CUSTOMERS" AS "unique_customers",
"QOLS_AVG" AS "average_qol_score",
"MEMBER_MONTHS" AS "member_months"
FROM "memory"."main"."payers"
),
"payers_renamed_casted" AS (
-- Column Type Casting:
-- city: from DECIMAL to VARCHAR
-- payer_id: from VARCHAR to UUID
-- phone_number: from DECIMAL to VARCHAR
-- state: from DECIMAL to VARCHAR
-- street_address: from DECIMAL to VARCHAR
-- zip_code: from DECIMAL to VARCHAR
SELECT
"payer_name",
"ownership_type",
"covered_amount",
"uncovered_amount",
"revenue",
"covered_encounters",
"uncovered_encounters",
"covered_medications",
"uncovered_medications",
"covered_procedures",
"uncovered_procedures",
"covered_immunizations",
"uncovered_immunizations",
"unique_customers",
"average_qol_score",
"member_months",
CAST("city" AS VARCHAR)
AS "city",
CAST("payer_id" AS UUID)
AS "payer_id",
CAST("phone_number" AS VARCHAR)
AS "phone_number",
CAST("state" AS VARCHAR)
AS "state",
CAST("street_address" AS VARCHAR)
AS "street_address",
CAST("zip_code" AS VARCHAR)
AS "zip_code"
FROM "payers_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "payers_renamed_casted"
stg_payers.yml (Document the table)
version: 2
models:
- name: stg_payers
description: The table is about payers in the healthcare system. It includes government
and private payers. Each payer has an ID, name, ownership type, and contact details.
The table provides financial data like covered and uncovered amounts, revenue,
and encounters. It also includes information on medications, procedures, immunizations,
unique customers, quality of life scores, and member months.
columns:
- name: payer_name
description: Name of the payer organization
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: ownership_type
description: Type of ownership (government or private)
tests:
- not_null
- accepted_values:
values:
- PRIVATE
- GOVERNMENT
- NO_INSURANCE
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: covered_amount
description: Total amount covered by the payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: uncovered_amount
description: Total amount not covered by the payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: revenue
description: Total revenue generated by the payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: covered_encounters
description: Number of covered healthcare encounters
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: uncovered_encounters
description: Number of uncovered healthcare encounters
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: covered_medications
description: Number of covered medications
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: uncovered_medications
description: Number of uncovered medications
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: covered_procedures
description: Number of covered medical procedures
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: uncovered_procedures
description: Number of uncovered medical procedures
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: covered_immunizations
description: Number of covered immunizations
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: uncovered_immunizations
description: Number of uncovered immunizations
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: unique_customers
description: Number of unique customers served
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: average_qol_score
description: Average Quality of Life Score
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: member_months
description: Total number of months members were enrolled
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: city
description: City where the payer is located
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: payer_id
description: Unique identifier for each payer
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for each payer. For
this table, each row is for a distinct payer, and payer_id is unique across
rows.
data_type:
current_data_type: UUID
- name: phone_number
description: Contact phone number for the payer
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: state
description: State where the payer is headquartered
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: street_address
description: Street address of the payer
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: zip_code
description: ZIP code of the payer's location
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
tests: []
stg_patients
Table Preview (first 5 rows)
ssn | drivers_license | passport_number | name_prefix | first_name | last_name | name_suffix | maiden_name | marital_status | race | ethnicity | gender | birthplace | street_address | city | state | county | fips_code | latitude | longitude | healthcare_expenses | healthcare_coverage | annual_income | birth_date | death_date | patient_id | zip_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | None | None | None | None | Mel | Bailey598 | None | None | None | white | nonhispanic | M | Norton Center Massachusetts US | 716 Wunsch Gardens Unit 48 | Framingham | Massachusetts | Middlesex County | 25017.0 | 42.275650 | -71.476367 | 2520.80 | 4323.64 | 170754 | 2021-09-23 | NaT | eb247227-e839-88d3-447d-b5972468f33b | 1701 |
1 | None | S99957814 | X45639058X | Mrs. | Cheyenne | Marks830 | None | Lowe | M | white | nonhispanic | F | Longmeadow Massachusetts US | 123 Bayer Camp | Taunton | Massachusetts | Bristol County | 25005.0 | 41.892884 | -71.066686 | 205342.20 | 94647.00 | 40526 | 1944-05-31 | NaT | 2ffa361e-5858-877e-e022-ce81fe32da1b | 2718 |
2 | None | None | None | None | Hunter | Keebler762 | None | None | None | white | nonhispanic | M | Maynard Massachusetts US | 575 Jast Rue Unit 48 | Winchendon | Massachusetts | Worcester County | 25027.0 | 42.670059 | -72.074664 | 16381.92 | 17447.87 | 79884 | 2009-02-08 | NaT | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 1475 |
3 | None | S99968506 | None | None | Herschel | Ernser583 | None | None | None | asian | nonhispanic | M | Somerville Massachusetts US | 184 Langworth Parade Apt 10 | Boston | Massachusetts | Suffolk County | 25025.0 | 42.395516 | -71.059015 | 3850.00 | 44057.32 | 6420 | 1979-06-26 | 1995-07-04 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 2131 |
4 | None | S99967405 | X86891718X | Mrs. | Lacey | Heathcote539 | None | Hegmann | M | white | nonhispanic | F | Natick Massachusetts US | 801 Morissette Divide | Hingham | Massachusetts | Plymouth County | 25023.0 | 42.200723 | -70.836590 | 66662.10 | 1777031.06 | 933420 | 1973-05-31 | NaT | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | 2043 |
stg_patients.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:11:16.831598+00:00
WITH
"patients_renamed" AS (
-- Rename: Renaming columns
-- Id -> patient_id
-- BIRTHDATE -> birth_date
-- DEATHDATE -> death_date
-- SSN -> ssn
-- DRIVERS -> drivers_license
-- PASSPORT -> passport_number
-- PREFIX -> name_prefix
-- FIRST_ -> first_name
-- LAST_ -> last_name
-- SUFFIX -> name_suffix
-- MAIDEN -> maiden_name
-- MARITAL -> marital_status
-- RACE -> race
-- ETHNICITY -> ethnicity
-- GENDER -> gender
-- BIRTHPLACE -> birthplace
-- ADDRESS -> street_address
-- CITY -> city
-- STATE -> state
-- COUNTY -> county
-- FIPS -> fips_code
-- ZIP -> zip_code
-- LAT -> latitude
-- LON -> longitude
-- HEALTHCARE_EXPENSES -> healthcare_expenses
-- HEALTHCARE_COVERAGE -> healthcare_coverage
-- INCOME -> annual_income
SELECT
"Id" AS "patient_id",
"BIRTHDATE" AS "birth_date",
"DEATHDATE" AS "death_date",
"SSN" AS "ssn",
"DRIVERS" AS "drivers_license",
"PASSPORT" AS "passport_number",
"PREFIX" AS "name_prefix",
"FIRST_" AS "first_name",
"LAST_" AS "last_name",
"SUFFIX" AS "name_suffix",
"MAIDEN" AS "maiden_name",
"MARITAL" AS "marital_status",
"RACE" AS "race",
"ETHNICITY" AS "ethnicity",
"GENDER" AS "gender",
"BIRTHPLACE" AS "birthplace",
"ADDRESS" AS "street_address",
"CITY" AS "city",
"STATE" AS "state",
"COUNTY" AS "county",
"FIPS" AS "fips_code",
"ZIP" AS "zip_code",
"LAT" AS "latitude",
"LON" AS "longitude",
"HEALTHCARE_EXPENSES" AS "healthcare_expenses",
"HEALTHCARE_COVERAGE" AS "healthcare_coverage",
"INCOME" AS "annual_income"
FROM "memory"."main"."patients"
),
"patients_renamed_cleaned" AS (
-- Clean unusual string values:
-- ssn: The problem is that all the SSN values begin with 999, which is invalid for U.S. Social Security Numbers. Valid SSNs never start with 999. These appear to be placeholder or dummy SSNs rather than real ones. Since we can't determine the correct SSNs from this information alone, the best approach is to map these invalid values to an empty string to indicate missing data.
-- first_name: The problem is that all first names have a random 3-digit number appended to them. These numbers appear to be meaningless and don't provide any useful information. The correct values should be just the first names without the numbers.
-- last_name: The problem is that most last names follow a pattern of a capitalized English surname followed by 3 digits, but there are two exceptions: 1. "Bermúdez789" contains a non-English character (ú). 2. "Roldán470" contains a non-English character (á). These should be normalized to use standard English characters. The correct values should maintain the same naming pattern as the majority of entries.
-- maiden_name: The problem is that all values in the maiden_name column are unusual because they combine surnames with three-digit numbers, which is not typical for real maiden names. This appears to be a data generation artifact rather than genuine maiden names. Since we can't determine the correct maiden names from this data, the best approach is to remove the numeric portion and keep only the surname part, which could potentially be valid maiden names.
SELECT
"patient_id",
"birth_date",
"death_date",
CASE
WHEN "ssn" = '999-10-1178' THEN NULL
WHEN "ssn" = '999-10-6028' THEN NULL
WHEN "ssn" = '999-12-9121' THEN NULL
WHEN "ssn" = '999-14-9380' THEN NULL
WHEN "ssn" = '999-14-9672' THEN NULL
WHEN "ssn" = '999-16-4297' THEN NULL
WHEN "ssn" = '999-17-2611' THEN NULL
WHEN "ssn" = '999-20-4271' THEN NULL
WHEN "ssn" = '999-20-5403' THEN NULL
WHEN "ssn" = '999-22-2635' THEN NULL
WHEN "ssn" = '999-23-4696' THEN NULL
WHEN "ssn" = '999-23-9351' THEN NULL
WHEN "ssn" = '999-26-4422' THEN NULL
WHEN "ssn" = '999-26-8041' THEN NULL
WHEN "ssn" = '999-29-2359' THEN NULL
WHEN "ssn" = '999-29-4844' THEN NULL
WHEN "ssn" = '999-29-7349' THEN NULL
WHEN "ssn" = '999-30-8851' THEN NULL
WHEN "ssn" = '999-32-4862' THEN NULL
WHEN "ssn" = '999-33-4589' THEN NULL
WHEN "ssn" = '999-35-8448' THEN NULL
WHEN "ssn" = '999-36-1150' THEN NULL
WHEN "ssn" = '999-37-8682' THEN NULL
WHEN "ssn" = '999-38-7473' THEN NULL
WHEN "ssn" = '999-41-1756' THEN NULL
WHEN "ssn" = '999-42-9847' THEN NULL
WHEN "ssn" = '999-43-4282' THEN NULL
WHEN "ssn" = '999-44-2795' THEN NULL
WHEN "ssn" = '999-44-9634' THEN NULL
WHEN "ssn" = '999-48-5926' THEN NULL
WHEN "ssn" = '999-49-9846' THEN NULL
WHEN "ssn" = '999-50-5586' THEN NULL
WHEN "ssn" = '999-50-5697' THEN NULL
WHEN "ssn" = '999-51-3221' THEN NULL
WHEN "ssn" = '999-55-3195' THEN NULL
WHEN "ssn" = '999-55-6098' THEN NULL
WHEN "ssn" = '999-56-9201' THEN NULL
WHEN "ssn" = '999-57-7157' THEN NULL
WHEN "ssn" = '999-58-7543' THEN NULL
WHEN "ssn" = '999-59-2568' THEN NULL
WHEN "ssn" = '999-61-4140' THEN NULL
WHEN "ssn" = '999-61-6611' THEN NULL
WHEN "ssn" = '999-61-6740' THEN NULL
WHEN "ssn" = '999-62-7937' THEN NULL
WHEN "ssn" = '999-70-2405' THEN NULL
WHEN "ssn" = '999-70-4594' THEN NULL
WHEN "ssn" = '999-71-8314' THEN NULL
WHEN "ssn" = '999-73-5643' THEN NULL
WHEN "ssn" = '999-77-7700' THEN NULL
WHEN "ssn" = '999-79-2426' THEN NULL
WHEN "ssn" = '999-79-3695' THEN NULL
WHEN "ssn" = '999-85-2178' THEN NULL
WHEN "ssn" = '999-88-1792' THEN NULL
WHEN "ssn" = '999-89-9127' THEN NULL
WHEN "ssn" = '999-91-9580' THEN NULL
WHEN "ssn" = '999-93-7263' THEN NULL
WHEN "ssn" = '999-95-3792' THEN NULL
WHEN "ssn" = '999-96-6743' THEN NULL
WHEN "ssn" = '999-99-2106' THEN NULL
WHEN "ssn" = '999-99-2436' THEN NULL
ELSE "ssn"
END AS "ssn",
"drivers_license",
"passport_number",
"name_prefix",
CASE
WHEN "first_name" = 'Hayden835' THEN 'Hayden'
WHEN "first_name" = 'Adelia946' THEN 'Adelia'
WHEN "first_name" = 'Andera917' THEN 'Andera'
WHEN "first_name" = 'Antony83' THEN 'Antony'
WHEN "first_name" = 'Burton124' THEN 'Burton'
WHEN "first_name" = 'Carmelita854' THEN 'Carmelita'
WHEN "first_name" = 'Carrol931' THEN 'Carrol'
WHEN "first_name" = 'Cedrick207' THEN 'Cedrick'
WHEN "first_name" = 'Cheyenne169' THEN 'Cheyenne'
WHEN "first_name" = 'Christal240' THEN 'Christal'
WHEN "first_name" = 'Cindy893' THEN 'Cindy'
WHEN "first_name" = 'Coleman27' THEN 'Coleman'
WHEN "first_name" = 'Corey514' THEN 'Corey'
WHEN "first_name" = 'Damon455' THEN 'Damon'
WHEN "first_name" = 'Daniela614' THEN 'Daniela'
WHEN "first_name" = 'Dione665' THEN 'Dione'
WHEN "first_name" = 'Dominick530' THEN 'Dominick'
WHEN "first_name" = 'Elden718' THEN 'Elden'
WHEN "first_name" = 'Ethel888' THEN 'Ethel'
WHEN "first_name" = 'Fletcher87' THEN 'Fletcher'
WHEN "first_name" = 'Garry927' THEN 'Garry'
WHEN "first_name" = 'Gayle448' THEN 'Gayle'
WHEN "first_name" = 'Grant908' THEN 'Grant'
WHEN "first_name" = 'Guadalupe206' THEN 'Guadalupe'
WHEN "first_name" = 'Herb645' THEN 'Herb'
WHEN "first_name" = 'Herschel574' THEN 'Herschel'
WHEN "first_name" = 'Hildred696' THEN 'Hildred'
WHEN "first_name" = 'Huey641' THEN 'Huey'
WHEN "first_name" = 'Hunter736' THEN 'Hunter'
WHEN "first_name" = 'Irving123' THEN 'Irving'
WHEN "first_name" = 'Janeth814' THEN 'Janeth'
WHEN "first_name" = 'Jenae263' THEN 'Jenae'
WHEN "first_name" = 'Jimmie93' THEN 'Jimmie'
WHEN "first_name" = 'Kaycee352' THEN 'Kaycee'
WHEN "first_name" = 'Kirk871' THEN 'Kirk'
WHEN "first_name" = 'Lacey714' THEN 'Lacey'
WHEN "first_name" = 'Lavette209' THEN 'Lavette'
WHEN "first_name" = 'Leandro563' THEN 'Leandro'
WHEN "first_name" = 'Lessie363' THEN 'Lessie'
WHEN "first_name" = 'Lillia547' THEN 'Lillia'
WHEN "first_name" = 'Linn541' THEN 'Linn'
WHEN "first_name" = 'Lonny638' THEN 'Lonny'
WHEN "first_name" = 'Luke971' THEN 'Luke'
WHEN "first_name" = 'Manuel446' THEN 'Manuel'
WHEN "first_name" = 'Martín25' THEN 'Martín'
WHEN "first_name" = 'Mel236' THEN 'Mel'
WHEN "first_name" = 'Mirta419' THEN 'Mirta'
WHEN "first_name" = 'Moises22' THEN 'Moises'
WHEN "first_name" = 'Raymon366' THEN 'Raymon'
WHEN "first_name" = 'Shaquana156' THEN 'Shaquana'
WHEN "first_name" = 'Shayla126' THEN 'Shayla'
WHEN "first_name" = 'Shiela18' THEN 'Shiela'
WHEN "first_name" = 'Steve819' THEN 'Steve'
WHEN "first_name" = 'Stewart672' THEN 'Stewart'
WHEN "first_name" = 'Tyrell880' THEN 'Tyrell'
WHEN "first_name" = 'Vanesa40' THEN 'Vanesa'
WHEN "first_name" = 'Whitney250' THEN 'Whitney'
WHEN "first_name" = 'Yetta429' THEN 'Yetta'
WHEN "first_name" = 'Zoila41' THEN 'Zoila'
ELSE "first_name"
END AS "first_name",
CASE
WHEN "last_name" = 'Bermúdez789' THEN 'Bermudez789'
WHEN "last_name" = 'Roldán470' THEN 'Roldan470'
WHEN "last_name" = 'Wilkinson796' THEN 'Wilkinson'
WHEN "last_name" = 'Willms744' THEN 'Willms'
WHEN "last_name" = 'Zboncak558' THEN 'Zboncak'
ELSE "last_name"
END AS "last_name",
"name_suffix",
CASE
WHEN "maiden_name" = 'Barela183' THEN 'Barela'
WHEN "maiden_name" = 'Beier427' THEN 'Beier'
WHEN "maiden_name" = 'Deckow585' THEN 'Deckow'
WHEN "maiden_name" = 'DuBuque211' THEN 'DuBuque'
WHEN "maiden_name" = 'Gorczany269' THEN 'Gorczany'
WHEN "maiden_name" = 'Hegmann834' THEN 'Hegmann'
WHEN "maiden_name" = 'Hermann103' THEN 'Hermann'
WHEN "maiden_name" = 'Kohler843' THEN 'Kohler'
WHEN "maiden_name" = 'Lowe577' THEN 'Lowe'
WHEN "maiden_name" = 'Predovic534' THEN 'Predovic'
WHEN "maiden_name" = 'Shanahan202' THEN 'Shanahan'
ELSE "maiden_name"
END AS "maiden_name",
"marital_status",
"race",
"ethnicity",
"gender",
"birthplace",
"street_address",
"city",
"state",
"county",
"fips_code",
"zip_code",
"latitude",
"longitude",
"healthcare_expenses",
"healthcare_coverage",
"annual_income"
FROM "patients_renamed"
),
"patients_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- birth_date: from VARCHAR to DATE
-- death_date: from VARCHAR to DATE
-- patient_id: from VARCHAR to UUID
-- zip_code: from INT to VARCHAR
SELECT
"ssn",
"drivers_license",
"passport_number",
"name_prefix",
"first_name",
"last_name",
"name_suffix",
"maiden_name",
"marital_status",
"race",
"ethnicity",
"gender",
"birthplace",
"street_address",
"city",
"state",
"county",
"fips_code",
"latitude",
"longitude",
"healthcare_expenses",
"healthcare_coverage",
"annual_income",
CAST("birth_date" AS DATE)
AS "birth_date",
CAST("death_date" AS DATE)
AS "death_date",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("zip_code" AS VARCHAR)
AS "zip_code"
FROM "patients_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "patients_renamed_cleaned_casted"
stg_patients.yml (Document the table)
version: 2
models:
- name: stg_patients
description: The table is about patients' personal and healthcare information. It
includes demographics like birth date, race, gender, and marital status. It also
contains contact details such as address, city, and state. Healthcare-related
information includes expenses, coverage, and income. The table provides a comprehensive
profile of each patient, including unique identifiers like SSN and driver's license
numbers.
columns:
- name: ssn
description: Social Security Number of the patient
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: drivers_license
description: Driver's license number of the patient
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: passport_number
description: Passport number of the patient
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: name_prefix
description: Title or honorific prefix of the patient
tests:
- accepted_values:
values:
- Mr.
- Ms.
- Mrs.
cocoon_meta:
missing_reason: Not applicable if person doesn't use a prefix.
future_accepted_values:
- Miss
- Dr.
- Prof.
- Rev.
- Sir
- Lady
- Dame
- Lord
- Capt.
- Lt.
- Col.
- Gen.
- Adm.
- Mx.
data_type:
current_data_type: VARCHAR
- name: first_name
description: First name of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: last_name
description: Last name of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: name_suffix
description: Name suffix of the patient, if any
tests:
- accepted_values:
values:
- JD
- MD
- PhD
cocoon_meta:
missing_reason: Not applicable if person doesn't have a suffix.
future_accepted_values:
- Sr.
- Jr.
- I
- II
- III
- IV
- V
- Esq.
- DDS
- DVM
- DO
- RN
- BSN
- MSN
- PA
- NP
- CPA
- PE
data_type:
current_data_type: VARCHAR
- name: maiden_name
description: Maiden name of the patient, if applicable
cocoon_meta:
missing_reason: Not applicable for males or unmarried females.
data_type:
current_data_type: VARCHAR
- name: marital_status
description: Marital status of the patient
tests:
- accepted_values:
values:
- M
- S
cocoon_meta:
missing_reason: Not applicable for minors or very young individuals.
future_accepted_values:
- D
- W
- P
- U
data_type:
current_data_type: VARCHAR
- name: race
description: Racial background of the patient
tests:
- not_null
- accepted_values:
values:
- white
- asian
- black
cocoon_meta:
future_accepted_values:
- White
- Asian
- Black
- Hispanic or Latino
- American Indian or Alaska Native
- Native Hawaiian or Other Pacific Islander
- Two or More Races
- Other
- Unknown
- Prefer not to say
data_type:
current_data_type: VARCHAR
- name: ethnicity
description: Ethnic background of the patient
tests:
- not_null
- accepted_values:
values:
- nonhispanic
- hispanic
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: gender
description: Gender of the patient
tests:
- not_null
- accepted_values:
values:
- M
- F
cocoon_meta:
future_accepted_values:
- O
data_type:
current_data_type: VARCHAR
- name: birthplace
description: Place of birth of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: street_address
description: Street address of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: city
description: City of residence of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: state
description: State of residence of the patient
tests:
- not_null
- accepted_values:
values:
- Massachusetts
cocoon_meta:
future_accepted_values:
- Alabama
- Alaska
- Arizona
- Arkansas
- California
- Colorado
- Connecticut
- Delaware
- Florida
- Georgia
- Hawaii
- Idaho
- Illinois
- Indiana
- Iowa
- Kansas
- Kentucky
- Louisiana
- Maine
- Maryland
- Michigan
- Minnesota
- Mississippi
- Missouri
- Montana
- Nebraska
- Nevada
- New Hampshire
- New Jersey
- New Mexico
- New York
- North Carolina
- North Dakota
- Ohio
- Oklahoma
- Oregon
- Pennsylvania
- Rhode Island
- South Carolina
- South Dakota
- Tennessee
- Texas
- Utah
- Vermont
- Virginia
- Washington
- West Virginia
- Wisconsin
- Wyoming
data_type:
current_data_type: VARCHAR
- name: county
description: County of residence of the patient
tests:
- not_null
- accepted_values:
values:
- Hampden County
- Plymouth County
- Essex County
- Middlesex County
- Suffolk County
- Worcester County
- Bristol County
- Barnstable County
- Norfolk County
- Berkshire County
- Franklin County
cocoon_meta:
future_accepted_values:
- Dukes County
- Hampshire County
- Nantucket County
data_type:
current_data_type: VARCHAR
- name: fips_code
description: Federal Information Processing Standards code for location
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: DECIMAL
- name: latitude
description: Latitude coordinate of the patient's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: longitude
description: Longitude coordinate of the patient's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: healthcare_expenses
description: Total healthcare expenses for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: healthcare_coverage
description: Total healthcare coverage for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: annual_income
description: Annual income of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: birth_date
description: Date of birth of the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DATE
- name: death_date
description: Date of death of the patient, if applicable
cocoon_meta:
missing_reason: Not applicable for living individuals.
data_type:
current_data_type: DATE
- name: patient_id
description: Unique identifier for each patient record
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column is a unique identifier for each patient. For this
table, each row represents a distinct patient, and patient_id appears to be
unique across rows.
data_type:
current_data_type: UUID
- name: zip_code
description: ZIP code of the patient's residence
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
tests: []
stg_careplans
Table Preview (first 5 rows)
treatment_description | diagnosis_code | diagnosis_description | care_plan_id | encounter_id | end_date | patient_id | start_date | treatment_code | |
---|---|---|---|---|---|---|---|---|---|
0 | Fracture care | 58150001.0 | Fracture of clavicle | e5cc23db-c84f-3ded-411d-deb020dfcb4c | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 2016-07-23 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 2016-04-11 | 385691007 |
1 | Wound care | 284551006.0 | Laceration of foot | d70b9f2f-745e-f594-4983-b48790e4291f | 3c131d0c-6d15-7fe1-1372-a3d8316b6d82 | 1988-05-12 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 1988-04-27 | 225358003 |
2 | Respiratory therapy | NaN | None | 54b5c4bb-1df6-0892-7097-416ecea0f44f | d6df3fd6-e4a0-9b51-ebd5-c6394be154eb | 1990-07-10 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 1989-09-03 | 53950000 |
3 | Wound care | 370247008.0 | Facial laceration | 254cfb18-3266-be35-b8c4-8a6a89ce0547 | 17d8fd78-be39-d2f7-6032-74d712972fa3 | 2014-02-03 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 2014-01-20 | 225358003 |
4 | Head injury rehabilitation | 62106007.0 | Concussion with no loss of consciousness | 7e07f691-b437-c62b-4379-0a8e55812747 | 5fbc8a13-6944-a854-c4cb-bbd04e969ee1 | 2017-09-02 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 2017-06-08 | 47387005 |
stg_careplans.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:47:43.534687+00:00
WITH
"careplans_renamed" AS (
-- Rename: Renaming columns
-- Id -> care_plan_id
-- START -> start_date
-- STOP -> end_date
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> treatment_code
-- DESCRIPTION -> treatment_description
-- REASONCODE -> diagnosis_code
-- REASONDESCRIPTION -> diagnosis_description
SELECT
"Id" AS "care_plan_id",
"START" AS "start_date",
"STOP" AS "end_date",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "treatment_code",
"DESCRIPTION" AS "treatment_description",
"REASONCODE" AS "diagnosis_code",
"REASONDESCRIPTION" AS "diagnosis_description"
FROM "memory"."main"."careplans"
),
"careplans_renamed_casted" AS (
-- Column Type Casting:
-- care_plan_id: from VARCHAR to UUID
-- encounter_id: from VARCHAR to UUID
-- end_date: from VARCHAR to DATE
-- patient_id: from VARCHAR to UUID
-- start_date: from VARCHAR to DATE
-- treatment_code: from INT to VARCHAR
SELECT
"treatment_description",
"diagnosis_code",
"diagnosis_description",
CAST("care_plan_id" AS UUID)
AS "care_plan_id",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("end_date" AS DATE)
AS "end_date",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("start_date" AS DATE)
AS "start_date",
CAST("treatment_code" AS VARCHAR)
AS "treatment_code"
FROM "careplans_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "careplans_renamed_casted"
stg_careplans.yml (Document the table)
version: 2
models:
- name: stg_careplans
description: The table is about care plans for patients. It shows the start and
stop dates of treatments. Each care plan is linked to a patient and an encounter.
The table includes codes and descriptions for the treatment and reason. It relates
patients, encounters, and specific medical interventions over time. The data covers
various medical issues like fractures, wounds, and head injuries.
columns:
- name: treatment_description
description: Text description of the treatment or procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: diagnosis_code
description: Numeric code for the reason of treatment
cocoon_meta:
missing_reason: Not applicable for non-diagnosis-related care plans.
data_type:
current_data_type: DECIMAL
- name: diagnosis_description
description: Text description of the reason for treatment
cocoon_meta:
missing_reason: Not applicable for non-diagnosis-related care plans.
data_type:
current_data_type: VARCHAR
- name: care_plan_id
description: Unique identifier for the care plan entry
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column is a unique identifier for each care plan entry.
For this table, each row represents a distinct care plan. care_plan_id appears
to be unique across rows, as it's designed to be a specific identifier for
each care plan.
data_type:
current_data_type: UUID
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: end_date
description: End date of the care plan
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: DATE
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: start_date
description: Start date of the care plan
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DATE
- name: treatment_code
description: Numeric code for the treatment or procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
tests: []
stg_organizations
Table Preview (first 5 rows)
facility_name | street_address | CITY | STATE | latitude | longitude | PHONE | revenue | utilization_rate | ZIP | organization_id | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | FITCHBURG OUTPATIENT CLINIC | 881 Main Street | Fitchburg | MA | 42.586487 | -71.805210 | 9783429781 | 0.0 | 131 | 1420 | 74ab949d-17ac-3309-83a0-13b4405c66aa |
1 | BOSTON HEALTH CARE FOR THE HOMELESS PROGRAM INC | 461 WALNUT AVE | JAMAICA PLAIN | MA | 42.311588 | -71.098001 | 8576541550 | 0.0 | 30 | 21302331 | 588f6ce6-b8db-3588-8189-29db2680a313 |
2 | UMASS MEMORIAL HEALTHALLIANCE CLINTON HOSPITAL INC | 60 HOSPITAL RD | LEOMINSTER | MA | 42.540319 | -71.763130 | 9784662000 | 0.0 | 8 | 14533290 | e09d4c49-c2ef-3b0f-9a46-3719d9219306 |
3 | SOUTHWOOD AT NORWELL NURSING CENTER | 501 CORDWAINER DRIVE | NORWELL | MA | 42.164707 | -70.881491 | 7819827450 | 0.0 | 12 | 20611630 | aa682136-a4df-3942-9f50-e6fcee6c0b73 |
4 | CAREWELL URGENT CARE CENTERS OF MA P.C. | 484 RTE 134 | WORCESTER | MA | 42.262562 | -71.801888 | 5086947901 | 0.0 | 3 | 16071728 | ab701a70-a658-340e-8f69-ee196a7d40c6 |
stg_organizations.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:08:13.429847+00:00
WITH
"organizations_renamed" AS (
-- Rename: Renaming columns
-- Id -> organization_id
-- NAME -> facility_name
-- ADDRESS -> street_address
-- LAT -> latitude
-- LON -> longitude
-- REVENUE -> revenue
-- UTILIZATION -> utilization_rate
SELECT
"Id" AS "organization_id",
"NAME" AS "facility_name",
"ADDRESS" AS "street_address",
"CITY",
"STATE",
"ZIP",
"LAT" AS "latitude",
"LON" AS "longitude",
"PHONE",
"REVENUE" AS "revenue",
"UTILIZATION" AS "utilization_rate"
FROM "memory"."main"."organizations"
),
"organizations_renamed_cleaned" AS (
-- Clean unusual string values:
-- facility_name: The main problems are inconsistent use of 'INC' vs 'INC.', inconsistent spacing before 'LLC', and varying formats for company names. The correct values should use 'INC' without a period, remove extra spaces before 'LLC', and maintain consistent capitalization. Some entries also have typos or unusual abbreviations that need to be corrected.
-- PHONE: The main problem is inconsistent formatting of phone numbers. Most numbers are either in a single string format (like '4134472000') or have parentheses for the area code (like '(978) 406-1327'). The value '413-731-6000 Or 413-731-6000' is particularly unusual as it repeats the same number and includes the word 'Or'. The correct values should follow the most common format, which appears to be a single string of digits without any separators.
SELECT
"organization_id",
CASE
WHEN "facility_name" = 'STEWARD GOOD SAMARITAN MEDICAL CENTER INC.' THEN 'STEWARD GOOD SAMARITAN MEDICAL CENTER INC'
WHEN "facility_name" = '135 BENTON DRIVE OPERATING COMPANY LLC' THEN '135 BENTON DRIVE OPERATING COMPANY LLC'
WHEN "facility_name" = '265 ESSEX STREET OPERATING COMPANY LLC' THEN '265 ESSEX STREET OPERATING COMPANY LLC'
WHEN "facility_name" = '4499 ACUSHNET AVENUE OPERATING COMPANY LLC' THEN '4499 ACUSHNET AVENUE OPERATING COMPANY LLC'
WHEN "facility_name" = 'ABBOTT HOME HEALTH CARE INC' THEN 'ABBOTT HOME HEALTH CARE INC'
WHEN "facility_name" = 'ARBOUR HOSPITAL THE' THEN 'THE ARBOUR HOSPITAL'
WHEN "facility_name" = 'ASSOCIATED PHYSICIANS OF HARVARD MEDICAL FACULTY PHYSICIANS AT BETH IS' THEN 'ASSOCIATED PHYSICIANS OF HARVARD MEDICAL FACULTY PHYSICIANS AT BETH ISRAEL'
WHEN "facility_name" = 'BAYADA HOME HEALTH CARE INC' THEN 'BAYADA HOME HEALTH CARE INC'
WHEN "facility_name" = 'BEAR MT EAST LONGMEADOW LLC' THEN 'BEAR MOUNTAIN EAST LONGMEADOW LLC'
WHEN "facility_name" = 'BEAR MT SPRINGFIELD LLC' THEN 'BEAR MOUNTAIN SPRINGFIELD LLC'
WHEN "facility_name" = 'BERKSHIRE MEDICAL CENTER INC - 1' THEN 'BERKSHIRE MEDICAL CENTER INC'
WHEN "facility_name" = 'BL HEALTHCARE INC DELEWARE' THEN 'BL HEALTHCARE INC DELAWARE'
WHEN "facility_name" = 'BOSTON HOME INC (THE)' THEN 'THE BOSTON HOME INC'
WHEN "facility_name" = 'BOSTON MEDICAL CENTER CORPORATION-' THEN 'BOSTON MEDICAL CENTER CORPORATION'
WHEN "facility_name" = 'BROCKTON HOSPITAL INC.' THEN 'BROCKTON HOSPITAL INC'
WHEN "facility_name" = 'CARECENTRAL URGENT CARE MEDICAL GROUP PC' THEN 'CARECENTRAL URGENT CARE MEDICAL GROUP P.C.'
WHEN "facility_name" = 'CAREWELL URGENT CARE CENTERS OF MA PC' THEN 'CAREWELL URGENT CARE CENTERS OF MA P.C.'
WHEN "facility_name" = 'CARING HEALTH CENTER INC' THEN 'CARING HEALTH CENTER INC'
WHEN "facility_name" = 'CHARLES RIVER COMMUNITY HEALTH INC' THEN 'CHARLES RIVER COMMUNITY HEALTH INC'
WHEN "facility_name" = 'CHELMSFORD FAMILY PRACTICE PC' THEN 'CHELMSFORD FAMILY PRACTICE P.C.'
WHEN "facility_name" = 'COMPASSIONATE CARE HOSPICE OF SOUTHEASTERN MASS LL' THEN 'COMPASSIONATE CARE HOSPICE OF SOUTHEASTERN MASS LLC'
WHEN "facility_name" = 'CONNECTICUT RIVER INTERNISTS. LLP' THEN 'CONNECTICUT RIVER INTERNISTS LLP'
WHEN "facility_name" = 'DIMOCK COMMUNITY HEALTH CENTER INC' THEN 'DIMOCK COMMUNITY HEALTH CENTER INC.'
WHEN "facility_name" = 'DYOUVILLE TRANSITIONAL CARE INC' THEN 'D''YOUVILLE TRANSITIONAL CARE INC.'
WHEN "facility_name" = 'EDWARD M KENNEDY COMMUNITY HEALTH CENTER INC' THEN 'EDWARD M. KENNEDY COMMUNITY HEALTH CENTER INC.'
WHEN "facility_name" = 'ENCOMPASS HEALTH BRAINTREE HOSPITAL OF BRAINTREE' THEN 'ENCOMPASS HEALTH HOSPITAL OF BRAINTREE'
WHEN "facility_name" = 'ENCOMPASS HEALTH REHAB HOSPITAL OF WESTERN MASS' THEN 'ENCOMPASS HEALTH REHAB HOSPITAL OF WESTERN MASS.'
WHEN "facility_name" = 'EXCEL HOME CARE SERVICES INC' THEN 'EXCEL HOME CARE SERVICES INC.'
WHEN "facility_name" = 'FALMOUTH HOSPITAL ASSOCIATION INC' THEN 'FALMOUTH HOSPITAL ASSOCIATION INC.'
WHEN "facility_name" = 'FRANCISCAN HOSPITAL FOR CHILDREN INC' THEN 'FRANCISCAN HOSPITAL FOR CHILDREN INC.'
WHEN "facility_name" = 'Fitchburg Outpatient Clinic' THEN 'FITCHBURG OUTPATIENT CLINIC'
WHEN "facility_name" = 'HOLYOKE HEALTH CENTER INC' THEN 'HOLYOKE HEALTH CENTER INC.'
WHEN "facility_name" = 'HOLYOKE MEDICAL CENTER INC' THEN 'HOLYOKE MEDICAL CENTER INC.'
WHEN "facility_name" = 'LAHEY HOSPITAL & MEDICAL CENTER BURLINGTON' THEN 'LAHEY HOSPITAL & MEDICAL CENTER BURLINGTON'
WHEN "facility_name" = 'LIGHTHOUSE OF REVERE INC.' THEN 'LIGHTHOUSE OF REVERE INC.'
WHEN "facility_name" = 'LUTHERAN HOME OF JAMAICA PLAIN INC' THEN 'LUTHERAN HOME OF JAMAICA PLAIN INC.'
WHEN "facility_name" = 'MELROSEWAKEFIELD HEALTHCARE INC' THEN 'MELROSEWAKEFIELD HEALTHCARE, INC.'
WHEN "facility_name" = 'MIDDLESEX INTERNAL MEDICINE ASSOCIATES INC' THEN 'MIDDLESEX INTERNAL MEDICINE ASSOCIATES, INC.'
WHEN "facility_name" = 'MORTON HOSPITAL A STEWARD FAMILY HOSPITAL INC.' THEN 'MORTON HOSPITAL, A STEWARD FAMILY HOSPITAL, INC.'
WHEN "facility_name" = 'NEW BEDFORD INTERNAL MEDICINE & GERIATRICS LLC' THEN 'NEW BEDFORD INTERNAL MEDICINE & GERIATRICS, LLC'
WHEN "facility_name" = 'NEW ENGLAND FAMILY HEALTH LLC' THEN 'NEW ENGLAND FAMILY HEALTH, LLC'
WHEN "facility_name" = 'NEW ENGLAND PROFESSIONAL HOME HEALTH CARE LLC' THEN 'NEW ENGLAND PROFESSIONAL HOME HEALTH CARE, LLC'
WHEN "facility_name" = 'NORTH ADAMS COMMONS NURSING & REHABILITATION CENTE' THEN 'NORTH ADAMS COMMONS NURSING & REHABILITATION CENTER'
WHEN "facility_name" = 'NORTH RIVER HOSPICE LLC' THEN 'NORTH RIVER HOSPICE, LLC'
WHEN "facility_name" = 'NORTH SHORE MEDICAL CENTER INC' THEN 'NORTH SHORE MEDICAL CENTER, INC.'
WHEN "facility_name" = 'OAKS THE' THEN 'THE OAKS'
WHEN "facility_name" = 'PALM SKILLED NRSING CR & CTR FOR REHAB EXCELLENCE' THEN 'PALM SKILLED NURSING CARE & CENTER FOR REHAB EXCELLENCE'
WHEN "facility_name" = 'PRIMARY & PREVENTIVE CARE INC.' THEN 'PRIMARY & PREVENTIVE CARE, INC.'
WHEN "facility_name" = 'Plymouth Outreach Clinic' THEN 'PLYMOUTH OUTREACH CLINIC'
WHEN "facility_name" = 'QUINCY HEALTH AND REHABILITATION CENTER LLC' THEN 'QUINCY HEALTH AND REHABILITATION CENTER, LLC'
WHEN "facility_name" = 'RIVER''S EDGE PRIMARY CARE LLC' THEN 'RIVER''S EDGE PRIMARY CARE, LLC'
WHEN "facility_name" = 'ROYAL NORWELL NURSING & REHABILITATION CENTER LLC' THEN 'ROYAL NORWELL NURSING & REHABILITATION CENTER, LLC'
WHEN "facility_name" = 'SEASONS HOSPICE & PALLIATIVE CARE OF MASS LLC' THEN 'SEASONS HOSPICE & PALLIATIVE CARE OF MASS, LLC'
WHEN "facility_name" = 'SOUTH SHORE MEDICAL INVESTORS LLC' THEN 'SOUTH SHORE MEDICAL INVESTORS, LLC'
WHEN "facility_name" = 'SOUTH SHORE PRIMARY AND URGENT CARE LLC' THEN 'SOUTH SHORE PRIMARY AND URGENT CARE, LLC'
WHEN "facility_name" = 'SOUTHWOOD AT NORWELL NURSING CTR' THEN 'SOUTHWOOD AT NORWELL NURSING CENTER'
WHEN "facility_name" = 'SUN MEDICAL CLINIC PC' THEN 'SUN MEDICAL CLINIC, PC'
WHEN "facility_name" = 'Springfield Outpatient Clinic' THEN 'SPRINGFIELD OUTPATIENT CLINIC'
WHEN "facility_name" = 'TAUNTON MEDICAL CENTER PC' THEN 'TAUNTON MEDICAL CENTER, PC'
WHEN "facility_name" = 'TUFTS MEDICAL CENTER INC' THEN 'TUFTS MEDICAL CENTER INC'
WHEN "facility_name" = 'URGENT CARE MEDICAL ASSOCIATES LLC' THEN 'URGENT CARE MEDICAL ASSOCIATES LLC'
WHEN "facility_name" = 'URGENT CARE SPECIALISTS P.C.' THEN 'URGENT CARE SPECIALISTS PC'
WHEN "facility_name" = 'VIGILANT FAMILY HEALTH CLINIC INC.' THEN 'VIGILANT FAMILY HEALTH CLINIC INC'
WHEN "facility_name" = 'WESTBOROUGH BEHAVIORAL HEALTHCARE HOSPITAL LLC' THEN 'WESTBOROUGH BEHAVIORAL HEALTHCARE HOSPITAL LLC'
WHEN "facility_name" = 'WORCESTER INTERNAL MEDICINE INC.' THEN 'WORCESTER INTERNAL MEDICINE INC'
ELSE "facility_name"
END AS "facility_name",
"street_address",
"CITY",
"STATE",
"ZIP",
"latitude",
"longitude",
CASE
WHEN "PHONE" = '(978) 406-1327' THEN '9784061327'
WHEN "PHONE" = '413-731-6000 Or 413-731-6000' THEN '4137316000'
WHEN "PHONE" = '978-342-9781 Or 978-342-9781' THEN '9783429781'
WHEN "PHONE" = '800-865-3384' THEN '8008653384'
ELSE "PHONE"
END AS "PHONE",
"revenue",
"utilization_rate"
FROM "organizations_renamed"
),
"organizations_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- ZIP: from INT to VARCHAR
-- organization_id: from VARCHAR to UUID
SELECT
"facility_name",
"street_address",
"CITY",
"STATE",
"latitude",
"longitude",
"PHONE",
"revenue",
"utilization_rate",
CAST("ZIP" AS VARCHAR)
AS "ZIP",
CAST("organization_id" AS UUID)
AS "organization_id"
FROM "organizations_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "organizations_renamed_cleaned_casted"
stg_organizations.yml (Document the table)
version: 2
models:
- name: stg_organizations
description: The table is about healthcare organizations. It includes details such
as unique ID, name, address, city, state, ZIP code, latitude, longitude, phone
number, revenue, and utilization. The organizations appear to be various healthcare
facilities like clinics, hospitals, and urgent care centers. They are located
in Massachusetts. The table provides contact information and some operational
data for each organization.
columns:
- name: facility_name
description: Name of the healthcare facility
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: street_address
description: Street address of the facility
tests:
- not_null
cocoon_meta:
patterns:
- regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)$
summary: Basic street address with number and street name
- regex: ^\d+\s+[A-Z\s]+(?:ST|AVE|RD|DRIVE|STREET|AVENUE|ROAD)(?:\s+(?:SUITE|STE)\s+\d+)?$
summary: Address with suite or unit number
- regex: ^[A-Z0-9\s]+$
summary: General pattern allowing for various formats
- regex: \d+\s+[A-Za-z\s&''\.]+\s+(Road|ST\.|Street|ROAD|PL)
summary: Street addresses with number, name, and type (most strict)
- regex: \d+\s+[A-Za-z\s&''\.]+
summary: Street addresses with number and name (less strict)
data_type:
current_data_type: VARCHAR
- name: CITY
description: City where the facility is located
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: STATE
description: State abbreviation for the facility's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: latitude
description: Latitude coordinate of the facility
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: longitude
description: Longitude coordinate of the facility
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: PHONE
description: Contact phone number for the facility
tests:
- not_null
cocoon_meta:
patterns:
- regex: \d{10}
summary: 10-digit phone numbers without separators
data_type:
current_data_type: VARCHAR
- name: revenue
description: Revenue information for the facility
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: utilization_rate
description: Measure of facility usage or capacity
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: ZIP
description: ZIP code of the facility's location
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: organization_id
description: Unique identifier for the healthcare organization
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for the healthcare
organization. For this table, each row is for a unique healthcare facility.
The organization_id appears to be a UUID, which is designed to be unique for
each entry.
data_type:
current_data_type: UUID
tests: []
stg_immunizations
Table Preview (first 5 rows)
vaccine_code | vaccine_description | base_cost | encounter_id | immunization_datetime | patient_id | |
---|---|---|---|---|---|---|
0 | 140 | Influenza seasonal injectable preservative free | 136.0 | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
1 | 140 | Influenza seasonal injectable preservative free | 136.0 | 07969a4d-f9f0-c63f-6a0e-ecb659a20571 | 1986-06-17 21:35:49 | db80575b-5e9b-921b-fad9-1e3a20929dc7 |
2 | 21 | varicella | 136.0 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 2014-01-19 01:05:00 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
3 | 10 | IPV | 136.0 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 2014-01-19 01:05:00 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
4 | 140 | Influenza seasonal injectable preservative free | 136.0 | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 2014-01-19 01:05:00 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
stg_immunizations.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:01:52.568671+00:00
WITH
"immunizations_renamed" AS (
-- Rename: Renaming columns
-- DATE_ -> immunization_datetime
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> vaccine_code
-- DESCRIPTION -> vaccine_description
-- BASE_COST -> base_cost
SELECT
"DATE_" AS "immunization_datetime",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "vaccine_code",
"DESCRIPTION" AS "vaccine_description",
"BASE_COST" AS "base_cost"
FROM "memory"."main"."immunizations"
),
"immunizations_renamed_casted" AS (
-- Column Type Casting:
-- encounter_id: from VARCHAR to UUID
-- immunization_datetime: from VARCHAR to TIMESTAMP
-- patient_id: from VARCHAR to UUID
SELECT
"vaccine_code",
"vaccine_description",
"base_cost",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("immunization_datetime" AS TIMESTAMP)
AS "immunization_datetime",
CAST("patient_id" AS UUID)
AS "patient_id"
FROM "immunizations_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "immunizations_renamed_casted"
stg_immunizations.yml (Document the table)
version: 2
models:
- name: stg_immunizations
description: The table is about immunizations given to patients. It includes details
such as the date of immunization, patient ID, encounter ID, vaccine code, vaccine
description, and base cost. Each row represents a single immunization event. Multiple
immunizations can be given to the same patient on the same date. All immunizations
in the sample have the same base cost of $136.00.
columns:
- name: vaccine_code
description: Numeric code representing the vaccine type
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: vaccine_description
description: Text description of the vaccine administered
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: base_cost
description: Base cost of the immunization in dollars
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: encounter_id
description: Unique identifier for the healthcare encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: immunization_datetime
description: Date and time of immunization
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
tests: []
stg_payer_transitions
Table Preview (first 5 rows)
plan_ownership | owner_name | coverage_end_date | coverage_start_date | member_id | patient_id | primary_payer_id | secondary_payer_id | |
---|---|---|---|---|---|---|---|---|
0 | Guardian | Solomon675 Bailey598 | 2022-09-29 02:44:04 | 2021-09-23 02:44:04 | 74d0f7af-d6c6-7848-9f62-c3f165170e09 | eb247227-e839-88d3-447d-b5972468f33b | 734afbd6-4794-363b-9bc0-6a3981533ed5 | NaN |
1 | Guardian | Solomon675 Bailey598 | 2023-09-29 02:44:04 | 2022-09-29 02:44:04 | 74d0f7af-d6c6-7848-9f62-c3f165170e09 | eb247227-e839-88d3-447d-b5972468f33b | 734afbd6-4794-363b-9bc0-6a3981533ed5 | NaN |
2 | Guardian | Hong136 Keebler762 | 2010-02-14 01:05:00 | 2009-02-08 01:05:00 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | NaN |
3 | Guardian | Hong136 Keebler762 | 2011-02-20 01:05:00 | 2010-02-14 01:05:00 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | NaN |
4 | Guardian | Hong136 Keebler762 | 2012-02-26 01:05:00 | 2011-02-20 01:05:00 | d4ef17f8-1714-3490-17af-70d4d0c16478 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | NaN |
stg_payer_transitions.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:13:46.432396+00:00
WITH
"payer_transitions_renamed" AS (
-- Rename: Renaming columns
-- PATIENT -> patient_id
-- MEMBERID -> member_id
-- START_DATE -> coverage_start_date
-- END_DATE -> coverage_end_date
-- PAYER -> primary_payer_id
-- SECONDARY_PAYER -> secondary_payer_id
-- PLAN_OWNERSHIP -> plan_ownership
-- OWNER_NAME -> owner_name
SELECT
"PATIENT" AS "patient_id",
"MEMBERID" AS "member_id",
"START_DATE" AS "coverage_start_date",
"END_DATE" AS "coverage_end_date",
"PAYER" AS "primary_payer_id",
"SECONDARY_PAYER" AS "secondary_payer_id",
"PLAN_OWNERSHIP" AS "plan_ownership",
"OWNER_NAME" AS "owner_name"
FROM "memory"."main"."payer_transitions"
),
"payer_transitions_renamed_casted" AS (
-- Column Type Casting:
-- coverage_end_date: from VARCHAR to TIMESTAMP
-- coverage_start_date: from VARCHAR to TIMESTAMP
-- member_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- primary_payer_id: from VARCHAR to UUID
-- secondary_payer_id: from VARCHAR to UUID
SELECT
"plan_ownership",
"owner_name",
CASE
WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:04Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:07Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
WHEN regexp_full_match("coverage_end_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z') THEN CAST("coverage_end_date" AS TIMESTAMP)
END
AS "coverage_end_date",
CASE
WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T13:42:04Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T03:41:07Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T19:43:25Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
WHEN regexp_full_match("coverage_start_date", '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z') THEN CAST("coverage_start_date" AS TIMESTAMP)
END
AS "coverage_start_date",
CAST("member_id" AS UUID)
AS "member_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("primary_payer_id" AS UUID)
AS "primary_payer_id",
CAST("secondary_payer_id" AS UUID)
AS "secondary_payer_id"
FROM "payer_transitions_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "payer_transitions_renamed_casted"
stg_payer_transitions.yml (Document the table)
version: 2
models:
- name: stg_payer_transitions
description: The table represents payer transitions for patients. It shows insurance
coverage periods for patients. Each row contains a patient ID, member ID, start
and end dates of coverage, primary payer, secondary payer (if any), plan ownership,
and owner name. The table tracks changes in insurance coverage over time for individual
patients.
columns:
- name: plan_ownership
description: Type of ownership for the insurance plan
tests:
- accepted_values:
values:
- Self
- Guardian
- Spouse
cocoon_meta:
missing_reason: Unknown
future_accepted_values:
- Parent
- Child
- Sibling
- Domestic Partner
- Other Family Member
- Employer
- Trust
- Other
data_type:
current_data_type: VARCHAR
- name: owner_name
description: Name of the insurance plan owner
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: coverage_end_date
description: Ending date of insurance coverage period
tests:
- not_null
cocoon_meta:
unusual_values: 'Two different time formats are used: 13:42:04Z for most dates
and 03:41:07Z for two December dates.'
data_type:
current_data_type: TIMESTAMP
- name: coverage_start_date
description: Beginning date of insurance coverage period
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: member_id
description: Unique identifier for the insurance member
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: primary_payer_id
description: Primary insurance payer identifier
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: secondary_payer_id
description: Secondary insurance payer identifier, if applicable
cocoon_meta:
missing_reason: Not all patients have a secondary insurance plan.
data_type:
current_data_type: UUID
tests: []
stg_medications
Table Preview (first 5 rows)
medication_description | base_cost | payer_coverage_amount | dispense_count | total_cost | reason_description | encounter_id | medication_code | order_end_datetime | order_start_datetime | patient_id | payer_id | reason_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Clopidogrel 75 MG Oral Tablet | 631.42 | 0.00 | 162 | 102290.04 | None | 546912c2-aa40-a2e0-f148-af23fa171af0 | 309362 | NaT | 2009-09-02 12:00:39 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | None |
1 | Simvastatin 20 MG Oral Tablet | 323.20 | 29.40 | 162 | 52358.40 | None | 546912c2-aa40-a2e0-f148-af23fa171af0 | 312961 | NaT | 2009-09-02 12:00:39 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | None |
2 | 24 HR metoprolol succinate 100 MG Extended Release Oral Tablet | 129.94 | 99.94 | 162 | 21050.28 | None | 546912c2-aa40-a2e0-f148-af23fa171af0 | 866412 | NaT | 2009-09-02 12:00:39 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | None |
3 | Nitroglycerin 0.4 MG/ACTUAT Mucosal Spray | 702.61 | 672.61 | 162 | 113822.82 | None | 546912c2-aa40-a2e0-f148-af23fa171af0 | 705129 | NaT | 2009-09-02 12:00:39 | 2ffa361e-5858-877e-e022-ce81fe32da1b | 0133f751-9229-3cfd-815f-b6d4979bdd6a | None |
4 | Ibuprofen 100 MG Oral Tablet | 332.42 | 0.00 | 3 | 997.26 | None | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 198405 | 2016-07-23 04:55:57 | 2016-04-11 04:55:57 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 26aab0cd-6aba-3e1b-ac5b-05c8867e762c | None |
stg_medications.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:03:31.386317+00:00
WITH
"medications_renamed" AS (
-- Rename: Renaming columns
-- START -> order_start_datetime
-- STOP -> order_end_datetime
-- PATIENT -> patient_id
-- PAYER -> payer_id
-- ENCOUNTER -> encounter_id
-- CODE -> medication_code
-- DESCRIPTION -> medication_description
-- BASE_COST -> base_cost
-- PAYER_COVERAGE -> payer_coverage_amount
-- DISPENSES -> dispense_count
-- TOTALCOST -> total_cost
-- REASONCODE -> reason_code
-- REASONDESCRIPTION -> reason_description
SELECT
"START" AS "order_start_datetime",
"STOP" AS "order_end_datetime",
"PATIENT" AS "patient_id",
"PAYER" AS "payer_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "medication_code",
"DESCRIPTION" AS "medication_description",
"BASE_COST" AS "base_cost",
"PAYER_COVERAGE" AS "payer_coverage_amount",
"DISPENSES" AS "dispense_count",
"TOTALCOST" AS "total_cost",
"REASONCODE" AS "reason_code",
"REASONDESCRIPTION" AS "reason_description"
FROM "memory"."main"."medications"
),
"medications_renamed_casted" AS (
-- Column Type Casting:
-- encounter_id: from VARCHAR to UUID
-- medication_code: from INT to VARCHAR
-- order_end_datetime: from VARCHAR to TIMESTAMP
-- order_start_datetime: from VARCHAR to TIMESTAMP
-- patient_id: from VARCHAR to UUID
-- payer_id: from VARCHAR to UUID
-- reason_code: from DECIMAL to VARCHAR
SELECT
"medication_description",
"base_cost",
"payer_coverage_amount",
"dispense_count",
"total_cost",
"reason_description",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("medication_code" AS VARCHAR)
AS "medication_code",
CAST("order_end_datetime" AS TIMESTAMP)
AS "order_end_datetime",
CAST("order_start_datetime" AS TIMESTAMP)
AS "order_start_datetime",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("payer_id" AS UUID)
AS "payer_id",
CAST("reason_code" AS VARCHAR)
AS "reason_code"
FROM "medications_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "medications_renamed_casted"
stg_medications.yml (Document the table)
version: 2
models:
- name: stg_medications
description: The table is about medication orders. It includes patient IDs, payer
information, encounter IDs, and medication details. Each row represents a medication
prescription with its start date, code, description, cost, coverage, and quantity
dispensed. The table tracks both base cost and total cost for each medication.
Some entries have reason codes and descriptions, though these are often empty
in the sample data.
columns:
- name: medication_description
description: Description of the medication
tests:
- not_null
cocoon_meta:
patterns:
- regex: ^[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
summary: Medication name followed by dosage (in MG, UNT, or ACTUAT) and form
(e.g., Oral Tablet, Injection)
- regex: ^\d+\sHR\s[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
summary: Medication with time duration (e.g., 24 HR) at the beginning
- regex: ^\d+\sML\s[A-Za-z\s-]+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT)(/[A-Z]+)?\s[A-Za-z\s]+(\[[\w\s]+\])?$
summary: Medication with volume (e.g., 1 ML) at the beginning
- regex: ^[A-Za-z]+\s\d+\sDay\sPack$
summary: Birth control packs (e.g., 28 Day Pack)
- regex: ^.+\s\d+(\.\d+)?\s(MG|UNT|ACTUAT|ML)(/[A-Z]+)?.*$
summary: Looser pattern to catch most entries
- regex: ^[A-Za-z]+\s\d+\.\d+/\d+\.\d+\sMG\sper\s\d+HR\s\d+\sDay\s[A-Za-z]+\s[A-Za-z]+$
summary: Medication name followed by dosage in MG and administration method
- regex: ^[A-Za-z]+\s\d+\.\d+/\d+\s\d+\sDay\sPack$
summary: Medication name followed by dosage and pack information
- regex: ^[A-Za-z]+/[A-Za-z]+$
summary: Medication names separated by forward slash
data_type:
current_data_type: VARCHAR
- name: base_cost
description: Base cost of the medication
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: payer_coverage_amount
description: Amount covered by the payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: dispense_count
description: Number of times the medication was dispensed
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: total_cost
description: Total cost of the medication
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: reason_description
description: Description of the reason for prescription
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: medication_code
description: Medication code
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: order_end_datetime
description: End date and time of medication order
cocoon_meta:
missing_reason: Ongoing medications don't have an end date.
data_type:
current_data_type: TIMESTAMP
- name: order_start_datetime
description: Start date and time of medication order
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: payer_id
description: Unique identifier for the insurance payer
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: reason_code
description: Code for the reason of prescription
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
tests: []
stg_imaging_studies
Table Preview (first 5 rows)
series_uid | bodysite_description | modality_code | modality_description | instance_uid | sop_code | sop_description | bodysite_code | encounter_id | patient_id | procedure_code | study_datetime | study_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.2.840.99999999.1.63419662.1460344377161 | Clavicle | DX | Digital Radiography | 1.2.840.99999999.1.1.35771508.1103224827588 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 51299004 | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 168594001 | 2016-04-11 03:12:57 | b7b3078a-86e4-fc51-1720-d33afcd4ce88 |
1 | 1.2.840.99999999.1.83549147.1664073844511 | Ankle | DX | Digital Radiography | 1.2.840.99999999.1.1.49048804.1413650359123 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 344001 | 1cedec53-a362-320c-d52a-ddbc57fbce63 | eb247227-e839-88d3-447d-b5972468f33b | 19490002 | 2022-09-25 02:44:04 | 6f1b4250-2e8e-eaaf-f6a7-655d8f8e5791 |
2 | 1.2.840.99999999.1.10943528.675468811165 | Arm | DX | Digital Radiography | 1.2.840.99999999.1.1.93692690.675468811165 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 40983000 | 59f48dfd-f1f6-faa7-f378-150137d46be4 | db80575b-5e9b-921b-fad9-1e3a20929dc7 | 1225002 | 1991-05-28 22:13:31 | 932a8e0d-20d5-573b-db47-8b9db9d3b99c |
3 | 1.2.840.99999999.1.70861523.1413650359123 | Ankle | DX | Digital Radiography | 1.2.840.99999999.1.1.49048804.1413650359123 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 344001 | a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 | 734e5f3c-e660-6cbe-7c26-c5264cbde68e | 19490002 | 2014-10-18 16:39:19 | c16fda8c-2353-75d9-f7c2-d91c773d87de |
4 | 1.2.840.99999999.1.26012892.1385166503634 | Clavicle | DX | Digital Radiography | 1.2.840.99999999.1.1.35771508.1103224827588 | 1.2.840.10008.5.1.4.1.1.1.1 | Digital X-Ray Image Storage | 51299004 | 0148026b-b4db-1663-19ed-a6a51d3c33b2 | 750cdaf4-c264-e967-e76b-53a5a61abcab | 168594001 | 2013-11-23 00:28:23 | ef1ee1a1-e906-6744-421a-ab8480b8b8ae |
stg_imaging_studies.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:01:06.123995+00:00
WITH
"imaging_studies_renamed" AS (
-- Rename: Renaming columns
-- Id -> study_id
-- DATE_ -> study_datetime
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- SERIES_UID -> series_uid
-- BODYSITE_CODE -> bodysite_code
-- BODYSITE_DESCRIPTION -> bodysite_description
-- MODALITY_CODE -> modality_code
-- MODALITY_DESCRIPTION -> modality_description
-- INSTANCE_UID -> instance_uid
-- SOP_CODE -> sop_code
-- SOP_DESCRIPTION -> sop_description
-- PROCEDURE_CODE -> procedure_code
SELECT
"Id" AS "study_id",
"DATE_" AS "study_datetime",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"SERIES_UID" AS "series_uid",
"BODYSITE_CODE" AS "bodysite_code",
"BODYSITE_DESCRIPTION" AS "bodysite_description",
"MODALITY_CODE" AS "modality_code",
"MODALITY_DESCRIPTION" AS "modality_description",
"INSTANCE_UID" AS "instance_uid",
"SOP_CODE" AS "sop_code",
"SOP_DESCRIPTION" AS "sop_description",
"PROCEDURE_CODE" AS "procedure_code"
FROM "memory"."main"."imaging_studies"
),
"imaging_studies_renamed_casted" AS (
-- Column Type Casting:
-- bodysite_code: from INT to VARCHAR
-- encounter_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- procedure_code: from INT to VARCHAR
-- study_datetime: from VARCHAR to TIMESTAMP
-- study_id: from VARCHAR to UUID
SELECT
"series_uid",
"bodysite_description",
"modality_code",
"modality_description",
"instance_uid",
"sop_code",
"sop_description",
CAST("bodysite_code" AS VARCHAR)
AS "bodysite_code",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("procedure_code" AS VARCHAR)
AS "procedure_code",
CAST("study_datetime" AS TIMESTAMP)
AS "study_datetime",
CAST("study_id" AS UUID)
AS "study_id"
FROM "imaging_studies_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "imaging_studies_renamed_casted"
stg_imaging_studies.yml (Document the table)
version: 2
models:
- name: stg_imaging_studies
description: The table is about imaging studies. It contains details of medical
imaging procedures. Each row represents a unique imaging study. It includes the
study ID, date, patient ID, encounter ID, body site, imaging modality, and procedure
codes. The table also contains technical details like series UID, instance UID,
and SOP codes. This data likely comes from a healthcare system's radiology department.
columns:
- name: series_uid
description: Unique identifier for the image series
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for the image series.
For this table, each row is for a unique imaging study. The series_uid appears
to be unique across rows in the sample data.
data_type:
current_data_type: VARCHAR
- name: bodysite_description
description: Description of the body part imaged
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: modality_code
description: Code representing the imaging modality
tests:
- not_null
- accepted_values:
values:
- CR
- DX
- US
- CT
cocoon_meta:
future_accepted_values:
- MR
- NM
- PT
- XA
- RF
- MG
- IO
- PX
- GM
- SM
- XC
- ES
- ECG
- HD
- SR
- OT
- BI
- CD
- DD
- LS
- OPT
data_type:
current_data_type: VARCHAR
- name: modality_description
description: Description of the imaging modality
tests:
- not_null
- accepted_values:
values:
- Computed Radiography
- Digital Radiography
- Ultrasound
- Computed Tomography
cocoon_meta:
future_accepted_values:
- Magnetic Resonance Imaging
- Nuclear Medicine
- Positron Emission Tomography
- Fluoroscopy
- Mammography
- Angiography
- Dual-Energy X-ray Absorptiometry
- Optical Coherence Tomography
data_type:
current_data_type: VARCHAR
- name: instance_uid
description: Unique identifier for the image instance
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: sop_code
description: Service-Object Pair code for DICOM standard
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: sop_description
description: Description of the SOP code
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: bodysite_code
description: Code representing the body part imaged
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: encounter_id
description: Unique identifier for the healthcare encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: procedure_code
description: Code representing the imaging procedure performed
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: study_datetime
description: Date and time of the imaging study
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: study_id
description: Unique identifier for the imaging study
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for the imaging study.
For this table, each row is for a unique imaging study. The study_id appears
to be unique across rows in the sample data.
data_type:
current_data_type: UUID
tests: []
stg_devices
Table Preview (first 5 rows)
device_description | device_udi | device_code | encounter_id | patient_id | usage_end_date | usage_start_date | |
---|---|---|---|---|---|---|---|
0 | Home nebulizer (physical object) | (01)22513743438051(11)041125(17)291210(10)68326934928628(21)359700838 | 170615005 | 7cde5dee-2065-db54-2aca-5335fca85bd5 | 2ffa361e-5858-877e-e022-ce81fe32da1b | NaT | 2004-12-16 19:00:39 |
1 | Manual wheelchair (physical object) | (01)10803398532575(11)220904(17)470919(10)77584230(21)506502350562 | 228869008 | 1cedec53-a362-320c-d52a-ddbc57fbce63 | eb247227-e839-88d3-447d-b5972468f33b | NaT | 2022-09-25 02:44:04 |
2 | Home nebulizer (physical object) | (01)91908954162042(11)030105(17)280120(10)384840111239630881(21)2214828292399 | 170615005 | becb96b4-e1aa-1b80-0aa9-8b06e258871f | d84815a3-c5b3-8ca2-025f-6323a4ec59ef | NaT | 2003-01-26 18:35:55 |
3 | Manual wheelchair (physical object) | (01)75341347423725(11)170329(17)420413(10)91627316426(21)736039710205635138 | 228869008 | 4e5eeece-1a14-8ae0-874f-15d9c6f42a34 | 7ec76836-c039-d9bf-8bb9-fe488c66d452 | 2017-05-15 15:37:58 | 2017-04-19 15:37:58 |
4 | Manual wheelchair (physical object) | (01)44815148141981(11)140927(17)391012(10)395970756765435894(21)565882491188675 | 228869008 | a3f6aeb1-56b5-7e85-9f6f-8eb5ce364257 | 734e5f3c-e660-6cbe-7c26-c5264cbde68e | 2015-01-09 17:11:06 | 2014-10-18 16:39:19 |
stg_devices.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:56:59.986257+00:00
WITH
"devices_renamed" AS (
-- Rename: Renaming columns
-- START -> usage_start_date
-- STOP -> usage_end_date
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> device_code
-- DESCRIPTION -> device_description
-- UDI -> device_udi
SELECT
"START" AS "usage_start_date",
"STOP" AS "usage_end_date",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "device_code",
"DESCRIPTION" AS "device_description",
"UDI" AS "device_udi"
FROM "memory"."main"."devices"
),
"devices_renamed_casted" AS (
-- Column Type Casting:
-- device_code: from INT to VARCHAR
-- encounter_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- usage_end_date: from VARCHAR to TIMESTAMP
-- usage_start_date: from VARCHAR to TIMESTAMP
SELECT
"device_description",
"device_udi",
CAST("device_code" AS VARCHAR)
AS "device_code",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("usage_end_date" AS TIMESTAMP)
AS "usage_end_date",
CAST("usage_start_date" AS TIMESTAMP)
AS "usage_start_date"
FROM "devices_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "devices_renamed_casted"
stg_devices.yml (Document the table)
version: 2
models:
- name: stg_devices
description: The table is about medical devices. It contains start and stop dates
for device usage. Each row represents a device associated with a patient and encounter.
The table includes device codes, descriptions, and unique device identifiers (UDI).
Devices listed are home nebulizers and manual wheelchairs. The UDI contains encoded
information about the device.
columns:
- name: device_description
description: Text description of the device
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: device_udi
description: Unique Device Identifier with encoded information
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column contains the Unique Device Identifier (UDI) for each
medical device. For this table, each row represents a unique device usage
instance associated with a patient and encounter. The device_udi appears to
be unique across rows, as it contains encoded information specific to each
device instance.
patterns:
- regex: \(01\)\d+\(11\)\d+\(17\)\d+\(10\)\d+\(21\)\d+
summary: 'UDI code with 5 segments: (01), (11), (17), (10), and (21)'
data_type:
current_data_type: VARCHAR
- name: device_code
description: Numeric code identifying the device type
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: usage_end_date
description: End date of device usage
cocoon_meta:
missing_reason: Device usage might be ongoing, without an end date.
data_type:
current_data_type: TIMESTAMP
- name: usage_start_date
description: Start date of device usage
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
tests: []
stg_observations
Table Preview (first 5 rows)
CATEGORY | observation_code | DESCRIPTION | UNITS | data_type | encounter_id | observation_datetime | observation_value | patient_id | |
---|---|---|---|---|---|---|---|---|---|
0 | vital-signs | 29463-7 | Body Weight | kg | numeric | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 11.1 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
1 | vital-signs | 59576-9 | Body mass index (BMI) [Percentile] Per age and gender | % | numeric | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 0.1 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
2 | vital-signs | 8480-6 | Systolic Blood Pressure | mmHg | numeric | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 127.0 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
3 | vital-signs | 9279-1 | Respiratory rate | /min | numeric | 8b80cac7-383e-b479-1a05-7d8a9de45d04 | 2013-01-13 01:05:00 | 15.0 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
4 | vital-signs | 72514-3 | Pain severity - 0-10 verbal numeric rating [Score] - Reported | None | numeric | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 2014-01-19 01:05:00 | 1.0 | 3dfb065a-67df-5b8a-3901-49bfd834bed1 |
stg_observations.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:06:20.318190+00:00
WITH
"observations_renamed" AS (
-- Rename: Renaming columns
-- DATE_ -> observation_datetime
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> observation_code
-- VALUE_ -> observation_value
-- TYPE -> data_type
SELECT
"DATE_" AS "observation_datetime",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CATEGORY",
"CODE" AS "observation_code",
"DESCRIPTION",
"VALUE_" AS "observation_value",
"UNITS",
"TYPE" AS "data_type"
FROM "memory"."main"."observations"
),
"observations_renamed_dedup" AS (
-- Deduplication: Removed 556 duplicated rows
SELECT DISTINCT *
FROM "observations_renamed"
),
"observations_renamed_dedup_cleaned" AS (
-- Clean unusual string values:
-- DESCRIPTION: The problem is inconsistent capitalization and punctuation in question-format values, as well as unexplained abbreviations. The correct values should have consistent capitalization (sentence case) and punctuation (question marks for questions), and abbreviated terms should be expanded for clarity.
-- observation_value: '0.0' and 'No' likely represent the same concept. '1.0' and 'Yes' might also be redundant representations.
-- UNITS: The problem is that some values in the UNITS column are not standard unit abbreviations and lack specificity. The unusual values are '{score}', '{nominal}', 'a', '{#}', '{INR}', '{presence}', '{SG}', and '{T-score}'. These are not standardized units of measurement but rather descriptors or placeholders. Additionally, some units have inconsistent formatting, such as '[pH]' vs 'pH', and '[degF]' which should be standardized. The correct values should be standardized units of measurement or left empty if they represent a score or nominal value without a specific unit.
SELECT
"observation_datetime",
"patient_id",
"encounter_id",
"CATEGORY",
"observation_code",
CASE
WHEN "DESCRIPTION" = 'At any point in the past 2 years has season or migrant farm work been your or your family''s main source of income?' THEN 'At any point in the past 2 years, has season or migrant farm work been your or your family''s main source of income?'
WHEN "DESCRIPTION" = 'During the past year what was the total combined income for you and the family members you live with? This information will help us determine if you are eligible for any benefits.' THEN 'During the past year, what was the total combined income for you and the family members you live with? This information will help us determine if you are eligible for any benefits.'
WHEN "DESCRIPTION" = 'Has lack of transportation kept you from medical appointments meetings work or from getting things needed for daily living?' THEN 'Has lack of transportation kept you from medical appointments, meetings, work, or from getting things needed for daily living?'
WHEN "DESCRIPTION" = 'How many family members including yourself do you currently live with?' THEN 'How many family members, including yourself, do you currently live with?'
WHEN "DESCRIPTION" = 'How often do you see or talk to people that you care about and feel close to (For example: talking to friends on the phone visiting friends or family going to church or club meetings)?' THEN 'How often do you see or talk to people that you care about and feel close to? (For example: talking to friends on the phone, visiting friends or family, going to church or club meetings)'
WHEN "DESCRIPTION" = 'In the past year have you been afraid of your partner or ex-partner?' THEN 'In the past year, have you been afraid of your partner or ex-partner?'
WHEN "DESCRIPTION" = 'In the past year have you or any family members you live with been unable to get any of the following when it was really needed?' THEN 'In the past year, have you or any family members you live with been unable to get any of the following when it was really needed?'
WHEN "DESCRIPTION" = 'In the past year have you spent more than 2 nights in a row in a jail prison detention center or juvenile correctional facility?' THEN 'In the past year, have you spent more than 2 nights in a row in a jail, prison, detention center, or juvenile correctional facility?'
WHEN "DESCRIPTION" = 'Stress is when someone feels tense nervous anxious or can''t sleep at night because their mind is troubled. How stressed are you?' THEN 'Stress is when someone feels tense, nervous, anxious or can''t sleep at night because their mind is troubled. How stressed are you?'
WHEN "DESCRIPTION" = 'DALY' THEN 'Disability-Adjusted Life Year (DALY)'
WHEN "DESCRIPTION" = 'QALY' THEN 'Quality-Adjusted Life Year (QALY)'
WHEN "DESCRIPTION" = 'QOLS' THEN 'Quality of Life Scale (QOLS)'
WHEN "DESCRIPTION" = 'Total score [DAST-10]' THEN 'Total score [Drug Abuse Screening Test-10]'
WHEN "DESCRIPTION" = 'Total score [HARK]' THEN 'Total score [Humiliation, Afraid, Rape, Kick Domestic Violence Screening Tool]'
WHEN "DESCRIPTION" = 'Total score [AUDIT-C]' THEN 'Total score [Alcohol Use Disorders Identification Test-Concise]'
WHEN "DESCRIPTION" = 'Patient Health Questionnaire-9: Modified for Teens total score [Reported.PHQ.Teen]' THEN 'Patient Health Questionnaire-9: Modified for Teens total score [PHQ-Teen]'
WHEN "DESCRIPTION" = 'FEV1/FVC' THEN 'FEV1/FVC Ratio [Forced Expiratory Volume in 1 second / Forced Vital Capacity]'
WHEN "DESCRIPTION" = 'Body temperature' THEN 'Body Temperature'
WHEN "DESCRIPTION" = 'INR in Platelet poor plasma by Coagulation assay' THEN 'International Normalized Ratio (INR) in Platelet Poor Plasma by Coagulation Assay'
WHEN "DESCRIPTION" = 'What number best describes how during the past week pain has interfered with your enjoyment of life?' THEN 'Pain Interference with Life Enjoyment (Past Week)'
WHEN "DESCRIPTION" = 'What number best describes how during the past week pain has interfered with your general activity?' THEN 'Pain Interference with General Activity (Past Week)'
WHEN "DESCRIPTION" = 'What number best describes your pain on average in the past week?' THEN 'Average Pain Level (Past Week)'
WHEN "DESCRIPTION" = 'Head Occipital-frontal circumference' THEN 'Head Occipital-Frontal Circumference'
WHEN "DESCRIPTION" = 'Head Occipital-frontal circumference Percentile' THEN 'Head Occipital-Frontal Circumference Percentile'
WHEN "DESCRIPTION" = 'Weight-for-length Per age and sex' THEN 'Weight-for-Length Percentile by Age and Sex'
WHEN "DESCRIPTION" = 'aPTT in Blood by Coagulation assay' THEN 'Activated Partial Thromboplastin Time (aPTT) in Blood by Coagulation Assay'
WHEN "DESCRIPTION" = 'Bicarbonate [Moles/volume] in Arterial blood' THEN 'Bicarbonate [Moles/Volume] in Arterial Blood'
WHEN "DESCRIPTION" = 'Carbon dioxide [Partial pressure] in Arterial blood' THEN 'Carbon Dioxide [Partial Pressure] in Arterial Blood'
WHEN "DESCRIPTION" = 'Oxygen [Partial pressure] in Arterial blood' THEN 'Oxygen [Partial Pressure] in Arterial Blood'
WHEN "DESCRIPTION" = 'pH of Arterial blood' THEN 'pH of Arterial Blood'
WHEN "DESCRIPTION" = 'Phosphate [Mass/volume] in Serum or Plasma' THEN 'Phosphate [Mass/Volume] in Serum or Plasma'
WHEN "DESCRIPTION" = 'Prothrombin time (PT)' THEN 'Prothrombin Time (PT)'
WHEN "DESCRIPTION" = 'Patient Health Questionnaire 9 item (PHQ-9) total score [Reported]' THEN 'Patient Health Questionnaire-9 (PHQ-9) Total Score [Reported]'
WHEN "DESCRIPTION" = 'Troponin I.cardiac [Mass/volume] in Serum or Plasma by High sensitivity method' THEN 'Cardiac Troponin I [Mass/Volume] in Serum or Plasma by High Sensitivity Method'
WHEN "DESCRIPTION" = 'Drugs of abuse 5 panel - Urine by Screen method' THEN 'Drugs of Abuse 5-Panel - Urine Screen'
WHEN "DESCRIPTION" = 'Iron binding capacity [Mass/volume] in Serum or Plasma' THEN 'Iron Binding Capacity [Mass/Volume] in Serum or Plasma'
WHEN "DESCRIPTION" = 'Iron saturation [Mass Fraction] in Serum or Plasma' THEN 'Iron Saturation [Mass Fraction] in Serum or Plasma'
WHEN "DESCRIPTION" = 'NT-proBNP' THEN 'N-Terminal Pro-B-Type Natriuretic Peptide (NT-proBNP)'
WHEN "DESCRIPTION" = 'Total score [MMSE]' THEN 'Total Score [Mini-Mental State Examination (MMSE)]'
WHEN "DESCRIPTION" = 'Natriuretic peptide.B prohormone N-Terminal [Mass/volume] in Blood by Immunoassay' THEN 'N-Terminal Pro-B-Type Natriuretic Peptide [Mass/Volume] in Blood by Immunoassay'
WHEN "DESCRIPTION" = 'Left ventricular Ejection fraction' THEN 'Left Ventricular Ejection Fraction'
WHEN "DESCRIPTION" = 'Operative Status Value' THEN 'Operative Status'
WHEN "DESCRIPTION" = 'American house dust mite IgE Ab in Serum' THEN 'American House Dust Mite IgE Antibody in Serum'
ELSE "DESCRIPTION"
END AS "DESCRIPTION",
"observation_value",
CASE
WHEN "UNITS" = '{score}' THEN NULL
WHEN "UNITS" = 'mm[Hg]' THEN 'mmHg'
WHEN "UNITS" = 'a' THEN '/year'
WHEN "UNITS" = '{nominal}' THEN NULL
WHEN "UNITS" = 'kg/m2' THEN 'kg/m²'
WHEN "UNITS" = '{#}' THEN NULL
WHEN "UNITS" = '10*3/uL' THEN '10³/µL'
WHEN "UNITS" = '/a' THEN '/year'
WHEN "UNITS" = '10*6/uL' THEN '10⁶/µL'
WHEN "UNITS" = 'Cel' THEN '°C'
WHEN "UNITS" = '{INR}' THEN NULL
WHEN "UNITS" = 'mL/min/{1.73_m2}' THEN 'mL/min/1.73m²'
WHEN "UNITS" = '[pH]' THEN 'pH'
WHEN "UNITS" = '{presence}' THEN NULL
WHEN "UNITS" = 'ug/dL' THEN 'µg/dL'
WHEN "UNITS" = 'ug/L' THEN 'µg/L'
WHEN "UNITS" = '[degF]' THEN '°F'
WHEN "UNITS" = 'm[IU]/L' THEN 'mIU/L'
WHEN "UNITS" = '{SG}' THEN NULL
WHEN "UNITS" = '{T-score}' THEN NULL
ELSE "UNITS"
END AS "UNITS",
"data_type"
FROM "observations_renamed_dedup"
),
"observations_renamed_dedup_cleaned_casted" AS (
-- Column Type Casting:
-- encounter_id: from VARCHAR to UUID
-- observation_datetime: from VARCHAR to TIMESTAMP
-- observation_value: from VARCHAR to DECIMAL
-- patient_id: from VARCHAR to UUID
SELECT
"CATEGORY",
"observation_code",
"DESCRIPTION",
"UNITS",
"data_type",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("observation_datetime" AS TIMESTAMP)
AS "observation_datetime",
"observation_value"
AS "observation_value",
CAST("patient_id" AS UUID)
AS "patient_id"
FROM "observations_renamed_dedup_cleaned"
)
-- COCOON BLOCK END
SELECT *
FROM "observations_renamed_dedup_cleaned_casted"
stg_observations.yml (Document the table)
version: 2
models:
- name: stg_observations
description: The table is about medical observations. It contains details of patient
health measurements. Each row represents a single observation. The table includes
the date, patient ID, encounter ID, category, measurement code, description, value,
units, and type of measurement. Observations cover vital signs like body height,
weight, and pain severity scores.
columns:
- name: CATEGORY
description: Category of the medical observation
tests:
- accepted_values:
values:
- laboratory
- survey
- vital-signs
- social-history
- procedure
- exam
- therapy
cocoon_meta:
missing_reason: Unknown
future_accepted_values:
- medication
- imaging
- diagnosis
- allergy
- immunization
- family-history
data_type:
current_data_type: VARCHAR
- name: observation_code
description: Standardized code for the medical observation
tests:
- not_null
cocoon_meta:
patterns:
- regex: \d{5,7}-\d
summary: LOINC codes (5-7 digits, hyphen, 1 digit)
- regex: '[A-Z]{3,4}'
summary: Short alphabetic codes (3-4 uppercase letters)
- regex: '[A-Za-z0-9-]+'
summary: Alphanumeric codes (mix of digits and letters)
data_type:
current_data_type: VARCHAR
- name: DESCRIPTION
description: Description of the medical observation
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: UNITS
description: Units of measurement for the observation
cocoon_meta:
missing_reason: Not all vital signs or measurements require units.
data_type:
current_data_type: VARCHAR
- name: data_type
description: Type of data for the observation
tests:
- not_null
- accepted_values:
values:
- numeric
- text
cocoon_meta:
future_accepted_values:
- categorical
- boolean
- date
- datetime
- time
data_type:
current_data_type: VARCHAR
- name: encounter_id
description: Unique identifier for the medical encounter
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: UUID
- name: observation_datetime
description: Date and time of the medical observation
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: observation_value
description: Numeric value of the medical observation
tests:
- not_null
cocoon_meta:
unusual_values: '''0.0'' and ''No'' likely represent the same concept. ''1.0''
and ''Yes'' might also be redundant representations.'
patterns:
- regex: ^\d+(\.\d+)?$
summary: Floating point numbers (including whole numbers)
- regex: ^(Yes|No)$
summary: Yes/No values
- regex: ^(3 to 5 times a week|5 or more times a week)$
summary: Frequency descriptions
- regex: ^(More than high school|High school diploma or GED)$
summary: Education levels
- regex: ^(Private insurance|Medicare|Medicaid)$
summary: Insurance types
- regex: ^(Full-time work|Part-time or temporary work)$
summary: Employment statuses
- regex: ^(Finding of .+|\w+.+\(finding\))$
summary: Medical findings (starting with "Finding of" or ending with "(finding)")
- regex: ^.+\(qualifier value\)$
summary: Qualifiers (ending with "(qualifier value)")
data_type:
current_data_type: VARCHAR
expected_data_type: DECIMAL
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
tests: []
stg_conditions
Table Preview (first 5 rows)
condition_description | condition_code | condition_end_date | condition_start_date | encounter_id | patient_id | |
---|---|---|---|---|---|---|
0 | Received certificate of high school equivalency (finding) | 5251000175109 | NaT | 1962-07-25 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 2ffa361e-5858-877e-e022-ce81fe32da1b |
1 | Full-time employment (finding) | 160903007 | 1963-07-31 | 1962-07-25 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 2ffa361e-5858-877e-e022-ce81fe32da1b |
2 | Full-time employment (finding) | 160903007 | 1966-08-03 | 1963-07-31 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | 2ffa361e-5858-877e-e022-ce81fe32da1b |
3 | Full-time employment (finding) | 160903007 | 1969-08-06 | 1966-08-03 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 2ffa361e-5858-877e-e022-ce81fe32da1b |
4 | Full-time employment (finding) | 160903007 | 1975-08-13 | 1972-08-09 | 0f6ff0c6-c365-84f5-6742-226325f0ab22 | 2ffa361e-5858-877e-e022-ce81fe32da1b |
stg_conditions.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:55:38.865223+00:00
WITH
"conditions_renamed" AS (
-- Rename: Renaming columns
-- START -> condition_start_date
-- STOP -> condition_end_date
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> condition_code
-- DESCRIPTION -> condition_description
SELECT
"START" AS "condition_start_date",
"STOP" AS "condition_end_date",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "condition_code",
"DESCRIPTION" AS "condition_description"
FROM "memory"."main"."conditions"
),
"conditions_renamed_casted" AS (
-- Column Type Casting:
-- condition_code: from INT to VARCHAR
-- condition_end_date: from VARCHAR to DATE
-- condition_start_date: from VARCHAR to DATE
-- encounter_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
SELECT
"condition_description",
CAST("condition_code" AS VARCHAR)
AS "condition_code",
CAST("condition_end_date" AS DATE)
AS "condition_end_date",
CAST("condition_start_date" AS DATE)
AS "condition_start_date",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("patient_id" AS UUID)
AS "patient_id"
FROM "conditions_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "conditions_renamed_casted"
stg_conditions.yml (Document the table)
version: 2
models:
- name: stg_conditions
description: The table is about patient conditions and employment status. It shows
start and end dates for conditions. Each row links a patient to an encounter and
a condition code. The conditions include high school equivalency and full-time
employment. The same patient has multiple entries over different time periods.
The table tracks changes in employment status for the patient across years.
columns:
- name: condition_description
description: Text description of the condition or status
tests:
- not_null
cocoon_meta:
patterns:
- regex: .+\s\((finding|disorder)\)$
summary: Entries ending with "(finding)" or "(disorder)"
- regex: ([A-Z][a-z]+\s?)+(\([a-z]+\))?
summary: Capitalized phrases (allowing for some lowercase words)
data_type:
current_data_type: VARCHAR
- name: condition_code
description: Standardized code for the condition or status
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: condition_end_date
description: End date of the condition or status
cocoon_meta:
missing_reason: Ongoing conditions or one-time events don't have end dates.
data_type:
current_data_type: DATE
- name: condition_start_date
description: Start date of the condition or status
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DATE
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
tests: []
stg_claims_transactions
Table Preview (first 5 rows)
service_description | patient_insurance_id | outstanding_balance | charge_id | service_units | department_id | transfer_type | payment_amount | unit_cost | adjustment_amount | transaction_amount | transaction_type | payment_method | transfer_amount | fee_schedule_id | appointment_id | claim_id | line_note | patient_id | primary_diagnosis_ref | procedure_code | procedure_modifier_1 | procedure_modifier_2 | provider_id | quaternary_diagnosis_ref | secondary_diagnosis_ref | service_end_datetime | service_location_id | service_start_datetime | supervising_provider_id | tertiary_diagnosis_ref | transaction_id | transfer_out_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | General examination of patient (procedure) | None | 0.0 | 0 | 1 | 20 | 1 | 0.0 | 136.8 | 0.0 | 136.8 | CHARGE | None | NaN | 1 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1 | 162673000 | None | None | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | 2.0 | 1962-07-25 12:15:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 1962-07-25 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | da18b712-59be-0489-1ba0-4830aef61968 | None |
1 | General examination of patient (procedure) | None | 0.0 | 1 | 1 | 20 | None | 136.8 | 0.0 | 0.0 | NaN | PAYMENT | CHECK | NaN | 1 | c6600278-5c90-c0c7-db52-140b3b98d1e8 | 9f4426b7-f32c-2b00-8ab4-1152cd0915ce | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1 | 162673000 | None | None | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | 2.0 | 1962-07-25 12:15:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 1962-07-25 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | 907b5d5f-4d34-385d-8fd9-564e782a8743 | None |
2 | General examination of patient (procedure) | None | 0.0 | 3 | 1 | 20 | 1 | 0.0 | 136.8 | 0.0 | 136.8 | CHARGE | None | NaN | 1 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1 | 162673000 | None | None | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | 1963-07-31 12:15:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 1963-07-31 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | bb0601d7-9d50-d1cc-488d-7e29cd95b49c | None |
3 | General examination of patient (procedure) | None | 0.0 | 4 | 1 | 20 | None | 136.8 | 0.0 | 0.0 | NaN | PAYMENT | CASH | NaN | 1 | 5d6bce0a-93b6-1c60-4399-97a717a3f843 | e22515cc-2422-7aac-49bd-ecc3a2ffdf86 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1 | 162673000 | None | None | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | 1963-07-31 12:15:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 1963-07-31 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | 555840f3-d0fc-4bb2-52fc-477e525e9ff1 | None |
4 | General examination of patient (procedure) | None | 0.0 | 6 | 1 | 20 | 1 | 0.0 | 136.8 | 0.0 | 136.8 | CHARGE | None | NaN | 1 | 829830eb-ca6e-ae6a-99a2-728751b0ea8e | 7ac8ce29-5517-d3d5-5004-f11942ae3225 | None | 2ffa361e-5858-877e-e022-ce81fe32da1b | 1 | 162673000 | None | None | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | None | 1966-08-03 12:15:39 | f1c6e386-ed52-3f97-a2b0-4e21ad2d0966 | 1966-08-03 12:00:39 | 5ceef3de-8e50-3c8f-8224-d277539f4100 | None | 7e46adfb-5b4f-4454-037a-2d834eefc4e8 | None |
stg_claims_transactions.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 00:54:34.603382+00:00
WITH
"claims_transactions_renamed" AS (
-- Rename: Renaming columns
-- ID -> transaction_id
-- CLAIMID -> claim_id
-- CHARGEID -> charge_id
-- PATIENTID -> patient_id
-- TYPE -> transaction_type
-- AMOUNT -> transaction_amount
-- METHOD -> payment_method
-- FROMDATE -> service_start_datetime
-- TODATE -> service_end_datetime
-- PLACEOFSERVICE -> service_location_id
-- PROCEDURECODE -> procedure_code
-- MODIFIER1 -> procedure_modifier_1
-- MODIFIER2 -> procedure_modifier_2
-- DIAGNOSISREF1 -> primary_diagnosis_ref
-- DIAGNOSISREF2 -> secondary_diagnosis_ref
-- DIAGNOSISREF3 -> tertiary_diagnosis_ref
-- DIAGNOSISREF4 -> quaternary_diagnosis_ref
-- UNITS -> service_units
-- DEPARTMENTID -> department_id
-- NOTES -> service_description
-- UNITAMOUNT -> unit_cost
-- TRANSFEROUTID -> transfer_out_id
-- TRANSFERTYPE -> transfer_type
-- PAYMENTS -> payment_amount
-- ADJUSTMENTS -> adjustment_amount
-- TRANSFERS -> transfer_amount
-- OUTSTANDING -> outstanding_balance
-- APPOINTMENTID -> appointment_id
-- LINENOTE -> line_note
-- PATIENTINSURANCEID -> patient_insurance_id
-- FEESCHEDULEID -> fee_schedule_id
-- PROVIDERID -> provider_id
-- SUPERVISINGPROVIDERID -> supervising_provider_id
SELECT
"ID" AS "transaction_id",
"CLAIMID" AS "claim_id",
"CHARGEID" AS "charge_id",
"PATIENTID" AS "patient_id",
"TYPE" AS "transaction_type",
"AMOUNT" AS "transaction_amount",
"METHOD" AS "payment_method",
"FROMDATE" AS "service_start_datetime",
"TODATE" AS "service_end_datetime",
"PLACEOFSERVICE" AS "service_location_id",
"PROCEDURECODE" AS "procedure_code",
"MODIFIER1" AS "procedure_modifier_1",
"MODIFIER2" AS "procedure_modifier_2",
"DIAGNOSISREF1" AS "primary_diagnosis_ref",
"DIAGNOSISREF2" AS "secondary_diagnosis_ref",
"DIAGNOSISREF3" AS "tertiary_diagnosis_ref",
"DIAGNOSISREF4" AS "quaternary_diagnosis_ref",
"UNITS" AS "service_units",
"DEPARTMENTID" AS "department_id",
"NOTES" AS "service_description",
"UNITAMOUNT" AS "unit_cost",
"TRANSFEROUTID" AS "transfer_out_id",
"TRANSFERTYPE" AS "transfer_type",
"PAYMENTS" AS "payment_amount",
"ADJUSTMENTS" AS "adjustment_amount",
"TRANSFERS" AS "transfer_amount",
"OUTSTANDING" AS "outstanding_balance",
"APPOINTMENTID" AS "appointment_id",
"LINENOTE" AS "line_note",
"PATIENTINSURANCEID" AS "patient_insurance_id",
"FEESCHEDULEID" AS "fee_schedule_id",
"PROVIDERID" AS "provider_id",
"SUPERVISINGPROVIDERID" AS "supervising_provider_id"
FROM "memory"."main"."claims_transactions"
),
"claims_transactions_renamed_cleaned" AS (
-- Clean unusual string values:
-- service_description: 'Encounter for problem' and 'Encounter for problem (procedure)' are redundant. Inconsistent use of '(procedure)' suffix.
-- transfer_type: The problem is that the transfer_type column contains a mix of numeric and alphabetic values. The value 'p' is unusual because it's alphabetic while the others ('1' and '2') are numeric. Without more context, it's unclear what 'p' represents. However, since '1' is the most frequent value and the column seems to be intended for numeric codes, it's reasonable to map 'p' to '1'.
SELECT
"transaction_id",
"claim_id",
"charge_id",
"patient_id",
"transaction_type",
"transaction_amount",
"payment_method",
"service_start_datetime",
"service_end_datetime",
"service_location_id",
"procedure_code",
"procedure_modifier_1",
"procedure_modifier_2",
"primary_diagnosis_ref",
"secondary_diagnosis_ref",
"tertiary_diagnosis_ref",
"quaternary_diagnosis_ref",
"service_units",
"department_id",
"service_description",
"unit_cost",
"transfer_out_id",
CASE
WHEN "transfer_type" = 'p' THEN '1'
ELSE "transfer_type"
END AS "transfer_type",
"payment_amount",
"adjustment_amount",
"transfer_amount",
"outstanding_balance",
"appointment_id",
"line_note",
"patient_insurance_id",
"fee_schedule_id",
"provider_id",
"supervising_provider_id"
FROM "claims_transactions_renamed"
),
"claims_transactions_renamed_cleaned_null" AS (
-- NULL Imputation: Impute Null to Disguised Missing Values
-- service_end_datetime: ['1970-01-01T00:00:00Z']
SELECT
CASE
WHEN "service_end_datetime" = '1970-01-01T00:00:00Z' THEN NULL
ELSE "service_end_datetime"
END AS "service_end_datetime",
"service_description",
"patient_insurance_id",
"quaternary_diagnosis_ref",
"claim_id",
"procedure_code",
"outstanding_balance",
"provider_id",
"charge_id",
"appointment_id",
"primary_diagnosis_ref",
"line_note",
"patient_id",
"service_units",
"department_id",
"transfer_type",
"payment_amount",
"unit_cost",
"service_start_datetime",
"adjustment_amount",
"transaction_id",
"service_location_id",
"transaction_amount",
"supervising_provider_id",
"procedure_modifier_2",
"tertiary_diagnosis_ref",
"transaction_type",
"secondary_diagnosis_ref",
"payment_method",
"procedure_modifier_1",
"transfer_amount",
"transfer_out_id",
"fee_schedule_id"
FROM "claims_transactions_renamed_cleaned"
),
"claims_transactions_renamed_cleaned_null_casted" AS (
-- Column Type Casting:
-- appointment_id: from VARCHAR to UUID
-- claim_id: from VARCHAR to UUID
-- line_note: from DECIMAL to VARCHAR
-- patient_id: from VARCHAR to UUID
-- primary_diagnosis_ref: from INT to VARCHAR
-- procedure_code: from INT to VARCHAR
-- procedure_modifier_1: from DECIMAL to VARCHAR
-- procedure_modifier_2: from DECIMAL to VARCHAR
-- provider_id: from VARCHAR to UUID
-- quaternary_diagnosis_ref: from DECIMAL to VARCHAR
-- secondary_diagnosis_ref: from DECIMAL to VARCHAR
-- service_end_datetime: from VARCHAR to TIMESTAMP
-- service_location_id: from VARCHAR to UUID
-- service_start_datetime: from VARCHAR to TIMESTAMP
-- supervising_provider_id: from VARCHAR to UUID
-- tertiary_diagnosis_ref: from DECIMAL to VARCHAR
-- transaction_id: from VARCHAR to UUID
-- transfer_out_id: from DECIMAL to VARCHAR
SELECT
"service_description",
"patient_insurance_id",
"outstanding_balance",
"charge_id",
"service_units",
"department_id",
"transfer_type",
"payment_amount",
"unit_cost",
"adjustment_amount",
"transaction_amount",
"transaction_type",
"payment_method",
"transfer_amount",
"fee_schedule_id",
CAST("appointment_id" AS UUID)
AS "appointment_id",
CAST("claim_id" AS UUID)
AS "claim_id",
CAST("line_note" AS VARCHAR)
AS "line_note",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("primary_diagnosis_ref" AS VARCHAR)
AS "primary_diagnosis_ref",
CAST("procedure_code" AS VARCHAR)
AS "procedure_code",
CAST("procedure_modifier_1" AS VARCHAR)
AS "procedure_modifier_1",
CAST("procedure_modifier_2" AS VARCHAR)
AS "procedure_modifier_2",
CAST("provider_id" AS UUID)
AS "provider_id",
CAST("quaternary_diagnosis_ref" AS VARCHAR)
AS "quaternary_diagnosis_ref",
CAST("secondary_diagnosis_ref" AS VARCHAR)
AS "secondary_diagnosis_ref",
CAST("service_end_datetime" AS TIMESTAMP)
AS "service_end_datetime",
CAST("service_location_id" AS UUID)
AS "service_location_id",
CAST("service_start_datetime" AS TIMESTAMP)
AS "service_start_datetime",
CAST("supervising_provider_id" AS UUID)
AS "supervising_provider_id",
CAST("tertiary_diagnosis_ref" AS VARCHAR)
AS "tertiary_diagnosis_ref",
CAST("transaction_id" AS UUID)
AS "transaction_id",
CAST("transfer_out_id" AS VARCHAR)
AS "transfer_out_id"
FROM "claims_transactions_renamed_cleaned_null"
)
-- COCOON BLOCK END
SELECT *
FROM "claims_transactions_renamed_cleaned_null_casted"
stg_claims_transactions.yml (Document the table)
version: 2
models:
- name: stg_claims_transactions
description: The table represents claims transactions in a healthcare system. It
includes charges and payments for medical services. Each row is a transaction
linked to a claim. Transactions have details like patient ID, procedure code,
amount, and payment method. The table connects patients, providers, and financial
aspects of healthcare services.
columns:
- name: service_description
description: Description of the procedure or service
tests:
- not_null
cocoon_meta:
unusual_values: '''Encounter for problem'' and ''Encounter for problem (procedure)''
are redundant. Inconsistent use of ''(procedure)'' suffix.'
data_type:
current_data_type: VARCHAR
- name: patient_insurance_id
description: Identifier for patient's insurance
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: outstanding_balance
description: Remaining balance to be paid
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: charge_id
description: Identifier for the specific charge within a claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: service_units
description: Number of units of service provided
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: department_id
description: Identifier for the department providing the service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: transfer_type
description: Type of transfer (e.g., 'p' for payment)
tests:
- accepted_values:
values:
- '1'
- '2'
cocoon_meta:
missing_reason: Not applicable when there's no transfer involved
future_accepted_values:
- p
- t
- w
- d
- r
- f
data_type:
current_data_type: VARCHAR
- name: payment_amount
description: Amount paid for the service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: unit_cost
description: Cost per unit of service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: adjustment_amount
description: Adjustments made to the charge
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: transaction_amount
description: Monetary value of the transaction
cocoon_meta:
missing_reason: Not applicable for PAYMENT transactions, only for CHARGES
data_type:
current_data_type: DECIMAL
- name: transaction_type
description: Type of transaction (e.g., CHARGE, PAYMENT)
tests:
- not_null
- accepted_values:
values:
- PAYMENT
- CHARGE
- TRANSFERIN
- TRANSFEROUT
cocoon_meta:
future_accepted_values:
- REFUND
- WITHDRAWAL
- DEPOSIT
- INTEREST
- FEE
- ADJUSTMENT
- REVERSAL
- CASHBACK
- CREDIT
- DEBIT
data_type:
current_data_type: VARCHAR
- name: payment_method
description: Payment method used (e.g., CHECK, CASH)
tests:
- accepted_values:
values:
- ECHECK
- COPAY
- CASH
- CC
- CHECK
cocoon_meta:
missing_reason: Not applicable for CHARGE transactions, only for PAYMENTS
future_accepted_values:
- DEBIT
- WIRE
- ACH
- MONEY ORDER
- PAYPAL
- VENMO
- APPLE PAY
- GOOGLE PAY
- CRYPTOCURRENCY
- GIFT CARD
- STORE CREDIT
- LOYALTY POINTS
data_type:
current_data_type: VARCHAR
- name: transfer_amount
description: Amount transferred between accounts
cocoon_meta:
missing_reason: Not applicable when there's no transfer involved
data_type:
current_data_type: DECIMAL
- name: fee_schedule_id
description: Identifier for the fee schedule used
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: INT
- name: appointment_id
description: Identifier for the associated appointment
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: claim_id
description: Identifier for the associated claim
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: line_note
description: Additional notes for the transaction line
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: primary_diagnosis_ref
description: Primary diagnosis reference
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: procedure_code
description: Code identifying the medical procedure performed
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: procedure_modifier_1
description: First modifier for the procedure code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: procedure_modifier_2
description: Second modifier for the procedure code
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: provider_id
description: Unique identifier for the healthcare provider
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: quaternary_diagnosis_ref
description: Quaternary diagnosis reference
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: secondary_diagnosis_ref
description: Secondary diagnosis reference
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: service_end_datetime
description: End date and time of the service
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: TIMESTAMP
- name: service_location_id
description: Identifier for the location of service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: service_start_datetime
description: Start date and time of the service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: supervising_provider_id
description: ID of supervising provider for the service
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: tertiary_diagnosis_ref
description: Tertiary diagnosis reference
cocoon_meta:
missing_reason: Unknown
data_type:
current_data_type: VARCHAR
- name: transaction_id
description: Unique identifier for the transaction
tests:
- not_null
- unique
cocoon_meta:
unique_reason: This column represents a unique identifier for each transaction.
For this table, each row is a distinct transaction (either a charge or a payment).
The transaction_id appears to be unique for each row, as it's different for
every sample provided.
data_type:
current_data_type: UUID
- name: transfer_out_id
description: Identifier for transfer out transactions
cocoon_meta:
missing_reason: Not applicable when there's no transfer involved
data_type:
current_data_type: VARCHAR
tests: []
stg_procedures
Table Preview (first 5 rows)
procedure_description | procedure_cost | reason_description | encounter_id | patient_id | procedure_code | procedure_end_time | procedure_start_time | reason_code | |
---|---|---|---|---|---|---|---|---|---|
0 | Medication Reconciliation (procedure) | 493.03 | None | 8b1ef1fb-3297-bf02-8442-f68af013a64c | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 430193006 | 2014-01-19 01:20:00 | 2014-01-19 01:05:00 | None |
1 | Medication Reconciliation (procedure) | 523.74 | None | 79b413f6-c668-bf41-bce7-b403c40cf7a4 | eb247227-e839-88d3-447d-b5972468f33b | 430193006 | 2021-09-23 02:59:04 | 2021-09-23 02:44:04 | None |
2 | Medication Reconciliation (procedure) | 536.16 | None | a79858c9-a74e-8ca3-c9a0-b63897e8fc05 | eb247227-e839-88d3-447d-b5972468f33b | 430193006 | 2021-10-28 02:59:04 | 2021-10-28 02:44:04 | None |
3 | Clavicle X-ray | 431.40 | None | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 168594001 | 2016-04-11 03:42:57 | 2016-04-11 03:12:57 | None |
4 | Admission to orthopedic department | 431.40 | Fracture of clavicle | 1600b01f-01b3-fd15-135f-c7c4a30d867f | 3dfb065a-67df-5b8a-3901-49bfd834bed1 | 305428000 | 2016-04-11 04:55:57 | 2016-04-11 03:12:57 | 58150001.0 |
stg_procedures.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
-- Generated at 2024-08-24 01:14:58.976415+00:00
WITH
"procedures_renamed" AS (
-- Rename: Renaming columns
-- START -> procedure_start_time
-- STOP -> procedure_end_time
-- PATIENT -> patient_id
-- ENCOUNTER -> encounter_id
-- CODE -> procedure_code
-- DESCRIPTION -> procedure_description
-- BASE_COST -> procedure_cost
-- REASONCODE -> reason_code
-- REASONDESCRIPTION -> reason_description
SELECT
"START" AS "procedure_start_time",
"STOP" AS "procedure_end_time",
"PATIENT" AS "patient_id",
"ENCOUNTER" AS "encounter_id",
"CODE" AS "procedure_code",
"DESCRIPTION" AS "procedure_description",
"BASE_COST" AS "procedure_cost",
"REASONCODE" AS "reason_code",
"REASONDESCRIPTION" AS "reason_description"
FROM "memory"."main"."procedures"
),
"procedures_renamed_casted" AS (
-- Column Type Casting:
-- encounter_id: from VARCHAR to UUID
-- patient_id: from VARCHAR to UUID
-- procedure_code: from INT to VARCHAR
-- procedure_end_time: from VARCHAR to TIMESTAMP
-- procedure_start_time: from VARCHAR to TIMESTAMP
-- reason_code: from DECIMAL to VARCHAR
SELECT
"procedure_description",
"procedure_cost",
"reason_description",
CAST("encounter_id" AS UUID)
AS "encounter_id",
CAST("patient_id" AS UUID)
AS "patient_id",
CAST("procedure_code" AS VARCHAR)
AS "procedure_code",
CAST("procedure_end_time" AS TIMESTAMP)
AS "procedure_end_time",
CAST("procedure_start_time" AS TIMESTAMP)
AS "procedure_start_time",
CAST("reason_code" AS VARCHAR)
AS "reason_code"
FROM "procedures_renamed"
)
-- COCOON BLOCK END
SELECT *
FROM "procedures_renamed_casted"
stg_procedures.yml (Document the table)
version: 2
models:
- name: stg_procedures
description: The table is about medical procedures performed on patients. It includes
details like start and stop times, patient ID, encounter ID, procedure code and
description, base cost, and reason codes. Each row represents a specific procedure
performed on a patient during a medical encounter. The table links patients, encounters,
and procedures, providing a comprehensive view of medical services rendered.
columns:
- name: procedure_description
description: Brief description of the medical procedure
tests:
- not_null
cocoon_meta:
patterns:
- regex: .+\s\(procedure\)$
summary: Procedures ending with "(procedure)"
- regex: .+\s\(regime\/therapy\)$
summary: Regimes or therapies ending with "(regime/therapy)"
- regex: .+\s\([a-z\/]+\)$
summary: General pattern for procedures or regimes in parentheses
data_type:
current_data_type: VARCHAR
- name: procedure_cost
description: Base cost of the medical procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: DECIMAL
- name: reason_description
description: Description of the reason for the procedure
cocoon_meta:
missing_reason: Not applicable for procedures without specific medical reasons.
data_type:
current_data_type: VARCHAR
- name: encounter_id
description: Unique identifier for the medical encounter
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: patient_id
description: Unique identifier for the patient
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: UUID
- name: procedure_code
description: Numerical code for the medical procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: VARCHAR
- name: procedure_end_time
description: End time of the medical procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: procedure_start_time
description: Start time of the medical procedure
tests:
- not_null
cocoon_meta:
data_type:
current_data_type: TIMESTAMP
- name: reason_code
description: Code indicating the reason for the procedure
cocoon_meta:
missing_reason: Not applicable for procedures without specific medical reasons.
data_type:
current_data_type: VARCHAR
tests: []
Join Graph (FK to PK)
cocoon_join.yml (Document the joins)
join_graph:
- table_name: stg_encounters
primary_key: encounter_id
foreign_keys:
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
- column: payer_id
reference:
table_name: stg_payers
column: payer_id
- column: provider_id
reference:
table_name: stg_providers
column: provider_id
- column: organization_id
reference:
table_name: stg_organizations
column: organization_id
time_keys:
- encounter_end
- encounter_start
- table_name: stg_allergies
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- allergy_end_date
- allergy_start_date
- table_name: stg_careplans
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- end_date
- start_date
- table_name: stg_conditions
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- condition_end_date
- condition_start_date
- table_name: stg_devices
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- usage_end_date
- usage_start_date
- table_name: stg_imaging_studies
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- study_datetime
- table_name: stg_immunizations
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- immunization_datetime
- table_name: stg_medications
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
- column: payer_id
reference:
table_name: stg_payers
column: payer_id
time_keys:
- order_end_datetime
- order_start_datetime
- table_name: stg_observations
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- observation_datetime
- table_name: stg_procedures
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- procedure_end_time
- procedure_start_time
- table_name: stg_supplies
foreign_keys:
- column: encounter_id
reference:
table_name: stg_encounters
column: encounter_id
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
time_keys:
- order_date
- table_name: stg_patients
primary_key: patient_id
foreign_keys: []
time_keys:
- birth_date
- death_date
- table_name: stg_claims
foreign_keys:
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
- column: primary_insurance_id
reference:
table_name: stg_payers
column: payer_id
- column: secondary_insurance_id
reference:
table_name: stg_payers
column: payer_id
- column: provider_id
reference:
table_name: stg_providers
column: provider_id
- column: referring_provider_id
reference:
table_name: stg_providers
column: provider_id
- column: supervising_provider_id
reference:
table_name: stg_providers
column: provider_id
primary_key: claim_id
time_keys:
- illness_onset_date
- patient_last_billed_date
- primary_last_billed_date
- secondary_last_billed_date
- service_date
- table_name: stg_claims_transactions
foreign_keys:
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
- column: provider_id
reference:
table_name: stg_providers
column: provider_id
- column: supervising_provider_id
reference:
table_name: stg_providers
column: provider_id
- column: claim_id
reference:
table_name: stg_claims
column: claim_id
time_keys:
- service_end_datetime
- service_start_datetime
- table_name: stg_payer_transitions
foreign_keys:
- column: patient_id
reference:
table_name: stg_patients
column: patient_id
- column: primary_payer_id
reference:
table_name: stg_payers
column: payer_id
- column: secondary_payer_id
reference:
table_name: stg_payers
column: payer_id
time_keys:
- coverage_end_date
- coverage_start_date
- table_name: stg_payers
primary_key: payer_id
foreign_keys: []
- table_name: stg_providers
primary_key: provider_id
foreign_keys:
- column: organization_id
reference:
table_name: stg_organizations
column: organization_id
- table_name: stg_organizations
primary_key: organization_id
foreign_keys: []
All
Process Story
cocoon_er.yml (Document the entity relationships)
groups: []
entities:
- entity_name: Encounters
entity_description: Represents individual medical encounters or visits, including
details about the patient, provider, payer, and associated costs.
table_name: stg_encounters
primary_key: encounter_id
- entity_name: Patients
entity_description: Represents individual patients and their personal and healthcare-related
information.
table_name: stg_patients
primary_key: patient_id
- entity_name: Payers
entity_description: Represents healthcare payers, including both government and
private entities, along with their financial and operational data.
table_name: stg_payers
primary_key: payer_id
- entity_name: Providers
entity_description: Represents healthcare providers, including their personal information,
specialties, and practice details.
table_name: stg_providers
primary_key: provider_id
- entity_name: Claims
entity_description: Represents individual insurance claims, including details about
the patient, provider, diagnosis, and claim processing information.
table_name: stg_claims
primary_key: claim_id
- entity_name: Organizations
entity_description: Represents healthcare organizations such as clinics, hospitals,
and urgent care centers, including their contact information and operational data.
table_name: stg_organizations
primary_key: organization_id
relations:
- relation_name: PatientEncounterDetails
relation_description: Encounters link Patients with Providers at Organizations,
documenting medical procedures and associated costs covered by Payers.
table_name: stg_encounters
entities:
- Encounters
- Patients
- Payers
- Providers
- Organizations
- relation_name: ProviderOrganizationAffiliation
relation_description: Providers are affiliated with Organizations and deliver healthcare
services within those organizations' networks.
table_name: stg_providers
entities:
- Providers
- Organizations
- relation_name: PatientProviderClaimPayment
relation_description: Claims are filed by Patients for medical services provided
by Providers and are processed by Payers for reimbursement.
table_name: stg_claims
entities:
- Claims
- Patients
- Payers
- Providers
- relation_name: PatientAllergyEncounters
relation_description: Patients experience allergies during Encounters, with each
allergy recorded along with its details and potential reactions.
table_name: stg_allergies
entities:
- Encounters
- Patients
- relation_name: PatientEncounterCarePlans
relation_description: Patients receive care plans during Encounters, detailing specific
treatments and diagnoses for medical conditions.
table_name: stg_careplans
entities:
- Encounters
- Patients
- relation_name: PatientEncounterConditions
relation_description: Patients have Encounters during which their conditions and
employment status are recorded and tracked over time.
table_name: stg_conditions
entities:
- Encounters
- Patients
- relation_name: PatientEncounterDevices
relation_description: Patients receive medical devices during Encounters, tracking
device usage periods and specifications.
table_name: stg_devices
entities:
- Encounters
- Patients
- relation_name: PatientImagingEncounters
relation_description: Patients undergo imaging studies during Encounters, which
are specific healthcare visits or interactions.
table_name: stg_imaging_studies
entities:
- Encounters
- Patients
- relation_name: PatientImmunizationEncounters
relation_description: Patients receive immunizations during Encounters, with each
immunization event recorded as a separate row.
table_name: stg_immunizations
entities:
- Encounters
- Patients
- relation_name: PatientEncounterMedicationCoverage
relation_description: Patients receive medication prescriptions during Encounters,
which are covered by Payers who manage the financial aspects of the medications.
table_name: stg_medications
entities:
- Encounters
- Patients
- Payers
- relation_name: PatientEncounterObservations
relation_description: Patients undergo medical Encounters during which various health
observations are recorded and measured.
table_name: stg_observations
entities:
- Encounters
- Patients
- relation_name: PatientEncounterProcedures
relation_description: Patients undergo medical procedures during Encounters, which
are documented with details such as procedure codes, descriptions, and costs.
table_name: stg_procedures
entities:
- Encounters
- Patients
- relation_name: PatientEncounterSupplies
relation_description: Patients receive medical supplies during Encounters, with
each supply order linked to a specific Encounter.
table_name: stg_supplies
entities:
- Encounters
- Patients
- relation_name: PatientProviderClaimTransactions
relation_description: Patients receive medical services from Providers, generating
Claims that track charges, payments, and adjustments for these services.
table_name: stg_claims_transactions
entities:
- Patients
- Providers
- Claims
- relation_name: PatientInsuranceCoverage
relation_description: This tracks insurance coverage periods for Patients, showing
their primary and secondary Payers over time.
table_name: stg_payer_transitions
entities:
- Patients
- Payers
story:
- name: PatientInsuranceCoverage
description: Patient obtains insurance from primary and secondary payers.
type: relation
- name: ProviderOrganizationAffiliation
description: Healthcare providers join organizations to offer medical services.
type: relation
- name: PatientEncounterDetails
description: Patient visits provider at healthcare organization for treatment.
type: relation
- name: PatientAllergyEncounters
description: Provider records patient's allergies during the visit.
type: relation
- name: PatientEncounterObservations
description: Provider conducts and records health observations and measurements.
type: relation
- name: PatientEncounterConditions
description: Provider diagnoses and documents patient's medical conditions.
type: relation
- name: PatientEncounterCarePlans
description: Provider creates tailored care plan for patient's conditions.
type: relation
- name: PatientImagingEncounters
description: Patient undergoes necessary imaging studies during visit.
type: relation
- name: PatientEncounterProcedures
description: Provider performs and documents medical procedures for patient.
type: relation
- name: PatientEncounterDevices
description: Provider prescribes and records medical devices for patient.
type: relation
- name: PatientEncounterSupplies
description: Provider orders and documents medical supplies for patient.
type: relation
- name: PatientEncounterMedicationCoverage
description: Provider prescribes medications covered by patient's insurance.
type: relation
- name: PatientImmunizationEncounters
description: Patient receives and provider records necessary immunizations.
type: relation
- name: PatientProviderClaimTransactions
description: Provider submits claim for services rendered to patient.
type: relation
- name: PatientProviderClaimPayment
description: Insurance processes and pays claim for patient's treatment.
type: relation