Fivetran Jira

We display the source tables from the data warehouses to model.

comment (first 100 rows)

id _fivetran_synced author_id body created is_public issue_id update_author_id updated
0 10001 2020-11-12 12:20:53.148 557058:a9572f6a-3041-435a-96ed-5286e3811b33 Hello 2020-11-10 19:19:41.224 True 10025 557058:a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-10 19:19:41.224
1 10013 2020-11-10 19:21:48.619 557058:a9572f6a-3041-435a-96ed-5286e3811b33 To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago 2020-11-07 02:45:38.717 True 10035 557058:a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717
2 10012 2020-11-10 19:21:48.618 557058:a9572f6a-3041-435a-96ed-5286e3811b33 Joined Sample Sprint 2 7 days 9 hours 10 minutes ago 2020-11-07 02:45:38.717 True 10035 557058:a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717

component (first 100 rows)

id _fivetran_synced description name project_id
0 10004 2020-11-23 12:21:02.159 NaN Component 5 10001
1 10003 2020-11-23 12:21:02.159 NaN Component 4 10001
2 10002 2020-11-23 12:21:02.158 NaN Component 3 10001
3 10019 2020-11-23 12:21:02.157 NaN Component 2 10001

epic (first 100 rows)

id _fivetran_synced done key_ name summary
0 10014 2020-11-17 12:20:54.934 False TP-11 Test Epic just for testing

field (first 100 rows)

id _fivetran_synced is_array is_custom name
0 timeoriginalestimate 2020-11-23 22:20:39.685 False False Original Estimate
1 creator 2020-11-23 22:20:39.711 False False Creator
2 issuerestriction 2020-11-23 22:20:39.643 False False Restrict to
3 components 2020-11-23 22:20:40.643 False False Components
4 summary 2020-11-23 22:20:41.553 False False Summary
5 epiclink 2020-11-23 22:20:41.553 False False Epic-Link
6 epic 2020-11-23 22:20:41.553 False False Epic

field_option (first 100 rows)

id _fivetran_synced name
0 10104 2020-11-23 12:21:00.979 opt 2
1 10016 2020-11-17 12:20:52.552 To Do
2 10106 2020-11-17 12:20:52.227 opt 4
3 10019 2020-11-19 12:20:53.110 Impediment
4 10026 2020-11-20 12:20:54.221 This is the summary of the issue.

issue (first 100 rows)

id _fivetran_deleted _fivetran_synced _original_estimate _remaining_estimate _time_spent assignee created creator description due_date environment issue_type key_ last_viewed original_estimate parent_id priority project remaining_estimate reporter resolution resolved status status_category_changed summary time_spent updated work_ratio
0 10011 False 2020-11-06 22:23:38 NaN NaN NaN NaN 2020-04-01 08:33:58 5c3326c24b248c315badf3ee Yes, this is an issue to test permissions NaN NaN 10103 TP-8 None NaN NaN 3 10001 NaN 5c3326c24b248c315badf3ee NaN 2020-05-29 11:33:53 3 2020-04-01 08:34:02 We want to test permissions NaN 2020-05-29 11:33:53 -1
1 10015 False 2020-11-06 22:23:41 NaN NaN NaN NaN 2020-06-11 07:38:03 5ed0edbe2999b60c23255b4f None NaN NaN 10103 TP-12 None NaN NaN 5 10001 NaN 5ed0edbe2999b60c23255b4f NaN 2020-06-11 07:38:03 10003 2020-06-11 07:38:03 Part of test epic NaN 2020-06-11 07:38:03 -1
2 10018 False 2020-11-12 12:20:50 NaN NaN NaN NaN 2020-11-10 19:11:45 557058:a9572f6a-3041-435a-96ed-5286e3811b33 None NaN NaN 10103 TCP-1 2020-11-10 19:11:46 NaN NaN 3 10005 NaN 557058:a9572f6a-3041-435a-96ed-5286e3811b33 10000.0 2020-11-28 19:11:45 10003 2020-11-28 19:11:46 this is a story NaN 2020-11-28 19:11:45 -1

issue_field_history (first 100 rows)

field_id issue_id time_ _fivetran_synced value_
0 created 10027 2020-11-10 19:19:41.472 2020-11-12 12:20:53.478 2020-11-10T19:19:41.472Z
1 customfield_10104 10027 2020-11-10 19:19:41.472 2020-11-12 12:20:53.472 3.0
2 summary 10027 2020-11-10 19:19:41.472 2020-11-12 12:20:53.500 As a developer, I'd like to update story status during the sprint >> Click the Active sprints link at the top right of the screen to go to the Active sprints where the current Sprint's items can be updated
3 components 10018 2020-11-10 19:19:41.472 2020-11-19 12:20:53.500 10019
4 summary 10018 2020-11-10 19:19:41.472 2020-11-19 12:20:53.500 10026
5 epic 10015 2020-11-10 19:19:41.472 2020-11-19 12:20:53.500 1111111
6 epiclink 10018 2020-11-10 19:19:41.472 2020-11-12 12:20:53.500 https://ulr-here/epic-key

issue_multiselect_history (first 100 rows)

_fivetran_id time_ _fivetran_synced field_id issue_id value_
0 w4UiT+hPMxJp3RhL/YFJm3uWL5U= 2020-11-10 19:19:41.472 2020-11-12 12:20:53.506 subtasks 10027 0
1 4pVgGn0qSqR2hCmMdo4wWHXmgew= 2020-11-10 19:19:41.472 2020-11-12 12:20:53.479 customfield_10021 10027 0
2 /zrY8m6q0VMW6ia1jGIerXqLIgQ= 2020-11-10 19:19:41.472 2020-11-12 12:20:53.479 customfield_10020 10027 0
3 3p3gGn0qSqR2hCmMdo4wWHXa32m= 2020-11-10 19:19:41.472 2020-11-12 12:20:53.479 components 10027 0

issue_type (first 100 rows)

id _fivetran_synced description name subtask
0 10117 2020-11-23 22:20:40.176 Bugs track problems or errors. Bug False
1 10106 2020-11-23 22:20:40.171 A problem or error. Bug False
2 10101 2020-11-23 22:20:40.177 Track large pieces of work. Epic False

priority (first 100 rows)

id _fivetran_synced description name
0 2 2020-11-06 22:23:39.764 Serious problem that could block progress. High
1 5 2020-11-06 22:23:41.009 Trivial problem with little or no impact on progress. Lowest
2 1 2020-11-20 12:20:54.303 This problem will block progress. Highest
3 3 2020-11-25 12:20:42.395 Has the potential to affect progress. Medium
4 4 2020-11-17 12:20:52.382 Minor problem or easily worked around. Low

project (first 100 rows)

id _fivetran_synced description key_ lead_id name permission_scheme_id project_category_id project_type_key
0 10008 2020-11-17 12:20:50.283 NaN TBTP 557058:a9572f6a-3041-435a-96ed-5286e3811b33 test bug tracking project 10001 NaN software
1 10005 2020-11-12 12:20:49.801 NaN TCP 557058:a9572f6a-3041-435a-96ed-5286e3811b33 test classique project 10001 NaN software
2 10001 2020-11-23 12:21:02.150 NaN TP 5c3326c24b248c315badf3ee Test Project 10001 10000.0 software

project_board (first 100 rows)

board_id project_id _fivetran_synced
0 7 10006 2020-11-22 12:20:57.685
1 5 10004 2020-11-22 12:20:59.875

project_category (first 100 rows)

id _fivetran_synced description name
0 10000 2020-11-23 12:21:02.157 will this create a project_category table? test-category

resolution (first 100 rows)

id _fivetran_synced description name
0 10000 2020-11-22 12:20:59.338 Work has been completed on this issue. Done

sprint (first 100 rows)

id _fivetran_synced board_id complete_date end_date name start_date
0 4 2020-11-22 12:20:59.513 5 None 2020-11-26 21:25:12 TNGP Sprint 1 2020-11-12 21:25:17.778
1 1 2020-11-22 12:20:57.012 0 2020-11-10 20:52:38.262 2020-11-17 10:29:44.653 Sample Sprint 2 2020-11-03 10:09:44.653
2 3 2020-11-22 12:20:57.298 7 None 2020-11-26 21:55:00 Sample Sprint 2 2020-11-12 21:55:42.220

status (first 100 rows)

id _fivetran_synced description name status_category_id
0 10010 2020-11-23 22:20:40.409 None In Progress 4
1 3 2020-11-23 22:20:40.391 This issue is being actively worked on at the moment by the assignee. In Progress 4
2 10000 2020-11-23 22:20:40.404 None To Do 2

status_category (first 100 rows)

id _fivetran_synced name
0 3 2020-11-23 22:20:40.410 Done
1 2 2020-11-23 22:20:40.411 To Do
2 4 2020-11-23 22:20:40.411 In Progress

user (first 100 rows)

id _fivetran_synced email locale name time_zone username
0 557058:a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-23 12:21:02.355 NaN en_US Scooby Doo America/Los_Angeles NaN
1 5ed0edbe2999b60c23255b4f 2020-11-23 12:21:02.562 NaN en_US Scrappy Cornelius Doo America/Los_Angeles NaN
2 5c3326c24b248c315badf3ee 2020-11-23 12:21:02.757 NaN en_US Norville Shaggy Rogers America/Los_Angeles NaN

user_group (first 100 rows)

group_name user_id _fivetran_synced
0 jira-software-users 5faf114ad74f69007579568f 2020-11-20 12:20:54.925
1 site-admins 5d577670dbb98c0d9c232bb2 2020-11-23 12:21:01.943
2 confluence-users 557058:a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-23 12:21:02.355

version (first 100 rows)

id _fivetran_synced archived description name overdue project_id release_date released start_date
0 10001 2021-06-17 14:20:59.271 False second version upd v2.0 True 10001 2020-06-01 False 2020-05-01
1 10003 2021-04-13 16:31:23.435 False None Version 2.0 True 10006 2020-11-17 False None
2 10004 2021-04-13 16:31:23.435 False None Version 3.0 False 10006 None False None
Source tables may have typos, unclear names, incorrect column types, etc. We clean these tables.

stg_field (first 100 rows)

field_id is_multi_value is_custom_field field_name
0 timeoriginalestimate False False Original Estimate
1 creator False False Creator
2 issuerestriction False False Restrict to
3 components False False Components
4 summary False False Summary
5 epiclink False False Epic-Link
6 epic False False Epic

stg_field.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"field_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "is_array",
        "is_custom",
        "name"
    FROM "field"
),

"field_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> field_id
    -- is_array -> is_multi_value
    -- is_custom -> is_custom_field
    -- name -> field_name
    SELECT 
        "id" AS "field_id",
        "is_array" AS "is_multi_value",
        "is_custom" AS "is_custom_field",
        "name" AS "field_name"
    FROM "field_projected"
)

-- COCOON BLOCK END
SELECT * FROM "field_projected_renamed"

stg_field.yml (Document the table)

version: 2
models:
- name: stg_field
  description: The table is about fields in an issue tracking system. It contains
    information on field IDs, names, and properties. Each row represents a different
    field. The "is_array" and "is_custom" columns are boolean flags indicating field
    characteristics. The table includes standard fields like summary, components,
    and creator.
  columns:
  - name: field_id
    description: Unique identifier for the field
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each field in the
        issue tracking system. For this table, each row represents a different field.
        The field_id is likely to be unique across rows as it's designed to be an
        identifier.
  - name: is_multi_value
    description: Boolean flag indicating if field can have multiple values
    tests:
    - not_null
  - name: is_custom_field
    description: Boolean flag indicating if field is custom-defined
    tests:
    - not_null
  - name: field_name
    description: Human-readable name of the field
    tests:
    - not_null

stg_project (first 100 rows)

project_id project_key project_name permission_scheme_id project_type category_id
0 10008 TBTP Test Project 10001 software None
1 10005 TCP Test Project 10001 software None
2 10001 TP Test Project 10001 software 10000.0

stg_project.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"project_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "key_",
        "lead_id",
        "name",
        "permission_scheme_id",
        "project_category_id",
        "project_type_key"
    FROM "project"
),

"project_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> project_id
    -- description -> project_description
    -- key_ -> project_key
    -- name -> project_name
    -- project_category_id -> category_id
    -- project_type_key -> project_type
    SELECT 
        "id" AS "project_id",
        "description" AS "project_description",
        "key_" AS "project_key",
        "lead_id",
        "name" AS "project_name",
        "permission_scheme_id",
        "project_category_id" AS "category_id",
        "project_type_key" AS "project_type"
    FROM "project_projected"
),

"project_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- project_name: The problem is inconsistent capitalization and specificity in project names. "Test Project" is capitalized and generic, while "test bug tracking project" and "test classique project" are lowercase and more specific. The most frequent and generic form is "Test Project", which should be used for consistency. 
    SELECT
        "project_id",
        "project_description",
        "project_key",
        "lead_id",
        CASE
            WHEN "project_name" = 'test bug tracking project' THEN 'Test Project'
            WHEN "project_name" = 'test classique project' THEN 'Test Project'
            ELSE "project_name"
        END AS "project_name",
        "permission_scheme_id",
        "category_id",
        "project_type"
    FROM "project_projected_renamed"
),

"project_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- category_id: from DECIMAL to VARCHAR
    -- lead_id: from VARCHAR to UUID
    -- project_description: from DECIMAL to VARCHAR
    SELECT
        "project_id",
        "project_key",
        "project_name",
        "permission_scheme_id",
        "project_type",
        CAST("category_id" AS VARCHAR) AS "category_id",
        CASE 
            WHEN "lead_id" ~ ':[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
            THEN CAST(REGEXP_EXTRACT("lead_id", '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})$') AS UUID)
            ELSE TRY_CAST("lead_id" AS UUID)
        END AS "lead_id",
        CAST("project_description" AS VARCHAR) AS "project_description"
    FROM "project_projected_renamed_cleaned"
),

"project_projected_renamed_cleaned_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- lead_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- project_description has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "project_id",
        "project_key",
        "project_name",
        "permission_scheme_id",
        "project_type",
        "category_id"
    FROM "project_projected_renamed_cleaned_casted"
)

-- COCOON BLOCK END
SELECT * FROM "project_projected_renamed_cleaned_casted_missing_handled"

stg_project.yml (Document the table)

version: 2
models:
- name: stg_project
  description: The table is about projects. Each project has an ID, description, key,
    lead ID, name, permission scheme ID, project category ID, and project type key.
    The table includes details like project names, keys, and associated IDs. All projects
    listed are software projects with different names and keys.
  columns:
  - name: project_id
    description: Unique identifier for the project
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each project. For
        this table, each row represents a distinct project, and project_id is unique
        across rows.
  - name: project_key
    description: Short code or abbreviation for the project
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a short code or abbreviation for the project.
        For this table, each row is a distinct project, and the project_key appears
        to be unique across rows based on the sample data.
  - name: project_name
    description: Full name of the project
    tests:
    - not_null
  - name: permission_scheme_id
    description: ID of the permission scheme applied to the project
    tests:
    - not_null
  - name: project_type
    description: Key indicating the type of project
    tests:
    - not_null
    - accepted_values:
        values:
        - software
        - hardware
        - web
        - mobile
        - desktop
        - embedded
        - cloud
        - AI/ML
        - data science
        - IoT
        - cybersecurity
        - blockchain
        - game development
        - enterprise
        - e-commerce
        - social media
        - education
        - healthcare
        - fintech
        - research
        - open source
  - name: category_id
    description: ID of the category the project belongs to
    cocoon_meta:
      missing_acceptable: Not all projects require categorization.

stg_version (first 100 rows)

version_id is_archived version_description version_name is_overdue project_id is_released release_date start_date
0 10001 False second version upd Version 2.0 True 10001 False 2020-06-01 2020-05-01
1 10003 False None Version 2.0 True 10006 False 2020-11-17 NaT
2 10004 False None Version 3.0 False 10006 False NaT NaT

stg_version.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"version_projected" AS (
    -- Projection: Selecting 9 out of 10 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "archived",
        "description",
        "name",
        "overdue",
        "project_id",
        "release_date",
        "released",
        "start_date"
    FROM "version"
),

"version_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> version_id
    -- archived -> is_archived
    -- description -> version_description
    -- name -> version_name
    -- overdue -> is_overdue
    -- released -> is_released
    SELECT 
        "id" AS "version_id",
        "archived" AS "is_archived",
        "description" AS "version_description",
        "name" AS "version_name",
        "overdue" AS "is_overdue",
        "project_id",
        "release_date",
        "released" AS "is_released",
        "start_date"
    FROM "version_projected"
),

"version_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- version_name: The problem is inconsistent formatting for version 2.0. It appears as both "Version 2.0" and "v2.0". The correct values should follow the most common format, which is "Version X.0". "Version 2.0" is already in the correct format and is the most frequent, so "v2.0" should be changed to match it. 
    SELECT
        "version_id",
        "is_archived",
        "version_description",
        CASE
            WHEN "version_name" = 'v2.0' THEN 'Version 2.0'
            ELSE "version_name"
        END AS "version_name",
        "is_overdue",
        "project_id",
        "release_date",
        "is_released",
        "start_date"
    FROM "version_projected_renamed"
),

"version_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- release_date: from VARCHAR to DATE
    -- start_date: from VARCHAR to DATE
    SELECT
        "version_id",
        "is_archived",
        "version_description",
        "version_name",
        "is_overdue",
        "project_id",
        "is_released",
        CAST("release_date" AS DATE) AS "release_date",
        CAST("start_date" AS DATE) AS "start_date"
    FROM "version_projected_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT * FROM "version_projected_renamed_cleaned_casted"

stg_version.yml (Document the table)

version: 2
models:
- name: stg_version
  description: The table is about software versions. It includes details like version
    ID, name, description, project ID, release dates, and status flags. Each row represents
    a specific version with its attributes. The table tracks whether versions are
    archived, overdue, or released, along with their start and release dates.
  columns:
  - name: version_id
    description: Unique identifier for the version
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each software version.
        For this table, each row represents a specific version of software. The version_id
        is likely to be unique across rows as it's designed to distinctly identify
        each version.
  - name: is_archived
    description: Indicates if the version is archived
    tests:
    - not_null
  - name: version_description
    description: Brief description of the version
    tests:
    - not_null
  - name: version_name
    description: Name or number of the version
    tests:
    - not_null
  - name: is_overdue
    description: Indicates if the version is past due date
    tests:
    - not_null
  - name: project_id
    description: Identifier of the project the version belongs to
    tests:
    - not_null
  - name: is_released
    description: Indicates if the version has been released
    tests:
    - not_null
  - name: release_date
    description: Planned or actual date of version release
    cocoon_meta:
      missing_acceptable: Not applicable for unreleased versions.
  - name: start_date
    description: Date when work on the version began
    tests:
    - not_null

stg_status_category (first 100 rows)

status_id status_name
0 3 Done
1 2 To Do
2 4 In Progress

stg_status_category.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"status_category_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "name"
    FROM "status_category"
),

"status_category_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> status_id
    -- name -> status_name
    SELECT 
        "id" AS "status_id",
        "name" AS "status_name"
    FROM "status_category_projected"
)

-- COCOON BLOCK END
SELECT * FROM "status_category_projected_renamed"

stg_status_category.yml (Document the table)

version: 2
models:
- name: stg_status_category
  description: The table is about status categories. It contains different stages
    of task completion. Each category has an ID and a name. The categories include
    "Done", "To Do", and "In Progress". This table likely serves as a lookup for task
    management systems to assign status to tasks or projects.
  columns:
  - name: status_id
    description: Unique identifier for each status category
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each status category.
        For this table, each row represents a distinct status category. The status_id
        is likely to be unique across rows as it serves as an identifier.
  - name: status_name
    description: Descriptive name of the status category
    tests:
    - not_null
    - unique
    - accepted_values:
        values:
        - Done
        - In Progress
        - To Do
        - Not Started
        - Blocked
        - Cancelled
        - On Hold
        - Deferred
        - Under Review
        - Completed
        - Pending Approval
    cocoon_meta:
      uniqueness: This column contains the descriptive name of the status category.
        For this table, each row represents a distinct status category. While status
        names are likely to be unique in this context, it's possible (though unlikely)
        that two different statuses could have the same name in some systems.

stg_sprint (first 100 rows)

sprint_id board_id sprint_name completion_date end_date start_date
0 4 5 TNGP Sprint 1 NaT 2020-11-26 21:25:12 2020-11-12 21:25:17.778
1 1 0 Sample Sprint 2 2020-11-10 20:52:38.262 2020-11-17 10:29:44 2020-11-03 10:09:44.653
2 3 7 Sample Sprint 2 NaT 2020-11-26 21:55:00 2020-11-12 21:55:42.220

stg_sprint.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"sprint_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "board_id",
        "complete_date",
        "end_date",
        "name",
        "start_date"
    FROM "sprint"
),

"sprint_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> sprint_id
    -- complete_date -> completion_date
    -- name -> sprint_name
    SELECT 
        "id" AS "sprint_id",
        "board_id",
        "complete_date" AS "completion_date",
        "end_date",
        "name" AS "sprint_name",
        "start_date"
    FROM "sprint_projected"
),

"sprint_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- end_date: The problem is that one of the date-time values ('2020-11-17 10:29:44.653') includes milliseconds, while the others don't. The correct values should all follow the same format without milliseconds, which is the most common format in the dataset. 
    SELECT
        "sprint_id",
        "board_id",
        "completion_date",
        CASE
            WHEN "end_date" = '2020-11-17 10:29:44.653' THEN '2020-11-17 10:29:44'
            ELSE "end_date"
        END AS "end_date",
        "sprint_name",
        "start_date"
    FROM "sprint_projected_renamed"
),

"sprint_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- completion_date: from VARCHAR to TIMESTAMP
    -- end_date: from VARCHAR to TIMESTAMP
    -- start_date: from VARCHAR to TIMESTAMP
    SELECT
        "sprint_id",
        "board_id",
        "sprint_name",
        CAST("completion_date" AS TIMESTAMP) AS "completion_date",
        CAST("end_date" AS TIMESTAMP) AS "end_date",
        CAST("start_date" AS TIMESTAMP) AS "start_date"
    FROM "sprint_projected_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT * FROM "sprint_projected_renamed_cleaned_casted"

stg_sprint.yml (Document the table)

version: 2
models:
- name: stg_sprint
  description: The table is about sprints. It contains details of each sprint, including
    an ID, associated board ID, start date, end date, and name. Some sprints have
    a complete date, while others do not. The table tracks the timeline and status
    of different sprints across various project boards.
  columns:
  - name: sprint_id
    description: Unique identifier for the sprint
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is a unique identifier for each sprint. For this table,
        each row represents a distinct sprint, and sprint_id is unique across rows.
  - name: board_id
    description: Identifier for the associated project board
    tests:
    - not_null
  - name: sprint_name
    description: Name or title of the sprint
    tests:
    - not_null
  - name: completion_date
    description: Date when the sprint was finished
    cocoon_meta:
      missing_acceptable: Sprint may not be completed yet.
  - name: end_date
    description: Planned end date of the sprint
    tests:
    - not_null
  - name: start_date
    description: Date when the sprint began
    tests:
    - not_null

stg_project_category (first 100 rows)

category_id
0 10000

stg_project_category.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"project_category_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name"
    FROM "project_category"
),

"project_category_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> category_id
    -- description -> category_description
    -- name -> category_name
    SELECT 
        "id" AS "category_id",
        "description" AS "category_description",
        "name" AS "category_name"
    FROM "project_category_projected"
),

"project_category_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- category_description: The problem is that the category_description column contains a question about database structure instead of an actual category description. This is completely inappropriate for a category description field. The correct value should be an empty string or NULL, as we don't have the actual category description information. 
    -- category_name: The problem is that 'test-category' appears to be a placeholder value rather than a genuine category name. It's likely used as a temporary or default value during testing or development, and doesn't represent an actual product category. In a real dataset, we would expect to see meaningful category names that describe the types of products being sold. 
    SELECT
        "category_id",
        CASE
            WHEN "category_description" = 'will this create a project_category table?' THEN ''
            ELSE "category_description"
        END AS "category_description",
        CASE
            WHEN "category_name" = 'test-category' THEN ''
            ELSE "category_name"
        END AS "category_name"
    FROM "project_category_projected_renamed"
),

"project_category_projected_renamed_cleaned_null" AS (
    -- NULL Imputation: Impute Null to Disguised Missing Values
    -- category_description: ['']
    -- category_name: ['']
    SELECT 
        CASE
            WHEN "category_description" = '' THEN NULL
            ELSE "category_description"
        END AS "category_description",
        CASE
            WHEN "category_name" = '' THEN NULL
            ELSE "category_name"
        END AS "category_name",
        "category_id"
    FROM "project_category_projected_renamed_cleaned"
),

"project_category_projected_renamed_cleaned_null_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- category_description has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- category_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "category_id"
    FROM "project_category_projected_renamed_cleaned_null"
)

-- COCOON BLOCK END
SELECT * FROM "project_category_projected_renamed_cleaned_null_missing_handled"

stg_project_category.yml (Document the table)

version: 2
models:
- name: stg_project_category
  description: The table is about project categories. It contains unique identifiers
    for each category. It stores the name of each category. It includes a description
    for each category. The table likely serves to organize and classify different
    types of projects within a system.
  columns:
  - name: category_id
    description: Unique identifier for each project category
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each project category.
        For this table, each row corresponds to a distinct project category. The category_id
        is unique across rows, as it's described as a "Unique identifier for each
        project category".

stg_issue_field_history (first 100 rows)

field_identifier new_field_value change_timestamp issue_identifier
0 created 2020-11-10T19:19:41.472Z 2020-11-10 19:19:41.472 10027
1 customfield_10104 3.0 2020-11-10 19:19:41.472 10027
2 summary As a developer, I'd like to update story status during the sprint >> Click the Active sprints link at the top right of the screen to go to the Active sprints where the current Sprint's items can be updated 2020-11-10 19:19:41.472 10027
3 components 10019 2020-11-10 19:19:41.472 10018
4 summary 10026 2020-11-10 19:19:41.472 10018
5 epic 1111111 2020-11-10 19:19:41.472 10015
6 epiclink https://ulr-here/epic-key 2020-11-10 19:19:41.472 10018

stg_issue_field_history.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"issue_field_history_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "field_id",
        "issue_id",
        "time_",
        "value_"
    FROM "issue_field_history"
),

"issue_field_history_projected_renamed" AS (
    -- Rename: Renaming columns
    -- field_id -> field_identifier
    -- issue_id -> issue_identifier
    -- time_ -> change_timestamp
    -- value_ -> new_field_value
    SELECT 
        "field_id" AS "field_identifier",
        "issue_id" AS "issue_identifier",
        "time_" AS "change_timestamp",
        "value_" AS "new_field_value"
    FROM "issue_field_history_projected"
),

"issue_field_history_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- change_timestamp: from VARCHAR to TIMESTAMP
    -- issue_identifier: from INT to VARCHAR
    SELECT
        "field_identifier",
        "new_field_value",
        CAST("change_timestamp" AS TIMESTAMP) AS "change_timestamp",
        CAST("issue_identifier" AS VARCHAR) AS "issue_identifier"
    FROM "issue_field_history_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "issue_field_history_projected_renamed_casted"

stg_issue_field_history.yml (Document the table)

version: 2
models:
- name: stg_issue_field_history
  description: The table is about issue field history. It tracks changes to various
    fields of issues in a tracking system. Each row represents a change, including
    the field ID, issue ID, timestamp, and new value. Fields can be system fields
    like "created" or custom fields. The table allows for tracking the evolution of
    issues over time.
  columns:
  - name: field_identifier
    description: Identifier for the issue field
    tests:
    - not_null
  - name: new_field_value
    description: New value of the changed field
    tests:
    - not_null
  - name: change_timestamp
    description: Timestamp of the field change
    tests:
    - not_null
  - name: issue_identifier
    description: Unique identifier for the issue
    tests:
    - not_null

stg_field_option (first 100 rows)

field_id option_name
0 10104 Option 2
1 10016 To Do
2 10106 Option 4
3 10019 Impediment
4 10026 Summary

stg_field_option.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"field_option_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "name"
    FROM "field_option"
),

"field_option_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> field_id
    -- name -> option_name
    SELECT 
        "id" AS "field_id",
        "name" AS "option_name"
    FROM "field_option_projected"
),

"field_option_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- option_name: The problem is that the option_name column contains inconsistent values. 'This is the summary of the issue.' is a full sentence among shorter labels, which is unusual. 'opt 2' and 'opt 4' are inconsistently named compared to the other options. The correct values should be concise labels that describe the option or status. 'Impediment' and 'To Do' are already in the correct format. 
    SELECT
        "field_id",
        CASE
            WHEN "option_name" = 'This is the summary of the issue.' THEN 'Summary'
            WHEN "option_name" = 'opt 2' THEN 'Option 2'
            WHEN "option_name" = 'opt 4' THEN 'Option 4'
            ELSE "option_name"
        END AS "option_name"
    FROM "field_option_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "field_option_projected_renamed_cleaned"

stg_field_option.yml (Document the table)

version: 2
models:
- name: stg_field_option
  description: The table is about field options. It contains IDs and names for various
    options. These options appear to be status labels or descriptors, possibly used
    in a task management or issue tracking system. Each option has a unique numeric
    ID and a corresponding text name.
  columns:
  - name: field_id
    description: Unique numeric identifier for each field
    tests:
    - not_null
  - name: option_name
    description: Text label or description of the option
    tests:
    - not_null
    - accepted_values:
        values:
        - Impediment
        - Option 2
        - Option 4
        - Summary
        - To Do
        - In Progress
        - Done
        - Blocked
        - Backlog
        - Ready for Review
        - Under Review
        - Approved
        - Rejected
        - Deferred

stg_issue_multiselect_history (first 100 rows)

record_id field_id issue_id change_timestamp new_field_value
0 w4UiT+hPMxJp3RhL/YFJm3uWL5U= subtasks 10027 2020-11-10 19:19:41.472 0
1 4pVgGn0qSqR2hCmMdo4wWHXmgew= customfield_10021 10027 2020-11-10 19:19:41.472 0
2 /zrY8m6q0VMW6ia1jGIerXqLIgQ= customfield_10020 10027 2020-11-10 19:19:41.472 0
3 3p3gGn0qSqR2hCmMdo4wWHXa32m= components 10027 2020-11-10 19:19:41.472 0

stg_issue_multiselect_history.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"issue_multiselect_history_projected" AS (
    -- Projection: Selecting 5 out of 6 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "_fivetran_id",
        "time_",
        "field_id",
        "issue_id",
        "value_"
    FROM "issue_multiselect_history"
),

"issue_multiselect_history_projected_renamed" AS (
    -- Rename: Renaming columns
    -- _fivetran_id -> record_id
    -- time_ -> change_timestamp
    -- value_ -> new_field_value
    SELECT 
        "_fivetran_id" AS "record_id",
        "time_" AS "change_timestamp",
        "field_id",
        "issue_id",
        "value_" AS "new_field_value"
    FROM "issue_multiselect_history_projected"
),

"issue_multiselect_history_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- change_timestamp: from VARCHAR to TIMESTAMP
    -- new_field_value: from INT to VARCHAR
    SELECT
        "record_id",
        "field_id",
        "issue_id",
        CAST("change_timestamp" AS TIMESTAMP) AS "change_timestamp",
        CAST("new_field_value" AS VARCHAR) AS "new_field_value"
    FROM "issue_multiselect_history_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "issue_multiselect_history_projected_renamed_casted"

stg_issue_multiselect_history.yml (Document the table)

version: 2
models:
- name: stg_issue_multiselect_history
  description: The table is about issue field value history. It tracks changes to
    various fields of an issue over time. Each row represents a change, recording
    the field ID, issue ID, new value, and timestamp. The _fivetran_id appears to
    be a unique identifier for each record.
  columns:
  - name: record_id
    description: Unique identifier for each record
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column appears to be a unique identifier for each change record.
        For this table, each row represents a single change to an issue field. The
        record_id seems to be unique across rows, as it's described as a unique identifier
        for each record.
  - name: field_id
    description: Identifier of the issue field that changed
    tests:
    - not_null
  - name: issue_id
    description: Identifier of the issue being tracked
    tests:
    - not_null
  - name: change_timestamp
    description: Timestamp of when the field value changed
    tests:
    - not_null
  - name: new_field_value
    description: New value of the field after the change
    tests:
    - not_null

stg_user_group (first 100 rows)

group_name user_identifier
0 jira-software-users 5faf114ad74f69007579568f
1 site-admins 5d577670dbb98c0d9c232bb2
2 confluence-users 557058:a9572f6a-3041-435a-96ed-5286e3811b33

stg_user_group.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"user_group_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "group_name",
        "user_id"
    FROM "user_group"
),

"user_group_projected_renamed" AS (
    -- Rename: Renaming columns
    -- user_id -> user_identifier
    SELECT 
        "group_name",
        "user_id" AS "user_identifier"
    FROM "user_group_projected"
)

-- COCOON BLOCK END
SELECT * FROM "user_group_projected_renamed"

stg_user_group.yml (Document the table)

version: 2
models:
- name: stg_user_group
  description: The table represents a many-to-many relationship between user groups
    and users. It shows which users belong to which groups. The 'group_name' column
    contains the names of different user groups. The 'user_id' column holds unique
    identifiers for individual users. This structure allows users to be part of multiple
    groups and groups to have multiple users.
  columns:
  - name: group_name
    description: Name of the user group
    tests:
    - not_null
    - accepted_values:
        values:
        - confluence-users
        - jira-software-users
        - site-admins
        - bitbucket-users
        - bamboo-users
        - fisheye-users
        - crucible-users
        - crowd-users
        - jira-core-users
        - jira-service-desk-users
        - confluence-administrators
        - jira-administrators
        - bitbucket-administrators
        - bamboo-administrators
        - fisheye-administrators
        - crucible-administrators
        - crowd-administrators
        - system-administrators
        - project-administrators
        - read-only-users
        - developers
        - testers
        - product-managers
        - support-team
        - marketing-team
        - sales-team
        - hr-team
  - name: user_identifier
    description: Unique identifier for individual users
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column contains unique identifiers for individual users. For
        this table, each row represents a relationship between a user group and a
        user. The user_identifier is likely to be unique across rows, as it's described
        as a unique identifier for users.

stg_priority (first 100 rows)

priority_id priority_impact priority_level
0 2 Serious problem that could block progress. High
1 5 Trivial problem with little or no impact on progress. Lowest
2 1 This problem will block progress. Highest
3 3 Has the potential to affect progress. Medium
4 4 Minor problem or easily worked around. Low

stg_priority.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"priority_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name"
    FROM "priority"
),

"priority_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> priority_id
    -- description -> priority_impact
    -- name -> priority_level
    SELECT 
        "id" AS "priority_id",
        "description" AS "priority_impact",
        "name" AS "priority_level"
    FROM "priority_projected"
)

-- COCOON BLOCK END
SELECT * FROM "priority_projected_renamed"

stg_priority.yml (Document the table)

version: 2
models:
- name: stg_priority
  description: The table is about priority levels. It contains five priority levels,
    each with an ID, description, and name. The levels range from Lowest to Highest.
    Descriptions explain the impact of each priority level on progress. The table
    helps categorize and understand the severity of problems or issues.
  columns:
  - name: priority_id
    description: Unique identifier for each priority level
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each priority level.
        For this table, each row corresponds to a distinct priority level. The priority_id
        is unique across rows as it's designed to be an identifier.
  - name: priority_impact
    description: Explains the impact of the priority level
    tests:
    - not_null
    - accepted_values:
        values:
        - Has the potential to affect progress.
        - Minor problem or easily worked around.
        - Serious problem that could block progress.
        - This problem will block progress.
        - Trivial problem with little or no impact on progress.
  - name: priority_level
    description: Short label for the priority level
    tests:
    - not_null
    - accepted_values:
        values:
        - High
        - Highest
        - Low
        - Lowest
        - Medium

stg_resolution (first 100 rows)

resolution_id status_description status_name
0 10000 Work has been completed on this issue. Done

stg_resolution.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"resolution_projected" AS (
    -- Projection: Selecting 3 out of 4 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name"
    FROM "resolution"
),

"resolution_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> resolution_id
    -- description -> status_description
    -- name -> status_name
    SELECT 
        "id" AS "resolution_id",
        "description" AS "status_description",
        "name" AS "status_name"
    FROM "resolution_projected"
)

-- COCOON BLOCK END
SELECT * FROM "resolution_projected_renamed"

stg_resolution.yml (Document the table)

version: 2
models:
- name: stg_resolution
  description: The table is about resolution statuses for issues or tasks. It contains
    details of different resolution states. Each status has an ID, a description explaining
    what it means, and a short name. The example shows the "Done" status, indicating
    when work on an issue is completed.
  columns:
  - name: resolution_id
    description: Unique identifier for the resolution status
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each resolution status.
        For this table, each row represents a distinct resolution status. The resolution_id
        is designed to be unique across rows, as it's typically used to uniquely identify
        each status.
  - name: status_description
    description: Detailed explanation of the resolution status
    tests:
    - not_null
  - name: status_name
    description: Short name of the resolution status
    tests:
    - not_null
    - accepted_values:
        values:
        - To Do
        - In Progress
        - Done
        - Blocked
        - On Hold
        - Cancelled
        - Rejected
        - Reopened
        - Resolved
        - Closed

stg_issue_type (first 100 rows)

issue_description issue_type is_subtask issue_type_id
0 A problem or error. Bug False 10117
1 A problem or error. Bug False 10106
2 Track large pieces of work. Epic False 10101

stg_issue_type.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"issue_type_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name",
        "subtask"
    FROM "issue_type"
),

"issue_type_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> issue_type_id
    -- description -> issue_description
    -- name -> issue_type
    -- subtask -> is_subtask
    SELECT 
        "id" AS "issue_type_id",
        "description" AS "issue_description",
        "name" AS "issue_type",
        "subtask" AS "is_subtask"
    FROM "issue_type_projected"
),

"issue_type_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- issue_description: The problem is that 'A problem or error.' and 'Bugs track problems or errors.' are redundant representations of the same concept. They both refer to issues or bugs in a system. The correct values should be distinct and represent different types of issues. 'Track large pieces of work.' is a distinct value and doesn't need to be changed. 
    SELECT
        "issue_type_id",
        CASE
            WHEN "issue_description" = 'Bugs track problems or errors.' THEN 'A problem or error.'
            ELSE "issue_description"
        END AS "issue_description",
        "issue_type",
        "is_subtask"
    FROM "issue_type_projected_renamed"
),

"issue_type_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- issue_type_id: from INT to VARCHAR
    SELECT
        "issue_description",
        "issue_type",
        "is_subtask",
        CAST("issue_type_id" AS VARCHAR) AS "issue_type_id"
    FROM "issue_type_projected_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT * FROM "issue_type_projected_renamed_cleaned_casted"

stg_issue_type.yml (Document the table)

version: 2
models:
- name: stg_issue_type
  description: The table is about issue types. It contains details of different issue
    categories. Each issue type has an ID, description, name, and a flag indicating
    if it's a subtask. The table includes examples like Bug and Epic. It provides
    a brief explanation of each issue type's purpose or characteristics.
  columns:
  - name: issue_description
    description: Brief explanation of the issue type
    tests:
    - not_null
    - accepted_values:
        values:
        - A problem or error
        - Track large pieces of work
        - A new feature request
        - An improvement suggestion
        - A task to be completed
        - A question or inquiry
        - Documentation update
        - Security vulnerability
        - Performance issue
        - User interface/experience issue
        - Testing/QA task
        - Maintenance task
        - Integration issue
        - Configuration change
        - Data-related issue
        - A problem or error.
        - Track large pieces of work.
  - name: issue_type
    description: Short name of the issue type
    tests:
    - not_null
    - accepted_values:
        values:
        - Bug
        - Epic
        - Task
        - Story
        - Subtask
        - Improvement
        - New Feature
        - Problem
        - Incident
        - Change Request
        - Defect
        - Risk
        - Spike
  - name: is_subtask
    description: Indicates if the issue type is a subtask
    tests:
    - not_null
  - name: issue_type_id
    description: Unique identifier for the issue type
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each issue type.
        For this table, each row represents a distinct issue type. The issue_type_id
        is likely to be unique across rows as it's described as a "Unique identifier
        for the issue type".

stg_issue (first 100 rows)

issue_id is_deleted creator_id description issue_type_id issue_key priority_id project_id reporter_id status_id summary assignee created_at due_date environment last_viewed_at original_estimate original_estimate_seconds parent_issue_id remaining_estimate remaining_estimate_seconds resolution_id resolved_at status_category_changed_at time_spent time_spent_seconds updated_at work_ratio
0 10011 False 5c3326c24b248c315badf3ee Yes, this is an issue to test permissions 10103 TCP-8 3 10001 5c3326c24b248c315badf3ee 3 We want to test permissions None 2020-04-01 08:33:58 NaT None NaT None NaN NaN None NaN NaN 2020-05-29 11:33:53 2020-04-01 08:34:02 None NaN 2020-05-29 11:33:53 -1.0
1 10015 False 5ed0edbe2999b60c23255b4f None 10103 TCP-12 5 10001 5ed0edbe2999b60c23255b4f 10003 Part of test epic None 2020-06-11 07:38:03 NaT None NaT None NaN NaN None NaN NaN 2020-06-11 07:38:03 2020-06-11 07:38:03 None NaN 2020-06-11 07:38:03 -1.0
2 10018 False 557058:a9572f6a-3041-435a-96ed-5286e3811b33 None 10103 TCP-1 3 10005 557058:a9572f6a-3041-435a-96ed-5286e3811b33 10003 this is a story None 2020-11-10 19:11:45 NaT None 2020-11-10 19:11:46 None NaN NaN None NaN 10000.0 2020-11-28 19:11:45 2020-11-28 19:11:46 None NaN 2020-11-28 19:11:45 -1.0

stg_issue.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"issue_projected" AS (
    -- Projection: Selecting 28 out of 29 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "_fivetran_deleted",
        "_original_estimate",
        "_remaining_estimate",
        "_time_spent",
        "assignee",
        "created",
        "creator",
        "description",
        "due_date",
        "environment",
        "issue_type",
        "key_",
        "last_viewed",
        "original_estimate",
        "parent_id",
        "priority",
        "project",
        "remaining_estimate",
        "reporter",
        "resolution",
        "resolved",
        "status",
        "status_category_changed",
        "summary",
        "time_spent",
        "updated",
        "work_ratio"
    FROM "issue"
),

"issue_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> issue_id
    -- _fivetran_deleted -> is_deleted
    -- _original_estimate -> original_estimate_seconds
    -- _remaining_estimate -> remaining_estimate_seconds
    -- _time_spent -> time_spent_seconds
    -- created -> created_at
    -- creator -> creator_id
    -- issue_type -> issue_type_id
    -- key_ -> issue_key
    -- last_viewed -> last_viewed_at
    -- parent_id -> parent_issue_id
    -- priority -> priority_id
    -- project -> project_id
    -- reporter -> reporter_id
    -- resolution -> resolution_id
    -- resolved -> resolved_at
    -- status -> status_id
    -- status_category_changed -> status_category_changed_at
    -- updated -> updated_at
    SELECT 
        "id" AS "issue_id",
        "_fivetran_deleted" AS "is_deleted",
        "_original_estimate" AS "original_estimate_seconds",
        "_remaining_estimate" AS "remaining_estimate_seconds",
        "_time_spent" AS "time_spent_seconds",
        "assignee",
        "created" AS "created_at",
        "creator" AS "creator_id",
        "description",
        "due_date",
        "environment",
        "issue_type" AS "issue_type_id",
        "key_" AS "issue_key",
        "last_viewed" AS "last_viewed_at",
        "original_estimate",
        "parent_id" AS "parent_issue_id",
        "priority" AS "priority_id",
        "project" AS "project_id",
        "remaining_estimate",
        "reporter" AS "reporter_id",
        "resolution" AS "resolution_id",
        "resolved" AS "resolved_at",
        "status" AS "status_id",
        "status_category_changed" AS "status_category_changed_at",
        "summary",
        "time_spent",
        "updated" AS "updated_at",
        "work_ratio"
    FROM "issue_projected"
),

"issue_projected_renamed_cleaned" AS (
    -- Clean unusual string values: 
    -- issue_key: The problem is inconsistent prefixes in the issue_key column. The prefixes 'TCP' and 'TP' are likely meant to represent the same concept but are inconsistently applied. Given that 'TCP' appears in the most frequent value ('TCP-1'), it's likely the intended prefix. The correct values should all use the 'TCP' prefix for consistency. 
    SELECT
        "issue_id",
        "is_deleted",
        "original_estimate_seconds",
        "remaining_estimate_seconds",
        "time_spent_seconds",
        "assignee",
        "created_at",
        "creator_id",
        "description",
        "due_date",
        "environment",
        "issue_type_id",
        CASE
            WHEN "issue_key" = 'TP-12' THEN 'TCP-12'
            WHEN "issue_key" = 'TP-8' THEN 'TCP-8'
            ELSE "issue_key"
        END AS "issue_key",
        "last_viewed_at",
        "original_estimate",
        "parent_issue_id",
        "priority_id",
        "project_id",
        "remaining_estimate",
        "reporter_id",
        "resolution_id",
        "resolved_at",
        "status_id",
        "status_category_changed_at",
        "summary",
        "time_spent",
        "updated_at",
        "work_ratio"
    FROM "issue_projected_renamed"
),

"issue_projected_renamed_cleaned_casted" AS (
    -- Column Type Casting: 
    -- assignee: from DECIMAL to VARCHAR
    -- created_at: from VARCHAR to TIMESTAMP
    -- due_date: from DECIMAL to DATE
    -- environment: from DECIMAL to VARCHAR
    -- last_viewed_at: from VARCHAR to TIMESTAMP
    -- original_estimate: from DECIMAL to VARCHAR
    -- original_estimate_seconds: from DECIMAL to INT
    -- parent_issue_id: from DECIMAL to INT
    -- remaining_estimate: from DECIMAL to VARCHAR
    -- remaining_estimate_seconds: from DECIMAL to INT
    -- resolution_id: from DECIMAL to INT
    -- resolved_at: from VARCHAR to TIMESTAMP
    -- status_category_changed_at: from VARCHAR to TIMESTAMP
    -- time_spent: from DECIMAL to VARCHAR
    -- time_spent_seconds: from DECIMAL to INT
    -- updated_at: from VARCHAR to TIMESTAMP
    -- work_ratio: from INT to DECIMAL
    SELECT
        "issue_id",
        "is_deleted",
        "creator_id",
        "description",
        "issue_type_id",
        "issue_key",
        "priority_id",
        "project_id",
        "reporter_id",
        "status_id",
        "summary",
        CAST("assignee" AS VARCHAR) AS "assignee",
        CAST("created_at" AS TIMESTAMP) AS "created_at",
        CAST("due_date" AS DATE) AS "due_date",
        CAST("environment" AS VARCHAR) AS "environment",
        CAST("last_viewed_at" AS TIMESTAMP) AS "last_viewed_at",
        CAST("original_estimate" AS VARCHAR) AS "original_estimate",
        CAST("original_estimate_seconds" AS INT) AS "original_estimate_seconds",
        CAST("parent_issue_id" AS INT) AS "parent_issue_id",
        CAST("remaining_estimate" AS VARCHAR) AS "remaining_estimate",
        CAST("remaining_estimate_seconds" AS INT) AS "remaining_estimate_seconds",
        CAST("resolution_id" AS INT) AS "resolution_id",
        CAST("resolved_at" AS TIMESTAMP) AS "resolved_at",
        CAST("status_category_changed_at" AS TIMESTAMP) AS "status_category_changed_at",
        CAST("time_spent" AS VARCHAR) AS "time_spent",
        CAST("time_spent_seconds" AS INT) AS "time_spent_seconds",
        CAST("updated_at" AS TIMESTAMP) AS "updated_at",
        CAST("work_ratio" AS DECIMAL) AS "work_ratio"
    FROM "issue_projected_renamed_cleaned"
)

-- COCOON BLOCK END
SELECT * FROM "issue_projected_renamed_cleaned_casted"

stg_issue.yml (Document the table)

version: 2
models:
- name: stg_issue
  description: The table is about issue tracking. It contains details like issue ID,
    type, status, priority, and project. It also includes information on who created
    and assigned the issue, when it was created and updated, and its resolution status.
    The table tracks time estimates and actual time spent on issues.
  columns:
  - name: issue_id
    description: Unique identifier for the issue
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each issue. For this
        table, each row represents a distinct issue, and issue_id is unique across
        rows.
  - name: is_deleted
    description: Indicates if the record was deleted
    tests:
    - not_null
  - name: creator_id
    description: User who created the issue
    tests:
    - not_null
  - name: description
    description: Detailed description of the issue
    tests:
    - not_null
  - name: issue_type_id
    description: Type of the issue (e.g., bug, task)
    tests:
    - not_null
  - name: issue_key
    description: Unique key for the issue
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique key for the issue. It appears to
        be a combination of project code and issue number, which should be unique
        for each issue.
  - name: priority_id
    description: Priority level of the issue
    tests:
    - not_null
  - name: project_id
    description: Project to which the issue belongs
    tests:
    - not_null
  - name: reporter_id
    description: User who reported the issue
    tests:
    - not_null
  - name: status_id
    description: Current status of the issue
    tests:
    - not_null
  - name: summary
    description: Brief summary of the issue
    tests:
    - not_null
  - name: assignee
    description: User assigned to the issue
    cocoon_meta:
      missing_acceptable: Issue may not be assigned to anyone yet
  - name: created_at
    description: Date and time when the issue was created
    tests:
    - not_null
  - name: due_date
    description: Deadline for the issue
    cocoon_meta:
      missing_acceptable: Issue may not have a set deadline
  - name: environment
    description: Environment where the issue occurs
    cocoon_meta:
      missing_acceptable: Issue may not be environment-specific
  - name: last_viewed_at
    description: Date and time when the issue was last viewed
    tests:
    - not_null
  - name: original_estimate
    description: Original time estimate (human-readable format)
    cocoon_meta:
      missing_acceptable: Time estimate may not be required for all issues
  - name: original_estimate_seconds
    description: Original time estimate for the issue
    cocoon_meta:
      missing_acceptable: Time estimate may not be required for all issues
  - name: parent_issue_id
    description: ID of the parent issue
    cocoon_meta:
      missing_acceptable: Issue may not be a sub-task of another issue
  - name: remaining_estimate
    description: Remaining time estimate (human-readable format)
    cocoon_meta:
      missing_acceptable: Time estimate may not be required for all issues
  - name: remaining_estimate_seconds
    description: Remaining time estimate for the issue
    cocoon_meta:
      missing_acceptable: Time estimate may not be required for all issues
  - name: resolution_id
    description: Resolution identifier
    tests:
    - not_null
  - name: resolved_at
    description: Date and time when the issue was resolved
    tests:
    - not_null
  - name: status_category_changed_at
    description: Date and time when status category last changed
    tests:
    - not_null
  - name: time_spent
    description: Time spent on the issue (human-readable format)
    cocoon_meta:
      missing_acceptable: No time may have been logged on the issue
  - name: time_spent_seconds
    description: Actual time spent on the issue
    cocoon_meta:
      missing_acceptable: No time may have been logged on the issue
  - name: updated_at
    description: Date and time when the issue was last updated
    tests:
    - not_null
  - name: work_ratio
    description: Ratio of work done to total work
    tests:
    - not_null

stg_comment (first 100 rows)

comment_id comment_text is_public issue_id author_id creation_timestamp last_update_author_id last_update_timestamp
0 10001 Hello True 10025 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-10 19:19:41.224 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-10 19:19:41.224
1 10013 To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago True 10035 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717
2 10012 Joined Sample Sprint 2 7 days 9 hours 10 minutes ago True 10035 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717

stg_comment.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"comment_projected" AS (
    -- Projection: Selecting 8 out of 9 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "author_id",
        "body",
        "created",
        "is_public",
        "issue_id",
        "update_author_id",
        "updated"
    FROM "comment"
),

"comment_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> comment_id
    -- body -> comment_text
    -- created -> creation_timestamp
    -- update_author_id -> last_update_author_id
    -- updated -> last_update_timestamp
    SELECT 
        "id" AS "comment_id",
        "author_id",
        "body" AS "comment_text",
        "created" AS "creation_timestamp",
        "is_public",
        "issue_id",
        "update_author_id" AS "last_update_author_id",
        "updated" AS "last_update_timestamp"
    FROM "comment_projected"
),

"comment_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- author_id: from VARCHAR to UUID
    -- creation_timestamp: from VARCHAR to TIMESTAMP
    -- last_update_author_id: from VARCHAR to UUID
    -- last_update_timestamp: from VARCHAR to TIMESTAMP
    SELECT
        "comment_id",
        "comment_text",
        "is_public",
        "issue_id",
        CAST(SPLIT_PART("author_id", ':', 2) AS UUID) AS "author_id",
        CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
        CAST(SUBSTRING("last_update_author_id" FROM POSITION(':' IN "last_update_author_id") + 1) AS UUID) AS "last_update_author_id",
        CAST("last_update_timestamp" AS TIMESTAMP) AS "last_update_timestamp"
    FROM "comment_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "comment_projected_renamed_casted"

stg_comment.yml (Document the table)

version: 2
models:
- name: stg_comment
  description: The table is about comments on issues. Each comment has an ID, author,
    body text, creation date, public visibility status, associated issue ID, update
    author, and update date. Comments are linked to specific issues and can be updated.
    The table tracks both the original creation and any subsequent updates to the
    comment.
  columns:
  - name: comment_id
    description: Unique identifier for the comment
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each comment. For
        this table, each row represents a distinct comment, and comment_id is designed
        to be unique across rows.
  - name: comment_text
    description: Text content of the comment
    tests:
    - not_null
  - name: is_public
    description: Visibility status of the comment
    tests:
    - not_null
  - name: issue_id
    description: ID of the associated issue
    tests:
    - not_null
  - name: author_id
    description: ID of the comment author
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp of comment creation
    tests:
    - not_null
  - name: last_update_author_id
    description: ID of the author who last updated the comment
    tests:
    - not_null
  - name: last_update_timestamp
    description: Timestamp of the last update
    tests:
    - not_null

stg_component (first 100 rows)

component_id component_name project_id component_description
0 10004 Component 5 10001 None
1 10003 Component 4 10001 None
2 10002 Component 3 10001 None
3 10019 Component 2 10001 None

stg_component.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"component_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name",
        "project_id"
    FROM "component"
),

"component_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> component_id
    -- description -> component_description
    -- name -> component_name
    SELECT 
        "id" AS "component_id",
        "description" AS "component_description",
        "name" AS "component_name",
        "project_id"
    FROM "component_projected"
),

"component_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- component_description: from DECIMAL to VARCHAR
    SELECT
        "component_id",
        "component_name",
        "project_id",
        CAST("component_description" AS VARCHAR) AS "component_description"
    FROM "component_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "component_projected_renamed_casted"

stg_component.yml (Document the table)

version: 2
models:
- name: stg_component
  description: The table is about components. Each component has an ID, name, and
    description. Components are associated with projects through a project_id. The
    description field appears to be empty for these samples. The table likely represents
    a list of components within a project management or software development system.
  columns:
  - name: component_id
    description: Unique identifier for the component
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each component. For
        this table, each row represents a distinct component, and component_id appears
        to be unique across rows as it's an ID field.
  - name: component_name
    description: Name of the component
    tests:
    - not_null
  - name: project_id
    description: Identifier of the project the component belongs to
    tests:
    - not_null
  - name: component_description
    description: Brief description of the component
    cocoon_meta:
      missing_acceptable: Optional field, not all components require a description.

stg_status (first 100 rows)

status_description status_name category_id status_id
0 None In Progress 4 10010
1 This issue is being actively worked on at the moment by the assignee. In Progress 4 3
2 None To Do 2 10000

stg_status.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"status_projected" AS (
    -- Projection: Selecting 4 out of 5 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "description",
        "name",
        "status_category_id"
    FROM "status"
),

"status_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> status_id
    -- description -> status_description
    -- name -> status_name
    -- status_category_id -> category_id
    SELECT 
        "id" AS "status_id",
        "description" AS "status_description",
        "name" AS "status_name",
        "status_category_id" AS "category_id"
    FROM "status_projected"
),

"status_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- status_id: from INT to VARCHAR
    SELECT
        "status_description",
        "status_name",
        "category_id",
        CAST("status_id" AS VARCHAR) AS "status_id"
    FROM "status_projected_renamed"
)

-- COCOON BLOCK END
SELECT * FROM "status_projected_renamed_casted"

stg_status.yml (Document the table)

version: 2
models:
- name: stg_status
  description: The table is about issue statuses. It contains unique identifiers,
    descriptions, names, and category IDs for various statuses. The statuses include
    "In Progress" and "To Do". Some entries have detailed descriptions, while others
    are left blank. Each status is associated with a specific category through the
    status_category_id field.
  columns:
  - name: status_description
    description: Detailed explanation of the status
    cocoon_meta:
      missing_acceptable: Not needed for standard/predefined statuses.
  - name: status_name
    description: Name of the status
    tests:
    - not_null
    - accepted_values:
        values:
        - To Do
        - In Progress
        - Blocked
        - In Review
        - Done
        - Cancelled
  - name: category_id
    description: ID of the category the status belongs to
    tests:
    - not_null
  - name: status_id
    description: Unique identifier for the status
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column is described as a unique identifier for the status.
        For this table, each row represents a unique status. The status_id is designed
        to be unique across all rows, ensuring that each status has a distinct identifier.

stg_user (first 100 rows)

user_id language_preference full_name time_zone
0 557058:a9572f6a-3041-435a-96ed-5286e3811b33 en_US Scooby Doo America/Los_Angeles
1 5ed0edbe2999b60c23255b4f en_US Scrappy Cornelius Doo America/Los_Angeles
2 5c3326c24b248c315badf3ee en_US Norville Shaggy Rogers America/Los_Angeles

stg_user.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"user_projected" AS (
    -- Projection: Selecting 6 out of 7 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "id",
        "email",
        "locale",
        "name",
        "time_zone",
        "username"
    FROM "user"
),

"user_projected_renamed" AS (
    -- Rename: Renaming columns
    -- id -> user_id
    -- locale -> language_preference
    -- name -> full_name
    SELECT 
        "id" AS "user_id",
        "email",
        "locale" AS "language_preference",
        "name" AS "full_name",
        "time_zone",
        "username"
    FROM "user_projected"
),

"user_projected_renamed_casted" AS (
    -- Column Type Casting: 
    -- email: from DECIMAL to VARCHAR
    -- username: from DECIMAL to VARCHAR
    SELECT
        "user_id",
        "language_preference",
        "full_name",
        "time_zone",
        CAST("email" AS VARCHAR) AS "email",
        CAST("username" AS VARCHAR) AS "username"
    FROM "user_projected_renamed"
),

"user_projected_renamed_casted_missing_handled" AS (
    -- Handling missing values: There are 2 columns with unacceptable missing values
    -- email has 100.0 percent missing. Strategy: 🗑️ Drop Column
    -- username has 100.0 percent missing. Strategy: 🗑️ Drop Column
    SELECT
        "user_id",
        "language_preference",
        "full_name",
        "time_zone"
    FROM "user_projected_renamed_casted"
)

-- COCOON BLOCK END
SELECT * FROM "user_projected_renamed_casted_missing_handled"

stg_user.yml (Document the table)

version: 2
models:
- name: stg_user
  description: The table is about user profiles. It contains unique identifiers, email
    addresses, language preferences, full names, time zones, and usernames. The samples
    show fictional characters like Scooby Doo and Shaggy Rogers. Some fields are empty,
    suggesting optional information. The table likely represents user account data
    for a system or application.
  columns:
  - name: user_id
    description: Unique identifier for the user account
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each user account.
        For this table, each row represents a distinct user profile. The user_id appears
        to be unique across rows, as it contains both numeric and UUID-style identifiers.
  - name: language_preference
    description: User's language and region preference
    tests:
    - not_null
  - name: full_name
    description: User's full name
    tests:
    - not_null
  - name: time_zone
    description: User's preferred time zone
    tests:
    - not_null

stg_project_board (first 100 rows)

board_id project_id
0 7 10006
1 5 10004

stg_project_board.sql (clean the table)

-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH 
"project_board_projected" AS (
    -- Projection: Selecting 2 out of 3 columns
    -- Columns projected out: ['_fivetran_synced']
    SELECT 
        "board_id",
        "project_id"
    FROM "project_board"
)

-- COCOON BLOCK END
SELECT * FROM "project_board_projected"

stg_project_board.yml (Document the table)

version: 2
models:
- name: stg_project_board
  description: The table 'project_board' represents a relation between project boards
    and projects. It links each board to a specific project. The 'board_id' column
    identifies unique boards. The 'project_id' column associates each board with a
    particular project. This structure allows multiple boards to be organized within
    different projects.
  columns:
  - name: board_id
    description: Unique identifier for each project board
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column represents a unique identifier for each project board.
        For this table, each row represents a distinct project board. The board_id
        is unique across rows as it's described as a unique identifier.
  - name: project_id
    description: Identifier of the project associated with the board
    tests:
    - not_null
Some tables log change events, which may be redundant to query. Instead, we take a snapshot of the latest.

snapshot_issue_multiselect_history (first 100 rows)

record_id field_id issue_id new_field_value
0 /zrY8m6q0VMW6ia1jGIerXqLIgQ= customfield_10020 10027 0
1 w4UiT+hPMxJp3RhL/YFJm3uWL5U= subtasks 10027 0
2 4pVgGn0qSqR2hCmMdo4wWHXmgew= customfield_10021 10027 0
3 3p3gGn0qSqR2hCmMdo4wWHXa32m= components 10027 0

snapshot_issue_multiselect_history.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "issue_id", "field_id"
-- Effective date columns are "change_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "record_id",
    "field_id",
    "issue_id",
    "new_field_value"
FROM (
     SELECT 
            "record_id",
            "field_id",
            "issue_id",
            "new_field_value",
            ROW_NUMBER() OVER (
                PARTITION BY "issue_id", "field_id" 
                ORDER BY "change_timestamp" 
            DESC) AS "cocoon_rn"
    FROM "stg_issue_multiselect_history"
) ranked
WHERE "cocoon_rn" = 1

snapshot_issue_multiselect_history.yml (Document the table)

version: 2
models:
- name: snapshot_issue_multiselect_history
  description: The table is about current issue field values. It tracks the most recent
    state of various fields for each issue. Each row represents the latest value for
    a specific field of an issue. The table includes the issue ID, field ID, and the
    current field value. It provides a snapshot of the current state of issues without
    historical changes or timestamps.
  columns:
  - name: record_id
    description: Unique identifier for each record
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: This column appears to be a unique identifier for each change record.
        For this table, each row represents a single change to an issue field. The
        record_id seems to be unique across rows, as it's described as a unique identifier
        for each record.
  - name: field_id
    description: Identifier of the issue field that changed
    tests:
    - not_null
  - name: issue_id
    description: Identifier of the issue being tracked
    tests:
    - not_null
  - name: new_field_value
    description: New value of the field after the change
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_issue_multiselect_history

snapshot_comment (first 100 rows)

comment_id comment_text is_public issue_id author_id creation_timestamp last_update_author_id
0 10012 Joined Sample Sprint 2 7 days 9 hours 10 minutes ago True 10035 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717 a9572f6a-3041-435a-96ed-5286e3811b33
1 10001 Hello True 10025 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-10 19:19:41.224 a9572f6a-3041-435a-96ed-5286e3811b33
2 10013 To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago True 10035 a9572f6a-3041-435a-96ed-5286e3811b33 2020-11-07 02:45:38.717 a9572f6a-3041-435a-96ed-5286e3811b33

snapshot_comment.sql (clean the table)

-- Slowly Changing Dimension: Dimension keys are "comment_id"
-- Effective date columns are "last_update_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT 
    "comment_id",
    "comment_text",
    "is_public",
    "issue_id",
    "author_id",
    "creation_timestamp",
    "last_update_author_id"
FROM (
     SELECT 
            "comment_id",
            "comment_text",
            "is_public",
            "issue_id",
            "author_id",
            "creation_timestamp",
            "last_update_author_id",
            ROW_NUMBER() OVER (
                PARTITION BY "comment_id" 
                ORDER BY "last_update_timestamp" 
            DESC) AS "cocoon_rn"
    FROM "stg_comment"
) ranked
WHERE "cocoon_rn" = 1

snapshot_comment.yml (Document the table)

version: 2
models:
- name: snapshot_comment
  description: The table is about the latest version of comments on issues. It tracks
    the most recent comment text, public visibility status, associated issue ID, original
    author, and creation timestamp for each unique comment. Each comment is linked
    to a specific issue and author. The table provides a current snapshot of all comments
    without version history.
  columns:
  - name: comment_id
    description: Unique identifier for the comment
    tests:
    - not_null
    - unique
    cocoon_meta:
      uniqueness: Unique dimension key, derived from the slowly changing dimension
  - name: comment_text
    description: Text content of the comment
    tests:
    - not_null
  - name: is_public
    description: Visibility status of the comment
    tests:
    - not_null
  - name: issue_id
    description: ID of the associated issue
    tests:
    - not_null
  - name: author_id
    description: ID of the comment author
    tests:
    - not_null
  - name: creation_timestamp
    description: Timestamp of comment creation
    tests:
    - not_null
  - name: last_update_author_id
    description: ID of the author who last updated the comment
    tests:
    - not_null
cocoon_meta:
  scd_base_table: stg_comment
We identify the primary key (PK) and foreign key (FK) from tables. We build a join graph that connects FK to PK.

Join Graph (FK to PK)

%3 stg_issue_type stg_issue_type stg_field stg_field stg_status stg_status stg_project_category stg_project_category stg_status->stg_project_category stg_project stg_project stg_project->stg_project_category stg_project_board stg_project_board stg_project_board->stg_project stg_status_category stg_status_category stg_status_category->stg_status stg_issue_field_history stg_issue_field_history stg_issue_field_history->stg_field stg_issue stg_issue stg_issue_field_history->stg_issue stg_issue->stg_issue_type stg_issue->stg_status stg_issue->stg_project stg_resolution stg_resolution stg_issue->stg_resolution stg_priority stg_priority stg_issue->stg_priority stg_user stg_user stg_issue->stg_user stg_issue->stg_user stg_issue_link stg_issue_link stg_issue_link->stg_issue stg_issue_link->stg_issue snapshot_comment snapshot_comment snapshot_comment->stg_issue snapshot_comment->stg_user snapshot_comment->stg_user stg_sprint stg_sprint stg_sprint->stg_project_board stg_version stg_version stg_version->stg_project stg_component stg_component stg_component->stg_project stg_field_option stg_field_option stg_field_option->stg_field snapshot_issue_multiselect_history snapshot_issue_multiselect_history snapshot_issue_multiselect_history->stg_issue

cocoon_join.yml (Document the joins)

join_graph:
- table_name: stg_field
  primary_key: field_id
  foreign_keys: []
- table_name: stg_field_option
  foreign_keys:
  - column: field_id
    reference:
      table_name: stg_field
      column: field_id
- table_name: stg_issue_field_history
  foreign_keys:
  - column: field_identifier
    reference:
      table_name: stg_field
      column: field_id
  - column: issue_identifier
    reference:
      table_name: stg_issue
      column: issue_id
- table_name: stg_issue
  primary_key: issue_id
  foreign_keys:
  - column: issue_type_id
    reference:
      table_name: stg_issue_type
      column: issue_type_id
  - column: priority_id
    reference:
      table_name: stg_priority
      column: priority_id
  - column: project_id
    reference:
      table_name: stg_project
      column: project_id
  - column: resolution_id
    reference:
      table_name: stg_resolution
      column: resolution_id
  - column: status_id
    reference:
      table_name: stg_status
      column: status_id
  - column: creator_id
    reference:
      table_name: stg_user
      column: user_id
  - column: reporter_id
    reference:
      table_name: stg_user
      column: user_id
- table_name: stg_issue_link
  foreign_keys:
  - column: issue_id
    reference:
      table_name: stg_issue
      column: issue_id
  - column: related_issue_id
    reference:
      table_name: stg_issue
      column: issue_id
- table_name: snapshot_comment
  foreign_keys:
  - column: issue_id
    reference:
      table_name: stg_issue
      column: issue_id
  - column: author_id
    reference:
      table_name: stg_user
      column: user_id
  - column: last_update_author_id
    reference:
      table_name: stg_user
      column: user_id
- table_name: snapshot_issue_multiselect_history
  foreign_keys:
  - column: issue_id
    reference:
      table_name: stg_issue
      column: issue_id
- table_name: stg_issue_type
  primary_key: issue_type_id
  foreign_keys: []
- table_name: stg_priority
  primary_key: priority_id
  foreign_keys: []
- table_name: stg_project
  primary_key: project_id
  foreign_keys:
  - column: category_id
    reference:
      table_name: stg_project_category
      column: category_id
- table_name: stg_component
  foreign_keys:
  - column: project_id
    reference:
      table_name: stg_project
      column: project_id
- table_name: stg_project_board
  foreign_keys:
  - column: project_id
    reference:
      table_name: stg_project
      column: project_id
  primary_key: board_id
- table_name: stg_version
  foreign_keys:
  - column: project_id
    reference:
      table_name: stg_project
      column: project_id
- table_name: stg_sprint
  foreign_keys:
  - column: board_id
    reference:
      table_name: stg_project_board
      column: board_id
- table_name: stg_project_category
  primary_key: category_id
  foreign_keys: []
- table_name: stg_status
  foreign_keys:
  - column: category_id
    reference:
      table_name: stg_project_category
      column: category_id
  primary_key: status_id
- table_name: stg_resolution
  primary_key: resolution_id
  foreign_keys: []
- table_name: stg_status_category
  foreign_keys:
  - column: status_id
    reference:
      table_name: stg_status
      column: status_id
- table_name: stg_user
  primary_key: user_id
  foreign_keys: []
We identify the entities and relationships behind the tables, and tell the story among these relationships.

cocoon_er.yml (Document the ER model)

entities:
- entity_name: Fields
  entity_description: Represents the various fields used in an issue tracking system,
    including both standard and custom fields.
  table_name: stg_field
  primary_key: field_id
- entity_name: Issues
  entity_description: Represents individual issues or tasks in an issue tracking system,
    including their details, status, and time tracking information.
  table_name: stg_issue
  primary_key: issue_id
- entity_name: Issue Types
  entity_description: Represents different categories or types of issues that can
    be created in the issue tracking system.
  table_name: stg_issue_type
  primary_key: issue_type_id
- entity_name: Priorities
  entity_description: Represents the different priority levels that can be assigned
    to issues, indicating their importance or urgency.
  table_name: stg_priority
  primary_key: priority_id
- entity_name: Projects
  entity_description: Represents individual projects within the system, including
    their details and associated metadata.
  table_name: stg_project
  primary_key: project_id
- entity_name: Project Boards
  entity_description: Represents the boards associated with projects, typically used
    for visualizing and managing project tasks or issues.
  table_name: stg_project_board
  primary_key: board_id
- entity_name: Project Categories
  entity_description: Represents the categories used to classify and organize different
    types of projects within the system.
  table_name: stg_project_category
  primary_key: category_id
- entity_name: Resolutions
  entity_description: Represents the various resolution statuses that can be applied
    to issues when they are completed or closed.
  table_name: stg_resolution
  primary_key: resolution_id
- entity_name: Statuses
  entity_description: Represents the different status options that can be assigned
    to issues, indicating their current state in the workflow.
  table_name: stg_status
  primary_key: status_id
- entity_name: Users
  entity_description: Represents user accounts and profiles within the system, including
    personal information and preferences.
  table_name: stg_user
  primary_key: user_id
relations:
- relation_name: IssueTrackingSystem
  relation_description: Issues are created within Projects, categorized by Issue Types,
    assigned Priorities, tracked through Statuses,  addressed by Resolutions, and
    managed by Users as creators, reporters, or assignees.
  table_name: stg_issue
  entities:
  - Issues
  - Issue Types
  - Priorities
  - Projects
  - Resolutions
  - Statuses
  - Users
- relation_name: ProjectCategorization
  relation_description: Projects are organized into Project Categories, which group
    related projects together for better management and classification.
  table_name: stg_project
  entities:
  - Projects
  - Project Categories
- relation_name: ProjectBoardAssociation
  relation_description: Project Boards are organized within Projects, allowing multiple
    boards to be associated with and managed under specific projects.
  table_name: stg_project_board
  entities:
  - Project Boards
  - Projects
- relation_name: StatusCategoryAssignment
  relation_description: Statuses are assigned to specific Project Categories, indicating
    the current state of tasks or issues within those categories.
  table_name: stg_status
  entities:
  - Statuses
  - Project Categories
- relation_description: This table stores the available options or status labels for
    different fields in a task management or issue tracking system.
  table_name: stg_field_option
  entities:
  - Fields
- relation_name: IssueFieldChangeHistory
  relation_description: Issues contain multiple Fields, and the history of changes
    to these Fields is tracked over time for each Issue.
  table_name: stg_issue_field_history
  entities:
  - Fields
  - Issues
- relation_description: This table represents relationships between Issues, showing
    how one issue is linked to another through specific connection types.
  table_name: stg_issue_link
  entities:
  - Issues
- relation_name: UserIssueComments
  relation_description: Users create and update comments on specific Issues, with
    each comment having visibility settings and timestamps.
  table_name: snapshot_comment
  entities:
  - Issues
  - Users
- relation_description: This table stores the current field values for various Issues,
    providing a snapshot of their latest state.
  table_name: snapshot_issue_multiselect_history
  entities:
  - Issues
- relation_description: This stores the Components that are associated with Projects,
    where each Component belongs to a specific Project.
  table_name: stg_component
  entities:
  - Projects
- relation_description: This stores the versions of software projects, tracking their
    release status, dates, and other attributes.
  table_name: stg_version
  entities:
  - Projects
- relation_description: This table tracks sprints associated with project boards,
    detailing their timelines, status, and identifying information.
  table_name: stg_sprint
  entities:
  - Project Boards
- relation_description: This table stores the Status categories that can be assigned
    to tasks or projects in a task management system.
  table_name: stg_status_category
  entities:
  - Statuses
story:
- relation_name: ProjectCategorization
  story_line: Administrators group projects into specific categories for organization.
- relation_name: ProjectBoardAssociation
  story_line: Project managers create boards within projects for task visualization.
- relation_name: StatusCategoryAssignment
  story_line: Team leads assign statuses to project categories for progress tracking.
- relation_name: IssueTrackingSystem
  story_line: Team members create issues in projects with types, priorities, and assignees.
- relation_name: UserIssueComments
  story_line: Users add comments to issues for collaboration and updates.
- relation_name: IssueFieldChangeHistory
  story_line: System logs changes to issue fields for audit purposes.