comment (first 100 rows)
id | _fivetran_synced | author_id | body | created | is_public | issue_id | update_author_id | updated | |
---|---|---|---|---|---|---|---|---|---|
0 | 10001 | 2020-11-12 12:20:53.148 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | Hello | 2020-11-10 19:19:41.224 | True | 10025 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-10 19:19:41.224 |
1 | 10013 | 2020-11-10 19:21:48.619 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago | 2020-11-07 02:45:38.717 | True | 10035 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 |
2 | 10012 | 2020-11-10 19:21:48.618 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | Joined Sample Sprint 2 7 days 9 hours 10 minutes ago | 2020-11-07 02:45:38.717 | True | 10035 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 |
component (first 100 rows)
id | _fivetran_synced | description | name | project_id | |
---|---|---|---|---|---|
0 | 10004 | 2020-11-23 12:21:02.159 | NaN | Component 5 | 10001 |
1 | 10003 | 2020-11-23 12:21:02.159 | NaN | Component 4 | 10001 |
2 | 10002 | 2020-11-23 12:21:02.158 | NaN | Component 3 | 10001 |
3 | 10019 | 2020-11-23 12:21:02.157 | NaN | Component 2 | 10001 |
epic (first 100 rows)
id | _fivetran_synced | done | key_ | name | summary | |
---|---|---|---|---|---|---|
0 | 10014 | 2020-11-17 12:20:54.934 | False | TP-11 | Test Epic | just for testing |
field (first 100 rows)
id | _fivetran_synced | is_array | is_custom | name | |
---|---|---|---|---|---|
0 | timeoriginalestimate | 2020-11-23 22:20:39.685 | False | False | Original Estimate |
1 | creator | 2020-11-23 22:20:39.711 | False | False | Creator |
2 | issuerestriction | 2020-11-23 22:20:39.643 | False | False | Restrict to |
3 | components | 2020-11-23 22:20:40.643 | False | False | Components |
4 | summary | 2020-11-23 22:20:41.553 | False | False | Summary |
5 | epiclink | 2020-11-23 22:20:41.553 | False | False | Epic-Link |
6 | epic | 2020-11-23 22:20:41.553 | False | False | Epic |
field_option (first 100 rows)
id | _fivetran_synced | name | |
---|---|---|---|
0 | 10104 | 2020-11-23 12:21:00.979 | opt 2 |
1 | 10016 | 2020-11-17 12:20:52.552 | To Do |
2 | 10106 | 2020-11-17 12:20:52.227 | opt 4 |
3 | 10019 | 2020-11-19 12:20:53.110 | Impediment |
4 | 10026 | 2020-11-20 12:20:54.221 | This is the summary of the issue. |
issue (first 100 rows)
id | _fivetran_deleted | _fivetran_synced | _original_estimate | _remaining_estimate | _time_spent | assignee | created | creator | description | due_date | environment | issue_type | key_ | last_viewed | original_estimate | parent_id | priority | project | remaining_estimate | reporter | resolution | resolved | status | status_category_changed | summary | time_spent | updated | work_ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10011 | False | 2020-11-06 22:23:38 | NaN | NaN | NaN | NaN | 2020-04-01 08:33:58 | 5c3326c24b248c315badf3ee | Yes, this is an issue to test permissions | NaN | NaN | 10103 | TP-8 | None | NaN | NaN | 3 | 10001 | NaN | 5c3326c24b248c315badf3ee | NaN | 2020-05-29 11:33:53 | 3 | 2020-04-01 08:34:02 | We want to test permissions | NaN | 2020-05-29 11:33:53 | -1 |
1 | 10015 | False | 2020-11-06 22:23:41 | NaN | NaN | NaN | NaN | 2020-06-11 07:38:03 | 5ed0edbe2999b60c23255b4f | None | NaN | NaN | 10103 | TP-12 | None | NaN | NaN | 5 | 10001 | NaN | 5ed0edbe2999b60c23255b4f | NaN | 2020-06-11 07:38:03 | 10003 | 2020-06-11 07:38:03 | Part of test epic | NaN | 2020-06-11 07:38:03 | -1 |
2 | 10018 | False | 2020-11-12 12:20:50 | NaN | NaN | NaN | NaN | 2020-11-10 19:11:45 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | None | NaN | NaN | 10103 | TCP-1 | 2020-11-10 19:11:46 | NaN | NaN | 3 | 10005 | NaN | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 10000.0 | 2020-11-28 19:11:45 | 10003 | 2020-11-28 19:11:46 | this is a story | NaN | 2020-11-28 19:11:45 | -1 |
issue_field_history (first 100 rows)
field_id | issue_id | time_ | _fivetran_synced | value_ | |
---|---|---|---|---|---|
0 | created | 10027 | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.478 | 2020-11-10T19:19:41.472Z |
1 | customfield_10104 | 10027 | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.472 | 3.0 |
2 | summary | 10027 | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.500 | As a developer, I'd like to update story status during the sprint >> Click the Active sprints link at the top right of the screen to go to the Active sprints where the current Sprint's items can be updated |
3 | components | 10018 | 2020-11-10 19:19:41.472 | 2020-11-19 12:20:53.500 | 10019 |
4 | summary | 10018 | 2020-11-10 19:19:41.472 | 2020-11-19 12:20:53.500 | 10026 |
5 | epic | 10015 | 2020-11-10 19:19:41.472 | 2020-11-19 12:20:53.500 | 1111111 |
6 | epiclink | 10018 | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.500 | https://ulr-here/epic-key |
issue_link (first 100 rows)
issue_id | related_issue_id | _fivetran_synced | relationship | |
---|---|---|---|---|
0 | 10002 | 10004 | 2020-11-06 22:23:39.405 | duplicates |
1 | 10004 | 10006 | 2020-11-23 12:21:01.461 | duplicates |
2 | 10004 | 10005 | 2020-11-23 12:21:01.454 | duplicates |
issue_multiselect_history (first 100 rows)
_fivetran_id | time_ | _fivetran_synced | field_id | issue_id | value_ | |
---|---|---|---|---|---|---|
0 | w4UiT+hPMxJp3RhL/YFJm3uWL5U= | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.506 | subtasks | 10027 | 0 |
1 | 4pVgGn0qSqR2hCmMdo4wWHXmgew= | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.479 | customfield_10021 | 10027 | 0 |
2 | /zrY8m6q0VMW6ia1jGIerXqLIgQ= | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.479 | customfield_10020 | 10027 | 0 |
3 | 3p3gGn0qSqR2hCmMdo4wWHXa32m= | 2020-11-10 19:19:41.472 | 2020-11-12 12:20:53.479 | components | 10027 | 0 |
issue_type (first 100 rows)
id | _fivetran_synced | description | name | subtask | |
---|---|---|---|---|---|
0 | 10117 | 2020-11-23 22:20:40.176 | Bugs track problems or errors. | Bug | False |
1 | 10106 | 2020-11-23 22:20:40.171 | A problem or error. | Bug | False |
2 | 10101 | 2020-11-23 22:20:40.177 | Track large pieces of work. | Epic | False |
priority (first 100 rows)
id | _fivetran_synced | description | name | |
---|---|---|---|---|
0 | 2 | 2020-11-06 22:23:39.764 | Serious problem that could block progress. | High |
1 | 5 | 2020-11-06 22:23:41.009 | Trivial problem with little or no impact on progress. | Lowest |
2 | 1 | 2020-11-20 12:20:54.303 | This problem will block progress. | Highest |
3 | 3 | 2020-11-25 12:20:42.395 | Has the potential to affect progress. | Medium |
4 | 4 | 2020-11-17 12:20:52.382 | Minor problem or easily worked around. | Low |
project (first 100 rows)
id | _fivetran_synced | description | key_ | lead_id | name | permission_scheme_id | project_category_id | project_type_key | |
---|---|---|---|---|---|---|---|---|---|
0 | 10008 | 2020-11-17 12:20:50.283 | NaN | TBTP | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | test bug tracking project | 10001 | NaN | software |
1 | 10005 | 2020-11-12 12:20:49.801 | NaN | TCP | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | test classique project | 10001 | NaN | software |
2 | 10001 | 2020-11-23 12:21:02.150 | NaN | TP | 5c3326c24b248c315badf3ee | Test Project | 10001 | 10000.0 | software |
project_board (first 100 rows)
board_id | project_id | _fivetran_synced | |
---|---|---|---|
0 | 7 | 10006 | 2020-11-22 12:20:57.685 |
1 | 5 | 10004 | 2020-11-22 12:20:59.875 |
project_category (first 100 rows)
id | _fivetran_synced | description | name | |
---|---|---|---|---|
0 | 10000 | 2020-11-23 12:21:02.157 | will this create a project_category table? | test-category |
resolution (first 100 rows)
id | _fivetran_synced | description | name | |
---|---|---|---|---|
0 | 10000 | 2020-11-22 12:20:59.338 | Work has been completed on this issue. | Done |
sprint (first 100 rows)
id | _fivetran_synced | board_id | complete_date | end_date | name | start_date | |
---|---|---|---|---|---|---|---|
0 | 4 | 2020-11-22 12:20:59.513 | 5 | None | 2020-11-26 21:25:12 | TNGP Sprint 1 | 2020-11-12 21:25:17.778 |
1 | 1 | 2020-11-22 12:20:57.012 | 0 | 2020-11-10 20:52:38.262 | 2020-11-17 10:29:44.653 | Sample Sprint 2 | 2020-11-03 10:09:44.653 |
2 | 3 | 2020-11-22 12:20:57.298 | 7 | None | 2020-11-26 21:55:00 | Sample Sprint 2 | 2020-11-12 21:55:42.220 |
status (first 100 rows)
id | _fivetran_synced | description | name | status_category_id | |
---|---|---|---|---|---|
0 | 10010 | 2020-11-23 22:20:40.409 | None | In Progress | 4 |
1 | 3 | 2020-11-23 22:20:40.391 | This issue is being actively worked on at the moment by the assignee. | In Progress | 4 |
2 | 10000 | 2020-11-23 22:20:40.404 | None | To Do | 2 |
status_category (first 100 rows)
id | _fivetran_synced | name | |
---|---|---|---|
0 | 3 | 2020-11-23 22:20:40.410 | Done |
1 | 2 | 2020-11-23 22:20:40.411 | To Do |
2 | 4 | 2020-11-23 22:20:40.411 | In Progress |
user (first 100 rows)
id | _fivetran_synced | locale | name | time_zone | username | ||
---|---|---|---|---|---|---|---|
0 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-23 12:21:02.355 | NaN | en_US | Scooby Doo | America/Los_Angeles | NaN |
1 | 5ed0edbe2999b60c23255b4f | 2020-11-23 12:21:02.562 | NaN | en_US | Scrappy Cornelius Doo | America/Los_Angeles | NaN |
2 | 5c3326c24b248c315badf3ee | 2020-11-23 12:21:02.757 | NaN | en_US | Norville Shaggy Rogers | America/Los_Angeles | NaN |
user_group (first 100 rows)
group_name | user_id | _fivetran_synced | |
---|---|---|---|
0 | jira-software-users | 5faf114ad74f69007579568f | 2020-11-20 12:20:54.925 |
1 | site-admins | 5d577670dbb98c0d9c232bb2 | 2020-11-23 12:21:01.943 |
2 | confluence-users | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-23 12:21:02.355 |
version (first 100 rows)
id | _fivetran_synced | archived | description | name | overdue | project_id | release_date | released | start_date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 10001 | 2021-06-17 14:20:59.271 | False | second version upd | v2.0 | True | 10001 | 2020-06-01 | False | 2020-05-01 |
1 | 10003 | 2021-04-13 16:31:23.435 | False | None | Version 2.0 | True | 10006 | 2020-11-17 | False | None |
2 | 10004 | 2021-04-13 16:31:23.435 | False | None | Version 3.0 | False | 10006 | None | False | None |
stg_field (first 100 rows)
field_id | is_multi_value | is_custom_field | field_name | |
---|---|---|---|---|
0 | timeoriginalestimate | False | False | Original Estimate |
1 | creator | False | False | Creator |
2 | issuerestriction | False | False | Restrict to |
3 | components | False | False | Components |
4 | summary | False | False | Summary |
5 | epiclink | False | False | Epic-Link |
6 | epic | False | False | Epic |
stg_field.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"field_projected" AS (
-- Projection: Selecting 4 out of 5 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"is_array",
"is_custom",
"name"
FROM "field"
),
"field_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> field_id
-- is_array -> is_multi_value
-- is_custom -> is_custom_field
-- name -> field_name
SELECT
"id" AS "field_id",
"is_array" AS "is_multi_value",
"is_custom" AS "is_custom_field",
"name" AS "field_name"
FROM "field_projected"
)
-- COCOON BLOCK END
SELECT * FROM "field_projected_renamed"
stg_field.yml (Document the table)
version: 2
models:
- name: stg_field
description: The table is about fields in an issue tracking system. It contains
information on field IDs, names, and properties. Each row represents a different
field. The "is_array" and "is_custom" columns are boolean flags indicating field
characteristics. The table includes standard fields like summary, components,
and creator.
columns:
- name: field_id
description: Unique identifier for the field
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each field in the
issue tracking system. For this table, each row represents a different field.
The field_id is likely to be unique across rows as it's designed to be an
identifier.
- name: is_multi_value
description: Boolean flag indicating if field can have multiple values
tests:
- not_null
- name: is_custom_field
description: Boolean flag indicating if field is custom-defined
tests:
- not_null
- name: field_name
description: Human-readable name of the field
tests:
- not_null
stg_project (first 100 rows)
project_id | project_key | project_name | permission_scheme_id | project_type | category_id | |
---|---|---|---|---|---|---|
0 | 10008 | TBTP | Test Project | 10001 | software | None |
1 | 10005 | TCP | Test Project | 10001 | software | None |
2 | 10001 | TP | Test Project | 10001 | software | 10000.0 |
stg_project.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"project_projected" AS (
-- Projection: Selecting 8 out of 9 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"key_",
"lead_id",
"name",
"permission_scheme_id",
"project_category_id",
"project_type_key"
FROM "project"
),
"project_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> project_id
-- description -> project_description
-- key_ -> project_key
-- name -> project_name
-- project_category_id -> category_id
-- project_type_key -> project_type
SELECT
"id" AS "project_id",
"description" AS "project_description",
"key_" AS "project_key",
"lead_id",
"name" AS "project_name",
"permission_scheme_id",
"project_category_id" AS "category_id",
"project_type_key" AS "project_type"
FROM "project_projected"
),
"project_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- project_name: The problem is inconsistent capitalization and specificity in project names. "Test Project" is capitalized and generic, while "test bug tracking project" and "test classique project" are lowercase and more specific. The most frequent and generic form is "Test Project", which should be used for consistency.
SELECT
"project_id",
"project_description",
"project_key",
"lead_id",
CASE
WHEN "project_name" = 'test bug tracking project' THEN 'Test Project'
WHEN "project_name" = 'test classique project' THEN 'Test Project'
ELSE "project_name"
END AS "project_name",
"permission_scheme_id",
"category_id",
"project_type"
FROM "project_projected_renamed"
),
"project_projected_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- category_id: from DECIMAL to VARCHAR
-- lead_id: from VARCHAR to UUID
-- project_description: from DECIMAL to VARCHAR
SELECT
"project_id",
"project_key",
"project_name",
"permission_scheme_id",
"project_type",
CAST("category_id" AS VARCHAR) AS "category_id",
CASE
WHEN "lead_id" ~ ':[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
THEN CAST(REGEXP_EXTRACT("lead_id", '([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})$') AS UUID)
ELSE TRY_CAST("lead_id" AS UUID)
END AS "lead_id",
CAST("project_description" AS VARCHAR) AS "project_description"
FROM "project_projected_renamed_cleaned"
),
"project_projected_renamed_cleaned_casted_missing_handled" AS (
-- Handling missing values: There are 2 columns with unacceptable missing values
-- lead_id has 100.0 percent missing. Strategy: 🗑️ Drop Column
-- project_description has 100.0 percent missing. Strategy: 🗑️ Drop Column
SELECT
"project_id",
"project_key",
"project_name",
"permission_scheme_id",
"project_type",
"category_id"
FROM "project_projected_renamed_cleaned_casted"
)
-- COCOON BLOCK END
SELECT * FROM "project_projected_renamed_cleaned_casted_missing_handled"
stg_project.yml (Document the table)
version: 2
models:
- name: stg_project
description: The table is about projects. Each project has an ID, description, key,
lead ID, name, permission scheme ID, project category ID, and project type key.
The table includes details like project names, keys, and associated IDs. All projects
listed are software projects with different names and keys.
columns:
- name: project_id
description: Unique identifier for the project
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each project. For
this table, each row represents a distinct project, and project_id is unique
across rows.
- name: project_key
description: Short code or abbreviation for the project
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a short code or abbreviation for the project.
For this table, each row is a distinct project, and the project_key appears
to be unique across rows based on the sample data.
- name: project_name
description: Full name of the project
tests:
- not_null
- name: permission_scheme_id
description: ID of the permission scheme applied to the project
tests:
- not_null
- name: project_type
description: Key indicating the type of project
tests:
- not_null
- accepted_values:
values:
- software
- hardware
- web
- mobile
- desktop
- embedded
- cloud
- AI/ML
- data science
- IoT
- cybersecurity
- blockchain
- game development
- enterprise
- e-commerce
- social media
- education
- healthcare
- fintech
- research
- open source
- name: category_id
description: ID of the category the project belongs to
cocoon_meta:
missing_acceptable: Not all projects require categorization.
stg_version (first 100 rows)
version_id | is_archived | version_description | version_name | is_overdue | project_id | is_released | release_date | start_date | |
---|---|---|---|---|---|---|---|---|---|
0 | 10001 | False | second version upd | Version 2.0 | True | 10001 | False | 2020-06-01 | 2020-05-01 |
1 | 10003 | False | None | Version 2.0 | True | 10006 | False | 2020-11-17 | NaT |
2 | 10004 | False | None | Version 3.0 | False | 10006 | False | NaT | NaT |
stg_version.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"version_projected" AS (
-- Projection: Selecting 9 out of 10 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"archived",
"description",
"name",
"overdue",
"project_id",
"release_date",
"released",
"start_date"
FROM "version"
),
"version_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> version_id
-- archived -> is_archived
-- description -> version_description
-- name -> version_name
-- overdue -> is_overdue
-- released -> is_released
SELECT
"id" AS "version_id",
"archived" AS "is_archived",
"description" AS "version_description",
"name" AS "version_name",
"overdue" AS "is_overdue",
"project_id",
"release_date",
"released" AS "is_released",
"start_date"
FROM "version_projected"
),
"version_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- version_name: The problem is inconsistent formatting for version 2.0. It appears as both "Version 2.0" and "v2.0". The correct values should follow the most common format, which is "Version X.0". "Version 2.0" is already in the correct format and is the most frequent, so "v2.0" should be changed to match it.
SELECT
"version_id",
"is_archived",
"version_description",
CASE
WHEN "version_name" = 'v2.0' THEN 'Version 2.0'
ELSE "version_name"
END AS "version_name",
"is_overdue",
"project_id",
"release_date",
"is_released",
"start_date"
FROM "version_projected_renamed"
),
"version_projected_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- release_date: from VARCHAR to DATE
-- start_date: from VARCHAR to DATE
SELECT
"version_id",
"is_archived",
"version_description",
"version_name",
"is_overdue",
"project_id",
"is_released",
CAST("release_date" AS DATE) AS "release_date",
CAST("start_date" AS DATE) AS "start_date"
FROM "version_projected_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT * FROM "version_projected_renamed_cleaned_casted"
stg_version.yml (Document the table)
version: 2
models:
- name: stg_version
description: The table is about software versions. It includes details like version
ID, name, description, project ID, release dates, and status flags. Each row represents
a specific version with its attributes. The table tracks whether versions are
archived, overdue, or released, along with their start and release dates.
columns:
- name: version_id
description: Unique identifier for the version
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each software version.
For this table, each row represents a specific version of software. The version_id
is likely to be unique across rows as it's designed to distinctly identify
each version.
- name: is_archived
description: Indicates if the version is archived
tests:
- not_null
- name: version_description
description: Brief description of the version
tests:
- not_null
- name: version_name
description: Name or number of the version
tests:
- not_null
- name: is_overdue
description: Indicates if the version is past due date
tests:
- not_null
- name: project_id
description: Identifier of the project the version belongs to
tests:
- not_null
- name: is_released
description: Indicates if the version has been released
tests:
- not_null
- name: release_date
description: Planned or actual date of version release
cocoon_meta:
missing_acceptable: Not applicable for unreleased versions.
- name: start_date
description: Date when work on the version began
tests:
- not_null
stg_status_category (first 100 rows)
status_id | status_name | |
---|---|---|
0 | 3 | Done |
1 | 2 | To Do |
2 | 4 | In Progress |
stg_status_category.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"status_category_projected" AS (
-- Projection: Selecting 2 out of 3 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"name"
FROM "status_category"
),
"status_category_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> status_id
-- name -> status_name
SELECT
"id" AS "status_id",
"name" AS "status_name"
FROM "status_category_projected"
)
-- COCOON BLOCK END
SELECT * FROM "status_category_projected_renamed"
stg_status_category.yml (Document the table)
version: 2
models:
- name: stg_status_category
description: The table is about status categories. It contains different stages
of task completion. Each category has an ID and a name. The categories include
"Done", "To Do", and "In Progress". This table likely serves as a lookup for task
management systems to assign status to tasks or projects.
columns:
- name: status_id
description: Unique identifier for each status category
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each status category.
For this table, each row represents a distinct status category. The status_id
is likely to be unique across rows as it serves as an identifier.
- name: status_name
description: Descriptive name of the status category
tests:
- not_null
- unique
- accepted_values:
values:
- Done
- In Progress
- To Do
- Not Started
- Blocked
- Cancelled
- On Hold
- Deferred
- Under Review
- Completed
- Pending Approval
cocoon_meta:
uniqueness: This column contains the descriptive name of the status category.
For this table, each row represents a distinct status category. While status
names are likely to be unique in this context, it's possible (though unlikely)
that two different statuses could have the same name in some systems.
stg_sprint (first 100 rows)
sprint_id | board_id | sprint_name | completion_date | end_date | start_date | |
---|---|---|---|---|---|---|
0 | 4 | 5 | TNGP Sprint 1 | NaT | 2020-11-26 21:25:12 | 2020-11-12 21:25:17.778 |
1 | 1 | 0 | Sample Sprint 2 | 2020-11-10 20:52:38.262 | 2020-11-17 10:29:44 | 2020-11-03 10:09:44.653 |
2 | 3 | 7 | Sample Sprint 2 | NaT | 2020-11-26 21:55:00 | 2020-11-12 21:55:42.220 |
stg_sprint.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"sprint_projected" AS (
-- Projection: Selecting 6 out of 7 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"board_id",
"complete_date",
"end_date",
"name",
"start_date"
FROM "sprint"
),
"sprint_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> sprint_id
-- complete_date -> completion_date
-- name -> sprint_name
SELECT
"id" AS "sprint_id",
"board_id",
"complete_date" AS "completion_date",
"end_date",
"name" AS "sprint_name",
"start_date"
FROM "sprint_projected"
),
"sprint_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- end_date: The problem is that one of the date-time values ('2020-11-17 10:29:44.653') includes milliseconds, while the others don't. The correct values should all follow the same format without milliseconds, which is the most common format in the dataset.
SELECT
"sprint_id",
"board_id",
"completion_date",
CASE
WHEN "end_date" = '2020-11-17 10:29:44.653' THEN '2020-11-17 10:29:44'
ELSE "end_date"
END AS "end_date",
"sprint_name",
"start_date"
FROM "sprint_projected_renamed"
),
"sprint_projected_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- completion_date: from VARCHAR to TIMESTAMP
-- end_date: from VARCHAR to TIMESTAMP
-- start_date: from VARCHAR to TIMESTAMP
SELECT
"sprint_id",
"board_id",
"sprint_name",
CAST("completion_date" AS TIMESTAMP) AS "completion_date",
CAST("end_date" AS TIMESTAMP) AS "end_date",
CAST("start_date" AS TIMESTAMP) AS "start_date"
FROM "sprint_projected_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT * FROM "sprint_projected_renamed_cleaned_casted"
stg_sprint.yml (Document the table)
version: 2
models:
- name: stg_sprint
description: The table is about sprints. It contains details of each sprint, including
an ID, associated board ID, start date, end date, and name. Some sprints have
a complete date, while others do not. The table tracks the timeline and status
of different sprints across various project boards.
columns:
- name: sprint_id
description: Unique identifier for the sprint
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column is a unique identifier for each sprint. For this table,
each row represents a distinct sprint, and sprint_id is unique across rows.
- name: board_id
description: Identifier for the associated project board
tests:
- not_null
- name: sprint_name
description: Name or title of the sprint
tests:
- not_null
- name: completion_date
description: Date when the sprint was finished
cocoon_meta:
missing_acceptable: Sprint may not be completed yet.
- name: end_date
description: Planned end date of the sprint
tests:
- not_null
- name: start_date
description: Date when the sprint began
tests:
- not_null
stg_project_category (first 100 rows)
category_id | |
---|---|
0 | 10000 |
stg_project_category.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"project_category_projected" AS (
-- Projection: Selecting 3 out of 4 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name"
FROM "project_category"
),
"project_category_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> category_id
-- description -> category_description
-- name -> category_name
SELECT
"id" AS "category_id",
"description" AS "category_description",
"name" AS "category_name"
FROM "project_category_projected"
),
"project_category_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- category_description: The problem is that the category_description column contains a question about database structure instead of an actual category description. This is completely inappropriate for a category description field. The correct value should be an empty string or NULL, as we don't have the actual category description information.
-- category_name: The problem is that 'test-category' appears to be a placeholder value rather than a genuine category name. It's likely used as a temporary or default value during testing or development, and doesn't represent an actual product category. In a real dataset, we would expect to see meaningful category names that describe the types of products being sold.
SELECT
"category_id",
CASE
WHEN "category_description" = 'will this create a project_category table?' THEN ''
ELSE "category_description"
END AS "category_description",
CASE
WHEN "category_name" = 'test-category' THEN ''
ELSE "category_name"
END AS "category_name"
FROM "project_category_projected_renamed"
),
"project_category_projected_renamed_cleaned_null" AS (
-- NULL Imputation: Impute Null to Disguised Missing Values
-- category_description: ['']
-- category_name: ['']
SELECT
CASE
WHEN "category_description" = '' THEN NULL
ELSE "category_description"
END AS "category_description",
CASE
WHEN "category_name" = '' THEN NULL
ELSE "category_name"
END AS "category_name",
"category_id"
FROM "project_category_projected_renamed_cleaned"
),
"project_category_projected_renamed_cleaned_null_missing_handled" AS (
-- Handling missing values: There are 2 columns with unacceptable missing values
-- category_description has 100.0 percent missing. Strategy: 🗑️ Drop Column
-- category_name has 100.0 percent missing. Strategy: 🗑️ Drop Column
SELECT
"category_id"
FROM "project_category_projected_renamed_cleaned_null"
)
-- COCOON BLOCK END
SELECT * FROM "project_category_projected_renamed_cleaned_null_missing_handled"
stg_project_category.yml (Document the table)
version: 2
models:
- name: stg_project_category
description: The table is about project categories. It contains unique identifiers
for each category. It stores the name of each category. It includes a description
for each category. The table likely serves to organize and classify different
types of projects within a system.
columns:
- name: category_id
description: Unique identifier for each project category
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each project category.
For this table, each row corresponds to a distinct project category. The category_id
is unique across rows, as it's described as a "Unique identifier for each
project category".
stg_issue_field_history (first 100 rows)
field_identifier | new_field_value | change_timestamp | issue_identifier | |
---|---|---|---|---|
0 | created | 2020-11-10T19:19:41.472Z | 2020-11-10 19:19:41.472 | 10027 |
1 | customfield_10104 | 3.0 | 2020-11-10 19:19:41.472 | 10027 |
2 | summary | As a developer, I'd like to update story status during the sprint >> Click the Active sprints link at the top right of the screen to go to the Active sprints where the current Sprint's items can be updated | 2020-11-10 19:19:41.472 | 10027 |
3 | components | 10019 | 2020-11-10 19:19:41.472 | 10018 |
4 | summary | 10026 | 2020-11-10 19:19:41.472 | 10018 |
5 | epic | 1111111 | 2020-11-10 19:19:41.472 | 10015 |
6 | epiclink | https://ulr-here/epic-key | 2020-11-10 19:19:41.472 | 10018 |
stg_issue_field_history.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"issue_field_history_projected" AS (
-- Projection: Selecting 4 out of 5 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"field_id",
"issue_id",
"time_",
"value_"
FROM "issue_field_history"
),
"issue_field_history_projected_renamed" AS (
-- Rename: Renaming columns
-- field_id -> field_identifier
-- issue_id -> issue_identifier
-- time_ -> change_timestamp
-- value_ -> new_field_value
SELECT
"field_id" AS "field_identifier",
"issue_id" AS "issue_identifier",
"time_" AS "change_timestamp",
"value_" AS "new_field_value"
FROM "issue_field_history_projected"
),
"issue_field_history_projected_renamed_casted" AS (
-- Column Type Casting:
-- change_timestamp: from VARCHAR to TIMESTAMP
-- issue_identifier: from INT to VARCHAR
SELECT
"field_identifier",
"new_field_value",
CAST("change_timestamp" AS TIMESTAMP) AS "change_timestamp",
CAST("issue_identifier" AS VARCHAR) AS "issue_identifier"
FROM "issue_field_history_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "issue_field_history_projected_renamed_casted"
stg_issue_field_history.yml (Document the table)
version: 2
models:
- name: stg_issue_field_history
description: The table is about issue field history. It tracks changes to various
fields of issues in a tracking system. Each row represents a change, including
the field ID, issue ID, timestamp, and new value. Fields can be system fields
like "created" or custom fields. The table allows for tracking the evolution of
issues over time.
columns:
- name: field_identifier
description: Identifier for the issue field
tests:
- not_null
- name: new_field_value
description: New value of the changed field
tests:
- not_null
- name: change_timestamp
description: Timestamp of the field change
tests:
- not_null
- name: issue_identifier
description: Unique identifier for the issue
tests:
- not_null
stg_field_option (first 100 rows)
field_id | option_name | |
---|---|---|
0 | 10104 | Option 2 |
1 | 10016 | To Do |
2 | 10106 | Option 4 |
3 | 10019 | Impediment |
4 | 10026 | Summary |
stg_field_option.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"field_option_projected" AS (
-- Projection: Selecting 2 out of 3 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"name"
FROM "field_option"
),
"field_option_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> field_id
-- name -> option_name
SELECT
"id" AS "field_id",
"name" AS "option_name"
FROM "field_option_projected"
),
"field_option_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- option_name: The problem is that the option_name column contains inconsistent values. 'This is the summary of the issue.' is a full sentence among shorter labels, which is unusual. 'opt 2' and 'opt 4' are inconsistently named compared to the other options. The correct values should be concise labels that describe the option or status. 'Impediment' and 'To Do' are already in the correct format.
SELECT
"field_id",
CASE
WHEN "option_name" = 'This is the summary of the issue.' THEN 'Summary'
WHEN "option_name" = 'opt 2' THEN 'Option 2'
WHEN "option_name" = 'opt 4' THEN 'Option 4'
ELSE "option_name"
END AS "option_name"
FROM "field_option_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "field_option_projected_renamed_cleaned"
stg_field_option.yml (Document the table)
version: 2
models:
- name: stg_field_option
description: The table is about field options. It contains IDs and names for various
options. These options appear to be status labels or descriptors, possibly used
in a task management or issue tracking system. Each option has a unique numeric
ID and a corresponding text name.
columns:
- name: field_id
description: Unique numeric identifier for each field
tests:
- not_null
- name: option_name
description: Text label or description of the option
tests:
- not_null
- accepted_values:
values:
- Impediment
- Option 2
- Option 4
- Summary
- To Do
- In Progress
- Done
- Blocked
- Backlog
- Ready for Review
- Under Review
- Approved
- Rejected
- Deferred
stg_issue_link (first 100 rows)
issue_id | related_issue_id | relationship_type | |
---|---|---|---|
0 | 10002 | 10004 | duplicates |
1 | 10004 | 10006 | duplicates |
2 | 10004 | 10005 | duplicates |
stg_issue_link.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"issue_link_projected" AS (
-- Projection: Selecting 3 out of 4 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"issue_id",
"related_issue_id",
"relationship"
FROM "issue_link"
),
"issue_link_projected_renamed" AS (
-- Rename: Renaming columns
-- relationship -> relationship_type
SELECT
"issue_id",
"related_issue_id",
"relationship" AS "relationship_type"
FROM "issue_link_projected"
)
-- COCOON BLOCK END
SELECT * FROM "issue_link_projected_renamed"
stg_issue_link.yml (Document the table)
version: 2
models:
- name: stg_issue_link
description: The table is about issue relationships. It shows how different issues
are linked. Each row connects two issues with an "issue_id" and a "related_issue_id".
The "relationship" column describes how they are connected, such as "duplicates".
This structure allows tracking of issue dependencies and similarities across a
project or system.
columns:
- name: issue_id
description: Identifier for the primary issue
tests:
- not_null
- name: related_issue_id
description: Identifier for the related issue
tests:
- not_null
- name: relationship_type
description: Type of connection between the issues
tests:
- not_null
- accepted_values:
values:
- duplicates
- blocks
- is blocked by
- relates to
- parent of
- child of
- clones
- is cloned by
- precedes
- follows
- causes
- is caused by
- similar to
- depends on
- is depended on by
stg_issue_multiselect_history (first 100 rows)
record_id | field_id | issue_id | change_timestamp | new_field_value | |
---|---|---|---|---|---|
0 | w4UiT+hPMxJp3RhL/YFJm3uWL5U= | subtasks | 10027 | 2020-11-10 19:19:41.472 | 0 |
1 | 4pVgGn0qSqR2hCmMdo4wWHXmgew= | customfield_10021 | 10027 | 2020-11-10 19:19:41.472 | 0 |
2 | /zrY8m6q0VMW6ia1jGIerXqLIgQ= | customfield_10020 | 10027 | 2020-11-10 19:19:41.472 | 0 |
3 | 3p3gGn0qSqR2hCmMdo4wWHXa32m= | components | 10027 | 2020-11-10 19:19:41.472 | 0 |
stg_issue_multiselect_history.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"issue_multiselect_history_projected" AS (
-- Projection: Selecting 5 out of 6 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"_fivetran_id",
"time_",
"field_id",
"issue_id",
"value_"
FROM "issue_multiselect_history"
),
"issue_multiselect_history_projected_renamed" AS (
-- Rename: Renaming columns
-- _fivetran_id -> record_id
-- time_ -> change_timestamp
-- value_ -> new_field_value
SELECT
"_fivetran_id" AS "record_id",
"time_" AS "change_timestamp",
"field_id",
"issue_id",
"value_" AS "new_field_value"
FROM "issue_multiselect_history_projected"
),
"issue_multiselect_history_projected_renamed_casted" AS (
-- Column Type Casting:
-- change_timestamp: from VARCHAR to TIMESTAMP
-- new_field_value: from INT to VARCHAR
SELECT
"record_id",
"field_id",
"issue_id",
CAST("change_timestamp" AS TIMESTAMP) AS "change_timestamp",
CAST("new_field_value" AS VARCHAR) AS "new_field_value"
FROM "issue_multiselect_history_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "issue_multiselect_history_projected_renamed_casted"
stg_issue_multiselect_history.yml (Document the table)
version: 2
models:
- name: stg_issue_multiselect_history
description: The table is about issue field value history. It tracks changes to
various fields of an issue over time. Each row represents a change, recording
the field ID, issue ID, new value, and timestamp. The _fivetran_id appears to
be a unique identifier for each record.
columns:
- name: record_id
description: Unique identifier for each record
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column appears to be a unique identifier for each change record.
For this table, each row represents a single change to an issue field. The
record_id seems to be unique across rows, as it's described as a unique identifier
for each record.
- name: field_id
description: Identifier of the issue field that changed
tests:
- not_null
- name: issue_id
description: Identifier of the issue being tracked
tests:
- not_null
- name: change_timestamp
description: Timestamp of when the field value changed
tests:
- not_null
- name: new_field_value
description: New value of the field after the change
tests:
- not_null
stg_user_group (first 100 rows)
group_name | user_identifier | |
---|---|---|
0 | jira-software-users | 5faf114ad74f69007579568f |
1 | site-admins | 5d577670dbb98c0d9c232bb2 |
2 | confluence-users | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 |
stg_user_group.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"user_group_projected" AS (
-- Projection: Selecting 2 out of 3 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"group_name",
"user_id"
FROM "user_group"
),
"user_group_projected_renamed" AS (
-- Rename: Renaming columns
-- user_id -> user_identifier
SELECT
"group_name",
"user_id" AS "user_identifier"
FROM "user_group_projected"
)
-- COCOON BLOCK END
SELECT * FROM "user_group_projected_renamed"
stg_user_group.yml (Document the table)
version: 2
models:
- name: stg_user_group
description: The table represents a many-to-many relationship between user groups
and users. It shows which users belong to which groups. The 'group_name' column
contains the names of different user groups. The 'user_id' column holds unique
identifiers for individual users. This structure allows users to be part of multiple
groups and groups to have multiple users.
columns:
- name: group_name
description: Name of the user group
tests:
- not_null
- accepted_values:
values:
- confluence-users
- jira-software-users
- site-admins
- bitbucket-users
- bamboo-users
- fisheye-users
- crucible-users
- crowd-users
- jira-core-users
- jira-service-desk-users
- confluence-administrators
- jira-administrators
- bitbucket-administrators
- bamboo-administrators
- fisheye-administrators
- crucible-administrators
- crowd-administrators
- system-administrators
- project-administrators
- read-only-users
- developers
- testers
- product-managers
- support-team
- marketing-team
- sales-team
- hr-team
- name: user_identifier
description: Unique identifier for individual users
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column contains unique identifiers for individual users. For
this table, each row represents a relationship between a user group and a
user. The user_identifier is likely to be unique across rows, as it's described
as a unique identifier for users.
stg_priority (first 100 rows)
priority_id | priority_impact | priority_level | |
---|---|---|---|
0 | 2 | Serious problem that could block progress. | High |
1 | 5 | Trivial problem with little or no impact on progress. | Lowest |
2 | 1 | This problem will block progress. | Highest |
3 | 3 | Has the potential to affect progress. | Medium |
4 | 4 | Minor problem or easily worked around. | Low |
stg_priority.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"priority_projected" AS (
-- Projection: Selecting 3 out of 4 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name"
FROM "priority"
),
"priority_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> priority_id
-- description -> priority_impact
-- name -> priority_level
SELECT
"id" AS "priority_id",
"description" AS "priority_impact",
"name" AS "priority_level"
FROM "priority_projected"
)
-- COCOON BLOCK END
SELECT * FROM "priority_projected_renamed"
stg_priority.yml (Document the table)
version: 2
models:
- name: stg_priority
description: The table is about priority levels. It contains five priority levels,
each with an ID, description, and name. The levels range from Lowest to Highest.
Descriptions explain the impact of each priority level on progress. The table
helps categorize and understand the severity of problems or issues.
columns:
- name: priority_id
description: Unique identifier for each priority level
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each priority level.
For this table, each row corresponds to a distinct priority level. The priority_id
is unique across rows as it's designed to be an identifier.
- name: priority_impact
description: Explains the impact of the priority level
tests:
- not_null
- accepted_values:
values:
- Has the potential to affect progress.
- Minor problem or easily worked around.
- Serious problem that could block progress.
- This problem will block progress.
- Trivial problem with little or no impact on progress.
- name: priority_level
description: Short label for the priority level
tests:
- not_null
- accepted_values:
values:
- High
- Highest
- Low
- Lowest
- Medium
stg_resolution (first 100 rows)
resolution_id | status_description | status_name | |
---|---|---|---|
0 | 10000 | Work has been completed on this issue. | Done |
stg_resolution.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"resolution_projected" AS (
-- Projection: Selecting 3 out of 4 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name"
FROM "resolution"
),
"resolution_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> resolution_id
-- description -> status_description
-- name -> status_name
SELECT
"id" AS "resolution_id",
"description" AS "status_description",
"name" AS "status_name"
FROM "resolution_projected"
)
-- COCOON BLOCK END
SELECT * FROM "resolution_projected_renamed"
stg_resolution.yml (Document the table)
version: 2
models:
- name: stg_resolution
description: The table is about resolution statuses for issues or tasks. It contains
details of different resolution states. Each status has an ID, a description explaining
what it means, and a short name. The example shows the "Done" status, indicating
when work on an issue is completed.
columns:
- name: resolution_id
description: Unique identifier for the resolution status
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each resolution status.
For this table, each row represents a distinct resolution status. The resolution_id
is designed to be unique across rows, as it's typically used to uniquely identify
each status.
- name: status_description
description: Detailed explanation of the resolution status
tests:
- not_null
- name: status_name
description: Short name of the resolution status
tests:
- not_null
- accepted_values:
values:
- To Do
- In Progress
- Done
- Blocked
- On Hold
- Cancelled
- Rejected
- Reopened
- Resolved
- Closed
stg_issue_type (first 100 rows)
issue_description | issue_type | is_subtask | issue_type_id | |
---|---|---|---|---|
0 | A problem or error. | Bug | False | 10117 |
1 | A problem or error. | Bug | False | 10106 |
2 | Track large pieces of work. | Epic | False | 10101 |
stg_issue_type.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"issue_type_projected" AS (
-- Projection: Selecting 4 out of 5 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name",
"subtask"
FROM "issue_type"
),
"issue_type_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> issue_type_id
-- description -> issue_description
-- name -> issue_type
-- subtask -> is_subtask
SELECT
"id" AS "issue_type_id",
"description" AS "issue_description",
"name" AS "issue_type",
"subtask" AS "is_subtask"
FROM "issue_type_projected"
),
"issue_type_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- issue_description: The problem is that 'A problem or error.' and 'Bugs track problems or errors.' are redundant representations of the same concept. They both refer to issues or bugs in a system. The correct values should be distinct and represent different types of issues. 'Track large pieces of work.' is a distinct value and doesn't need to be changed.
SELECT
"issue_type_id",
CASE
WHEN "issue_description" = 'Bugs track problems or errors.' THEN 'A problem or error.'
ELSE "issue_description"
END AS "issue_description",
"issue_type",
"is_subtask"
FROM "issue_type_projected_renamed"
),
"issue_type_projected_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- issue_type_id: from INT to VARCHAR
SELECT
"issue_description",
"issue_type",
"is_subtask",
CAST("issue_type_id" AS VARCHAR) AS "issue_type_id"
FROM "issue_type_projected_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT * FROM "issue_type_projected_renamed_cleaned_casted"
stg_issue_type.yml (Document the table)
version: 2
models:
- name: stg_issue_type
description: The table is about issue types. It contains details of different issue
categories. Each issue type has an ID, description, name, and a flag indicating
if it's a subtask. The table includes examples like Bug and Epic. It provides
a brief explanation of each issue type's purpose or characteristics.
columns:
- name: issue_description
description: Brief explanation of the issue type
tests:
- not_null
- accepted_values:
values:
- A problem or error
- Track large pieces of work
- A new feature request
- An improvement suggestion
- A task to be completed
- A question or inquiry
- Documentation update
- Security vulnerability
- Performance issue
- User interface/experience issue
- Testing/QA task
- Maintenance task
- Integration issue
- Configuration change
- Data-related issue
- A problem or error.
- Track large pieces of work.
- name: issue_type
description: Short name of the issue type
tests:
- not_null
- accepted_values:
values:
- Bug
- Epic
- Task
- Story
- Subtask
- Improvement
- New Feature
- Problem
- Incident
- Change Request
- Defect
- Risk
- Spike
- name: is_subtask
description: Indicates if the issue type is a subtask
tests:
- not_null
- name: issue_type_id
description: Unique identifier for the issue type
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each issue type.
For this table, each row represents a distinct issue type. The issue_type_id
is likely to be unique across rows as it's described as a "Unique identifier
for the issue type".
stg_issue (first 100 rows)
issue_id | is_deleted | creator_id | description | issue_type_id | issue_key | priority_id | project_id | reporter_id | status_id | summary | assignee | created_at | due_date | environment | last_viewed_at | original_estimate | original_estimate_seconds | parent_issue_id | remaining_estimate | remaining_estimate_seconds | resolution_id | resolved_at | status_category_changed_at | time_spent | time_spent_seconds | updated_at | work_ratio | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10011 | False | 5c3326c24b248c315badf3ee | Yes, this is an issue to test permissions | 10103 | TCP-8 | 3 | 10001 | 5c3326c24b248c315badf3ee | 3 | We want to test permissions | None | 2020-04-01 08:33:58 | NaT | None | NaT | None | NaN | NaN | None | NaN | NaN | 2020-05-29 11:33:53 | 2020-04-01 08:34:02 | None | NaN | 2020-05-29 11:33:53 | -1.0 |
1 | 10015 | False | 5ed0edbe2999b60c23255b4f | None | 10103 | TCP-12 | 5 | 10001 | 5ed0edbe2999b60c23255b4f | 10003 | Part of test epic | None | 2020-06-11 07:38:03 | NaT | None | NaT | None | NaN | NaN | None | NaN | NaN | 2020-06-11 07:38:03 | 2020-06-11 07:38:03 | None | NaN | 2020-06-11 07:38:03 | -1.0 |
2 | 10018 | False | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | None | 10103 | TCP-1 | 3 | 10005 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | 10003 | this is a story | None | 2020-11-10 19:11:45 | NaT | None | 2020-11-10 19:11:46 | None | NaN | NaN | None | NaN | 10000.0 | 2020-11-28 19:11:45 | 2020-11-28 19:11:46 | None | NaN | 2020-11-28 19:11:45 | -1.0 |
stg_issue.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"issue_projected" AS (
-- Projection: Selecting 28 out of 29 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"_fivetran_deleted",
"_original_estimate",
"_remaining_estimate",
"_time_spent",
"assignee",
"created",
"creator",
"description",
"due_date",
"environment",
"issue_type",
"key_",
"last_viewed",
"original_estimate",
"parent_id",
"priority",
"project",
"remaining_estimate",
"reporter",
"resolution",
"resolved",
"status",
"status_category_changed",
"summary",
"time_spent",
"updated",
"work_ratio"
FROM "issue"
),
"issue_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> issue_id
-- _fivetran_deleted -> is_deleted
-- _original_estimate -> original_estimate_seconds
-- _remaining_estimate -> remaining_estimate_seconds
-- _time_spent -> time_spent_seconds
-- created -> created_at
-- creator -> creator_id
-- issue_type -> issue_type_id
-- key_ -> issue_key
-- last_viewed -> last_viewed_at
-- parent_id -> parent_issue_id
-- priority -> priority_id
-- project -> project_id
-- reporter -> reporter_id
-- resolution -> resolution_id
-- resolved -> resolved_at
-- status -> status_id
-- status_category_changed -> status_category_changed_at
-- updated -> updated_at
SELECT
"id" AS "issue_id",
"_fivetran_deleted" AS "is_deleted",
"_original_estimate" AS "original_estimate_seconds",
"_remaining_estimate" AS "remaining_estimate_seconds",
"_time_spent" AS "time_spent_seconds",
"assignee",
"created" AS "created_at",
"creator" AS "creator_id",
"description",
"due_date",
"environment",
"issue_type" AS "issue_type_id",
"key_" AS "issue_key",
"last_viewed" AS "last_viewed_at",
"original_estimate",
"parent_id" AS "parent_issue_id",
"priority" AS "priority_id",
"project" AS "project_id",
"remaining_estimate",
"reporter" AS "reporter_id",
"resolution" AS "resolution_id",
"resolved" AS "resolved_at",
"status" AS "status_id",
"status_category_changed" AS "status_category_changed_at",
"summary",
"time_spent",
"updated" AS "updated_at",
"work_ratio"
FROM "issue_projected"
),
"issue_projected_renamed_cleaned" AS (
-- Clean unusual string values:
-- issue_key: The problem is inconsistent prefixes in the issue_key column. The prefixes 'TCP' and 'TP' are likely meant to represent the same concept but are inconsistently applied. Given that 'TCP' appears in the most frequent value ('TCP-1'), it's likely the intended prefix. The correct values should all use the 'TCP' prefix for consistency.
SELECT
"issue_id",
"is_deleted",
"original_estimate_seconds",
"remaining_estimate_seconds",
"time_spent_seconds",
"assignee",
"created_at",
"creator_id",
"description",
"due_date",
"environment",
"issue_type_id",
CASE
WHEN "issue_key" = 'TP-12' THEN 'TCP-12'
WHEN "issue_key" = 'TP-8' THEN 'TCP-8'
ELSE "issue_key"
END AS "issue_key",
"last_viewed_at",
"original_estimate",
"parent_issue_id",
"priority_id",
"project_id",
"remaining_estimate",
"reporter_id",
"resolution_id",
"resolved_at",
"status_id",
"status_category_changed_at",
"summary",
"time_spent",
"updated_at",
"work_ratio"
FROM "issue_projected_renamed"
),
"issue_projected_renamed_cleaned_casted" AS (
-- Column Type Casting:
-- assignee: from DECIMAL to VARCHAR
-- created_at: from VARCHAR to TIMESTAMP
-- due_date: from DECIMAL to DATE
-- environment: from DECIMAL to VARCHAR
-- last_viewed_at: from VARCHAR to TIMESTAMP
-- original_estimate: from DECIMAL to VARCHAR
-- original_estimate_seconds: from DECIMAL to INT
-- parent_issue_id: from DECIMAL to INT
-- remaining_estimate: from DECIMAL to VARCHAR
-- remaining_estimate_seconds: from DECIMAL to INT
-- resolution_id: from DECIMAL to INT
-- resolved_at: from VARCHAR to TIMESTAMP
-- status_category_changed_at: from VARCHAR to TIMESTAMP
-- time_spent: from DECIMAL to VARCHAR
-- time_spent_seconds: from DECIMAL to INT
-- updated_at: from VARCHAR to TIMESTAMP
-- work_ratio: from INT to DECIMAL
SELECT
"issue_id",
"is_deleted",
"creator_id",
"description",
"issue_type_id",
"issue_key",
"priority_id",
"project_id",
"reporter_id",
"status_id",
"summary",
CAST("assignee" AS VARCHAR) AS "assignee",
CAST("created_at" AS TIMESTAMP) AS "created_at",
CAST("due_date" AS DATE) AS "due_date",
CAST("environment" AS VARCHAR) AS "environment",
CAST("last_viewed_at" AS TIMESTAMP) AS "last_viewed_at",
CAST("original_estimate" AS VARCHAR) AS "original_estimate",
CAST("original_estimate_seconds" AS INT) AS "original_estimate_seconds",
CAST("parent_issue_id" AS INT) AS "parent_issue_id",
CAST("remaining_estimate" AS VARCHAR) AS "remaining_estimate",
CAST("remaining_estimate_seconds" AS INT) AS "remaining_estimate_seconds",
CAST("resolution_id" AS INT) AS "resolution_id",
CAST("resolved_at" AS TIMESTAMP) AS "resolved_at",
CAST("status_category_changed_at" AS TIMESTAMP) AS "status_category_changed_at",
CAST("time_spent" AS VARCHAR) AS "time_spent",
CAST("time_spent_seconds" AS INT) AS "time_spent_seconds",
CAST("updated_at" AS TIMESTAMP) AS "updated_at",
CAST("work_ratio" AS DECIMAL) AS "work_ratio"
FROM "issue_projected_renamed_cleaned"
)
-- COCOON BLOCK END
SELECT * FROM "issue_projected_renamed_cleaned_casted"
stg_issue.yml (Document the table)
version: 2
models:
- name: stg_issue
description: The table is about issue tracking. It contains details like issue ID,
type, status, priority, and project. It also includes information on who created
and assigned the issue, when it was created and updated, and its resolution status.
The table tracks time estimates and actual time spent on issues.
columns:
- name: issue_id
description: Unique identifier for the issue
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each issue. For this
table, each row represents a distinct issue, and issue_id is unique across
rows.
- name: is_deleted
description: Indicates if the record was deleted
tests:
- not_null
- name: creator_id
description: User who created the issue
tests:
- not_null
- name: description
description: Detailed description of the issue
tests:
- not_null
- name: issue_type_id
description: Type of the issue (e.g., bug, task)
tests:
- not_null
- name: issue_key
description: Unique key for the issue
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique key for the issue. It appears to
be a combination of project code and issue number, which should be unique
for each issue.
- name: priority_id
description: Priority level of the issue
tests:
- not_null
- name: project_id
description: Project to which the issue belongs
tests:
- not_null
- name: reporter_id
description: User who reported the issue
tests:
- not_null
- name: status_id
description: Current status of the issue
tests:
- not_null
- name: summary
description: Brief summary of the issue
tests:
- not_null
- name: assignee
description: User assigned to the issue
cocoon_meta:
missing_acceptable: Issue may not be assigned to anyone yet
- name: created_at
description: Date and time when the issue was created
tests:
- not_null
- name: due_date
description: Deadline for the issue
cocoon_meta:
missing_acceptable: Issue may not have a set deadline
- name: environment
description: Environment where the issue occurs
cocoon_meta:
missing_acceptable: Issue may not be environment-specific
- name: last_viewed_at
description: Date and time when the issue was last viewed
tests:
- not_null
- name: original_estimate
description: Original time estimate (human-readable format)
cocoon_meta:
missing_acceptable: Time estimate may not be required for all issues
- name: original_estimate_seconds
description: Original time estimate for the issue
cocoon_meta:
missing_acceptable: Time estimate may not be required for all issues
- name: parent_issue_id
description: ID of the parent issue
cocoon_meta:
missing_acceptable: Issue may not be a sub-task of another issue
- name: remaining_estimate
description: Remaining time estimate (human-readable format)
cocoon_meta:
missing_acceptable: Time estimate may not be required for all issues
- name: remaining_estimate_seconds
description: Remaining time estimate for the issue
cocoon_meta:
missing_acceptable: Time estimate may not be required for all issues
- name: resolution_id
description: Resolution identifier
tests:
- not_null
- name: resolved_at
description: Date and time when the issue was resolved
tests:
- not_null
- name: status_category_changed_at
description: Date and time when status category last changed
tests:
- not_null
- name: time_spent
description: Time spent on the issue (human-readable format)
cocoon_meta:
missing_acceptable: No time may have been logged on the issue
- name: time_spent_seconds
description: Actual time spent on the issue
cocoon_meta:
missing_acceptable: No time may have been logged on the issue
- name: updated_at
description: Date and time when the issue was last updated
tests:
- not_null
- name: work_ratio
description: Ratio of work done to total work
tests:
- not_null
stg_comment (first 100 rows)
comment_id | comment_text | is_public | issue_id | author_id | creation_timestamp | last_update_author_id | last_update_timestamp | |
---|---|---|---|---|---|---|---|---|
0 | 10001 | Hello | True | 10025 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-10 19:19:41.224 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-10 19:19:41.224 |
1 | 10013 | To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago | True | 10035 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 |
2 | 10012 | Joined Sample Sprint 2 7 days 9 hours 10 minutes ago | True | 10035 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 |
stg_comment.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"comment_projected" AS (
-- Projection: Selecting 8 out of 9 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"author_id",
"body",
"created",
"is_public",
"issue_id",
"update_author_id",
"updated"
FROM "comment"
),
"comment_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> comment_id
-- body -> comment_text
-- created -> creation_timestamp
-- update_author_id -> last_update_author_id
-- updated -> last_update_timestamp
SELECT
"id" AS "comment_id",
"author_id",
"body" AS "comment_text",
"created" AS "creation_timestamp",
"is_public",
"issue_id",
"update_author_id" AS "last_update_author_id",
"updated" AS "last_update_timestamp"
FROM "comment_projected"
),
"comment_projected_renamed_casted" AS (
-- Column Type Casting:
-- author_id: from VARCHAR to UUID
-- creation_timestamp: from VARCHAR to TIMESTAMP
-- last_update_author_id: from VARCHAR to UUID
-- last_update_timestamp: from VARCHAR to TIMESTAMP
SELECT
"comment_id",
"comment_text",
"is_public",
"issue_id",
CAST(SPLIT_PART("author_id", ':', 2) AS UUID) AS "author_id",
CAST("creation_timestamp" AS TIMESTAMP) AS "creation_timestamp",
CAST(SUBSTRING("last_update_author_id" FROM POSITION(':' IN "last_update_author_id") + 1) AS UUID) AS "last_update_author_id",
CAST("last_update_timestamp" AS TIMESTAMP) AS "last_update_timestamp"
FROM "comment_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "comment_projected_renamed_casted"
stg_comment.yml (Document the table)
version: 2
models:
- name: stg_comment
description: The table is about comments on issues. Each comment has an ID, author,
body text, creation date, public visibility status, associated issue ID, update
author, and update date. Comments are linked to specific issues and can be updated.
The table tracks both the original creation and any subsequent updates to the
comment.
columns:
- name: comment_id
description: Unique identifier for the comment
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each comment. For
this table, each row represents a distinct comment, and comment_id is designed
to be unique across rows.
- name: comment_text
description: Text content of the comment
tests:
- not_null
- name: is_public
description: Visibility status of the comment
tests:
- not_null
- name: issue_id
description: ID of the associated issue
tests:
- not_null
- name: author_id
description: ID of the comment author
tests:
- not_null
- name: creation_timestamp
description: Timestamp of comment creation
tests:
- not_null
- name: last_update_author_id
description: ID of the author who last updated the comment
tests:
- not_null
- name: last_update_timestamp
description: Timestamp of the last update
tests:
- not_null
stg_component (first 100 rows)
component_id | component_name | project_id | component_description | |
---|---|---|---|---|
0 | 10004 | Component 5 | 10001 | None |
1 | 10003 | Component 4 | 10001 | None |
2 | 10002 | Component 3 | 10001 | None |
3 | 10019 | Component 2 | 10001 | None |
stg_component.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"component_projected" AS (
-- Projection: Selecting 4 out of 5 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name",
"project_id"
FROM "component"
),
"component_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> component_id
-- description -> component_description
-- name -> component_name
SELECT
"id" AS "component_id",
"description" AS "component_description",
"name" AS "component_name",
"project_id"
FROM "component_projected"
),
"component_projected_renamed_casted" AS (
-- Column Type Casting:
-- component_description: from DECIMAL to VARCHAR
SELECT
"component_id",
"component_name",
"project_id",
CAST("component_description" AS VARCHAR) AS "component_description"
FROM "component_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "component_projected_renamed_casted"
stg_component.yml (Document the table)
version: 2
models:
- name: stg_component
description: The table is about components. Each component has an ID, name, and
description. Components are associated with projects through a project_id. The
description field appears to be empty for these samples. The table likely represents
a list of components within a project management or software development system.
columns:
- name: component_id
description: Unique identifier for the component
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each component. For
this table, each row represents a distinct component, and component_id appears
to be unique across rows as it's an ID field.
- name: component_name
description: Name of the component
tests:
- not_null
- name: project_id
description: Identifier of the project the component belongs to
tests:
- not_null
- name: component_description
description: Brief description of the component
cocoon_meta:
missing_acceptable: Optional field, not all components require a description.
stg_status (first 100 rows)
status_description | status_name | category_id | status_id | |
---|---|---|---|---|
0 | None | In Progress | 4 | 10010 |
1 | This issue is being actively worked on at the moment by the assignee. | In Progress | 4 | 3 |
2 | None | To Do | 2 | 10000 |
stg_status.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"status_projected" AS (
-- Projection: Selecting 4 out of 5 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"description",
"name",
"status_category_id"
FROM "status"
),
"status_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> status_id
-- description -> status_description
-- name -> status_name
-- status_category_id -> category_id
SELECT
"id" AS "status_id",
"description" AS "status_description",
"name" AS "status_name",
"status_category_id" AS "category_id"
FROM "status_projected"
),
"status_projected_renamed_casted" AS (
-- Column Type Casting:
-- status_id: from INT to VARCHAR
SELECT
"status_description",
"status_name",
"category_id",
CAST("status_id" AS VARCHAR) AS "status_id"
FROM "status_projected_renamed"
)
-- COCOON BLOCK END
SELECT * FROM "status_projected_renamed_casted"
stg_status.yml (Document the table)
version: 2
models:
- name: stg_status
description: The table is about issue statuses. It contains unique identifiers,
descriptions, names, and category IDs for various statuses. The statuses include
"In Progress" and "To Do". Some entries have detailed descriptions, while others
are left blank. Each status is associated with a specific category through the
status_category_id field.
columns:
- name: status_description
description: Detailed explanation of the status
cocoon_meta:
missing_acceptable: Not needed for standard/predefined statuses.
- name: status_name
description: Name of the status
tests:
- not_null
- accepted_values:
values:
- To Do
- In Progress
- Blocked
- In Review
- Done
- Cancelled
- name: category_id
description: ID of the category the status belongs to
tests:
- not_null
- name: status_id
description: Unique identifier for the status
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column is described as a unique identifier for the status.
For this table, each row represents a unique status. The status_id is designed
to be unique across all rows, ensuring that each status has a distinct identifier.
stg_user (first 100 rows)
user_id | language_preference | full_name | time_zone | |
---|---|---|---|---|
0 | 557058:a9572f6a-3041-435a-96ed-5286e3811b33 | en_US | Scooby Doo | America/Los_Angeles |
1 | 5ed0edbe2999b60c23255b4f | en_US | Scrappy Cornelius Doo | America/Los_Angeles |
2 | 5c3326c24b248c315badf3ee | en_US | Norville Shaggy Rogers | America/Los_Angeles |
stg_user.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"user_projected" AS (
-- Projection: Selecting 6 out of 7 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"id",
"email",
"locale",
"name",
"time_zone",
"username"
FROM "user"
),
"user_projected_renamed" AS (
-- Rename: Renaming columns
-- id -> user_id
-- locale -> language_preference
-- name -> full_name
SELECT
"id" AS "user_id",
"email",
"locale" AS "language_preference",
"name" AS "full_name",
"time_zone",
"username"
FROM "user_projected"
),
"user_projected_renamed_casted" AS (
-- Column Type Casting:
-- email: from DECIMAL to VARCHAR
-- username: from DECIMAL to VARCHAR
SELECT
"user_id",
"language_preference",
"full_name",
"time_zone",
CAST("email" AS VARCHAR) AS "email",
CAST("username" AS VARCHAR) AS "username"
FROM "user_projected_renamed"
),
"user_projected_renamed_casted_missing_handled" AS (
-- Handling missing values: There are 2 columns with unacceptable missing values
-- email has 100.0 percent missing. Strategy: 🗑️ Drop Column
-- username has 100.0 percent missing. Strategy: 🗑️ Drop Column
SELECT
"user_id",
"language_preference",
"full_name",
"time_zone"
FROM "user_projected_renamed_casted"
)
-- COCOON BLOCK END
SELECT * FROM "user_projected_renamed_casted_missing_handled"
stg_user.yml (Document the table)
version: 2
models:
- name: stg_user
description: The table is about user profiles. It contains unique identifiers, email
addresses, language preferences, full names, time zones, and usernames. The samples
show fictional characters like Scooby Doo and Shaggy Rogers. Some fields are empty,
suggesting optional information. The table likely represents user account data
for a system or application.
columns:
- name: user_id
description: Unique identifier for the user account
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each user account.
For this table, each row represents a distinct user profile. The user_id appears
to be unique across rows, as it contains both numeric and UUID-style identifiers.
- name: language_preference
description: User's language and region preference
tests:
- not_null
- name: full_name
description: User's full name
tests:
- not_null
- name: time_zone
description: User's preferred time zone
tests:
- not_null
stg_project_board (first 100 rows)
board_id | project_id | |
---|---|---|
0 | 7 | 10006 |
1 | 5 | 10004 |
stg_project_board.sql (clean the table)
-- COCOON BLOCK START: PLEASE DO NOT MODIFY THIS BLOCK FOR SELF-MAINTENANCE
WITH
"project_board_projected" AS (
-- Projection: Selecting 2 out of 3 columns
-- Columns projected out: ['_fivetran_synced']
SELECT
"board_id",
"project_id"
FROM "project_board"
)
-- COCOON BLOCK END
SELECT * FROM "project_board_projected"
stg_project_board.yml (Document the table)
version: 2
models:
- name: stg_project_board
description: The table 'project_board' represents a relation between project boards
and projects. It links each board to a specific project. The 'board_id' column
identifies unique boards. The 'project_id' column associates each board with a
particular project. This structure allows multiple boards to be organized within
different projects.
columns:
- name: board_id
description: Unique identifier for each project board
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column represents a unique identifier for each project board.
For this table, each row represents a distinct project board. The board_id
is unique across rows as it's described as a unique identifier.
- name: project_id
description: Identifier of the project associated with the board
tests:
- not_null
snapshot_issue_multiselect_history (first 100 rows)
record_id | field_id | issue_id | new_field_value | |
---|---|---|---|---|
0 | /zrY8m6q0VMW6ia1jGIerXqLIgQ= | customfield_10020 | 10027 | 0 |
1 | w4UiT+hPMxJp3RhL/YFJm3uWL5U= | subtasks | 10027 | 0 |
2 | 4pVgGn0qSqR2hCmMdo4wWHXmgew= | customfield_10021 | 10027 | 0 |
3 | 3p3gGn0qSqR2hCmMdo4wWHXa32m= | components | 10027 | 0 |
snapshot_issue_multiselect_history.sql (clean the table)
-- Slowly Changing Dimension: Dimension keys are "issue_id", "field_id"
-- Effective date columns are "change_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT
"record_id",
"field_id",
"issue_id",
"new_field_value"
FROM (
SELECT
"record_id",
"field_id",
"issue_id",
"new_field_value",
ROW_NUMBER() OVER (
PARTITION BY "issue_id", "field_id"
ORDER BY "change_timestamp"
DESC) AS "cocoon_rn"
FROM "stg_issue_multiselect_history"
) ranked
WHERE "cocoon_rn" = 1
snapshot_issue_multiselect_history.yml (Document the table)
version: 2
models:
- name: snapshot_issue_multiselect_history
description: The table is about current issue field values. It tracks the most recent
state of various fields for each issue. Each row represents the latest value for
a specific field of an issue. The table includes the issue ID, field ID, and the
current field value. It provides a snapshot of the current state of issues without
historical changes or timestamps.
columns:
- name: record_id
description: Unique identifier for each record
tests:
- not_null
- unique
cocoon_meta:
uniqueness: This column appears to be a unique identifier for each change record.
For this table, each row represents a single change to an issue field. The
record_id seems to be unique across rows, as it's described as a unique identifier
for each record.
- name: field_id
description: Identifier of the issue field that changed
tests:
- not_null
- name: issue_id
description: Identifier of the issue being tracked
tests:
- not_null
- name: new_field_value
description: New value of the field after the change
tests:
- not_null
cocoon_meta:
scd_base_table: stg_issue_multiselect_history
snapshot_comment (first 100 rows)
comment_id | comment_text | is_public | issue_id | author_id | creation_timestamp | last_update_author_id | |
---|---|---|---|---|---|---|---|
0 | 10012 | Joined Sample Sprint 2 7 days 9 hours 10 minutes ago | True | 10035 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 | a9572f6a-3041-435a-96ed-5286e3811b33 |
1 | 10001 | Hello | True | 10025 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-10 19:19:41.224 | a9572f6a-3041-435a-96ed-5286e3811b33 |
2 | 10013 | To Do to In Progress 6 days 22 hours 26 minutes ago In Progress to Done 3 days 16 hours 34 minutes ago | True | 10035 | a9572f6a-3041-435a-96ed-5286e3811b33 | 2020-11-07 02:45:38.717 | a9572f6a-3041-435a-96ed-5286e3811b33 |
snapshot_comment.sql (clean the table)
-- Slowly Changing Dimension: Dimension keys are "comment_id"
-- Effective date columns are "last_update_timestamp"
-- We will create Type 1 SCD (latest snapshot)
SELECT
"comment_id",
"comment_text",
"is_public",
"issue_id",
"author_id",
"creation_timestamp",
"last_update_author_id"
FROM (
SELECT
"comment_id",
"comment_text",
"is_public",
"issue_id",
"author_id",
"creation_timestamp",
"last_update_author_id",
ROW_NUMBER() OVER (
PARTITION BY "comment_id"
ORDER BY "last_update_timestamp"
DESC) AS "cocoon_rn"
FROM "stg_comment"
) ranked
WHERE "cocoon_rn" = 1
snapshot_comment.yml (Document the table)
version: 2
models:
- name: snapshot_comment
description: The table is about the latest version of comments on issues. It tracks
the most recent comment text, public visibility status, associated issue ID, original
author, and creation timestamp for each unique comment. Each comment is linked
to a specific issue and author. The table provides a current snapshot of all comments
without version history.
columns:
- name: comment_id
description: Unique identifier for the comment
tests:
- not_null
- unique
cocoon_meta:
uniqueness: Unique dimension key, derived from the slowly changing dimension
- name: comment_text
description: Text content of the comment
tests:
- not_null
- name: is_public
description: Visibility status of the comment
tests:
- not_null
- name: issue_id
description: ID of the associated issue
tests:
- not_null
- name: author_id
description: ID of the comment author
tests:
- not_null
- name: creation_timestamp
description: Timestamp of comment creation
tests:
- not_null
- name: last_update_author_id
description: ID of the author who last updated the comment
tests:
- not_null
cocoon_meta:
scd_base_table: stg_comment
Join Graph (FK to PK)
cocoon_join.yml (Document the joins)
join_graph:
- table_name: stg_field
primary_key: field_id
foreign_keys: []
- table_name: stg_field_option
foreign_keys:
- column: field_id
reference:
table_name: stg_field
column: field_id
- table_name: stg_issue_field_history
foreign_keys:
- column: field_identifier
reference:
table_name: stg_field
column: field_id
- column: issue_identifier
reference:
table_name: stg_issue
column: issue_id
- table_name: stg_issue
primary_key: issue_id
foreign_keys:
- column: issue_type_id
reference:
table_name: stg_issue_type
column: issue_type_id
- column: priority_id
reference:
table_name: stg_priority
column: priority_id
- column: project_id
reference:
table_name: stg_project
column: project_id
- column: resolution_id
reference:
table_name: stg_resolution
column: resolution_id
- column: status_id
reference:
table_name: stg_status
column: status_id
- column: creator_id
reference:
table_name: stg_user
column: user_id
- column: reporter_id
reference:
table_name: stg_user
column: user_id
- table_name: stg_issue_link
foreign_keys:
- column: issue_id
reference:
table_name: stg_issue
column: issue_id
- column: related_issue_id
reference:
table_name: stg_issue
column: issue_id
- table_name: snapshot_comment
foreign_keys:
- column: issue_id
reference:
table_name: stg_issue
column: issue_id
- column: author_id
reference:
table_name: stg_user
column: user_id
- column: last_update_author_id
reference:
table_name: stg_user
column: user_id
- table_name: snapshot_issue_multiselect_history
foreign_keys:
- column: issue_id
reference:
table_name: stg_issue
column: issue_id
- table_name: stg_issue_type
primary_key: issue_type_id
foreign_keys: []
- table_name: stg_priority
primary_key: priority_id
foreign_keys: []
- table_name: stg_project
primary_key: project_id
foreign_keys:
- column: category_id
reference:
table_name: stg_project_category
column: category_id
- table_name: stg_component
foreign_keys:
- column: project_id
reference:
table_name: stg_project
column: project_id
- table_name: stg_project_board
foreign_keys:
- column: project_id
reference:
table_name: stg_project
column: project_id
primary_key: board_id
- table_name: stg_version
foreign_keys:
- column: project_id
reference:
table_name: stg_project
column: project_id
- table_name: stg_sprint
foreign_keys:
- column: board_id
reference:
table_name: stg_project_board
column: board_id
- table_name: stg_project_category
primary_key: category_id
foreign_keys: []
- table_name: stg_status
foreign_keys:
- column: category_id
reference:
table_name: stg_project_category
column: category_id
primary_key: status_id
- table_name: stg_resolution
primary_key: resolution_id
foreign_keys: []
- table_name: stg_status_category
foreign_keys:
- column: status_id
reference:
table_name: stg_status
column: status_id
- table_name: stg_user
primary_key: user_id
foreign_keys: []
cocoon_er.yml (Document the ER model)
entities:
- entity_name: Fields
entity_description: Represents the various fields used in an issue tracking system,
including both standard and custom fields.
table_name: stg_field
primary_key: field_id
- entity_name: Issues
entity_description: Represents individual issues or tasks in an issue tracking system,
including their details, status, and time tracking information.
table_name: stg_issue
primary_key: issue_id
- entity_name: Issue Types
entity_description: Represents different categories or types of issues that can
be created in the issue tracking system.
table_name: stg_issue_type
primary_key: issue_type_id
- entity_name: Priorities
entity_description: Represents the different priority levels that can be assigned
to issues, indicating their importance or urgency.
table_name: stg_priority
primary_key: priority_id
- entity_name: Projects
entity_description: Represents individual projects within the system, including
their details and associated metadata.
table_name: stg_project
primary_key: project_id
- entity_name: Project Boards
entity_description: Represents the boards associated with projects, typically used
for visualizing and managing project tasks or issues.
table_name: stg_project_board
primary_key: board_id
- entity_name: Project Categories
entity_description: Represents the categories used to classify and organize different
types of projects within the system.
table_name: stg_project_category
primary_key: category_id
- entity_name: Resolutions
entity_description: Represents the various resolution statuses that can be applied
to issues when they are completed or closed.
table_name: stg_resolution
primary_key: resolution_id
- entity_name: Statuses
entity_description: Represents the different status options that can be assigned
to issues, indicating their current state in the workflow.
table_name: stg_status
primary_key: status_id
- entity_name: Users
entity_description: Represents user accounts and profiles within the system, including
personal information and preferences.
table_name: stg_user
primary_key: user_id
relations:
- relation_name: IssueTrackingSystem
relation_description: Issues are created within Projects, categorized by Issue Types,
assigned Priorities, tracked through Statuses, addressed by Resolutions, and
managed by Users as creators, reporters, or assignees.
table_name: stg_issue
entities:
- Issues
- Issue Types
- Priorities
- Projects
- Resolutions
- Statuses
- Users
- relation_name: ProjectCategorization
relation_description: Projects are organized into Project Categories, which group
related projects together for better management and classification.
table_name: stg_project
entities:
- Projects
- Project Categories
- relation_name: ProjectBoardAssociation
relation_description: Project Boards are organized within Projects, allowing multiple
boards to be associated with and managed under specific projects.
table_name: stg_project_board
entities:
- Project Boards
- Projects
- relation_name: StatusCategoryAssignment
relation_description: Statuses are assigned to specific Project Categories, indicating
the current state of tasks or issues within those categories.
table_name: stg_status
entities:
- Statuses
- Project Categories
- relation_description: This table stores the available options or status labels for
different fields in a task management or issue tracking system.
table_name: stg_field_option
entities:
- Fields
- relation_name: IssueFieldChangeHistory
relation_description: Issues contain multiple Fields, and the history of changes
to these Fields is tracked over time for each Issue.
table_name: stg_issue_field_history
entities:
- Fields
- Issues
- relation_description: This table represents relationships between Issues, showing
how one issue is linked to another through specific connection types.
table_name: stg_issue_link
entities:
- Issues
- relation_name: UserIssueComments
relation_description: Users create and update comments on specific Issues, with
each comment having visibility settings and timestamps.
table_name: snapshot_comment
entities:
- Issues
- Users
- relation_description: This table stores the current field values for various Issues,
providing a snapshot of their latest state.
table_name: snapshot_issue_multiselect_history
entities:
- Issues
- relation_description: This stores the Components that are associated with Projects,
where each Component belongs to a specific Project.
table_name: stg_component
entities:
- Projects
- relation_description: This stores the versions of software projects, tracking their
release status, dates, and other attributes.
table_name: stg_version
entities:
- Projects
- relation_description: This table tracks sprints associated with project boards,
detailing their timelines, status, and identifying information.
table_name: stg_sprint
entities:
- Project Boards
- relation_description: This table stores the Status categories that can be assigned
to tasks or projects in a task management system.
table_name: stg_status_category
entities:
- Statuses
story:
- relation_name: ProjectCategorization
story_line: Administrators group projects into specific categories for organization.
- relation_name: ProjectBoardAssociation
story_line: Project managers create boards within projects for task visualization.
- relation_name: StatusCategoryAssignment
story_line: Team leads assign statuses to project categories for progress tracking.
- relation_name: IssueTrackingSystem
story_line: Team members create issues in projects with types, priorities, and assignees.
- relation_name: UserIssueComments
story_line: Users add comments to issues for collaboration and updates.
- relation_name: IssueFieldChangeHistory
story_line: System logs changes to issue fields for audit purposes.