This SQL query creates a daily performance report for Amplitude events. It starts by joining event data with a date spine to ensure all dates are represented. It then aggregates event data to calculate daily counts of events, sessions, users, and new users for each event type. Finally, it combines this information, fills in any missing values with zeros, and generates a unique key for each daily record.
IntegrationAggregationFeaturizationCleaningWITH event_enhanced AS (
SELECT
*
FROM TEST.PUBLIC_amplitude.amplitude__event_enhanced
), date_spine AS (
SELECT
spine.*
FROM TEST.PUBLIC_int_amplitude.int_amplitude__date_spine AS spine
), agg_event_data AS (
SELECT
event_day,
event_type,
COUNT(DISTINCT unique_event_id) AS number_events,
COUNT(DISTINCT unique_session_id) AS number_sessions,
COUNT(DISTINCT amplitude_user_id) AS number_users,
COUNT(
DISTINCT (
CASE
WHEN CAST(DATE_TRUNC('DAY', user_creation_time) AS DATE) = event_day
THEN amplitude_user_id
END
)
) AS number_new_users
FROM event_enhanced
GROUP BY
1,
2
), spine_joined AS (
SELECT
date_spine.event_day,
date_spine.event_type,
agg_event_data.number_events,
agg_event_data.number_sessions,
agg_event_data.number_users,
agg_event_data.number_new_users
FROM date_spine
LEFT JOIN agg_event_data
ON date_spine.event_day = agg_event_data.event_day
AND date_spine.event_type = agg_event_data.event_type
), final AS (
SELECT
event_day,
event_type,
COALESCE(number_events, 0) AS number_events,
COALESCE(number_sessions, 0) AS number_sessions,
COALESCE(number_users, 0) AS number_users,
COALESCE(number_new_users, 0) AS number_new_users,
MD5(
CAST(COALESCE(CAST(event_day AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(event_type AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS daily_unique_key
FROM spine_joined
)
SELECT
*
FROM final
Name | Type | Comment |
---|---|---|
event_day | date | Day of event time |
event_type | text | The assigned type of event. Example 'Add Friend' |
number_events | bigint | Count of distinct events |
number_sessions | bigint | Count of distinct sessions |
number_users | bigint | Count of distinct visitors |
number_new_users | bigint | Count of distinct new visitors for that day |
daily_unique_key | text | None |
This SQL query combines and enhances event data from Amplitude. It deduplicates raw event data, joins it with event type and session data, and adds additional columns such as session and user event numbers. The query also incorporates various event and session-related metrics, and finally generates a unique key for each event.
DeduplicationIntegrationFeaturizationCleaningWITH event_data_raw AS (
SELECT
events.*
FROM TEST.PUBLIC__source_amplitude.stg_amplitude__event AS events
), event_data /* deduplicate */ AS (
SELECT
*
FROM (
SELECT
*,
CASE
WHEN NOT _insert_id IS NULL
THEN ROW_NUMBER() OVER (PARTITION BY _insert_id ORDER BY client_upload_time DESC)
ELSE ROW_NUMBER() OVER (PARTITION BY event_id, device_id, client_event_time, amplitude_user_id ORDER BY client_upload_time DESC)
END AS nth_event_record
FROM event_data_raw
) AS duplicates
WHERE
nth_event_record = 1
), event_type AS (
SELECT
*
FROM TEST.PUBLIC__source_amplitude.stg_amplitude__event_type
), session_data AS (
SELECT
*
FROM TEST.PUBLIC_amplitude.amplitude__sessions
), event_enhanced AS (
SELECT
event_data.unique_event_id,
event_data.unique_session_id,
CAST(event_data.amplitude_user_id AS TEXT) AS amplitude_user_id,
event_data.event_id,
event_data.event_type,
event_data.event_time,
event_data.event_day,
event_type.event_type_id,
event_type.event_type_name,
event_data.session_id,
ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY event_time ASC) AS session_event_number,
ROW_NUMBER() OVER (PARTITION BY amplitude_user_id ORDER BY event_time ASC) AS user_event_number,
event_data.group_types,
CAST(event_data.user_id AS TEXT) AS user_id,
event_data.user_creation_time,
event_data.amplitude_id,
event_data.app,
event_data.project_name,
event_data.version_name,
event_data.client_event_time,
event_data.client_upload_time,
event_data.server_received_time,
event_data.server_upload_time,
event_data.city,
event_data.country,
event_data.region,
event_data.data,
event_data.location_lat,
event_data.location_lng,
event_data.device_brand,
event_data.device_carrier,
event_data.device_family,
event_data.device_id,
event_data.device_manufacturer,
event_data.device_model,
event_data.device_type,
event_data.ip_address,
event_data.os_name,
event_data.os_version,
event_data.platform,
event_data.language,
event_data.dma,
event_data.schema,
event_data.start_version,
event_type.totals,
event_type.value,
session_data.events_per_session,
session_data.session_started_at,
session_data.session_ended_at,
session_data.user_session_number,
session_data.session_started_at_day,
session_data.session_ended_at_day,
session_data.session_length_in_minutes,
session_data.is_first_user_session,
session_data.minutes_in_between_sessions
FROM event_data
LEFT JOIN event_type
ON event_data.event_type_id = event_type.event_type_id
LEFT JOIN session_data
ON event_data.unique_session_id = session_data.unique_session_id
), final AS (
SELECT
*,
MD5(
CAST(COALESCE(CAST(unique_event_id AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(event_day AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS unique_key
FROM event_enhanced
)
SELECT
*
FROM final
Name | Type | Comment |
---|---|---|
unique_event_id | text | A unique key for each event, concatenating event_id, device_id, and client_event_time |
unique_session_id | text | Unique key of each session, concatenating user_id and session_id |
amplitude_user_id | text | Unique key for each user, which is user_id by default. If user_id does not exist, use amplitude_id |
event_id | bigint | A counter that distinguishes events |
event_type | text | The assigned type of event. Example 'Add Friend' |
event_time | timestamp without time zone | Amplitude timestamp (UTC) which is the client_event_time adjusted by the difference between server_received_time and client_upload_time, specifically: event_time = client_event_time + (server_received_time - client_upload_time) 2015-08-10T12:00:00.000000 We use this timestamp to organize events on Amplitude charts. Note: If the difference between server_received_time and client_upload_time is less than 60 seconds, the event_time will not be adjusted and will equal the client_event_time |
event_day | date | Day of event time |
session_id | bigint | The session start time in milliseconds since epoch. All events within the same session share the same session ID. By default, the setting in Amplitude for the session property is session ID. All events with the same session ID and the same user ID will be grouped into the same session. The session ID does not have to be unique across multiple users. Session IDs for events sent via the Amplitude SDKs are automatically generated and managed. However, for events sent via the HTTP API, Amplitude defaults to a session ID of -1. This means the event is excluded from all session metrics. |
amplitude_id | text | An internal ID used to count unique users. |
app | bigint | Project ID found in your project's Settings page |
project_name | bigint | Project ID |
version_name | integer | The app version |
client_event_time | timestamp without time zone | Local timestamp (UTC) of when the device logged the event. |
client_upload_time | timestamp without time zone | The local timestamp (UTC) of when the device uploaded the event. |
server_received_time | timestamp without time zone | Amplitude timestamp (UTC) of when servers received the event |
server_upload_time | timestamp without time zone | Amplitude timestamp (UTC) of when the event data was uploaded to Amplitude servers |
city | integer | City |
country | text | Country |
region | integer | Region |
data | text | Dictionary where certain fields such as first_event and merged_amplitude_id are stored. |
location_lat | integer | Latitude |
location_lng | integer | Longitude |
device_brand | integer | Device brand |
device_carrier | integer | Device carrier. |
device_family | integer | Device family. Apple iPhone |
device_id | text | The device specific identifier |
device_manufacturer | integer | Device manufacturer |
device_model | integer | The device model |
device_type | integer | Device type |
ip_address | text | IP address |
os_name | integer | OS name |
os_version | integer | OS version |
platform | integer | Platform of the device. |
language | integer | The language set by the user |
dma | integer | Designated marketing area (DMA) |
schema | bigint | Schema ID |
start_version | integer | App version the user was first tracked on |
user_creation_time | timestamp without time zone | Event_time (UTC) of the user's first event |
session_event_number | bigint | The number of the event within the session, ordered by time of event. |
group_types | text | Group types. "Groups" in the source table, renamed here. See the Accounts documentation for more information. |
user_id | text | A readable ID specified by you. Amplitude uses a combination of three different methods to identify your users: device IDs, Amplitude ID, and user IDs. The first comes directly from your users' devices, while the second is an ID that Amplitude automatically creates once it has enough information to conclusively identify a unique user. The user ID, however, is something you'd set up. In Amplitude, a user ID is a unique identifier applied to individual users. Using them is optional, but recommended: your product should set a user ID once a user has created an account, logged in, or is otherwise identified in your product. Amplitude can use a user ID to reconcile events across multiple devices under the same user ID. Additionally, a user's event data will be merged on the backend so that all anonymous events prior to the assignment of a user ID will be connected to the correct user. For this reason, you can wait to assign user IDs if that makes sense for your product. Conversely, this is also why you should not set user IDs for anonymous users. Once set, user IDs in Amplitude cannot be changed.Once set, user IDs in Amplitude cannot be changed. |
event_type_id | bigint | Event type ID |
event_type_name | text | Event type name |
totals | integer | The total number of times the event has happened this week |
value | text | Name of the event in the raw data. |
events_per_session | bigint | The count of events in a session |
session_started_at | timestamp without time zone | The time of the first event in a session |
session_ended_at | timestamp without time zone | The time of the last event in a session |
session_length | None | The length of the session, in seconds |
is_first_user_session | integer | Boolean of whether or not a session is the user's first session |
user_event_number | bigint | None |
user_session_number | bigint | None |
session_started_at_day | timestamp without time zone | None |
session_ended_at_day | timestamp without time zone | None |
session_length_in_minutes | double precision | None |
minutes_in_between_sessions | double precision | None |
unique_key | text | None |
This SQL query processes event data from Amplitude to create a comprehensive session analysis. It deduplicates raw event data, aggregates events into sessions, calculates session metrics (such as duration and events per session), ranks sessions by user, and determines time between sessions. The query also identifies first sessions for each user and calculates the time gap between consecutive sessions.
DeduplicationAggregationFeaturizationCleaningWITH event_data_raw AS (
SELECT
events.*
FROM TEST.PUBLIC__source_amplitude.stg_amplitude__event AS events
), event_data /* deduplicate */ AS (
SELECT
*
FROM (
SELECT
*,
CASE
WHEN NOT _insert_id IS NULL
THEN ROW_NUMBER() OVER (PARTITION BY _insert_id ORDER BY client_upload_time DESC)
ELSE ROW_NUMBER() OVER (PARTITION BY event_id, device_id, client_event_time, amplitude_user_id ORDER BY client_upload_time DESC)
END AS nth_event_record
FROM event_data_raw
) AS duplicates
WHERE
nth_event_record = 1
), session_agg AS (
SELECT
unique_session_id,
user_id,
COUNT(event_id) AS events_per_session,
MIN(event_time) AS session_started_at,
MAX(event_time) AS session_ended_at,
DATEDIFF(second, MIN(event_time), MAX(event_time)) / 60 AS session_length_in_minutes
FROM event_data
GROUP BY
1,
2
), session_ranking AS (
SELECT
unique_session_id,
user_id,
events_per_session,
session_started_at,
session_ended_at,
session_length_in_minutes,
DATE_TRUNC('DAY', session_started_at) AS session_started_at_day,
DATE_TRUNC('DAY', session_ended_at) AS session_ended_at_day,
CASE
WHEN NOT user_id IS NULL
THEN ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY session_started_at)
ELSE NULL
END AS user_session_number
FROM session_agg
), session_lag AS (
SELECT
*,
CASE
WHEN NOT user_id IS NULL
THEN LAG(session_ended_at, 1) OVER (PARTITION BY user_id ORDER BY session_ended_at)
ELSE NULL
END AS last_session_ended_at, /* determine prior sessions' end time, then in the following cte calculate the difference between current session's start time and last session's end time to determine the time in between sessions */
CASE
WHEN NOT user_id IS NULL
THEN LAG(session_ended_at_day, 1) OVER (PARTITION BY user_id ORDER BY session_ended_at_day)
ELSE NULL
END AS last_session_ended_at_day
FROM session_ranking
)
SELECT
*,
CASE WHEN user_session_number = 1 THEN 1 ELSE 0 END AS is_first_user_session,
CASE
WHEN NOT user_id IS NULL
THEN DATEDIFF(second, last_session_ended_at, SESSION_STARTED_AT) / 60
ELSE NULL
END AS minutes_in_between_sessions
FROM session_lag
Name | Type | Comment |
---|---|---|
unique_session_id | text | Unique key of each session, concatenating user_id and session_id |
user_id | text | A readable ID specified by you. Amplitude uses a combination of three different methods to identify your users: device IDs, Amplitude ID, and user IDs. The first comes directly from your users' devices, while the second is an ID that Amplitude automatically creates once it has enough information to conclusively identify a unique user. The user ID, however, is something you'd set up. In Amplitude, a user ID is a unique identifier applied to individual users. Using them is optional, but recommended: your product should set a user ID once a user has created an account, logged in, or is otherwise identified in your product. Amplitude can use a user ID to reconcile events across multiple devices under the same user ID. Additionally, a user's event data will be merged on the backend so that all anonymous events prior to the assignment of a user ID will be connected to the correct user. For this reason, you can wait to assign user IDs if that makes sense for your product. Conversely, this is also why you should not set user IDs for anonymous users. Once set, user IDs in Amplitude cannot be changed.Once set, user IDs in Amplitude cannot be changed. |
events_per_session | bigint | The count of events in a session |
session_started_at | timestamp without time zone | The time of the first event in a session |
session_ended_at | timestamp without time zone | The time of the last event in a session |
session_length | None | The length of the session, in seconds |
session_started_at_day | timestamp without time zone | The day the session started |
session_ended_at_day | timestamp without time zone | The day the session ended |
user_session_number | bigint | The number of the session, within a user's lifetime |
last_session_ended_at | timestamp without time zone | The time the prior session ended at |
last_session_ended_at_day | timestamp without time zone | The day the prior session ended |
is_first_user_session | integer | Boolean of whether or not a session is the user's first session |
seconds_in_between_sessions | None | The seconds in between sessions of a user |
minutes_in_between_sessions | double precision | The minutes in between sessions of a user |
days_in_between_sessions | None | The days in between sessions of a user |
session_length_in_minutes | double precision | None |
This query combines data from two tables (amplitude__event_enhanced and amplitude__sessions) to create a user-centric view. It calculates various user metrics including their creation time, first and last session times, total events and sessions, average session length, and average time between sessions. The results are grouped by the user ID, providing a comprehensive overview of user activity and engagement.
IntegrationAggregationFeaturizationWITH event_enhanced AS (
SELECT
*
FROM TEST.PUBLIC_amplitude.amplitude__event_enhanced
), session_data AS (
SELECT
*
FROM TEST.PUBLIC_amplitude.amplitude__sessions
)
SELECT
event_enhanced.amplitude_user_id,
MIN(event_enhanced.user_creation_time) AS user_created_at,
MIN(event_enhanced.session_started_at) AS first_session_at,
MAX(event_enhanced.session_ended_at) AS last_session_at,
COUNT(DISTINCT event_enhanced.unique_event_id) AS total_events_per_user,
COUNT(DISTINCT session_data.unique_session_id) AS total_sessions_per_user,
AVG(session_data.session_length_in_minutes) AS average_session_length_in_minutes,
AVG(session_data.minutes_in_between_sessions) AS average_minutes_in_between_sessions
FROM event_enhanced
LEFT JOIN session_data
ON event_enhanced.unique_session_id = session_data.unique_session_id
GROUP BY
1
Name | Type | Comment |
---|---|---|
amplitude_user_id | text | Unique key for each user, which is user_id by default. If user_id does not exist, use amplitude_id |
user_created_at | timestamp without time zone | Event_time (UTC) of the user's first event |
first_session_at | timestamp without time zone | Timestamp of user's first recorded activity |
last_session_at | timestamp without time zone | Timestamp of user's most recent activity |
total_events_per_user | bigint | The total number of events for a certain user |
total_sessions_per_user | bigint | The total number of sessions for a certain user |
average_session_length | None | Average length of sessions, in seconds |
average_seconds_in_between_sessions | None | Average time in between sessions for a user, in seconds |
average_minutes_in_between_sessions | double precision | Average time in between sessions for a user, in minutes |
average_days_in_between_sessions | None | Average time in between sessions for a user, in days |
average_session_length_in_minutes | double precision | None |
This SQL query creates a date spine (a continuous series of dates) for each event type in the amplitude__event_enhanced table. It generates dates from 2020-01-01 to 2024-08-01, then joins this with the event data to create a record for each event type on every day since its first occurrence. The query also generates a unique key for each date-event type combination.
IntegrationFeaturizationOtherWITH event_data AS (
SELECT
*
FROM TEST.PUBLIC_amplitude.amplitude__event_enhanced
), spine /* create end_date_adjust variable */ AS (
SELECT
spine.*
FROM (
WITH rawdata AS (
WITH p AS (
SELECT
0 AS generated_number
UNION ALL
SELECT
1
), unioned AS (
SELECT
p0.generated_number * POWER(2, 0) + p1.generated_number * POWER(2, 1) + p2.generated_number * POWER(2, 2) + p3.generated_number * POWER(2, 3) + p4.generated_number * POWER(2, 4) + p5.generated_number * POWER(2, 5) + p6.generated_number * POWER(2, 6) + p7.generated_number * POWER(2, 7) + p8.generated_number * POWER(2, 8) + p9.generated_number * POWER(2, 9) + p10.generated_number * POWER(2, 10) + 1 AS generated_number
FROM p AS p0
CROSS JOIN p AS p1
CROSS JOIN p AS p2
CROSS JOIN p AS p3
CROSS JOIN p AS p4
CROSS JOIN p AS p5
CROSS JOIN p AS p6
CROSS JOIN p AS p7
CROSS JOIN p AS p8
CROSS JOIN p AS p9
CROSS JOIN p AS p10
)
SELECT
*
FROM unioned
WHERE
generated_number <= 1674
ORDER BY
generated_number
), all_periods AS (
SELECT
(
DATEADD(day, ROW_NUMBER() OVER (ORDER BY 1) - 1, CAST('2020-01-01' AS DATE))
) AS date_day
FROM rawdata
), filtered AS (
SELECT
*
FROM all_periods
WHERE
date_day <= CAST('2024-08-01' AS DATE)
)
SELECT
*
FROM filtered
) AS spine
), date_spine AS (
SELECT DISTINCT
event_data.event_type,
CAST(spine.date_day AS DATE) AS event_day,
MD5(
CAST(COALESCE(CAST(spine.date_day AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(event_data.event_type AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS date_spine_unique_key
FROM spine
JOIN event_data
ON spine.date_day >= event_data.event_day /* each event_type will have a record for every day since their first day */
)
SELECT
*
FROM date_spine
Name | Type | Comment |
---|---|---|
event_type | text | None |
event_day | date | None |
date_spine_unique_key | text | None |
This SQL query processes data from an Amplitude event source. It performs several transformations including casting data types, creating derived columns (like event_day and unique_session_id), and applying filters based on event time. The query also generates a unique event ID using an MD5 hash of multiple fields. It selects a wide range of columns, some directly from the source and others with modifications or as newly calculated fields.
FilteringCleaningFeaturizationOtherWITH base AS (
SELECT
*
FROM TEST.PUBLIC__source_amplitude.stg_amplitude__event_tmp
), fields AS (
SELECT
CAST(NULL AS TIMESTAMP) AS _fivetran_synced,
CAST(NULL AS TEXT) AS _insert_id,
CAST(NULL AS TEXT) AS ad_id,
CAST(NULL AS TEXT) AS amplitude_id,
CAST(NULL AS TEXT) AS app,
CAST(NULL AS TEXT) AS city,
CAST(NULL AS TIMESTAMP) AS client_event_time,
CAST(NULL AS TIMESTAMP) AS client_upload_time,
CAST(NULL AS TEXT) AS country,
CAST(NULL AS TEXT) AS data,
CAST(NULL AS TEXT) AS device_brand,
CAST(NULL AS TEXT) AS device_carrier,
CAST(NULL AS TEXT) AS device_family,
CAST(NULL AS TEXT) AS device_id,
CAST(NULL AS TEXT) AS device_manufacturer,
CAST(NULL AS TEXT) AS device_model,
CAST(NULL AS TEXT) AS device_type,
CAST(NULL AS TEXT) AS dma,
CAST(NULL AS TEXT) AS event_properties,
CAST(NULL AS TIMESTAMP) AS event_time,
CAST(NULL AS TEXT) AS event_type,
CAST(NULL AS INT) AS event_type_id,
CAST(NULL AS TEXT) AS group_properties,
CAST(NULL AS INT) AS id,
CAST(NULL AS TEXT) AS idfa,
CAST(NULL AS TEXT) AS ip_address,
CAST(NULL AS BOOLEAN) AS is_attribution_event,
CAST(NULL AS TEXT) AS language,
CAST(NULL AS TEXT) AS library,
CAST(NULL AS TEXT) AS location_lat,
CAST(NULL AS TEXT) AS location_lng,
CAST(NULL AS TEXT) AS os_name,
CAST(NULL AS TEXT) AS os_version,
CAST(NULL AS BOOLEAN) AS paying,
CAST(NULL AS TEXT) AS platform,
CAST(NULL AS TIMESTAMP) AS processed_time,
CAST(NULL AS TEXT) AS project_name,
CAST(NULL AS TEXT) AS region,
CAST(NULL AS INT) AS schema,
CAST(NULL AS TIMESTAMP) AS server_received_time,
CAST(NULL AS TIMESTAMP) AS server_upload_time,
CAST(NULL AS INT) AS session_id,
CAST(NULL AS TEXT) AS start_version,
CAST(NULL AS TIMESTAMP) AS user_creation_time,
CAST(NULL AS TEXT) AS user_id,
CAST(NULL AS TEXT) AS user_properties,
CAST(NULL AS TEXT) AS uuid,
CAST(NULL AS TEXT) AS version_name,
CAST(NULL AS TEXT) AS group_types
FROM base
), final AS (
SELECT
id AS event_id,
CAST(event_time AS TIMESTAMP) AS event_time,
CAST(DATE_TRUNC('DAY', event_time) AS DATE) AS event_day,
MD5(
CAST(COALESCE(CAST(user_id AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(session_id AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS unique_session_id,
COALESCE(CAST(user_id AS TEXT), (
CAST(amplitude_id AS TEXT)
)) AS amplitude_user_id,
event_properties,
event_type,
event_type_id,
group_types,
group_properties,
session_id,
CAST(user_id AS TEXT) AS user_id,
user_properties,
CAST(amplitude_id AS TEXT) AS amplitude_id,
_insert_id,
ad_id,
app,
project_name,
CAST(client_event_time AS TIMESTAMP) AS client_event_time,
CAST(client_upload_time AS TIMESTAMP) AS client_upload_time,
city,
country,
data,
device_brand,
device_carrier,
device_family,
device_id,
device_manufacturer,
device_model,
device_type,
dma,
idfa,
ip_address,
language,
location_lat,
location_lng,
os_name,
os_version,
is_attribution_event,
library,
paying AS is_paying,
platform,
CAST(processed_time AS TIMESTAMP) AS processed_time,
region,
schema,
CAST(server_received_time AS TIMESTAMP) AS server_received_time,
CAST(server_upload_time AS TIMESTAMP) AS server_upload_time,
start_version,
CAST(user_creation_time AS TIMESTAMP) AS user_creation_time,
uuid,
version_name,
_fivetran_synced
FROM fields
WHERE
CAST(DATE_TRUNC('DAY', event_time) AS DATE) >= CAST('2020-01-01' AS DATE) /* filter to records past a specific date */
AND CAST(DATE_TRUNC('DAY', event_time) AS DATE) <= CAST(DATEADD(month, 1, DATE_TRUNC('DAY', CAST(CURRENT_TIMESTAMP() AS TIMESTAMP))) AS DATE) /* filter to records before a specific date */
), surrogate AS (
SELECT
*,
MD5(
CAST(COALESCE(CAST(event_id AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(device_id AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(client_event_time AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(amplitude_user_id AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS unique_event_id
FROM final
)
SELECT
*
FROM surrogate
Name | Type | Comment |
---|---|---|
unique_event_id | text | A unique key for each event, concatenating event_id, device_id, client_event_time, and amplitude_user_id. |
event_id | bigint | A counter that distinguishes events. |
unique_session_id | text | A unique key for each session, concatenating user_id and session_id. |
amplitude_id | text | An internal ID used to identify unique users. |
amplitude_user_id | text | Unique key for each user, which is user_id by default. If user_id does not exist, the package will use amplitude_id. |
ad_id | integer | (Android) Google Play Services advertising ID (AdID). This usually is wiped after ingestion and therefore will be blank. |
app | bigint | Project ID found in your project's Settings page. |
city | integer | City where the event was registered. |
client_event_time | timestamp without time zone | Local timestamp (UTC) of when the device logged the event. |
client_upload_time | timestamp without time zone | The local timestamp (UTC) of when the device uploaded the event. |
country | text | Country where event was registered. |
data | text | Dictionary where certain fields such as first_event and merged_amplitude_id are stored. Ex, {"first_event":true}. |
device_brand | integer | Brand of device where event was registered on. |
device_carrier | integer | Carrier of device where event was registered on. |
device_family | integer | Device family. Ex, Apple iPhone. |
device_id | text | The device specific identifier. |
device_manufacturer | integer | Manufacturer of device where event was registered on. |
device_model | integer | The model of device where event was registered on. |
device_type | integer | Type of device where event was registered on. |
dma | integer | Designated marketing area (DMA). |
event_properties | text | Event properties are attributes of the events your users fire, and which you then send to Amplitude. Each event will have its own set of event properties. The nature of these properties depends on both the type of product you have, and the specific information you're most interested in discovering. For instance, if Swipe is an event you're tracking, the event property Direction could have the values Left or Right. Some example event properties are description, category, type, duration, level, percent completed, count, source, status, number, lives, authenticated, error number, rank, action, and mode. |
event_time | timestamp without time zone | Amplitude timestamp (UTC) which is the client_event_time adjusted by the difference between server_received_time and client_upload_time, specifically: event_time = client_event_time + (server_received_time - client_upload_time) 2015-08-10T12:00:00.000000 We use this timestamp to organize events on Amplitude charts. Note: If the difference between server_received_time and client_upload_time is less than 60 seconds, the event_time will not be adjusted and will equal the client_event_time. |
event_day | date | The day of the event time, in date format. |
event_type | text | The assigned type of event. Example 'Add Friend' |
event_type_id | bigint | Internal identifier on Amplitude. |
group_properties | text | A dictionary of key-value pairs that represent data tied to the group. Each distinct value appears as a group segment on the Amplitude dashboard. You can store property values in an array, and date values are transformed into string values. |
group_types | text | Group types. "Groups" in the source table, renamed here. See the [Accounts documentation](https://help.amplitude.com/hc/en-us/articles/115001765532-Accounts) for more information. |
idfa | integer | (iOS) Identifier for Advertiser. This usually is wiped after ingestion and therefore will be blank. |
_insert_id | integer | A unique identifier for the event. Amplitude deduplicates subsequent events sent with the same device_id and insert_id within the past 7 days. Amplitude recommends generating a UUID or using some combination of device_id, user_id, event_type, event_id, and time. |
ip_address | text | The IP address of the device which triggered the event. |
is_attribution_event | boolean | Boolean of boolean of whether event was an attribution event. |
language | integer | The language set by the user. |
library | text | Library being used to send the event data. |
location_lat | integer | Latitude where event was registered. |
location_lng | integer | Longitude where event was registered. |
os_name | integer | OS name where event was registered on. |
os_version | integer | OS version where event was registered on. |
is_paying | integer | Boolean of boolean of whether the user has ever logged any revenue, otherwise '(none)'. Note the property value can be modified via the\_Identify API. |
platform | integer | Platform of the device where event was registered on. |
processed_time | timestamp without time zone | Time of when event was processed. |
project_name | bigint | The identifier of the project. |
region | integer | Region where event was registered on. |
schema | bigint | The schema ID. |
server_received_time | timestamp without time zone | Amplitude timestamp (UTC) of when servers received the event. |
server_upload_time | timestamp without time zone | Amplitude timestamp (UTC) of when the event data was uploaded to Amplitude servers. |
session_id | bigint | The session start time in milliseconds since epoch. |
start_version | integer | App version the user was first tracked on. |
user_creation_time | timestamp without time zone | Event_time (UTC) of the user's first event. |
user_id | text | A readable ID specified by you. Amplitude uses a combination of three different methods to identify your users: device IDs, Amplitude ID, and user IDs. The first comes directly from your users' devices, while the second is an ID that Amplitude automatically creates once it has enough information to conclusively identify a unique user. The user ID, however, is something you'd set up. In Amplitude, a user ID is a unique identifier applied to individual users. Using them is optional, but recommended: your product should set a user ID once a user has created an account, logged in, or is otherwise identified in your product. Amplitude can use a user ID to reconcile events across multiple devices under the same user ID. Additionally, a user's event data will be merged on the backend so that all anonymous events prior to the assignment of a user ID will be connected to the correct user. For this reason, you can wait to assign user IDs if that makes sense for your product. Conversely, this is also why you should not set user IDs for anonymous users. Once set, user IDs in Amplitude cannot be changed.Once set, user IDs in Amplitude cannot be changed. |
user_properties | text | An attribute that describes a useful detail about the user it's attached to. Amplitude sends user properties with every event. |
uuid | text | A unique identifier per row (event sent). |
version_name | integer | The app version name. |
_fivetran_synced | timestamp without time zone | Timestamp of when Fivetran synced a record. |
This SQL query selects all columns and rows from the 'event' table in the 'amplitude' schema of the 'TEST' database. It's a simple SELECT * statement without any filtering, transformation, or aggregation.
OtherSELECT
*
FROM TEST.amplitude.event
Name | Type | Comment |
---|---|---|
device_id | text | None |
id | bigint | None |
server_upload_time | timestamp without time zone | None |
_fivetran_synced | timestamp without time zone | None |
_insert_id | integer | None |
ad_id | integer | None |
amplitude_event_type | integer | None |
amplitude_id | bigint | None |
app | bigint | None |
city | integer | None |
client_event_time | timestamp without time zone | None |
client_upload_time | timestamp without time zone | None |
country | text | None |
data | text | None |
device_brand | integer | None |
device_carrier | integer | None |
device_family | integer | None |
device_manufacturer | integer | None |
device_model | integer | None |
device_type | integer | None |
dma | integer | None |
event_properties | text | None |
event_time | timestamp without time zone | None |
event_type | text | None |
event_type_id | bigint | None |
group_properties | text | None |
groups | text | None |
idfa | integer | None |
ip_address | text | None |
is_attribution_event | boolean | None |
language | integer | None |
library | text | None |
location_lat | integer | None |
location_lng | integer | None |
os_name | integer | None |
os_version | integer | None |
paying | integer | None |
platform | integer | None |
processed_time | timestamp without time zone | None |
project_name | bigint | None |
region | integer | None |
sample_rate | integer | None |
schema | bigint | None |
server_received_time | timestamp without time zone | None |
session_id | bigint | None |
start_version | integer | None |
user_creation_time | timestamp without time zone | None |
user_id | character varying(100) | None |
user_properties | text | None |
uuid | text | None |
version_name | integer | None |
This SQL query performs a series of transformations on data from a staging table for Amplitude event types. It casts columns to specific data types, renames and reorganizes fields, and creates a unique identifier for each event type using a hash function. The query doesn't filter or aggregate data, but rather focuses on restructuring and cleaning the existing information.
CleaningFeaturizationOtherWITH base AS (
SELECT
*
FROM TEST.PUBLIC__source_amplitude.stg_amplitude__event_type_tmp
), fields AS (
SELECT
CAST(NULL AS BOOLEAN) AS _fivetran_deleted,
CAST(NULL AS TIMESTAMP) AS _fivetran_synced,
CAST(NULL AS BOOLEAN) AS autohidden,
CAST(NULL AS BOOLEAN) AS deleted,
CAST(NULL AS TEXT) AS display,
CAST(NULL AS BOOLEAN) AS flow_hidden,
CAST(NULL AS BOOLEAN) AS hidden,
CAST(NULL AS INT) AS id,
CAST(NULL AS BOOLEAN) AS in_waitroom,
CAST(NULL AS TEXT) AS name,
CAST(NULL AS BOOLEAN) AS non_active,
CAST(NULL AS TEXT) AS project_name,
CAST(NULL AS BOOLEAN) AS timeline_hidden,
CAST(NULL AS INT) AS totals,
CAST(NULL AS INT) AS totals_delta,
CAST(NULL AS TEXT) AS value,
CAST(NULL AS TEXT) AS waitroom_approved
FROM base
), final AS (
SELECT
id AS event_type_id,
name AS event_type_name,
project_name,
display,
totals,
totals_delta,
value,
flow_hidden AS is_flow_hidden,
hidden AS is_hidden,
in_waitroom AS is_in_waitroom,
non_active AS is_non_active,
autohidden AS is_autohidden,
deleted AS is_deleted,
timeline_hidden AS is_timeline_hidden,
waitroom_approved AS is_waitroom_approved,
_fivetran_deleted,
_fivetran_synced
FROM fields
), surrogate AS (
SELECT
*,
MD5(
CAST(COALESCE(CAST(event_type_id AS TEXT), '_dbt_utils_surrogate_key_null_') || '-' || COALESCE(CAST(project_name AS TEXT), '_dbt_utils_surrogate_key_null_') AS TEXT)
) AS unique_event_type_id
FROM final
)
SELECT
*
FROM surrogate
Name | Type | Comment |
---|---|---|
unique_event_type_id | text | A unique key for each event, concatenating event_type_id and project_name. |
event_type_id | bigint | Event type ID. |
_fivetran_deleted | boolean | Timestamp of when Fivetran marked a record as deleted. |
_fivetran_synced | timestamp without time zone | Timestamp of when Fivetran synced a record. |
is_autohidden | boolean | Boolean of whether event type is hidden. |
is_deleted | boolean | Boolean of whether event type is deleted. |
display | text | The display name of the event. |
is_flow_hidden | boolean | Boolean of whether the event is hidden from Pathfinder/Pathfinder Users or not. |
is_hidden | boolean | Boolean of whether the event is hidden or not. |
is_in_waitroom | boolean | Boolean of whether event type is in waitroom. |
event_type_name | text | Event type name. |
is_non_active | boolean | Boolean of whether the event is marked inactive or not. |
project_name | bigint | Project name. |
is_timeline_hidden | boolean | Boolean of whether the event is hidden or not. |
totals | integer | The total number of times the event has happened this week. |
totals_delta | integer | Represents the change in event volume from the previous week. |
value | text | Name of the event in the raw data. |
is_waitroom_approved | integer | Boolean of whether is waitroom approved. |
This SQL query selects all columns and rows from the 'event_type' table in the 'amplitude' schema of the 'TEST' database. It's a simple SELECT * statement without any filtering, transformation, or aggregation.
OtherSELECT
*
FROM TEST.amplitude.event_type
Name | Type | Comment |
---|---|---|
id | bigint | None |
project_name | bigint | None |
_fivetran_deleted | boolean | None |
_fivetran_synced | timestamp without time zone | None |
autohidden | boolean | None |
deleted | boolean | None |
display | text | None |
flow_hidden | boolean | None |
hidden | boolean | None |
in_waitroom | boolean | None |
name | text | None |
non_active | boolean | None |
timeline_hidden | boolean | None |
totals | integer | None |
totals_delta | integer | None |
value | text | None |
waitroom_approved | integer | None |