SNAPSHOT CONTROL V0
This macro creates a snapshot table to control snapshot-based tables like PITs and Bridges. The snapshot table will hold daily snapshots starting at a specific start_date and has a configurable daytime. Usual application would involve creating one snapshot table per Data Vault environment, and therefore creating one dbt model using this macro. The model needs to be scheduled daily, with an execution time that matches your desired snapshot time, since the macro automatically inserts all snapshots until the current day, no matter which time it is, and a snapshot for 08:00:00 should be calculated at that time. So if the snapshot table is configured to have a daily_snapshot_time of 07:00:00, all snapshots in the table will have the timestamp 07:00:00. Therefore you need to schedule the building of the snapshot table also to 07:00:00.
In addition to the actual snapshot-datetimestamp (sdts), the macro generates the following columns:
| Column | Data Type | Explanation |
|---|---|---|
| replacement_sdts | timestamp | Allows users to replace an sdts with another one, without having to update the actual sdts column. By default this column is filled with the regular sdts. |
| caption | string | Allows users to title their snapshots. Examples would be something like: 'Christmas 2022', or 'End-of-year report 2021'. By default this is filled with 'Snapshot {sdts}', holding the respective sdts. |
| is_hourly | boolean | Captures if the time of an sdts is on an exact hour, meaning minutes=0 and seconds=0. All sdts created by this macro are daily and therefore always hourly, but this column enables future inserts of custom, user-defined sdts. |
| is_daily | boolean | Captures if the time of an sdts is on exactly midnight, meaning hours=0, minutes=0 and seconds=0. This depends on your desired daily_snapshot_time, but is not used by the downstream macros, and just generates additional metadata for potential future use. |
| is_beginning_of_week | boolean | Captures if a sdts is the first day of the week. This is fully configurable via the datavault4dbt.first_day_of_week variable. |
| is_end_of_week | boolean | Captures if a sdts is the last day of the week, automatically calculated based on your configured first day of the week. |
| is_beginning_of_month | boolean | Captures if a sdts is the first day of a month. |
| is_end_of_month | boolean | Captures if a sdts is the last day of a month. |
| is_beginning_of_quarter | boolean | Captures if a sdts is the first day of a quarter. |
| is_end_of_quarter | boolean | Captures if a sdts is the last day of a quarter. |
| is_beginning_of_year | boolean | Captures if a sdts is the first day of a year. |
| is_end_of_year | boolean | Captures if a sdts is the last day of a year. |
| comment | string | Allows users to write custom comments for each sdts. By default this column is set to NULL. |
Macro Parameters
| Parameters | Data Type | Required | Default Value | Explanation |
|---|---|---|---|---|
| start_date | date | mandatory | – | Defines the earliest date that should be available inside the snapshot_table. The format of this date must be 'YYYY-MM-DD'. |
| daily_snapshot_time | time | mandatory | – | Defines the time that your daily snapshots should have. Usually this is either something right before daily business starts, or after daily business is over. Needs to be in the format 'hh:mm:ss'. |
| sdts_alias | string | optional | datavault4dbt.sdts_alias | Defines the name of the snapshot date timestamp column inside the snapshot_table. |
| end_date | date | optional | current date/timestamp of the target database | Defines the latest date that should be available inside the snapshot_table. The format of this date must be 'YYYY-MM-DD'. When omitted, snapshots are generated up to the database's current date/timestamp. The end_date itself is included. |
DAY OF WEEK ADAPTER REFERENCE MATRIX
Because different database dialects use different standards and functions for extracting the day of the week, configuring your datavault4dbt.first_day_of_week variable requires knowing how your specific target database behaves natively.
Use the table below to determine the correct integer to input for your target database in your dbt_project.yml based on whether you want your weeks to start on Sunday or Monday.
| Dialect | Native Function | Sunday | Monday | Behavior |
|---|---|---|---|---|
| BigQuery | EXTRACT(DAYOFWEEK) | 1 | 2 | US Standard |
| Synapse | DATEPART(WEEKDAY) | 1 | 2 | US Standard |
| Fabric | DATEPART(WEEKDAY) | 1 | 2 | US Standard |
| Oracle | TRUNC(..., 'IW') Math | 1 | 2 | US Standard |
| SQLServer | DATEPART(WEEKDAY) | 1 | 2 | US Standard |
| Postgres | EXTRACT(ISODOW) | 7 | 1 | ISO Standard |
| Snowflake | EXTRACT(DAYOFWEEK_ISO) | 7 | 1 | ISO Standard |
| Databricks | dayofweek_iso() | 7 | 1 | ISO Standard |
| Exasol | TO_CHAR('ID') | 7 | 1 | ISO Standard |
| Redshift | EXTRACT(DOW) | 0 | 1 | Wild Card (0-6) |
EXAMPLE 1
{{ config(materialized='incremental') }}
{%- set yaml_metadata -%}
start_date: '2015-01-01'
daily_snapshot_time: '07:30:00'
{%- endset -%}
{{ datavault4dbt.control_snap_v0(yaml_metadata=yaml_metadata) }}
DESCRIPTION
- start_date: The parameter start_date has to be in the format ‘YYYY-MM-DD’. With the start_date set to ‘2015-01-01’, the snapshot table would hold daily snapshots beginning at 2015.
- daily_snapshot_time: ’07:30:00′ The snapshots inside this table would all have the time ’07:30:00′. Must be set in the format ‘hh:mm:ss’.