Non-Historized Satellite

This macro creates a non-historized satellite that should be materialized as an incremental table. It should be applied ‘on top’ of the staging layer, and is either connected to a Hub or a Link. Besides the missing hashdiff, a non-historized satellite applies the same loading logic as a regular version 0 satellite. Each satellite can only be loaded by one source model, since we typically recommend a satellite split by source system.

Features:

  • High-Performance loading of non-historized satellite data

Required Parameters

Parameters Data Type Required Default Value Explanation
parent_hashkey string mandatory Name of the hashkey column inside the stage of the object that this satellite is attached to.
src_payload string | list of strings mandatory A list of all the descriptive attributes that should be included in this satellite.
source_model string mandatory Name of the underlying staging model, must be available inside dbt as a model.

Optional Parameters

Parameters Data Type Required Default Value Explanation
source_is_single_batch boolean optional False See below for explanation.
src_ldts string optional datavault4dbt. ldts_alias Name of the ldts column inside the source models. Needs to use the same column name as defined as alias inside the staging model.
src_rsrc string optional datavault4dbt. rsrc_alias Name of the rsrc column inside the source models. Needs to use the same column name as defined as alias inside the staging model.

Example 1

{{ config(materialized='incremental') }}

{%- set yaml_metadata -%}
parent_hashkey: 'hk_creditcard_transactions_nl'
src_payload:
    - invoice_address
    - vendor_name
source_model: 'stage_creditcard_transactions'
{%- endset -%}    

{%- set metadata_dict = fromyaml(yaml_metadata) -%}

{{ datavault4dbt.nh_sat(parent_hashkey=metadata_dict.get('parent_hashkey'),
                        src_payload=metadata_dict.get('src_payload'),
                        source_model=metadata_dict.get('source_model'),
                        src_ldts=metadata_dict.get('src_ldts'),
                        src_rsrc=metadata_dict.get('src_rsrc')) }}               

Description

With this example, a non-historized Satellite is created. Under normal circumstances, all descriptive attributes of a transaction are integrated into a non-historized link. In this example, there are some attributes that contain personal identifiable information (PII), which has to be treated differently than the other attributes. For that split, the non-historized satellite comes into play: all PII-attributes are integrated into a non-historized satellite, which is attached to the corresponding non-historized link.

  • parent_hashkey:
    • hk_creditcard_transactions_nl: The satellite would be attached to the non-historized link creditcard_transaction, which has the column ‘hk_creditcard_transactions_nl’ as a hashkey column.
  • src_payload:
    • [‘invoice_address’,’vendor_name’]: This satellite would hold the columns ‘invoice_address’ and ‘vendor_name’ out of the underlying staging area.
  • source_model:
    • stage_creditcard_transactions: This satellite is loaded out of the stage for creditcard_transactions

 

Disabling Deduplication of the source data

To safely disable the deduplication, you have to verify that the underlying staging model of the NH Sat only holds one row per Link Hashkey.

We highly recommend setting up a uniqueness test within a yml file, covering the hashkey column of the staging model.

Once this is guaranteed, you just add the following parameter to your macro call:

source_is_single_batch=true

If this is set to true, this QUALIFY statement:

earliest_hk_over_all_sources AS (
{# Deduplicate the unionized records again to only insert the earliest one. #}

    SELECT
        lcte.*
    FROM {{ ns.last_cte }} AS lcte

    QUALIFY ROW_NUMBER() OVER (PARTITION BY {{ link_hashkey }} ORDER BY {{ src_ldts }}) = 1

    {%- set ns.last_cte = 'earliest_hk_over_all_sources' -%}

),

will not be activated!

To continuously ensure data validity within your NH Sat, we also recommend setting up a uniqueness test across the Link Hashkey on your NH Sat.