Bronze to Silver
These are the basic best practices followed when moving data from Bronze to Silver.
DO's
Case standardization
- Convert column names to lowercase or uppercase.
- Standardize text fields (e.g., convert all First Name, Last Name, Product names to title case).
Column renaming
- Adopt consistent naming conventions across tables
Whitespace
- Trim leading and trailing spaces from string fields.
- Remove extra spaces between words.
Data type conversions
- Convert string dates to proper date format
- Change numeric strings to appropriate integer or decimal types
Null handling
- Replace empty strings with NULL values
- Set default values for NULL fields where appropriate
Deduplication
- Remove exact duplicate records
- Handle near-duplicates based on business rules
Format standardization
- Normalize phone numbers to a consistent format
- Standardize address formats
Value normalization
- Convert units of measurement to a standard unit (e.g., all weights to pounds or kgs)
- Standardize currency / lat-long decimal positions.
Character encoding
- Convert all text to UTF-8 or another standard encoding.
Special character handling
- Remove or replace non-printable characters.
- Handle escape characters in text fields.
Data validation
- Check for values within expected ranges
- Validate against reference data (e.g., valid product codes)
Date and time standardization
- Convert all timestamps to UTC or a standard time zone
- Ensure consistent date formats across all fields
Calculated fields
- Add derived columns based on raw data (e.g., age from birth date)
Data enrichment
- Add geographic information based on zip codes
- Categorize products based on attributes
Error correction
- Fix common misspellings
- Correct known data entry errors
Structural changes:
- Split combined fields (e.g., full name into first and last name).
- Merge related fields for easier analysis.
Metadata addition:
- Add source system identifiers.
- Include data lineage information.
Sensitive data handling
- Mask or encrypt personally identifiable information (PII)
- Apply data governance rules
Outlier detection and handling
- Identify statistical outliers
- Apply business rules to handle extreme values
Column dropping
- Remove unnecessary or redundant columns
DON'Ts
Don't Lose Raw Data
- Avoid modifying or transforming data in a way that the original information is lost. Always keep the Bronze layer intact and unmodified as it serves as the source of truth.
- Maintain backups of critical data and metadata in case you need to revert transformations.
Don't Over-optimize Early
- Avoid excessive transformations or optimizations before ensuring data accuracy.
- Focus on clarity and correctness before performance tuning.
Don't Hard-code Business Logic
- Avoid hard-coding business rules that could change frequently. Instead, make them configurable to ensure flexibility for future updates.
Don't Skip Data Validation
- Don’t blindly trust incoming data from the Bronze layer. Always validate the quality, type, and range of data before moving it to Silver.
- Implement rules for handling invalid data, such as setting defaults for missing values or discarding outliers.
Don't Ignore Metadata
- Always retain important metadata, such as source system identifiers, data lineage, or event timestamps, to maintain traceability.
Don't Introduce Ambiguity
- Avoid renaming columns or values without proper documentation. Ensure that any renaming or transformation is clear, consistent, and well-documented to prevent confusion.
Don't Remove Critical Columns Without Backup
- Avoid dropping columns that might be required later for audits or debugging. Always ensure you have a backup or metadata for columns you drop.
Don't Assume Data Integrity
- Never assume that data from the Bronze layer is clean or consistent. Always apply deduplication, null handling, and data cleaning rules.
Don't Overload the Silver Layer with Business Logic
- Avoid implementing complex business rules in the Silver layer. Keep it focused on data cleansing and preparation. Reserve detailed business logic for the Gold layer.
Don't Over-transform Data
- Avoid excessive or unnecessary transformations in the Silver layer, which could make the data less flexible for future use cases.
Don't Change Granularity
- Don’t aggregate or change the granularity of data too early. Keep data at the same granularity as the Bronze layer to maintain flexibility for future aggregations in the Gold layer.
Don't Overlook Data Encoding and Formats
- Don’t assume uniform character encoding across all systems. Always validate encoding, such as converting text to UTF-8 where needed.
- Standardize date formats and ensure consistent time zones (preferably UTC) across all fields.
Don't Disregard Null and Missing Data
- Don't ignore patterns of missing data. Treat NULL values carefully, ensuring that they’re handled based on the specific business logic.
Don't Apply Unnecessary Joins
- Avoid applying joins that aren’t required in the Silver layer. Unnecessary joins can increase complexity and lead to performance degradation.
Don't Remove Contextual Information
- Don’t drop fields that provide necessary context, such as event timestamps, source system details, or audit trails.
Don't Assume All Data Sources Are Consistent
- Don’t assume that data from different sources (even within the Bronze layer) will have consistent formats, structures, or units of measurement. Always validate and normalize data as necessary.
Don't Ignore Data Governance
- Avoid handling sensitive data like PII carelessly. Ensure that you apply data masking, encryption, or other governance rules to stay compliant with regulations.
Don't Forget Documentation
- Avoid performing transformations or renaming columns without thoroughly documenting them. Clear documentation ensures data transformations are transparent and easily understood across teams.