PostgreSQL Data Management System

Overview

This collection of PostgreSQL functions forms a comprehensive data management system designed to analyze table structures, create optimized materialized views, and maintain their health over time. The system consists of two integrated subsystems that work together to improve database performance, data quality, and maintenance efficiency.

Core Subsystems

1. Table Analysis Subsystem

This subsystem analyzes database tables to identify their characteristics, data quality, and optimal strategies for keys, partitioning, and ordering.

Key Features:

  • Statistical sampling for efficient analysis of large tables
  • Column-level fitness evaluation for primary/foreign key suitability
  • Data quality assessment with encoding issue detection
  • Identification of optimal column combinations for partitioning
  • Detection of timestamp columns suitable for ordering
  • Overall Data Quality Index (DQI) calculation

Primary Functions:

  • grok_analyze_table_fitness: Main entry point for table analysis
  • grok_analyze_column_stats: Analyzes individual column characteristics
  • grok_analyze_column_combinations: Evaluates column pairs for composite keys
  • grok_calculate_dqi: Calculates the overall Data Quality Index

2. Materialized View Management Subsystem

This subsystem creates, monitors, and maintains optimized materialized views based on insights from the table analysis.

Key Features:

  • Optimized materialized view creation with proper indexing
  • Automatic handling of character encoding issues
  • Synthetic key generation for uniqueness
  • Content hash generation for efficient change detection
  • Health monitoring with staleness detection
  • Automated maintenance and remediation actions

Primary Functions:

  • grok_create_optimized_matv: Creates a complete materialized view system
  • grok_manage_matv_health: Monitors and maintains materialized view health
  • grok_check_matv_mismatches: Detects inconsistencies between source and materialized views
  • grok_perform_matv_action: Executes maintenance actions on materialized views

Architecture & Design Patterns

The system implements several important design patterns:

  1. View Layering Pattern: Creates multiple views serving different purposes:

    • vtw_*: View To Watch (source view with data quality enhancement)
    • matc_*: MATerialized Copy (physical storage with indexes)
    • vm_*: View of Materialized view (clean data for querying)
    • vprob_*: View of PROBlematic data (encoding issues for review)
  2. Data Quality Management Pattern: Automatically detects, flags, and segregates problematic data:

    • Non-ASCII character detection
    • Cleansed versions of problematic text
    • Separate views for clean vs. problematic data
  3. Change Detection Pattern: Implements efficient methods to detect data changes:

    • Content hash generation from relevant columns
    • Timestamp-based staleness detection
    • Sampling-based consistency validation
  4. Maintenance Strategy Pattern: Provides multiple strategies for maintaining materialized views:

    • Refresh: Updates with fresh data from the source
    • Repair: Rebuilds indexes and constraints
    • Reindex: Rebuilds indexes without dropping them

Usage Examples

Analyzing a Table

-- Analyze a table to identify key characteristics and data quality
SELECT config.grok_analyze_table_fitness(
  'public',           -- Source schema
  'customer_data',    -- Source table
  ARRAY['id', 'uid']  -- Columns to exclude from key fitness evaluation
);

Creating an Optimized Materialized View

-- Create an optimized materialized view system based on analysis results
SELECT config.grok_create_optimized_matv(
  'public',                        -- Source schema
  'customer_data',                 -- Source table
  'analytics',                     -- Target schema
  'matc_customer_summary',         -- Target materialized view name
  ARRAY['region', 'customer_type'], -- Partition columns
  ARRAY['updated_at', 'customer_id'], -- Order-by columns
  ARRAY['created_by', 'modified_by'], -- Columns to exclude from hash
  true                             -- Filter to latest records only
);

Monitoring Materialized View Health

-- Check health of a materialized view
SELECT config.grok_manage_matv_health(
  'analytics',              -- Schema
  'matc_customer_summary',  -- Materialized view name
  'daily',                  -- Validation type: 'quick', 'daily', or 'full'
  NULL                      -- Action (NULL for check only, 'refresh', 'repair', 'reindex')
);

Maintaining Materialized View Health

-- Refresh a stale materialized view
SELECT config.grok_manage_matv_health(
  'analytics',              -- Schema
  'matc_customer_summary',  -- Materialized view name
  'daily',                  -- Validation type
  'refresh'                 -- Action to perform
);

Performance Considerations

  • Sampling: The system uses statistical sampling for efficient analysis of large tables
  • Concurrent Refresh: Uses concurrent refresh when possible (requires unique indexes)
  • Validation Modes: Offers different validation modes with performance/thoroughness tradeoffs:
    • quick: Fastest, uses 0.1% sampling, 3-day staleness threshold
    • daily: Medium, uses 1% sampling, 1-day staleness threshold
    • full: Most thorough, uses 100% sampling, 12-hour staleness threshold

Dependencies

This system depends on the following database objects:

  1. Table Fitness Audit Table:

    • config.table_fitness_audit: Stores table analysis results
  2. Materialized View Statistics Table:

    • public.c77_dbh_matv_stats: Stores materialized view refresh statistics

Best Practices

  1. Initial Analysis: Run table analysis before creating materialized views to identify optimal configuration
  2. Regular Health Checks: Schedule periodic health checks using grok_manage_matv_health
  3. Validation Types: Use quick for frequent checks, daily for daily maintenance, and full for critical views
  4. Monitoring: Track Data Quality Index (DQI) over time to detect data quality trends
  5. Maintenance Windows: Schedule refreshes during low-usage periods for large materialized views

Error Handling

All functions include comprehensive error handling with:

  • Clear error messages indicating what went wrong
  • Processing notes to track execution steps
  • Safe failure modes that avoid leaving the database in an inconsistent state

Troubleshooting

Common issues and solutions:

  1. Stale Materialized Views: Use grok_manage_matv_health with action='refresh'
  2. Encoding Issues: Use grok_manage_matv_health with action='repair'
  3. Index Performance Issues: Use grok_manage_matv_health with action='reindex'
  4. Missing Statistics: Ensure public.c77_dbh_matv_stats table is populated with refresh statistics

Extension Points

The system is designed to be extended in several ways:

  1. Add custom data quality checks in the vtw_ view creation
  2. Extend partition and order-by column validation logic
  3. Implement additional maintenance actions in grok_perform_matv_action
  4. Add custom health metrics to grok_manage_matv_health
Description
Postgres Extension for Managing Materialized Views Used as a Cache
Readme 66 KiB
Languages
PLpgSQL 100%