c77_mvc/USAGE.md
2025-03-27 07:00:55 -05:00

11 KiB

c77_mvc PostgreSQL Extension Usage Guide

This guide provides detailed instructions on how to use the c77_mvc extension for materialized view management and table fitness analysis in PostgreSQL.

Table of Contents

  1. Overview
  2. Table Fitness Analysis
  3. Materialized View Management
  4. Materialized View Health Monitoring
  5. Advanced Use Cases
  6. Best Practices
  7. Function Reference

Overview

The c77_mvc extension provides two main sets of functionality:

  1. Table Fitness Analysis: Evaluate table structure for data quality, partitioning suitability, and optimization opportunities
  2. Materialized View Management: Create and maintain optimized materialized views with content hashing, synthetic keys, and encoding status tracking

Table Fitness Analysis

Table fitness analysis helps you evaluate table structure and data quality to inform optimization decisions.

Basic Table Analysis

To perform a basic analysis on a table:

SELECT * FROM public.c77_mvc_analyze_table_fitness('schema_name', 'table_name');

This returns a JSON object containing:

  • Column statistics
  • Recommended partition combinations
  • Order-by candidates
  • Data quality index
  • Analysis notes

Example with Excluding Key Columns

If you have columns that should not be considered for key combinations:

SELECT * FROM public.c77_mvc_analyze_table_fitness(
  'schema_name', 
  'table_name', 
  ARRAY['id', 'created_at']::text[]
);

Interpreting Analysis Results

The analysis result includes:

  1. Column Stats: Individual column metrics including:

    • Null ratio
    • Uniqueness ratio
    • Encoding issue ratio
    • Fitness score
  2. Recommended Partition Combinations: Column pairs that work well together for partitioning:

    • Uniqueness ratio
    • Discrimination power
    • Average fitness score
  3. Order-by Candidates: Columns suitable for ordering data:

    • Timestamp columns
    • Text columns parseable as timestamps
  4. Data Quality Index (DQI): Overall score from 0-100 indicating data quality

Example query to extract key information:

SELECT 
  run_id,
  run_timestamp,
  analysis_result->>'data_quality_index' as dqi,
  analysis_result->'recommended_partition_combinations' as partition_recommendations
FROM public.c77_mvc_table_fitness_audit
WHERE source_schema = 'schema_name' AND source_table = 'table_name'
ORDER BY run_id DESC
LIMIT 1;

Materialized View Management

The extension provides tools to create and manage optimized materialized views.

Creating Optimized Materialized Views

To create an optimized materialized view with synthetic keys and content hashing:

SELECT * FROM public.c77_mvc_create_optimized_matv(
  'source_schema',            -- Source schema name
  'source_table',             -- Source table name
  'target_schema',            -- Target schema for materialized view
  'matc_target_mv_name',      -- Target materialized view name (should start with matc_)
  ARRAY['column1', 'column2'],  -- Partition columns
  ARRAY['timestamp_column'],    -- Order-by columns
  ARRAY['exclude_column1'],     -- Columns to exclude from content hash calculation (optional)
  false                         -- Filter to get only latest records (optional)
);

This creates:

  1. vtw_target_mv_name: A view with regexp_replace for character columns
  2. matc_target_mv_name: A materialized view derived from the vtw_ view
  3. vm_target_mv_name: A view that filters out encoding issues
  4. vprob_target_mv_name: A view showing only records with encoding issues

View Structure and Purpose

When you create an optimized materialized view, multiple objects are created:

Object Type Naming Pattern Purpose
View vtw_* Source view with content hash, synthetic key, and encoding status
Materialized View matc_* Materialized copy of the vtw_ view
View vm_* Clean data view (excludes encoding issues)
View vprob_* Problematic data view (only encoding issues)

Example Use Case

Scenario: Creating a materialized view of customer data:

SELECT * FROM public.c77_mvc_create_optimized_matv(
  'sales', 
  'customers', 
  'reporting', 
  'matc_customer_summary',
  ARRAY['customer_id', 'region'], 
  ARRAY['last_updated'], 
  ARRAY['notes', 'internal_comments'],
  false
);

To query clean data:

SELECT * FROM reporting.vm_customer_summary;

To check for encoding issues:

SELECT * FROM reporting.vprob_customer_summary;

Materialized View Health Monitoring

The extension provides tools to monitor and maintain the health of materialized views.

Checking Materialized View Health

SELECT * FROM public.c77_mvc_manage_matv_health(
  'schema_name',       -- Schema containing the materialized view
  'matc_view_name',    -- Materialized view name (should start with matc_)
  'quick',             -- Validation type: 'quick', 'daily', or 'full'
  NULL                 -- Action: NULL, 'refresh', 'repair', or 'reindex'
);

Validation types:

  • quick: Fast check with 0.1% sample (3-day threshold)
  • daily: More thorough check with 1% sample (1-day threshold)
  • full: Complete check with 100% sample (12-hour threshold)

Automated Refresh

To check health and refresh if needed:

SELECT * FROM public.c77_mvc_manage_matv_health(
  'schema_name',
  'matc_view_name',
  'daily',
  'refresh'  -- Will refresh if stale based on thresholds
);

Monitoring Multiple Views

Example script to monitor all materialized views in a schema:

DO $$
DECLARE
  view_record RECORD;
  result JSONB;
BEGIN
  FOR view_record IN 
    SELECT matviewname 
    FROM pg_matviews 
    WHERE schemaname = 'target_schema'
      AND matviewname LIKE 'matc_%'
  LOOP
    RAISE NOTICE 'Checking view: %', view_record.matviewname;
    SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) INTO result;
    RAISE NOTICE 'Status: %', result->>'status';
  END LOOP;
END;
$$;

Advanced Use Cases

Customizing Character Encoding Handling

The extension detects non-ASCII characters using the regex pattern [^\x00-\x7F]. For custom handling:

  1. Create your own view that modifies the vtw_ view:
CREATE OR REPLACE VIEW custom_schema.my_custom_vtw AS
SELECT *,
  CASE
    WHEN column1 ~ '[^\x00-\x7F]' OR column2 ~ '[^\x20-\x7E]' THEN 'CUSTOM_ENCODING_ISSUE'
    ELSE 'CLEAN'
  END AS encoding_status
FROM schema_name.vtw_original_view;

Batch Refresh Strategy

Example of a batch refresh strategy based on analysis:

DO $$
DECLARE
  view_record RECORD;
  health_result JSONB;
  estimated_time INTERVAL;
  total_time INTERVAL := '0 seconds'::INTERVAL;
  max_batch_time INTERVAL := '2 hours'::INTERVAL;
  views_to_refresh TEXT[] := '{}';
BEGIN
  -- Gather health stats and estimated times
  FOR view_record IN 
    SELECT matviewname 
    FROM pg_matviews 
    WHERE schemaname = 'target_schema'
      AND matviewname LIKE 'matc_%'
  LOOP
    SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) 
    INTO health_result;
    
    IF health_result->>'status' = 'Stale' THEN
      estimated_time := (health_result->>'estimated_refresh_time')::INTERVAL;
      
      -- Add to batch if we don't exceed max time
      IF (total_time + estimated_time) < max_batch_time THEN
        views_to_refresh := array_append(views_to_refresh, view_record.matviewname);
        total_time := total_time + estimated_time;
      END IF;
    END IF;
  END LOOP;

  -- Refresh the batch
  FOREACH view_record.matviewname IN ARRAY views_to_refresh
  LOOP
    RAISE NOTICE 'Refreshing %', view_record.matviewname;
    PERFORM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', 'refresh');
  END LOOP;
END;
$$;

Best Practices

Table Fitness Analysis

  1. Analyze Regularly: Run table fitness analysis regularly to track data quality changes
  2. Compare Over Time: Store historical analysis results for trend tracking
  3. Sample Size Consideration: For very large tables, adjust confidence level and margin of error:
    SELECT public.c77_mvc_calculate_sample_size(10000000, 0.95, 0.05);
    

Materialized View Management

  1. Naming Convention: Follow the expected naming pattern:

    • Materialized views must start with matc_
    • Source views will be created with vtw_ prefix
    • Read views will be created with vm_ prefix
    • Problem record views will have vprob_ prefix
  2. Partition Column Selection: Choose partition columns based on:

    • Table fitness analysis recommendations
    • High uniqueness ratio
    • Low null ratio
    • Business requirements for data segmentation
  3. Order-by Column Selection: Choose columns that:

    • Represent timestamps or dates
    • Have a clear logical ordering in the data
    • Are regularly populated (low null ratio)
  4. Refresh Strategy: Consider:

    • Data change frequency
    • Query load patterns
    • Validation type based on criticality
  5. Performance Monitoring:

    • Track refresh times
    • Monitor the c77_dbh_matv_stats table for historical performance

Function Reference

Table Fitness Analysis

Function Description
c77_mvc_analyze_table_fitness(source_schema, source_table, exclude_key_columns) Main function for table fitness analysis
c77_mvc_calculate_sample_size(total_rows, confidence_level, margin_of_error) Calculate appropriate sample size for analysis
c77_mvc_analyze_column_stats(temp_table_name, col_name, column_type, sample_size, total_rows, exclude_key_columns) Analyze individual column statistics
c77_mvc_analyze_column_combinations(temp_table_name, column_stats, sample_size, total_rows, exclude_key_columns) Analyze column combinations for partitioning
c77_mvc_identify_order_by_candidates(temp_table_name, column_stats) Identify columns suitable for ordering
c77_mvc_calculate_dqi(column_stats) Calculate Data Quality Index

Materialized View Management

Function Description
c77_mvc_create_optimized_matv(source_schema, source_table, target_schema, target_matview, partition_columns, order_by_columns, exclude_columns_from_hash, filter_latest_only) Create an optimized materialized view
c77_mvc_manage_matv_health(target_schema, matview_name, validation_type, action) Check and manage materialized view health
c77_mvc_check_matv_mismatches(target_schema, matview_name, validation_type) Check for mismatches between source and materialized view
c77_mvc_create_indexes(target_schema, target_mv_name, partition_columns) Create indexes on a materialized view
c77_mvc_validate_matv_inputs(schema_name, matview_name, vtw_name) Validate materialized view inputs
c77_mvc_validate_order_by_columns(source_schema, source_table, order_by_columns) Validate order-by columns
c77_mvc_collect_matv_stats(full_matview_name, full_vtw_name) Collect materialized view statistics
c77_mvc_estimate_matv_refresh_time(full_matview_name) Estimate refresh time for a materialized view