# c77_mvc PostgreSQL Extension Usage Guide This guide provides detailed instructions on how to use the c77_mvc extension for materialized view management and table fitness analysis in PostgreSQL. ## Table of Contents 1. [Overview](#overview) 2. [Table Fitness Analysis](#table-fitness-analysis) 3. [Materialized View Management](#materialized-view-management) 4. [Materialized View Health Monitoring](#materialized-view-health-monitoring) 5. [Advanced Use Cases](#advanced-use-cases) 6. [Best Practices](#best-practices) 7. [Function Reference](#function-reference) ## Overview The c77_mvc extension provides two main sets of functionality: 1. **Table Fitness Analysis**: Evaluate table structure for data quality, partitioning suitability, and optimization opportunities 2. **Materialized View Management**: Create and maintain optimized materialized views with content hashing, synthetic keys, and encoding status tracking ## Table Fitness Analysis Table fitness analysis helps you evaluate table structure and data quality to inform optimization decisions. ### Basic Table Analysis To perform a basic analysis on a table: ```sql SELECT * FROM public.c77_mvc_analyze_table_fitness('schema_name', 'table_name'); ``` This returns a JSON object containing: - Column statistics - Recommended partition combinations - Order-by candidates - Data quality index - Analysis notes ### Example with Excluding Key Columns If you have columns that should not be considered for key combinations: ```sql SELECT * FROM public.c77_mvc_analyze_table_fitness( 'schema_name', 'table_name', ARRAY['id', 'created_at']::text[] ); ``` ### Interpreting Analysis Results The analysis result includes: 1. **Column Stats**: Individual column metrics including: - Null ratio - Uniqueness ratio - Encoding issue ratio - Fitness score 2. **Recommended Partition Combinations**: Column pairs that work well together for partitioning: - Uniqueness ratio - Discrimination power - Average fitness score 3. **Order-by Candidates**: Columns suitable for ordering data: - Timestamp columns - Text columns parseable as timestamps 4. **Data Quality Index (DQI)**: Overall score from 0-100 indicating data quality Example query to extract key information: ```sql SELECT run_id, run_timestamp, analysis_result->>'data_quality_index' as dqi, analysis_result->'recommended_partition_combinations' as partition_recommendations FROM public.c77_mvc_table_fitness_audit WHERE source_schema = 'schema_name' AND source_table = 'table_name' ORDER BY run_id DESC LIMIT 1; ``` ## Materialized View Management The extension provides tools to create and manage optimized materialized views. ### Creating Optimized Materialized Views To create an optimized materialized view with synthetic keys and content hashing: ```sql SELECT * FROM public.c77_mvc_create_optimized_matv( 'source_schema', -- Source schema name 'source_table', -- Source table name 'target_schema', -- Target schema for materialized view 'matc_target_mv_name', -- Target materialized view name (should start with matc_) ARRAY['column1', 'column2'], -- Partition columns ARRAY['timestamp_column'], -- Order-by columns ARRAY['exclude_column1'], -- Columns to exclude from content hash calculation (optional) false -- Filter to get only latest records (optional) ); ``` This creates: 1. `vtw_target_mv_name`: A view with regexp_replace for character columns 2. `matc_target_mv_name`: A materialized view derived from the vtw_ view 3. `vm_target_mv_name`: A view that filters out encoding issues 4. `vprob_target_mv_name`: A view showing only records with encoding issues ### View Structure and Purpose When you create an optimized materialized view, multiple objects are created: | Object Type | Naming Pattern | Purpose | |-------------|----------------|---------| | View | vtw_* | Source view with content hash, synthetic key, and encoding status | | Materialized View | matc_* | Materialized copy of the vtw_ view | | View | vm_* | Clean data view (excludes encoding issues) | | View | vprob_* | Problematic data view (only encoding issues) | ### Example Use Case Scenario: Creating a materialized view of customer data: ```sql SELECT * FROM public.c77_mvc_create_optimized_matv( 'sales', 'customers', 'reporting', 'matc_customer_summary', ARRAY['customer_id', 'region'], ARRAY['last_updated'], ARRAY['notes', 'internal_comments'], false ); ``` To query clean data: ```sql SELECT * FROM reporting.vm_customer_summary; ``` To check for encoding issues: ```sql SELECT * FROM reporting.vprob_customer_summary; ``` ## Materialized View Health Monitoring The extension provides tools to monitor and maintain the health of materialized views. ### Checking Materialized View Health ```sql SELECT * FROM public.c77_mvc_manage_matv_health( 'schema_name', -- Schema containing the materialized view 'matc_view_name', -- Materialized view name (should start with matc_) 'quick', -- Validation type: 'quick', 'daily', or 'full' NULL -- Action: NULL, 'refresh', 'repair', or 'reindex' ); ``` Validation types: - `quick`: Fast check with 0.1% sample (3-day threshold) - `daily`: More thorough check with 1% sample (1-day threshold) - `full`: Complete check with 100% sample (12-hour threshold) ### Automated Refresh To check health and refresh if needed: ```sql SELECT * FROM public.c77_mvc_manage_matv_health( 'schema_name', 'matc_view_name', 'daily', 'refresh' -- Will refresh if stale based on thresholds ); ``` ### Monitoring Multiple Views Example script to monitor all materialized views in a schema: ```sql DO $$ DECLARE view_record RECORD; result JSONB; BEGIN FOR view_record IN SELECT matviewname FROM pg_matviews WHERE schemaname = 'target_schema' AND matviewname LIKE 'matc_%' LOOP RAISE NOTICE 'Checking view: %', view_record.matviewname; SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) INTO result; RAISE NOTICE 'Status: %', result->>'status'; END LOOP; END; $$; ``` ## Advanced Use Cases ### Customizing Character Encoding Handling The extension detects non-ASCII characters using the regex pattern `[^\x00-\x7F]`. For custom handling: 1. Create your own view that modifies the `vtw_` view: ```sql CREATE OR REPLACE VIEW custom_schema.my_custom_vtw AS SELECT *, CASE WHEN column1 ~ '[^\x00-\x7F]' OR column2 ~ '[^\x20-\x7E]' THEN 'CUSTOM_ENCODING_ISSUE' ELSE 'CLEAN' END AS encoding_status FROM schema_name.vtw_original_view; ``` ### Batch Refresh Strategy Example of a batch refresh strategy based on analysis: ```sql DO $$ DECLARE view_record RECORD; health_result JSONB; estimated_time INTERVAL; total_time INTERVAL := '0 seconds'::INTERVAL; max_batch_time INTERVAL := '2 hours'::INTERVAL; views_to_refresh TEXT[] := '{}'; BEGIN -- Gather health stats and estimated times FOR view_record IN SELECT matviewname FROM pg_matviews WHERE schemaname = 'target_schema' AND matviewname LIKE 'matc_%' LOOP SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) INTO health_result; IF health_result->>'status' = 'Stale' THEN estimated_time := (health_result->>'estimated_refresh_time')::INTERVAL; -- Add to batch if we don't exceed max time IF (total_time + estimated_time) < max_batch_time THEN views_to_refresh := array_append(views_to_refresh, view_record.matviewname); total_time := total_time + estimated_time; END IF; END IF; END LOOP; -- Refresh the batch FOREACH view_record.matviewname IN ARRAY views_to_refresh LOOP RAISE NOTICE 'Refreshing %', view_record.matviewname; PERFORM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', 'refresh'); END LOOP; END; $$; ``` ## Best Practices ### Table Fitness Analysis 1. **Analyze Regularly**: Run table fitness analysis regularly to track data quality changes 2. **Compare Over Time**: Store historical analysis results for trend tracking 3. **Sample Size Consideration**: For very large tables, adjust confidence level and margin of error: ```sql SELECT public.c77_mvc_calculate_sample_size(10000000, 0.95, 0.05); ``` ### Materialized View Management 1. **Naming Convention**: Follow the expected naming pattern: - Materialized views must start with `matc_` - Source views will be created with `vtw_` prefix - Read views will be created with `vm_` prefix - Problem record views will have `vprob_` prefix 2. **Partition Column Selection**: Choose partition columns based on: - Table fitness analysis recommendations - High uniqueness ratio - Low null ratio - Business requirements for data segmentation 3. **Order-by Column Selection**: Choose columns that: - Represent timestamps or dates - Have a clear logical ordering in the data - Are regularly populated (low null ratio) 4. **Refresh Strategy**: Consider: - Data change frequency - Query load patterns - Validation type based on criticality 5. **Performance Monitoring**: - Track refresh times - Monitor the c77_dbh_matv_stats table for historical performance ## Function Reference ### Table Fitness Analysis | Function | Description | |----------|-------------| | `c77_mvc_analyze_table_fitness(source_schema, source_table, exclude_key_columns)` | Main function for table fitness analysis | | `c77_mvc_calculate_sample_size(total_rows, confidence_level, margin_of_error)` | Calculate appropriate sample size for analysis | | `c77_mvc_analyze_column_stats(temp_table_name, col_name, column_type, sample_size, total_rows, exclude_key_columns)` | Analyze individual column statistics | | `c77_mvc_analyze_column_combinations(temp_table_name, column_stats, sample_size, total_rows, exclude_key_columns)` | Analyze column combinations for partitioning | | `c77_mvc_identify_order_by_candidates(temp_table_name, column_stats)` | Identify columns suitable for ordering | | `c77_mvc_calculate_dqi(column_stats)` | Calculate Data Quality Index | ### Materialized View Management | Function | Description | |----------|-------------| | `c77_mvc_create_optimized_matv(source_schema, source_table, target_schema, target_matview, partition_columns, order_by_columns, exclude_columns_from_hash, filter_latest_only)` | Create an optimized materialized view | | `c77_mvc_manage_matv_health(target_schema, matview_name, validation_type, action)` | Check and manage materialized view health | | `c77_mvc_check_matv_mismatches(target_schema, matview_name, validation_type)` | Check for mismatches between source and materialized view | | `c77_mvc_create_indexes(target_schema, target_mv_name, partition_columns)` | Create indexes on a materialized view | | `c77_mvc_validate_matv_inputs(schema_name, matview_name, vtw_name)` | Validate materialized view inputs | | `c77_mvc_validate_order_by_columns(source_schema, source_table, order_by_columns)` | Validate order-by columns | | `c77_mvc_collect_matv_stats(full_matview_name, full_vtw_name)` | Collect materialized view statistics | | `c77_mvc_estimate_matv_refresh_time(full_matview_name)` | Estimate refresh time for a materialized view |