11 KiB
c77_mvc PostgreSQL Extension Usage Guide
This guide provides detailed instructions on how to use the c77_mvc extension for materialized view management and table fitness analysis in PostgreSQL.
Table of Contents
- Overview
- Table Fitness Analysis
- Materialized View Management
- Materialized View Health Monitoring
- Advanced Use Cases
- Best Practices
- Function Reference
Overview
The c77_mvc extension provides two main sets of functionality:
- Table Fitness Analysis: Evaluate table structure for data quality, partitioning suitability, and optimization opportunities
- Materialized View Management: Create and maintain optimized materialized views with content hashing, synthetic keys, and encoding status tracking
Table Fitness Analysis
Table fitness analysis helps you evaluate table structure and data quality to inform optimization decisions.
Basic Table Analysis
To perform a basic analysis on a table:
SELECT * FROM public.c77_mvc_analyze_table_fitness('schema_name', 'table_name');
This returns a JSON object containing:
- Column statistics
- Recommended partition combinations
- Order-by candidates
- Data quality index
- Analysis notes
Example with Excluding Key Columns
If you have columns that should not be considered for key combinations:
SELECT * FROM public.c77_mvc_analyze_table_fitness(
'schema_name',
'table_name',
ARRAY['id', 'created_at']::text[]
);
Interpreting Analysis Results
The analysis result includes:
-
Column Stats: Individual column metrics including:
- Null ratio
- Uniqueness ratio
- Encoding issue ratio
- Fitness score
-
Recommended Partition Combinations: Column pairs that work well together for partitioning:
- Uniqueness ratio
- Discrimination power
- Average fitness score
-
Order-by Candidates: Columns suitable for ordering data:
- Timestamp columns
- Text columns parseable as timestamps
-
Data Quality Index (DQI): Overall score from 0-100 indicating data quality
Example query to extract key information:
SELECT
run_id,
run_timestamp,
analysis_result->>'data_quality_index' as dqi,
analysis_result->'recommended_partition_combinations' as partition_recommendations
FROM public.c77_mvc_table_fitness_audit
WHERE source_schema = 'schema_name' AND source_table = 'table_name'
ORDER BY run_id DESC
LIMIT 1;
Materialized View Management
The extension provides tools to create and manage optimized materialized views.
Creating Optimized Materialized Views
To create an optimized materialized view with synthetic keys and content hashing:
SELECT * FROM public.c77_mvc_create_optimized_matv(
'source_schema', -- Source schema name
'source_table', -- Source table name
'target_schema', -- Target schema for materialized view
'matc_target_mv_name', -- Target materialized view name (should start with matc_)
ARRAY['column1', 'column2'], -- Partition columns
ARRAY['timestamp_column'], -- Order-by columns
ARRAY['exclude_column1'], -- Columns to exclude from content hash calculation (optional)
false -- Filter to get only latest records (optional)
);
This creates:
vtw_target_mv_name
: A view with regexp_replace for character columnsmatc_target_mv_name
: A materialized view derived from the vtw_ viewvm_target_mv_name
: A view that filters out encoding issuesvprob_target_mv_name
: A view showing only records with encoding issues
View Structure and Purpose
When you create an optimized materialized view, multiple objects are created:
Object Type | Naming Pattern | Purpose |
---|---|---|
View | vtw_* | Source view with content hash, synthetic key, and encoding status |
Materialized View | matc_* | Materialized copy of the vtw_ view |
View | vm_* | Clean data view (excludes encoding issues) |
View | vprob_* | Problematic data view (only encoding issues) |
Example Use Case
Scenario: Creating a materialized view of customer data:
SELECT * FROM public.c77_mvc_create_optimized_matv(
'sales',
'customers',
'reporting',
'matc_customer_summary',
ARRAY['customer_id', 'region'],
ARRAY['last_updated'],
ARRAY['notes', 'internal_comments'],
false
);
To query clean data:
SELECT * FROM reporting.vm_customer_summary;
To check for encoding issues:
SELECT * FROM reporting.vprob_customer_summary;
Materialized View Health Monitoring
The extension provides tools to monitor and maintain the health of materialized views.
Checking Materialized View Health
SELECT * FROM public.c77_mvc_manage_matv_health(
'schema_name', -- Schema containing the materialized view
'matc_view_name', -- Materialized view name (should start with matc_)
'quick', -- Validation type: 'quick', 'daily', or 'full'
NULL -- Action: NULL, 'refresh', 'repair', or 'reindex'
);
Validation types:
quick
: Fast check with 0.1% sample (3-day threshold)daily
: More thorough check with 1% sample (1-day threshold)full
: Complete check with 100% sample (12-hour threshold)
Automated Refresh
To check health and refresh if needed:
SELECT * FROM public.c77_mvc_manage_matv_health(
'schema_name',
'matc_view_name',
'daily',
'refresh' -- Will refresh if stale based on thresholds
);
Monitoring Multiple Views
Example script to monitor all materialized views in a schema:
DO $$
DECLARE
view_record RECORD;
result JSONB;
BEGIN
FOR view_record IN
SELECT matviewname
FROM pg_matviews
WHERE schemaname = 'target_schema'
AND matviewname LIKE 'matc_%'
LOOP
RAISE NOTICE 'Checking view: %', view_record.matviewname;
SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) INTO result;
RAISE NOTICE 'Status: %', result->>'status';
END LOOP;
END;
$$;
Advanced Use Cases
Customizing Character Encoding Handling
The extension detects non-ASCII characters using the regex pattern [^\x00-\x7F]
. For custom handling:
- Create your own view that modifies the
vtw_
view:
CREATE OR REPLACE VIEW custom_schema.my_custom_vtw AS
SELECT *,
CASE
WHEN column1 ~ '[^\x00-\x7F]' OR column2 ~ '[^\x20-\x7E]' THEN 'CUSTOM_ENCODING_ISSUE'
ELSE 'CLEAN'
END AS encoding_status
FROM schema_name.vtw_original_view;
Batch Refresh Strategy
Example of a batch refresh strategy based on analysis:
DO $$
DECLARE
view_record RECORD;
health_result JSONB;
estimated_time INTERVAL;
total_time INTERVAL := '0 seconds'::INTERVAL;
max_batch_time INTERVAL := '2 hours'::INTERVAL;
views_to_refresh TEXT[] := '{}';
BEGIN
-- Gather health stats and estimated times
FOR view_record IN
SELECT matviewname
FROM pg_matviews
WHERE schemaname = 'target_schema'
AND matviewname LIKE 'matc_%'
LOOP
SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL)
INTO health_result;
IF health_result->>'status' = 'Stale' THEN
estimated_time := (health_result->>'estimated_refresh_time')::INTERVAL;
-- Add to batch if we don't exceed max time
IF (total_time + estimated_time) < max_batch_time THEN
views_to_refresh := array_append(views_to_refresh, view_record.matviewname);
total_time := total_time + estimated_time;
END IF;
END IF;
END LOOP;
-- Refresh the batch
FOREACH view_record.matviewname IN ARRAY views_to_refresh
LOOP
RAISE NOTICE 'Refreshing %', view_record.matviewname;
PERFORM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', 'refresh');
END LOOP;
END;
$$;
Best Practices
Table Fitness Analysis
- Analyze Regularly: Run table fitness analysis regularly to track data quality changes
- Compare Over Time: Store historical analysis results for trend tracking
- Sample Size Consideration: For very large tables, adjust confidence level and margin of error:
SELECT public.c77_mvc_calculate_sample_size(10000000, 0.95, 0.05);
Materialized View Management
-
Naming Convention: Follow the expected naming pattern:
- Materialized views must start with
matc_
- Source views will be created with
vtw_
prefix - Read views will be created with
vm_
prefix - Problem record views will have
vprob_
prefix
- Materialized views must start with
-
Partition Column Selection: Choose partition columns based on:
- Table fitness analysis recommendations
- High uniqueness ratio
- Low null ratio
- Business requirements for data segmentation
-
Order-by Column Selection: Choose columns that:
- Represent timestamps or dates
- Have a clear logical ordering in the data
- Are regularly populated (low null ratio)
-
Refresh Strategy: Consider:
- Data change frequency
- Query load patterns
- Validation type based on criticality
-
Performance Monitoring:
- Track refresh times
- Monitor the c77_dbh_matv_stats table for historical performance
Function Reference
Table Fitness Analysis
Function | Description |
---|---|
c77_mvc_analyze_table_fitness(source_schema, source_table, exclude_key_columns) |
Main function for table fitness analysis |
c77_mvc_calculate_sample_size(total_rows, confidence_level, margin_of_error) |
Calculate appropriate sample size for analysis |
c77_mvc_analyze_column_stats(temp_table_name, col_name, column_type, sample_size, total_rows, exclude_key_columns) |
Analyze individual column statistics |
c77_mvc_analyze_column_combinations(temp_table_name, column_stats, sample_size, total_rows, exclude_key_columns) |
Analyze column combinations for partitioning |
c77_mvc_identify_order_by_candidates(temp_table_name, column_stats) |
Identify columns suitable for ordering |
c77_mvc_calculate_dqi(column_stats) |
Calculate Data Quality Index |
Materialized View Management
Function | Description |
---|---|
c77_mvc_create_optimized_matv(source_schema, source_table, target_schema, target_matview, partition_columns, order_by_columns, exclude_columns_from_hash, filter_latest_only) |
Create an optimized materialized view |
c77_mvc_manage_matv_health(target_schema, matview_name, validation_type, action) |
Check and manage materialized view health |
c77_mvc_check_matv_mismatches(target_schema, matview_name, validation_type) |
Check for mismatches between source and materialized view |
c77_mvc_create_indexes(target_schema, target_mv_name, partition_columns) |
Create indexes on a materialized view |
c77_mvc_validate_matv_inputs(schema_name, matview_name, vtw_name) |
Validate materialized view inputs |
c77_mvc_validate_order_by_columns(source_schema, source_table, order_by_columns) |
Validate order-by columns |
c77_mvc_collect_matv_stats(full_matview_name, full_vtw_name) |
Collect materialized view statistics |
c77_mvc_estimate_matv_refresh_time(full_matview_name) |
Estimate refresh time for a materialized view |