336 lines
11 KiB
Markdown
336 lines
11 KiB
Markdown
# c77_mvc PostgreSQL Extension Usage Guide
|
|
|
|
This guide provides detailed instructions on how to use the c77_mvc extension for materialized view management and table fitness analysis in PostgreSQL.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Table Fitness Analysis](#table-fitness-analysis)
|
|
3. [Materialized View Management](#materialized-view-management)
|
|
4. [Materialized View Health Monitoring](#materialized-view-health-monitoring)
|
|
5. [Advanced Use Cases](#advanced-use-cases)
|
|
6. [Best Practices](#best-practices)
|
|
7. [Function Reference](#function-reference)
|
|
|
|
## Overview
|
|
|
|
The c77_mvc extension provides two main sets of functionality:
|
|
|
|
1. **Table Fitness Analysis**: Evaluate table structure for data quality, partitioning suitability, and optimization opportunities
|
|
2. **Materialized View Management**: Create and maintain optimized materialized views with content hashing, synthetic keys, and encoding status tracking
|
|
|
|
## Table Fitness Analysis
|
|
|
|
Table fitness analysis helps you evaluate table structure and data quality to inform optimization decisions.
|
|
|
|
### Basic Table Analysis
|
|
|
|
To perform a basic analysis on a table:
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_analyze_table_fitness('schema_name', 'table_name');
|
|
```
|
|
|
|
This returns a JSON object containing:
|
|
- Column statistics
|
|
- Recommended partition combinations
|
|
- Order-by candidates
|
|
- Data quality index
|
|
- Analysis notes
|
|
|
|
### Example with Excluding Key Columns
|
|
|
|
If you have columns that should not be considered for key combinations:
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_analyze_table_fitness(
|
|
'schema_name',
|
|
'table_name',
|
|
ARRAY['id', 'created_at']::text[]
|
|
);
|
|
```
|
|
|
|
### Interpreting Analysis Results
|
|
|
|
The analysis result includes:
|
|
|
|
1. **Column Stats**: Individual column metrics including:
|
|
- Null ratio
|
|
- Uniqueness ratio
|
|
- Encoding issue ratio
|
|
- Fitness score
|
|
|
|
2. **Recommended Partition Combinations**: Column pairs that work well together for partitioning:
|
|
- Uniqueness ratio
|
|
- Discrimination power
|
|
- Average fitness score
|
|
|
|
3. **Order-by Candidates**: Columns suitable for ordering data:
|
|
- Timestamp columns
|
|
- Text columns parseable as timestamps
|
|
|
|
4. **Data Quality Index (DQI)**: Overall score from 0-100 indicating data quality
|
|
|
|
Example query to extract key information:
|
|
|
|
```sql
|
|
SELECT
|
|
run_id,
|
|
run_timestamp,
|
|
analysis_result->>'data_quality_index' as dqi,
|
|
analysis_result->'recommended_partition_combinations' as partition_recommendations
|
|
FROM public.c77_mvc_table_fitness_audit
|
|
WHERE source_schema = 'schema_name' AND source_table = 'table_name'
|
|
ORDER BY run_id DESC
|
|
LIMIT 1;
|
|
```
|
|
|
|
## Materialized View Management
|
|
|
|
The extension provides tools to create and manage optimized materialized views.
|
|
|
|
### Creating Optimized Materialized Views
|
|
|
|
To create an optimized materialized view with synthetic keys and content hashing:
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_create_optimized_matv(
|
|
'source_schema', -- Source schema name
|
|
'source_table', -- Source table name
|
|
'target_schema', -- Target schema for materialized view
|
|
'matc_target_mv_name', -- Target materialized view name (should start with matc_)
|
|
ARRAY['column1', 'column2'], -- Partition columns
|
|
ARRAY['timestamp_column'], -- Order-by columns
|
|
ARRAY['exclude_column1'], -- Columns to exclude from content hash calculation (optional)
|
|
false -- Filter to get only latest records (optional)
|
|
);
|
|
```
|
|
|
|
This creates:
|
|
1. `vtw_target_mv_name`: A view with regexp_replace for character columns
|
|
2. `matc_target_mv_name`: A materialized view derived from the vtw_ view
|
|
3. `vm_target_mv_name`: A view that filters out encoding issues
|
|
4. `vprob_target_mv_name`: A view showing only records with encoding issues
|
|
|
|
### View Structure and Purpose
|
|
|
|
When you create an optimized materialized view, multiple objects are created:
|
|
|
|
| Object Type | Naming Pattern | Purpose |
|
|
|-------------|----------------|---------|
|
|
| View | vtw_* | Source view with content hash, synthetic key, and encoding status |
|
|
| Materialized View | matc_* | Materialized copy of the vtw_ view |
|
|
| View | vm_* | Clean data view (excludes encoding issues) |
|
|
| View | vprob_* | Problematic data view (only encoding issues) |
|
|
|
|
### Example Use Case
|
|
|
|
Scenario: Creating a materialized view of customer data:
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_create_optimized_matv(
|
|
'sales',
|
|
'customers',
|
|
'reporting',
|
|
'matc_customer_summary',
|
|
ARRAY['customer_id', 'region'],
|
|
ARRAY['last_updated'],
|
|
ARRAY['notes', 'internal_comments'],
|
|
false
|
|
);
|
|
```
|
|
|
|
To query clean data:
|
|
```sql
|
|
SELECT * FROM reporting.vm_customer_summary;
|
|
```
|
|
|
|
To check for encoding issues:
|
|
```sql
|
|
SELECT * FROM reporting.vprob_customer_summary;
|
|
```
|
|
|
|
## Materialized View Health Monitoring
|
|
|
|
The extension provides tools to monitor and maintain the health of materialized views.
|
|
|
|
### Checking Materialized View Health
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_manage_matv_health(
|
|
'schema_name', -- Schema containing the materialized view
|
|
'matc_view_name', -- Materialized view name (should start with matc_)
|
|
'quick', -- Validation type: 'quick', 'daily', or 'full'
|
|
NULL -- Action: NULL, 'refresh', 'repair', or 'reindex'
|
|
);
|
|
```
|
|
|
|
Validation types:
|
|
- `quick`: Fast check with 0.1% sample (3-day threshold)
|
|
- `daily`: More thorough check with 1% sample (1-day threshold)
|
|
- `full`: Complete check with 100% sample (12-hour threshold)
|
|
|
|
### Automated Refresh
|
|
|
|
To check health and refresh if needed:
|
|
|
|
```sql
|
|
SELECT * FROM public.c77_mvc_manage_matv_health(
|
|
'schema_name',
|
|
'matc_view_name',
|
|
'daily',
|
|
'refresh' -- Will refresh if stale based on thresholds
|
|
);
|
|
```
|
|
|
|
### Monitoring Multiple Views
|
|
|
|
Example script to monitor all materialized views in a schema:
|
|
|
|
```sql
|
|
DO $$
|
|
DECLARE
|
|
view_record RECORD;
|
|
result JSONB;
|
|
BEGIN
|
|
FOR view_record IN
|
|
SELECT matviewname
|
|
FROM pg_matviews
|
|
WHERE schemaname = 'target_schema'
|
|
AND matviewname LIKE 'matc_%'
|
|
LOOP
|
|
RAISE NOTICE 'Checking view: %', view_record.matviewname;
|
|
SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL) INTO result;
|
|
RAISE NOTICE 'Status: %', result->>'status';
|
|
END LOOP;
|
|
END;
|
|
$$;
|
|
```
|
|
|
|
## Advanced Use Cases
|
|
|
|
### Customizing Character Encoding Handling
|
|
|
|
The extension detects non-ASCII characters using the regex pattern `[^\x00-\x7F]`. For custom handling:
|
|
|
|
1. Create your own view that modifies the `vtw_` view:
|
|
```sql
|
|
CREATE OR REPLACE VIEW custom_schema.my_custom_vtw AS
|
|
SELECT *,
|
|
CASE
|
|
WHEN column1 ~ '[^\x00-\x7F]' OR column2 ~ '[^\x20-\x7E]' THEN 'CUSTOM_ENCODING_ISSUE'
|
|
ELSE 'CLEAN'
|
|
END AS encoding_status
|
|
FROM schema_name.vtw_original_view;
|
|
```
|
|
|
|
### Batch Refresh Strategy
|
|
|
|
Example of a batch refresh strategy based on analysis:
|
|
|
|
```sql
|
|
DO $$
|
|
DECLARE
|
|
view_record RECORD;
|
|
health_result JSONB;
|
|
estimated_time INTERVAL;
|
|
total_time INTERVAL := '0 seconds'::INTERVAL;
|
|
max_batch_time INTERVAL := '2 hours'::INTERVAL;
|
|
views_to_refresh TEXT[] := '{}';
|
|
BEGIN
|
|
-- Gather health stats and estimated times
|
|
FOR view_record IN
|
|
SELECT matviewname
|
|
FROM pg_matviews
|
|
WHERE schemaname = 'target_schema'
|
|
AND matviewname LIKE 'matc_%'
|
|
LOOP
|
|
SELECT * FROM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', NULL)
|
|
INTO health_result;
|
|
|
|
IF health_result->>'status' = 'Stale' THEN
|
|
estimated_time := (health_result->>'estimated_refresh_time')::INTERVAL;
|
|
|
|
-- Add to batch if we don't exceed max time
|
|
IF (total_time + estimated_time) < max_batch_time THEN
|
|
views_to_refresh := array_append(views_to_refresh, view_record.matviewname);
|
|
total_time := total_time + estimated_time;
|
|
END IF;
|
|
END IF;
|
|
END LOOP;
|
|
|
|
-- Refresh the batch
|
|
FOREACH view_record.matviewname IN ARRAY views_to_refresh
|
|
LOOP
|
|
RAISE NOTICE 'Refreshing %', view_record.matviewname;
|
|
PERFORM public.c77_mvc_manage_matv_health('target_schema', view_record.matviewname, 'quick', 'refresh');
|
|
END LOOP;
|
|
END;
|
|
$$;
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### Table Fitness Analysis
|
|
|
|
1. **Analyze Regularly**: Run table fitness analysis regularly to track data quality changes
|
|
2. **Compare Over Time**: Store historical analysis results for trend tracking
|
|
3. **Sample Size Consideration**: For very large tables, adjust confidence level and margin of error:
|
|
```sql
|
|
SELECT public.c77_mvc_calculate_sample_size(10000000, 0.95, 0.05);
|
|
```
|
|
|
|
### Materialized View Management
|
|
|
|
1. **Naming Convention**: Follow the expected naming pattern:
|
|
- Materialized views must start with `matc_`
|
|
- Source views will be created with `vtw_` prefix
|
|
- Read views will be created with `vm_` prefix
|
|
- Problem record views will have `vprob_` prefix
|
|
|
|
2. **Partition Column Selection**: Choose partition columns based on:
|
|
- Table fitness analysis recommendations
|
|
- High uniqueness ratio
|
|
- Low null ratio
|
|
- Business requirements for data segmentation
|
|
|
|
3. **Order-by Column Selection**: Choose columns that:
|
|
- Represent timestamps or dates
|
|
- Have a clear logical ordering in the data
|
|
- Are regularly populated (low null ratio)
|
|
|
|
4. **Refresh Strategy**: Consider:
|
|
- Data change frequency
|
|
- Query load patterns
|
|
- Validation type based on criticality
|
|
|
|
5. **Performance Monitoring**:
|
|
- Track refresh times
|
|
- Monitor the c77_dbh_matv_stats table for historical performance
|
|
|
|
## Function Reference
|
|
|
|
### Table Fitness Analysis
|
|
|
|
| Function | Description |
|
|
|----------|-------------|
|
|
| `c77_mvc_analyze_table_fitness(source_schema, source_table, exclude_key_columns)` | Main function for table fitness analysis |
|
|
| `c77_mvc_calculate_sample_size(total_rows, confidence_level, margin_of_error)` | Calculate appropriate sample size for analysis |
|
|
| `c77_mvc_analyze_column_stats(temp_table_name, col_name, column_type, sample_size, total_rows, exclude_key_columns)` | Analyze individual column statistics |
|
|
| `c77_mvc_analyze_column_combinations(temp_table_name, column_stats, sample_size, total_rows, exclude_key_columns)` | Analyze column combinations for partitioning |
|
|
| `c77_mvc_identify_order_by_candidates(temp_table_name, column_stats)` | Identify columns suitable for ordering |
|
|
| `c77_mvc_calculate_dqi(column_stats)` | Calculate Data Quality Index |
|
|
|
|
### Materialized View Management
|
|
|
|
| Function | Description |
|
|
|----------|-------------|
|
|
| `c77_mvc_create_optimized_matv(source_schema, source_table, target_schema, target_matview, partition_columns, order_by_columns, exclude_columns_from_hash, filter_latest_only)` | Create an optimized materialized view |
|
|
| `c77_mvc_manage_matv_health(target_schema, matview_name, validation_type, action)` | Check and manage materialized view health |
|
|
| `c77_mvc_check_matv_mismatches(target_schema, matview_name, validation_type)` | Check for mismatches between source and materialized view |
|
|
| `c77_mvc_create_indexes(target_schema, target_mv_name, partition_columns)` | Create indexes on a materialized view |
|
|
| `c77_mvc_validate_matv_inputs(schema_name, matview_name, vtw_name)` | Validate materialized view inputs |
|
|
| `c77_mvc_validate_order_by_columns(source_schema, source_table, order_by_columns)` | Validate order-by columns |
|
|
| `c77_mvc_collect_matv_stats(full_matview_name, full_vtw_name)` | Collect materialized view statistics |
|
|
| `c77_mvc_estimate_matv_refresh_time(full_matview_name)` | Estimate refresh time for a materialized view |
|