
A data product schema defines the structure of your data. Each field represents a column or attribute.
Field Properties:
Field Definition:
├── Name (technical name)
├── Display Name (user-friendly)
├── Data Type (see types below)
├── Required (true/false)
├── Description
├── Business Rules
├── Default Value
├── Example Values
├── Validation Rules
└── Quality Attributes
Example Field:
Name: transaction_amount
Display Name: Transaction Amount
Data Type: Decimal(10,2)
Required: Yes
Description: Total transaction value including tax
Business Rules: Must be positive, max 1 million
Default Value: 0.00
Examples: 49.99, 1299.00, 15.50
Validation: Range(0, 1000000)
Quality Attributes: Non-null, Positive, Formatted
Supported Data Types:
| Type | Description | Example | Use Cases |
|---|---|---|---|
| String | Text data | "John Doe" | Names, descriptions |
| Integer | Whole numbers | 42 | Counts, IDs |
| Decimal | Floating point | 19.99 | Prices, percentages |
| Boolean | True/False | true | Flags, status |
| Date | Date only | 2025-01-25 | Birth dates, deadlines |
| Timestamp | Date + Time | 2025-01-25T14:30:00Z | Events, logs |
| UUID | Unique ID | 550e8400-e29b-41d4-a716-446655440000 | Primary keys |
| JSON | Structured data | Complex objects | |
| Array | List of values | [1, 2, 3] | Tags, categories |
| Enum | Fixed set | "ACTIVE" | Status codes |
String:
Max Length: 1 - 65535 characters
Format: UTF-8
Validation: Regex patterns, length constraints
Example Constraints:
- Email format
- Phone number format
- URL format
Decimal:
Precision: Total digits (e.g., 10)
Scale: Decimal places (e.g., 2)
Example: Decimal(10,2) allows 99999999.99
Enum:
Definition: ["ACTIVE", "INACTIVE", "PENDING", "SUSPENDED"]
Validation: Must be one of defined values
Case Sensitive: Yes (by default)
UUID:
Format: 8-4-4-4-12 hexadecimal
Example: 550e8400-e29b-41d4-a716-446655440000
Version: UUID v4 (random) recommended
Quality attributes ensure data integrity and reliability.
☑ Required: Field must have a value
☑ Non-Null: No null values allowed
☐ Optional: Value can be missing
Example:
customer_email: Required, Non-Null
customer_phone: Optional
☑ Unique: No duplicate values
☑ Primary Key: Unique identifier
☐ Non-Unique: Duplicates allowed
Example:
customer_id: Unique, Primary Key
order_date: Non-Unique
☑ Format Validation: Must match pattern
☑ Range Validation: Min/max bounds
☑ Enum Validation: Must be from list
Examples:
email: Format(email regex)
age: Range(0, 150)
status: Enum(['ACTIVE', 'INACTIVE'])
☑ Precision: Number of decimal places
☑ Scale: Total significant digits
☑ Reference Check: Foreign key validation
Example:
price: Precision(2), Scale(10)
customer_id: References(customers.id)
☑ Freshness: Max age of data
☑ Update Frequency: How often updated
Example:
stock_price: Freshness(5 minutes)
daily_summary: Update Frequency(Daily)
☑ Cross-Field Validation: Related fields match
☑ Referential Integrity: Valid references
Example:
start_date < end_date
order_total = SUM(line_items.amount)
{
"field": "transaction_amount",
"qualityAttributes": {
"required": true,
"non_null": true,
"min_value": 0,
"max_value": 1000000,
"precision": 2,
"format": "currency",
"freshness": "5 minutes",
"consistency_check": "matches_line_items_sum"
}
}
Schema creation methods are covered in detail in the Sync Strategies section: