Documentation Index
Fetch the complete documentation index at: https://mintlify.com/khaphanspace/gonhanh.org/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The validation algorithm ensures that only valid Vietnamese syllables are transformed by the IME engine. It uses a whitelist-based approach with 6 sequential rules to validate syllable structure, phonotactics, and spelling conventions.Purpose
- Code identifiers (
function,const) - Foreign names (
John,Claude) - Loanwords (
pizza,email) - URLs and email addresses
Syllable Structure
Formula
Examples
| Input | C₁ | G | V | C₂ | Valid |
|---|---|---|---|---|---|
a | - | - | a | - | ✓ |
ban | b | - | a | n | ✓ |
hoa | h | o | a | - | ✓ |
qua | qu | - | a | - | ✓ |
giau | gi | - | au | - | ✓ |
nghieng | ngh | - | ie | ng | ✓ |
duoc | d | - | uo | c | ✓ |
Data Constants
Initial Consonants (C₁)
- Single (16)
- Double (11)
- Triple (1)
core/src/data/constants.rs
Final Consonants (C₂)
- Single (10)
- Double (3)
k is included for ethnic minority names (Đắk Lắk, Đắk Nông)
Spelling Rules
C / K / Q Distribution
C / K / Q Distribution
| Consonant | Invalid before | Should use |
|---|---|---|
c | e, i, y | → k |
k | a, o, u | → c |
q | (always with u) | → qu |
G / GH Distribution
G / GH Distribution
| Consonant | Invalid before | Should use |
|---|---|---|
g | e | → gh |
gh | a, o, u | → g |
NG / NGH Distribution
NG / NGH Distribution
| Consonant | Invalid before | Should use |
|---|---|---|
ng | e, i | → ngh |
ngh | a, o, u | → ng |
Valid Vowel Patterns (Whitelist)
- Diphthongs (29)
- Triphthongs (11)
core/src/data/constants.rs
| Aspect | Inclusion (whitelist) | Exclusion (blacklist) |
|---|---|---|
| Coverage | Comprehensive - catches all invalid | Only catches listed patterns |
| Maintenance | Need to add Telex states | Easy to miss edge cases |
| Risk | False negative (need Telex states) | False positive (miss invalid) |
ea→ sea, beach, teacher, searchou→ you, our, house, about, wouldyo→ yoke, York, your, beyond
Validation Rules
Rule Execution Order
core/src/engine/validation.rs
Rule 3: All Chars Parsed
Rule 4: Spelling Rules
- ✓
ca,ke,ghe,nghi - ✗
ci,ce,cy,ka,ko,ku,ge,nge(spelling violations)
Foreign Word Detection
Beyond validation, the engine detects foreign word patterns to skip transformation:core/src/engine/validation.rs
Invalid Vowel Patterns
ou(you, our, house)yo(yoke, York)ea(search, beach)
Consonant Clusters
- T+R (metric, matrix)
- P+R (spectrum)
- C+R (across)
Validation API
core/src/engine/validation.rs
Integration with Engine
Test Coverage
- Valid Syllables
- Invalid - No Vowel
- Invalid - Bad Initial
- Invalid - Spelling
- Invalid - Foreign
Performance Considerations
Fast Failure
Rules execute sequentially, first failure returns immediately
Constant Lookup
All validation data in const arrays, O(1) access
No Allocations
Uses slices and indices, no heap allocations
Zero-Copy
BufferSnapshot uses references to original buffer
See Also:
- Core Engine - 7-stage processing pipeline
- Vietnamese Language System - Language foundations
- System Architecture - High-level overview