Documentation
¶
Overview ¶
Package cefr provides CEFR (Common European Framework of Reference) difficulty level assessment for English text.
It uses a multi-feature fusion algorithm combining vocabulary analysis (50%), syntactic complexity (30%), and readability formulas (20%) to produce a CEFR level (A1–C2) and a continuous score (1.0–6.0) for any given English text.
All word frequency data (Oxford 5000, NGSL, AWL, irregular forms, etc.) is embedded at compile time via embed.FS, so the library has zero external dependencies and works fully offline.
Basic usage:
result, err := cefr.Assess("The cat sat on the mat.")
fmt.Println(result.Level, result.Score)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Level ¶
type Level int
Level represents a CEFR proficiency level as an integer.
CEFR level constants.
type Option ¶
type Option func(*config)
Option configures the behavior of the Assess function.
func WithFullAnalysis ¶
func WithFullAnalysis() Option
WithFullAnalysis forces full-text analysis instead of sampling for long texts.
func WithSamplingThreshold ¶
WithSamplingThreshold sets a custom word-count threshold above which sampling is applied. n must be > 0; otherwise the option is ignored.
func WithWeights ¶
WithWeights sets custom weights for vocabulary, syntax and readability sub-scores. The three values must sum to 1.0 (tolerance ±0.001). If the constraint is not satisfied, the option is silently ignored and default weights are kept.
type ReadabilityResult ¶
type ReadabilityResult struct {
Score float64 // Readability sub-score (1.0–6.0)
FKGL float64 // Flesch-Kincaid Grade Level
FRE float64 // Flesch Reading Ease
CLI float64 // Coleman-Liau Index
}
ReadabilityResult holds readability formula outputs.
type Result ¶
type Result struct {
Level string // CEFR level label: "A1" through "C2"
Score float64 // Continuous score from 1.0 to 6.0
Confidence float64 // Confidence of the assessment from 0.0 to 1.0
Vocab VocabResult // Vocabulary analysis details
Syntax SyntaxResult // Syntactic complexity details
Readability ReadabilityResult // Readability formula details
WordCount int // Total word count
SentenceCount int // Total sentence count
}
Result holds the complete CEFR assessment output for a text.
type SyntaxResult ¶
type SyntaxResult struct {
Score float64 // Syntax sub-score (1.0–6.0)
AvgSentenceLength float64 // Average words per sentence
SubordinationIndex float64 // Ratio of subordinate clauses
PassiveRate float64 // Ratio of passive voice constructions
ConnectorDiversity int // Number of distinct connector types used
}
SyntaxResult holds syntactic complexity metrics.
type Token ¶
type Token struct {
Original string // Original text (case-preserved)
Lower string // Lowercase form (for dictionary lookup)
IsFirst bool // Whether this is a sentence-initial word
IsProper bool // Whether this is a proper noun (skip for scoring)
IsStopword bool // Whether this is a stopword
IsFiltered bool // Whether this is filtered (number/punctuation/non-ASCII)
}
Token represents a processed word from the input text.
type VocabResult ¶
type VocabResult struct {
Score float64 // Vocabulary sub-score (1.0–6.0)
Distribution map[string]float64 // CEFR level distribution of content words
UnknownRatio float64 // Ratio of words not found in any word list
ContentWords int // Number of content words analyzed
}
VocabResult holds vocabulary analysis metrics.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
Package data holds embedded word frequency and linguistic reference data used by the cefr package for vocabulary lookup and analysis.
|
Package data holds embedded word frequency and linguistic reference data used by the cefr package for vocabulary lookup and analysis. |