Assessment Science

Built on Psychometric Best Practices

Our assessments are designed by language assessment experts, powered by advanced AI, and validated against the globally recognized CEFR framework.

Global Standard

CEFR-Aligned Scoring

The Common European Framework of Reference for Languages (CEFR) is the international standard for describing language ability. Our assessments map directly to CEFR levels A1 through C2, providing universally understood proficiency ratings.

A1-A2

Basic User

Can understand and use familiar everyday expressions

B1-B2

Independent User

Can handle most situations in the target language

C1-C2

Proficient User

Can express fluently with precision and nuance

Sample Score Report

Detailed breakdown by skill dimension

Upper Intermediate

78/100

Speaking (20% each)

Pronunciation82

Fluency75

Grammar78

Vocabulary80

Coherence74

Writing (20% each)

Task Response79

Organization76

Vocabulary81

Grammar77

Conventions80

Test Format

Comprehensive Assessment Structure

Our assessments evaluate both productive skills through carefully designed sections that progress in complexity.

Speaking Assessment

~7 minutes • 4 questions • 3 sections

Warm-Up & Read Aloud

Self-introduction and passage reading to assess pronunciation

Topic Discussion

Extended responses on general topics with 10s think time

Situational Response

Workplace scenarios requiring practical communication

Scoring Dimensions

Pronunciation

Fluency

Grammar

Vocabulary

Coherence

Writing Assessment

~15 minutes • 2 questions • 2 sections

Professional Email

5 minutes to compose a workplace email (50-100 words)

Extended Essay

10 minutes to write an opinion piece (150-250 words)

Scoring Dimensions

Task Response

Organization

Vocabulary

Grammar

Conventions

AI-Powered Analysis

Advanced Language Models with Guardrails

Our scoring engine combines state-of-the-art AI with rigorous quality controls to deliver consistent, fair, and accurate assessments.

Multi-Modal Analysis

Speaking and writing assessments use specialized AI models optimized for each modality, capturing nuances specific to verbal and written communication.

Native audio analysis (Google Gemini)
Advanced text understanding (OpenAI)
5 dimensions scored per section

Security Guardrails

Our system includes multiple layers of protection against manipulation attempts, ensuring assessment integrity.

Prompt injection detection
Content sanitization
Score anomaly flagging

Structured Feedback

Every response receives detailed, actionable feedback with specific examples to help candidates improve.

“What went well” highlights
Specific improvement areas
Dimension-level reasoning

Dynamic Question Bank

Expert-Crafted Prompt Library

Speaking Prompts

Writing Prompts

Prompt Types

Warm-Up

Topic Discussion

Situational

Read Aloud

Essay

Domains Covered

Workplace

Academic

Social

Travel

Technology

Question Design

Scenario-Based Prompts

Our question bank features carefully designed prompts that elicit authentic language use. Each prompt targets specific CEFR levels and assesses multiple competencies.

Randomized Selection

Each candidate receives a unique combination of prompts, preventing memorization and ensuring fair assessment.

Difficulty Calibration

Prompts are tagged with CEFR difficulty ranges and selected to match assessment goals.

Skill Targeting

Each prompt is designed to elicit specific language skills like argumentation, formal register, or narrative ability.

Real-World Contexts

Questions simulate authentic workplace and social situations candidates will encounter.

Fairness & Validity

Commitment to Unbiased Assessment

We actively work to identify and mitigate bias in our assessments, ensuring fair evaluation regardless of accent, dialect, or background.

Accent-Neutral

Trained on diverse English accents from around the world

Content Focus

Evaluates language ability, not cultural knowledge

DIF Analysis

Statistical monitoring for differential item functioning

Human Review

Expert oversight for edge cases and quality assurance

Research-Backed

Continuous Validation

We continuously validate our assessments against human rater benchmarks and industry standards to ensure reliability and accuracy.

0.92

Inter-rater reliability (Target)

Cohen's Kappa

0.85

Consistency (Target)

Cronbach's Alpha

< 2%

AI-Human deviation

Average score difference

See Research-Backed Assessment in Action

Try Evalingo free with 10 assessments — no credit card required. Experience how our AI-powered scoring delivers reliable, CEFR-aligned results.

Get Started Free