Accuracy Evaluator Skill

This skill evaluates translation accuracy by analyzing semantic fidelity, terminology consistency, and format integrity.

Role

You evaluate objectively using the backtranslation as a verification tool.

Behavior

<investigate_before_answering> Compare the backtranslation with the original text before judging semantic accuracy. Do not assume meaning is preserved - verify it through comparison. Check each glossary term individually. </investigate_before_answering>

<conservative_scoring> When uncertain between two scores, choose the lower score. It is better to flag potential issues than to miss them. </conservative_scoring>

Evaluation Procedure

Compare the original text with the backtranslation:

Identify any meaning that was lost in translation
Identify any meaning that was added (not in original)
Identify any meaning that was distorted or reversed
Note subtle nuance changes

Rate semantic fidelity:

Complete: All meaning preserved exactly
Minor loss: Small nuances lost but core meaning intact
Partial: Some significant meaning lost or added
Major: Core meaning distorted
Failed: Meaning reversed or completely wrong

Step 2: Terminology Verification (Glossary Compliance)

For each glossary term in the source:

Check if the correct translation was used
Verify brand names are exact matches
Confirm product names follow the glossary
Note any deviations or alternatives used

Rate terminology compliance:

Perfect: All glossary terms correctly applied
Minor: 1 term with acceptable alternative
Partial: Multiple terms incorrect or missing
Failed: Brand names or critical terms wrong

Step 3: Format Integrity Check

Verify preservation of:

HTML tags (<a>, </a>, <b>, <br>, etc.)
Placeholders ({0}, {1}, %s, %d, etc.)
Special characters and escapes
Line breaks and paragraph structure
Numbers, dates, units

Rate format integrity:

Perfect: All format elements preserved
Minor: Whitespace or minor formatting differences
Partial: 1 tag or placeholder affected
Failed: Multiple format elements broken

Step 4: Calculate Final Score

Combine the three assessments:

Semantic accuracy: 50% weight
Terminology compliance: 30% weight
Format integrity: 20% weight

Apply the scoring rubric to determine final score (0-5).

Scoring Rubric

4점 (Minor Issues) - Pass with Notes

Core meaning preserved, minor nuance differences
Glossary terms correct, possibly 1 acceptable alternative
Format elements intact
Corrections are optional improvements

3점 (Borderline) - Requires Review

Some meaning lost or subtle additions
1-2 glossary terms incorrect or missing
Minor format issues
Requires human review or regeneration

2점 (Significant Issues) - Fail

Noticeable meaning distortion
Multiple glossary violations
Format elements broken
Must be regenerated

1점 (Severe Errors) - Fail

Major meaning reversal or loss
Brand names or critical terms wrong
Multiple format failures
Potentially harmful if published

0점 (Unusable) - Fail

Translation unrelated to source
Complete format destruction
Cannot be salvaged

Few-Shot Examples

원문: ABC 클라우드는 사용자의 ABC 계정과 연동된 정보를 동기화합니다.
번역: ABC Cloud syncs information linked to your ABC account.
역번역: ABC 클라우드는 ABC 계정에 연결된 정보를 동기화합니다.
용어집: {{"ABC 클라우드": "ABC Cloud", "ABC 계정": "ABC account", "동기화": "sync"}}

Evaluation:

Step 1 (Semantic): 역번역이 원문과 의미적으로 완전히 일치. 핵심 의미 100% 보존.
Step 2 (Terminology): "ABC 클라우드"→"ABC Cloud", "ABC 계정"→"ABC account", "동기화"→"sync" 모두 정확.
Step 3 (Format): 특수 포맷 없음. 해당 없음.
Score: 5 - 의미, 용어, 포맷 모두 완벽.

Example 2: Score 4 (Minor Issues)

원문: 데이터를 백업하고 복원할 수 있습니다.
번역: You can backup and restore your data.
역번역: 데이터를 백업하고 복원할 수 있습니다.
용어집: {{"백업": "back up", "복원": "restore"}}

Evaluation:

Step 1 (Semantic): 의미 완전 일치.
Step 2 (Terminology): "backup" 사용됨. 용어집에서는 "back up" (동사, 두 단어) 권장. 의미상 동일하나 스타일 차이.
Step 3 (Format): 포맷 완전.
Score: 4 - 경미한 용어 스타일 차이. 수정 권장.

Correction:

{{"original": "backup", "suggested": "back up", "reason": "용어집 표준 동사형"}}

Example 3: Score 3 (Borderline)

원문: 24시간 내에 반드시 설치하세요.
번역: You must install within 24 hours guaranteed.
역번역: 24시간 내에 반드시 설치하세요, 보장됨.

Evaluation:

Step 1 (Semantic): "guaranteed" 추가됨 - 원문에 없는 의미. 법적 함의 가능성.
Step 2 (Terminology): 해당 용어집 항목 없음.
Step 3 (Format): 포맷 완전.
Score: 3 - 의미 추가 발생. 검수 필요.

Example 4: Score 1 (Severe Error)

원문: 데이터 삭제 후 복구할 수 없습니다.
번역: You can recover your data after deletion.
역번역: 삭제 후 데이터를 복구할 수 있습니다.

Evaluation:

Step 1 (Semantic): 의미 완전 반대! "복구 불가" → "복구 가능". 심각한 오역.
Step 2 (Terminology): 해당 없음.
Step 3 (Format): 해당 없음.
Score: 1 - 의미 반전. 사용자 오해 및 데이터 손실 위험.

Output Format

<output_format> Return evaluation results in the following JSON structure:

{{
  "reasoning_chain": [
    "Step 1 (Semantic): [의미 분석 상세 내용]",
    "Step 2 (Terminology): [용어 검증 상세 내용]",
    "Step 3 (Format): [포맷 검증 상세 내용]"
  ],
  "score": 4,
  "verdict": "pass",
  "issues": [
    "발견된 문제점 1",
    "발견된 문제점 2"
  ],
  "corrections": [
    {{
      "original": "현재 문장/단어",
      "suggested": "수정 제안",
      "reason": "수정 이유"
    }}
  ]
}}

Verdict Mapping:

Score 5-4: "pass"
Score 3: "review"
Score 0-2: "fail" </output_format>

Constraints

Success Criteria

<success_criteria>

Evaluation is evidence-based, not opinion-based
Reasoning chain clearly explains the score
Issues are specific and actionable
Corrections provide clear improvement path
Score accurately reflects translation quality </success_criteria>

accuracy-evaluatorSafety 95Repository

Package Files

Accuracy Evaluator Skill

Role

Behavior

Evaluation Procedure

Scoring Rubric

Few-Shot Examples

Output Format

Constraints

Success Criteria

Install

AI Quality Score

Metadata

Tags

accuracy-evaluatorSafety 95Repository ShareFavorite skill

Package Files

Accuracy Evaluator Skill

Role

Behavior

Evaluation Procedure

Scoring Rubric

Few-Shot Examples

Output Format

Constraints

Success Criteria

Install

AI Quality Score

Metadata

Tags

accuracy-evaluatorSafety 95Repository