Grader Comparison Dashboard

Analyze grader statistics and score divergence

Total Graders: 6
Divergent Scores: 53

Grader Overview

Grader Claims Reviewed Total Scores Avg Score Std Dev
B
Bao
20 200 4.01 1.14
L
lizzy
20 200 4.22 1.25
R
Rob
3 15 4.2 0.86
T
Test
2 2 3.5 0.71
J
Jawand
0 0 0.0 0.0
R
Richard
0 0 0.0 0.0

Inter-Rater Reliability (Lizzy vs Bao)

Weighted Kappa i
0.53
Mean Absolute Difference i
0.73 pts
Paired Items i
200

Score Divergence (2+ Point Disagreements)

BH 1564979706 MODERATE
1 divergent dimension
Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
4
L
2

Rationale:

it does not understand this should be a simple, routine injury

BH 2088482200 MINOR
2 divergent dimensions
Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3
L
5
Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
4
L
2

Rationale:

it is not understanding that this is a minor injury

AB 3499234442 MODERATE
6 divergent dimensions
Quality and accuracy Accuracy
Difference: 2 pts
B
3

Rationale:

- Minor severity instead of moderate since no surgery involved and just an ankle sprain. - Claimant's occupation is clearly a student but industry risk category is healthcare outpatient - Adjuster notes mentioned clear liability, but litigation risk rationale mentioned disputed liability leading to misclassification - Since litigation risk was misclassified, it leads to settlement likelihood and management complexity to be misclassified as well.

L
5
Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3

Rationale:

LLM classified claimant industry risk category as healthcare outpatient instead of education.

R
5

Rationale:

Everything looks clinically accurate.

Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3

Rationale:

LLM classified claimant industry risk category as healthcare outpatient instead of education.

L
5

Rationale:

it was able to appropriately use the RICE definition

Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
3
L
5
Quality and accuracy Relevance
Difference: 2 pts
B
5
T
3
Quality and accuracy Relevance
Difference: 2 pts
T
3
L
5

Rationale:

the claim was talking about a high ankle sprain which was relevant to the unstructured data

AJ 3628150755 MINOR
1 divergent dimension
Safety and Ethics Hallucination
Difference: 2 pts
B
4
L
2

Rationale:

it kind of is making up the price i think

CA 3950946063 MAJOR
1 divergent dimension
Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
3
L
5
BH 4462291667 MAJOR
2 divergent dimensions
Quality and accuracy Accuracy
Difference: 2 pts
B
3

Rationale:

- Ultimate cost category is not $118k

L
1

Rationale:

made up the costs again

Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
4
L
2

Rationale:

it got some of the cost categories wrong which tells me it is not understanding these context

AG 4726770034 MAJOR
2 divergent dimensions
Quality and accuracy Completeness
Difference: 2 pts
B
5
L
3

Rationale:

i think it's messing up some of the prices or maybe I am not understanding them

Safety and Ethics Missing Information
Difference: 2 pts
B
5
L
3

Rationale:

it mentioned there was none for safety which i don't know is true since there was a safety harness, it just was not properly secured, so I don't know if there is a better term for this.

AK 5908806823 MODERATE
3 divergent dimensions
Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
5
L
3

Rationale:

it understands everything, but like for this example, he fell when playing soccer, so I don't think there is any safety that should be involved and the LLM said unknown, and I wonder if there is a better category for this since there really isn't any safety stuff that can be included in this, it was just an accident, like he fell

Safety and Ethics Hallucination
Difference: 2 pts
B
4
L
2

Rationale:

the LLM may be hallucinating on the cost,

Safety and Ethics Missing Information
Difference: 2 pts
B
3
L
5
AB 6165669449 MODERATE
2 divergent dimensions
Quality and accuracy Completeness
Difference: 3 pts
B
5
L
2

Rationale:

it did not include the safety harness in safety evaluation, which I think may change the category from none to something else. It's not that she wasn't using safety precautions, she just accidently fell.

Safety and Ethics Missing Information
Difference: 2 pts
B
5
L
3

Rationale:

it did not mention anything about how she feels like every day is reminder of what could have been with her wrist mobility. I don't know if we see this as important, but could it be a reason that she chooses to sue?? also, they left out the safety harness in the safety evaluation, so I don't know if that may be considered for something.

BC 6277819370 MODERATE
12 divergent dimensions
Quality and accuracy Accuracy
Difference: 3 pts
B
1

Rationale:

- Injury severity should be minor - Medical complexity should be simple - Medical complexity should be fast resolution - Risk level should be low - Experience modifier impact should be neutral

R
4

Rationale:

The development and cost seem correct. I would rate severity as mild and complexity as routine.

Quality and accuracy Accuracy
Difference: 3 pts
B
1

Rationale:

- Injury severity should be minor - Medical complexity should be simple - Medical complexity should be fast resolution - Risk level should be low - Experience modifier impact should be neutral

L
4
Understanding and Reasoning Contextual Understanding
Difference: 3 pts
R
4

Rationale:

This overall seemed reasonable but the assessment of PT and ortho as more than simple care shows lack of understanding of common treatments for almost any condition with pain or discomfort.

L
1

Rationale:

a lot of the moderate category should be simple and low, so it is not understanding that this is a low and easy injury compared to others so I don't think it understands that

Safety and Ethics Missing Information
Difference: 3 pts
B
2
L
5
Quality and accuracy Completeness
Difference: 2 pts
B
2
R
4

Rationale:

It caught most details. The connection of SIU -> potential fraud indicator was not picked up.

Quality and accuracy Completeness
Difference: 2 pts
B
2
L
4
Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3
R
5

Rationale:

The information all seemed aligned with existing code sets.

Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3
L
5
Understanding and Reasoning Contextual Understanding
Difference: 2 pts
B
3
L
1

Rationale:

a lot of the moderate category should be simple and low, so it is not understanding that this is a low and easy injury compared to others so I don't think it understands that

Safety and Ethics Hallucination
Difference: 2 pts
B
3
R
5

Rationale:

I noted no hallucinations.

Safety and Ethics Hallucination
Difference: 2 pts
B
3
L
5
Safety and Ethics Missing Information
Difference: 2 pts
B
2
R
4

Rationale:

The SIU was a missing element in the scoring.

AT 6478583886 MODERATE
1 divergent dimension
Safety and Ethics Missing Information
Difference: 2 pts
B
2
L
4

Rationale:

it did not leave out anything important, just made up some stuff

AN 6744942349 MODERATE
3 divergent dimensions
Quality and accuracy Accuracy
Difference: 3 pts
B
2

Rationale:

- Expected development pattern should be fast resolution because claimant recovered <6 months, clear liability, and returned to full duty. LLM mentioned "expected to return to full duty soon" - For industry risk category, claimant occupation is marketing specialist not healthcare outpatient

L
5
Quality and accuracy Completeness
Difference: 2 pts
B
4
L
2

Rationale:

it left out some important points that i think would change some categories to score less harsh, like it should be a simple accident because it was a sprain with PT only, and I don't know if the model picked up on that.

Safety and Ethics Missing Information
Difference: 2 pts
B
4
L
2

Rationale:

it left out that the surface was dangerous for the safety variable, and it had the type of accident set to moderate when this should really be simple.

BC 6775379108 MODERATE
1 divergent dimension
Understanding and Reasoning Contextual Understanding
Difference: 3 pts
B
4
L
1

Rationale:

it got the cost wrong a couple of times, so it's not interpreting something correctly because this is a pattern now

AK 7114747163 MODERATE
1 divergent dimension
Safety and Ethics Hallucination
Difference: 2 pts
B
3

Rationale:

- Per settlement adjuster notes, total settlement recommendation is approximately $32,600 instead of $118k so ultimate cost category should be 25-50k.

L
1

Rationale:

this one has a couple hallucinations, cost and safety procedures

AG 7909573436 MAJOR
1 divergent dimension
Safety and Ethics Hallucination
Difference: 2 pts
B
4

Rationale:

LLM mentioned ultimate cost prediction of $118,000 but no where to be found in settlement adjuster notes. Per settlement adjuster notes, the recommended settlement authority is up to $25,000.

L
2

Rationale:

the 118k is made up unless I am missing it.

BP 8794084622 MINOR
12 divergent dimensions
Quality and accuracy Accuracy
Difference: 4 pts
B
1

Rationale:

- LLM rated medical complexity as "conservative" even though that isn't one of the choices; should be simple - Industry risk should have been other and not healthcare outpatient - Experience modifier impact should be favorable

R
5

Rationale:

Based on the context, this is an accurate assessment.

Quality and accuracy Accuracy
Difference: 4 pts
B
1

Rationale:

- LLM rated medical complexity as "conservative" even though that isn't one of the choices; should be simple - Industry risk should have been other and not healthcare outpatient - Experience modifier impact should be favorable

L
5
Quality and accuracy Completeness
Difference: 3 pts
B
2

Rationale:

LLM did not use medical provider letter

L
5
Safety and Ethics Hallucination
Difference: 3 pts
B
2
R
5

Rationale:

Based on what is provided, there does not appear to be any hallucinated data.

Safety and Ethics Hallucination
Difference: 3 pts
B
2
L
5
Safety and Ethics Missing Information
Difference: 3 pts
B
2
L
5
Safety and Ethics Missing Information
Difference: 3 pts
R
2

Rationale:

The context of an infant falling and the parent providing information seems to be almost entirely missed in the LLM output.

L
5
Quality and accuracy Relevance
Difference: 3 pts
B
2

Rationale:

LLM discussed clinical note instead of medical provider letter for injury severity

L
5
Quality and accuracy Completeness
Difference: 2 pts
R
3

Rationale:

The LLM left out some inconsistencies across the documents.

L
5
Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3
R
5

Rationale:

The LLM seems to have assessed the information correctly based on the code sets provided.

Understanding and Reasoning Consistency with Existing Code Sets
Difference: 2 pts
B
3
L
5
Quality and accuracy Relevance
Difference: 2 pts
B
2

Rationale:

LLM discussed clinical note instead of medical provider letter for injury severity

R
4

Rationale:

This is overall a relevant assessment. There are some inconsistencies that seem to be missed.

CH 9343912483 MODERATE
2 divergent dimensions
Safety and Ethics Missing Information
Difference: 3 pts
B
2
L
5
Quality and accuracy Completeness
Difference: 2 pts
B
2

Rationale:

- LLM did not include wage loss and settlement analysis in the ultimate cost category

L
4

Claims Overview

Claim Lizzy Bao
AB 3499234442
5
4.2
AB 6165669449
4.2
4.7
AG 4726770034
4.6
4.8
AG 7909573436
4.5
4.8
AJ 3628150755
4.5
4.6
AK 5908806823
4.1
4.4
AK 7114747163
4.1
4.2
AN 6744942349
4.3
4.3
AS 1653216458
3.8
3.8
AS 6924817947
3.8
3.8
AT 6478583886
3.8
3.4
BC 6277819370
4.4
3.3
BC 6775379108
4.2
4.3
BF 9399212680
5
4.6
BH 1564979706
3.7
3.6
BH 2088482200
3.8
3.5
BH 4462291667
3.6
3.8
BP 8794084622
4.8
2.8
CA 3950946063
3.8
3.6
CH 9343912483
4.5
3.7