edu.illinois.cs.cogcomp.lbj.coref.scorers
Class BCubedScorer

java.lang.Object
  extended by edu.illinois.cs.cogcomp.lbj.coref.scorers.Scorer<ChainSolution<T>>
      extended by edu.illinois.cs.cogcomp.lbj.coref.scorers.ChainScorer<Mention>
          extended by edu.illinois.cs.cogcomp.lbj.coref.scorers.BCubedBase
              extended by edu.illinois.cs.cogcomp.lbj.coref.scorers.BCubedUniformPerMentionBase
                  extended by edu.illinois.cs.cogcomp.lbj.coref.scorers.BCubedScorer

public class BCubedScorer
extends BCubedUniformPerMentionBase

Computes the within-document B-Cubed F-Score for a collection of documents.

The precision of a collection of documents is the average of the precisions of all mentions in all documents. The precision of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the predicted cluster containing m.

The recall of a collection of documents is the average of the recalls of all mentions in all documents. The recall of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the true cluster containing m.

The B-Cubed F-Score is the harmonic mean of the precision and recall defined above. This B-Cubed F-Score is weighted so that every mention's precision and recall gets equal weight.

This is the algorithm that Culotta says he used in Culotta, Wick, and McCallum (HLT 2007), modified to accept prediction solutions that contain different mentions than the key solutions, by counting overlap as 0 for mentions not contained in both.

See (Amit) Bagga and Baldwin (MUC-7 1998).

Author:
Eric Bengtson

Nested Class Summary
(package private) static interface BCubedScorer.MentionTypeTranslator
           
 
Constructor Summary
BCubedScorer()
          Default Constructor.
 
Method Summary
private  double[] calcPR(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> predictions)
           
private  java.util.Map<java.lang.String,double[]> calcPR(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> predictions, BCubedScorer.MentionTypeTranslator f)
          Computes the within-document B-Cubed precision and recall for a collection of documents.
 java.util.Map<java.lang.String,double[]> calcPRByType(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> predictions)
           
 double getPrecision(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> preds)
          Computes the within-document B-Cubed precision for a collection of documents.
 double getRecall(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> preds)
          Computes the within-document B-Cubed recall for a collection of documents.
 Score getScore(java.util.List<ChainSolution<Mention>> keys, java.util.List<ChainSolution<Mention>> preds)
          Computes the within-document B-Cubed F-Score for a collection of documents.
 
Methods inherited from class edu.illinois.cs.cogcomp.lbj.coref.scorers.BCubedBase
getPartition, getPrecision, getRecall, getScore, haveSameMembers
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BCubedScorer

public BCubedScorer()
Default Constructor.

Method Detail

getScore

public Score getScore(java.util.List<ChainSolution<Mention>> keys,
                      java.util.List<ChainSolution<Mention>> preds)
Computes the within-document B-Cubed F-Score for a collection of documents.

The precision of a collection of documents is the average of the precisions of all mentions in all documents. The precision of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the predicted cluster containing m.

The recall of a collection of documents is the average of the recalls of all mentions in all documents. The recall of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the true cluster containing m.

The B-Cubed F-Score is the harmonic mean of the precision and recall defined above. This B-Cubed F-Score is weighted so that every mention's precision and recall gets equal weight.

This is the algorithm that Culotta says he used in Culotta, Wick, and McCallum (HLT 2007), modified to accept prediction solutions that contain different mentions than the key solutions, by counting overlap as 0 for mentions not contained in both.

Specified by:
getScore in class BCubedUniformPerMentionBase
Parameters:
keys - A collection of true (gold standard) solutions (for example, one per document)
preds - A collection of predicted solutions (for example, one per document)
Returns:
The within-document B-Cubed F-Score.

getPrecision

public double getPrecision(java.util.List<ChainSolution<Mention>> keys,
                           java.util.List<ChainSolution<Mention>> preds)
Computes the within-document B-Cubed precision for a collection of documents.

The precision of a collection of documents is the average of the precisions of all mentions in all documents. The precision of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the predicted cluster containing m. and then averages those scores of all mentions in across all documents.

Specified by:
getPrecision in class BCubedUniformPerMentionBase
Parameters:
keys - A collection of true (gold standard) solutions (for example, one per document)
preds - A collection of predicted solutions (for example, one per document)
Returns:
The within-document B-Cubed precision.

getRecall

public double getRecall(java.util.List<ChainSolution<Mention>> keys,
                        java.util.List<ChainSolution<Mention>> preds)
Computes the within-document B-Cubed recall for a collection of documents.

The recall of a collection of documents is the average of the recalls of all mentions in all documents. The recall of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the true cluster containing m.

Specified by:
getRecall in class BCubedUniformPerMentionBase
Parameters:
keys - A collection of true (gold standard) solutions (for example, one per document)
preds - A collection of predicted solutions (for example, one per document)
Returns:
The within-document B-Cubed recall.

calcPR

private java.util.Map<java.lang.String,double[]> calcPR(java.util.List<ChainSolution<Mention>> keys,
                                                        java.util.List<ChainSolution<Mention>> predictions,
                                                        BCubedScorer.MentionTypeTranslator f)
Computes the within-document B-Cubed precision and recall for a collection of documents.

The precision of a collection of documents is the average of the precisions of all mentions in all documents. The precision of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the predicted cluster containing m. and then averages those scores of all mentions in across all documents.

The recall of a collection of documents is the average of the recalls of all mentions in all documents. The recall of a mention m is calculated as the number of mentions correctly predicted to be in the same cluster as m (including m) divided by the number of mentions in the true cluster containing m.

Parameters:
keys - A collection of true (gold standard) solutions (for example, one per document)
predictions - A collection of predicted solutions (for example, one per document)
Returns:
An array containing the precision and the recall, in that order.

calcPR

private double[] calcPR(java.util.List<ChainSolution<Mention>> keys,
                        java.util.List<ChainSolution<Mention>> predictions)

calcPRByType

public java.util.Map<java.lang.String,double[]> calcPRByType(java.util.List<ChainSolution<Mention>> keys,
                                                             java.util.List<ChainSolution<Mention>> predictions)