footix.metrics package

Submodules

footix.metrics.confidence module

Confidence metrics derived from Bayesian posterior 1X2 samples.

This module provides utilities to convert posterior samples of match outcome probabilities into a single confidence score in [0, 100].

class footix.metrics.confidence.ConfidenceComponents(confidence, sharpness, disagreement)[source]

Bases: NamedTuple

Decomposed confidence metrics for a 1X2 prediction.

Parameters:

confidence (float)
sharpness (float)
disagreement (float)

confidence

Final confidence score in [0, 100].

Type:: float

sharpness

Sharpness score in [0, 1] derived from normalized entropy.

Type:: float

disagreement

Posterior disagreement score in [0, 1] derived from mutual information.

Type:: float

confidence: float: Alias for field number 0

sharpness: float: Alias for field number 1

disagreement: float: Alias for field number 2

footix.metrics.confidence.confidence_curve(confidence, gamma=0.7)[source]

Rescale confidence with a monotone power curve.

This helper is intended for readability in user interfaces while preserving the match ranking induced by the raw confidence score.

The mapping is: c' = 100 * (clip(c, 0, 100) / 100) ** gamma.

Parameters:

confidence (float) – Raw confidence score.
gamma (float) – Positive exponent. Values below 1.0 boost mid-range scores, values above 1.0 compress them.

Returns:

Rescaled confidence in [0, 100].

Raises:

ValueError – If gamma is not strictly positive.

Return type:

float

footix.metrics.confidence.confidence_1x2_from_samples_array(p_samples, eps=1e-12)[source]

Compute confidence from posterior 1X2 probability samples.

The score combines: - Sharpness: 1 - H(mean_p) / log(3) - Posterior disagreement: MI / log(3) where

MI = H(mean_p) - E[H(p_s)]

Final score: confidence = clip(100 * 4.5 * sharpness * (1 - disagreement), 0, 100).

The 4.5 factor is an empirical stretch used to spread mid-range raw sharpness values into a more readable 0-100 confidence scale before clipping.

Parameters:

p_samples (ndarray[tuple[Any, ...], dtype[floating]]) – Array with shape (n_samples, 3) containing posterior samples of [p_home, p_draw, p_away].
eps (float) – Numerical stability constant used for clipping.

Returns:

ConfidenceComponents with confidence in [0, 100].

Raises:

ValueError – If the input shape is invalid or no samples are provided.

Return type:

ConfidenceComponents

footix.metrics.confidence.confidence_1x2_from_samples(samples, eps=1e-12)[source]

Compute confidence from a SampleProbaResult object.

Parameters:

samples (SampleProbaResult) – Posterior samples for home/draw/away outcome probabilities.
eps (float) – Numerical stability constant used for clipping.

Returns:

ConfidenceComponents with confidence in [0, 100].

Raises:

ValueError – If sample arrays have incompatible shapes.

Return type:

ConfidenceComponents

footix.metrics.metrics_function module

footix.metrics.metrics_function.incertity(probas, outcome_idx)[source]

Compute the entropy (or incertity) metric.

Parameters:

float (proba ArrayLike) – list of probabilities
outcome_idx (int) – index of the outcome, can be 0, 1, 2 for Home, Draw and Away
probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]])

Returns:

entropy metrics

Return type:

float

footix.metrics.metrics_function.rps(probas, outcome_idx)[source]

Compute the Ranked Probability Score (RPS) for a single categorical forecast.

RPS measures the squared differences between cumulative forecast probabilities and the cumulative actual outcome. Lower scores indicate better forecasts.

Parameters:

probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) – Sequence of forecast probabilities for each category (must sum to 1).
outcome_idx (int) – Index of the realized outcome (0-based).

Returns:

The RPS value.

Raises:

ValueError – If probabilities are invalid or outcome_idx is out of range.

Return type:

float

footix.metrics.metrics_function.zscore(probas, rps_observed, n_iter=10000, seed=None)[source]

Compute the z-score of an observed RPS against a Monte Carlo distribution.

This quantifies how many standard deviations the observed RPS is from the expected RPS if forecasts were perfect probabilistically.

Parameters:

probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) – Sequence of forecast probabilities for each category (must sum to 1).
rps_observed (float) – The observed RPS value to evaluate.
n_iter (int) – Number of Monte Carlo samples (default: 10000).
seed (Optional[int]) – Random seed for reproducibility.

Returns:

A tuple containing (z_score, mean_rps, std_rps).

Return type:

RPSResult

footix.metrics.standings module

Module to compute league standings from match results.

footix.metrics.standings.compute_standings(matches, points_win=3, points_draw=1, tiebreakers=None)[source]

Compute league standings table from a DataFrame of match results.

The input DataFrame should have at least the following columns: ‘home_team’, ‘away_team’, ‘fthg’ (Full Time Home Goals), ‘ftag’ (Full Time Away Goals). Rows with missing score values (NaN) are treated as unplayed and ignored.

Parameters:

matches (DataFrame) – DataFrame containing match results.
points_win (int) – Points awarded for a win. Defaults to 3.
points_draw (int) – Points awarded for a draw. Defaults to 1.
tiebreakers (List[str] | None) – Ordered list of criteria to break ties. Supported: ‘points’, ‘goal_difference’, ‘goals_for’. Defaults to [‘points’, ‘goal_difference’, ‘goals_for’].

Returns:

Sorted standings table with columns:: ’team’, ‘played’, ‘wins’, ‘draws’, ‘losses’, ‘gf’, ‘ga’, ‘gd’, ‘points’, ‘position’

Return type:

pd.DataFrame

footix.metrics.standings.get_team_form(matches, team, last_n=5)[source]

Get the recent form of a team.

Parameters:

matches (DataFrame) – DataFrame containing match results.
team (str) – Team name.
last_n (int) – Number of matches to retrieve. Defaults to 5.

Returns:

List of results (‘W’, ‘D’, ‘L’) from oldest to newest.

Return type:

List[str]

Module contents

Evaluation metrics for prediction models and strategies.

This module provides metrics for assessing model performance including probabilistic calibration, ranking quality, and decision-making metrics.

Exported functions:

incertity: Prediction uncertainty metric. Also known as entropy value.
rps: Ranked Probability Score
zscore: Standardized score calculation

footix.metrics.incertity(probas, outcome_idx)[source]

Compute the entropy (or incertity) metric.

Parameters:

float (proba ArrayLike) – list of probabilities
outcome_idx (int) – index of the outcome, can be 0, 1, 2 for Home, Draw and Away
probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]])

Returns:

entropy metrics

Return type:

float

footix.metrics.rps(probas, outcome_idx)[source]

Compute the Ranked Probability Score (RPS) for a single categorical forecast.

RPS measures the squared differences between cumulative forecast probabilities and the cumulative actual outcome. Lower scores indicate better forecasts.

Parameters:

probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) – Sequence of forecast probabilities for each category (must sum to 1).
outcome_idx (int) – Index of the realized outcome (0-based).

Returns:

The RPS value.

Raises:

ValueError – If probabilities are invalid or outcome_idx is out of range.

Return type:

float

footix.metrics.zscore(probas, rps_observed, n_iter=10000, seed=None)[source]

Compute the z-score of an observed RPS against a Monte Carlo distribution.

This quantifies how many standard deviations the observed RPS is from the expected RPS if forecasts were perfect probabilistically.

Parameters:

probas (Sequence[float] | ndarray[tuple[Any, ...], dtype[floating]]) – Sequence of forecast probabilities for each category (must sum to 1).
rps_observed (float) – The observed RPS value to evaluate.
n_iter (int) – Number of Monte Carlo samples (default: 10000).
seed (Optional[int]) – Random seed for reproducibility.

Returns:

A tuple containing (z_score, mean_rps, std_rps).

Return type:

RPSResult

class footix.metrics.ConfidenceComponents(confidence, sharpness, disagreement)[source]

Bases: NamedTuple

Decomposed confidence metrics for a 1X2 prediction.

Parameters:

confidence (float)
sharpness (float)
disagreement (float)

confidence

Final confidence score in [0, 100].

Type:: float

sharpness

Sharpness score in [0, 1] derived from normalized entropy.

Type:: float

disagreement

Posterior disagreement score in [0, 1] derived from mutual information.

Type:: float

confidence: float: Alias for field number 0

sharpness: float: Alias for field number 1

disagreement: float: Alias for field number 2

footix.metrics.confidence_curve(confidence, gamma=0.7)[source]

Rescale confidence with a monotone power curve.

This helper is intended for readability in user interfaces while preserving the match ranking induced by the raw confidence score.

The mapping is: c' = 100 * (clip(c, 0, 100) / 100) ** gamma.

Parameters:

confidence (float) – Raw confidence score.
gamma (float) – Positive exponent. Values below 1.0 boost mid-range scores, values above 1.0 compress them.

Returns:

Rescaled confidence in [0, 100].

Raises:

ValueError – If gamma is not strictly positive.

Return type:

float

footix.metrics.confidence_1x2_from_samples(samples, eps=1e-12)[source]

Compute confidence from a SampleProbaResult object.

Parameters:

samples (SampleProbaResult) – Posterior samples for home/draw/away outcome probabilities.
eps (float) – Numerical stability constant used for clipping.

Returns:

ConfidenceComponents with confidence in [0, 100].

Raises:

ValueError – If sample arrays have incompatible shapes.

Return type:

ConfidenceComponents

footix.metrics.confidence_1x2_from_samples_array(p_samples, eps=1e-12)[source]

Compute confidence from posterior 1X2 probability samples.

The score combines: - Sharpness: 1 - H(mean_p) / log(3) - Posterior disagreement: MI / log(3) where

MI = H(mean_p) - E[H(p_s)]

Final score: confidence = clip(100 * 4.5 * sharpness * (1 - disagreement), 0, 100).

The 4.5 factor is an empirical stretch used to spread mid-range raw sharpness values into a more readable 0-100 confidence scale before clipping.

Parameters:

p_samples (ndarray[tuple[Any, ...], dtype[floating]]) – Array with shape (n_samples, 3) containing posterior samples of [p_home, p_draw, p_away].
eps (float) – Numerical stability constant used for clipping.

Returns:

ConfidenceComponents with confidence in [0, 100].

Raises:

ValueError – If the input shape is invalid or no samples are provided.

Return type:

ConfidenceComponents