footix.data_io package
Submodules
footix.data_io.base_scrapper module
footix.data_io.data_reader module
- class footix.data_io.data_reader.DataProtocol(*args, **kwargs)[source]
Bases:
ProtocolProtocol for data readers.
- class footix.data_io.data_reader.MatchupResult(home_team, away_team, result, away_goals, home_goals)[source]
Bases:
objectA dataclass representing the result of a football match.
- result
The final result of the match: (‘H’ for Home Win, ‘A’ for Away Win, ‘D’ for Draw).
- Type:
- static from_dict(dict_row)[source]
Factory method to create a MatchupResult object from a dictionary row.
Parameters
dict_row (dict): A dictionary containing the match results with keys: - ‘HomeTeam’: The name of the home team. - ‘AwayTeam’: The name of the away team. - ‘FTR’: The final result
(‘H’ for Home Win, ‘A’ for Away Win, ‘D’ for Draw).
‘FTAG’: The number of goals scored by the away team.
‘FTHG’: The number of goals scored by the home team.
Returns
MatchupResult: An instance of the MatchupResult class populated with data from the dictionary row.
- Parameters:
dict_row (dict)
- Return type:
footix.data_io.footballdata module
Module for scraping and processing footballdata.co.uk data.
This module contains the ScrapFootballData class, which is responsible for downloading, storing, and preprocessing football match data from football-data.co.uk. It includes methods for data sanitization, team name mapping, and fixture retrieval.
- Classes:
ScrapFootballData: Handles the scraping and processing of football match data.
- Functions:
_process_season(season: str) -> str: Processes a season string into a standardized format.
- class footix.data_io.footballdata.ScrapFootballData(competition, season, path, force_reload=False, mapping_teams=None)[source]
Bases:
ScraperScraper for downloading and processing football match data from football-data.co.uk.
This class handles the retrieval, local storage, and preprocessing of football match data for a given competition and season. It supports automatic downloading, file management, column sanitization, and team name mapping.
- Parameters:
competition (str) – The competition code (e.g., ‘E0’ for Premier League).
season (str) – The season string (e.g., ‘2020/2021’, ‘2020-2021’, or ‘2021’).
path (str) – Directory path to store the downloaded CSV files.
force_reload (bool, optional) – If True, forces re-download of data even if file exists.
mapping_teams (dict[str, str] | None, optional) – Optional mapping for team name
normalization.
- path
Path object for data storage.
- Type:
Path
- df
Loaded and processed match data.
- Type:
pd.DataFrame
- load()[source]
Load the CSV for the configured competition and season into a pandas DataFrame.
If a file named “{competition}_{season}.csv” exists under self.path and self.force_reload is False, it is loaded with pandas.read_csv. Otherwise self.download() is invoked to (re)create the CSV, which is then read.
- Returns:
The loaded dataset.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the expected CSV is not found after attempting download.
pandas.errors.EmptyDataError, pandas.errors.ParserError, OSError – Propagated from
pandas.read_csv or filesystem operations. –
Notes
Relies on the instance attributes self.path (Path or str), self.competition (str), self.season (str), and self.force_reload (bool). This method may have the side effect of calling self.download().
footix.data_io.prediction_export module
Prediction export utilities for model predictions.
This module transforms model outputs into a normalized JSON-compatible structure for prediction record consumers.
- footix.data_io.prediction_export.build_prediction_records_from_predictions(fixtures, goal_matrices, samples, payload_metadata=None, team_normalizer=None, confidence_gamma=0.7)[source]
Build prediction records from existing prediction artifacts.
- Parameters:
fixtures (Sequence[Mapping[str, Any]]) – Raw fixtures payload from odds JSON.
goal_matrices (Mapping[str, GoalMatrix]) – Mapping from match key to score matrix predictions.
samples (Mapping[str, SampleProbaResult]) – Mapping from match key to posterior probability samples.
payload_metadata (Mapping[str, Any] | None) – Optional metadata extracted from odds payload.
team_normalizer (Callable[[str], str] | None) – Optional callable for team-name normalization.
confidence_gamma (float | None)
- Returns:
Tuple of valid records and technical error reports.
- Return type:
- footix.data_io.prediction_export.export_prediction_records_from_model(model, fixtures, payload_metadata=None, team_normalizer=None, predict_kwargs=None, sample_kwargs=None, confidence_gamma=0.7)[source]
Compute predictions from a model and export prediction records.
- Parameters:
model (PredictionExportModel) – Predictive model supporting predict/get_samples.
fixtures (Sequence[Mapping[str, Any]]) – Raw fixtures payload from odds JSON.
payload_metadata (Mapping[str, Any] | None) – Optional metadata extracted from odds payload.
team_normalizer (Callable[[str], str] | None) – Optional callable for team-name normalization.
predict_kwargs (Mapping[str, Any] | None) – Optional extra kwargs forwarded to predict.
sample_kwargs (Mapping[str, Any] | None) – Optional extra kwargs forwarded to get_samples.
confidence_gamma (float | None)
- Returns:
Tuple of valid records and technical error reports.
- Return type:
footix.data_io.understat module
- exception footix.data_io.understat.ShotDataNotFound[source]
Bases:
RuntimeErrorRaised when the expected shotsData <script> block is not present.
- exception footix.data_io.understat.FixtureDataNotFound[source]
Bases:
RuntimeErrorRaised when the fixture data are not present.
- class footix.data_io.understat.ScrapUnderstat(competition, season, path, force_reload=False, mapping_teams=None)[source]
Bases:
ScraperScraper for downloading and processing football match data from understat.com. This class function is heavily inspired/copied from its counterpart from penalty blog: https://github.com/martineastwood/penaltyblog
This class retrieves, parses, and processes football match data for a given competition and season from Understat. It extracts fixture details, expected goals (xG), forecasts, and normalizes team names. The data is returned as a processed pandas DataFrame.
- Parameters:
competition (str) – The competition code (e.g., ‘EPL’ for Premier League).
season (str) – The season string (e.g., ‘2020/2021’, ‘2020-2021’, or ‘2021’).
path (str) – Directory path for any required file operations.
force_reload (bool, optional) – If True, forces re-download or reprocessing of data.
mapping_teams (dict[str, str] | None, optional) – Optional mapping for team name
normalization.
- get_fixtures() pd.DataFrame[source]
Downloads, parses, and returns processed match data.
- Return type:
- _process_season(season
str) -> str: Processes the season string for URL usage.
- get_fixtures()[source]
Downloads and processes match fixtures using Understat’s API.
Uses the /getLeagueData/ API endpoint which requires specific headers.
- Returns:
Processed fixtures with match details, xG, and forecasts.
- Return type:
pd.DataFrame
- Raises:
FixtureDataNotFound – If no fixture data is found in the API response.
footix.data_io.utils_scrapper module
- footix.data_io.utils_scrapper.check_competition_exists(competition)[source]
Check if the competition exists in the MAPPING_COMPETITIONS dictionary.
- footix.data_io.utils_scrapper.to_snake_case(name)[source]
Convert the string name into a snake case string. Shamelessly copied from: https://stackoverflow.com/questions/1175208/ elegant-python-function-to-convert-camelcase-to-snake-case
- footix.data_io.utils_scrapper.add_match_id(df)[source]
Add a stable match_id column in the form “Home - Away - YYYY-MM-DD”.
This normalizes the date formatting so match ids are consistent across scrapers that use different date string formats.
Module contents
Data input/output utilities for football data sources.
This module provides interfaces and implementations for scraping and reading football data from multiple sources (Football-Data.org, Understat, etc.).
- Submodules:
footballdata: Football-Data.org scraper
understat: Understat.com data reader
data_reader: Generic data reading utilities
base_scrapper: Base classes for data scrapers
utils_scrapper: Scraper utility functions
- class footix.data_io.ScrapFootballData(competition, season, path, force_reload=False, mapping_teams=None)[source]
Bases:
ScraperScraper for downloading and processing football match data from football-data.co.uk.
This class handles the retrieval, local storage, and preprocessing of football match data for a given competition and season. It supports automatic downloading, file management, column sanitization, and team name mapping.
- Parameters:
competition (str) – The competition code (e.g., ‘E0’ for Premier League).
season (str) – The season string (e.g., ‘2020/2021’, ‘2020-2021’, or ‘2021’).
path (str) – Directory path to store the downloaded CSV files.
force_reload (bool, optional) – If True, forces re-download of data even if file exists.
mapping_teams (dict[str, str] | None, optional) – Optional mapping for team name
normalization.
- path
Path object for data storage.
- Type:
Path
- df
Loaded and processed match data.
- Type:
pd.DataFrame
- load()[source]
Load the CSV for the configured competition and season into a pandas DataFrame.
If a file named “{competition}_{season}.csv” exists under self.path and self.force_reload is False, it is loaded with pandas.read_csv. Otherwise self.download() is invoked to (re)create the CSV, which is then read.
- Returns:
The loaded dataset.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the expected CSV is not found after attempting download.
pandas.errors.EmptyDataError, pandas.errors.ParserError, OSError – Propagated from
pandas.read_csv or filesystem operations. –
Notes
Relies on the instance attributes self.path (Path or str), self.competition (str), self.season (str), and self.force_reload (bool). This method may have the side effect of calling self.download().
- class footix.data_io.ScrapUnderstat(competition, season, path, force_reload=False, mapping_teams=None)[source]
Bases:
ScraperScraper for downloading and processing football match data from understat.com. This class function is heavily inspired/copied from its counterpart from penalty blog: https://github.com/martineastwood/penaltyblog
This class retrieves, parses, and processes football match data for a given competition and season from Understat. It extracts fixture details, expected goals (xG), forecasts, and normalizes team names. The data is returned as a processed pandas DataFrame.
- Parameters:
competition (str) – The competition code (e.g., ‘EPL’ for Premier League).
season (str) – The season string (e.g., ‘2020/2021’, ‘2020-2021’, or ‘2021’).
path (str) – Directory path for any required file operations.
force_reload (bool, optional) – If True, forces re-download or reprocessing of data.
mapping_teams (dict[str, str] | None, optional) – Optional mapping for team name
normalization.
- get_fixtures() pd.DataFrame[source]
Downloads, parses, and returns processed match data.
- Return type:
- _process_season(season
str) -> str: Processes the season string for URL usage.
- get_fixtures()[source]
Downloads and processes match fixtures using Understat’s API.
Uses the /getLeagueData/ API endpoint which requires specific headers.
- Returns:
Processed fixtures with match details, xG, and forecasts.
- Return type:
pd.DataFrame
- Raises:
FixtureDataNotFound – If no fixture data is found in the API response.
- footix.data_io.build_prediction_records_from_predictions(fixtures, goal_matrices, samples, payload_metadata=None, team_normalizer=None, confidence_gamma=0.7)[source]
Build prediction records from existing prediction artifacts.
- Parameters:
fixtures (Sequence[Mapping[str, Any]]) – Raw fixtures payload from odds JSON.
goal_matrices (Mapping[str, GoalMatrix]) – Mapping from match key to score matrix predictions.
samples (Mapping[str, SampleProbaResult]) – Mapping from match key to posterior probability samples.
payload_metadata (Mapping[str, Any] | None) – Optional metadata extracted from odds payload.
team_normalizer (Callable[[str], str] | None) – Optional callable for team-name normalization.
confidence_gamma (float | None)
- Returns:
Tuple of valid records and technical error reports.
- Return type:
- footix.data_io.export_prediction_records_from_model(model, fixtures, payload_metadata=None, team_normalizer=None, predict_kwargs=None, sample_kwargs=None, confidence_gamma=0.7)[source]
Compute predictions from a model and export prediction records.
- Parameters:
model (PredictionExportModel) – Predictive model supporting predict/get_samples.
fixtures (Sequence[Mapping[str, Any]]) – Raw fixtures payload from odds JSON.
payload_metadata (Mapping[str, Any] | None) – Optional metadata extracted from odds payload.
team_normalizer (Callable[[str], str] | None) – Optional callable for team-name normalization.
predict_kwargs (Mapping[str, Any] | None) – Optional extra kwargs forwarded to predict.
sample_kwargs (Mapping[str, Any] | None) – Optional extra kwargs forwarded to get_samples.
confidence_gamma (float | None)
- Returns:
Tuple of valid records and technical error reports.
- Return type: