Combining Learned and Handcrafted Features for Injury Risk Estimation in Football
Authors
Rasmussen, Marcus Kassow ; Jensen, Anders Knudsen ; Hansen, Kenneth Krogh
Term
4. term
Education
Publication year
2023
Submitted on
2023-06-08
Pages
80
Abstract
Injuries are a major concern in football because they limit players’ availability and lead to both performance and financial costs for clubs. This project develops machine learning models that rank players by their risk of injury. The work is conducted with Aalborg Boldklub (AaB) using data from training and match sessions. We merge four datasets into a single dataset with 4,350 sessions, 89 of which include an injury. We compare four approaches: (1) a model that uses only manually designed, domain-informed features (handcrafted features), (2) a model that uses only automatically learned features, (3) a model that combines both, and (4) a model that learns player-specific representations by classifying player IDs to help assess injury risk. To address the data imbalance (far fewer injuries than non-injuries), we use cost-sensitive learning combined with binary cross-entropy as the loss function. The models can estimate injury risk to some extent and serve as a recommendation tool for medical staff. The best-performing model uses only handcrafted features and achieves a precision@k of 56.66% ± 9.08 with k = 5, and a Discounted Cumulative Gain (DCG) of 0.90 ± 0.08. Here, precision@k indicates the share of correctly identified injuries among the top k ranked players in a session, and DCG is a measure of ranking quality.
Skader er en central udfordring i fodbold, fordi de begrænser spillernes mulighed for at spille og har både sportslige og økonomiske konsekvenser for klubber. I dette projekt udvikler vi maskinlæringsmodeller, der rangordner spillere efter deres skaderisiko. Arbejdet er lavet i samarbejde med Aalborg Boldklub (AaB) på baggrund af data fra trænings- og kampsessioner. Vi samler fire datasæt til ét samlet datasæt med 4.350 sessioner, hvoraf 89 indeholder en skade. Vi sammenligner fire tilgange: (1) en model, der udelukkende bruger manuelt udvalgte egenskaber baseret på domæneviden (handcrafted features), (2) en model, der udelukkende bruger automatisk lærte egenskaber (learned features), (3) en model, der kombinerer de to, og (4) en model, der lærer spiller-specifikke repræsentationer ved hjælp af klassifikation af spiller-ID’er for derigennem at vurdere skaderisiko. For at håndtere ubalancen i data (få skader i forhold til ikke-skader) anvender vi omkostningsfølsom læring kombineret med binary cross-entropy som tabsfunktion. Modellerne kan i nogen grad estimere spillernes skaderisiko og dermed fungere som et anbefalingsværktøj for sundhedsstaben. Den bedst præsterende model bruger udelukkende handcrafted features og opnår en precision@k på 56.66% ± 9.08 ved k = 5 samt en Discounted Cumulative Gain (DCG) på 0.90 ± 0.08. Her angiver precision@k andelen af rigtigt identificerede skader blandt de k højest rangerede spillere i en given session, mens DCG er et mål for ranglistens kvalitet.
[This apstract has been rewritten with the help of AI based on the project's original abstract]
Keywords
