Author(s)
Term
4. semester
Education
Publication year
2024
Submitted on
2024-06-10
Pages
26 pages
Abstract
For pharmaceutical companies to get new drugs to market, they first must get clinical studies approved. This entails following rigid rules defined in large regulatory documents. This is both a costly and time-intensive process when done manually. The field of automated consistency checking (ACC) can assist in automating this process. As regulatory documents are large, complex, and contain rich natural language, implementing ACC solutions is complex. However, natural language processing (NLP) methods have become increasingly powerful in recent years, providing a better use case for ACC. Thus, this paper investigates ACC in the pharmaceutical domain in collaboration with Novo Nordisk. The paper explores the problem of ACC by dividing it into multiple NLP subproblems and presents a pipeline for ACC. The pipeline consists of identifying sentences representing rules in regulatory documents and extracting relevant data from these rules needed to serialize them into CDISC Core rules. This paper demonstrates how an in-domain dataset can be constructed needed to implement machine learning models. Using this dataset, we train multiple machine-learning models to solve each subproblem. For the first problem of identifying rules, an SVM classifier using TF-IDF embeddings obtains an 2 score of 0.79, outperforming other baselines and fine-tuned versions of BERT models. To assign operators to the classified rules, an MLkNN classifier also using TF-IDF embeddings obtains an 2-micro score of 0.71 . Lastly, to extract elements such as columns and values from the rule sentences, a fine-tuned version of LegalBERT can be used, obtaining an 2 score of 0.69. Utilizing the output of these three models, we show that it is possible to generate simple rules, which can be used to implement ACC on clinical trial study databases.
Documents
Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.
If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.