Topic modelling for CEDA archive meta-data
14 Jul 2020





​​Collaborating Facility: Centre for Environmental Data Analysis (CEDA)

The Centre for Environment Data Analysis (CEDA) archive contains 10,000s of archived files consisting of both real-world measurements and simulated climate models. Each of these files contains a plethora of semi-structure meta data which could be utilised for more intelligent document discovery and retrieval. 
This project aims to examine the application of machine learning and natural language processing (NLP) techniques to datasets within the CEDA archive to facilitate better end user information discovery and retrieval. This project will examine the use of modern word embedding techniques to extract meaningful topics and​ relationships between documents. 

Contact: Jackson, Samuel (STFC,RAL,SC)