The underlying data consists of MSE forum posts mentioning heat pumps, covering the period between 1 January 2016 and 22 May 2024. From forum posts we identified 4,875 questions. We use a topic model (BERTopic, with a minimum topic size of 100 and everything else as default) to identify groups of frequently asked questions. This is followed by the use of generative AI to summarise each group of frequently asked questions into one representative question (using OpenAI's GPT-4o mini model).
We further refined these FAQs manually and created sub-categories of questions. Manual validation is also conducted on a sample of questions in each topic. The questions presented above are not necessarily ordered by how many times those,or similar questions, are asked. More details on the methodology including data collection, data processing and FAQ identification can be found in this technical appendix (section 4.4 highlights the FAQ methodology).