Farid Ahmed, CTaLE Associate and Teaching Fellow in Economics & Public Policy at Imperial College London, shares his insights into the latest EconTEAching session.
CTaLE’s first EconTeaching seminar of the year explored the complexities that accompany working with student data. I was eager to listen, learn and contribute to the discussion, having recently embarked on a student-data driven project myself. The seminar led to an interesting conversation among the panellists as well as participants and provided interesting insights. The session is a must listen for anyone interested in undertaking pedagogic research in economics. While panellists primarily specialised in using student data in the UK context, there was a diverse set of participants present, some of whom came forth with interesting observations from their own contexts.
For anyone new to student data, the first 10 minutes of the session provided an excellent overview. Stefania Paredes (University of Warwick), opened up the session with an explanation of the type of different student data that are available. Stefania explained that data from university students can be obtained on their personal characteristics (e.g. gender, nationality, ethnicity) – this is likely to be protected by GDPR and would not be readily accessible to everyone without ethical approvals; student outcome data (e.g. attendance, marks and VLE engagement) – module leaders should have access to this, but may need permission to use it for research; qualitative data (e.g. surveys, focus groups) – this data is generated specifically for the research question being analysed.
Stefania also highlighted data that is available through the Higher Education Statistics Agency (HESA) which collects data on UK students on a variety of different metrics annually. Depending on the pedagogic question being addressed, this data can provide a wealth of information as it represents students throughout the UK from different universities. One caveat to keep in mind is that data requests to HESA can be significantly costly and each characteristic requested should be carefully considered and evaluated. This can be an invaluable tool for cross-university or time-series comparisons with regards to student outcomes and choices.
Student data is of course naturally available at University, Programme and Module levels and can be used to address different questions. At each stage however, the use of student data has to be carefully considered, ensuring that relevant ethical approvals are obtained, and controls are put in place for compliance with GDPR.
The seminar presenters, Gabriela Cagliesi (University of Sussex) and Anastasia Papadopoulou (University of Bristol) were then invited to share their insights on working with student data. Gabriela contextualised her understanding of student data through her experience as an applied macroeconomist. While macroeconomic data tends to be ‘clean’, a significant amount of work is often needed on student data to ensure it is usable for the purposes it is collected. This is most often the case with self-reported student data. Gabriela spoke in the context of her pedagogic research on attainment gap and evidence-based interventions in courses, such as with assessments – as there can be attrition and we may not be able to observe student outcomes before and after a pedagogic intervention.
Gabriela drew a contrast with institutional student data, such as that on personal characteristics and grades, which tends to be far more complete. My own experience with student data is similar to Gabriela’s – while researching assessment outcomes in Econometrics following Covid, we were unable to obtain 100% compliance with self-reported student data as students often chose not to provide certain information, but the corresponding characteristic data for students in our sample was more or less complete as it was obtained from the University.
Anastasia echoed Gabriela and spoke about student data collected from surveys, focusing mostly on primary data. Anastasia noted, that with survey data, stated and revealed answers may be different and pose a challenge to identification in pedagogic research and giving rise to measurement error. There is also the challenge of encouraging students to respond to surveys. One method is surveying in class, where we can obtain a larger sample, but attrition rates may be high and it may be difficult to collect a repeated sample (before and after an intervention for example). This is something for pedagogic researchers to keep in mind as it is difficult to incentivise students to complete these surveys and an important consideration when eliciting student responses to surveys. Alternative sources, according to Anastasia, such as focus groups and interviews may provide more valuable information that surveys are unable to capture. With reference to secondary data, Anastasia explained how it may not be replicable and at times one may need to combine secondary data from different sources, which would create additional data collection issues. Thus, there are unique challenges that arise with student data collection and the challenges tend to vary based on the nature of data being collected.
Chris Downey (University of Southampton), presented a perspective from the discipline of Education, seconding Gabriela and Anastasia, in the difficulties involved with working with internal student data. Chris highlighted these difficulties through a research question he has been evaluating recently, namely investigating pre-higher education and post-higher education differences in student attainment. Here Chris mentioned a different challenge he has been facing with student data – that of equivalence of pre-school certifications. If research questions are considering student attainment, how would pedagogic researchers evaluate heterogeneities that exist due to students’ pre-school certifications. This is an important question to consider and would impact virtually every study considering attainment at university – not to mention the additional layer it creates with regards to collecting and translating student data for analysis.
The live discussion and ensuing question and answer session were equally lively. We learnt that universities in some countries, such as in Germany, provide access far easily to student data than those in the UK. This was surprising given how much emphasis and consideration is placed on data protection regulations in the EU. Participants also discussed challenges with publishing pedagogic research that is based on student data. Often, reviewers’ queries become impossible to answer as the cohorts in question and/or survey respondents become inaccessible. On the practical front, there is a challenge of requesting and obtaining data from the right people at the University level. At times, teams who hold the relevant student data find it quite difficult to generate appropriate database queries and therefore, are unable to provide timely data even if relevant approvals have been obtained. This creates additional layers of complexity and additional relationships that must be managed. Panellists also emphasized the fact that the availability of good student data, while challenging, still needs a good research question and robust methodology to lead to an impactful pedagogic project.
To conclude, I would echo Chris Downey when he said that economists possess relevant skills and training to undertake pedagogic research and people in education and other social sciences would be eager to work with us, especially when it comes to work that will lead to improved student learning and attainment. This was an invaluable opportunity for existing and prospective pedagogic researchers in economics looking for an overview of the challenges to be aware of when working with student data and will hopefully encourage more economics scholars to undertake pedagogic research.
Written by Farid Ahmed,
Teaching Fellow in Economics & Public Policy at Imperial College London, and Associate Member CTaLE
Image credit: Joshua Sortino
