Warwick Q-Step Masterclass: Automated Web Data Collection
Event Information
About this Event
PLEASE DO NOT REGISTER FOR THIS EVENT IF YOU HAVE NO INTENTION OF ATTENDING.
Although places are free to students, the Q-Step Centre incurs substantial costs for these events which cannot be refunded in the event that registered students do not turn up on the day. Please make every effort to inform us in advance of the event if you are unable to attend or you can cancel your place directly on Eventbrite.
The Warwick Q-Step Centre is delighted to host this Masterclass on Automated Web Data Collection, delivered by Theresa Gessler, of the Institute of Political Science, University of Zurich.
This Masterclass is aimed primarily at postgraduate students on our 3 Quantitative Social Science degrees. Final year Q-Step undergraduate students, interested in learning enhancement opportunities, are also invited to attend. Depending on numbers we may be able to open the event up to PG students in other relevant subject areas throughout the University.
Automated Web Data Collection
The increasing availability of large amounts of data is changing research in political science. Over the past years, a variety of data – whether election results, press releases, parliamentary speeches or social media posts – has become available online. While data has become easier to find, in most cases, it comes in an unstructured format. This makes collecting, cleaning and analysing this data challenging.
The goal of this Masterclass is to equip you to gather online data and process it in R for your own research. The major advantage of R for web scraping is that all the steps required during your research project can be done within the same program – data collection, data processing, data analysis and visualisation. Additionally, the functionality of R for scraping has been growing immensively over the past years.
The course is an introduction to webscraping that will give you an applied overview of the skills required to automatically collect data from the web, as well as some experience with basic techniques. The course is hands-on, with lectures followed by in-class exercises where you will be able to apply and practice the new methods. If you want, feel free to bring examples from your own research projects to work on during in-class exercises.
Software
All the software we will be using is freely available and based on the R statistical computing language, which is required for this Masterclass. I recommend installing RStudio as a graphical interface for R. Most required packages (e.g. rvest) can be installed in advance or during the course. A definitive list will be emailed shortly before the course to allow you to prepare.
PLEASE NOTE: This event is for Warwick University students only. There is no option to attend parts of the Masterclass, you must be able to attend for the whole day.
Attendance is free. Registration is required as places are strictly limited. So book your place now!