The below information provides more detail on the course topics that are being offered in 2023.
The detailed course list for the 2023 summer school is available here.
Please take a look to further explore the courses and our providers’ backgrounds. This document also provides detailed information about course content, pre-requisites, and any preparation required.
Please remember that our courses are subject to change, and updates will be made to this list if changes come through.
Best Practice Analytics: This course will look at methods and tools that can help us create high-quality analytics and reproducible results. We will also look at how to move from a single analyst, spreadsheet driven approach to collaborative analytics that follows a best practice governance model. Adopting practices from test driven software development, we will look at how to establish an analytics process based on documentation, versioning, testing, peer review, collaboration and risk evaluation. We will use examples in R, Shiny, Python and Jupyter Notebooks to illustrate the ideas taught in the course. The aim is to give you an understanding of the challenges you will face when running your own real-world data analytics project and introduce you to a number of principles you can follow to achieve high-quality reproducible results.
Empowerment: Empowerment measures how much an agent is in control of the world it itself perceives. This information-theoretic intrinsic motivation measure was originally conceived to understand the development of more complex behaviours via simple gradients, but has since spread to many other domains. In this course we will first cover the core idea of Empowerment, introducing basic concepts of information theory, intrinsic motivation and the perception action loop. After a short practical exercise we will then take a look at some neural network approximation, and the application of empowerment to fields such as deep learning, robot control and multi-agent reinforcement learning.
Data Protection, Security, Ethics and Liability in the Age of Big Data: This session aims to introduce the current EU and UK data protection regime and the changes to be brought in by the General Data Protection Regulation applicable since May 2018, in spite of Brexit. Furthermore, the session will present and allow for discussion of the specific challenges big data analytics bring, especially in light of the reports published by various data protection regulators on big data both at UK and EU levels. Special attention will be given to security requirements in data protection law. The last two hours (with both speakers) will introduce the ethical issues arising from Big Data and present the correlative legal issues that may arise in light of Data Protection legislations and of criminal law. Torts and contracts will not be covered.
Introduction to Federated Machine Learning: This course will explore the range of training and networking challenges in relation to model exchange among client and server nodes. We will also cover the recent developments in the field and some well known solutions to enhance privacy and security in federated machine learning. The second half of the day will entail hands-on experiments using state-of-the-art open-source European federated learning platform, FEDn, which will be hosted over the Network Convergence Laboratory (NCL) in UEssex. Based on the proposed theoretical and practical sessions, the participants will have a good understanding of the topic and it will provide the foundation required to further explore the field.
Introduction to Network Science: This one-day introductory course on network science will give a broad overview of the different concepts and methods commonly applied in social network analysis. We will first consider different kinds of network data and their representation and discuss the basics of network visualisation, including a hands-on example using the free software visone as an example. We will also discuss different kinds of applications and usage scenarios of network science in business and social contexts. The second part of the course will introduce exploratory and descriptive methods for the analysis of networks, at three levels of granularity: at the node level, the subgroup level, and the network level. The third part of the course will introduce inferential or statistical network analysis, including the basic ideas behind a range of models like the exponential random graph model and its various extensions, latent space models, the quadratic assignment procedure, and related techniques. We will cover the implementation of these methods in a very cursory way using R, but the focus is on the methods, not their implementation. Overall, this course is an introductory-level teaser for interested academics, practitioners, and data scientists who would like to explore what they can possibly do with their relational data in the way of exploration and prediction.
Introduction to Python: This Introduction to Python course is for beginners. We aim to introduce fundamental programming concepts using Google Colab. We will introduce variables, data types, casting, string, Booleans, operators, lists, tuples, loops, conditions, functions, and a bit of NumPy. This course is designed for those who are coming from a non-technical background.
Introduction to Startups for Data Scientists, Analysts and Machine Learning Engineers: An introductory course designed for technical professionals and students (data scientists, analysts, ML engineers, software developers) who are interested in understanding the world of startups. The course will cover the fundamentals of startups and how they function, the career opportunities available, and how to start your own startup. Participants will learn about the role of venture capital, pitch decks, business model canvases, and the concept of control and cap tables.
Focus 1: Machine Learning
Introduction to Machine Learning: The aim of this course is to provide an introduction to Machine Learning and a discussion of the types of problems it is suitable for. The course will then introduce Kernel Machines and show how they can provide robust but flexible classifiers when the number of training points is limited.
Tree-based Models for Machine Learning in Data Analytics: In this course, participants will learn how to work with tree-based models to solve data science problems in Python. Everything from using a single tree for regression or classification to more advanced ensemble methods will be covered. Participants start learning about basic CARTs (classification and regression trees) followed by implementation of bagged trees, Random Forests, and boosted trees using the Gradient Boosting Machine, or GBM. The course will include dedicated practical sessions for these techniques and allow the participant to create high performance tree-based models for a real-world dataset.
Machine learning for Causal Inference from Observational Data: This course will introduce the basic principles of causal modelling (potential outcomes, graphs, causal effects) while emphasising the key role of design and assumptions in obtaining robust estimates. It will also cover the basic principles of machine learning and the use of machine learning methods to do causal inference (e.g. methods stemming from domain adaptation and propensity scores). Lastly it will show how to implement these techniques for causal analysis and interpret the results in illustrative examples. By the end of this course participants should: understand the distinction between causal effects and associations and appreciate the key role of design and possibly untestable assumptions in the estimation of causal effects, understand the role of training and testing models on data and the use of regularization to avoid overfitting, and be able to position machine learning within the causal tool chain.
Focus 2: R
Functions, Control flow, and Automation
Data Visualisation in R: In the era of misinformation and fake news, producing data visualisations that are clear and interpretable to an audience is essential in engaging people with data. Whilst there are many software packages available to produce data graphics, many offer limited customisation of graphics, or are not easily reproducible. This course will explore tools to produce high-quality graphics using the R programming language, focusing on the “ggplot2” package. The “ggplot2” package allows almost endless customisation of data visualisations, has a number of excellent extension packages that add further flexibility, and, being in R, is entirely script-based and therefore highly reproducible. This course will equip attendants with the skills to produce high-quality data visualisations using the “ggplot2” package and extensions, and would be beneficial to people working in any field where data visualisation is important. The course will be suitable for those with little to intermediate prior programming experience.
GIS Analysis in R: Geographic information systems (GIS) software form a powerful tool in the analysis of many types of spatial data, from understanding political trends in different areas, mapping the spread of infectious diseases, or understanding the impacts of climate change across the globe. Commonly used GIS tools offer great power, but can be incredibly expensive for individual users, or offer limited reproducibility of analyses. In recent years, the “landscape” of GIS packages available in the R programming language has enabled R to become a powerful and richly-functional tool in the world of GIS anaysis. R, and all of it’s packages, are freely available, and entirely script-based, allowing users to quickly and easily reproduce their analyses. This course will focus on the “sf” package, and will explore the merits and functionality of working with “simple features” based objects in geographical analyses. This one-day course will familiarise users with the array of GIS packages available in R, and enable users to carry out basic GIS operations on a variety of different geographical data formats. Some prior experience with geographical data and GIS software would be beneficial, but is not essential, whilst this course is recommended for users with an intermediate level of programming experience (or who have attended the “Introduction to R” and “Data visualisation in R” courses).
Bayesian Analysis in R: Bayesian statistics are increasingly popular in many scientific disciplines. In this course, you will learn the theoretical underpinnings of Bayesian approaches and the differences between Bayesian and frequentist statistics. You will also learn how to implement, plot, and interpret Bayesian models in R. Finally, you will learn more about some of the advanced options for statistical modelling in this framework, including multi-level modelling and generalised linear approaches.
Focus 3: Optimisation topics
Synergy of Optimisation and Machine Learning: In the first part of the course, we will discuss modern optimisation approaches that do not require significant investment of expertise and time in algorithm development, but still allow to tackle real-world problems. We will cover the following topics: the meaning of optimisation and the relevance to decision support/making, off-the-shelf solvers, algorithm complexity, simple exact algorithms, simple heuristics, metaheuristics, algorithm configuration and tuning. The second part of the course will provide foundations for exploiting the strong connections between optimisation and data science. In a series of exercises, you will see how the techniques studied in the first part of the course are used in machine learning. The aim is to enhance understanding, and so the usage, of optimisation within machine learning. Conversely, it is being increasingly recognised that the control of optimisation algorithms would itself benefit from application of data science techniques. We will present methods, with exercises, that are being developed for data science to improve the performance of existing optimisation methods in many real-world problems. Overall, the course presents the close interactions between data science and optimisation. You will gain deeper understanding of the optimisation within machine learning and decision support systems, and so how to make more effective use of them.
Focus 4: Deep learning
Introduction to TensorFlow and Deep Learning: The course introduces Tensorflow as a programming language from scratch and shows how to use it to build simple neural networks and perform backpropagation. Students are encouraged to program along with the tutor. The basic underlying workings of TensorFlow and neural networks are taught without resorting to higher-level black box packages, so that students can gain a fundamental understanding of how deep learning works. The course also gives an introductory overview of popular deep learning models, including convolutional neural networks and recurrent neural networks.
Recurrent Neural Networks with Keras: This course teaches a deep understanding of how recurrent neural networks work, what they are used for, and how to implement them efficiently using Keras and Tensorflow. The day culminates with unique advanced recurrent neural network examples applied to control problems. Note that natural-language processing examples will not be covered.
Learning Under Different Training and Testing Distributions: Systems based on machine learning methods often suffer a major challenge when applied to the real-world datasets. The conditions under which the system was developed will differ from those in which we use the system. Few sophisticated examples could be email spam filtering, stock prediction, health diagnostic, and brain-computer interface (BCI) systems, that took a few years to develop. Will this system be usable, or will it need to be adapted because the distribution has changed since the system was first built? Apparently, any form of real-world data analysis is cursed with such problems, which arise for reasons varying from the sample selection bias or operating in non-stationary environments. This tutorial will focus on the issues of dataset shifts (e.g. covariate shift, prior-probability shift, and concept shift) and will cover transfer learning for managing to learn a satisfactory model.
Focus 5: Brain Computer Interfaces
A Hands-On Introduction to Non-Invasive Neural Interfaces: The course will introduce the basic neuroscientific principles and engineering concepts required to understand and participate to this exciting field. Participants will gain insight into the generation of electrophysiological signalsin the nervous systemand the corresponding methods of signal processing and feature extraction. Through practical exercises participant will learn how record electrophysiological signals, and design and develop Brain-Computer interfaces (BCIs) and peripheral nervous interfaces(PNIs). PNIs are devices that directly interact with the peripheral nervous system while BCIs translate patterns of brain activity into messages for artificial devices.