If you work with data, you probably spend a lot of time cleaning it and wrangling it into the correct shape. With manual point-and-click methods this can be an extremely time-consuming task that is both frustrating and repetitive! This course will show you how you can use R to efficiently clean and wrangle your data into a format that’s ready for analysis. You will learn about the leading suite of packages known as the Tidyverse, what tidy data really is, and how to practically achieve it with packages such as {dplyr}, {tidyr}, {lubridate} and {forcats}.

- Programming Level: Foundation
- Type: Programming

Having trouble handling text data in R? If so, this course is certainly for you! One of the main problems Data Scientists face when importing data into R is incosistencies within the raw data. For example, cells may have trailing whitespace or names might not be in title case. We will be covering the {stringr} package which can be used to solve these problems! We will also explore how to parse objects into strings using {glue} and how to text mine using {tidytext}.

- Programming Level: Intermediate
- Type: Programming

When working on data analysis projects version control is essential, for tracking project progress and in aiding project collaboration. Fortunately it is now easier than ever before to integrate version control into your project, using RStudio’s interface to the version control software git and online code sharing websites such as GitHub / GitLab.

- Programming Level: Foundation
- Type: Version Control

An important aspect of managing workflow in data science is being able to work in tandem with your colleagues! This course outlines how effective git is as a tool for version control in collaborative projects. We will be making use of the RStudio git interface and remote project hosting platforms, such as Github and Gitlab.

- Programming Level: Foundation
- Type: Version Control

This is a one-day course on the {tidyverse} package, {purrr}. {purrr} is a very powerful package that gives great flexibility to analysts, by enhancing R’s functional programming toolkit. We will demonstatre how to use functions such as map(), map2() and pmap(), to iteratively map functions over multi-element objects like vectors and lists. Emphasis will also be placed on how we can manipulate list outputs and how this can be applied to our data.

- Programming Level: Foundation
- Type: Programming

This is a one-day intensive course on R and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using “magic code”, and stress the importance of understanding what R is doing.

- Programming Level: Foundation
- Type: Programming

This is a one-day intensive course on advanced graphics with R. The standard plotting commands in R are known as the base graphics, but are starting to show their age. In this course, we cover more advanced graphics packages - in particular, {ggplot2}. The {ggplot2} package can create advanced and informative graphics.

- Programming Level: Intermediate
- Type: Analytics

The benefit of using a programming language such as R is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.

- Programming Level: Intermediate
- Type: Programming

Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. The course will teach participants how to interface with Stan through R!

- Programming Level: Intermediate
- Type: Stats/ML, Programming

Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. The course will teach participants how to interface with Stan through Python!

- Programming Level: Intermediate
- Type: Stats/ML, Programming

This is a one-day intensive course on the R package {shiny}. Shiny allows you to create cutting-edge interactive web-graphics. From the Shiny documentation ‘Shiny makes it incredibly easy to build interactive web applications with R. Automatic ‘reactive’ binding between inputs and outputs and extensive pre-built widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.’

- Programming Level: Intermediate
- Type: Reporting

Do you want to dynamically create static or interactive documents? Do you want your reports to automatically update when the data changes? Then this session is for you! R Markdown is easy to use and allows for dynamic report generation. Whether you are hoping to generate HTML, PDF or Microsoft Word like documents, or even slides for a presentation, R Markdown tailors to your needs.

- Programming Level: Intermediate
- Type: Reporting

This is a one-day intensive course on Python and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.

- Programming Level: Foundation
- Type: Programming

The benefit of using a programming language such as Python is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them.

- Programming Level: Intermediate
- Type: Programming

Python has a number of packages for the effective creation of graphics to communicate your data insights. This one day course will examine a range of packages for building impactful visualisations. During the training session, we’ll cover the main Python plotting libraries: plotly, matplotlib and seaborn. Additionally, we discuss how to effectively use faceting and layers in a graphic.

- Programming Level: Intermediate
- Type: Analytics

From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. This course covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANVOA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).

- Programming Level: Intermediate
- Type: Stats/ML

As spatial data sets get larger, more sophisticated software needs to be harnessed for their analysis. R is now a widely used open source software platform for working with spatial data thanks to its powerful analysis and visualisation packages. The focus of this course is providing participants with the understanding needed to apply R’s powerful suite of geographical tools to their own problems.

- Programming Level: Advanced
- Type: Analytics

So you can write code? Great. But can you write code which is easy to read, simple to maintain, and reproducible? Under the pressure of deadlines even the best of us can fall victim to bad-practices. In this course we motivate the importance of good-practices, and show how we can make best practices second nature by incorporating them into our normal workflow.

- Programming Level: Intermediate
- Type: Programming

Using databases is a fundamental part of a data scientists role. The main focus of this training course is to introduce SQL databases and how R can be used to retrieve and manipulate data stored in a relational database. The course uses both the {DBI} and {dbplyr} packages. We use the PostgresSQL database as an example for public courses. For in-house training, we are happy to adapt the course to match your database requirements.

- Programming Level: Intermediate
- Type: Programming

This is a one-day intensive course on building a package in R. The focus will be on getting a working R package ready for distribution.

- Programming Level: Advanced
- Type: Programming

This is a one-day Docker course aimed at R users. Docker is a popular platform for packaging, deploying, and running applications. These applications run in containers. Crucially, this container can be used on any system: a developer’s laptop, systems on premises, or in the cloud. Applications are packaged as images that contain everything needed to run them: code, libraries, and configuration.

- Programming Level: Intermediate
- Type: Programming

In recent years Python has exploded onto the data-science scene, and with it has come a great swathe of data-oriented packages. However, as easy as these packages make analysis, using these tools efficiently requires much more know-how. By the end of this course participants will be able to locate and address bottlenecks in their data-science workflows, using a number of different techniques and tools.

- Programming Level: Intermediate
- Type: Programming

This course is for anyone who wants to make their R code faster to type, faster to run and more scalable. During the course, we’ll cover the main R sins (and how to avoid them), dabble with hardware, look at running in parallel and think about efficient R data structure. This course should be useful to people with a range of skill levels.

- Programming Level: Advanced
- Type: Programming

We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces Gaussian processes as a statistical modelling technique, motivating principled prior models that avoid pathological behaviour. For full event information and booking details, please visit the event page

- Programming Level: Advanced
- Type: Stats/ML

We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces exchangeability and hierarchical models with a strong focus on the inherent identifiability issues and their computational consequences, as well as strategies for moderating this issues. Completion of the Regression Modelling module is recommended.

- Programming Level: Advanced
- Type: Stats/ML

The capturing and quantification of uncertainty is a very important aspect of model-fitting and parameter inference. Bayesian inference represents a fully-probabilistic approach to parameter inference, allowing a practitioner to quantify their uncertainties through probability densities. However, fitting models in a Bayesian framework can be an involved and complicated affair, often necessitating the use of Markov chain Monte Carlo (MCMC) algorithms and their programmatic implementation.

- Programming Level: Foundation
- Type: Stats/ML

RStudio Connect is an enterprise-grade publishing platform which gives you, the user, the ability to easily share code, documents and applications with collaborators, colleagues and clients. By the end of this course participants will be able to deploy their content to RStudio Connect, manage its access and settings, and tune how this content scales with usage.

- Programming Level: Intermediate
- Type: Management

Python (along with R) has become the dominant language in machine learning and data science. This two-day intensive course will equip you with the knowledge and tools to undertake a variety of tasks in a standard machine learning analytics pipeline. We stress the importance of data preparation, both in terms of data standardisation and feature selection, before tackling model building. We run a separate course on using Tensorflow and Keras with Python.

- Programming Level: Intermediate
- Type: Stats/ML

Machine learning is the process of applying statistical techniques to gain systematic information about a quantity of interest. We will be specifically focusing on how we can use the {tidymodels} suite of packages to implement these techniques. We cover key reasons for model fitting, such as predicition and inference, on quantitative and qualitative responses.

- Programming Level: Intermediate
- Type: Stats/ML

We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces conditional exchangeability, marginal exchangeability, and multifactor modelling (also known as multilevel or random effects modelling) with a focus on efficient implementations. Completion of the Regression Modelling and Hierarchical Modelling modules is highly recommended.

- Programming Level: Advanced
- Type: Stats/ML

The training course will cover R object-oriented programming techniques. We’ll discuss what OOP is and the different varieties within R. Beginning with the popular S3 and S4 OOP frameworks, we’ll finish with the new {R6} package that is used extensively in Shiny applications. By the end of the course, participants will be able to use OOP within their own code.

- Programming Level: Advanced
- Type: Programming

We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. In this module we review a principled Bayesian workflow that guides the development of statistical models suited to the particular details of a given application. For full event information and booking details, please visit the event page

- Programming Level: Advanced
- Type: Stats/ML

Deep learning is a cutting-edge machine learning technique for classification and regression. In the past few years, it has produced state-of-the-art results in fields such as image classification, natural language processing, bioinformatics and robotics. This course will cover the main ideas of deep learning, and how to implement it in practice with tensorflow: a software framework for efficient and scalable deep learning.

- Programming Level: Intermediate
- Type: Programming

Python (along with R) has become the dominant language in machine learning and data science. PyTorch is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing. It is primarily developed by Facebook’s artificial-intelligence research group, and Uber’s “Pyro” software for probabilistic programming is built on it.

- Programming Level: Intermediate
- Type: Stats/ML

Jane produces reports both weekly progress, monthly, quarterly and annual overviews for management and the board. She uses a variety of licensed software/tools because each one has limitations. This course aims to take each individual through the fundamental approach to using R programming in her current role. By the end of the course the individual will be working towards automating all of their reports.

- Programming Level: Foundation, Intermediate
- Type: Programming, Analytics, Reporting

We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module presents linear and general linear regression techniques from a modelling perspective, using that context to motivate robust implementations. We will especially emphasize principled prior modelling strategies for linear, log, and logistic regression models. For full event information and booking details, please visit the event page

- Programming Level: Advanced
- Type: Stats/ML

This course is aimed at statisticians and data scientists already familiar with a dynamic programming language (such as R, Python or Octave). Scala is a free modern, powerful, strongly-typed, functional programming language. In particular, it is fast and efficient, runs on the Java virtual machine (JVM), and is designed to easily exploit modern multi-core and distributed computing architectures.

- Programming Level: Advanced
- Type: Programming, Stats/ML

This course is a practical introduction to some of every day and more sophisticated tools used for the analysis of survival data.

- Programming Level: Intermediate
- Type: Stats/ML

This is a one-day course comprising of methods for tidy evaluation in R. We introduce the {rlang} package as a way of parsing variables from a data set into a function. Furthermore, we cover {renv} and its uses in managing workflows, by isolating your project’s R dependencies and managing library paths!

- Programming Level: Advanced
- Type: Programming

Predicting the future is a tough problem. Time series analysis makes it possible to assess whether or not predictions are possible and, if they are, build a model which can generate informed predictions for the future with realistic estimates of uncertainty. This training course will introduce participants to the packages in the Tidyverts. The best qualification of a prophet is to have a good memory – George Savile

- Programming Level: Intermediate
- Type: Stats/ML, Analytics

This is a 1/2 day session that gives an overview of where and how R is used. Using a combination of lecture-based case studies, and hands-on practicals we’ll cover some of the latest developments in the R world. This course is intended to be interactive and is aimed at an organisation that is considering why (or why not) to move to R.

- Programming Level: Foundation
- Type: Management

Moving you from data storage to data insights with our expert training courses.

Contact our support team if you have any questions about a specific course or if you need a course creating tailored to your needs.