65th Annual Conference of the South African Statistical Association

Pre-Conference Workshops

Pre-conference workshops are only available on the 5 day registration package

Prof Mohammad Arashi

Neural Networks and Deep Learning (with R and Python)

Date: 18 & 19 November 2024

Time: 08:30 - 13:00

For more information and workshop related content, click on the following link: https://bit.ly/4fupkZ2

Because of its capacity to handle and learn from massive volumes of data, deep learning—a branch of machine learning within AI—has become a crucial technology for numerous applications. For handling complex data, feature extraction, scalability, and etc, deep learning (DL) is important and widely used. Using R and Python, we will study artificial neural networks and DL in this workshop. We begin by going over the fundamentals and providing an introduction to DL. Following that, we will go over shallow neural networks and build on our previous knowledge of data analysis using convolutional and recurrent neural networks in R and Python, presuming that the audience is familiar with these programmes. Neural network learning using both programmes is essential for those working in statistics and related fields. We also cover topics like hyperparameter tuning, optimization, propagations, and the mathematics behind DL.

Prof Carlos A. Coelho

Likelihood Ratio Tests in Multivariate Analysis whose statistics have quite simple finite form representations for their distribution

Date: 19 November 2024

Time: 08h30 – 13h00

For more information and workshop related content, click on the following link: https://bit.ly/4f9eMie

In this workshop a variety of Likelihood Ratio Tests (LRTs) that may be used in Multivariate Analysis are shown to have quite simple finite form representations for both the probability density and cumulative distribution functions (p.d.f. and c.d.f.) of their distributions.From the results of three theorems in the book ‘Finite Form Representations for Meijer G and Fox H Functions – Applied to Multivariate Likelihood Ratio Tests Using Mathematica®, Maxima and R’ (*) we can easily obtain rather simple expressions for the p.d.f.’s and c.d.f.’s of the exact distributions of several LRT statistics which may be then used to compute exact quantiles and p-values.

We will address tests that cover

• the usual test for equality of mean vectors and the test for parallelism of profiles,

• the test of independence of two or more sets of variables,

• a test for outliers, o (each of the above tests for real as well as for complex random variables)

• the test of complete symmetrical spherical equivalence,

• the test of equality of mean vectors and profile parallelism with circular covariance matrices

• the test of circularity of the covariance matrix • the simultaneous test of circularity of the covariance matrix and equality of means

• the simultaneous test of independence of several sets of variables and the circularity of the covariance matrices.

For each test addressed we will deal with examples involving real datasets and we will use R functions (freely available to the users) to compute the value of the LRT statistic and to obtain the p-values and quantiles for each of the tests.

Facilitator: StatsNetSA (Statistics Supervision Network in South Africa)

StatSNetSA
Insights into NRF Grant Writing and Rating Applications

18 November 2024

Time: 08h30 – 17h00

For more information and workshop related content, click on the following link: https://bit.ly/40sq10I

This workshop will dive into the NRF grant application system and provide insight into how to write a successful grant application. The presenters, who sit on the relevant NRF panels, will discuss specifically the rated, unrated, and Y-rated grant applications, as well as how the applications are evaluated and scored. However, the workshop will be beneficial for any grant application. In addition, a number of guest speakers will discuss NRF Rating applications, specifically for the Statistician. Note that Y-rated researchers can apply for sabbatical and lecturer replacement funding through the NRF instruments, which should be considered for early-career researchers in Statistics who have a large teaching workload and are struggling to find time to develop their research. The workshop is aimed at early-career researchers in Statistics but established researchers are also invited to join the discussion. During the workshop, we will assist with starting of applications or aid in reviewing unsuccessful applications.

Mr Lucas van der Meer

Analyzing geospatial networks in R with sfnetworks

19 November 2024

Time: 08H30 - 13h00
For more information and workshop related content, click on the following link:
https://luukvdmeer.github.io/sfnetworks-workshop/

Lucas van der Meer is a doctoral researcher in geoinformatics at the University of Salzburg in Austria. He obtained a bachelor in spatial planning at the University of Groningen in The Netherlands, with an academic minor in mathematics and statistics. His master in Geospatial Technologies was a joint degree from the University of Münster in Germany and the Nova Information Management School in Lisbon, Portugal. His research lies on the intersection between spatial data science and human behavioral science. It focuses on quantitative model development within human-centric urban planning practices, geospatial network analysis, and the assessment of sustainable transport accessibility in particular. Lucas is an advocate for open, reproducible science, and has authored multiple software packages in both R and Python. Abstract Geospatial networks are graphs embedded in geographical space. That means that both the nodes and edges in the graph can be represented as geographic features (e.g. points and lines) with a location somewhere on or near the surface of the earth. They play an important role in many different domains, ranging from transportation planning and logistics to ecology and epidemiology. The structure and characteristics of geospatial networks go beyond standard graph topology, and therefore it is crucial to explicitly take space into account when analyzing them. The R package sfnetworks is created to facilitate such an integrated workflow. It combines the forces of two popular R packages, sf for spatial data science and tidygraph for standard graph analysis, and extends them with functionalities that are specific to geospatial network analysis, such as geographic shortest path calculations, geospatial network cleaning, and topology modification. It also facilitates smooth integration with packages for statistical analysis on spatial linear networks, and is designed to seamlessly fit into tidy data wrangling workflows. This workshop provides an introduction to the sfnetworks package for geospatial network analysis. We will start with simple examples on abstract dummy networks, and gradually move towards the analysis of real-world networks that we extract from OpenStreetMap. We will prepare several analytical tasks to solve, of varying difficulty. If you are already working with geospatial networks, you are also encouraged to bring your own use-cases.

Prof Tanja Verster

SASA2024 Workshop: Credit Scorecard Development Tools

19 November 2024

Time: 14h00 – 17h00
For more information and workshop related content, click on the following link: https://bit.ly/3C6mrPH

This workshop has been designed to provide high-level steps on credit scorecard development. Examples will be given in Excel. The focus will be an application scorecard within a retail banking environment, but the principles can be applied to any other type of scorecard (e.g. behavioural, collection, fraud scorecards). Note that although all the examples will be done in Excel, the logistic regression fit will be done in a choice of three software packages: SAS, Python or R Studio.

Prof Ding-Geng Chen

Dr Najmeh Nakhaeirad

Meta-Analysis and Network Meta-Analysis in Public Health Applications

18 November 2024

Time: 08h30 – 13h00
For more information and workshop related content, click on the following link: https://bit.ly/4f7OM70

This workshop provides thorough presentation on models for meta-analysis and network meta-analysis for public health research and applications with detailed step-by-step illustrations and implementation using R. The examples are compiled from real health literatures and the analyses are illustrated by a step-by-step fashion using the most appropriate R packages and functions which should enable attendees to follow the logic and gain an understanding of the meta-analysis and network meta-analysis methods and R implementation so that they may use R to analyze their own data. Specifically we start with an introduction to meta-analysis on both fixed-effects and random-effects models to incorporate within/between-study variations as well as meta-regression to quantify heterogeneity and test the significance of heterogeneity among studies in a meta-analysis. These models will be illustrated using real data from studies on efficacy of Bacillus Calmette-Guerin(BCG) vaccine along with the implementations in commonly used R packages “metafor”. We further discuss how to do network meta-analysis using example in comparing 10 diabetes treatments to reduce blood glucose in R package “netmeta”.

MDAG (Sugnet Lubbe, Niël J le Roux, Johané Nienkemper-Swanepoel, Raeesa Ganey, Ruan Buys, Zoë-Mae Adams and Peter Manefeldt)

User-friendly biplots in R with biplotEZ

18 November 2024

Time: 14:00 - 17:00

For more information and workshop related content, click on the following link: https://url.za.m.mimecastprotect.com/s/yp6uCk5jm7SORyjRLH2f8HG9Xyc

Biplots are valuable visualisation tools in exploratory data analysis. In its simplest form, biplots are regarded as generalised scatterplots for more than two variables. The rows of a data matrix are represented as sample points while the columns are represented as variable axes. Although the interpretation in terms of samples and variable axes dates from the work of Gower in the 1990’s, the application has been limited by the availability of EZ-to-use software. In this presentation we will look at the basic linear algebra behind popular forms of biplots: Principal Component Analysis (PCA), Canonical Variate Analysis (CVA) and biplots of Correspondence Analysis (CA) amongst others. The availability of software limits biplot application to expert users. Providing an EZier to use package for practitioners wanting to visualise their data, encouraged the development of a user-friendly R package. In this workshop you will be introduced to the main aspects of biplot methodology and receive access to the newly developed functions of the biplotEZ R package with applications on real data in various contexts.