**2019 Seminars**

## Department of Statistics

# 2019 Seminars

Seminars by year: Current | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019

**A Kernel-Based Neural Network for High-dimensional Genetic Data Analysis**

Speaker: Qing Lu

Affiliation: Department of Biostatistics, University of Florida

When: Monday, 2 December 2019, 11:00 am to 12:00 pm

Where: 303-310

Artificial intelligence (AI) is a thriving research field with many successful applications in areas such as computer vision and speech recognition. Neural-network-based methods (e.g., deep learning) play a central role in modern AI technology. While neural-network-based methods also hold great promise for genetic research, the high-dimensionality of genetic data, the massive amounts of study samples, and complex relationships between genetic variants and disease outcomes bring tremendous analytic and computational challenges. To address these challenges, we propose a kernel-based neural network (KNN) method. KNN inherits features from both linear mixed models (LMM) and classical neural networks and is designed for high-dimensional genetic data analysis. Unlike the classic neural network, KNN summarizes a large number of genetic variants into kernel matrices and uses the kernel matrices as input matrices. Based on the kernel matrices, KNN builds a feedforward neural network to model the complex relationship between genetic variants and a disease outcome. Minimum norm quadratic unbiased estimation and batch training are implemented in KNN to accelerate the computation, making KNN applicable to massive datasets with millions of samples. Through simulations, we demonstrate the advantages of KNN over LMM in terms of prediction accuracy and computational efficiency. We also apply KNN to the large-scale UK Biobank dataset, evaluating the role of a large number of genetic variants on multiple complex diseases.

**Introduction to Template Model Builder (TMB)**

Speaker: Anders Nielsen

Affiliation:

When: Thursday, 14 November 2019, 11:00 am to 12:00 pm

Where: 303-G15

Template Model Builder[1] (TMB) is an R package for general optimization of statistical models developed by Kasper Kristensen from DTU Aqua. TMB is inspired by the program AD Model Builder[2,3] and like AD Model Builder it uses a combination of automatic differentiation and Laplace approximation to handle models with random effects. TMB is especially well suited for large nonlinear models with random effects because 1) the matrix operations are highly optimized, 2) the sparseness pattern is automatically detected, and 3) parallel computations are well supported. The workflow and optimization is controlled from within R with only the joint likelihood itself written as a C++ function. TMB will be introduced via simple examples. It will be demonstrated how to define models including passing data, defining model parameters, and reporting quantities of interest. The focus will be on TMB's features for models with random effects including model validation[4] and validation of the Laplace approximation.

References:

[1] K Kristensen, A Nielsen, CW Berg, HJ Skaug, B Bell 2016. TMB: Automatic differentiation and Laplace approximation. Journal of Statistical Software 70 (5), 1-21.

[2] DA Fournier, HJ Skaug, J Ancheta, J Ianelli, A Magnusson, MN Maunder, A Nielsen, JR Sibert 2012. AD Model Builder: using automatic differentiation for statistical inference of highly parameterized complex nonlinear models. Optimization Methods and Software 27 (2), 233-249.

[3] RB Millar 2011. Maximum likelihood estimation and inference: with examples in R, SAS and ADMB. John Wiley & Sons.

[4] UH Thygesen, CM Albertsen, CW Berg, K Kristensen, A Nielsen 2017. Validation of ecological state-space models using the Laplace approximation.

Environmental and ecological statistics 24 (2), 317-339.

**Constrained Estimation for Correlated Data from Two-phase Designs**

Speaker: Yujin Kim

Affiliation: The University of Auckland

When: Wednesday, 13 November 2019, 11:00 am to 12:00 pm

Where: 303-310

In resource-limited settings, a cost-efficient approach is to implement a two-phase design and make use of big data information available from the first phase. This information is generally available from large data sources such as census data and disease registries and it is also obtainable from public websites as a form of grouped-level information.

Some existing methods for the analysis of two-phase samples such as calibration or estimation of sampling weights have been shown to yield large improvements in efficiency, as well as a recently proposed method called constrained maximum likelihood. This method builds regression models based on individual-level data from an internal study while using summary-level information from an external big data source (first phase). A set of general constraints are proposed to link internal and external models. The methodology is cleaner because the investigator does not intervene in the process. The approach is well established, but only when participants are assumed independent. In situations when participants are naturally clustered (families, clinics), existing methods can yield concerning and invalid inference and conclusions. Thus, we aim to fill this gap by developing novel statistical methods that efficiently analyse clustered data from two-phase studies. This work aims to develop constrained maximum likelihood estimation for the case where the first phase is cluster-correlated and when the external model is a cluster-level model.

**Statistical aspects of Close-Kin Mark-Recapture methods**

Speaker: Professor Hans Skaug

Affiliation: University of Bergen

When: Tuesday, 12 November 2019, 3:00 pm to 4:00 pm

Where: 303-G15

Mark-recapture methods have a long tradition in the study of wildlife populations. Taking advantage of modern genetics one can generalize from "recapturing the same individual" to "recapturing a closely related kin", and thereby the name Close-Kin Mark-Recapture (CKMR). So far, CKMR has been successfully applied to a number of fish populations, but it also has potential for terrestrial populations. In this talk I will explain basic CKMR principles, and how these lead to a pseudo-likelihood for parameter estimation. Results from mathematical demography will be used. I will also compare CKMR to standard mark recapture, and finally I will discuss optimal design of CKMR studies.

**Probabilistic Deep Learning in Autonomous Driving**

Speaker: Christian Hubschneider

Affiliation: FZI Research Center for Information Technology / Karlsruhe Institute of Technology

When: Wednesday, 6 November 2019, 11:00 am to 12:00 pm

Where: 303-310

Christian will talk about his work in getting probabilistic mechanisms into deep learning used for autonomous driving, utilising special neural network architectures and training procedures to get uncertainty measures out of those models, which have to be treated as a black box otherwise. His application is on training deep neural networks in an end-to-end fashion to achieve direct vehicle control and vehicle trajectory generation. To help categorise how his work impacts the field of autonomous driving, the talk will also give an introduction into self-driving cars and autonomous driving, what kind of sensors are used in such vehicles and common architectural paradigms typically currently used and discussed in the scientific community and at car manufacturers.

**Spatial capture-recapture—applications to acoustic surveys of cetacean populations**

Speaker: David Chan

Affiliation: The University of Auckland

When: Wednesday, 23 October 2019, 11:00 am to 12:00 pm

Where: 303-310

Acoustic surveys are rapidly becoming one of the most common ways to assess cetacean populations, and can be vastly cheaper than other alternatives, like visual surveys. This is because visual surveys have low detection rates, even under ideal conditions. However, acoustic surveys can detect large numbers of vocalising individuals, if the monitoring devices and survey design are appropriate for the target population. Acoustic spatial capture-recapture (ASCR) can estimate call density from these surveys. ASCR has not been widely applied to acoustic surveys of cetacean populations although it is acknowledged in the literature.

In this talk I will cover the current state of my PhD research on applying ASCR methods to acoustic surveys of cetacean populations. I will present an introduction to ASCR methods utilising the data from two case studies to illustrate key concepts of ASCR. This will be followed by describing the key findings of these two case studies and how they impact the future direction of my research. Other motivating problems in the field of ASCR will be briefly discussed as well.

**Risk prediction Modelling using multi-omics data**

Speaker: Xiaqiong Wang

Affiliation: The University of Auckland

When: Wednesday, 25 September 2019, 3:00 pm to 4:00 pm

Where: 303-610

Accurate disease risk prediction is an essential step towards precision medicine, an emerging model of health-care that tailors treatment strategies based on individual’s profiles. Recent emerging multi-layer omics data (e.g., genomic, transcriptomic, epigenomic and proteomic) provide unprecedented opportunities for investigating the predictive effects of biomarkers and their interplay at various molecular levels. However, their ultra-high dimensionality and complex intra/inter-relationships have brought tremendous analytical challenges. Therefore, it is important to develop a model that can efficiently select predictors from ultra-high dimensional omics data and account for their complex relationships, for risk prediction analysis on high-dimensional multi-omics data. In this talk, I will firstly introduce existing methods for dimension reduction and multi-omics data integration. Thereafter, I will discuss the preliminary results on the comparisons of the existing methods. Finally, I will briefly discuss future work.

**SociaLab: A census-based simulation tool for public policy inquiry**

Speaker: Professor Peter Davis

Affiliation: University of Auckland

When: Wednesday, 25 September 2019, 11:00 am to 12:00 pm

Where: 303-310

It is usually neither practical nor ethical to conduct large-scale experiments in public policy with standard methodologies. One alternative for the prior testing of policy options is to use simulation, a prime contemporary example being climate change projections.

A tool – SociaLab – was developed for the counterfactual modelling of public policy drawing on longitudinal data from the New Zealand census and using microsimulation techniques.

SociaLab potentially provides an open-source tool for inquiry in policy development. It has now been fully written up in Simulating Societal Change, co-authored with Roy Lay-Yee, and published by Springer in the series Computational Social Sciences.

Peter Davis is Honorary Professor in the Department of Statistics and Emeritus Professor in Population Health and Social Science at the University of Auckland. Earlier in his career, he was the founding director of the Centre of Methods and Policy Application in the Social Sciences (COMPASS) in the Faculty of Arts and before that a health and applied sociologist in the Faculty of Medical and Health Sciences.

**Examining measures of family and household socioeconomic position in the New Zealand context.**

Speaker: Natalia Boven

Affiliation: Department of Statistics, The University of Auckland.

When: Wednesday, 18 September 2019, 11:00 am to 12:00 pm

Where: 303-310

Health and social science research tends to rely on individual-level measures of socioeconomic position. However, individuals are often embedded within families and households with shared resources and health risks. While some research has examined the adequacy of different methods of assigning socioeconomic position within families, much of this took place prior to recent increases in family complexity and female labour force participation. Furthermore, to my knowledge there is no research examining the adequacy of these methods in the New Zealand context.

This research will use the Integrated Data Infrastructure from Statistics New Zealand, and will use the following data sets; the 2013 Census, MoH chronic conditions, MoH B4 school check, DIA birth records, MoE schools data, the personal details table and the estimated resident population table. The research will examine the adequacy of established methods of combining socioeconomic information for couples, parental units and households in the New Zealand context, using a range of outcomes. Additionally, the research will seek to develop methods to incorporate other adults in the household into household socioeconomic measures. All analyses will be stratified by age group, ethnic group, gender and same and opposite sex couples (where sample sizes permit) due to potential differences in family structures and gender norms across groups. This research will be conducted as part of a PhD and this presentation will present the outline for the proposed research.

**Conditional likelihood for two-phase designs: continuous responses and relative risk.**

Speaker: CLAUDIA RIVERA-RODRIGUEZ

Affiliation: The University of Auckland, Auckland, New Zealand

When: Wednesday, 11 September 2019, 11:00 am to 12:00 pm

Where: 303-310

C. Rivera-Rodriguez, G. Amorim,

The University of Auckland, Auckland, New Zealand

Department of Biostatistics, Vanderbilt School of Medicine, U.S.A

*email: c.rodriguez@auckland.ac.nz

Summary:

Several methods can be used for the the analysis of two-phase samples. A popular approach is weighted likelihood, which can be improved by adjusting the sampling weights using information available at first phase. The best choice of adjusting variables are the influence functions, which are usually unknown and therefore need to be estimated. For adjusted weights, the method has been shown to be efficient, but it is prone to error and implementation can be complex. An alternative method is conditional maximum likelihood (CML) estimation, which is computationally more demanding, but does not require the imputation step. Another advantage of it is that non-experts can utilize it without interfering in the estimation process while still obtaining gains in efficiency. We implement a CML approach for the analysis of two-phase samples and for distributions that belong to the exponential family- continuous and discrete. Parametric modeling of the sampling probabilities through calibration or estimation is also considered.. We conduct simulation studies to evaluate the methods under different sampling designs using i) a continuous normal model and a ii) model for the relative risk. We apply the methods to a costing study of immunization in Brazil and a log-model for the relative risk from a subset of the Breast Cancer Surveillance Consortium (BCSC).

**Longitudinal analysis of dietary patterns**

Speaker: Beatrix Jones

Affiliation: Department of Statistics, The University of Auckland

When: Wednesday, 7 August 2019, 11:00 am to 12:00 pm

Where: 303-310

Joint work with Larisa Morales, Clare Wall, and John Thompson

When study participants complete a food frequency questionnaire, it is common to summarise their diet with a few principal components. We look at ways of extending this to track dietary consumption over time, in the context of the Auckland Birth Cohort Study. The children in this study had their diet assessed at ages 3.5, 7, 11, and 16. There are some changes to the questionnaire used over this time period, as well as to the typical foods eaten (e.g. coffee and tea figure much more prominently for 16 year olds than 3.5 year olds). As well as principal component based strategies, we consider JIVE (Joint and Individual Variance Estimation, O’Connell and Lock, 2016), which was developed for looking at high throughput molecular data from different sources. We augment these strategies with visualisations to understand to what extent diets follow a predictable trajectory through childhood and adolescence.

**Latent socio-economic health and causal modelling**

Speaker: Fui Swen Kuh

Affiliation: Research School of Finance, Actuarial Studies and Statistics, College of Business and Economics, The Australian National University

When: Wednesday, 24 July 2019, 11:00 am to 12:00 pm

Where: 303-310

This research develops a model-based socio-economic health index for nations which incorporates spatial dependence and examines the impact of policies through a causal modelling framework. As Gross Domestic Product (GDP) has been regarded as a dated measure and tool for benchmarking a nation’s economic performance, there has been a growing consensus for an alternative holistic performance measure [1]. Many conventional ways of constructing health indices involve combining different observable metrics to form an index [2,3]. However, health is inherently latent with metrics actually being observable indicators of health. Much effort has been attempted to provide this alternative measure but none to our knowledge so far that uses a model-based approach to reflect the concept of latent health. In contrast to the GDP or other conventional health indices, our approach provides a holistic quantification of the overall ‘health’ of a nation. We extend the latent health factor index (LHFI) approach that has been used in assessing ecological/ecosystem health [4,5]. This framework integratively models the relationship between metrics, the unobservable latent health, and the covariates that drive the notion of health. In our work, the LHFI structure is integrated with spatial modelling and statistical causal modelling, so as to evaluate the impact of a policy variable (mandatory maternity leave days) on a nation’s socio-economic health, while formally accounting for spatial dependency among the 125 countries in 2015. We implement our model using data pertaining to different aspects of societal health and potential explanatory variables. The approach is structured in a Bayesian hierarchical framework and results are obtained by Markov chain Monte Carlo techniques.

Fui Swen Kuh*, Grace S. Chiu^, Anton H. Westveld*

Affiliations: *Research School of Finance, Actuarial Studies and Statistics, College of Business and Economics, The Australian National University

^William & Mary’s Virginia Institute of Marine Science

[1] Conceicao, P. and Bandura, R. (2008). Measuring subjective wellbeing: A summary review of the literature. United nations development programme (UNDP) development studies, working paper.

[2] Sachs, J. D., Layard, R., Helliwell, J. F. et al. (2018). World Happiness Report 2018 Technical Report.

[3] Programme, U. N. D. (2018). Human Development Indices and Indicators: 2018 Statistical Update Technical Report.

[4] Chiu, G. S., Guttorp, P., Westveld, A. H., Khan, S. A., & Liang, J. (2011). Latent health factor index: a statistical modelling approach for ecological health assessment. Environmetrics, 22(3), 243-255.

[5] Chiu, G. S., Wu, M. A., & Lu, L. (2013). Model-based assessment of estuary ecosystem health using the latent health factor index, with application to the Richibucto Estuary. PloS one, 8(6), e65697.

**Multiple and adaptive importance sampling schemes for Bayesian inference**

Speaker: Victor Elvira

Affiliation: School of Engineering, IMT Lille Douai, France

When: Wednesday, 3 July 2019, 11:00 am to 12:00 pm

Where: 303-310

In many problems of statistics and signal processing, the interest is in estimating unknown static variables given a set of observations. The hidden parameters and the available data are usually related through a specific model. Under the probabilistic Bayesian framework, the objective is more ambitious than simply calculating point-wise estimates of the unknowns and amounts to obtaining their posterior distributions. While for simple models, the posterior distributions can be characterized in a closed-form expression, however, in most practical scenarios they cannot be computed. Importance sampling (IS) is an elegant, theoretically sound, and simple-to-understand methodology for approximation of moments of distributions. The only condition relates to the capability of the point-wise evaluation of the targeted distribution. The basic mechanism of IS consists of (a) drawing samples from simple proposal densities, (b) weighting the samples by accounting for the mismatch between the targeted and the proposal densities, and (c) approximating the moments of interest with the weighted samples. The performance of IS methods directly depends on the choice of the proposal functions. For that reason, the proposals have to be updated and improved with iterations so that samples are generated in regions of interest. In this talk, we will first introduce the basics of IS and multiple IS (MIS), motivating the need of adapting the proposal densities. Then, the focus will be on describing an encompassing framework of AIS algorithms available in the current literature, including few recent methods. Finally, we will briefly present some numerical examples where we will study the performance of various algorithms.

**Designing learning tasks for statistical modelling that blend the use of GUI-driven and code-driven tools**

Speaker: Anna Fergusson

Affiliation: Department of Statistics, The University of Auckland

When: Wednesday, 19 June 2019, 11:00 am to 12:00 pm

Where: 303-310

The advent of data science has led to statistics education researchers re-thinking and expanding their ideas about tasks and tools for teaching and learning. A common thread to discussions about data science education is that students need to integrate both statistical and computational thinking, and that this necessitates students developing at least some computer programming skills. Although most of the recommendations for data science education are framed within the context of the tertiary level, within the context of high school statistics, teaching computational thinking and programming is also consistent with the goals of the New Zealand digital technologies curriculum. However, statistics has only recently emerged as a separate subject from mathematics at the high school level and one of the enablers for has been the increased affordability, availability and usability of computers. In particular, the teaching of simulation-based inference has been facilitated by software such as TinkerPlotsTM and specialist tools such as VIT (Visual Inference Tools, which have gained popularity due to their intuitive design and use of animation, graphics and interactivity to support statistical thinking and reasoning. In considering the implementation of data science as a subject at high school level, an issue to resolve is what kinds of computational tools students should use to learn from data. Arguments exist for and against introducing the use of code-driven tools or retaining the use of GUI-driven tools. I will argue in this PhD proposal that rather than adopting a “one tool to rule them all” type mentality, that both GUI-driven and code-driven tools should be used in combination. Within the learning area of simulation-based inference, as an example from my pilot study, I propose that education research has the potential to reveal and resolve the tensions that may exist when teaching both statistical and computational thinking.

**Optimal Multi-wave sampling for regression modelling**

Speaker: Tong Chen

Affiliation: University of Auckland

When: Wednesday, 29 May 2019, 1:00 pm to 2:00 pm

Where: 303-B07

Two-phase designs involve measuring extra variables on a subset of cohort some variables are measured. The goal is to choose a subsample of people from the sampled sub-cohort and analyse that subsample efficiently. There is a large body of literature on statistical inference for two-phase designs. However, compared with estimation methods, there is much less attention focused on the design aspect. It is desirable to obtain an optimal design which ends up with the most efficient estimation. In this talk, I will firstly introduce two-phase sampling and corresponding estimation methods. Thereafter, I will present a multi-wave sampling strategy and what we currently know about optimal design. I will focus on design-based estimation without making strong assumptions about the model. Finally, I will briefly discuss future work.

**Visualisation and Non-parametric Mixture Density Estimation for Circular Data**

Speaker: Lois Xu

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 29 May 2019, 11:00 am to 12:00 pm

Where: 303-310

The talk consists of two parts: a new general visualisation method for circular data, and using non-parametric mixtures for their density estimation.

Circular data can be understood as points that lie on the unit circle in a 2-dimensional space, e.g., wind directions or animal migratory orientations. It is therefore more appropriate to display such data or their (relative) frequencies in a 2-dimensional space to accommodate their circular nature. However, it is common in the literature to directly use frequency values in a 2-dimensional space, and this results in a perspective distortion. In the first part, I will describe a new, general area-proportional method that does not suffer from this problem and can be easily used via a simple transformation formula to produce various circular plots, e.g., smooth density curves, histograms, rose diagrams, dot plots, and plots for multi-class data.

In the second part, I will describe how to use non-parametric mixtures for non-parametric density estimation with circular data. Unlike a finite mixture, a non-parametric mixture distribution mixes together component distributions with a mixing distribution that takes a completely unspecified form. This results in non-parametric density estimates that are much simpler in form than the kernel-based ones. I will also report some numerical studies and results.

**Addressing contemporary issues in fish stock assessment methods**

Speaker: Craig Marsh

Affiliation: The University of Auckland

When: Wednesday, 22 May 2019, 11:00 am to 12:00 pm

Where: 303-310

Fisheries management rely on statistical models to provide advice on sustainable harvesting of fish populations. This PhD focuses on integrated Statistical Age-Structured (iSAS) models, which are used to assess important social, cultural and economic fish populations in New Zealand and worldwide. Age-structured models partition a population into age cohorts, and apply process dynamics that describe how age cohorts change over time, such as growth, recruitment, fishing, and maturation. State-space methodology extends historically applied iSAS models by incorporating process variation via time and age varying stock dynamics. This extra complexity has added modelling challenges regarding parameter identifiability and parameter biases. This PhD investigates these challenges through simulation studies. In particular, looking at what model formulations and data conditions do these issues arise, which have been documented in the literature. As well as simulation based investigation, this PhD plans to develop and apply an iSAS state-space model for hoki, an important commercial stock in New Zealand. This will extend this research beyond simulated data and demonstrate state-space methodology applicability for New Zealand iSAS models.

**Limiting distribution of particles near the frontier in the catalytic branching Brownian motion**

Speaker: Sergey Bocharov

Affiliation: Zhejiang University

When: Wednesday, 17 April 2019, 3:00 pm to 4:00 pm

Where: 303-610

Catalytic branching Brownian motion (catalytic BBM) is a spatial population model in which individuals, referred to as particles, move in space according to the law of standard Brownian motion and split in two at a spatially-inhomogeneous branching rate \beta \delta_0(.), where \delta_0(.) is the Dirac delta measure and \beta > 0 is some constant.

We shall discuss various asymptotic results concerning the spatial spread of particles in such a model and in particular show that the distribution of particles near the frontier converges to a Poisson point process with a random intensity.

**Data Science on Music Data**

Speaker: Prof. Claus Weihs

Affiliation: TU Dortmund University, Germany

When: Wednesday, 17 April 2019, 11:00 am to 12:00 pm

Where: 303-310

The talk discusses the structure of the field of data science and substantiates the key premise that statistics is one of the most important disciplines in data science and the most important discipline to analyze and quantify uncertainty. As an application, the talk demonstrates data science methods on music data for automatic transcription and automatic genre determination, both on the basis of signal-based features from audio recordings of music pieces.

Literature:

Claus Weihs und Katja Ickstadt (2018): Data Science: The Impact of Statistics; International Journal of Data Science and Analytics 6, 189–194

Claus Weihs, Dietmar Jannach, Igor Vatolkin und Günter Rudolph (Eds.)(2017): Music Data Analysis: Foundations and Applications; CRC Press, Taylor & Francis, 675 pages

**Bayesian modelling of forensic evidence**

Speaker: Jason Wen

Affiliation: Department of Statistics, University of Auckland

When: Wednesday, 27 March 2019, 3:00 pm to 4:00 pm

Where: 303-610

The term trace evidence is used to describe (usually non-biological) evidence left at a crime scene, or perhaps recovered from a person of interest (POI). There are many questions of interest, but they usually ultimately devolve to “How much more likely does this evidence make it that the accused is guilty?” The answer to this question, and the central quantity of interest for statisticians involved in the statistical interpretation of evidence, is the likelihood ratio. This is usually given as

LR = Pr(Evidence|Hp) Pr(Evidence|Hd )

where Hp and Hd are competing explanations for the evidence. Evaluation of the LR depends on statistical models which represent forensic expert knowledge, and incorporate experimental and observational data. My the- sis is concerned with developing and refining models in a number of trace evidence disciplines.

In this presentation, I will present my initial work on a Bayesian semi-parametric model - an infinite mixture model–for the distribution of recovered glass. This work can be seen as an expansion, or refinement, of models for the denominator of the likelihood ratio.

I will also outline the directions I intend to take in my future work.

**Bayesian demographic estimation and forecasting**

Speaker: John Bryant

Affiliation: Research Associate, University of Waikato

When: Wednesday, 27 March 2019, 11:00 am to 12:00 pm

Where: 303-310

Demographers face some tricky statistical problems. Users of demographic statistics are demanding estimates and forecasts that are as disaggregated as possible, including not just age and sex, but also ethnicity, area, income, and much else besides. Administrative data, and big data more generally, offer new possibilities. But these new data sources tend to be noisy, incomplete, and mutually inconsistent. The talk will describe a long-term project to develop Bayesian methods, and the associated software, to address these problems. We will look at some concrete examples, including a probabilistic demographic account for New Zealand.

About the Speaker:

John Bryant has worked at universities in New Zealand and Thailand, at the New Zealand Treasury, and (until February 2019) at Stats NZ. He has a PhD in Demography from the Australian National University. He is the author, with Junni Zhang, of the book Bayesian Demographic Estimation and Forecasting, published by CRC Press in 2018.

**Empowering Signal Processing Using Machine Learning: Applications in Speech Reconstruction and Forensic Investigations**

Speaker: Hamid Sharifzadeh

Affiliation: Senior Lecturer, School of Computing at Unitec Institute of Technology

When: Wednesday, 20 March 2019, 11:00 am to 12:00 pm

Where: 303-310

Advances in machine learning find rapid adoption in many fields ranging from communications, signal processing, and automotive industry to healthcare, law, and forensics. In this talk, we focus on two research projects revolved around machine learning and signal processing: a) forensic investigation through vein pattern recognition, b) speech reconstruction for aphonic patients. Both research projects rely on cutting edge machine learning algorithms while applied into speech and image processing areas, one for rehabilitation purposes for post-laryngectomised patients and the other for helping law enforcement agencies identify criminals/victims in child exploitation material.

Bio: Hamid Sharifzadeh is currently a Senior Lecturer in the School of Computing at Unitec Institute of Technology. Hamid completed his Ph.D. at Nanyang Technological University (NTU), Singapore in 2012. Following the completion of his studies, he undertook two postdoctoral fellowships at NTU focussing on the areas of speech and image processing before joining Unitec as a Lecturer in 2014.

**Multi-kernel linear mixed model with adaptive lasso for complex phenotype prediction**

Speaker: Yalu Wen

Affiliation: The University of Auckland

When: Wednesday, 6 March 2019, 11:00 am to 12:00 pm

Where: 303-310

Linear mixed models (LMMs) and their extensions have been widely used for prediction purposes with high-dimensional genomic data. However, LMMs used to date have lacked theoretical justification for selecting disease predictive regions and failed to account for complex genetic architectures. In this work we present a multi-kernel linear mixed model with adaptive lasso to predict phenotypes using high-dimensional data. We developed an efficient algorithm for parameter estimation and also established the asymptotic properties when only one dependent observation is available. The proposed KLMM-AL can account for heterogeneous effect sizes from different genomic regions, capture both additive and non-additive genetic effects, and adaptively and efficiently select predictive genomic regions. Through simulation studies, we demonstrate that KLMM-AL is overall comparable if not better than most of the existing methods and KLMM-AL achieves high sensitivity and specificity of selecting predictive genomic regions. KLMM-AL is further illustrated by an application to a real data set.

**Variable Selection and Dimension Reduction methods for high dimensional and Big-Data Set.**

Speaker: Benoit Liquet

Affiliation: University of Pau & Pays de L'Adour, ACEMS (QUT)

When: Wednesday, 27 February 2019, 11:00 am to 12:00 pm

Where: 303-310

It is well established that incorporation of prior knowledge on the structure existing in the data for potential grouping of the covariates is key to more accurate prediction and improved interpretability. In this talk, I will present new multivariate methods incorporating grouping structure in Bayesian and frequentist methodology for variable selection and dimension reduction to tackle the analysis of high dimensional and Big-Data set.

**Data depth and generalized sign tests**

Speaker: Christine Mueller

Affiliation: Department of Statistics, University of Dortmund, Germany

When: Wednesday, 20 February 2019, 11:00 am to 12:00 pm

Where: 303-310

Data depth is one of the approaches to generalize the outlier robust median to more complex situations. In this talk, I show how this can be done for multivariate data and regression. The concept of half space depth and simplicial depth are crucial for the generalization for multivariate data while the concept of nonfit was used for defining regression depth and simplicial regression depth. Thereby, simplicial regression depth leads often to a so-called sign depth and corresponding generalized sign tests. These generalized sign tests can be used as soon as residuals are available and are much more powerful than the classical sign test. I demonstrate this for

**Critical issues in recent guidelines**

Speaker: Prof Markus Neuhaeuser

Affiliation: Dept. of Mathematics and Technology, Koblenz University of Applied Sciences, Remagen, Germany

When: Tuesday, 12 February 2019, 11:00 am to 12:00 pm

Where: 303-B09

To increase rigor and reproducibility, some medical journals provide detailed guidelines for experimental design and statistical analysis. Although this development is positive, quite a few recommendations are critical because they reduce the power or are indefensible from a statistical point of view. This is shown using two current examples, namely the 2017 published checklist of the journal Circulation Research [Circulation Research. 2017; 121:472-9] and the 2018 published guideline of the British Journal of Pharmacology [British Journal of Pharmacology. 2018; 175:987-93]. Topics discussed are the analysis of variance in case of heteroscedasticity, the question of balanced sample sizes, the power calculation including so-called post-hoc power analyses, minimum group sizes, and the t test for small samples.

https://www.hs-koblenz.de/en/profilepages/profil/neuhaeuser/

**Modelling block arrivals in the Bitcoin blockchain**

Speaker: Peter Taylor

Affiliation: University of Melbourne

When: Wednesday, 30 January 2019, 3:00 pm to 4:00 pm

Where: 303-610

Modelling block arrivals in the Bitcoin blockchain In 2009 the pseudononymous Satoshi Nakamoto published a short paper on the Internet, together with accompanying software, that proposed an `electronic equivalent of cash’ called Bitcoin. At its most basic level, Bitcoin is a payment system where transactions are verified and stored in a distributed data structure called the blockchain. The Bitcoin system allows electronic transfer of funds without the presence of a trusted third party. It achieves this by making it `very hard work’ to create the payment record, so that it is not computationally-feasible for a malicious player to subsequently repudiate a transaction and create the forward history without it.

The Nakamoto paper contained a simple stochastic model, used to show that the above-mentioned malicious player would be very unlikely to succeed. Unfortunately, this calculation contained an error, which I shall discuss and show how to correct.

The Bitcoin payment record is stored in a data structure called the blockchain. Blocks are added to this structure by `miners’ working across a distributed peer-to-peer network to solve a computationally difficult problem. With reference to historical data, I shall describe the block mining process, and present a second stochastic model that gives insight into the block arrival process.

Finally, I shall make some brief comments about how stochastic modelling can be used to address the current concerns that the transaction processing rate of the Bitcoin system is not high enough.

**Bursty Markovian Arrival Processes**

Speaker: Azam Asanjarani

Affiliation: The University of Auckland

When: Wednesday, 23 January 2019, 3:00 pm to 4:00 pm

Where: 303-610

We consider stationaryMarkovian Arrival Processes (MAPs) where both the squared coeficient of variation of inter-event times and the asymptotic index of dispersion of counts are greater than unity. We refer to such MAPs as bursty. The simplest bursty MAP is a Hyperexponential Renewal Process (H-renewal process). Applying Matrix analytic methods (MAM), we establish further classes of MAPs as Bursty MAPs: the Markov Modulated Poisson Process (MMPP), the Markov Transition Counting Process (MTCP) and the Markov Switched Poisson Process (MSPP). Of these, MMPP has been used most often in applications, but as we illustrate, MTCP and MSPP may serve as alternative models of bursty traffic. Hence understating MTCPs, MSPPs, and MMPPs and their relationships is important from a data modelling perspective. We establish a duality in terms of first and second moments of counts between MTCPs and a rich class of MMPPs which we refer to as slow-MMPPs (modulation is slower than the events).

**Critical two-point function for long-range models with power-law couplings: The marginal case for $d\ge d_c$**

Speaker: Akira Sakai

Affiliation: Department of Mathematics, Hokkaido University, Japan

When: Wednesday, 16 January 2019, 3:00 pm to 4:00 pm

Where: 303-610

Consider the long-range models on $\mathbb{Z}^d$ of random walk, self-avoiding walk,percolation and the Ising model, whose translation-invariant 1-step distribution/couplingcoefficient decays as $|x|^{-d-\alpha$}$ for some $\alpha>0$. In the previous work (Ann.Probab., 43, 639--681, 2015), we have shown in a unified fashion for all $\alpha\ne2$ that,assuming a bound on the "derivative" of the $n$-step distribution (the compound-zetadistribution satisfies this assumed bound), the critical two-point function $G_{p_c}(x)$ decays as $|x|^{\alpha\wedge2-d}$ above the upper-critical dimension$d_c\equiv(\alpha\wedge2)m$, where $m=2$ for self-avoiding walk and the Ising modeland $m=3$ for percolation.

In this talk, I will show in a much simpler way, without assuming a bound on the derivativeof the $n$-step distribution, that $G_{p_c}(x)$ for the marginal case $\alpha=2$ decaysas $|x|^{2-d}/log|x|$ whenever $d\ge d_c$ (with a large spread-out parameter L). Thissolves the conjecture in the previous work, extended all the way down to $d=d_c$, andconfirms a part of predictions in physics (Brezin, Parisi, Ricci-Tersenghi, J. Stat. Phys.,157, 855--868, 2014). The proof is based on the lace expansion and new convolutionbounds on power functions with log corrections.