Student research opportunities
Mining Big Data
Project Code: CECS_845
This project is available at the following levels:
Honours, PhD
Keywords:
Gaussian processes, Big data, spatio-temporal modelling
Supervisor:
Dr Warren JinOutline:
The era of Big Data has begun: Web service companies, retails, government and science agencies are talking about capturing, managing, and processing data in the scale of terabyte or more per day. For example, ecologists, in order to understand how species adapt to climate change in Australia, would like to see species response to high-resolution weather data (say 250m by 250m) in the continent. Such big data sets become difficult to process using on-hand data management tools or traditional data processing applications. Most scalable data mining techniques, such as sparse Gaussian process regression and Gaussian predictive process, have a computational complexity of O(n) to a data set of size n. To make it practical, mining big data needs techniques with far less computation and storage complexity. This project will investigate various approximation techniques, like variational inference, feature extraction, etc, to meet the gap.
Goals of this project
The project will develop sophisticated modelling and computational techniques for big data with computational complexity sublinear to data size n. These developed techniques are applicable to various important environmental problems such as daily climate projection for australia, subseasonal forecast, and climate change adaptation, to just name a few. It will also impact these important areas by combining sophisticated statistical modelling techniques with modern computation techniques.
Requirements/Prerequisites
- Applicants are expected to have a strong background in statistics/maths and/or machine learning.
- Interest in environmental problems
- Preferably with strong background in statistical machine learning.
- Preferably with excellent programming skills (R, Python or C/C++)
Student Gain
A student working in this project can expect
- to learn state-of-art of data mining techniques
- to learn state-of-art of statistical computation techniques
- to be involved in developing cutting-edge techniques to handle real-world environmental challenges while working with a research group delivering great science and innovative solutions for Australian society and economy;
Background Literature
- Williams, C. K. I., & Rasmussen, C. E. (2006). Gaussian processes for machine learning.
- Hastie, Trevor, Robert Tibshirani, Jerome Friedman (2009) The Element of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition.
- Khandoker Shuvo Bakar, Philip Kokic and Huidong Jin. Hierarchical spatially and temporally varying coefficient process models using spTimer version 2.0. Under review, 2014
- Park, Trevor and George Casella. "The Bayesian Lasso." Journal of the American Statistical Association Volume 103, Issue 482 (2008): 681-686.
- Hensman, James, Magnus Rattray, and Neil D. Lawrence. "Fast variational inference in the conjugate exponential family." In Advances in Neural Information Processing Systems, pp. 2888-2896. 2012.
- Porcu et al. (eds.), Advances and Challenges in Space-time Modelling of Natural Events. Springer, 2012.
- Hoffman, Matthew D., David M. Blei, Chong Wang, and John Paisley. "Stochastic variational inference." The Journal of Machine Learning Research 14, no. 1 (2013): 1303-1347.






