題目: Optimal Subsampling Algorithm for Big Data Generalized Linear Models
主講人:艾明要 教授
講座時間:2019年4月19日(周五)下午15:00-16:00
講座地點:金融學院422會議室
主講人簡介:
艾明要,男,2003年在南開大學取得博士學位,之后來北京大學數學科學學院工作至今。2007年8月至2009年1月,美國佐治亞理工學院工業(yè)與系統工程系訪問學者。現為北京大學數學科學學院統計學教研室主任、教授、博士生導師。兼任中國概率統計學會秘書長,中國現場統計研究會常務理事,試驗設計分會理事長,高維數據統計分會副理事長等,國際重要統計期刊《Statistica Sinica》、《Journal of Statistical Planning and Inference》、《Statistics and Probability Letters》、《STAT》副主編,國內核心期刊 《系統科學與數學》編委,科學出版社《統計與數據科學系列叢書》編委。
主要從事試驗設計與分析、計算機試驗、大數據分析和應用統計的教學和研究工作,在Ann Statist、JASA、Biometrika、Technometrics、Statist Sinica等國內外頂尖期刊發(fā)表學術論文六十余篇,主持完成國家自然科學基金面上項目5項、國家自然科學基金重點項目子課題1項,參與完成國家科技部973課題2項。
Abstract:
To fast approximate the MLE with massive data, this paper study the optimal subsampling method under the A-optimality criterion for generalized linear models (GLM). The consistency and asymptotic normality of the estimator from a general subsampling algorithm are established, and optimal subsampling probabilities under the A- and L-optimality criteria are derived. Furthermore, using Frobenius norm matrix concentration inequality, finite sample properties of the subsample estimator based on optimal subsampling probabilities are also derived. Since the optimal subsampling probabilities depend on the full data estimate, an adaptive two-step algorithm is developed. Asymptotic normality and optimality of the estimator from this adaptive algorithm are established. The proposed methods are illustrated and evaluated through numerical experiments on simulated and real datasets.