題目: Optimal Subsampling Algorithm for Big Data Generalized Linear Models
主講人:艾明要 教授
講座時間:2019年4月19日(周五)下午15:00-16:00
講座地點:金融學(xué)院422會議室
主講人簡介:
艾明要,,男,2003年在南開大學(xué)取得博士學(xué)位,,之后來北京大學(xué)數(shù)學(xué)科學(xué)學(xué)院工作至今,。2007年8月至2009年1月,美國佐治亞理工學(xué)院工業(yè)與系統(tǒng)工程系訪問學(xué)者?,F(xiàn)為北京大學(xué)數(shù)學(xué)科學(xué)學(xué)院統(tǒng)計學(xué)教研室主任,、教授、博士生導(dǎo)師,。兼任中國概率統(tǒng)計學(xué)會秘書長,,中國現(xiàn)場統(tǒng)計研究會常務(wù)理事,試驗設(shè)計分會理事長,,高維數(shù)據(jù)統(tǒng)計分會副理事長等,,國際重要統(tǒng)計期刊《Statistica Sinica》、《Journal of Statistical Planning and Inference》,、《Statistics and Probability Letters》,、《STAT》副主編,國內(nèi)核心期刊 《系統(tǒng)科學(xué)與數(shù)學(xué)》編委,,科學(xué)出版社《統(tǒng)計與數(shù)據(jù)科學(xué)系列叢書》編委,。
主要從事試驗設(shè)計與分析、計算機試驗,、大數(shù)據(jù)分析和應(yīng)用統(tǒng)計的教學(xué)和研究工作,,在Ann Statist、JASA,、Biometrika,、Technometrics,、Statist Sinica等國內(nèi)外頂尖期刊發(fā)表學(xué)術(shù)論文六十余篇,主持完成國家自然科學(xué)基金面上項目5項,、國家自然科學(xué)基金重點項目子課題1項,,參與完成國家科技部973課題2項。
Abstract:
To fast approximate the MLE with massive data, this paper study the optimal subsampling method under the A-optimality criterion for generalized linear models (GLM). The consistency and asymptotic normality of the estimator from a general subsampling algorithm are established, and optimal subsampling probabilities under the A- and L-optimality criteria are derived. Furthermore, using Frobenius norm matrix concentration inequality, finite sample properties of the subsample estimator based on optimal subsampling probabilities are also derived. Since the optimal subsampling probabilities depend on the full data estimate, an adaptive two-step algorithm is developed. Asymptotic normality and optimality of the estimator from this adaptive algorithm are established. The proposed methods are illustrated and evaluated through numerical experiments on simulated and real datasets.