《计算机应用研究》|Application Research of Computers

数据库系统性能模型建模方法综述

Survey on performance modeling of database systems

免费全文下载 (已被下载 次)  
获取PDF全文
作者 张锦文,牛保宁
机构 太原理工大学 计算机科学与技术学院,太原 030024
统计 摘要被查看 次,已被下载
文章编号 1001-3695(2019)03-001-0641-08
DOI 10.19734/j.issn.1001-3695.2018.01.0060
摘要 当前的性能模型主要分为分析型和统计型两种。分析型模型需要深入研究数据库系统查询执行过程,对动态查询的适应性较好,无须成本高昂的采样实验,但在查询并行执行情景下建模复杂,对不同的数据库系统有不同的理论模型。统计型模型无须分析查询执行过程,通过采集查询执行参数并训练某个数学模型。统计型建模过程简单,能够较好地描述查询交互,预测效果较好,但采样成本很高,对动态查询的适应性差。对数据库系统性能建模的主要文献进行综述,重点介绍数据库系统性能建模的主要方法,并讨论这两类模型各自的优缺点、建模的难点以及应对策略。在此基础上,对数据库系统性能模型领域的研究进行展望,为有关该领域的研究提供参考。
关键词 数据库系统性能模型;数据库系统性能管理;查询交互;查询调度;机器学习
基金项目 国家自然科学基金资助项目(61572345)
本文URL http://www.arocmag.com/article/01-2019-03-001.html
英文标题 Survey on performance modeling of database systems
作者英文名 Zhang Jinwen, Niu Baoning
机构英文名 SchoolofComputerScience&Technology,TaiyuanUniversityofTechnology,Taiyuan030024,China
英文摘要 The current performance models can be divided into two categories, analytic models and statistical models. The ana-lytical models are built by studying the processes of query execution for the specific database system. It can adapt to dynamical queries well and does not need to do costly sampling experiments. It is complex, however, to describe the execution processes of concurrent queries, and has to develop different theoretical models for different database systems. In contrast, statistical models predict the performance of database systems by training a mathematical model with data collected from the database systems, avoiding the complexity of studying the detailed query execution processes and has better prediction accuracy. For the reason of sampling, statistical modeling is costly and cannot adapt to dynamic queries. This paper survived the literatures of performance modeling of database systems, put emphasis on the major modeling methods. And then discussed the pros and cons of the two types of models, and the challenging and corresponding approaches for attacking them. Finally, it also discussed the future research directions in the field of database performance modeling.
英文关键词 database system performance model; database system performance management; query interaction; query sche-dule; machine learning
参考文献 查看稿件参考文献
  [1] Guirguis S, Sharaf M A, Chrysanthis P K, et al. Adaptive scheduling of Web transactions[C] //Proc of International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2009:357-368.
[2] Ahmad M, Aboulnaga A, Babu S, et al. Interaction-aware scheduling of report-generation workloads[J] . VLDB Journal, 2011, 20(4):589-615.
[3] Tozer S, Brecht T, Aboulnaga A. Q-Cop:avoiding bad query mixes to minimize client timeouts under heavy loads[C] //Proc of the 26th IEEE International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2010:397-408.
[4] Costa P. Bridging the tenant-provider gap in networked cloud services[C] //Proc of International Workshop on Virtualization Technologies in Distributed Computing. New York:ACM Press, 2013:23-24.
[5] Marcus R, Papaemmanouil O. WiSeDB:a learning-based workload management advisor for cloud databases[J] . Proceedings of the VLDB Endowment, 2016, 9(10):780-791.
[6] Li Jiexing, Naughton J, Nehme R V. Resource bricolage for parallel database systems[J] . Proceedings of the VLDB Endowment, 2015, 8(1):25-36.
[7] Ahmad M, Aboulnaga A, Babu S, et al. Modeling and exploiting query interactions in database systems[C] //Proc of International Conference on Information and Knowledge Management. New York:ACM Press, 2008:183-192.
[8] Chaudhuri S, Narasayya V, Ramamurthy R. Estimating progress of execution for SQL queries[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2004:803-814.
[9] Luo Gang, Naughton J F, Ellmann C J, et al. Toward a progress indicator for database queries[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2004:791-802.
[10] Wasserman T J, Martin P, Skillicorn D B, et al. Developing a cha-racterization of business intelligence workloads for sizing new database systems[C] //Proc of International Workshop on Data Warehousing and OLAP. New York:ACM Press, 2004:7-13.
[11] Wu Wentao, Chi Yun, Hacígümüs, H, et al. Towards predicting query execution time for concurrent and dynamic database workloads[J] . Proceedings of the VLDB Endowment, 2013, 6(10):925-936.
[12] Duggan J, Cetintemel U, Papaemmanouil O, et al. Performance prediction for concurrent database workloads[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2011:337-348.
[13] Ahmad M, Duan S, Aboulnaga A, et al. Predicting completion times of batch query workloads using interaction-aware models and simulation[C] //Proc of International Conference on Extending Database Technology. New York:ACM Press, 2011:449-460.
[14] Ganapathi A, Kuno H, Dayal U, et al. Predicting multiple metrics for queries:better decisions enabled by machine learning[C] //Proc of International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2009:592-603.
[15] Mozafari B, Curino C, Jindal A, et al. Performance and resource modeling in highly-concurrent OLTP workloads[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2013:301-312.
[16] Li Hui, Hou Xiaohuan, Chen Mei, et al. Performance prediction for concurrent workloads in distributed database systems[C] //Proc of International Conference of Algorithms and Architectures for Parallel Processing. Berlin:Springer, 2015:626-639.
[17] Schaffner J, Eckart B, Jacobs D, et al. Predicting in-memory database performance for automating cluster management tasks[C] //Proc of International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2011:1264-1275.
[18] Duggan J, Papaemmanouil O, Cetintemel U, et al. Contender:a resource modeling approach for concurrent query performance prediction[C] //Proc of International Conference on Extending Database Technology. New York:ACM Press, 2014:109-120.
[19] Akdere M, C, etintemel U, Riondato M, et al. Learning-based query performance modeling and prediction[C] //Proc of International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2012:390-401.
[20] Luo Gang, Naughton J F, Yu P S. Multi-query SQL progress indicators[C] //Proc of International Conference on Extending Database Technology. Berlin:Springer, 2006:921-941.
[21] Chaudhuri S, Kaushik R, Ramamurthy R. When can we trust progress estimators for SQL queries?[C] //Proc of ACM SIGMOD International Conference on Management of Data. New York:ACM Press, 2005:575-586.
[22] Luo G, Naughton J F, Watzke M W, et al. Increasing the accuracy and coverage of SQL progress indicators[C] //Proc of the 21st International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2005:853-864.
[23] Mishra C, Koudas N. A lightweight online framework for query progress indicators[C] // Proc of the 23rd IEEE International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2007:1292-1296.
[24] Li Jiexing, Nehme R V, Naughton J. GSLPI:a cost-based query progress indicator[C] //Proc of International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2012:678-689.
[25] Knig A C, Ding Bolin, Chaudhuri S, et al. A statistical approach towards robust progress estimation[J] . Proceedings of the VLDB Endowment, 2011, 5(4):382-393.
[26] Morton K, Friesen A, Balazinska M, et al. Estimating the progress of MapReduce pipelines[C] //Proc of the 26th IEEE International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 2010:681-684.
[27] Wu Wentao, Chi Yun, Zhu Shenghuo, et al. Predicting query execution time:are optimizer cost models really unusable?[C] //Proc of the 29th IEEE International Conference on Data Engineering. Piscata-way, NJ:IEEE Press, 2013:1081-1092.
[28] Haas P J, Naughton J F, Seshadri S, et al. Selectivity and cost estimation for joins based on random sampling[J] . Journal of Computer & System Sciences, 1996, 52(3):550-569.
[29] Suri R, Sahu S. Approximate mean value analysis for closed queuing networks with multiple-server stations[EB/OL] . (2007-01) . http://pages. cs. wisc. edu/~vernon/papers/poems. 07ierc. pdf.
[30] Wu Wentao, Wu Xi, Naughton J F. Uncertainty aware query execution time prediction[J] . Proceedings of the VLDB Endowment, 2014, 7(14):1857-1868.
[31] Bach F R, Jordan M I. Kernel independent component analysis[J] . Journal of Machine Learning Research, 2003, 3(1):1-48.
[32] Sheikh M B, Minhas U F, Khan O Z, et al. A Bayesian approach to online performance modeling for database appliances using Gaussian models[C] //Proc of International Conference on Autonomic Computing. New York:ACM Press, 2011:121-130.
[33] Witten I H, Frank E, Hall M A, et al. Data mining:practical machine learning tools and techniques[M] . 3rd ed. San Francisco:Morgan Kaufmann, 2016.
[34] Milicevic M, Baranovic M, Zubrinic K. Application of machine lear-ning algorithms for the query performance prediction[J] . Advances in Electrical & Computer Engineering, 2015, 15(3):33-44.
[35] Thomasian A. On a more realistic lock contention model and its ana-lysis[C] //Proc of the 10th IEEE International Conference on Data Engineering. Piscataway, NJ:IEEE Press, 1994:2-9.
[36] Kononenko I. Estimating attributes:analysis and extensions of Relief[C] //Proc of European Conference on Machine Learning on Machine Learning. Berlin:Springer, 1994:171-182.
[37] Robnik-Sikonja M, Kononenko I. An adaptation of Relief for attribute estimation in regression[C] //Proc of International Conference on Machine Learning. San Francisco:Morgan Kaufmann Publisher, 1997:296-304.
[38] Hadoop[EB/OL] . (2017-12-18)[2018-01-20] . https://hadoop. apache. org.
[39] MongoDB[EB/OL] . (2017-11-14)[2018-01-20] . https://www. mongodb. com.
[40] Spark[EB/OL] . (2017-12-01)[2018-01-20] . http://spark. apache. org.
[41] Duggan J, Chi Yun, Hacigumus H, et al. Packing light:portable workload performance prediction for the cloud[C] //Proc of the 29th IEEE International Conference on Data Engineering Workshops. Piscataway, NJ:IEEE Press, 2013:258-265.
[42] Ganapathi A, Chen Yanpei, Fox A, et al. Statistics-driven workload modeling for the cloud[C] //Proc of the 26th IEEE International Conference on Data Engineering Workshops. Piscataway, NJ:IEEE Press, 2010:87-92.
[43] Mao Li, Qi Deyu , Lin Weiwei, et al. A self-adaptive prediction algorithm for cloud workloads[J] . International Journal of Grid & High Performance Computing, 2015, 7(2):65-76.
[44] Song Ge, Meng Zide, Huet F, et al. A Hadoop MapReduce perfor-mance prediction method[C] //Proc of the 10th IEEE International Conference on High Performance Computing and Communications & International Conference on Embedded and Ubiquitous Computing. Piscataway, NJ:IEEE Press, 2014:820-825.
[45] Farias V A E, Sousa F R C, Maia J G R, et al. Machine learning approach for cloud NoSQL databases performance modeling[C] //Proc of the 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Piscataway, NJ:IEEE Press, 2016:617-620.
[46] Zhang J, Niu B. A clustering-based sampling method for building query response time models[J] . Computer Systems Science & Engineering, 2017, 32(4):319-331.
[47] LéCun Y, Bengio Y, Hinton G. Deep learning[J] . Nature, 2015, 521(7553):436-444.
[48] LéCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J] . Proceedings of the IEEE, 1998, 86(11):2278-2324.
[49] Graves A. Long short-term memory[M] //Supervised Sequence Labelling with Recurrent Neural Networks. Berlin:Springer, 2012:1735-1780.
[50] Sutton R S, Barto A G. Reinforcement learning:an introduction[M] . Cambridge, MA:MIT Press, 1998.
[51] Dai Wenyuan, Yang Qiang, Xue G R, et al. Boosting for transfer learning[C] //Proc of International Conference on Machine Learning. New York:ACM Press, 2007:193-200.
收稿日期 2018/1/8
修回日期 2018/3/7
页码 641-648,656
中图分类号 TP311.13
文献标志码 A