Dec 11

position paper – The role of statistics for Big Data, Data Literacy, Machine Learning, Of, Analytics und Data Science

The role of statistics

Why the digitized information- and knowledge society statistical skills needs

Data understanding and extraction of knowledge from data of growing importance for science, economy and society. deficits to initiatives and programs on topics such as Data Literacy, Statistical Literacy, Artificial Intelligence and Data Engineering are met. Data Science is establishing itself as a new scientific field, the sub-regions of computer science subjects, Statistics and Mathematics spans and solutions for new digital data, Big Data, tracks data analysis and knowledge extraction.

The statistics shall as established science data analysis and inference a central role in these processes and structures take in. Without statistical know-how efforts at risk, their goals missing.

The German Statistical Society understands Data Literacy and Data science as an integral part of statistical domains, has this in their adopted system and promotes research and education on these issues.

The increased demand for skills and abilities in the Data fields Literacy, digitalization, Big Data, Data analysis and data Science requires substantial contributions of statistics. The development and Mediation required knowledge belongs qualified in their hands Statisticians.

Therefore, the German Statistical Society calls for a comprehensive development of statistics in research and teaching (incl. statistical consultation) at universities and colleges.

Data and the strong growth in demand, this understanding and information from them, to extract knowledge and insight, are now ubiquitous than ever. The Society of here, Business and politics perceived - and real part – Problems and deficiencies relating to the availability of analytical skills are considered critical factors. Problems and solutions are marketed under the terms Data Literacy, Statistical Literacy, Data and Business Analytics, Artificial intelligence (Of), Data Engineering, Knowledge Engineering and Data Science discussed and resulted in the science system already extensive activities and initiatives under these slogans, 100 additional AI-Chairs on new research structures to new training opportunities and courses of study, the latter often with a distinctive online- and e-learning reference.

This process, which is largely determined by the demand side actors, who often have no relevant statistical training, raises questions about the role and importance of statistics on, in particular in relation to Data Science.

The data sciences (Data Science) form internationally as a scientific field, interdisciplinary and driven by applications in business, much of the sciences and official statistics. Concerns are the research and application of scientific methodologies for information- and gaining knowledge from data and data-driven solutions through processing, processing, Analysis and Inference of very large, high-dimensional data sets (Big Data, new digital data). The field of artificial intelligence pursuing this, particularly with a, to extract intelligent behavior from data for self-optimizing AI systems, especially with methods of Machine Learnings and Deep Learning Networks. This requires skills, only in the past often distributed on the subjects computer science, Statistics and Mathematics templates.

Although the understanding of Data Science resulted from this problem location and is still in flux, but so are always models, Methods and findings from the statistics and computer science, and optimization, Numerics and the field of application (Domain Knowledge) educated, used and problem-related developed. Although important special problems can be researched independently useful in the various disciplines and have them fertilized already substantially and sustainably and reshaped in parts. nevertheless Data Science coined as a scientific topic through interdisciplinary thinking and from this draws its special profile.

The German Statistical Society has these developments, were also a development of statistics and, Taken into account and Data Science Next Computational Statistics and Statistical Literacy firmly integrated into their subject and annual meetings. The development and classification of Data Science and the special role of statistics were also discussed in committees of the Company and annual meetings as literacy issues.

The statistics occupies a central position in the data sciences. It is the science and practical discipline, the solution of the first Big Data-Problems of humanity, censuses and population statistics, enabled. She researches based on the probability theory has always been essential core issues for data understanding and knowledge extraction, namely Datendeskription, Data exploration and data analysis and sampling theory and statistical inference. Theoretical foundations were laid before the beginning of the computer age and practically applied. The statistic is not only essential foundations of a theoretical foundation of many methods of Machine Learnings by the Statistical Learning Theory delivered, but also with Random-Forrest-Klassifizierern and BaggingMethods some nowadays usually used the Machine-LearningProcedures for data analysis and prediction developed. This, as well as further developments of classical statistical methods, are now in areas such as medical diagnostics, Business Analytics, Image processing and used extensively in autonomous systems. With interpretable models, founded approaches for the quantification of uncertainties and assessing replicability and substantial advances in statistical inference for Big Data analysis support the modern statistics and stochastics also on current developments and research trends decisively.  Statistical expertise is relevant also in many places for improved algorithms and their understanding. Thus, the static cross-validation is an important tool for the training phase of Deep Learnern, to achieve a good generalization.


The specialization of statistics as a discipline on individual scientific fields and the inclusion of expert knowledge of these fields is the establishment of sub-disciplines such as Biometry / Biostatistics, environmental Statistics, Industriestatistik or the Econometrics guided. The technological developments in computing technology and the digitization of society, Economic and empirical studies have been taken up in the statistics at an early stage and many subdivisions have changed significantly. In particular, the new sub-discipline was the Computational Statistics. Well have Hochdimensionale Statistics and Statistical Machine Learning established as a research and statistics in the form of an extended range of methods input in application-related areas (esp. Econometrics, Empirical Research in Economics, biostatistics, Technical Statistics) found. Important improvements in hardware and software also allow significantly more complex stochastic models and methods to establish and apply the necessary statistical theories.

Many of the term Data Science addressed issues and problems found in these developments of statistics their natural links and scientific Zitationsbasis, although they certainly formulate new challenges. This applies to both the disciplinary research in the mathematical and applied statistical and Stochastics, as well as for interdisciplinary research and data analysis. In particular the Econometrics, Industriestatistik, to mention education and teaching and the Official Statistics. It is noted, that independent initiatives to Data Science or Data Literacy arise especially in those scientific fields, in which the triumph of statistics has in the last century are not sustainable input found in the form of the establishment of statistics professors or the formation of an independent statistical sub-discipline.

The valid analysis of large data sets requires substantial contributions of statistics and thus qualified statisticians, the comprehensive expertise in areas such as Machine Learning, Data Privacy and Literacy, parallel Computing, Algorithmik and optimization have. A purely algorithmic viewpoint, the Data Science is understood as an engineer or skilled branch of computer science and data processing on a, algorithmic perspective reduced, falls well short and self-defined goals such uncertainty quantification or explainability of AI will not be able to reach. In particular, also runs the risk, to ignore the fundamental theoretical scientific knowledge, that result interpretation and evaluation of data uncertainty statistics required in the form of scientific, falsifiable models, consider the mechanisms of data generation.

Statistics show the possibilities and limits of knowledge extraction from data and thus provides the basis for a critical approach to data. Statistics was and is the science for gaining knowledge from data and any data science is without statistics unthinkable.

Against the background of an analysis recently nationally and internationally DOMICILED programs and established programs as well as newly created research structures at renowned institutions, the German Statistical Society is speaking the recommendations and demands from:

Positions and recommendations of the German Statistical Society in detail:

  • As a professional society for theoretical, Applied and practical statistics, which also statisticians with expertise from applications represent, understands the German Statistical Society data science and data literacy as an integral part areas.
  • The German Statistical Society advocates and calls for the creation of statistics at universities through the establishment of new professorships and posts for research assistants / inside, to the substantially increased demand for skills and abilities in the field of statistics, digitalization, Data comply Literacy and data science in teaching and research.

    This is particularly necessary, so that the substantially growing teaching- and training needs no impact to the detriment of research quality.

    A nursery places is especially with regard to economic- and engineering faculties and courses necessary, to the increased importance of the topic for the business location Germany meet.
  • The statistical consulting and non-curricular statistical training courses at universities must be expanded as needed.
  • of teaching- and research structures Data Science, Machine Learning, Of and Data Literacy (e.g. support programs, courses, Doktorandenprogram Up, research associations, research programs) and digitized teaching- and training opportunities (E-Learning) must in statistics is trained and qualified teachers and researchers involved are. They should be involved in the management and coordination. The construction of structures in teaching and research with no connection to existing statistical institutes or. professorships and without significant statistical expertise to the requirements is not justified. The development and communication of data analysis skills belong in the hands of qualified statisticians / inside. The German Statistical Society strongly recommends here, To account for this in the further development and, if necessary. rework.
  • standardized, broad-based (about interdisciplinary) Online learning offers on topics such as Data Science and Data Literacy are to be welcomed in the sense of a complementary offer. You may not restrict teachers and the variety of teaching and teaching but, but they should enrich, and can not replace specialist teaching.

Pdf file to download: Position Paper German Statistical Society