Since 2018, the Project Data Science and Big Data at School (ProDaBi) develops teaching units for secondary level and professional development courses for teachers in collaboration with teachers and facilitators. The learning materials are being evaluated and accompanying research is based on a design research paradigm. Learning opportunities and obstacles for students and teachers are in the focus.
The project was initiated and is funded by the “Deutsche Telekom Stiftung”. The project is associated with the German Centre for Mathematics Teacher Education. The project is situated at Paderborn University.
Project Phase 1: 2018-2019 (ProDaBi I)
Project Phase 2: 2019-2023 (ProDaBi II)
Co-directors: Rolf Biehler (Mathematics and Statistics Education) and Carsten Schulte (Computer Science Education)
Since the school year 2018/2019, Paderborn University (Didactics of Mathematics, Didactics of Computer Science) has been conducting a curriculum, development and research project on the topic of Data Science and Big Data for secondary school level with funding from the “Deutsche Telekom Stiftung”. The first evaluation took place in the context of a project course in cooperation with the Gymnasium Theodorianum, Paderborn.
The project work was preceded by an international symposium “Perspectives for data science education at school level – Educational contributions from statistics, computer science and sociocultural studies”, 13. – 15. November 2017 in Paderborn (Proceedings: https://doi.org/10.17619/UNIPB/1-374).
Overview of teaching modules and courses
Grade | Topic |
---|---|
12 | Year-long project course “Big Data and Data Science”: Data exploration, machine learning, conducting a data science project |
8-10 | Units for about 12-20 lessons:
|
5-6 | Units for about 4-8 lessons
|
Publications
- Biehler, R. (2019). Software for learning and for doing statistics and probability – Looking back and looking forward from a personal perspective. In J. M. Contreras, M. M. Gea, M. M. López-Martín, & E. Molina-Portillo (Eds.), Proceedings of the Third International Virtual Congress of Statistical Education. University of Granada. www.ugr.es/local/fqm126/civeest.html
- Biehler, R., Budde, L., Frischemeier, D., Heinemann, B., Podworny, S., Schulte, C., & Wassong, T. (Eds.). (2018). Paderborn Symposium on Data Science Education at School Level 2017: The Collected Extended Abstracts. Paderborn: Universitätsbibliothek Paderborn. doi.org/10.17619/UNIPB/1-374
- Biehler, R., & Fleischer, Y. (2021). Introducing students to machine learning with decision trees using CODAP and Jupyter Notebooks. Teaching Statistics, 43(S1), S133-S142. https://doi.org/https://doi.org/10.1111/test.12279
- Biehler, R., Fleischer, Y., Budde, L., Frischemeier, D., Gerstenberger, D., Podworny, S., & Schulte, C. (2020). DATA SCIENCE EDUCATION IN SECONDARY SCHOOLS: TEACHING AND LEARNING DECISION TREES WITH CODAP AND JUPYTER NOTEBOOKS AS AN EXAMPLE OF INTEGRATING MACHINE LEARNING INTO STATISTICS EDUCATION. In P. Arnold (Ed.), New Skills in the Changing World of Statistics Education: Proceedings of the Roundtable conference of the International Association for Statistical Education (IASE), July 2020. ISI/IASE.
- Biehler, R., Frischemeier, D., Podworny, S., Wassong, T., Budde, L., Heinemann, B., & Schulte, C. (2018). Data Science and Big Data in Upper Secondary Schools: A Module to Build up First Components of Statistical Thinking in a Data Science Curriculum. Archives of Data Science, Series A (Online First), 5(1), 28. http://doi.org/10.5445/KSP/1000087327/28
- Biehler, R., & Schulte, C. (2018). Perspectives for an interdisciplinary data science curriculum at German secondary schools. In R. Biehler, L. Budde, D. Frischemeier, B. Heinemann, S. Podworny, C. Schulte, & T. Wassong (Eds.), Paderborn Symposium on Data Science Education at School Level 2017: The Collected Extended Abstracts (pp. 2-14). Universitätsbibliothek Paderborn. https://doi.org/http://dx.doi.org/10.17619/UNIPB/1-374
- Budde, L., Frischemeier, D., Biehler, R., Fleischer, Y., Gerstenberger, D., Podworny, S., & Schulte, C. (2020). DATA SCIENCE EDUCATION IN SECONDARY SCHOOL: HOW TO DEVELOP STATISTICAL REASONING WHEN EXPLORING DATA USING CODAP. In P. Arnold (Ed.), New Skills in the Changing World of Statistics Education: Proceedings of the Roundtable conference of the International Association for Statistical Education (IASE), July 2020. ISI/IASE.
- Frischemeier, D., Biehler, R., Podworny, S., & Budde, L. (2021). A first introduction to data science education in secondary schools: Teaching and learning about data exploration with CODAP using survey data. Teaching Statistics, 43(S1), S182-S189. https://doi.org/https://doi.org/10.1111/test.12283
- Heinemann, B., Opel, S., Budde, L., Schulte, C., Frischemeier, D., Biehler, R., Podworny, S. & Wassong, T. (2018). Drafting a Data Science Curriculum for Secondary Schools. Proceedings of the 18th Koli Calling International Conference on Computing Education Research – Koli Calling ’18, (17), 1–5. http://doi.org/10.1145/3279720.3279737
More information about the teaching modules
Human vs. Machine! – Game

The game was developed in the context of the „Science Year 2019“ and allows young people to experience how an AI system works by illustrating how a “simulated AI” gets “better and better” at playing the game “Hexapawn”.
You can download the material for the game here.
The Project Course “Big Data and Data Science” (Grade 12)
Click here for more information about the Project Course
The development of the project course formed the basis for the development of further teaching modules on the topic of data science and artificial intelligence (machine learning).The general goal of the project course is to introduce higher secondary school students to the topics of data science and artificial intelligence by following the framework of a meaningful and authentic data project with the aim of improving the traffic flow in Paderborn in order to save emissions. In this project, the students collaborate with the local adminstrations creating an authentic project work with real clients who are interested in a deployment of the results.
In this course, the students develop an own machine learning model for predicting the occupancy of several parking spaces in Paderborn for the next few hours, in order to help users in finding a parking lot faster. The focus is not only on the product, but also on the documentation of the data science process in a so-called computational essay, in order to make it reproducible and transparent for others. The CRISP-DM cycle is used as a basis. As computational tools, Python and Jupyter Notebooks are used. Codap (https://codap.concord.org) is used in the introductory phase of the course.
The project course was improved in three development cycles.
Collaborative development of modules for teaching at lower secondary level (grades 5-6 and 9-10) – including professional development courses for teachers
Based on these experiences, the developed material and the accompanying research, we started a collaborative project with regional school adminstrations in the state of Northrhine Westfalia in 2020. Together with a group of experienced facilitators, new versions of our teaching modules were developed for younger students. These modules are being tested in various classrooms and form the basis of professional development courses starting in the fall of 2021.
Grades 8-10
1. Data Science and artificial intelligence (without programming)Click here for more information about this module
In this teaching module, two teaching units focus on the introduction to Data Science using data exploration and decision trees as an AI method of machine learning.Unit 1: In the first unit, students explore survey data on youth media use (including anonymizised data from their own school) using the web-based data science tool CODAP (https://codap.concord.org) with a focus on elementary methods for data analysis and data exploration. The students examine the multivariate data set under several questions and develop analyses and interpretations in a project-like manner. A statistical report has to be presented at the end of the unit.
Unit 2: In the second unit, the results from the first unit can be used to make predictions about various questions using the decision tree method. The fictitous application context is an online platform that aims to place targeted advertisements for youth on behalf of various clients and uses media use data already collected from the students. The focus is on learning and understanding decision trees as a data-based decision model. The students first create decision trees intuitively and manually using the CODAP software. Afterwards, the creation process is systematized step by step by the students to understand how an algorithm can automatically create decision trees. Finally, the evaluation of the final decision models based on test data is performed. The confusion matrix, different kinds of misclassification and fairness issues are discussed.
2. Data Science and Artificial Intelligence (with programming)
Click here for more information about this module
This teaching module deals with two units of lessons on artificial intelligence, each of which introduces a machine learning method (decision trees, artificial neural networks). In addition to examples to illustrate basic concepts of machine learning, Jupyter Notebooks with Python are used in an age-appropriate way.Unit 1: At the beginning, the students learn about the rough structure of an ideal machine learning process (supervised learning) using image recognition as an example. The students then investigate such a process in more detail on a less complex dataset about beetles. With the help of prepared Jupyter notebooks, in which interactive widgets are used, the students first explore this data set and then the process of creating a decision tree. Based on data, decision trees are first created manually and later automatically. Students then interpret and evaluate the decision trees they have created. Finally, students apply their knowledge to another somewhat more complex data set.
Unit 2: At the beginning, the students learn about the rough structure of an ideal machine learning process (supervised learning) using image recognition as an example. Afterwards, the students learn about the structure of an artificial neural network by means of an unplugged activity, in which they are introduced to the components and functionalities of neural networks. Afterwards, the structural knowledge about neural networks is linked to the example on image recognition, and the ideas about functionalities are elaborated. Then, using prepared Jupyter Notebooks, a neural network for a less complex dataset about beetles is considered, on which the students can build up ideas about the learning process of a neural network. Finally, students apply their knowledge to another data set and create a neural network for image recognition using prepared Jupyter Notebooks. Students can use our Jupyter Notebooks with different levels of programming and coding knowledge. Several options depending on the available teaching time are offered. Teachers need basic knowledge in Python programming and in using Jupyter Notebooks, in order to be able to adapt our notebooks for teaching/–learning processes and explaining algorithmic details if students request or should require this knowledge.
3. Data Awareness – Data, Individual and Society
Click here for more information about this module
This teaching module is about a teaching unit which consists of two parts. It aims to create an awareness of the role of data in everyday situations, looking at the interactions between data-driven technologies and both the individual and society. In this way, we connect different areas of computer science education.Part 1: The first shorter part of the teaching module is an exploration of real location data (see the first module for grade 5/6). In this part, the students examine this data in a prepared Jupyter Notebook and find out as much information as possible about the initially unknown person.
Part 2: In the second part of the teaching module, students create their own recommendation system for movies in a prepared Jupyter Notebook using real data with explicit movie-ratings. The benefits and risks of recommendation services are reflected upon during the teaching units and also in the final lesson.
4. Data Projects and Data Exploration: Collecting and Analyzing Environmental Data with Sensors
Click here for more information about this module
This teaching module deals with two variants of a series of lessons in which students use professional tools to carry out their own data analysis with data they have collected themselves with sensors.Environmental data such as CO2 data, particulate matter data, temperature data or humidity data from their own environment is used. The two variants of the series of lessons differ in that in one variant the students develop and program the measuring instrument (an Arduino with various measuring sensors) on their own, while in the other variant they use a measuring instrument that has already been prepared or download data from their own environment via the website https://opensensemap.org.
The students develop their own research questions, which they would like to pursue in the data analysis. Then, suitable data are collected using the measuring instruments – which they may have created themselves – and are then analyzed with the support of a prepared Jupyter Notebook. The students create their own Jupyter Notebook in which they perform and comment on the data analysis step by step as well as describe and interpret the results and visualizations. Through this self-directed data analysis, students gain new insights into both programming, data analysis and exploration and their own environment.
Grades 5/6
1. Where, how and for what purpose is data collected and processed? – Exploration of location dataClick here for more information about this module
In this teaching module, the context of location data in the mobile telephone network is addressed. First, the structure and functioning of the mobile telephone network are discussed with the help of a puzzle in order to explore why location data is collected, of which users are usually not aware. The telephone company needs this location data, however its use is regulated by very strict laws. Students should experience the dangers if such data were publicly available. For this purpose, a data protection activist published his own telephone location data some years ago. As part of the teaching module, the students explore these real location data with the help of an interactive web application and create a profile of the person from whom the data was collected. This usually results in various interpretations that can be profitably discussed. The collection and possible processing of users‘ location data is reflected upon and discussed in the context of the teaching module on both an individual and a societal level.2. Data as a model of the world – How we learn about nutrition with food datacards and understand machine learning with decision trees
Click here for more information about this module
Using data cards has a strong tradition for introducing young children into elementary multivariate statistics based on an enactivist learning paradigm. Based on these experiences, we created a card game with data cards with information on various nutrition variables of food. Food is classified as recommendable or not recommendable based on the „Big 7“ nutritional values.With the help of playing cards about foods with the corresponding nutritional values per 100 g, students gradually work out one-level and later two-level decision trees by hand. These decision trees are validated with test cards. The knowledge built up in the process is used to understand how decision trees can be created and subsequently used as rule systems based on data. Finally it is reflected how a computer may automatically create decision trees and how these can be used for the classification of previously unknown cases.