Previous Sessions

Session #01, Part 1: October 27, 2021, 4:00 p.m. (CEST, UTC + 2):

Data Detective Clubs in the Time of COVID-19
Jan Mokros and Bill Finzer (USA)

Abstract: (Click to view the full abstract)

The COVID-19 pandemic presents an opportunity to engage young people in exploring how data can be used to understand a public health crisis, make decisions, and save lives. In this session we will describe a multifaceted project involving an adventure story about COVID-19 that is connected to data challenges in which CODAP (Common Online Data Analysis Platform) is used to explore time series of pandemic data. This work takes place in out-of-school clubs around the US, comprising 20 hours over two to three months with students who are 10-14 years old.[]

We will focus on two aspects of this work: First, we’ll demonstrate and discuss the affordances of CODAP and accompanying datasets in understanding how a dynamic pandemic unfolds. For example, CODAP interacts well with NetLogo, which means students (and those attending our session) can set parameters for infectivity and run multiple simulations to see how many people get sick, how many recover, and how long the outbreak lasts. In addition, CODAP’s “Story Builder” feature enables youth to combine graphs, photos, and text to tell the story of what happens over time with COVID under different circumstances and in different places.

Second, we’ll discuss the challenges and opportunities of working with real-time, highly relevant data that transcend the boundaries of school curricula. Most students do not study epidemiology in secondary school, though the subject area is an ideal vehicle for learning about data. Students’ work with data from pandemics also integrates social science, public policy, and the science of viruses.

The session will conclude with a discussion of the social-emotional aspects of using sensitive data. COVID data, like most data that truly matter, elicit a range of social and emotional issues, and we believe it is part of our role as data science educators to address these concerns.


Bio Jan Mokros (Click to view the full bio)

Dr. Jan Mokros is a developmental psychologist who is currently directing National Science Foundation-funded projects focused on how youth learn about data in out-of-school clubs. []


Jan’s work with data science education introduces youth to topics including Lyme disease, teens’ use of time, sports injuries, and COVID-19. The COVID project centers on using a combination of CODAP, data activities, and a young adult adventure book, “The Case of the COVID Crisis”, by Pendred Noyce, to explore infectious disease epidemics. Afterschool programs around the US are using this program with youth who are underserved with respect to STEM. Jan is a Senior Research Scientist at Science Education Solutions. In prior positions, she has designed curriculum and conducted research at TERC and at the Maine Mathematics and Science Alliance. She has authored three books, including one for museum educators on incorporating math into exhibits and programs, and one for parents on exploring math in everyday life. She has been involved as a writer and researcher for the math curriculum Investigations in Number, Data and Space.

Bio Bill Finzer (Click to view the full bio)

Bill Finzer’s work has long centered on getting students using data in every subject they study. []


He led the Fathom Dynamic Data Software development team at KCP Technologies before joining the Concord Consortium in 2014 where he leads development of the Common Online Data Analysis Platform (CODAP). He has been a classroom teacher, curriculum developer, teacher professional development course designer and leader, and educational software developer. Bill works with staff of many projects both inside and outside Concord Consortium to help them make use of CODAP. He loves nothing better than fixing bugs and implementing new features.

Session #01, Part 2:  October 27, 2021, 5:10 p.m. (CEST, UTC + 2):

Teaching machine learning in school: Some emerging research trajectories
Matti Tedre and Henriikka Vartiainen (Finland)

Abstract: (Click to view the full abstract)

A major technological shift has recently triggered discussions about the need to amend computing education at all education levels. Traditional, rule-based automation has been joined by machine learning (ML), which, when provided with enough computing power and data, has enabled new classes of jobs to be automated, and thus expedited automation in the society, workplace, and in people’s everyday lives.[]

Although ML has become an integral part of our lives, communities, and societies, it has gained very little attention in K–12 (school) computing education which mainly focuses on rule-based programming and computational thinking. This talk will map the emerging trajectories in educational practice, theory, and technology related to teaching machine learning in K-12 education. It will situate that research in the broader context of computing education and describe what changes ML necessitates in the classroom. The talk will outline the paradigm shift that will be required in order to successfully integrate machine learning into the broader K-12 computing curricula.


Bio Matti Tedre (Click to view the full bio)

Dr. Matti Tedre is a professor of computer science, especially computing education and the philosophy of computer science, at the University of Eastern Finland.[]

His 2019 book “Computational Thinking” (The MIT Press, with P.J. Denning) presented a rich picture of computing’s disciplinary ways of thinking and practicing, and his 2014 book “Science of Computing” (Taylor & Francis / CRC Press) portrayed the conceptual and technical history of computing as a discipline.

Bio Henriikka Vartiainen (Click to view the full bio)

Dr. Henriikka Vartiainen[]

is a senior researcher at the University of Eastern Finland, School of Applied Educational Science and Teacher Education. Currently, her research focuses especially on learning Machine Learning through co-design as well as on the ways to support children’s data agency.

Session #02, Part 1: November 24, 2021, 4:00 p.m. (CET, UTC + 1):

Beyond Bias. Locating questions of injustice in Data Science and Artificial Intelligence
Tobias Matzner (Germany)

Abstract: (Click to view the full abstract)

Data Science and Artificial Intelligence have repeatedly come under scrutiny because of injustices they produce. AI based services only work for a certain part of the populace, scoring and ranking systems are skewed towards certain groups, errors concentrate on specific persons etc.[]

Thus, when teaching Data Science and Artificial Intelligence, such issues should be part of the curriculum. Most often in research and in educational concepts, injustice is discussed as bias outputs of information processing. In social studies of technology, recently some people have begun to challenge this idea of equating injustice with bias. The talk will shortly present why this is the case – and how this challenge from current research opens up new ways of discussing the social impact of Data Science and Artificial Intelligence in schools.

Bio Tobias Matzner (Click to view the full bio)

Dr. Tobias Matzner is professor for “Media, Algorithms, and Society” at Paderborn University in Germany. His work combines theories of (digital) media and technologies with approaches from political philosophy, cultural studies, and social theory.[]


He has studied philosophy and computer science in Karlsruhe, Rome, and Berlin. Prior to his appointment in Paderborn, he has been working at the International Centre for Ethics in the Sciences and Humanities at the University of Tübingen as well as the New School for Social Research in New York.

Session #02, Part 2: November 24, 2021, 5:10 p.m. (CET, UTC + 1):

Bringing together statistics and computer science education: Machine learning by decision trees grounded in students’ data exploration experiences 
Rolf Biehler and Yannik Fleischer (Germany) 

Abstract: (Click to view the full abstract)

Trees can be used to visualize decision rules for classifications, and students may have encountered trees for different purposes in the mathematics or computer science classroom already. Everyday decisions can be supported by using simple decision trees. A new idea for students is to use trees for predictive modeling (classification) in multivariate data sets. []

The human construction of trees has to be based on insights into the data and its context. Based on such experiences, algorithms for the automatic creation of trees can be developed and critically evaluated. Essential elements of predictive modeling such as the distinction between training and test data, overfitting, consequences of bias in the data (random or systematic sources), different evaluation criteria based on the confusion matrix can be discussed.

We have developed material and educational guidelines for their use for several educational levels (grade 5/6, 9/10, and 11/12). We use different computational tools: Codap (codap.concord.org) as a web-based, easy-to-use data exploration tool that has a plug-in for creating decision trees used for a start. Various types of Jupyter Notebooks, based on Python, require different levels of coding skills from the students. As a rule, we start with unplugged activities at all levels. For instance, we have developed a decision game with data cards for young kids. In their basic version, the Jupyter Notebooks appear menu-driven. In their advanced version, students get worked examples for computational essays that they can adapt for their own data and predictive modeling problems, including the adaptation and enhancement of code. All notebooks use libraries for data exploration and decision tree machine learning that we have adapted for educational purposes from professional sources.

Students encounter various multivariate data sets. These include data on nutrition values of food, data on (social) media use of adolescents, and data from medicine on heart diseases. In addition, parking lot occupancy data from their town were used, where predictive modeling is applied to help reducing parking search traffic and related emissions.

We will present some of our materials and the first results from studies in the classroom where we used the material.


Bio Rolf Biehler (Click to view the full bio)

Dr. Rolf Biehler is professor for didactics of mathematics at Paderborn University. His research interests include probability, statistics and data science education, university mathematics education and the professional development of mathematics teachers. []

He was a co-founder and co-director of the Centre for Research in University Mathematics Education. He is engaged in the International Association of Statistics Education (IASE) and has worked as an editor or editorial board member in several international journals and book series for mathematics education. He is currently co-directing the Project Data Science and Big Data at School.

Bio Yannik Fleischer (Click to view the full bio)

Yannik Fleischerr Yannik Fleischer is a PhD student in mathematics education research at Paderborn University, Germany. []

His main research interest is developing a conception for teaching machine learning methods in school with a focus on decision trees, and to evaluate this by developing and examining teaching materials in practice. Since 2019, he has been teaching year-long project courses on data science in upper secondary and developing, implementing, and evaluating teaching modules for different levels in secondary school, mainly about machine learning with decision trees.

Session #03, Part 1: January 12, 2022, 4:00 p.m. (CET, UTC + 1)

Learning data science through civic engagement with open data
Graham Dove (USA)

Abstract: (Click to view the full abstract)

In this talk I will discuss work undertaken for the project “Learning Data Science Through Civic Engagement With Open Data”. This project, which is funded through the National Science Foundation’s Advancing Informal STEM Leaning (AISL) program, studies the informal data science learning that takes place within workshops, and other events and activities, that have been developed to support community engagement with civic open data in New York City (NYC). []

NYC is a leader in Open Data initiatives, which are centered around the NYC Open Data portal, and which have become enshrined in the City Charter. It also has a large and highly diverse population, including many traditionally underserved communities. As government service provision becomes increasingly digital, large amounts of data are generated and subsequently used to assess need, drive service delivery decisions, and evaluate effectiveness. Services producing these data include education, transport, and 311 service requests (a non-emergency municipal service available in many cities for reporting problems such as noise or public safety concerns), and the data they produce can be probed to ask many important questions such as: “How do City agencies respond to noise in my neighborhood?”, “How do waste and recycling services in my neighborhood compare with others?”, and “Are there more construction permits issued for my neighborhood than similar areas?”. To better understand how diverse communities might access and analyze these data to answer questions, share narratives about issues of concern, and respond to data driven policy and resource allocation, we are studying programs offered by the Mayor’s Office of Data Analytics (MODA) and BetaNYC. MODA is the NYC agency with overall responsibility for the City’s Open Data programs, while BetaNYC is a leading nonprofit organization working to improve the lives of NYC residents through civic design, technology, and engagement with government open data. We study the ecosystem that has emerged around the programs these organizations offer as a possible model for identifying, validating, and evaluating best practices; including questions of participation and potential barriers to entry.

Bio Graham Dove (Click to view the full bio)

Dr. Graham Dove is a human-computer interaction researcher, with experience in participatory approaches to design and citizen science. []


Based in NYU Tandon’s Dept. of Technology Management and Innovation (TMI), and the Center for Urban Science and Progress (CUSP), Graham investigates ways that people who are not experts in data science can use quantitative data and artificial intelligence to inform decision making, advocacy, and creativity in design. Current projects include investigating the informal learning that takes place around NYC Open Data, designing data rich interfaces to support future healthcare work practices. SONYC (Sounds of New York City), which investigates approaches to monitoring and mitigating noise pollution. He has previously worked in Denmark and the UK.

Session #03, Part 2: January 12, 2022, 5:10 p.m. (CET, UTC + 1)

Why should students take a data science course?
Rob Gould (USA)

Abstract: (Click to view the full abstract)

The Mobilize Intro to Data Science (IDS, www.introdatascience.org) course was first offered in 2017 and was, at that time, the only data science course designed for secondary students in the U.S., and possibly anywhere. IDS was developed through a partnership between the Los Angeles Unified School District (the second-larger district in the U.S.), the UCLA Department of Computer Science, the UCLA Graduate School of Education and Information Sciences, and the UCLA Department of Statistics. []

The IDS curriculum has several special features. First, it relies on a data collection paradigm, Participatory Sensing (Burke, et al 2006), in which students use mobile devices to collect multivariate data together. Second, the curriculum teaches students to use R, via the interface Rstudio and the mobilizR package, to organize, prepare, and analyze data. Finally, it relies on a student-centered, activity-based pedagogy. The primary goal of IDS is to develop in students the ability to synthesize statistical and computational thinking (DeVeaux, et al. 2017; Gould 2021) in order to work, and live, ethically and productively in a data-driven world.

Since 2017, interest in preK-12 data science and “data literacy” has blossomed in the U.S (and elsewhere), and varying visions and purposes for data science education have emerged. Data science education is many things to many people; it can be seen as an approach to developing “data literacy”, improving programming skills, increasing “college readiness”, improving equity in mathematics (Burdman,2015), and developing students’ appreciation for and skill in for mathematics. While we agree that many of these purposes are welcome consequences of a well-designed data science course, in our vision of data science education these are secondary to the primary goal, which is to teach students to analyze complex, multivariate data and to develop what has been called “data acumen”. Many in the U.S. see secondary-level Data Science as a sub-discipline of mathematics (Levitt 2019), which naturally affects the scope and intent of a course.

In this presentation, we’ll describe the IDS vision of data science and IDS’s role in developing data literacy for secondary students. We’ll also discuss some of the challenges that remain in implementing this vision, including teacher preparation and university acceptance.

References:


Bio Rob Gould (Click to view the full bio)

Dr. Rob Gould is a teaching professor and vice-chair of undergraduate studies in the Department of Statistics at UCLA, active in statistics and data science education since 1994. He is the founder of DataFest, a 48-hour undergraduate data analysis competition sponsored by the American Statistical Association and held at over 40 sites around the world.  []


He is co-author with Colleen Ryan and Rebecca Wong of an introductory statistics book. He was the lead Principal Investigator of the NSF-funded Mobilize project, which produced the Introduction to Data Science curriculum, the first high school data science curriculum in the U.S. Rob was elected Fellow of the American Statistical Association in 2012; in 2019 was awarded the CAUSE Lifetime Achievement Award for Statistics Education and the ASA Waller Distinguished Teaching Career Award. He received his B.S. from Harvey Mudd College in 1987 and PhD in Mathematics (concentration on Statistics) from the University of California, San Diego, in 1994. He is Vice-president of International Association of Statistics Education (IASE).

Session #04, Part 1: April 27, 2022, 4:00 p.m. (CEST, UTC + 2):

Why Computing Education, and Especially CT, Needs a Broader Perspective!
Arnold Pears (Sweden)

Abstract:

Computing education has focussed on introductory programming, nearly to the exclusion of all other CS content, a mistake that plagues the discipline. Computational Thinking (CT) runs the risk of making a parallel error by focussing on aspects of computation unique to the imperative programming paradigm and sequential execution. An inordinate focus on loops, sequences and alternation runs the risk of impoverishing the computing discipline, and by ignoring vital areas such as concurrency and data parallelism, runs the risk of educating future generations in an obsolete programming tradition.


Bio Arnold Pears (Click to view the full bio)

Arnold Pears is Professor and Chair of the Department of Learning in Engineering Sciences at the KTH Royal Institute of Technology, Sweden. He also holds a professorship in Computer Science at Uppsala University, Sweden. Professor Pears received his BSc(Hons) in 1986 and PhD in 1994, both from La Trobe University, Melbourne, Australia.[]


In the late 1990’s, Together with colleagues Dr. Berglund and Prof. Daniels, Prof. Pears established the UpCERG research group in computing and engineering education research at Uppsala University. As foundation professor at KTH he has lead research in all areas of technical and engineering education since 2017. His recent work includes several articles on computing in schools. He has published over 100 articles in leading Computing and Engineering education journals and conferences. He has delivered a number of keynote addresses, and is well known as a computing and engineering education researcher through his professional activities in the ACM, and IEEE.

Contributions to the academic and professional community include his roles as a member of the Board of Governors of the IEEE Computer Society 2012-2014, where he is active coordinating education conferences; serving on the steering committee of the Frontiers in Education Conference and as Chair of the Special Technical Community (STC) for Education. In addition, he is a Director of CeTUSS (The Swedish National Center for Pedagogical Development of Technology Education in a Societal and Student Oriented Context, www.cetuss.se) and the IEEE Education Society Nordic Chapter. He also serves as a reviewer for a number of major journals and conferences, including the Computer Science Education Journal (Taylor and Francis), the ACM SIGCSE and ITiCSE and Koli Calling International Computer Science Education conferences.

Recent publications include „Does Quality Assurance Enhance the Quality of Computing Education?“, in the Proceedings of the 12th Australasian Computer Science Education Conference, 2010, and models for research driven education in Computing, „Conveying Conceptions of Quality through Instruction“, in the 7th International Conference on the Quality of Information and Communications Technology, 2010.

Prior appointments include, lecturer and senior lecturer at La Trobe University between 1991 and 1998. Since 1999 senior lecturer at Uppsala University, Sweden where he was awarded the Uppsala University Pedagogy Prize in 2008, and promoted to Associate Professor of Computing Education Research in May 2011, and Professor in 2017. Roles at Uppsala University include appointment to the University Academic Senate, Programme Director for the IT Engineering programme, member of the selection committee for the Uppsala University Pedgogy prize and as member of the educational advisory board of the Faculty of Technology and Natural Sciences.

Session #04, Part 2:  April 27, 2022, 5:10 p.m. (CEST, UTC + 2):

Education for a fast-changing world: Conceptions of Statistical Literacy and Data Science
Jim Ridgway (England)

Abstract:

The data landscape is in a continued state of flux. New sorts and sources of data emerge from new creators; new ways to interact with data are created; we ill-understand the ways new information shapes beliefs and actions. Here, I will map out some of the elements in the evidence ecosystem – producers and consumers, both well- and ill- intentioned. What do students need to learn, if they are to navigate this brave new world? We will explore the needs of future citizens in their roles as spectators, referees and players in this ecology, and consider how well these needs map onto conceptual frameworks describing statistical literacy, and data science.


Bio Jim Ridgway (Click to view the full bio)

Jim Ridgway is an emeritus professor at Durham University, with a background in cognitive psychology. Past work has included the creation of materials to develop mathematical thinking on undergraduate courses in the USA, creation of computer-based materials to identify students in poorly-supported communities who have a flare for STEM (subsequently used in 20+ countries), work with the House of Commons Library to provide (huge amounts of) data accessible to citizens via their phones (along with some gamification), design and delivery of the first OECD workshop for politicians and policy makers on evidence-informed decision making, EU-funded projects on girls and STEM; and ProCivicStat, an Erasmus-funded collaboration between 5 countries which has developed materials to engage students with issues such as poverty, migration, gender inequality and racism.[]

A current project entitled firing up the epistemological engine plans to use AI (and conventional methods) to challenge some current research practices and conclusions in science and medicine.
Latest book: Teaching Data Science and Statistics  (eds. MacGillivray, Gould, Ridgway Special Edition of Teaching Statistics (vol 43, Summer 2021)
Forthcoming: J. Ridgway (Ed.),  Statistics for Empowerment and Social Engagement: teaching civic statistics to develop informed citizens. Springer. 

Session #05, Part 1: May 18, 2022, 4:00 p.m. (CEST, UTC + 2):

Data Awareness: Be aware of the data!
Lukas Höper & Carsten Schulte (Germany) 

Abstract: (Click to view the full abstract)

In data and in digital literacy one core issue is to enable students to cope with the datafication of their everyday lives, and hence to become data literate. Based on this line of reasoning the debate then focusses on what skills to teach. We add to this discussion by trying a slightly different angle: to make students aware of the data flows they create and probably can influence when interacting with digital artefacts like social messengers, recommendation system of video streaming portals, or simply when using a mobile phone. []

In order to enable students to understand these processes of data collection and processing when using data-driven technologies, we developed the framework data awareness. The goal is to enable students to become aware of and understand the collection and processing of data about them during interaction with digital artefacts. It also aims to provide students with appropriate skills and adequate knowledge to apply this to their own daily lives, and to enable them to evaluate data-driven systems and their impact. This is intended to create the basis for them to be able to shape the data-driven world (in the sense of agency).

In this talk we will first introduce this framework data awareness. We will then present two exemplary teaching units for fostering data awareness. The first is about exploring the mobile phone system and location data traces of one user; the second is about a recommender system for movies and how it works (e.g., using the k-nearest-neighbour method).


Bio Lukas Höper (Click to view the full bio)

Lukas Höper is PhD student for computing education research at Paderborn University, Germany. []

The main research interest is to develop the concept data awareness for computing education and evaluate this within design-based research by developing and empirically examining teaching materials in practice. Since 2020, he has been working on data awareness in the ProDaBi project, in which curriculum ideas, teaching materials and teacher education approaches regarding data awareness, artificial intelligence and data science in schools are developed.

Bio Carsten Schulte (Click to view the full bio)

Carsten Schulte is professor for computing education research at Paderborn University, Germany. []

Work and research interests are: Philosophy of computing education and empirical research into teaching-learning processes (including eye movement research). Since 2017, he has been working together with Didactics of Mathematics (Paderborn University) in the ProDaBi project, in which Data Science and Artificial Intelligence are prepared as teaching topics. He is also PI in the collaborative research centre ‘Constructing Explainability’ on explainable AI.

Session #05, Part 2: May 18, 2022, 5:10 p.m. (CEST, UTC + 2):

Teaching Core Principles of Machine Learning with a Simple Machine Learning Algorithm: The Case of the KNN Algorithm 
Orit Hazzan & Koby Mike (Israel) 

Abstract: (Click to view the full abstract)

Data science is a new interdisciplinary science that focuses on extracting insights and value from data. Upon scanning introductory data science courses, one usually finds that they include several machine learning algorithms of different kinds.[]

In this talk, we propose that only one simple algorithm may be sufficient for such courses, illustrating our approach using the KNN algorithm. The main reason we propose the KNN algorithm is that it is simple to understand both from a mathematical perspective and from an algorithmic perspective. This approach is implemented in the basic level of the data science unit of the Israeli high school computer science curriculum. We highlight our approach from three perspectives: Computational, cognitive and pedagogical. We show that despite the simplicity of the KNN algorithm, it enables to expose novice data science learners to the main ideas of machine learning and to pose interesting questions that address its core concepts. We also discuss how such an approach may eliminate barriers, which new teachers may encounter, to both learning the topic and teaching it.  In the discussion, we invite the audience to suggest other algorithms that may serve as the sole algorithm taught in introductory data science courses.

Bio Orit Hazzan (Click to view the full bio)

Professor Orit Hazzan  is a faculty member at the Technion’s Department of Education in Science and Technology since October 2000. Her research focuses on computer science, software engineering and data science education.[]


Within this framework she researches cognitive and social processes on the individual, the team and the organization levels, in all kinds of organizations. She has published about 130 papers in professional refereed journals and conference proceedings, and seven books.
In 2006–2008 she served as the Technion’s Associate Dean of Undergraduate Studies. In 2007-2010 she chaired the High School Computer Science Curriculum Committee assigned by the Israeli Ministry of Education. In 2011-2015 Hazzan was the faculty Dean. From 2017 to 2019, Hazzan served the Technion Dean of Undergraduate Studies.

Additional details can be found on her personal homepage.

Bio Koby Mike (Click to view the full bio)

Koby Mike  is a Ph.D. student in the Technion’s Department of Education in Science and Technology under the supervision of Professor Orit Hazzan.[]


He holds a B.Sc. and a M.Sc. in electrical engineering. Koby’s doctoral research focuses on data science education. As part of his research, he teaches data science in high school and at Tel Aviv University. Prior to his doctoral studies, Koby has gained an extensive experience in the Israeli hi-tech industry.

Session #06, Part 1: June 22, 2022, 4:00 p.m. (CEST, UTC + 2)

My AI discriminates? How could this happen and who is to blame?
Marc Hauer (Germany)

Abstract: (Click to view the full abstract)

For some years now, artificial intelligence methods have been used in many areas of daily life. Many applications have been criticized for being discriminatory. There are several ways to deal with such cases: Training datasets can be improved to reduce discriminatory behavior, discriminatory model outputs can be modified post hoc, processes can be established to make discriminatory results usable. In any case, the preliminary assumption is that discrimination can be measured. []

The development process of an AI consists of several steps, collection and processing of data is only one of them. Errors can occur at all steps, which have an effect on later steps. This means control processes are needed at all steps and the transitions between them. In addition, responsibilities must be assigned at all these points so that it is clear who must react to errors and problems. At this point, the concept of the „Long Chain of Responsibilities“ is introduced, which helps to clarify these responsibilities.

In this talk, we will talk about how discrimination can enter AI, who is responsible for it, and how discrimination can be operationalized.

Bio Marc Hauer (Click to view the full bio)

Marc Hauer is a PhD candidate on the question of how to make software development processes and AI products accountable. []

Additionally, he works as media education consultant for the Landesmedienzentrum Baden-Württemberg in the education of students, parents and teachers on the topic of computer science and society and for TrustedAI GmbH in AI consulting for companies.


Session #06, Part 2: June 22, 2022, 5:10 p.m. (CEST, UTC + 2)

A Framework for Exploring the Purposes and Processes of Data Wrangling in Complex Self-Directed Analysis Tasks
Michelle Hoda Wilkerson (USA)

Abstract: (Click to view the full abstract)

The data science education community at large is advocating to include open-ended exploration of large datasets in curriculum and instruction. Introducing this level of flexibility, however, requires teachers and students to wrangle data — that is, to transform complex datasets so that they can be used to particular lines of inquiry. []

In this talk, I will draw empirical examples from two data science education projects that engaged teens and young adults in analyzing complex public datasets about socioscientific issues (e.g. public transit, genetics, environmental justice): Data Science Games and Writing Data Stories. Through these examples, I will present a framework for understanding the process of data wrangling as novice analysts determine
(a)  whether  a particular question can be explored using a given dataset;
(b)  what  transformation needs to be applied to the dataset in order to pursue that question, and
(c)  how  to execute that transformation using the tools at hand.

The framework and examples lend insight into how actions by learners that may initially seem inappropriate (such as rejecting a dataset or applying unexpected transformations) can be understood as sensible from the perspective of learners‘ goals and contexts. More generally, they highlight how the interplay of learners‘ investigative goals, the data context, and the available tools all shape the ways in which a complex data investigation unfolds. I will discuss how considering these elements of data exploration in interaction can lead to the more thoughtful development of educational data science tools, activities, and assessment.

Bio Michelle Hora Wilkerson (Click to view the full bio)

Michelle Hoda Wilkerson is an Associate Professor in the Graduate School of Education and the Graduate Group in Science and Mathematics Education at the University of California, Berkeley. Her research broadly explores the question: How is computing changing what is important to teach and learn in middle and high school science and mathematics classes?  []


This has led her to study how young people learn with and about scientific computing artifacts such as simulations, data analysis tools, and interactive visualizations. Recently, she has explored how learners‘ relationships with data — for instance as consumers, subjects, and creators of data — shapes how they understand and engage in data analysis. Michelle’s research has been supported by the United States National Science Foundation (NSF), the George Lucas Education Foundation, and Google Education Research. Her work has appeared in general and STEM-specific venues including Educational Researcher, Journal of the Learning Sciences, Science Education, and the Journal of Science Teacher Education and in 2020, her work was recognized with the American Educational Research Association’s Jan Hawkins Award for Humanistic Research and Scholarship in Learning Technologies.  

Information on the colloquium

Data science, artificial intelligence, machine learning, data literacy, and statistical literacy concerning secondary education are currently discussed in the communities of scientists and educators in statistics, mathematics, computer science, social and natural sciences, and media education. Our colloquium intends to bring together these perspectives and communities to create an interdisciplinary community for scientific exchange.  

Since data science and artificial intelligence have become more and more relevant in industrial and economical automation processes, marketing processes, and monitoring in politics, both topics permeate nearly all areas of life. These influences raise questions about future possibilities for social participation, self-determination, and self-realization in the professional and private sector, resulting in the need for educational processes that address these issues in school. For the teaching of mathematics and computer science completely new challenges have emerged, as well as for the subjects of the socio-scientific field and cross-curricular media education. 

In our colloquium, we want to take up these issues and discuss state of the art and future trends of education in data science and artificial intelligence that can inspire ideas for teaching data science in secondary schools. We also want to discuss fundamental ideas of data science as they are conceptualized by experts in this field since a broad perspective of data science as a scientific discipline is needed to inform curriculum development. Contributions to the colloquium will also present practice-oriented research as well as research on teachers’ professional development. 

ProDaBi (Project Data Science and Big Data at School) develops research-based teaching material and professional development courses for teaching data science and artificial intelligence for grades 5 to 12. It was initiated and is funded by the Deutsche Telekom Stiftung since 2018. 

Registration

The colloquium is open and free for everyone and will be held via Zoom. To register for the sessions #01-#03 of the colloquium, please fill out the form, which you can access via the link below.
After the registration, you’ll be emailed the information for the sessions including the Zoom-Access-Data (which is the same for all three sessions). If you have any questions, please do not hesitate to contact us at the following mail address: prodabi@mail.upb.de
Please distribute the information to interested colleagues so that they can also register.

Please register here for the colloquium