1 Introduction

Recognizing the critical role of proof in mathematics, policy and curricular documents worldwide emphasize teaching proof across grade levels and mathematical topics (e.g., ACARA, 2022; MINEDUC, 2019; NCTM, 2009; NGA & CCSSI, 2010). Teacher education programs are expected to prepare their graduates to implement this vision of teaching. Many secondary teacher preparation programs provide strong content preparation, including courses on proof taught in mathematics departments by mathematicians, to both mathematics and mathematics education majors (Blömeke et al. 2014; Tatto, 2013). Studies have shown that prospective secondary teachers (PSTs) experience difficulties with proof at the university level, struggling with topics such as understanding the relationship between empirical and deductive reasoning, including the role of examples in proving (e.g., Weber, 2010); reasoning with conditional statements (e.g., Durand-Guerrier, 2003); proof production and comprehension (e.g., Hodds, et al., 2014) and proof by contradiction (Antonini & Mariotti, 2008).

With few opportunities for making connections between university proof courses and their future classroom practice, PSTs often develop views of proof as formal exercises in university mathematics, disconnected from secondary classrooms (Stylianides et al., 2017). For example, Schwarz et al. (2008) examined future teachers’ professional knowledge of argumentation and proof in Germany, Hong Kong, and Australia. Across all three countries, the authors concluded that “possessing a tertiary mathematical background as required for teaching and having a high affinity with proving in mathematics teaching at the lower secondary level are not sufficient preparation for teaching proof” (p. 808).

Felix Klein (1932) alluded to the problem of double discontinuity—the feeling of disconnect that future teachers experience when first encountering university-level mathematics, and then again, at the start of classroom teaching. While the feelings of disconnect apply to university mathematics in general, (e.g., Goulding et al., 2003; Winsløw & Grønbæk, 2014), the situation with proof seems to be unique due to the explicit expectation by policy documents and national curricula that teachers will integrate proof in their classroom teaching. This suggests that universities need to develop more effective approaches for supporting PSTs developing proof-specific knowledge and practices.

Toward this end, we conducted a three-year design research study (Sandoval, 2014) in which we developed a capstone courseFootnote 1Mathematical Reasoning and Proving for Secondary Teachers, as a culminating experience of a mathematics education program. The course was offered at the mathematics department and taught by mathematics education faculty, the first author of this paper. The course consists of four modules, each addressing a topic identified in the literature as posing persistent difficulties to students and teachers alike: (a) direct proof and argument evaluation, (b) conditional statements, (c) quantification and the role of examples in proving, and (d) indirect reasoning. The course activities intended to help PSTs to crystalize their mathematical knowledge, connect it to secondary curriculum, learn about students’ conceptions, and apply this knowledge by designing lessons that integrate proof with secondary mathematical topics and enacting them in local schools (Buchbinder & McCrone, 2020).

During the three-year design research project, we systematically studied the impact of the course on PSTs’ content and pedagogical knowledge of proof. This paper provides evidence that the course design and activities were conducive to PSTs’ learning and points to the design principles supporting that learning. Due to space constraints, we focus on one course module: Quantification and the Role of Examples in Proving (QRE) and formulate our research questions with respect to it:

  1. 1.

    How do PSTs’ knowledge and practices related to quantification and the role of examples in proving change due to participation in the course?

  2. 2.

    How did the course activities contribute to the observed changes in PSTs’ knowledge and practices?

The broader project examined similar questions with respect to other modules and the whole course; presentation of all results is beyond the scope of this paper.

2 Theoretical perspectives

2.1 Reasoning and proving at the secondary level: the focus on QRE

One of the main challenges to conceptualizing reasoning and proving in schools has been to differentiate school-level proofs from formal, university-level proofs while preserving the integrity of proof as a hallmark of mathematical validity (Harel & Sowder, 2007). To address these challenges, we adopt a definition of proof as “a mathematical argument for or against a mathematical claim that is both mathematically sound and conceptually accessible to the members of the local community where the argument is offered” (Stylianides & Stylianides, 2017, p. 121). This definition can equally apply to a community of mathematicians and of school students. Importantly, it suggests that the validity of student arguments should be grounded in deductive reasoning rather than authority of a teacher, textbook or empirical evidence, and does not require proof to be formal or have a certain format.

Recent reviews of international literature (e.g., Mariotti, et al., 2018; Stylianides et al., 2017) show that proof has been a challenging topic to learn and to teach. Students at all levels, as well as prospective and even practicing teachers experience challenges with understanding the relationship between empirical arguments and deductive reasoning. A recurring finding around the world is that students rely on supportive examples as proof, without realizing the limitations of such reasoning (Stylianides & Stylianides, 2017). PSTs also, tend to consider empirical arguments as more convincing than deductive proofs (Ko, 2010), and both pre- and in-service teachers have been shown to struggle distinguishing between valid and invalid arguments (Harel & Sowder, 2007; Ko, 2010). Thus, teachers may miss opportunities to address the limitations of empirical arguments with their students, or unintentionally reinforce this problematic conception.

Another well-documented finding is that students and teachers treat counterexamples as exceptions, rather than disproof, or tend to prefer multiple counterexamples (e.g., Lee, 2016; Weber, 2010). Tabach et al. (2010) identified challenges related to existential (there exist) statements. Teachers in that study struggled to accept correct students’ proofs of existential statements when those relied on a supportive example, or to reject incorrect “disproof” by non-supportive examples.

As teachers are charged with helping students develop mathematically accepted notions of proof and the roles of examples in proving/disproving quantified statements, it is important that teachers themselves have a strong mathematical and pedagogical knowledge base.

2.2 Mathematical Knowledge for Teaching Proof: MKT-P with the focus on QRE

Researchers have conjectured that teaching mathematical reasoning and proving requires a special type of teacher knowledge: Mathematical Knowledge for Teaching Proof (MKT-P). Several frameworks deliniating this concept have been proposed over the years (e.g., Corleis et al., 2008; Harel, 2008; Lesseig, 2016; Lin et al., 2011; Stylianides, 2011). Following Stylianides’ (2011) and Harel’s (2008) approaches, we conceptualize MKT-P as comprised of three facets: Knowledge of the Logical Aspects of Proof (content knowledge of proof), Knowledge of Content and Students specific to proof (KCS-P) and Knowledge of Content and Teaching specific to proof (KCT-P) (Buchbinder & McCrone, 2020).

Knowledge of the Logical Aspects of Proof includes knowledge of different types of proofs, valid and invalid modes of reasoning, logical relations, a range of definitions and theorems and the roles of examples in proving. The latter includes the types of inferences that can be drawn from examples with respect to two types of quantified statements: universal statements—“all objects in a domain D, satisfy some property P(x),” and existential statements— “there is an object in D that satisfies P(x).” Examples that are in the domain and satisfy the property, support but do not prove a universal statement, while one such example proves the existential statement is true. Examples that are in the domain but do not satisfy the property are counterexamples disproving a universal statement but are merely non-confirming for an existential statement; insufficient to disprove it. Examples that are not in the domain are irrelevant to proving or disproving either type of statement [cf., Buchbinder and Zaslavsky’s (2019) Role of Examples in Proving framework]. In classrooms, teacher knowledge of logical aspects of proof is manifested in their use of clear and accurate mathematical language, notation, and ability to identify and correct students’ logical mistakes.

Knowledge of Content and Students specific to proof involves knowledge of students’ proof-related conceptions, misconceptions, and common mistakes. With respect to the role of examples in proving, KCS-P involves recognizing student challenges with this topic, such as confusion between universal and existential quantifiers, the difficulty to discern between examples, counterexamples and irrelevant examples for a given statement, and making inferences that consider both the type of quantifier and the type of example (Buchbinder & Zaslavsky, 2019; Durand-Guerrier, 2003). Related classroom practices involve the teacher’s ability to identify and anticipate students’ proof-related misconceptions, facilitate discussions and explain proof concepts.

Stylianides (2011) notes that while classroom discussions may offer opportunities to discuss the differences between empirical examples and proof, helping students to overcome their proof-related misconceptions requires carefully planned classroom interventions. Thus, Knowledge of Content and Teaching specific to proof (KCT-P) involves pedagogical strategies like identifying curriculum opportunities for reasoning and proving, designing, and enacting proof-related tasks. Specific to the roles of examples in proving, it involves designing tasks with opportunities for students to learn about mathematically acceptable ways to prove or disprove quantified statements and the roles examples in these processes.

The three facets of MKT-P are closely intertwined. Drawing distinctions between them serves the operational purpose of capturing and assessing MKT-P. The conceptualization of the classroom manifestations of MKT-P draws on the literature connecting teacher knowledge to quality of instruction (e.g., Charalambous, 2020; Kunter et al., 2013). The theoretical perspective on the multidimensionality of MKT-P as comprised of content knowledge, pedagogical knowledge, beliefs, and practices aligns with other general frameworks on teacher competence (Blömeke, et al., 2015; Kunter et al., 2013), mathematical knowledge for teaching (Ball et al., 2008; Shulman, 1986), and frameworks utilized in international studies like MT21 (Schmidt 2013), and TEDS-M (Tatto, 2013). Our MKT-P framework distills elements specific to reasoning and proof from this research base and MKT-P literature.

3 Method

3.1 Design research methodology

Design research methodology intertwines instructional design and educational research (Gravemeijer & Prediger, 2019) through iterative stages of design, implementation, analysis, reflection, and refinement of learning environments. Researchers concurrently design environments and study phenomena emerging in them to develop local—domain or topic-specific- learning theories that connect instructional design with learning outcomes (Fig. 1).

Fig. 1
figure 1

Four phases of design-based research

The final stage of design research is a retrospective analysis of all data and partial theories from earlier cycles to produce contextually sensitive design principles (Sandoval, 2014). We report on the results of this reflective analysis following three iterations and provide evidence for how the finalized design principles contributed to the development of PSTs’ knowledge and practices specific to the Quantification and the Role of Examples module.

3.2 The capstone course design

The initial design principles for the course Mathematical Reasoning and Proving for Secondary Teachers came from the analysis of literature on students’ and teachers’ conception of proof and difficulties with proving (e.g., Ko, 2010; Schwarz et al., 2008). Through this analysis, we identified four proof themes for the course modules: direct proof and argument evaluation, conditional statements, quantification and the role of examples in proving, and indirect reasoning.

Teacher knowledge and practices outlined in our MKT-P framework, represent the desired learning outcomes and the course objectives. To identify pedagogical strategies for supporting these objectives, we consulted literature on PSTs’ learning in undergraduate programs (e.g., Grossman et al., 2009; Kunter et al., 2013; Wasserman et al., 2019). The key elements emerging from the literature inspired the three types of course activities: crystallize, connect, and apply. Later, after iteratively testing the course design and conducting retrospective analysis, these types of activities became formulated as design principles (Sandoval, 2014). Crystalize activities provide PSTs opportunities to refresh and enhance their content knowledge of the four proof themes. The connect activities focuse on connecting university-level knowledge of proof to secondary mathematics, and on increasing PSTs’ awareness of students’ difficulties with proving. Apply activities provide opportunities to enact content and pedagogical knowledge of proof. These types of activities integrate various aspects of MKT-P as they support proof-specific content and pedagogical knowledge and practices. The crystalize-connect-apply structure was reproduced in each module with specific activities (Buchbinder & McCrone, 2020), including QRE module, described below.

3.3 The QRE module

3.3.1 What Can You Infer from This Example?

The module included two What Can You Infer from This Example? activities, (Buchbinder et al., 2017), completed by PSTs individually, online. In Activity 1 PSTs were given a false universal statement: “A quadrilateral whose diagonals are congruent and perpendicular to each other is a kite,” and asked to determine its logical structure and truth-value, and to justify their response. Next, PSTs examined five fictitious students’ examples and determined what can be inferred from each example about the statement: (1) a square and (2) non-convex kite with congruent and perpendicular diagonals only support the statement, (3) an isosceles trapezoid with perpendicular diagonals and (4) a general quadrilateral with congruent and perpendicular diagonals disprove the statement, and (5) a general convex kite—irrelevant to the statement, since its diagonals are not congruent (Fig. 2). Next, the PSTs could revise their original assessment of the statements’ truth-value. Then, they watched a cartoon-based classroom scenario of students confused about which quadrilateral is “the best” counterexample: an isosceles trapezoid that has congruent and perpendicular diagonals but is not a kite, or a general kite, which does not have congruent diagonals. The PSTs wrote a scenario (Zazkis et al., 2013) describing how they, as teachers, would lead the discussion to resolve the confusion.

Fig. 2
figure 2

What Can You Infer from This Example? (Graphics are © 2021, The Regents of the University of Michigan, used with permission)

In Activity 2 the PSTs analyzed a true existential statement: “There exist three consecutive even numbers whose sum is divisible by four.” They examined six fictitious students’ examples; determining, for each example, whether it proves the statement, disproves it, neither proves nor disproves, or cannot be used to evaluate the statement. For instance, (8, 10, 12) are consecutive even numbers whose sum is not divisible by 4; they neither prove nor disprove the statement, since an existential statement cannot be disproved by non-supportive examples. A triplet (4, 6, 10) has non-consecutive even numbers, making it irrelevant, even though the sum is divisible by 4. These distinctions were not clarified in advance but were left for the PSTs to deduce from the activity. Next, the PSTs could revise their original assessment of the truth-value of the statement.

Although these activities are embedded in a school context, they intend to crystalize PSTs’ content knowledge of QRE, since identifying what can be inferred from a given example does not require any pedagogical knowledge. Analyzing the classroom scenario and writing its continuation supports pedagogical knowledge of QRE, embodying the connect design principle.

3.3.2 Task analysis: true–false vs. always–sometimes–never

In this activity the PSTs solved two tasks with the same mathematical content but different requirements. The True–False task required sorting of six statements into non-mutually exclusive categories: universal, existential, require a general proof, require a disproof by a counterexample, require a proof by supportive example, have only supportive examples, have both supportive and counterexamples, and have no supportive examples. The Always–Sometimes–Never task contained the equations from the True–False task not embedded in quantified statements (Fig. 3). For each equation, the PSTs determined if it is true for all values of parameters, some values, or no values of the parameters. This intended to emphasize that while a predicate P(x) can be true or false, depending on the variable (x), a quantified statement must have a single truth-value. Next, for all six equations, the PSTs wrote their own, true, universal or existential statements.

Fig. 3
figure 3

a, b Items from the True–False and Always–Sometimes–Never tasks

The PSTs completed these tasks in groups, each group with one version of the task, and then compared their work. This activity is mainly focused on crystalizing content knowledge of QRE. The connect aspect is manifested in the use of secondary school mathematical content, and in the follow-up discussion where PSTs contemplated learning opportunities for reasoning and proving afforded by these tasks.

3.3.3 QRE-integrated lesson cycle

For the apply activity, each PST reached out to their cooperating school teacher to find out the mathematical topic to be taught. The PSTs designed a lesson plan integrating that topic with some key ideas of QRE. To support the PSTs, the instructor devoted class time to lesson planning, and consultation with peers and the course instructor. Next, the PSTs taught small groups of students in a local school, videorecorded the lesson and wrote a reflection report. PSTs’ lesson plans were graded for scope and richness of proof-related tasks and correctness of mathematical explanations; the classroom teaching was not assessed to reduce performance pressure; the post-lesson reflections were graded for completion. For research purposes, we re-analyzed the entire data corpus, as described below.

3.4 Participants

The participants were 34 PSTs (22 females, 12 males) who took the capstone course in the three years of the project. The PSTs were in their final, fourth year of secondary mathematics education program, having completed most of their mathematical courses alongside mathematics majors, including proof-intensive courses like Mathematical Proof, Geometry, and Abstract Algebra (some PSTs took it concurrently with the capstone course). Also, all PSTs had completed one or two mathematics education courses, which had no classroom practicum component.

3.5 Data sources and analysis

We used a combination of descriptive statistics and qualitative analyses (Table 1).

Table 1 Summary of evaluation methods by data source and knowledge type

To assess changes in PSTs’ knowledge we compared their pre-post course performance on the MKT-P questionnaire (Cronbach’s alpha = 0.892; moderate to good internal consistency). The number of QRE items varied between 9 to 11 across versions of the questionnaire, which changed slightly over the years. The items were distributed among the three facets of MKT-P and mathematical topics: algebra, functions, and geometry; but we report on them in aggregate, since there were not enough data points in each set. Each item was scored on a scale from 0 to 3. Three points were given for a correct answer supported by a correct explanation; intermediate scores of 1 or 2 were given for partially correct responses. We used ANOVA and paired t test to compare pre-to-post-course means, with Bonferroni correction for multiple comparisons. To compensate for unequal variances, we used Tukey’s Honest Significant Difference test.

Questionnaire data were triangulated using multiple sources: PSTs’ responses to the course activities, lesson plans, and written artifacts. For each data source, we developed an analytic rubric based on our MKT-P framework. To examine the development of PSTs’ content knowledge, we analyzed shifts in their responses to What Can You Infer from This Example? tasks before and after the PSTs examining students’ examples, coding for the mathematical correctness and types of justifications.

To examine the PSTs’ proof-specific knowledge of content and teaching, we analyzed the extent to which they integrated QRE in their lesson plans. Lesson planning is one of the core tasks of teaching, often used to assess pedagogical knowledge. Of the common criteria for evaluating lesson plans, we focused on objectives, formative assessment, explanations- adaptation of mathematical content to facilitate learning, and task design (Blömeke et al., 2008; Silver et al., 2009). Each lesson plan was scored on four parameters: (1) the ratio of QRE-related lesson objectives; (2) the ratio of QRE-specific discussion prompts in lesson’s summary; (3) a dichotomous score of 1 or 0, based on inclusion, or not, of an explanation of the roles of examples in proving; (4) the ratio of tasks containing opportunities for students to engage with QRE. The unit of analysis was the smallest unit in which students were asked to do something, e.g., calculate, justify, generalize, prove. Adding the four ratios resulted in a numeric score ranging from 0 for lessons with no proof content, to 4 for lessons devoted to QRE entirely.

Following lesson enactment, the PSTs watched the video of their lesson, wrote a reflection on what they noticed (about 8–9 comments per 50-min video), and responded to prompts like: What QRE-related ideas were included in your lesson? How do you know if students understood them? We analyzed PSTs’ pedagogical learning from these reflections using the concept of reflective noticing (Buchbinder et al., 2021), which combines teacher noticing (Dindyal et al., 2021) and reflection (Moore-Russo & Wilsey, 2014). We also analyzed PSTs’ summative, post-course reflections, using open coding (Patton, 2002), to identify recurring themes of PSTs’ perceived challenges with, and learning from the course activities.

4 Results

4.1 Enhancement of PSTs’ knowledge of QRE

4.1.1 Evidence from the MKT-P questionnaire

We present evidence for qualitative and quantitative improvements in PSTs’ pre-to-post course performance. Figure 4 shows a KCS-P item in which a student Kacey, attempted to prove a general conjecture with random examples. Successful completion of the item entails recognizing the flaw in Kacey’s argument, rating it low and explaining the misconception.

Fig. 4
figure 4

A KCS-P item Kacey

To illustrate improvement in PSTs’ performance, consider Eva’sFootnote 2 response. On the pre-test Eva rated Kacey’s argument high (3 out of 4) since “it is not a reasoning that can be used with only one example.” The high rating and focusing on a single example may suggest Eva’s own fragile understanding. On the post-test Eva rated Kacey’s work low (2 out of 4) and explained:

Kacey shows a supporting example of the conjecture. Her example supports the statement. However, this one example is not enough information to prove the statement is true in general, she needs a general proof.

Here Eva used precise vocabulary and provided clear indication of understanding the limitation of supportive examples for proving universal statements, which require a general proof.

These types of changes were typical of many PSTs; reflecting stronger content knowledge and improved ability to identify students’ misconceptions. The mean post-score on Kacey item in years two and three increased by a full point; in year one the mean score increased by 0.4 points due to a high mean pre-score of 2.57 out of 3 (on a scale from 0 to 3, as described in Sect. 3.5).

Figure 5 shows a KCT-P item in which a student Sam conjectured that 0 and 2 are the only numbers whose product is equal to their sum. This non-existence statement is equivalent to a universal statement: all numbers except 0 and 2, do not have this property. A student Calvin finds only pairs of numbers whose sum is not equal to their product and concludes that the conjecture is true. But the inability to find supportive examples is insufficient to prove nonexistence. In fact, there are infinite number pairs whose product is equal to their sum, e.g., 3 and 3/2, 4 and 4/3. Regardless of whether the PSTs noticed that, they were expected to identify a flaw in Calvin’s argument and provide feedback on his work.

Fig. 5
figure 5

A KCT-P item Calvin

On the pre-test Bella complimented Calvin on trying multiple types of numbers but criticized the use of examples as proof, writing: “errors and weaknesses are that you can’t prove by example with just a handful.” This response makes one suspect whether more examples would be acceptable to Bella. On the post-test she wrote:

He used proof by examples to show no more exist, and that is a flaw in thinking for universal statement. Calvin tried more examples than Sam but that is not enough to prove that 2, 0 are only ones. His thinking of finding a counterexample is correct, but his counterexample must be a number that works to show 0, 2 aren’t only it. Showing proof by example here isn’t good enough.

This answer illustrates typical pre-post changes in PSTs’ responses: the increased use of correct vocabulary and correct explanation of inapplicability of a proof by example. It also specifies what type of object constitutes a counterexample for the conjecture. The change in the mean score on this item was negligible in year one; but in years two and three it increased by 0.77 and 1.22 points respectively.

Table 2 shows the aggregated mean change in pre-post-course performance on all QRE items. The number of data points is the product of the number of participants and the number of QRE items.

Table 2 Change in performance on QRE portion of MKT-P questionnaire

The effect size in year 1 was small, due to this cohort having relatively high pre-mean, leaving limited space for growth. The effect size in years two and three was medium. In each year, the pre-to-post performance improved significantly. These results may be partially explained by repeated exposure to proof content, but not entirely, since despite taking several proof-intensive courses, the pre-course means in all years were quite low (on 0–3 scale) (Table 2). Thus, PSTs’ experiences in the capstone course likely contributed to the observed growth. PSTs’ self-report data supports this assumption, as this quote shows:

As a student who was previously a pure mathematics major and have taken many proof-based classes before this course, this was the only course that allowed students to truly explore these types of proof and apply the knowledge gained from these ways of proving in an alternative setting.

This quote, and similar ones, suggest that PSTs perceived their engagement with proof in the capstone course as qualitatively different from their experiences in other mathematics courses. The PSTs expressed appreciation for the opportunities to contextualize their knowledge of proof and apply it in situations approximating teaching practice.

4.1.2 Evidence from the What Can You Infer from This Example? activities

Tables 3 and 4 show the shifts in PSTs’ justifications for the truth-value of the two statements, after examining student work.

Table 3 Justifications of the truth-value of the statement: a quadrilateral whose diagonals are congruent and perpendicular to each other is a kite (N = 34)
Table 4 Justifications for the truth-value of the statement: there exist three consecutive even numbers whose sum is divisible by four (N = 34)

Almost all PSTs correctly identified the truth-value of both universal and existential statements. However, only 6 out of 34 PSTs correctly justified the falsehood of the universal statement referring to the existence of a counterexample—a quadrilateral with congruent and perpendicular diagonals, which is not a kite. Meaning that most of PSTs’ initial justifications were incorrect (Table 3). Eight PSTs considered a square or a rhombus as counterexamples, which is incorrect, since both are kites. Half of the PSTs wrote that “kites do not have congruent diagonals.” This is geometrically imprecise and logically inapplicable to disprove this statement.

After examining students’ examples, the number of PSTs’ incorrect justifications and irrelevant counterexamples decreased, while the number of correct responses more than doubled. These shifts occurred even before the whole class discussion, solely due to the PSTs’ interaction with the activity. The increase from 0 to 8 in the number of justifications that use both correct and irrelevant counterexamples is problematic. It suggests that the PSTs maintained their initial incorrect idea while also accepting correct counterexamples as legitimate.

For existential statements (Table 4), almost all PSTs initially proved it with a correct supportive example or explained that such an example exists, or both. After exploring student work there was an increase in the number of correct general explanations that a single supportive example proves an existential statement. We interpret this as PSTs’ increased ability to verbalize the mathematical warrant behind existential proof.

The follow-up whole-class discussion was centered on crystalizing content and pedagogical knowledge of proof through collective analysis of the hypothetical student work and of the PSTs’ own anonymized responses. The PSTs found this process beneficial as this comment shows:

Sometimes, a student’s reasoning seemed valid and correct to me, but I later learned how their reasoning could be more developed, or why it was invalid. Over time, I improved upon recognizing invalid proofs and techniques.

The PSTs attributed their improvement to the situated nature of the activities and the opportunity to analyze student mathematical work both individually and collectively.

4.2 Enhancement of PSTs’ QRE-related practices

4.2.1 Lesson planning

The apply activities involved lesson planning, enactment, and reflection. The PSTs designed QRE-integrated lessons in a variety of mathematical topics and grade levels such as: equations, proportional reasoning, functions, congruent triangles, parallel lines, standard deviation, and matrix operations.

The success of QRE integration in the lesson plans varied. There were 16 lesson plans (47%) with none or low QRE integration scores of 0–1.3 (see the Methods section). For example, Jane’s high school-level lesson on matrices had a QRE integration score of 0.4 out of 4. The plan contained no explanation of quantified statements or the role of examples in proving, and no relevant summary questions, as Jane expected students to “learn about quantified statements indirectly.” Of the eight tasks in the lesson plan, two were True–False questions about matrix multiplication, one universal and one existential.

There were 12 lessons (35%) with the medium QRE integration scores of 1.3–2.6. These lessons mentioned QRE in either the objectives, explanations, or summary, and/or had a relatively high ratio of QRE-related tasks. For example, Emily’s lesson on quadratic functions scored 2.2 on QRE integration. Of the six lesson objectives, one was “students will learn about quantification and the role of examples in proving”. Of the 17 tasks, 12 were devoted to QRE. Emily had students examine six Always–Sometimes–Never questions about quadratic functions, e.g., “knowing the vertex and the y-intercept we can graph the parabola,” and then consider how many examples would be sufficient to prove/disprove each statement depending on if it is always, sometimes, or never true. Emily used these questions to discuss the role of examples in proving/disproving quantified statements. One out of three lesson summary questions assessed student understanding of this topic.

Six lesson plans (18%) had high QRE-integration scores of 2.6–4. Such lessons contained explicit explanations of the role of examples in proving/disproving quantified statements, QRE-related objectives and summative questions, and at least half of the tasks directly dealing with QRE. For example, in Silvia’s lesson on congruent triangles (QRE integration score of 3.15) students had to prove or disprove eight pairs of quantified statements, which differed only by the type of quantifier (Fig. 6). Moreover, students had to formulate the statements themselves, from the given information about the pairs of triangles. The lesson contained an exposition on proving/disproving quantified statements; three out of four lesson objectives and two out of five summative questions specifically addressed QRE.

Fig. 6
figure 6

Silvia’s worksheet integrating QRE in a lesson on congruent triangles

A prevalent theme in the PSTs’ post-course reflections was that lesson planning was “the most challenging” but also “most worthwhile” aspect of the course. Angie wrote:

While incorporating the proof themes into our lessons was challenging, it was also very eye-opening into the multitude of ways that higher-level mathematics topics can be brought into lower-level subjects.

In summary, 53% of the lesson plans had medium or high QRE integration scores, suggesting that despite the challenges, about half of the PSTs succeeded in designing lesson plans that creatively integrated QRE with regular mathematics topics. Notably, 60% of the lessons used True–False or Always–Sometimes–Never task formats. The post-course reflections provided supportive evidence that PSTs drew inspiration from the course activities in their lesson plans.

4.2.2 PSTs’ learning from reflection on enacted lessons

We outline the main findings of the analysis of PSTs’ reflections on enacted lessons to illustrate their effect on PSTs’ pedagogical growth (for more details see Buchbinder et al., 2021). Effective reflection entails teachers noticing multiple aspects of classroom environment, critically analyzing them, and making connections to past experiences, theoretical principles, and future actions (Moore-Russo & Wilsey, 2014). The analysis revealed four broad categories of PSTs’ noticing: mathematical content, teaching, students, and interactions. The two modal categories, accounting for 45% of codes, were PSTs’ noticing their own teaching of mathematics and their interactions with students. This is typical of novice teachers, who tend to focus on aspects of the classroom situation directly involving them (Dindyal et al., 2021).

Next, we used literature-based categories to identify evidence of PSTs’ learning from reflections (shown in Table 5 in the descending order of frequency).

Table 5 Categories indicating PSTs’ learning from reflection

Given the foci of PSTs’ noticing, it is not surprising that the modal category (33% of codes) was reflecting on one’s teaching in relation to student learning. Post-course reflections support this observation, as this comment shows:

It was a learning experience to teach the lesson, but a great deal of learning occurred watching the videos because I could analyze my lesson in depth. I was able to go back in time to my lesson and see how what I said or what I did affected the discourse that took place.

The added value of this analysis is illustrating that PSTs’ reflections bear characteristics of effective reflection, which may have contributed to PSTs’ learning.

5 Discussion

5.1 Summary and limitations

We presented evidence of PSTs’ learning from the Quantification and the Role of Examples in Proving (QRE) module, and from the course overall. The evidence for strengthened PSTs’ content knowledge is shown in significant improvement in PSTs’ performance on the QRE portion of the MKT-P questionnaire and an increased number of correct justifications on What Can You Infer from This Example? activities, even prior to the whole class discussion. The evidence for enhanced pedagogical knowledge and practices came from PSTs’ lesson plans and reflections on enacted lessons. PSTs’ written comments provide additional support, connecting course activities to their learning, although self-reported data should be treated with caution.

These results must be interpreted within methodological limitations of the study. First, the number of the participants is small, not allowing for generalizing the outcomes. Second, the feasibility of assessing MKT with questionnaires has been contested due to the situated nature of teacher knowledge (Charalambous, 2020). Our study followed the established methodological tradition of using questionnaires for assessing cognitive aspects of MKT-P (cf., Krauss et al., 2008; Tatto, 2013) while practices were assessed using scenario-based instruments (e.g., Zazkis et al., 2013) and a lesson enactment cycle. Third, since the course design and measurements are aligned, as typical of design research (Sandoval, 2014), the observed improvement in MKT-P scores may be, at least partially, attributed to this alignment. To mitigate these challenges we used multiple data sources to triangulate evidence of PSTs’ enhanced knowledge of QRE. Another limitation related to the nature of MKT is that if teacher knowledge or competence are special to the teaching profession and grow with experience (Krauss et al., 2008; Shulman, 1986), it is unclear how much change can occur in a single semester. We addressed this challenge by treating lesson enactment as a non-assessed learning experience, and only considering lesson planning and reflective noticing as indicators of growth of MKT-P (cf., Blömeke et al., 2008; Dindyal et al., 2021).

Finally, the absence of a control group, although typical of design research (Gravemeijer & Prediger, 2019), makes it difficult to attribute the outcomes to the course. In a relatated study (Buchbinder et al., 2022) we partially address this issue by comparing PSTs’ MKT-P performance to that of other mathematically knowledgeable groups, like undergraduate mathematics majors and inservice secondary teachers. That study showed that PSTs’ post-course performance is closer to that of inservice teachers, who outperformed all other groups. However, such comparison is not a substitute for a controlled study, which can be conducted in the future.

5.2 Design principles: crystalize–connect–apply

University mathematics programs seek to bridge the double discontinuity (Klein, 1932) and to address the needs of future mathematics teachers in various ways (Tatto, 2013). Instructors may explicate connections between university and school mathematics by introducing tasks grounded in secondary context or adopt curricula emphasizing such connections (e.g., Wasserman et al., 2023). To the best of our knowledge, these efforts are less common with respect to the topic of proof. Our study addressed this gap by designing a novel capstone course, bridging between university-level proof and secondary teaching. Although the course targets both content and pedagogical MKT-P, it’s focus is inherently mathematical, organized around four proof themes. This approach asserts the primacy of subject matter knowledge in teachers’ knowledge (Harel, 2008), and positions teaching as a form of applied mathematics (Stylianides & Stylianides, 2010).

Since adding a whole course to an existing program may not be feasible for most universities, it is important to clarify the contributions of our study. The outcomes of design research are not intended to be generalized in the same way as those of experimental research, but the design principles generated in the retrospective analysis may apply beyond the research context (Gravemeijer & Prediger, 2019). The motivation for the three design principles: crystalize–connect–apply, came from extensive analysis of research literature (see Method section), which typically treats each principle separately. Our study distills these design principles from various strands of literature and illustrates how they can be integrated in a holistic design, embodied in specific activities, which correspond to three types of learning opportunities afforded by the course.

In the QRE module, the crystalize design principle: Use activities that strengthen (crystalize) PSTs’ content knowledge specific to proof, was embodied in the activities What Can You Infer from This Example?, True–False, and Always–Sometimes–Never. These activities aimed to crystalize PSTs’ knowledge of the roles of examples in proving/disproving quantified statements relying solely on content knowledge. The use of secondary mathematics content in the statements intended to focus PSTs’ attention on the logical aspects of QRE rather than on the complexity of university-level content (Dawkins, 2017). Concurrently, the use of secondary mathematics content in these tasks supported the second design principle: Connect university-level knowledge of proof with secondary school mathematics. In our study, this principle entails two types of connections (1) the mathematical connections between university-level proof and secondary curriculum, and (2) pedagogical connections to student mathematical conceptions. In the QRE module these connections were closely intertwined and linked with the first design principle. That is, the opportunities to strengthen content knowledge of QRE were situated in secondary classroom contexts and linked with pedagogical opportunities to examine students’ conceptions. By presenting mathematical arguments as products of student work, the activities support positioning of PSTs as future teachers, affecting their interactions with the tasks (e.g., Baldinger & Lai, 2019).

The crystalize and connect design principles may be implemented in a variety of courses. Tasks requiring analysis of fictitious students’ arguments have been successfully used in university courses like Calculus, Real Analysis, Abstract Algebra and capstone courses to strengthen connections between university and school mathematics (Álvarez, et al., 2022; Wasserman et al., 2019; Winsløw & Grønbæk, 2014). Analyzing written arguments, identifying mistakes, and responding to incorrect solutions provide students in university mathematics courses with opportunities to reflect on their own mathematical understanding, become aware of their own possible misconceptions and engage in self-explanation and justification (Hodds et al., 2014). Thus, tasks embedding the crystalize and connect design principles could benefit PSTs, and other majors, in a variety of university mathematics courses. Future studies may examine applications of our tasks, or modifications thereof, in proof-intensive university courses.

The apply design principle: Provide opportunities to apply (enact) content and pedagogical knowledge specific to proof in environments approximating school teaching was realized through the plan-enact-reflect lesson cycle. Originally, we envisioned this as an assessment component but early on realized that this is a learning opportunity for the PSTs to develop practical skills for integrating proof in teaching mathematics. The benefits of including this design principle in our study are evident in the relative success of PSTs in designing QRE-oriented lessons, the improved performance on pedagogical items of the MKT-P questionnaire, and PSTs’ self-report. The apply design principle emphasizes the importance of engaging future teachers with approximations of practice (Grossman et al., 2009) and the value of reflecting on one’s practice (Moore-Russo & Wilsey, 2014).

Implementing the apply design principle in our study required conditions that may be challenging to replicate: an instructor specializing in mathematics education, the course being offered to PSTs exclusively, and coordination with local schools. However, some alternatives to lesson planning and enacting can be utilized in other university courses. For example, scripting tasks and lesson plays, where students explain a concept or a procedure in the form of a written scenario and visual images have been used in courses like Abstract Algebra, problem-solving for secondary teachers, and others (Zazkis et al., 2013; Zazkis & Herbst., 2017).

Our study serves as a proof of existence for the possibility of supporting PSTs’ learning to teach reasoning and proving at the secondary level via a university mathematics course. Future studies may explore the effects of individual design principles, modules, or activities in other settings.