6
$\begingroup$

With regard to an undergraduate statistics course, I am developing a standardized list of point deductions with the TAs (doctoral students) so that graders are consistent in what they are taking off intermediate points for. For example, most problems are 10 points total, and my proposed point deductions for intermediate math errors are (for example):

  • -2 pts, erroneous +, - , *, /
  • -2 pts, erroneous sign, e.g. 3.02 instead of -3.02
  • -3 pts, failed to square, e.g. (x) instead of (x)^2
  • -3 pts, failed to take square root, e.g. (x) instead of sqrt(x)

If, after grading, you discover on a particular exam that the final answers for five 10-point questions are incorrect because of only making a minor -2 point intermediate error, the student could conceivably obtain a score of 80% on an exam if they missed only -2 points per question (40/50).

However, in statistics, there is a contextual element to every question, not just solving for a numerical answer -- that is, in addition to the worked problem, students need to write a text-based response for the following:

  • (2 pts) state whether the hypothesis test is significant or not
  • (2 pts) state whether the null hypothesis is rejected or accepted
  • (2 pts) state whether the p-value is less than 0.05 or not.

So if there was only one minor (-2 point) intermediate error made, causing an incorrect final numerical answer, the student will also incorrectly respond to the final text-based answers (above) as well.

Thus, would you also take off e.g. -2 points for an incorrect final numerical answer, as well as -6 points for missing the final text-based sub-items listed above?

In other words, would you only deduct -2 points for a complex (multi-step) algebra or calculus question if only a minor intermediate step was erroneous, or would you also deduct for having an incorrect final numerical answer as well?

Maybe I could propose to the TAs to augment the point deduction list with:

  • -1 pt, incorrect final numerical answer
  • -1 pt, state whether the hypothesis test is significant or not
  • -1 pt, state whether the null hypothesis is rejected or accepted
  • -1 pt, state whether the p-value is less than 0.05 or not.
$\endgroup$
0

4 Answers 4

7
$\begingroup$

When designing a grading system for a question on an exam, I would identify the key skills being tested on that question. This can involve looking at other questions on the exam: if there are other questions that test a skill, I like to weigh it less unless it's a vital skill. A good exam is actually created starting from a list of skills you want to test.

Then allot some points for each key skill being tested on the problem. For each skill, I would suggest either two possible scores (all or nothing) or maybe an additional "demonstrated a significant partial understanding of the skill" score.

I advise avoiding putting emphasis on skills which are not taught/reviewed in the course. So for instance, I won't deduct points for an arithmetic mistake since arithmetic is not a skill taught in the course. However there are similar skills that are taught: using the right formula, knowing how to use that formula (what plugs in where), and checking to make sure the answer makes intuitive sense.

What this means for multi-step problems is: if the student got step 1 wrong, they get points for step 2 if:

  • Their answer would have been correct if the answer to step 1 had been what they said
  • They demonstrated the skill in question

So, for instance, if they get the wrong p-value on a question, but interpret that incorrect p-value correctly to get the corresponding (but incorrect) interpretation, they get full points for demonstrating the "interpret p-values" skill. If they interpret their incorrect p-value incorrectly to get the interpretation of the correct answer, they don't get points for demonstrating the "interpret p-values skill".

Of course the best test of a grading system is to actually look at graded work and see if it gives reasonable scores. Try whatever you choose on some sample student work before locking it in.

Also, depending on the number of exams, questions, and TAs, consider having each TA grade a specific problem across all sections. This improves uniformity and distances the grader from the student.

$\endgroup$
8
  • $\begingroup$ There are actually both graders and TAs for the course. TAs are committed to holding office hours to assist/help students, whereas graders are managed by the TAs, and only grade quizzes, homeworks, exams. Graders agree among themselves that they will grade a specific set of questions on all measurements to minimize bias. Given your approach surrounding skills, if a student misses every question on a calculus exam, but because of algebraic mistakes only, and correctly performed calculus steps, would you give them a 100%? $\endgroup$
    – wjktrs
    Commented Nov 29, 2022 at 5:24
  • 2
    $\begingroup$ This is an idealization, but yes, I don't feel that whether a student gets an answer correct or not should impact their grade. There are a few subtleties, such as the fact that I include a review of some precalculus skills in my calculus courses (which do impact students' grades on the questions intended to test those skills). The really valuable information for students and educators coming out of an assessment is what skills students have demonstrated proficiency in and which they haven't yet (you can just report that and skip the number grades entirely!) $\endgroup$
    – TomKern
    Commented Nov 29, 2022 at 6:24
  • $\begingroup$ They are both important. (Method and getting right answer.) And getting the right answer is one of the easiest and least ambiguous to check. Furthermore it's very important for students to develop facility to NOT make "dumb mistakes" because they confuse and hold you back in longer problems, derivations, even rigor things like epsilon-delta or series solutions. $\endgroup$
    – guest
    Commented Nov 30, 2022 at 20:07
  • 1
    $\begingroup$ If you want, you can include "avoiding mistakes over long calculations under exam pressure" as a skill, and even include specific problems to test that skill. But if you test that skill, you should teach it, and pay close attention to how much that skill contributes to students' final grades. It seems like the poster of the original question was concerned that small mistakes contributed too much to students' final grades. It's worth noting that this skill doesn't often transfer to the real world: real-world situations that require accurate computation often give people more time to check work. $\endgroup$
    – TomKern
    Commented Nov 30, 2022 at 22:03
  • 2
    $\begingroup$ @guest I have dyscalculia. I make dumb computational mistakes all the time because my brain is wired up in such a way that I am bound to make these mistakes. In a real world setting, when I have time and resources to check my work, I am perfectly capable of finding my errors and correcting them (I mean, I did manage to earn a PhD, in part on the back of some rather involved hard analysis). But in an exam setting, under pressure, I will make mistakes. I do not believe that these kinds of mistakes are nearly as problematic as fundamental conceptual misunderstandings. $\endgroup$
    – Xander Henderson
    Commented Dec 1, 2022 at 12:59
20
$\begingroup$

I favor an additive grading scheme, where points are earned toward a possible maximum (say 10) instead of deducting points for the myriad possible mistakes one could make. Here, I would try to adopt a set of markers I am looking for and awarding points if they appear in the written work. This could help in standardizing your grading.

To avoid the situation you mentioned (where a student loses 6 points because they came to the wrong numerical conclusion and thereby has the wrong verbal interpretation), I might change your markers to see what to award points for, like:

  • (+2 pt) correctly/appropriately calculate the p-value
  • (+1 pt) state whether the p-value is less than 0.05
  • (+1 pt) if answer above matches the calculated p-value
  • (+1 pt) state whether the hypothesis test is significant or not
  • (+1 pt) if statement above matches the calculated p-value
  • (+1 pt) state whether the null hypothesis is rejected or accepted
  • (+1 pt) if statement above matches the calculated p-value
  • etc.

This way, a student can get the wrong p-value, but still answer the rest of the problem "correctly" and receive points.

$\endgroup$
7
  • 3
    $\begingroup$ I do like what you are proposing, since you essentially work from the bottom up and add points. For example, was the final numerical answer correct? Was the statement about the final answer correct? etc. That's easier to standardize - especially if you can't identify an intermediate point-reduction marker, i.e., what was done incorrectly. $\endgroup$
    – wjktrs
    Commented Nov 28, 2022 at 23:20
  • 3
    $\begingroup$ @jaskij- the real problem of this grading method is work involved. If you use this method and a student makes a mistake in step 1 or 2 - but then every following calculation is correct based on his initial wrong step, you / your doctorate will have to do the same calculation as you now also value the way the student takes, not just results. $\endgroup$
    – eagle275
    Commented Nov 29, 2022 at 11:58
  • 2
    $\begingroup$ @jaskij, I once confused a teacher by using a thoroughly unorthodox method to find the vertex of a parabola from an equation. Instead of using algebraic manipulation to put it into standard form, I took the derivative (which cleared out most of the clutter) and found the zero of the resulting line. $\endgroup$
    – Mark
    Commented Nov 29, 2022 at 23:35
  • 3
    $\begingroup$ I use this system with a couple of additional variations. 1. Correct step answers without any supporting work or justification get 0 credit. 2. If a step is done correctly but the numeric answer or expression is off because of an error in a previous step, I don't penalize the error twice. 3. If the student makes an error in one step and undoes the error in a later step (without explanation), they lose credit on both steps. $\endgroup$ Commented Nov 30, 2022 at 22:11
  • 3
    $\begingroup$ @user0123456789, re, reminding students not to make small math errors is unlikely to be helpful; no-one intentionally decides to make, or probably even wittingly makes, a math error. (This might be helpful for reminding them to check for small math errors, but I think we as instructors all too easily forget the huge stress of the test environment. I sometimes warn: "I will ask you this question, to which you will want to give the wrong answer ___, whereas the right answer is ___". It doesn't make that much difference!) $\endgroup$
    – LSpice
    Commented Dec 1, 2022 at 18:37
12
$\begingroup$

I advise being less intricate and put less load on the graders. I personally would go with all, half or zero credit for every question.

*All is correct answer (and some reasonable explication, not an essay, but also not a bare number.

*Half shows some decent knowledge of the process, but founders partway. Or has a "dumb mistake" on the algebra/arithmetic.

*And zero is for a mess.

After all there is more than one question on a test. More than one test in the course. And more than one course that the students take. Life is statistical after all, so these things even out.

Don't invest too much time in intricate grading. The cost/benefit is not worth it.

$\endgroup$
3
  • 4
    $\begingroup$ I don't know how well students would take discrete grading that results in weighting of 0, 0.5, or 1 for the total points, but there is some merit to this that prevents "nickel-and-diming" all the points taken off by a grader (worst case) when negotiating with a student during re-grading. $\endgroup$
    – wjktrs
    Commented Nov 28, 2022 at 23:26
  • 1
    $\begingroup$ I think this approach makes sense for a course where there are a lot of "tries" (i.e., graded problems) for a student (ex: frequent tests, long tests, etc.), but it could be pretty severe if you stack this up with other choices professors sometimes make for their own convenience (only have 1 midterm and 1 final, <= 5 questions on the one midterm, etc.). $\endgroup$
    – Steve
    Commented Nov 30, 2022 at 22:01
  • 1
    $\begingroup$ Good point and agreed, Steve. I hate that approach personally (but realize it is common in American colleges). Loved my math courses in HS, with a weekly Friday period-long exam. Felt high stakes enough to drive study and effort and seriousness, but also frequent feedback and not too final. My college was very high school also in attitude (not a midterm/final), but I went to a place that was stuck in the 19th century (in a good way, also had small classes, etc.) $\endgroup$
    – guest
    Commented Nov 30, 2022 at 22:39
9
$\begingroup$

Part of the problem is writing the exam questions in the first place. Others have noted that, when designing a grading rubric, you should identify what the key skills in the problem are. This seems backwards to me. First, identify the key skill that you want to test, and then write the exam questions which hit those skills.

Once the exam has been written, I would very much recommend that you make the rubric as simple as possible, given that you want consistent grading across a (possibly large) group of TAs. I typically grade on a 3 points scale:

  • [3] The answer is nearly perfect.
  • [2] The answer contains errors that are mechanical in nature (e.g. missing signs, incorrect computations, etc), but not conceptual. The mechanical errors are minor or are not central to the skill(s) being tested by the question.
  • [1] There are serious mechanical errors and/or conceptual errors, but something correct or relevant has been written on the page, in a way which clearly demonstrates at least some conceptual understanding.
  • [0] The answer is essentially ungradable (it is blank, or nonsensical, or whatever).

I will note that my [2] and [1] are essentially the 50% category in this answer. I think that it is worthwhile to distinguish between "dumb" arithmetic mistakes and more fundamental conceptual errors. That being said, my scheme is essentially the same idea—simple and quick to implement.

Experience has shown me that students really don't like this system. The students I work with are used to a grading scale in which an A is anything over 90%, a B is anything over 80%, and so on. Thus when they get 2 points out of 3 (67%), they feel like they are failing (since 67% is a D). However, my feeling is that [2] represents, roughly speaking, B or C level response, while a [1] represents a D or low C. This is something which has to be thought about when assigning letter grades—either add more "free" points into the course elsewhere, or grade on a different percentage scale, or accept that more students are going to fail.

If you are really stuck on a 90/80/70% scale, then (for example) remap [3] to 5 points, [2] to 4 points, [1] to 3 points, and [0] to 0 points.

If you want to weight different questions differently, continue to grade them on a 3 point scale, but weight them differently (easy-peasy). Students will be most happy if all of the questions are worth multiples of 3 points (because they don't really want to think about weighting).

In any event, the overall goal is to construct a grading scheme which is fast (you don't want your graders to have to spend a lot of time on things), and consistent (different graders should score a given response in the same way). Creating lots of deductions or opportunities for partial credit makes grading slower, hence I would tend to avoid it. Consistency is also easier to attain if there are fewer categories.

$\endgroup$
17
  • 1
    $\begingroup$ I don't think that the "discrete" nature is problematic. The scale from 0-10 is also "discrete", it just has more noise. The biggest difficulty that I have had with students vis-a-vis this grading scheme is that it is different from what they are used to, and people don't like it when you move their cheese. But it is a conversation which can be had with students. $\endgroup$
    – Xander Henderson
    Commented Nov 29, 2022 at 23:49
  • 2
    $\begingroup$ I think the student psychology aspect you mention is important and think it is really punitive (and honestly not great in terms of differentiating answer quality) to equally score an answer with an arithmetic error and one with a conceptual error as both meriting a "50% = F" as described in one of the other responses to this question. $\endgroup$
    – Steve
    Commented Nov 30, 2022 at 22:05
  • 2
    $\begingroup$ "Dumb errors" are a sort of static that jams up the works. I'm not looking for two team on the submarine nuclear code certainty. But very strong efficacy (98-99%+) is desirable for basic calculational work. It's like typing or piano playing...gotta hit the right keys. (Yes, own goal with my errors on typing!) Yes, concepts are important also...but unless you are teaching Olympiaders, that will come out easy, also. $\endgroup$
    – guest
    Commented Dec 1, 2022 at 9:48
  • 2
    $\begingroup$ In the work world, you aren't getting a passing grade for an Excel spreadsheet that has a dumb mistake. Concepts AND bug free are both important. 50% is enough really...especially since there are MUTIPLE questions. $\endgroup$
    – guest
    Commented Dec 1, 2022 at 9:48
  • 1
    $\begingroup$ @guest Exams are not a real world setting. In the real world, there are checks and balances which make it possible to catch silly mistakes in a way that is generally not possible on a timed exam in a mathematics classroom. Real code gets debugged. Real computation is done with a calculator or computer (either as a check against by-hand work, or checked by by-hand work). A student who has mastered the conceptual content of a course is likely to be able to spot mechanical errors in a real world setting, even if they make mechanical errors on an exam. $\endgroup$
    – Xander Henderson
    Commented Dec 1, 2022 at 12:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.