Assessment support

toolbox

Asessment analysis

After the test taking, it is highly recommended to execute an analysis on the results. Preferrably before the grades are communicated. Based on the analysis you can:

Assess the overall student results

First of all you can have a look at and reflect upon the student results for the whole test. You can look at:
+  How many students passed/failed? Number/percentage?
+  What is the range and distribution of the grades? Frequencies?
+  What is the average grade?
+  Standard deviation/variance (tells you how spread out the samples in a set are from the mean).

To reflect: Are these results as expected? can you account for the results? Explain the results? 

Test analysis on item level

The next step is to look at the values for each of the items (questions). For open questions, this can be done in a qualitative, holistic way and/or in a quantative way.  Based on the conclusions, measures can be taken for the purpose of scoring and grading and for evaluative purposes.

Qualitative way: During your grading process you get a notion of the common mistakes that were made. It may be necessary to adapt your answering and scoring model (Check the already scored tests again!). You can make notes about these common mistakes and what stands out. If there are more assessors involved, you can aks all assessors to do the same and afterwards compare and discuss the notes.   

Quantitative way: Calculate the psychometric values for all items, interpret the values, draw conclusions and if necessary take actions. 
The data provide signals, you still have to check what did really happen. For instance if hardly any student choose for a MC question the right answer,
it can be that it was a very difficult question, or unclear question, or more answers turned out te be correct or maybe just by mistake the wrong key (the letter indicating the correct answer) was entried in the system.  
When digital systems are used, like Remindo or Contest, you get these data automatically. 

Qualtitative test item analysis

For a psychometric analysis, you can look at the following vaues:

  • P-values - Item difficulty

    The P-value gives an indication of the Item difficulty. It shows the percentage of students who answered an item correctly.
    A low P-value (e.g. below 0.30) indicates that for a many students this was a difficult item, a high P-value (e.g. above 0.90) indicates that almost all students found this question easy to answer. 

    For open questions
    The formula for calculating is:   P-Value = average score of the class / maximum possible score 
    Example: Maximum score for question A was 20 points. Students on average got 14 points (sum of all scores / number of participants)
    P-value = 14 / 20 = 0.70 

    Optimal P-value for open questions
    Theoretically a question that discriminates optimally, has a P-value of 0.50. It is not too easy and not too difficult. In practice you want to construct a test with questions that are a bit easier (e.g. 0.80 or higher), questions which will be a bit more difficult (e.g. below 0.40), and most questions will be in between and in the range around 0.50.  The exact distribution is difficult to indicate, but could be 20-60-20, for example.

    For closed questions
    The formula for calculating now is:  P-Value = total who answered the item correctly / total number of participants 
    Example: 30 students gave a corret answer. Totale number of stduents: 50.
    P-value = 30/50 = 0.60 

    Optimal P-value for closed questions
    Students may guess and you want to take that into account. Optimal P-value: slightly higher than midway between chance (1 divided by the number of answer options) and a perfect score (1) for the item. Formula: (1 + (1 / nr. alt.)) / 2 
    Example: Say a MC item with 5 alternatives. Score: 0 or 1. Guessing chance = 1/5 = 0.20 (20%) Optimal P-value: (1 + 0.20) / 2 = 0.60. 
    Optimal values:  2 answer options: 0.75    ||     3 answer options: 0.67     ||   4 answer options: 0.62


  • Distractor values

    This applies only for closed questions.
    When working with closed questions, beside looking at the P-value, it is also interesting to look at the A-value; the values for the aternatives or the incorrect answers. The alternatives are alos known as “distractors”. You can execute a distractor analysis.

    You look at the number and percentage of students that choose one of the distractors.  
    A-value =  total who choose a certain distractor / total of participants taking the test 

    The distractors should be plausible but incorrect. How to interpret the values? Look at the outliers.
    Look at the very low percentages (e.g.2%). This is not a good distractor. It needs revision or replacement for a next time (else it makes a question easier). 
    Look at the values higher than the P-value. This can indicate that there may be something wrong with the question. E.g. are the answers mutually exclusive? Miskeyed? Or this points to a misconception shared by many students (evaluative information).

  • Rit / Rir values - Item discrimination

    Item-total correlation (Rit)  [a.k.a. Point Biserial correlation]. 

    The Rit-value provides an indication of the correlation between the item and the total score. This Item Discrimination Index shows the extent to which students with high overall test results also got a certain question correct. This is what you would expect. A high Rit value will indicate this. 
    It becomes interesting when this will not be the case. When students who didn't perform well in general, answered a certain question correctly and those who did do well on the test overall, choose a incorrect answer. Then this is a signal that something can be the matter with this question.  

    The RiT ranges from -1.00 to 1.00. 
    As a general rule: the value should be higher than 0.20 

    + ) Positive (high) indicates that those scoring high on the total exam answered a test item correctly more frequently than low-scoring students.
    - ) Negative indicates low scoring students on the total test did better on a test item than high-scoring students.

    The Rir-value in general is used when a test has not many questions. It has the same function, but doesn't take the results of the question for which you calculate the value into account for the total. 

    For thise interested, this is the Rit formula:   

    For those who want to learn more, the following sites are informative:  
    Use point-biserial to discriminate high and low performers | GradeHub 
    > To see how it is calculated: Item Statistics for Classroom Assessments Part 2: Computing P-Values and Point-Biserial Correlations - Educational Data Systems (eddata.com) 

  • Chronbach's alpha

    Cronbach's alpha is a measure of internal consistency. It shows the extent to which the questions of a test provide consistent information regarding the students’ mastery of the course content. 
    It takes a lot of calculation, but if you administer the test ivia Remindo or Contest, you will get the value automatically. 

    The range is from 0 to 1.     As a rule of thumb:  < 0.70  bad / moderate    > 0.80  sufficient / good

    The value will be influenced by homo/heterogeneous of items (topics) and group and the number of participants and items.
    It won’t be reliable to use for small or very heterogeneous groups or a test with just a few questions or content that is not expected to be coherent (like for instance two subject-contents combined in one test).  
    So yes it can be used to get an impression of the test in total. But be careful with the interpretation. 

    For those who want to read more about this:
    > In this guide the formula is explained: An Instructor’s Guide to Understanding Test Reliability. Craig S. Wells &  James A. Wollack, 2003. 

Useful tools and resources

When you make use of digital test systems, such as Remindo, you will get the psychometric data automatically. That helps a lot. If this is not the case, you can create for instance your own Excel (SPSS or R) file or make use of existing format files for these data. Below some useful resources.