Next Article in Journal
Investigating Risk Factors for Racial Disparity in E-Cigarette Use with PATH Study
Previous Article in Journal
A Comparison of Limited Information Estimation Methods for the Two-Parameter Normal-Ogive Model with Locally Dependent Items
Previous Article in Special Issue
Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Stats 2024, 7(3), 592-612; https://doi.org/10.3390/stats7030036
Submission received: 29 April 2024 / Revised: 17 June 2024 / Accepted: 19 June 2024 / Published: 21 June 2024
(This article belongs to the Special Issue Robust Statistics in Action II)

Abstract

:
The two-parameter logistic (2PL) item response theory model is a statistical model for analyzing multivariate binary data. In this article, two groups are brought onto a common metric using the 2PL model using linking methods. The linking methods of mean–mean linking, mean–geometric–mean linking, and Haebara linking are investigated in nonrobust and robust specifications in the presence of differential item functioning (DIF). M-estimation theory is applied to derive linking errors for the studied linking methods. However, estimated linking errors are prone to sampling error in estimated item parameters, thus resulting in artificially increased the linking error estimates in finite samples. For this reason, a bias-corrected linking error estimate is proposed. The usefulness of the modified linking error estimate is demonstrated in a simulation study. It is shown that a simultaneous assessment of the standard error and linking error in a total error must be conducted to obtain valid statistical inference. In the computation of the total error, using the bias-corrected linking error estimate instead of the usually employed linking error provides more accurate coverage rates.

1. Introduction

Item response theory (IRT) models [1,2,3] are multivariate statistical models for multivariate binary random variables. These models are frequently used to model cognitive testing data from educational or psychological applications. For example, IRT models are operationally utilized in educational large-scale assessments [4,5], like the programme for international student assessment (PISA; [6]) study.
In this article, we only treat unidimensional IRT models [7]. Let X = ( X 1 , , X I ) be the vector of I dichotomous random variables X i { 0 , 1 } (also referred to as items or (scored) item responses). A unidimensional IRT model [8] is a statistical model for the probability distribution P ( X = x ) for x = ( x 1 , , x I ) { 0 , 1 } I , where
P ( X = x ; δ , γ ) = i = 1 I P i ( θ ; γ i ) x i 1 P i ( θ ; γ i ) 1 x i ϕ ( θ ; μ , σ ) d θ ,
where ϕ denotes the density of the normal distribution, with the mean μ and the standard deviation σ . The distribution parameters of the latent variable θ (also referred to as the factor variable, trait, or ability) are contained in the vector δ = ( μ , σ ) . The vector γ = ( γ 1 , , γ I ) contains all the estimated item parameters of item response functions (IRFs); where P i ( θ ; γ i ) = P ( X i = 1 | θ ) ( i = 1 , , I ). The two-parameter logistic (2PL) model [9] possesses the following IRF:
P i ( θ ; γ i ) = Ψ a i ( θ b i )
Using the item discrimination a i and item difficulty b i , Ψ ( x ) = ( 1 + exp ( x ) ) 1 denotes the logistic distribution function. The 2PL model could also be estimated for non-normal distributions [10,11,12,13,14,15]. In this case, an identification constraint is typically applied to an item X i 0 such that a i 0 = 1 and b i 0 = 0 .
The Rasch model [16] is obtained from the 2PL model as a particular case in which all item discriminations equal 1. Some researchers believe that the Rasch model offers particular measurement (i.e., metrological) properties in contrast to the 2PL model (e.g., [17,18,19]). However, in our view, the Rasch model has only one advantage over the 2PL model, which is that a conditional maximum likelihood estimation is applicable [20]. Moreover, the Rasch model possesses the unweighted sum score as a sufficient statistic for θ , which offers many interpretational advantages [21,22,23,24,25,26]. There is a disbelief that group comparisons can only be conducted with the Rasch model because it has a so-called property of separability, which entails specific objective comparisons [27,28]. However, this reasoning is incorrect and can be disproved by empirical data [29]. In fact, any IRT model with invariant item parameters across groups allows for invariant group comparisons [7,30], although proponents of the Rasch model frequently claim otherwise [31,32].
If independent and identically distributed observations x 1 , , x N of N persons from the distribution of the random variable X are available, and the unknown model parameters of the IRT model in (1) can be estimated using the marginal maximum likelihood (MML) using an expectation–maximization algorithm [33,34].
IRT models are frequently used to compare the performance of two groups in a test (i.e., on a set of items) regarding the factor variable θ in the IRT model (1). In the following, we only discuss the 2PL model. Two primary approaches can be distinguished [35]. First, concurrent calibration can be applied in which a joint IRT model is estimated in the two groups by assuming common (i.e., invariant) item discriminations and item difficulties in the two groups. While the mean and the standard deviation of θ are fixed in the first group for identification reasons, the mean μ and the standard deviation σ can be identified for the second group. Hence, these two parameters summarize the group difference regarding the factor variable θ . Second, the 2PL model can be separately estimated in each of the two groups. This approach allows items to function differently across groups, which is a property that is referred to as differential item functioning (DIF; see [36,37,38]). In the second step, the differences in item parameters are used to determine the group difference regarding the θ variable by means of a linking method [39,40,41]. The occurrence of DIF causes additional variability in the estimated mean μ and standard deviation σ [42,43,44,45,46]. Therefore, the estimated distribution parameters ( μ , σ ) depend on the choice of selected items, even for infinite sample sizes of persons. This variability is quantified in the linking error [47,48,49,50,51,52].
An anonymous reviewer pointed out that a simultaneous estimation, while allowing for group-specific DIF effects, would also be possible [53,54,55,56]. In fact, the two-step procedure of separate scaling with subsequent linking can be equivalently formulated as a one-step simultaneous estimation (i.e., concurrent calibration) with nonlinear constraints on group-specific item parameters [57].
This article investigates the computation of the total uncertainty of linking methods in a general treatment based on M-estimation theory [58]. A new bias-corrected linking error is derived, which results in the better performance of the coverage rates of constructed confidence intervals for linking estimates. In particular, it turned out that the newly proposed bias-corrected linking error estimate has a smaller bias for the linking error than the estimators currently employed in the literature.
The rest of the article is organized as follows. Section 2 formalizes the linking methods in the statistical language of estimating equations (i.e., M-estimation). Section 3 presents examples of linking methods that are subsequently investigated in two simulation studies. In Section 4, M-estimation theory is applied to compute the linking error, standard error, total error, and the newly proposed bias-corrected linking error. Then, Section 5 and Section 6 present the findings from two simulation studies. Finally, the article closes with a discussion in Section 7.

2. Linking Method

In this section, we formally define linking methods as a particular M-estimation [58,59] problem. We refer to Section 3 for examples of linking methods. Throughout this paper, it is assumed that γ ^ i = ( a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) contains the estimated item parameters from the 2PL model for item i in both groups. One can assume that there exists a true item parameter γ i = ( a i 1 , b i 1 , a i 2 , b i 2 ) in the population (i.e., for an infinite sample size). The quantity γ ^ i is a consistent estimate for γ i (under the scheme N ). The goal of a linking method consists of estimating the mean μ and the standard deviation σ in the second group based on all the estimated item parameters γ ^ i .

2.1. One-Step Linking Method

In a one-step method linking method, the parameter estimate δ ^ = ( μ ^ , σ ^ ) is obtained as the minimizer of a nonlinear optimization function H that is a sum of terms, in which each term refers to a single item. Formally, we define
δ ^ = arg min δ H ( δ ) with H ( δ ) = i = 1 I h ( δ ; γ ^ i ) .
Now, we define the partial derivatives H μ = ( H ) / ( μ ) , H σ = ( H ) / ( σ ) , h μ = ( h ) / ( μ ) , h σ = ( h ) / ( σ ) and H δ = ( H μ , H σ ) , and h δ = ( h μ , h σ ) for brevity. The parameter estimate δ ^ solves the following estimating equation:
H δ ( δ ) = i = 1 I h δ ( δ ; γ ^ i ) = 0 .
M-estimation theory is applied for computing a variance estimate for δ ^ under the assumption that I and { γ ^ i } i = 1 , , I are independent realizations from a distribution [58]. In this sense, the uncertainty of δ ^ regarding the selected set of items is somehow quantified.

2.2. Two-Step Linking Method

In a two-step linking method, the standard deviation σ is estimated in the first step. Afterward, the mean μ is estimated in the second step. The estimate σ ^ of the standard deviation σ is determined as a root of the nonlinear equation (that is, it is additive in items).
H σ ( σ ) = i = 1 I h σ ( σ ; γ ^ i ) = 0 .
In the second step, the estimate μ ^ is obtained as the root of the estimating equation
H μ ( μ , σ ^ ) = i = 1 I h μ ( μ , σ ^ ; γ ^ i ) = 0 .
Note that the two-step method can be alternatively considered as a one-step linking method, because the estimate δ ^ = ( μ ^ , σ ^ ) solves the stacked estimating equation (see [58]).
H δ ( δ ) = 0 , where H δ ( δ ) = H μ ( μ , σ ) H σ ( σ ) for δ = ( μ , σ ) .
Hence, one-step and two-step linking methods can be simultaneously analyzed using M-estimation theory.

2.3. Statistical Inference

This article is concerned with estimating the uncertainty of the estimated mean and standard deviation that are contained in the vector δ . The uncertainty is due to the sampling (or selection) of persons (i.e., under the scheme N ) and the selection (or sampling/modeling the randomness in group comparisons) of items (i.e., under the scheme I ).
Most of the linking literature that treats the uncertainty in γ ^ by computing a standard error (SE) of  δ ^ for a fixed number of items [60,61,62]. In this case, the variability in γ ^ exists because there is sampling variability in the estimated item parameters γ ^ i .
We have argued that the estimated item parameters γ ^ i have a population analogue γ i in an infinite sample size. If there is a model error (i.e., DIF), the estimated linking parameter  δ ^ depends on the chosen set of items, even for an infinite sample size. This variability due to item selection, which is referred to as the linking error (LE; see [48,63]) and the variance estimation, operates under the scheme where I .
The total error (TE) includes both sources of uncertainty: the standard error due to randomness due to persons and the linking error due to randomness in items [47,50,51,64]. However, it has been argued that the ordinary linking error estimate is partly affected by the sampling error [65]. In this article, a bias-corrected linking error resulting in a bias-corrected total error estimate is examined to try to reduce the portion of the estimated linking error variance that is due to the sampling error. We outline the statistical underpinnings of the estimators in Section 4.

3. Robust and Nonrobust Linking Methods

In this section, we discuss the most frequently employed linking methods for the 2PL model.

3.1. Mean–Mean Linking (MM)

The mean–mean (MM) linking method is a two-step linking method [40]. The standard deviation σ is estimated in the first step as the ratio of the means of the item discriminations in the two groups.
σ ^ = 1 I i = 1 I a ^ 2 i 1 I i = 1 I a ^ 1 i .
In the second step, the mean μ is estimated by
μ ^ = σ ^ 1 I i = 1 I b ^ i 2 + 1 I i = 1 I b ^ i 1 .
The estimate δ ^ = ( μ ^ , σ ^ ) can be written as the solution δ = ( μ , σ ) of the estimating equations
i = 1 I σ b ^ i 2 b ^ i 1 + μ = 0 and i = 1 I σ a ^ i 1 a ^ i 2 = 0 .
The MM linking method is considered nonrobust to outliers in the item parameter differences between groups, because the mean is an outlier-sensitive location measure.

3.2. Mean–Geometric–Mean Linking (MGM and RMGM)

The mean–geometric–mean (MGM) linking [40] (also referred to as log–mean–mean linking in [66]) is another two-step linking method that estimates σ as the ratio of the geometric means of the item discriminations. Using a general loss function ρ , the standard deviation σ is estimated in the first step as
σ ^ = arg min σ i = 1 I ρ log a ^ i 2 log ( σ ) log a ^ i 1 ,
where ρ ( x ) = ( x 2 + ε ) p / 2 is a differentiable approximation of the L p loss function x | x | p for p ( 0 , 2 ] and a sufficiently small ε > 0 (see [67]). In this article, we use ε = 0.01 in the simulation studies [68]. The ordinary mean is obtained using the L p loss function for p = 2 . The median as a location measure can be approximately obtained for p = 1 . The choice p = 0.5 is advantageous for asymmetrically distributed error distributions for DIF effects [69] and appears in invariance alignment [68,70]. In the second step, the mean μ is estimated by
μ ^ = arg min μ i = 1 I ρ σ ^ b ^ i 2 b ^ i 1 + μ .
By defining ρ 1 = d ρ / d x , the solution in MGM linking is given as the root of the estimating equations as follows:
i = 1 I ρ 1 log a ^ i 2 log ( σ ) log a ^ i 1 = 0 and i = 1 I ρ 1 σ b ^ i 2 b ^ i 1 + μ = 0 .
MGM linking in a nonrobust variant is defined with p = 2 , while robust MGM (abbreviated as RMGM) linking is defined by choosing p = 0.5 in the loss function ρ .

3.3. Symmetric Haebara Linking (HAE and RHAE)

Haebara linking is a one-step linking method [71] that relies on aligning IRFs to determine μ and σ . Using a discrete grid of θ points θ t ( t = 1 , , T ) and weights ω t , the mean μ and the standard deviation σ are estimated by minimizing a weighted distance between the IRFs, which is shown as follows:
H ( μ , σ ) = i = 1 I t = 1 T ω t ρ P ( σ θ t + μ ; a ^ i 1 , b ^ i 1 ) P ( θ t ; a ^ i 2 , b ^ i 2 ) ,
where P ( θ ; a , b ) = Ψ ( a ( θ b ) ) , and ρ again define a differentiable approximation of the L p loss function. For example, the θ grid can be equidistantly chosen between 6 and 6, and ω t could be proportional to the values of the density of a normal distribution with a standard deviation of 2. Some researchers alternatively prefer to set all weights ω t equal to 1. The nonrobust loss function ρ ( x ) = x 2 has been originally proposed by Haebara [71] and is widely used. The robust loss function with p = 1 has been proposed in Refs. [72,73]. The general robust L p loss function in Haebara linking for p 1 was utilized in [74]. Corresponding estimating equations to (14) are obtained by computing the partial derivatives of H with respect to μ and σ .
The linking function H in (14) is referred to as asymmetric Haebara linking, because it aligns the IRF of the first group with the IRF of the second group. This kind of asymmetry induces non-negligibly biased estimates for the standard deviation and for the mean to a smaller extent [66]. To this end, symmetric Haebara linking [75] has been proposed to simultaneously align the IRFs of both groups and to reduce the bias [66]. The linking function of symmetric Haebara linking is defined by
H ( μ , σ ) = i = 1 I t = 1 T ω t ρ P ( σ θ t + μ ; a i 1 , b i 1 ) P ( θ t ; a i 2 , b i 2 ) + i = 1 I t = 1 T ω t ρ P ( θ t ; a i 1 , b i 1 ) P ( σ 1 θ t σ 1 μ ; a i 2 , b i 2 ) .
The estimate δ ^ = ( μ ^ , σ ^ ) is obtained by minimizing H in (15) with respect to δ = ( μ , σ ) . Nonrobust symmetric Haebara (HAE) linking is obtained with the choice p = 2 of the L p loss function, while robust symmetric Haebara (RHAE) linking is obtained with the choice p = 0.5 .

4. Estimation of Standard Error, Linking Error, and Total Error

In this section, we derive the standard error, linking error, and total error for linking estimates in the framework of M-estimation theory [58,59,76]. The treatment in this section is an extension of the material presented in [65]. Assume the item parameter estimate γ ^ i = ( a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) with an estimated variance matrix V γ ^ i . Moreover, let γ ^ = ( γ ^ 1 , , γ ^ I ) be the vector of all the item parameters with an estimated variance matrix V γ ^ . The corresponding population analogs of the estimators are denoted by γ i and γ and are effectively the estimates for an infinite sample size. As described in Section 2, the linking method provides an estimate δ ^ = ( μ ^ , σ ^ ) for the population parameter δ = ( μ , σ ) as a root of the estimating equation.
H δ ( δ ^ ) = i = 1 I h δ ( δ ^ ; γ ^ i ) = 0 .

4.1. Linking Error

First, we derive the linking error of the estimate δ ^ of the estimating equation (16) that quantifies the uncertainty in the estimate due to the selection (or randomness) of items. As is usual in M-estimation theory, we carry out a Taylor approximation of H δ around the true parameter δ (sometimes also referred to as the pseudotrue parameter; [58])
H δ ( δ ^ ) = H δ ( δ ) + H δ δ ( δ ) ( δ ^ δ ) = 0 ,
where H δ δ denotes the vector of partial derivatives of H δ with respect to δ . Moreover, it holds that H δ ( δ ) = 0 because of the definition of the true parameter [58]. Hence, we obtain from (17) the following:
δ ^ δ = H δ δ ( δ ) 1 H δ ( δ ^ ) .
M-estimation theory provides the variance matrix of δ ^ as the sandwich variance estimate:
V LE = Var ( δ ^ ) = A 1 B A , where
A = H δ δ ( δ ) = i = 1 I h δ δ ( δ ; γ ^ i ) and
B = Var H δ ( δ ^ ) = i = 1 I Var h δ ( δ ; γ ^ i ) .
In (21), we used the approximate independence of the item parameters across items. In M-estimation, the matrix A is called the bread matrix, and B is the meat matrix. The unknown quantities in (19) can be estimated by
A ^ = i = 1 I h δ δ ( δ ^ ; γ ^ i ) and
B ^ = i = 1 I h δ ( δ ^ ; γ ^ i ) h δ ( δ ^ ; γ ^ i ) .
Hence, an estimate of the variance matrix V LE is given by
V ^ LE = I I 1 · A ^ 1 B ^ A ^ .
The factor I / ( I 1 ) in (24) is included to correct for finite-sample bias [65,77,78,79].

4.2. Standard Error

We now compute the standard error of δ ^ due to the sampling of persons (see [80,81,82,83,84,85]). A Taylor approximation of h δ around ( δ , γ i ) is carried out and results in
h δ ( δ ^ ; γ ^ i ) = h δ ( δ , γ i ) + h δ γ ( δ , γ i ) ( γ ^ i γ i ) + h δ δ ( δ , γ i ) ( δ ^ δ ) .
We can now use
i = 1 I h δ ( δ ^ ; γ ^ i ) = 0 and i = 1 I h δ ( δ , γ i ) = 0 , and arrive at
δ ^ δ = i = 1 I h δ δ ( δ , γ i ) 1 i = 1 I h δ γ ( δ , γ i ) ( γ ^ i γ i ) = A 1 C ( γ ^ γ ) , where
C = h δ γ ( δ ; γ 1 ) h δ γ ( δ ; γ I ) .
This allows us to compute the variance matrix in δ ^ due to the sampling error as follows:
V SE = A 1 D A with D = C V γ ^ C .
The unknown quantities in (29) can be estimated using A ^ in (22), and we obtain the following:
D ^ = C ^ V γ ^ C ^ with C ^ = h δ γ ( δ ^ , γ ^ 1 ) h δ γ ( δ ^ , γ ^ I ) .

4.3. Total Error and Bias-Corrected Linking Error

We now compute the total uncertainty in δ ^ (i.e., the total error). The variance as the total error has been defined as the sum of the variances due to the sampling error and linking error, and it is written as follows (see [50,65]):
V TE = V SE + V LE and V ^ TE = V ^ SE + V ^ LE .
We now derive a bias-corrected estimate of the linking error variance matrix V LE , which, in turn, allows us to compute a bias-corrected variance matrix for the linking error. The estimated meat matrix in the variance matrix for the linking error is given as
B ^ = i = 1 I h δ ( δ ^ ; γ ^ i ) h δ ( δ ^ ; γ ^ i ) .
However, the linking error should only be computed based on the true item parameters γ i instead of the estimated item parameters γ ^ i , which appear in (32). A Taylor approximation provides
h δ ( δ ^ ; γ ^ i ) = h δ ( δ ^ ; γ i ) + h δ γ ( δ ^ , γ i ) ( γ ^ i γ i ) .
Hence, the inflated variance contribution in B ^ due to the sampling error can be determined as
Var i = 1 I h δ γ ( δ ^ , γ i ) ( γ ^ i γ i ) = i = 1 I h δ γ ( δ ^ , γ i ) V γ ^ i h δ γ ( δ ^ , γ i ) ,
where we used the approximate independence of the item parameters across items. As a result, we compute a bias-corrected meat matrix as follows:
B ^ bc = B ^ D ˜ with D ˜ = i = 1 I h δ γ ( δ ^ , γ ^ i ) V γ ^ i h δ γ ( δ ^ , γ ^ i )
Note that the correction term D ˜ in (35) corresponds to the matrix D ^ in (29) in the variance due to standard errors if the item parameters γ ^ i were uncorrelated across items. Next, a bias-corrected variance matrix due to linking error is given as
V ^ LE , bc = I I 1 · A ^ 1 B ^ bc A ^
and the variance matrix for the total error is given by
V ^ TE , bc = V ^ SE + V ^ LE , bc .
To sum up, the variance matrix referring to the total error can be written as
V ^ TE = A ^ 1 I I 1 B ^ + D ^ A ^ and V ^ TE , bc = A ^ 1 I I 1 B ^ + D ^ I I 1 D ˜ A ^ .
To obtain standard errors, linking errors, bias-corrected linking errors, total errors and bias-corrected total errors, the square root of the diagonal elements of the corresponding matrices can be taken. In cases of negative variances for bias-corrected estimates, a corresponding linking error estimate is set to zero.

5. Simulation Study 1: Assessing Linking Error in Infinite Sample Size

In the first Simulation Study 1, the validity of the variance estimates based on M-estimation (see Section 4) for the estimated mean μ ^ and the estimated standard deviation σ ^ was investigated for an infinite sample size of persons.

5.1. Method

In this study, only item parameters were simulated in each replication. No item responses were simulated, because the case of an infinite sample size N was investigated. The 2PL model was used as the IRF in the IRT model. There were two groups. For identification reasons, the mean and the standard deviation of the factor variable θ in the first group were set to 0 to 1, respectively. The mean μ and the standard deviation σ of the second group parametrized the group differences. Throughout the simulation, the choices μ = 0.3 and σ = 1.2 were made.
We simulated item parameters for I = 10 , 20, and 40 items. The group-specific item parameters a i g and b i g for i = 1 , , I and g = 1 , 2 relied on base item parameters that were fixed in the simulation and a random DIF effect that was simulated in each replication of the simulation study. The base item discriminations a i 0 in the case of I = 10 items were chosen as 0.73, 1.25, 1.20, 1.47, 0.97, 1.38, 1.05, 1.14, 1.15, and 0.67. The base item discriminations b i 0 were chosen as −1.31, 1.44, −1.20, 0.10, 0.10, −0.74, 1.48, −0.61, 0.82, and −0.07. The same item parameters were also chosen in [65]. For item numbers as multiples of 10, we duplicated the item parameters of the 10 items accordingly. The group-specific item difficulties b i g ( g = 1 , 2 ) were simulated as
b i 1 = b i 0 e i / 2 and b i 2 = b i 0 + e i / 2 ,
where e i is a random DIF effect. Note that e i = b i 2 b i 1 parametrizes a uniform DIF effect as the difference in group-specific item difficulties. Group-specific item discriminations  a i g   ( g = 1 , 2 ) were simulated as
a i 1 = a i 0 exp ( f i / 2 ) and a i 2 = a i 0 exp ( f i / 2 ) ,
where f i is another random DIF effect. The nonuniform DIF effect f i can be computed as the difference of logarithms of item discriminations (i.e., f i = log a i 2 log a i 1 ). In the simulation, we assumed that e i and f i were uncorrelated. Both DIF effects had zero means and had standard deviation τ for e i and 0.3 × τ for f i . Two distributions of e i and f i were specified: a normal distribution or a scaled t distribution with three degrees of freedom (denoted as t 3 ). In this simulation study, we varied the DIF standard deviation for DIF effects for e i item difficulties as 0.25 and 0.50. According to the definition, the respective DIF standard deviation for DIF effects f i for (logarithmized) item discriminations were 0.075, and 0.15, respectively.
Five different linking methods were utilized to compute to estimate the mean difference μ ^ and the standard deviation σ ^ : mean–mean (MM) linking, mean–geometric–mean (MGM) linking, robust mean–geometric–mean (RGM) linking, symmetric Haebara (HAE) linking, and robust symmetric Haebara (RHAE) linking. The linking methods rely on estimated item discriminations a ^ i g and item difficulties b ^ i g ( i = 1 , , I , g = 1 , 2 ). In an infinite sample size, these identified item parameters are given as
a ^ i 1 = a i 1 , b ^ i 1 = b i 1 , a ^ i 2 = σ 1 a i 2 , and b ^ i 2 = σ 1 ( b i 2 μ ) .
In each of the 2 (type of distribution) × 2 (DIF standard deviation τ ) × 3 (number of items I) = 12 cells of the simulation, 5000 replications were conducted. We computed bias and root mean square error (RMSE) for the estimated mean μ ^ and the estimated standard deviation σ ^ . A relative percentage RMSE was computed as the ratio of the RMSE values of a particular linking method and the chosen reference method of MGM linking. We also assessed the coverage rate for μ ^ and σ ^ at the 95% confidence level based on the normal distribution as the percentage of events in which an estimated confidence interval contained the true value μ = 0.3 or σ = 1.2 , respectively.
The R software (Version 4.3.0; [86]) was used for the entire analysis in this simulation study. We wrote an R function linking_2groups_dich() that allows for the computation of estimates and their standard errors of any user-defined linking method. The function and replication material for this Simulation Study 1 can be found at https://osf.io/6bp3t (accessed on 29 April 2024).

5.2. Results

It turned out that all five linking methods resulted in approximately unbiased estimates for the mean μ ^ and the standard deviation σ ^ . Table 1 displays the relative RMSE for the estimates as a function of the DIF effect distribution type, the DIF standard deviation τ , and the number of items I. The MM linking method performed similarly to the MGM method with respect to the mean μ (i.e., the MM had relative RMSE values close to 100; that is, those of the MGM) but was slightly less efficient with respect to the standard deviation σ , particularly for DIF effects that followed the t 3 distribution. The RMGM linking method had substantial efficiency losses for normally distributed DIF effects (i.e., it had RMSE values much larger than 100). On the other hand, the RMGM method provided large efficiency gains compared to the MGM method for the heavy-tailed t 3 distribution for μ and σ (i.e., it had RMSE values much smaller than 100). Note that the RHAE method outperformed the HAE method for DIF effects with a t 3 distribution when analyzing the RMSE for the estimated mean μ ^ . However, RHAE (or HAE) should not be preferred over RMGM (or MGM) with respect to the RMSE of the estimated standard deviation σ ^ , because it had RMSE values much larger than 100.
Table 2 reports the coverage rate for μ ^ and σ ^ for the five linking methods. Coverage rates that are within the interval [92.5,97.5] indicate acceptable performance. Overall, the coverage rates based on M-estimation theory performed satisfactorily for at least 20 items, except for the RMGM linking method, which resulted in undercoverage for μ ^ (i.e., it had coverage rates much lower than 92.5). In this case, the estimated linking errors were, on average, too small compared to the standard deviation of the μ ^ estimates across replications in this simulation study. As expected, the coverage rates improved with an increasing number of items I.

6. Simulation Study 2: Assessing Total Error in Finite Sample Sizes

In the second Simulation Study 2, we investigated the statistical performance of the linking estimation methods in finite samples. In this case, there was uncertainty regarding the sampling of persons, thus resulting in standard error results and randomness in the DIF effects, which resulted in linking error results. Both sources of errors can be summarized as the total error.

6.1. Method

The item responses were simulated according to the 2PL model for a test with I = 10 , 20, or 40 items. The same item parameters as in Simulation Study 1 (see Section 5.1) were used. The factor variable θ was assumed to be normally distributed in both groups. Like in Simulation Study 1, the mean and the standard deviation for the factor variable θ in the first group were fixed at 0 and 1, respectively. The variable θ in the second group had a mean μ = 0.3 and a standard deviation σ = 1.2 . We used the normal distribution and the scaled t 3 distribution for DIF effects and varied the DIF standard deviations τ as 0.25 and 0.5. Moreover, we simulated a condition of no DIF effects (i.e., τ = 0 ). The sample sizes N = 500 , 1000, and 2000 were chosen in order to mimic sample sizes that are typically in the applications of the 2PL model [8].
In contrast to Simulation Study 1, the item parameters of the 2PL model were separately estimated for the two groups in the first step using MML estimation. In the second step, the performance of the five linking methods MM, MGM, RMGM, HAE, and RHAE was studied. The estimated mean μ ^ and the estimated standard deviation σ ^ for the five methods were compared regarding the bias, RMSE, and relative RMSE. As in Simulation Study 1, MGM linking was used as the reference method for computing the RMSE.
In total, 5000 replications were conducted in each of the 5 (type of distribution for DIF effects combined with DIF standard deviation  σ ) × 3 (number of items I) × 3 (sample size N) = 45 cells of the simulation.
In the analysis, we computed the median of the linking error estimate LE based on (24) and the median of the bias-corrected linking error estimate LE bc based on (36). Moreover, we compared the coverage rates for the estimates μ ^ and σ ^ based on the standard error SE , as well as the (uncorrected) total error TE , based on (31), and the bias-corrected total error TE bc was based on (37).
The R software [86] was used for the entire analysis in this simulation study. The 2PL model was estimated with the sirt::xxirt() function in the R package sirt [87]. As in Simulation Study 1, the R function linking_2groups_dich() was used for computing the estimates μ ^ and σ ^ and their standard errors for the five linking methods. Replication material for this Simulation Study 2 can be found at https://osf.io/6bp3t (accessed on 29 April 2024).

6.2. Results

All five linking methods were approximately unbiased in all conditions of the simulation study. Table 3 presents the relative RMSE as a function of the different DIF distribution types, the DIF effect standard deviation τ , the number of items I, and the sample size N. As expected from the literature, the HAE method was the most efficient linking method in the condition of no DIF (i.e., τ = 0 ). Across all conditions, the MM method had comparable performance to the MGM method regarding μ ^ , but it was slightly more efficient for σ ^ . Efficiency gains of the RMGM method were only realized for the heavy-tailed t 3 distribution in large sample sizes. In these situations, the RHAE method outperformed the RMGM method for the estimated mean but not for the estimated standard deviation.
Table 4 presents the median of the estimated linking error LE and bias-corrected linking error LE bc for the estimated mean μ ^ . The results for the DIF effects with a t 3 distribution and a standard deviation of τ = 0.5 were omitted for space reasons. In Table 4, we have also reported the estimated linking error for an infinite sample size (i.e., N = Inf) that was obtained from Simulation Study 1. It can be seen that the estimated linking errors LE and LE bc (almost always) converged to the LE for an infinite sample size with an increasing sample size.
It turned out that the estimated linking error LE was positively biased, while the bias-corrected linking error LE bc was negatively biased (to a lesser extent). In particular, the median estimated linking error of 0.061 for the MM method for 10 items and a small sample size N = 500 was substantially larger than the true value of 0 in the condition of no DIF (i.e., τ = 0 ). On the other hand, the median of the estimated bias-corrected linking error LE bc was 0 in all situations in which no DIF was simulated in the item parameters. Overall, one could conclude that the bias in both linking error types can be reduced with an increasing sample size and an increasing number of items. For all linking methods except for the RMGM method, the uncorrected linking error LE had worse performance compared to the bias-corrected linking error LE bc . Hence, LE bc could be the preferred choice for a reported linking error.
Table 5 reports the median values of the estimated linking error LE and bias-corrected linking error LE bc for the estimated standard deviation σ ^ . Overall, we observed a similar pattern of findings as in the case of the estimated mean μ ^ . Again, the bias-corrected linking error estimates for the RMGM method were unsatisfactory. The bias in the uncorrected linking error estimates was slightly larger for σ ^ than for μ ^ . A simple idea might be to use the mean of both of the linking error estimates LE and LE bc as another linking error to improve the performance of the linking error estimate.
In Table 6, the coverage rate for the estimated mean μ ^ is displayed. In the no DIF condition τ = 0 , the coverage rates based on the standard error SE performed satisfactorily. In the presence of DIF, the uncertainty in the estimated means of μ ^ was underestimated when the standard error was used when computing confidence intervals, thus resulting in substantial undercoverage. The confidence intervals based on the total error TE tended to have slightly increased coverage rates. In such situations, the coverage rates for the confidence intervals based on the bias-corrected linking error TE bc were slightly better. Generally, the RMGM linking method did not have adequate coverage rates in many situations.
Finally, Table 7 reports the coverage rates for the estimated standard deviation σ ^ . The bias-corrected total error outperformed the uncorrected total error regarding coverage rates. In many situations, the coverage rates based on the total error were too high. However, the RMGM method had substantial overcoverage in many conditions for confidence intervals based on TE and TE bc , particularly for fewer items or smaller sample sizes.

7. Discussion

In this article, we simultaneously treated standard errors and linking errors for linking methods in the 2PL model. We proposed a bias-corrected linking error estimate, which delivers a bias-corrected total error estimate. This bias-corrected total error outperformed the usually employed total error that is given as the simple variances due to standard error and the usual uncorrected linking error. In a simulation study, it turned out that the confidence intervals for the linking parameters based on the bias-corrected total error outperformed those based on the usual total error regarding coverage rates. Moreover, the bias-corrected linking error estimate was less biased than the uncorrected linking error estimate.
As with any simulation study, our study had several limitations. First, our study only treated the 2PL model for dichotomous item responses. However, the performance of the linking estimators and their variance estimates could also be investigated for the simpler Rasch model for dichotomous item responses [16] or the generalized partial credit model for polytomous item responses [88]. Furthermore, the theory in this article could also be adapted to the chain linking of multiple groups [83,84]. In addition, the distribution types of the DIF effects in the simulation studies were restricted to the symmetric normal distribution and the t distributions with three degrees of freedom. Future research could focus on alternative and asymmetric distributions such as mixture, uniform, or discrete distributions. Moreover, the factor variable θ was assumed to be normally distributed in both simulation studies. The 2PL model could also be estimated with non-normal θ distributions [10,13], which could be investigated in future studies. Next, followup research could focus on linking with smaller sample sizes, such as N = 250 , as well as the case of unbalanced group sizes. Furthermore, we only employed 10, 20, or 40 items in the two simulation studies. Future research could also investigate a larger number of items. We do not think that linking should be conducted with an even smaller number of items, such as I = 5 items, because the group comparisons will likely become unstable in the presence of DIF, and the representativity of the link items might be questioned (but, see [89]). Also, the extent of nonuniform DIF was not independently manipulated from the extent of uniform DIF in the two simulation studies. Finally, the performance of our proposed error estimates could also be applied to mis-specified IRT models. For example, the 2PL model could be employed for linking if the item response data would be generated from the logistic positive exponential model [90,91] or the monotonic polynomial IRT model [92,93]. All of these limitations could be addressed in extensive future research.
As a final side note, I would like to add that a comparison of two groups regarding the distribution of the factor variable θ could also be conducted using concurrent calibration by assuming invariant (i.e., the same) item parameters across the groups. Some researchers argue that linking uncertainty is reduced by assuming invariant item parameters (see [94,95]). I think that this belief is unjustified. The fact that there could be variability due to item selection does not disappear, because the variability in the model parameters is not represented in the statistical model. The computation of the linking errors under the assumption of invariant item parameters in the statistical model has been carved out in Ref. [96].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
2PLtwo-parameter logistic
HAEHaebara
IRFitem response function
IRTitem response theory
LElinking error
MGMmean–geometric–mean
MMmean–mean
MMLmarginal maximum likelihood
RHAErobust Haebara
RMGMrobust mean–geometric–mean
SEstandard error
TEtotal error

References

  1. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. Stat. Sci. 2023; epub ahead of print. Available online: https://imstat.org/journals-and-publications/statistical-science/statistical-science-future-papers/(accessed on 29 April 2024).
  2. Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
  3. Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
  4. Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  5. Berezner, A.; Adams, R.J. Why large-scale assessments use scaling and item response theory. In Implementation of Large-Scale Education Assessments; Lietz, P., Cresswell, J.C., Rust, K.F., Adams, R.J., Eds.; Wiley: New York, NY, USA, 2017; pp. 323–356. [Google Scholar] [CrossRef]
  6. OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 29 April 2024).
  7. van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
  8. Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
  9. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
  10. Bartolucci, F. A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika 2007, 72, 141–157. [Google Scholar] [CrossRef]
  11. Casabianca, J.M.; Lewis, C. IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. J. Educ. Behav. Stat. 2015, 40, 547–578. [Google Scholar] [CrossRef]
  12. von Davier, M. A general diagnostic model applied to language testing data. Br. J. Math. Stat. Psychol. 2008, 61, 287–307. [Google Scholar] [CrossRef] [PubMed]
  13. Xu, X.; von Davier, M. Fitting the Structured General Diagnostic Model to NAEP Data; Research Report No. RR-08-28; Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
  14. Woods, C.M.; Lin, N. Item response theory with estimation of the latent density using Davidian curves. Appl. Psychol. Meas. 2009, 33, 102–117. [Google Scholar] [CrossRef]
  15. Woods, C.M. Estimating the latent density in unidimensional IRT to permit non-normality. In Handbook of Item Response Theory Modeling; Reise, S.P., Revicki, D.A., Eds.; Routledge: New York, NY, USA, 2014; pp. 78–102. [Google Scholar] [CrossRef]
  16. Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
  17. Bond, T.; Yan, Z.; Heene, M. Applying the Rasch Model; Routledge: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
  18. Linacre, J.M. Understanding Rasch measurement: Estimation methods for Rasch measures. J. Outcome Meas. 1999, 3, 382–405. Available online: https://bit.ly/2UV6Eht (accessed on 29 April 2024).
  19. Salzberger, T. The illusion of measurement: Rasch versus 2-PL. Rasch Meas. Trans. 2002, 16, 882. Available online: https://tinyurl.com/25wzmzb5 (accessed on 29 April 2024).
  20. van der Linden, W.J. Fundamental measurement and the fundamentals of Rasch measurement. In Objective Measurement: Theory Into Practice. Vol. 2; Wilson, M., Ed.; Ablex Publishing Corporation: Hillsdale, NJ, USA, 1994; pp. 3–24. [Google Scholar]
  21. Camilli, G. IRT scoring and test blueprint fidelity. Appl. Psychol. Meas. 2018, 42, 393–400. [Google Scholar] [CrossRef]
  22. Edelsbrunner, P.A. A model and its fit lie in the eye of the beholder: Long live the sum score. Front. Psychol. 2022, 13, 986767. [Google Scholar] [CrossRef] [PubMed]
  23. Hemker, B.T. To a or not to a: On the use of the total score. In Essays on Contemporary Psychometrics; van der Ark, L.A., Emons, W.H.M., Meijer, R.R., Eds.; Springer: Cham, Switzerland, 2023; pp. 251–270. [Google Scholar] [CrossRef]
  24. Robitzsch, A. On the choice of the item response model for scaling PISA data: Model selection based on information criteria and quantifying model uncertainty. Entropy 2022, 24, 760. [Google Scholar] [CrossRef]
  25. Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci. 2022, 4, 9. [Google Scholar] [CrossRef]
  26. White, M. A peculiarity in educational measurement practices. PsyArXiv 2024. [Google Scholar]
  27. Engelhard, G. Invariant Measurement; Routledge: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
  28. Wind, S.A.; Engelhard, G. How invariant and accurate are domain ratings in writing assessment? Assess. Writ. 2013, 18, 278–299. [Google Scholar] [CrossRef]
  29. Heene, M.; Bollmann, S.; Bühner, M. Much ado about nothing, or much to do about something? J. Individ. Differ. 2014, 35, 245–249. [Google Scholar] [CrossRef]
  30. Ballou, D. Test scaling and value-added measurement. Educ. Financ. Policy 2009, 4, 351–383. [Google Scholar] [CrossRef]
  31. Briggs, D.; Maul, A.; McGrane, J. On the nature of measurement. PsyArXiv 2023. [Google Scholar] [CrossRef]
  32. Heine, J.H.; Heene, M. Measurement and mind: Unveiling the self-delusion of metrification in psychology. Meas. Interdiscip. Res. Persp. 2024; epub ahead of print. [Google Scholar] [CrossRef]
  33. Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
  34. Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
  35. Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model. 2020, 62, 233–279. [Google Scholar]
  36. Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
  37. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  38. Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amesterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
  39. Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
  40. Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
  41. Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica 2017, 77, 329–352. [Google Scholar] [CrossRef]
  42. Brennan, R.L. Generalizabilty Theory; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
  43. Michaelides, M.P. A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Front. Psychol. 2010, 1, 167. [Google Scholar] [CrossRef]
  44. Michaelides, M.P.; Haertel, E.H. Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Appl. Meas. Educ. 2014, 27, 46–57. [Google Scholar] [CrossRef]
  45. Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
  46. Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
  47. Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
  48. Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar]
  49. OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014; Available online: https://bit.ly/2YLG24g (accessed on 29 April 2024).
  50. Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
  51. Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
  52. Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
  53. Melin, J.; Cano, S.; Pendrill, L. The role of entropy in construct specification equations (CSE) to improve the validity of memory tests. Entropy 2021, 23, 212. [Google Scholar] [CrossRef]
  54. Melin, J.; Cano, S.; Flöel, A.; Göschel, L.; Pendrill, L. The role of entropy in construct specification equations (CSE) to improve the validity of memory tests: Extension to word lists. Entropy 2022, 24, 934. [Google Scholar] [CrossRef]
  55. Melin, J.; Fridberg, H.; Hansson, E.E.; Smedberg, D.; Pendrill, L. Exploring a new application of construct specification equations (CSEs) and entropy: A pilot study with balance measurements. Entropy 2023, 25, 940. [Google Scholar] [CrossRef]
  56. Tennant, A.; Pallant, J.F. DIF matters: A practical approach to test if differential item functioning makes a difference. Rasch Meas. Trans. 2007, 20, 1082–1084. [Google Scholar]
  57. von Davier, M.; von Davier, A.A. A unified approach to IRT scale linking and scale transformations. Methodology 2007, 3, 115–124. [Google Scholar] [CrossRef]
  58. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  59. Stefanski, L.A.; Boos, D.D. The calculus of M-estimation. Am. Stat. 2002, 56, 29–38. [Google Scholar] [CrossRef]
  60. Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas. 2018, 42, 192–205. [Google Scholar] [CrossRef]
  61. Battauz, M. equateIRT: An R package for IRT test equating. J. Stat. Softw. 2015, 68, 1–22. [Google Scholar] [CrossRef]
  62. Jewsbury, P.A. Error Variance in Common Population Linking Bridge Studies; Research Report No. RR-19-42; Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef]
  63. Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
  64. Haberman, S.J.; Lee, Y.H.; Qian, J. Jackknifing Techniques for Evaluation of Equating Accuracy; Research Report No. RR-09-02; Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
  65. Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
  66. Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
  67. Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
  68. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
  69. Robitzsch, A. Comparing robust linking and regularized estimation for linking two groups in the 1PL and 2PL models in the presence of sparse uniform differential item functioning. Stats 2023, 6, 192–208. [Google Scholar] [CrossRef]
  70. Robitzsch, A. Examining differences of invariance alignment in the Mplus software and the R package sirt. Mathematics 2024, 12, 770. [Google Scholar] [CrossRef]
  71. Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef]
  72. He, Y.; Cui, Z.; Osterlind, S.J. New robust scale transformation methods in the presence of outlying common items. Appl. Psychol. Meas. 2015, 39, 613–626. [Google Scholar] [CrossRef]
  73. He, Y.; Cui, Z. Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Appl. Psychol. Meas. 2020, 44, 296–310. [Google Scholar] [CrossRef]
  74. Robitzsch, A. Robust Haebara linking for many groups: Performance in the case of uniform DIF. Psych 2020, 2, 155–173. [Google Scholar] [CrossRef]
  75. Weeks, J.P. plink: An R package for linking mixed-format tests using IRT-based methods. J. Stat. Softw. 2010, 35, 1–33. [Google Scholar] [CrossRef]
  76. Zeileis, A. Object-oriented computation of sandwich estimators. J. Stat. Softw. 2006, 16, 1–16. [Google Scholar] [CrossRef]
  77. Fay, M.P.; Graubard, B.I. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001, 57, 1198–1206. [Google Scholar] [CrossRef]
  78. Li, P.; Redden, D.T. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med. 2015, 34, 281–296. [Google Scholar] [CrossRef] [PubMed]
  79. Zeileis, A.; Köll, S.; Graham, N. Various versatile variances: An object-oriented implementation of clustered covariances in R. J. Stat. Softw. 2020, 95, 1–36. [Google Scholar] [CrossRef]
  80. Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas. 2001, 25, 53–67. [Google Scholar] [CrossRef]
  81. Ogasawara, H. Item response theory true score equatings and their standard errors. J. Educ. Behav. Stat. 2001, 26, 31–50. [Google Scholar] [CrossRef]
  82. Ogasawara, H. Applications of asymptotic expansion in item response theory linking. In Statistical Models for Test Equating, Scaling, and Linking; von Davier, A., Ed.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 261–280. [Google Scholar] [CrossRef]
  83. Battauz, M. IRT test equating in complex linkage plans. Psychometrika 2013, 78, 464–480. [Google Scholar] [CrossRef] [PubMed]
  84. Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
  85. Zhang, Z. Asymptotic standard errors of generalized partial credit model true score equating using characteristic curve methods. Appl. Psychol. Meas. 2021, 45, 331–345. [Google Scholar] [CrossRef] [PubMed]
  86. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  87. Robitzsch, A. sirt: Supplementary Item Response Theory Models.R Package Version 4.1-15. 2024. Available online: https://cran.r-project.org/web/packages/sirt/index.html (accessed on 29 April 2024).
  88. Muraki, E. A generalized partial credit model: Application of an EM algorithm. Appl. Psychol. Meas. 1992, 16, 159–176. [Google Scholar] [CrossRef]
  89. Pibal, F.; Cesnik, H.S. Evaluating the quantity-quality trade-off in the selection of anchor items: A vertical scaling approach. Pract. Assess. Res. Eval. 2011, 16, 6. [Google Scholar]
  90. Samejima, F. Logistic positive exponent family of models: Virtue of asymmetric item characteristic curves. Psychometrika 2000, 65, 319–335. [Google Scholar] [CrossRef]
  91. Huang, Q.; Bolt, D.M.; Lyu, W. Investigating item complexity as a source of cross-national DIF in TIMSS math and science. Large-scale Assess. Educ. 2024, 12, 12. [Google Scholar] [CrossRef]
  92. Falk, C.F.; Cai, L. Semiparametric item response functions in the context of guessing. J. Educ. Meas. 2016, 53, 229–247. [Google Scholar] [CrossRef]
  93. Feuerstahler, L. Flexible item response modeling in R with the flexmet package. Psych 2021, 3, 447–478. [Google Scholar] [CrossRef]
  94. OECD. PISA 2015; Technical Report; OECD: Paris, France, 2017; Available online: https://bit.ly/32buWnZ (accessed on 29 April 2024).
  95. Robitzsch, A.; Lüdtke, O. An examination of the linking error currently used in PISA. Meas. Interdiscip. Res. Persp. 2024, 22, 61–77. [Google Scholar] [CrossRef]
  96. Robitzsch, A. Analytical approximation of the jackknife linking error in item response models utilizing a Taylor expansion of the log-likelihood function. AppliedMath 2023, 3, 49–59. [Google Scholar] [CrossRef]
Table 1. Simulation Study 1: Relative RMSE for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation  τ , and number of items I for an infinite sample size.
Table 1. Simulation Study 1: Relative RMSE for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation  τ , and number of items I for an infinite sample size.
μ ^ σ ^
Distτ I MMMGMRMGMHAERHAEMMMGMRMGMHAERHAE
norm0.2510100.0 100   125.8104.6105.2102.8 100   104.7189.3175.5
20100.0 100   126.0104.4105.7102.9 100   104.4189.0175.6
40100.0 100   125.5103.9105.4102.6 100   104.9185.0172.0
0.510100.0 100   141.1104.5110.8102.0 100   114.3186.6166.9
20100.1 100   141.4104.3109.9102.9 100   113.7186.6168.2
40100.0 100   143.5104.5110.9102.8 100   114.8187.6166.1
t 3 0.2510100.1 100   76.599.379.9 110.6 100   69.8163.9129.9
20100.1 100   74.996.778.4102.8 100   74.3175.2137.6
40100.2 100   75.497.578.9108.8 100   73.3170.7132.9
0.510100.1 100   86.594.377.4105.1 100   74.8169.5127.9
20100.2 100   82.593.376.1114.8 100   71.9167.4125.2
40100.1 100   74.687.069.7114.3 100   70.9161.7123.5
Note. Dist = type of DIF distribution; norm = normal distribution; t 3  = t distribution with three degrees of freedom; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking;  = MGM was the reference method for computing the relative RMSE. Relative RMSE values larger than 110.0 are printed in bold font.
Table 2. Simulation Study 1: Coverage rate at 95% confidence level for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation  τ , and number of items I for an infinite sample size.
Table 2. Simulation Study 1: Coverage rate at 95% confidence level for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation  τ , and number of items I for an infinite sample size.
μ ^ σ ^
Distτ I MMMGMRMGMHAERHAEMMMGMRMGMHAERHAE
norm0.251091.891.784.389.891.192.293.593.588.490.3
2093.493.388.392.993.593.694.494.292.193.1
4094.094.091.093.994.295.596.095.393.794.2
0.51091.291.268.989.989.792.793.790.989.391.2
2092.992.878.092.592.993.694.993.492.293.2
4093.993.884.593.993.994.795.694.193.694.5
t 3 0.251092.292.189.891.292.693.795.094.691.092.2
2093.793.792.293.594.195.495.795.592.693.1
4094.093.893.393.893.995.996.695.894.494.6
0.51093.292.980.992.592.894.295.193.890.192.0
2093.893.786.893.894.295.496.295.093.493.8
4094.194.090.293.794.295.896.495.693.693.9
Note. Dist = type of DIF distribution; norm = normal distribution; t 3  = t distribution with three degrees of freedom; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; Coverage rates smaller than 92.5 and larger than 97.5 are printed in bold font.
Table 3. Simulation Study 2: Relative RMSE for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
Table 3. Simulation Study 2: Relative RMSE for the estimated mean μ ^ and for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
μ ^ σ ^
Distτ I N MMMGMRMGMHAERHAEMMMGMRMGMHAERHAE
-01050099.6 100   100.792.089.996.9 100   118.596.296.5
100099.9 100   100.192.691.098.2 100   112.895.896.0
200099.9 100   99.494.092.999.1 100   107.995.795.2
20500100.1 100   100.494.293.697.7 100   108.797.797.7
1000100.2 100   99.595.194.398.3 100   105.498.197.8
2000100.1 100   98.895.895.398.4 100   103.798.197.9
40500100.0 100   99.296.395.898.4 100   102.799.298.6
1000100.1 100   100.197.597.498.8 100   101.799.499.3
2000100.1 100   99.897.997.799.4 100   100.999.399.0
norm0.251050099.8 100   118.095.997.097.6 100   118.5106.4107.5
100099.9 100   122.499.3100.699.1 100   114.4118.1116.0
2000100.0 100   123.6100.6102.099.2 100   110.8132.1128.0
20500100.0 100   112.396.597.098.7 100   110.6104.3104.3
1000100.0 100   116.599.2100.098.5 100   107.8115.0112.6
2000100.0 100   119.5100.7101.199.1 100   105.9124.4120.7
40500100.0 100   107.697.798.198.4 100   105.0104.3103.5
1000100.1 100   111.599.4100.099.4 100   105.4110.4109.0
2000100.0 100   116.3101.4102.399.3 100   103.6116.9114.6
norm0.51050099.8 100   130.999.6104.798.0 100   122.3129.4127.0
100099.9 100   133.6101.8107.4100.1 100   118.8143.2134.0
200099.9 100   135.6102.5109.099.8 100   117.6156.5144.8
2050099.9 100   130.5100.5105.098.2 100   115.1125.6122.2
1000100.0 100   138.1101.6108.599.8 100   114.6139.0132.5
2000100.0 100   138.5102.5108.6100.5 100   113.5155.1144.7
4050099.9 100   126.5100.4104.698.7 100   108.2115.9113.3
1000100.0 100   131.8102.2106.599.4 100   108.5129.4125.1
2000100.1 100   139.8103.3108.9100.9 100   111.6140.9132.1
t 3 0.251050099.8 100   99.294.687.097.8 100   116.6106.7103.9
100099.8 100   94.594.184.698.8 100   112.8113.9106.4
2000100.0 100   90.497.885.2100.8 100   102.8125.8113.3
2050099.9 100   100.795.490.998.1 100   108.4106.0102.7
1000100.0 100   94.995.087.798.8 100   105.8110.6104.2
2000100.1 100   89.197.986.8100.1 100   100.8120.8109.0
40500100.0 100   99.297.194.198.5 100   103.9103.7101.5
1000100.0 100   96.697.492.199.1 100   102.0108.2103.6
2000100.1 100   91.797.588.799.8 100   100.1115.9107.3
t 3 0.51050099.8 100   93.491.981.099.5 100   115.6122.6112.4
100099.9 100   92.592.280.4100.9 100   107.2134.9116.5
2000100.0 100   91.394.580.2101.6 100   99.3147.3121.9
20500100.0 100   97.093.484.198.7 100   107.8116.0108.5
1000100.0 100   92.793.682.4101.0 100   101.9127.6112.7
2000100.0 100   90.793.980.0102.4 100   93.1135.5112.8
40500100.0 100   97.595.588.798.8 100   104.5112.2106.4
1000100.0 100   92.094.783.9100.3 100   100.3121.0109.6
2000100.0 100   87.193.380.0101.9 100   96.0132.0111.6
Note. Dist = type of DIF distribution; norm = normal distribution; t 3 = t distribution with three degrees of freedom; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; = MGM was the reference method for computing the relative RMSE. Relative RMSE values larger than 110.0 are printed in bold font.
Table 4. Simulation Study 2: Median of the estimated linking error LE and the estimate bias-corrected linking error LE bc for the estimated mean μ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
Table 4. Simulation Study 2: Median of the estimated linking error LE and the estimate bias-corrected linking error LE bc for the estimated mean μ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
MM MGM RMGM HAE RHAE
Distτ I N LELEbcLELEbcLELEbcLELEbcLELEbc
-0105000.0610.000 0.0610.000 0.0660.000 0.0510.000 0.0520.000
10000.0420.0000.0420.0000.0450.0000.0360.0000.0370.000
20000.0300.0000.0300.0000.0320.0000.0250.0000.0250.000
Inf0.000 0.000 0.000 0.000 0.000
205000.0430.0000.0430.0000.0450.0000.0360.0000.0360.000
10000.0300.0000.0300.0000.0310.0000.0260.0000.0260.000
20000.0210.0000.0210.0000.0220.0000.0180.0000.0180.000
Inf0.000 0.000 0.000 0.000 0.000
405000.0300.0000.0300.0000.0320.0000.0250.0000.0250.000
10000.0210.0000.0210.0000.0220.0000.0180.0000.0180.000
20000.0150.0000.0150.0000.0150.0000.0130.0000.0130.000
Inf0.000 0.000 0.000 0.000 0.000
norm0.25105000.1020.0680.1020.0680.1000.0000.0940.0660.1020.066
10000.0910.0740.0910.0740.0900.0000.0850.0710.0910.075
20000.0860.0770.0860.0770.0860.0390.0810.0740.0880.079
Inf0.076 0.075 0.077 0.075 0.080
205000.0720.0500.0710.0500.0780.0000.0680.0520.0700.051
10000.0640.0530.0640.0530.0710.0000.0620.0540.0650.056
20000.0600.0550.0600.0550.0660.0400.0600.0560.0620.057
Inf0.054 0.054 0.061 0.056 0.058
405000.0500.0360.0500.0360.0590.0000.0480.0380.0500.037
10000.0450.0380.0450.0380.0540.0000.0450.0390.0460.040
20000.0420.0390.0420.0390.0500.0330.0430.0400.0440.041
Inf0.039 0.039 0.046 0.041 0.041
norm0.5105000.1740.1550.1740.1540.1290.0000.1660.1520.1810.156
10000.1670.1570.1670.1570.1220.0000.1610.1530.1750.162
20000.1630.1590.1630.1590.1190.0650.1600.1560.1710.165
Inf0.151 0.150 0.116 0.151 0.162
205000.1220.1100.1220.1100.1160.0000.1210.1120.1290.115
10000.1170.1120.1170.1110.1110.0210.1180.1140.1270.120
20000.1150.1120.1140.1120.1070.0740.1160.1140.1240.121
Inf0.109 0.109 0.105 0.113 0.119
405000.0850.0780.0850.0770.0960.0000.0860.0810.0920.083
10000.0820.0780.0820.0780.0930.0380.0840.0820.0900.085
20000.0800.0780.0800.0780.0910.0690.0830.0820.0880.086
Inf0.077 0.077 0.087 0.081 0.086
t 3 0.25105000.0920.0510.0920.0510.0910.0000.0830.0480.0860.041
10000.0790.0580.0780.0570.0770.0000.0730.0550.0740.054
20000.0720.0610.0710.0610.0670.0180.0670.0580.0680.059
Inf0.061 0.061 0.054 0.060 0.059
205000.0660.0410.0650.0410.0680.0000.0610.0420.0590.035
10000.0570.0450.0570.0450.0550.0000.0540.0450.0520.040
20000.0530.0470.0530.0470.0490.0210.0520.0470.0480.042
Inf0.046 0.046 0.039 0.046 0.042
405000.0470.0310.0470.0310.0480.0000.0440.0330.0420.026
10000.0410.0330.0410.0330.0400.0000.0400.0340.0370.029
20000.0380.0340.0380.0340.0350.0170.0380.0350.0340.030
Inf0.034 0.034 0.028 0.035 0.030
Note. Dist = type of DIF distribution; norm = normal distribution; t 3 = t distribution with three degrees of freedom; Inf = infinite sample size; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; Linking error estimates with an absolute bias larger than 0.01 are printed in bold font.
Table 5. Simulation Study 2: Median of the estimated linking error LE and the estimate bias-corrected linking error LE bc for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
Table 5. Simulation Study 2: Median of the estimated linking error LE and the estimate bias-corrected linking error LE bc for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
MM MGM RMGM HAE RHAE
Distτ I N LELEbcLELEbcLELEbcLELEbcLELEbc
-0105000.0860.0000.0870.0000.0930.0000.0740.0000.0790.000
10000.0610.0000.0610.0000.0680.0000.0520.0000.0550.000
20000.0420.0000.0430.0000.0470.0000.0360.0000.0380.000
Inf0.000 0.000 0.000 0.000 0.000
205000.0520.0000.0530.0000.0600.0000.0480.0000.0500.000
10000.0370.0000.0370.0000.0420.0000.0340.0000.0340.000
20000.0260.0000.0260.0000.0280.0000.0240.0000.0240.000
Inf0.000 0.000 0.000 0.000 0.000
405000.0340.0000.0350.0000.0400.0000.0320.0000.0330.000
10000.0240.0000.0250.0000.0270.0000.0230.0000.0230.000
20000.0170.0000.0170.0000.0190.0000.0160.0000.0160.000
Inf0.000 0.000 0.000 0.000 0.000
norm0.25105000.0920.0060.0930.0090.1020.0000.0910.0170.0980.016
10000.0660.0070.0670.0100.0760.0000.0710.0330.0770.033
20000.0510.0190.0520.0210.0590.0060.0610.0420.0650.044
Inf0.029 0.029 0.032 0.046 0.048
205000.0560.0080.0570.0090.0660.0000.0600.0260.0630.015
10000.0420.0150.0430.0140.0490.0000.0500.0320.0510.029
20000.0330.0180.0340.0190.0380.0050.0440.0340.0440.033
Inf0.021 0.021 0.022 0.035 0.034
405000.0370.0090.0380.0090.0450.0000.0420.0220.0420.013
10000.0280.0130.0290.0130.0330.0000.0350.0240.0350.021
20000.0230.0140.0230.0140.0250.0040.0310.0250.0300.023
Inf0.015 0.015 0.015 0.026 0.024
norm0.5105000.1040.0370.1060.0410.1130.0000.1250.0870.1350.072
10000.0850.0510.0860.0520.0980.0000.1120.0920.1180.089
20000.0730.0550.0740.0570.0850.0190.1040.0940.1070.093
Inf0.057 0.058 0.065 0.091 0.091
205000.0680.0380.0690.0380.0810.0000.0890.0700.0900.058
10000.0560.0390.0570.0400.0660.0030.0810.0710.0800.064
20000.0490.0410.0500.0410.0580.0270.0770.0720.0740.066
Inf0.041 0.042 0.047 0.070 0.067
405000.0450.0280.0470.0280.0560.0000.0620.0500.0610.041
10000.0380.0290.0390.0290.0460.0040.0570.0510.0550.045
20000.0340.0290.0350.0300.0400.0230.0540.0510.0520.047
Inf0.030 0.030 0.033 0.051 0.047
t 3 0.25105000.0910.0050.0920.0070.0980.0000.0860.0110.0920.008
10000.0660.0070.0670.0100.0750.0000.0660.0200.0700.018
20000.0500.0130.0500.0150.0560.0030.0540.0310.0570.031
Inf0.024 0.024 0.022 0.038 0.037
205000.0560.0070.0570.0090.0640.0000.0580.0200.0590.009
10000.0410.0130.0420.0130.0470.0000.0460.0260.0460.019
20000.0320.0160.0320.0160.0340.0020.0390.0290.0380.025
Inf0.018 0.018 0.015 0.029 0.026
405000.0370.0080.0380.0080.0430.0000.0400.0180.0390.007
10000.0280.0110.0280.0110.0310.0000.0320.0200.0310.015
20000.0220.0120.0220.0120.0230.0020.0280.0220.0260.018
Inf0.013 0.013 0.011 0.022 0.019
Note. Dist = type of DIF distribution; norm = normal distribution; t 3 = t distribution with three degrees of freedom; Inf = infinite sample size; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; Linking error estimates with an absolute bias larger than 0.01 are printed in bold font.
Table 6. Simulation Study 2: Coverage rate at 95% confidence level based on the standard error SE , total error TE , and bias-corrected total error TE bc for the estimated mean μ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
Table 6. Simulation Study 2: Coverage rate at 95% confidence level based on the standard error SE , total error TE , and bias-corrected total error TE bc for the estimated mean μ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
SE TE TE bc
Distτ I N MMMGMRMGMHAERHAE MMMGMRMGMHAERHAE MMMGMRMGMHAERHAE
-01050094.694.698.794.995.297.797.699.497.497.995.095.098.895.195.4
100094.494.597.794.995.097.997.999.197.797.894.994.997.795.195.2
200094.694.597.094.995.098.098.098.797.797.895.095.097.295.195.3
2050094.594.398.294.694.997.097.099.196.997.194.694.598.294.795.0
100094.594.597.394.995.296.896.798.596.797.094.794.797.395.095.2
200094.594.596.795.095.097.297.298.496.997.194.894.796.795.195.0
4050094.694.698.095.395.696.796.698.796.596.894.894.798.095.395.6
100095.395.296.995.895.796.796.698.096.696.695.495.396.995.895.7
200094.694.696.495.395.396.396.197.396.496.494.794.896.495.395.4
norm0.251050086.086.095.384.486.396.396.497.595.896.492.092.095.591.792.9
100078.778.788.576.777.395.495.494.894.695.492.092.090.191.592.2
200066.165.978.664.064.794.594.491.793.093.692.092.084.490.691.3
2050088.388.096.487.989.195.795.698.395.995.993.493.396.493.893.9
100082.382.291.881.882.595.695.596.895.395.893.092.892.893.493.7
200075.175.183.672.774.195.395.495.695.095.593.793.889.993.594.1
4050090.790.696.791.091.695.195.098.395.195.593.393.296.893.894.2
100087.487.493.387.388.095.495.397.295.495.593.893.894.194.394.2
200081.480.986.180.580.595.395.296.095.495.494.694.492.394.794.7
norm0.51050066.466.485.063.667.493.993.789.992.592.991.091.085.790.289.6
100054.254.273.550.555.093.092.984.991.492.491.491.277.689.790.0
200040.340.558.338.241.292.292.278.890.990.691.291.269.990.189.3
2050074.674.689.772.974.095.195.094.495.195.193.693.590.293.893.0
100064.063.678.261.862.394.694.590.094.594.093.393.382.193.792.8
200048.848.864.547.248.994.594.584.793.392.993.993.878.092.992.1
4050082.382.292.881.782.595.595.496.495.795.594.494.493.194.894.5
100072.972.584.471.772.595.094.994.295.095.194.594.387.894.594.6
200060.760.471.158.758.694.894.790.994.794.694.394.385.694.494.0
t 3 0.251050087.086.997.385.890.097.197.398.696.797.693.193.097.492.693.8
100080.780.694.178.984.095.995.997.395.296.292.492.394.491.892.5
200070.770.588.369.374.596.095.996.294.795.592.592.491.891.692.9
2050089.689.297.689.291.496.396.298.796.196.693.493.497.693.693.9
100084.284.194.984.387.995.995.998.095.696.393.793.595.393.794.0
200075.075.190.074.580.395.195.096.994.895.193.092.992.993.193.3
4050091.591.597.592.493.496.296.098.996.096.494.194.097.595.094.9
100087.887.695.488.590.696.196.198.096.196.294.194.095.794.794.7
200082.582.292.583.587.195.695.697.095.695.994.294.294.594.794.5
t 3 0.51050070.770.892.369.678.595.295.195.894.296.092.191.992.791.491.9
100059.859.984.958.466.994.494.391.893.094.792.192.086.790.991.9
200048.148.173.346.155.594.494.289.793.194.093.093.081.891.892.4
2050077.777.895.577.984.995.795.797.895.996.894.294.295.694.394.8
100065.765.589.065.673.595.395.195.894.695.693.893.790.793.793.7
200053.052.878.752.162.594.894.793.094.595.093.993.986.793.994.1
4050082.782.396.483.988.496.196.098.495.896.194.794.696.695.194.8
100073.673.591.374.881.295.095.096.895.395.994.194.193.194.594.8
200062.361.984.263.971.395.395.095.395.095.894.594.490.994.494.7
Note. Dist = type of DIF distribution; norm = normal distribution; t 3 = t distribution with three degrees of freedom; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; Coverage rates smaller than 92.5 and larger than 97.5 are printed in bold font.
Table 7. Simulation Study 2: Coverage rate at 95% confidence level based on the standard error SE , total error TE , and bias-corrected total error TE bc for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
Table 7. Simulation Study 2: Coverage rate at 95% confidence level based on the standard error SE , total error TE , and bias-corrected total error TE bc for the estimated standard deviation σ ^ as a function of different DIF distributions, DIF standard deviation τ , number of items I, and sample size N.
SE TE TE bc
Distτ I N MMMGMRMGMHAERHAEMMMGMRMGMHAERHAEMMMGMRMGMHAERHAE
-01050095.195.399.294.895.998.998.999.598.498.896.196.199.295.396.3
100094.895.199.095.396.099.199.099.798.899.095.995.999.096.096.4
200094.794.898.095.295.399.099.099.599.099.195.695.698.195.695.8
2050095.095.099.195.195.798.398.499.598.098.495.495.399.195.596.0
100094.994.898.394.595.098.098.199.297.798.195.295.298.394.895.3
200095.795.597.295.395.498.398.199.098.098.196.195.997.495.595.6
4050094.494.898.394.695.297.297.498.896.897.294.595.098.394.895.4
100095.495.497.795.195.297.797.598.997.197.395.695.597.795.295.3
200095.695.797.295.795.797.797.798.597.797.695.895.997.295.895.9
norm0.251050093.793.899.191.193.498.798.799.597.998.795.194.999.193.295.2
100092.792.798.685.988.998.698.899.497.297.994.694.598.691.493.4
200089.890.096.677.380.698.498.498.895.396.593.793.996.888.891.0
2050093.193.298.991.293.398.098.199.497.397.894.194.598.993.794.5
100093.093.297.888.190.397.997.899.197.097.794.894.597.892.794.2
200091.091.295.481.484.297.397.598.895.996.994.094.296.093.093.7
4050094.594.497.993.194.097.097.299.097.097.795.094.997.994.794.8
100094.194.297.090.591.997.197.498.696.697.294.995.197.194.494.7
200092.992.794.886.288.497.297.498.396.396.694.895.195.494.394.7
norm0.51050089.889.998.378.286.498.197.899.195.497.193.293.598.390.192.5
100084.084.896.166.577.497.597.798.494.096.091.592.296.388.991.2
200076.476.591.554.864.296.396.596.991.894.491.392.192.688.790.3
2050091.291.398.582.688.297.997.899.495.997.494.294.798.692.993.9
100087.187.696.073.880.197.497.598.794.896.393.994.596.392.593.3
200080.281.291.159.367.296.697.097.693.695.194.394.794.091.992.8
4050092.292.397.586.590.196.696.998.896.197.294.394.597.694.194.5
100090.090.795.979.284.397.097.298.596.296.894.795.096.394.494.4
200084.285.490.868.775.196.496.798.195.696.394.695.094.494.495.1
t 3 0.251050093.994.199.391.794.498.999.099.698.098.795.395.599.394.295.5
100092.592.798.787.691.298.698.599.397.498.194.694.698.792.294.0
200089.990.497.480.786.098.298.399.196.397.393.994.397.690.892.8
2050093.493.998.791.393.697.898.099.497.397.894.694.798.793.594.7
100093.393.597.889.592.498.098.399.397.598.295.095.297.894.194.6
200091.191.296.383.388.498.098.299.096.597.294.794.796.693.393.8
4050094.194.298.293.294.496.997.099.197.097.594.694.898.294.795.0
100093.493.697.090.692.297.197.098.796.797.294.694.897.094.194.1
200092.392.595.187.490.897.097.398.195.996.594.494.895.493.794.4
t 3 0.51050091.191.398.882.190.498.498.699.296.898.094.694.698.891.893.6
100086.186.697.571.282.697.797.898.895.697.093.393.897.589.991.8
200079.780.094.661.173.497.097.298.293.795.192.192.394.990.091.2
2050091.692.098.384.690.497.597.999.496.697.694.794.998.393.393.7
100087.887.996.777.685.597.697.798.795.997.094.394.496.993.293.4
200081.081.994.166.478.597.397.598.494.896.394.794.895.492.893.6
4050091.892.698.188.292.097.197.499.196.396.894.594.898.194.094.2
100090.090.395.882.788.496.997.098.495.996.594.894.996.093.994.4
200086.887.693.873.982.396.997.298.395.696.495.595.895.294.494.9
Note. Dist = type of DIF distribution; norm = normal distribution; t 3 = t distribution with three degrees of freedom; MM = mean-mean linking; MGM = mean-geometric-mean linking; RMGM = robust mean-geometric-mean linking; HAE = symmetric Haebara linking; RHAE = robust symmetric Haebara linking; Coverage rates smaller than 92.5 and larger than 97.5 are printed in bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model. Stats 2024, 7, 592-612. https://doi.org/10.3390/stats7030036

AMA Style

Robitzsch A. Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model. Stats. 2024; 7(3):592-612. https://doi.org/10.3390/stats7030036

Chicago/Turabian Style

Robitzsch, Alexander. 2024. "Estimation of Standard Error, Linking Error, and Total Error for Robust and Nonrobust Linking Methods in the Two-Parameter Logistic Model" Stats 7, no. 3: 592-612. https://doi.org/10.3390/stats7030036

Article Metrics

Back to TopTop