Abstract
Construct validity is essential to evaluate the generalizability of findings on literacy and dyslexia. Operational definitions of reading literacy determine the measurement method, yielding territory or country-wide literacy rates. This practice echoes the norm in diagnosis and prevalence estimates of dyslexia. International Large-Scale Assessments (ILSA) of literacy such as the Programme for International Student Assessment (PISA) compare countries’ performances in relation to how well their students are reading. In this paper, we reexamine the validity claims and evidence using the examples of countries in Southeast Asia—Indonesia, Malaysia and Thailand, purported to have high proportions of poor readers. The challenge of characterizing reading performance and designing suitable measures for valid international comparisons is similar across phases of reading development and proficiency. The importance of the specificity of scripts and languages for reading abilities and impairments is highlighted. We suggest ways in which researchers can approach the assessment of reading proficiency from a cross-cultural and an interdisciplinary perspective. These can foster contextual caveats for generating and interpreting evidence.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
To what extent are international rankings of literacy achievement across countries and regions accurate? Given the centrality of basic reading and writing skills for higher level learning, such international rankings appear to be critical as global markers of educational quality. Indeed, most governments prioritize high levels of literacy. Presumably, some of their willingness to participate in international literacy surveys is attributable to the fact that they strive to understand and track the literacy skills of their populations. After all, with the rapid rise of international literacy assessments and their influential role in education policy, reading proficiency of different countries is promoted as an objective proxy of the effectiveness of their education systems. In this review, we consider the question of validity of international literacy measures using the example of PISA through the lens of dyslexia.
“Thinking about validity can be frustrating, and trying to do something about validity can be even more frustrating,” Kane (1992, p. 230) candidly stated. Trying to evaluate validity arguments of an international ranking on PISA reading and trying to say something about dyslexia can be equally frustrating. The former involves reading achievement of adolescents indexed by countries or economies and the latter centers on word reading difficulties. International large scale assessments can be a useful avenue to unpack research in reading impairments across languages. In most educational assessments, the test consumer focuses on interpretation and uses of scores while the test developer emphasises examining both the instrument and the theory that informed its construction. Literacy researchers with the goal of educational equity have the obligation to focus on both these roles.
Definitions of dyslexia vary widely across countries (e.g., McBride, 2019), and reading skills are also associated with economic factors at the country, school, and family levels (e.g., Chiu & McBride-Chang, 2006, 2010). Thus, we must scrutinize the extent to which international literacy comparisons are valid given the myriad of differences across countries and territories. We offer suggestions to consider in understanding the complexity of reading scores, including the poorest readers.
According to the International Dyslexia Association (2016), the global prevalence rate of dyslexia is approximately 15–20%. The variation of dyslexia prevalence in the alphabetic writing systems ranges from as low as 2% (Fluss et al., 2008; Miles et al., 1998), to an average of 12–15% and as high as 17.5% (Peterson & Pennington, 2012; Shaywitz, 1998) and 19.90% (Jiménez et al., 2011; Prior et al., 1995). One of the earliest reviews of dyslexia in non-alphabetic script was amongst Chinese and Japanese readers (Stevenson et al., 1982). Since then, there has been a gradual interest and increase in dyslexia research in Asia. In Chinese-speaking school children, for example, a dyslexia prevalence rate of 3.0–12.6% (Gu et al., 2018) is reported, with one estimate of 9.7% in Hong Kong (Chan et al., 2007). Globally, the rate of dyslexia is about 9.7% (Sharma & Sagar, 2017). Yang et al., (2022) analysed published results from the 1950s to June 2021 and estimated an average rate of developmental dyslexia to be at 7.10% (7.26% in alphabetic scripts and 6.97% in logographic scripts). Despite the fact that studies on the prevalence rate of dyslexia are common, it is important to note that across countries and regions, the definitions and identification of dyslexia may vary greatly, thus potentially rendering comparison of prevalence rates somewhat meaningless (see McBride, 2019).
Since Elley (1992) asked How in the World Do Students Read?, the Programme for International Student Assessment (PISA) reading literacy data have become ubiquitous as a default reference for how well students are reading and where they stand in comparison to their peers from other countries. We focus here on PISA results because they are among the best-known literacy comparisons. Given the tradition of such global literacy indices, to what extent are they comparable? As prevalence rates of dyslexia across countries and regions tend to be difficult or impossible to compare, how can ranking of reading ability be done meaningfully?
The aim of PISA reading literacy and the validity argument
The major goal of PISA is to assess whether adolescents who are of an average age of 15-years old upon completion of at least 6 years of formal compulsory schooling can apply what they have learned in school in real life situations (OECD, 2019a). What are the OECD arguments for the comparability of PISA across cultures? First, PISA reading adapts a broad literacy approach (Hopfenbeck et al., 2018); hence, the content is independent of the curricula mandated by a specific school board or government. Second, there are uniform testing conditions enabled through standard training that all test administrators across the countries undergo. Third, the adaptation and translation of test items is done to ensure linguistic equivalence (McQueen & Mendelovits, 2003).
Despite all of these, measurement invariance persists as a problem for valid cross-lingual comparisons (Huang et al., 2016; Padilla & Benítez, 2014). What are some of these difficulties? Measurement invariance or equivalence assumes that the psychometric properties of PISA are equal (i.e., invariant or equivalent) across groups (i.e., countries, economies, languages, scripts). This equivalence seeks to answer the questions of whether the reported rankings based on observed score differences are due to (i) actual differences in reading ability or efficacy of school systems or (ii) differences in how the test measured reading across different contexts and languages. Without measurement equivalence, group comparison can lead to incorrect conclusions and inference (Chen, 2007). Addressing this issue is essential to get a comprehensive and culturally valid framework of literacy spanning from the basic process of decoding to that of the more complicated one of comprehension and inference making.
Recognition of the relationship between literacy achievement and language development warrants the examination of measurement invariance (Asil & Brown, 2016; Oliveri & von Davier, 2011) and ecological validity of the measures (Papadopoulos et al., 2021). Measurement equivalence includes construct equivalence, test equivalence, and equivalence of testing conditions (Ercikan & Lyons-Thomas, 2013). Thus, there is a need for language-specific consideration in item construction and selection to ensure comparability. Item level equivalence has been examined in similar international assessments such as the Trends in International Mathematics and Science Study (TIMSS) (Ercikan & Koh, 2005), science items and comparability of the National Assessment of Educational Progress (of the United States), and the PISA (Stephens & Coleman, 2007). In the traditional approach of factor analysis used in such assessment, the goal of the primary level of invariance is to establish that indicators of the construct (that is, reading proficiency) are loaded on the same items across groups. However, item-level and scale-level comparability and consistency are difficult to attain for two main reasons. First, such a configuration requires a compelling theoretical justification of reading ability and its processes and experiences as almost equivalent across countries and languages. Yet, this assertion of a universal model of reading remains questionable (Plaut, 2012). Second, evidence of such configural invariance is best generated with multi-methods research (Luong & Flake, 2022) which is rarely a norm. If these fundamental measurement issues persist unaddressed, the claims of cross-contexts literacy achievement or underachievement are less compelling scientifically and less informative for classroom instruction.
There is still a major disagreement about sources of validity evidence required for justifying inferences and uses of test scores (Cizek et al., 2008). This could be because intended score meaning and intended test uses (Cizek, 2012) are incompatible as definitions of validity. To begin with, there are multiple perspectives of validity. This is taken to indicate that researchers’ understanding and application of validity are less standardized, perhaps impeding generalizability. The foundational psychometric framework of construct validity proposed by Meehl and colleagues was formally incorporated in the Technical Recommendations of the American Psychological Association (1954). According to Cronbach and Meehl (1955), construct validity is to be treated as an evaluation program with explicit theoretical definition of the construct reflected in the measures developed and used. This shifted the original emphasis of construct validity on internal consistency (Thurstone, 1952) to that of validity in terms of interpretations (Cronbach, 1971). The obligation of test developers and users are taken to be inherent in this process (Cronbach, 1988). Messick (1989) proposed a unified view of construct validity with an evidential basis of justified interpretations and usage of the test scores and a consequential basis of value implications and societal outcomes. Taking forward the validity as argument approach of the early proponents such as Cronbach, the perspective of Kane (1992, 2016) has, as discussed in the following paragraph, emerged as the most influential in educational assessments particularly for the large scale measurements (Chapelle et al., 2010). Using PISA scores, we seek to discuss the challenges of test construction and the validation process of reading as a construct across levels of proficiencies and languages. At the outset, we concede there is no doubt that the OECD has highly trained measurement experts who are involved in the sophisticated methodologies of the PISA (Berliner, 2020; Takayama, 2018).
Reexamining the construct validity basis of the PISA reading measure
The validity evidence of PISA is purportedly informed by the argument-based approach of Kane (OECD, 2018a). The fundamental target of Kane’s approach is to put forth different evidence of validity, alternative interpretations and hypotheses (Kane, 1992) when there are diverse contexts and stakeholder (Kane, 2001). The goal is to accommodate the potential of change in the degree of validity in light of new evidence from different contexts, a possible scenario for international assessments. However, the validation procedure in PISA does not necessarily conform to this intention and the claim of the argument approach. As noted by Addey et al. (2020), the validity practice of PISA measure contradicts this approach because, according to Kane (2013, 2016), validity is a framework of justified interpretation and uses of test scores, and not an intrinsic feature of the test or its performance.
Construct validity of the PISA (OECD, 2019b) has been primarily established on the basis of (1) the defined purpose of the instrument, (2) evidence from field trials of items testing the PISA theoretical framework, and (3) the adherence to all technical standards throughout the process of assessment. In the ‘National Project Manager Manual’ (OECD, 2019b) of PISA 2021, a similar argument of technical standards as validity evidence is put forward. However, the actual process and the final documentation of validity arguments is a prototype of assembled validity (Addey et al., 2020). This means that the cumulative practice of generating and establishing evidence of validity, starting from the field trials to the publishing of the report involves constant negotiation. The different stakeholders have varying levels of socio-political power and material resources. Of special relevance is the instance of PISA for Development (PISA-D) for low- and middle- income countries first launched in 2013 (OECD, 2016a). Addey et al., (2020) distinguished between authorized validity arguments and unauthorized ones highlighting that the crucial discussions of (PISA-D) items have, for the most part, not happened in the official meetings, but in unofficial conversations. India’s revoking of the decision to participate in PISA is also a relevant instance. Only two states- Tamil Nadu and Himachal Pradesh participated in it, an event which was declared by the OECD to ‘..not meet the PISA standards for student sampling’ (Bloem, 2013; p. 21). Accordingly, the results from these data were rejected by the Indian Government because the PISA test was not appropriate for the diverse contexts of the country (Chakraborty et al., 2019). If negotiation fails or new data contest the existing status of the scientific instrument of reading (OECD, 2018b), how does such a singular event inform the attempt to gather new evidence of validity?
The OECD (1999) claims that most PISA items were developed in English because of practical reasons. Indeed, the PISA reading test items have two original source versions—English and French. All translations in other languages are done based on one of these two languages. Compared to the original English version, the Finnish version was on average 8% longer, the Irish version 11% longer and the German version 17% longer in one study (Eivers, 2010). Moreover, these translated measures are judged to be more comparable if they share linguistic (Grisay et al., 2009) and geographical (Grisay & Monseur, 2007; Kankaraš & Moors, 2014) proximities with the original source. Out of the 30 countries that submitted the reading items for PISA 2009, only two are from Asia—Korea and Macao, China (OECD, 2009). Takayama (2018) noted that the Reading Expert Group (REG) of this same year included two representatives from Asia—Japan and Korea. The same nominee from Japan was also a part of the PISA 2000 Reading Function Expert Group (OECD, 2001). Though this individual was intended to be the expert from outside Europe and North America, this aforementioned representative was described as having ‘no expertise in reading at all’ (Takayama, 2018; p. 226). In such an evidently skewed capture of expertise, arguably reflected in the PISA framework of literacy, relooking at what poor reading and proficiency reading actually mean in both a practical and theoretical sense are warranted. If assessment aims to inform educational policy and practice, a dyslexia lens needs to be inherent in this framework. For example, there is no definitive answer as to “What 15-year-old students in Malaysia know and can do” (OECD, 2018c) based on a reading measure unless we have clear evidence as to how emergent word decoding contributed to this knowing and doing.
As mentioned above, the measurement invariance of the PISA test is consistently questioned (see Arffman, 2013). For example, Söyler et al. (2021) found there were substantial differences between the item threshold and factor loading of the test items in the PISA 2015 reading test between countries that tested native English and non-native speakers. Scores from these two groups–the first one including Canada, the USA, and the UK, were compared with that of the second, namely, Japan, Thailand and Turkey; the authors reported that eight of the twenty-eight items had limited invariance. In light of such significant invariance, what is the inference of reading and educational attainment we can make from the international ranks of these countries?
The PISA ranking analysis is based on the marginal maximum likelihood (MML) estimates of item and population parameters. This estimation assumes the normal distribution of reading ability as a latent variable. However, the participating countries differ markedly in languages and educational policies (Kreiner & Christensen, 2014). The scaling of PISA items follows the Rasch model. Thus, it is the basis of the person parameter estimation. This model assumes that a unidimensional trait gives rise to the scores and most importantly, all items in each country have the same difficulty level (Berliner, 2020). Kreiner and Christensen (2014) reported negative estimates of the Rasch parameter for all countries. This indicates that the assumption of a unidimensional trait—the difference between the student’s reading ability and the difficulty of the item—giving rise to the scores requires more scrutiny. Moreover, they used the conditional likelihood ratio (CLR) test and demonstrated that PISA reading item parameters are not the same in all countries.
In addition, the imprimatur of objective assessments purportedly co-constructed with local experts is often used to substantiate the contextual claims of PISA (Lockheed et al., 2015). “It is an international assessment, so we cannot shape it very much” (Gorur et al., 2019; p. 319) was a comment made by an official of the Research and Test Development department of The Zambian PISA-D implementation team. With primary focus on standardised procedure of test constructions and adaptations, the need of contextualization prevails as an unresolved tension (Gorur et al., 2019). For researchers studying reading development across languages, this can be reframed as two salient challenges. Conceptually, how and when do word reading difficulties converge with comprehension impairments? Will it replicate across languages, socio-cultural and educational contexts? Methodologically, how can we refine the psychometric measures to capture the construct comprehensively so that variation will not deter the larger scientific goal of generalization. We address these by presenting the case of reading development in three countries of Southeast Asia.
Sampling and comparisons from Southeast Asia: reading in multilingual contexts
Indeed, the primary barrier of evaluating the comparability of PISA scores is the difficulty in sampling (Bloem, 2013, 2015) specifically from regions that are allegedly performing at lower levels. Sampling comparability is critical in order to estimate trends in literacy development and impairment over time. To illustrate this issue more concretely, we selected three low-performing countries in Southeast Asia for comparison. These are Indonesia, Thailand, and Malaysia, countries with populations that are roughly 273 million, 70 million, and 32 million, respectively.
Our rationales for our selection of these three Southeast Asian countries were the following: First, they all fall under the middle-income, low performers group (The World Bank Group, 2019). Second, the measure was administered in one of the regional languages of all three of these countries. This is important because presumably taking a standardized test in one’s native language can be helpful. In a meta-analysis conducted by Melby-Lervåg and Lervåg (2014), the researchers have found that second language learners exhibited a medium-to-large effect size of poorer reading and language comprehension skills as compared to first language learners. Hence, in the Philippines, for example, the test is administered in English (OECD, 2019c), a language that children all learn in school, but one that is likely not the native language of the vast majority of the school children. Third, all three countries had participated in PISA prior to 2018, allowing us to make a trend comparison. Moreover, there are few studies published using PISA scores from Southeast Asia. In fact, most research publications on PISA are of the USA (114), Australia (72), Germany (69), the UK (52), and Ireland (31) (Hopfenbeck et al., 2018).
We examined students’ PISA reading performances in Indonesia, Thailand, and Malaysia (Table 1) for over a decade: the scores from 2009, 2012, and 2015 were compared to that of 2018. According to the OECD (2018c), the 2015 PISA scores of Malaysia are internationally incomparable ‘due to the potential of bias introduced by low response rates in the original PISA sample’ (p. 3). Approximately 5,000 students were sampled from each country, generating an aggregate comparability.
Table 1 demonstrates a general trend for all three countries included in the analysis to have either declined or stayed relatively steady in PISA scores across 10 years. None of them showed substantial improvement. In contrast, for reference, consider a country that has consistently secured high ranks, namely, South Korea. The performance of the high achieving students contributed to the significant 31 points increase in reading between PISA 2000 and PISA 2006 of South Korea (Schleicher, 2009). New questions arise from this contrast: First, if it is the advanced readers of a country, primarily, who start scoring higher, can this alone improve the average performance and, hence, the overall rank of the specific country? Second, how effectively do the items of the PISA actually assess the low achieving students, or the poorer readers?
PISA reading Level 2 is the baseline level of proficiency. Though it is not a starting point, students at this level can locate multiple pieces of information in a moderate length text and understand the relationships between them. In the absence of little to none extraneous information, they can figure out the central message relatively easily. Those who have scored below Level 2 are identified as low achievers in the PISA reading scale (OECD, 2016a). Following analyses for the PISA in 2012, low achievers are further divided into sub-categories of 2 and even lower (namely, 1a, 1b, 1c, and below).
As shown in Table 2, the majority of students in all the three countries are either baseline or poor achievers. That is, the percentage of participants from Indonesia, Malaysia and Thailand who are on level 2 and below are 91.7%, 77.3% and 85.6% respectively. This raises the question as to how more than half of the students tested in these countries can be poor readers. For reference, the percentages of those in the US, UK, and Australia who fell into the category of level 2 and below were 40.31%, 40% and 40.7% respectively.
Is it possible that the experience of PISA might differ across regions? Do items differ strongly based on language and/or educational practices and, hence, influence the performance? In Asia, it is likely that the PISA measures generally are less comparable across linguistic and cultural dimensions (Grisay & Monseur, 2007; Grisay et al., 2009). One of the cases of ‘PISA shock’ (Santos & Centeno, 2021) in Asia was that of Japan's PISA rank decreasing from the 8th in 2000 to the 14th in 2003 (Takayama, 2008). Japanese students omitted 9% of the items in the PISA 2009 reading measure (Okumura, 2014). This omission tendency was larger for the open-ended items than the closed ones supposedly because of lack of experience in generating sentences. As a concrete example of this phenomenon elsewhere, Hatzinikita et al., (2008) reported discrepancies between the languages used in Greek textbooks and PISA items. The former uses specialized scientific textual content whereas the linguistic mode of the latter is nonspecialized narratives focusing on scientific method. The authors hypothesized that years of experience informed by the textbook standard influenced the reading and inferences students make. Based on the analysis of data from English-, French-, and German-speaking countries, Blum et al., (2001) argued that the comparability claim is falsified by the significant association of item success rate with the language used.
Moreover, multilingualism is the norm for most people across the globe. Consequently, perhaps the majority of all children worldwide learn to read and write in a language different from the one spoken in their home environment (e.g., McBride, 2016). As mentioned above, dyslexia is indeed a cross-cultural phenomenon that occurs in all languages and scripts (Shaywitz et al., 2008). However, difficulties in reading are also likely to be more pronounced in the context of diglossia, defined as the linguistic distance between the language students learn to speak and the written form they are exposed to in literacy instruction (see Saiegh-Haddad et al., 2022). This is one reason that Vagh and Nag (2019) argued that Generalizability Theory and Item Response theory are inadequate for international usage and Akshara languages. For example, for about 80 percent of school students in Indonesia, Bahasa Indonesian is their second or third language (Elley, 1992). However, PISA testing in Indonesia consistently takes place in Bahasa Indonesia. Previous research has demonstrated that children’s literacy learning may be impeded when the home and school language differ particularly in low- and middle-income (Nag et al., 2019). It is not uncommon that children living in Asia, Indonesia and Malaysia particularly, might not always have the opportunity to choose a school language (i.e., medium of instruction) that best reflects their home language (see McBride et al., 2022). Furthermore, even in cases in which such children’s home language is indeed similar to that of the school language, namely, Malay, the variant of Malay to which they are exposed may be quite different. For example, Sneddon (2003) stated that most, if not all, Indonesian children are only exposed to the ‘informal’ variant of Malay (i.e., Bahasa Indonesia); only via schooling opportunities are they exposed to the ‘formal’ variant. Previous studies have shown that, due to the differences in grammar, vocabulary, and phonology of both Malay variants, i.e., formal and informal, among young children in Singapore (i.e,. Bahasa Melayu), exposure to the nonstandard ‘informal’ Malay variant leads to difficulties and challenges in learning standardized written Malay (Jalil & Liow, 2008 but see Habib et al., 2022 for different results); such difficulties also tend to affect their acquisition of literacy skills. The topic of diglossia in Malay warrants more research across Malay-speaking countries.
Prevalence of diglossia is common in various parts of the world. Spoken Arabic and Modern Standard Arabic is one of the most prevalent instances of diglossia (Maamouri, 1998; Saiegh-Haddad, 2003). The PISA-D report of Senegal estimated that for about 93.7% of grade 7 students, French, which is their medium of instruction in school, is not their home language (OECD, 2017; p. 49). The most recent 2020 PISA-D reported that only 28% of the respondents in Senegal speak French in their home environment. Likewise, only 17% of the participants in Paraguay reported speaking Spanish at home (OECD, 2020; p. 11). In Zambia, this estimate of a different language of instruction and PISA items from that of their home language is about 83%. Also, about 80% of students score below Level 1a (The Ministry of General Education, Zambia, 2017). According to the PISA reading proficiency criteria, this means that these readers struggle to understand the literal meaning of a short and simple text and identify the explicit relevant information or the purpose of the passages (OECD, 2018d). In all these countries, linguistic distance is a major hindrance of overall school performance such as grade repetition (Delprato, 2021).
In the 2018 PISA data of Thailand, students' and schools’ economic, social, and cultural status (ESCS) together explained 37.7% of the variance in reading scores within schools (OECD, 2019d). Using this same index and PISA scores from 2003 to2018, Lam and Zhou (2021) showed that the high performing education systems in East Asia also have consistent and significant socio-economic status and achievement gaps. This disparity is the least in Macao where the academic achievement and opportunities are not very different for students from across socio-economic strata.
How reliably can reading proficiency be compared across such diverse linguistic and cultural contexts? Perhaps, identifying struggling readers can be an equitable and effective starting point. To reiterate, similar to the challenge of construct equivalence of PISA, reading impairments can manifest differently for different languages. Lopes et al. (2020) analyzed 800 studies of dyslexia undertaken over the past two decades and found that clear criteria for participant recruitment were rarely made explicit. The common norm of demonstrating an IQ–reading discrepancy despite substantial evidence that this discrepancy requirement is not helpful for understanding persistent reading difficulties (Siegel, 1989) still persists (Tzouriadou, 2022). Elliott and Grigorenko (2014) made a provocative plea to replace the term dyslexia with reading disability. Amidst the debates, differences, and controversies, the term ‘dyslexia’ is here to stay for the foreseeable future (Elliott, 2020). If it is true that scientific knowledge generation and verification struggle to keep pace with colloquial parlance and societal discussion, what then should the role of scientists to society? Perhaps, recognizing the diversity of languages and learning contexts can be the central tenet of both science and policy.
Asia has many languages and scripts. Each country has unique challenges of educational curriculum requiring mastery of multiple languages and scripts, often dissimilar to the ones spoken in students’ home environments. Dyslexia is described and diagnosed differently across the globe, but the typical consensus is of impairment at the level of word reading. However, most international estimates of literacy assess reading and language comprehension and not word reading. Given the technical conditions and contextual challenges discussed hitherto, what can international large-scale literacy assessments teach us about dyslexia, both perils and prospects? A preliminary step could be to relook at the global perspective of literacy and refine the understanding of the association between early word reading and comprehension.
How is word reading related to reading comprehension across languages and scripts?
One fundamental question that remains critical is the extent to which word reading involves the same processes across languages and scripts. This issue of “universals” (Frost, 2012) and “specifics” of reading (Plaut, 2012) has centered on word reading. It is all the more challenging to extend what is known about dyslexia to reading comprehension difficulties. Moreover, how these early processes relate to PISA scores is unknown. Previous work (e.g., Daniels & Share, 2018; McBride & Mo, 2021) has underscored sources of variability in word reading and word writing across cultures. Differences across phonological, semantic, and visuo-orthographic aspects of print may or may not influence reading speed and accuracy. A consideration of these may optimize construct validity across cultures. Treating and reporting construct validity explicitly and accurately can facilitate the comparability of scores across contexts.
Some uniquenesses of reading are best illustrated with Thai. In Thai, both early and skilled readers frequently use syllable segmentation strategies and tone markers as salient cues. This is a critical language-specific aspect embedded within the instructional context. Thus, the formal teaching method for Thai primarily focuses on teaching children about correspondences between whole spoken and written syllables rather than about grapheme–phoneme correspondences (Winskel, 2013).
At the semantic level, Thai often involves substantial compounding. Thai also involves considerable top-down processing to disambiguate ambiguous phrases and sentences. It appears that more flexibility in processing is required in Thai print than in most other scripts and languages (e.g., Aroonmanakun, 2002). At the level of visuo-orthographic processing, most students’ literacy instruction in school involves reading and spelling similar words (Winskel & Ratitamkul, 2019). A lexical strategy usage is inferred based on the higher occurrences of lexical errors in reading both words and nonwords. Early Thai readers tend to inaccurately segment monosyllables (Winskel & Iemwanthong, 2010). This is a critical language-specific aspect embedded within the instructional context. Text is also typically presented unspaced, potentially slowing down reading in children (e.g., Kohsom & Gobet, 1997; Winskel et al., 2009). Unspaced Thai texts such as those used for the items presented in PISA can result in lower scores due to slower processing, longer time, and higher errors. If most cognitive resources (Ehri, 2005) are spent in reading non-spaced words and lexical quality remains low, reading comprehension can be severely impaired. If better word processing facilitates formation of compounds, which is a crucial characteristic of Thai morphology, perhaps students’ reading comprehension can also be enhanced.
What matters? The extent to which reading is affected by the variables mentioned above for the PISA is not yet known. For example, it is difficult to judge with certainty whether and how diglossia in Malay might affect reading comprehension. Equally, we do not know whether several aspects of text reading in Thai that differ substantially from English text reading influence speed of reading comprehension. Our focus here is primarily to point out what could matter for reading performance and to stimulate further research to understand these phenomena particularly vis-à-vis the issue of poor reading performance, including dyslexia.
Given these illustrations, we have three main suggestions for enhancing construct validity for future PISA and related work focused on identifying proportions of poor readers across regions. These are informed by lessons from diverse fields of inquiry. Global dyslexia discourse necessitates interdisciplinary perspectives. Petscher et al., (2020) proposed characteristics of “team translational science” which they consider to represent the roadmap for reading research (Solari et al., 2020). Inspired by it, we propose recommendations that correspond to the aforementioned challenges. These are to be considered as coexisting dependent components, rather than independent disparate elements.
Suggestion 1: including dyslexia lens in rethinking international literacy framework for educational policy consequence
What might large-scale literacy assessments look like if they are conceptualized and designed to identify poor decoders rather than proficient comprehenders? The explicit assumption is that of reading proficiency as a continuous distribution. That is, most students are neither dyslexic nor highly advanced; they are, by definition, in the middle of the distribution of all readers. Given this, a potential consideration is to have well-defined criteria of functional illiteracy (Vágvölgyi et al., 2021), categorically distinctive from both developmental dyslexia and skilled reading. One might argue that the same measure reporting proficient readers in specific languages and countries is disproportionately detecting poor readers in others.
If improving reading proficiency is the target and identifying students who are struggling readers is the starting point, then the formative question to ask is “What is the purpose of assessment?” We emphasize that understanding the issue of “how does this facilitate a student?” should be the consistent underlying theme for literacy stakeholders across all levels of decision making and action. This is also echoed by Winograd et al. (1991) who recommends goals of assessment to be explicitly and directly linked to interpretation for instruction. Applications for instruction can and should be a consistent anchor for all stages of dyslexia research design and implementation.
How can international large-scale assessment of reading and local small-scale identification and intervention of reading disability inform each other? This question is particularly important in countries in which a negligible research and development system of dyslexia (Mather et al., 2020) exists. The “science of reading” also needs to be understood within the context of the sociology of education and cultural anthropology. A truly international approach to dyslexia and broader literacy should also account for cultural practices and collective attitudes towards learning. The literacy without schooling (Scribner & Cole, 1978) evidence had alerted the scientific community to the importance of re-examining our conventional understanding of cognitive development and literacy. There are both major commonalities in reading across scripts and major cultural variations across the world; these two need not be antithetical.
Suggestion 2: methodological rigor for educational equity
Methodological rigor entails measures construction, sample selection, and data collection and reporting. At the onset, the equivalence of the construct and the measure across languages must be established (Papadopoulos et al., 2021). Before the initial phase of item construction and selection, a team of experts including researchers and teachers from the region should illustrate the linguistic distinctiveness of reading a particular script and how this poses challenges within their educational context (for a review, see Daniels & Share, 2018). A formal documentation and presentation of the arguments and evidence should be published beyond the claim of consultation. Variation within the standardization accounting for a correspondence between students’ background of knowledge and experiences with assessment (Snyder et al., 2005) might improve comparability. For instance, for Thailand, one might include a statement of how items assessing information retrieval in Thai can be weighted more by certain points than for languages/scripts containing spaced words since time and errors can be influenced by spacing.
A potential solution for measurement invariance could be the alignment method (Asparouhov & Muthén, 2014) treating this invariance as an optimization problem. The primary assumption of this method is that most items are approximately or partially invariant and the goal is to minimize the invariance by reducing the differences between factor loadings and item intercepts. Used as a procedure of exploratory analysis to identify non-variant items, the alignment method is perhaps best for group comparisons of latent mean scores (Luong & Flake, 2022). Suggestions include a cutoff of at least 25% (Asparouhov & Muthén, 2014) and 29% (Luong & Flake, 2022) non-variant items. It allows researchers to consider the extent to which the non-variance is theoretically and practically significant. For example, does the non-variance tell us what are the distinctive challenges of students with difficulties in reading Thai from that of Bahasa Indonesia? This decision about meaningful interpretation of differences in factor and item levels can be made in the construction and design phase prior to analysis. Running simulations across levels of measurement invariance with specific features of different groups could be helpful for making a more accurate decision about the continuity of reading from decoding and fluency to text comprehension.
Contrary to the ‘large-scale’ norm, Wagner (2011) proposed an alternative Smaller Quicker Cheaper (SQC) approach for literacy assessment. Informed by it, we suggest small but strategic sampling to go beyond the norm of studying and testing easily accessible groups and more frequent intervals of assessment. The former can be a means for including students and languages that have historically been excluded from any formal system of research and intervention. The latter might enable capturing of at-risk students without further delay. He also suggests shareability to equitably negotiate the conflicts of the etic (international comparison) objective with that of the emic approach (local contexts). This pertains to transparency in methods and measures development for ensuring replicability by the concerned sites without perpetual dependence on external experts.
For PISA sampling, a question that has not received enough scrutiny is “Where are the 15-year-olds?” (OECD, 2016b). Who we study informs what is reported as the norm and the outliers. Barrett (2020) calls for the new wave of Cross-Cultural Cognitive Science to adapt ‘principled sampling of people and phenomena’ (p. 683) using hypothesis-driven sampling and representative sampling. Most international assessments explicitly aim for the latter. However, comparability would be redundant if there were no a priori hypotheses of variation. If we have a hypothesis, for example, that reading in English and Thai share the similarity of phonemic awareness influencing early word reading but a differentiated influence of morphological knowledge on comprehension, then the test items should reflect this accordingly. An often-neglected fact is that there is a large middle ground between proficient readers and dyslexic readers. For measures to capture the expected similarity and/or variability, we suggest principled sampling. A common data collection and management framework mandating transparency in psychometric properties (Flake, 2021) of measures and statistical analyses of data is called for to monitor the progress and evaluate the effectiveness of an international assessment project.
Of particular relevance now is the feasibility of online data collection. This is a predicament accentuated by the Covid-19 pandemic. It can be amended to reach hitherto seldom included students from marginalized communities. At the same time, it can become yet another double-edged sword because internet and technology access is still a function of socio-economic resources in most countries. A way forward can be training and incentivizing researchers and other stakeholders from these settings to collect and share their data and findings in a crowd-sourced database such as the recently launched Global Literacy Assessment Dashboard (GLAD) (Patel, 2021). Since the GLAD is in its initial stage, this can be a useful platform from which to adapt lessons from similar global initiatives and also synergise the chasm between literacy and dyslexia.
Suggestion 3: communication and collaboration for accurate interpretation and effective implementation
In interpreting reading scores from a single country as well as comparing similarities or differences between countries, we make generalizations from the sample to the population. The generalizability challenge is the “horse before the cart” of cross-country valid comparisons. Generalizability extends beyond the population differences or the statistical analyses of the relationships. An essential strategy is to reconsider the causal relationships between variables and the exact mechanisms by which countries and/or students differ. Taking a lesson from the proposal of “Constraints on generality” (Simons et al., 2017) for empirical research and demonstrated in a cross-cultural study by Tiokhin et al. (2019), international literacy reports can include similar statements. For example, there was a decision to introduce the fluency test though not included for the cumulative reading scores of PISA purportedly to account for readers at a lower proficiency level. For this test, it is advisable to highlight that for certain languages and countries, the scores may vary significantly if interpreted within the word-level items only. An attempt in this direction to improve validity and generalizability amidst the variation across contexts can also be valuable for dyslexia discourse across cultures.
‘Different countries, different evidence?’, Strassheim and Kettunen (2014) questioned in light of international comparisons for science-informed policy. It is incumbent on us not to let this remain an ostensibly rhetorical remark while discussing literacy achievements and impairments globally. For evidence-based dyslexia research to mature into effective implementable programs, we recommend instruction-focused and culturally grounded institutionalized practices, from conceptualization to communication.
Conclusion
The multiplicity of problems and contexts need not be a deterrent for refining the construct of reading. We demonstrated it using the PISA validity framework, including the procedure and evidence with relevant examples of multi-lingual contexts of literacy development. Global scientific and educational movement of literacy is a worthwhile collective goal. The primary step could be the re-examination of the theoretical and operational definitions of reading abilities and disabilities to inform international assessments. Further, the explicit design and mandate of research and implementation systems are to promote transparent collaboration, equity and effectiveness. With common goals and strategies for enhanced literacy and learning, global networks with local partnerships can be tenable and mutually beneficial. The suggestions made if optimally adapted can enhance the accuracy and validity of assessments of global literacy achievements. Thus, concerted institutional initiatives can enable the capture of a clearer and granular picture of how students in the world read.
References
Addey, C., Maddox, B., & Zumbo, B. D. (2020). Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs). Assessment in Education: Principles, Policy & Practice, 27(6), 588–606. https://doi.org/10.1080/0969594X.2020.1843136
American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51(2, Pt.2), 1–38. https://doi.org/10.1037/h0053479
Arffman, I. (2013). Problems and issues in translating international educational achievement tests. Educational Measurement: Issues and Practice, 32(2), 2–14. https://doi.org/10.1111/emip.12007
Aroonmanakun, W. (2002). Collocation and Thai word segmentation. In T. Theeramunkong & V. Sornlertlamvanich (Eds.), Proceedings of the fifth symposium on natural language processing and the fifth oriental COCOSDA (International Committee for the Coordination and Standardization of Speech Databases and Assessment Techniques) workshop (pp. 68–75). Sirindhorn International Institute of Technology.
Asil, M., & Brown, G. T. L. (2016). Comparing OECD PISA reading in English to other languages: Identifying potential sources of non-invariance. International Journal of Testing, 16(1), 71–93. https://doi.org/10.1080/15305058.2015.1064431
Asparouhov, T., & Muthén, B. (2014). Multiple-group factor analysis alignment. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 495–508. https://doi.org/10.1080/10705511.2014.919210
Barrett, H. C. (2020). Towards a cognitive science of the human: Cross-cultural approaches and their urgency. Trends in Cognitive Sciences, 24(8), 620–638. https://doi.org/10.1016/j.tics.2020.05.007
Berliner, D. C. (2020). The implications of understanding that PISA is simply another standardized achievement Test. In G. Fan & T. Popkewitz (Eds.), Handbook of education policy studies. Springer. https://doi.org/10.1007/978-981-13-8343-4_13
Bloem, S. (2013). PISA in low and middle income countries. OECD Education Working Papers, 93. doi: https://doi.org/10.1787/5k41tm2gx2vd-en
Bloem, S. (2015). PISA for low- and middle-income countries. Compare: A Journal of Comparative and International Education, 45(3), 481–486. https://doi.org/10.1080/03057925.2015.1027513
Blum, A., Goldstein, H., & Guérin-Pace, F. (2001). International Adult Literacy Survey (IALS): An analysis of international comparisons of adult literacy. Assessment in Education Principles Policy and Practice, 8(2), 225–246. https://doi.org/10.1080/09695940123977
Chakraborty, S., Elde Mølstad, C., Feng, J., & Pettersson, D. (2019). The reception of large-scale assessments in China and India. In New practices of comparison, quantification and expertise in education: Conducting empirically based research. Routledge.
Chan, D. W., Ho, C.S.-H., Tsang, S.-M., Lee, S.-H., & Chung, K. K. H. (2007). Prevalence, gender ratio and gender differences in reading-related cognitive abilities among Chinese children with dyslexia in Hong Kong. Educational Studies, 33(2), 249–265. https://doi.org/10.1080/03055690601068535
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13. https://doi.org/10.1111/j.1745-3992.2009.00165.x
Chen, F. F. (2007). Sensitivity of Goodness of Fit Indexes to Lack of Measurement Invariance. Structural Equation Modeling: A Multidisciplinary Journal, 14(3), 464–504. https://doi.org/10.1080/10705510701301834
Chiu, M. M., & McBride-Chang, C. (2006). Gender, context, and reading: A comparison of students in 43 countries. Scientific Studies of Reading, 10(4), 331–362. https://doi.org/10.1207/s1532799xssr1004_1
Chiu, M. M., & McBride-Chang, C. (2010). Family and reading in 41 countries: Differences across cultures and students. Scientific Studies of Reading, 14(6), 514–543. https://doi.org/10.1080/10888431003623520
Cizek, G. J. (2012). Defning and distinguishing validity: Interpretations of score meaning and justifications of test use. Psychological Methods, 17(1), 31–43. https://doi.org/10.1037/a0026975
Cizek, G. J., Rosenberg, S. L., & Koons, H. H. (2008). Sources of validity evidence for educational and psychological tests. Educational and Psychological Measurement, 68(3), 397–412. https://doi.org/10.1177/0013164407310130
Cronbach, L. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 443–507). American Council on Education.
Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. Braun (Eds.), Test validity (pp. 3–17). Lawrence Erlbaum Associates Inc.
Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302. https://doi.org/10.1037/h0040957
Daniels, P. T., & Share, D. L. (2018). Writing system variation and its consequences for reading and dyslexia. Scientific Studies of Reading, 22(1), 101–116. https://doi.org/10.1080/10888438.2017.1379082
Delprato, M. (2021). Indigenous learning gaps and home language instruction: New evidence from PISA-D. International Journal of Educational Research, 109, 101800. https://doi.org/10.1016/j.ijer.2021.101800
Ehri, L. C. (2005). Development of sight word reading: Phases and findings. In M. J. Snowling & C. Hulme (Eds.), The science of reading: A handbook (pp. 135–154). Blackwell Publishing. https://doi.org/10.1002/9780470757642.ch8
Eivers, E. (2010). PISA: Issues in implementation and interpretation. The Irish Journal of Education, 38, 94–118. http://www.jstor.org/stable/20789130. Accessed 10 Feb 2022
Elley, W. (1992). How in the world do students read? The IEA study of reading literacy. International Associations for the Evaluation of educational Achievement.
Elliott, J. G. (2020). It’s time to be scientific about dyslexia. Reading Research Quarterly, 55(S1), 61–75. https://doi.org/10.1002/rrq.333
Elliott, J. G., & Grigorenko, E. L. (2014). The dyslexia debate. Cambridge University Press.
Ercikan, K., & Koh, K. (2005). Examining the construct comparability of the English and French versions of TIMSS. International Journal of Testing, 5(1), 23–35. https://doi.org/10.1207/s15327574ijt0501_3
Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and cultures. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I.C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.), APA handbook of testing and assessment in psychology, Vol. 3. Testing and assessment in school psychology and education (pp. 545–569). American Psychological Association. https://doi.org/10.1037/14049-026
Flake, J. K. (2021). Strengthening the foundation of educational psychology by integrating construct validation into open science reform. Educational Psychologist, 56(2), 132–141. https://doi.org/10.1080/00461520.2021.1898962
Fluss, J., Ziegler, J., Ecalle, J., Magnan, A., Warszawski, J., Ducot, B., Richard, G., & Billard, C. (2008). Prevalence of reading disabilities in early elementary school: Impact of socioeconomic environment on reading development in 3 different educational zones. Archives of Pediatrics, 15(6), 1049–1057. https://doi.org/10.1016/j.arcped.2008.02.012
Frost, R. (2012). Towards a universal model of reading. Behavioral and Brain Sciences, 35(5), 263–279. https://doi.org/10.1017/S0140525X11001841
Gorur, R., Sørensen, E., & Maddox, B. (2019). Standardizing the context and contextualizing the standard: Translating PISA into PISA-D. In M. Prutsch (Ed.), Science, numbers and politics. Palgrave-Macmillan. https://doi.org/10.1007/978-3-030-11208-0_14
Grisay, A., Gonzalez, E., & Monseur, C. (2009). Equivalence of item difficulties across national versions of the PIRLS and PISA reading assessments. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 63–83.
Grisay, A., & Monseur, C. (2007). Measuring the equivalence of item difficulty in the various versions of an international test. Studies in Educational Evaluation, 33(1), 69–86. https://doi.org/10.1016/j.stueduc.2007.01.006
Gu, H., Hou, F., Liu, L., Luo, X., Nkomola, P. D., Xie, X., Li, X., & Song, R. (2018). Genetic variants in the CNTNAP2 gene are associated with gender differences among dyslexic children in China. eBioMedicine, 34, 165–170. https://doi.org/10.1016/j.ebiom.2018.07.007
Habib, M., Arshad, N. A., & O’Brien, B. A. (2022). Acquiring literacy in the diglossic contexts of Malay and Tamil in Singapore: Problems and prospects in early childhood classrooms. In E. Saiegh-Haddad, L. Laks, & C. McBride (Eds.), Handbook of literacy in diglossia and in dialectal contexts (pp. 273–301). Springer. https://doi.org/10.1007/978-3-030-80072-7_13
Hatzinikita, V., Dimopoulous, K., & Christidou, V. (2008). PISA test items and school textbooks related to science: A textual comparison. Science Education, 92(4), 664–687. https://doi.org/10.1002/sce.20256
Hopfenbeck, T. N., Lenkeit, J., El Masri, Y., Cantrell, K., Ryan, J., & Baird, J. (2018). Lessons learned from PISA: A systematic review of peer-reviewed articles on the programme for international student assessment. Scandinavian Journal of Educational Research, 62(3), 333–353. https://doi.org/10.1080/00313831.2016.1258726
Huang, X., Wilson, M., & Wang, L. (2016). Exploring plausible causes of differential item functioning in the PISA science assessment: Language, curriculum or culture. Educational Psychology, 36(2), 378–390. https://doi.org/10.1080/01443410.2014.946890
International Dyslexia Association. (2016). How widespread is dyslexia. https://dyslexiaida.org/how-widespread-is-dyslexia/. Accessed 10 Feb 2022
Jalil, S. B., & Liow, S. J. R. (2008). How does home language influence early spellings? Phonologically plausible errors of diglossic Malay children. Applied Psycholinguistics, 29(4), 535–552. https://doi.org/10.1017/S0142716408080235
Jiménez, J. E., de la Cadena, C. G., Siegel, L. S., O’Shanahan, I., García, E., & Rodríguez, C. (2011). Gender ratio and cognitive profiles in dyslexia: A cross-national study. Reading and Writing, 24(7), 729–747. https://doi.org/10.1007/s11145-009-9222-6
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 272–235. https://doi.org/10.1037/0033-2909.112.3.527
Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319–342. https://doi.org/10.1111/j.1745-3984.2001.tb01130.x
Kane, M. T. (2013). Validating the interpretation and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. https://doi.org/10.1111/jedm.12000
Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192
Kankaraš, M., & Moors, G. B. D. (2014). Analysis of cross-cultural comparability of PISA 2009 scores. Journal of Cross-Cultural Psychology, 45(3), 381–399. https://doi.org/10.1177/0022022113511297
Kohsom, C., & Gobet, F. (1997). Adding spaces to Thai and English: Effects on reading. In Proceedings of the 19th Annual Meeting of the Cognitive Science Society (pp. 388–393). Erlbaum. bura.brunel.ac.uk/bitstream/2438/2122/1/Cogsci%2797-thai.pdf. Accessed 20 Feb 2022
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79, 210–231. https://doi.org/10.1007/s11336-013-9347-z
Lam, S. M., & Zhou, Y. (2021). SES-achievement gaps in East Asia: Evidence from PISA 2003–2018. The Asia-Pacific Education Researcher. https://doi.org/10.1007/s40299-021-00620-7
Lockheed, M., Prokic-Bruer, T., & Shadrova, A. (2015). The experience of middle-income countries participating in PISA 2000–2015. PISA, The World Bank, Washington, D.C./OECD Publishing. https://doi.org/10.1787/9789264246195-en
Lopes, J. A., Gomes, C., Oliveira, C. R., & Elliott, J. G. (2020). Research studies on dyslexia: Participant inclusion and exclusion criteria. European Journal of Special Needs Education, 35(5), 587–602. https://doi.org/10.1080/08856257.2020.1732108
Luong, R., & Flake, J. K. (2022). Measurement invariance testing using confirmatory factor analysis and alignment optimization: A tutorial for transparent analysis planning and reporting. Psychological Methods. https://doi.org/10.1037/met0000441
Maamouri, M. (1998). Language education and human development: Arabic diglossia and its impact on the quality of education in the Arab region. Washington, DC: World Bank, Mediterranean Development Forum. https://files.eric.ed.gov/fulltext/ED456669.pdf. Accessed 10 Feb 2022
Mather, N., White, J., & Youman, M. (2020). Dyslexia around the world: A snapshot. Learning Disabilities, 25(1), 1–17. https://doi.org/10.18666/LDMJ-2020-V25-I1-9552
McBride, C. A. (2016). Is Chinese special? Four aspects of Chinese literacy acquisition that might distinguish learning Chinese from learning alphabetic orthographies. Educational Psychology Review, 28(3), 523–549. https://doi.org/10.1007/s10648-015-9318-2
McBride, C. (2019). Coping with dyslexia, dysgraphia and ADHD: A global perspective. Routledge/Taylor & Francis Group.
McBride, C., Inoue, T., Cheah, Z. R. E., & Pamei, G. (2022). Dyslexia in Asia. In G. Elbeheri & S. Lee (Eds.), The Routledge international handbook of dyslexia in education. Routledge. https://doi.org/10.4324/9781003162520-47
McBride, C., & Mo, J. (2021). Tower of Babel? Literacy development and impairment across cultures. In M. J. Gelfand, C.-y Chiu, & Y.-y Hong (Eds.), Handbook of advances in culture and psychology (pp. 120–162). Oxford University Press. https://doi.org/10.1093/oso/9780190079741.003.0003
McQueen, J., & Mendelovits, J. (2003). PISA reading: Cultural equivalence in a cross-cultural study. Language Testing, 20(2), 208–224. https://doi.org/10.1191/0265532203lt253oa
Melby-Lervåg, M., & Lervåg, A. (2014). Reading comprehension and its underlying components in second-language learners: A meta-analysis of studies comparing first-and second-language learners. Psychological Bulletin, 140(2), 409–433. https://doi.org/10.1037/a0033890
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (pp. 13–103). Macmillan Publishing Co Inc, American Council on Education.
Miles, T. R., Haslum, M. N., & Wheeler, T. J. (1998). Gender ratio in dyslexia. Annals of Dyslexia, 48, 27–55. https://doi.org/10.1007/s11881-998-0003-8
Nag, S., Vagh, S. B., Dulay, K. M., & Snowling, M. J. (2019). Home language, school language and children’s literacy attainments: A systematic review of evidence from low-and middle-income countries. Review of Education, 7(1), 91–150. https://doi.org/10.1002/rev3.3130
OECD. (1999). Translation of test instruments and survey material. In: PISA 2000 Field trial national project manager’s manual. Australian Council for Educational Research, pp. 21–54.
OECD. (2001). Knowledge and skills for life: First results from the OECD Programme for International Student Assessment (PISA) 2000. www.oecd-ilibrary.org/docserver/9789264195905-en.pdf?expires=1663502886&id=id&accname=ocid177302&checksum=3C5C4C9BA7D08D15DBB296F04311624F. Accessed 10 Feb 2022
OECD. (2007). Item submission guidelines for reading for PISA 2009. Paris: Author. www.acer.edu.au/files/itemsubguide_rd_pisa09_1.pdf. Accessed 10 Feb 2022
OECD. (2016a). PISA for development (Brochure). www.oecd.org/pisa/pisa-for-development/PISA-D_brochure_2016a_ENG.pdf. Accessed 10 Feb 2022
OECD. (2016b). Education in China: A snapshot. Retreived from https://www.oecd.org/china/Education-in-China-a-snapshot.pdf. Accessed 10 Feb 2022
OECD (2017) PISA for Development Senegal National Report www.oecd.org/pisa/pisa-for-development/Senegal_PISA_D_national_report.pdf
OECD. (2018a). ‘PISA for development construct validity’, (PISA-D Policy Brief, number 24). www.oecd.org/pisa/pisa-for-development/24-PISA-D-validity.pdf. Accessed 10 Feb 2022
OECD. (2018b). PISA Technical Report: Proficiency Scale Construction. www.oecd.org/pisa/data/pisa2018technicalreport/PISA2018b%20TecReport-Ch-15-Proficiency-Scales.pdf. Accessed 10 Feb 2022
OECD. (2018c). Programme for International Student Assessment (PISA) Results from PISA www.oecd.org/pisa/publications/PISA2018c_CN_MYS.pdf. Accessed 10 Feb 2022
OECD. (2018d). PISA 2018d Reading Literacy Framework. www.educacionyfp.gob.es/dam/jcr:49ede102-244b-4acb-b28e-a7978d9883ed/ReadingFramework.pdf
OECD (2019a). PISA 2018 Results (Volume I): What students know and can do. doi: https://doi.org/10.1787/5f07c754-en
OECD (2019b). PISA national project manager manual. www.oecd.org/pisa/pisaproducts/PISA-2022-National-Project-Manager-NPM-Manual.pdf. Accessed 10 Feb 2022
OECD (2019c). Philippines - Country note - PISA 2018 results. www.oecd.org/pisa/publications/PISA2018_CN_PHL.pdf. Accessed 10 Feb 2022
OECD (2019d). Thailand - Country note - PISA 2018 results. www.oecd.org/pisa/publications/PISA2018_CN_THA.pdf. Accessed 10 Feb 2022
OECD. (2020). The Programme for International Student Assessment For Development (PISA-D). Out-of-school-assessment Results in Focus www.oecd-ilibrary.org/docserver/491fb74a-en.pdf?expires=1663208145&id=id&accname=guest&checksum=216201D3848975E2C491372EF1BE2BCE. Accessed 10 Feb 2022
Okumura, T. (2014). Empirical differences in omission tendency and reading ability in PISA: An application of tree-based item response models. Educational and Psychological Measurement, 74(4), 611–626. https://doi.org/10.1177/0013164413516976
Oliveri, M. E., & von Davier, M. (2011). Investigation of model fit and score scale comparability in international assessments. Psychological Test and Assessment Modeling, 53(3), 315–333.
Padilla, J. L., & Benítez, I. (2014). Validity evidence based on response processes. Psciothema, 26(1), 136–144. https://doi.org/10.7334/psicothema2013.259
Papadopoulos, T. C., Csépe, V., Aro, M., Caravolas, M., Diakidoy, I.-A., & Olive, T. (2021). Methodological issues in literacy research across languages: Evidence from alphabetic orthographies. Reading Research Quarterly, 56(S1), 351–370. https://doi.org/10.1002/rrq.407
Patel, S. (2021). The global literacy assessment dashboard. Results for Development (R4D). r4d.og/resources/global-literacy-assessment-dashboard-glad/. Accessed 15 Mar 2022
Peterson, R. L., & Pennington, B. F. (2012). Developmental dyslexia. The Lancet, 379(9830), 1997–2007. https://doi.org/10.1016/S0140-6736(12)60198-6
Petscher, Y., Terry, N. P., Gaab, N., & Hart, S. A. (2020). Widening the lens of translational science through team science. https://doi.org/10.31234/osf.io/a8xs6
Plaut, D. (2012). Giving theories of reading a sporting chance. Behavioral and Brain Sciences, 35(5), 301–302. https://doi.org/10.1017/S0140525X12000301
Prior, M., Sanson, A., Smart, D., & Oberklaid, F. (1995). Reading disability in an Australian community sample. Australian Journal of Psychology, 47(1), 32–37. https://doi.org/10.1080/00049539508258766
Saiegh-Haddad, E. (2003). Linguistic distance and initial reading acquisition: The case of Arabic diglossia. Applied Psycholinguistics, 24(3), 431–451. https://doi.org/10.1017/s0142716403000225
Saiegh-Haddad, E., Laks, L., & McBride, C. (2022). Handbook of literacy in diglossia and in dialectal contexts. Springer. https://doi.org/10.1007/978-3-030-80072-7_13
Santos, Í., & Centeno, V. G. (2021). Inspirations from abroad: the impact of PISA on countries’ choice of reference societies in education. Compare: A Journal of Comparative and International Education. https://doi.org/10.1080/03057925.2021.1906206
Schleicher, A. (2009). Securing quality and equity in education: Lessons from PISA. Prospects, 39, 251–263. https://doi.org/10.1007/s11125-009-9126-x
Scribner, S., & Cole, M. (1978). Unpackaging literacy. Social Science Information, 17(1), 19–40. https://doi.org/10.1177/053901847801700102
Sharma, P., & Sagar, R. (2017). Unfolding the genetic pathways of dyslexia in Asian population: A review. Asian Journal of Psychiatry, 30, 225–229. https://doi.org/10.1016/j.ajp.2017.06.006
Shaywitz, S. E. (1998). Dyslexia. The New England Journal of Medicine, 338(5), 307–312. https://doi.org/10.1056/NEJM199801293380507
Shaywitz, S. E., Morris, R., & Shaywitz, B. A. (2008). The education of dyslexic children from childhood to young adulthood. Annual Review of Psychology, 59(1), 451–475. https://doi.org/10.1146/annurev.psych.59.103006.093633
Siegel, L. S. (1989). The education of dyslexic children from childhood to young adulthood. Journal of Learning Disabilities, 22(8), 469–478. https://doi.org/10.1177/002221948902200803
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on Generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630
Sneddon, J. N. (2003). Diglossia in Indonesian. Journal of the Humanities and Social Sciences of Southeast Asia and Oceania, 159(4), 519–549. https://doi.org/10.1163/22134379-90003741
Snyder, L., Caccamise, D., & Wise, B. (2005). The assessment of reading comprehension: Considerations and cautions. Topics in Language Disorders, 25(1), 33–50. https://doi.org/10.1097/00011363-200501000-00005
Solari, E., Terry, N. P., Gaab, N., Hogan, T., Nelson, N., Pentimonti, J., Petscher, Y., & Sayko, S. (2020). Translational science: A road map for the science of reading. Reading Research Quarterly, 55(S1), 347–360. https://doi.org/10.1002/rrq.357
Söyler, P. B., Aydin, B., & Atilgan, H. (2021). PISA 2015 reading test item parameters across language groups: A measurement invariance study with binary variables. Journal of Measurement and Evaluation in Education and Psychology, 12(2), 112–128. https://doi.org/10.21031/epod.800697
Stephens, M., & Coleman, M. (2007). Comparing PIRLS and PISA with NAEP in reading, mathematics, and science. National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Washington, DC. Retrieved from https://nces.ed.gov/surveys/pisa/pdf/comppaper12082004.pdf. Accessed 10 Feb 2022
Stevenson, H. W., Stigler, J. W., Lucker, G. W., Lee, S., Hsu, C., & Kitamura, S. (1982). Reading disabilities: The case of Chinese, Japanese, and English. Child Development, 53(5), 1164–1181. (pubmed.ncbi.nlm.nih.gov/7140425/)
Strassheim, H., & Kettunen, P. (2014). When does evidence-based policy turn into policy-based evidence? Configurations, contexts and mechanisms. Evidence & Policy, 10(2), 259–277. https://doi.org/10.1332/174426514X13990433991320
Takayama, K. (2008). The politics of international league tables: PISA in Japan’s achievement crisis debate. Comparative Education, 44(4), 387–407. https://doi.org/10.1080/03050060802481413
Takayama, K. (2018). How to mess with PISA: Learning from Japanese kokugo curriculum experts. Curriculum Inquiry, 48(2), 220–237. https://doi.org/10.1080/03626784.2018.1435975
The Ministry of General Education, Zambia. (2017). Education in Zambia. Findings from Zambia’s experience in PISA for Development. www.oecd.org/pisa/pisa-for-development/Zambia_PISA_D_national_report.pdf. Accessed 15 Sept 2022
The World Bank Group Education. (2019). Programme for International Student Assessment (PISA) * 2018, East Asia and pacific regional brief. Public disclosure authorized. documents1.worldbank.org/curated/en/876861593415668827/pdf/East-Asia-and-Pacific-Regional-Brief-Programme-for-International-Student-Assessment-PISA-2018.pdf. Accessed 10 Mar 2022
Thurstone, L. L. (1952). Applications of psychology. Harper & Brothers.
Tiokhin, L., Hackman, J., Munira, S., Jesmin, K., & Hruschka, D. (2019). Generalizability is not optional: Insights from a cross-cultural study of social discounting. Royal Society Open Science, 6(2), 181386. https://doi.org/10.1098/rsos.181386
Tzouriadou, M. (2022). Assessment and learning disabilities. In M. Tzouriadou & S. Tzivinikou (Eds.), Learning disabilities: From assessment to intervention (pp. 38–73). Cambridge Scholars Publishing.
Vagh, S. B., & Nag, S. (2019). The assessment of emergent and early literacy skills in the akshara languages. In R. Joshi & C. McBride (Eds.), Handbook of Literacy in Akshara Orthography. Literacy Studies (Perspectives from Cognitive Neurosciences, Linguistics, Psychology and Education) (Vol. 17, pp. 235–260). Springer.
Vágvölgyi, R., Bergström, K., & Bulajić, A. (2021). Functional illiteracy and developmental dyslexia: Looking for common roots. A systematic review. Journal of Cultural Cognitive Science, 5, 159–179. https://doi.org/10.1007/s41809-021-00074-9
Wagner, D. A. (2011). Smaller, quicker, cheaper: Improving learning assessments for developing countries. Paris: UNESCO-IIEP. http://repository.upenn.edu/literacyorg_chapters/4. Accessed 20 May 2022
Winograd, P., Paris, S., & Bridge, C. (1991). Improving the assessment of literacy. The Reading Teacher, 45(2), 108–115.
Winskel, H. (2013). Reading and writing in Southeast Asian languages. Procedia-Social and Behavioral Sciences, 97(6), 437–442. https://doi.org/10.1016/j.sbspro.2013.10.256
Winskel, H., & Iemwanthong, K. (2010). Reading and spelling acquisition in Thai children. Reading and Writing, 23, 1021–1053. https://doi.org/10.1007/s11145-009-9194-6
Winskel, H., Radach, R., & Luksaneeyanawin, S. (2009). Eye movements when reading spaced and unspaced Thai and English: A comparison of Thai-English bilinguals and English monolinguals. Journal of Memory and Language, 61(3), 339–351. https://doi.org/10.1016/j.jml.2009.07.002
Winskel, H., & Ratitamkul, T. (2019). Learning to read and write in Thai. In R. M. Joshi & C. McBride-Chang (Eds.), Handbook of literacy in Akshara orthographies. Literacy studies (Perspectives from Cognitive Neurosciences, Linguistics, Psychology and Education) (Vol. 17, pp. 217–231). Springer.
Yang, L., Li, C., Li, X., Zhai, M., An, Q., Zhang, Y., Zhao, J., & Weng, X. (2022). Prevalence of developmental dyslexia in primary school children: A systematic review and meta-analysis. Brain Science, 12(2), 240. https://doi.org/10.3390/brainsci12020240
Funding
This work was sponsored in part by grant number (T44-410/21-N) under the theme-based research scheme (C. McBride, PC).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pamei, G., Cheah, Z.R.E. & McBride, C. Construct validity of international literacy measures: implications for dyslexia across cultures. J Cult Cogn Sci 7, 159–173 (2023). https://doi.org/10.1007/s41809-022-00115-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41809-022-00115-x