Publications

The following are original Educational Data Systems publications based on work by our research team.

Scroll to:

Enrollment Projection Methods White Paper | How to Measure the “Objectivity” of a Test | Ready for Any Format: Using Paper-based and Online Assessments Effectively and Efficiently | California English Language Development Test (CELDT) Replication Study | Objectivity & Multidimensionality: An Alternating Least Squares Algorithm For Imposing Rasch-Like Standards Of Objectivity On Highly Multidimensional Datasets | EdScale: How to Measure Growth using Formative Exams Without Common Persons or Items | The California Reading First Year 7 Evaluation Report | The California Reading First Year 6 Evaluation Report | Reading First Supplemental Survey Report | The California Reading First Year 5 Evaluation Report | One Ruler, Many Tests: A Primer on Test Equating | The California Reading First Year 4 Evaluation Report | Multidimensional Equating | The California Reading First Year 3 Evaluation Report | One Use of a Non-Unidimensional Scaling (NOUS) Model | Weighting and Calibration | Rasch Demo Spreadsheet | NOUS 2003 Demo Spreadsheet | Preliminary Item Statistics Using Point-Biserial Correlation and P-Values PDF Document | Point Biserials Demo Spreadsheet

Enrollment Projection Methods White Paper [849KB]

by Educational Data Systems/Planware
June 23, 2020

Introduction: Planware™, a division of Educational Data Systems, has prepared this white paper with three main goals related to enrollment projections:

Provide a guide to understanding the terminology used when generating enrollment projections.
Demonstrate how to do one-year projections using methods that do not require a statistical consultant and that require only historical enrollment data and common spreadsheet software.
Describe a method for determining the “best” projection method for your district.

How to Measure the “Objectivity” of a Test [928KB]

Presentation by Mark H. Moulton, Ph.D.
November 2018, California Educational Research Association (CERA) Conference

Abstract: Although test objectivity (or “specific objectivity”, or “invariance”) has, since the work of Georg Rasch in the 1960’s, been a stated goal of educational measurement, no “objectivity” statistic has as yet been defined for evaluating tests, datasets, and psychometric models. Yet the properties of objectivity—comparability across tests, comparability over time, reproducibility of results—are often claimed for tests and student measures at all levels of the assessment system, from district-developed interim assessments to large-scale state assessments. This presentation will critique the basis of those claims and propose a fresh way to look at “objectivity.” An alternative set of statistical criteria will be identified that make it possible to support the claim of objectivity in a very direct and useful way: a) the ability to predict missing values in a dataset; b) parameter reproducibility across samples; and c) item independence. Combined into a single “objectivity statistic,” it will be shown that these criteria lead to a generalized method to evaluate the results of a test analysis—to evaluate test quality, the suitability of the IRT (Item Response Theory) models being considered to analyze it, and the optimal dimensionality for multidimensional tests. Examples will be drawn from real and simulated datasets.

Ready for Any Format: Using Paper-based and Online Assessments Effectively and Efficiently [265KB]

by Susan M. McMillan, Ph.D.
November 2018, California Educational Research Association (CERA) Conference

Abstract: In a climate where statewide educational assessments seem to be shifting toward online administration formats, we at Educational Data Systems wondered whether school districts are changing how they administer local assessments. We asked a subset of California public school districts and in this presentation we provide our survey findings about the use of computer- and paper-based local assessments. Most districts report using a mix of assessment formats in their efforts to make efficient and effective use of district resources. We also offer information on tools that may be useful for districts in managing and scoring their paper-based assessments, and make a case that sometimes paper makes the most sense, even in today’s world of “everything” going digital.

California English Language Development Test (CELDT) Replication Study [3.6MB]

by Educational Data Systems
February, 2017

Abstract: For the 2016–17 and 2017–18 editions of the California English Language Development Test (CELDT), Educational Data Systems, the prime contractor for the California Department of Education, assumed responsibility for psychometric, analytical, and technical reporting tasks previously held by its subcontractor, Educational Testing Service (ETS). To ensure the validity and consistency of testing across CELDT editions and to ensure a smooth transition, psychometricians at Educational Data Systems conducted an independent replication study of the analyses previously conducted by ETS for the 2014–15 edition of the CELDT, the most recently completed analyses at the time this activity began. Our replication report provides the results of that study. To summarize, we are able to replicate exactly the classical item analysis, descriptive statistics, and correlation analysis in the Technical Report. We find that the item calibration and scale transformation results produced using the open-source IRT software, jMetrik, are comparable to those produced by ETS using other software. We recommend adopting Rudner’s IRT-based method for computing classification accuracy and consistency. Finally, we discuss options for continuing to explore item-fit statistics for the CELDT.

Objectivity & Multidimensionality: An Alternating Least Squares Algorithm For Imposing Rasch-Like Standards Of Objectivity On Highly Multidimensional Datasets [937KB]

by Mark H. Moulton, Ph.D.
May, 2013

Abstract: To an increasing degree, psychometric applications (e.g., predicting music preferences) are characterized by highly multidimensional, incomplete datasets. While the data mining and machine learning fields offer effective algorithms for such data, few specify Rasch-like conditions of objectivity. On the other hand, while Rasch models specify conditions of objectivity—made necessary by the imperative of fairness in educational testing—they do not decisively extend those conditions to multidimensional spaces. This paper asks the following questions: What must a multidimensional psychometric model do in order to be classified as “objective” in Rasch’s sense? What algorithm can meet these requirements? The paper describes a form of “alternating least squares” matrix decomposition (NOUS) that meets these requirements to a large degree. It shows that when certain well-defined empirical criteria are met, such as fit to the model, ability to predict “pseudo-missing” cells, and structural invariance, NOUS person and item parameters and their associated predictions can be assumed to be invariant and sample-free with all the benefits this implies. The paper also describes those conditions under which the model can be expected to fail. Demonstrations of NOUS mathematical properties are performed using an open-source implementation of NOUS called Damon.

EdScale: How to Measure Growth using Formative Exams Without Common Persons or Items [384KB]

by Mark H. Moulton, Ph.D.
March, 2010

Abstract: A new method is proposed for equating multidimensional formative exams without common persons or items. A multidimensional IRT (MIRT) model called NOUS is used to link students who take different formative exams by exploiting scores received on a common test administered at some point in the recent past, such as the California Standards Test. The “past test” vector is projected into the multidimensional subspace of the two formative exams, and students located in the same subspace are projected onto the common “past test” vector, allowing apples-to-apples comparisons between students on a common, well-understood metric. Growth measurement is handled by applying a timeseries function to expected growth rates computed from previous years. The methodology is presented in connection with a scaling product developed in California, called EdScale, which is used to measure student growth on benchmark exams developed or purchased independently by school districts.

The California Reading First Year 7 Evaluation Report [1.2MB]

by Diane Haager, Ph.D.; Renuka Dhar; Mark H. Moulton, Ph.D.; Susan McMillan, Ph.D.
November, 2009

Abstract: The last in a seven-year series of evaluation reports of the California Reading First program (2003-09), this report generally replicates the findings of previous reports with particular focus on schools that have been in the program since the second year of Reading First funding (Cohort 2). It finds that for Cohort 2 schools growth has been significant, that higher implementing schools post larger gains than lower implementing schools, and that Reading First schools generally out-perform a statistical control group, though the differences are not significant in all cases. Reading First schools have posted higher growth rates than non-Reading First schools and have proven to be particularly effective in helping low-performing students and English Learners move to higher proficiency levels. These effects, while smaller in Cohort 2 schools than for the Reading First population as a whole, are in substantial agreement with a meta-analysis of California Reading First effect sizes presented in the Year 6 Evaluation Report which found that, overall, the Reading First effect is educationally meaningful and has a high degree of statistical significance. The report also finds that program implementation declined in 2008-09, that principal participation and teacher perceptions are the strongest predictors of success, that the program is responsible for building capacity at state and local levels, and that it has created a sustainable, comprehensive structure for reading/language arts instruction. The Report concludes with lessons learned over the seven-year course of the evaluation.

In addition to the primary report, the Appendices [5.5MB] are also available.

The California Reading First Year 6 Evaluation Report [1.1MB]

by Diane Haager, Ph.D.; Craig Heimbichner; Renuka Dhar; Mark H. Moulton, Ph.D.; Susan McMillan, Ph.D.
December, 2008

Abstract: Replicating the results of previous Reading First Evaluation Reports, the Year 6 Report documents the achievement and implementation of more than 800 schools that have received Reading First funding since 2003. Achievement is measured using STAR scores. Implementation is measured with data from surveys administered to Reading First teachers, coaches, and principals, using a 3-Facet Rasch Model. The report finds that Reading First schools show significantly higher gains (p < 0.05) than a "statistical control group," and that High Implementation Reading First schools show significantly higher gains than Low Implementation Reading First schools. This is true even for achievement in grade 5, despite the fact that Reading First is only administered for K-3. The Year 6 Report includes a meta-analysis of the Reading First effect to date which effectively removes any doubt regarding the overall efficacy of the program. It also includes quantitative and qualitative information regarding the efficacy of individual program elements and the usefulness of the interim assessment system, and devotes chapters to the effect of Reading First on English Learners and Special Education classrooms. In addition to the primary report, the Appendices [1MB] are also available.

Reading First Supplemental Survey Report [225KB]

by Craig Heimbichner, Susan McMillan, Ph.D., Mark H. Moulton, Ph.D., Renuka Dhar
March, 2008

Abstract: Developed in response to 2007-08 Senate Bill (SB) 77, Provision 6, this report presents results of a survey given online to administrators of California school districts (Local Educational Agencies, or LEAs) that were eligible to participate in the federal Reading First program. The survey solicited information regarding why districts chose to participate or not in the Reading First program, its perceived strengths and weaknesses, and for districts that chose not to participate in Reading First, the reading programs, coaching, and professional development that these districts offer in grades K-3 instead. The report compares the achievement of eligible participating and non-participating districts on the grade 2 and grade 3 English Language Arts California Standards Test over a five-year period. Subject to limitations in sample size, findings include: eligible non-Reading First LEAs were concerned about too many requirements when opting not to apply; approximately 50%-60% of eligible non-Reading First LEAs use the Houghton-Mifflin reading program, one of the two programs required in Reading First; teachers in eligible non-Reading First LEAs receive less professional development than those in Reading First LEAs; reading coaches are less available and under-utilized in eligible non-Reading First LEAs; perceptions of Reading First by participating LEAs are very positive; and Reading First schools and LEAs show stronger achievement growth than eligible non-participating schools and LEAs on the CSTs for English Language Arts. Both the survey and the achievement results support the finding that Reading First has been beneficial for participating LEAs, replicating results in The California Reading First Year 5 Evaluation Report.

In addition to the primary report, the Appendices [185KB] are also available.

The California Reading First Year 5 Evaluation Report [1MB]

by Diane Haager, Ph.D.; Renuka Dhar; Mark H. Moulton, Ph.D.; Susan McMillan, Ph.D.
January, 2008

Abstract: Replicating the results of the Year 4 Report, the Year 5 Reading First Evaluation Report commissioned by the California Department of Education in accordance with No Child Left Behind documents the achievement and implementation of three cohorts of schools that have received Reading First funding since 2003. Achievement was measured primarily using STAR scores, Implementation was measured with data from surveys administered to Reading First teachers, coaches, and principals using a 4-Facet Rasch Model. The report finds that Reading First schools show significantly higher gains (p < 0.05) than a "statistical control group," and that High Implementation Reading First schools show significantly higher gains than Low Implementation Reading First schools. The report concludes that California's Reading First program is effective, and that its effectiveness is directly related to the degree it is implemented. Additional chapters document perceptions of the various reading program elements, the performance of English learners, and the performance of students in waivered classrooms. We find that Reading First is effective with the English learner population, and that non-waivered classrooms are more effective than waivered classrooms. In addition to the primary report, the Appendices [7.5MB] are also available.

One Ruler, Many Tests: A Primer on Test Equating [750KB]

by Mark H. Moulton, Ph.D.
December, 2007

Abstract: Given the variety of languages, cultures, and curricular priorities across APEC countries, it would seem difficult to unite around a common set of teaching and learning standards for purposes of international communication. Yet the nascent field of “psychometrics” offers practical solutions to such problems by its ability to “equate” tests that differ in difficulty and even content, and by its ability to set standards that have the same meaning across tests and countries. After summarizing the principles behind classical and modern educational measurement, the paper discusses several technologies that can make it possible for APEC countries to jump the language barrier without sacrificing local imperatives. These technologies include: local, national and international item banks, computer adaptive testing, and the Lexile framework.

The California Reading First Year 4 Evaluation Report [740KB]

by Diane Haager, Ph.D.; Renuka Dhar; Mark H. Moulton, Ph.D.; Susan McMillan, Ph.D.
December, 2006

Abstract: Replicating the results of the Year 3 Report, the Year 4 Reading First Evaluation Report commissioned by the California Department of Education in accordance with No Child Left Behind documents the achievement and implementation of three cohorts of schools that have received Reading First funding since 2003. Achievement was measured primarily using STAR scores. Implementation was measured with data from surveys administered to Reading First teachers, coaches, and principals using a 4-Facet Rasch Model. The report finds that Reading First schools show significantly higher gains (p < 0.05) than a "statistical control group," and that High Implementation Reading First schools show significantly higher gains than Low Implementation Reading First schools. The report concludes that California’s Reading First program is effective, and that its effectiveness is directly related to the degree it is implemented. In addition to the primary report, the Appendices [740KB] are also available.

Multidimensional Equating [275KB]

by Mark H. Moulton, Ph.D.; Howard A. Silsdorf
April, 2006

Abstract: Form equating methods have proceeded under the assumption that test forms should be unidimensional, both across forms and within each form. This assumption is necessary when the data are fit to a unidimensional model, such as Rasch. When the assumption is violated, variations in the dimensional mix of the items on each test form, as well as in the mix of skills in the student population, can lead to problematic testing anomalies. The assumption ceases to be necessary, however, when data are fit to an appropriate multidimensional model. In such a scenario, it becomes possible to reproduce the same composite dimension rigorously across multiple test forms, even when the relative mix of dimensions embodied in the items on each form varies substantially. This paper applies one such multidimensional model, NOUS, to a simulated multidimensional dataset and shows how it avoids the pitfalls that can arise when fitting the same data to a single dimension. Some implications of equating multidimensional forms are discussed.

The California Reading First Year 3 Evaluation Report [1MB]

by Diane Haager, Ph.D.; Renuka Dhar; Mark H. Moulton, Ph.D.; Seema Varma, Ph.D.
November, 2005

Abstract: The Year 3 Reading First Evaluation report, commissioned by the California Department of Education in accordance with No Child Left Behind, documents the achievement and implementation of three cohorts of schools that have received Reading First funding since 2003. Achievement was measured using primarily STAR scores. Implementation was measured with data from surveys administered to Reading First teachers, coaches, and principals, analyzed using a 4-Facet Rasch Model. The report finds that Reading First schools show somewhat higher gains than comparable non-Reading First schools, and that High Implementation Reading First schools show significantly higher gains than Low Implementation Reading First schools and comparable non-Reading First schools. The report concludes that California’s Reading First program is effective, and that its effectiveness is directly related to the degree it is implemented.

In addition to the primary report, the Appendices [650KB] are also available.

One Use of a Non-Unidimensional Scaling (NOUS) Model [125KB]
Transferring Information Across Dimensions and Subscales

by Mark H. Moulton, Ph.D.
April, 2004

Abstract: Test administrators sometimes ask for student performance on test subscales having few items, rendering them unreliable and hard to equate. Worse, subscales sometimes embody orthogonally distinct secondary dimensions as well. Traditional Rasch analysis offers reasonable solutions in some cases, but not all, and is not a general solution. This paper proposes a general solution using a Rasch-derived non-unidimensional scaling measurement model, called NOUS, which transfers information across items, subscales, and dimensions. Drawing examples from a recent state exam, it shows that NOUS yields measures for short subscales that are comparable to unidimensional measures computed using long forms of the same subscale. It concludes by discussing applications for multidimensional equating, student-level diagnostics, and measurement of performance on open-ended items.

Weighting and Calibration [195KB]
Merging Rasch Reading and Math Subscale Measures into a Composite Measure

by Mark H. Moulton, Ph.D.
April, 2004

Abstract: While the emergence of Rasch and related IRT methodologies has made it routine to update tests across administrations without altering the original Pass/Fail standard, their insistence on unidimensionality raises a problem when the standard combines performance on multiple dimensions, such as mathematics and language. How combine a student’s mathematics and language measures to make a Pass/Fail decision on composite ability when the two scales embody different dimensions and logit units, and lack common items. Using client-determined weights and student expected scores, we present a simple method for combining unrelated subscales, encountered in a recent high-stakes certification exam, to produce composite logit measures without sacrificing the advantages of unidimensional IRT methodologies.

Rasch Demo Spreadsheet [197KB]

by Mark H. Moulton, Ph.D.
August, 2003

Abstract: This Excel file was developed to help students and practitioners of the Rasch Model get a simple and intuitive look at what goes on “under the hood” of most Rasch programs for dichotomous data. The Excel workbook shows all the matrices, formulas, and iterations needed to understand how person abilities, item difficulties, standard errors, expected values, and misfit statistics are computed. It also allows the user to perform simple experiments to see the effects of missing and misfitting data. You will find that, beneath the sophistication and apparent inscrutability of modern Rasch software, the model and its estimation algorithm are surprisingly easy to understand.

The workbook most closely emulates the UCON algorithm described by Wright and Stone in Best Test Design.

The spreadsheet is fully documented using Excel comments, which may be turned on and off as desired.

NOUS 2003 Demo Spreadsheet [1.5MB]

by Mark H. Moulton, Ph.D.
August, 2003

Abstract: At EDS we make use of three powerful psychometric tools: WinSteps (Rasch Model), Facets (the Many Facets Rasch Model), and NOUS (Non-Unidimensional Scaling). The first two are well-known but NOUS, developed by Mark Moulton and Howard Silsdorf, is the secret tool behind much of EDS’s psychometric work, especially in the area of equating benchmark exams. It combines important principles of the Rasch Model with concepts drawn from multidimensional matrix methods such as Singular Value Decomposition and Alternating Least Squares to yield a tool that is able to analyze dimensionally complex data sets and solve equating problems that are widely considered to be intractable. For all of its apparent complexity, the algorithm is actually quite simple and is presented here in complete detail with Excel formulas and helpful annotations.

This spreadsheet requires Microsoft Excel 2007.

Preliminary Item Statistics Using Point-Biserial Correlation and P-Values PDF Document [54KB]

by Seema Varma, Ph.D.
August, 2003

Abstract: This document demonstrates the usefulness of the point biserial correlation for doing item analysis. Step by step computation of the point biserial correlation is shown in an Excel demo sheet. The SPSS syntax for computing the correlation in SPSS is also provided. A dummy dataset is used to help the reader interpret the statistic.

Point Biserials Demo Spreadsheet [68KB]

by Seema Varma, Ph.D.
August, 2003

Abstract: This Excel file accompanies the above document and demonstrates how point-biserials can be computed with Microsoft Excel.

The spreadsheet is fully documented using Excel comments, which may be turned on and off as desired.

Redistribution policy: These publications may be quoted or distributed without prior consent of the authors subject to the condition that the authors be correctly and fully acknowledged and that quotations be in context.

Publications

Scroll to:

Enrollment Projection Methods White Paper [849KB]

How to Measure the “Objectivity” of a Test [928KB]

Ready for Any Format: Using Paper-based and Online Assessments Effectively and Efficiently [265KB]

California English Language Development Test (CELDT) Replication Study [3.6MB]

Objectivity & Multidimensionality: An Alternating Least Squares Algorithm For Imposing Rasch-Like Standards Of Objectivity On Highly Multidimensional Datasets [937KB]

EdScale: How to Measure Growth using Formative Exams Without Common Persons or Items [384KB]

The California Reading First Year 7 Evaluation Report [1.2MB]

The California Reading First Year 6 Evaluation Report [1.1MB]

Reading First Supplemental Survey Report [225KB]

The California Reading First Year 5 Evaluation Report [1MB]

One Ruler, Many Tests: A Primer on Test Equating [750KB]

The California Reading First Year 4 Evaluation Report [740KB]

Multidimensional Equating [275KB]

The California Reading First Year 3 Evaluation Report [1MB]

One Use of a Non-Unidimensional Scaling (NOUS) Model [125KB] Transferring Information Across Dimensions and Subscales

Weighting and Calibration [195KB] Merging Rasch Reading and Math Subscale Measures into a Composite Measure

Rasch Demo Spreadsheet [197KB]

NOUS 2003 Demo Spreadsheet [1.5MB]

Preliminary Item Statistics Using Point-Biserial Correlation and P-Values PDF Document [54KB]

Point Biserials Demo Spreadsheet [68KB]

One Use of a Non-Unidimensional Scaling (NOUS) Model [125KB]
Transferring Information Across Dimensions and Subscales

Weighting and Calibration [195KB]
Merging Rasch Reading and Math Subscale Measures into a Composite Measure