Flow and diffusion of high-stakes test scores

Edited by Leo P. Kadanoff, University of Chicago, Chicago, IL, and approved August 18, 2009
October 13, 2009
106 (41) 17267-17270

Abstract

We apply visualization and modeling methods for convective and diffusive flows to public school mathematics test scores from Texas. We obtain plots that show the most likely future and past scores of students, the effects of random processes such as guessing, and the rate at which students appear in and disappear from schools. We show that student outcomes depend strongly upon economic class, and identify the grade levels where flows of different groups diverge most strongly. Changing the effectiveness of instruction in one grade naturally leads to strongly nonlinear effects on student outcomes in subsequent grades.
Texas began testing almost every student in almost every public school in grades 3-11 in 2003 with the Texas Assessment of Knowledge and Skills (TAKS). Every other state in the United States administers similar tests and gathers similar data, either because of its own testing history, or because of the Elementary and Secondary Education Act of 2001 (No Child Left Behind, or NCLB). Texas mathematics scores for the years 2003 through 2007 comprise a data set involving more than 17 million examinations of over 4.6 million distinct students. Here we borrow techniques from statistical mechanics (1) developed to describe particle flows with convection and diffusion and apply them to these mathematics scores. The methods we use to display data are motivated by the desire to let the numbers speak for themselves with minimal filtering by expectations or theories.
The most similar previous work describes schools using Markov models. “Demographic accounting” (2) predicts changes in the distribution of a population over time using Markov models and has been used to try to predict student enrollment year to year (3, 4), likely graduation times for students (5), and the production of and demand for teachers (6). We obtain a more detailed description of students based on large quantities of testing data that are just starting to become available. Working in a space of score and time we pursue approximations that lead from general Markov models to Fokker–Planck equations, and obtain the advantages in physical interpretation that follow from the ideas of convection and diffusion.

Results

Fig. 1 A compares Texas mathematics scores taken in spring 2006 to those taken in spring 2007. Each arrow represents a group of students whose score fell into a range such as 80%-89% in 2006. The tail of each arrow is centered in the bin where students start. The tip of the arrow points in the direction of the average change in score of the students in this group. The number of students accounted for by the arrow is shown by its area (not its length). The question one answers by following the flow is “If I know students' scores when they are in third grade, what is the most likely set of scores for them to have as they head towards 11th?” The plot shows a snapshot of student motion across all grades in one year, and does not show the motion of particular individuals all the way from third to eleventh grade. The number of students represented by each arrow is large (40,000 students for the larger arrows in Fig. 1) so the standard error of the mean for changes in score is around 0.1%. Three shaded bands indicate the cut scores that divide commended from passing and failing performance.
Fig. 1.
Flow plots recording score changes from spring 2006 to spring 2007. The top row has data from students not elibigle for free and reduced meals whereas the bottom row shows students who are eligible. That is, students from wealthier families are on top, and those from poorer families are below. Flow arrows v S show the most likely future path of students whose starting point is known. Reverse flow arrows v S can be followed backwards to determine the most likely history of students whose ending point is known. The top band highlights commended students, the middle band highlights passing scores, and the bottom band highlights failing scores, using the cut scores published for each year and grade by the Texas Education Agency.
Knowing students' scores in one year does not completely determine their scores the next year. Score changes have a random component. The degree of randomness can be reduced by judiciously grouping similar students, but cannot be eliminated. Variations between students and schools, and the fact that students guess at problems they do not know, lead to uncertainty on the order of 10% in the change of individual scores from year to year. Adopting the language of fluids, both convection and diffusion contribute to the flow of students. Eq. 3 presents a Fokker–Planck equation (1) that makes this idea precise.
The diffusive contribution to the flow due to guessing can be modeled mathematically: students invariably guess the answers to questions they do not know, because there is no penalty for guessing, and the fraction of correct guesses can be modeled by a binomial distribution. Each question has four responses, so the probability of guessing correctly is f=14, and students who know n questions out of N total questions on an exam will guess at the remaining M = Nn, resulting in a mean (normalized) score of (n + fM)/N with a variance of f(1 − f)M/N 2. This variance provides a lower limit for the amount of diffusion in the absence of any other diffusive terms. The actual diffusion, as measured from the data, is several times greater than this lower limit in all cases, indicating that randomness due to guessing only provides a small part of the diffusion.
One consequence of diffusion is that not all students follow the path predicted by their flow arrows. Fig. 2 A is a graphical representation of the amount students' scores are raised above or below the main flow by diffusion. These differences are due both to guessing and to differences in education, school, etc. A more subtle consequence of diffusion is that following students from the future into the past is different from following them from the past into the future. One can obtain a different set of flow arrows by choosing students whose score fell into a range such as 80%-89% in 2007 and computing their average score the prior year. This flow, which is displayed as Fig. 1 B answers the question “If I know students' scores when they are in 11th grade, what is the most likely set of scores for them to have had coming from third?”
Fig. 2.
Student diffusion, disappearance, and retention. (A) Diffusion plots. These show the changes in student numbers due to random processes such as guessing on the exam (Eq. 11). To find the total change in numbers of students in every cell including contributions from diffusion, add these vertical arrows to the convective arrows of Fig. 1 A. (B) Students appearing and disappearing from school or retained in a grade, deduced from mathematics exams administered in spring 2006 and spring 2007. Vertical arrows show the net result of students appearing and disappearing from school, whereas horizontal arrows show the numbers of students required to repeat a grade. Downward pointing arrows mean that more children are disappearing from a grade than appearing in it. Areas of arrows are proportional to the numbers of children involved. The scale of arrows and meaning of the colored bands is the same as in Fig. 1.
The upper and lower plots in Fig. 1 divide students into two groups according to their level of economic need. The upper plots show students not eligible for free and reduced meals (called “not low income”), and the bottom plots show those who are eligible (called “low income”). Many other groupings are possible, including those by race and gender, economic need according to school rather than by individual, or combining race and gender with economic need.
There are two additional phenomena that contribute to student flows. Students appear and disappear from one year to the next, and students can be required to repeat a grade. These contributions are reflected to some extent in the sizes of arrows in Fig. 1, but it is useful to bring them out directly. Vertical arrows in Fig. 2 B show the number of students who appeared in each grade and were missing or had zero score the previous year minus those who had been present and now vanish or get zero score. The horizontal arrows show the numbers of students who repeat a grade.
Fig. 3 shows how flow fields evolve over time for low-income students. The broad outlines of the flow pattern remain remarkably constant, while at the same time there are some systematic changes such as a rapid increase in the numbers of students obtaining commended scores at 10th grade.
Fig. 3.
Flow plots for low-income students 2003–2007. The conventions for the arrows and colored bands in these plots are the same as in Fig. 1. The figure shows that the main features of the flow are very consistent from year to year. Arrows circled in the upper right corners highlight the small but rapidly increasing numbers of low-income students performing at the highest levels on the test at 10th grade.

Discussion

A characteristic pattern in Fig. 1 is a strong horizontal flow with arrows of decreasing size above and below it pointing towards the flow center. In fluids, this phenomenon results from the competition between fluctuations and dissipation: a particle that is moving much faster than those around it because it has just received a particularly large random kick is most probably going to slow down. In statistics this phenomenon is called regression to the mean (7) and explains why arrows above the center of the flow tend to point down. Regression to the mean can be caused by several factors, including the mathematics of guessing, or by the small likelihood of students having exceptional teachers several years in a row.
Educational outcomes for students from wealthy and poor families are very different in Texas. The flow fields show where the greatest divergences between these groups occur. The flow patterns in the top and bottom rows of Fig. 1 start out in nearly the same direction until the transition to middle school between fifth and seventh grade, when students from economically disadvantaged backgrounds flow downwards at a higher pace than their less disadvantaged counterparts and never recover. Ninth grade is another crucial time because students who are not passing the mathematics exams are forced to repeat a grade and consequently disappear from schools in large numbers. This effect is much stronger for those who are economically disadvantaged than for those who are not, as shown in Fig. 2 B.
Flow fields address many questions about the educational system. There is a debate over the student variables that should be used to describe effects of teachers and schools. Sanders (8) states that “models should not include socio-economic or ethnic accommodations but should only include measures of previous achievement of individual students.” In this view, prior year scores contain everything one needs to know about the state of the students. However differences between flow directions have great statistical significance. For example, sixth graders not eligible for free and reduced meals and mathematics scores between 90% and 100 % in 2006/2007 drop on average in score by 4.4% the next year, whereas those eligible for free and reduced meals drop in score by 7.0%. (N ∼ 30,000, t = 34, p < 10−9). Similar statistical significance applies to the differences between virtually all the arrows in the upper and lower rows of Fig. 1. Changes in scores depend strongly, reproducibly, and with high statistical significance, upon poverty level even after controlling for previous achievements of students. It is possible that this difference in score changes is entirely due to the lower quality of teachers assigned to the least affluent students. However, it is difficult to reach such a conclusion simply from test data; the conclusion that ineffective teachers are largely to blame for unsatisfactory student performance risks being circular (9) if ineffective teachers are defined to be those whose students' test scores decrease (10). Drawing conclusions about school effectiveness from test data presents comparable difficulties (11).
Another claim is that the difficulty of items on the TAKS exams is carefully chosen so as to maintain students' scores at the same level over time (12). This claim is only partly supported by the data. Most flow vectors are close to horizontal but the slopes are not negligible and, over the course of several years, students flow to very different regions from where they began, a change which depends strongly on variables such as students' ethnic group and economic class.
Because the impetus to pass No Child Left Behind came from Texas, there have been debates on differing reasons Texas test scores have been rising. McNeill et al. (13) suggest that rises in Texas test scores can be attributed to an increasing pattern of retaining low-income students at ninth grade until they disappear. Data support this claim. Retention of low-income students (those eligible for free and reduced lunch) at ninth grade increased between 2003/2004 and 2006/2007 from 31,200 to > 33,000, whereas the number of low-income students disappearing from ninth grade increased from 23,200 to 28,500.
Finally, we note that linear modeling is pervasive in analysis of educational data (14), but we see many effects that are inherently nonlinear. For example, suppose that through some form of improved instruction it is possible to increase the score gains of low-income students in sixth and seventh grades. This will have the effect of diverting the entire flow pattern slightly upward. In 10th grade, the number of low-income students in the highest score bracket (90%-100%) constitutes the exponentially small tail of a distribution centered at around 60%. Thus small motions of the distribution upward result in exponentially varying gains at the top. Such gains are evident. For example, consider the low income students scoring > 90% in 10th grade circled in Fig. 3. Between 2003 and 2006, the number of these students grew by 15–50% per year, starting with 2,150 students in 2003 and ending with 7088 in 2006. The model presented here is nonparametric and makes minimal assumptions about the form of the underlying probability distributions for student score changes.
It should be possible to use our convection and diffusion models in order to predict quantitatively how improvements at lower grades affect flow of students at higher grades. These predictions would apply to the average behavior of large numbers of students although individuals would display considerable variation. The success of these predictions will partly hinge on the extent to which score changes in successive years are statistically independent. Our preliminary examination of this question indicates that knowledge of two prior year's scores only improves prediction by ≈ 10%, so the assumption of independence is acceptable.

Materials and Methods

Dataset.

We obtained all TAKS-related data that the Texas Education Agency, the government agency charged with administering and evaluating standardized tests in Texas, is able to release. Each row of the dataset contains information about a single examination of a single student, including that student's demographic details as well as their responses to each question of the examination along with their score. Each student has a globally unique, anonymized identifier that allows us to follow them through time as they move between schools. Students are described by race, gender, and eligibility for free and reduced-price meals, which is an indication of whether family income is low or high. Every answer they have bubbled on each test is provided, and can be compared with correct answers. The schools and districts in which students take the exams are named. There are some things we would have liked to know that are not included: there is no indication of whether students have changed schools in the middle of the year. The State probably does not know. More puzzling, there is no information on the teacher under whose care the student took the exam. The State certainly does have this information, because it returns to each teacher a record of their students' scores, and provides reports on the performance of every teacher to each district. However they must not retain the data, because the Texas Education Agency is required release all information in their possession in accord with the Texas Freedom of Information Act, and data linking students to teachers are not available.
The data set has defects, some of which can partially be remedied and some of which cannot. There are over 27,000 students with invalid records who end up coded with the same unique identifier and must be removed. Such defects appear to involve tens of thousands of students but there is nothing to be done about them. Out of a population of millions we do not believe that these defects are likely to distort our results.
We transformed the dataset to make analysis more manageable, but without changing any entries. We created normalized sets of tables to speed up searches in MySQL. We also produced condensed files containing all relevant information about each student on one line, useful for analysis with Python scripts. We normalize all students' scores by dividing by the maximum possible score for that student for that exam.

Statistical Methods.

Let N S a and N S b be the numbers of students in years a and b with score S. N will always depend upon some other variables as well, such as the grade level, perhaps economic need or race of the student, but we suppress additional indices for the moment so as to focus on the primary variables of scores and time. Let R S′→S ap be the number of students with score S′ in year a who score S in year b. The master equation is
Transitions to and from the state with score zero have to be treated separately because they correspond to students who were sick, absent for other reasons, left school, left the country, or have an invalid exam. We do not distinguish between students who show up in the dataset with a zero score and those who do not appear at all. We define
to be the disappearance of students with score S between years a and b.
To obtain a Fokker–Planck equation, assume that R is slowly varying as a function of S, although not slowly varying as a function of δS. Then, to second order in score changes,
This gives
where the forward flow v S and the forward diffusion D S are defined by
The forward flow v S gives the average score change of students with score S in year a who also have a (nonzero) score in year b. The diffusion coefficient D S sets the magnitude of random variations in scores. One can repeat the derivation of Eq. 4 but Taylor expand the second term of Eq. 1 rather than the first. This leads to the reverse flow v S and reverse diffusion D S:
The reverse flow v S answers the question “If a student has a score between 80% and 89% in 12th grade, what is the most likely path to have been followed since third grade?” The average of the forward and reverse flows is a current that predicts changes in student numbers without diffusion:
Subtracting Eq. 4 from Eq. 9 gives the identity
which means that the difference between the forward and reverse flows or the sum of the forward and reverse diffusions provides a measure of total diffusion.
We now can interpret more precisely what we have plotted. Fig. 1 A displays the vector (δt, v S)N S a, where δt = 1 year is the horizontal distance from one grade level to the next, with the vector scaled in the vertical direction so that v S = 1 corresponds to the height difference between 0 and 100%. The flow plots show streamlines of the most likely future path of students. Fig. 1 B displays the vector −(δt, v S)N S b. Fig. 2 A plots triangles of height J D, and whose width is proportional to NSa+NSb. Note that J D + v S N S a = J, so the vertical components of the vectors plotted under Flow and Diffusion sum to the average score changes of all students including both the effects of convective flow and diffusion. The vertical arrows in Fig. 2 B are the disappearance rate Δ S ab. The horizontal arrows are computed similarly, and are obtained from the total number of students in two consecutive years found to be repeating a grade.

Acknowledgments.

We thank Philip Kromer for helping to put the data in normalized form, Stephen Stigler for posing stimulating questions following a presentation, the Texas Education Agency for supplying preliminary data, and the University of Texas Dallas Educational Research Center and the Ray Marshall Center at the University of Texas at Austin for access to final unfiltered data. This work was supported by National Science Foundation Grant DMR 0701373.

References

1
NG van Kampen in Stochastic Processes in Physics and Chemistry (3rd Ed, North-Holland, Amsterdam), pp. 197–200 (2007).
2
R Stone Demographic Accounting and Model-Building (Organization for Economic Cooperation and Development, Washington, DC, 1971).
3
J Gani, Formulae for projecting enrolments and degrees awarded in universities. J R Stat Soc 126, 400–409 (1986).
4
MG Nicholls, Short term prediction of student numbers in the Victorian secondary education system. Aust New Zealand J Stat 24, 179–190 (1982).
5
C Shah, G Burke, An undergraduate student flow model: Australian higher education. High Educ 37, 359–375 (1999).
6
G Burke, Demographic accounting and modeling: An application to trainee secondary teachers in Victoria. Aust Econ Pap 15, 240–251 (1976).
7
SM Stigler Statistics on the Table (Harvard Univ Press, Cambridge, MA), pp. 157–188 (1999).
8
WL Sanders, Value-added assessment from student achievement data: Opportunities and hurdles. J Pers Eval Educ 14, 329–339 (2000).
9
H Kupermintz, Teacher effects and teacher effectiveness: A validity investigation of the Tennessee Value Added Assessment System. Educ Eval Policy Anal 25, 287–298 (2003).
10
HR Jordan, RL Mendro, D Weerasinghe Teacher Effects on Longitudinal Student Achievement: A Report on Research in Progress (CREATE, Dallas, TX, 1997).
11
E Haertel Using a longitudinal student tracking system to improve the design for public school accountability in California, Available at http://ed.stanford.edu/suse/faculty/haertel/Haertel-Value-Added.pdf. (2005).
12
W Stroup, What Bernie Madoff can teach us about accountability in education. Education Weekly 28, 22–23 (2009).
13
LM McNeil, E Coppola, J Radigan, JV Heilig, Avoidable losses: High-stakes accountability and the dropout crisis. Educ Policy Anal Arch 16, 1–45 (2008).
14
D Weerasinghe, How to Compute School and Classroom Effectiveness Indices: The Value-Added Model Implemented in Dallas Independent School District. Office of Institutional Research (Dallas Independent School District, Dallas, TX, 2007).

Information & Authors

Information

Published in

Go to Proceedings of the National Academy of Sciences
Proceedings of the National Academy of Sciences
Vol. 106 | No. 41
October 13, 2009
PubMed: 19805049

Classifications

Submission history

Received: December 2, 2008
Published online: October 13, 2009
Published in issue: October 13, 2009

Keywords

  1. Fokker–Planck equation
  2. convection
  3. education

Acknowledgments

We thank Philip Kromer for helping to put the data in normalized form, Stephen Stigler for posing stimulating questions following a presentation, the Texas Education Agency for supplying preliminary data, and the University of Texas Dallas Educational Research Center and the Ray Marshall Center at the University of Texas at Austin for access to final unfiltered data. This work was supported by National Science Foundation Grant DMR 0701373.

Notes

This article is a PNAS Direct Submission.

Authors

Affiliations

Center for Nonlinear Dynamics and Department of Physics, University of Texas, Austin, TX 78712
D. Bansal
Center for Nonlinear Dynamics and Department of Physics, University of Texas, Austin, TX 78712

Notes

1
To whom correspondence should be addressed. E-mail: [email protected]
Author contributions: M.M. designed research; M.M. and D.B. performed research; M.M. and D.B. analyzed data; and M.M. and D.B. wrote the paper.

Competing Interests

The authors declare no conflict of interest.

Metrics & Citations

Metrics

Note: The article usage is presented with a three- to four-day delay and will update daily once available. Due to ths delay, usage data will not appear immediately following publication. Citation information is sourced from Crossref Cited-by service.


Citation statements




Altmetrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

    Loading...

    View Options

    View options

    PDF format

    Download this article as a PDF file

    DOWNLOAD PDF

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Personal login Institutional Login

    Recommend to a librarian

    Recommend PNAS to a Librarian

    Purchase options

    Purchase this article to access the full text.

    Single Article Purchase

    Flow and diffusion of high-stakes test scores
    Proceedings of the National Academy of Sciences
    • Vol. 106
    • No. 41
    • pp. 17243-17606

    Media

    Figures

    Tables

    Other

    Share

    Share

    Share article link

    Share on social media