Fatigue and vigilance in medical experts detecting breast cancer

Significance For over 70 y, researchers have believed that as time on search tasks increases, humans make more errors detecting target “events” (and take longer): a “vigilance decrement.” Previous research has been undertaken in laboratory settings, on tasks with little control over presentation rate, but generalized to real-world scenarios, leading to regulations limiting continuous viewing time in cancer screening. We demonstrate that in a large, controlled study in clinical practice, where readers self-pace reading and rest breaks, reduced accuracy is not observed. Overall accuracy increases with time on task with fewer false alarms. Instead of limiting continuous viewing time, work environments for breast screening should allow experts uninterrupted sessions of self-chosen length, thus improving accuracy and reducing unnecessary further tests.

This supplement contains additional details for the study methods (section 1 figure S1), descriptive statistics (section 2 table S1), full model outputs for adjusted and unadjusted models (Section 3 tables S2-S5).Results for speed of reading are presented for median rather than mean, demonstrating no impact of excluding outliers on results (section 4 figure S2).
Results for cancer detection rate, recall rate and speed of reading are presented both with and without cases which were examined out of the intended order, demonstrating this exclusion did not impact results (section 5, figure S3).

Supplementary Methods
Defining how many cases had been examined in each reading session Experts examine mammograms in screening practice in predefined sessions.Each session consists of a full or half day for a screening mammography machine at one location.Sessions are created alphabetically by surname from lists of eligible women from general practitioners' databases.On the radiology software each session is opened separately and is displayed as a list of women for whom the expert sequentially decides whether to recall each for further tests.The computer software records any cases which were not examined in the sequential order assigned.After completing a session, experts often immediately start another session, which can be achieved simply and quickly with the click of a mouse.The primary analysis in this paper considers an expert to have kept working constantly even if switching session as long as they did not spend longer than the predefined times between making decisions for subsequent women.
To create the new sessions we put the mammograms examined by each expert into chronological order (using the date/time stamp).Individual experts read some mammograms as first expert and others as second expert.Thus, we separated the data from each mammogram into two records, one for the data from each expert.The dataset was sorted by centre, expert and date/time to give a list of cases read by each expert in chronological order (regardless of whether it was as first expert or second expert).The difference between the time stamps of consecutive cases was calculated to give the time taken for that case by that expert.
The new sessions were then created (with new session numbers) by assigning each case to either the same session as the previous case if the time taken was less than the break definition, or to a new session if the time taken was greater than the break definition.This was done for each of the different values of the break definition (10,20, 60, 180 and 480 minutes).
It was possible that where mammograms from different original sessions have been combined into the same new session that the new session will contain some mammograms read as first expert and others as second expert.The dataset used for modelling contains all of the mammograms once, with their session position and outcomes taken from the first expert only.
This also ensures that the mammograms were examined independently without consulting the other expert's decision.
The trial data included a field indicating whether the actual order that the mammogram was read in was the intended order.Mammograms that were not read in the intended order were excluded from the models, as they may be systematically different, for example occasionally difficult cases which were put aside for later review.Mammograms that were read in the first position of a new session (as defined by the different break definitions) were also excluded, as it is not possible to determine the time taken to read them.
It was also necessary to exclude data from centres where it was not possible to distinguish between different experts using the expert ID code in the dataset.
The distributions of the session position numbers used in the models are shown in Figure S1.S1.

Supplementary tables -Descriptive statistics
The exclusion of mammograms from the dataset due to missing data to identify the expert (71,695) or due to the first expert not examining in the intended order (a further 52,886) reduced the dataset to 1,069,566 women's mammograms, of which 37 were a second set of mammograms from an individual woman.
The number of women included at each session position is dependent on the session definition and is given in figure S1.Using the first expert only, the recall rate was 4.8% and the cancer detection rate 7.4 per thousand women screened.
There were 410 experts in the study identified by their pseudonymised login at the computer system at each breast screening centre.Of these, only 360 pseudonymised codes were unique across the whole dataset, with the same pseudonymised code appearing at more than one centre.It was not possible to identify whether the same expert worked at two different breast screening centres or whether the same login was a coincidence, so we conservatively report only 360 readers.
Descriptive statistics of the three outcomes under the different thresholds are shown in Table S1.
Table S1.Descriptive statistics of the outcomes for each threshold.

Supplementary tables -Describing the models
The model coefficients for the main model adjusted for the woman's age and whether she had previously attended are given in table S2, with fitted values in table S3.The coefficients of an unadjusted but otherwise equivalent model are given in table S4, with fitted values in table S5.
Table S2.Model coefficients (with 95% confidence intervals) and random effect standard deviations for models for the three outcomes at the different break definitions.For recall and cancer detection the coefficients from their logistic models are shown as odds ratios (OR).For time taken to read the coefficients are shown on the linear scale.The models for recall and time taken used a linear basis spline for session position, with knots at positions 20 and 40.Instead of the model coefficients the gradient of those three lines is shown; as an OR for recall, and linearly for time taken.The gradient is over five session positions, rather than one.Age was standardised in the models.The coefficients shown in these tables have been adjusted (divided by the standard deviation, before conversion to the odds ratio scale) to show the effect of an increase of one year on the outcome.The intercept terms should be interpreted as the outcome at session position two for a mammogram of the mean age that is incident (not a woman's first mammogram).The model results for prevalent (woman's first mammogram) rather than incident are shown in the tables.The standard deviation of the random effects at each level are abbreviated to "RE SD".The cancer detection and recall results are shown to three decimal places (except for the cancer detection intercept and session position) and the time taken results to three significant digits.Table S3.Fitted values for selected session positions from the models of different break definitions, with 95% confidence intervals, for all three outcomes.The session positions listed are chosen throughout the range and include the knot points used in the basis spline of session position used in the models, positions 20 and 40.Table S4.Model coefficients (with 95% confidence intervals) and random effect standard deviations for models for the three outcomes at the different break definitions for sessions, which have not been adjusted for age and prevalence status.For recall and cancer detection the coefficients from their logistic models are shown as odds ratios (OR).For time taken to read the coefficients are shown on the linear scale.The models for recall and time taken used a linear basis spline for session position, with knots at positions 20 and 40.Instead of the model coefficients the gradient of those three lines is shown; as an OR for recall, and linearly for time taken.The gradient is over five session positions, rather than one.The intercept terms should be interpreted as the outcome at session position two.The standard deviation of the random effects at each level are abbreviated to "RE SD".The cancer detection and recall results are shown to three decimal places (except for the cancer detection intercept and session position) and the time taken results to three significant digits.Table S5.Fitted values for selected session positions from the models of different thresholds not adjusted for age and prevalence status, with 95% confidence intervals, for all three outcomes.The session positions listed are chosen throughout the range and include the knot points used in the basis spline of session position used in the recall and time taken models, positions 20 and 40.

Median speed of reading
The primary measure of time taken to examine each case was a mean, with cases taking longer than 10 minutes excluded so that the mean was not overly influenced by the tail of the distribution.The median time taken with no exclusions is shown in figure S2.This demonstrates the same pattern of decreasing time taken per case with increasing time on task.Mammograms read in order only, excluding those with time taken to read of zero, but with no upper limit on time taken.

Models including cases examined out of intended order
taken to examine each woman's mammograms.Time on task is represented by the number of women's mammograms examined consecutively without a break.Break defined as 20 minutes without inputting a decision on the computer.Models were adjusted for women's age and whether she has previously attended screening, with clustering for expert and screening centre.

Break duration
To investigate the possible interaction between the length of the reader's breaks and the vigilance decrement/ improvement, outcomes for sessions starting after a short break, a moderate break, or a long break were plotted.This demonstrates that specificity and criterion are lower after a long break of >12hours, compared to after a short break of <1hour.

Time taken to read
To investigate the difference between the mean and median time taken to examine each woman's mammogram, the time taken to read was plotted using a 20-minute threshold:

Figure S1 .
Figure S1.Cumulative distribution of women in the dataset by number of women's

Figure S2 .
Figure S2.Median time taken to read for each position in a reading session, by break definition.

Figure S4 .
Figure S4.Outcomes for sessions starting after a short break of <1hour, a moderate break of 1 to 3 or 3 to 12 hours, or a long break of >12 hours which represents the next working day.The session is considred ended after a break without entering an opinion of more than 20 minutes in all definitions.

Figure S5 .
Figure S5.Histogram of time taken to read, using 20 minute threshold, horizontal axis limited to 180s.
Definition of a break (time without inputting a decision into the software)