Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Classroom sound can be used to classify teaching practices in college science courses

View ORCID ProfileMelinda T. Owens, Shannon B. Seidel, Mike Wong, Travis E. Bejines, Susanne Lietz, Joseph R. Perez, Shangheng Sit, Zahur-Saleh Subedar, Gigi N. Acker, Susan F. Akana, Brad Balukjian, Hilary P. Benton, J. R. Blair, Segal M. Boaz, Katharyn E. Boyer, Jason B. Bram, Laura W. Burrus, Dana T. Byrd, Natalia Caporale, Edward J. Carpenter, Yee-Hung Mark Chan, Lily Chen, Amy Chovnick, Diana S. Chu, Bryan K. Clarkson, Sara E. Cooper, Catherine Creech, Karen D. Crow, José R. de la Torre, Wilfred F. Denetclaw, Kathleen E. Duncan, Amy S. Edwards, Karen L. Erickson, Megumi Fuse, Joseph J. Gorga, Brinda Govindan, L. Jeanette Green, Paul Z. Hankamp, Holly E. Harris, Zheng-Hui He, Stephen Ingalls, Peter D. Ingmire, J. Rebecca Jacobs, Mark Kamakea, Rhea R. Kimpo, Jonathan D. Knight, Sara K. Krause, Lori E. Krueger, Terrye L. Light, Lance Lund, Leticia M. Márquez-Magaña, Briana K. McCarthy, Linda J. McPheron, Vanessa C. Miller-Sims, Christopher A. Moffatt, Pamela C. Muick, Paul H. Nagami, Gloria L. Nusse, Kristine M. Okimura, Sally G. Pasion, Robert Patterson, View ORCID ProfilePleuni S. Pennings, Blake Riggs, Joseph Romeo, Scott W. Roy, Tatiane Russo-Tait, Lisa M. Schultheis, Lakshmikanta Sengupta, Rachel Small, Greg S. Spicer, Jonathon H. Stillman, Andrea Swei, Jennifer M. Wade, Steven B. Waters, Steven L. Weinstein, Julia K. Willsie, Diana W. Wright, Colin D. Harrison, Loretta A. Kelley, Gloriana Trujillo, Carmen R. Domingo, Jeffrey N. Schinske, and Kimberly D. Tanner
PNAS March 21, 2017 114 (12) 3085-3090; first published March 6, 2017; https://doi.org/10.1073/pnas.1618693114
Melinda T. Owens
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Melinda T. Owens
Shannon B. Seidel
bDepartment of Biology, Pacific Lutheran University, Tacoma, WA 98447;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mike Wong
cCenter for Computing for Life Sciences, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Travis E. Bejines
bDepartment of Biology, Pacific Lutheran University, Tacoma, WA 98447;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susanne Lietz
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph R. Perez
bDepartment of Biology, Pacific Lutheran University, Tacoma, WA 98447;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shangheng Sit
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zahur-Saleh Subedar
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gigi N. Acker
dDepartment of Biology, De Anza College, Cupertino, CA 95014;
eNutrition, Food Science, and Packaging Department, San Jose State University, San Jose, CA 95192;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Susan F. Akana
fBiology Department, City College of San Francisco, San Francisco, CA 94112;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brad Balukjian
gBiology Department, Laney College, Oakland, CA 94607;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hilary P. Benton
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
J. R. Blair
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Segal M. Boaz
iBiology Department, Las Positas College, Livermore, CA 94551;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Katharyn E. Boyer
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
jRomberg Tiburon Center for Environmental Studies, San Francisco State University, Tiburon, CA 94920;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jason B. Bram
dDepartment of Biology, De Anza College, Cupertino, CA 95014;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laura W. Burrus
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dana T. Byrd
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Natalia Caporale
kDepartment of Neurobiology, Physiology, and Behavior, University of California, Davis, CA 95616;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Edward J. Carpenter
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
jRomberg Tiburon Center for Environmental Studies, San Francisco State University, Tiburon, CA 94920;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yee-Hung Mark Chan
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lily Chen
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amy Chovnick
iBiology Department, Las Positas College, Livermore, CA 94551;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diana S. Chu
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bryan K. Clarkson
lDepartment of Biological Science, Diablo Valley College, Pleasant Hill, CA 94523;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sara E. Cooper
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Catherine Creech
mDepartment of Biology, Portland Community College, Portland, OR 97219;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen D. Crow
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
José R. de la Torre
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wilfred F. Denetclaw
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kathleen E. Duncan
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amy S. Edwards
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karen L. Erickson
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Megumi Fuse
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph J. Gorga
nMath and Sciences Department, Diablo Valley College, San Ramon, CA 94582;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brinda Govindan
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
L. Jeanette Green
oScience and Technology Division, Cañada College, Redwood City, CA 94061;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul Z. Hankamp
pBiology Department, College of San Mateo, San Mateo, CA 94402;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Holly E. Harris
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zheng-Hui He
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Stephen Ingalls
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter D. Ingmire
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
qDivision of Undergraduate Education and Academic Planning, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
J. Rebecca Jacobs
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Kamakea
rLife Science Department, Chabot College, Hayward, CA 94545;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rhea R. Kimpo
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
sScience/Mathematics/Technology Division, Skyline College, San Bruno, CA 94066;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathan D. Knight
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sara K. Krause
tLife Sciences Department, Palomar College, San Marcos, CA 92069;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lori E. Krueger
uBiology Department, Solano Community College, Fairfield, CA 94534;
vDepartment of Biological Sciences, California State University, Sacramento, CA 95819;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Terrye L. Light
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lance Lund
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Leticia M. Márquez-Magaña
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Briana K. McCarthy
wBiology Department, Los Medanos College, Pittsburg, CA 94565;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Linda J. McPheron
xScience Department, Berkeley City College, Berkeley, CA 94704;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Vanessa C. Miller-Sims
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher A. Moffatt
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pamela C. Muick
uBiology Department, Solano Community College, Fairfield, CA 94534;
yBiological Sciences Department, Contra Costa College, San Pablo, CA 94806;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Paul H. Nagami
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
gBiology Department, Laney College, Oakland, CA 94607;
zDepartment of Biological Science, Holy Names University, Oakland, CA 94619;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gloria L. Nusse
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kristine M. Okimura
aaDepartment of Earth and Climate Sciences, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sally G. Pasion
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Robert Patterson
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pleuni S. Pennings
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pleuni S. Pennings
Blake Riggs
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Romeo
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Scott W. Roy
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tatiane Russo-Tait
bbDepartment of Curriculum and Instruction, STEM Education, University of Texas at Austin, Austin, TX 78712;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lisa M. Schultheis
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lakshmikanta Sengupta
pBiology Department, College of San Mateo, San Mateo, CA 94402;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rachel Small
ccDepartment of Chemistry and Biochemistry, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Greg S. Spicer
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jonathon H. Stillman
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
jRomberg Tiburon Center for Environmental Studies, San Francisco State University, Tiburon, CA 94920;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andrea Swei
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer M. Wade
ddDepartment of Biology, University of San Francisco, San Francisco, CA 94117;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven B. Waters
wBiology Department, Los Medanos College, Pittsburg, CA 94565;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven L. Weinstein
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julia K. Willsie
lDepartment of Biological Science, Diablo Valley College, Pleasant Hill, CA 94523;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Diana W. Wright
eNutrition, Food Science, and Packaging Department, San Jose State University, San Jose, CA 95192;
eeBiological, Health & Environmental Sciences Division, DeAnza College, Cupertino, CA 95014;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Colin D. Harrison
ffSchool of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Loretta A. Kelley
ggKelley, Petterson, and Associates, Inc., San Francisco, CA 94127;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gloriana Trujillo
hhOffice of the Vice Provost for Teaching and Learning, Stanford University, Stanford, CA 94305
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carmen R. Domingo
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jeffrey N. Schinske
dDepartment of Biology, De Anza College, Cupertino, CA 95014;
hDepartment of Biology, Foothill College, Los Altos Hills, CA 94022;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kimberly D. Tanner
aDepartment of Biology, San Francisco State University, San Francisco, CA 94132;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: kdtanner@sfsu.edu
  1. Edited by Bruce Alberts, University of California, San Francisco, CA, and approved January 31, 2017 (received for review November 20, 2016)

  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

Although the United States needs to expand its STEM (science, technology, engineering, mathematics) workforce, United States postsecondary institutions struggle to retain and effectively teach students in STEM disciplines. Using teaching techniques beyond lecture, such as pair discussions and reflective writing, has been shown to boost student learning, but it is unknown what proportion of STEM faculty use these active-learning pedagogies. Here we describe DART: Decibel Analysis for Research in Teaching, a machine-learning–derived algorithm that analyzes classroom sound to predict with high accuracy the learning activities used in classrooms, and its application to thousands of class session recordings. DART can be used for large-scale examinations of STEM teaching practices, evaluating the extent to which educators maximize opportunities for effective STEM learning.

Abstract

Active-learning pedagogies have been repeatedly demonstrated to produce superior learning gains with large effect sizes compared with lecture-based pedagogies. Shifting large numbers of college science, technology, engineering, and mathematics (STEM) faculty to include any active learning in their teaching may retain and more effectively educate far more students than having a few faculty completely transform their teaching, but the extent to which STEM faculty are changing their teaching methods is unclear. Here, we describe the development and application of the machine-learning–derived algorithm Decibel Analysis for Research in Teaching (DART), which can analyze thousands of hours of STEM course audio recordings quickly, with minimal costs, and without need for human observers. DART analyzes the volume and variance of classroom recordings to predict the quantity of time spent on single voice (e.g., lecture), multiple voice (e.g., pair discussion), and no voice (e.g., clicker question thinking) activities. Applying DART to 1,486 recordings of class sessions from 67 courses, a total of 1,720 h of audio, revealed varied patterns of lecture (single voice) and nonlecture activity (multiple and no voice) use. We also found that there was significantly more use of multiple and no voice strategies in courses for STEM majors compared with courses for non-STEM majors, indicating that DART can be used to compare teaching strategies in different types of courses. Therefore, DART has the potential to systematically inventory the presence of active learning with ∼90% accuracy across thousands of courses in diverse settings with minimal effort.

  • active learning
  • evidence-based teaching
  • science education
  • lecture
  • assessment

Current college STEM (science, technology, engineering, and mathematics) teaching in the United States continues to be lecture-based and is relatively ineffective in promoting learning (1, 2). Undergraduate instructors continue to struggle to engage, effectively teach, and retain postsecondary students, both generally and particularly among women and students of color (3, 4). Federal analyses suggest that a 10% increase in retention of undergraduate STEM students could address anticipated STEM workforce shortfalls (5). Replacing the standard lecture format with more active teaching strategies has been shown to increase retention, and hundreds of millions of dollars have been invested by national and federal agencies to this end (2). Even for those students retained in STEM, active-learning pedagogies have been repeatedly demonstrated to produce superior learning gains with large effect sizes compared with lecture-based pedagogies (6⇓⇓–9). All of the evidence suggests that shifting large numbers of STEM faculty to include even small amounts of active learning in their teaching may retain and more effectively educate far more students than having a few faculty completely transform their teaching (10).

The extent to which large numbers of STEM faculty are changing their teaching methods to include active learning is unclear. What proportion of United States STEM faculty use anything but lecture with question/answer (Q/A) of individual students? What is the probability that a student would encounter any active learning across all STEM courses in a single department or institution? To address these questions, one would need a measurement tool that could systematically inventory the presence and frequency of active learning not only in one course but also across dozens of departmental courses, multiple STEM departments, and thousands of colleges and universities. Currently available classroom observation tools [e.g., Teaching Dimensions Observation Protocol (TDOP), Reformed Teaching Observation Protocol (RTOP), Classroom Observation Protocol for Undergraduate STEM (COPUS), Practical Observation Rubric To Assess Active Learning (PORTAAL)] (11⇓⇓–14) require trained human observers and are not feasible for addressing questions at this scale. Previous research into using automatic classification of classroom activities largely focuses on K–12 education and has either required special recording equipment (15, 16), analyzed small numbers of teachers (17⇓–19), or did not focus on active-learning pedagogies (17), making these methods insufficient for large-scale analysis of the presence of active learning in college classrooms.

To meet this need, we developed DART: Decibel Analysis for Research in Teaching. DART is a machine-learning–based algorithm that can rapidly analyze thousands of audio-recorded class sessions per day, with minimal costs and without need for human observers, to measure the use of teaching strategies beyond traditional lecture in undergraduate STEM courses. Below we describe the development and validation of DART and report results from over 60 STEM courses drawn from community colleges and a 4-y university.

Results

Our key insight from observations of classroom environments was that nonlecture activities are typically associated with either unusually high noise levels (e.g., pair discussions, small group discussions) or unusually low noise levels (e.g., individual clicker question response, minute paper writing). This suggests that variation in the sound level of a classroom may indicate variation in teaching strategies. To test this hypothesis, an initial 45 audio recordings from 8 instructors teaching different courses (Table 1, pilot group) were analyzed by extracting audio decibel levels at a 2-Hz sampling rate (every 0.5 s) and graphing sound waveforms. To analyze DART’s performance in diverse teaching settings, these instructors were purposefully drawn from an atypical pool consisting of people from many different institutions who had undergone over 40 h of professional development in scientific teaching. To determine if patterns of variation in waveforms correlated with activity types, a three-person team listened to all recorded class sessions and individually annotated them using six emergent annotation codes (lecture with Q/A, discussion, silent, video, transition, and other) (Table S1). Sound-level patterns in class sessions primarily using lecture with Q/A were visibly different from the patterns in class sessions with varied learning activities (Fig. 1 A and C).

View this table:
  • View inline
  • View popup
Table 1.

Overview of DART study participants

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

Sound analysis can differentiate lecture and nonlecture classroom activities. All: Sound levels over time sampled at 2 Hz, with each tickmark indicating 2 min. Typical results are shown. (A) Class session with mostly lecture (94 min) with human annotation codes indicated above the waveform. (B) Background color indicates DART prediction for the recording shown in A. (C) Class session with varied learning activities (108 min) with human annotation codes indicated. (D) Background colors indicate DART predictions for recording in C. (E) DART prediction, small class (n = 15 students; 98 min). (F) DART prediction, large class (n = 287 students; 49 min). (G) Examples of DART learning activity footprints from different class sessions: thinking, writing, or clicker response; pair or group discussion; lecture; think-pair-share.

View this table:
  • View inline
  • View popup
Table S1.

Description of human annotation codes

Developing an Algorithm to Automate the Classification of Classroom Noise.

To develop DART, human annotations were used to design and optimize a machine-learning–based algorithm that reports what types of activities are going on in a classroom based on sound waveforms. To do this task, we applied methods from the field of audio segmentation, which applies machine learning to classify sound into different categories based on statistical characterizations (20). Because some of the human annotation categories yielded waveforms that were statistically similar to each other, we collapsed the six human annotation categories into four activity prediction modes with distinct waveform profiles: single voice, multiple voice, no voice, and other. Lecture with Q/A and video were aggregated into the mode “single voice”; discussion and transition were aggregated into the mode “multiple voice”; silent was assigned to the mode “no voice”; and other was assigned to the mode “other” (Table S1).

To prepare the classroom audio-recording waveforms for the optimization procedure, we tagged each 0.5-s sample of sound from each recording from the pilot group (640,152 samples in total) with three pieces of data: its label from human annotation (S for single voice, M for multiple voice, or N for no voice), the normalized mean volume of the 15-s window of audio around it, and the normalized SD in that window’s volume (Fig. S1A). Both the mean volume and the SD of the volume of each sample were normalized with respect to their class session.

Fig. S1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S1.

Using machine learning to optimize the DART algorithm for classifying classroom noise as single voice, multiple voice, or no voice. (A) Each 0.5-s sample from each recording from the pilot group was tagged with its label from human annotation (S for single voice, M for multiple voice, or N for no voice), the mean volume of the 15-s window of audio around it, and the SD (std) in that window’s volume. Mean volume and SD were normalized with respect to their class session. (B) Ensemble of binary decision trees used to classify classroom audio recordings. (C and D) Optimizing parameters for identifying nature of classroom noise samples using 10-fold stratified cross-validation with grid search. Example below shows the process of optimizing parameters for classifying samples as single voice. (C) Samples were sorted into single voice (n = 493,862) and nonsingle voice (n = 146,290) based on human annotation and further randomly and equally divided into 10 groups each (S1–S10 and NS1–NS10). These groups were recombined 10 times to make 10 folds, each of which contained all of the data. Each fold had a different pair of groups (i.e., S1/NS1 or S2/NS2) designated as the test set, with all other groups forming the validation set. These folds were all tested using the grid search method that empirically tested all volume and SD parameters and measured error for each of these parameter sets. (D) Grid search for choosing cut-off parameters for classifying samples as either belonging to a given annotation category or not. Different combinations of mean volume in window and SD of the window volume were tried as cut-off parameters on each of the 10 folds. The error rates (percentage of samples where the computer and human annotations did not match) for the validation set and the test set were calculated and are represented as heat maps with red showing high-validation error and blue showing low-validation error for each fold. The parameters were first tested at a low resolution (0.5 SD intervals), and the parameters that yielded the lowest validation error were then explored at a higher resolution (0.01 SD intervals). The combination of cut-offs for mean volume and mean SD of volume with the lowest average validation error over all folds was selected for the final version of the DART algorithm. The test error was an estimate of generalized model performance.

Then, to sort the samples into the four prediction modes (single voice, multiple voice, no voice, and other), we used an ensemble of binary decision trees comprised of four nodes connected serially. A binary decision tree is a series of decisions to either sort or not sort a given input into a certain category based on the values of the input. Here, the inputs were the 0.5-s samples of classroom audio, and the sorting decisions were based on each sample’s normalized mean volume and SD of the volume. In our tree, each node represented one activity prediction mode, and the nodes for each mode were connected in order of decreasing frequency from the pilot data, so that the dominant class activity (single voice) was detected first, and less-frequent class activities follow (multiple voice, no voice, and other, in that order) (Fig. S1B). This ordering emphasized the importance of predicting the common activities correctly while allowing some prediction flexibility for the less-frequent activities.

Next, we optimized the selection parameters that would determine which audio samples were sorted into which activity modes. To accomplish this, we used machine learning, specifically grid search (Fig. S1 C and D). Grid search is a brute-force method to select the optimal selection parameters for each mode by first evaluating each possible combination of the two selection parameters, the normalized average volume and the normalized average SD, and then choosing the pair of parameter values that yielded the model with the best match to human annotation, defined as the fewest number of errors. This grid search process was conducted three times—once each for single voice, multiple voice, and no voice—to find the optimal parameters for each activity prediction mode. For more details of the development of the DART algorithm, refer to SI Methods, Development of DART Algorithm with Machine Learning.

We found that the resulting algorithm, DART, is able to classify each 0.5-s sample of a recording into one of three DART prediction modes: single voice, multiple voice, or no voice. (The final algorithm never categorizes samples as other, probably because the human annotation “other” was assigned only 0.9% of the time to a variety of instances that were difficult to categorize in the pilot data.) Single-voice samples, characterized by one person speaking at a time (e.g., lecture, question/answer, and so forth), were of average volume but high variance. Single voice typically indicated nonactive teaching strategies given that only a single active voice was heard, with all other individuals passively listening. In contrast, multiple-voice samples, characterized by many people speaking simultaneously (e.g., pair discussions), were of high mean volume and low variance. No-voice samples, characterized by quiet throughout the classroom (e.g., silent writing), were of low mean volume and low variance. As verified by human annotations, multiple and no voice generally indicated active learning because many or all students actively were engaged in a task.

DART Classifies Classroom Noise with High Accuracy.

To assess the accuracy of DART, we compared DART’s classifications of classroom noise to the human annotations in various ways, both in the original dataset of 45 class sessions collected from 8 instructors and a new, larger dataset comprised of 1,486 class sessions collected from 49 instructors, representing 67 courses taught across 15 community colleges and a 4-y university, a total of 1,720 h of recordings (Table 1). Qualitatively, we saw that DART was able to differentiate between lecture and nonlecture classroom activities. For example, DART predicted a class session that was annotated as 98% lecture with Q/A to be solely single voice (Fig. 1 A and B) and a class session with varied activities, like silent writing and discussion, to have a variety of modes (Fig. 1 C and D). DART identification of varied learning activities was robust in both small and large classes (Fig. 1 E and F). Its predictions reveal that waveform “footprints” are indicative of specific teaching techniques (Fig. 1G). For example, the common active learning technique “think-pair-share” actually consists of three distinct activities in response to an instructor’s question to the class: first students silently think or write about the answer, then they discuss it in pairs or small groups, and finally some students share their responses individually with the class. A human would annotate these three phases, in order, as silent, discussion, and lecture with Q/A. Similarly, DART assigns no voice (think), multiple voice (pair), and single voice (share) (Fig. 1G).

We also assessed DART’s accuracy quantitatively by measuring how often DART predictions matched the human annotations. In the original dataset used for optimizing the algorithm, DART classification matched the human annotations 90% of the time across all modes. In comparison, human annotators agreed with each other only 93% of the time, showing that DART was almost as accurate at identifying classroom activities as human annotators were. To see if this high rate of accuracy was retained in a new context, we randomly chose one class session from each of the 67 courses recorded as part of the new, larger dataset, performed human annotation, and compared DART’s classifications to the human annotation. We again obtained a very high accuracy of 87%, suggesting that DART can accurately applied to many different classroom contexts.

To further assess DART’s ability to discern the presence of activities that may indicate active learning or traditional lecture, we used signal-detection theory to analyze DART’s accuracy by mode. In the original dataset, we used signal-detection theory to discriminate for each mode (single voice, multiple voice, and no voice) between correct inclusions (hits) and incorrect exclusions (misses) (21). We also used this method to determine the rates of correct exclusions (correct rejections) and incorrect inclusions (false alarms) for each of the three modes (21). The results are given in Fig. 2. DART correctly identifies nearly all instances of lecture and Q/A as single voice (hit rate = 98.0%) (Fig. 2A). In addition, the false-alarm rates for multiple voice and no voice are low (2.3% and <0.1%, respectively) (Fig. 2 B and C). Combined, these rates mean that most errors over- rather than underestimate lecture, minimizing the potential for falsely indicating the presence of active learning in class sessions.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

DART accurately identifies single voice and conservatively estimates multiple and no voice. Recordings from eight instructors from two colleges teaching one course each were used to produce this data. Pie charts on the Left show rates for hits (dark purple) and misses (light purple) and on the Right show rates for correct rejections (dark teal) and false alarms (light teal) for each DART mode. Both the number in parentheses and the area of the pie chart represent the proportion of each mode present in human annotations. d′, the sensitivity index, is a measurement of the difference between the signal and noise distributions. (A) Single voice, (B) multiple voice, (C) no voice.

DART Can Be Used to Perform Large-Scale Analysis of Classrooms.

We sought to explore how DART could be used to analyze classroom audio recordings on a larger scale, so we performed DART analysis on the larger dataset consisting of 1,720 h of recordings of 67 courses. DART analysis revealed that in these courses, a range of instructional strategies were represented. Although all courses (n = 67) used single voice a majority of the time, ranging from 69 to 100%, among individual class sessions (n = 1,486), time spent in single voice ranged from 15 to 100% (Fig. 3 A and B). Within a course, we observed that the time spent in single voice could vary from 15% in one class session to 90% in another class session (Fig. 3C). In addition, some instructors that had no multiple or no voice in some class sessions nevertheless spent up to 37% of the time in these categories in another class session within the same course (Fig. 3D). This within-course variability highlights the need for a tool that can efficiently analyze every class session of a course.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

DART can be used to analyze large numbers of courses. (A) Percentage of absolute time spent in single voice (SV), multiple voice (MV), and no voice (NV) for all eligible courses (n = 67). Courses ordered in increasing order of single voice percentage. Boxes indicate minimum and maximum percentages spent in single voice. (B) Percentage of absolute time spent in various modes for all class sessions from eligible courses (n = 1,486). Class sessions ordered in increasing order of single voice. Boxes indicate minimum and maximum percentages spent in single voice. (C and D) Percentage of time spent in multiple or no voice in each class session in time order for two representative courses, course 1 and course 2. (E) Proportion of courses where all class sessions have some multiple or no voice (<100% single voice) (Left) and where at least half of all class sessions have some multiple or no voice (Right). (F) Average time spent in multiple or no voice for courses with one female (n = 36) or male (n = 26) instructor (cotaught courses excluded). Error bars represent SE. n.s.: P = 0.10. (G) Average time spent in multiple or no voice for biology majors’ (n = 32) and nonbiology majors’ (n = 35) courses. Error bars represent SE. *P = 0.01.

To determine the likelihood a student experienced active learning in any one of these courses, we calculated the percentage of class sessions within each course that included any multiple or no voice (<100% single voice). Whereas only 31% of the courses had multiple or no-voice activities in all class sessions, 88% of courses had multiple or no-voice activities in at least half of their class sessions (Fig. 3D), indicating that many of these instructors are using active-learning strategies, which is likely unusual among undergraduate STEM instructors.

DART also has the potential to reveal differences in how courses are taught across instructors and courses in particular departments or institutions. In this course sample, we found that the percentage of time spent in multiple or no voice did not vary by instructor gender (n = 36 female, n = 26 male; P = 0.10) but was significantly higher in courses for biology majors (n = 32) than nonbiology majors (n = 35; P = 0.01) (Fig. 3 D and E).

SI Methods

Development of DART Algorithm with Machine Learning.

To develop DART, the human annotations were used to design and optimize a machine-learning algorithm that reports what types of activities are going on in a classroom based on sound waveforms. As stated in the main text, to do this task we applied methods from the field of audio segmentation, which applies machine learning to classify sound into different categories based on statistical characterizations (15). Because some of the human annotation categories yielded waveforms that were statistically similar to each other, we collapsed the eight human annotation categories into four activity prediction modes with distinct waveform profiles: single voice, multiple voice, no voice, and other. Lecture with Q/A and video were aggregated into the mode single voice, discussion and transition were aggregated into the mode multiple voice, silent was assigned to the mode no voice, and other was assigned to the mode other (Table S1).

To prepare the classroom audio recording waveforms for the optimization procedure, we tagged each 0.5-s sample of sound from each recording from the pilot group (640,152 samples in total) with three pieces of data: its label from human annotation (S for single voice, M for multiple voice, or N for no voice), the normalized mean volume of the 15-s window of audio around it, and the normalized SD in that window’s volume (Fig. S1A). Both the mean volume and the SD of the volume of each sample were normalized with respect to their class session.

Then, to sort the samples into the four prediction modes (single voice, multiple voice, no voice, and other), we used an ensemble of binary decision trees comprised of four nodes connected serially. A binary decision tree is a series of decisions to either sort or not sort a given input into a certain category based on the values of the input. Here, the inputs were the 0.5-s samples of classroom audio, and the sorting decisions were based on each sample’s normalized mean volume and SD of the volume. In our tree, each node represented one activity prediction mode, and the nodes for each mode were connected in order of decreasing frequency from the pilot data, so that the dominant class activity (single voice) was detected first, and less-frequent class activities follow (multiple voice, no voice, and other, in that order) (Fig. S1B). This ordering emphasized the importance of predicting the common activities correctly while allowing some prediction flexibility for the less-frequent activities.

To optimize the cut-off parameters that would sort the audio samples at each node, we used machine learning, specifically 10-fold stratified cross validation with grid search (Fig. S1 C and D). This process was repeated three times to find the optimal cut-off parameters for each node.

We first created stratified folds for cross-validation. To create the folds for, for example, the single-voice/nonsingle-voice node, all samples annotated by humans as single voice were equally and randomly divided into 10 single voice groups (S1–S10), whereas all other samples were equally and randomly divided into 10 nonsingle voice groups (NS1–NS10) (Fig. S1C). A fold consisted of a set of all 20 of these groups (i.e., all of the pilot data) with one pair of groups, for example S1 and NS1, designated as the “test set,” whereas the remaining 18 groups were designated the “validation set” (Fig. S1C). All 10 such folds were created, each with a different pair of groups being designated the test set (Fig. S1C).

We then performed a grid search to look for the optimal cut-offs for each mode. Different combinations of mean volume in window and SD of the window volume were tried as cut-off parameters on each of the 10 folds (Fig. S1D). For each fold, the error rates (percentage of samples where the computer and human annotations did not match) for the validation set and the test set were calculated. The parameters were first tested at a low resolution (0.5 SD intervals), and the parameters that yielded the lowest validation error were then explored at a higher resolution (0.01 SD intervals). The combination of cut-offs with the lowest average validation error over all 10 folds was selected for DART (Fig. S1D). The test error was used as an estimate of generalized model performance. This approach avoided selecting a model that overfit the data and overestimated prediction performance.

As a side note, the final algorithm, DART, never predicts “other.” As this mode was marked by humans only 0.9% of the time, the fact that this mode is never used does not greatly affect model accuracy.

Participant Recruitment.

Participants were recruited in two phases, a pilot phase in spring 2014 and a large-scale analysis phase in spring 2015.

Pilot group.

Data from this group were used to train the DART algorithm. We invited participants to be in the pilot group if they: (i) were a community college biology instructor teaching at a quarter-system institution, (ii) had attended a weeklong intensive scientific teaching institute, (iii) were teaching one or more nonlaboratory courses in spring 2014. Thirteen participants were invited and nine accepted. Data from one participant was excluded because of leaving the course midquarter. Therefore, eight instructors participated for an overall participation rate of 61.5%.

Large-scale analysis group.

Data from this group were used to test the effectiveness of the DART algorithm for large-scale analyses. Therefore, we invited a much larger set of participants, all community college or comprehensive university instructors who had previously attended a scientific teaching institute. Seventy-five community college instructors at either semester or quarter institutions were invited; at the time of invitation, it was unknown whether they were teaching in spring 2015. Twenty-eight agreed, for a participation rate of 36.8% (Table S2). These instructors were given a $500 stipend for their collaboration. Forty-two comprehensive university instructors were invited, all of whom were teaching a course in spring 2015. Thirty-one instructors agreed, for a participation rate of 73.8% (Table S2). These instructors were given summer salary or course release in a future term. Although the participation rates for these sets of instructors varied greatly, these differences are anticipated given differences in the method of recruitment and teaching context.

Further DART Accuracy Measures.

We analyzed DART’s accuracy by prediction mode for the pilot data. First, we quantified how often each human annotation code was classified into each DART prediction mode and vice versa (Fig. S2). Quantification of how often each human annotation code was classified into each DART prediction mode shows that nearly all of the times annotated as lecture with Q/A were assigned correctly to single voice (98.5% of the time) (Fig. S2A). For comparison, DART was modestly accurate at identifying discussion as multiple voice (73.9%) and less accurate at identifying silent as no voice (56.0%) (Fig. S2A); the latter result is not surprising considering the presence of extraneous classroom sounds or instructor comments or human speech may cause DART to incorrectly classify silent activities as single voice (34.8%). Common DART errors are described in Table S3. In addition, quantification of which human annotations were classified as single voice shows that single voice was primarily composed of lecture with Q/A (86.4%) (Fig. S2B). For comparison, multiple voice was mostly composed of discussion (75.2%), and no voice was overwhelmingly composed of silent (91.6%) (Fig. S2B).

Fig. S2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. S2.

DART can accurately identify when lecture with Q/A occurs. (A) Percentage of the time each human annotation code was labeled by the DART prediction as single voice, multiple voice, or no voice. Shaded boxes represent the DART prediction mode that was most often assigned to that row’s human annotation code. (B) Percentage of the time each DART prediction mode was labeled by each human annotation code. Shaded boxes represent the human annotation code that is most represented in that row’s DART prediction mode.

Discussion

In summary, we have described the development and validation of DART, an analytical tool that uses sound levels to predict classroom activities, as well as results from applying DART to 67 STEM courses. We show that DART is robust to varying class sizes and can determine the presence and quantity of single-voice (e.g., lecture), multiple-voice (e.g., pair or group discussion), or no-voice (e.g., clicker question, thinking, or quiet writing) learning activities with ∼90% accuracy. At this level of accuracy, ease, and time efficiency (∼5 min per 2-h class session), one could analyze and draw broad conclusions about millions of hours of class sessions at periodic intervals over time. Because DART only analyzes sound levels, it protects the anonymity of instructors and students. Furthermore, because DART detected differences in the extent of nonlecture in courses for nonbiology majors versus biology majors, DART additionally promises to reveal differences among other types of courses, instructors, disciplines, and institutions that were previously not feasible for study.

DART is relevant to many educational stakeholders, from individual instructors to institutions and researchers. For individual instructors, DART holds additional promise as a tool for individual instructor professional development. Although previous studies have shown that many STEM faculty aspire to change their teaching (22), detailed observations of classroom videos suggest that instructors overestimate the extent to which they have integrated reformed teaching practices in their classrooms (23). DART could provide instructors with quick and quantitative evidence for instructor self-study. DART can easily identify those class sessions with minimal to no learning activities for students and enable faculty to specifically target how they spend their limited time for pedagogical innovation. For disciplinary programs or departments and the faculty developers that support their teaching efforts, DART could supplement ongoing program assessment, providing insight into the nature of the learning activities happening in different courses with varying student outcomes. It could quickly reveal differences in the teaching strategies used across a department, allowing faculty to have discussions of teaching goals across the curriculum. For institutions, DART may provide a means for describing to prospective students and skeptical parents the added value of a STEM education at their particular campus. Increasingly, parents and students seek information about the added value of an education at particular institution, going beyond academic reputation and research profile, and DART could help institutions make transparent the extent to which their students experience active engagement and their faculty use pedagogically effective teaching methods in their courses. Finally, for federal and private agencies attempting to foster change in STEM faculty teaching practices, DART has the potential to dramatically increase systematic measurement of classroom practices and expand insights being gained from current evaluation approaches through self-report, occasional classroom observations, and time-consuming videotape analyses. In addition, although DART emerged from studies of STEM classrooms, DART also has the potential to address similar inquiries about university classrooms in other subjects or about precollege settings. DART’s efficiency could allow for studying correlations between DART’s quantitative metrics and a variety of variables associated with STEM courses, including positive outcomes, such as overall learning gains, pass rates, and success in follow-on courses, as well as negative outcomes, such as persistent achievement gaps correlated with student gender or cultural background. It is important to note that DART is not suitable for ranking or evaluating individual instructors, both because of the possibility of errors and because DART is not intended to measure the quality of teaching. Although much research has established that any form of active learning appears to produce higher learning gains than lecture alone (9), it is not known how much or what patterns of active learning may be adequate or optimal for learning.

So, what proportion of STEM instructors in the United States and internationally regularly use teaching strategies beyond lecture? What is the probability that an undergraduate STEM student would have the opportunity to speak, write, or discuss their ideas with peers in every class session? Analyzing classroom noise can quickly and anonymously reveal what is happening in classrooms, making DART a measurement tool with the potential to systematically inventory the presence of active learning across all types of higher education institutions. Given pressing needs to expand and diversify STEM workforces in the United States and beyond, DART can also be used to characterize the extent to which educators are maximizing opportunities for effective STEM learning. Because DART will be available online at dart.sfsu.edu, thousands of instructors, students, or other stakeholders could soon perform DART analyses, opening a variety of new lines of research and inquiry.

Methods

Audio Recording.

Each audio recording analyzed as part of this paper was obtained from Sony audio recorder model ICD-PX333. Decibel analysis has also been completed using recordings made on the iPhone Voice Memo App, as well as live-recording the sound levels in the classroom using the iPhone Decibel 10th App. Instructors were given audio recorders and asked to record every class session of at least one of the courses they were teaching. They were instructed to place the audio recorders at the front of the classroom (e.g., on a lectern) with the microphone pointing in the general direction of students. Before analysis, recordings were trimmed by hand at the beginning and end to exclude noise associated with student arrival and departure.

Instructor Population.

Courses analyzed in this study were taught by collaborators on the Talk Matters Project, an advanced collaborative scientific teaching research project. Participating instructors in this project were drawn from two faculty development programs focusing on scientific teaching: Community College Biology Faculty Enhancement through Scientific Teaching (CCB FEST), for community college biology faculty; and Biology Faculty Explorations in Scientific Teaching (Biology FEST), for biology faculty in a single 4-y university. They included part-time, full-time, and tenured/tenure-track faculty teaching a variety of biology courses, including lower- and upper-division courses and courses for biology majors and nonbiology majors. Course enrollments ranged from 4 to 287 students with a median course size of 48 students.

Faculty were recruited in two phases, a pilot phase in Spring 2014 and a large-scale analysis phase in Spring 2015. The research done was a collaboration between dozens of faculty instructors, and as a result there were no human subjects and no need for informed consent. Each instructor who contributed recordings has a letter of collaboration on file with San Francisco State University’s Institutional Review Board, which approved the research described in this report in exempt protocols #E14-141a-d. For more information about faculty recruitment and participation rates, see SI Methods, Participant Recruitment and Table S2.

View this table:
  • View inline
  • View popup
Table S2.

Instructor participation rates for each phase of DART development

Human Annotation of Pilot Data.

The development of annotation codes was an iterative process. A team of three people annotated a total of 45 class session recordings split between the 8 instructors in the pilot group. Initially, human annotation was unstructured, and coders were charged to individually listen to audio files, observe audio waveforms, and develop codes that correlated with the types of activities occurring in class sessions. For each new activity lasting more than 15 s, annotators indicated a start time (minutes and seconds) and a code. Emergent codes from all three annotators were compared and collapsed into six categories (lecture with Q/A, discussion, silent, transition, video, and other) (Table S1). The predominant annotation code of this set was lecture with Q/A, which took up 73.5% of the time, followed by discussion at 13.8%. Silent, transition, video, and other each took up less than 5% of the time (Table S1).

One class session from each of the instructors (17% of total annotation) was used to test interrater reliability; all other class sessions were annotated by only one person. The mean Fleiss’ κ, a metric appropriate for measuring agreement between multiple annotators for categorical ratings, was κ = 0.567, indicating moderate to substantial agreement (24). Fleiss’ κ was calculated by hand and in Excel. In addition, annotators agreed with each other 93.2% of the time, also showing good interrater reliability.

Measurement of DART’s Accuracy.

Pilot data.

In the final model used for DART, model prediction accuracy was found to be 89.5% accurate overall on the pilot data. The accuracy was found by calculating the percentage of time the prediction mode matched the human annotation for all 66 annotations (of 45 class sessions; some class sessions were annotated by multiple people). As noted above, by the same metric, the human annotators achieve an accuracy of 93.2%, because human annotators did not always agree. We also analyzed the accuracy of DART with signal-detection theory (Fig. 2). Signal-detection theory calculations of hit, miss, false positive, and correct rejection rates used equations outlined in Stanislaw and Todorov (21) and were calculated in Excel.

For further analyses of DART’s accuracy on the pilot group data, see SI Materials and Methods, Further DART Accuracy Measures and Fig. S2. Common DART errors are described in Table S3.

View this table:
  • View inline
  • View popup
Table S3.

Potential DART limitations and coding misclassifications

Large-scale analysis data.

To calculate DART’s accuracy on the large-scale analysis data set, one class session from each of this dataset’s 67 courses was randomly chosen and annotated by a new two-person team trained in annotation using the previous annotation team’s codes and files. We compared how often the human annotations matched DART’s predictions, obtaining an accuracy of 87%.

DART Analysis of a Large Set of Courses.

Fifty-seven instructors recorded at least one class session in 78 distinct courses. Of these 78 courses, we only included nonlaboratory biology courses where at least 30% of class sessions were recorded. Therefore, we excluded three courses for being laboratories and eight courses for having low numbers of recordings, giving an inclusion rate of 67 of 78 = 85.9%.

DART was used to calculate the time spent in single voice, multiple voice, and no voice for each class session. To compare DART data between different groups of courses, we used t tests in Excel on logit-transformed DART data, to correct for using percentage data.

Acknowledgments

We thank John Coley, Heidi McLaughlin, Sarah Bissonnette, Kristin de Nesnera, the National Science Foundation-funded Community College Biology Faculty Enhancement through Scientific Teaching community, and the Howard Hughes Medical Institute-funded Biology Faculty Enhancement through Scientific Teaching community for insightful discussion and support. We also thank the Center for Computing for Life Sciences at San Francisco State University for extensive support. This work was funded by Howard Hughes Medical Institute Undergraduate Science Education Award 52007556 and National Science Foundation Transforming Undergraduate Education in Science, Technolgy, Engineering, and Mathematics Award DUE-1226361.

Footnotes

  • ↵1M.T.O., S.B.S., and M.W. contributed equally to this work.

  • ↵2To whom correspondence should be addressed. Email: kdtanner{at}sfsu.edu.
  • Author contributions: M.T.O., S.B.S., M.W., J.N.S., and K.D.T. designed research; M.T.O., S.B.S., M.W., T.E.B., S.L., J.R.P., S.S., Z.-S.S., G.N.A., S.F.A., B.B., H.P.B., J.R.B., S.M.B., K.E.B., J.B.B., L.W.B., D.T.B., N.C., E.J.C., Y.-H.M.C., L.C., A.C., D.S.C., B.K.C., S.E.C., C.C., K.D.C., J.R.d.l.T., W.F.D., K.E.D., A.S.E., K.L.E., M.F., J.J.G., B.G., L.J.G., P.Z.H., H.E.H., Z.-H.H., S.I., P.D.I., J.R.J., M.K., R.R.K., J.D.K., S.K.K., L.E.K., T.L.L., L.L., L.M.M.-M., B.K.M., L.J.M., V.C.M.-S., C.A.M., P.C.M., P.H.N., G.L.N., K.M.O., S.G.P., R.P., P.S.P., B.R., J.R., S.W.R., T.R.-T., L.M.S., L.S., R.S., G.S.S., J.H.S., A.S., J.M.W., S.B.W., S.L.W., J.K.W., D.W.W., C.D.H., L.A.K., G.T., C.R.D., J.N.S., and K.D.T. performed research; M.T.O., S.B.S., M.W., J.N.S., and K.D.T. contributed new reagents/analytic tools; M.T.O., S.B.S., M.W., T.E.B., S.L., J.R.P., S.S., Z.-S.S., J.N.S., and K.D.T. analyzed data; and M.T.O., S.B.S., M.W., T.E.B., J.R.P., J.N.S., and K.D.T. wrote the paper.

  • Conflict of interest statement: K.D.T., J.N.S., M.W., S.B.S., and M.T.O. have filed a provisional patent on the subject of this report, DART (US Provisional Patent Application No. 62/398,888).

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1618693114/-/DCSupplemental.

Freely available online through the PNAS open access option.

View Abstract

References

  1. ↵
    1. Arum R,
    2. Roksa J
    (2010) Academically Adrift: Limited Learning on College Campuses (Univ of Chicago Press, Chicago).
    .
  2. ↵
    1. Singer SR,
    2. Nielsen NR,
    3. Schweingruber HA
    , eds (2012) Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering (National Academies, Washington, DC).
    .
  3. ↵
    1. Seymour E,
    2. Hewitt NM
    (1997) Talking About Leaving: Why Undergraduates Leave The Sciences (Westview Press, Boulder, CO).
    .
  4. ↵
    1. Graham MJ,
    2. Frederick J,
    3. Byars-Winston A,
    4. Hunter A-B,
    5. Handelsman J
    (2013) Increasing persistence of college students in STEM. Science 341(6153):1455–1456.
    .
    OpenUrlAbstract/FREE Full Text
  5. ↵
    1. President’s Council of Advisors on Science and Technology
    (2012) Engage to Excel: Producing One Million Additional College Graduates with Degrees in Science, Technology, Engineering, and Mathematics (Executive Office of the President, Washington, DC).
    .
  6. ↵
    1. Eddy SL,
    2. Hogan KA
    (2014) Getting under the hood: How and for whom does increasing course structure work? CBE Life Sci Educ 13(3):453–468.
    .
    OpenUrlAbstract/FREE Full Text
  7. ↵
    1. Hake RR
    (1998) Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. Am J Phys 66(1):64–74.
    .
    OpenUrl
  8. ↵
    1. Halloun IA,
    2. Hestenes D
    (1985) The initial knowledge state of college physics students. Am J Phys 53(11):1043–1055.
    .
    OpenUrl
  9. ↵
    1. Freeman S, et al.
    (2014) Active learning increases student performance in science, engineering, and mathematics. Proc Natl Acad Sci USA 111(23):8410–8415.
    .
    OpenUrlAbstract/FREE Full Text
  10. ↵
    1. Fairweather J
    (2008) Linking evidence and promising practices in STEM undergraduate education. NRC workshop on Evidence on Selected Promising Practices in Undergraduate Science, Technology, Engineering, and Mathematics (STEM) Education (Board of Science Education, National Research Council, The National Academies, Washington, DC). Available at https://nsf.gov/attachments/117803/public/Xc–Linking_Evidence–Fairweather.pdf. Accessed September 9, 2016.
    .
  11. ↵
    1. Hora MT,
    2. Oleson A,
    3. Ferrare JJ
    (2008) Teaching Dimensions Observation Protocol (TDOP) (Wisconsin Center for Education Research, Madison, WI).
    .
  12. ↵
    1. Sawada D, et al.
    (2002) Measuring reform practices in science and mathematics classrooms: The reformed teaching observation protocol. Sch Sci Math 102(6):245–253.
    .
    OpenUrlCrossRef
  13. ↵
    1. Smith MK,
    2. Jones FHM,
    3. Gilbert SL,
    4. Wieman CE
    (2013) The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE Life Sci Educ 12(4):618–627.
    .
    OpenUrlAbstract/FREE Full Text
  14. ↵
    1. Eddy SL,
    2. Converse M,
    3. Wenderoth MP
    (2015) PORTAAL: A classroom observation tool assessing evidence-based teaching practices for active learning in large science, technology, engineering, and mathematics classes. CBE Life Sci Educ 14(2):14:ar23.
    .
    OpenUrl
  15. ↵
    1. Donnelly PJ, et al.
    (2016) Multi-sensor modeling of teacher instructional segments in live classrooms. Proceedings of the 18th ACM International Conference on Multimodal Interaction - ICMI 2016 (ACM, New York), pp 177–184.
    .
  16. ↵
    1. Wang Z,
    2. Pan X,
    3. Miller KF,
    4. Cortina KS
    (2014) Automatic classification of activities in classroom discourse. Comput Educ 78:115–123.
    .
    OpenUrl
  17. ↵
    1. Li Y,
    2. Dorai C
    (2006) Instructional video content analysis using audio information. IEEE Trans Audio Speech Lang Process 14(6):2264–2274.
    .
    OpenUrl
  18. ↵
    1. Donnelly PJ, et al.
    (2016) Automatic teacher modeling from live classroom audio. Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization - UMAP ’16 (ACM, New York), pp 45–53.
    .
  19. ↵
    1. Brdiczka O,
    2. Maisonnasse J,
    3. Reignier P
    (2005) Automatic detection of interaction groups. Proceedings of the 7th International Conference on Multimodal Interfaces - ICMI ’05 (ACM, New York), p 32.
    .
  20. ↵
    1. Lu L,
    2. Zhang H-J,
    3. Li SZ
    (2003) Content-based audio classification and segmentation by using support vector machines. Multimedia Syst 8(6):482–492.
    .
    OpenUrl
  21. ↵
    1. Stanislaw H,
    2. Todorov N
    (1999) Calculation of signal detection theory measures. Behav Res Methods Instrum Comput 31(1):137–149.
    .
    OpenUrlCrossRefPubMed
  22. ↵
    1. Savkar V,
    2. Lokere J
    (2010) Time to Decide: The Ambivalence of the World of Science Toward Education. (Nature Education, Cambridge, MA).
    .
  23. ↵
    1. Ebert-May D, et al.
    (2011) What we say is not what we do: Effective evaluation of faculty professional development programs. Bioscience 61(7):550–558.
    .
    OpenUrlAbstract/FREE Full Text
  24. ↵
    1. Landis JR,
    2. Koch GG
    (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174.
    .
    OpenUrlCrossRefPubMed
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Classroom sound can be used to classify teaching practices in college science courses
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Classroom sound classifies teaching practices
Melinda T. Owens, Shannon B. Seidel, Mike Wong, Travis E. Bejines, Susanne Lietz, Joseph R. Perez, Shangheng Sit, Zahur-Saleh Subedar, Gigi N. Acker, Susan F. Akana, Brad Balukjian, Hilary P. Benton, J. R. Blair, Segal M. Boaz, Katharyn E. Boyer, Jason B. Bram, Laura W. Burrus, Dana T. Byrd, Natalia Caporale, Edward J. Carpenter, Yee-Hung Mark Chan, Lily Chen, Amy Chovnick, Diana S. Chu, Bryan K. Clarkson, Sara E. Cooper, Catherine Creech, Karen D. Crow, José R. de la Torre, Wilfred F. Denetclaw, Kathleen E. Duncan, Amy S. Edwards, Karen L. Erickson, Megumi Fuse, Joseph J. Gorga, Brinda Govindan, L. Jeanette Green, Paul Z. Hankamp, Holly E. Harris, Zheng-Hui He, Stephen Ingalls, Peter D. Ingmire, J. Rebecca Jacobs, Mark Kamakea, Rhea R. Kimpo, Jonathan D. Knight, Sara K. Krause, Lori E. Krueger, Terrye L. Light, Lance Lund, Leticia M. Márquez-Magaña, Briana K. McCarthy, Linda J. McPheron, Vanessa C. Miller-Sims, Christopher A. Moffatt, Pamela C. Muick, Paul H. Nagami, Gloria L. Nusse, Kristine M. Okimura, Sally G. Pasion, Robert Patterson, Pleuni S. Pennings, Blake Riggs, Joseph Romeo, Scott W. Roy, Tatiane Russo-Tait, Lisa M. Schultheis, Lakshmikanta Sengupta, Rachel Small, Greg S. Spicer, Jonathon H. Stillman, Andrea Swei, Jennifer M. Wade, Steven B. Waters, Steven L. Weinstein, Julia K. Willsie, Diana W. Wright, Colin D. Harrison, Loretta A. Kelley, Gloriana Trujillo, Carmen R. Domingo, Jeffrey N. Schinske, Kimberly D. Tanner
Proceedings of the National Academy of Sciences Mar 2017, 114 (12) 3085-3090; DOI: 10.1073/pnas.1618693114

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Classroom sound classifies teaching practices
Melinda T. Owens, Shannon B. Seidel, Mike Wong, Travis E. Bejines, Susanne Lietz, Joseph R. Perez, Shangheng Sit, Zahur-Saleh Subedar, Gigi N. Acker, Susan F. Akana, Brad Balukjian, Hilary P. Benton, J. R. Blair, Segal M. Boaz, Katharyn E. Boyer, Jason B. Bram, Laura W. Burrus, Dana T. Byrd, Natalia Caporale, Edward J. Carpenter, Yee-Hung Mark Chan, Lily Chen, Amy Chovnick, Diana S. Chu, Bryan K. Clarkson, Sara E. Cooper, Catherine Creech, Karen D. Crow, José R. de la Torre, Wilfred F. Denetclaw, Kathleen E. Duncan, Amy S. Edwards, Karen L. Erickson, Megumi Fuse, Joseph J. Gorga, Brinda Govindan, L. Jeanette Green, Paul Z. Hankamp, Holly E. Harris, Zheng-Hui He, Stephen Ingalls, Peter D. Ingmire, J. Rebecca Jacobs, Mark Kamakea, Rhea R. Kimpo, Jonathan D. Knight, Sara K. Krause, Lori E. Krueger, Terrye L. Light, Lance Lund, Leticia M. Márquez-Magaña, Briana K. McCarthy, Linda J. McPheron, Vanessa C. Miller-Sims, Christopher A. Moffatt, Pamela C. Muick, Paul H. Nagami, Gloria L. Nusse, Kristine M. Okimura, Sally G. Pasion, Robert Patterson, Pleuni S. Pennings, Blake Riggs, Joseph Romeo, Scott W. Roy, Tatiane Russo-Tait, Lisa M. Schultheis, Lakshmikanta Sengupta, Rachel Small, Greg S. Spicer, Jonathon H. Stillman, Andrea Swei, Jennifer M. Wade, Steven B. Waters, Steven L. Weinstein, Julia K. Willsie, Diana W. Wright, Colin D. Harrison, Loretta A. Kelley, Gloriana Trujillo, Carmen R. Domingo, Jeffrey N. Schinske, Kimberly D. Tanner
Proceedings of the National Academy of Sciences Mar 2017, 114 (12) 3085-3090; DOI: 10.1073/pnas.1618693114
Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 114 (12)
Table of Contents

Submit

Sign up for Article Alerts

Article Classifications

  • Social Sciences
  • Psychological and Cognitive Sciences

Jump to section

  • Article
    • Abstract
    • Results
    • SI Methods
    • Discussion
    • Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Abstract depiction of a guitar and musical note
Science & Culture: At the nexus of music and medicine, some see disease treatments
Although the evidence is still limited, a growing body of research suggests music may have beneficial effects for diseases such as Parkinson’s.
Image credit: Shutterstock/agsandrew.
Large piece of gold
News Feature: Tracing gold's cosmic origins
Astronomers thought they’d finally figured out where gold and other heavy elements in the universe came from. In light of recent results, they’re not so sure.
Image credit: Science Source/Tom McHugh.
Dancers in red dresses
Journal Club: Friends appear to share patterns of brain activity
Researchers are still trying to understand what causes this strong correlation between neural and social networks.
Image credit: Shutterstock/Yeongsik Im.
Yellow emoticons
Learning the language of facial expressions
Aleix Martinez explains why facial expressions often are not accurate indicators of emotion.
Listen
Past PodcastsSubscribe
Goats standing in a pin
Transplantation of sperm-producing stem cells
CRISPR-Cas9 gene editing can improve the effectiveness of spermatogonial stem cell transplantation in mice and livestock, a study finds.
Image credit: Jon M. Oatley.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490