Previous Article |
Table of Contents
| Next Article
COLLOQUIUM PAPERS
Mixed-membership models of scientific publications


¶
*Department of Statistics, School of Social Work, and Center for Statistics and the Social Sciences, University of Washington, Seattle, WA 98195; and
Department of Statistics, ¶Computer Science Department, and
Center for Automated Learning and Discovery, Carnegie Mellon University, Pittsburgh, PA 15213
PNAS is one of world's most cited multidisciplinary scientific journals. The PNAS official classification structure of subjects is reflected in topic labels submitted by the authors of articles, largely related to traditionally established disciplines. These include broad field classifications into physical sciences, biological sciences, social sciences, and further subtopic classifications within the fields. Focusing on biological sciences, we explore an internal soft-classification structure of articles based only on semantic decompositions of abstracts and bibliographies and compare it with the formal discipline classifications. Our model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. We discuss the appropriateness of the model for the PNAS database as well as other features of the data relevant to soft classification.
To whom correspondence should be addressed. E-mail: elena{at}stat.washington.edu.
![]()
CiteULike
Complore
Connotea
Del.icio.us
Digg What's this?
This article has been cited by other articles in HighWire Press-hosted journals:
![]() |
K.-A. Sohn and E. P. Xing Spectrum: joint bayesian inference of population structure and recombination events Bioinformatics, July 1, 2007; 23(13): i479 - i489. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. G. Manton, V. L. Lamb, and XiLiang Gu Medicare Cost Effects of Recent U.S. Disability Trends in the Elderly: Future Implications J Aging Health, June 1, 2007; 19(3): 359 - 381. [Abstract] [PDF] |
||||
![]() |
N. A. Rosenberg and M. Nordborg A General Population-Genetic Model for the Production by Population Structure of Spurious Genotype-Phenotype Associations in Discrete, Admixed or Spatially Distributed Populations Genetics, July 1, 2006; 173(3): 1665 - 1678. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. W. Boyack Mapping knowledge domains: Characterizing PNAS PNAS, April 6, 2004; 101(suppl_1): 5192 - 5199. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. K. Landauer, D. Laham, and M. Derr From paragraph to graph: Latent semantic analysis for information visualization PNAS, April 6, 2004; 101(suppl_1): 5214 - 5219. [Abstract] [Full Text] [PDF] |
||||