Functional connectome fingerprinting using shallow feedforward neural networks
- aCenter for Functional MRI, University of California San Diego, La Jolla, CA 92093;
- bDepartment of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA 92093;
- cDepartment of Radiology, University of California San Diego, La Jolla, CA 92093;
- dDepartment of Psychiatry, University of California San Diego, La Jolla, CA 92093
See allHide authors and affiliations
Edited by Huda Akil, University of Michigan–Ann Arbor, Ann Arbor, MI, and approved March 10, 2021 (received for review October 20, 2020)

Abstract
Although individual subjects can be identified with high accuracy using correlation matrices computed from resting-state functional MRI (rsfMRI) data, the performance significantly degrades as the scan duration is decreased. Recurrent neural networks can achieve high accuracy with short-duration (72 s) data segments but are designed to use temporal features not present in the correlation matrices. Here we show that shallow feedforward neural networks that rely solely on the information in rsfMRI correlation matrices can achieve state-of-the-art identification accuracies (
Functional connectome fingerprinting based on the similarity of correlation coefficient matrices computed from resting-state functional MRI (rsfMRI) data can identify individuals with high accuracy (
The two networks considered are shown in Fig. 1 A and B. The input to the correlation neural network (corrNN) consists of the upper triangular elements of the correlation coefficient matrix C estimated from a data matrix X consisting of z-normalized time series (of length N) from M regions of interest (ROIs). For identification of L subjects, the network structure consists of a fully connected classification layer with L units, a batch normalization layer, and a softmax layer. The norm-based neural network (normNN) uses the z-normalized data X as the input. The first stage is a fully connected layer that projects the data onto K hidden units using the
(A and B) CorrNN and NormNN model structures. (C and D) Top rows are maps showing the relative importance of the ROIs for identification accuracy, with maximum importance of 1.0 indicated in yellow. The remaining rows are thresholded to show the locations of the top 15 to 60 ROIs. (E and F) Mean identification accuracies as a function of the number of time points and ROIs.
Results
We assessed the performance of the two networks using data from the Human Connectome Project (HCP) (6). Two rsfMRI scans acquired on day 1 were used for training, while the two scans from day 2 were used for validation and testing.
For
We used a greedy search algorithm to assess the relative importance of the ROIs with respect to model accuracy. Importance maps are shown in the top rows of Fig. 1 C and D for corrNN and normNN, respectively, with the subsequent rows thresholded to highlight the top 15 to 60 ROIs. When considering the top 60 ROIs, the highest numbers of ROIs are found in region 22 (dorsolateral prefrontal cortex) followed by regions 17 (inferior parietal cortex), 14 (lateral temporal cortex), 16 (superior parietal cortex; for CorrNN), 21 (inferior frontal cortex), and 3 (dorsal stream visual cortex), where brain regions are as defined in ref. 7.
We used the top ROIs to evaluate CorrNN and NormNN performance with 15 to 60 ROIs and 5 to 1,000 time points, as shown in Fig. 1 E and F, respectively. As the number of ROIs decreases, the number of time points needed to achieve higher accuracy increases. Defining
To further explore the dependence on the number of ROIs and time points, we considered combinations
(A) CorrNN and (B) NormNN identification accuracies for combinations
For NormNN, the number of parameters exhibits a linear dependence on the number of ROIs (M) as compared to the quadratic dependence for CorrNN (see Fig. 2 legend). To better compare the models, we increased K by powers of 2 up to the value
As shown by the histograms, the high mean CorrNN and NormNN accuracies correspond to robust identification performance, with the majority of the trials demonstrating 100% prediction accuracy. These accuracies were obtained with global signal regression (GSR), and were significantly greater than those obtained without GSR for both CorrNN (
Using the ROIs determined from the first 100 subjects, we evaluated performance on the second set of 100 subjects for the combinations denoted in Fig. 2. High mean CorrNN accuracies (
For both sets of subjects, the mean number of CorrNN prediction errors was not significantly correlated (across subjects) with the mean framewise displacement (FD) measure of subject motion (
For NormNN, we find that the first layer trained weights are randomly distributed so that the features after the
Discussion
We have shown that shallow feedforward models can identify subjects based solely on information in rsfMRI correlation matrices, robustly achieving high accuracies (
Consistent with prior observations (1), high performance can be achieved when using a subset of the ROIs, including those located in frontoparietal and lateral temporal regions. The same set of ROIs can be used to achieve high performance across independent datasets, suggesting that the predictive value of intersubject variability in the functional boundaries and connectivity of these regions generalizes across datasets.
While combinations with span lengths as short as 27 points (19.5 s; CorrNN
As in prior studies (1⇓–3), the current study utilized the HCP dataset, in which the data were acquired on two consecutive days (6). Although substantial variations in functional connectivity can occur on short time scales (i.e., minutes to hours) due to factors such as temporal fluctuations in vigilance (8), our results indicate that high performance can be obtained over a 1-d interval even in the presence of these factors. Future large-scale studies will be needed to assess whether high identification accuracy can be obtained over longer intervals (i.e., weeks to years).
The effectiveness of the feedforward networks for distinguishing individuals with relatively little data suggests that similar future approaches may have the potential to more fully utilize the information contained in rsfMRI data to better identify disease-related differences.
Materials and Methods
HCP preprocessing of the data included motion correction, detrending, denoising, and registration (7). The 379 ROIs were defined using 360 cortical ROIs from ref. 7 and 19 subcortical ROIs from ref. 6. Data were averaged within each ROI, and GSR was applied. Training, testing, and validation of the models were performed with Keras and TensorFlow. Further details are provided in SI Appendix, Extended Methods.
Data Availability
Analysis code, summary data, and anonymized fMRI data have been deposited at Bitbucket and Open Science Framework (9, 10).
Acknowledgments
This work was supported, in part, by NIH Grant R21MH112155. We thank Eric Wong, Garrison Cottrell, Jiawei Ren, and Shili Wang for their assistance.
Footnotes
- ↵1To whom correspondence may be addressed. Email: ttliu{at}ucsd.edu.
Author contributions: G.S., B.R., and T.L. designed research; G.S. and T.L. performed research; G.S. and T.L. contributed new reagents/analytic tools; G.S. and T.L. analyzed data; and G.S. and T.L. wrote the paper.
The authors declare no competing interest.
This article contains supporting information online at https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2021852118/-/DCSupplemental.
- Copyright © 2021 the Author(s). Published by PNAS.
This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).
References
- ↵
- ↵
- S. Chen,
- X. Hu
- ↵
- L. Wang,
- K. Li,
- X. Chen,
- X. P. Hu
- ↵
- G. Sarar,
- S. Wang,
- J. Ren,
- T. T. Liu
- ↵
- D. Bartz,
- K. Hatrick,
- C. W. Hesse,
- K.-R. Müller,
- S. Lemm
- ↵
- ↵
- ↵
- T. T. Liu,
- M. Falahpour
- ↵
- T. T. Liu
- ↵
- T. T. Liu
Citation Manager Formats
Article Classifications
- Biological Sciences
- Neuroscience