Controlling False Disrupted Connectivities in Neuroimaging Studies
Dulal K Bhaumik1, Yuanbo Song2, Dan Zhao2*, Olusola Ajilore1
1Department of Psychiatry, University of
Illinois at Chicago, USA
2Department of Epidemiology and Biostatistics, University of Illinois at Chicago, USA
*Corresponding author: Dulal K. Bhaumik, Department of Epidemiology and Biostatistics, University of Illinois at Chicago, USA. Email: dbhaumik@uic.edu; Tel: +1(312) 413-4455
Received Date: 15 October, 2018; Accepted
Date: 30 October, 2018; Published Date: 07 November, 2018
Citation:
Bhaumik DK, Song Y, Zhao D, Ajilore O (2018)
Controlling False Disrupted Connectivities in Neuroimaging Studies. J Biostat Biom: JBSB-108.
DOI:
10.29011/JBSB-108.100008.
Abstract
Detection of disrupted connectivities is an important problem in neuroimaging research for targeting treatment interventions in order to get optimal therapeutic benefits. Disrupted connectivities are generally detected by comparing the disease group with a healthy control group utilizing the whole brain, resulting in thousands of comparisons. However, standard neuroimaging studies devoted to making such large scale multiple comparisons towards discovering statistically significant disrupted connectivities are seen to produce inflated false discoveries. Thus, development of statistical methods for multiple comparisons built upon a solid theoretical foundation in neuroimaging studies for controlling false discoveries is urgently needed. This article explores several approaches to control the false discovery rate, and recommends one best suited for detecting disrupted connectivities. Results are illustrated with a live data set generated from a late life depression study.
Keywords: Functional magnetic resonance imaging; False discovery rate; Local false discovery rate; Mixed-effects linear models; Neuroconnectivity
Introduction
Controlling the False Discovery Rate (FDR) is
critically important for a detailed understanding of how a healthy brain
differs with neurological diseases (e.g., depression), which is the fundamental
requirement to the development and application of treatments for these
conditions. Most of the neuroimaging research studies either do not address the
issue of Multiple Testing (MT), or inadequately address MT by failing to
incorporate the underlying dependency structure. Software packages often used
for fMRI analysis (SPM, FSL, AFNI) can result in making false-positive rates of
up to 70%, when it is fixed at 5% [1]. Inadequate use of statistical methods
for MT is hindering the search for truly disrupted connectivities. Excessive
false discoveries lead to wrong conclusion and may mar the importance of the
study. Thus, a proper procedure to control the false discovery rate in neuro
connectivity research is urgently needed.
Currently functional connectivity data are
analyzed using model based methods such as cross correlation analysis or
statistical parametric mapping [2-8]. Functional connectivity is also analyzed
by data-driven methods such as independent component analysis, or principal
component analysis [9-11]. Generally, structural connectivity data are obtained
using diffusion tensor imaging, with measurements such as fractional
anisotropy, relative anisotropy, mean diffusivity, or axial diffusivity
[12-17]. For multimodal analysis, the existing Bayesian approach utilizes
structural connectivity as a prior for functional connectivity [18-21], and
frequentist statistical methods are also used with or without prior information
[11,22-38]. In all neuroconnectivity studies based on either the whole brain,
or any network (e.g. Salience Network, Default Mode Network), multiple testing
becomes an integral part when two groups (e.g. Depression and Healthy Control)
are compared. So, it is important to know the performance of different methods
that are commonly used to control the false discovery rate in neuroconnectivity
analysis.
Traditional approaches of large sample tests
based on likelihood ratios and Wald-type statistics cannot address the issue of
multiple comparison in neuroconnectivity analysis. Bonferroni correction for
multiple testing becomes over conservative for a large number of comparisons. In
rare cases, to control the false discovery rate, the approach by Benjamini and
Hochberg (BH) [39] is used. However, the BH approach produces inflated false
discovery rates in certain cases, and hence poses a problem in correctly
identifying true discoveries with small samples.
The goal of this article is to develop
statistical approaches to precisely detect disruptive neuroconnectivity in
diseased populations and apply the findings to optimize benefits of clinical
interventions. The first novelty of this article is in the development of a
statistical model for neuroconnectivity data collected from neurological
conditions (e.g., psychiatric, cognitive impairment etc.) and healthy control
populations to identify significant changes in neuroconnectivity. For that
purpose, we develop a mixed-effects model with some important characteristics
in order to address within and between-subject heterogeneities, and also
variabilities across links. The second novelty is in providing a simple
interpretation to the q-value cut off which represents the minimum FDR at which
the test can be called significant. Disrupted findings are next used to guide
decisions on where to apply a neuro-modulatory intervention intended to improve
behavioural disorders. The optimal approach of this article will indirectly
reduce the cost of longitudinal and/or cluster neuroimaging studies, and
provide the ability to study populations that are more difficult to recruit,
such as minorities and traumatic brain injury patients. In summary, this article
addresses modeling of neuroconnectivity data, and controlling of FDR in
multiple testing for selecting the target of treatment applications.
We organize the article as follows. In
Section 2, we motivate our problem with a study related to Late Life Depression
(LLD) and argue that instead of controlling the type I error rate, we should
control the false discovery rate for within or between group neuroconnectivity
comparisons. In this section, we also discuss mixed-effects models and multiple
testing procedures. In Section 3, we perform a simulation study for identifying
an FDR approach that suits the best for neuroconnectivity comparisons and apply
it to the LLD study for controlling the false discovery rate in section 4. We
conclude the paper in Section 5.
Material
and Methods
In this section, we discuss and develop some
statistical methodologies intended to implement for detecting disrupted
connectivities in group comparison studies. Towards that we start with a
motivational example.
Motivational
Example: Late Life Depression (LLD)
We motivate our research problem with a live data set from a study known as LLD recently conducted by Dr. Ajilore at University of Illinois at Chicago. LLD refers to major depressive episodes in elderly patients (usually over 50 or 60 years of age). This study recruited 23 subjects (13 healthy control (HC), and 10 LLD subjects), and collected data from 87 brain regions (i.e. a whole brain study, see Table 1) of each subject. The inclusion criteria for all subjects were over 55 years of age, medication-naive or anti-depressant free for at least two weeks (in the case of our LLD subjects) and no history of unstable cardiac or neurological diseases. The objective of this study is to detect brain regions and/or connectivity that are impaired during depression, and find out whether the identified regions can inform us about the pathophysiology of LLD Table 1.
Imaging data were collected using a Philips
Achieva 3.0T scanner (Philips Medical Systems, Best, The Netherlands) with an
8-channel sensitivity encoded (SENSE) head coil. To reduce noise, headphones
and foam pads were used and head movement was minimized. Subjects were
instructed to stay still with their eyes closed yet awake without “thinking
anything in particular” throughout the scanning process. Other sources of
confounding effects such as motion artfact, white matter, and CSF were also
regressed out before analysis. High resolution 3D T1-weighted images were
attained by a MPRAGE (Magnetization Prepared Rapid Acquisition Gradient Echo)
sequence with the following parameters: FOV = 240mm, TR/TE = 8.4/3.9 ms, flip
angle = 8 degree, voxel size = 1.1 × 1.1 × 1.1 mm, and 134 contiguous axial
slices. Resting-state imaging data were collected using a single-shot
gradient-echo EPI sequence (EPI factor = 47, FOV = 23×23×15 cm3,
TR/TE = 2000/30ms, Flip angle = 80, in-plane resolution = 3×3 mm2,
slice thickness/gap = 5/0 mm, slice number = 30, SENSE reduction factor = 1.8,
NEX = 200, total scan time = 6: 52. The resting-state functional neuronetworks
were configured using CONN [40], a Matlab toolbox. In brief, we preprocessed
our raw EPI images involving four steps namely: realignment, co-registration,
normalization, and smoothing.
The adjacency matrix of functional network as shown in (1), it has 87 × 86/2 = 3471 unique measures of functional connectivity. To satisfy the key assumption of normal distribution of our model, we applied the Fisher’s Z transformation to these Pearson Correlations (PC).
Modeling
of Data
Mixed-effects regression models are
frequently used to analyze neuroimaging data, as it can borrow strengths from
multiple sources and provides reliable estimates of model parameters [4,18,41-48].
Fixed parameters of mixed-effects regression models provide population related
information while random parameters provide subject specific information beyond
the population mean [49-53]. Let Yijg/ be the fisher Z transformed
pearson correlation measure of the ith link (i.e. connectivity between two
regions) of the jth subject nested within the gth group
(e.g. intervention, disease, control etc.).
where i (= 1, 2..., m) denotes the ith link
(connection between two brain regions), j (= 1, 2...,Ng) denotes the
j subject nested within the gth group, xijg = 0, if the jth
subject is from the control group (i.e. g = control), and xijg= 1,
if the jth subject is from the other group (i.e. g = intervention or
disease). We assume the random noise εijg follows a normal
distribution N (0, σ
Parameters in the model are estimated using
an EM algorithm. In the E step, with the current values of the other
parameters, we compute the “expected posteriori” or Empirical Bayes (EB)
estimates of random effects as well as the conditional variances of random
effects given data. In the M step, given the current estimated values of random
effects, we obtain the Maximum Marginal Likelihood (MML) estimates of the
regression coefficients, error variances, and variances of random effects. The
algorithm iterates between the EB and MML estimates until convergence.
q-value based
FDR Procedure
To detect disrupted connectivities in a neuroimaging study, we generally perform between group comparisons. In whole brain study, the number of Regions of Interest (ROI) in our example is 87 with a total number of 3471 connectivities. In a single hypothesis testing problem, we generally control the type I error rate at a pre-specified level (e. g. α = .05), and determine a rejection region to maximize the probability of detecting the true alternative hypothesis. When multiple hypotheses are tested simultaneously, it is important to control the expected proportion of false rejections among all rejections, and the concept of False Discovery Rate (FDR) [39] prevails there. The ingredients that we need to formally define the FDR are presented in Table 2.
By
definition, FDR is the expected proportion of Type I errors among the rejected
hypotheses.
Hence, FDR = E(V/(V + S) = E(V/R), where V isthe number of false
rejections, R is the total number of rejections, and we define V/R=0, if R=0.
Algorithm to Control FDR
We start with
the Benjamini and Hochberg [39] procedure to control the FDR. Consider a
multiple testing problem consists of m hypotheses H1, …, Hm with
the corresponding p values p1…, pm. In the literature of
FDR, [54, 55] defined a q-value as an analog of the p value that incorporates
FDR-based multiple testing correction. q-value is the minimum positive FDR that
can occur when rejecting a statistic with a threshold for the set of nested
significance regions. The general procedure of the algorithm that controls the
FDR at a threshold value of q is as follows:
1)
Obtain p-values of all test statistics and sort those in an
increasing order, i.e. p (1) ≤ p (2) ≤ … ≤ p(m).
2)
For a given q, find the largest k for which p(k)≤
kq/m holds.
3)
Declare discoveries for all H(i), where i = 1,
2…, k.
First the BH
procedure needs a cut-off q-value, then starts from the largest p value (i.e. p(m))
and moves towards the smallest one, consequently we reject all null hypotheses
with p-values less than the aforementioned threshold.
Determination of FDR Level q
In a single hypothesis testing, generally we fix the
individual type I error rate α at
5%, and try to achieve 80% power. Similarly, in multiple hypothesis testing, we
need to determine a q-value. However, no such standard value of the FDR level q
exists in the literature, mainly because of some other factors that play
crucial roles in this process. Suppose in a multiple testing scenario with 100
(out of a total of 200) hypotheses are truly null (i.e., the proportion of
nulls is p0= 0.50), we would expect 5 null rejections
(out of 100) when α is
individually fixed at 5%. With 80 rejections of the hypotheses that are truly
non-null (i.e., 80% power), the observed theFDR level q = 5/(5+80) ≈0.059. A second scenario also conforms to
multiple testing problem, with 90 of the 100 hypotheses being truly null (i.e.,
p0 0.90), and 10 of these being truly false. In
this scenario, we should expect observed FDR level q close to 4/(4+8) ≈0.333.
If this procedure continues for a large value of p0(e.g. p0=.99), q will be close to 1 which will lead to less power and is undesirable. Thus, in a multiple testing problem, the FDR level q should be adjusted with the value of p0, together with the individual type I error rate and power, such that q does not exceed to a desired threshold (e.g., 0.20 or 0.30). In order to better understand the interrelationship between individual type I error rate
Figure 1: mFDR level vs.
Figure 2: Adjusted type I error rate (
Region | Description | Region | Description |
1 | cerebellum cortex | 23 | Middletemporal |
2 | thalamus proper | 24 | parahippocampal |
3 | caudate magnetic | 25 | paracentral |
4 | putamen | 26 | parsopercularis |
5 | pallidum | 27 | parsorbitalis |
6 | hippocampus | 28 | parstriangularis |
7 | amygdala | 29 | pericalcarine |
8 | accumbens area | 30 | postcentral |
9 | VentralDC | 31 | osteriorcingulate |
10 | Bankssts | 32 | precentral |
11 | caudalanteriorcingulate | 33 | precuneus |
12 | caudalmiddlefrontal | 34 | rostralanteriorcingulate |
13 | cuneus | 35 | rostralmiddlefrontal |
14 | entorhinal | 36 | superiorfrontal |
15 | fusiform | 37 | superiorparietal |
16 | inferiorparietal | 38 | superiortemporal |
17 | inferiortemporal | 39 | supramarginal |
18 | isthmuscingulate | 40 | frontalpole |
19 | lateraloccipital | 41 | temporalpole |
20 | lateralorbitofrontal | 42 | transversetemporal |
21 | lingual | 43 | insula |
22 | medialorbitofrontal | 44 | brain stem (central) |
Table 1: Brain regions (left and right) in whole brain analysis.
H0
Accepted |
H0
Rejected |
Total |
|
H0True |
U |
V |
m0 |
H0False |
T |
S |
m-m0 |
Total |
m-R |
R |
m |
Table
2: Different types of outcomes in multiple
testing.
μ | σ | p0 | |
Theorical Null | 0 | 1 | 0.901 |
Efron | 0.038 | 1.121 | 0.988 |
Jin and Cai | -0.025 | 1.116 | 0.985 |
Table 3: Estimates of parameters by different procedures.
UnderConnectivity | Over Connectivity Connectivity | Total | ||
q-value | Adaptive BH | 22 | 37 | 59 |
Theorical Null | 18 | 51 | 69 | |
z-value | Efron | 12 | 11 | 12 |
Jin and Cai | 1 | 13 | 14 |
Table 4: Number of significant connectivities by various fdr procedures.
| Region 1 | Region 2 | |
28 | Left isthmuscingulate | 56 | Right caudalmiddlefrontal |
12 | Right thalamus_Proper | 56 | Right caudalmiddlefrontal |
19 | Right ventralDC | 59 | Right fusiform |
45 | Left rostralmiddlefrontal | 56 | Right caudalmiddlefrontal |
15 | Right pallidum | 56 | Right caudalmiddlefrontal |
15 | Right pallidum | 26 | Left inferiorparietal |
18 | Right accumbens_area | 28 | Left isthmuscingulate |
41 | Left posteriorcingulate | 79 | Right rostralmiddlefrontal |
55 | Right caudalanteriorcingulate | 56 | Right caudalmiddlefrontal |
46 | Left superiorfrontal | 62 | Right isthmuscingulate |
56 | Right caudalmiddlefrontal | 62 | Right isthmuscingulate |
25 | Left fusiform | 49 | Left supramarginal |
Table 5: Significant links.
45.
Chen G,
Saad ZS, Britton JC, Pine DS, Cox RW (2013) Linear mixed-effects modeling
approach to FMRI group analysis. Neuroimage 73: 176-190.
50.
Hedeker D,
Gibbons RD (2006) Longitudinal data analysis. John Wiley & Sons.
51.
Fitzmaurice
GM, Laird NM, Ware JH (2012) Applied longitudinal analysis. John Wiley &
Sons.
53.
Seltman HJ
(2012) Experimental design and analysis.
66.
Sarkar
SK (2008) On methods controlling the false discovery rate. Sankhya: The Indian
Journal of Statistics, Series A 70: 135-168.