Controlling False Disrupted Connectivities in Neuroimaging Studies

Dulal K Bhaumik; Yuanbo Song; Dan Zhao; Olusola Ajilore

Controlling False Disrupted Connectivities in Neuroimaging Studies

Dulal K Bhaumik¹, Yuanbo Song², Dan Zhao^2*, Olusola Ajilore¹

¹Department of Psychiatry, University of Illinois at Chicago, USA

²Department of Epidemiology and Biostatistics, University of Illinois at Chicago, USA

^*Corresponding author: Dulal K. Bhaumik, Department of Epidemiology and Biostatistics, University of Illinois at Chicago, USA. Email: dbhaumik@uic.edu; Tel: +1(312) 413-4455

Received Date: 15 October, 2018; Accepted Date: 30 October, 2018; Published Date: 07 November, 2018

Citation: Bhaumik DK, Song Y, Zhao D, Ajilore O (2018) Controlling False Disrupted Connectivities in Neuroimaging Studies. J Biostat Biom: JBSB-108. DOI: 10.29011/JBSB-108.100008.

Abstract

Detection of disrupted connectivities is an important problem in neuroimaging research for targeting treatment interventions in order to get optimal therapeutic benefits. Disrupted connectivities are generally detected by comparing the disease group with a healthy control group utilizing the whole brain, resulting in thousands of comparisons. However, standard neuroimaging studies devoted to making such large scale multiple comparisons towards discovering statistically significant disrupted connectivities are seen to produce inflated false discoveries. Thus, development of statistical methods for multiple comparisons built upon a solid theoretical foundation in neuroimaging studies for controlling false discoveries is urgently needed. This article explores several approaches to control the false discovery rate, and recommends one best suited for detecting disrupted connectivities. Results are illustrated with a live data set generated from a late life depression study.

Keywords: Functional magnetic resonance imaging; False discovery rate; Local false discovery rate; Mixed-effects linear models; Neuroconnectivity

Introduction

Controlling the False Discovery Rate (FDR) is critically important for a detailed understanding of how a healthy brain differs with neurological diseases (e.g., depression), which is the fundamental requirement to the development and application of treatments for these conditions. Most of the neuroimaging research studies either do not address the issue of Multiple Testing (MT), or inadequately address MT by failing to incorporate the underlying dependency structure. Software packages often used for fMRI analysis (SPM, FSL, AFNI) can result in making false-positive rates of up to 70%, when it is fixed at 5% [1]. Inadequate use of statistical methods for MT is hindering the search for truly disrupted connectivities. Excessive false discoveries lead to wrong conclusion and may mar the importance of the study. Thus, a proper procedure to control the false discovery rate in neuro connectivity research is urgently needed.

Currently functional connectivity data are analyzed using model based methods such as cross correlation analysis or statistical parametric mapping [2-8]. Functional connectivity is also analyzed by data-driven methods such as independent component analysis, or principal component analysis [9-11]. Generally, structural connectivity data are obtained using diffusion tensor imaging, with measurements such as fractional anisotropy, relative anisotropy, mean diffusivity, or axial diffusivity [12-17]. For multimodal analysis, the existing Bayesian approach utilizes structural connectivity as a prior for functional connectivity [18-21], and frequentist statistical methods are also used with or without prior information [11,22-38]. In all neuroconnectivity studies based on either the whole brain, or any network (e.g. Salience Network, Default Mode Network), multiple testing becomes an integral part when two groups (e.g. Depression and Healthy Control) are compared. So, it is important to know the performance of different methods that are commonly used to control the false discovery rate in neuroconnectivity analysis.

Traditional approaches of large sample tests based on likelihood ratios and Wald-type statistics cannot address the issue of multiple comparison in neuroconnectivity analysis. Bonferroni correction for multiple testing becomes over conservative for a large number of comparisons. In rare cases, to control the false discovery rate, the approach by Benjamini and Hochberg (BH) [39] is used. However, the BH approach produces inflated false discovery rates in certain cases, and hence poses a problem in correctly identifying true discoveries with small samples.

The goal of this article is to develop statistical approaches to precisely detect disruptive neuroconnectivity in diseased populations and apply the findings to optimize benefits of clinical interventions. The first novelty of this article is in the development of a statistical model for neuroconnectivity data collected from neurological conditions (e.g., psychiatric, cognitive impairment etc.) and healthy control populations to identify significant changes in neuroconnectivity. For that purpose, we develop a mixed-effects model with some important characteristics in order to address within and between-subject heterogeneities, and also variabilities across links. The second novelty is in providing a simple interpretation to the q-value cut off which represents the minimum FDR at which the test can be called significant. Disrupted findings are next used to guide decisions on where to apply a neuro-modulatory intervention intended to improve behavioural disorders. The optimal approach of this article will indirectly reduce the cost of longitudinal and/or cluster neuroimaging studies, and provide the ability to study populations that are more difficult to recruit, such as minorities and traumatic brain injury patients. In summary, this article addresses modeling of neuroconnectivity data, and controlling of FDR in multiple testing for selecting the target of treatment applications.

We organize the article as follows. In Section 2, we motivate our problem with a study related to Late Life Depression (LLD) and argue that instead of controlling the type I error rate, we should control the false discovery rate for within or between group neuroconnectivity comparisons. In this section, we also discuss mixed-effects models and multiple testing procedures. In Section 3, we perform a simulation study for identifying an FDR approach that suits the best for neuroconnectivity comparisons and apply it to the LLD study for controlling the false discovery rate in section 4. We conclude the paper in Section 5.

Material and Methods

In this section, we discuss and develop some statistical methodologies intended to implement for detecting disrupted connectivities in group comparison studies. Towards that we start with a motivational example.

Motivational Example: Late Life Depression (LLD)

We motivate our research problem with a live data set from a study known as LLD recently conducted by Dr. Ajilore at University of Illinois at Chicago. LLD refers to major depressive episodes in elderly patients (usually over 50 or 60 years of age). This study recruited 23 subjects (13 healthy control (HC), and 10 LLD subjects), and collected data from 87 brain regions (i.e. a whole brain study, see Table 1) of each subject. The inclusion criteria for all subjects were over 55 years of age, medication-naive or anti-depressant free for at least two weeks (in the case of our LLD subjects) and no history of unstable cardiac or neurological diseases. The objective of this study is to detect brain regions and/or connectivity that are impaired during depression, and find out whether the identified regions can inform us about the pathophysiology of LLD Table 1.

Imaging data were collected using a Philips Achieva 3.0T scanner (Philips Medical Systems, Best, The Netherlands) with an 8-channel sensitivity encoded (SENSE) head coil. To reduce noise, headphones and foam pads were used and head movement was minimized. Subjects were instructed to stay still with their eyes closed yet awake without “thinking anything in particular” throughout the scanning process. Other sources of confounding effects such as motion artfact, white matter, and CSF were also regressed out before analysis. High resolution 3D T1-weighted images were attained by a MPRAGE (Magnetization Prepared Rapid Acquisition Gradient Echo) sequence with the following parameters: FOV = 240mm, TR/TE = 8.4/3.9 ms, flip angle = 8 degree, voxel size = 1.1 × 1.1 × 1.1 mm, and 134 contiguous axial slices. Resting-state imaging data were collected using a single-shot gradient-echo EPI sequence (EPI factor = 47, FOV = 23×23×15 cm³, TR/TE = 2000/30ms, Flip angle = 80, in-plane resolution = 3×3 mm², slice thickness/gap = 5/0 mm, slice number = 30, SENSE reduction factor = 1.8, NEX = 200, total scan time = 6: 52. The resting-state functional neuronetworks were configured using CONN [40], a Matlab toolbox. In brief, we preprocessed our raw EPI images involving four steps namely: realignment, co-registration, normalization, and smoothing.

The adjacency matrix of functional network as shown in (1), it has 87 × 86/2 = 3471 unique measures of functional connectivity. To satisfy the key assumption of normal distribution of our model, we applied the Fisher’s Z transformation to these Pearson Correlations (PC).

Modeling of Data

Mixed-effects regression models are frequently used to analyze neuroimaging data, as it can borrow strengths from multiple sources and provides reliable estimates of model parameters [4,18,41-48]. Fixed parameters of mixed-effects regression models provide population related information while random parameters provide subject specific information beyond the population mean [49-53]. Let Y_ijg/ be the fisher Z transformed pearson correlation measure of the ith link (i.e. connectivity between two regions) of the j^th subject nested within the g^th group (e.g. intervention, disease, control etc.).

where i (= 1, 2..., m) denotes the ith link (connection between two brain regions), j (= 1, 2...,N_g) denotes the j subject nested within the g^th group, x_ijg = 0, if the j^th subject is from the control group (i.e. g = control), and x_ijg= 1, if the j^th subject is from the other group (i.e. g = intervention or disease). We assume the random noise ε_ijgfollows a normal distribution N (0, σ ), in addition, random subject effect ν_jg follows a normal distribution N0, σ , and these two normal distributions are independent. The vector w_jg represents measures of additional fixed covariates such as age, race, sex etc. Thus, our decision related to significance of link connectivity will be adjusted by such fixed covariates. In this model, γ_1i differentiates the intercept of the intervention group from that of the control group at the i^th link. Note that this model allows each link to have its own mean and variance parameters, and those vary from one group to the other group. To compare two groups at the i^th link, we will test null hypotheses H₀: γ_1i = 0, i = 1, 2,…m. Thus, we have a total number of m hypotheses, which brings the notion of multiple testing.

Parameters in the model are estimated using an EM algorithm. In the E step, with the current values of the other parameters, we compute the “expected posteriori” or Empirical Bayes (EB) estimates of random effects as well as the conditional variances of random effects given data. In the M step, given the current estimated values of random effects, we obtain the Maximum Marginal Likelihood (MML) estimates of the regression coefficients, error variances, and variances of random effects. The algorithm iterates between the EB and MML estimates until convergence.

q-value based FDR Procedure

To detect disrupted connectivities in a neuroimaging study, we generally perform between group comparisons. In whole brain study, the number of Regions of Interest (ROI) in our example is 87 with a total number of 3471 connectivities. In a single hypothesis testing problem, we generally control the type I error rate at a pre-specified level (e. g. α = .05), and determine a rejection region to maximize the probability of detecting the true alternative hypothesis. When multiple hypotheses are tested simultaneously, it is important to control the expected proportion of false rejections among all rejections, and the concept of False Discovery Rate (FDR) [39] prevails there. The ingredients that we need to formally define the FDR are presented in Table 2.

By definition, FDR is the expected proportion of Type I errors among the rejected hypotheses.

Hence, FDR = E(V/(V + S) = E(V/R), where V isthe number of false rejections, R is the total number of rejections, and we define V/R=0, if R=0.

Algorithm to Control FDR

We start with the Benjamini and Hochberg [39] procedure to control the FDR. Consider a multiple testing problem consists of m hypotheses H₁, …, H_mwith the corresponding p values p₁…, p_m. In the literature of FDR, [54, 55] defined a q-value as an analog of the p value that incorporates FDR-based multiple testing correction. q-value is the minimum positive FDR that can occur when rejecting a statistic with a threshold for the set of nested significance regions. The general procedure of the algorithm that controls the FDR at a threshold value of q is as follows:

1) Obtain p-values of all test statistics and sort those in an increasing order, i.e. p (1) ≤ p (2) ≤ … ≤ p_(m).

2) For a given q, find the largest k for which p_(k)≤ kq/m holds.

3) Declare discoveries for all H_(i), where i = 1, 2…, k.

First the BH procedure needs a cut-off q-value, then starts from the largest p value (i.e. p_(m)) and moves towards the smallest one, consequently we reject all null hypotheses with p-values less than the aforementioned threshold.

Determination of FDR Level q

In a single hypothesis testing, generally we fix the individual type I error rate α at 5%, and try to achieve 80% power. Similarly, in multiple hypothesis testing, we need to determine a q-value. However, no such standard value of the FDR level q exists in the literature, mainly because of some other factors that play crucial roles in this process. Suppose in a multiple testing scenario with 100 (out of a total of 200) hypotheses are truly null (i.e., the proportion of nulls is p₀= 0.50), we would expect 5 null rejections (out of 100) when α is individually fixed at 5%. With 80 rejections of the hypotheses that are truly non-null (i.e., 80% power), the observed theFDR level q = 5/(5+80) ≈0.059. A second scenario also conforms to multiple testing problem, with 90 of the 100 hypotheses being truly null (i.e., p₀0.90), and 10 of these being truly false. In this scenario, we should expect observed FDR level q close to 4/(4+8) ≈0.333.

If this procedure continues for a large value of p₀(e.g. p₀=.99), q will be close to 1 which will lead to less power and is undesirable. Thus, in a multiple testing problem, the FDR level q should be adjusted with the value of p₀, together with the individual type I error rate and power, such that q does not exceed to a desired threshold (e.g., 0.20 or 0.30). In order to better understand the interrelationship between individual type I error rate

Figure 1: mFDR level vs.

Figure 2: Adjusted type I error rate (

Region	Description	Region	Description
1	cerebellum cortex	23	Middletemporal
2	thalamus proper	24	parahippocampal
3	caudate magnetic	25	paracentral
4	putamen	26	parsopercularis
5	pallidum	27	parsorbitalis
6	hippocampus	28	parstriangularis
7	amygdala	29	pericalcarine
8	accumbens area	30	postcentral
9	VentralDC	31	osteriorcingulate
10	Bankssts	32	precentral
11	caudalanteriorcingulate	33	precuneus
12	caudalmiddlefrontal	34	rostralanteriorcingulate
13	cuneus	35	rostralmiddlefrontal
14	entorhinal	36	superiorfrontal
15	fusiform	37	superiorparietal
16	inferiorparietal	38	superiortemporal
17	inferiortemporal	39	supramarginal
18	isthmuscingulate	40	frontalpole
19	lateraloccipital	41	temporalpole
20	lateralorbitofrontal	42	transversetemporal
21	lingual	43	insula
22	medialorbitofrontal	44	brain stem (central)

Table 1: Brain regions (left and right) in whole brain analysis.

	H₀ Accepted	H₀ Rejected	Total
H₀True	U	V	m₀
H₀False	T	S	m-m₀
Total	m-R	R	m

Table 2: Different types of outcomes in multiple testing.

	μ	σ	p0
Theorical Null	0	1	0.901
Efron	0.038	1.121	0.988
Jin and Cai	-0.025	1.116	0.985

Table 3: Estimates of parameters by different procedures.

		UnderConnectivity	Over Connectivity Connectivity	Total
q-value	Adaptive BH	22	37	59
	Theorical Null	18	51	69
z-value	Efron	12	11	12
	Jin and Cai	1	13	14

Table 4: Number of significant connectivities by various fdr procedures.

	Region 1		Region 2
28	Left isthmuscingulate	56	Right caudalmiddlefrontal
12	Right thalamus_Proper	56	Right caudalmiddlefrontal
19	Right ventralDC	59	Right fusiform
45	Left rostralmiddlefrontal	56	Right caudalmiddlefrontal
15	Right pallidum	56	Right caudalmiddlefrontal
15	Right pallidum	26	Left inferiorparietal
18	Right accumbens_area	28	Left isthmuscingulate
41	Left posteriorcingulate	79	Right rostralmiddlefrontal
55	Right caudalanteriorcingulate	56	Right caudalmiddlefrontal
46	Left superiorfrontal	62	Right isthmuscingulate
56	Right caudalmiddlefrontal	62	Right isthmuscingulate
25	Left fusiform	49	Left supramarginal