Class 2: One Health, BioStatistics I: The Power and Crisis

1. Introduction and Recap of Last Class

1.1. Recap of Last Class

Introduction to Biostatistics
First Part Content
Second Part Content
Laboratory

1.2. Lecture Outline

Motivations
The Power
Probability, Expectations & Variance
Intervals, Testing & p-Values
StatsBiol
Laboratory

2. Motivations

2.1. The Crisis

2.1.1. The Crisis

2.1.2. Objectives & Agenda

Critical Thinking & Objectivity
Bias in Knowledge & Believes
Customs & Best Practice
Societal Pressures

2.1.3. Where we are

Ask yourself
- 90% Recognize a Crisis on REPRODUCIBILITY

Trust you field?

Quantification makes a difference?.
Physicist & chemists more confident

Have you?

Fail to reproduce results:
- Someone else 60-80%
- My own 40-60%
Publishing difficulty
- of failing reproduction 13%
- vs successful reproduction 24%

Why?

~70% fraud
>80% poor design
Selective reporting & pressure ~90%

What to Change?

~90% Better Statistical undersanding
Robust design
Mentoring
Better practices

Did you?

34% did take actions
33% last 5 yrs
7% more than 5 yrs
26% From the beginning

2.1.4. Replication Studies

Replicability crisis is a serious issue in which many scientific studies are difficult to reproduce or replicate.

Cancer research, only about 10–25% of published studies could be validated or reproduced.
In psychology only about 36% were reproduced.

Other
- Medicine.
- Genetics.
- Economics.
- Neuroscience.

2.1.5. Reasons

Inappropriate practices of science,
- HARKing (Hypothesizing After the Results are Known)
- p-hacking.
- Selective reporting of positive results.
- Poor research design.
- Lack of raw data.

2.1.6. Pharma

Bayer
- Oncology, woman health, cardiovascular.
- 65 % where not reproducible.

Amgen

Oncology and hematology
From 53 works, only 6 (11%) where confirmed.

2.1.7. Give me the Power

Power Failure
- Median statistical power is 18-21%
- Neuroimaging studies 8%.
- Animal models 18-31%.

Sample Size

Power Effects

Low Power
- Discovering effects that are genuinely true is low.
- Produce more false negatives than high-powered studies.
Low PPV
- Positive Predictable Value
- PPV = ([1 – β] × R) ⁄ ([1− β] × R + α)
Effect inflation
- Effect inflation occur whenever claims of discovery are based on thresholds of statistical significance
  - for example, p < 0.05, or other selection filters.

Low power and other biases

Low-powered studies are more likely to provide a wide range of estimates of the magnitude of an effect.
Publication bias, selective data analysis and selective reporting are more likely to affect low-powered studies.
Small studies may be of lower quality in other aspects of their design as well.

More Power

The probability that a research finding is indeed true depends on:

the prior probability of it being true (before doing the study),
the statistical power of the study,
and the level of statistical significance.

PPV

After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV.

Graphical Assessment

Power & Bias

Finding	True Relationship
	Yes	No	Total
Yes	\[c(1-\beta)R/(R+1)\]	\[c\alpha/(R+1)\]	\[c(R+\alpha-\beta R)/(R+1)\]
No	\[c\beta R/(R+1)\]	\[c(1-\alpha)/(R+1)\]	\[c(1-\alpha+\beta R)/(R+1)\]
Total	\[cR/(R+1)\]	\[c/(R+1)\]	\[c\]

(c=relationships are being probed in the field)

Bias

A combination of various factors that tend to produce research findings when they should not be produced including:

Design
Data
Analysis
Presentation factors

Corollaries

The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.

Some Estimates

\(1-\beta\)	R	Bias, u	Example	PPV
0.80	1:1	0.10	Powered RCT with little bias a 1:1 pre-study odds	0.85
0.95	2:1	0.30	Confirmatory meta-analysis of good quality RCTs	0.85
0.80	1:10	0.30	Adequately powered exploratory epidemiological study	0.20
0.20	1:1000	0.80	Discovery oriented exploratory research with massive testing limited bias	0.0015

Randomized control Trial

2.1.8. Erroneous Interactions

Even Best Families Top-ranking journals
- Behavioural, Systems Neuroscience.
  - ~50% correct comparison procedures for two experimental effects.
  - 2/3 of erroneous cases it may have had serious consequences.

Even Best Families Top-ranking journals

Cellular and molecular neuroscience.
- From 120 additional articles in, none uses correct statistical procedure to compare effect sizes.
- 25 used incorrect procedures to compared significance levels.

Comparison Errors

Tree situation where effect size comparison are incorrectly made.

2.1.9. DATA sets

Raw Data Withdraw

Absence of Raw Data means absence of science

Open Science Open Data

3. The Power

After a break?

3.1. Objectives

Probability & Statistics
Descriptive/Exploratory
Inference
Hypothesis Testing
Some Recommendations for Biology

3.2. Intro

3.2.1. Basic Definition

Statistical inference is the process of drawing formal conclusions from data.
Statistical inference occurs where one wants to infer facts about a population using noisy statistical data where uncertainty must taken into account.
Statistical inference requires assessment of assumptions and tools and thinking how to draw conclusions from data.

3.2.2. Some Inference Goals

Benchmarking
- Effectiveness of a treatment
Quantify
- Proportion of voting
Relationship
- Slope of Hoocke’s law
Impact
- Confinements
Probability
- Raining tomorrow

3.2.3. Some tools in Inference

Randomization.
- Unobserved variables may confound inferences of interest.
Random sampling.
- Data representative of a population.
Sampling models.
- Creating a model for the sampling process.
- Independent Identically Distributed (i.i.d).
Hypothesis testing.
- Decision making under uncertainty.
Confidence intervals.
- Quantify uncertainty in estimation.

Probability Models.
- Formal connection between the data and population of interest.
Study Design.
- Experiment to minimize biases and variability.
Nonparametric bootstrapping.
- Using data to create inference with minimal probability model assumptions.
Permutation.
- Randomization and exchangeability testing to perform inferences.

3.2.4. Schools Styles

Frequentist Probability & Inference
- Long run proportion of times an event occurs in independent, identically distributed repetitions.
- Interpretations of probabilities to control error rates.
- Given my data controlling the long run proportion of mistakes I make at a tolerable level.

Bayesian Probability & Inference
- Estimate or calculate of beliefs, which follow certain rules.
- Inference is performed by Bayesian probability representation of beliefs.
- Subjective beliefs and the objective information from the data to infer.

4. Probability, Expectations & Variance

4.1. Probability

4.1.1. Probability Definition

Given a random variable (experiment; say rolling a die) a probability measure is a population quantity that summarizes the randomness.

number between 0 and 1.
probability that something occurs is 1 (the die must be rolled) and
The probability of the union of any two sets of outcomes that have nothing in common (mutually exclusive) is the sum of their respective probabilities.

4.1.2. Rules probability must follow

The Russian mathematician Andrey Nikolaevich Kolmogorov formalized these rules.

The probability that nothing occurs is 0
The probability that something occurs is 1
The probability of something is 1 minus the probability that the opposite occurs
The probability of at least one of or more things that can not simultaneously occur, mutually exclusive, is the sum of their respective probabilities.

More interestingly
- If an event “A” implies the occurrence of event “B”, then the probability of “A” occurring is less than the probability that “B” occurs.
- For any two events the probability that at least one occurs is the sum of their probabilities minus their intersection.

4.1.3. Simple Example

Event/Condition X with incidence of 3% in the population
Whereas 10% of the population with Event/Condition Y.
Does this imply that 13% of people will have at least one these Event/Condition?
- Answer: If the events can simultaneously occur; they are not mutually exclusive so NO.

lets:

\begin{eqnarray*} A_1 & = & \{\mbox{Event X}\} \\ A_2 & = & \{\mbox{Event Y}\} \end{eqnarray*}

Then

\begin{eqnarray*} P(A_1 \cup A_2 ) & = & P(A_1) + P(A_2) - P(A_1 \cap A_2) \\ & = & 0.13 - \mbox{Probability of having both} \end{eqnarray*}

Likely, some fraction of the population has both.

4.1.4. Random variables

A random variable is a numerical outcome of an experiment.
The random variables come in two varieties, discrete or continuous.
- Discrete random variable take on only a countable number of possibilities; the probability takes specific values.
- Continuous random variable can take any value on the real line, or some subset; the probability they take within some range.

4.1.5. Quantiles

Famous sample quantiles.
- The 95th percentile on an exam, 95% of people scored worse than 5% scored better.
Population analogs.

Definition
- The \(\alpha^{th}\) quantile of a distribution with distribution function \(F\) is the point \(x_\alpha\) so that \[F(x_\alpha) = \alpha\]
- A percentile is simply a quantile with \(\alpha\) expressed as a percent
- The median is the \(50^{th}\) percentile

For example

The \(75^{th}\) percentile of a distribution is the point so that:
- The probability, that a random variable from the population, is less is 75%.
- The probability, that a random variable from the population, is more is 25%.

4.1.6. Conditional Probability

Motivating example
- The probability of getting a one when rolling a (standard) die is usually assumed to be one sixth.
- Suppose you were given the extra information that the die roll was an odd number (hence 1, 3 or 5).
- Conditional on this new information, the probability of a one is now one third.

Definition

Let \(B\) be an event so that \(P(B) > 0\)
- Then the conditional probability of an event \(A\) given that \(B\) has occurred is \(P(A ~|~ B) = \frac{P(A \cap B)}{P(B)}\)
- Notice that if \(A\) and \(B\) are independent, then \(P(A ~|~ B) = \frac{P(A) P(B)}{P(B)} = P(A)\)
- \(\cap =\mbox{ intersection}\)

Little Example

Consider our die roll example, \(P(\mbox{one given that roll is odd})=P^*\).
- \(A = \{1\}\) and \(B = \{1, 3, 5\}\). Then
  
  \begin{eqnarray*} P^* & = & P(A ~|~ B) \\ \\ & = & \frac{P(A \cap B)}{P(B)} \\ \\ & = & \frac{P(A)}{P(B)} = \frac{1/6}{3/6} = \frac{1}{3} \end{eqnarray*}

4.1.7. Bayes’ rule

Baye’s rule allows us to reverse the conditioning set provided that we know some marginal probabilities.
- \[ P(B ~|~ A) = \frac{P(A ~|~ B) P(B)}{P(A ~|~ B) P(B) + P(A ~|~ \neg B)P(\neg B)}\]
- Where \[P(\neg B)\] is the initial degree of belief in not-B (B is false), and \(P(\neg B)=1-P(B)\)

Diagnostic tests
- Let \(+\) and \(-\) be the events that the result of a diagnostic test is positive or negative respectively.
- Let \(D\) and \(D^c\) be the event that the subject of the test has or does not have the disease respectively.
- The sensitivity is the probability that the test is positive given that the subject actually has the disease, \(P(+ ~|~ D)\).
- The specificity is the probability that the test is negative given that the subject does not have the disease, \(P(- ~|~ D^c)\).

More definitions

The positive predictive value is the probability that the subject has the disease given that the test is positive, \(P(D ~|~ +)\)
The negative predictive value is the probability that the subject does not have the disease given that the test is negative, \(P(D^c ~|~ -)\)
The prevalence of the disease is the marginal probability of disease, \(P(D)\)

4.1.8. Using Bayes’ formula

\begin{eqnarray*} P(D | +) & = &\frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|D^c)P(D^c)}\\ \\ & = & \frac{P(+|D)P(D)}{P(+|D)P(D) + \{1-P(-|D^c)\}\{1 - P(D)\}} \\ \\ & = & \frac{.997\times .001}{.997 \times .001 + .015 \times .999} = 0.062 \end{eqnarray*}

Then,
- A positive test result only suggests a 6% probability that the subject has the disease.
- The positive predictive value is 6% for this test.

4.1.9. Likelihood ratios, using Bayes rule

\[ P(D|+) = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|D^c)P(D^c)} \] \[P(D^c|+) = \frac{P(+|D^c)P(D^c)}{P(+|D)P(D) + P(+|D^c)P(D^c)}\]

Therefore
- \[\frac{P(D|+)}{P(D^c|+)} = \frac{P(+|D)}{P(+|D^c)}\times \frac{P(D)}{P(D^c)}\] ie \[\mbox{post-test odds of }D = DLR_+\times\mbox{pre-test odds of }D\]
  - DLR, Diagnostic Likelihood Ratio test
  - Similarly, \(DLR_-\) relates the decrease in the odds of the disease after a negative test result to the odds of disease prior to the test.

4.2. Expectations

4.2.1. Expected values

Expected values are useful for characterizing a distributions.
The mean is a characterization of its center.
The variance and standard deviation are characterizations of how spread out it is.
Our sample expected values (the sample mean and variance) will estimate the population versions.

4.2.2. The population mean

The expected value or mean of a random variable is the center of its distribution
For discrete random variable \(X\) with PMF \(p(x)\), it is defined as follows \[ E[X] = \sum_x xp(x). \] where the sum is taken over the possible values of \(x\)
\(E[X]\) represents the center of mass of a collection of locations and weights, \(\{x, p(x)\}\)

4.2.3. The sample mean

The ample mean estimates this population mean.
The center of mass of the data is the empirical mean.

\[ \bar X = \sum_{i=1}^n x_i p(x_i) \] where \(p(x_i) = 1/n\)

4.2.4. Example

Find the center of mass of the bars

4.2.5. What about a biased coin?

Suppose that a random variable, \(X\), is so that
\(P(X=1) = p\) and \(P(X=0) = (1 - p)\)
(This is a biased coin when \(p\neq 0.5\))
What is its expected value?
\[E[X] = 0 * (1 - p) + 1 * p = p\]

4.2.6. Continuous random variables

For a continuous random variable, \(X\), with density, \(f\), the expected value is again exactly the center of mass of the density.

4.2.7. Summary of Expected Values

Expected values are properties of distributions.
The average of random variables is itself a random variable and its associated distribution has an expected value.
The center of this distribution is the same as that of the original distribution.
Therefore, the expected value of the sample mean is the population mean trying to estimate.
When the expected value of an estimator is what its trying to estimate, we say that the estimator is unbiased

4.3. Variance

4.3.1. The variance

The variance of a random variable is a measure of spread
If \(X\) is a random variable with mean \(\mu\), the variance of \(X\) is defined as
- \(Var(X) = E[(X - \mu)^2] = E[X^2] - E[X]^2\)
The expected (squared) distance from the mean
Densities with a higher variance are more spread out than densities with a lower variance
The square root of the variance is called the standard deviation
The standard deviation has the same units as \(X\)

—

4.3.2. Examples Variance

Example 1
- What’s the variance from the result of a toss of a die?
  - \(E[X] = 3.5\)
  - \(E[X^2] = 1 ^ 2 \times \frac{1}{6} + 2 ^ 2 \times \frac{1}{6} + 3 ^ 2 \times \frac{1}{6} + \\ 4 ^ 2 \times \frac{1}{6} + 5 ^ 2 \times \frac{1}{6} + 6 ^ 2 \times \frac{1}{6} = 15.17\)
- \(Var(X) = E[X^2] - E[X]^2 \approx 2.92\)
—

Example 2

What’s the variance from the result of the toss of a coin with probability of heads (1) of \(p\)?
- \(E[X] = 0 \times (1 - p) + 1 \times p = p\)
- \(E[X^2] = E[X] = p\)

\[Var(X) = E[X^2] - E[X]^2 = p - p^2 = p(1 - p)\] —

4.3.3. The sample variance

The sample variance is
- \(S^2 = \frac{\sum_{i=1} (X_i - \bar X)^2}{n-1}\)
- (almost, but not quite, the average squared deviation from the sample mean)
It is also a random variable
- It has an associate population distribution
- Its expected value is the population variance
- Its distribution gets more concentrated around the population variance with more data
Its square root is the sample standard deviation

—

4.3.4. Recall the mean

Recall that the average of random sample from a population is itself a random variable
We know that this distribution is centered around the population mean, \(E[\bar X] = \mu\)
We also know what its variance is \(Var(\bar X) = \sigma^2 / n\)
This is very useful, since we don’t have repeat sample means to get its variance; now we know how it relates to the population variance
We call the standard deviation of a statistic a standard error

—

4.3.5. To summarize

The sample variance, \(S^2\), estimates the population variance, \(\sigma^2\)
The distribution of the sample variance is centered around \(\sigma^2\)
The variance of the sample mean is \(\sigma^2 / n\)
- Its logical estimate is \(s^2 / n\)
- The logical estimate of the standard error is \(S / \sqrt{n}\)
\(S\), the standard deviation, talks about how variable the population is
\(S/\sqrt{n}\), the standard error, talks about how variable averages of random samples of size \(n\) from the population are

—

4.3.6. Summarizing what we know about variances

The sample variance estimates the population variance
The distribution of the sample variance is centered at what its estimating
It gets more concentrated around the population variance with larger sample sizes
The variance of the sample mean is the population variance divided by \(n\)
- The square root is the standard error

5. Intervals, Testing & p-Values

5.1. Hypothesis Testing

5.1.1. Hypothesis testing

Hypothesis testing is concerned with making decisions using data.
A null hypothesis is specified that represents the status quo, usually labeled \(H_0\).
The null hypothesis is assumed true and statistical evidence is required to reject it in favor of a research or alternative hypothesis.

5.1.2. Hypothesis testing decision

The alternative hypotheses are typically of the form \(<\), \(>\) or \(\neq\)
Note that there are four possible outcomes of our statistical decision process

Truth	Decide	Result
\(H_0\)	\(H_0\)	Correctly accept null
\(H_0\)	\(H_a\)	Type I error
\(H_a\)	\(H_a\)	Correctly reject null
\(H_a\)	\(H_0\)	Type II error

5.1.3. General rules

The \(Z\) test for \(H_0:\mu = \mu_0\), versus
- \(H_1: \mu < \mu_0\)
- \(H_2: \mu \neq \mu_0\)
- \(H_3: \mu > \mu_0\)
Test statistic \(TS = \frac{\bar{X} - \mu_0}{S / \sqrt{n}}\)
Reject the null hypothesis when
- \(TS \leq Z_{\alpha} = -Z_{1 - \alpha}\)
- \(|TS| \geq Z_{1 - \alpha / 2}\)
- \(TS \geq Z_{1 - \alpha}\)

5.1.4. Notes

We:
- Fix \(\alpha\) to be low, so if we reject \(H_0\): our model is wrong or there is a low probability that we have made an error.
- Not fixed the probability of a type II error, \(\beta\); we tend to say Fail to reject \(H_0\) rather than accepting \(H_0\).
- Statistical significance is no the same as Scientific significance.

5.1.5. Connections with confidence intervals

Consider testing \(H_0: \mu = \mu_0\) versus \(H_a: \mu \neq \mu_0\).
Take the set of all possible values for which you fail to reject \(H_0\), this set is a \((1-\alpha)100\%\) confidence interval for \(\mu\).
The same works in reverse; if a \((1-\alpha)100\%\) interval contains \(\mu_0\), then we fail to reject \(H_0\).

5.2. p-Values

5.2.1. P-values

Most common measure of statistical significance.
Their ubiquity, along with concern over their interpretation and use makes them controversial among statisticians.

5.2.2. What is a P-value?

Idea: Suppose nothing is going on - how unusual is it to see the estimate we got? Approach:

Define the hypothetical distribution of a data summary (statistic) when “nothing is going on” (null hypothesis)
Calculate the summary/statistic with the data we have (test statistic)
Compare what we calculated to our hypothetical distribution and see if the value is “extreme” (p-value)

5.2.3. P-values

The P-value is the probability under the null hypothesis of obtaining evidence as extreme or more extreme than that obtained
If the P-value is small, then either \(H_0\) is true and we have observed a rare event or \(H_0\) is false
Suppose that you get a \(T\) statistic of \(2.5\) for 15 df testing \(H_0:\mu = \mu_0\) versus \(H_a : \mu > \mu_0\).
- What’s the probability of getting a \(T\) statistic as large as \(2.5\)?

Therefore, the probability of seeing evidence as extreme or more extreme than that actually obtained under \(H_0\) is 0.0123

—

5.2.4. The attained significance level

Our test statistic was \(2\) for \(H_0 : \mu_0 = 30\) versus \(H_a:\mu > 30\).
Notice that we rejected the one sided test when \(\alpha = 0.05\), would we reject if \(\alpha = 0.01\), how about \(0.001\)?
The smallest value for alpha that you still reject the null hypothesis is called the attained significance level
This is equivalent, but philosophically a little different from, the P-value

—

5.2.5. Notes

By reporting a p-value the reader can perform the hypothesis test at whatever \(\alpha\) level.
If the p-value is less than \(\alpha\) you reject the null hypothesis.
For two sided hypothesis test, double the smaller of the two one sided hypothesis test P-values.

6. StatsBiol

Network and Science Compexity

6.1. Descriptive Statistics

Term	Meaning	Common Uses
Standard deviation	The typical difference between each value and the mean value.	Describing how broadly the sample values are distributed. \[s.d.=\sqrt{\sum(X-\bar{X})^2/(N-1)}\]
Standard error of the mean (s.e.m)	An estimate how variable the means will be if the experiment is repeated multiple times.	Inferring where the population mean is likely to lie or whether set of samples are likely to come from the sample population. \[s.e.m.=s.d./\sqrt{N}\]
Confidence Interval (CI:95%)	with 95% confidence, the population mean will lie in this interval.	Top interfere where the population mean lies, and to compare two populations \[CI=mean\pm s.e.m. \times t_{(N-1)}\]
Independent Data	Values from separate of the same type that are not linked	Testing hypothesis about population.
Replicate data	Values from experiment where everything is linked as much as possible.	Serves as an internal check on performance of an experiment.
Sampling error	Variation caused by sampling part of a population rather than measuring the whole population.	Can reveal bias in the data or problems with conduct of experiment. In binomial distributions the expected is \[\sqrt{Np(1-p)}\]; in Poisson the expected s.d. is \[\sqrt{mean}\]

6.2. Statistical Hypothesis Testing

Null hypothesis
- Pearson’s correlation test is that there is no relationship between two variables.
- The null hypothesis for the Student’s t test is that there is no difference between the means of two populations.

6.3. p-value (p)

A p-value, which is the probability of observing the result given that the null hypothesis is true.
- not the reverse, as is often the case with misinterpretations.
\(p <= \alpha\): reject \(H_0\), different distribution.
\(p > \alpha\): fail to reject \(H_0\), same distribution.

6.4. Errors

There are two types of errors:
Type I Error. Reject the null hypothesis when there is in fact no significant effect - false positive.
- The p-value is optimistically small.
Type II Error. Not reject the null hypothesis when there is a significant effect -false negative.
- The p-value is pessimistically large.

6.5. What Is Statistical Power?

Statistical power, or the power of a hypothesis test is the probability that the test correctly rejects the null hypothesis.
- Power = 1 - Type II Error
- Pr(True Positive) = 1 - Pr(False Negative)

More intuitively, the statistical power can be thought of as the probability of accepting an alternative hypothesis, when the alternative hypothesis is true.

Low Statistical Power:
- Large risk of committing Type II errors.
High Statistical Power:
- Small risk of committing Type II errors.

6.6. Statistical Power

The statistical power of a hypothesis test is the probability of detecting an effect, if there is a true effect present to detect.

6.7. Power Analysis

Effect Size.
- The quantified magnitude of a result present in the population.
- Effect size is calculated using a specific statistical measure, such as Pearson’s correlation coefficient for the relationship between variables.
Sample Size.
- The number of observations in the sample.

Significance.
- The significance level used in the statistical test, e.g. alpha. Often set to 5% or 0.05.
Statistical Power.
- The probability of accepting the alternative hypothesis if it is true.

7. Class Recap

Motivations
The Power
Probability, Expectations & Variance
Intervals, Testing & p-Values
StatsBiol
Laboratory