Из этой статьи 2011 года:
Они определяют время по ширине блоков ДНК, которые форимируются при примеси. Они определяют, что вся примесь произошла единомоментно, а не за разные промежутки времени несколько раз.
Красным я выделил, как они пытаются бороться с неточностью, которая возникает, но я не понимаю, что это значит.
Results and discussion
Overview of the method
The idea behind the method is straightforward: when two populations admix, genetic recombination starts breaking ancestral genomes into blocks of different sizes, so that the genomes of the descendants of an
admixture event are composed of different combinations of these ancestral blocks (Figure 1). Hence, by screening the genome of an individual of mixed ancestry, we identify stretches of the genome which are inherited from either of the ancestral populations. Moreover, the structure of an admixed genome contains information on the timing of the admixture event itself. The number of admixture blocks reflects past recombination events, and similarly the width of such blocks also contains temporal information, as more recombination events would result in narrower blocks that are more evenly spread along and among chromosomes.
We start by performing a sequential stepwise PCA
(StepPCO) along each chromosome of an individual
from an admixed population and of individuals from
respective parental populations.
We consider an
admixed population as a mixture of two ancestral popu-
lations, in which the admixture occurred at a single
timepoint, and assume that no genetic drift occurred
after the admixture event. These, of course, are simplify-
ing assumptions, as most human populations are
expected not only to have many incidents of admixture
occurring at different points in time, and between differ-
ent populations, but also to experience genetic drift rela-
tive to the parental populations.
We try to circumvent
this issue by finding the first principal axis (PA1) based
on the samples from the proposed ancestral populations
or their proxies, and then project the admixed dataset
onto the axis of variation defined by these ancestral
populations, thereby excluding any signal which poten-
tially could originate from drift and/or other sources of
ancestry [8,20]. We then consider a sliding window
along each chromosome. The size of this window is not
fixed, but at each position is determined by the statisti-
cal properties of the collection of SNPs in the window.
We take evenly spaced points along each chromosome
(evenly spaced in terms of genetic distances); and each
point serves as center for the next window. The number
of points (windows) is chosen so that the windows span
the entire chromosome, leaving no gaps in between. To
simplify subsequent wavelet
transform analysis, we also
want the number of windows (or bins) equal to a power
of two. Starting from the center of each window, we
increase the window until the mean PC1 coordinates for
the parental populations are separated by three standard
deviations from each mean. The goal is to achieve a
complete separation of the parental populations within
each window, so there is no ambiguity in assigning
chromosomal segments in an admixed genome to either
ancestral population. Because human populations are
closely related, there is an obvious trade-off between the
signal resolution and uncertainty in ancestry estimation;
by making the size of the window variable and depen-
dent on the number of informative sites within a given
chromosomal region, we always find the smallest possi-
ble window that gives us optimal signal resolution with-
out introducing excess errors into the ancestry
estimation. Using PA1 coefficients as weights, we find
the average value of SNPs within each window. The
resulting values are then normalized, so that the ances-
tralpopulationscorrespondtovalueswithmeansof1
and -1, respectively. Thus, for each individual we obtain
a value for each of the windows, and the windows are
evenly spaced along the chromosome. For an admixed
individual, the value in each window will either corre-
spond to one of the ancestral populations, or have an
intermediate value corresponding to having one chro-
mosomal segment from each ancestral population (we
use unphased data, as phasing at the level of an entire
chromosome infers haplotypes with significant phasing
(switch) errors [14,21], making such data unusable for
time since admixture estimation). Thus for each indivi-
dual and each chromosome we obtain a StepPCO signal,
consisting of a sequence of values along the given chro-
mosome. This part of the method is similar to a recently
published approach [10], in which local genomic admix-
ture estimates are inferred using PC analysis on a grid
of points along the genome (and not genome-wide);
unlike our method this approach works with very small
windowsof15SNPs,andrequiresaHiddenMarkov
Model (HMM) to infer ancestry state within each win-
dow. Our implementation is
also different in that we
not only estimate the local genomic level of admixture,
but also use the identified ancestry block structure to
date admixture events.