АвторТема: 67-marker datasets and Markov chain STR mutation  (Прочитано 3743 раз)

0 Пользователей и 1 Гость просматривают эту тему.

Оффлайн ClavisАвтор темы

  • Семенов Михаил Юрьевич
  • Сообщений: 1495
  • Страна: ru
  • Рейтинг +111/-0
    • https://m.vk.com/@clavis1953
  • Y-ДНК: G2a2 L1264
  • мтДНК: HV9, ранее известная как HV3a
67-marker datasets and Markov chain STR mutation
« : 22 Март 2010, 06:47:31 »
http://f1.grp.yahoofs.com/v1/MN2mS0zNLPmQ4c96_q7CwReJRt21p4TSZUTv5QLkK_TJHpO_yeA-8qn5B1dokZY-WZfux4DC9QtMGLIVwFlgo-nyb1r9PyWTPCC5KrzlLeUonlOxEgQ/MarkosSTRMutationModel.pdf
The present study addresses the statistical properties of human Y chromosome short tandem repeat
(STR) loci.
The motivation was the availability of 67 STR marker data from various public projects associated
with ftdna.com [1] and the interest to develop various genetic time estimation methods based
on STR statistics. A statistical description of STR loci with 5 per locus parameters was developed
[2]. The per locus parameters were found with a study of phylogenetic networks for 5 major y-
DNA haplogroups, E,G,I,J, and R, based on 4,409 samples at 67-marker level. The haplogroup Q
was also studied in some extent. However, some major branches of these haplogroups were poorly
represented which may have some impact on the results.
Statistical models are based on direct observational data and the use of indirect methods. In the
present case, the use of direct empirical studies is limited to overall rate of change calibration via reported
direct observations [3]. However, indirect observational studies of STR evolution are possible
with the use of publicly available STR databases. The use of such databases without any pedigree
information means that the phylogenetic structures have to be inferred from the data. Without prior
statistical knowledge, such phylogenetic inference has to be based on the use of non-parametric
methods. Fortunately, there is some hope that the number of available STR loci is large enough (67)
to make such studies meaningful.
The non-parametrically inferred segments of evolutionary paths can be studied in a framework
of a statistical model. Model parameters and time values can then be inferred by maximization of the
overall probability of the observed evolution (maximum likelihood principle). The main advantage
of the approach is the large number of transmission events that can be indirectly covered in this way;
with a time depth (tMRCA) of 10,000 years, 4,409 samples would cover indirectly about 200,000
transmission events1 and about 10 million marker mutation opportunities.
It is known that STR mutation rates depend on the length of the DNA segment involved; longer
segments could be subject to a larger copy error rate, for example. Some STR loci also show
preference for repeat number increases or decreases. It is also possible that the change is by more
than one repeat unit in a single mutation event; such changes are seen in many loci while in some
loci they are not seen. The generalized STR statistical model proposed here includes a description of
these features. The common behavior found is that both up and down mutation rates increase with
the STR repeat number. Some loci show only weak repeat number dependence, like, e.g, CDYa/b. In
the most common case, the down rate starts from a lower level but increases faster than the up rate;
an example of such locus is DYS388 where mutation rates increase very significantly and become
more symmetric moving from 12 to 15 repeat units; multistep changes are also relatively commonly
seen in this locus. For time calculations, the overall effect of allele number dependence is, however,less or of similar magnitude than the impact of multistep effects. This is most likely since some of
the length dependence effect tends to average out, while multistep changes always produce larger
time estimates.
For the set of haplogroup diagrams, per locus parameters2 of a generalized STR statistical model
were determined for the 67 loci used. Parameters were found by solving the related maximum
likelihood problem with respect to node time estimates and locus parameters several times until the
values converged to a self-consistent optimum. Mutation rate models were calibrated by using the
data collected by YHRD for a set of reference loci.
The phylogenetic trees were based on joint SNP and STR information. Actual SNP test data is
sparse, but more extensive predicted SNP haplogroup data is often reported in the various project
pages. The networks were constructed by using a locally improved Neighbor Joining (NJ) [4] algorithm.
The STR values inside the network after the distance-based NJ stage were found by the
parsimony principle (each STR value is by construction a median of the three adjacent STR values
in the related binary tree). A branch-and-bound local improvement algorithm [5] was employed to
ensure that arrangements inside a certain radius cannot improve the overall Hamming distance of
the network connections. The relative weight of the SNP data was chosen to be large enough to enforce
the SNP splits in the resulting network. The STR loci were uniformly weighted in NJ distance
calculations and in the Hamming distance evaluations in the local improvement stage.
Time estimates for the oldest nodes in the phylogenetic diagrams of the major yDNA haplogroups
suggest that many major branches of various haplogroups appeared near the Pleistocene–
Holocene transition about 11,000 ... 13,000 years before present (yBP). It has been proposed that
this is related to population dynamical effects near the Younger Dryas climatological period [6],
such like global temperature behavior and possible population expansion related to B?lling-Aller?d
period 14,700 – 12,700 yBP and early Holocene 11,500 – 10,000 yBP [7]. Results also suggest that
commonly used variance based time estimates are too large by 20% ... 30% due to a fact that about
6% of the STR mutation events involve changes by more than one repeat unit.
Section 2 provides a brief summary of the statistical models and principles.

Оффлайн Nimissin

  • Сообщений: 2402
  • Рейтинг +759/-0
  • Y-ДНК: N-M178 L839+ P298+ M2019+ M2118+ M1991+ M1988+
  • мтДНК: C4b12a
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #1 : 22 Март 2010, 10:05:23 »
Уважаемый Clavis, Ваша ссылка не открывается. Прошу написать реквизиты статьи.

Оффлайн asan-kaygy

  • ...
  • Сообщений: 9613
  • Страна: kz
  • Рейтинг +945/-5
  • Y-ДНК: R1a1a1b2a1a-L657+,Y9+,Y944+
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #2 : 22 Март 2010, 10:06:45 »
Тоже не открывается.

Оффлайн Аббат Бузони

  • ...
  • Сообщений: 19888
  • Страна: ru
  • Рейтинг +1818/-60
  • Y-ДНК: I1-SHTR7+
  • мтДНК: H16-a1-T152C!
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #3 : 22 Март 2010, 10:11:58 »
Рано утром открывалась.

Оффлайн ClavisАвтор темы

  • Семенов Михаил Юрьевич
  • Сообщений: 1495
  • Страна: ru
  • Рейтинг +111/-0
    • https://m.vk.com/@clavis1953
  • Y-ДНК: G2a2 L1264
  • мтДНК: HV9, ранее известная как HV3a
67-marker datasets and Markov chain STR mutation
« Ответ #4 : 22 Март 2010, 12:45:19 »
У меня сейчас открывается. Впрочем, вот другой маршрут:
1) http://tech.groups.yahoo.com/group/HaploGNewsGrp/
Вверху ссылка:
Activity within 7 days:1 New File - New Questions
кликните ее и еще раз
там выложено Рэем Бэнксом две картинки и статья в формате pdf
Название я привел, автор и дата:
M.T. Heinil?a
February 27, 2010
« Последнее редактирование: 22 Март 2010, 13:04:19 от Clavis »

Оффлайн Centurion

  • 100% Earth (Solar System) genofond
  • Администратор
  • *****
  • Сообщений: 9548
  • Страна: ru
  • Рейтинг +571/-2
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #5 : 22 Март 2010, 12:58:20 »
Клавис, ну это наверное все доступно только участникам группы.

Оффлайн Nimissin

  • Сообщений: 2402
  • Рейтинг +759/-0
  • Y-ДНК: N-M178 L839+ P298+ M2019+ M2118+ M1991+ M1988+
  • мтДНК: C4b12a
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #6 : 22 Март 2010, 13:57:38 »
У меня сейчас открывается. Впрочем, вот другой маршрут:
1) http://tech.groups.yahoo.com/group/HaploGNewsGrp/
Вверху ссылка:
Activity within 7 days:1 New File - New Questions
кликните ее и еще раз
там выложено Рэем Бэнксом две картинки и статья в формате pdf
Название я привел, автор и дата:
M.T. Heinil?a
February 27, 2010
Картинки и статью не нашел. Попытка воспользоваться поисковиком ничего не дала.

Оффлайн Nozdrin

  • Сообщений: 485
  • Рейтинг +33/-0
  • V13
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #7 : 22 Март 2010, 14:11:39 »
И тут у меня кончился лимит на размещение картинок. Сорри..
http://www.radikal.ru/   как вариант
или перезалить документ

Оффлайн Nimissin

  • Сообщений: 2402
  • Рейтинг +759/-0
  • Y-ДНК: N-M178 L839+ P298+ M2019+ M2118+ M1991+ M1988+
  • мтДНК: C4b12a
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #8 : 23 Март 2010, 08:56:09 »
Спасибо, уважаемый Clavis!

Оффлайн Овод

  • Главный модератор
  • *****
  • Сообщений: 1769
  • Рейтинг +390/-3
  • Omnia mea mecum porto
  • Y-ДНК: R1a-M198
  • мтДНК: U4a
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #9 : 23 Март 2010, 12:00:47 »
Уважаемый Клавис, не могли бы Вы выслать мне на мэйл статью в исходном формате? Очень трудно мне читать мелкий шрифт. Похоже, она толковая.
 
Также неплохо было бы иметь её в Аббатовском "хранилище знаний".
 
Спасибо.

Оффлайн ClavisАвтор темы

  • Семенов Михаил Юрьевич
  • Сообщений: 1495
  • Страна: ru
  • Рейтинг +111/-0
    • https://m.vk.com/@clavis1953
  • Y-ДНК: G2a2 L1264
  • мтДНК: HV9, ранее известная как HV3a
Re: 67-marker datasets and Markov chain STR mutation
« Ответ #10 : 23 Март 2010, 13:27:49 »
отправил

 

© 2007 Молекулярная Генеалогия (МолГен)

Внимание! Все сообщения отражают только мнения их авторов.
Все права на материалы принадлежат их авторам (владельцам) и сетевым изданиям, с которых они взяты.