Category: genetic structure

PopGen June 2019

Two papers.

The first:

Abstract
In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe.

That’s interesting, I suppose, but what is really needed from population genetics is two things.  First, global assays of genetic kinship.  Second, application of genetic structure and genetic integration (e.g., Gillet and Gregorious) to human genetic data. These things are consistently not being done. Is it because they are viewed as uninteresting to the field, or is it because the findings would be politically unpalatable to the field?

Author summary
We introduce a novel statistical method to infer migration rates and population sizes across space in recent time periods. Our approach builds upon the previously developed EEMS method, which infers effective migration rates under a dense lattice. Similarly, we infer demographic parameters under a lattice and use a (Voronoi) prior to regularize parameters of the model. However, our method differs from EEMS in a few key respects. First, we use the coalescent model parameterized by migration rates and population sizes while EEMS uses a resistance model. As another key difference, our method uses haplotype data while EEMS uses the average genetic distance. A consequence of using haplotype data is that our method can separately estimate migration rates and population sizes, which in essence is done by using a recombination rate map to calibrate the decay of haplotypes over time. An additional useful feature of haplotype data is that, by varying the lengths analyzed, we can infer demography associated with different recent time periods. We call our method MAPS for estimating Migration And Population-size Surfaces. To illustrate MAPS on real data, we analyze a genome-wide SNP dataset on 2224 individuals of European ancestry.

I’m not going to judge the validity of this approach without more data; however, any cursory look at current population genetic studies illustrates how the “testing companies” are behind the cutting edge of methodology.

Largely speaking, the spatial variation in inferred dispersal rates and population densities is remarkably consistent across the different time scales (Fig 4). In the MAPS dispersal surfaces, several regions with consistently low estimated dispersal rates coincide with geographic features that would be expected to reduce gene flow, including the English Channel, Adriatic Sea and the Alps. 

In general, geographic barriers have historically impeded (but obviously not abrogated) gene flow.

In addition we see consistently high dispersal across the region between the UK and Norway, which may reflect the known genetic effects of the Viking expansion [22]. 

See more on this below.

These features are consistent with visual inspection of the raw lPSC sharing data (S4b Fig). The MAPS population density surfaces consistently show lowest density in Ireland, Switzerland, Iberia, and the southwest region of the Balkans. This is consistent with samples within each of these areas having among the highest PSC segment sharing (S4a Fig). The MAPS inferred country population sizes are also highly correlated with estimated current census population sizes from [36] and [37] (S5 Fig) which can be mainly attributed to the fact that lPSC segments are highly informative of current census population sizes (Fig 5).

And then:

We do note the lower estimated dispersal rates between Portugal and Spain compared to the rest of Europe in the analyses of longer PSC segments (5-10 and > 10cM), and the higher estimated dispersal rates through the Baltic Sea (> 10cM segments), possibly reflecting changing gene flow in these regions in recent history.

I’m not sure what to make of that Iberian data.  I’m not aware of any significant geographical barrier there, so is that an example of political barriers affecting gene flow?  The data of this paper call into question “testing companies” using generalized “Iberian” or “British/Irish” ancestral categories.

Our estimates of dispersal distances and population density from the POPRES data are among the first such estimates using a spatial model for Europe (though see [30]). The features observed in the dispersal and population density surfaces are in principle discernible by careful inspection of the numbers of shared PSC segments between pairs of countries (e.g. using average pairwise numbers of shared segments, S4b Fig, as in [20]). For example, high connectivity across the North Sea is reflected in the raw PSC calls: samples from the British Isles share a relatively high number of PSC segments with those from Sweden (S4b Fig). 

This is consistent with what is mentioned above, compatible with the historically known gene flow from Scandinavia to the British Isles, particularly England, during the Viking age.

Also the low estimated dispersal between Switzerland and Italy is consistent with Swiss samples sharing relatively few PSC segments with Italians given their close proximity (S4b Fig). 

The Alps being one of the geographical barriers mentioned above.  This of course is not compatible with Der Movement dogma of Northern Italians being “Celto-Germanic Nordics.”

However, identifying interesting patterns directly from the PSC segment sharing data is not straightforward, and one goal of MAPS (and EEMS) is to produce visualizations that point to patterns in the data that suggest deviations from simple isolation by distance.

The inferred population size surfaces for the POPRES data show a general increase in sizes through time, with small fluctuations across geography; In our results, the smallest inferred population sizes are in the Balkans and Eastern Europe more generally. This is in agreement with the signal seen previously [20]; however, taken at face value, our results suggest that high PSC sharing in these regions may be due more to consistently low population densities than to historical expansions (such as the Slavic or Hunnic expansions).

Relative population density may be a driver of genetic history, and one ignored by Der Movement in lieu of more colorful stories about expansions and admixture.

Second paper:

The roles of migration, admixture and acculturation in the European transition to farming have been debated for over 100 years. Genome-wide ancient DNA studies indicate predominantly Aegean ancestry for continental Neolithic farmers, but also variable admixture with local Mesolithic hunter-gatherers. Neolithic cultures first appear in Britain circa 4000 BC, a millennium after they appeared in adjacent areas of continental Europe. The pattern and process of this delayed British Neolithic transition remain unclear. We assembled genome-wide data from 6 Mesolithic and 67 Neolithic individuals found in Britain, dating 8500-2500 BC. Our analyses reveal persistent genetic affinities between Mesolithic British and Western European hunter-gatherers. We find overwhelming support for agriculture being introduced to Britain by incoming continental farmers, with small, geographically structured levels of hunter-gatherer ancestry. Unlike other European Neolithic populations, we detect no resurgence of hunter-gatherer ancestry at any time during the Neolithic in Britain. Genetic affinities with Iberian Neolithic individuals indicate that British Neolithic people were mostly descended from Aegean farmers who followed the Mediterranean route of dispersal. We also infer considerable variation in pigmentation levels in Europe by circa 6000 BC.

Contra Duchesne, ancestry deriving from Neolithic farmers is not restricted to Southern Europe; it is just much more concentrated there.

Genetic Structure and Altruistic Self-Sacrifice

A more precise accounting is required.

We are all aware of Haldane’s oft-quoted assertion that he would lay down his life for two brothers or eight cousins, the genetic payoff of such altruistic self-sacrifice being the equivalence – as measured by ”bean-bag” genetics – of the numbers of gene copies between these sets of relatives.

In general, I am in broad agreement with the sentiment, although as we shall see, it requires modification.  Even more broadly, those on the Far Right invoke this paradigm to support the idea of altruistic self-sacrifice in favor of larger numbers of an ethny, in defense if ethnic genetic interests.  Likewise, I support that as well, with the proper modifications as with the smaller-scale examples of familial relatives.

Even though at first glance, Haldane’s reasoning seems sound, likely most people would be hesitant to follow that advice.  In large part, this is the natural impulse of self-preservation, but there are other reasonable objections that can be made.

One could argue, all else being equal, that judging between two sets of equivalent genetics, it’s better to preserve yourself for reasons of control.  A person concerned enough with genetic continuity that they would consider such altruistic self-sacrifice is someone likely to start a family, care for children, and properly actualize the continuity. Can you be sure your two brothers would do the same?  Why are they in the position that they need your sacrifice to begin with?  Are they stupid?  Reckless? Are you sure they’ll act in support of your (in this case indirect) genetic continuity with the same vigor you would do for yourself?  So, to be safe, maybe you need to raise the bar for self-sacrifice to three brothers or ten cousins?

A more important reason, and one that may be intuitively sensed by most people even though they wouldn’t be able to explain it, or likely even articulate their feeling about it, is that there is more about kinship than mere numbers of gene copies.  Genetic structure is important – what genes are coinherited and, to the layman’s eye, what phenotypic traits (derived from those genes) are inherited together.  Of course, family is going to be more similar here than (co-ethnic) strangers, but similarity is not identity.  Even with siblings (apart from identical twins, which are a special case), recombination and independent assortment will ensure that your brothers will have a distinct genetic stricture from you.  Now, granted, these same processed, even with a co-ethnic mate, will ensure that your children will also have a different genetic structure than you, but, all else being equal, your brothers’ children will be more unlike you, with respect to genetic structure, than your own children, as the “starting point” (you vs. your brothers) is already different. So, when genetic structure is taken into account, two brothers are not really your genetic equivalent.  Apart from an identical twin, you have no genetic equivalent, just degrees of relative similarity and difference, even after numbers of gene copies are accounted for.  Then how many brothers are sufficient for self-sacrifice?  This requires a more rigorous analysis, which will be dependent upon accurate measures of genetic structure, and that’s not something we can expect SJW population geneticists are likely to do. However, while the overall Haldane argument – and its Salterian extension – makes sense the numbers given based on “bean bag” genetics is going to be an underestimation of where you need to draw the line in sacrificing yourself for others.  On the other hand, the reverse is true – if you have to choose between your brothers and strangers, or between co-ethnics and non-ethnics, taking genetic structure into account means that helping your brothers and your co-ethnics is even more important than before, because in comparison to more genetically alien peoples, genetic structure amplifies how much more close you are to your brothers and your co-ethnics.  It’s a double-edged sword: it makes your own preservation a bit more important, but it also makes the preservation of those more similar to you more important than those more distant.

Now, one can argue that after several generations of recombination and independent assortment – even assuming endogamous mating within the ethny – genetic structures derived from your posterity and those of your brothers will be more or less the same, converging on the common pool of ethny-specific genetic structures.  So, while in the first generation, your offspring and that of your brothers may be distinct with respect to genetic structure, that difference would be attenuated over time and, as long as endogamous mating is maintained, your posterity and theirs would reflect similar genetic structures.  But there are problems here.  First, a rigorous analysis is required; perhaps some differences would continue over at least several generations; even if these differences are small, they nevertheless would need to be accounted for.  Second, if it is true that familial genetic strictures would tend, over time, to converge on more generalized ethny-specific structures, then why bother favoring two brothers over two random co-ethnics?  The brothers would share more of your genes, yes, and be more similar as far as genetic structure, but if one invokes “long term intergenerational effects” with respect to questioning the need to account for structure in modifying Haldane’s argument, then one can use the same “intergenerational effect” to directly question Haldane’s original premise.  The answer I believe is that one must do the best they can at a given time in maximizing their genetic payoff, and hope that subsequent generations do the same. In the absence of the required analysis, one can simply argue that looking to the next generation, differences in genetic structure are important and, hence, two brothers are not quite the genetic equivalence of yourself.  Your structure is different from theirs and the genetic payoff of your reproduction is greater for your than both of theirs combined.  So, maybe you need to hold out and sacrifice for three (or more) brothers instead, including for the other reason outlined above. Note that these fine points deal with very close genetic similarity.  When we are talking about racially alien peoples, the genetic distance becomes even more amplified with genetic structure, and in the absence of panmixia, ethny-specific patterns of genetic structure are broadly stable over evolutionary time (we can see that the Iceman is genetically more similar to Europeans than to, say, Asians  of Africans, as one example).

In the absence of the sort of careful quantitative analysis that population geneticists won’t do, from a qualitative standpoint, it would be prudent to require more of a genetic payoff before engaging in Haldane-style altruistic self-sacrifice.  On the other hand, when considering a choice in investing between two genetic entities, picking the group genetically closer to you is even more important when considering genetic structure.  So, when the choice is between self vs. family or family vs. ethny, genetic structure will require a larger genetic payoff before agreeing to sacrifice the interests of the former for the latter. However, when considering a relative choice between ethny one vs. ethny two, genetic structure means that choosing the more similar-to-you ethny is even more important than with “bean-bag” genetics.  

The overall Salterian imperative remains the same as before, once these adjustments are made.

Yet Even More DifferInt

More DifferInt model results.

Note that genepool is exactly the same between both populations, but rearranging genotype combinations gives some differentiation at single and multiple locus measurements even when including elementary genic differences, and there is complete differentiation at the level of multiple locus genotypes neglecting elementary genic differences, even though the genepools are identical and there is not a very large number of genotype rearrangements between the populations. This shows how rapidly complete differentiation is achieved when considering discrete genotype combinations.

(A = 1, T = 2, C = 3, G = 4, first number = number of individuals per genotype)

 #Population1

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

#Population2

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1 2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  4 4

Genepool: 0.0000

Single locus including elementary genic differences: 0.0167

Single locus neglecting elementary genic differences: 0.0333

Multiple locus including elementary genic differences: 0.0410

Multiple locus neglecting genic differences: 1.0000

Yet More DifferInt

More on genetic integration.

Some interesting quotes from this paper; emphasis added:

The elementary genic difference does not distinguish homologous from non-homologous genes. Hence, the homologous and non-homologous gene arrangements within the objects affect the elementary genic differences between them only through their sum. For example, in the case of diploid individuals scored at two gene loci A and B, say, the genotypes A1A1/B1B2 and A1A2/B1B3 represent three (A1, B1, B2) and four (A1, A2, B1, B3), respectively, of the total of five gene-types. A1 is represented by two copies in the first genotype and by one copy in the second, and the remaining four gene-types are represented by at most one copy in each of the two genotypes. The sum of copy number differences between the two genotypes thus equals four. After division by twice the number of individual genes in a genotype (i.e. 2·4), this yields 0.5

as the elementary genic difference. The same result is obtained for the two genotypes A1A2/B1B2 and A1A2/B3B3, even though all genic differences are now due to the alleles at a single locus (B).

Proceeding from lower to higher levels of integration, one expects an increase in differentiation among populations simply because of the larger varietal potential inherent in more complex structures. Since differentiation is based on distances, the distance between two populations should therefore also increase, or at least not decrease, with integration level.

…it appears that differentiation among populations with respect to their forms of gene association may be a normal occurrence. This insight questions the common practice of restricting the measurement of population differentiation to the allelic level (e.g. FST), thereby ignoring the considerable effects of gene association on population differentiation.

One major finding of the paper is that model data routinely give no increase in differentiation (measured including elementary genic differences) with increasing genetic integration, but real data does show increases.  One wonders if large scale human SNP data would demonstrate such differences, as opposed to the limited SNP data or model systems I have used, which demonstrate increased differentiation only when elementary genic differences are neglected.  On the other hand, as I’ve previously written, neglecting elementary genic differences is, I believe, more compatible with my idea of genetic structure.

That said, one can, if they choose allele structure carefully, produce models that do the exact opposite, have equality at the lower levels of genetic integration, but differentiation at the highest level.

Here is an interesting population model I devised and tested with DifferInt; the differences between the two populations are highlighted.  Note that total numbers of each allele are the same, and the total numbers of single locus genotypes are the same as well.  Thus, genepool differentiation is zero (0.000), as is single locus genotype differentiation, also zero (0.000).  The arrangement of the first and ninth single locus genotypes, together, were changed in six of ten individuals between the two populations, thus producing differentiation specifically at the level of multilocus genotypes. 

(A = 1, T = 2, C= 3, G = 4; first number = number of individuals) 

MLG with EGD: 0.0246

MLG w/o EGD: 0.6000 (6/10 individuals per population altered)

#Population1

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

 1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

#Population2

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1 2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4