Category: genetic integration

PopGen June 2019

Two papers.

The first:

Abstract
In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe.

That’s interesting, I suppose, but what is really needed from population genetics is two things.  First, global assays of genetic kinship.  Second, application of genetic structure and genetic integration (e.g., Gillet and Gregorious) to human genetic data. These things are consistently not being done. Is it because they are viewed as uninteresting to the field, or is it because the findings would be politically unpalatable to the field?

Author summary
We introduce a novel statistical method to infer migration rates and population sizes across space in recent time periods. Our approach builds upon the previously developed EEMS method, which infers effective migration rates under a dense lattice. Similarly, we infer demographic parameters under a lattice and use a (Voronoi) prior to regularize parameters of the model. However, our method differs from EEMS in a few key respects. First, we use the coalescent model parameterized by migration rates and population sizes while EEMS uses a resistance model. As another key difference, our method uses haplotype data while EEMS uses the average genetic distance. A consequence of using haplotype data is that our method can separately estimate migration rates and population sizes, which in essence is done by using a recombination rate map to calibrate the decay of haplotypes over time. An additional useful feature of haplotype data is that, by varying the lengths analyzed, we can infer demography associated with different recent time periods. We call our method MAPS for estimating Migration And Population-size Surfaces. To illustrate MAPS on real data, we analyze a genome-wide SNP dataset on 2224 individuals of European ancestry.

I’m not going to judge the validity of this approach without more data; however, any cursory look at current population genetic studies illustrates how the “testing companies” are behind the cutting edge of methodology.

Largely speaking, the spatial variation in inferred dispersal rates and population densities is remarkably consistent across the different time scales (Fig 4). In the MAPS dispersal surfaces, several regions with consistently low estimated dispersal rates coincide with geographic features that would be expected to reduce gene flow, including the English Channel, Adriatic Sea and the Alps. 

In general, geographic barriers have historically impeded (but obviously not abrogated) gene flow.

In addition we see consistently high dispersal across the region between the UK and Norway, which may reflect the known genetic effects of the Viking expansion [22]. 

See more on this below.

These features are consistent with visual inspection of the raw lPSC sharing data (S4b Fig). The MAPS population density surfaces consistently show lowest density in Ireland, Switzerland, Iberia, and the southwest region of the Balkans. This is consistent with samples within each of these areas having among the highest PSC segment sharing (S4a Fig). The MAPS inferred country population sizes are also highly correlated with estimated current census population sizes from [36] and [37] (S5 Fig) which can be mainly attributed to the fact that lPSC segments are highly informative of current census population sizes (Fig 5).

And then:

We do note the lower estimated dispersal rates between Portugal and Spain compared to the rest of Europe in the analyses of longer PSC segments (5-10 and > 10cM), and the higher estimated dispersal rates through the Baltic Sea (> 10cM segments), possibly reflecting changing gene flow in these regions in recent history.

I’m not sure what to make of that Iberian data.  I’m not aware of any significant geographical barrier there, so is that an example of political barriers affecting gene flow?  The data of this paper call into question “testing companies” using generalized “Iberian” or “British/Irish” ancestral categories.

Our estimates of dispersal distances and population density from the POPRES data are among the first such estimates using a spatial model for Europe (though see [30]). The features observed in the dispersal and population density surfaces are in principle discernible by careful inspection of the numbers of shared PSC segments between pairs of countries (e.g. using average pairwise numbers of shared segments, S4b Fig, as in [20]). For example, high connectivity across the North Sea is reflected in the raw PSC calls: samples from the British Isles share a relatively high number of PSC segments with those from Sweden (S4b Fig). 

This is consistent with what is mentioned above, compatible with the historically known gene flow from Scandinavia to the British Isles, particularly England, during the Viking age.

Also the low estimated dispersal between Switzerland and Italy is consistent with Swiss samples sharing relatively few PSC segments with Italians given their close proximity (S4b Fig). 

The Alps being one of the geographical barriers mentioned above.  This of course is not compatible with Der Movement dogma of Northern Italians being “Celto-Germanic Nordics.”

However, identifying interesting patterns directly from the PSC segment sharing data is not straightforward, and one goal of MAPS (and EEMS) is to produce visualizations that point to patterns in the data that suggest deviations from simple isolation by distance.

The inferred population size surfaces for the POPRES data show a general increase in sizes through time, with small fluctuations across geography; In our results, the smallest inferred population sizes are in the Balkans and Eastern Europe more generally. This is in agreement with the signal seen previously [20]; however, taken at face value, our results suggest that high PSC sharing in these regions may be due more to consistently low population densities than to historical expansions (such as the Slavic or Hunnic expansions).

Relative population density may be a driver of genetic history, and one ignored by Der Movement in lieu of more colorful stories about expansions and admixture.

Second paper:

The roles of migration, admixture and acculturation in the European transition to farming have been debated for over 100 years. Genome-wide ancient DNA studies indicate predominantly Aegean ancestry for continental Neolithic farmers, but also variable admixture with local Mesolithic hunter-gatherers. Neolithic cultures first appear in Britain circa 4000 BC, a millennium after they appeared in adjacent areas of continental Europe. The pattern and process of this delayed British Neolithic transition remain unclear. We assembled genome-wide data from 6 Mesolithic and 67 Neolithic individuals found in Britain, dating 8500-2500 BC. Our analyses reveal persistent genetic affinities between Mesolithic British and Western European hunter-gatherers. We find overwhelming support for agriculture being introduced to Britain by incoming continental farmers, with small, geographically structured levels of hunter-gatherer ancestry. Unlike other European Neolithic populations, we detect no resurgence of hunter-gatherer ancestry at any time during the Neolithic in Britain. Genetic affinities with Iberian Neolithic individuals indicate that British Neolithic people were mostly descended from Aegean farmers who followed the Mediterranean route of dispersal. We also infer considerable variation in pigmentation levels in Europe by circa 6000 BC.

Contra Duchesne, ancestry deriving from Neolithic farmers is not restricted to Southern Europe; it is just much more concentrated there.

Yet Even More DifferInt

More DifferInt model results.

Note that genepool is exactly the same between both populations, but rearranging genotype combinations gives some differentiation at single and multiple locus measurements even when including elementary genic differences, and there is complete differentiation at the level of multiple locus genotypes neglecting elementary genic differences, even though the genepools are identical and there is not a very large number of genotype rearrangements between the populations. This shows how rapidly complete differentiation is achieved when considering discrete genotype combinations.

(A = 1, T = 2, C = 3, G = 4, first number = number of individuals per genotype)

 #Population1

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

#Population2

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  3 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1 2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  4 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  4 4

Genepool: 0.0000

Single locus including elementary genic differences: 0.0167

Single locus neglecting elementary genic differences: 0.0333

Multiple locus including elementary genic differences: 0.0410

Multiple locus neglecting genic differences: 1.0000

Yet More DifferInt

More on genetic integration.

Some interesting quotes from this paper; emphasis added:

The elementary genic difference does not distinguish homologous from non-homologous genes. Hence, the homologous and non-homologous gene arrangements within the objects affect the elementary genic differences between them only through their sum. For example, in the case of diploid individuals scored at two gene loci A and B, say, the genotypes A1A1/B1B2 and A1A2/B1B3 represent three (A1, B1, B2) and four (A1, A2, B1, B3), respectively, of the total of five gene-types. A1 is represented by two copies in the first genotype and by one copy in the second, and the remaining four gene-types are represented by at most one copy in each of the two genotypes. The sum of copy number differences between the two genotypes thus equals four. After division by twice the number of individual genes in a genotype (i.e. 2·4), this yields 0.5

as the elementary genic difference. The same result is obtained for the two genotypes A1A2/B1B2 and A1A2/B3B3, even though all genic differences are now due to the alleles at a single locus (B).

Proceeding from lower to higher levels of integration, one expects an increase in differentiation among populations simply because of the larger varietal potential inherent in more complex structures. Since differentiation is based on distances, the distance between two populations should therefore also increase, or at least not decrease, with integration level.

…it appears that differentiation among populations with respect to their forms of gene association may be a normal occurrence. This insight questions the common practice of restricting the measurement of population differentiation to the allelic level (e.g. FST), thereby ignoring the considerable effects of gene association on population differentiation.

One major finding of the paper is that model data routinely give no increase in differentiation (measured including elementary genic differences) with increasing genetic integration, but real data does show increases.  One wonders if large scale human SNP data would demonstrate such differences, as opposed to the limited SNP data or model systems I have used, which demonstrate increased differentiation only when elementary genic differences are neglected.  On the other hand, as I’ve previously written, neglecting elementary genic differences is, I believe, more compatible with my idea of genetic structure.

That said, one can, if they choose allele structure carefully, produce models that do the exact opposite, have equality at the lower levels of genetic integration, but differentiation at the highest level.

Here is an interesting population model I devised and tested with DifferInt; the differences between the two populations are highlighted.  Note that total numbers of each allele are the same, and the total numbers of single locus genotypes are the same as well.  Thus, genepool differentiation is zero (0.000), as is single locus genotype differentiation, also zero (0.000).  The arrangement of the first and ninth single locus genotypes, together, were changed in six of ten individuals between the two populations, thus producing differentiation specifically at the level of multilocus genotypes. 

(A = 1, T = 2, C= 3, G = 4; first number = number of individuals) 

MLG with EGD: 0.0246

MLG w/o EGD: 0.6000 (6/10 individuals per population altered)

#Population1

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

 1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

#Population2

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  3 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 2  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1 2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

1  1 1  2 2  2 3  3 3  1 1  1 4 1 1  2 2  2 3  3 3  1 1  1 4

More Genetic Structure and DifferInt Analysis

An important topic.

I have been looking a bit more at the DifferInt program (currently unable to find anything better), testing some model genotypes to better understand the relationship between different levels of integration with respect to the amount of differentiation.  One finding which is clear that it is when genetic differentiation – at the lowest genepool level – between groups is shallow is when the program is scalable at the level of the highest level of integration.

A test model was devised with two populations of eleven individuals each.  Six loci were considered.  Initially, the two populations were constructed to be genetically identical. Four individuals of the second population had alleles at one lock rearranged so that four heterozygotes were made into four homozygotes (two of each type), without changing the total number of each allele type for that locus in that population.  After this change, the genepool differentiation was 0.0303, but the multilocus genotype neglecting elementary genic differences (MGNEGD) was 0.3636 – a twelve-fold increase in differentiation.  In this simple model of shallow genetic difference, a discrete representation of genetic structure (MGNEGD) is seen to exhibit sharply increased (and quantitatively scalable) differentiation with even a small change in allele structuring in genetically similar (model) populations.

However, when differentiation at the genepool level is already fairly high, then MGNEGD rises to complete differentiation quickly, and the ability to evaluate genetic structure becomes non-scalable using this program.  It could be that the SNP database I utilized for my initial human study was enriched in SNPs that sharply differentiate between ethnies and so all levels of differentiation were high in the analysis; perhaps completely random SNPs would be better? On the other hand, we are most concerned about the distinctive genome (with respect to EGI).  

In a more realistic model of human genetic differentiation, two populations were set up, each consisting of ten individuals, each assayed over 100 loci.  90 of these loci were absolutely identical between the two populations and 10 loci differed between the populations with respect to the frequencies of alleles at the loci.  In some cases, it was 100%  of one allele pair compared to 100% of another; in other cases it was more subtle – for example one population having 20% AA, 60% AT, and 20% TT while the other population was 20% AA, 50% AT, and 30% TT for the same locus.  The genepool differentiation between the two populations was 0.0370; the MGNEGD was 1.000 – complete differentiation.  This again shows that with enough loci studied and differentiated populations, analysis of discrete sets of multilocus genotypes (see my definition of genetic structure below) will reach complete differentiation.  The implications for genetic interests should be obvious.

It might be a good idea to review my idea of genetic structure again here.

Genetic structure as per my definition can be viewed as a form of linkage disequilibrium of alleles over all the loci in the genome, or this distinctive genome, of at least whatever number of loci that were assayed.  Each specific permutation of multilocus genotypes is a discrete entity, so that one would expect, of course, district genetic structures between any set of individuals who are not identical twins; there would be differences in genetic structure within families, never mind within ethnies.

However – and this is the key point that separates my idea from the run-of-the mill evaluations of genetic structure – I envision genetic structure to be defined by specific ranges of multilocus genotypes.  Therefore, while there is going to be, naturally, individual variation of discrete multilocus genotypes within families, there will be a family-specific range of multilocus genotypes, a range within which all the individual genotypes, of that family will fall within.  Likewise, there will be ethny-specific ranges of multilocus genotypes, so that members of an ethny will exhibit genotypes that – while they differ on an individual level – will fall within a range, a set, of genotypes characteristic of that ethny.  

It then follows, that while multilocus genotypes will be differentiated from each other, the extent of that differentiation will differ.  Different families will exhibit different ranges, or sets, of possible multilocus genotypes, but families belonging to the same ethny will exhibit ranges that are more similar to each other than that of families of different ethnies (the same goes for individuals of course, across families or across ethnies).  Ethnies belonging to the same continental population group (i.e., intra-racial) will exhibit more similar ranges of possibilities of multilocus genotypes than that of inter-racial comparisons.  One could think of it also as frequency distributions of multilocus genotypes, of all the alleles possibilities at all the relevant loci considered together as a discrete entity, and one can compare how similar the frequency distributions are, with more overlap from those more similar.  

One would also expect a solid correlation, or association, between the differentiation as measured by an allele-by-allele genepool/beanbag approach, single locus genotypes, and multilocus genotypes. The relative extent of differences should correlate in at least a qualitative sense between these levels of “genetic integration.”  Hence, as previously noted at this blog, “complete differentiation” at the multilocus genotype level should differ in extent dependent upon how similar or different the genotypes are from each other.  One should in theory be able to quantitate this in a continuous fashion, rather than just having a binary yes/no undifferentiated/completely differentiated choice.

This is obviously an important topic.  If we are to make decisions based on genetic interests, don’t we need to have a better understanding about what those interests actually are, quantitatively speaking?

It’s true that we know enough right now to justify taking action in defense of genetic interests; even at the lowest levels of genetic integration, and even with estimates of child equivalents based on Fst, we already know that mass migration of alien peoples is genocide.

So, yes, I’m sympathetic to the argument that in general, qualitatively speaking, it is more important to actualize a defense of the interests we already know about than to fine-tune our understanding of these interests. But why not both?  Nothing stops us from both organizing on a political and metapolitical level while at the same time continuing to refine our understanding of this topic.  While most of my work now concerns the political and metapolitical implications of defending EGI and of actualizing a High Culture, surely there is also a place for a better understanding of EGI and for a better understanding of Spenglerian cycles and how to control them foe civilizational benefit.