Category: genetic variation

PopGen June 2019

Two papers.

The first:

Abstract
In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe.

That’s interesting, I suppose, but what is really needed from population genetics is two things.  First, global assays of genetic kinship.  Second, application of genetic structure and genetic integration (e.g., Gillet and Gregorious) to human genetic data. These things are consistently not being done. Is it because they are viewed as uninteresting to the field, or is it because the findings would be politically unpalatable to the field?

Author summary
We introduce a novel statistical method to infer migration rates and population sizes across space in recent time periods. Our approach builds upon the previously developed EEMS method, which infers effective migration rates under a dense lattice. Similarly, we infer demographic parameters under a lattice and use a (Voronoi) prior to regularize parameters of the model. However, our method differs from EEMS in a few key respects. First, we use the coalescent model parameterized by migration rates and population sizes while EEMS uses a resistance model. As another key difference, our method uses haplotype data while EEMS uses the average genetic distance. A consequence of using haplotype data is that our method can separately estimate migration rates and population sizes, which in essence is done by using a recombination rate map to calibrate the decay of haplotypes over time. An additional useful feature of haplotype data is that, by varying the lengths analyzed, we can infer demography associated with different recent time periods. We call our method MAPS for estimating Migration And Population-size Surfaces. To illustrate MAPS on real data, we analyze a genome-wide SNP dataset on 2224 individuals of European ancestry.

I’m not going to judge the validity of this approach without more data; however, any cursory look at current population genetic studies illustrates how the “testing companies” are behind the cutting edge of methodology.

Largely speaking, the spatial variation in inferred dispersal rates and population densities is remarkably consistent across the different time scales (Fig 4). In the MAPS dispersal surfaces, several regions with consistently low estimated dispersal rates coincide with geographic features that would be expected to reduce gene flow, including the English Channel, Adriatic Sea and the Alps. 

In general, geographic barriers have historically impeded (but obviously not abrogated) gene flow.

In addition we see consistently high dispersal across the region between the UK and Norway, which may reflect the known genetic effects of the Viking expansion [22]. 

See more on this below.

These features are consistent with visual inspection of the raw lPSC sharing data (S4b Fig). The MAPS population density surfaces consistently show lowest density in Ireland, Switzerland, Iberia, and the southwest region of the Balkans. This is consistent with samples within each of these areas having among the highest PSC segment sharing (S4a Fig). The MAPS inferred country population sizes are also highly correlated with estimated current census population sizes from [36] and [37] (S5 Fig) which can be mainly attributed to the fact that lPSC segments are highly informative of current census population sizes (Fig 5).

And then:

We do note the lower estimated dispersal rates between Portugal and Spain compared to the rest of Europe in the analyses of longer PSC segments (5-10 and > 10cM), and the higher estimated dispersal rates through the Baltic Sea (> 10cM segments), possibly reflecting changing gene flow in these regions in recent history.

I’m not sure what to make of that Iberian data.  I’m not aware of any significant geographical barrier there, so is that an example of political barriers affecting gene flow?  The data of this paper call into question “testing companies” using generalized “Iberian” or “British/Irish” ancestral categories.

Our estimates of dispersal distances and population density from the POPRES data are among the first such estimates using a spatial model for Europe (though see [30]). The features observed in the dispersal and population density surfaces are in principle discernible by careful inspection of the numbers of shared PSC segments between pairs of countries (e.g. using average pairwise numbers of shared segments, S4b Fig, as in [20]). For example, high connectivity across the North Sea is reflected in the raw PSC calls: samples from the British Isles share a relatively high number of PSC segments with those from Sweden (S4b Fig). 

This is consistent with what is mentioned above, compatible with the historically known gene flow from Scandinavia to the British Isles, particularly England, during the Viking age.

Also the low estimated dispersal between Switzerland and Italy is consistent with Swiss samples sharing relatively few PSC segments with Italians given their close proximity (S4b Fig). 

The Alps being one of the geographical barriers mentioned above.  This of course is not compatible with Der Movement dogma of Northern Italians being “Celto-Germanic Nordics.”

However, identifying interesting patterns directly from the PSC segment sharing data is not straightforward, and one goal of MAPS (and EEMS) is to produce visualizations that point to patterns in the data that suggest deviations from simple isolation by distance.

The inferred population size surfaces for the POPRES data show a general increase in sizes through time, with small fluctuations across geography; In our results, the smallest inferred population sizes are in the Balkans and Eastern Europe more generally. This is in agreement with the signal seen previously [20]; however, taken at face value, our results suggest that high PSC sharing in these regions may be due more to consistently low population densities than to historical expansions (such as the Slavic or Hunnic expansions).

Relative population density may be a driver of genetic history, and one ignored by Der Movement in lieu of more colorful stories about expansions and admixture.

Second paper:

The roles of migration, admixture and acculturation in the European transition to farming have been debated for over 100 years. Genome-wide ancient DNA studies indicate predominantly Aegean ancestry for continental Neolithic farmers, but also variable admixture with local Mesolithic hunter-gatherers. Neolithic cultures first appear in Britain circa 4000 BC, a millennium after they appeared in adjacent areas of continental Europe. The pattern and process of this delayed British Neolithic transition remain unclear. We assembled genome-wide data from 6 Mesolithic and 67 Neolithic individuals found in Britain, dating 8500-2500 BC. Our analyses reveal persistent genetic affinities between Mesolithic British and Western European hunter-gatherers. We find overwhelming support for agriculture being introduced to Britain by incoming continental farmers, with small, geographically structured levels of hunter-gatherer ancestry. Unlike other European Neolithic populations, we detect no resurgence of hunter-gatherer ancestry at any time during the Neolithic in Britain. Genetic affinities with Iberian Neolithic individuals indicate that British Neolithic people were mostly descended from Aegean farmers who followed the Mediterranean route of dispersal. We also infer considerable variation in pigmentation levels in Europe by circa 6000 BC.

Contra Duchesne, ancestry deriving from Neolithic farmers is not restricted to Southern Europe; it is just much more concentrated there.

Advertisements

Autism, Spengler, Lewontin, Der Movement, and Moobs

In der news.

First, about autism, emphasis added:

Autism spectrum disorder (ASD) manifests as alterations in complex human behaviors including social communication and stereotypies. In addition to genetic risks, the gut microbiome differs between typically developing (TD) and ASD individuals, though it remains unclear whether the microbiome contributes to symptoms. We transplanted gut microbiota from human donors with ASD or TD controls into germ-free mice and reveal that colonization with ASD microbiota is sufficient to induce hallmark autistic behaviors. The brains of mice colonized with ASD microbiota display alternative splicing of ASD-relevant genes. Microbiome and metabolome profiles of mice harboring human microbiota predict that specific bacterial taxa and their metabolites modulate ASD behaviors. Indeed, treatment of an ASD mouse model with candidate microbial metabolites improves behavioral abnormalities and modulates neuronal excitability in the brain. We propose that the gut microbiota regulates behaviors in mice via production of neuroactive metabolites, suggesting that gut-brain connections contribute to the pathophysiology of ASD.

So, instead of giving mice MMR vaccinations to induce autism, the “gut microbiota” of autistic humans was sufficient to do the job, including alterations of “alternative splicing of ASD-relevant genes” in the mouse brains.

What could affect the “gut microbiota?”  There’s the initial colonization during gestation, birth, and early years of life. There is diet. And there is antibiotic use.  Conspicuously missing from that list are the “Big Pharma” vaccines injected by dastardly “Jew doctors.” But alas, the festering microbial environment of “Mama,” the lousy diets, and those tasty pink-colored antibiotic spoonfuls (given liberally, even for viral infections against which they are useless) are not “scary needles,” so it’s all A-OK!  After all, those scary injections are violating our bodily integrity and contaminating our precious bodily fluids, so we can’t have that!  Just take dem dere pills and spoonfuls, eat dat dere junk food, and it’ll all be fine!  Also, make sure to expose your child’s amygdala to bizarre NEC phenotypes as well.  What could go wrong?

Lewontin’s Fallacy and “genetic variation within vs. between” mentioned here.  Of course, several months ago, EGI Notes posted a comprehensive refutation of Lewontin, demonstrating, using genetic data and calculations, that ANY human group, no matter how randomly chosen, will always exhibit more genetic variation within than between.  Indeed, random groupings of Whites and Blacks mixed together will also demonstrate the same pattern – conclusively showing that the pattern is due to random distribution of genetic variation among all humans, with no relevance to racial classification whatsoever.  A comparison of human racial Fst to that of dog breeds was also discussed.

That post got ZERO attention from Der Movement; after all, the Quota Queens have circled the wagons and have established a cordon sanitaire around the Sallis Groupuscule, lest I threaten their tin cup panhandling and their by-birthright-affirmative-action-positions.

Comment about Spengler from Counter-Currents:

BroncoColorado
Posted June 5, 2019 at 2:16 pm | Permalink
Spengler deserves, but does not receive, more criticism from rightist sources. As has been mentioned in other posts two other philosophers of history, Piritim Sorokin and Lawrence Brown have both ably demonstrated the inaccuracy and danger in using biological metaphors to describe the development of a culture.
Although Spengler was a teacher of mathematics his literary style is not precise or particularly clear, he writes more as a poet than as a scientist. A scientist with desire to explain history should use his training to identify the causes for a culture taking a particular development path, and not ascribe that path to a destiny that is chosen for it. Such a description is too mystical. In some ways Spengler’s outlook has parallels with his near contemporary Hans Driesch and his thoughts related to biological determinism as observed in embryos.
We need to distance ourselves from Spengler’s brand of determinism. History is open-ended, and it is imperative that we change roads at the next off-ramp, if one isn’t in sight then prepare for some off road driving.

Revilo Oliver, many years ago, critiqued Spengler; I have done so more recently.  Of course, my critique was ignored by Der Movement as well.  Surprise!

Spencer is right.  But he’s not calling out the appropriate people.  David French?  Pshaw!  As Derb would say.  Let’s see. I can think of two prominent racialist types who have, in recent months, publicly stated that we are winning and that our victory is inevitable.  One of these is someone Spencer despises (and vice versa). Then we had this:

Jared Taylor’s short answer to “why we are winning” was “because we’re right and our opponents are wrong.”

Ball in your court, Mr. Spencer.

Der Movement likes to talk about “soyboys,” but based on recent findings, we had better talk about “beer bros” instead.  Is this the reason why the Type I Nutzis are so inept?  Is it that all of the “Sieg Heil and pass the beer” crowd are drenched in estrogen?  Adjust those bra straps, fellas!

New Fst and Kinship Estimators

And a statement on Identity.

In all cases, emphasis added.

The abstract:

Kinship coefficients and FST, which measure genetic relatedness and the overall population structure, respectively, have important biomedical applications. However, existing estimators are only accurate under restrictive conditions that most natural population structures do not satisfy. We recently derived new kinship and FST estimators for arbitrary population structures [1, 2]. Our estimates on human datasets reveal a complex population structure driven by founder effects due to dispersal from Africa and admixture. Notably, our new approach estimates larger FST values of 26% for native worldwide human populations and 23% for admixed Hispanic individuals, whereas the existing approach estimates 9.8% and 2.6%, respectively. While previous work correctly measured FST between subpopulation pairs, our generalized FST measures genetic distances among all individuals and their most recent common ancestor (MRCA) population, revealing that genetic differentiation is greater than previously appreciated. This analysis demonstrates that estimating kinship and FST under more realistic assumptions is important for modern population genetic analysis.

I’m not a fan of Fst for genetic distance estimates for reasons discussed at this blog, and based on peer-reviewed literature, but it is used for that by many, so let’s see what this paper says.

From the main text:

However, the most commonly-used standard kinship estimator [9, 10, 13–19] is accurate only in the absence of population structure [2, 20]. Likewise, current FST estimators assume that individuals are partitioned into statistically-independent subpopulations [4, 5, 21–23], which does not hold for human and other complex population structures.

About Hispanics:

In particular, since differentiation increases from AFR to EUR to AMR (Fig. 3), the greatest kinship is between individuals with higher AMR ancestry, and the lowest kinship is between individuals with higher AFR ancestry (Fig. 4B and C).

So, it would seem that Hispanics like Mexicans and Peruvians have greater kinship among them than do the Caribbean-type Hispanics who stress Negro admixture to a greater extent.  Genetic differentiation (and kinship) seems highest among Amerindians and Pacific Islanders.

Fst between populations may be “substantially larger” than previously determined:

Remarkably, our estimated FST of 0.260 is substantially larger than estimates around 0.098 from existing approaches (Fig. 3) and previous measurements based on FST [30, 45] or related variance component models [31, 46, 47] — except for some AMOVA  ST estimates [48] (pairwise FST estimates [23, 49– 52] are not generally comparable to our estimate). Existing approaches underestimate FST because they assume zero kinship between subpopulations, clearly incorrect as seen in Fig. 1C, whereas our new approach models arbitrary kinship between individuals and leverages kinship to estimate FST.

Consistent with the “genes follow geography” paradigm, with genetic variation being both clinal and discontinuous.

We typically see that each ancestry cluster is concentrated in a certain geographical region, and this ancestry is also present to a lesser extent in neighboring regions and diminishes with geographical distance from its point of greatest concentration. This again argues for a complex population structure where relatedness at the population level falls on a continuum rather than taking on discrete values. The most notable geographic discontinuities in ancestry were observed for cluster 3, which is roughly West Eurasian ancestry.

And within West Eurasians?

Among West Eurasians, kinship is higher within Europe, reflecting another bottleneck.

So much for those that have denied any differences among West Eurasians.

It would be useful to use the new kinship estimator to get quantitative data for groups and transform those into child equivalents as well. That would be important for biopolitical considerations, an important component, but not the only component, of biopolitical identity. Identity – particularly from the general Yockeyian perspective I espouse – has multiple components.

Interestingly, he authors of this paper take a similar perspective; thus:

This partition into subpopulation is based on geography, history, language families, and our kinship estimates.

If “history” includes cultural/civilizational components, which are the major proximate interests, then this tracks well with my idea of Identity, composed both from the key ultimate interest (genetic kinship) and the major proximate interests. These different sets of interests synergize to form sharp discontinuities which are not present when only one interest is considered in isolation.

Now, I do not agree with the authors including the Ashkenazim in the European subpopulation, but that does not mean their approach is wrong – they are simply following the same simplistic mindset reflected by the testing companies that “they are found in Europe so they are European,” ignoring the history of the Ashkenazim as a Diaspora group akin to the Roma.

But, that’s a minor detail. The major approach of synergistic Identity is sound.

More on Cancer Cell Lines

Race, race, race.

Read here.  Excerpts, emphasis added:

Assessing the role of ancestry-associated genetic variations in disease etiology is further complicated by the recent admixture that characterizes various populations of the world (24). Hence, an individual’s ancestry can be described by quantifying the proportion of the genome derived from each contributing population (global ancestry). Heterogeneity is also observed locally in the genome, as variability is observed in the ancestral origins of any particular segment of chromosomes (local ancestry; ref. 25). Ultimately, genetics plays a role in the biological characteristics of a cancer in the form of both germline variation and somatic alterations. Further research is needed to determine the extent to which genetic differences align with ancestral genetic changes (26).

Cell lines reported as “African” or “Black” clustered with African-American populations in 81.6% of the cases, emphasizing the ambiguity of the existing nomenclature. In fact, the proportion of the genome inferred to be of European origin in these cell lines averaged 18.32% (ranging from 0% to 95.09%). Another type of ambiguity concerns the cell line Hs 698.T labeled as originating from an “American Indian,” which clusters with populations of South Asia, suggesting an origin in India rather than from a Native/Indigenous American individual. A total of 26 cell lines were reported as Caucasian but clustered genetically with other populations including African (n = 2), African American (n = 6), East Asian (n = 1), Hispanic/Latinos (n = 16), and South Asian (n = 1). Interestingly, 89% of the cell lines identified as Hispanic/Latino from admixture patterns and clustering are reported as “Caucasian.” Several groups have reported a concordance between self- or observer-reported belonging to major racial/ethnic groups (141–143). However, these categories do not capture the inherent heterogeneity of admixed populations (144–147). What appears as inconsistencies in self-report and genetic data may result from individuals having limited knowledge of their ancestral origins, or culturally identifying to an ethnic group that is not representative of one’s admixture proportions (18). Sociological, behavioral, and biological factors that underlie race, ethnicity, and ancestry are likely to interact (148). Consequently, from a biomedical research perspective, both self-reports of race/ethnicity group as well as genetically determined clustering and admixture are expected to be relevant in understanding disease susceptibility, and ultimately, the causes of health disparities (18, 148, 149).

Note the last phrase.  Also, importantly, there is misclassification.  Given that people are not always accurate about their own self-reported ancestry, what can we say about the ancestry testing companies that use customer samples to inflate their pathetically limited parental/reference population datasets?

Also consider Figure 1 in the paper. It looks to me like the cancer cell lines exhibit more admixture than the actual human population samples. At the very least,there are observable differences in ancestral proportions. Some of that of course is simply the well known admixture in “African Americans,” but what about the other populations?  That could be due to the misclassification mentioned above, there are of course issues about sample size, and concerns over how accurate the ancestry testing is. Cancer cell lines also tend to have high mutation rates, reflecting the situation in the tumor of origin. However, even with all those caveats, can we consider the possibility that increased admixture is associated with a higher cancer risk; hence, cancer cell lines show more admixture because cancer patients are on average more admixed than is typical of the general  population?  Given how prevalent cancer is, the differences are not great, as we are dividing populations in two relatively similar “chunks” (the difference being cancer vs. non-cancer); but still, if there is going to be any differences between the two “chunks” – perhaps the cancer “chunk” exhibits more admixture than the non-cancer “chunk?” Anyone willing to test the hypothesis?  Or, we can consider the more general hypothesis of statistically significant differences in ancestry between cancer vs. non-cancer for each population group (regardless of admixture, or which group has more admixture, etc.).  

We Are Not All the Same

Genes, Race, IQ, and disease.

One refutation of Lynn, and three papers with emphasis added.

Refuting Lynn, refuting the Alt Wrong/Alt Yellow.  Amren weeps.

Read here.

BACKGROUND:
Although cell lines are an essential resource for studying cancer biology, many are of unknown ancestral origin, and their use may not be optimal for evaluating the biology of all patient populations.
METHODS:
An admixture analysis was performed using genome-wide chip data from the Catalogue of Somatic Mutations in Cancer (COSMIC) Cell Lines Project to calculate genetic ancestry estimates for 1018 cancer cell lines. After stratifying the analyses by tissue and histology types, linear models were used to evaluate the influence of ancestry on gene expression and somatic mutation frequency.
RESULTS:
For the 701 cell lines with unreported ancestry, 215 were of East Asian origin, 30 were of African or African American origin, and 453 were of European origin. Notable imbalances were observed in ancestral representation across tissue type, with the majority of analyzed tissue types having few cell lines of African American ancestral origin, and with Hispanic and South Asian ancestry being almost entirely absent across all cell lines. In evaluating gene expression across these cell lines, expression levels of the genes neurobeachin line 1 (NBEAL1), solute carrier family 6 member 19 (SLC6A19), HEAT repeat containing 6 (HEATR6), and epithelial cell transforming 2 like (ECT2L) were associated with ancestry. Significant differences were also observed in the proportions of somatic mutation types across cell lines with varying ancestral proportions.
CONCLUSIONS:
By estimating genetic ancestry for 1018 cancer cell lines, the authors have produced a resource that cancer researchers can use to ensure that their cell lines are ancestrally representative of the populations they intend to affect. Furthermore, the novel ancestry-specific signal identified underscores the importance of ancestral awareness when studying cancer.

Racial genetic differences mean that results obtained with cancer cell lines from one race may very well be NOT applicable to other races.  There are indeed racial differences in gene sequences and gene expression, with clinically significant implications for patients.

Read here.

BACKGROUND:
We examined racial differences in the expression of eight genes and their associations with risk of recurrence among 478 white and 495 black women who participated in the Carolina Breast Cancer Study Phase 3.
METHODS:
Breast tumor samples were analyzed for PAM50 subtype and for eight genes previously found to be differentially expressed by race and associated with breast cancer survival: ACOX2, MUC1, FAM177A1, GSTT2, PSPH, PSPHL, SQLE, and TYMS. The expression of these genes according to race was assessed using linear regression and each gene was evaluated in association with recurrence using Cox regression.
RESULTS:
Compared to white women, black women had lower expression of MUC1, a suspected good prognosis gene, and higher expression of GSTT2, PSPHL, SQLE, and TYMS, suspected poor prognosis genes, after adjustment for age and PAM50 subtype. High expression (greater than median versus less than or equal to median) of FAM177A1 and PSPH was associated with a 63% increase (hazard ratio (HR) = 1.63, 95% confidence interval (CI) = 1.09-2.46) and 76% increase (HR = 1.76, 95% CI = 1.15-2.68), respectively, in risk of recurrence after adjustment for age, race, PAM50 subtype, and ROR-PT score. Log2-transformed SQLE expression was associated with a 20% increase (HR = 1.20, 95% CI = 1.03-1.41) in recurrence risk after adjustment. A continuous multi-gene score comprised of eight genes was also associated with increased risk of recurrence among all women (HR = 1.11, 95% CI = 1.04-1.19) and among white (HR = 1.14, 95% CI = 1.03-1.27) and black (HR = 1.11, 95% CI = 1.02-1.20) women.
CONCLUSIONS:
Racial differences in gene expression may contribute to the survival disparity observed between black and white women diagnosed with breast cancer.

Health disparity differences in outcome for breast cancer in White vs. Black women have a genetic basis.

Read this.

Age at menarche (AM) and age at natural menopause (ANM) define the boundaries of the reproductive lifespan in women. Their timing is associated with various diseases, including cancer and cardiovascular disease. Genome-wide association studies have identified several genetic variants associated with either AM or ANM in populations of largely European or Asian descent women. The extent to which these associations generalize to diverse populations remains unknown. Therefore, we sought to replicate previously reported AM and ANM findings and to identify novel AM and ANM variants using the Metabochip (n = 161,098 SNPs) in 4,159 and 1,860 African American women, respectively, in the Women’s Health Initiative (WHI) and Atherosclerosis Risk in Communities (ARIC) studies, as part of the Population Architecture using Genomics and Epidemiology (PAGE) Study. We replicated or generalized one previously identified variant for AM, rs1361108/CENPW, and two variants for ANM, rs897798/BRSK1 and rs769450/APOE, to our African American cohort. Overall, generalization of the majority of previously-identified variants for AM and ANM, including LIN28B and MCM8, was not observed in this African American sample. We identified three novel loci associated with ANM that reached significance after multiple testing correction (LDLR rs189596789, p = 5×10⁻⁰⁸; KCNQ1 rs79972789, p = 1.9×10⁻⁰⁷; COL4A3BP rs181686584, p = 2.9×10⁻⁰⁷). Our most significant AM association was upstream of RSF1, a gene implicated in ovarian and breast cancers (rs11604207, p = 1.6×10⁻⁰⁶). While most associations were identified in either AM or ANM, we did identify genes suggestively associated with both: PHACTR1 and ARHGAP42. The lack of generalization coupled with the potentially novel associations identified here emphasize the need for additional genetic discovery efforts for AM and ANM in diverse populations.

There seems to be genetic differences underlying reproductive lifespan in women of different races.  I hypothesize that Negro females would tend to possess variants promoting earlier reproduction.  Blacks and Hispanics have earlier puberty than Whites.

Ancestral Graphics

A more visual explanation.

Let us explore some of the ideas broached here in a simplistic visual manner, to make some of the basic concepts more understandable to drooling Nutzi Type I retards. Note that all of the below is obviously very highly simplified so as to make the concepts clear to “movement” “activists” and their below-room-temperature IQs.

Also note that the first graphic uses, again for the sake of simplicity, a one-dimensional continuum, as opposed to the two-dimensional PCA plots used in many population genetics studies (and true biological reality is multi-dimensional, more complex than any PCA plot).  It shows clinal genetic variation.  Blue and green are European populations, while purple and yellow are non-European.  The other X’s are other populations that lie along the continuum of genetic variation. The red and orange-brown X’s represent populations even more genetically distant from Europeans than are the purple and yellow; these are presented for the sake of illustrating clinal variation and will not be relevant to the following analysis.


xxXxxxXxxxxXxxx—-xxxxxXxxxXxxxx—xxxxxxXxxxxxxXxx

A company calculates ancestry based on SNP gene frequencies, and chooses purple, blue, and yellow as parental populations (we can assume red and orange-brown are chosen also, but, again, for the sake of simplicity, we will not discuss those populations). Thus, the  company chooses blue as a population representing “European.”  Green is not a parental population for this company.

So, a green individual (i.e., someone of “green” ancestry), represented by the purple-blue-yellow parental populations, might be, say 85-90% blue and 10-15% yellow.  Blue individuals and individuals from the X’s adjacent to blue, would test out as close to 100% blue (European). What if green was chosen as a European parental population instead of blue?  Then green individuals (and persons from related groups) would be close to 100% green (European) and blue individuals may show significant fractions of purple.  Of course, including both blue and green as parental populations would be best.


Given that a company (deCODE back when they were offering their own ancestry test) openly admitted that clinal genetic variation coupled with a limited set of parental populations could result in artefactual “admixture,” the above analysis, however simplified, is a reflection of the reality of these tests. The more similar someone is to the parental populations, the greater the probability of getting high percentage (i.e., close to 100%) matches to their actual ancestry. The more distant, the lower the probability.  The more fine the level of distinction required,the greater the need for more parental populations.  At the level that these companies purport to assay, at racial and subracial, and ethnic levels, of course you will need a very broad array of parental populations, which they do not have. And, yes, of course they know this.  After all, why do they occasionally add more parental populations to their limited databases?  If it really didn’t matter, we could just go back to the days of DNAPrint and use CEU Romneyites from Utah as “European” and not bother with anything else.  But, alas, then Germans would start getting “East Asian admixture” and we can’t have that.

Apologists for ancestry testing companies would argue that some of those companies use chromosome blocks (haplotypes) to make their ancestry estimates, rather than just SNP gene frequencies.  As I wrote in the above-linked post, this is even worse.  Let’s consider what can go wrong here, again using a simplified example suitable for brain-addled Nutzi freaks.

Let’s assume an individual from the green ethny is tested via the haplotype/chromosome block method, using the same blue-purple-yellow parental populations.

At the most conservative, highest confidence level of 90% (that is still less than the 95% typically used in scientific publications, although there is obviously subjectivity on where to draw the line), this person gets 58% blue, 40% unassigned (black), and 2% yellow (that can be real or artefactual).  That can be crudely represented as follows (with a single continuum representing all the chromosomes for the sake of simplicity):

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

But, at the pathetically comical 50% confidence level (flip a coin!), the 40% that was unassigned at 90% becomes: 3% still unassigned, 27% looks a bit more blue than yellow and so is assigned to blue, and 10% looks a bit more yellow than blue and so is assigned to yellow.

Now the person is 85% blue, 12% yellow, and 3% unassigned.  Again we assume blue = European, and yellow is some non-European group.  That’s crudely shown here:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The default setting is at 50% confidence for the company to report their results and the Nutzis start their heavy breathing excitement.  Admixture!

But what if the parental populations were purple, green, and yellow?  Then all of the above would hold, but substituting green for blue, and purple for yellow.  Here, green = European, and purple and yellow = non-European, the green individual would now be 98-100% green and 0-2% yellow. 


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


This green individual would have little unassigned even at 90%, while the blue individual would now exhibit the same problems the green individual had before (albeit with different color combinations).  So, at 50% we would have the following for a blue individual:


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Now, at this point, the Nutzis will be screaming about how “crazy” and “wrong” the test is.  


By the way, if there is any question about the validity of the haplotype discussion in this post, see these admissions from the heroes at 23andMe (spun in a manner to make them look less culpable).  The highlights:

Each prediction is also linked to our confidence that the call is correct. By default, Ancestry Composition requires that our confidence in a prediction be greater than 50%.

Two points.  First, “greater than 50%” could be 51% or 50.1% or 52.25% or whatever measure (e.g., “55%” – see their Japanese example immediately below) slightly greater than the probability of a coin flip.  Second, even with that, the “calls” are based on what their parental population database is.  The “correct call” here does not really mean matching biological reality; instead, it merely means “correct” within the confines of the test’s parameters.

For example, if a segment of your DNA has a 55 percent chance of being Japanese, then that segment will be painted as Japanese at the 50 percent confidence level, but it will be painted with a more broad ancestry…at the 60 to 90 percent confidence levels.

Exactly what I have been writing all along about this.  And, of course, the individual in question might not be Japanese or part Japanese.  Perhaps this is someone of an Asian ethny not part of their parental population database, so the company is trying to assign chromosomal fragments based on the fragments’ relative similarity to that of ethnies that are part of their database.  If the actual population was included as a reference, then this would not be necessary (at least not to this extent).


And this also demonstrates why the haplotype/chromosome block (“chromosome painting”) method is even more sensitive to test parameters than is the more general SNP frequency method, particularly at low confidence levels.  A shift in probability from 49% to 51% can result in an entire chunk of the genome being reassigned to a different ancestry at the 50% confidence level, and that subtle shift could result from differential representation of parental populations.


Default reporting of such low confidence levels is ludicrous.

Grading the Testers

Best and worst of the worst: A survey of some ancestry testing companies. Introducing the term “parental privilege.”  In all cases, emphasis added.

This is an opinion piece on some examples of the “state-of-the-art” (such as it is) in commercially available ancestry testing; this is not meant to be comprehensive.  I’m not going to discuss online accusations that the companies fudge results to “stick it to racists.”  For the most part I’ll discuss the actual product, with a few words here and there about certain other issues.

Before we begin, let’s take a look at “movement” Type I droolcup commentary:

Tostig 

I think that it is weird how some people clutch at microscopic bits of DNA and pretend that they are something they are not. My DNA is 90% Southern England, 5% German and 5% Norwegian. The two 5%s, is due to those Vikings raping and pillaging everywhere. Actually, at least 30% is down to the hunter gatherers who followed the retreat of the ice. I do not desire any exotic mixture

Alfred the Great Tostig

Your DNA is very similar to mine. The admixture that we have is from kindred, white races. So we are pure.

Yeah, Alfred.  How about this instead – you derive from ethnies well represented in your testing company’s parental population database, ancestral components labeled as “European.”  You are essentially being compared to yourself.  Congratulations.

The perfect “historical” example of this was DNAPrint Genomics using Hapmap CEU – essentially Anglo-Mormons from Utah – as the parental population for “European,” followed by Pennsylvania German-Americans getting significant levels of “East Asian admixture.”  By those standards, Mitt Romney was undoubtedly a pureblood.  So, tell me, Alfie – if the PA Germans had been used to define “European” instead of Utah Mormons, what do you think would have happened to all that “East Asian admixture?”

I’m sure the Type I peanut gallery response to that question would be: (((((crickets)))))

For information aimed at Normies see the following:

Also, see Dienekes’ criticism of these tests and their use of parental reference populations for “training data.”  Note that many companies include customer data as part of their parental populations, and that data is not verifiable to the extent data derived from academic publications are.  Are customers’ reported ancestries accurate?

Please keep in mind I am focusing here on ancestry, not health data, and I am focusing on autosomal ancestry data, not NRY or mitochondrial DNA, single locus markers that I have zero interest in. If you are into health data, 23andMe provides some of that, although people have complained about the accuracy of such data (stories about that are found online), and several companies provide NRY and Mito data that seem reasonably accurate.

The following comments are based on my reading about, and analyzing, the tests and some online results, based on my own scientific knowledge, the population genetics literature, what is known about human population history, as well as logic and common sense.  The viewpoint is informed by a concern for politically relevant EGI, rather “movement” obsessions about “purity.”

I have long criticized 23andMe, which is an absolutely terrible test – it seems like the most popular such test among both Normies and Nutzis, it has majestic flaws, it is constantly misinterpreted, and in my opinion the company’s lack of transparency about certain realities borders on fundamental dishonesty.  As one example of the latter, consider: 

23andMe amusingly explains “unassigned” this way:

It is also possible to see a percentage of your DNA listed as “Unassigned.” There are two reasons why a piece of DNA might have unassigned ancestry:

The piece of DNA matches many different populations from around the world.

The piece of DNA does not match any of the reference populations very well.

That’s amusing because obviously the second explanation is what makes sense; the first is absurd.  The whole purpose of the test is to identify and distinguish ancestral components. Since humans share 99+% of their DNA, excuse #1 should in theory apply to everyone equally.  If the riposte is that we are talking about specific haplotypes stretching across chromosomal fragments, and those are not so widely shared, then how could it be possible for a stretch of DNA (chosen for examination for its value in assigning ancestry) to be so widely shared as to be “unassigned” to begin with?  And why would most of this “unassigned” show up only at the highest, most conservative, confidence level?  How come this “matches many populations” fragment is assigned to specific populations at the 50% interval?  And why is it that the “unassigned” just so happens to show up for those individuals for whom parental population coverage is relatively lacking?  Coincidence?

It’s obviously point #2 – the fragment doesn’t match the limited reference populations. At the most conservative setting they admit this is “unassigned,” but at the lower settings they just pick whatever population samples they have available at the time that may be slightly more similar to the fragment than are others. To say that 30% or 40% or 50% or whatever of someone’s genome, which is being used to distinguish ancestry in the first place, is common to many populations is ludicrous if you think about it – it is just coming from a population that may be more intermediate in the clinal genetic range than the parental populations they use.  If the DNA fragment is one that is similar between “many populations” than how can they distinguish it in some people and not others?  Simple answer – the “privilege” of being a member of a population well represented as a parental population. This is what I term “parental privilege” in ancestry testing – some people derive from ethnies well represented as parental populations, so those individuals get good matches and relatively “pure” results. For these lucky people, suddenly the “shared by many populations” (faux) problem no longer exists. This company makes it sound (for point 1) that, hey, it’s just a generalized piece of DNA, but then they ignore that it just so happens – a coincidence no doubt! – that people who derive from certain parental populations can have that fragment very easily assigned. It is “common to many populations” only when no matching parental population can be found in their database. “Parental privilege” is analogous to a form of affirmative action in ancestry testing. 

perfect example of parental privilege can be read here.  Note that, with the most conservative confidence level for 23andMe this person was getting only 0.3% unassigned ancestry. 0.3%!  Meanwhile, other people, at the same confidence level, get in the range of 30-50% – two orders of magnitude higher!  How can you compare the accuracy of those sets of results?  It is absurdity.  Someone who at the highest, most stringent, confidence level offered by the company, has only 0.3% of their ancestry unassigned is obviously getting much more dependable results than someone who has 30% or 40% or 50% or whatever high level of their ancestry as unassigned.  Amusingly, this person is still not satisfied by the fact that they are essentially being compared to themselves; thus:

The lack of defined reference samples from specific countries within the British Isles also sometimes gives confusing results. I have one project member with seven of his eight great-grandparents born in Wales and one great-grandparent born in Devon. At AncestryDNA he comes out as 64% Irish and 12% Great Britain, 12% Scandinavia and 11% Trace Regions. At Family Tree DNA his ancestry is reported as being 97% from the British Isles and 3% from Finland and Siberia.

Well, others have it worse.  And then read this, an excerpt:

One could thus reasonably infer that, rather than ancestry, commercial DNA test results represent current geographic distribution of various population groups living wherever they happened to be living when the companies collected their samples. A customer’s DNA matches inform them of not where their ancestors might have come from in the past, i.e., their ancestry, but rather current geographic distribution of similar patterns of DNA bits that each company happens to probe for. Is that DNA-based heritage or ancestry, i.e., hints of ancestral people customers may have descended from? Sounds far from it.

I agree and that is one of the points I make in this post; one flaw in these tests is using extant populations to model past “admixture” events, and this is particularly comical since even “pure” extant populations are mixtures of earlier groups.  Further, to use extant populations in a fair manner you need broad parental populations, which as we have seen, and will see more below, do not exist.

Getting back to the “unassigned” issue, online comments by customers speculating that “unassigned” regions may be due to “admixed regions” also fails as a logical excuse.  First, it doesn’t consistently fit with the idea of a fragment being read as shared by groups (excuse 1) – being shared by different groups is not necessarily the same as being a mixture of those groups. Second, on an even more fundamental logical level, it fails because all extant ethnies are mixtures of past groups, but are being definitively assigned as a specific ethny if that ethny is well represented as a parental population today. If an ethny is a parental population, then a similar fragment is assigned to that ethny, regardless of the ethnic history that created that stretch of gene sequences. So, this excuse basically conflates with the more realistic excuse 2 – insufficient coverage of parental populations. It’s not a mysterious piece of DNA widely shared but yet distinct for any group represented in the parental populations, and it cannot be shrugged off at the same time as admixture. It is poor coverage. Adding more parental populations will create matches to those regions even at a 90% (“conservative”) confidence level, and make all the results more accurate and realistic.

As stated above, using extant ethnies to estimate deep ancestry is not going to give consistent results, as we in fact observe. Then there is the problem that people take seriously the labels companies give to particular ancestral components – as if a name is more than just a convenient label and instead carries some deep and objective meaning about the underlying objective ancestry. If a company decided to label some type of ancestry “Martian” does that mean people with that ancestry are descended from little green men?  True enough, the labels do have some meaning on the broad scale.  No company is going to label Sino-Japanese ancestry as “European.”  They have some standards.  But as we dig deeper, the correlation between names and objective meaning starts to fall apart.

An example of the labeling problem is that some of these companies label Ashkenazi ancestry as “European” while ancestral components that are part of the European genepool (e.g., enriched in Southern Europe), entering from the Neolithic to Bronze Age to Classical Age (and to some extent possibly from more modern intrusive invasions and migrations) are labeled “Western Asian” or “Southwest Asian” or “Middle Eastern” or “North African” or “Turkish/Caucasus” due to similarity to gene sequences found in modern populations from that region.  Greg Johnson’s admixture component, if real and not an artifact, is likely from such an ancient source. 

Thus, a problem here is that Jewish genes are labelled “European” while parts of the European genepool are labeled as something else. Again, labels are not the things themselves; dependent on the biases and parent populations of a given company, a given ancestry can be assigned to different continental population groups.  For example, why not label Ashkenazi as Middle Eastern? Why not invent a “Neolithic” or “Mediterranean” label (instead of “West Asian” or “Southwest Asian”) for component autosomal ancestries enriched in those regions where J2 NRY is common?  One labeling scheme is as justified as the other.  If the idea is “we label based on where the ancestries are most enriched in modern times,” then great – last time I looked Israel is not in Europe.  Inconsistent much?

Also, if we were to assume some, any, or all, of this purported admixture is real (and only at the highest confidence levels should it really be possibly so considered), and if we note that current populations are being used as the parental populations, the it is clear that “Western Asian” or “Southwest Asian” almost certainly tracks with the dispersal of J2 NRY and would most likely be ancient Neolithic, Bronze Age, and perhaps Classical population movements. Later invasions would be Berber-Arab and would track with North African/Arabian ancestral components – although some of these can be ancient as well, particularly with contacts between Southern Europe (particularly Iberia) and North Africa in ancient times (as well as the modern Moorish intrusive elements).  We should not conflate “West Asian” with “North African” – these are not the same racially or historically. Consistently with these tests, ancestral components like Anatolian genetics seem to track with J2 NRY, so it appears it is an ancient component, and showing up in European populations because of poor parental population coverage

And again we come back to the issue of parental populations.  European ethny “X” – not a parental population – is characterized by a test as having some degree of “admixture” compared to the parental populations available. However, if “X” itself is used as a parental population, then individuals from “X” will see most (and in some cases all) of the “admixture” disappear, since they are being compared to the consensus of their own ethny.  

The riposte to that would be that “that is an unfair obfuscation of the underlying genetic realities.”  Perhaps.  But why can’t the same be said about other groups used as parental populations?  As noted above, when DNAPrint Genomics was using CEU Utah residents as the European parental population – basically using Anglo-American Mitt Romney types as the reference population for “European” – some German-Americans (*) were getting “East Asian admixture.”  Most tests today have Germans as one of the parental populations, so few if any Germans are getting any such admixture. So, groups used as parental populations are “privileged” (see above) in the sense that members of such groups, or genetically closely related groups, are going to get minimal to no “admixture,” as they are being compared to their own ethny.  The riposte to that would be that “well, we ‘know’ from ‘racial history’ that some groups are more admixed than others, so the choices of parental populations makes sense.”  Perhaps, but that is mostly subjective, and when based on genetics data it is circular reasoning.

Objectively, we could just use raw genetics data for genetic kinship analysis – by its nature kinship analysis includes all sources of autosomal genetic variation, including admixture – but people seem not to want that and/or companies refuse to offer it. In any case, the companies can’t be so stupid as to not know that the choice of parental populations directly affects the results.  They (with the one exception below) just don’t want to admit it.  

One point I’d like to make is that although 23andMe’s “chromosome painting” has some advantages if done correctly – identifying chromosome blocks and the timing of putative admixture events – the key point is “if done correctly.”  In most cases, just looking at SNP frequency data is going to be much more dependable, because the higher-level analyses are increasingly dependent on proper parental population samples (as well as an overall proper methodology).  If you misidentify several SNPs out of the many used, well, that’s not good but not “fatal” to reasonably accurate results, as long as the rest of the SNPs are more or less correctly characterized. But if you misidentify an entire chunk of someone’s chromosomes, then you are going to markedly alter their ancestral composition.  I trust it is clear why this is so – it is the difference between misidentifying individual alleles vs. misidentifying haplotypes that cover significant portions of the genome.  The latter situation amplifies the error because the error constitutes such a large percentage of the ancestral calculation, while the former error is relatively minuscule.

So, I’d trust the data based on SNP frequencies more given equal parental population representation.  That doesn’t mean the SNP frequency data are correct – the company may have made errors in that as well – but we are talking about relative probabilities here.

In summary, 23andMe gets a C for people who have good parental population coverage, they good a F for those who do not, so the overall grade for 23andMe is a D.  And that’s not good. It’s terrible in fact. By comparison, evaluating DNAPrint’s test by today’s standards, it would be a D- or F, while by the standards of its own day it was maybe a C+ or B-. In a relative sense, 23andMe is far worse, and in a gross sense, it is at best only marginally better. It’s a disaster. I’m not impressed by DNATribes either – parental population coverage there is relatively good, but…STR analysis? An F for them.  They’ve announced they are going out of business – should they be upgraded to an A for that?  In my opinion, DNATribes is/was even worse than 23andMe, and we can only hope 23andMe follows DNATribes’ lead in closing up shop.. And, I’ll give FamilyTree DNA an F – F for FBI.  Genetic privacy matters. Enough said about that.  What about other tests?

We can consider AncestryDNA, yet another substandard test.  If you look at their website, they make it sound like they have a really large number of parental populations, for example “see all regions” at their website. However, when the customers get their results, it is the same old story with the standard reference populations. True enough, the company will tell customers, in a qualitative sense, where the more specific place of origin of their majority ancestry most likely is, but that’s it.  The more specific subregions are not being used as reference (parental) populations, and they are not directly used in a quantitative sense to give the ancestry proportions.  The company’s website is therefore in my opinion highly misleading.

As a positive, they give errors bars, which is a plus; however, the range they give is sometimes extremely broad.  Results can vary over a range of 10-20%, etc. That’s not very precise, and demonstrates why these tests cannot be used to determine exact cut-offs. A person “100% pure” may actually be, say, 85%, and a person “85%” may actually be 100%.

From an online forum about this company’s test:

thednageek says:

September 13, 2018 at 10:00 am

Thank you for the kind words. I’d love to hear how your new results compare to your tree. Northern Europeans seem to be quite happy with the new estimates; southern Europeans less so.

I can’t say I’m surprised.  Look at the reference populations. In general, the sample sizes for Southern Europe are less than that of Northern or Eastern Europe. Italy has the most at 1000, but that is less than France, “Germanic Europe,” England/Wales/”Northwestern Europe,” as well as “Eastern Europe”/Russia.  Population genetics studies have shown greater genetic heterogeneity in Southern Europe than the North. So, good coverage is particularly important in the South. Consider that the “movement” likes to tell us how Northern and Southern Italy are radically different, racially speaking. If that is so, then those regions should have their own separate reference populations. Or are they really similar? You can’t have it both ways. If Lombards and Sicilians are less similar than are Norwegians and Swedes to each other (the company has Norwegians and Swedes as separate reference populations), then the different Italian subgroups should have their own reference (parental) populations. On the other hand, if those Italian groups are so similar that a general “Italy” category is sufficient, then all the fetishists should stop foaming at the mouth over intra-Italian differences. Again, you can’t have it both ways. In general though, a test that distinguishes Norway from Sweden, and England from Scotland, should probably break apart places like Italy into subregions – which would be more honest given how they advertise the test on their website.

Actually, even some of those well represented regions have problems.  What is “Germanic Europe?” Why not Germany alone?  Why not separate North and South Germany?  Different regions of France?  Separate Russia from other Eastern European nations?  England vs. Wales vs. “Northwestern Europe?”  And Ireland and Scotland combined?  Why?  Now, as I have said, areas with greater genetic heterogeneity require more coverage, but, still, e,g., the English and Welsh are not identical and should not be lumped together as such.

I also read where the newest version of the test (like 23andMe) uses haplotypes rather individual SNPs. If you do that, you MUST have excellent coverage for your reference (parental) populations.  An error is misidentifying an entire chromosome block is going to be a lot more damaging than getting scattered SNPs incorrect  That amplifies the problem of insufficient reference population coverage and is another explanation why Southern European results have gotten worse after the change.

So, AncestryDNA gets a D/D+ for overall results, which would have been upgraded to D+ for giving error bars (however broad), but because they are (in my opinion) misleading customers as to what the reference base actually is and how detailed it is for subregions, they get downgraded to a D.

Now we will consider another terribly flawed and incompetent test – the National Geographic Geno 2.0 (Helix) test, which uses Next Generation Sequencing, is purported to be designed to look at “deep ancestry,” but that make the error, consistent with other companies, of using extant, narrow, parental populations as proxies for “deep” ancestry, which is a major flaw.  Their “reference populations” are extremely limited (as usual – the typical “parental privilege”), the labels they give ancestral components are strange, and the website is reported by some customers to be difficult to use.  We will consider the various versions of the Geno tests, of which Helix is one.

Putting aside this person’s (somewhat dated) opinions of the tests (keeping in mind she derives from populations that may have better parental coverage – even at that time – than others), I find it interesting that a person who is predominantly of Northern European heritage has a substantial contribution of “Mediterranean” and “Southwest Asian” ancestral components as measured by one (older?) version of the National Geographic “deep ancestry” test.  Granted that there is an unknown component in her genealogical ancestry, still, I believe that these data – to the extent they are in any way meaningful – likely represent Neolithic (and perhaps Bronze Age) influences.  In other words, these components – including “Southwest Asian” – are a natural part of the European genepool, albeit represented to different degrees in different parts of Europe. Of course, I disagree with their “Mediterranean” category that lumps together genetically and historically disparate groups; however, in that case, it may represent a common thread (Neolithic?) of these groups, with the rest of the total ancestry of these groups being different. In any case, once again, we see the danger of taking labels literally, and also the problem of using current extant parental populations to represent ancient ancestral components.  

See this.  We note several things here.  There isn’t a good range of parental populations. We note that all European populations – including Northern European populations – are bring represented as being composed of different ratios of Northern European, Mediterranean, and Southwest Asian ancestral components (with some populations having low levels of other ancestries).  Thus, different ethnies are represented as diagnostic ancestral components.  Also, some of these populations are considered by 23andMe as distinct, discrete “pure” populations but are here represented as mixes of various ancient ancestral components.  

Here is yet another (“next generation”) characterization of reference populations with their respective ancestral components.  We notice three crucially important things.  First, many of the populations are the same as in the original list (discussed above) but the ancestral components are different. The same populations, with the same gene sequences, are being represented differently with alternate sets of ancestral components (each component given descriptive labels by the company). Thus, how a population’s ancestral components are represented, and how those components are labeled, can change over time; differing between various versions of a test and of course varying between different company’s tests. Second, again we see that European populations are composed of different components, they are all “admixed” to some degree based on the ancient components identified by the test. Third – and this applies to both versions of the National Geographic reference populations – what is considered mixes here would be considered “pure” in 23andMe, demonstrating how concepts of “purity” differ with what reference populations are used, how companies decide how to represent those populations, and what labels are used for description. Thus, in 23andMe, “European” includes “Greek/Balkan” as a category, as that is represented as part of their parental population base.  In theory, someone genetically similar to 23andMe’s Greek/Balkan reference population could be “100% Greek/Balkan” and hence “100% European” – while that same ancestry in the National Geographic test will be shown as a mix of different ancestries, mostly European but some non-European.  It’s the same gene sequences, the same ancestry, but interpreted in widely divergent ways by the companies and the tests. What one company labels “pure” another company – digging deeper in the ancestral mix – considers to be “admixed.”  It’s all relative, not something definitive and set in stone. There’s nuance and interpretation, shades of gray, not black and white.  And both Nutzi fetishists and Normie ignoramuses cannot understand this.

Ancestry results are not something that can be interpreted as absolutes, they are dependent upon methodology, parental populations, labels given to ancestral components, all leading to whether the company is assaying more recent ancestry, or “deeper” ancient ancestry.  The “purity” myth is on display here, since “100% pure” ancestries in one test will be represented as mixtures of components in a different test. Labels and interpretations are not the same as objective reality. And this is a crucially important point. The ancestral components themselves are certainly made up of mixtures of earlier population groups.  For example, with respect to the “Eastern European” component, which most possibly reflects Slavic ancestry, the company states (emphasis added):

The large Eastern European component is typical for the region, and is itself a genetic composite of years of migration through the region.

So, again, this is something “movement” fetishists don’t understand – the ancestral components that they perceive as “pure” are themselves mixtures from earlier times, mixtures containing components that may well trace from outside of Europe.  That is the nature of human biological reality.  There is no “purity.”  Instead, there are greater or lesser degrees of genetic similarity and difference.

If “Eastern European” does in fact reflect a basic Slavic ancestry, and if these results can be trusted, then it is interesting that Balkan South Slavs like the Bulgarians are heavily Slavic, only a few percentage points less than Russians and Poles, and more than the Czechs, all groups typically considered “more Slavic” than are Balkan groups.  So, there may well be evidence for a common Slavic ethnoracial foundation for all these groups. Also note that Romanians are more “Southern European” than are Bulgarians, despite the fact that Romania is just to the north of Bulgaria, and based on simple gene flow you’d expect the results to be the opposite.  Maybe there is something to the idea that there is a significant “Latin” “Roman” component to the Romanian ethny in addition to Slavic and other elements.  What about “Diaspora Jewish?”  Described as a distinct category here (and in 23andMe more specifically as “Ashkenazi Jewish”), academic population genetics suggest that this is in actuality a combination of Middle Eastern and European genetics.  Once again we see a category that is either a single distinct “pure” ancestral component, or a mixed component, dependent on how it is analyzed and interpreted.  

What about statistical significance?  Confidence levels?  Error bars? And, more fundamentally, what was the reason for changing the ancestral components between the different versions of the tests?  Whatever the reasons, there’s no explanation that I find satisfactory; the overall attitude of all these companies tends to be “trust us, we’re the experts,” and the customer base accepts that, with some grumbling from those more skeptical and better informed.  None of these companies provide the nuanced interpretations and more detailed explanations that I am providing with this post.

The National Geographic test does tell customers the two groups they are most similar to.  Fine, but not enough.  There needs to be a complete list, with quantitative measurements of genetic kinship.

In summary, although some of the ideas behind the National Geographic test are interesting, the test itself is as bad as 23andMe (or worse).  The basic problems are the same – lack of sufficient reference populations, lack of nuanced understanding of the meaning of the ancestral components, lack of real statistics, and the subjective labels given to ancestral components.  If we couple this to a bad website, lack of explanation, and changes between all the different versions of the test (without sufficient explanation), this test is lucky to get a D, and not a D-.

Then we have LivingDNA, which has a leftist anti-racist narrative behind its founding, and which has received some criticism from customers online (but, then, of course, all these companies have their share of dissatisfied customers).  The results from this company seem to be slightly more plausible than that generated from 23andMe, which isn’t saying much, but suffers from the same basic problem – individuals from ethnies likely not well represented in their parental database get skewed results.  I say “likely” because the company provides remarkably little information (that I can find) on their methodology and parental population database, but given the results they generate and given the general history of companies having weak representation of certain ethnies, it’s a fair bet that this company also exemplifies “parental privilege” for certain ethnies. So, basically, it is a real bad test, only slightly better (if that) than 23andMe and National Geographic.

They also exhibit the curious results that a person of 100% genealogical ancestry X turns out to be a mix of X, Y and Z – despite the fact that, e.g., Y and Z are known to be components of X. This is the same problem with all of these companies.  It may well be that X is not represented well in their parental database; hence, the problem.  That is more likely that the X person is really so much Y and Z that it presents in addition to the Y and Z inherent in X.  Of course, the companies of course explain none of this nuance to their customers.

Indeed, a major weakness of this company (besides their politics and the questionable results) is the relative lack of information they provide about the test itself, and about the results, to their customers.  On the one hand, it’s a weakness, but then, given that much of the information provided by other companies is questionable at best and bogus at worst, maybe being reticent is a positive.  Addition by subtraction, so to speak.

No surprise of course that results from this test can very markedly differ from that obtained from, for example, 23andMe. Who expects consistency, what with different methodologies, parental population databases, gaps in those databases, labels given to ancestral components, etc.?  Don’t expect careful statistical analysis either. We certainly can’t have that!

I note that they say results can be “refined” in the future as their database expand, a tacit admission that they do not presently have good coverage of certain ethnies.  That also emphasizes the impossibility of utilizing precise cut-offs as the always-fuzzy boundaries are ever-shifting.

So, with all these weaknesses, balanced out by (possibly) marginally more plausible results than 23andMe, this company gets a “healthy” D+ for their efforts.  Really, I could have given them a D, but they seem to be relatively new, so I’ll be generous for now, and we’ll see if they improve or get worse (more likely).  I do not like their politics, but I’m not grading them on that.  I’ll expect them to ruin their test with “upgrades” the same as every other company; in that case, they would then get the D (or D-) they likely really deserve.

Getting back to inconsistency of results – as we can read in various online articles and blog posts, people who use multiple companies typically get markedly divergent results.  The main ancestry is usually similar but after that it all falls apart. Now, if the tests and their interpretations were all sound and consistent, how could that be possible?  The answer of course is that with different sets of narrowly defined parental populations with insufficient coverage and different ways of breaking down ancestral components and different approaches to labeling those components, of course the results will be different.  And, lacking sufficient information, as well as statistical information, how can we say one result is more accurate than another?  The only thing we can go on is how well the results match what academic population genetics data say about the ethny or ethnies making up a person’s genealogical ancestry.  If that’s the case, then why take the tests?  Just go to the published papers. And, laughably, the companies do not even give customers remotely similar calculations for percentages of Neanderthal ancestry. What is it?  Do they use different caveman reference populations?  One company uses Fred Flintstone and the other uses Barney Rubble?

The deCODEme site used to have a free, good (albeit qualitative) kinship comparison based on 23anedMe data – ranking relatedness to a global ethnic groups, arraigned by continent, and those results seemed reasonable, but it seems no longer offered. The original 23andMe site used to have a more quantitative estimate of relatedness at the continental and sub-continental (e.g., Northern vs. Southern European) level, as well as a PCA plot, but unfortunately they did away with that in favor of material less politically relevant (or not relevant at all).

I suppose if someone has the money to try every testing service they could look into it, and try all the companies, for personal interest. Again, this essay is not meant to be a comprehensive analysis of every company; I may have missed a test that is particularly good or bad. This post is instead meant as a brief and cursory survey of some of the main current competitors in the field, coupled with some general commentary on the tests themselves.

In any case, I agree with Johnson here.  Past “Old World” admixture is part of the European genepool.  Certainly, we can always strive to improve the genetic situation (e.g., eugenics), but we are what we are. We have to look to the future, not the past.

Grades for (autosomal) ancestry testing companies:

23andMe: D

FamilyTree: F 

AncestryDNA: D

National Geographic: D

LivingDNA: D+

DNA Tribes: F

Others are not worth mentioning or I have insufficient data.

The patterns is of very low grades, reflective of the reality that the overall state of current commercially available ancestry testing is poor.  And just as the companies claim that the data they present to their customers may change and become more “refined” with more parental population coverage, so may the grades I give these companies change (likely for the worse, given their poor performance heretofore) and become more refined with more data as well.  So,expect grade updates in the future.  Also, new companies may come into existence and  those may be evaluated as well.

The most urgent need is proper parental (or “reference”) population coverage.  Nor more “parental privilege” affirmative action for some groups and not for others.  Either add more parental population coverage or have the integrity only to offer the tests to customers who match the reference profiles.  Otherwise, it is all a misleading fraud.

In addition, these tests need to be interpreted in a relative (e.g., greater or lesser degrees of different ancestral components comparatively speaking) rather than an absolute e.g., definitive results with hard cutoffs, concerns about “purity”) fashion.  Given the realities of uncertainty and methodology, even a good test would need to be interpreted in such a fashion, much less the mediocrity we have to deal with. Of course, Nutzis will remain incapable of understanding any of this.

Really, what is needed is genetic kinship assays on all populations, comparing individuals and populations to each other, but I suppose such a biopolitically relevant metric is nothing we should expect any time soon (or ever).

One could argue that ancestry testing as it exists today could be, at best, an amusing personal hobby for individuals, if it wasn’t being politicized by actors on both the Retard Right (see quotes at the beginning of this post) and the Loony Left (LivingDNA’s anti-racialist agenda, deCODES’s “gotcha” of Watson, and the Cobb setup debacle).  But we live in an age where everything is politicized, for better or worse. In that case, we had better focus on genetic kinship, which is politically relevant with respect to EGI.

But instead we’ll have more juvenile ignorant blustering from entitled Nutzis basking in their “parental privilege” affirmative action ancestry results.

Needless to say, I was very, very surprised with the results of my DNAPrint “geographic ancestry” test results when I received it, and it showed a 21% East Asian content and 79% European instead of a 100% European which I had expected. In discussing this with AncestrybyDNA lab personnel I have learned that surprisingly to them some other PA Germans tested have had similar significant high teen, low 20’s% East Asian content results. At present they have no clear explanation as to why.

The “clear explanation” seems obvious in retrospect. Compared to the Romenyite parental population for “European,” some Germans would appear to be 4/5 Romney and 1/5 Chairman Mao.  If the parental population had been “PA Germans” then all those folks would have been “100% European.”

I’ll say it again for the mentally slow: The results of ancestral component testing is going to absolutely and directly depend on the choice of parental populations.

I need to summarize the whole “parental privilege” problem for the Nutzi crowd.  I’ll try to make it as simple as possible.  Let’s consider it first in outline form.

1. A company defines a particular ancestral component as “European.”

2. The reason for that label is that the ancestral component is defined by a parental (or reference) population (or populations) that is European.

3. But why does the company label a particular parental/reference population as European?  Well, it is because the population is historically tied to Europe, it derives from a nation or region within the boundaries of Europe, the population came into existence as a distinct group within Europe.  All of which essentially matches much of what I define as an indigenous population.

4. Very good.  So an ancestral component is European if it is derived from, or defined by, or represented by, a population that is European. European populations tend to possess ancestral components that are “European” because those components are defined from an analysis of European populations.

This is saved from being circular reasoning by the fact that the initial definition of a population as European is not based on the ancestral components (that are themselves defined as European because they come from populations labeled as European), but instead because of the historical existence of the population within Europe, as an indigenous population of Europe, so defined.

5. OK.  But, if population groups A,B,C, and D are all historically European ethnies, if they all historically exist and existed within specific regions of Europe, then why should A and B be among the parental populations that define European ancestral components, and not C and D?  There is no reason to privilege A and B over C and D.  The only practical reason is that the company simply doesn’t have any, or enough, samples from C and D, while they have many samples from A and B.

6. Because of this deficit of C and D, and presence of A and B, individuals of ethnic background A and B, or ethnies very similar to A and B, are essentially being compared to themselves in the test.  If A and B define the ancestral components of “European,” and your ancestry is A and/or B (or something similar), it stands to reason you will test out as being close to, or at, “100% European,” with the subpopulation being A and/or B. Again, you are essentially being compared to yourself.  

On the other hand, individuals from C and D are being compared against a standard defined by A and B.  So, individuals from C and D will be represented as “mostly A and/or B” but with some “E and F”- with “E and F” being ancestral components labeled as from other, non-European, populations that happen to be well-defined in the parental population database.

7. On the other hand, if C and D were included as parental populations, then their ancestral components would be included as “European” and results for individuals of C and D ancestry would be similarly “European” as for A and B, with the subpopulations in this case being C and/or D.

And in the rare cases in which testing companies decide to be honest, they admit the reality of “parental privilege” – although of course they do not term it as such.  Thus, we read:

On the old Decodeme site (login was required, so no URL available), the following was admitted (emphasis added): 

The reference population samples were obtained from the HapMap project – they are:

1) European Americans from Utah – who most likely have a majority of north European ancestry

2) Yoruban Nigerians

3) Chinese from Beijing and Japanese from Tokyo.

The characteristics of these reference population samples and the clinal nature of human genetic variation (i.e. the fact that people typically become gradually more different as you travel further from your country) have several minor implications for the interpretation of the results. For example, a deCODEme user with a majority of ancestors (during the past >2 generations) from south-east Europe, will typically see higher percentages of African and Asian ancestry than a deCODEme user whose ancestry is mainly from north-west Europe. The difference will be small, but present.

So, deCODEme at least had the honesty that populations not represented in the parentals would exhibit artefactual “admixture” due to clinal differences in gene frequencies.  As to what level of difference is “small” they do not say, but keep in mind that another company was stating that close to 9% “admixture” was close to the levels of statistical significance. 

Here’s the response from our scientist who developed the algorithm underlying ancestry painting: “There’s no case that I’ve seen where 9% Asian ancestry does not indicate genuine East Asian or Native American ancestry. I’ve looked at order thousands of individuals of known ancestry, that approximately cover the gamut of human diversity. Thus I would regard 9% as a reliable indication of East Asian or Native American ancestry. That said, 9% is close to the threshold above which the following statement can be made, so it is still theoretically possible, albeit very unlikely, that the prediction is not true.

If that is so, and then you add to that the extra uncertainty due to “parental privilege” what are we talking about here as potential error for non-privileged populations?  10%?  15%? More?  In some cases that falls with the errors bars provided by companies like AncestryDNA!

Now, of course, there really is some (modern, historical) admixture in Europe, higher in some regions than in others.  But the amount of real admixture is much lower than what would suppose from looking at commercially available ancestry testing that inflates admixture for the reasons explained above – an inflation that, by some happy coincidence, just so happens to be compatible with the leftist political views of the companies, their founders, and their employees.

While single locus markers are absolutely useless on an individual basis, they do have some utility for populations, with results averaged out over large sample sizes. Such data suggest that real admixture in Europe tops out at about 5%.  And much of that is non-European Caucasian or Central Asian.  More divergent sub-Saharan African or East Asian admixture is going to be significantly less than 5%.

So, in the end, the real reason why something like the post linked here is essentially correct is that the typical “movement” activist is too stupid to understand all of the points made in my post that you are currently reading here at EGI Notes. Even when the companies themselves admit that “parental privilege” is real, even when the companies admit the fairly large statistical error, and even when confronted with the obvious logic that someone essentially compared to themselves is going to be, by necessity, ”pure,” the Nutzi retards still won’t get it.  Or, maybe it is not that they are too stupid, but that they lack the incentive. After all, those who benefit from affirmative action rarely criticize the program; the same applies to “parental privilege.”  Let some testing company start using, say, Sardinians as the reference population to define “European,” and all the Nutzis suddenly start getting “exotic mixture,” and I’m sure they’ll all cry bloody murder.  All of a sudden, everything written here, and all the open admissions of the companies themselves, will become crystal clear and acceptable.