Ancestral Graphics

A more visual explanation.

Let us explore some of the ideas broached here in a simplistic visual manner, to make some of the basic concepts more understandable to drooling Nutzi Type I retards. Note that all of the below is obviously very highly simplified so as to make the concepts clear to “movement” “activists” and their below-room-temperature IQs.

Also note that the first graphic uses, again for the sake of simplicity, a one-dimensional continuum, as opposed to the two-dimensional PCA plots used in many population genetics studies (and true biological reality is multi-dimensional, more complex than any PCA plot).  It shows clinal genetic variation.  Blue and green are European populations, while purple and yellow are non-European.  The other X’s are other populations that lie along the continuum of genetic variation. The red and orange-brown X’s represent populations even more genetically distant from Europeans than are the purple and yellow; these are presented for the sake of illustrating clinal variation and will not be relevant to the following analysis.


xxXxxxXxxxxXxxx—-xxxxxXxxxXxxxx—xxxxxxXxxxxxxXxx

A company calculates ancestry based on SNP gene frequencies, and chooses purple, blue, and yellow as parental populations (we can assume red and orange-brown are chosen also, but, again, for the sake of simplicity, we will not discuss those populations). Thus, the  company chooses blue as a population representing “European.”  Green is not a parental population for this company.

So, a green individual (i.e., someone of “green” ancestry), represented by the purple-blue-yellow parental populations, might be, say 85-90% blue and 10-15% yellow.  Blue individuals and individuals from the X’s adjacent to blue, would test out as close to 100% blue (European). What if green was chosen as a European parental population instead of blue?  Then green individuals (and persons from related groups) would be close to 100% green (European) and blue individuals may show significant fractions of purple.  Of course, including both blue and green as parental populations would be best.


Given that a company (deCODE back when they were offering their own ancestry test) openly admitted that clinal genetic variation coupled with a limited set of parental populations could result in artefactual “admixture,” the above analysis, however simplified, is a reflection of the reality of these tests. The more similar someone is to the parental populations, the greater the probability of getting high percentage (i.e., close to 100%) matches to their actual ancestry. The more distant, the lower the probability.  The more fine the level of distinction required,the greater the need for more parental populations.  At the level that these companies purport to assay, at racial and subracial, and ethnic levels, of course you will need a very broad array of parental populations, which they do not have. And, yes, of course they know this.  After all, why do they occasionally add more parental populations to their limited databases?  If it really didn’t matter, we could just go back to the days of DNAPrint and use CEU Romneyites from Utah as “European” and not bother with anything else.  But, alas, then Germans would start getting “East Asian admixture” and we can’t have that.

Apologists for ancestry testing companies would argue that some of those companies use chromosome blocks (haplotypes) to make their ancestry estimates, rather than just SNP gene frequencies.  As I wrote in the above-linked post, this is even worse.  Let’s consider what can go wrong here, again using a simplified example suitable for brain-addled Nutzi freaks.

Let’s assume an individual from the green ethny is tested via the haplotype/chromosome block method, using the same blue-purple-yellow parental populations.

At the most conservative, highest confidence level of 90% (that is still less than the 95% typically used in scientific publications, although there is obviously subjectivity on where to draw the line), this person gets 58% blue, 40% unassigned (black), and 2% yellow (that can be real or artefactual).  That can be crudely represented as follows (with a single continuum representing all the chromosomes for the sake of simplicity):

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

But, at the pathetically comical 50% confidence level (flip a coin!), the 40% that was unassigned at 90% becomes: 3% still unassigned, 27% looks a bit more blue than yellow and so is assigned to blue, and 10% looks a bit more yellow than blue and so is assigned to yellow.

Now the person is 85% blue, 12% yellow, and 3% unassigned.  Again we assume blue = European, and yellow is some non-European group.  That’s crudely shown here:

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The default setting is at 50% confidence for the company to report their results and the Nutzis start their heavy breathing excitement.  Admixture!

But what if the parental populations were purple, green, and yellow?  Then all of the above would hold, but substituting green for blue, and purple for yellow.  Here, green = European, and purple and yellow = non-European, the green individual would now be 98-100% green and 0-2% yellow. 


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


This green individual would have little unassigned even at 90%, while the blue individual would now exhibit the same problems the green individual had before (albeit with different color combinations).  So, at 50% we would have the following for a blue individual:


xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx


Now, at this point, the Nutzis will be screaming about how “crazy” and “wrong” the test is.  


By the way, if there is any question about the validity of the haplotype discussion in this post, see these admissions from the heroes at 23andMe (spun in a manner to make them look less culpable).  The highlights:

Each prediction is also linked to our confidence that the call is correct. By default, Ancestry Composition requires that our confidence in a prediction be greater than 50%.

Two points.  First, “greater than 50%” could be 51% or 50.1% or 52.25% or whatever measure (e.g., “55%” – see their Japanese example immediately below) slightly greater than the probability of a coin flip.  Second, even with that, the “calls” are based on what their parental population database is.  The “correct call” here does not really mean matching biological reality; instead, it merely means “correct” within the confines of the test’s parameters.

For example, if a segment of your DNA has a 55 percent chance of being Japanese, then that segment will be painted as Japanese at the 50 percent confidence level, but it will be painted with a more broad ancestry…at the 60 to 90 percent confidence levels.

Exactly what I have been writing all along about this.  And, of course, the individual in question might not be Japanese or part Japanese.  Perhaps this is someone of an Asian ethny not part of their parental population database, so the company is trying to assign chromosomal fragments based on the fragments’ relative similarity to that of ethnies that are part of their database.  If the actual population was included as a reference, then this would not be necessary (at least not to this extent).


And this also demonstrates why the haplotype/chromosome block (“chromosome painting”) method is even more sensitive to test parameters than is the more general SNP frequency method, particularly at low confidence levels.  A shift in probability from 49% to 51% can result in an entire chunk of the genome being reassigned to a different ancestry at the 50% confidence level, and that subtle shift could result from differential representation of parental populations.


Default reporting of such low confidence levels is ludicrous.

Advertisements