TL;DR: Ethics aside, embryo selection doesn’t work. Genes are not a crystal ball for predicting individual outcomes (and never will be).
Like many people my age, the idea of genetically engineering babies brings to mind the 1997 sci-fi movie Gattaca, where Ethan Hawke’s “naturally” conceived character suffers job discrimination because he’s considered genetically inferior.
While Gattaca came out well before the completion of the Human Genome Project in 2003, it foreshadowed a future that has already arrived. Several companies are now offering screening that lets “parents pursuing IVF see and understand the complete genetic profile of each of their embryos.”
Yep, this is real. (https://mynucleus.com/embryo)
Intriguing. So why wouldn’t we want to use science to select a healthier embryo?
Pre-implantation genetic testing of embryos to screen for genetic disorders has been happening for years, but these new screening tests are different. Why? Diseases caused by single-gene mutation like Huntington’s or cystic fibrosis are not the norm. Most health or behavioral outcomes are “polygenic”, meaning their likelihood is influenced by the combination of small effects from hundreds or even thousands of genes. This means that you can’t screen for just one gene to avoid a bad outcome.
These thousands of small genetic effects can be combined into a “Polygenic Risk Score” that summarizes how a person’s genes correlate with the risk of a disease or other outcome of interest, such as height. The distribution of a polygenic score looks like a bell curve, so most people have “average” risk:
Source: https://polygenicscores.org/explained/
But unlike single-gene diseases like cystic fibrosis, having a higher polygenic risk score doesn’t cause a disease; it just means your combination of genes is associated with a slightly higher probability of the outcome. So most genetic “effects” are probabilistic, not deterministic.
Polygenic Scores Are Like Risk Scores for Home Insurance: Useful for Groups, Noisy for Individuals
Imagine you are a home insurance company. You can’t know exactly which home will catch fire, flood, or get hit by a tree—the world is not that predictable. But you do have data that can inform which homes are more likely to suffer major damage over time. For the company to stay financially afloat, it must balance risk across all its insured homes. Calculating a risk score based on factors such as the geography of flooding or whether the house has a fire sprinkler system, helps a company price their policies and balance risk across a large portfolio of insured homes. The insurer is not predicting individual disasters, but managing overall population-level risk.
Polygenic scores are like these home risk scores. A high polygenic score doesn’t guarantee disease, just like living in a fire-prone area doesn’t mean your house will burn down.
Coming back to embryo selection, perhaps you can start to see the problem...
Instead of insuring 10,000 houses, a couple may select one or two embryos from 5-10 to implant. Like insurance risk scores, polygenic scores are correlated with the outcome at the population level, but the scores are poor predictors for individuals. For me this idea is best visualized with scatterplots:
Height is one of the most genetically predictable outcomes, but even for height, a polygenic score that combines thousands of genes accounts for around 35-40% of the variablity in height outcomes (like the middle panel above). For traits like cognitive ability, this predictive ability is even lower (<15%, right panel above). Compared to a perfect correlation (far left panel), these weaker correlations mean that the outcome for any individual (one dot) is only loosely tied to their polygenic score. The same score can lead to many different outcomes.
Let’s look at a real-world example. These data are from a national sample of US adults and show the association between a polygenic score for educational attainment (x-axis) and the number of years a person actually went to school (y-axis). Each green dot is one person in the sample.
Source: Okbay, A., Wu, Y., Wang, N. et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat Genet 54, 437–449 (2022).
Here we see that on average at the population-level, a higher polygenic score is correlated with more years of education, as indicated by the upward-sloping regression line. However, the correlation is very weak, meaning the scatterplot looks like a cloud rather than being tightly clustered around the line.
This picture brings home the point about individual prediction. If you look along the horizontal (x) axis, knowing an individual’s polygenic score tells you very little about their actual level of education (y-axis). For someone with a polygenic score one standard deviation below the mean (quite low), the completed years of education ranges from a low of 7 years to a high of 22 years, and everything in between. What’s more, many people with “high” polygenic scores for education end up with lower education than people on the low end of the polygenic scale.
This is the essence of why polygenic prediction for embryo selection is largely a waste of money (and not a big risk for advancing eugenics). Even for the most strongly determined traits, like height, you are very likely to get an outcome that is contrary to what you are trying to achieve. Genes are simply not deterministic enough to engineer the outcomes we want (and likely never will be).
A worked example for height
Karavani and colleagues worked out the math of embryo selection for traits like height. Assuming ten tested embryos, they found an average gain of 2.5 cm (slightly less than an inch) choosing the embryo with the highest polygenic score for height compared to a random embryo. But remember, this represents the average gain in a large sample of couples making this choice (like the average increase in education with higher polygenic scores along the regression line above). But there is a lot of variability around this “expected” gain. The authors even looked at genetic and height data from real families with lots of kids (all now adults). They found:
The offspring with the highest polygenic score (red squares) among the siblings was the tallest in only 7 of the 28 families.
Across all families, the tallest child was on average roughly 3.0 cm taller than the child with the tallest predicted height.
An Analysis of Selection for Height in 28 Real Families with up to 20 Adult Offspring Each. Figure 5 from E. Karavani, O. Zuk, D. Zeevi, N. Barzilai, N. C. Stefanis, A. Hatzimanolis, et al. Screening Human Embryos for Polygenic Traits Has Limited Utility, Cell 2019.
In panel A above you can see the variability around the average expected “gain” in height by choosing the highest scoring embryo, depending on the number of embryos. The dots represent real data from the large families. Choosing the highest scoring embryo leads to minimal or even lower than expected height gains in many cases.
Panel B shows the variability in height among offspring from the same family. The red square corresponds to the offspring with the highest polygenic score for height. You can see for example that:
In Family 1, the highest scorer is also the tallest offspring.
In Family 2, the highest scorer is the shortest of the offspring.
Let’s recall that height is one of the most genetically predictable traits that we have. The predictive ability of polygenic scores for other “complex” traits, such as cognitive ability or chronic diseases like heart disease or depression is much lower (imagine big clouds for scatterplots like the example above). This means that selecting embryos based on risk scores for these outcomes is unlikely to result in better outcomes than choosing randomly.
But maybe we will get better at genetic prediction?
Not likely. Statistical geneticists have estimated “upper bounds” for how good a genetic predictor might be, and we’re approaching this actual limit for most outcomes. At its core, this poor predictability is not a failure of the science of genetics, but a measure of the unpredictability and complexity of human lives. Like geography and house fires, genes are associated with many behavioral traits and health outcomes, but only a little bit. The rest of the variation among people is explained by the accumulated experiences of our bodies interacting with the world around us, and a lot of randomness (good or bad luck).
This may seem disappointing for those who hoped that mapping the human genome would give us the codebook for preventing all disease. But it’s actually very good news that genes are not as deterministic as we may have imagined. For our species to have survived and thrived for this long, our genome had to be enormously plastic and adaptive to the environment when expressing itself.
So for me, the best argument against the utility of polygenic scores for individual prediction (including embryo selection) is that they don’t achieve what they are advertised to do. Statements like this are blatantly false:
Since most families won’t have an infinite number of repeated IVF cycles, the opportunities to achieve even the “average” gain from choosing the highest score for any trait are extremely limited. Paying for polygenic testing for embryo selection is at best a big waste of money for the illusion of genetic control.
For many, the idea of embryo selection raises serious ethical concerns. Wealthy people paying to implant embryos with the “best” genes sounds way too close to dystopian Gattaca or to historical eugenics horrors. While these ethical questions are hugely important, I take comfort in the fact that polygenic scores would be largely useless even in the hands of eugenicists.
Even in a Gattaca-esque future where every baby born has been pre-screened as an embryo, it would still be very hard to dramatically change the average level of cognitive ability in the population based on polygenic scores. Again, this comes down to the inherently low genetic predictability of complex traits. Like predicting the weather, there is a limit to how good we can get when high levels of complexity and randomness are at play.
But even if genetic prediction was more accurate, here are a couple more reasons why embryo selection using polygenic scores is a bad idea:
Most genes are associated with many different outcomes (known as pleiotropy).
Selecting for one trait necessarily means tinkering with others in unintended ways. For example, selecting an embryo with the highest score for educational attainment also means a higher risk score for bipolar disorder, anorexia, and schizophrenia. While the companies advertising these services claim you can get a “holistic picture” of the genetic profiles of your embryos for many different traits, biology is way more complex. There is no scientific way to “optimize” the combination of different polygenic scores to achieve a healthier or smarter baby.
When I think about people selecting embryos based on something like height, I also cringe on a personal level. I am a little over 5’8” and definitely enjoy being on the tall side for a woman. My daughter, the most amazing human ever, is 5’1” (5’ 1 and a half if you ask her). Genetic shuffling from parent to child is a wondrous thing, and choosing an embryo on a single feature like height (if it worked well) could mean trading off some other unknown but wonderful trait. Imagine a score that selects against this perfect human?
Polygenic scores are not fixed biological quantities; they are context-specific.
Polygenic scores capture the observed association between genes and a certain outcome at a particular time and place.
Especially for cognitive and behavioral traits, genes must work through the environment to manifest as a specific outcome. The best example I’ve heard for this intuition was from a talk by Princeton sociogenomics researcher Sam Trejo which went something like this:
Imagine 75 years ago a genetic variant that increased the propensity for a baby to put things in their mouth. Because there was a lot of lead paint in houses and toys at this time, a proclivity to suck on things would likely be associated with lower cognitive ability in adults. This gene has nothing to do with cognition in a direct way, but only indirectly through this very specific environmental exposure. But the genome-wide association studies (GWAS) that we use to calculate polygenic scores can’t make this distinction, they just see that some genetic variant is statistically associated with the outcome, no matter the mechanism.
For babies born today, this same lead-paint eating gene would not confer a risk of lower cognitive ability (because there is no lead around). So, polygenic scores constructed from older populations today (which is the norm) will not necessarily be accurate for people born today, because their genes are interacting with a very different world.
Indeed, we see lots of examples of polygenic scores being less predictive of an outcome when tried in a different population, time, or place. This variation underscores how important the environment is for translating any genetic propensities into real outcomes.
You can probably tell there is a lot more to discuss about the use of polygenic scores for genetic prediction, including the potential for “precision medicine” or targeted interventions. Besides embryo selection, companies are also selling people their own polygenic scores. The researchers who develop these scores are themselves warning about what they can and can’t say, emphasizing “...it is important that participants/users understand that these individual results are not meaningful predictions and should be regarded essentially as entertainment. Failure to make this point clear risks sowing confusion and undermining trust in genetics research.”
The tension between useful population-level assocations versus noisy individual-level prediction is also revelant for lots of other measurements, including epigenetic “clocks” that supposedly tell you your true “biological age.” Stay tuned for more on this…
BOTTOM LINE:
Except for rare single-gene mutations, genetic prediction for individuals doesn’t work.
The good news is we’re not in (this particular) Black Mirror episode just yet.
We all want our kids to be as healthy and happy as possible, but genetic tinkering is not the answer. A safe and loving home (and world) gives kids the best chance to reach their full potential no matter their specific combination of genes.
Stay well,
Jenn
My perspective comes from being a researcher/scholar of population statistics in general, and being an avid academic follower of this area of genetics, with lots of close colleagues working in this area. I also tried to describe things in a way that would be most understandable for non-specialist audience, which likely trades of some nuance and precision. For some beautifully written deeper and more technical expert dives into these topics, I highly recommend the Substack of statistical geneticist of Sasha Gusev, and particularly these pieces most relevant to this post:
For the super keen I also recommend the Substack of Eric Turkheimer, and the book The Genetic Lottery, Why DNA Matters for Social Equality by Paige Harden for a thoughtful and provacative take on the intersection of social science and genetics. And I have not read this new book yet, but Dalton Conley is always very insightful on these topics.
Yes, Jenn's article is right.
Going a step further: calling risk scores “polygenic” demonstrates ignorance about what the word means. “Polygenic” means that the disease or condition is caused entirely by a set of genes acting together. In reality, almost every disease or condition depends on non-genetic factors, aka the environment, and is also influenced somewhat by genes. That’s called “multifactorial”. The word doesn’t sound as cool. Scientific honesty is more important than cool sounding words.
I read about it few months back, at that time my stance was it's ethically wrong what company is trying to sell. We're trying to make robots I feel, what If parents are not satisfied with child then and too many loopholes.