Darlene Storm

Researchers exploit flaw to identify anonymous DNA donors and their families

January 24, 2013 1:30 PM EST

We’ve looked at police bees vs DNA hackers before, and how freaky future weaponized viruses and zero-day exploits may aim to infect your brain. Well there are lots of cool and creepy things happening in the area of DNA again. From biological 'hard drives,' to exploiting a vulnerability to expose identities of supposedly anonymous genetic donors—and people in their family tree that never donated DNA, to Homeland Security discussing potentially launching a “social conditioning campaign” so people won’t freak out about plans to develop and deploy rapid DNA analyzers.

A single gram of DNA could store enough data to fill 468,000 DVDs

A single gram of DNA could store enough data to fill 468,000 DVDsPicture something as tiny as a “speck of dust” that can store a text file containing the entire collection of Shakespeare's 154 sonnets, a 26 second MP3 from Martin Luther King Jr.'s "I have a dream" speech, a PDF of the first research paper describing the double helical nature of DNA by James Watson and Francis Crick, as well as JPG photo from the European Bioinformatics Institute (EMBL-EBI). Since DNA strands about the size of a dust speck stored all that, Emily Leproust of Agilent Technologies said that a cupful of DNA could hold “a hundred million hours of high-definition video.” Researchers in the United Kingdom told Nature that they improved upon the DNA encoding scheme and raised storage density to 2.2 petabytes per gram, which is three times better than the last effort.

A team led by molecular biologists Nick Goldman and Ewan Birney of the European Bioinformatics Institute, with help from Agilent Technologies, also added an error correction scheme so the data could be read back with 100% accuracy. To put the storage density another way, a single gram of DNA could store what we can currently fit on 468,000 DVDs, about 2.2 million gigabits of data. If stored in a cool and dry place, DNA can be stable for thousands of years. According to the EBI researchers, DNA data storage is now only cost-effective for data that needs to be archived for 600 years or more. “But if the costs of DNA synthesis—currently the most expensive part of the enterprise—drop 100-fold, that break-even number would drop to about 50 years.” However such DNA as biological hard drive storage could prove invaluable for institutions like the Large Hadron Collider since it creates about 15 petabytes of data per year.

From DNA slick tricks to scarier DNA news: Hacking privacy of 'anonymous' donorsif your DNA is public, so are you and your family -- Researchers exploit flaw to identify anonymous DNA donors and their families

You may be used to hearing about vulnerabilities leading to security and privacy breaches, but in a new twist, scientists exploited vulnerabilities in the security of genetic data posted online from supposedly anonymous donors. “Using only a computer, an Internet connection, and publicly accessible online resources, a team of Whitehead Institute researchers has been able to identify nearly 50 individuals who had submitted personal genetic material as participants in genomic studies.” Not only that, but they were able to find their entire families, even though the relatives had not donated DNA. The scientists published the results of their research in Science magazine.

Whitehead Fellow Yaniv Erlich used to work as a white hat hacker, pen testing for vulnerabilities in banks, but this time, using public databases, he searched out an easily found type of DNA pattern on the Y chromosome that is passed from father to son; it “looks like stutters among billions of chemical letters in human DNA.” Since “there is a strong link in men between their surname and unique markings on the male, or Y, chromosome,” Elrich took the Y chromosome's short “stutters” and then searched a genealogy database for men with those same repeating DNA patterns. That gave him the surnames of the paternal and maternal grandfather. A quick Google search for those people turned up an obituary, which then gave him the family tree.

Melissa Gymrek, a member of the Erlich’s team, explained, “We show that if, for example, your Uncle Dave submitted his DNA to a genetic genealogy database, you could be identified. In fact, even your fourth cousin Patrick, whom you’ve never met, could identify you if his DNA is in the database, as long as he is paternally related to you.”

Elrich said, “This is an important result that points out the potential for breaches of privacy in genomic studies. Our aim is to better illuminate the current status of identifiability of genetic data. More knowledge empowers participants to weigh the risk and benefits and make more informed decisions when considering whether to share their own data.” He added, “We also hope that this study will eventually result in better security algorithms, better policy guidelines, and better legislation to help mitigate some of the risks.”

During a Science Magazine podcast, Gymrek pointed out one of the scarier risks of what could be done with this data. “So the big example that comes to mind is something like insurance companies. If insurance companies can know who you are, know your DNA sequence, they can determine if you’re predisposed to certain disorders, and they can use that information against you to raise your premiums and to make your life bad. You can think of scenarios like that where these are people that you don’t want getting a hold of your genetic data that might be able to get a hold of it.”

Eric D. Green, director of the National Human Genome Research Institute at the National Institutes of Health, said, "We are in what I call an awareness moment." Dr. Amy L. McGuire, an attorney and ethicist at Baylor College of Medicine said, "To have the illusion you can fully protect privacy or make data anonymous is no longer a sustainable position." Mildred Cho of the Stanford University's Center for Integration of Research on Genetics and Ethics added, "Nobody can promise privacy." Basically, if your DNA is public, then so are you and your family.

DHS plans for Rapid DNA Analyzers

CODIS, a database that is administered through the FBI and enables state and local crime laboratories to exchange and compare DNA Profiles electronicallyWhy else could this be potentially dangerous for you? Because, as the EFF explained in an unrelated bit of scary DNA news, Rapid DNA analyzers that can “process DNA in 90 minutes or less” are “coming soon to a police department or immigration office near you.” Rapid DNA Analyzers are “about the size of a laser printer” and “are designed to be used in the field by non-scientists.” Manufacturers are telling the U.S. government the devices “will soon revolutionize the use of DNA by making it a routine identification and investigational tool.” Documents from the US Citizenship and Immigration Services (USCIS) and DHS’s Science & Technology show that funds have been earmarked “to develop a Rapid DNA analyzer that can verify familial relationships for refugee and asylum applications for as little as $100.”

The EFF added:

DHS and USCIS acknowledge that “DNA collection may create controversy.” One USCIS employee advocated for “DHS, with the help of expert public relation professionals,” to “launch a social conditioning campaign” to “dispel the myths and promote the benefits of DNA technology.” Another document feared that “If DHS fails to provide an adequate response to [inquiries about its Rapid DNA Test Program] quickly, civil rights/civil liberties organizations may attempt to shut down the test program."