Media Releases

Machine learning reveals unexpected genetic roots of cancers, autism and other disorders

December 18, 2014

University of Toronto researchers from Engineering, Biology and Medicine teach computers to ‘read the human genome’ and rate likelihood of mutations causing disease, opening vast new possibilities for medicine

TORONTO, ON — In the decade since the genome was sequenced in 2003, sci­en­tists, engi­neers and doc­tors have strug­gled to answer an all-con­sum­ing ques­tion: Which DNA muta­tions cause dis­ease?

A new com­pu­ta­tion­al tech­nique devel­oped at the Uni­ver­si­ty of Toron­to may now be able to tell us.

A Cana­di­an research team led by pro­fes­sor Bren­dan Frey has devel­oped the first method for ‘rank­ing’ genet­ic muta­tions based on how liv­ing cells ‘read’ DNA, reveal­ing how like­ly any giv­en alter­ation is to cause dis­ease. They used their method to dis­cov­er unex­pect­ed genet­ic deter­mi­nants of autism, hered­i­tary can­cers and spinal mus­cu­lar atro­phy, a lead­ing genet­ic cause of infant mor­tal­i­ty.

Their find­ings appear in today’s issue of the Sci­ence, a lead­ing jour­nal.

Think of the human genome as a mys­te­ri­ous text, made up of three bil­lion let­ters. “Over the past decade, a huge amount of effort has been invest­ed into search­ing for muta­tions in the genome that cause dis­ease, with­out a ratio­nal approach to under­stand­ing why they cause dis­ease,” says Frey, also a senior fel­low at the Cana­di­an Insti­tute for Advanced Research.

“This is because sci­en­tists didn’t have the means to under­stand the text of the genome and how muta­tions in it can change the mean­ing of that text.” Biol­o­gist Eric Lan­der of the Mass­a­chu­setts Insti­tute of Tech­nol­o­gy cap­tured this puz­zle in his famous quote: “Genome. Bought the book. Hard to read.”

What was Frey’s approach? We know that cer­tain sec­tions of the text, called exons, describe the pro­teins that are the build­ing blocks of all liv­ing cells. What wasn’t appre­ci­at­ed until recent­ly is that oth­er sec­tions, called introns, con­tain instruc­tions for how to cut and paste exons togeth­er, deter­min­ing which pro­teins will be pro­duced. This ‘splic­ing’ process is a cru­cial step in the cell’s process of con­vert­ing DNA into pro­teins, and its dis­rup­tion is known to con­tribute to many dis­eases.

Most research into the genet­ic roots of dis­ease has focused on muta­tions with­in exons, but increas­ing­ly sci­en­tists are find­ing that dis­eases can’t be explained by these muta­tions. Frey’s team took a com­plete­ly dif­fer­ent approach, exam­in­ing changes to text that pro­vides instruc­tions for splic­ing, most of which is in introns.

Frey’s team used a new tech­nol­o­gy called ‘deep learn­ing’ to teach a com­put­er sys­tem to scan a piece of DNA, read the genet­ic instruc­tions that spec­i­fy how to splice togeth­er sec­tions that code for pro­teins, and deter­mine which pro­teins will be pro­duced.

Unlike oth­er machine learn­ing meth­ods, deep learn­ing can make sense of incred­i­bly com­plex rela­tion­ships, such as those found in liv­ing sys­tems in biol­o­gy and med­i­cine. “The suc­cess of our project relied cru­cial­ly on using the lat­est deep learn­ing meth­ods to ana­lyze the most advanced exper­i­men­tal biol­o­gy data,” says Frey, whose team includ­ed mem­bers from Uni­ver­si­ty of Toronto’s Fac­ul­ty of Applied Sci­ence & Engi­neer­ing, Temer­ty Temer­ty Fac­ul­ty of Med­i­cine and the Ter­rence Don­nel­ly Cen­tre for Cel­lu­lar and Bio­mol­e­c­u­lar Research, as well as Microsoft Research and the Cold Spring Har­bor Lab­o­ra­to­ry. “My col­lab­o­ra­tors and our grad­u­ate stu­dents and post­doc­tor­al fel­lows are world-lead­ing experts in these areas.”

Once they had taught their sys­tem how to read the text of the genome, Frey’s team used it to search for muta­tions that cause splic­ing to go wrong. They found that their method cor­rect­ly pre­dict­ed 94 per­cent of the genet­ic cul­prits behind well-stud­ied dis­eases such as spinal mus­cu­lar atro­phy and col­orec­tal can­cer, but more impor­tant­ly, made accu­rate pre­dic­tions for muta­tions that had nev­er been seen before.

They then launched a huge effort to tack­le a con­di­tion with com­plex genet­ic under­pin­nings: autism spec­trum dis­or­der. “With autism there are only a few dozen genes def­i­nite­ly known to be involved and these account for a small pro­por­tion of indi­vid­u­als with this con­di­tion,” says Frey.

In col­lab­o­ra­tion with Dr. Stephen Scher­er, senior sci­en­tist and direc­tor of The Cen­tre for Applied Genomics at Sick­Kids and the Uni­ver­si­ty of Toron­to McLaugh­lin Cen­tre, Frey’s team com­pared muta­tions dis­cov­ered in the whole genome sequences of chil­dren with autism, but not in con­trols. Fol­low­ing the tra­di­tion­al approach of study­ing pro­tein-cod­ing regions, they found no dif­fer­ences. How­ev­er, when they used their deep learn­ing sys­tem to rank muta­tions accord­ing to how much they change splic­ing, sur­pris­ing pat­terns appeared.

“When we ranked muta­tions using our method, strik­ing pat­terns emerged, reveal­ing 39 nov­el genes hav­ing a poten­tial role in autism sus­cep­ti­bil­i­ty,” Frey says.

And autism is just the beginning—this muta­tion index­ing method is ready to be applied to any num­ber of dis­eases, and even non-dis­ease traits that dif­fer between indi­vid­u­als.

Dr. Juan Val­cár­cel Juárez, a researcher with the Cen­ter for Genom­ic Reg­u­la­tion in Barcelona, Spain, who was not involved in this research, says: “In a way it is like hav­ing a lan­guage trans­la­tor: it allows you to under­stand anoth­er lan­guage, even if full com­mand of that lan­guage will require that you also study the under­ly­ing gram­mar. The work pro­vides impor­tant infor­ma­tion for per­son­al­ized med­i­cine, clear­ly a key com­po­nent of future ther­a­pies.”

-30-

Notes to edi­tors:
Bren­dan Frey is a pro­fes­sor in The Edward S. Rogers Sr. Depart­ment of Elec­tri­cal & Com­put­er Engi­neer­ing, with cross-appoint­ments to the Don­nel­ly Cen­tre for Cel­lu­lar and Bio­mol­e­c­u­lar Research, Depart­ment of Mol­e­c­u­lar Genet­ics, Depart­ment of Com­put­er Sci­ence, and McLaugh­lin Cen­tre at Uni­ver­si­ty of Toron­to. He is a senior fel­low of the Cana­di­an Insti­tute for Advanced Research, and a mem­ber of the Tech­ni­cal Advi­so­ry Board of Microsoft Research.

This research was sup­port­ed by:
Cana­di­an Insti­tute for Advanced Research (CIFAR); Cana­di­an Insti­tutes of Health Research (CIHR); Nat­ur­al Sci­ences and Engi­neer­ing Research Coun­cil of Cana­da (NSERC); McLaugh­lin Cen­tre; Autism Speaks; Genome Cana­da; Autism Train­ing Pro­gram.

Media con­tacts:

RJ Tay­lor
Com­mu­ni­ca­tions & Media Rela­tions Strate­gist
Fac­ul­ty of Applied Sci­ence & Engi­neer­ing
Uni­ver­si­ty of Toron­to
Tel: 416–978-4498; rj.taylor@utoronto.ca