Media Releases

Machine learning reveals unexpected genetic roots of cancers, autism and other disorders

December 18, 2014

TORONTO, ON — In the decade since the genome was sequenced in 2003, sci­en­tists, engi­neers and doc­tors have strug­gled to answer an all-con­sum­ing ques­tion: Which DNA muta­tions cause dis­ease?

A new com­pu­ta­tion­al tech­nique devel­oped at the Uni­ver­si­ty of Toron­to may now be able to tell us.

A Cana­di­an research team led by pro­fes­sor Bren­dan Frey has devel­oped the first method for ‘rank­ing’ genet­ic muta­tions based on how liv­ing cells ‘read’ DNA, reveal­ing how like­ly any giv­en alter­ation is to cause dis­ease. They used their method to dis­cov­er unex­pect­ed genet­ic deter­mi­nants of autism, hered­i­tary can­cers and spinal mus­cu­lar atro­phy, a lead­ing genet­ic cause of infant mor­tal­i­ty.

Their find­ings appear in today’s issue of the Sci­ence, a lead­ing jour­nal.

Think of the human genome as a mys­te­ri­ous text, made up of three bil­lion let­ters. “Over the past decade, a huge amount of effort has been invest­ed into search­ing for muta­tions in the genome that cause dis­ease, with­out a ratio­nal approach to under­stand­ing why they cause dis­ease,” says Frey, also a senior fel­low at the Cana­di­an Insti­tute for Advanced Research.

“This is because sci­en­tists didn’t have the means to under­stand the text of the genome and how muta­tions in it can change the mean­ing of that text.” Biol­o­gist Eric Lan­der of the Mass­a­chu­setts Insti­tute of Tech­nol­o­gy cap­tured this puz­zle in his famous quote: “Genome. Bought the book. Hard to read.”

What was Frey’s approach? We know that cer­tain sec­tions of the text, called exons, describe the pro­teins that are the build­ing blocks of all liv­ing cells. What wasn’t appre­ci­at­ed until recent­ly is that oth­er sec­tions, called introns, con­tain instruc­tions for how to cut and paste exons togeth­er, deter­min­ing which pro­teins will be pro­duced. This ‘splic­ing’ process is a cru­cial step in the cell’s process of con­vert­ing DNA into pro­teins, and its dis­rup­tion is known to con­tribute to many dis­eases.

Most research into the genet­ic roots of dis­ease has focused on muta­tions with­in exons, but increas­ing­ly sci­en­tists are find­ing that dis­eases can’t be explained by these muta­tions. Frey’s team took a com­plete­ly dif­fer­ent approach, exam­in­ing changes to text that pro­vides instruc­tions for splic­ing, most of which is in introns.

Frey’s team used a new tech­nol­o­gy called ‘deep learn­ing’ to teach a com­put­er sys­tem to scan a piece of DNA, read the genet­ic instruc­tions that spec­i­fy how to splice togeth­er sec­tions that code for pro­teins, and deter­mine which pro­teins will be pro­duced.

Unlike oth­er machine learn­ing meth­ods, deep learn­ing can make sense of incred­i­bly com­plex rela­tion­ships, such as those found in liv­ing sys­tems in biol­o­gy and med­i­cine. “The suc­cess of our project relied cru­cial­ly on using the lat­est deep learn­ing meth­ods to ana­lyze the most advanced exper­i­men­tal biol­o­gy data,” says Frey, whose team includ­ed mem­bers from Uni­ver­si­ty of Toronto’s Fac­ul­ty of Applied Sci­ence & Engi­neer­ing, Temer­ty Temer­ty Fac­ul­ty of Med­i­cine and the Ter­rence Don­nel­ly Cen­tre for Cel­lu­lar and Bio­mol­e­c­u­lar Research, as well as Microsoft Research and the Cold Spring Har­bor Lab­o­ra­to­ry. “My col­lab­o­ra­tors and our grad­u­ate stu­dents and post­doc­tor­al fel­lows are world-lead­ing experts in these areas.”

Once they had taught their sys­tem how to read the text of the genome, Frey’s team used it to search for muta­tions that cause splic­ing to go wrong. They found that their method cor­rect­ly pre­dict­ed 94 per­cent of the genet­ic cul­prits behind well-stud­ied dis­eases such as spinal mus­cu­lar atro­phy and col­orec­tal can­cer, but more impor­tant­ly, made accu­rate pre­dic­tions for muta­tions that had nev­er been seen before.

They then launched a huge effort to tack­le a con­di­tion with com­plex genet­ic under­pin­nings: autism spec­trum dis­or­der. “With autism there are only a few dozen genes def­i­nite­ly known to be involved and these account for a small pro­por­tion of indi­vid­u­als with this con­di­tion,” says Frey.

In col­lab­o­ra­tion with Dr. Stephen Scher­er, senior sci­en­tist and direc­tor of The Cen­tre for Applied Genomics at Sick­Kids and the Uni­ver­si­ty of Toron­to McLaugh­lin Cen­tre, Frey’s team com­pared muta­tions dis­cov­ered in the whole genome sequences of chil­dren with autism, but not in con­trols. Fol­low­ing the tra­di­tion­al approach of study­ing pro­tein-cod­ing regions, they found no dif­fer­ences. How­ev­er, when they used their deep learn­ing sys­tem to rank muta­tions accord­ing to how much they change splic­ing, sur­pris­ing pat­terns appeared.

“When we ranked muta­tions using our method, strik­ing pat­terns emerged, reveal­ing 39 nov­el genes hav­ing a poten­tial role in autism sus­cep­ti­bil­i­ty,” Frey says.

And autism is just the beginning—this muta­tion index­ing method is ready to be applied to any num­ber of dis­eases, and even non-dis­ease traits that dif­fer between indi­vid­u­als.

Dr. Juan Val­cár­cel Juárez, a researcher with the Cen­ter for Genom­ic Reg­u­la­tion in Barcelona, Spain, who was not involved in this research, says: “In a way it is like hav­ing a lan­guage trans­la­tor: it allows you to under­stand anoth­er lan­guage, even if full com­mand of that lan­guage will require that you also study the under­ly­ing gram­mar. The work pro­vides impor­tant infor­ma­tion for per­son­al­ized med­i­cine, clear­ly a key com­po­nent of future ther­a­pies.”

-30-

Notes to edi­tors:
Bren­dan Frey is a pro­fes­sor in The Edward S. Rogers Sr. Depart­ment of Elec­tri­cal & Com­put­er Engi­neer­ing, with cross-appoint­ments to the Don­nel­ly Cen­tre for Cel­lu­lar and Bio­mol­e­c­u­lar Research, Depart­ment of Mol­e­c­u­lar Genet­ics, Depart­ment of Com­put­er Sci­ence, and McLaugh­lin Cen­tre at Uni­ver­si­ty of Toron­to. He is a senior fel­low of the Cana­di­an Insti­tute for Advanced Research, and a mem­ber of the Tech­ni­cal Advi­so­ry Board of Microsoft Research.

This research was sup­port­ed by:
Cana­di­an Insti­tute for Advanced Research (CIFAR); Cana­di­an Insti­tutes of Health Research (CIHR); Nat­ur­al Sci­ences and Engi­neer­ing Research Coun­cil of Cana­da (NSERC); McLaugh­lin Cen­tre; Autism Speaks; Genome Cana­da; Autism Train­ing Pro­gram.

Media con­tacts:

RJ Tay­lor
Com­mu­ni­ca­tions & Media Rela­tions Strate­gist
Fac­ul­ty of Applied Sci­ence & Engi­neer­ing
Uni­ver­si­ty of Toron­to
Tel: 416–978-4498; rj.taylor@utoronto.ca