The
National Cancer Institute (NCI) is the U.S. Government's principal
agency for cancer research and training. NCI coordinates the National
Cancer Program, which conducts and supports research, training,
health information dissemination, and other programs with respect
to the cause, diagnosis, prevention, and treatment of cancer, rehabilitation
from cancer, and the continuing care of cancer patients and the
families of cancer patients.
Star-P software is being used by scientists at NCI’s Pediatric
Oncology Branch to mine vast public databases of genomic information
for potential new medical discoveries. With Star-P, scientists can
now tap powerful high performance computers (HPCs) to dramatically
accelerate the process of genomic profiling, which could yield new
insights into the genetic risk factors for cancer, foster new procedures
for testing tumors, or identify genetic changes that may result
from treatments and therapies.
Using a specialized software application called CORR4DB, researchers
correlate one genomic array against a database of 100,000 probe
pieces of a gene in search of specific DNA components or attributes.
The correlations help them understand the relationship of genes,
and their conclusions can provide the basis for additional genomic
research.
CORR4DB is developed in MATLAB®®, a highly productive desktop
tool favored by scientists. But sample sizes were growing into the
tens of thousands of genomic arrays, overwhelming the capabilities
of their desktop MATLAB® environment and hindering interactivity
with the data. Scientists knew that larger correlations could be
completed faster if their calculation could be parallelized to run
on a parallel HPC.
“Running a single correlation on a desktop computer could
take a week or more to complete,” says Bill Strecker, chief
technical officer at ISC. “An explosion in the amount of genomic
data available to researchers has made their work increasingly difficult.
Their tasks require more computing power, more system memory, and
– all too often – more time. And in the race to understand
how genetics and cancer are linked, time is precious.”
Star-P
is an interactive parallel computing platform that lets NCI scientists
continue to use work with CORR4DB on their desktops, but run the
correlations interactively on high performance computing servers.
This eliminates the need to re-program the application in C, FORTRAN
and MPI to run on the parallel computer. As a result, the answers
to some researchers’ questions are arriving up to 200 times
faster than ever before, in minutes instead of days.
With a more powerful parallel system at their disposal, researchers
may also try even more complex searches that previously weren't
an option. For example, the group has estimated that, using today's
database, the largest potential correlation— with a data matrix
of 100,000 by 100,000—would require more than 256GB of memory
to solve. Star-P fundamentally transforms the workflow, giving researchers
the ability to run more samples, and approach problems differently
than they would have before.
The Star-P approach has yielded significant advantages, said Dr.
Mark Potts, president of HPC Applications, Inc., a consulting firm
contracted to get NCI’s software up and running on the SGI
Altix. “If your goal is to take the same interactive environment
and transfer it to a parallel processing system with a lot more
memory, then you’ll look for the easiest way to get there,”
says Potts. “NCI is accustomed to working in MATLAB® and with
certain formatted files, and the Star-P approach retains that environment.”
Metrics:
- 200X Speedup on 8-processor server
- Ability to process larger data sets (>256GB) from desktop
MATLAB®
- Transformation of research workflow
For More Information:
• http://www.interactivesupercomputing.com/success/genomic_correlation.php
|