Task Parallel Computing with
     PPEVAL >

Vectorizing Your MATLAB®      Code for Best Performance >

Parallel File I/O >

 

Computational Ecology at University of California at Santa Barbara >

Genomic Analysis at the National Cancer Institute >

Integration with Numerical Algorithms Group >

Fast and Fourier: FFTW Integrated in Star-P >

“Fast and Fourier”: Under
the hood of high-performance FFTs in Star-P

Significant upgrade in the functionality and performance of Fast Fourier Transform functions like fft, fft2 and fftn on distributed data. More >

Python Users: Extend your codes to parallel servers and clusters. More >

Interactive Tour >

Python Users: Join our early adopter program!
Whitepaper Library: Going Parallel: An Implementation Guide.
Productivity Breakthrough for Python, and Others. Learn More.

Genomic Analysis at the National Cancer Institute


The National Cancer Institute (NCI) is the U.S. Government's principal agency for cancer research and training. NCI coordinates the National Cancer Program, which conducts and supports research, training, health information dissemination, and other programs with respect to the cause, diagnosis, prevention, and treatment of cancer, rehabilitation from cancer, and the continuing care of cancer patients and the families of cancer patients.

Star-P software is being used by scientists at NCI’s Pediatric Oncology Branch to mine vast public databases of genomic information for potential new medical discoveries. With Star-P, scientists can now tap powerful high performance computers (HPCs) to dramatically accelerate the process of genomic profiling, which could yield new insights into the genetic risk factors for cancer, foster new procedures for testing tumors, or identify genetic changes that may result from treatments and therapies.

Using a specialized software application called CORR4DB, researchers correlate one genomic array against a database of 100,000 probe pieces of a gene in search of specific DNA components or attributes. The correlations help them understand the relationship of genes, and their conclusions can provide the basis for additional genomic research.

CORR4DB is developed in MATLAB®®, a highly productive desktop tool favored by scientists. But sample sizes were growing into the tens of thousands of genomic arrays, overwhelming the capabilities of their desktop MATLAB® environment and hindering interactivity with the data. Scientists knew that larger correlations could be completed faster if their calculation could be parallelized to run on a parallel HPC.

“Running a single correlation on a desktop computer could take a week or more to complete,” says Bill Strecker, chief technical officer at ISC. “An explosion in the amount of genomic data available to researchers has made their work increasingly difficult. Their tasks require more computing power, more system memory, and – all too often – more time. And in the race to understand how genetics and cancer are linked, time is precious.”

Star-P is an interactive parallel computing platform that lets NCI scientists continue to use work with CORR4DB on their desktops, but run the correlations interactively on high performance computing servers. This eliminates the need to re-program the application in C, FORTRAN and MPI to run on the parallel computer. As a result, the answers to some researchers’ questions are arriving up to 200 times faster than ever before, in minutes instead of days.

With a more powerful parallel system at their disposal, researchers may also try even more complex searches that previously weren't an option. For example, the group has estimated that, using today's database, the largest potential correlation— with a data matrix of 100,000 by 100,000—would require more than 256GB of memory to solve. Star-P fundamentally transforms the workflow, giving researchers the ability to run more samples, and approach problems differently than they would have before.

The Star-P approach has yielded significant advantages, said Dr. Mark Potts, president of HPC Applications, Inc., a consulting firm contracted to get NCI’s software up and running on the SGI Altix. “If your goal is to take the same interactive environment and transfer it to a parallel processing system with a lot more memory, then you’ll look for the easiest way to get there,” says Potts. “NCI is accustomed to working in MATLAB® and with certain formatted files, and the Star-P approach retains that environment.”

Metrics:

  • 200X Speedup on 8-processor server
  • Ability to process larger data sets (>256GB) from desktop MATLAB®
  • Transformation of research workflow

For More Information:
http://www.interactivesupercomputing.com/success/genomic_correlation.php

   

ISC Home | Forward to Friend | Subscribe

©Copyright 2007 Interactive Supercomputing, Inc. and its licensors. All rights reserved.
Interactive Supercomputing, Inc. | 135 Beaver St. | Waltham, MA 02452
Phone: +1.781.419.5050 | Fax: +1.781.419.6050 www.interactivesupercomputing.com
STAR-P™ and the "star" logo are trademarks of Interactive Supercomputing. MATLAB® is a registered trademark of The MathWorks, Inc.