Customers /

SAS Institute Inc. World Headquarters
SAS Campus Drive, Cary, NC 27513
Tel (800) 727-0025
Fax (919) 677-4444
www.sas.com/success

Customers

Printer-Friendly Printer-Friendly

Customers

 

The power of the grid

Texas Tech accelerates research with SAS® and grid computing

In many fields of academic study – including physics, genetics, medicine and others – scientific research requires an intimate look at huge amounts of data and information. Indeed, probability modeling and data analysis have permeated modern scientific theories to the point where even a simple hypothesis can quickly become a complicated statistical problem that can be addressed only with simulation and validation – important processes that determine the statistical significance of results.

"Whether you're working in genomics and analyzing genes from different gene banks; working in chemistry and looking through all kinds of molecular databases; or working in astrophysics and looking through terabytes of microscopic data, more and more researchers are using huge databases and sophisticated algorithms to attack their investigations," explains Peter Westfall, director of the Center for Advanced Analytics and Business Intelligence at Texas Tech University in Lubbock. Westfall holds a doctorate in statistics.

At Texas Tech, Westfall and many of his colleagues are using SAS and grid computing to develop their computer-intensive research methods faster than ever before. Grid computing is a method of harnessing the power of many computers in a network to solve problems that require a large number of processing cycles and involve huge volumes of data.

Grid computing taps the unused processor capacity of hundreds, sometimes even thousands, of computers – the SAS/CONNECT grid at Texas Tech contains 200 computers, all running SAS – to reduce the processing times for heavy, computer-intensive problems. In this way, users can achieve much faster results on large, complicated projects at a much lower cost.

In an academic environment, as in a business environment, time savings can mean a huge return on investment. "Research is an ongoing process," says Westfall. "The ability to move quickly and adapt quickly to change is appealing to academics."

Grid computing offers substantial returns
Westfall coordinates SAS grid computing with a team of researchers at Texas Tech. In a project that sought to help financial companies identify new ways to manage multiple portfolios, stock portfolios were created by repeatedly sampling investment data from large financial databases. The analysis was accomplished in 14 days. Without the grid, Westfall says, it would have required more than 500 days of continuous computing time on a dedicated machine.

In addition, Westfall used grid computing to identify how anomalies, such as hurricanes and terror threats, can affect stock prices. In this project, a bootstrap statistical methodology, which essentially shuffles stock market data over and over again thousands of times, was developed and evaluated to create a methodology that the financial community can use to better predict and manage certain situations.

"Evaluating the bootstrap methodology required an additional simulation layer, effectively squaring the needed time of what was already a computer-intensive technique," says Westfall. The overall computing time was significantly decreased using the SAS grid, reducing what would normally take more than a week to just one afternoon. As a result, Westfall was able to complete his research on this project in time for publication. "We had a big deadline that I don't think we would have met without grid computing on our side," he recalls.

"If you can cut down the time of your research cycle, that's a significant ROI," adds Westfall. "When you can get something done in one-thirtieth of the time – and our time savings are much larger than that – but even a reduction of that level gives you a much stronger ability to move forward with your research."

Genetic research with significant results
In addition to financial research, Texas Tech also has used SAS to grid-enable various genomic applications, including one that performs "wildcard searches" for combinations of base elements – represented by the letters A, C, G and T – among the millions of sequential letters in a single gene. Biologists use the results in their search to locate specific portions of DNA that may inhibit or otherwise affect specific diseases.

The wildcard aspect of the search allows researchers to search not only for simple combinations like A, T, T, C, A but also for more complex possibilities of letter combinations such as A, [C or T], [any letter], G, C, A. Additionally, the application randomly scrambles the entire DNA sequence and searches repeatedly to determine the significance of the find.

Using the grid with SAS for this application has allowed researchers to identify potential DNA regulators more quickly than they would have with a single desktop application. In the long run, that early identification means the SAS grid is helping scientists move one step closer to treating dozens of genetic diseases.

SAS ideal for grid computing
Westfall developed the SAS grid along with Phil Smith, director of the High Performance Computing Center at Texas Tech. "SAS complements our other grid technologies by providing a minimal programming interface to access grid capabilities," says Smith. "Many researchers do not have the time or inclination to develop C or FORTRAN codes that are necessary to utilize campus grids. SAS empowers its users by making grid computing available through SAS/CONNECT."

Overall, Westfall enjoys having access to the analytic power of SAS in combination with the grid's tremendous processing power straight from his desktop. Plus, SAS provides everything he needs in one package. "Only SAS offers database management, statistical analysis, optimization and econometrics in one complete package. Having all the tools right there inside the same package makes SAS ideal for grid computing."

Copyright © SAS Institute Inc. All Rights Reserved.

Texas Tech University

Challenge:
Solve complex business and academic problems that involve tremendous amounts of data and highly complex statistical procedures.
Solution:
SAS Analytics, running on hundreds of computers within a large grid computing structure, helps researchers solve computer-intensive projects in significantly less time.

When you can get something done in one-thirtieth of the time … a reduction of that level gives you a much stronger ability to move forward with your research.

—Peter Westfall

Director of the Center for Advanced Analytics and Business Intelligence, Texas Tech University