Sequencing the genomes of the world's leading crops with SAS

he laboratory of crop physiology and plant improvement at UCL's Faculty of bio-engi-neering, agronomy and environment (Louvain-la-Neuve) is successfully using the SAS sys-tem in its genetic research work - both fundamental (e.g. to gain a better understanding of the root system formation in certain plants) and applied (improving the characteristics and performance of specific food plants).

The Challenge

Develop appropriate database systems with sophisticated query facilities to capture large amounts of data quickly and produce intelligible results efficiently from genetic research on plants. Develop a soft-ware system to facilitate the archiving of data from screening macroarrays of ordered DNA clones, and allow the navigation of these data to detect errors and uncertainties.

 

The sturdiness and statistical power of SAS tools, combined with their open systems design, has enabled us to build a comprehensive data warehousing system for our fundamental and applied research in a very short period of time.
Dr. Xavier Draye, Faculty of bio-engineering, agronomy and environment, UCL

Dr. Xavier Draye
Faculty of bio-engineering, agronomy and environment

The Solution

Implement a data entry graphical interface and a data warehouse system with a sophisticated exploration tool in the application development environment of the SAS software, to achieve a significant reduction in the time required for experimental setup and data entry, and in the risk of data entry mistakes.

Sequencing the genomes of the world's leading crops

UCL's Faculty of bio-engineering, agronomy and environment at Louvain-la- Neuve carries out genetic research on populations and uses quantitative genetics to gain a better understanding of the consequences of selective breeding and the evolutionary effects of natural selection, in order to examine the basic prerequisites for improve-ments to animal and vegetable species.

Many of the world's food and feed crops are members of the Poaceae family, which includes rice, wheat, maize, sorghum, sugarcane, barley, oat, rye, millet, and others.

Sorghum (Sorghum bicolor L. Moench.) is a suitable species for positional cloning because of its relatively small genome size. The small genome of sorghum provides an important tem-plate for the study of closely-related large-genome crops such as maize and sugar cane. Moreover, sorghum is a leading cereal in arid and semi-arid agriculture, ranking fifth in importance among the world's grain crops.

Dr. Xavier Draye, a researcher at the UCL's laboratory of crop physiology and plant improvement, confirmed that genetic research on sorghum and other plants should allow improvements to crop qualities and yields, while reducing the environmental impact of grow-ing these crops (reduced fertilizer residues in the soil).

"For example, we try to gather detailed knowledge of the root system size and development of the plant," he stated. "We look in the genome to try and find the genes responsible for a specific growth or form of the roots that makes them more adapted to certain soils and environments. This involves two types of analysis, a comparison of the root system in the soil and an analysis of the plant's genome, plus comparisons between the two. Our goal is to exploit the genetic variations among individu-als."

A genetic detective's work

During the last years, direct sequencing of the genomes (about 300 millions pairs of bases for rice and billions for maize) has become routine. The strug-gling work now is to annotate the DNA sequence, viz. to find the genes (each a certain sequence of pairs of bases) and assign their functions. One promising strategy is to look at the association between the presence of certain sequences (also known as DNA mark-ers) in the genome and desired charac-terictics of the roots.
More than two decades of genomics research have yielded high-density maps of DNA markers for many majorcrops. These maps provide a valuable foundation for basic research in genome organization and evolution. One of the essentials for the success of map-based cloning and physical analy-sis of large chromosomal regions is the availability of libraries containing large inserts of genomic DNA. These libraries have made valuable contributions to the production of physical maps of large regions and to the isolation of many important genes. A large insert Sorghum propinquum BAC (Bacterial Artificial Chromosomes) library has been constructed at Texas A&M University to analyze the physical organization of the sorghum genome and facilitate positional cloning of sorghum genes.

The need for a data warehouse

"We are speaking of really high num-bers here," said Dr. Xavier Draye. "Just to give you an idea, with 1,000 to 10,000 markers, we try to map 30,000 to 150,000 clones. We use grids to physically spot the occurrence of cer-tain clones, with more than 18,000 clones on a single grid. As grids can bere-used for examining (but get old and more imprecise to read), as batches tend to overlap and research people also overlap over time, we needed a monitoring system and a robust data warehouse, built upon some 10 differ-ent databases."
The system had to enable easy, accu-rate identification of the grids, while guiding the user through the various steps intuitively.

SAS software was chosen for this com-plex statistical work. "SAS /AF enabled us to develop the BAC-DMS application very rapidly – it took less than 3 months working part-time. SAS has a very open systems design and allows easy links. What's more, SAS is a highly developed language and a very robust and comprehensive statistical analysis tool, which means that an experienced SAS user can create a new application in less than half an hour. The fact that everything is done in one single envi-ronment also accelerates the work," added Dr. Xavier Draye.
The program now provides a versatile "click-a-clone" data entry graphical interface, a data warehouse including a sophisticated exploration tool, a routine to export groups of overlapping clones for interactive clone ordering, and a monitoring system for the practical management of a large number of reusable macroarrays.
"The ability to use BAC-DMS does not depend on any SAS skills," Dr. Xavier Draye concluded. "However, all native SAS windows can be accessed during a BAC-DMS session, allowing almost any database transaction to be per-formed by a knowledgeable SAS user."

Université Catholique de Louvain

Challenge

Develop appropriate database systems with sophisticated query facilities to capture large amounts of data quickly and produce intelligible results efficiently from genetic research on plants. Develop a soft-ware system to facilitate the archiving of data from screening macroarrays of ordered DNA clones, and allow the navigation of these data to detect errors and uncertainties.

Solution

Implement a data entry graphical interface and a data warehouse system with a sophisticated exploration tool in the application development environment of the SAS software, to achieve a significant reduction in the time required for experimental setup and data entry, and in the risk of data entry mistakes.

UCL

SAS DNA Markers Mapping at UCL

SAS software was chosen for the complex statistical work involved in mapping DNA markers for genome sequencing.

The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.

Back to Top