Four steps to developing reusable code and production processes
This case study was adapted from the SAS Global Forum 2010 paper Creating Easily Reusable and Extensible Processes: Code that Thinks for Itself
By Faisal Dosani, Royal Bank of Canada; Lisa Eckler, Lisa Eckler Consulting Inc.; and Marje Fecht, Prowerk Consulting Ltd.
It's easy to write code that answers only one need, but more challenging to develop a hands-off process that adapts to many needs. Focusing on the bigger picture when projects and requests come across your desk allows you to create flexible and extensible solutions that avoid maintainability issues and enable speed to market of results.
In creating a framework that consists of easily reusable and extensible code and processes, your aim should be a multi-layered, multi-component approach. The processes need to be adaptable and need to accommodate the presence or absence of any of the layers and some of the components.
The following are four steps for achieving that objective.
Step 1: Planning and requirements gathering
Defining a hierarchy of components at this stage will aid in identifying which are required and which are optional. The process must flow smoothly whether or not the optional components are present. This will support a modular design and will also help break the project into manageable sub-tasks. For example, you might need information on the enterprise, division or project component.
It's important to distinguish the consistent from the distinctive elements: What will always be included, either at the enterprise level or the division level, and what aspects must always be customized for each divisionor project? You must also determine whether some of the customizations are filters that could be controlled by metadata or parameters or if they are truly unique requirements?
Identifying the intended recipients or users of each type of result at this early stage is helpful because it may expose details on what information should or shouldn't be included. Likely, the key users represent a cross-section, perhaps including business users, analysts and project planners. Each of these users will have different needs, which should be identified early.
The next step is to control the process. Can you anticipate what changes to the overall requirements are likely to occur in the future? If so, this should influence the design to maximize flexibility and ease maintainability.
The exploration of requirements should include a discussion of how you can deliver and validate your results. Delivering intermediate products for review and user acceptance during development will help structure the project and build respect and acceptance.
Step 2: Design
2.a Process and documentation
An equally important part of preparing the framework is properly documenting it. Documentation can include such things as data definition tables, database table definitions, flow diagrams and reporting templates. The documentation should also outline how the various components communicate with each other to ensure a symbiotic relationship throughout.
Designing the framework to incorporate a metadata module from the beginning will save time now and in the future, as metadata will drive which sections of the framework are called upon and when. Metadata will not only help centralize conditional logic but will help with processing, scheduling, delivering and automating. Each project will have a set of definitions within metadata describing certain cues or functionality that the framework will handle.
Instead of manually checking what gets run when or where the results need to be delivered, you can place some of those conditions in metadata. This cuts down on quality-assurance processing and development work in the future and offers much more flexibility.
It is important to document the coding modules you anticipate building. They should follow similar naming conventions, storage locations and input/output styles.
2.b Modularizing your code
There are many possible levels of code and algorithm modularization. When determining which to employ, you should consider code complexity, applicability for generalization and ease of use.
Let's explore two modularization techniques used in this project: driver/source programs and macro modules.
Consider one very common approach to program creation where you locate an existing program similar to the functionality you require, copy the code and store it as a new and unrelated program, start making changes to the code to tailor it to the current task and run and test the code as if it had never been used before. This results in a lot of lengthy programs that all may require changes as the business rules and data change.
Now consider the approach where you recognize that sections of your programs can easily be reused by just supplying the information that changes via macro variables or control datasets. You proceed to break the code into those sections and store them as separate, callable modules. This approach is what we call driver/source:
A repeatable process like this requires additional and extensive testing to confirm that changes to the source code modules accommodate all drivers that call the modules. But the payoff is that when business rules change, there is only one source program to revise rather than a multitude. And once the flexible process is set up, reuse of code is simple.
Assuming that you have generalized your processes into source modules, when you are ready to start reporting for a new project, just create a driver with necessary parameter values as input to your source modules.
It should not come as a surprise that the SAS Macro language will likely be the choice for creating reusable modules within a framework. Macros have always been a great way to generalize by building code and passing parameters through a process to influence the results. We can write generic code that can handle several situations rather than repeating code over and over again. Macro tools will allow us to build the robust and versatile framework we desire.
Macro variables serve a similar purpose and are a component of the macro language. They can be used to feed information into code that only differs by certain parameters and can be used for passing down instructions and conditions from metadata at a global level within your framework. This will allow your code to be very generic and mutate when we instruct it to do so. Using metadata coupled with macros will truly enable hands-off processing, and through several iterations of code it will become obvious as to where these techniques can be implemented.
Macros can also be used to control which process and logic gets executed. Metadata coupled with macro logic can help execute code based on certain conditions.
2.c Managing results
The associated log filename should include the SAS program name, for easy reference, along with the date time stamp corresponding to when the program ran.
To automatically create the log file with each run of a program, use PROC PRINTTO to begin log writing at the top of the SAS program. We recommend further generalization of the program by using macro variables for the filename and file pathing to enable easy changes and usage for other purposes within the program (such as naming other files).
What about SAS datasets?
When creating permanent SAS datasets, including a date-time stamp in the name may be difficult due to the 32-character limit. Further, the dataset name may need to be more static for downstream programs. Instead, consider generation datasets.
Each dataset in a generation data group has the same member name (dataset name) but has a different version number. Every time the dataset is updated, a new-generation dataset is created and the version numbers of the older versions are incremented. The DEFAULT version is called the base version and is the most recent version of the data.
What are the advantages of using generation datasets?
For further information on creating and using generation datasets, see the genmax and gennum dataset options in SAS online documentation.
Step 3: Testing and quality assurance
Essentials of the QA are:
Step 4: Rollout
You must consider how the new data requests will be handled, how and when jobs will be scheduled, how results will get sent to users and how to automate it all.
Metadata will be the driving force in addressing ways to handle the rollout process. It's used to help describe the processes you're running and will be the gatekeeper to managing their execution and delivery.
Imagine trying to determine which processes need to run monthly by working through a list of jobs, checking start and end dates. This can be very tedious, not to mention prone to human error. New data requests will involve the creation of new driver programs. This can be as simple as altering the project identifier so that when the driver is executed it can refer back to the metadata that describes the execution rules for that project. This type of processing really simplifies the creation of new data or report delivery, since metadata drives the processing.
Delivering the data to end-users also can be driven from metadata cues. You can set up delivery locations or methods conditionally, based on this metadata.
A logical extension from generating a driver program for each unique project would be to have one super-driver program that invoked the relevant processes for the active projects for each reporting period. The examples that we have discussed can be taken a level higher, where your driver programs would be automatically generated and executed based on metadata. This would mean your metadata would need to be fairly extensive in describing all the rules and relationships needed to work properly. A master program could take metadata and generate the conditional calls and rules to run the processes, in addition to when to run them. We have the project Start and End dates, so we know when the framework needs to be invoked for each project. Divisional indicators can describe the people who receive results and where they receive them. Project definitions would individualize the data to the requestors' needs. These metadata elements will help drive the use of the modular items created and designed in earlier sections.
Keep in mind that a repeatable process like this requires extensive testing to confirm that changes to the source code modules accommodate all drivers that call the modules. In the long run, time and effort is saved by building a robust modular process with wide applicability.
A more detailed version of this case study, complete with code samples, can be found on the SAS Customer Support Site.
The results illustrated in this article are specific to the particular situations, business models, data input, and computing environments described herein. Each SAS customer’s experience is unique based on business and technical variables and all statements must be considered non-typical. Actual savings, results, and performance characteristics will vary depending on individual customer configurations and conditions. SAS does not guarantee or represent that every customer will achieve similar results. The only warranties for SAS products and services are those that are set forth in the express warranty statements in the written agreement for such products and services. Nothing herein should be construed as constituting an additional warranty. Customers have shared their successes with SAS as part of an agreed-upon contractual exchange or project success summarization following a successful implementation of SAS software. Brand and product names are trademarks of their respective companies.
Copyright © SAS Institute Inc. All Rights Reserved.
Royal Bank of Canada
Increasing speed-to-market of applications development.
Strategies for developing reusable and extensible code and processes, including the SAS Macro Language.
Creating short, focused code modules saves both time and effort in applications development and is more efficient to document.