Is high-performance analytics a catalyst for change?
Computing undergoes an evolutionary leap
There's a concept in evolutionary biology called "punctuated equilibria," which is used to explain why the fossil record seems to abruptly change after long periods of apparent calm. Think of a stairway where the length between each step represents how long a species is able to survive before another life form adapts to take its place. The degree of change is represented by the height of the stair – the next step up from the older, pre-existing organism.
This idea can be applied to data processing as well, where technological conditions can remain relatively constant for extended durations of time, only to be followed by sudden and dramatic change.
Examples abound, but for us baby boomers, we can recall a time before personal computers (PCs) when most office work was done on a typewriter. People had used typewriters for more than a hundred years prior to the shift to PCs, which happened fairly quickly in the early 1980s. Nowadays, I challenge anyone to find a typewriter on which they could write a letter.
A giant leap forward in computing
We’re in the midst of another punctuated equilibria moment in technology: “distributed, (shared) in-memory processing,” collectively referred to at SAS as high-performance analytics.
While distributed computing and in-memory processing have been around separately as independent concepts for quite some time, the idea of combining them with a re-architecture of enabling code represents another giant leap forward in the computing world.
But what is distributed, in-memory processing, and how does it differ from other processing models?
Essentially the new processing paradigm is represented by the dissection of a single problem into many different subsets so that each computer has the ability to complete work on a small portion of the problem, while maintaining a two-way communication channel with all other computers working on different parts of the same problem.
The in-memory piece allows for faster consumption of the inputs because there is no costly delay in reading/acquiring data (which is where most of the time is spent versus doing calculations).
Additionally, there is an automatic step at the end where the individual disparate results are collected and assembled together for final interpretation and presentation.
Think of a pie-eating contest where one person who's trying to finish a whole pie by himself races against a group of pie eaters who devour small slices of their pie until the whole pie is consumed. By having more pie eaters working on smaller slices, the group will finish the whole pie faster (as long as the overhead of slicing the pie and delivering the slices is low). And because each member of the group can communicate before and after eating, they can share information that might make their eating go faster.
Grid computing was the first logical extension of using multiple groups of processors to solve a specific problem. In order to accomplish its work, SAS Grid Computing, also a high-performance product, seeks to replicate the entire work problem on each computer instead of sub-sectioning it into smaller pieces.
In our pie-eating analogy, an optimized grid computing environment essentially attempts to bake smaller, but still whole, pies. The pies can be reduced in size, but all have to be consumed separately, and none of the eating group can communicate with one another before or after the task at hand.
Additionally, at the end of the contest, all the results have to be manually tabulated, since there is no automated results aggregation.
Inevitably, there are more manual steps and some duplication involved in grid computing as compared to distributed, in-memory processing, but both are still orders of magnitude faster than having a single processor (even if it is multi-threaded) solve the entire problem.
What has been missing is a lightweight communication layer that could pass data processing instructions from a single origination point to many other data processing points, or nodes, that were connected in a vast network or array type of framework. SAS has now created a standardized communication layer and is currently well ahead of any competitor in this field.
This innovative work has created a foundation upon which the future of the company will be based, a point often articulated by SAS CEO Jim Goodnight. As early as January 2008, Goodnight directed his development teams to begin redesigning certain SAS analytic procedures specifically to take advantage of server farms, as he correctly envisioned this hardware configuration (also referred to as "cloud computing") as the dominant form of future computing.
However, the magnitude of the problem faced by R&D was truly monumental. The difficulty was not simply achieving the first set of results, but rather implementing a conceptual methodology that all R&D teams could easily learn and apply. Overcoming this initial obstacle laid the groundwork for integrating these speed improvements into many different SAS solutions.
As a result of our development breakthrough, these are exciting times at SAS! SAS has chosen to lead this pioneering effort and act as a true catalyst for change in the software industry.
The positive impact on our customers in the form of efficiencies to some of their business cycles is just beginning to be felt, but there is already good evidence that a large portion of business processes will be radically transformed within the next decade because of this new technology.
So why originally bring up the idea of punctuated equilibria? Basically it’s the age-old story we have always told our customers: Adapt or be displaced by the competition. Similar to what the scientific evolutionary model predicts, the businesses that find ways to better exploit their data through improved technology will be better positioned to out-compete their less nimble counterparts and rivals.