Want more Insights from SAS? Subscribe to our Insights newsletter. Or check back often to get more insights on the topics you care about, including analytics, big data, data management, marketing, and risk & fraud.
SAS Analytics AND Open Source: Why Should Organizations Consider SAS?
By Steve Holder, National Lead Analytics SAS Canada and Tina Schweihofer, Senior Solutions Principal SAS Canada
When Linus Torvalds introduced the Linux operating system—the pioneer of Free and Open Source Software (FOSS)—it was hard to predict what impact open source would have on computing. It took Linux nearly 10 years to find it’s place and become an enterprise standard, today this time to standardization is much shorter. This acceleration means the open source community is developing enterprise ready software faster impacting the entire software landscape. To prove the point, today Apache Tomcat dominates the web server software market and the various distributions of Linux constitute the go-to operating system on enterprise servers.
There are several factors driving the growth of open source in the enterprise.
* Low cost to get started. The software is free, though vendors are allowed to repackage open source software with various utilities and customized functionality and sell it.
* A large and very active community of developers. By its very nature, it’s hard to tell how many there are. In 2012, Marten Mickos, CEO of HackerOne, estimated there may be as many as 300,000 open source projects and 450,000 contributors—not including those with full-time day jobs developing open source.
* Rapid innovation. Any improvement or change to the code of an open source project can be done quickly and immediately gets back into the pool. Users can choose to adopt it or not based on their needs and other contributors can then repurpose the code if need be. This is because open source software operates under a general public license (GPL), free to use and distribute. The result is more people working on creating and innovating.
SAS AND OPEN SOURCE
One of the most exciting developments on the open source front recently has been Hadoop, a framework for distributed storage and processing of huge data sets on clusters of commodity hardware. An entire ecosystem of software packages that can be run on top of or alongside the base elements (Hadoop Distributed File System for storage, MapReduce for processing) has sprung up. This ability to handle Big Data problems means Hadoop and analytics products from SAS Institute are ideally complementary.
SAS products aren’t meant to replace the functionality of open source projects. They’re meant to augment it to create an enterprise analytics platform. There are essentially two ways to achieve this integration, both of which massively improve performance and usability—running SAS inside an open source environment, or running open source within SAS.
SAS IN OPEN SOURCE
Running SAS in an open source environment eases the transition for users who aren’t familiar with SAS in a number of ways.
* Base SAS offers a Java object to respond to a variety of external programming languages, including open source Python.
* SAS Procedures can be called from open source tools.
* The Jupyter Kernel for SAS brings data manipulation and analytics capabilities to the iPython-based Jupyter notebook.
Data scientists can maintain a familiar user interface, while allowing SAS to extend the productivity and scalability of open source applications.
OPEN SOURCE IN SAS
SAS products offer integration features that leverage the power of open source applications. SAS Enterprise Miner, for example, has support for R, a programming language and software environment for statistical analysis. It supports a variety of R packages and can import any R code. An open source node imports any open source model.
This allows the creation of ensemble or blended models, combining models from SAS and open source applications. These models can be converted to score code for deployment with a drag-and-drop interface. SAS automatically documents best practices to promote collaboration and reduce turnover risk.
Integration of SAS and open source systems reduces data movement, which slows production and can leave trace, out of date data elements behind, opening the door to inaccuracies.
SAS supports Hadoop throughout the entire analytics lifecycle: preparation of data, exploration of data, modeling, inventory and execution.
DEPLOYMENT IN THE CLOUD
Open source has changed the economics and ethos of enterprise computing. Another long-growing trend is cloud computing, which has changed IT deployment model at its foundation. According to a survey by RightScale, companies are using on average six clouds—three public and three private—while hybrid cloud operations continue to gain popularity. Research firm Gartner predicts $114 billion in cloud spending for 2016, a number that will rise to $26 billion by 2020. That’s a shift of $1 trillion in spending from traditional hardware to cloud computing.
In addition to augmenting open source environments SAS has embraced it to help enterprises deploy their SAS analytics platforms as well. SAS embraces all these cloud enabling technologies such Docker, a software container deployment engine; CloudFoundry, a platform-as-a-service that makes it easier to build, deploy, run and scale applications; Ansible, an automation engine for cloud deployments; and GitHub, which tracks configuration changes and offers code and public repositories.
VIYA: THE FUTURE IS OPEN
To leverage these technological innovations, SAS has created a new analytics architecture: Viya. SAS Viya is designed to be:
Cloud-ready. SAS Viya is built to be elastic and scalable for both private and public clouds using the industry-standard Cloud Foundry deployment platform. Multi-cloud architectures can natively run in a range of cloud infrastructures.
Open. Users who don’t code in SAS can access it in Python, Lua, Java, and other programming languages, as well as by using public REST APIs.
Unified. From a visual interface, using one common code base, organizations can manage their entire Analytical Lifecycle, from data management to model building – including the necessary visualization and reporting.
Simple. Write code once; deploy it anywhere. Users can run and score analytical code in-memory, in a data stream, in-database, in-Hadoop, in the cloud, or even on a device. SAS Viya also governs and inventories models built with SAS and non-SAS tools.
SAS remains committed to ensuring our tools work in open source environments, as evidenced by our participation in the Open Data Platform Initiative and Data Governance Initiative. But this article barely scratches the surface of the integration of SAS products and open source. For a more comprehensive discussion, read our point of view, SAS in the Open Ecosystem.