Developing a central platform for overcoming integration challenges in big data exploration

The Statistical Office of the Republic of Slovenia (SURS) has developed general software solutions for statistical data processing by means of the SAS solution in order to overcome the challenges of integrating business processes.

In the past, SURS, which is the main provider and coordinator of the activities of Slovenian national statistics, carried out statistical processing on the central computer of the Government Centre for Informatics. With the transition from the central computer to the LAN environment, SURS was obliged to seek an alternative solution for certain standard software packages that could not be used in the LAN environment. “SURS already used SAS on a central computer; however, its use has expanded enormously with the transition to the LAN environment. SAS played a key role in the smooth transition to the LAN environment with effective and statistician friendly tools,” says Genovefa Ružić, SURS General Director.

The main SAS advantages are user-friendly tools and an open architecture, which follows the latest trends.
Genovefa Ružić

General Director at SURS

SAS as a central development platform for data preparation

With the transition to the LAN environment, SURS started to use different tools, which, according to General Director Ružić, made it an advanced user, compared to other statistical offices that use SAS. “Most of the other statistical offices use SAS for analytical purposes, as a solution for business intelligence. Rare statistical offices use SAS as a platform for data integration; very rare is the use of SAS as a major development platform,” explains Ružić.

SURS developed general software solutions for statistical data processing by means of SAS. Solutions offered by SAS enable it to overcome the challenges of business process integration and introduce service-oriented architecture (SOA) in the statistical production system. “With its open service-oriented development platform, SAS enables rapid and efficient development of generic building blocks of the statistical production system and their integration with other development platforms at SURS,” adds Ružić, and continues that SAS is a central platform for data preparation (which includes cleaning, arrangement, and transformation) and the implementation of statistical methods and analyses. Furthermore, it enables advanced “in-memory” statistical analysis that is especially important in analysing mass data.

For simple and highly complex statistical procedures

Furthermore, the SAS tools support very simple procedures, such as the calculation of basic descriptive statistics, and highly complex ones, which are based on statistical models.

The indicated tools play a key role in the Process and Communication Division and the Information Infrastructure and Technology Division, which prepare samples for research and general solutions. Tools allow substantive methodologists to carry out statistical considerations, data aggregation, and the calculation of quality rates.

SAS® Contextual Analysis for the analysis of big data sources

Each statistical office strives to develop new innovations in the field of statistical services, the same applies to SURS, which also deals with the exploration of big data sources. “One of the characteristics of big data sources its non-structured nature, which prevents their classification and subsequently enables the detection of the characteristics of this data,” says the General Director.

The SAS® Contextual Analysis solution is also intended for those users who are not experts in programming or statistical models. SURS is successfully using the tool in a pilot programme, primarily for collecting data on job vacancies by searching for those URLs, within the given companies’ URLs, which are potentially associated with vacancy advertisements. In the second phase it is using various machine learning methods to detect ads and the number of vacancies and classify texts in a particular group. In this respect, the SAS® Contextual Analysis is seen as a potentially very useful tool, which enables users a relatively easy use with its graphical interfaces.

Better efficiency and usability of procedures

“The main SAS advantages are user-friendly tools and an open architecture, which follows the latest trends,” concludes the General Director. Their application results in significantly improved efficiency and usability of procedures and processes. 

SAS user-friendly tools and open architecture enables exploration of non-structured big data sources and their detection and classification for further use.

