Products & Solutions / Text Mining

SAS® Text Miner

Capitalize on the value hidden in textual information

SAS Text Miner discovers information buried in unstructured text collections, saving time and money by automating the tasks of reading and comprehending electronic text. Analyze legacy data stored in your IT department – and dynamically reach outside to retrieve pertinent, fresh Web content. Through interactive drill-down reporting and visualizations, SAS Text Miner helps you distill insight, grasp topical trends and act on new opportunities more efficiently and with less risk. The software builds on the core data mining solution of SAS® Enterprise Miner™.

Benefits

  • Reduce decision time through automated processes.
  • Enhance the discovery process by uncovering associations and relationships.
  • Present a high-level view of data with deep drill-down capability.
  • Recognize trends and spot business opportunities.

Read more

Features

  • Universal data access and text import node
  • Support for multiple languages
  • User-friendly, flexible interface
  • Text parsing node
  • Text filter node
  • Dimension reduction techniques
  • Text topic and text clustering nodes
  • 360-degree view of your data

Read more

" No other software delivers this depth and breadth of analytic functionality."

—Patricia Cerrito, PhD

Professor of Mathematics

University of Louisville

Read full story


Screenshot

Visually explore term frequencies


Screenshots

How SAS® Is Different

SAS Text Miner provides a rich suite of linguistic and analytical modeling tools for discovering and extracting knowledge from across text collections.

  • Advanced data capture, with more than 27 supported languages and dialects, provides built-in capabilities to transcode and identify the language, convert the file format, and import from file systems and the Web.
  • Within a single, integrated environment, text mining results extend the understanding of structured information and can be readily included in both descriptive and predictive modeling.
  • SAS Text Miner's interactive results browser enables analysts to explore concepts and relationships between documents and dynamically make modifications to further tailor analyses.
  • The software lets you embed text analytics results directly into operational systems or into common reporting systems, such as Microsoft Outlook, Excel or Word – for intelligence you can act on.

Benefits

  • Reduce decision time through automated processes. By implementing intelligent algorithms and natural language processing techniques, time-consuming activities previously done manually – such as theme identification, tagging or topic library building, and document index creation – can be generated automatically and executed efficiently.
  • Enhance the discovery process by uncovering associations and relationships. Extend the value of your text mining efforts from SAS Text Miner to import custom (known) entities, facts and events; then, apply data-driven methods to reveal buried relationships and identify concepts that build upon what you already know.
  • Present a high-level view of data with deep drill-down capability. SAS Text Miner offers a visual presentation of the entire data-mining process and allows users to drill down to relevant details, illustrating the connections and exploring the links between items in document collections; an interactive interface lets you investigate derived topics and fine-tune the model.
  • Recognize trends and spot business opportunities. Building on the full range of predictive modeling tools in SAS Enterprise Miner, you can analyze information like customer letters and call center notes to understand customer, service and product needs – and predict opportunities for timely exploitation.

Features

Universal data access and text import node
  • Lets you dynamically create data sets from files contained in a directory or from the Web.
  • Gives access to numerous forms of textual data, including PDFs, Microsoft Word, extended ASCII text, HTML, Microsoft Office formats, spreadsheets, presentations, email and database formats.
  • Extracts, transforms and loads textual data into a SAS data set for mining.
  • Accepts even potentially proprietary formats, converts the formats, and filters or extracts the text from the files, placing a copy in a plain file and referencing the data to a SAS data set for mining.
  • Provides Web-crawling capabilities – including social media discussions such as Twitter and news feeds – retrieving files and bringing them to the common directory before filtering; output can be used by the text parsing node.
  • Identifies each document's language and transcodes it to the session encoding format.
Support for multiple languages
  • Support for Latin-1, Double Byte Character and UTF-8 encodings.
  • European languages (Latin-1 encoding): Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish and Vietnamese.
  • Eastern languages (Double Byte Character Support): Arabic, Chinese, Japanese, Korean.
  • Dialects: Simplified and Traditional Chinese, US and UK English, Parisian and Canadian French, Old and New World German, Nynorsk and Bokmål Norwegian, Portugal and Brazilian Portuguese, and Spanish from both South America and Spain.
  • The product ships with English and the native language if other than English. Additional languages are licensed as add-ons.
User-friendly, flexible interface
  • Text mining is encapsulated into four different nodes corresponding to common tasks that can be combined in any way.
  • Reusability of existing assets can be improved by creating synonym data sets and importing previously defined synonyms into the text filter node.
  • Text filter node allows editing of any existing subset of documents, and text topic view allows creation of a user-specified number of topics.
  • Both the text filter and text topic viewers permit you to search for specific terms of interest.
  • Text topic view allows creation of a user-specified number of topics, and also:
    • Permits export of raw rotated singular value decomposition (SVD) topic values (for use in any predictive modeling nodes).
    • Retains user-specified term and document cutoff values, with reruns of the text topic node.
  • Table editing lets you sort columns and insert and delete multiple rows.
  • The concept link diagram displays a visual relationship between terms.
  • Process-flow diagrams of text mining analysis can be modified, saved and shared with others.
  • Flexible reporting allows results to be published in a concise HTML format.
  • Text nodes operate directly with a variety of SAS Enterprise Miner nodes.
  • Text nodes can be extended further by customizing algorithms or declaring new user-written business rules for predictive modeling, clustering, visualization and reporting – deployable as SAS score code.
Text parsing node
  • Default or customized stop lists will remove terms with little or no informational value from your analysis.
  • Automated spelling correction.
  • Automatic stemming to identify root words.
  • Automatic part-of-speech tagging based on sentence context.
  • Noun group extraction for identifying phrase-level concepts such as "competitive intelligence."
  • Out-of-the-box support for many different entity types, including person and company names, locations, dates, addresses, measurements, and email and URL addresses.
    • Entities are customized for every language supported.
    • Custom entities can be included with SAS Concept Creation for SAS Text Miner add-on.
  • User-defined multiword tokens, such as "point and click."
  • User-customized and default synonym lists.
  • Comprehensive capabilities include compound word splitting into distinct subterms.
Text filter node
  • Contains a concise view of documents and vocabulary on all terms discovered during parsing, with metrics such as frequency counts.
  • Automates spell checking by mapping misspelled words to the terms from which they were misspelled.
  • Applies Google-like searches or SQL WHERE clauses to subset analysis (for example, conducting separate warranty analysis for each make or model of automobile).
  • Can programmatically and interactively distinguish and filter out unimportant terms, easily map abbreviations and represent other equivalent terms.
  • Provides integrated, full-text search capabilities, and an interactive query will retrieve individual documents matching whatever search parameters you specify.
  • Lets you base filters on any characteristic, including presence or absence of terms, and offers interactive visualization so you can probe until you find documents and terms that meet your specifications.
  • Includes concept maps to link terms, phrases and entities in a visual, interactive manner to highlight previously undetected patterns.
Dimension reduction techniques
  • Roll-up terms automatically identify the n-highest-weighted terms in a document.
  • Singular value decomposition (SVD) transforms each document into an n-dimensional space where the closer two documents are in that space, the more similar they are.
Text topic and text clustering nodes
  • Documents can have unique membership in only one cluster, or in multiple clusters (text topics).
  • Taxonomy browser displays the default topics automatically generated, as well as the manually created topics defined by the user.
  • Documents can be categorized as belonging to zero, one or even many different topics.
  • Topics can be customized interactively in an easy-to-comprehend and intuitive visual environment.
  • Expectation-maximization clustering groups documents into discrete nonoverlapping clusters (also known as hard clustering) using spatial clustering techniques.
  • Users have enhanced control over clustering parameters with the text cluster node.
  • Hierarchical clustering facilitates automatic grouping of documents into taxonomies.
  • Overall analysis (e.g., age, purchase propensity, churn, etc.) is enhanced as the software profiles clusters and topics by incorporating structured data from original documents.
360-degree view of your data
  • Combines textual data with traditional structured data mining to automate, visualize, classify and deploy your predictive modeling results.
  • Displays performance assessments of multiple models side-by-side, helping you to select the best one to deploy in your operations for scoring new documents.
  • Directly integrates output from SAS Enterprise Content Categorization into your text mining analysis.
    • Discovered topics and themes produced by SAS Text Miner are valuable input for SAS Enterprise Content Categorization, especially in situations where taxonomies did not previously exist.

Screenshots

Screenshot
Visually explore term frequencies

Visually explore term frequencies and other metrics automatically generated from document collections.

View Screenshot

Screenshot
Search and filter terms

Filter viewing capability includes find/repeat find and the ability to import synonyms.

View Screenshot

Screenshot
Concept linking

The concept linking window allows visual exploration of grouped terms.

View Screenshot

Screenshot
Task linking

Link together data and analysis tasks for text and structured data mining.

View Screenshot

System Requirements

Supported platforms
  • HP-UX Itanium: HP-UX 11iv3 (11.31)
  • IBM AIX: Version 6.1 and 7.1 on POWER architectures
  • Linux for x64 (EM64T/AMD64): RHEL 5 and 6, SuSE SLES 10 and 11
  • Microsoft Windows (x86-64): Windows XP Professional, Windows Vista*, Windows 7**, Windows Server 2003 family, Windows Server 2008 family
  • Microsoft Windows on x64 (EM64T/AMD64): Windows XP Professional for x64, Windows Vista* for x64, Windows 7** for x64, Windows Server 2003 for x64, Windows Server 2008 for x64
  • Solaris on SPARC: Version 10
  • Solaris on x64: Version 10
Supported Web browsers
  • Internet Explorer 7 and 8 on Windows XP Pro, Windows Vista* and Windows 7**
  • Firefox 3.6 on Windows XP Pro, Windows Vista*, Windows 7** and Linux 32-bit, Linux x64
Middle tier required/optional software
  • SAS client and middle tier require Sun JRE 1.6
Required software
  • SAS Enterprise Miner is required and must be installed on the same machine as SAS Text Miner; or, SAS Enterprise Miner for Desktop is required and must be installed on the same machine as SAS Text Miner for Desktop
SAS® Text Miner for Desktop

Client tier (only)

  • Microsoft Windows (x86-64): Windows XP Professional, Windows Vista*, Windows 7**
  • Microsoft Windows(x64): Windows XP Professional for x64, Windows Vista* for x64, Windows 7** for x64

*NOTE: Windows Vista editions that are supported include Enterprise, Business and Ultimate.
**NOTE: Windows 7 supported editions are Professional, Enterprise and Ultimate.

Ready to learn more?

Call us at 1-800-727-0025 (US and Canada) or request more information.