Products & Solutions / Text Mining

SAS® Text Miner

Discover new insights from in-depth content analysis

SAS Text Miner incorporates advanced linguistic capabilities within the core data mining solution of SAS® Enterprise Miner™. Consolidating structured (quantitative) data analysis with unstructured (free-form text) provides complete views and meaningful insights within an integrated predictive modeling environment. Automating manual comprehension of the textual data sources, incorporating interactive drill-down reporting, and delivering algorithms for rigorous advanced analyses make it possible to grasp future trends and act on new opportunities more efficiently and with less risk.

Benefits

  • Reduce decision time with automated processes.
  • Enhance the discovery process with subject-matter expertise.
  • Present a high-level view of data with transparent drill-down capability.
  • Recognize trends and spot business opportunities.

Read more

Features

  • Automatic Boolean rule generation makes it easy to classify content
  • User-friendly, flexible interface
  • Integrated document filtering
  • Visual analysis of results
  • Choose predefined entities, define your own, or create custom entities for fact and event extraction
  • Interactive interface for importing text from the Web or internal file systems
  • Natively supports multiple languages

Read more

" With our SAS solution, we can greatly reduce the number of customers that ever see a failure. The result is obvious – that we'll have more satisfied customers."

—Joshua Becker

Manager of Reliability

Sub-Zero Freezer Company

Read full story



How SAS® Is Different

  • SAS Text Miner delivers superior results by automating tedious, time-consuming tasks while allowing subject-matter experts to intervene if they need to refine and retrain the sophisticated text algorithms.
  • The software enhances the value of your predictive models by including social media conversations (blogs, Twitter, Facebook, etc.), news, comments, and research innovations as they occur across the globe.
  • SAS Text Miner's interactive results browser enables analysts to explore concepts and relationships between documents and dynamically make modifications to further tailor analyses and understand details.
  • The software lets you embed text analytics results directly into operational systems or into common reporting systems, such as Microsoft Outlook, Excel or Word, as well as SAS Business Intelligence technologies – for making better decisions.
  • To meet the needs of organizations that must uncover new insights from millions and tens of millions of documents, and run the models in minutes or seconds, high-performance text miner nodes have been enabled to run in a distributed, in-memory environment (SAS High-Performance Analytics Server, which is licensed separately).

Benefits

  • Reduce decision time with automated processes. By implementing intelligent algorithms and natural language processing techniques, time-consuming manual activities – such as synonym identification or topic building – can be generated automatically and executed consistently and efficiently.
  • Enhance the discovery process with subject-matter expertise. Through a unique, data-driven method of identifying key concepts, you can use interactive GUIs to modify relevance scores and guide machine-learning results with human insight – supporting active learning. You can also extend text mining efforts beyond start and stop lists – as well as what the software automatically discovers – using predefined and custom entities.
  • Present a high-level view of data with transparent drill-down capability. SAS Text Miner offers a visual presentation of the entire data mining process and allows users to drill to relevant details, illustrating and exploring the term connections. The interactive interface also lets you investigate derived topics and fine-tune models.
  • Recognize trends and spot business opportunities. SAS Text Miner can structure text into numeric representations that surmise the collection and become insightful inputs to a full range of predictive and data mining modeling techniques. In turn, you can better understand customer, service and product needs – and predict opportunities for timely exploitation.

Features

Automatic Boolean rule generation makes it easy to classify content
  • Lets you describe and predict a target variable based on the detailed terms. Resulting rules can be used to categorize documents based on rule matches.
  • Easily export the Boolean rules – for a starter rule set for SAS Enterprise Content Categorization.
  • Enables active learning by:
    • Allowing you to interactively build the algorithm.
    • Providing automated, machine-generated suggestions of categories and topics that can be recharacterized by the user.
    • Regenerating rules based on user modifications – the model is updated accordingly.
    • Creating highly refined rules through the combination and interaction of built-in software guidance/machine learning and human subject-matter expertise refinement.
User-friendly, flexible interface
  • Merge topics together into one user topic for simplifying similar results.
  • Use topic displays to show document terms/all terms, highlighting why a document was assigned to a particular topic.
  • Use view mode to illustrate just the terms in a single document or within a topic, or to sort text documents.
  • Obtain document-level sentiment insights with an AFFIN sentiment list available as a sample data set with more than 2,000 terms and preassigned polarity weights.
  • Modify, save and share process-flow diagrams of text mining analyses.
  • Use flexible reporting tools to publish results in a concise HTML format.
  • Extend text nodes further by customizing algorithms or declaring new user-written business rules for predictive modeling, clustering, visualization and reporting – deployed as SAS score code.
  • Conforms to accessibility standards for the Windows platform. Accessibility features relate to standards for electronic information technology that were adopted by the US government under Section 508 of the US Rehabilitation Act of 1973.
Integrated document filtering
  • Employ sophisticated dimension reduction techniques that enable advanced filtering through weighting, integrated spell checking and transformation of qualitative data into compact formats.
  • Create synonym data sets and import previously defined synonym lists into the text filter node to improve reusability of existing assets.
Visual analysis of results
  • Use the concept link diagram to analyze results visually and to effectively explore the relationships between terms.
  • Use interactive diagrams to communicate results to key stakeholders:
    • Employ diagrams that cluster results, derive topic assessments and link associations among terms.
Choose predefined entities, define your own, or create custom entities for fact and event extraction
  • Define your own multiword terms (phrases such as "drag and drop").
  • Choose from one of 18 prespecified entities definitions for address, company, date, phone number, SSN, time and others to ensure extraction from input content.
  • Create your own custom entities to be extracted from text inputs, including a list of pre-defined entities (such as defined districts or product codes) using SAS Concept Creation for SAS Text Miner add-on.
Interactive interface for importing text from the Web or internal file systems
  • Lets you dynamically create data sets from files contained in a directory or crawled from the Web.
  • Gives access to numerous forms of textual data, including PDFs, Microsoft Word, extended ASCII text, HTML, Microsoft Office formats, spreadsheets, presentations, email and database formats.
  • Extracts, transforms and loads textual data into a SAS data set for mining.
  • Accepts even potentially proprietary formats, converts the formats, and filters or extracts the text from the files, placing a copy in a plain file and referencing the data to SAS.
  • Identifies each document's language and transcodes it to the session encoding format.
Natively supports multiple languages
  • Supports Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese. Dialects included Simplified and Traditional Chinese, Parisian and Canadian French, Old and New World German, Nynorsk and Bokmål Norwegian, and both Portugal and Brazilian Portuguese, as well as Spanish from both South America and Spain.

Demos

Demo
SAS Text Miner Demo

View Demo

Screenshots

Screenshot
Automatically generate Boolean rules

You can generate Boolean rules automatically with the rule generator node and actively teach the model directly by overriding the software-derived results.

View Screenshot

Screenshot
Examine terms through an interactive GUI

Examine terms driving topic membership in an interactive GUI – and, if similar, merge and reassign terms and/or topics to produce desired results.

View Screenshot

Screenshot
Classify content with the rule generator node

Use the rule generator node to automatically derive Boolean rules that classify content – and use results directly in SAS Enterprise Content Categorization to provide an initial rules list.

View Screenshot

Screenshot
Visually explore term frequencies

Visually explore term frequencies and other metrics automatically generated from document collections.

View Screenshot

Screenshot
Search and filter terms

Filter viewing capability includes find/repeat find and the ability to import synonyms.

View Screenshot

Screenshot
Concept linking

The concept linking window allows visual exploration of grouped terms.

View Screenshot

Screenshot
New capabilities include text parsing, text filtering and text topic identification

SAS Text Miner 4.2 includes three new nodes (Text Parsing, Text Filter and Text Topic). Multiple-word phrases, full-text search features and the ability to incorporate user-defined custom entities are just a few of the new capabilities.

View Screenshot

System Requirements

Supported platforms
  • HP-UX Itanium: HP-UX 11iv3 (11.31)
  • IBM AIX: Version 6.1 and 7.1 on POWER architectures
  • Linux for x64 (EM64T/AMD64): RHEL 5 and 6, SuSE SLES 10 and 11
  • Microsoft Windows (x86-64): Windows XP Professional, Windows Vista*, Windows 7**, Windows Server 2003 family, Windows Server 2008 family
  • Microsoft Windows on x64 (EM64T/AMD64): Windows XP Professional for x64, Windows Vista* for x64, Windows 7** for x64, Windows Server 2003 for x64, Windows Server 2008 for x64
  • Solaris on SPARC: Version 10
  • Solaris on x64: Version 10
Supported Web browsers
  • Internet Explorer 7 and 8 on Windows XP Pro, Windows Vista* and Windows 7**
  • Firefox 3.6 on Windows XP Pro, Windows Vista*, Windows 7** and Linux 32-bit, Linux x64
Middle tier required/optional software
  • SAS client and middle tier require Sun JRE 1.6
Required software
  • SAS Enterprise Miner is required and must be installed on the same machine as SAS Text Miner; or, SAS Enterprise Miner for Desktop is required and must be installed on the same machine as SAS Text Miner for Desktop
SAS® Text Miner for Desktop

Client tier (only)

  • Microsoft Windows (x86-64): Windows XP Professional, Windows Vista*, Windows 7**
  • Microsoft Windows(x64): Windows XP Professional for x64, Windows Vista* for x64, Windows 7** for x64

*NOTE: Windows Vista editions that are supported include Enterprise, Business and Ultimate.
**NOTE: Windows 7 supported editions are Professional, Enterprise and Ultimate.

Ready to learn more?

Call us at 1-800-727-0025 (US and Canada) or request more information.