SAS United Kingdom
News Events Services Academic SAS Careers Contact Us
Home Products and Solutions Customer References Partners Company Customer Support www.sas.com

Technologies /Analytics / Data Mining

Products and Solutions
Industries
Solution Lines
Data Integration & ETL
Business Intelligence
Analytics
Statistics
Data & Text Mining
- Predictive Analytics/Data Mining
- Scoring Acceleration
- Text Mining
Data Visualisation
Forecasting & Econometrics
Optimisation
Model Mgmt. and Deployment
Quality Improvement
Enterprise Intelligence Platform
Information Management
Capital Markets
Communications
Gambling & Gaming
Public Sector
Product Index A-Z
 
 

Text Mining with SAS® Text Miner

Capitalize on the value hidden in textual information

Features

Universal data access

  • Access to numerous forms of textual data, including PDF, extended ASCII text, HTML and Microsoft Word.
  • Web crawling capabilities.
  • Ability to extract, transform and load textual data into a SAS data set for mining.
  • Support for multiple languages

  • Total language list: Danish, Dutch, English, Finnish, French, German, Italian, Japanese, Korean, Norwegian (Bokmal), Portuguese, Spanish, Swedish, Traditional Chinese and Simplified Chinese.
  • Support for Latin-1, Double Byte Character and UTF-8 encodings.
  • European languages (Latin-1 encoding): Danish, Dutch, English, Finnish, French, German, Italian, Norwegian (Bokmal), Portuguese, Spanish and Swedish.
  • Far-Eastern languages (Double Byte Character Support): Japanese, Korean, Simplified Chinese and Traditional Chinese.
  • Encoding support for Unicode UTF-8.

Self-documenting interface

  • User-friendly interface eliminates manual coding with visual diagrams.
  • Process flow diagrams can be modified, saved and shared with others.
  • Flexible reporting allows results to be published in a concise HTML format.

Comprehensive text preprocessing capabilities

  • Capture and distill the most important underlying information within a document collection.
  • Default or customized stop lists for each language to remove terms with little or no informational value.
  • Automated spelling correction.
  • Stemming to identify root words.
  • Part-of-speech tagging based on sentence context.
  • Noun group extraction for identifying phrase-level concepts such as "competitive intelligence."
  • User-defined multiword tokens, such as "point and click."
  • User-customized and default synonym lists.
  • Compound word splitting into distinct subterms.

Extensive feature extraction

  • Broad customizable data dictionaries can extract particular pieces of information such as names of people, products, organizations, URLs and addresses.
  • Extracted entities are then normalized and included in a matrix table.
  • Entity extraction is available for English, French, German and Spanish.

Dimension reduction techniques

  • Textual data is preprocessed into an information-rich matrix for application of powerful dimension reduction techniques.
  • Rollup terms automatically identify the n-highest weighted terms in a document.
  • Singular value decomposition (SVD) transforms each document into an n-dimensional subspace.

Text clustering algorithms

  • Group documents based on their content.
  • Expectation-maximization clustering groups documents using spatial clustering techniques.
  • Hierarchical clustering using Ward’s agglomerative method facilitates automatic grouping of documents into taxonomies. Documents grouped into hierarchical clusters belong to one leaf cluster as well as its parent clusters.
  • Cluster documents downstream in the Process Flow Diagram using K-means or SOM/Kohonen clustering.
  • Profile clusters using additional structured data from original documents (age, purchase propensity, etc.).

Download the complete SAS Text Miner Fact Sheet.

 

Ready to learn more?

Call us at 01628 486 933 (UK) or request more information.

 

 

Questions?

 

News

SAS acquires Teragram to strengthen our industry-leading text mining and analytics solutions

 

Webcast

Supercharging Your Business Intelligence with Text Analytics

The Power to Know
   Contact Us     Search     Terms of Use & Legal Information     Privacy Statement   Copyright © 2009 SAS Institute Inc. All Rights Reserved