|
|
Text Mining with SAS® Text Miner
Capitalize on the value hidden in textual information
Features
Universal data access
Self-documenting interface
- User-friendly interface eliminates manual coding with visual diagrams.
- Process flow diagrams can be modified, saved and shared with others.
- Flexible reporting allows results to be published in a concise HTML format.
Comprehensive text preprocessing capabilities
- Capture and distill the most important underlying information within a document collection.
- Default or customized stop lists for each language to remove terms with little or no informational value.
- Automated spelling correction.
- Stemming to identify root words.
- Part-of-speech tagging based on sentence context.
- Noun group extraction for identifying phrase-level concepts such as "competitive intelligence."
- User-defined multiword tokens, such as "point and click."
- User-customized and default synonym lists.
- Compound word splitting into distinct subterms.
Extensive feature extraction
- Broad customizable data dictionaries can extract particular pieces of information such as names of people, products, organizations, URLs and addresses.
- Extracted entities are then normalized and included in a matrix table.
- Entity extraction is available for English, French, German and Spanish.
Dimension reduction techniques
- Textual data is preprocessed into an information-rich matrix for application of powerful dimension reduction techniques.
- Rollup terms automatically identify the n-highest weighted terms in a document.
- Singular value decomposition (SVD) transforms each document into an n-dimensional subspace.
Text clustering algorithms
- Group documents based on their content.
- Expectation-maximization clustering groups documents using spatial clustering techniques.
- Hierarchical clustering using Ward’s agglomerative method facilitates automatic grouping of documents into taxonomies. Documents grouped into hierarchical clusters belong to one leaf cluster as well as its parent clusters.
- Cluster documents downstream in the Process Flow Diagram using K-means or SOM/Kohonen clustering.
- Profile clusters using additional structured data from original documents (age, purchase propensity, etc.).
Download the complete SAS Text Miner Fact Sheet.
|
|