Products & Solutions / Content Categorization

SAS® Content Categorization

Drive faster, more efficient information organization, access and findability with automated content categorization

SAS Content Categorization, powered by Teragram technology, applies natural language processing and advanced linguistic techniques to automatically categorize large volumes of multilingual content that is acquired, generated or exists in a repository. It correctly parses and analyzes content for entities and events, which are then used to create metadata and trigger business processes. This drives faster, efficient information organization, access and findability, while drastically reducing the overhead associated with the content categorization process.

Benefits

  • Purge content chaos that spans multiple enterprise repositories.
  • Enable users to find the information they need quickly.

Read more

Features

  • Taxonomy creation
  • Category classification
  • Entity extraction
  • Support for more than 30 languages
  • Collaboration

Read more

Screenshot

SAS Content Categorization parses and analyzes content, which is then used to create metadata and trigger business processes.



How SAS® Is Different

  • SAS Content Categorization, powered by Teragram, drives faster, more efficient information organization and access by processing large volumes of content and eliminating manual and redundant content tagging processes.
  • SAS Content Categorization leverages advanced linguistic and natural language processing techniques to recognize and analyze parts of speech from more than 30 languages, enabling organizations to better manage and govern multilingual content.
  • With SAS you can fully leverage content assets and ensure reuse across disparate departmental repositories, regardless of who owns the content or where it was generated.

Benefits

  • Purge content chaos that spans multiple enterprise repositories. All too often, enterprise information is managed in silos based on different types of data, storage and characterizations. To be useful, content must be integrated, organized and managed with automated content categorization. Providing the flexibility of applying linguistic rules for unique, identifying terms and the ability to define category rules to classify documents that match those rules, SAS Content Categorization drastically reduces the overhead associated with the content categorization process.
  • Enable users to find the information they need quickly. Findability is the ability of users to find information they need, whether they have touched it before, know where it is or where it resides. Effective findability retrieves content in context and provides intuitive interaction between users and the content. It provides multiple, tailored ways to retrieve content, and includes the necessary security controls. By processing large volumes of content and eliminating manual and redundant tagging processes, SAS Content Categorization drives faster and more efficient information organization and access.

Features

Taxonomy creation
  • Intuitive interfaces for developing taxonomies and writing category rules and concept definitions that classify the taxonomy nodes.
  • Unlimited number of taxonomy nodes to apply generated categories and concepts to large volumes of input documents.
  • Hierarchical taxonomy development where related topics are grouped together or flat taxonomy development where there is no relationship between nodes in the taxonomy tree.
  • Prebuilt, out-of-the-box taxonomies for news and publishing organizations, libraries and other enterprises.
  • Taxonomy services include:
    • Tutorials on metadata generation and deployment analyses.
    • Services addressing taxonomy creation, rules for document classification and definitions for entity extraction.
    • Integration requirements, including workflow analysis and implementation.
    • Benchmarks and throughput analysis within each customer's environment.
    • Return on investment (ROI) analyses.
Category classification
  • Category rules definitions to classify documents that match the rule, while excluding texts that do not.
  • Automatic application of natural language processing and advanced linguistic technologies to classify and identify key information.
  • Linguistic rules and Boolean operators for added specificity.
  • Ability to create simple or complex category rules and concept definitions.
  • Ability to develop a list of unique identifying terms for each category rule.
  • Weight selective terms or the categories to create more exclusive membership requirements.
  • Test and document interfaces for validating application of rules and definitions to batch, entire or content components.
  • Automatic application of rules and definitions to incoming texts using the client APIs in C, C++, C#.NET, Java, Perl or Python.
Entity extraction
  • Distill vast quantities of information into a few, easily understandable pieces of information.
  • Dictionary-based, grammar-based and regular expression-based concepts to simplify the process of locating related data.
  • An intuitive GUI for performing complex information tasks.
  • Automated customized classification and entity application to large volumes of multilingual content.
Support for more than 30 languages
  • Language tools: Advanced linguistic technologies that leverage:
    • Part-of-speech recognition and tagging; recognizes nouns, verbs, adjectives, etc.
    • Stemming: Locates the various forms of an input noun or verb.
    • Case sensitivity: Specify uppercase and/or lowercase recognition for concepts.
  • Compound recognition and compound decomposition for Germanic and Asian languages; break apart the recognized compound words.
  • Segmentation for Asian languages.
Collaboration
  • Provide several taxonomists and developers, working individually or in teams, with secure access to projects.
  • Multiple users can access projects under development.
  • Permission levels include read, write, category rules and concept definitions.

Screenshots

Screenshot
SAS Content Categorization parses and analyzes content, which is then used to create metadata and trigger business processes.

SAS Content Categorization parses and analyzes content for entities and events, which are then used to create metadata and trigger business processes. A GUI makes it easy to define and test metadata. Shown here, the terms that match the metadata are highlighted in red.

View Screenshot

System Requirements

SAS Content Categorization, powered by Teragram technologies, is a standalone product that requires no other SAS modules.

Client environment
  • Microsoft Windows (x86-32): Windows 2000 Professional, Windows XP Professional, Windows Vista*, Windows Server 2003 family

Server environment
  • AIX: Version 4.3 (x86-32), Versions 5.3 and 6.1 (x64) on POWER architectures
  • FreeBSD 4.8 (x86-32) and 6.0 (x64)
  • HP-UX PA-RISC: HP-UX 11iv2 (11.23), 11iv3 (11.31)
  • HP-UX Itanium: HP-UX 11iv2 (11.23), 11iv3 (11.31)
  • Linux for x86 (x86-32): RHEL 4, SuSE SLES 9
  • Linux for x64 (EM64T/AMD64): RHEL 4, SuSE SLES 9
  • Macintosh: Mac OS X 10.4.8 or higher
  • Microsoft Windows (x86-32): Windows 2000, Windows XP Professional, Windows Server 2003, Windows Vista*
  • Microsoft Windows on x64 (EM64T/AMD64): Windows XP Professional for x64, Windows Vista* for x64, Windows Server 2003 for x64
  • Solaris on SPARC: Versions 6, 8, 9 and 10
  • Solaris on x64: Versions 8 and 10

* NOTE: Windows Vista editions that are supported include Enterprise, Business and Ultimate.

Ready to learn more?

Call us at 1-800-727-0025 (US and Canada) or request more information.