Teragram

OEM Products

European and Arabic Linguistic Suite

Teragram provides Linguistic, information extraction, knowledge management and text processing technologies for all major European languages including Nordic and Eastern European languages. Teragram has established itself as the leading technology and service provider for technologies such as Linguistic technologies, pattern matching, Linguistic retrieval, international dictionaries for search and e-commerce, media companies, document management, high demand Internet applications, and services.

PRODUCTS

Morphological Stemming
Spelling Correction
Part of Speech Tagging
Text Normalization
Word and Sentence Tokenizer
Phonetic Transcription

APIs & SUPPORT

Client-Server APIs
Java Support
Perl Support

Morphological Stemming

Morphological stemming identifies all possible root forms of an inflected, or conjugated, word. Stemming which is true to the language, and not just the insensitive work of inferior software, is required for real meaning extraction in text applications. For example, Teragram morphological stemming correctly produces child for the input word children, hang for hung, and goose for geese.

The multilingual dictionaries used in Teragram morphological stemming are designed with great care. They are updated quarterly to reflect the latest word usages, using advanced data gathering techniques that linguistically process terabytes of real data. This guarantees that all Teragram solutions correctly handle the wide range of word usages which spring up around new technologies, as we have seen for the Internet.

Morphological stemming is available for most European languages.

Derivational Stemming identifies the derivational root forms of conjugated words. Similar to Morphological Stemming, this technique is very powerful because it relates categories to the textual meanings of your documents. Teragram's Derivational Stemming is linguistically correct and robust in every available language, so that applications obtain the right mapping for a user's intentions. For example, this software will output incorporate for the input words incorporation and incorporating.

Teragram Morphological Generation is the inverse process of morphological analysis. It conjugates a word given a morphological feature. For example, this software outputs the word children on the input plural of the noun child.

Spelling Correction

Teragram Query Speller is Teragram's essential spell-checking solution. Teragram Query Speller sets a new standard of quality in spell-checking software, by providing the most technologically advanced spelling correction system available on the market today.

Teragram Query Speller provides superior word verification and spelling correction, including correction for typographic (insertion, deletion, replacement and transposition of any number of characters) and cognitive errors, as well as errors in capitalization, proper nouns, contractions, acronyms, accents, company names, hyphenated words, and abbreviations.

In addition, Teragram Query Speller allows the user to construct words phonetically, by applying the pronunciation rules and exceptions which are unique to each language. The capability to type words phonetically is extremely important for those words which are difficult to spell. Conventional technologies fail to appreciate the user's intentions, especially in the international settings. But Teragram Query Speller's powerful phonetic feature can put writers from different backgrounds on a more equal footing, enabling them to focus on content. For example, Teragram Query Speller can actually correct azmatik to asthmatic, and offtalmologist to ophthalmologist.

By suggesting the shortest and most accurate list possible for the spelling alternatives of each misspelled word, Teragram Query Speller immediately improves the user’s productivity. Since alternatives are ordered by likelihood, the desired correction is most often the top, if not the only, choice among candidates.

Teragram Query Speller has the best word coverage of any spelling program available. Other spelling programs rely on word lists constructed from paper dictionaries that are revised every ten years. The dictionaries of Teragram Query Speller are dynamically generated from terabytes of real data, using advanced retrieval methods that are completely refreshed quarterly. This unique approach to dictionary construction guarantees that all words of the language in usage will make their way into TeraSpell, with the preference for frequent words over the archaic or rare. Very soon after a word appears in the media (for example, a place name such as Chechnya, also spelled Chechnia), Teragram has correctly incorporated it into its intelligent wordlists. No other system in the world can offer such linguistically advanced, up-to-date coverage. And with its specialized compression technologies, Teragram Query Speller's breadth does not compromise its space requirements.

In summary, Teragram Query Speller applies the state-of-the-art Linguistic technologies to improve a writer’s productivity, enabling users to focus on the message not on the form.

Teragram Query Speller and Teragram Query Speller are available to original equipment manufacturers (OEMs) and to application developers. It is provided as a software development kit with a complete application programming interface, and is available for a variety of platforms, including Windows, Macintosh and UNIX. In addition to licensing agreements, Teragram provides technical support services for its products. Teragram’s highly trained professionals can develop the customized approaches that suit your particular spelling correction needs and platform requirements.

Part of Speech Tagging

Teragram Part-of-Speech Tagging will identify from possibilities, or "disambiguate," the grammatical category of words in a context. Accurate tagging is a vital process for any text application hoping to act on the user's precise meaning. For example, the grammatical category of the polysemous word left is different for each of the following sentences:

  1. He turned left at the light.
  2. Yesterday he left work early.
  3. Please sit on my left.

A part-of-speech tagger must be able to give the unique grammatical category for each word appearance in a given context: in the first sentence, left is an adverb, in the second, a verb, and in the third it is a noun.

Due to the fact that every Teragram Part-of-Speech Tagger is built on a deep and efficient modeling for each particular language's rules and regulations, reliable tagging is at last accurate, fast, and easy.

Text Normalization

Teragram Text Normalization builds on top of Teragram Spelling Correction for various languages. It normalizes the expectable variations in textual forms, allowing applications to manipulate the information from textual data sources more categorically and accurately. For example, this feature is successfully used for text indexing, enabling standard spelling variations to be indexed under one single concept. Additionally, Teragram text normalization recognizes the complex linguistic variations that exist for the dates, dollar amounts, and company names in every language. For example, in English the third of February and February 3rd and 2/3 are all normalized to the same category. Such behavior is necessary and expected in modern information processing.

Word and Sentence Tokenizer
Teragram's Word Tokenizer separates punctuation marks from text, enabling further text processing. Teragram's Sentence Tokenizer builds on top of the word tokenizer by breaking down the stream of text into a stream of sentences. This process requires sophisticated techniques since, for example, in English the period may function as an end of sentence marker as well as a character in abbreviations and ellipses. Good tokenization can make or break a sophisticated text processing application, and Teragram Word and Sentence Tokenization is exceptional.

Phonetic Transcription

Phonetic transcription consists of transcribing the pronunciation of words. Teragram Phonetic Transcription is available for several languages.

Client Server APIs

All of Teragram's software and tools are designed to support Client-Server applications. Teragram's libraries are further designed for use within multi-threaded environments.

Java Support

Teragram's tools can easily be integrated within Java applications.

Perl Support

Many of Teragram's tools can be provided as Perl packages. This functionality is particularly useful for the integration of language capabilities within webservers, and it gives developers the advantages of tremendous flexibility coupled with familiar simplicity.