OEM Products

Language & Character Encoding Identification

Now you can recognize the language and character encoding of any text with superb speed and accuracy, by using Teragram Language and Character Encoding Identification software. Recognition processing is the first essential step to preparing any text for Tokenizing, Spelling Correction, Stemming, and Indexing (to name just a few common text manipulations). Teragram’s Solution is the standard now used by major World-Wide-Web search engines. For the half billion Internet documents now estimated to be in existence, Teragram’s software recognizer has been run a billion times or more, invisibly and reliably. Teragram proprietary technology has been combined with Teragramo’s highly accurate lexicons and scalable methodologies, enabling the identification of more than 100 unique language and encoding pairs with almost no extra time added to reading the file. Teragram’s Language Recognition covers all the major European and Asian languages, for all the major encoding standards found in both modern and legacy documents. No additional pre-filtering is required for plain text or the HTML, SGML, and XML markup languages. Teragram’s identification technology easily handles the terabytes which are increasingly common in the global computing world, giving you the capability and confidence you need to run advanced text applications.