Text Processing Capabilities Features and Functions
|
- Content analysis on short alphanumeric variables (up to 255 characters) and longer ANSI, RTF, and other formats
- Captures and distills the most important underlying information within a document collection
- Default or customized stop lists for each language removes terms with little or no informational value
- Calls external text pre-processing to EXE or to DLL
- Integrated multilingual spell-checking
- Integrated thesaurus to assist the creation of taxonomies and comprehensive categorization schemas
- Case filtering on any numeric or alphanumeric field and on code occurrence (with AND, OR, and NOT Boolean operators)
- Excludes pronouns, conjunctions, etc. based on user-defined exclusion lists (or stop list)
- Categorizes words or phrases using existing or user-defined dictionaries
- Categorizes Word based on Boolean (AND, OR, NOT) and proximity rules (NEAR, AFTER, BEFORE)
- Substitutes and scores Word and phrase substitution using wildcards and weighing
- Frequency analysis on keywords, phrases, derived categories or concepts, or user-defined codes entered manually within a text
- Interactive development and maintenance of hierarchical dictionaries, taxonomies, or categorization schemata
- Restricts the analysis to specific portions of a text or to exclude comments and annotations
- Stemming to identify root words
- Part-of-speech tagging based on sentence context
- Noun group extraction for identifying phrase-level concepts such as "competitive intelligence"
- Compound word splitting into distinct sub-terms
- Dictionary building assistant to find related words (including synonyms, antonyms, holonyms, meronyms, hypernyms, and hyponyms)
|
|
Text Mining Features and Functions
|