Discovery DPS solutions rest upon achievements in computer linguistics and artificial intelligence, which have  resulted in the development of complex of algorithms and technologies that can be used to ‘learn’ from textual data with limited human intervention. The results of this machine learning are displayed in the DPS system.

Our text mining system is based on a constellation of state of the art methods and algorithms; both developed internally and acquired from third parties. A key component of our text mining architecture is our layered approach to text mining.  We apply technologies developed internally, and technologies from a dozen other industry leaders and aggregate the results.  We harvest the strengths of leading providers, and consolidate the information for display in the DPS interface. Where one technology may excel at recognizing companies and organizations, another may excel at extracting dates and currencies.  

The aim of these algorithms and technologies is automation of  typical tasks of text analysis, which until recently could only be performed by human analysts.

Prosearch DPS uses computer linguistics and artificial intelligence algorithms and techniques to performs the following:

·     key concepts extraction;

·     recognition of named entities in texts such as personal names, geographical names, names of organizations, dates, currencies, holidays;

·     intelligent search—flexible search methods taking into account word forms, synonymy, word generalizations, word associations; search for documents semantically close to the given one; semi-automated search result refinement;

·     document categorization—unsupervised splitting of documents collection into the hierarchical set of categories according to semantic proximity, taking into account word forms and synonyms;

·     document classification and evaluation—prediction of certain document properties – like relevance or importance—on the basis of its contents and available metainformation;

·     morphological analysis of known English words and heuristic morphological analysis of unknown words;

·     semantic analysis using the most complete modern thesauri – WordNet and others;

·     structural text analysis including breaking text to sentences, extraction of special entities such as proper names, numbers, URLs others;

·     frequent phrase extraction;

·     advanced statistical methods of analysis of the frequencies of words and other objects in a text for ranking and classification;

·     artificial intelligence for unsupervised machine learning.