
Discovery DPS solutions rest upon achievements in computer linguistics
and artificial intelligence, which have resulted in the development of
complex of algorithms and technologies that can be used to learn from
textual data with limited human intervention. The results of this
machine learning are displayed in the DPS system.
Our
text mining system is based on a constellation of state of the art
methods and algorithms; both developed internally and acquired from
third parties. A key component of our text mining architecture is our
layered approach to text mining. We apply technologies developed
internally, and technologies from a dozen other industry leaders and
aggregate the results. We harvest the strengths of leading providers,
and consolidate the information for display in the DPS interface. Where
one technology may excel at recognizing companies and organizations,
another may excel at extracting dates and currencies.
The
aim of these algorithms and technologies is automation of typical tasks
of text analysis, which until recently could only be performed by human
analysts.
Prosearch DPS uses computer linguistics and artificial intelligence
algorithms and techniques to performs the following:
· key
concepts
extraction;
· recognition
of
named entities
in texts such as personal names, geographical names, names of
organizations, dates, currencies, holidays;
· intelligent
searchflexible
search methods taking into account word forms, synonymy, word
generalizations, word associations; search for documents semantically
close to the given one; semi-automated search result refinement;
· document
categorizationunsupervised
splitting of documents collection into the hierarchical set of
categories according to semantic proximity, taking into account word
forms and synonyms;
· document
classification and evaluationprediction
of certain document properties like relevance or importanceon
the basis of its contents and available metainformation;
·
morphological analysis
of known English words and heuristic morphological analysis of unknown
words;
· semantic
analysis
using the most complete modern thesauri WordNet and others;
·
structural
text analysis
including breaking text to sentences, extraction of special entities
such as proper names, numbers, URLs others;
· frequent
phrase
extraction;
· advanced
statistical methods of
analysis of
the frequencies of words and other objects in a text for ranking and
classification;
· artificial
intelligence for unsupervised
machine learning.