|
|
|
![]() |
|
|
|
|
| |
|
|
|
|
LARGE SCALE DATA MINING XRCE's Large Scale Data Mining research area is the point of reference for new data mining algorithms within Xerox. We focus on learning and inference from high-dimensional data that is heterogeneous and evolving over time. Our algorithms have been applied to mining logs from devices such as networks and printers, enabling behavior prediction, optimization, visualization and diagnostics. We also produce text categorization and clustering tools that have been applied in numerous business settings ranging from the filtering and routing of mails to the analysis of Voice of Customer and survey data. Scientifically speaking, our text analysis tools are unique in the sense that they operate with a single model unifying both categorization and clustering.
Print Infrastructure Mining We create software that manages infrastructures involving hundreds of printers. By applying data mining tools to print data we help administrators identify patterns of usage, detect abnormal behaviors and optimize the position or the settings of the printers. Device Log Mining Like aircraft, high-end printers are exceptionally complex to monitor
and maintain. We tackle this challenge with dynamic Bayesian networks
applied to huge volumes of sensor data from our worldwide printer fleet. Text Categorization and Clustering We design text analysis tools that tackle multiple practical business issues, including: hierarchical aspects (tree-structured taxonomies and multi-level language models), dynamic collections (emergence of new topics, vocabulary drifts), mixed data (textual / quantitative) as in survey analysis, coping with noise, fuzziness and uncertainties in documents (such as those resulting from OCR/translation/speech-to-text processes), and reducing the annotation burden by optimally combining active and semi-supervised learning. Hybrid Text-Image Information Access Multimedia information access (categorizing/clustering multimedia documents, querying a multimedia database) raises the problem of designing algorithms able to fill the gap between different media, by providing "translational" links and exploiting cross-media information. These algorithms are applied to multi-media tasks such as automatic image annotation, automatic text illustration, cross-media categorization and searching an image database with text queries.
|
|