|
|
|
![]() |
|
|
|
|
|
|
|
|
|
|
ACTIVITIES
The increased ubiquity and resolution of digital still cameras and mobile phone cameras mean that these devices can now act as convenient document capture tools. For instance, cameras are portable, require zero foot-space and can capture curved and projected documents. The price paid for this convenience is a collection of image processing challenges, since cameras operate under far less constrained conditions than conventional document scanners. Our research has resulted in powerful new algorithms for overcoming these challenges, resulting in better print quality, improved text recognition performance, as well as applicability of document-specific image compression methods. This work led to the PageCam, CopyFinder and Portable Document Camera products. We have also worked on the extension of Xerox scanning platforms to camera imaging. Technically, this work has produced new methods giving state-of-the-art performance for correcting lighting variations and perspective distortions, for reconstructing and normalizing color camera images, for automatic zoom control, as well as camera optimizations of techniques for text and embedded data decoding.
The phenomenal growth of document and digital asset repositories makes the ability to search and mine them a central problem. Although images play a key role in most documents, technologies for semantic categorization of their image content are still in their infancy. We are exploring fundamental algorithms for generic visual categorization through the IST Project LAVA , of which we are the coordinating partner. Applications of these technologies include image tagging, auto-illustration and text-image cross-referencing. These techniques can also be applied to enable new, more automatic methods for enhancement of generic color photos, for instance to correct for "red eye." Technically, the problem of visual categorization is essentially one of generalizing over the natural variations in appearance inherent in a category of images or objects, and over viewing and imaging conditions. While successful categorization methods have recently been developed for individual categories, such as faces and cars, we are first to produce a truly generic visual categorization system which handles multiple object types simultaneously. Our system exploits analogies to successful machine learning methods for language processing problems, thus ensuring that generalization is accomplished efficiently and robustly.
Images are widely used in marketing and customer communications, since they are known to have a substantial impact on a recipient's response to a document. As documents become ever more customized and personalized for their recipients, there is an increased need for automation of graphic design processes involving images. In this work we are exploring new technologies for analyzing images for perceptual and esthetic characteristics, which are needed to ensure that automatic documents are exciting, interesting to read and beautifully presented. Applications include automated document proofing and generation of image-text combinations. These will improve the range, effectiveness and quality of documents which can be produced in variable document production solutions.
|
|