Third block: Deep Learning
Chair: Thomas Smits (Radboud University)
6) Aligning Images and Text in a Digital Library (Jack Hessel & David Mimno)
Website David Mimno
Website Jack Hessel
- In this work, the researchers train machine learning algorithms to match images from book scans with the text in the pages surrounding those images.
- Using 400K images collected from 65K volumes published between the 14th and 20th centuries released to the public domain by the British Library, they build information retrieval systems capable of performing cross-modal retrieval, i.e., searching images using text, and vice-versa.
- Previous multi-modal work:
- Datasets: Microsoft Common Objects in Context (COCO) and Flickr (images with user-provided tags);
- Tasks: Cross-modal information retrieval (ImageCLEF) and Caption search / generation
- Project Goals:
- Use text to provide context for the images we see in digital libraries, and as a noisy “label” for computer vision tasks
- Use images to provide grounding for text.
- Why is this hard? Most relationship between text and images is weakly aligned, that is, very vague. A caption is an example of strong alignments between text and images. An article is an example of weak alignment.
7) Visual Trends in Dutch Newspaper Advertisements (Melvin Wevers & Juliette Lonij)
- The context of advertisements for historical research:
- “insight into the ideals and aspirations of past realities …”
- “show the state of technology, the social functions of products, and provide information on the society in which a product was sold” (Marchand, 1985).
- Research question: How can we combine non-textual information with textual information to study trends in advertisements?
- Data: ~1,6M Advertisements from two Dutch national newspapers Algemeen Handelsblad and NRC Handelsblad between 1948-1995
- Metadata: title, date, newspaper, size, position (x, y), ocr, page number, total number of pages.
- Approach: Visual Similarity:
- Group images together based on visual cues;
- Demo: SIAMESE: SImilar AdvertiseMEnt SEarch;
- Approximate nearest neighbors in a penultimate layer of ImageNet inception model.
- Final remarks:
- Object detection and visual similarity approach offer trends on different layers, similar to close and distant reading;
- Visual Similarity is not always Conceptual Similarity;
- Combination of text/semantic and visual similarity as a way to find related advertisements.
8) Deep Learning Tools for Foreground-Aware Analysis of Film Colors (Barbara Flueckiger, Noyan Evirgen, Enrique G. Paredes, Rafael Ballester-Ripoll, Renato Pajarola)
The research project FilmColors, funded by an Advanced Grant of the European Research Council, aims at a systematic investigation into the relationship between film color technologies and aesthetics.
Initially, the research team analyzed a large group of 400 films from 1895 to 1995 with a protocol that consists of about 600 items per segment to identify stylistic and aesthetic patterns of color in film.
This human-based approach is now being extended by an advanced software that is able to detect the figure-ground configuration and to plot the results into corresponding color schemes based on a perceptually uniform color space (see Flueckiger 2011 and Flueckiger 2017, in press).