DH2017 – Computer Vision in DH workshop (Papers – Second Block)

Second block: Tools
Chair: Melvin Wevers (Utrecht University)

4) A Web-Based Interface for Art Historical Research (Sabine Lang & Bjorn Ommer)

Abstract
Slides
Computer Vision Group (University of Heidelberg)

  • Area: art history <-> computer vision
  • First experiment: Can computers propel the understanding and reconstruction of drawing processes?
  • Goal: Study production process. Understand the types and degrees of transformation between an original piece of art and its reproductions.
  • Experiment 2: Can computers help with the analysis of large image corpora, e.g. find gestures?
  • Goal: Find visual similarities and do formal analysis.
  • Central questions: which gestures can we identify? Do there exist varying types of one gesture?
  • Results: Visuelle Bildsuche (interface for art historical research)
Visuelle Bildsuche – Interface start screen. Data collection Sachsenspiegel (c1220)
  • Interesting and potential feature: in the image, you can markup areas and find others images with visual similarities:
Search results with visual similarities based on selected bounding boxes
Bautista, Miguel A., Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer. “CliqueCNN: Deep Unsupervised Exemplar Learning.” arXiv:1608.08792 [Cs], August 31, 2016. http://arxiv.org/abs/1608.08792.

5) The Media Ecology Project’s Semantic Annotation Tool and Knight Prototype Grant (Mark Williams, John Bell, Dimitrios Latsis, Lorenzo Torresani)

Abstract
Slides
Media Ecology Project (Dartmouth)

The Semantic Annotation Tool (SAT)

Is a drop-in module that facilitates the creation and sharing of time-based media annotations on the Web

Knight News Challenge Prototype Grant

Knight Foundation has awarded a Prototype Grant for Media Innovation to The Media Ecology Project (MEP) and Prof. Lorenzo Torresani’s Visual Learning Group at Dartmouth, in conjunction with The Internet Archive and the VEMI Lab at The University of Maine.

“Unlocking Film Libraries for Discovery and Search” will apply existing software for algorithmic object, action, and speech recognition to a varied collection of 100 educational films held by the Internet Archive and Dartmouth Library. We will evaluate the resulting data to plan future multimodal metadata generation tools that improve video discovery and accessibility in libraries.

 

DH2017 – Computer Vision in DH workshop (Papers – First Block)

Seven papers have been selected by a review commission and authors had 15 minutes to present during the Workshop. Papers were divided into three thematic blocks:

First block: Research results using computer vision
Chair: Mark Williams (Darthmouth College)

1) Extracting Meaningful Data from Decomposing Bodies (Alison Langmead, Paul Rodriguez, Sandeep Puthanveetil Satheesan, and Alan Craig)

Abstract
Slides
Full Paper

Each card used a pre-established set of eleven anthropometrical measurements (such as height, length of left foot, and width of the skull) as an index for other identifying information about each individual (such as the crime committed, their nationality, and a pair of photographs).

This presentation is about Decomposing Bodies, a large-scale, lab-based, digital humanities project housed in the Visual Media Workshop at the University of Pittsburgh that is examining the system of criminal identification introduced in France in the late 19th century by Alphonse Bertillon.

  • Data: System of criminal identification from American prisoners from Ohio.
  • ToolOpenFace. Free and open source face recognition with deep neural networks.
  • Goal: An end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion and also the ability to browse through the original data and generated metadata using a web interface.
  • Character recognition: MNIST database
  • Mechanical Turk: we need to talk about it”: consider Mechanical Turk if public domain data and task is easy.
  • Findings: Humans deal very well with understanding discrepancies. We should not ask the computer to find these discrepancies to us, but we should build visualizations that allow us to visually compare images and identify de similarities and discrepancies.

2) Distant Viewing TV (Taylor Arnold and Lauren Tilton, University of Richmond)

Abstract
Slides

Distant Viewing TV applies computational methods to the study of television series, utilizing and developing cutting-edge techniques in computer vision to analyze moving image culture on a large scale.

Screenshots of analysis of Bewitched
  • Code on Github
  • Both presenters are authors o Humanities Data in R
  • The project was built on work with libraries with low-level features (dlib, cvv and OpenCV) + many papers that attempt to identify mid-level features. Still:
    • code often nonexistent;
    • a prototype is not a library;
    • not generalizable;
    • no interoperability
  • Abstract-features such as genre and emotion, are new territories
Feature taxonomy
  • Pilot study: Bewitched (serie)
  • Goal: measure character presence and position in the scene
  • Algorithm for shot detection 
  • Algorithm for face detection
  •  Video example
  • Next steps:
    • Audio features
    • Build a formal testing set

3) Match, compare, classify, annotate: computer vision tools for the modern humanist (Giles Bergel)

Abstract
Slides
The Printing Machine (Giles Bergel research blog)

This presentation related the University of Oxford’s Visual Geometry Group’s experience in making images computationally addressable for humanities research.

The Visual Geometry Group has built a number of systems for humanists, variously implementing (i) visual search, in which an image is made retrievable; (ii) comparison, which assists the discovery of similarity and difference; (iii) classification, which applies a descriptive vocabulary to images; and (iv) annotation, in which images are further described for both computational and offline analysis

a) Main Project Seebibyte

  • Idea: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 – 2020.
  • Objectives: to carry out fundamental research to develop next generation computer vision methods that are able to analyse, describe and search image and video content with human-like capabilities. To transfer these methods to industry and to other academic disciplines (such as Archaeology, Art, Geology, Medicine, Plant sciences and Zoology)
  • Demo: BBC News Search (Visual Search of BBC News)

Tool: VGG Image Classification (VIC) Engine

This is a technical demo of the large-scale on-the-fly web search technologies which are under development in the Oxford University Visual Geometry Group, using data provided by BBC R&D comprising over five years of prime-time news broadcasts from six channels. The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types: object search, image search and people search.

The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types.
An item of interest can be specified at run time by a text query, and a discriminative classifier for that item is then learnt on-the-fly using images downloaded from Google Image search.

ApproachImage classification through Machine Learning.
Tool: VGG Image Classification Engine (VIC)

The objective of this research is to find objects in paintings by learning classifiers from photographs on the internet. There is a live demo that allows a user to search for an object of their choosing (such as “baby”, “bird”, or “dog, for example) in a dataset of over 200,000 paintings, in a matter of seconds.

It allows computers to recognize objects in images, what is distinctive about our work is that we also recover the 2D outline of the object. Currently, the project has trained this model to recognize 20 classes. The demo allows the user to test our algorithm on their images.

b) Other projects

Approach: Image searching
Tool: VGG Image Search Engine (VISE)

Approach: Image annotation
Tool: VGG Image Annotator (VIA)

 

DH2017 – Computer Vision in DH workshop (Keynote)

Robots Reading Vogue Project

A keynote by Lindsay King & Peter Leonard (Yale University) on “Processing Pixels: Towards Visual Culture Computation”.

SLIDES HERE

Abstract: This talk will focus on an array of algorithmic image analysis techniques, from simple to cutting-edge, on materials ranging from 19th century photography to 20th century fashion magazines. We’ll consider colormetrics, hue extraction, facial detection, and neural network-based visual similarity. We’ll also consider the opportunities and challenges of obtaining and working with large-scale image collections.

Project Robots Reading Vogue project at Digital Humanities Lab Yale University Library

1) The project:

  • 121 yrs of Vogue (2,700 covers, 400,000 pages, 6 TB of data). First experiments: N-Grams, topic modeling.
  • Humans are better at seeing “distant vision” (images) patterns with their own eyes than  “distant reading” (text)
  • A simple layout interface of covers by month and year reveals patterns about Vogue’s seasonal patterns
  • The interface is not technically difficult do implement
  • Does not use computer vision for analysis

2) Image analysis in RRV (sorting covers by color to enable browsing)

    • Media visualization (Manovich) to show saturation and hue by month. Result: differences by the season of the year. Tool used:  ImagePlot
    • “The average color problem”. Solutions:
    • Slice histograms: Visualization Peter showed.

The slice histograms give us a zoomed-out view unlike any other visualizations we’ve tried. We think of them as “visual fingerprints” that capture a macroscopic view of how the covers of Vogue changed through time.
  • “Face detection is kinda of a hot topic people talk about but I only think it is of use when it is combined with other techniques’ see e.g. face detection within 

    3. Experiment Face Detection + geography 

  •  Photogrammer
Face Detection + Geography
  • Code on Github
  • Idea: Place image as thumbnail in a map
  • Face Detection + composition
Face Detection + composition

4. Visual Similarity 

  • What if we could search for pictures that are visually similar to a given image
  • Neural networks approach
  • Demo of Visual Similarity experiment:
In the main interface, you select an image and it shows its closest neighbors.
  • In the main interface, you select an image and it shows its closest neighbors.

Other related works on Visual Similarities:

  • John Resig’s Ukiyo-e  (Japenese woodblock prints project). Article: Resig, John. “Aggregating and Analyzing Digitized Japanese Woodblock Prints.” Japanese Association of Digital Humanities conference, 2013.
  • John Resig’s  TinEye MatchEngine (Finds duplicate, modified and even derivative images in your image collection).
  • Carl Stahmer – Arch Vision (Early English Broadside / Ballad Impression Archive)
  • Article: Stahmer, Carl. (2014). “Arch-V: A platform for image-based search and retrieval of digital archives.” Digital Humanities 2014: Conference Abstracts
  • ARCHIVE-VISION Github code here
  • Peter refers to paper Benoit presented in Krakow.

5. Final thoughts and next steps

  • Towards Visual Cultures Computation
  • NNs are “indescribable”… but we can dig in to look at pixels that contribute to classifications: http://cs231n.github.io/understanding-cnn/
  • The Digital Humanities Lab at Yale University Library is currently working with as image dataset from YALE library through Deep Learning approach to detect visual similarities.
  • This project is called Neural Neighbors and there is a live demo of neural network visual similarity on 19thC photos
Neural Neighbors seeks to show visual similarities in 80,000 19th Century photographs
  • The idea is to combine signal from pixels with signal from text
  • Question: how to organize this logistically?
  • Consider intrinsic metadata of available collections
  • Approaches to handling copyright licensing restrictions (perpetual license and transformative use)
  • Increase the number of open image collections available: museums, governments collections, social media
  • Computer science departments working on computer vision with training datasets.

 

DH2017 – Computer Vision in DH workshop (Hands-on)

Hands-on Part I – Computer Vision basics, theory, and tools

SLIDES HERE

Instructor: Benoit Seguin (from Image and Visual Representation Lab – | École Polytechnique Fédérale de Lausanne)

An introduction of basic notions about the challenges of computer vision. A feeling of the simple, low-level operations necessary for the next stage.

Tools:
Python
Basic image operations: scikit-image
Face-object identification + identification: dlib
Deep Learning: Keras

What is CV?
How to gain high-level understanding from digital images or videos.
It tries to resolve tasks that humans can do (Wikipedia)

Human Vision System (HVS) versus Digital Image Processing (what the computer sees)

Our human understanding of images is way more complex than their digital version (arrays of pixels)
Convolution illustrated

Practice:
Jupyter system (an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text);
– perform basic image operations;
– Play with different convolutions to develop intuition.

Hands-on Part II – Deep Learning and its application

 

DH2017 – Computer Vision in DH workshop

During the DH2017 conference in Montreal, I attended the ‘Computer Vision in Digital Humanities‘ workshop organized by AVinDH SIG (Special Interest Group AudioVisual material in Digital Humanities). All information about the workshop can be found here.

An abstract about the workshop was published on DH2017 Proceedings and can be found here.

Computer Vision in Digital Humanities Workshop: Keynote by Lindsay King & Peter Leonard.
Workshop Computer Vision in Digital Humanities: hands-on session.

This workshop focus on how computer vision can be applied within the realm of Audiovisual Materials in Digital Humanities. The workshop included:

  • A keynote by Lindsay King & Peter Leonard (Yale University) on “Processing Pixels: Towards Visual Culture Computation”.
  • Paper presentations. (papers have been selected by a review commission)
  • hands-on session to experiment with open source Computer Vision tools.
  • Lightning Talks allowing participants to share their ideas, projects or ongoing work in a short presentation of two minutes.

 

Digital Humanities Conference (DH2017)

Today I start a series of posts about the Digital Humanities Conference hosted by McGill University (Montréal) from 8 to 11 August.

Organized by The Alliance of Digital Humanities Organizations (ADHO), the conference addresses many aspects of digital humanities such as:

  • Humanities research enabled through digital media, artificial intelligence or machine learning, software studies, or information design and modeling;
  • Social, institutional, global, multilingual, and multicultural aspects of digital humanities;
  • Computer applications in literary, linguistic, cultural, and historical studies, including public humanities and interdisciplinary aspects of modern scholarship;
  • Quantitative stylistics and philology, including big data and text mining studies;
  • Digital arts, architecture, music, film, theatre, new media, digital games, and electronic literature;
  • Emerging technologies such as physical computing, single-board computers, minimal computing, wearable devices, applied to humanities research; and
  • Digital humanities in pedagogy and academic curricula.

I participated in the Conference with a poster presentation called “Exploring Rio-2016 image dataset through Deep Learning and visualization techniques”.