Gugelmann Galaxy

Gugelmann Galaxy is an interactive demo by Mathias Bernhard exploring itens from the Gugelmann Collection, a group of 2336 works by the Schweizer Kleinmeister – Swiss 18th century masters. Gugelmann Galaxy is built on Three.js, a lightweight javascript library, allowing to create animated 3D visualizations in the browser using WebGL.

The images are grouped according to specific parameters that are automatically calculated by image analysis and text analysis from metadata. A high-dimensional space is then projected onto a 3D space, while preserving topological neighborhoods between images in the original space. More explanation about the dimensionality reduction can be read here.

The user interface allows four types of image arrangement: by color distribution, by technique, by description and by composition.  As the mouse hovers over the items, an info box with some metadata is displayed on the left. The user can also perform rotation, zooming, and panning.

The author wrote on his site:

The project renounces to come up with a rigid ontology and forcing the items to fit in premade categories. It rather sees clusters emerge from attributes contained in the images and texts themselves. Groupings can be derived but are not dictated.


My presentation at HDRio2018

During the paper session “Social networks and visualizations”, held on April 11 at HDRio2018 Congress, I presented the work “Perspectivas para integração do Design nas Humanidades Digitais frente ao desafio da análise de artefatos visuais”  (“Perspectives for integrating Design in Digital Humanities in the face of the challenge of visual artifacts analysis”).

In this work, I outline initial considerations of a broader and ongoing research that seeks to reflect on the contributions offered by the field of Design in the conception of a graphical user interface that, along with computer vision and machine learning technologies, support browsing and exploration of large collections of images.

I believe my contribution raises three main discussions for the field of Digital Humanities:

  1. The investigation of large collections of images (photographs, paintings, illustrations, videos, GIFs, etc.) using image recognition techniques through a Machine Learning approach;
  2. The valorization of texts and media produced on social networks as a valid source of cultural heritage for Digital Humanities studies;
  3. Integration of Design principles and methodologies (HCI and visualization techniques) in the development of tools to retrieve, explore and visualize large image collections.

Slides from this presentation can be accessed here (Portuguese only).

First Digital Humanities Conference in Brazil

The I International Congress on Digital Humanities – HDRio2018, held in Getulio Vargas Foundation (FGV), Rio de Janeiro, from April 9 to 13, 2018, initiated in Brazil a broad and international debate on this relevant and emerging field, constituting a timely opportunity for academics, scientists and technologists of Arts, Culture and Social Sciences, Humanities and Computation, to reflect, among other topics, the impact of information technologies, communication networks and the digitization of collections and processes in individuals’ daily lives and their effects on local and global institutions and societies, especially in Brazilian reality.

HDRio2018’s program included Opening and Closing Ceremony, 6 workshops, 8 panels, 8 paper sessions (featuring 181 presentations) and 1 poster session. Accepted papers can be found here.

Organizers: The Laboratory Of Digital Humanities – LHuD from Centre for Research and Documentation of Contemporary History of Brazil (CPDOC) at Getulio Vargas Foundation (FGV) and the Laboratory for Preservation and Management of Digital Collections (LABOGAD) at Federal University of the State of Rio de Janeiro (UNIRIO).

Visualizing time, texture and themes in historical drawings

Past vision is a collection of historical drawings visualized in a thematic and temporal arrangement. The interface highlights general trends in the overall collection and gives access to rich details of individual items.

The case study examines the potential of visualization when applied to, and developed for, cultural heritage collections. It specifically explores how techniques aimed at visualizing the quantitative structure of a collection can be coupled with a more qualitative mode that allows for detailed examination of the artifacts and their contexts by displaying high-resolution views of digitized cultural objects with detailed art historical research findings.

Past vision is a research project by Urban Complexity Lab at Potsdam University of Applied Sciences.

Reference: “Past Visions and Reconciling Views: Visualizing Time, Texture and Themes in Cultural Collections.” ResearchGate. Accessed March 8, 2018.

Visualizing cultural collections

Browsing the content from Information Plus Conference (2016 edition) I bumped into a really interesting presentation regarding the use of graphical user interfaces and data visualization to support the exploration of large-scale digital cultural heritage.

One View is Not Enough: High-level Visualizations of Large Cultural Collections is a contribution by the Urban Complexity Lab, from the University of Applied Sciences Potsdam. Check the talk by Marian Dörk:

As many cultural heritage institutions, such as museums, archives, and libraries, are digitizing their assets, there is a pressing question which is how can we give access to this large-scale and complex inventories? How can we present it in a way to let people can draw meaning from it, get inspired and entertained and maybe even educated?

The Urban Complexity Lab tackle this open problem by investigating and developing graphical user interfaces and different kinds of data visualizations to explore and visualize cultural collections in a way to show high-level patterns and relationships.

In this specific talk, Marian presents two projects conducted at the Lab. The first, DDB visualized, is a project in partnership with the Deutsche Digitale Bibliothek. Four interactive visualizations make the vast extent of the German Digital Library visible and explorable. Periods, places and persons are three of the categories, while keywords provide links to browsable pages of the library itself.


The second, GEI – Digital, is a project in partnership with the Georg Eckert Institute. This data dossier provides multi-faceted perspectives on GEI-Digital, a digital library of historical schoolbooks created and maintained by the Georg Eckert Institute for International Textbook Research.


The problem of gender bias in the depiction of activities such as cooking and sports in images

The challenge of teaching machines to understand the world without reproducing prejudices. Researchers from Virginia University have identified that intelligent systems have started to link the cooking action in images much more to women than men.

Gender bias test with artificial intelligence to the act “cook”: women are more associated, even when there is a man in the image.

Just like search engines – which Google has as its prime example – do not work under absolute neutrality, free of any bias or prejudice, machines equipped with artificial intelligence trained to identify and categorize what they see in photos also do not work in a neutral way.

Article on Wired.

Article on Nexo (Portuguese)

Reference: Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. “Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints.” arXiv:1707.09457 [Cs, Stat], July 28, 2017.

DH2017 – Computer Vision in DH workshop (lightining talks part 1)

To facilitate the exchange of current ongoing work, projects or plans, the workshop allowed participants to give very short lightning talks and project pitches of max 5 minutes.

Part 1
Chair: Martijn Kleppe (National Library of the Netherlands)

1. How can Caffe be used to segment historical images into different categories?
Thomas Smits (Radboud University)

Number of images by identified categories.
  • Challenge: how to attack the “unknown” category and make data more discoverable?

2. The Analysis Of Colors By Means Of Contrasts In Movies 
Niels Walkowski (BBAW / KU Leuven)

  • Slides 
  • Cinemetrics, Colour Analysis & Digital Humanities:
    • Brodbeck (2011) “Cinemetrics”: the project is about measuring and visualizing movie data, in order to reveal the characteristics of films and to create a visual “fingerprint” for them. Information such as the editing structure, color, speech or motion are extracted, analyzed and transformed into graphic representations so that movies can be seen as a whole and easily interpreted or compared side by side.

      Film Data Visualization
    • Burghardt (2016) “Movieanalyzer
Movieanalyzer (2016)

3. New project announcement INSIGHT: Intelligent Neural Networks as Integrated Heritage Tools
Mike Kestemont (Universiteit Antwerpen)

  • Slides
  • Data from two museums Museums: Royal Museums of Fine Arts of Belgium and Royal Museums of Art and History;
  • Research opportunity: how can multimodal representation learning (NPL + Vision) help to organize and explore this data;
  • Transfer knowledge approach:
    • Large players in the field have massive datasets;
    • How easily can we transfer knowledge from large to small collections? E.g. automatic dating or object description;
  • Partner up: the Departments of Literature and Linguistics (Faculty of Arts and Philosophy) of the University of Antwerp and the Montefiore Institute (Faculty of Applied Sciences) of the University of Liège are seeking to fill two full-time (100%) vacancies for Doctoral Grants in the area of machine/deep learning, language technology, and/or computer vision for enriching heritage collections. More information.

4. Introduction of CODH computer vision and machine learning datasets such as old Japanese books and characters
Asanobu KITAMOTO (CODH -National Institute of Informatics)

  • Slides;
  • Center for Open Data in the Humanities (CODH);
  • It’s a research center in Tokyo, Japan, officially launched on April 1, 2017;
  • Scope: (1) humanities research using information technology and (2) other fields of research using humanities data.
  • Released datasets:
    • Dataset of Pre-Modern Japanese Text (PMJT): Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, is released image and text data as open data. In addition, some text has description, transcription, and tagging data.

      Pre-Modern Japanese Text Dataset: currently 701 books
    • PMJT Character Shapes;
    • IIIF Curation Viewer

      Curation Viewer
  • CODH is looking for a project researcher who is interested in applying computer vision to humanities data. Contact:

5. Introduction to the new South African Centre for Digital Language Resources (SADiLaR )
Juan Steyn

  • Slides;
  • SADiLaR is a new research infrastructure set up by the Department of Science and Technology (DST) forming part of the new South African Research Infrastructure Roadmap (SARIR).
  • Officially launched on October, 2016;
  • SADiLaR runs two programs:
    • Digitisation program: which entails the systematic creation of relevant digital text, speech and multi-modal resources related to all official languages of South Africa, as well as the development of appropriate natural language processing software tools for research and development purposes;
    • A Digital Humanities program; which facilitates research capacity building by promoting and supporting the use of digital data and innovative methodological approaches within the Humanities and Social Sciences. (See

DH2017 – Computer Vision in DH workshop (Papers – Third Block)

Third block: Deep Learning
Chair: Thomas Smits (Radboud University)

6) Aligning Images and Text in a Digital Library (Jack Hessel & David Mimno)

Website David Mimno
Website Jack Hessel

Problem: correspondence between text and images.
  • In this work, the researchers train machine learning algorithms to match images from book scans with the text in the pages surrounding those images.
  • Using 400K images collected from 65K volumes published between the 14th and 20th centuries released to the public domain by the British Library, they build information retrieval systems capable of performing cross-modal retrieval, i.e., searching images using text, and vice-versa.
  • Previous multi-modal work:
    • Datasets: Microsoft Common Objects in Context (COCO) and Flickr (images with user-provided tags);
    • Tasks: Cross-modal information retrieval (ImageCLEF) and Caption search / generation
  • Project Goals:
    • Use text to provide context for the images we see in digital libraries, and as a noisy “label” for computer vision tasks
    • Use images to provide grounding for text.
  • Why is this hard? Most relationship between text and images is weakly aligned, that is, very vague. A caption is an example of strong alignments between text and images. An article is an example of weak alignment.

7) Visual Trends in Dutch Newspaper Advertisements (Melvin Wevers & Juliette Lonij)


Live Demo of SIAMESE: Similar advertisement search.
  • The context of advertisements for historical research:
    • “insight into the ideals and aspirations of past realities …”
    • “show the state of technology, the social functions of products, and provide information on the society in which a product was sold” (Marchand, 1985).
  • Research question: How can we combine non-textual information with textual information to study trends in advertisements?
  • Data: ~1,6M Advertisements from two Dutch national newspapers Algemeen Handelsblad and NRC Handelsblad between 1948-1995
  • Metadata: title, date, newspaper, size, position (x, y), ocr, page number, total number of pages.
  • Approach: Visual Similarity:
    • Group images together based on visual cues;
    • Demo: SIAMESE: SImilar AdvertiseMEnt SEarch;
    • Approximate nearest neighbors in a penultimate layer of ImageNet inception model.
  • Final remarks:
    • Object detection and visual similarity approach offer trends on different layers, similar to close and distant reading;
    • Visual Similarity is not always Conceptual Similarity;
    • Combination of text/semantic and visual similarity as a way to find related advertisements.

8) Deep Learning Tools for Foreground-Aware Analysis of Film Colors (Barbara Flueckiger, Noyan Evirgen, Enrique G. Paredes, Rafael Ballester-Ripoll, Renato Pajarola)

The research project FilmColors, funded by an Advanced Grant of the European Research Council, aims at a systematic investigation into the relationship between film color technologies and aesthetics.

Initially, the research team analyzed a large group of 400 films from 1895 to 1995 with a protocol that consists of about 600 items per segment to identify stylistic and aesthetic patterns of color in film.

This human-based approach is now being extended by an advanced software that is able to detect the figure-ground configuration and to plot the results into corresponding color schemes based on a perceptually uniform color space (see Flueckiger 2011 and Flueckiger 2017, in press).

ERC Advanced Grant FilmColors

DH2017 – Computer Vision in DH workshop (Papers – Second Block)

Second block: Tools
Chair: Melvin Wevers (Utrecht University)

4) A Web-Based Interface for Art Historical Research (Sabine Lang & Bjorn Ommer)

Computer Vision Group (University of Heidelberg)

  • Area: art history <-> computer vision
  • First experiment: Can computers propel the understanding and reconstruction of drawing processes?
  • Goal: Study production process. Understand the types and degrees of transformation between an original piece of art and its reproductions.
  • Experiment 2: Can computers help with the analysis of large image corpora, e.g. find gestures?
  • Goal: Find visual similarities and do formal analysis.
  • Central questions: which gestures can we identify? Do there exist varying types of one gesture?
  • Results: Visuelle Bildsuche (interface for art historical research)
Visuelle Bildsuche – Interface start screen. Data collection Sachsenspiegel (c1220)
  • Interesting and potential feature: in the image, you can markup areas and find others images with visual similarities:
Search results with visual similarities based on selected bounding boxes
Bautista, Miguel A., Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer. “CliqueCNN: Deep Unsupervised Exemplar Learning.” arXiv:1608.08792 [Cs], August 31, 2016.

5) The Media Ecology Project’s Semantic Annotation Tool and Knight Prototype Grant (Mark Williams, John Bell, Dimitrios Latsis, Lorenzo Torresani)

Media Ecology Project (Dartmouth)

The Semantic Annotation Tool (SAT)

Is a drop-in module that facilitates the creation and sharing of time-based media annotations on the Web

Knight News Challenge Prototype Grant

Knight Foundation has awarded a Prototype Grant for Media Innovation to The Media Ecology Project (MEP) and Prof. Lorenzo Torresani’s Visual Learning Group at Dartmouth, in conjunction with The Internet Archive and the VEMI Lab at The University of Maine.

“Unlocking Film Libraries for Discovery and Search” will apply existing software for algorithmic object, action, and speech recognition to a varied collection of 100 educational films held by the Internet Archive and Dartmouth Library. We will evaluate the resulting data to plan future multimodal metadata generation tools that improve video discovery and accessibility in libraries.


DH2017 – Computer Vision in DH workshop (Papers – First Block)

Seven papers have been selected by a review commission and authors had 15 minutes to present during the Workshop. Papers were divided into three thematic blocks:

First block: Research results using computer vision
Chair: Mark Williams (Darthmouth College)

1) Extracting Meaningful Data from Decomposing Bodies (Alison Langmead, Paul Rodriguez, Sandeep Puthanveetil Satheesan, and Alan Craig)

Full Paper

Each card used a pre-established set of eleven anthropometrical measurements (such as height, length of left foot, and width of the skull) as an index for other identifying information about each individual (such as the crime committed, their nationality, and a pair of photographs).

This presentation is about Decomposing Bodies, a large-scale, lab-based, digital humanities project housed in the Visual Media Workshop at the University of Pittsburgh that is examining the system of criminal identification introduced in France in the late 19th century by Alphonse Bertillon.

  • Data: System of criminal identification from American prisoners from Ohio.
  • ToolOpenFace. Free and open source face recognition with deep neural networks.
  • Goal: An end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion and also the ability to browse through the original data and generated metadata using a web interface.
  • Character recognition: MNIST database
  • Mechanical Turk: we need to talk about it”: consider Mechanical Turk if public domain data and task is easy.
  • Findings: Humans deal very well with understanding discrepancies. We should not ask the computer to find these discrepancies to us, but we should build visualizations that allow us to visually compare images and identify de similarities and discrepancies.

2) Distant Viewing TV (Taylor Arnold and Lauren Tilton, University of Richmond)


Distant Viewing TV applies computational methods to the study of television series, utilizing and developing cutting-edge techniques in computer vision to analyze moving image culture on a large scale.

Screenshots of analysis of Bewitched
  • Code on Github
  • Both presenters are authors o Humanities Data in R
  • The project was built on work with libraries with low-level features (dlib, cvv and OpenCV) + many papers that attempt to identify mid-level features. Still:
    • code often nonexistent;
    • a prototype is not a library;
    • not generalizable;
    • no interoperability
  • Abstract-features such as genre and emotion, are new territories
Feature taxonomy
  • Pilot study: Bewitched (serie)
  • Goal: measure character presence and position in the scene
  • Algorithm for shot detection 
  • Algorithm for face detection
  •  Video example
  • Next steps:
    • Audio features
    • Build a formal testing set

3) Match, compare, classify, annotate: computer vision tools for the modern humanist (Giles Bergel)

The Printing Machine (Giles Bergel research blog)

This presentation related the University of Oxford’s Visual Geometry Group’s experience in making images computationally addressable for humanities research.

The Visual Geometry Group has built a number of systems for humanists, variously implementing (i) visual search, in which an image is made retrievable; (ii) comparison, which assists the discovery of similarity and difference; (iii) classification, which applies a descriptive vocabulary to images; and (iv) annotation, in which images are further described for both computational and offline analysis

a) Main Project Seebibyte

  • Idea: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 – 2020.
  • Objectives: to carry out fundamental research to develop next generation computer vision methods that are able to analyse, describe and search image and video content with human-like capabilities. To transfer these methods to industry and to other academic disciplines (such as Archaeology, Art, Geology, Medicine, Plant sciences and Zoology)
  • Demo: BBC News Search (Visual Search of BBC News)

Tool: VGG Image Classification (VIC) Engine

This is a technical demo of the large-scale on-the-fly web search technologies which are under development in the Oxford University Visual Geometry Group, using data provided by BBC R&D comprising over five years of prime-time news broadcasts from six channels. The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types: object search, image search and people search.

The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types.
An item of interest can be specified at run time by a text query, and a discriminative classifier for that item is then learnt on-the-fly using images downloaded from Google Image search.

ApproachImage classification through Machine Learning.
Tool: VGG Image Classification Engine (VIC)

The objective of this research is to find objects in paintings by learning classifiers from photographs on the internet. There is a live demo that allows a user to search for an object of their choosing (such as “baby”, “bird”, or “dog, for example) in a dataset of over 200,000 paintings, in a matter of seconds.

It allows computers to recognize objects in images, what is distinctive about our work is that we also recover the 2D outline of the object. Currently, the project has trained this model to recognize 20 classes. The demo allows the user to test our algorithm on their images.

b) Other projects

Approach: Image searching
Tool: VGG Image Search Engine (VISE)

Approach: Image annotation
Tool: VGG Image Annotator (VIA)