The School of Applied Mathematics (EMAp) and the School of Social Sciences (CPDOC) from Getúlio Vargas Foundation (FGV) will organize and host the First Panorama in Digital Technologies for Museums (I Panorama em Tecnologias Digitais para Museus) on November 27, 2018.
The objective of this Panorama is to present the demands of the museological sector, as well as reflections on previous experiences. Given the scenario of the recent disaster of the National Museum of UFRJ, it is necessary a reaction of all the actors involved in the theme: managers, researchers, educators and other sectors of society.
The event will discuss the strengthening of a knowledge network around the use of digital technologies in the museum context. Likewise, it is necessary to consider impacts related to the diffusion of the collections of these museums, understanding that the society’s engagement with the issue, as well as the development of a close relationship between population and museums, is one of the ways of preserving, collecting and maintaining investments in these institutions.
Representatives of diverse institutions will participate as speakers in this event. Among them, my Ph.D. co-advisor and coordinator of the Visgraf Laboratory, Luiz Velho.
Last week, I co-hosted a workshop at the Thought For Food Academy Program, an international event dedicated to engaging and empowering the next generation of innovators to solve the complex and important challenges facing our food system. And for that to be, the annual TFF Academy and Summit bring together interdisciplinary professionals from science, entrepreneurship, industry, policy, and design to explore, debate and create ‘what’s next’ in food and agriculture. The TFF Academy Program took place in Escola Eleva, Rio de Janeiro, from 23 to 26 July. The full TFF Program can be accessed here.
I had the opportunity to propose a hands-on activity on Data visualization for spatial data analysis as part of the Big Data and GIS specialization track offered to young students and entrepreneurs from all over the world. In total, 35 participants from 20 different nationalities participated in the workshop. I co-hosted this track with Brittany Dahl, from ESRI Australia, and Vinicius Filier, from Imagem Soluções de Inteligência Geográfica.
The resources for this hands-on activity (slides and instructions) can be found on my personal website.
A special thanks to Leandro Amorim, Henrique Ilidio and Erlan Carvalho, from Café Design Studio, who helped to line up this activity.
The first People + AI Research Symposium brings together academics, researchers and artists to discuss such topics as augmented intelligence, model interpretability, and human–AI collaboration.
The Symposium is part of PAIR initiative, a Google Artificial Intelligence project, and is scheduled to go on livestream on September 26, 2017, at 9 am (GMT-4).
The livestream content will be available on this link.
1) Welcome: John Giannandrea (4:55); Martin Wattenberg and Fernanda Viegas (20:06)
2) Jess Holbrook, PAIR Google (UX lead for the AI project. Talks about the concept of Human-centered Machine Learning)
3) Karrie …, University of Illinois
4) Hae Won Park, MIT
5) Maya Gupla, Google
6) Antonio Torralba, MIT
7) John Zimmerman, Carnegie Melon University
Webinar abstract: It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, Dr. Li will briefly discuss the key ideas and the cutting-edge advances in the quest for visual intelligence in computers, focusing on work done to develop ImageNet over the years.
Some highlights of this webinar:
1) The impact of ImageNet on AI/ ML research:
First. What’s ImageNet? It’s an image database, a “… largescale ontology of images built upon the backbone of the WordNet structure”;
ImageNet became a key driven-force for deep learning implementation and helped to spread the culture of building structured datasets for specific domains:
Kaggle: a platform for predictive modeling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data
Datasets – not algorithms – might be the key limiting factor to develpment of human-level artificial inteligence.” (Alexander Wissner-Gross, 2016)
2) The background of ImageNet
The beginning: Publication about ImageNet in CVPR (2009);
There are a lot of previous datasets that should be acknowledged:
The reason why ImageNet became so popular is that this dataset has the rights characteristics to implement Computer Vision (CV) tasks from a Machine Learning (ML) approach.;
By 2005, the marriage of ML and CV became a trend in the scientific community;
There was a shift in the way ML was applied for visual recognition tasks: from a modeling-oriented approach to having lots of data.
This shift was partly enabled by the rapid internet data growth, that meant the opportunity to collect a large-scale visual data.
3) From Wordnet to ImageNet
ImageNet was built upon the backbone of the WordNet, a tremendous dataset that enabled work in Natural Language Processing (NLP) and related tasks.
What’s WordNet? It’s a large lexical database of English. The original paper (3) by George Miller et al is cited over 5k. The database organizers over 150k words into 117k categories. It establishes ontological and lexical relationships in NLP and related tasks.
The idea to move from language to image:
Three steps shift:
Step 1: ontological structures based on wordnet;
Step 2: populate categories with thousands of images from the internet;
Step3: clean bad results manually. By cleaning the errors you ensure your dataset is accurate.
Three attempts to populate, train and test the dataset. The first two failed. The third was successful due to a new technology that became available by that time: Amazon Mechanical Turk, a kind of crowdsourced engineer. Imagenet had the help of 49k workers from 167 countries (2007-2010).
After three years, ImageNet goes live in 2009 (50M images organized by 10K concept categories)
4) What they did right?
Based on ML needs, ImageNet targeted scale:
Besides, the database cared about:
image quality (high resolution to better replicate human visual acuity);
accurate annotations (to create a benchmarking dataset and advance the state of machine perception);
free of Charge (to ensure immediate application and a sense of community -> democratization)
Emphasis on Community: ILSVRC challenge is launched in 2009;
ILSVRC was inspired in PASCAL by VOC (Pattern Analysis, Statistical Modelling, and Computational Learning). From 2005-2012.
Participation and performance: the number of entries increased; classification errors (top-5) went down; the average precision for object detection went up:
5) In what ImageNet invested and still investing efforts?
Lack of details: just one category annotated per image. Object detection enabled to recognize more than one class per image (through bounding boxs);
Fine-grained recognition: recognize similar objects (class of cars, for example):
6) Expected outcomes
ImageNet became a benchmark
It meant a breakthrough in object recognition
Machine learning advanced and changed dramatically
7) Unexpected outcomes
Neural Nets became popular in academical research again
Together, with the increase of accurate and available datasets and high-performance GPUs they promoted a Deep Learning revolution:
Maximize specificity in ontological structures:
Still, relatively few works uses ontological structures;
Human comparing versus machine comparing:
7) What lies ahead
moving from object recognition to human-level understanding (from perception to cognition):
That’s the concept behind Microsoft COCO (Common Objects in Context) (5), a “dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding”;
More recently there is the Visual Genome (6), a dataset, a knowledge base, an ongoing effort to connect structural image concepts to language:
Visual Genome dataset was further used to advance the state-of-art in CV:
Image retrieval with scene graph;
visual questioning and answering
The future of vision intelligence relies upon the integration of perception, understanding, and action;
From now on, ImageNet ILSVRC challenge will be organized by Kaggle, a data science community that organizes competitions and makes datasets available.
(1) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.
(2) Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision 115, no. 3 (December 2015): 211–52. doi:10.1007/s11263-015-0816-y.
(3) Miller, George A. “WordNet: A Lexical Database for English.” Communications of the ACM 38, no. 11 (1995): 39–41.
(4) Deng, Jia, Alexander C. Berg, Kai Li, and Li Fei-Fei. “What Does Classifying More than 10,000 Image Categories Tell Us?” In European Conference on Computer Vision, 71–84. Springer, 2010. https://link.springer.com/chapter/10.1007/978-3-642-15555-0_6.
(5) Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. “Microsoft COCO: Common Objects in Context.” arXiv:1405.0312 [Cs], May 1, 2014. http://arxiv.org/abs/1405.0312.
(6) Krishna, Ranjay, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, et al. “Visual Genome.” Accessed September 27, 2017. https://pdfs.semanticscholar.org/fdc2/d05c9ee932fa19df3edb9922b4f0406538a4.pdf.
Cinemetrics, Colour Analysis & Digital Humanities:
Brodbeck (2011) “Cinemetrics”: the project is about measuring and visualizing movie data, in order to reveal the characteristics of films and to create a visual “fingerprint” for them. Information such as the editing structure, color, speech or motion are extracted, analyzed and transformed into graphic representations so that movies can be seen as a whole and easily interpreted or compared side by side.
Data from two museums Museums: Royal Museums of Fine Arts of Belgium and Royal Museums of Art and History;
Research opportunity: how can multimodal representation learning (NPL + Vision) help to organize and explore this data;
Transfer knowledge approach:
Large players in the field have massive datasets;
How easily can we transfer knowledge from large to small collections? E.g. automatic dating or object description;
Partner up: the Departments of Literature and Linguistics (Faculty of Arts and Philosophy) of the University of Antwerp and the Montefiore Institute (Faculty of Applied Sciences) of the University of Liège are seeking to fill two full-time (100%) vacancies for Doctoral Grants in the area of machine/deep learning, language technology, and/or computer vision for enriching heritage collections. More information.
4. Introduction of CODH computer vision and machine learning datasets such as old Japanese books and characters
Asanobu KITAMOTO (CODH -National Institute of Informatics)
It’s a research center in Tokyo, Japan, officially launched on April 1, 2017;
Scope: (1) humanities research using information technology and (2) other fields of research using humanities data.
Dataset of Pre-Modern Japanese Text (PMJT): Pre-Modern Japanese Text, owned by National Institute of Japanese Literature, is released image and text data as open data. In addition, some text has description, transcription, and tagging data.
SADiLaR is a new research infrastructure set up by the Department of Science and Technology (DST) forming part of the new South African Research Infrastructure Roadmap (SARIR).
Officially launched on October, 2016;
SADiLaR runs two programs:
Digitisation program: which entails the systematic creation of relevant digital text, speech and multi-modal resources related to all official languages of South Africa, as well as the development of appropriate natural language processing software tools for research and development purposes;
A Digital Humanities program; which facilitates research capacity building by promoting and supporting the use of digital data and innovative methodological approaches within the Humanities and Social Sciences. (See http://www.digitalhumanities.org.za)
In this work, the researchers train machine learning algorithms to match images from book scans with the text in the pages surrounding those images.
Using 400K images collected from 65K volumes published between the 14th and 20th centuries released to the public domain by the British Library, they build information retrieval systems capable of performing cross-modal retrieval, i.e., searching images using text, and vice-versa.
Previous multi-modal work:
Datasets: Microsoft Common Objects in Context (COCO) and Flickr (images with user-provided tags);
Tasks: Cross-modal information retrieval (ImageCLEF) and Caption search / generation
Use text to provide context for the images we see in digital libraries, and as a noisy “label” for computer vision tasks
Use images to provide grounding for text.
Why is this hard? Most relationship between text and images is weakly aligned, that is, very vague. A caption is an example of strong alignments between text and images. An article is an example of weak alignment.
The research project FilmColors, funded by an Advanced Grant of the European Research Council, aims at a systematic investigation into the relationship between film color technologies and aesthetics.
Initially, the research team analyzed a large group of 400 films from 1895 to 1995 with a protocol that consists of about 600 items per segment to identify stylistic and aesthetic patterns of color in film.
This human-based approach is now being extended by an advanced software that is able to detect the figure-ground configuration and to plot the results into corresponding color schemes based on a perceptually uniform color space (see Flueckiger 2011 and Flueckiger 2017, in press).
Is a drop-in module that facilitates the creation and sharing of time-based media annotations on the Web
Knight News Challenge Prototype Grant
Knight Foundation has awarded a Prototype Grant for Media Innovation to The Media Ecology Project (MEP) and Prof. Lorenzo Torresani’s Visual Learning Group at Dartmouth, in conjunction with The Internet Archive and the VEMI Lab at The University of Maine.
“Unlocking Film Libraries for Discovery and Search” will apply existing software for algorithmic object, action, and speech recognition to a varied collection of 100 educational films held by the Internet Archive and Dartmouth Library. We will evaluate the resulting data to plan future multimodal metadata generation tools that improve video discovery and accessibility in libraries.
Abstract: This talk will focus on an array of algorithmic image analysis techniques, from simple to cutting-edge, on materials ranging from 19th century photography to 20th century fashion magazines. We’ll consider colormetrics, hue extraction, facial detection, and neural network-based visual similarity. We’ll also consider the opportunities and challenges of obtaining and working with large-scale image collections.
What if we could search for pictures that are visually similar to a given image
Neural networks approach
Demo of Visual Similarity experiment:
In the main interface, you select an image and it shows its closest neighbors.
Other related works on Visual Similarities:
John Resig’s Ukiyo-e (Japenese woodblock prints project). Article: Resig, John. “Aggregating and Analyzing Digitized Japanese Woodblock Prints.” Japanese Association of Digital Humanities conference, 2013.
John Resig’s TinEye MatchEngine (Finds duplicate, modified and even derivative images in your image collection).
Carl Stahmer – Arch Vision (Early English Broadside / Ballad Impression Archive)
Article: Stahmer, Carl. (2014). “Arch-V: A platform for image-based search and retrieval of digital archives.” Digital Humanities 2014: Conference Abstracts
An introduction of basic notions about the challenges of computer vision. A feeling of the simple, low-level operations necessary for the next stage.
Basic image operations: scikit-image
Face-object identification + identification: dlib
Deep Learning: Keras
What is CV?
How to gain high-level understanding from digital images or videos.
It tries to resolve tasks that humans can do (Wikipedia)
Human Vision System (HVS) versus Digital Image Processing (what the computer sees)
– Jupyter system (an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text);
– perform basic image operations;
– Play with different convolutions to develop intuition.
Hands-on Part II – Deep Learning and its application
During the DH2017 conference in Montreal, I attended the ‘Computer Vision in Digital Humanities‘ workshop organized by AVinDH SIG (Special Interest Group AudioVisual material in Digital Humanities). All information about the workshop can be found here.
An abstract about the workshop was published on DH2017 Proceedings and can be found here.
This workshop focus on how computer vision can be applied within the realm of Audiovisual Materials in Digital Humanities. The workshop included: