Visualizing time, texture and themes in historical drawings

Past vision is a collection of historical drawings visualized in a thematic and temporal arrangement. The interface highlights general trends in the overall collection and gives access to rich details of individual items.

The case study examines the potential of visualization when applied to, and developed for, cultural heritage collections. It specifically explores how techniques aimed at visualizing the quantitative structure of a collection can be coupled with a more qualitative mode that allows for detailed examination of the artifacts and their contexts by displaying high-resolution views of digitized cultural objects with detailed art historical research findings.

Past vision is a research project by Urban Complexity Lab at Potsdam University of Applied Sciences.

Reference: “Past Visions and Reconciling Views: Visualizing Time, Texture and Themes in Cultural Collections.” ResearchGate. Accessed March 8, 2018.

Visualizing cultural collections

Browsing the content from Information Plus Conference (2016 edition) I bumped into a really interesting presentation regarding the use of graphical user interfaces and data visualization to support the exploration of large-scale digital cultural heritage.

One View is Not Enough: High-level Visualizations of Large Cultural Collections is a contribution by the Urban Complexity Lab, from the University of Applied Sciences Potsdam. Check the talk by Marian Dörk:

As many cultural heritage institutions, such as museums, archives, and libraries, are digitizing their assets, there is a pressing question which is how can we give access to this large-scale and complex inventories? How can we present it in a way to let people can draw meaning from it, get inspired and entertained and maybe even educated?

The Urban Complexity Lab tackle this open problem by investigating and developing graphical user interfaces and different kinds of data visualizations to explore and visualize cultural collections in a way to show high-level patterns and relationships.

In this specific talk, Marian presents two projects conducted at the Lab. The first, DDB visualized, is a project in partnership with the Deutsche Digitale Bibliothek. Four interactive visualizations make the vast extent of the German Digital Library visible and explorable. Periods, places and persons are three of the categories, while keywords provide links to browsable pages of the library itself.

 

The second, GEI – Digital, is a project in partnership with the Georg Eckert Institute. This data dossier provides multi-faceted perspectives on GEI-Digital, a digital library of historical schoolbooks created and maintained by the Georg Eckert Institute for International Textbook Research.

 

Mind-reading machines

A new AI model sort of reconstructs what you see from brain scans.

Schematics of our reconstruction approach. (A) Model training. We use an adversarial training strategy adopted from Dosovitskiy and Brox (2016b), which consists of 3 DNNs: a generator, a comparator, and a discriminator. The training images are presented to a human subject, while brain activity is measured by fMRI. The fMRI activity is used as an input to the generator. The generator is trained to reconstruct the images from the fMRI activity to be as similar to the presented training images in both pixel and feature space. The adversarial loss constrains the generator to generate reconstructed images that fool the discriminator to classify them as the true training images. The discriminator is trained to distinguish between the reconstructed image and the true training image. The comparator is a pre-trained DNN, which was trained to recognize the object in natural images. Both the reconstructed and true training images are used as an input to the comparator, which compares the image similarity in feature space. (B) Model test. In the test phase, the images are reconstructed by providing the fMRI activity of the test image as the input to the generator. (Shen et al, 2018)

Check the Journal Article here

Reference: Shen, Guohua, Kshitij Dwivedi, Kei Majima, Tomoyasu Horikawa, and Yukiyasu Kamitani. “End-to-End Deep Image Reconstruction from Human Brain Activity.” BioRxiv, February 27, 2018, 272518. https://doi.org/10.1101/272518.

A computer vision algorithm for identifying images in different lighting

Computer vision has come a long way since Imagenet, a large, open-source data set of labeled images, was released in 2009 for researchers to use to train AI—but images with tricky or bad lighting can still confuse algorithms.

A new paper by researchers from MIT and DeepMind details a process that can identify images in different lighting without having to hand-code rules or train on a huge data set. The process, called a rendered intrinsics network (RIN), automatically separates an image into reflectance, shape, and lighting layers. It then recombines the layers into a reconstruction of the original images.

AI is learning how to invent new fashions

In a paper published on the ArXiv, researchers from the University of California and Adobe have outlined a way for AI to not only learn a person’s style but create computer-generated images of items that match that style. This kind of computer vision task is being called “predictive fashion” and could let retailers create personalized pieces of clothing.

The model can be used for both personalized recommendation and design. Personalized recommendation is achieved by using a ‘visually aware’ recommender based on Siamese CNNs; generation is achieved by using a Generative Adversarial Net to synthesize new clothing items in the user’s personal style. (Kang et al., 2017).
Reference: Kang, Wang-Cheng, Chen Fang, Zhaowen Wang, and Julian McAuley. “Visually-Aware Fashion Recommendation and Design with Generative Image Models.” arXiv:1711.02231 [Cs], November 6, 2017. http://arxiv.org/abs/1711.02231.

Artificial Intelligence that can create convincing spoof photo and video

I wonder if Peter Burke would rethink the documental and historical status of photography when we start to see AI and Deep Learning systems (like generative adversarial networks – GANs) being used to create fake and believable images at scale.

Reproduction from Ian Goodfellow’s speaking presentation at EmTech MIT 2017.
Reference: J. Snow, “AI could send us back 100 years when it comes to how we consume news,” MIT Technology Review. [Online]. Available: https://www.technologyreview.com/s/609358/ai-could-send-us-back-100-years-when-it-comes-to-how-we-consume-news/. [Accessed: 09-Nov-2017].

Machine Learning Foundations – Week 1: course overview

I decided to take the online course “Machine Learning Foundations – A Case Study Approach” offered by Coursera and taught by Carlos Guestrin and Emily Fox (professors from University of Washington).

This introductory and intuitive course treats the Machine Learning method as a black box. The idea is to learn ML concepts through a case study approach, so the course doesn’t deepen on how to describe a ML model and optimize it.

It’s a 6-week course and I’ll share here the highlights related to my research.

Week 1 – course overview

Slides
Videos

Machine learning is changing the world: In fact, if you look some of the most industry successful companies today – Companies that are called disruptive – they’re often differentiated by intelligent applications, by intelligence that uses machine learning at its core. So, for example, early days Amazon really disrupted the retail market by bringing in product recommendations into their website. We saw Google disrupting the advertising market by really targeting advertising with machine learning to figure out what people would click on. You saw Netflix, the movie distribution company, really change how movies are seen. Now we don’t go to a shop and rent movies anymore. We go to the web and we stream data. Netflix really changed that. And at the core, there was a recommender system that helped me find the movies that I liked, the movies that are good for me out of the many, many, many thousands of movies they were serving. You see companies like Pandora, where they’re providing a music recommendation system where I find music that I like. And I find streams that are good for the morning when I’m sleepy or at night when I’m ready to go to bed and I want to listen to different music. And they really find good music for us. And you see that in many places, in many industries, you see Facebook connecting me with people who I might want to be friends with. And you even see companies like Uber disrupting the taxi industry by really optimizing how to connect drivers with people in real time. So, in all these areas, machine learning is one of the core technologies, the technology that makes that company’s product really special.

The Machine Learning pipeline: the data to intelligence pipeline. We start from data and bring in a machine learning method that provides us with a new kind of analysis of the data. And that analysis gives us intelligence. Intelligence like what product am I likely to buy right now?

Case study 1: Predicting house prices

Machine Learning can be used to predict house values. So, the intelligence we’re deriving is a value associated with some house that’s not on the market. So, we don’t know what its value is and we want to learn that from data. And what’s our data? In this case, we look at other houses and look at their house sales prices to inform the house value of this house we’re interested in. And in addition to the sales prices, we look at other features of the houses. Like how the number of bedrooms, bathrooms, the number of square feet, and so on. What the machine learning method does it to relate the house attributes to the sales price. Because if we can learn this model – this relationship from house level features to the observed sales price – then we can use that for predicting on this new house. We take its house attribute and predict its house sales price. And this method is called regression.

Case study 2: Sentiment analysis

Machine Learning can be used to a sentiment analysis task where the training data are reviews of restaurants. In this case, a review can say the sushi was awesome, the drink was awesome, but the service was awful. A possible ML goal in this scenario can be to take this single review and classify whether or not it has a positive sentiment. If it is a good review, thumbs up; if it has negative sentiment, thumbs down. To do so, the ML pipeline analyses a lot of other reviews (training data) considering the text and the rating of the review in order to understand what’s the relationship here, for classification of this sentiment. For example, the ML model might analyze the text of this review in terms of how many time the word “awesome” versus how many times the word “awful” was used. And doing so for all reviews, the model will learn – based on the balance of usage of these words – a decision boundary between whether it’s a positive or negative review. And the way the model learn from these other reviews is based on the ratings associated with that text. This method is called a classification method.

Case study 3: Document retrieval

The third case study it’s about a document retrieval task. From a huge collection of articles and books (dataset) the system could recommend, the challenge is to use machine learning to indicate those readings more interesting to a specific person. In this case, the ML model tries to find structure in the dataset based on groups of related articles (e.g. sports, world news, entertainment, science, etc.). By finding this structure and annotating the corpus (the collection of documents) then the machine can use the labels to build a document retrieval engine. And if a reader is currently reading some article about world news and wants to retrieve another one, then, aware of its label, he or she knows which type of category to keep searching over. This type of approach is called clustering.

Case study 4: Product recommendation

The fourth case study addresses an approach called collaborative filtering that’s had a lot of impact in many domains in the last decade. Specifically, the task is to build a product recommendation applications, where the ML model gets to know the costumer’s past purchases and tries to use those to recommend some set of other products the customer might be interested in purchasing. The relation the model tries to understand to make the recommendation is on the products the consumer bought before and what he or she is likely to buy in the future. And to learn this relation the model looks at the purchase histories of a lot of past customers and possibly features of those customers (e.g. age, genre, family role, location …).

Case study 5:  Visual product recommender

The last case study is about a visual product recommender. The concept idea is pretty much like the latter example. The task here is also a recommendation application, but the ML model learns from visual features of an image and the outcome is also an image. Here, the data is an input image (e.g. black shoe, black boot, high heel, running shoe or some other shoe) chosen by a user on a browser. And the goal of the application is to retrieve a set of images of shoes visually similar to the input image. The model does so by learning visual relations between different shoes. Usually, these models are trained on a specific kind of architecture called Convolutional Neural Network (CNN). In CNN architecture, every layer of the neural network provides more and more descriptive features. The first layer is supposed to just detect features like different edges. By the second layer, the model begins to detect corners and more complex features. And as we go deeper and deeper in these layers, we can observe more intricate visual features arising.

PAIR Symposium 2017

The first People + AI Research Symposium brings together academics, researchers and artists to discuss such topics as augmented intelligence, model interpretability, and human–AI collaboration.

PAIR Symposium

The Symposium is part of PAIR initiative, a Google Artificial Intelligence project, and is scheduled to go on livestream on September 26, 2017, at 9 am (GMT-4).

The livestream content will be available on this link.

Morning Program:
1) Welcome: John Giannandrea (4:55); Martin Wattenberg and Fernanda Viegas (20:06)
2) Jess Holbrook, PAIR Google
(UX lead for the AI project. Talks about the concept of Human-centered Machine Learning)
3) Karrie …, University of Illinois
4)  Hae Won Park, MIT
5) Maya Gupla, Google
6) Antonio Torralba, MIT
7) John Zimmerman, Carnegie Melon University

Webinar: ImageNet – Where have we been? Where are we going?

ACM Learning Webinar
ImageNet: Where have we been? Where are we going?
Speaker: Fei-Fei Li
Chief Scientist of AI/ML at Google Cloud; Associate Professor at Stanford, Director of Stanford A.I. Lab

Slides

Webinar abstract: It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, Dr. Li will briefly discuss the key ideas and the cutting-edge advances in the quest for visual intelligence in computers, focusing on work done to develop ImageNet over the years.

_____

Some highlights of this webinar:

1) The impact of ImageNet on AI/ ML research:
  • First. What’s ImageNet? It’s an image database, a “… largescale ontology of images built upon the backbone of the WordNet structure”;
  • The article “ImageNet: A Large-Scale Hierarchical Image Database” (1) has ~4,386 citations by the time on Google Scholar;
  • The dataset gave origin to The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) (2), a benchmark in image classification and object detection in images running annually from 2010 up to now;
  • Many ImageNet Challenge Contestants became Startups (e.g. Clarifai; VizSense);
  • ImageNet became a key driven-force for deep learning implementation and helped to spread the culture of building structured datasets for specific domains:
Annotated datasets for specific domains.
  • Kaggle: a platform for predictive modeling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data

Datasets – not algorithms – might be the key limiting factor to develpment of human-level artificial inteligence.” (Alexander Wissner-Gross, 2016)

2) The background of ImageNet
  • The beginning: Publication about ImageNet in CVPR (2009);
  • There are a lot of previous datasets that should be acknowledged:
Previous image datasets.
  • The reason why ImageNet became so popular is that this dataset has the rights characteristics to implement Computer Vision (CV) tasks from a Machine Learning (ML) approach.;
  • By 2005, the marriage of ML and CV became a trend in the scientific community;
  • There was a shift in the way ML was applied for visual recognition tasks: from a modeling-oriented approach to having lots of data.
  • This shift was partly enabled by the rapid internet data growth, that meant the opportunity to collect a large-scale visual data.
    3) From Wordnet to ImageNet
  • ImageNet was built upon the backbone of the WordNet, a tremendous dataset that enabled work in Natural Language Processing (NLP) and related tasks.
  • What’s WordNet? It’s a large lexical database of English. The original paper (3) by George Miller et al is cited over 5k. The database organizers over 150k words into 117k categories. It establishes ontological and lexical relationships in NLP and related tasks.
  • The idea to move from language to image:
From WordNet to ImageNet.
  • Three steps shift:
    • Step 1: ontological structures based on wordnet;
    • Step 2: populate categories with thousands of images from the internet;
    • Step3: clean bad results manually. By cleaning the errors you ensure your dataset is accurate.
From WordNet to ImageNet: three steps.
  • Three attempts to populate, train and test the dataset. The first two failed. The third was successful due to a new technology that became available by that time:  Amazon Mechanical Turk, a kind of crowdsourced engineer. Imagenet had the help of 49k workers from 167 countries (2007-2010).
  • After three years, ImageNet goes live in 2009 (50M images organized by 10K concept categories)
4) What they did right?
  • Based on ML needs, ImageNet targeted scale:
ImageNet: large-scale visual data
  • Besides, the database cared about:
    • image quality (high resolution to better replicate human visual acuity);
    • accurate annotations (to create a benchmarking dataset and advance the state of machine perception);
    • free of Charge (to ensure immediate application and a sense of community -> democratization)
  • Emphasis on Community: ILSVRC challenge is launched in 2009;
  • ILSVRC was inspired in PASCAL by VOC (Pattern Analysis, Statistical Modelling, and Computational Learning). From 2005-2012.
  • Participation and performance: the number of entries increased; classification errors (top-5) went down; the average precision for object detection went up:
Participation and performance at ILSVRC (2010-2017)
5) In what ImageNet invested and still investing efforts?
  • Lack of details: just one category annotated per image. Object detection enabled to recognize more than one class per image (through bounding boxs);
  • Hierarchical annotation:
Confusion matrix and sub-matrices of classifying the 7404 leaf categories in ImageNet7K, ordered by a depth-first traversal of the WordNet hierarchy (J. Deng, A. Berg & L. Fei-Fei, ECCV, 2010) (4)
  • Fine-grained recognition: recognize similar objects (class of cars, for example):
Fine-Grained Recognition (Gebru, Krause, Deng, Fei-Fei, CHI 2017)
6) Expected outcomes
  • ImageNet became a benchmark
  • It meant a breakthrough in object recognition
  • Machine learning advanced and changed dramatically
7) Unexpected outcomes
  • Neural Nets became popular in academical research again
  • Together, with the increase of accurate and available datasets and high-performance GPUs they promoted a Deep Learning revolution:
  • Maximize specificity in ontological structures:
Maximizing specificity (Deng, Krause, Berg & Fei-Fei, CVPR 2012)
  • Still, relatively few works uses ontological structures;
  • Human comparing versus machine comparing:
How humans and machines compare (Andrej Karpathy, 2014)
7) What lies ahead
  • moving from object recognition to human-level understanding (from perception to cognition):
It means more than recognizing objects AI will allow scene understanding, that is, the relations between people, actions and artifacts in an image.
  • That’s the concept behind Microsoft COCO (Common Objects in Context) (5), a “dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding”;
  • More recently there is the Visual Genome (6), a dataset, a knowledge base, an ongoing effort to connect structural image concepts to language:
    • Specs • 108,249 images (COCO images) • 4.2M image descriptions • 1.8M Visual QA (7W) • 1.4M objects, 75.7K obj. classes • 1.5M relationships, 40.5K rel. classes • 1.7M attributes, 40.5K attr. classes • Vision and language correspondences • Everything mapped to WordNet Synset
    • Exploratory interface:
The interface allows to search fore image and select different image attributes.
  • Visual Genome dataset was further used to advance the state-of-art in CV:
    • Paragraph generation;
    • Relationship prediction;
    • Image retrieval with scene graph;
    • visual questioning and answering
  • The future of vision intelligence relies upon the integration of perception, understanding, and action;
  • From now on, ImageNet ILSVRC challenge will be organized by Kaggle, a data science community that organizes competitions and makes datasets available.
References 

(1) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009. 

(2) Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision 115, no. 3 (December 2015): 211–52. doi:10.1007/s11263-015-0816-y.

(3) Miller, George A. “WordNet: A Lexical Database for English.” Communications of the ACM 38, no. 11 (1995): 39–41.

(4) Deng, Jia, Alexander C. Berg, Kai Li, and Li Fei-Fei. “What Does Classifying More than 10,000 Image Categories Tell Us?” In European Conference on Computer Vision, 71–84. Springer, 2010. https://link.springer.com/chapter/10.1007/978-3-642-15555-0_6.

(5) Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. “Microsoft COCO: Common Objects in Context.” arXiv:1405.0312 [Cs], May 1, 2014. http://arxiv.org/abs/1405.0312.

(6) Krishna, Ranjay, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, et al. “Visual Genome.” Accessed September 27, 2017. https://pdfs.semanticscholar.org/fdc2/d05c9ee932fa19df3edb9922b4f0406538a4.pdf.

Your face in 3D

Reconstructing a 3-D model of a face is a fundamental Computer Vision problem that usually requires multiple images. But a recent publication presents an artificial intelligence approach to tackle this problem. And it does an impressive job!

In this work, the authors train a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. See more information at their project website.

Try their online demo!
Reference: Jackson, Aaron S., Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression.” arXiv:1703.07834 [Cs], March 22, 2017. http://arxiv.org/abs/1703.07834.