A map that reveals patterns of arrangement of buildings

A dataset containing 125,192,184 computer generated building footprints in all 50 US states is the source for a New York Times’s map of every building in America.

Published on 12th October, this map represents every building in the US by a black speck, reflecting the built legacy of the United States.

The dataset was publicly released by Microsoft early this year. The company’s computer engineers trained a neural network to analyze satellite imagery and then to trace the shapes of buildings across the country.

DNN architecture: The network foundation is ResNet34. The model is fully-convolutional, meaning that the model can be applied to an image of any size (constrained by GPU memory, 4096×4096 in the case).

The map reveals patterns in the arrangements of buildings. Traditional road maps highlight streets and highways; here they show up as a linear absence. As a result, “… you can read history in the transition from curving, paved-over cow paths in old downtowns to suburban sprawl; you can detect signals of wealth and poverty, sometimes almost next door to each other.”.

In the south of New Orleans, it’s possible to notice the layout of buildings along a narrow spit of land on either side of a Louisiana bayou, which may reflect the imprint of the region’s history under France: “… “long lot” development, which stretched skinny holdings laterally away from important waterways. Geography shapes settlement, but culture does, as well.”
Buildings along Louisiana bayou

Experiments by Google explore how machines understand artworks

Google Arts & Culture initiative promotes experiments at the crossroads of art and technology created by artists and creative coders. I selected two experiments that apply Machine Learning methods to detect objects in photographs and artworks and generate machine-based tags. These tags are then used to enhance accessibility and exploration of cultural collections.

Tags and Life Tags

These two demo experiments explore how computers read and tag artworks through a Machine Learning approach.

Tags: without the intervention of humans, keywords were generated by an algorithm also used in Google Photos, which analyzed the artworks by looking at the images without any metadata.

The user interface shows a list of tags (keywords) followed by its number of occurrence in the artwork collection. Selecting the tag ‘man’ reveals artworks containing what an intelligent machine understands to be a man. Hovering an artwork reveals other tags detected on that specific representation.

The user interface shows a list of tags (keywords) followed by its number of occurrence in the artwork collection. Selecting the tag ‘man’ reveals artworks containing what an intelligent machine understands to be a man. Hovering an artwork reveals other tags detected on that specific representation.

Life Tags: organizes over 4 million images from the Life magazine archives into an interactive interface that looks like an encyclopedia. The terms of the “encyclopedia” were generated by an algorithm based on a deep neural network used in Google photo search that has been trained on millions of images and labels to recognize categories for labels and pictures.

Labels were clustered into categories using a nearest neighbor algorithm, which finds related labels based on image feature vectors. Each image has multiple labels linked to the elements that are recognized. The full-size image viewer shows dotted lines revealing the objects detected by the computer.

The overall interface of Life Tags looks like an encyclopedia
Kitchen is a categoria clustering labels using a nearest neighbor algorithm.
Selecting a specific photo expands it and reveals the labels recognized by the machine.

Digital Humanities 2018: a selection of sessions I would like to attend

As Digital Humanities 2018 is approaching, I took a time to look at its program. Unfortunately, I didn’t have contributions to submit this year so I won’t attend the Conference. But I had the pleasure to be a reviewer this edition and I’ll also stay tuned on Twitter during the Conference!

My main topic of interest in Digital Humanities bridges the analysis of large-scale visual archives and graphical user interface to browse and make sense of them. So I selected the following contributions I would like to attend if I were at DH2018.


Distant Viewing with Deep Learning: An Introduction to Analyzing Large Corpora of Images

by Taylor Arnold, Lauren Tilton (University of Richmond)

Taylor and Lauren coordinate the Distant Viewing, a Laboratory which develops computational techniques to analyze moving image culture on a large scale. Previously, they contributed on Photogrammar, a web-based platform for organizing, searching, and visualizing the 170,000 photographs. This project was first presented ad Digital Humanities 2016. (abstract here) and I’ve mentioned this work in my presentation at the HDRIO2018 (slides here, Portuguese only).

  • Beyond Image Search: Computer Vision in Western Art History, with Miriam Posner, Leonardo Impett, Peter Bell, Benoit Seguin and Bjorn Ommer;
  • Computer Vision in DH, with Lauren Tilton, Taylor Arnold, Thomas Smits, Melvin Wevers, Mark Williams, Lorenzo Torresani, Maksim Bolonkin, John Bell, Dimitrios Latsis;
  • Building Bridges With Interactive Visual Technologies, with Adeline Joffres, Rocio Ruiz Rodarte, Roberto Scopigno, George Bruseker, Anaïs Guillem, Marie Puren, Charles Riondet, Pierre Alliez, Franco Niccolucci

Paper session: Art History, Archives, Media

  • The (Digital) Space Between: Notes on Art History and Machine Vision Learning, by Benjamin Zweig (from Center for Advanced Study in the Visual Arts, National Gallery of Art);
  • Modeling the Fragmented Archive: A Missing Data Case Study from Provenance Research, by Matthew Lincoln and Sandra van Ginhoven (from Getty Research Institute);
  • Urban Art in a Digital Context: A Computer-Based Evaluation of Street Art and Graffiti Writing, by Sabine Lang and Björn Ommer (from Heidelberg Collaboratory for Image Processing);
  • Extracting and Aligning Artist Names in Digitized Art Historical Archives by Benoit Seguin, Lia Costiner, Isabella di Lenardo, Frédéric Kaplan (from EPFL, Switzerland);
  • Métodos digitales para el estudio de la fotografía compartida. Una aproximación distante a tres ciudades iberoamericanas en Instagram (by Gabriela Elisa Sued)
Paper session: Visual Narratives
  • Computational Analysis and Visual Stylometry of Comics using Convolutional Neural Networks, by Jochen Laubrock and David Dubray (from University of Potsdam, Germany);
  • Automated Genre and Author Distinction in Comics: Towards a Stylometry for Visual Narrative, by Alexander Dunst and Rita Hartel (from University of Paderborn, Germany);
  • Metadata Challenges to Discoverability in Children’s Picture Book Publishing: The Diverse BookFinder Intervention, by Kathi Inman Berens, Christina Bell (from Portland State University and Bates College, United States of America)
Poster sessions:
  • Chromatic Structure and Family Resemblance in Large Art Collections — Exemplary Quantification and Visualizations (by Loan Tran, Poshen Lee, Jevin West and Maximilian Schich);
  • Modeling the Genealogy of Imagetexts: Studying Images and Texts in Conjunction using Computational Methods (by Melvin Wevers, Thomas Smits and Leonardo Impett);
  • A Graphical User Interface for LDA Topic Modeling (by Steffen Pielström, Severin Simmler, Thorsten Vitt and Fotis Jannidis)

Multiplicity project at the 123data exhibition in Paris

Multiplicity is a collective photographic portrait of Paris. Idealized and designed by Moritz Stefaner, in the occasion of the 123 data exhibition, this interactive installation provides an immersive dive into the image space spanned by hundreds of thousands of photos taken across the Paris city area and shared on social media.

Content selection and curation aspects

The original image dataset consisted of 6.2m geo-located social media photos posted in Paris in 2017. However, for a not really clarified reason (maybe a technical aspect?), a custom selection of 25.000 photos was chosen according to a list of criteria. Moritz highlights it was his intention not to measure, but portray the city. He says: “Rather than statistics, the project presents a stimulating arrangement of qualitative contents, open for exploration and to interpretation — consciously curated and pre-arranged, but not pre-interpreted.” This curated method wasn’t just used for data selection but also for bridging the t-SNE visualization and the grid visualization. Watch the transition effect in the video below. As a researcher interested in user interface and visualization techniques to support knowledge discovery in digital image collections, I wonder if a curated-applied method could be considered in a Digital Humanities approach.

Data Processing

Using machine learning techniques, the images are organized by similarity and image contents, allowing to visually explore niches and microgenres of image styles and contents. More precisely, it uses t-SNE dimensionality reduction to visualize the features from the last layer of a pre-trained neural network to cluster images of Paris. The author says: “I used feature vectors normally intended for classification to calculate pairwise similarities between the images. The map arrangement was calculated using t-SNE — an algorithm that finds an optimal 2D layout so that similar images are close together.”

While the t-SNE algorithm takes care of the clustering and neighborhood structure, manual annotations help with identification of curated map areas. These areas can be zoomed on demand enabling close viewing of similar photos.


Introducing deepfakes

Have you heard about the so-called deepfakes? The word, a portmanteau of “deep learning” and “fake”, refers to a new AI-assisted human image synthesis technique that generates realistic face-swaps.

The technology behind deepfake is relatively easy to understand. In short, you show a set of images of an individual to a machine (a computer program or an app such as FakeApp) and, through an artificial intelligence approach, it finds common ground between two faces and stitches one over the other.

Deepfake phenomenon started to draw attention after 2017 porn scandal when an anonymous Reddit user under the pseudonym “Deepfakes” posted several porn videos on the Internet.

Deepfakes in politics

Deepfakes have been used to misrepresent well-known politicians on video portals or chatrooms. For example, the face of the Argentine President Mauricio Macri has been replaced by the face of Adolf Hitler:

Also, Angela Merkel’s face was replaced with Donald Trump’s.

In April 2018, Jordan Peele, from BuzzFeed, demonstrated the dangerous potential of deepfakes, with a video where a man who looks just like Barack Obama says the following: “So, for instance, they could have me say things like ‘Killmonger was right’ or ‘Ben Carson is in the Sunken Place,’ or ‘President Trump is a total and complete dipshit.'”

Curating photography with neural networks

“Computed Curation” is a 95-foot-long, accordion photobook created by a computer. Taking the human editor out of the loop, it uses machine learning and computer vision tools to curate a series of photos from Philipp Schmitt personal archive.

The book features 207 photos taken between 2013 to 2017. Considering both image content and composition the algorithms uncover unexpected connections among photographies and interpretations that a human editor might have missed.

A spread of the accordion book feels like this: on one page, a photograph is centralized with a caption above it: “a harbor filled with lots of traffic” [confidence: 56,75%]. Location and date appear next to the photo, as a credit: Los Angeles, USA. November, 2016. On the bottom of the photo, some tags are listed: “marina, city, vehicle, dock, walkway, sport venue, port, harbor, infrastructure, downtown”. On the next page, the same layout with different content: a picture is captioned “a crowd of people watching a large umbrella” [confidence: 67,66%]. Location and date: Berlin, Germany. August, 2014. Tags: “crowd, people, spring, festival, tradition”.

Metadata from the camera device (date and location) is collected using Adobe Lightroom. Visual features (tags and colors) are extracted from photos using Google’s Cloud Vision API. Automated captions for photos, with their corresponding score confidence, are generated using Microsoft’s Cognitive Services API. Finally, image composition is analyzed using histogram of oriented gradients (HOGs). These components were then considered by a t-SNE learning algorithm, which sorted the images in a two-dimensional space according to similarities. A genetic TSP algorithm computes the shortest path through the arrangement, thereby defining the page order. You can check out the process, recorded in his video below:



Mind-reading machines

A new AI model sort of reconstructs what you see from brain scans.

Schematics of our reconstruction approach. (A) Model training. We use an adversarial training strategy adopted from Dosovitskiy and Brox (2016b), which consists of 3 DNNs: a generator, a comparator, and a discriminator. The training images are presented to a human subject, while brain activity is measured by fMRI. The fMRI activity is used as an input to the generator. The generator is trained to reconstruct the images from the fMRI activity to be as similar to the presented training images in both pixel and feature space. The adversarial loss constrains the generator to generate reconstructed images that fool the discriminator to classify them as the true training images. The discriminator is trained to distinguish between the reconstructed image and the true training image. The comparator is a pre-trained DNN, which was trained to recognize the object in natural images. Both the reconstructed and true training images are used as an input to the comparator, which compares the image similarity in feature space. (B) Model test. In the test phase, the images are reconstructed by providing the fMRI activity of the test image as the input to the generator. (Shen et al, 2018)

Check the Journal Article here

Reference: Shen, Guohua, Kshitij Dwivedi, Kei Majima, Tomoyasu Horikawa, and Yukiyasu Kamitani. “End-to-End Deep Image Reconstruction from Human Brain Activity.” BioRxiv, February 27, 2018, 272518. https://doi.org/10.1101/272518.

A computer vision algorithm for identifying images in different lighting

Computer vision has come a long way since Imagenet, a large, open-source data set of labeled images, was released in 2009 for researchers to use to train AI—but images with tricky or bad lighting can still confuse algorithms.

A new paper by researchers from MIT and DeepMind details a process that can identify images in different lighting without having to hand-code rules or train on a huge data set. The process, called a rendered intrinsics network (RIN), automatically separates an image into reflectance, shape, and lighting layers. It then recombines the layers into a reconstruction of the original images.

AI is learning how to invent new fashions

In a paper published on the ArXiv, researchers from the University of California and Adobe have outlined a way for AI to not only learn a person’s style but create computer-generated images of items that match that style. This kind of computer vision task is being called “predictive fashion” and could let retailers create personalized pieces of clothing.

The model can be used for both personalized recommendation and design. Personalized recommendation is achieved by using a ‘visually aware’ recommender based on Siamese CNNs; generation is achieved by using a Generative Adversarial Net to synthesize new clothing items in the user’s personal style. (Kang et al., 2017).
Reference: Kang, Wang-Cheng, Chen Fang, Zhaowen Wang, and Julian McAuley. “Visually-Aware Fashion Recommendation and Design with Generative Image Models.” arXiv:1711.02231 [Cs], November 6, 2017. http://arxiv.org/abs/1711.02231.

Artificial Intelligence that can create convincing spoof photo and video

I wonder if Peter Burke would rethink the documental and historical status of photography when we start to see AI and Deep Learning systems (like generative adversarial networks – GANs) being used to create fake and believable images at scale.

Reproduction from Ian Goodfellow’s speaking presentation at EmTech MIT 2017.
Reference: J. Snow, “AI could send us back 100 years when it comes to how we consume news,” MIT Technology Review. [Online]. Available: https://www.technologyreview.com/s/609358/ai-could-send-us-back-100-years-when-it-comes-to-how-we-consume-news/. [Accessed: 09-Nov-2017].