The last seminar held by the Vision and Graphics Laboratory was about data mining with historical documents. Marcelo Ribeiro, a master student at the Applied Mathematics School of the Getúlio Vargas Foundation (EMAp/FGV), presented the results obtained with the application of topic modeling and natural language processing on the analysis of historical documents. This work was previously presented at the first International Digital Humanities Conference held in Brazil (HDRIO2018) and had Renato Rocha Souza (professor and researcher at EMAp/FGV) and Alexandre Moreli (professor and researcher at USP) as co-authors.
The database used is part of the CPDOC-FGV collection and essentially comprises historical documents from the 1970s belonging to Antonio Azeredo da Silveira, former Minister of Foreign Affairs of Brazil.
• +10 thousand documents
• +66 thousand pages
• +14 million tokens / words (dictionaries or not)
• 5 languages, mainly Portuguese
Existing projects in visualization-based interfaces (interfaces which enables navigation through visualization) for cultural collections usually focusses on making their content more accessible to specialists and the public.
Possibly one of the first attempts to explore new forms of knowledge discovery in cultural collections was SFMOMA ArtScope, developed by Stamen Design in 2007 (now decommissioned). The interface allows users to explore more than 6,000 artworks in a grid-based and zoomable visualization. Navigating the collection follows a visualization-based first paradigm which is mainly exploratory (although the interface enables navigation through keyword search, the visualization canvas is clearly protagonist). The artworks’ thumbnails are visually organized by when they were purchased by the museum. The user is able to pan the canvas by dragging it and the lens serves as a selection tool, which magnifies the selected work and reveals detailed information about the selected piece.
ArtScope is an attractive interface which offers the user an overview of the size and content of SFMOMA’s collection. However, the artworks in the canvas are only organized by time of acquisition, a not very informative feature for users (maybe just for the staff museum). Other dimensions (authorship, creation date, technique, subject, etc.) can’t either be filtered and visually organized in the structure of the canvas.
The video bellow illustrates the interface navigation:
Multiplicity is a collective photographic portrait of Paris. Idealized and designed by Moritz Stefaner, in the occasion of the 123 data exhibition, this interactive installation provides an immersive dive into the image space spanned by hundreds of thousands of photos taken across the Paris city area and shared on social media.
Content selection and curation aspects
The original image dataset consisted of 6.2m geo-located social media photos posted in Paris in 2017. However, for a not really clarified reason (maybe a technical aspect?), a custom selection of 25.000 photos was chosen according to a list of criteria. Moritz highlights it was his intention not to measure, but portray the city. He says: “Rather than statistics, the project presents a stimulating arrangement of qualitative contents, open for exploration and to interpretation — consciously curated and pre-arranged, but not pre-interpreted.” This curated method wasn’t just used for data selection but also for bridging the t-SNE visualization and the grid visualization. Watch the transition effect in the video below. As a researcher interested in user interface and visualization techniques to support knowledge discovery in digital image collections, I wonder if a curated-applied method could be considered in a Digital Humanities approach.
Using machine learning techniques, the images are organized by similarity and image contents, allowing to visually explore niches and microgenres of image styles and contents. More precisely, it uses t-SNE dimensionality reduction to visualize the features from the last layer of a pre-trained neural network to cluster images of Paris. The author says: “I used feature vectors normally intended for classification to calculate pairwise similarities between the images. The map arrangement was calculated using t-SNE — an algorithm that finds an optimal 2D layout so that similar images are close together.”
While the t-SNE algorithm takes care of the clustering and neighborhood structure, manual annotations help with identification of curated map areas. These areas can be zoomed on demand enabling close viewing of similar photos.
TheUSmilitary is funding an effort to determine whether AI-generated video and audio will soon be indistinguishable from the real thing—even for another AI.
The Defense Advanced Research Projects Agency (DARPA) is holding a contest this summer to generate the most convincing AI-created videos and the most effective tools to spot the counterfeits.
Some of the most realistic fake footage is created by generative adversarial networks, or GANs. GANs pit AI systems against each other to refine their creations and make a product real enough to fool the other AI. In other words, the final videos are literally made to dupe detection tools.
Why it matters? The software to create these videos is becoming increasingly advanced and accessible, which could cause real harm. Sooner this year, actor and filmmaker Jordan Peele warned of the dangers of of deepfakes by manipulating a video of Barack Obama’s speech.
The images are grouped according to specific parameters that are automatically calculated by image analysis and text analysis from metadata. A high-dimensional space is then projected onto a 3D space, while preserving topological neighborhoods between images in the original space. More explanation about the dimensionality reduction can be read here.
The user interface allows four types of image arrangement: by color distribution, by technique, by description and by composition. As the mouse hovers over the items, an info box with some metadata is displayed on the left. The user can also perform rotation, zooming, and panning.
The author wrote on his site:
The project renounces to come up with a rigid ontology and forcing the items to fit in premade categories. It rather sees clusters emerge from attributes contained in the images and texts themselves. Groupings can be derived but are not dictated.
Have you heard about the so-called deepfakes? The word, a portmanteau of “deep learning” and “fake”, refers to a new AI-assisted human image synthesis technique that generates realistic face-swaps.
The technology behind deepfake is relatively easy to understand. In short, you show a set of images of an individual to a machine (a computer program or an app such as FakeApp) and, through an artificial intelligence approach, it finds common ground between two faces and stitches one over the other.
Deepfake phenomenon started to draw attention after 2017 porn scandal when an anonymous Reddit user under the pseudonym “Deepfakes” posted several porn videos on the Internet.
Deepfakes in politics
Deepfakes have been used to misrepresent well-known politicians on video portals or chatrooms. For example, the face of the Argentine President Mauricio Macri has been replaced by the face of Adolf Hitler:
Also, Angela Merkel’s face was replaced with Donald Trump’s.
In April 2018, Jordan Peele, from BuzzFeed, demonstrated the dangerous potential of deepfakes, with a video where a man who looks just like Barack Obama says the following: “So, for instance, they could have me say things like ‘Killmonger was right’ or ‘Ben Carson is in the Sunken Place,’ or ‘President Trump is a total and complete dipshit.'”
Uncompromisingly, I showed Edvard Munch’s “The Scream” to Google Cloud Vision API and, for my surprise, its computer vision algorithm “saw” an interesting aspect I guess most human eyes wouldn’t notice. The Cloud Vision API landscape feature, which detects popular natural and man-made structures within an image, printed out a bounding box along with the tag National Congress of Brazil in a specific area of the painting. Apparently, our Congress is a fright to the machine’s eyes.
Note: The Cloud Vision API doesn’t detect the National Congress of Brazil in all images of “The Scream” available on the web. The image I used was from this page.
The subject of MoMA R&D 24 Salon, held on April, 3rd, 2018, was the impact of artificial intelligent systems in our daily lives through the lens of imperfection. Inspired by Stephen Hawking famous quote “Without imperfection, you or I would not exist, this salon raised interesting questions:
If superintelligent machines will be capable of human-level performance on the full range of tasks that are often thought to be uniquely human capacities, including general intelligence and moral reasoning, how will this impact on the way in which we conceive of ourselves? Will this translate into a radical end to human exceptionalism? Or, rather, will this force us to conclude that the most profound essence of our human nature lies in our fallibility? Will AI teach us that imperfection is what makes us human?
During the paper session “Social networks and visualizations”, held on April 11 at HDRio2018 Congress, I presented the work “Perspectivas para integração do Design nas Humanidades Digitais frente ao desafio da análise de artefatos visuais” (“Perspectives for integrating Design in Digital Humanities in the face of the challenge of visual artifacts analysis”).
In this work, I outline initial considerations of a broader and ongoing research that seeks to reflect on the contributions offered by the field of Design in the conception of a graphical user interface that, along with computer vision and machine learning technologies, support browsing and exploration of large collections of images.
I believe my contribution raises three main discussions for the field of Digital Humanities:
The investigation of large collections of images (photographs, paintings, illustrations, videos, GIFs, etc.) using image recognition techniques through a Machine Learning approach;
The valorization of texts and media produced on social networks as a valid source of cultural heritage for Digital Humanities studies;
Integration of Design principles and methodologies (HCI and visualization techniques) in the development of tools to retrieve, explore and visualize large image collections.
Slides from this presentation can be accessed here (Portuguese only).
The I International Congress on Digital Humanities – HDRio2018, held in Getulio Vargas Foundation (FGV), Rio de Janeiro, from April 9 to 13, 2018, initiated in Brazil a broad and international debate on this relevant and emerging field, constituting a timely opportunity for academics, scientists and technologists of Arts, Culture and Social Sciences, Humanities and Computation, to reflect, among other topics, the impact of information technologies, communication networks and the digitization of collections and processes in individuals’ daily lives and their effects on local and global institutions and societies, especially in Brazilian reality.
HDRio2018’s program included Opening and Closing Ceremony, 6 workshops, 8 panels, 8 paper sessions (featuring 181 presentations) and 1 poster session. Accepted papers can be found here.
Organizers: The Laboratory Of Digital Humanities – LHuD from Centre for Research and Documentation of Contemporary History of Brazil (CPDOC) at Getulio Vargas Foundation (FGV) and the Laboratory for Preservation and Management of Digital Collections (LABOGAD) at Federal University of the State of Rio de Janeiro (UNIRIO).