Pauliceia 2.0: collaborative mapping of the history of São Paulo (1870-1940)

The Digital Humanities Laboratory (LHuD) of the School of Social Sciences (FGV CPDOC) is organizing an open lecture about the project “Pauliceia 2.0: collaborative mapping of the history of São Paulo (1870-1940)”, coordinated by professor Luis Ferla (Unifesp).

The project developed and made available a historical digital cartographic base of the city of São Paulo, referring to the period of its urban-industrial modernization (1870-1940). The lecture aims to discuss the online platform, disseminate its use and motivate the participation of scholars. The digital cartographic database is associated with an interface that allows interactivity and collaboration: researchers can both search for spatializable events on the map and feed the database with other geolocated events.

The lecture will take place at Acervo CPDOC (Rua Jornalista Orlando Dantas, 60, Botafogo, Rio de Janeiro) on May 29, 2019 (2:00 p.m.). Further information and registration can be found here.

The project was sponsored by Fapesp's eScience program.



PixPlot is a project by Yale Digital Humanities Lab Team. The tool facilitates the dynamic exploration of tens of thousands of images. Inspired by Benoît Seguin et al’s paper at DH Krakow (2016)PixPlot uses the penultimate layer of a pre-trained convolutional neural network for image captioning to derive a robost featurization space in 2,048 dimensions.

Improved Dimensionality Reduction

In order to collapse those 2,048 dimensions into something that can be rendered on a computer screen, we turned to Uniform Manifold Approximation and Projection (UMAP), a dimensionality reduction technique similar to t-Distributed Stochastic Neighbor Embedding (t-SNE) that seeks to preserve both local clusters and an intrepretable global shape.

Dynamic Visualization

The resulting WebGL-powered visualization consists of a two-dimensional projection within which similar images cluster together. Users can navigate the space by panning and zooming in and out of clusters of interest, or they can jump to designated “hotspots” that feature a representative image from each cluster, as identified by the computer.

Future Developments

PixPlot provides new ways of engaging large-scale visual collections. Initial experiments underway at Yale use the tool to look at thousands of cultural heritage images held in the Beinecke Rare Book & Manuscript Library, Yale Center for British Art, and the Medical Historical Library.

Video recordings of Information Plus Conference

Information Plus is a biennial conference on interdisciplinary practices in information design and visualization. The last edition took place in Potsdam Germany from 19 to 21 October.

Organizers have just updated the website with video recordings of the first conference day and photo documentation of the workshops, exhibition and dialog dinner. The remaining videos will follow over the next weeks.

Presentations I watched so far:

An interactive map that uses machine learning algorithms to detect fields and crops

OneSoil Map allows to explore and compare fields and crops in Europe and the United States (44 countries in total). The overview map helps to understand patterns of fields sizes and crops in different regions. Zooming in enables to know a specific field in detail: the hectarage, the crop, and the field score. Besides, the key feature of the map is that it allows users to see how these fields have changed over the past three years (2016 – 2018). The map reveals insights about local and global trends in crop production for farmers, advisers, and dealers. It helps to predict market performance at all levels and fosters smart decision-making.

Data collection and technology

The map was created by the startup OneSoil and is a continuation of the OneSoil digital farming platform, which automatically detects fields, identifies crops through satellite imagery analysis. The core technologies are based on AI, deep learning models, computer vision, IoT and original machine learning algorithms, which enable the company to process data in real time:

"First, we learned how to clean the satellite photos from artifacts to ensure correct processing of information. Second, we trained an algorithm to allocate field boundaries automatically. For the map, we simplified the boundaries so that the visualization is really fast. The accuracy of crop classification, or F1 score, is 0.91. Third, we trained another algorithm to automatically determine a crop that grows on a field. Fourth, we created what you can now see: the map.



“Existência Numérica” – dataviz exhibition

The exhibition “Existência Numérica” (“Numerical Existence”), that will open on September 17 at Oi Futuro (Rio de Janeiro, Brazil), presents visualization works approached poetically. Migration flow, urban mobility in rental bicycle systems in New York, London and Rio, investments in science and technology made in Brazil in recent years, are some of the themes addressed by Brazilian and foreign artists who are at the forefront of data visualization, an area where art meets computer science.

The exhibition, conceived by Barbara Castro and Luiz Ludwig and curated by Doris Kosminsky (from Labvis Laboratory), will occupy the galleries of Oi Futuro, with dataviz projects by Pedro Miguel Cruz, Till Nagel & Christopher Pietsch (from the Urban Complexity Lab), Alice Bodanzky, Barbara Castro, Doris Kosminsky & Claudio Esperança and Luiz Ludwig.

A roundtable with the presence of artists and researchers will take place on September 19 from 3:00 p.m. to 6:00 p.m.

Hands-on activity on data visualization

Last week, I co-hosted a workshop at the Thought For Food Academy Program, an international event dedicated to engaging and empowering the next generation of innovators to solve the complex and important challenges facing our food system. And for that to be, the annual TFF Academy and Summit bring together interdisciplinary professionals from science, entrepreneurship, industry, policy, and design to explore, debate and create ‘what’s next’ in food and agriculture. The TFF Academy Program took place in Escola Eleva, Rio de Janeiro, from 23 to 26 July. The full TFF Program can be accessed here.

I had the opportunity to propose a hands-on activity on Data visualization for spatial data analysis as part of the Big Data and GIS specialization track offered to young students and entrepreneurs from all over the world. In total, 35 participants from 20 different nationalities participated in the workshop. I co-hosted this track with Brittany Dahl, from ESRI Australia, and Vinicius Filier, from Imagem Soluções de Inteligência Geográfica.

The resources for this hands-on activity (slides and instructions) can be found on my personal website.

My hand-crafted presentation for the hands-on activity 🙂 See more here

A special thanks to Leandro Amorim, Henrique Ilidio and Erlan Carvalho, from Café Design Studio, who helped to line up this activity.


Data mining with historical documents

The last seminar held by the Vision and Graphics Laboratory was about data mining with historical documents. Marcelo Ribeiro, a master student at the Applied Mathematics School of the Getúlio Vargas Foundation (EMAp/FGV), presented the results obtained with the application of topic modeling and natural language processing on the analysis of historical documents. This work was previously presented at the first International Digital Humanities Conference held in Brazil (HDRIO2018) and had Renato Rocha Souza (professor and researcher at EMAp/FGV) and Alexandre Moreli (professor and researcher at USP) as co-authors.

The database used is part of the CPDOC-FGV collection and essentially comprises historical documents from the 1970s belonging to Antonio Azeredo da Silveira, former Minister of Foreign Affairs of Brazil.

The documents:

• +10 thousand documents
• +66 thousand pages
• +14 million tokens / words (dictionaries or not)
• 5 languages, mainly Portuguese

• Physical documents
• Images (.tif and .jpg)
• Texts (.txt)

The presentation addressed the steps of the project, from document digitalization to Integration of results into the History-Lab platform.

The images below refer to the explanation of the OCR (Optical Character Recognition) phase and the topic modeling phase:

Presentation slides (in pt) can be accessed here. This initiative integrates the History Lab project, organized by Columbia University, which uses data science methods to investigate history.

Visualizing cultural collections

Browsing the content from Information Plus Conference (2016 edition) I bumped into a really interesting presentation regarding the use of graphical user interfaces and data visualization to support the exploration of large-scale digital cultural heritage.

One View is Not Enough: High-level Visualizations of Large Cultural Collections is a contribution by the Urban Complexity Lab, from the University of Applied Sciences Potsdam. Check the talk by Marian Dörk:

As many cultural heritage institutions, such as museums, archives, and libraries, are digitizing their assets, there is a pressing question which is how can we give access to this large-scale and complex inventories? How can we present it in a way to let people can draw meaning from it, get inspired and entertained and maybe even educated?

The Urban Complexity Lab tackle this open problem by investigating and developing graphical user interfaces and different kinds of data visualizations to explore and visualize cultural collections in a way to show high-level patterns and relationships.

In this specific talk, Marian presents two projects conducted at the Lab. The first, DDB visualized, is a project in partnership with the Deutsche Digitale Bibliothek. Four interactive visualizations make the vast extent of the German Digital Library visible and explorable. Periods, places and persons are three of the categories, while keywords provide links to browsable pages of the library itself.


The second, GEI – Digital, is a project in partnership with the Georg Eckert Institute. This data dossier provides multi-faceted perspectives on GEI-Digital, a digital library of historical schoolbooks created and maintained by the Georg Eckert Institute for International Textbook Research.