Reconstructing a 3-D model of a face is a fundamental Computer Vision problem that usually requires multiple images. But a recent publication presents an artificial intelligence approach to tackle this problem. And it does an impressive job!
In this work, the authors train a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. See more information at their project website.
Reference: Jackson, Aaron S., Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression.” arXiv:1703.07834 [Cs], March 22, 2017. http://arxiv.org/abs/1703.07834.
This Fast Company article approaches the application of Machine Learning to Logo Design and touches the issue of whether or not robots and automation are coming to take designer’s jobs.
More specifically, the article describes Mark Maker, a web-based platform that generates logo designs.
But how does it work? I’ll quote Fast Company’s explanation: “In Mark Maker, you type in a word. The system then uses a genetic algorithm–a kind of program that mimics natural selection–to generate an endless succession of logos. When you like a logo, you click a heart, which tells the system to generate more logos like it. By liking enough logos, the idea is that Mark Maker can eventually generate one that suits your needs, without ever employing a human designer”.
I’m not sure if we can say this tool is actually applying design to create logos. Either way, it still a fun web toy. Give it a try!
An insightful video by Google Creative Lab explaining how intelligent machines perpetuates humans bias.
Just because something is based on data, doesn’t automatically make it neutral. Even with good intention, it’s impossible to separate ourselves from our own human biases. So our human biases become part of the technology we create in many differente ways.
According to a Fast Company article, Adobe is applying machine learning and image recognition to graphic and web design. Using Sensei, the company has created tools that automate designers’ tasks, like cropping photos and designing web pages.
Instead of a designer deciding on layout, colors, photos, and photo sizes, the software platform automatically analyzes all the input and recommends design elements to the user. Using image recognition techniques, basic photo editing like cropping is automated, and an AI makes design recommendations for the pages. Using photos already in the client’s database (and the metadata attached to those photos), the AI–which, again, is layered into Adobe’s CMS–makes recommendations on elements to include and customizations for the designer to make.
Should designers be worried? I guess not. Machine learning helps automate tedious and boring tasks. The vast majority of graphic designers don’t have to worry about algorithms stealing their jobs.
While machine learning is great for understanding large data sets and making recommendations, it’s awful at analyzing subjective things such as taste.
The challenge of teaching machines to understand the world without reproducing prejudices. Researchers from Virginia University have identified that intelligent systems have started to link the cooking action in images much more to women than men.
Just like search engines – which Google has as its prime example – do not work under absolute neutrality, free of any bias or prejudice, machines equipped with artificial intelligence trained to identify and categorize what they see in photos also do not work in a neutral way.
Reference: Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. “Men Also Like Shopping: Reducing Gender Bias Amplification Using Corpus-Level Constraints.” arXiv:1707.09457 [Cs, Stat], July 28, 2017. http://arxiv.org/abs/1707.09457.
By comparing 1.6 million pairs of photos taken seven years apart, researchers from MIT’s Collective Learning Group now used a new computer vision system to quantify the physical improvement or deterioration of neighborhoods in five American cities, in an attempt to identify factors that predict urban change.
The project is called Streetchange. An article introducing the article can be found here.
Reference:Naik, Nikhil, Scott Duke Kominers, Ramesh Raskar, Edward L. Glaeser, and César A. Hidalgo. “Computer Vision Uncovers Predictors of Physical Urban Change.” Proceedings of the National Academy of Sciences 114, no. 29 (July 18, 2017): 7571–76. doi:10.1073/pnas.1619003114.
This presentation is about Decomposing Bodies, a large-scale, lab-based, digital humanities project housed in the Visual Media Workshop at the University of Pittsburgh that is examining the system of criminal identification introduced in France in the late 19th century by Alphonse Bertillon.
Data: System of criminal identification from American prisoners from Ohio.
Tool: OpenFace. Free and open source face recognition with deep neural networks.
Goal: An end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion and also the ability to browse through the original data and generated metadata using a web interface.
Mechanical Turk: we need to talk about it”: consider Mechanical Turk if public domain data and task is easy.
Findings: Humans deal very well with understanding discrepancies. We should not ask the computer to find these discrepancies to us, but we should build visualizations that allow us to visually compare images and identify de similarities and discrepancies.
2) Distant Viewing TV (Taylor Arnold and Lauren Tilton, University of Richmond)
Distant Viewing TV applies computational methods to the study of television series, utilizing and developing cutting-edge techniques in computer vision to analyze moving image culture on a large scale.
This presentation related the University of Oxford’s Visual Geometry Group’s experience in making images computationally addressable for humanities research.
The Visual Geometry Group has built a number of systems for humanists, variously implementing (i) visual search, in which an image is made retrievable; (ii) comparison, which assists the discovery of similarity and difference; (iii) classification, which applies a descriptive vocabulary to images; and (iv) annotation, in which images are further described for both computational and offline analysis
Idea: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 – 2020.
Objectives: to carry out fundamental research to develop next generation computer vision methods that are able to analyse, describe and search image and video content with human-like capabilities. To transfer these methods to industry and to other academic disciplines (such as Archaeology, Art, Geology, Medicine, Plant sciences and Zoology)
This is a technical demo of the large-scale on-the-fly web search technologies which are under development in the Oxford University Visual Geometry Group, using data provided by BBC R&D comprising over five years of prime-time news broadcasts from six channels. The demo consists of three different components, which can be used to query the dataset on-the-fly for three different query types: object search, image search and people search.
The objective of this research is to find objects in paintings by learning classifiers from photographs on the internet. There is a live demo that allows a user to search for an object of their choosing (such as “baby”, “bird”, or “dog, for example) in a dataset of over 200,000 paintings, in a matter of seconds.
It allows computers to recognize objects in images, what is distinctive about our work is that we also recover the 2D outline of the object. Currently, the project has trained this model to recognize 20 classes. The demo allows the user to test our algorithm on their images.
An introduction of basic notions about the challenges of computer vision. A feeling of the simple, low-level operations necessary for the next stage.
Basic image operations: scikit-image
Face-object identification + identification: dlib
Deep Learning: Keras
What is CV?
How to gain high-level understanding from digital images or videos.
It tries to resolve tasks that humans can do (Wikipedia)
Human Vision System (HVS) versus Digital Image Processing (what the computer sees)
– Jupyter system (an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text);
– perform basic image operations;
– Play with different convolutions to develop intuition.
Hands-on Part II – Deep Learning and its application
During the DH2017 conference in Montreal, I attended the ‘Computer Vision in Digital Humanities‘ workshop organized by AVinDH SIG (Special Interest Group AudioVisual material in Digital Humanities). All information about the workshop can be found here.
An abstract about the workshop was published on DH2017 Proceedings and can be found here.
This workshop focus on how computer vision can be applied within the realm of Audiovisual Materials in Digital Humanities. The workshop included:
The article “Human-Centered Machine Learning” by Jess Holbrook¹, addresses how ML is causing UX designers to rethink, restructure, displace, and consider new possibilities for every product or service they build.
Both texts made me think about the image search and comparison engine I’m proposing through an user-centered point of view. I can take the following user needs identified by Martin Wattenberg and Fernanda Viégas and try to apply them to the product I’m planning to implement and evaluate:
Engineers and researchers: AI is built by people. How might we make it easier for engineers to build and understand machine learning systems? What educational materials and practical tools do they need?
Domain experts: How can AI aid and augment professionals in their work? How might we support doctors, technicians, designers, farmers, and musicians as they increasingly use AI?
Everyday users: How might we ensure machine learning is inclusive, so everyone can benefit from breakthroughs in AI? Can design thinking open up entirely new AI applications? Can we democratize the technology behind AI?
In my opinion, my research expects to attend the needs of “domain experts” (eg. designers and other professionals interested on visual discovery) and everyday users. But how to design this image search and comparison engine through a ML-driven approach or what Jess Holbrook calls “Human-Centered Machine Learning”? In his text, there are 7 steps to stay focused on the user when designing with ML. However, I want to highlight a distinction between what I see to be a full ML-driven product (in the way of what Google creates) and what I understand to be a product that shows a ML approach in its conception but not in its entirety (that is, the engine proposed in my research).
A full ML-driven product results in an interface that dynamically responds to the user input. That is, the pre-trained model performs tasks during user interaction and the interface presents the desired output for the user input. Or even more: the model can be retrained from the user’s data during interaction and the interface will dynamically show the results.
On the other hand, in my research, the ML approach will be only used during the image classification phase, which does not include the final user. After we collect all images from Twitter (or Instagram) these data will be categorized by Google Vision API, which is driven by ML algorithms. The results of Google’s classification will be then selected and used to organize the images on a multimedia interface. Finally, the user will be able to search for image trough text queries or by selecting filters based on ML image classification. However, during user interaction, there are no ML tasks being performed.
1 UX Manager and UX Researcher in the Research and Machine Intelligence group at Google