The first People + AI Research Symposium brings together academics, researchers and artists to discuss such topics as augmented intelligence, model interpretability, and human–AI collaboration.
The Symposium is part of PAIR initiative, a Google Artificial Intelligence project, and is scheduled to go on livestream on September 26, 2017, at 9 am (GMT-4).
The livestream content will be available on this link.
1) Welcome: John Giannandrea (4:55); Martin Wattenberg and Fernanda Viegas (20:06)
2) Jess Holbrook, PAIR Google (UX lead for the AI project. Talks about the concept of Human-centered Machine Learning)
3) Karrie …, University of Illinois
4) Hae Won Park, MIT
5) Maya Gupla, Google
6) Antonio Torralba, MIT
7) John Zimmerman, Carnegie Melon University
Webinar abstract: It took nature and evolution more than 500 million years to develop a powerful visual system in humans. The journey for AI and computer vision is about half of a century. In this talk, Dr. Li will briefly discuss the key ideas and the cutting-edge advances in the quest for visual intelligence in computers, focusing on work done to develop ImageNet over the years.
Some highlights of this webinar:
1) The impact of ImageNet on AI/ ML research:
First. What’s ImageNet? It’s an image database, a “… largescale ontology of images built upon the backbone of the WordNet structure”;
ImageNet became a key driven-force for deep learning implementation and helped to spread the culture of building structured datasets for specific domains:
Kaggle: a platform for predictive modeling and analytics competitions in which companies and researchers post data and statisticians and data miners compete to produce the best models for predicting and describing the data
Datasets – not algorithms – might be the key limiting factor to develpment of human-level artificial inteligence.” (Alexander Wissner-Gross, 2016)
2) The background of ImageNet
The beginning: Publication about ImageNet in CVPR (2009);
There are a lot of previous datasets that should be acknowledged:
The reason why ImageNet became so popular is that this dataset has the rights characteristics to implement Computer Vision (CV) tasks from a Machine Learning (ML) approach.;
By 2005, the marriage of ML and CV became a trend in the scientific community;
There was a shift in the way ML was applied for visual recognition tasks: from a modeling-oriented approach to having lots of data.
This shift was partly enabled by the rapid internet data growth, that meant the opportunity to collect a large-scale visual data.
3) From Wordnet to ImageNet
ImageNet was built upon the backbone of the WordNet, a tremendous dataset that enabled work in Natural Language Processing (NLP) and related tasks.
What’s WordNet? It’s a large lexical database of English. The original paper (3) by George Miller et al is cited over 5k. The database organizers over 150k words into 117k categories. It establishes ontological and lexical relationships in NLP and related tasks.
The idea to move from language to image:
Three steps shift:
Step 1: ontological structures based on wordnet;
Step 2: populate categories with thousands of images from the internet;
Step3: clean bad results manually. By cleaning the errors you ensure your dataset is accurate.
Three attempts to populate, train and test the dataset. The first two failed. The third was successful due to a new technology that became available by that time: Amazon Mechanical Turk, a kind of crowdsourced engineer. Imagenet had the help of 49k workers from 167 countries (2007-2010).
After three years, ImageNet goes live in 2009 (50M images organized by 10K concept categories)
4) What they did right?
Based on ML needs, ImageNet targeted scale:
Besides, the database cared about:
image quality (high resolution to better replicate human visual acuity);
accurate annotations (to create a benchmarking dataset and advance the state of machine perception);
free of Charge (to ensure immediate application and a sense of community -> democratization)
Emphasis on Community: ILSVRC challenge is launched in 2009;
ILSVRC was inspired in PASCAL by VOC (Pattern Analysis, Statistical Modelling, and Computational Learning). From 2005-2012.
Participation and performance: the number of entries increased; classification errors (top-5) went down; the average precision for object detection went up:
5) In what ImageNet invested and still investing efforts?
Lack of details: just one category annotated per image. Object detection enabled to recognize more than one class per image (through bounding boxs);
Fine-grained recognition: recognize similar objects (class of cars, for example):
6) Expected outcomes
ImageNet became a benchmark
It meant a breakthrough in object recognition
Machine learning advanced and changed dramatically
7) Unexpected outcomes
Neural Nets became popular in academical research again
Together, with the increase of accurate and available datasets and high-performance GPUs they promoted a Deep Learning revolution:
Maximize specificity in ontological structures:
Still, relatively few works uses ontological structures;
Human comparing versus machine comparing:
7) What lies ahead
moving from object recognition to human-level understanding (from perception to cognition):
That’s the concept behind Microsoft COCO (Common Objects in Context) (5), a “dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding”;
More recently there is the Visual Genome (6), a dataset, a knowledge base, an ongoing effort to connect structural image concepts to language:
Visual Genome dataset was further used to advance the state-of-art in CV:
Image retrieval with scene graph;
visual questioning and answering
The future of vision intelligence relies upon the integration of perception, understanding, and action;
From now on, ImageNet ILSVRC challenge will be organized by Kaggle, a data science community that organizes competitions and makes datasets available.
(1) J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A Large-Scale Hierarchical Image Database. IEEE Computer Vision and Pattern Recognition (CVPR), 2009.
(2) Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision 115, no. 3 (December 2015): 211–52. doi:10.1007/s11263-015-0816-y.
(3) Miller, George A. “WordNet: A Lexical Database for English.” Communications of the ACM 38, no. 11 (1995): 39–41.
(4) Deng, Jia, Alexander C. Berg, Kai Li, and Li Fei-Fei. “What Does Classifying More than 10,000 Image Categories Tell Us?” In European Conference on Computer Vision, 71–84. Springer, 2010. https://link.springer.com/chapter/10.1007/978-3-642-15555-0_6.
(5) Lin, Tsung-Yi, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. “Microsoft COCO: Common Objects in Context.” arXiv:1405.0312 [Cs], May 1, 2014. http://arxiv.org/abs/1405.0312.
(6) Krishna, Ranjay, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, et al. “Visual Genome.” Accessed September 27, 2017. https://pdfs.semanticscholar.org/fdc2/d05c9ee932fa19df3edb9922b4f0406538a4.pdf.
Reconstructing a 3-D model of a face is a fundamental Computer Vision problem that usually requires multiple images. But a recent publication presents an artificial intelligence approach to tackle this problem. And it does an impressive job!
In this work, the authors train a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. See more information at their project website.
Reference: Jackson, Aaron S., Adrian Bulat, Vasileios Argyriou, and Georgios Tzimiropoulos. “Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression.” arXiv:1703.07834 [Cs], March 22, 2017. http://arxiv.org/abs/1703.07834.
This Fast Company article approaches the application of Machine Learning to Logo Design and touches the issue of whether or not robots and automation are coming to take designer’s jobs.
More specifically, the article describes Mark Maker, a web-based platform that generates logo designs.
But how does it work? I’ll quote Fast Company’s explanation: “In Mark Maker, you type in a word. The system then uses a genetic algorithm–a kind of program that mimics natural selection–to generate an endless succession of logos. When you like a logo, you click a heart, which tells the system to generate more logos like it. By liking enough logos, the idea is that Mark Maker can eventually generate one that suits your needs, without ever employing a human designer”.
I’m not sure if we can say this tool is actually applying design to create logos. Either way, it still a fun web toy. Give it a try!
An insightful video by Google Creative Lab explaining how intelligent machines perpetuates humans bias.
Just because something is based on data, doesn’t automatically make it neutral. Even with good intention, it’s impossible to separate ourselves from our own human biases. So our human biases become part of the technology we create in many differente ways.
According to a Fast Company article, Adobe is applying machine learning and image recognition to graphic and web design. Using Sensei, the company has created tools that automate designers’ tasks, like cropping photos and designing web pages.
Instead of a designer deciding on layout, colors, photos, and photo sizes, the software platform automatically analyzes all the input and recommends design elements to the user. Using image recognition techniques, basic photo editing like cropping is automated, and an AI makes design recommendations for the pages. Using photos already in the client’s database (and the metadata attached to those photos), the AI–which, again, is layered into Adobe’s CMS–makes recommendations on elements to include and customizations for the designer to make.
Should designers be worried? I guess not. Machine learning helps automate tedious and boring tasks. The vast majority of graphic designers don’t have to worry about algorithms stealing their jobs.
While machine learning is great for understanding large data sets and making recommendations, it’s awful at analyzing subjective things such as taste.