Designing ML-driven products

The People + AI Research Initiative (PAIR), launched on 10th July 2017 by Google Brain Team, brings together researchers across Google to study and redesign the ways people interact with AI systems.

The article “Human-Centered Machine Learning” by Jess Holbrook¹, addresses how ML is causing UX designers to rethink, restructure, displace, and consider new possibilities for every product or service they build.

Both texts made me think about the image search and comparison engine I’m proposing through an user-centered point of view. I can take the following user needs identified by Martin Wattenberg and Fernanda Viégas and try to apply them to the product I’m planning to implement and evaluate:

  • Engineers and researchers: AI is built by people. How might we make it easier for engineers to build and understand machine learning systems? What educational materials and practical tools do they need?
  • Domain experts: How can AI aid and augment professionals in their work? How might we support doctors, technicians, designers, farmers, and musicians as they increasingly use AI?
  • Everyday users: How might we ensure machine learning is inclusive, so everyone can benefit from breakthroughs in AI? Can design thinking open up entirely new AI applications? Can we democratize the technology behind AI?

In my opinion, my research expects to attend the needs of “domain experts” (eg. designers and other professionals interested on visual discovery) and everyday users. But how to design this image search and comparison engine through a ML-driven approach or what Jess Holbrook calls “Human-Centered Machine Learning”? In his text, there are 7 steps to stay focused on the user when designing with ML. However, I want to highlight a distinction between what I see to be a full ML-driven product (in the way of what Google creates) and what I understand to be a product that shows a ML approach in its conception but not in its entirety (that is, the engine proposed in my research).

A full ML-driven product results in an interface that dynamically responds to the user input. That is, the pre-trained model performs tasks during user interaction and the interface presents the desired output for the user input. Or even more: the model can be retrained from the user’s data during interaction and the interface will dynamically show the results.

On the other hand, in my research, the ML approach will be only used during the image classification phase, which does not include the final user. After we collect all images from Twitter (or Instagram) these data will be categorized by Google Vision API, which is driven by ML algorithms. The results of Google’s classification will be then selected and used to organize the images on a multimedia interface. Finally, the user will be able to search for image trough text queries or by selecting filters based on ML image classification. However, during user interaction, there are no ML tasks being performed.


1 UX Manager and UX Researcher in the Research and Machine Intelligence group at Google

StreetStyle: Exploring world-wide clothing styles from millions of photos

Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends.

Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:1706.01869 [cs.CV]
(or arXiv:1706.01869v1 [cs.CV] for this version)

Content-based image retrieval

Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases (see this survey [1] for a recent scientific overview of the CBIR field). Content-based image retrieval is opposed to traditional concept-based approaches (see Concept-based image indexing).


[1] Content-based multimedia information retrieval: State of the art and challenges

Source: Wikipedia