The People + AI Research Initiative (PAIR), launched on 10th July 2017 by Google Brain Team, brings together researchers across Google to study and redesign the ways people interact with AI systems.
The article “Human-Centered Machine Learning” by Jess Holbrook¹, addresses how ML is causing UX designers to rethink, restructure, displace, and consider new possibilities for every product or service they build.
Both texts made me think about the image search and comparison engine I’m proposing through an user-centered point of view. I can take the following user needs identified by Martin Wattenberg and Fernanda Viégas and try to apply them to the product I’m planning to implement and evaluate:
- Engineers and researchers: AI is built by people. How might we make it easier for engineers to build and understand machine learning systems? What educational materials and practical tools do they need?
- Domain experts: How can AI aid and augment professionals in their work? How might we support doctors, technicians, designers, farmers, and musicians as they increasingly use AI?
- Everyday users: How might we ensure machine learning is inclusive, so everyone can benefit from breakthroughs in AI? Can design thinking open up entirely new AI applications? Can we democratize the technology behind AI?
In my opinion, my research expects to attend the needs of “domain experts” (eg. designers and other professionals interested on visual discovery) and everyday users. But how to design this image search and comparison engine through a ML-driven approach or what Jess Holbrook calls “Human-Centered Machine Learning”? In his text, there are 7 steps to stay focused on the user when designing with ML. However, I want to highlight a distinction between what I see to be a full ML-driven product (in the way of what Google creates) and what I understand to be a product that shows a ML approach in its conception but not in its entirety (that is, the engine proposed in my research).
A full ML-driven product results in an interface that dynamically responds to the user input. That is, the pre-trained model performs tasks during user interaction and the interface presents the desired output for the user input. Or even more: the model can be retrained from the user’s data during interaction and the interface will dynamically show the results.
On the other hand, in my research, the ML approach will be only used during the image classification phase, which does not include the final user. After we collect all images from Twitter (or Instagram) these data will be categorized by Google Vision API, which is driven by ML algorithms. The results of Google’s classification will be then selected and used to organize the images on a multimedia interface. Finally, the user will be able to search for image trough text queries or by selecting filters based on ML image classification. However, during user interaction, there are no ML tasks being performed.
1 UX Manager and UX Researcher in the Research and Machine Intelligence group at Google