Seminar Series #1: Erdem Kaya

How can we interactively and visually explore very big and high-dimensional data? In this seminar, we lay out a number of design recommendations and demonstrate how we applied them on a real-world problem: Credit card customer segmentation.

Abstract: In interactive data analysis processes, the dialogue between the human and the computer is the enabling mechanism that can lead to actionable observations about the phenomena being investigated. It is of paramount importance that this dialogue is not interrupted by slow computational mechanisms that do not consider any known temporal human-computer interaction characteristics that prioritise the perceptual and cognitive capabilities of the users. In cases where the analysis involve an integrated computational method, for instance to reduce the dimensionality of the data or to perform clustering, such non-optimal processes are often likely. To remedy this, progressive computations, where results are iteratively improved, are getting increasing interest in visual analytics applications. In this paper, we present techniques and design considerations to incorporate progressive methods within interactive analysis processes that involve high-dimensional data. We define methodologies to facilitate processes that adhere to the perceptual characteristics of users and describe how online algorithms can be incorporated within these. A set of design recommendations and according methods to support analysts in accomplishing high-dimensional data analysis tasks are then presented. Our arguments and decisions here are informed by observations gathered over a series of workshops with analysts from finance. We also demonstrate our approach on an additional use-case and evaluate the methods on financial analysis tasks carried out with our collaborators. We document observations and recommendations from this case study and present evidence on how our approach contribute to the efficiency and productivity of interactive visual analysis sessions involving high-dimensional data.