Summarizing Data Through Visualizations

We had the pleasure of welcoming our first computer science professor to Northern Exposure this past Friday. Dr. David Koop, assistant professor from the Department of Computer Science, gave us a very interesting and engaging talk on summarizing data using visualization.

Dr. Koop started by giving us some brief insight into the different kinds of areas he looks at when it comes to data, then enlightened us (at least the non-computer science majors) on what it means and the steps in summarizing data. Using the example of power generation around the world, Dr. Koop explained that the first step to summarize data is to look at the tabular form, which, in simple terms, is data organizing in rows and columns. From there, the next step is to count the data to find the total number of each thing that exists within the data collected. The third step is to look at some of the statistics present in the data, for example, the mean, median, or standard deviation. And finally, one must find any outliers in the data sets, or any data values that are at an abnormal distance from the core data values. Dr. Koop further explained that summarizing a data set in this way helps us to gain a better insight into a numerical picture of data, but his question is, what if we could get a visual picture of data?

However, there can be a few problems with trying to construct a visual picture of data. For one, if there is too much data present, there can be overlap on the picture, making it so that a data point could get covered up. Additionally, there could be a limited number of pixels or a limited amount of human-resolvable resolution, making the data picture difficult to comprehend at a glance. A solution, Dr. Koop has been examining for this, is an interactive approach from a visual information seeking mantra framework (Shneiderman, 1996). From this, the visual model of the data could then give an overview of the larger data set first, then allow one to zoom, filter, or have details for a specific data value on demand. This approach, however, also presents its own challenges. For instance, how do we know where to zoom? What do we filter for? And how do we not miss something?

Dr. Koop explained that in making a good data visual, there needs to be a few things present in order for the data to be well perceived, and dealing with visual limitations. Some of the guiding principles in making visual data models are that the overview must be excellent and the zoom should have the ability to show greater details. All in all, in visual perceptions of data, the key is not only to make a cool picture, but to gain greater insight into the data.

One of the cool data visualizations Dr. Koop presented to us was a graph summary overview of molecules. This was a summary derived from taking multiple different molecules to give a general visual of their structures. An approach like this, as Dr. Koop explained, allows for a better overview of the data as it is condensed into one graph, letting us show or hide certain node names on the model for certain visual inquiries. Some more of the data types that can be made into a visual model are trajectory clustering, like in hurricane patterns. For this, one can start to group the general trends of hurricane patterns to produce a general picture of trajectory; however, this still doesn’t produce a nice, clean image, as weather can be unpredictable. There is also trajectory binding, where, in some fun experiments, one can GPS track, without an existing map, to see if they can map out major roads. This presents some issues with merging roads in the final products, but it can map out the general paths of major roads. This can also be done with ship traffic, where a single trajectory bundle can be produced. This ship data could potentially aid in finding solutions to environmental factors, but it also reduces clutter and is able to show splits in the trajectories. One of the final models Dr. Koop showed us was for testing the effectiveness of a data visualization through gas pricing/locations. We got to see different variations of the models and see how each has its pros and cons.

Some of the next steps and potential inquiries for Dr. Koop are to see how time data can be integrated into visual data models. For instance, how can we look at data trends over time and periods? Dr. Koop would also like to look at different types of data based on personal interests. Maybe in the future, there will be avenues for Dr. Koop to collaborate with other faculty and professors at NIU to assist them in visualizing some of their data and research. In the end, we are all so happy to finally have a computer science presenter for NX, all of us, comp. sci. or not, we were very engaged. For any students interested in this area of work, Dr. Koop teaches classes on Programming in Python and Data Visualization, and is also looking for motivated students who are looking to work on projects with reproducibility, data visualization, and trajectory analysis. Feel free to reach out to Dr. Koop if you are interested.