Dataviz tips and dad jokes
0:10 Chad
Hey, everyone. This is Chad Janacek, and welcome to a special episode of ZacCast. Special because it's just me today. Patrick is off doing whatever it is that he's doing, uh, but that's okay. Uh, he's not here. I can talk about him kind of behind his back. Uh, and also he wouldn't really have that much to contribute to this episode anyway, so... Uh, okay, that's not true. Uh, but I'm curious to see if he actually listens to this, so I'm gonna leave that in there. Uh, but anyway, I wanted to record a short episode today about data visualization. Uh, not the most exciting topic, but it is mid-August, which means you're probably trying to wrap up your budget approval here in the next few weeks, and that means that you're about to start building your adopted budget book pretty soon. Uh, submitting that budget book to GFOA, posting it on your website for the whole world to see, hoping that you designed it in a way that your residents can actually get a better idea of where and how their money is spent. And while the GFOA Budget Award is a great program, and it certainly has improved the overall quality of budget books, a lot of cities just focus on trying to get all the information that's required into their document without always thinking about the best ways to present that information. In addition, software like Excel, which I love by the way, uh, it also allows for creating some very ugly and unhelpful charts, uh, that we have a tendency to use just because we think they look fancier. So today I'm gonna do kind of a quick hit on five things that you should know about data visualization so that you can make your budget books, reports, presentations, all that kind of stuff, as easy to consume and as useful as possible. So let's get started. First, what is the purpose of data visualization? This may sound like kind of a cheap way to sort of pad this listicle of... Actually, I guess this is a podcast, so what do you call a listicle in podcast form? A listisode? Uh, I don't know. Anyway, it's worth starting by just explaining the purpose of data visualization so that we can all kind of be on the same page, and it's actually pretty simple. The purpose of data visualization is to assist in the digestion and comprehensive of otherwise raw data. Uh, data do not have any particular form or organization on their own. Datum literally means a piece of information. Um, I mean, if you took every line item in your budget and you just put them all on their own index cards and threw them all over the office, uh, you know, the floor of your office, you'd have all the data in your budget book, but without some method of organization, it would be nearly impossible to comprehend. To paraphrase Stephen Few, numbers can't always speak for themselves, so we use visualizations to aid in this process. Now, that means both charts and graphs, but it also means things like tables and infographics, whether big or small, whether they're just individual, uh, data points that, uh, that you've sort of emphasized on your, uh, on your page. Perhaps images, but also the text that's involved in your budget books. That's all part of data visualization, and it's all based around the idea of making sense of these individual data points, so keep that in mind as we continue. Number two, the type of visualization that you choose should depend on the type of data you're presenting. Um, what that means is that you need to know what type of data you're actually presenting. Uh, in order to explain to someone else what to glean from some data that you have, you have to have a pretty good grasp of it yourself. So are you trying to communicate trends or changes over time? Are you trying to compare two or more things? Are you trying to compare two or more things over time? Uh, are you trying to show whether or not there's some correlation between data points? And what are the relationships in your data sets? Two big things to take away here are, one, understanding the difference between your categorical data and your quantitative data is really important. You can think of categorical data like categories, right? Departments, account codes, account types, council districts, et cetera. They don't necessarily have any inherent order, although they can. Um, council districts don't, but your cost centers do, right? 'Cause they're hierarchical. You have your department, or you have your, your fund, your department, your division, et cetera. Categorical data are the pieces of data that help you slice and dice your quantitative pieces of data. Categorical data deals with the qualities of a data point. Quantitative data, on the other hand, deals with the measurements, the actual quantities associated with the data point. Things that can be measured. Salaries, budgets, historical amounts, things like that. If you can measure it, it's quantitative. And, uh, if it describes some quality about a measurement, it's categorical. So each of those, those two different things have different ways of telling that story, and, uh, it's important to understand the difference between the two. Now, data visualization itself, the second takeaway, is about storytelling. And storytelling is about explaining the relationships between those things. Different types of relationships lend themselves to different types of visualizations. Sometimes a table is perfectly, uh, acceptable. Sometimes a chart is a little bit more helpful in terms of getting the point across. When you have two different types of visualizations that could work just fine for a particular relationship, you should probably prefer the one that tells the story more clearly and more succinctly. So sometimes that means a table is perfectly fine, even if it maybe isn't as pretty as a really cool chart. Um, the point is we're trying to convey information, so what's the most efficient way to do that? We're trying to expose those relationships as easily as possible rather than make the reader dig through the index cards that you threw on the floor. The third item is about pre-attentive processing. So humans are largely visual creatures. There are actually more than twice as many sense receptors that feed our visual sense than the other four combined. Our brains have honed methods of processing information before we're even consciously aware of it, and we call that pre-attentive processing. So the brain quickly evaluates the environment, it filters out those things that are most important, and then it gives them to you for further processing, for conscious processing. So you can and should use this to your advantage when you design your data visualizations. Uh, the four key areas, uh, or properties that, uh, we consider pre-attentive are color, formSpatial position and movement in a, uh, a document like a budget document or really any kind of, any kind of report that you're probably gonna be preparing if you're listening to this podcast, movement isn't really all that important, so I'm just gonna focus on the other three. So the first one's color. Color is a very easy way to call attention to important things. Um, there are two, two elements that we talk about, the intensity and the hue. The hue is the actual color itself, and the intensity is a combination of the saturation and lightness. So, uh, saturation is when you have a, a higher percentage of the true color, uh, in your color, like more of a true red, whereas a desaturated red, uh, is gonna be, is going to appear a little bit more faded. Uh, and then lightness refers to how light or dark that that same color will appear. So I'm not an expert in color theory, uh, so just sort of what I think about is how strong that color appears to you. Um, the more intense, the stronger that it will appear, the less intense, uh, the more faded or less strong that it will appear to you. So a couple of quick examples of how you can use color to benefit you. Um, be careful with some of the predefined charts in Excel. Uh, a lot of times, especially if you have, uh, a lot of different categories, they may just assign a bunch of random and often bright colors to each category, and this can make it hard for the brain to distinguish what's important. So consider using a smaller number of colors, um, but adjusting the intensity of those colors to highlight the things that are more important. Now, if you're gonna do this, make sure that you be careful or, uh, be aware that, uh, people with colorblindness can have trouble distinguishing certain shades and groups of colors. So just wanna make sure that your, uh, your visualizations are accessible to everybody. Another example is, uh, if you're using a bar chart, for example, to show data over time, you might want to use a more intense version of the, your main color to highlight key data points or maybe a less saturated version to reduce the, the visual importance of, uh, items that are less valuable. So for example, maybe you're showing a, a column chart of your monthly sales tax collections. In a chart like that, it might be important to highlight months where you had year-over-year declines. So in those charts, maybe it would make sense to either increase these, the intensity of those months or, uh, potentially just change the color altogether. So the point here is using color to highlight those things that are the most important. And the big challenge when we just use Excel prepackaged, uh, charts is that a lot of times they will throw in all kinds of different colors, and it makes it really hard to quickly distinguish what's important. The second area is form. Form deals with size, shape, distance between things, things like that. Humans are really, really good at quickly perceiving the differences between length, width, size, shape, and, uh, and so forth. We're not very good at perceiving the differences between angles, which is why pie charts suck, and I'll get back to that in just a little bit. Uh, but you can use this to your advantage as well to highlight key pieces of data. Each type of chart encodes its data in a different way. Bar charts, column charts, for example, use size and length, and those are really quick things that the human brain can pick up on differences between those. The third pre-attentive property is spatial position. This deals with how things are laid out in space. Humans are very good at perceiving position on a twoD plane, so a flat surface, you know, X and Y axis or axes. We're not as good at using depth as a pre-attentive attribute, so, uh, threeD charts are not as helpful. Uh, so, so consider that as well. The fourth item is avoiding visual clutter. So if the purpose of a data visualization is to aid in the comprehension of data, we should do everything that we can to put the data front and center. So that means avoiding chrome and clutter that might seem like it looks nice or fancy, but it actually makes the chart harder to read. And unfortunately, Excel makes it very easy to add this kind of clutter to your charts and graphs. So where possible, don't use background colors, uh, especially bright ones or dark ones, um, certainly not gradients. These things don't add any value to the data that's on your graph. Uh, they in fact just take away by adding additional visual noise that has to be filtered out. You might look at removing borders around your charts because, again, these don't really add anything. Um, they just... They're just there, right? So, um, it's n- there's no data encoded in borders that are around your charts when you copy them from Excel and you paste them into your, your budget document. So just get rid of those borders around them because it's not adding anything. Another option is if you can get rid of gridlines, that may be a good option for you. Um, gridlines are not data so much as they are just affordances. They help the reader determine where on the axis something that's in the middle or on the edge of the chart actually falls. Now, if you have a particular situation where maybe one or two of your data points to the far left of this column chart are significantly higher, and the story is that we're really dependent on these two things and everything else is not as important, then maybe you don't even need gridlines at all. Um, but if you do need that affordance so that, uh, you can more easily tell where a particular column or, or bar falls on the axis, then consider just reducing the visual impact of it. Soften the color a little bit so that it doesn't take away from the data itself. It's still there to help you read the chart, but it's not trying to compete with the actual data. The last item is a hard pass on threeD charts. So not only do they, we talked about this a second ago, the depth is not a great attribute for pre-attentive processing, but often these charts are laid out in such that there's space between the actual, say, bar, uh, or column in your chart and the grid line behind it. So it's really hard to see exactly what the value is that's being encoded. And, and all of that extra work that your brain has to engage in just takes away from being able to understand what the data is showing. So the general rule here is if it's not helping you tell your story, get rid of it. If it's just there for es- aesthetics, it's probably making it more difficult to understand what you're trying to communicate.And the last thing, number five, don't use pie charts. Seriously, pie charts are the worst. There's almost no reason to ever use a pie chart, and I have saved this for last because it's gonna be a little bit of a rant. But bar and column charts encode information using length and width of the bars. Uh, we talked a minute ago about how, uh, length and width are pre-attentive attributes. They're very easy for us to pick up on. Line charts encode information using two D spatial positioning. That's another pre-attentive attribute. Pie charts encode information using the angles of the slices. And as I mentioned before, we're not very good at quickly determining the differences among angles. Um, if I had to distill down why I think pie charts are not good, one, again, we're better at judging differences in length and spatial positioning than differences in angles. Uh, Excel will usually throw a lot of different colors on a chart, which makes it more difficult to distinguish what's important. Um, and because there's no space between the slices, uh, it's, it's kind of a necessity, right? You have to have different colors for all the different slices. Uh, Excel often doesn't sort the pieces by size, so there's often no inherent order to the pie chart, which makes it even more difficult to figure out which one's smaller, which one's bigger. And because of the way they're designed, they often include lots of arrows and labels and things like that, which just adds to the visual clutter. You can almost always convert a pie chart to a bar or a column chart and improve the ability of the reader to quickly understand the story that's being told. In fact, most of these problems just go away automatically. Uh, again, bar and column charts use length to encode data rather than angles. Um, often Excel will just use a single color for, um, a, a single series bar or column chart, so you don't have to worry about having all those different colors to distinguish. Uh, you may still need to sort your data to have it line up in order, but even if you don't, because it's easier for us to see the difference between lengths of lines, uh, it's, it's still easier for you to determine even if they aren't in a, uh, an actual sorted order, it's still easier for you to determine which one's bigger, which one's, uh, smaller. And because the labels are adjacent to the bars or the columns, you don't have the need for the extra visual clutter of the arrows and the call-outs. But if for some reason you do need to use a pie chart, a couple of suggestions. Make sure that you order your data, sort it before you create the chart from biggest to smallest so that Excel will put the pieces in size order. So at least you can have some kind of implicit order in, uh, the relative importance of those pieces. Rather than using just random colors, consider using shades of the same color. So, uh, your biggest piece may have the most intense, uh, variation of a particular color, maybe a blue or a green or whatever. And then as you move to smaller and smaller slices, just lower that intensity a little bit, make it a little bit more faded so that, um, it's easier to visually discern what is the most important piece. And if you have a lot of really small slices at the bottom, maybe just combine those into one category. So that's all I got. The truth is I really just kinda wanted to record this so I could rant about pie charts again. But hopefully you picked up a few tips that can help you tell your story a little bit more effectively. Uh, if you have any questions or comments, don't hesitate to reach out. Uh, you can find me on Twitter at Chad Janacek or, uh, chad@zactech.com. I will post some articles and resources in the show notes. Uh, if you enjoyed this, if it helped you at all, please consider giving us a rating in iTunes or whatever platform you listen to. Thanks for joining me. We'll see you next time.