More Misunderstood Truths About Statistics
Published 4/2/2018
We've established that statistics are useful and more relevant in our day-to-day work life, but how do statistics effect our personal selves? In Today's episode we're talking about what statistics mean for our personal selves and how we make our decisions.
Today's episode is sponsored by Linode.
In 2018, Linode is joining forces with Developer Tea listeners by offering you $20 of credit - that's 4 months of FREE service on the 1GB tier - for free! Head over to https://spec.fm/linode and use the code DEVELOPERTEA2018 at checkout.
Transcript (Generated by OpenAI Whisper)
So we've established that statistics are useful and perhaps more commonly referenced in our normal conversations than we may realize at the outset. But how can we say that statistics matter when we're looking at our own experiences? How is it that statistics apply to me? And more importantly, what can I do to understand my own statistics? It seems that I need a much larger group of people to measure rather than just measuring my own self. That's what we're talking about in today's episode. My name is Jonathan Cutrell and you're listening to Developer Tea. My goal on this show is to help you as a driven developer uncover your career purpose and to do better work that has a positive influence on the people around you. So we've talked about statistics in the last episode and it's something that I'm really kind of passionate about. I'm passionate that people understand what statistics mean to them. What is the place of statistics in your life and in your decision making? Decision making is a statistical process. When you break it down, if you have one option versus another option, even if your option is a non-option, for example, will I work out or not, you are still weighing to options when you make a decision. And ultimately, the way that humans tend to make decisions relies on that option kind of weighing more heavily in the positive direction than the other ones. This is what we talked about in the last episode, the Bayesian Decision Theory. This is a decision theory that says that we trade off the costs and the benefits, the cost-benefit analysis. These are all statistic terms, but I do want to kind of go back to something we said in the last episode. It's something that I hear often about statistics and most often in individual cases. Why do statistics apply to me? We hear the phrase that you don't want to become a statistic. This is a kind of a native connotation that you don't want to be one of that group of people that something bad happened to you. And so you should, for example, wear your seatbelt or drink a cups of water a day. But on the flip side, we also have kind of a reticence to view ourselves in terms of our quantified data. Now, this isn't true across the board. Certainly, the quantified self-movement is a big kind of vote in the direction of understanding yourself in terms of data. But I want to talk about why this is important first. And then secondly, areas where individual statistics actually do make a difference. They are measured and they are used on a regular basis. So we're actually going to do those in the opposite order. We're going to start by talking about a few types of information that are measurable at an individual level that we can apply to our decision-making process, to inform us into the future. So very simple examples certainly come from sports, for example, batting average. In a given season, a player may have between 450 and 650 at bats. And certainly during practice, significantly more than that, of course. And so what the measurement of bat percentage, the batting average, that's measuring how often does that person the batter, the player, how often do they get what's called a safe hit. In other words, a hit that doesn't turn into a foul, for example. So this is expressed in terms of a ratio. And the ratio is, if it's, let's say, for example, that you have five out of ten, well, your batting average would be 0.5. This batting average is totally unreasonable to expect because that would mean that every other ball that you get thrown, you hit safely. And the highest batting average on record in Major League Baseball is Ty Cobb, unless somebody has unseated them from my quick Google searching, and Ty Cobb had a 0.366. So what does this tell you? And why is it useful to batters, to baseball players? Well, it tells you, obviously, the direct information that it tells you is how often does this person hit a ball. But it may also tell you in a given game, is this person doing better than they normally do? Or are they doing worse than they normally do? And if you look at a few games in a row, if the batting average or the course of those few games is lower than their overall batting average, right, we can start to draw out new information. And this is all about a single person's performance. Another very simple example of this is how many steps you take per day. For me, the number is almost always the same on a given day of the week. So Mondays often look the same as other Mondays and Sundays often look the same as other Sundays. And as it turns out, most of us have this particular information available to us, especially if we have enabled certain types of apps or if we have something like a Fitbit that's tracking this information. But what is this information telling us? Well, it tells us about our patterns. For example, I'm much less active on the weekends as a general rule because I tend to stay home and spend time with my family rather than, you know, during the week I'm out at work, I'm doing walking meetings, you know, I might walk from one location to another in the middle of the day. And that typically doesn't happen as often on weekends. So this information is reflected in the quantified data. If I look at the information, what's interesting is you can also look back at times when I was traveling, perhaps walking through airports or, you know, maybe I was out touring an area of a new city that I haven't been to. And you can see that my steps are significantly higher on those days, sometimes 13,000, 14,000 steps in a single day. But as a general rule, we can look at our activity for Mondays for me. It just so happens that the highest correlation is based on those weekday alignments. For Mondays or maybe for you, it's over the course of time you can look at that average trend. And most people will have similar numbers on a daily basis. So what is this describing and why is it important? Well, again, we're taking this idea of statistics and applying it at a singular person level. There are many variables that every person faces every day. And it would be easy to use the variability of my day to make a claim that my step count is unreliable as a way of understanding my activity. But the data shows differently. If I go and look at the data, then I know that my average behavior stays about the same. Now this may produce different results for different people. There may actually be a high level of variability for most people. But the truth still remains that, at least for some people, for this particular metric, there is a reliable statistic analysis that can be done that can give you insight into your own behavior and how you may change it. And of course, we aren't hyper-focusing on step count. There are plenty of other quantifiable pieces of information that you can track throughout the day about yourself. Whether it's about your health, like, for example, tracking your calories, you'll probably find that you eat the same number of calories unless you are actively changing that number on purpose on a given day. Other numbers might include the amount of time that you sit in traffic per day. You may feel like traffic is heavier at the end of the week or at the beginning of the week. And in some cases, it may be heavier at the end of the week or at the beginning of the week. You're likely to find that, on average, for the same trip, your amount of time traveled is going to be very close. And your perception can change this. Your perception can make traffic feel much longer. Now, this can be a problem, and that's what we're going to talk about after we talk about today's sponsor, Linode. With Linode, you can get up and running in just a few minutes. By picking your Linux distribution, your node location, and the resources you need for your application, and then pressing go. You launch your Linode, and you have access with a ton of tools and support that Linode provides on day one. Linode has 24 or 7 customer support. They have developers on staff. They can even do your DevOps for you. They can have professional development operations services that you can take advantage of so that you can focus on your business, rather than focusing on the technical implementation details that really don't add a bunch of value to your business. As developer, that may mean simply focusing on the code or on a user experience or the things that you actually care to focus on, rather than getting your site back up after the server crashes. One of my favorite things about Linode is that they provide their services all at an hourly rate. What that means is you only pay for what you use. Go and check it out. It's specs out of them. Slash Linode. If you use the code developer to 2019, all one word at checkout. They'll give you $20 worth of credit towards any of their services. That's not just the Linode itself, but also their supporting services. For example, node balancer. Go and check it out, spec that FM. Slash Linode. Thank you again to Linode for sponsoring today's episode of Developer Tea. So why is it important that we look at these statistics about ourselves when possible? And when do we know when we can discard them? Because it is true. The intuition is correct. That we cannot always trust statistics. Sometimes we absolutely need a larger collection of information, a larger amount of information for anything meaningful to come out of it. So if we only have one or five or maybe even 20 data points about our life, about a particular trend that we're trying to measure against, it's very unlikely that we have enough to create a trend. And what is going on here? Why is it that we can kind of not even look at that data because it's just not enough there? The reality is, for any kind of data point, for any kind of information, depending on the possible outcomes, and depending on the complexity of the scenario, the number of variables that you are controlling for, there needs to be a minimum amount, minimum amount of data available. So I'll give you an example. If you flipped a coin 20 times, and every time you got heads, this is a very unusual scenario. Very unusual to get heads 20 times in a row. And perhaps at that point you might be questioning maybe this coin has heads on both sides or maybe it's a trick coin, right? So this is an unusual outcome. And it's significant, right? It is significant because of the lack of complexity, the lack of variables that go into this. They typically with 20 flips, maybe you would see five in a row, or maybe you would even say ten in a row, but 20 in a row, and we're kind of using an arbitrary number here. So the number is not as important as the intuition. On the other hand, let's say you are a business owner and you have a physical storefront, and 20 people have walked by, and you are in a relatively busy area. Should you, they have, they've walked by and they haven't walked into your store, 20 people. Would it be reasonable to deduce from this information, which on its face says that 100% of the people walking by did not come in, should I then extrapolate out into the future and say no one will ever come into the store, or even on the average case, most people, almost all people will not come into the store. Now whether or not it is statistically reasonable is not really the question here, because the statistics would tell you to predict a 100% rate of no one coming into the store. However, you would probably be shutting your doors a little bit too early, because making that judgment call really relies on a significantly higher amount of traffic, especially if your store is a niche store, for example, right? Perhaps your rate of acquisition of people should really only be measured once a thousand people have walked by the store, and if at that point no one has walked in, then perhaps you consider changing your storefront. So this is important intuition again about statistics, because it really is kind of the common reason to remove statistics thinking altogether. This is because we don't have enough information. It's not reliable enough. There's not a large enough sample size, and therefore we're not going to engage it at all. This is a reasonable response to unreliable data, right? It's a reasonable response when you have a small amount of data. In generally speaking, when you are a small company, like most startups, most agencies, most of the companies that are people who are listening to this podcast, you work at companies that are small enough that statistical thinking is probably not kind of at your core, right? This is definitely true for the company that I work out. Now does this mean that you can throw away statistics altogether and never come back to it? The answer to that question is absolutely not. So what is it about statistics thinking that is important? Well as we mentioned earlier with our traffic example, we may have an estimation of traffic that is very different from reality. Now what is it that causes this? Our perception versus a measured reality is very often different. We see this happening with estimation errors all the time. We have the perception that is very difficult to shake, by the way, even for seasoned programmers, even for people who consistently have issues with estimation. It's very difficult to shake this perception issue of estimating in a way that doesn't reflect a measured reality. In other words, we overestimate what we're able to do and we underestimate any kind of variable that may be outside of our control that will increase the timeline well beyond our estimate. Now in our minds, this is not a reasonable thing to expect because we see things differently, very often because of our perception, we have a warped view of reality. And this is, again, this is very difficult and it's very common. In fact, everyone has distortions of reality in their perceptions. However, when we can measure things, we can have a common ground that we look at together. It's much more difficult to warp a common ground, a measured common ground. Now just because we measure it, does that mean that it adequately represents reality at a large scale? For example, let's say that you have that storefront again and you're measuring people coming in and you're getting a 5% rate of people coming into your store. Well, does that mean that you should just blanket, expect for 5% of all people to come in into your store? Very unlikely that your measured statistics are going to represent a holistic view of reality that has no other variables. For example, there may have been a convention in town full of people who very much so appreciate your store or very much so don't appreciate your store. And this could skew your measurements. We have to take everything that we measure with a grain of salt and remember that the number of variables that may be affecting something is almost immeasurable, almost infinite. It's very difficult to measure the distant effect, even things like butterfly effect come to mind that have an impact on whatever it is that you're looking at. But what we don't want to do is avoid measuring altogether or adding to that pot of variables ourselves by relying on our perception, relying on intuition. These are things that we know are not very good at measuring. We know that our perception or intuition, they're not good at measuring and coming to a common ground that we can share with other people, that we can use to predict the future that we can use to describe our present situation. Thank you again for listening to today's episode. I hope you've enjoyed this. The last two episodes on statistics, I know this is a hot button issue, especially between developers and other types of roles and companies. So I encourage you to do everything you can to make this practical and reliable and not to create a false loyalty to statistics, just because you understand them or you believe in them, instead create a loyalty to another person. Remember that all of this should be viewed through the lens of connecting with other humans and helping make your efforts together better. Thank you again to Leno for sponsoring today's episode. Depending on how everyone is responding to these last two episodes, there's a lot more that we can unpack. Thank you again to Leno. Thank you so much for listening to today's episode of Developer Tea. We very likely will continue some discussions on statistics depending on how everyone is responding to these last two episodes. There's a lot more that we can unpack about how we can use statistics to make more reliable decisions. And in what ways do we skew these with our perception? You know, we kind of glossed over that in today's episode, but that is certainly an important part of this discussion. Thank you so much for listening to today's episode and until next time, enjoy your tea.