FULL TRANSCRIPT BELOW:
Daniel W.: Welcome to the American Small Business Institute. I’m Daniel Whittington. I am here with Cedric Yau.
Cedric Yau: Yes.
Daniel W.: How are you, sir?
Cedric Yau: I’m great.
Daniel W.: Now Cedric is sort of a data king and I use that term very vaguely but tell us what you do in the data world.
Cedric Yau: All right, let’s see so I run a company called Linebacker Data recently rebranding. We work with a hedge fund in New York. They have the problem of crunching data set with over 20 billion transactions. It’s about six, seven terabytes of data. We’ve taken that process from something that running on your personal computer could take about a year and a half to crunch through. We do this on Amazon, Amazon’s web service platform in under an hour.
Daniel W.: What does that do for them?
Cedric Yau: It gives them insights almost you could say a trading day in advance.
Daniel W.: Okay, I have a full … You work with a hedge fund that gets a full day advance notice on data that they can use to act on?
Cedric Yau: Yes.
Daniel W.: That’s some serious data crunching. The reason I drug Cedric in here is not because he’s an amazing human being but also because all the time in small business we hear, “Oh well, well data this, big data that. Google has all this data. Everybody has this data and it’s going to make you irrelevant, right?” One of the things that Cedric has talked about many times is there are problems data doesn’t solve. You were telling me about the flaws in viewing the world as only a data set.
Cedric Yau: Yeah, it’s that it tends to expose our own inherent biases. It’s something I like to call the Green Jellybean Problem.
Daniel W.: I like the name of this.
Cedric Yau: The idea is that it comes from this computer science comic called XKCD and if you’ve seen those. Somebody comes in and says that, “Oh, green jellybeans cause cancer.” The reason they show is is that the scientific study shows that in this data sample wherein this one test you have a so-called P value of less than 0.05 so less that one in 20 chance of being wrong or being a false positive. Anyway, you fast forward and you say, “Oh, well we didn’t say that, well it was inconclusive. Maybe it’s a different color, maybe it’s the dye or maybe it’s modern jellybeans.” They do a test and they show 20 panels that it’s a different color. It’s, now it’s a blue, Red Number 1 or Blue 17 causes cancer because they’ve done this one test.
This is the one challenge is that a lot of times now with all the data that is out there you could talk about stuff like Google Trends if you were looking at that or you know all the stuff that people are talking about like the Facebook Graph. There’s data everywhere and you can find everything for your business. The challenge is that a lot of times we don’t ask the question, when could this data point be wrong or are we taking this out of the wrong context? Richard Feynman, Nobel laureate, once said that you have to be careful not to fool people and the easiest person to fool is yourself. The challenge these days is that because there’s so much data available we can always cherry pick to find something that agrees with us.
Daniel W.: When I grew up my dad kept a whole bookcase full of authors that he disagreed with and he would always say, “If you can’t understand the things you disagree with you’ll never look at the things you believe correctly.” It seems like that’s what you’re saying is in the world of data be careful not to wade into it looking to prove yourself right.
Cedric Yau: Yeah, because you will always be able to do that. I find that this is the top challenge that we see you could say in the trading world where somebody will say that, “Oh this data is so beautiful because it agrees with me. Yup, I had this theory this was going to happen and here’s that did backs up. The data’s beautiful.” The other great thing is when the data doesn’t agree with you then we say, “Oh, it’s just noisy.”
Daniel W.: It’s like that book How to Lie with Statistics.
Cedric Yau: Yup.
Daniel W.: You can really make numbers dance so I think that’s perfect. I think our takeaway is look as a small business owner from every side you’re being faced with data onslaught. The thing you need to protect yourself against is going in to prove what you already believe because data will definitely do that for you. Well thanks for taking the time Cedric.
Cedric Yau: Yeah.
Daniel W.: We’ll see you guys next week.