Welcome to the Orientation for the Statistics Self-Assessments.
The following five mini-cases provide a self-assessment that determines if the reader has sufficient statistical knowledge to begin learning about business analytics. Each case contains questions that help assess your application of the statistical concepts. A commentary follows each mini-case that highlights and discusses the statistical concepts being tested by a case.
This self-assessment has no formal scoring but a reader that cannot answer any question will benefit from taking a Statistics tutorial before studying business analytics.
[[Click this link to being with Mini-Case S1|S1 Customer Research at an Apparel Firm]]The marketing manager for Sunflowers, an apparel firm, conducts research using a simple random sample of customers. The manager discovers that 80% of Sunflowers shoppers want the company to develop an online sales platform and that two-thirds of those shopper indicated that they would buy significantly more of our product if they could have the convenience of buying online.
A second manager states that the company should only be interested in selling to above-average shoppers. Therefore, taking a random sample was using the wrong approach.
A third manager summarizes the result of the company’s suggestion box into which in-store shoppers can make comments about the business. “The suggestion box reveals that our customers are very pleased with our physical, bricks-and-mortal locations.”
1. What is a “simple random sample?” Was the marketing manager correct to take a random sample?
2. The second manager seems interested in targeting the firm’s better-than-average customers, a reasonable business goal. When focusing on only part of a population, is taking a random sample a wrong approach?
3. What the possible advantages and disadvantages of using the third manager’s suggestion. Is using a suggestion box instead of taking a random sample a better way to support fact-based decision making.
* [[Review the Commentary for Mini-Case S1|S1 Commentary]]
* [[Continue with Mini-Case S2|S2 The Product Marketing Presenter]]The Sunflowers product marketing team convenes a separate meeting. Dylan, the product marketing manager, begins the discussion.
“In the product marketing team, we all know the importance of using business statistics. After that seminar on data visualization that many of us attended last week, I thought it was important to visualize the results of our recent sampling of customers so that we could better understand our results.”
“Recall we asked that sample to rate our main product using a scale of 1 to 5 in which 1 was unsatisfactory and 5 was excellent. When I first visualized the results as a scatter plot, I unfortunately found there was no pattern to the data.”
Below are the product rating visualizations:
<img src="AnalysisOfProductRatingScatter.png" alt="AnalysisOfProductRatingScatter" style="max-width:100%; display:block; margin-bottom:10px;">
“I also constructed the following histogram, but I may have done something wrong, as I seem to recall that the middle bar should have been the tallest and that the chart should be symmetrical.”
<img src="AnalysisOfProductRatingBar.png" alt="AnalysisOfProductRatingBar" style="max-width:100%; display:block; margin-bottom:10px;">
1. What conceptual error about statistics underlies the scatter plot? When are scatter plots useful for visualizing data?
2. Was the histogram a good choice for visualizing the data? Explain.
3. Should one expect all histograms to be symmetrical and a “middle bar” the tallest? Would one ever see such a histogram? Explain.
What would you like to do next?
* [[Review the Commentary for Mini-Case S2|S2 Commentary]]
* [[Continue with Mini-Case S3|S3 Thinking About the Future]]Managers at Sunflowers seek to overhaul their signature product line of apparel. Being cautious, these managers first want to conduct a trial to discover customer reactions to two possible future versions of the product: an “A” version that has superficial packaging differences with the current version and a “B” version that contains a fully redesigned product.
Using a previously assembled list of customers, managers randomly select and contact customers until they have 30 customers for Group A that will evaluate the “A” product version. Then, on the company’s Facebook page, the managers post a message asking customers to volunteer to try out a new version of the main product and select the first thirty customers to form Group B that will evaluate the “B” product.
After two weeks, Sunflowers managers asks the customers in the trial for their reactions to the product they are evaluating. The following table summarizes customer reactions.
<table border="1" cellspacing="0" cellpadding="5">
<tr>
<td><strong>Reaction</strong></td>
<td><strong>Group A</strong></td>
<td><strong>Group B</strong></td>
</tr> <tr>
<td><strong>Like</strong></td>
<td>23</td>
<td>26</td>
</tr>
<tr>
<td><strong>Did not Like</strong></td>
<td>7</td>
<td>4</td>
</tr>
<tr>
<td><strong>Percentage Like</strong></td>
<td>77%</td>
<td>87%</td>
</tr>
</table>
The managers continue the trial for two more months and again contact the customers for their evaluation. This time, not all customers respond, but using the responses of those that did, the managers construct the following summary table.
<table border="1" cellspacing="0" cellpadding="5">
<tr>
<td><strong>Reaction</strong></td>
<td><strong>Group A</strong></td>
<td><strong>Group B</strong></td>
</tr> <tr>
<td><strong>Like</strong></td>
<td>19</td>
<td>25</td>
</tr>
<tr>
<td><strong>Did not Like</strong></td>
<td>7</td>
<td>5</td>
</tr>
<tr>
<td><strong>Percentage Like</strong></td>
<td>63%</td>
<td>83%</td>
</tr>
</table>
Looking at these results, Sunflowers managers declare the radically overhauled product (“B”) to be an “overwhelming” success (83% to 63%) and begin plans to make the necessary changes in order to produce the “B” product version.
1. Were there any problems or issues with the way the trial was conducted?
2. Should the managers have done something else before committing to make the “necessary changes” in order to produce the “B” version of the product?
3. When are differences important and noteworthy? How can a manager know that a difference seen is important and actionable and not caused by natural, expected variation in the data?
* [[Review the Commentary for Mini-Case S3|S3 Commentary]]
* [[Continue with Mini-Case S4|S4 Talking Statistics]]Someone discovers that some friends have been taking a course in introductory statistics. The person comments, “Numbers can be faked. Heck, last week, I heard one poll that had a candidate up by twelve percentage points in a local political race, while another poll had the same candidate trailing by 4 points. Haven’t you ever heard of the quote about there being ‘lies, damned lies, and statistics?’”
1. Is statistics just “numbers?” Is statistics something worse than “damned lies?”
2. What are some of the reasons that cause valid political poll results to vary?
* [[Review the Commentary for Mini-Case S4|S4 Commentary]]
* [[Continue with Mini-Case S5|S5 Streaming Media Players Cause Depression]]
A popular streaming media company faces a public relations challenge. Recently, on a nationally syndicated talk show, a famous social psychologist notes that the company is the cause of increased depression in young adults.
The psychologist explains that ever since the introduction of the company’s media player, rates for depression disorders among young adults have increased in the same way that sales of the media player has increased.
“It’s obvious,” the psychologist says, “the increasing sales of streaming media players correlates with the rising rates of depression in young adults.”
1. Does a correlation between two things prove an actual relationship? Should correlations always be explored?
2. A critic of the social psychologist claims that that data actually shows that “the rising rates of depression cause the increasing sales of streaming media players.” Would this opposite cause-and-effect be more valid than the cause-and-effect the social psychologist identifies?
3. What must exist in order to use changes in one variable to predict or estimate changes in a second variable?
* [[Review the Commentary for Mini-Case S5|S5 Commentary]]
* [[Continue with the Post-Assessment Review|Post-Assessment Review]]Having taken the self-assessment, readers should reflect how much of the answers seem new or incomprehensible and how much made sense.
A reader that generally could anticipate the answers to the question properly has sufficient background to proceed with this book, even if that reader has not fully mastered all concepts (that mastery can be learned in tandem to reading this book).
Readers that could not answer most questions and had very little understanding of the concepts assessed will benefit from a statistics tutorial. (Readers enrolled in a course about business analytics should discuss with their instructors their level of preparedness.)
[[Redo Self-Assessment|Orientation]]
```
A simple random sample, each occurrence in the population (Sunflowers shoppers in this case) has an equally likely chance of being selected for the sample. Yes, proper sampling is needed to ensure that the sampling is consistent to the theoretical underpinnings of the statistical method to be used. One can focus on part of a population, but sampling from that part should be proper sampling, such as selecting a (simple) random sample, so the second manager is a bit confused.
Data collected from a suggestion box data does not constitute proper sampling, so any data cannot be analyzed by statistical techniques. Suggestion box data can be biased, not properly reflecting the population of interest (all Sunflower shoppers). However, such data might serve as an early-warning system to inform managers of possible developing issues that might be grounds for further investigations.
* [[Continue with Mini-Case S2|S2 The Product Marketing Presenter]]
Dylan has made several errors in attempting to visualize the product rating. Scatter plots visualize the relationship between two numerical variables.
Customer ID is a not a numerical variable, even as its values are digits, because this variable does not hold a counted or measured quantity. Also, when developing the scatter plot, the variables chosen should have some plausible real-world relationship, something that does not exist between an ID number and a rating.
This histogram was a poor choice because this visualization is used with continuous numerical data. Because the five-point scale defines five categories, some sort of bar chart would be one of several good choices. Dylan seems to be remembering some attributes of the normal distribution, an important foundational concept. Not all data sets are symmetrical like the normal distribution is, so a histogram can be asymmetrical with the tallest bar not being the “middle bar.”
* [[Continue with Mini-Case S3|S3 Thinking About the Future]]
By using improper sampling to form Group B, the trial was flawed from the start.
In examining products A and B, customers were evaluating different attributes of the product and its packaging, confounding, if not invalidating any direct comparison of A to B. (A much better comparison would have been if both groups were looking at two versions of same product attribute such as two different packaging redesigns.)
The managers also ignored the missing data from the customers that chose not to respond the second time. Significantly in this case, that all of the missing data is from Group A, confounds the final result.
The managers should have used a proper statistical procedure that would have generated useful data that could have been used to decide about the apparel line. Given the number of missing values, a larger original sample size would have been a better choice.
Eyeballing differences in a summary table is not a proper substitution for applying a statistical method design to identify differences between two (properly constructed) groups.
* [[Continue with Mini-Case S4|S4 Talking Statistics]]This assessment goes to the most important point about statistics and business analytics. Statistics and business analytics are tools to generate useful information for fact-based decision making.
Statistics, itself, does not lie, but by deliberate or accidental misuse of statistics, a distorted view of the things that a data set represents can be presented.
If the polls were conducted properly, two polls might vary due to random factors (“random chance”). Random chance is also the cause for certain business analytics methods not having repeatable exact results.
* [[Continue with Mini-Case S5|S5 Streaming Media Players Cause Depression]]
Finding a correlation between variables does not imply that any real-world cause, a principle summarized in the phrase “correlation is not causation.”
Random chance causes correlations among data that have no causality, a point made humorously in Tyler Vigen’s Spurious Correlations. Only correlations for which one might imagine a real-world cause should be examined.
For example, one of Vigen’s spurious correlations notes how the annual divorce rate in the state of Maine correlates with the annual per capita consumption of margarine. Although these two things correlate quite well, in an “obvious” way, there is no reasonable explanation for the correlation other than random chance.
The psychologist’s critic makes the same error as the psychologist, suggesting a correlation with no know basis.
* [[Continue with the Post-Assessment Review|Post-Assessment Review]]