A way to see if a survey or experiment has meaningful results
If a test is statistically significant, it means the results of the test are real and not an error caused by random chance
Usually, you need a statistical power of at least 80% to consider your results statistically significant
The power of test: T-test and hypothesis testing
Business scenario | How proxy data can be used |
---|---|
A new car model has just launched a few days ago and the auto dealership can’t wait until the end of the month for sales data to come in. They want sales projections now. | The analyst proxies the number of clicks to the car specifications on the dealership’s website as an estimate of potential sales at the dealership. |
A brand new plant-based meat product was only recently stocked in grocery stores and the supplier needs to estimate the demand over the next four years. | The analyst proxies the sales data for a turkey substitute made out of tofu that has been on the market for several years. |
The Chamber of Commerce wants to know how a tourism campaign is going to impact travel to their city, but the results from the campaign aren’t publicly available yet. | The analyst proxies the historical data for airline bookings to the city one to three months after a similar campaign was run six months earlier. |
Here's an example. A nasal version of a vaccine was recently made available. A clinic wants to know what to expect for contraindications but just started collecting first-party data from its patients.
A contraindication is a condition that may cause a patient not to take a vaccine due to the harm it would cause them if taken.
To estimate the number of possible contraindications, a data analyst proxies an open dataset from a trial of the injection version of the vaccine.
The analyst selects a subset of the data with patient profiles most closely matching the makeup of the patients at the clinic.
Overcoming the challenges of insufficient data
Most industries hope for at least a 90% or 95% percent confidence level.
After you have plugged your information into one of these calculators, it will give you a recommended sample size. Keep in mind, that the calculated sample size is the minimum number to achieve what you input for confidence level and margin of error.
The maximum amount that the sample results are expected to differ from those of the actual population