Introducing Steam Gauge: Ars reveals Steam’s most popular games

@ 2014/04/16
After throwing out pages that are marked as private or invalid for some reason, our crawler leaves us with a sample of about 80 to 90,000 valid Steam user pages every day. From that, we can generate an estimate of the percentage of all Steam users that have bought/played any particular game (and how many hours they’ve spent on that game). We then multiply that ratio out across the total size of the Steam Community ID universe (about 172 million but growing every day) to generate our sales and gameplay estimates.

Sampling what amounts to just 0.04 percent of Steam Community pages every day might not seem like an effective methodology, but the power of random sampling means that should be enough to generate a margin of error of only 0.33 percent from the actual numbers, statistically. (This is the same reason national political polls can be so accurate by sampling with just a few thousand likely voters.)

Still, to be as accurate as possible and to smooth out some noisiness in the day-to-day samples (especially for games that aren’t major sellers), we're using a three-day rolling sample to generate our final reported numbers. That means every day we “throw out” the data from three days prior and replace it with newer data from more recent crawling. Our rolling sample generally includes data from more than 250,000 valid Steam Community profiles at any time.

No comments available.