    Using Public Data In Your Business

    Imagine that you plan to open a new store at a place, what is information you need to ensure you have made the best decision. Other than using your own data in business analysis, will you consider the data like population, composite of gender, education, number of children per family, distance from bus and train stations, average personal income, number of vehicle registrations, school dates,..., etc.? By combining public data, I believe, to extend the scope of your research will enhance the possibility of your bet and ensure your long term success.

    More examples of using public data

    Local data is equally important as government census data. For examples, John Gagnon1 wrote Super 8 hotel was able to charge high rate for the first Formula One race in 2007 and expected the same in 2013, national business moved its back-to-school inventory from place to place according school opening dates, and local business determined if they can deliver to customers within a certain timeframe. Al Tompkins2 interviewed Doug Haddix how journalists can use public data for stories about their communities based on 2010 U.S. census data.

    Why public data is important

    Meta Brown3 listed the reasons why census data are critical to because of “accurate and comprehensive”, “consistent over time and place”, “granular (city block groups)”, “comprehensive content”, and “timely”. Thus business can utilize the data to project better on investment, to determine how skilled its labor force, to project sales behavior, shopping preferences,..., etc., given these characteristics of census data.

    Things to consider when use public data

    Public data should be treated as “secondary data” because it is not directly derived from your source/sample. Therefore some caution must be taken in use. Don Patrick4 point out things to consider when use secondary data in research. These include looking into the definition and scope of data, check the measurements of data and verify if the data is used in any studies before,..., etc. Joseph Rabianski5 pointed out the similar factors to be considered when use secondary data in research.

    Use public data in auto sales study – proof of concept

    Suppose that we plan to analyze U.S. auto sales data by states, what other public data we can apply to our model. Data below are included in our study.

    1.Sales data is published by NADA6

    2.Demographic data:population, race, birth rate

    3.Economic data:personal income, gross domestic product, unemployment rate, housing sales, federal aid to state

    4.Criminal data:motor vehicle theft

    5.Geographical data7:list of regions like Northeast, Midwest, South, and West

    We can apply data mining to detect if any hidden pattern exists when add more data to the model.


    Applying public data is nothing now. However, the interesting point in the recent trend is more data have become available to the public and the potential value of these data should not be overlooked in your research.


    如同政府的人口普查數據,地方型的數據也很重要。舉例來說,John Gagnon1在一篇文章中提到,速8酒店(Super 8 hotel)參考了地方數據,從2007年開始便自F1賽車(first Formula One race)的賽事期間收取較高的費用,並預期在2013年也能以此方式獲利;全國性的企業根據各地學校的開學日期,調整各地域的商品庫存;地方型企業利用當地數據判斷是否能在一定的時間內交貨給客戶。Al Tompkins2採訪Doug Haddix時提到記者如何使用2010年美國人口普查的公開數據撰寫地方報導。


    Meta Brown3列舉普查數據之所以至關重要,是因為數據「準確而全面」、「時間和地點一致」、「精細度(可達到城市街區群體)」、「全面性的內容」與「及時性」。因此,企業可以利用這些數據的特點進行更好的投資計劃、決定員工的技術能力熟練度、計畫銷售行為模式、購物偏好……等等。


    公開數據應被視為「第二手資料」,它不直接來自於您的資訊來源與樣本,因此必須謹慎使用。Don Patrick4列出使用第二手數據資料研究時要考慮的課題,這些課題包括:確認該數據的定義和資料範圍、檢查數據的量測方式、驗證數據是否曾經被用在其他研究……等。Joseph Rabianski5也指出使用輔助數據研究時要考慮其他類似的因素。



    1. NADA6發表的汽車銷售數據
    2. 人口統計數據:人口、種族、人口出生率
    3. 經濟數據:個人收入、國內生產總值GDP、失業率、房屋銷售狀態、聯邦援助
    4. 犯罪數據:機動車盜竊
    5. 地理數據7:東北部、中西部、南部、西部等地區的列表

    如果添加更多的數據,我們可以應用「Data Mining」來檢測任何隱藏模型的存在。




