A Simple Method for Analyzing Books

A recent Pew Research Center study found the following:

  • Americans 18 and older read on average 17 books each year. 19% say they don’t read any books at all. Only 5% say they read more than 50.
  • Fewer Americans are reading books now than in 1978.
  • 64% of respondents said they find the books they read from recommendations from family members, friends, or co-workers.
  • The average reader of e-books read 24 books (the mean number) in the past 12 months; the average non-e-book consumer read an average of 15.

The first bullet above is pretty remarkable. Using 17 books/year with, let’s say, 40 years of reading (above the age of 18), that’s 680 books read in adulthood. That’s a lot.

This got me thinking about how we decide which books to buy and how our decisions on which books to buy adapt with each book that we read. Are we in tune with our changing desires and interests and is our feedback loop from both positive and negative reading experiences, well, accurate and efficient?

Some time ago, I began collecting data on my book reading experiences to allow me to analyze exactly that. Given the Pew study, I figure I’ll share my methodology in hopes it makes sense to someone else. Star ratings such as that on Amazon are certainly helpful, but my hope is to perfectly understand what works for me as to make my decisions on reading material accurate, efficient, and part of a lifelong journey for knowledge and inspiration.

Known Data Elements (Both Categorical and Quantitative)

  • Author
  • Type (Non-Fiction vs Fiction)
  • Genre (Thrillers/Suspense, Science/Technology, Current Affairs & Politics, etc.)
  • Number of Pages (using hardcover as a standard)
  • Date Published

Personal Data Inputs (upon book completion)

  • Date Completed
  • Tags/Notes
  • Readability, Flow, & Structure (RFS) – A score ranging from [0.0, 5.0] subjectively assigned to a book based on ease-of-read and the overall structure of the book.
  • Thought-Provoking, Engagement, & Educational Value (TEV) – A score ranging from [0.0, 5.0] subjectively assigned to a book based on how mentally stimulating it was in terms of knowledge and thought.
  • Entertainment, Suspense, & Likeability (ESL) – A score ranging from [0.0, 5.0] subjectively assigned to a book based on the entertainment value and overall likeability of the story, characters, and/or information presented.

Those three metrics (RFS, TEV, ESL) allow one to create a overall score for the book. My overall score is a simple sum of the three metrics, divided by the maximum possible score (15.0), and expressed as a percentage (ranging from 0% to 100%). Although I have not yet conducted any correlation studies or categorical analyses using my data (which I have for 42 books starting in Aug 2004), below is a snapshot. As for my next book, it’ll probably be a self-help guide to drop the data obsession. 🙂

Title Author Pages RFS [0,5] TEV [0,5] ESL [0,5] SCORE [0,100%]
A Short History of Nearly Everything Bill Bryson 560 4.5 5.0 4.5 93%
The Alchemist Paulo Coelho 208 4.5 4.5 4.5 90%
Life of Pi Yann Martel 336 4.5 4.0 4.5 87%
Moneyball: The Art of Winning an Unfair Game Michael Lewis 288 4.0 4.5 4.0 83%
Born to Be Good: The Science of a Meaningful Life Dacher Keltner 352 4.0 4.5 3.5 80%
The Tipping Point: How Little Things Can Make a Big Difference Malcolm Gladwell 288 4.0 4.0 4.0 80%
The Next 100 Years: A Forecast for the 21st Century George Friedman 272 4.0 4.5 3.5 80%
Super Freakonomics: Global Cooling, Patriotic Prostitutes, and Why Suicide Bombers Should Buy Life Insurance Steven Levitt; Stephen Dubner 288 4.0 4.0 4.0 80%
Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart Ian Ayres 272 4.0 4.0 4.0 80%
The Art of Strategy: A Game Theorist’s Guide to Success in Business & Life Avinash Dixit; Barry Nalebuff 512 4.0 4.5 3.5 80%
The Long Tail: Why the Future of Business is Selling Less of More Chris Anderson 256 4.0 4.0 3.5 77%
Outliers: The Story of Success Malcolm Gladwell 309 4.0 4.0 3.5 77%
Body of Lies David Ignatius 352 4.5 3.0 4.0 77%
A Walk in the Woods: Rediscovering America on the Appalachian Trail Bill Bryson 284 3.5 4.0 3.5 73%
Kill Alex Cross James Patterson 464 4.5 2.5 4.0 73%
The Increment David Ignatius 400 4.0 2.5 4.5 73%
A Whole New Mind: Why Right-Brainers Will Rule the Future Daniel Pink 272 4.0 4.0 3.0 73%
Blink: The Power of Thinking Without Thinking Malcolm Gladwell 288 3.5 4.0 3.0 70%
Physics of the Impossible: A Scientific Exploration into the World of Phasers, Force Fields, Teleportation, and Time Travel Michio Kaku 352 3.5 4.0 3.0 70%
The Bourne Dominion Eric van Lustbader 432 3.5 2.5 4.5 70%
Fortune’s Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street William Poundstone 400 3.0 4.0 3.5 70%
The Godfather Mario Puzo 448 3.5 2.5 4.5 70%
The Sicilian Mario Puzo 410 3.5 2.5 4.5 70%
The Invention of Air: A Story of Science, Faith, Revolution, and the Birth of America Steven Johnson 272 3.0 4.0 3.0 67%
The Drunkard’s Walk: How Randomness Rules Our Lives Leonard Mlodinow 272 3.0 3.5 3.5 67%
Cross Fire James Patterson 432 4.0 1.5 4.5 67%
The Social Animal: The Hidden Sources of Love, Character, and Achievement David Brooks 448 3.5 4.5 2.0 67%
The Golden Ratio: The Story of PHI, the World’s Most Astonishing Number Mario Livio 294 3.0 4.0 2.5 63%
Physics for Future Presidents: The Science Behind the Headlines Richard Muller 354 3.0 3.5 3.0 63%
The Future of Everything: The Science of Prediction David Orrell 464 3.0 3.5 3.0 63%
The Department of Mad Scientists Michael Belfiore 320 3.0 3.0 3.5 63%
For the President’s Eyes Only: Secret Intelligence and the American Presidency from Washington to Bush Christopher Andrew 672 3.0 3.5 3.0 63%
Born Standing Up: A Comic’s Life Steve Martin 209 4.0 2.0 3.0 60%
Science is Culture: Conversations at the New Intersection of Science + Society Adam Bly (Seed Magazine) 368 2.5 3.5 3.0 60%
1491: New Revelations of the Americas Before Columbus Charles Mann 480 2.5 3.5 2.5 57%
The Curious Incident of the Dog in the Night-Time Mark Haddon 226 3.0 3.0 2.0 53%
Group Theory in the Bedroom, and Other Mathematical Diversions Brian Hayes 288 2.0 3.5 2.0 50%
Euclid in the Rainforest: Discovering Universal Truth in Logic and Math Joseph Mazur 352 2.0 3.0 2.5 50%
This is Your Brain on Music: The Science of a Human Obsession Daniel Levitin 320 2.5 3.0 1.5 47%

A List of Some Web Data Sources

Well I needed to pull together a listing of publicly available data sources for a project, so I figured I’d post them here as well. Some descriptions and tag lines have been taken directly from the website, and some I quickly created on my own. This list is by no means comprehensive (I probably have about 100 links in the “Data” folder of my bookmarks…) but it’s a quick snapshot at some useful data sources on the web. That being said, there are a lot of considerations when targeting a data set and tomorrow’s need for data will most likely differ from today’s need for data. Build and execute a target data strategy using the vast sets of search engines, libraries, and social networks on the web and you’ll be just fine.

AggData – The advantage of AggData is that the data is collected into one file that is very raw and portable, which makes it easy to integrate into any application or website. You can browse free data sets or purchase any of the many data sets from public and private organizations for a relatively small fee.

The Association of Religion Data Archives – The ARDA Data Archive is a collection of surveys, polls, and other data submitted by researchers and made available online by the ARDA. There are nearly 500 data files included in the ARDA collection. You can browse files by category, alphabetically, view the newest additions, most popular files, or search for a file. Once you select a file you can preview the results, read about how the data were collected, review the survey questions asked, save selected survey questions to your own file, and/or download the data file.

Census.gov American FactFinder – In American FactFinder you can obtain data in the form of maps, tables, and reports from a variety of Census Bureau sources. Click here for a good listing of available data sets, visualizations, and search functionalities.

CIA World Factbook – Contains a lot of country-level metrics/statistics, although they are not very easily exportable and/or available in table format.

City Population – Gazetteer of global geographic data and limited demographic statistics per location.

Data360 – This is essentially a wiki for data. Data360 is an open-source, collaborative and free website.  The site hosts a common and shared database, which any person or organization, committed to neutrality and non-partisanship (meaning “let the data speak”), can use for presentation of reports and visualizations about the data.

Data.gov – The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead. For more information, view our How to Use Data.gov guide.

Data Marketplace – Buy and/or sell data. You can request data sets for others to build and provide for a small fee.

DBpedia – DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data.

EconoMagic – A directory of data sets specific to US states.

Factual – Factual is a platform where anyone can share and mash open data on any subject. Factual was founded to provide open access to better structured data.

FedStats – Provides access to all federal statistical agencies (by geographic scope or listed alphabetically) with a search function to discover available data sets across all US federal statistical agencies.

GapminderA non-profit venture that, through a interactive viz tool accompanied by a listing of available data tables, aims to “unveal the beauty of statistics for a fact based world view”.

GeoCommons Finder! – Upload, organize and share your Geographic Data. Then you can use their built in application called Maker! to map/visualize it.

GeoNames – The GeoNames geographical database covers all countries and contains over eight million placenames that are available for download free of charge.

Global Airport Database – Comprehensive set of global airport data (download available for free).

Global Health Facts – Search global data by health topic and/or country. You can also interactively compare data for up to five countries at a time.

Google Public Data – In addition to plainly using the main Google search engine to search for a specific data set, Google has a public data library with some valuable sets available for free.

Guardian.co.uk Data Store – Governments around the globe are opening up their data vaults – allowing you to check out the numbers for yourself. This is the Guardian’s gateway to that information. Search for government data here from the UK (including London), USA, Australia and New Zealand – and look out for new countries and places as we add them. Read more about this on the Datablog. Full list of government data sites here.

Harvard Geographic Information Systems – Contains a highly credible listing of various national and international data providers and data sources, with a strong focus on geographic data.

International Civil Aviation Organization (ICAO) – Global air traffic data available for a fee.

Infochimps – Request data sets, search for existing data sets, or post and sell your own data sets.

International Statistical Agencies
US Census Bureau: http://www.census.gov/aboutus/stat_int.html
US Bureau of Labor Statistics: http://www.bls.gov/bls/other.htm
United Nations: http://unstats.un.org/unsd/methods/inter-natlinks/sd_intstat.htm

MelissaData – Buy comprehensive zip code data for about $150. Tailored for businesses with use in marketing.

NationMaster – NationMaster is a massive central data source and a handy way to graphically compare nations. NationMaster is a vast compilation of data from such sources as the CIA World Factbook, UN, and OECD. Using the form above, you can generate maps and graphs on all kinds of statistics with ease.

National Association of Counties (NACO) – Includes a US county data library.

Numbrary – Numbrary is a free online service dedicated to finding, using and sharing numbers on the web.

OECD Stat Extracts – OECD.Stat includes data and metadata for OECD (Organization for Economic Cooperation and Development) countries and selected non-member economies.

QuickFacts (US Census Bureau Site) – Quick, easy access to facts about people, business, and geography.

StateMaster – StateMaster is a unique statistical database which allows you to research and compare a multitude of different data on US states. We have compiled information from various primary sources such as the US Census Bureau, the FBI, and the National Center for Educational Statistics. More than just a mere collection of various data, StateMaster goes beyond the numbers to provide you with visualization technology like pie charts, maps, graphs and scatterplots. We also have thousands of map and flag images, state profiles, and correlations.

United Nations Development Programme (UNDP) – Includes UN Human Development reports and statistics such as the Human Development Index.

USA Counties (US Census Bureau Site) – A directory of data tables for US states and individual counties. Includes over 6,500 data items.

Weather Underground – Provides free access to historical weather data for cities around the globe.

Wolfram|Alpha – Deemed a “computational knowledge engine”, the W|A search and discovery tool is mathematically-based and tries to turn queries (term-based or data-driven) into actionable knowledge with visualization of in-house data sets and information relevant to your query.

World Gazetteer – The World Gazetteer provides a comprehensive set of population data and related statistics.

World Port Source – Contains extensive data on global sea ports, characterized by size and searchable by shipping liners and other various data fields.