How to Study 20,000 Research Papers in 3 Weeks

After exploring opinionated research, my attention was drawn to the more rigorous academic studies I’d encountered.

Seeking more academic research, I moved my understanding of the conversation on education from debating the theoretical history of education and ideals of its future, toward identifying the factors which directly affect its quality so they can be be leveraged.

I’ve focused on as a source for this research.This would be richest source of direct research, but would be far too much to read on my own.

So I would scrape the data, solicit the help of a distributed team to summarize the premise and findings of each article, and finally work with a trusted content writer to help synthesize the available research into “thesis proposals” which I could then flip into essay outlines full of cited research, quotes, and facts, ready for me to write in my own voice.

Here goes:


I’ve identified 61 education-related topics on

These were scraped using ParseHub (after trying and Kumono) (could have written more Scrapy scrapers in Python, but wanted to see what the code-less tools could do).

This resulted in some 20,000 unique research articles, from which I selected 1,000 which sounded interesting.

Now, with 1,000, I wanted to understand the basic topics I would be reading about, as well as a means of highlighting research most relevant to my interests.

I would tag all the articles, and derive a relevance score from their tags, starting with:

  • Technology
  • Assessment
  • Psychology

As I worked, I also refined my tags. Now I knew what topics I would be looking at:

  • Technology
  • Assessment – eg. Means of assessment, as well as definitions of standards.
  • Student Psychology – Usually motivation / engagement
  • Teacher Psychology – Ended up with lots on professional development
  • Community – Civic / social responsibility, or ways schools interfaced with or created community.
  • Philosophy – Anything relevant that didn’t seem to fit neatly into one of the above.

Each article received a score of 0 – 3 in each tag. This produced a score by which I could rank articles by priority, as well as slice by categories and groups of categories. Additionally, I applied a binary “Must Read” tag in order to manually highlight articles I knew would be good to read but didn’t score highly for whatever reason on the tags themselves.

Then, I began creating sub-sets from the tagged articles using the QUERY function:

  • Assessment of Student Psychology (Articles with > 0 scores in Assessment and Student Psych.)
  • Assessment of Teacher Psychology (> 0 in Assessment and Teacher Psych.)
  • Assessment & Technology
  • Assessment of Community Impact (> 0 in Assessment and Community)

As well as sub-sets where articles used specific keywords in their titles:

  • Trend, as in “Trends in Education” or “The Trend Toward X”
  • Principles, as in “Guiding Principles of Primary Education”
  • Blended, as in all things “Blended Learning”. See also “transformative” and “flipped”.
  • Games, mind (as in mindfulness), curriculum
  • “Data”
  • Anything with the word “Effective” in its title
  • “Case”, as in case studies.

See the final scored and sorted results.

Hiring A Distributed Team

First, I set up a workspace in Google Drive to organize the distributed work. Folders for each worker would be editable by that worker, and I would leave a spreadsheet in each worker’s folder where I would link to articles for that worker to summarize and they would link to the completed summaries.

I had a master workbook which collected all the summaries from all the workers in one place (using the IMPORTRANGE function), give me overview statistics of everybody’s progress, and see the summaries inline with my topic and keyword sub-sets.

The folder structure looked like this:

  • Home
    • All Scraped Articles.csv
    • 1000 Scored Articles.csv
    • Summaries
      • Summaries by Worker1 – Worker 1 has permission to edit from here, down.
        • Articles for Worker1.csv
        • summary 1
        • summary 2
      • Summaries by Worker2 – Worker 2 has permission to edit from here, down.
        • Articles for Worker2.csv
        • summary 1
        • summary 2

The overview was meant to highlight:

  1. How close am I to having all the summaries I want?
  2. Who works the fastest (so I can rely on them)?
  3. Who is almost out of work (I need to assign more articles, possibly pull from someone who’s backlogged)?

This overview looks like:

Worker Stats

Now, with my setup, I began hiring workers.

Created stock messages for:

  • Inviting a freelancer to apply
  • Accepting a freelancer’s proposal
  • Tips, FAQ, and other gotchas I found myself needing to tell people more than once, that were then bundled and told to each subsequent worker, shortly after hiring them.

Workers were hired from:

  • Elance
    • Made 4 hires, 2 of which I later canceled for low quality.
    • $5 / hr average
    • Tip: Work with individuals, rather than agencies.
    • Don’t be afraid to cut people off early. You will find other workers who will do the job well, and you’ll save a lot of time and money by not trying to “teach” bad workers to perform well.
  • Upwork
    • Probably hired 10. Only cancelled one.
    • $3.5 / hr average
    • UI is better than Elance, yet still nothing to be proud of. Design looks nice but is clunky to use.

Reviewing the Research to Synthesize New Ideas

I am in the middle of reading the summaries and drawing new conclusions.

Stay tuned.


  • Scrape
  • Score
  • Hire
  • Review (with writer)
  • Synthesize (with writer)
  • Outline (just writer)
  • Write (just me)

Liked that post? Try this one next:

  • What’s Going On With Education, Today?

    A casual attempt to “catch up” on education reform (trends, state of the art, and status quo) results in a meandering exploration through ideals, and ends with an idea for how to get to the facts.


Jordan is a freelance engineer with full-stack chops, and an eye for analytics and growth.


  1. […] I set out to intensively gather a new, deeper understanding — gathering data by indexing, scoring, and distilling the cumulative body of research available on […]