Nathan Laundry's Blog


Turn GPT-4 Into your Personal Literature Review Bot

Academia GenAI Literature Review

👋 Hey Friends,

Getting into a new field of research requires reading dozens of landmark and seminal papers to understand that field’s foundations. This usually involves the manual process of searching key-words on google scholar, finding papers, polling friends, advisors, colleagues, and other experts for the most important papers. It’s an ad-hoc process that can take months.

Even more time intensive is conducting a proper, methodical, literature review. Establishing a method, identifying the correct keywords, finding the right databases - it’s difficult but vital work.

Lately, I’ve been thinking about how I can use ChatGPT to get researchers a summary of a field’s most important papers as quickly and digestibly as possible. That brings me to the first tool in The Academic’s ToolKit, the Scholar Scraper.

All the code can be found here: ScholarScraper

The Academics_Scholar_Scraper is a Python package that quickly searches and summarizes the most cited articles pertaining to a keyword. It uses the Elsevier Scopus API and GPT-4 to rapidly produce a csv of the top articles with the following data:

  • Title: The title of the paper.
  • Authors: The authors of the paper.
  • Publication Name: The name of the publication where the paper was published.
  • Publication Date: The date when the paper was published.
  • DOI: The DOI of the paper.
  • Summary: A summary of the paper generated by the GPT-4 model.
  • Hypotheses: Hypotheses in the paper as interpreted by the GPT-4 model.
  • Methods: Methods used in the paper as interpreted by the GPT-4 model.
  • Findings: Findings in the paper as interpreted by the GPT-4 model.

Convinced? Let’s get into how to use it!

Installation

To install the scholar scraper there are three steps:

  • Install Python on your system if you haven’t already.
  • Install the academics_scholar_scraper package using pip.
  • Set up your Elsevier API key and OpenAI API key as environment variables.

Let’s go through them together 😄

Installing the academics_scholar_scraper package

  1. Make sure you have Python installed on your system. You can check if Python is installed by running the command python –version in your terminal. If Python is not installed, you can download it from the official website at https://www.python.org/downloads/. Alternatively, you can use your Operating System’s package manager to install.
  2. Open a terminal or command prompt and run the following command to install the academics_scholar_scraperpackage:

pip install academics_scholar_scraper

This command will download and install the package and its dependencies.

Setting up the Elsevier API key

  1. To use the Elsevier API, you need to obtain an API key from the Elsevier Developer Portal. If you don’t have an account, you can create one for free at https://dev.elsevier.com/.

  2. Once you have an account, log in to the Elsevier Developer Portal and navigate to the “API Key Generator” page. Select the “SCOPUS” product and generate a new API key.

  3. Copy the API key and paste it into a text editor or note-taking app for safekeeping.
    In your terminal or command prompt, run the following command to set the ELSEVIER_API_KEY environment variable:
    export ELSEVIER_API_KEY=your_elsevier_api_key

  4. Replace your_elsevier_api_key with the API key you generated in step

  5. To verify that the environment variable is set correctly, run the following command: echo $ELSEVIER_API_KEY

This should output your Elsevier API key.

Setting up the OpenAI API key

  • To use the OpenAI GPT API, you need to obtain an API key from the OpenAI website. If you don’t have an account, you can create one for free at https://beta.openai.com/signup/.

  • Once you have an account, log in to the OpenAI website and navigate to the “API Keys” page. Generate a new API key for the GPT model.

  • Copy the API key and paste it into a text editor or note-taking app for safekeeping.

  • In your terminal or command prompt, run the following command to set the OPENAI_API_KEY environment variable:
    export OPENAI_API_KEY=your_openai_api_key

  • Replace your_openai_api_key with the API key you generated in step 2.

  • To verify that the environment variable is set correctly, run the following command: echo $OPENAI_API_KEY

This should output your OpenAI API key.

That’s it! You have now installed the academics_scholar_scraper package and set up the necessary API keys to use it.
Usage

To run the script from the command line, use the following arguments:

    keyword: The keyword to search for in the articles.
    n, --num_papers: The number of papers to retrieve (default: 10).
    o, --output: The output CSV file (default: papers.csv).
    s, --subject: The subject area (e.g., AGRI, ARTS, BIOC, etc.) (optional).

For example, if you want to search for ten papers related to machine learning in the computer science subject area and save the summaries in a file called “results.csv”, use the following command:

academics_scholar_scraper “machine learning” -n 10 -o results.csv -s COMP

Here’s a step-by-step guide on how to use the academics_scholar_scraper package to retrieve academic papers using the command line:

  1. Open a command prompt or terminal window on your computer. You can typically do this by searching for “Command Prompt” or “Terminal” in your computer’s search bar.
  2. Navigate to the directory where the main.py file is located using the cd command. For example, if the main.py file is located in a folder called my_project, you can navigate to that folder using the following command: cd path/to/my_project

Replace path/to/my_project with the actual path to the my_project folder on your computer.

  1. Once you’re in the correct directory, you can run the academics_scholar_scraper package with the appropriate arguments. Here’s a breakdown of the available arguments:
    keyword: The keyword to search for in the articles. This argument is required.
    n, -num_papers: The number of papers to retrieve (default: 10).
    o, -output: The output CSV file (default: papers.csv).
    s, -subject: The subject area (e.g., AGRI, ARTS, BIOC, etc.) (optional).

here’s a list of subject areas that you can use with the –subject argument:

AGRI: Agriculture and Biological Sciences
ARTS: Arts and Humanities
BIOC: Biochemistry, Genetics and Molecular Biology
BUSI: Business, Management and Accounting
CHEM: Chemistry
COMP: Computer Science
DEC: Decision Sciences
DENT: Dentistry
EART: Earth and Planetary Sciences
ECON: Economics, Econometrics and Finance
ENGI: Engineering
ENVI: Environmental Science
HEAL: Health Professions
IMMU: Immunology and Microbiology
MATE: Materials Science
MATH: Mathematics
MED: Medicine
NEUR: Neuroscience
NURS: Nursing
PHAR: Pharmacology, Toxicology and Pharmaceutical Science
PHYS: Physics and Astronomy
PSYC: Psychology
SOCI: Social Sciences
VET: Veterinary Science and Veterinary Medicine

  1. To run the package with the appropriate arguments, use the following command structure:

academics_scholar_scraper "keyword" -n num_papers -o output_file -s subject

Replace keyword with the keyword you want to search for, num_papers with the number of papers you want to retrieve (if different from the default of 10), output_file with the name of the output CSV file you want to create (if different from the default of “papers.csv”), and subject with the subject area you want to search in (if applicable).

For example, if you want to search for ten papers related to machine learning in the computer science subject area and save the summaries in a file called “results.csv”, use the following command:

academics_scholar_scraper "machine learning" -n 10 -o results.csv -s COMP

  1. Once you’ve entered the appropriate command, press Enter to run the script. The script will retrieve the specified number of papers related to the specified keyword and subject area, and save the summaries to the specified output file in CSV format.

That’s all there is to it!

Output

here’s an example run of the package I did for a friend in psychology:

academics_scholar_scraper 'empathy' -n 10 -o test2.csv -s PSYC

{
   "Title": "Measuring individual differences in empathy: Evidence for a multidimensional approach",
   "Authors": "Davis M.",
   "Publication Name": "Journal of Personality and Social Psychology",
   "Publication Date": "1983-01-01",
   "DOI": "10.1037/0022-3514.44.1.113",
   "Summary": "This article explores the concept of empathy as a multidimensional construct and proposes a new method for measuring individual differences in empathic abilities.",
   "Hypotheses": "The author hypothesizes that empathy is a multidimensional construct and that it can be effectively measured using a multidimensional approach.",
   "Methods": "Davis developed the Interpersonal Reactivity Index (IRI), a self-report questionnaire designed to assess four dimensions of empathy (perspective-taking, empathic concern, personal distress, and fantasy), and tested its validity using various samples.",
   "Findings": "The results indicate that the IRI is a reliable and valid measure of individual differences in empathy, supporting the idea of a multidimensional approach to empathy assessment."
}

❗ Disclaimer

It is important to note that the script uses GPT-4 to generate summaries, which may not always be perfectly accurate. The generated summaries should be used as a starting point for further investigation, and you should always refer to the original articles for accurate information.
Conclusion

The ScholarScraper tool is a powerful and convenient way to search for and summarize scholarly articles. By using the Elsevier Scopus API and OpenAI’s GPT-3 model, it can help you quickly find relevant articles and get a high-level overview of their content. Give it a try and see how it can enhance your research process!

If a youtube tutorial would be useful, let us know! Also, if there’s interest, I’ll do a follow up blog post on how the code works, how to extend it, and any new features we at the Academic’s Field Guide to Writing Code implement along the way.


Cheers,
Nathan Laundry

✉️ Join my Email Newsletter #GuidingQuestions here