API web data capture

I’d go for scraping, as the url itself gives you more control over what you’re after. Also, you can easily get the tabular data with pandas.

For example:

import requests
import pandas as pd


headers = {
    "accept": "application/json, text/javascript, */*; q=0.01",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
    "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.99 Safari/537.36",
    "x-requested-with": "XMLHttpRequest",
}

url = "https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html"
html = requests.get(url).text

df = pd.read_html(html, flavor="html5lib")
df = pd.concat(df).drop([0, 1, 2], axis=1)
df.to_csv("golf.csv", index=False)

Gives you this:

enter image description here

You can then keep swapping the urls or modify the stat., y, and eon part of the URL to get different stats. For example, this is 2018 U.S. Open – https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top