I’d go for scraping, as the url itself gives you more control over what you’re after. Also, you can easily get the tabular data with pandas.
For example:
import requests
import pandas as pd
headers = {
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.99 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
}
url = "https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html"
html = requests.get(url).text
df = pd.read_html(html, flavor="html5lib")
df = pd.concat(df).drop([0, 1, 2], axis=1)
df.to_csv("golf.csv", index=False)
Gives you this:
You can then keep swapping the urls or modify the stat.
, y
, and eon
part of the URL to get different stats. For example, this is 2018 U.S. Open – https://www.pgatour.com/content/pgatour/stats/stat.02674.y2017.eon.t030.html
CLICK HERE to find out more related problems solutions.