If you’re looking for a workaround, I recommend making the GET request via requests library:

import requests
from io import StringIO

url = "https://gitlab.com/stragu/DSH/-/raw/master/Python/pandas/spi.csv"
df = pd.read_csv(StringIO(requests.get(url).text))
df.head()
  country_code  year        spi
0          AFG  2020  42.290001
1          AFG  2019  42.340000
2          AFG  2018  40.610001
3          AFG  2017  38.939999
4          AFG  2016  39.650002

As to the “why” part of it, I see read_csv internally uses urllib for standard URLs, apparently the API in question blocks the request possibly because it thinks you are a crawler. If I repeat the same process, but add The “User-Agent” header, the request succeeds.

TLDR; what pandas does and fails:

from urllib.request import Request, urlopen

req = Request(<URL>)
urlopen(req).read() # fails

What pandas should have done for this to work:

req = Request(<URL>)
req.add_header('User-Agent', <literally anything>)
urlopen(req).read() # succeeds

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top