If you’re looking for a workaround, I recommend making the GET request via requests
library:
import requests
from io import StringIO
url = "https://gitlab.com/stragu/DSH/-/raw/master/Python/pandas/spi.csv"
df = pd.read_csv(StringIO(requests.get(url).text))
df.head()
country_code year spi
0 AFG 2020 42.290001
1 AFG 2019 42.340000
2 AFG 2018 40.610001
3 AFG 2017 38.939999
4 AFG 2016 39.650002
As to the “why” part of it, I see read_csv
internally uses urllib
for standard URLs, apparently the API in question blocks the request possibly because it thinks you are a crawler. If I repeat the same process, but add The “User-Agent” header, the request succeeds.
TLDR; what pandas does and fails:
from urllib.request import Request, urlopen
req = Request(<URL>)
urlopen(req).read() # fails
What pandas should have done for this to work:
req = Request(<URL>)
req.add_header('User-Agent', <literally anything>)
urlopen(req).read() # succeeds
CLICK HERE to find out more related problems solutions.