By latest I assume you want the link with the most recent date. As such you need to capture both the URL and the date given for each link. This can then by converted into a datetime
object and added to a list.
After all URLs are found, the list can be easily sorted into date order, with the newest first. The latest URL can then be used to download the img file.
For example:
from bs4 import BeautifulSoup
import requests
from datetime import datetime
base_url = "https://dl.twrp.me"
req = requests.get(f"{base_url}/gauguin")
soup = BeautifulSoup(req.content, "html.parser")
urls = []
for a in soup.find_all('a', href=True):
link = a['href']
if link.endswith('.img.html'):
date_text = a.find_next('em').get_text(strip=True)
date_dt = datetime.strptime(date_text, "%Y-%m-%d %H:%M:%S %Z")
urls.append([date_dt, link])
latest = sorted(urls, reverse=True)[0][1] # choose the latest url
# Download the latest img file
url_img = base_url + latest.split('.html')[0]
filename = url_img.split('/')[-1]
with requests.get(url_img, stream=True, headers={'Referer' : base_url + latest}) as req_img:
with open(filename, 'wb') as f_img:
for chunk in req_img.iter_content(chunk_size=2**15):
f_img.write(chunk)
This approach might have the advantage of still working if the naming or numbering scheme is changed. A referer header is added to avoid the website returning the HTML for the download page.
This results in a download of 131,072 KB
Note, if you prefer to ignore the date and just sort on the version number, use the following approach:
from bs4 import BeautifulSoup
import requests
from datetime import datetime
import re
base_url = "https://dl.twrp.me"
req = requests.get(f"{base_url}/gauguin")
soup = BeautifulSoup(req.content, "html.parser")
urls = []
for a in soup.find_all('a', href=True):
link = a['href']
if link.endswith('.img.html'):
version = re.findall(r'(\d+)', link)
urls.append([version, link])
latest = sorted(urls, reverse=True)[0][1] # choose the latest url
CLICK HERE to find out more related problems solutions.