You don’t really need regex for that. You might be just fine with bs4
and a css
selector.
Try this:
import requests
from bs4 import BeautifulSoup
html = requests.get("https://www.altuzarra.com/en-bg/customer-service/contact").text
soup = BeautifulSoup(html, "html.parser")
mailtos = soup.select('a[href^=mailto]')
print(list(set(m["href"] for m in mailtos)))
Output:
['mailto:[email protected]', 'mailto:[email protected]', 'mailto:[email protected]', 'mailto:[email protected]']
And if you want “pure” emails, just change the last line to:
print(list(set(m["href"].replace("mailto:", "") for m in mailtos)))
To get this:
['[email protected]', '[email protected]', '[email protected]', '[email protected]']
CLICK HERE to find out more related problems solutions.