How to replace unwanted text on beautifulsoup

1.) The text is not replaced because it contains \xa0 character.

2.) To not get content of <p> tags which are under <script> tags, you can use .find_parent() method

Example:

import requests
from bs4 import BeautifulSoup

url = 'https://www.cgtrader.com/3d-models'

response = requests.get(url)

soup = BeautifulSoup(response.content, 'html.parser')
content = soup.find_all('p')

for content2 in content:
    if content2.find_parent('script'):
        continue
    i = content2.get_text().replace('\xa0',' ')
    i = i.replace('Type something to search', ' ').replace('Your shopping cart is empty.', ' ').replace('By subscribing you confirm that you have read and accept our Terms of Use', ' ')
    print(i)

Prints:

Find the exact right 3D content for your needs, including AR/VR, gaming, advertising, entertainment and others
Buy or free-download professional 3D models ready to be used in CG projects, film and video production, animation, visualizations, games, VR/AR, and others. Assets are available for download in many industry-accepted formats including MAX, OBJ, FBX, 3DS, STL, C4D, BLEND, MA, MB and other. If you are searching for high poly or real-time 3D assets, we have a leading digital art library for all your needs.
This category covers 3D aircraft. CG airplanes will fit into simulations, visualizations, advertisements and videos. Drone bodies and parts will delight fans of tiny flying vehicles. And the rigged models are ready to be imported into game engines.

...and so on.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top