how can i find all hidden tab hrefs during web scraping?

You can use this example to get links to documents from other tabs:

import requests
from bs4 import BeautifulSoup


url = 'https://www.jud11.flcourts.org/Judge-Details?judgeid=1063&sectionid=2'
headers = {'X-MicrosoftAjax': 'Delta=true',
           'X-Requested-With': 'XMLHttpRequest'}

with requests.session() as s:

    soup = BeautifulSoup(s.get(url).content, 'html.parser')

    data = {}
    for i in soup.select('input[name]'):
        data[i['name']] = i.get('value', '')

    for page in range(0, 6):
        print('Tab no.{}..'.format(page))
        data['ScriptManager'] = "ScriptManager|dnn$ctr1843$View$rtSectionHearingTypes"
        data['__EVENTARGUMENT'] = '{"type":0,"index":"' + str(page) + '"}'
        data['__EVENTTARGET'] ="dnn$ctr1843$View$rtSectionHearingTypes"
        data['dnn_ctr1843_View_rtSectionHearingTypes_ClientState'] = '{"selectedIndexes":["' + str(page) + '"],"logEntries":[],"scrollState":{}}'
        data['__ASYNCPOST'] = "true"
        data['RadAJAXControlID'] = "dnn_ctr1843_View_RadAjaxManager1"

        soup = BeautifulSoup( s.post(url, headers=headers, data=data).content, 'html.parser' )
        for a in soup.select('a[href*="documents"]'):
            print('https://www.jud11.flcourts.org' + a['href'])

Prints:

Tab no.0..
https://www.jud11.flcourts.org/documents/judges_forms/1062458802-Ex%20Parte%20Motions%20to%20Compel%20Discovery.pdf
https://www.jud11.flcourts.org/documents/judges_forms/1062459053-JointCaseMgtReport121.pdf
Tab no.1..
Tab no.2..
Tab no.3..
Tab no.4..
https://www.jud11.flcourts.org/documents/judges_forms/1422459010-Order%20Granting%20Motion%20to%20Withdraw.docx
https://www.jud11.flcourts.org/documents/judges_forms/1422459046-ORDER%20ON%20Attorneys%20Fees.docx
Tab no.5..
https://www.jud11.flcourts.org/documents/judges_forms/1512459051-Evidence%20Procedures.docx

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top