0

I thought it might be because the site asks that you login, so I went to cURL converter and got the cookies and header information. I also thought it could've been an outdated version of python, I installed the latest one and started a new project. I pip install lxml, bs4, and requests to be sure.

Here's the code:

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://newsfilter.io/latest/news', cookies=cookies, headers=headers).text
soup = BeautifulSoup(html_text, 'lxml')
jobs = soup.find_all('div', class_='sc-dnqmqq bxsfdc')

print(jobs)
1
  • Most likely the web site uses JavaScript to populate the data dynamically. requests.get() just returns the source code.
    – Barmar
    Commented Oct 20, 2022 at 16:33

1 Answer 1

1

Using the dev tool of your browser, look at the request made to the server.

Look at the html, and check that it is mostly empty andd then notice that it makes a json request to https://static.newsfilter.io/landing-page/articles-latest-news.json

It is just a matter to get that and parse it.

import requests
    
resp = requests.get('https://static.newsfilter.io/landing-page/articles-latest-news.json')
if resp.status_code == 200:
    for headlines in resp.json():
        print(f'{headlines["publishedAt"]}: {headlines["title"]}')
1
  • Thanks so much bro, if you don't mind me asking, how did you get that static link? I've been looking all over for it in the Network tab
    – noah g
    Commented Oct 21, 2022 at 14:26

Not the answer you're looking for? Browse other questions tagged or ask your own question.