파이썬 크롤링시 forbidden

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

Kirito's Blog

파이썬 크롤링시 forbidden 본문

파이썬

파이썬 크롤링시 forbidden

Kirito 2020. 4. 30. 12:26

가끔 크롤링이 안되고 403 forbidden이 뜨는 페이지가 있습니다.

실제로 403이 뜨는 페이지를 크롤링 하려면

from bs4 import BeautifulSoup
from urllib.request import Request

headers = {'User-Agent':'Chrome/81.0.4044.92'}
w_url = 'https://www.worldometers.info/coronavirus/'
w_req = Request(w_url, headers=headers)

w_html = urllib.request.urlopen(w_req).read()
w_soup = BeautifulSoup(w_html, 'html.parser')

위와같이 중간에 헤더를 추가해주면 됩니다.

셀레니움에서는

from selenium import webdriver

options = webdriver.ChromeOptions()
#       options.add_argument('--headless')
#       options.add_argument("--disable-extensions")
#       options.add_argument("disable-infobars")
#       options.add_argument("window-size=1920x1080")
        options.add_argument("no-sandbox")
        options.add_argument("disable-gpu")
        options.add_argument("--lang=ko_KR")
        options.add_argument(
            'user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')

        driver = webdriver.Chrome('크롬드라이버의 위치/chromedriver.exe', chrome_options=options)
        driver.get('http://사이트주소')

위와같이 add_argument로 추가해주시면 됩니다

주석처리한 headless나 disable옵션들은 속도 향상을 위해 사용합니다.

저작자표시 동일조건

Comments

Kirito's Blog

파이썬 크롤링시 forbidden 본문

파이썬 크롤링시 forbidden

티스토리툴바