登录
首页 >  数据库 >  MySQL

python爬虫之连接mysql

来源:SegmentFault

时间:2023-01-14 15:53:50 471浏览 收藏

本篇文章主要是结合我之前面试的各种经历和实战开发中遇到的问题解决经验整理的,希望这篇《python爬虫之连接mysql》对你有很大帮助!欢迎收藏,分享给更多的需要的朋友学习~

准备工作

  • 运行本地数据库服务器

    mysql -u root -p
  • 安装pymysql

    pip install pymysql

建表

CREATE DATABASE crawls;
// show databases; 
use db;

CREATE TABLE IF NOT EXISTS baiduNews('
       'id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,'
       'ranking VARCHAR(30),'
       'title VARCHAR(60),'
       'datetime TIMESTAMP,'
       'hot VARCHAR(30));
// show tables;

pymysql连接数据库

db = pymysql.connect(host='localhost', port=3306, user='root', passwd='123456', 
                    db='crawls', charset='utf8')
cursor = db.cursor()
cursor.execute(sql_query)
db.commit()

用python操作mysql还是比较简单的,如果有一点数据库基础的话,可以直接上手,最后一定不要忘了写commit提交,不然数据只是缓存,存不到数据库里

完整示例

爬取百度上最热的几个新闻标题,并存储到数据库,太懒了没写注释-_- (确保本地mysql服务器已经打开)

'''
Get the hottest news title on baidu page,
then save these data into mysql
'''
import datetime

import pymysql
from pyquery import PyQuery as pq
import requests
from requests.exceptions import ConnectionError

URL = 'https://wappass.baidu.com/static/captcha/tuxing.html?&logid=11151228204422475442&ak=c27bbc89afca0463650ac9bde68ebe06&backurl=https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%25E7%2583%25AD%25E7%2582%25B9&ext=x9G9QDmMXq%2FNo87gjGO0P4duDYWmTLah%2FsWlJ%2B%2Fs0zRWkhrGqVqihBVl6ZY8QtPHeUkK%2FLSi82sM2wFm%2BXofRA8QipFbArBY11xRs2OUQOCyuRtUIETqejFhi48WwtWcZaw2FQi2OfC72W%2FW5HwRPw%3D%3D&signature=86adae7de4d91d6adc7c4689b7348af3&timestamp=1673119049'
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36',
    'Upgrade-Insecure-Requests': '1'
}

def get_html(url):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return response.text
        return None
    except ConnectionError as e:
        print(e.args)
        return None

def parse_html(html):
    doc = pq(html)
    trs = doc('.FYB_RD table.c-table tr').items()
    for tr in trs:
        index = tr('td:nth-child(1) span.c-index').text()
        title = tr('td:nth-child(1) span a').text()
        hot = tr('td:nth-child(2)').text().strip('"')
        yield {
            'index':index,
            'title':title,
            'hot':hot
        }

def save_to_mysql(items):
    try:
        db = pymysql.connect(host='localhost', port=3306, user='root', passwd='123456',
                             db='crawls', charset='utf8')
        cursor = db.cursor()
        cursor.execute('use crawls;')
        cursor.execute('CREATE TABLE IF NOT EXISTS baiduNews('
                       'id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,'
                       'ranking VARCHAR(30),'
                       'title VARCHAR(60),'
                       'datetime TIMESTAMP,'
                       'hot VARCHAR(30));')
        try:
            for item in items:
                print(item)
                now = datetime.datetime.now()
                now = now.strftime('%Y-%m-%d %H:%M:%S')
                sql_query = 'INSERT INTO baiduNews(ranking, title, datetime, hot) VALUES ("%s", "%s", "%s", "%s")' % (
                            item['index'], item['title'], now, item['hot'])
                cursor.execute(sql_query)
                print('Save into mysql')
            db.commit()
        except pymysql.MySQLError as e:
            db.rollback()
            print(e.args)
            return
    except pymysql.MySQLError as e:
        print(e.args)
        return

def check_mysql():
    try:
        db = pymysql.connect(host='localhost', port=3306, user='root', passwd='123456',
                             db='crawls', charset='utf8')
        cursor = db.cursor()
        cursor.execute('use crawls;')
        sql_query = 'SELECT * FROM baiduNews'
        results = cursor.execute(sql_query)
        print(results)
    except pymysql.MySQLError as e:
        print(e.args)

def main():
    html = get_html(URL)
    items = parse_html(html)
    save_to_mysql(items)
    #check_mysql()

if __name__ == '__main__':
    main()

以上就是《python爬虫之连接mysql》的详细内容,更多关于mysql的资料请关注golang学习网公众号!

声明:本文转载于:SegmentFault 如有侵犯,请联系study_golang@163.com删除
相关阅读
更多>
最新阅读
更多>
课程推荐
更多>
评论列表