登录
首页 >  Golang >  Go教程

Golang用colly做简单爬虫教程

时间:2025-06-30 14:42:15 454浏览 收藏

你在学习Golang相关的知识吗?本文《Golang用colly实现简单爬虫教程》,主要介绍的内容就涉及到,如果你想提升自己的开发能力,就不要错过这篇文章,大家要知道编程理论基础和实战操作都是不可或缺的哦!

实现一个简单的爬虫程序在Golang中使用colly框架非常高效。1. 安装colly:执行go get github.com/gocolly/colly/v2完成安装;2. 创建采集器:使用colly.NewCollector()创建采集器并用OnHTML监听HTML元素提取内容,如抓取网页标题;3. 抓取多个链接:通过OnHTML捕获a标签并递归访问子链接,可设置MaxDepth限制爬取深度;4. 设置请求头与延迟:用OnRequest设置User-Agent模拟浏览器行为,并通过Limit控制并发和请求间隔降低被封风险;5. 保存数据:将结果保存为JSON格式或存入数据库,满足后续分析需求。这些步骤展示了colly的基本用法,适合快速开发小型爬虫项目。

怎样用Golang实现一个简单的爬虫程序 使用colly框架实战

实现一个简单的爬虫程序其实并不难,尤其是在 Golang 中使用 colly 这个框架,可以非常高效地完成网页抓取任务。Colly 是 Go 语言中最流行的爬虫库之一,简单易用、性能高,适合新手入门和快速开发。

怎样用Golang实现一个简单的爬虫程序 使用colly框架实战

安装 Colly

在开始写代码之前,首先需要安装 colly:

怎样用Golang实现一个简单的爬虫程序 使用colly框架实战
go get github.com/gocolly/colly/v2

这一步完成后,就可以在项目中导入并使用了。记得保持你的 Go 环境配置正确,否则可能会遇到依赖问题。


创建第一个爬虫:抓取网页标题

我们先从最基础的示例入手:抓取某个网页的 </code> 标签内容。</p><img src="/uploads/20250630/17512656936862319d21e4a.jpg" alt="怎样用Golang实现一个简单的爬虫程序 使用colly框架实战"><pre class="brush:go;toolbar:false;">package main import ( "fmt" "github.com/gocolly/colly/v2" ) func main() { // 创建一个新的 collector c := colly.NewCollector() // 在访问每个页面时触发 c.OnHTML("title", func(e *colly.HTMLElement) { fmt.Println("页面标题是:", e.Text) }) // 开始访问目标网址 c.Visit("https://example.com") }</pre><p>这段代码的作用就是访问 <code>example.com</code> 并提取其中的标题。关键点在于:</p><ul><li>使用 <code>colly.NewCollector()</code> 创建采集器。</li><li>使用 <code>OnHTML</code> 监听 HTML 元素,传入选择器(如 CSS 选择器)。</li><li>使用 <code>Visit</code> 发起请求。</li></ul><p>你可以把 <code>https://example.com</code> 替换成任何你想爬取的网站试试看。</p><hr><h3>抓取多个链接:遍历页面中的所有超链接</h3><p>很多时候我们不仅想抓取一个页面的内容,还想顺着链接继续爬下去。这时候可以用 <code>OnHTML</code> 来捕获 <code><a></code> 标签,并递归访问它们。</p><pre class="brush:go;toolbar:false;">c.OnHTML("a[href]", func(e *colly.HTMLElement) { link := e.Attr("href") // 访问子链接 c.Visit(link) })</pre><p>但要注意,这样会无限递归下去。通常我们会限制采集深度:</p><pre class="brush:go;toolbar:false;">c := colly.NewCollector( colly.MaxDepth(2), // 只爬两层页面 )</pre><p>这样就能避免爬到太多无关页面,控制资源消耗。</p><hr><h3>设置请求头与延迟:模拟浏览器行为</h3><p>有些网站会对爬虫做限制,我们可以稍微“伪装”一下请求头,让服务器认为你是浏览器访问:</p><pre class="brush:go;toolbar:false;">c.OnRequest(func(r *colly.Request) { r.Headers.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0 Safari/537.36") })</pre><p>另外,为了避免对服务器造成压力,也可以设置访问间隔:</p><pre class="brush:go;toolbar:false;">c.Limit(&colly.LimitRule{ DomainGlob: "*", Parallelism: 2, Delay: 1 * time.Second, })</pre><p>这样每秒最多请求一次,同时并发不超过两个请求,可以有效降低被封 IP 的风险。</p><hr><h3>小技巧:保存数据到文件或数据库</h3><p>爬下来的数据当然要保存起来。常见的做法是保存为 JSON 或 CSV 文件。</p><p>例如保存成 JSON:</p><pre class="brush:go;toolbar:false;">type Result struct { Title string `json:"title"` URL string `json:"url"` } var results []Result c.OnHTML("title", func(e *colly.HTMLElement) { results = append(results, Result{ Title: e.Text, URL: e.Request.URL.String(), }) }) // 最后用 json.MarshalIndent 写入文件即可</pre><p>如果你打算做更复杂的分析,还可以考虑将数据存入 SQLite、MySQL 或 MongoDB。</p><hr><p>基本上就这些了。Colly 功能很强大,上面只是展示了最基本的一些用法。实际使用中还可以结合代理、分布式架构等来提升效率。不过对于大多数小规模爬虫需求来说,这些已经够用了。</p><p>以上就是本文的全部内容了,是否有顺利帮助你解决问题?若是能给你带来学习上的帮助,请大家多多支持golang学习网!更多关于Golang的相关知识,也可关注golang学习网公众号。</p> </div> <div class="labsList"> <a href="/special/3_new_0_1.html" class="aLightGray" target="_blank" title="golang">golang</a> <a href="javascript:;" class="aLightGray" title="爬虫">爬虫</a> </div> </div> <!-- 最新阅读 --> <div class="contBoxNor"> <div class="contTit"> <div class="tit">相关阅读</div> <a href="/articlelist.html" class="more">更多></a> </div> <ul class="latestReadList"> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2年前  |   <a href="/articletag/56_new_0_1.html" class="aLightGray" title="map">map</a> · <a href="/articletag/993_new_0_1.html" class="aLightGray" title="实践">实践</a> · <a href="/articletag/994_new_0_1.html" class="aLightGray" title="实现原理">实现原理</a> · <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> </div> <div class="tit lineOverflow"><a href="/article/10762.html" title="Golangmap实践及实现原理解析" class="aBlack">Golangmap实践及实现原理解析</a></div> <div class="opt"> <span><i class="view"></i>505</span> <span class="collectBtn user_collection" data-id="10762" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2年前  |   <a href="/articletag/1668_new_0_1.html" class="aLightGray" title="try">try</a> · <a href="/articletag/1669_new_0_1.html" class="aLightGray" title="catch">catch</a> · <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> </div> <div class="tit lineOverflow"><a href="/article/11451.html" title="试了下Golang实现try catch的方法" class="aBlack">试了下Golang实现try catch的方法</a></div> <div class="opt"> <span><i class="view"></i>502</span> <span class="collectBtn user_collection" data-id="11451" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2年前  |   <a href="/articletag/578_new_0_1.html" class="aLightGray" title="Slice">Slice</a> · <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> </div> <div class="tit lineOverflow"><a href="/article/22134.html" title="Go语言中Slice常见陷阱与避免方法详解" class="aBlack">Go语言中Slice常见陷阱与避免方法详解</a></div> <div class="opt"> <span><i class="view"></i>501</span> <span class="collectBtn user_collection" data-id="22134" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2年前  |   <a href="/articletag/1513_new_0_1.html" class="aLightGray" title="for循环">for循环</a> · <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> </div> <div class="tit lineOverflow"><a href="/article/38357.html" title="Golang中for循环遍历避坑指南" class="aBlack">Golang中for循环遍历避坑指南</a></div> <div class="opt"> <span><i class="view"></i>501</span> <span class="collectBtn user_collection" data-id="38357" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2年前  |   <a href="javascript:;" class="aLightGray" title="应用">应用</a> <a href="javascript:;" class="aLightGray" title="Go语言">Go语言</a> </div> <div class="tit lineOverflow"><a href="/article/40805.html" title="Go语言中的RPC框架原理与应用" class="aBlack">Go语言中的RPC框架原理与应用</a></div> <div class="opt"> <span><i class="view"></i>501</span> <span class="collectBtn user_collection" data-id="40805" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> </ul> </div> <!-- 最新阅读 --> <div class="contBoxNor"> <div class="contTit"> <div class="tit">最新阅读</div> <a href="/articlelist.html" class="more">更多></a> </div> <ul class="latestReadList"> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  1分钟前  |   <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> <a href="javascript:;" class="aLightGray" title="time库">time库</a> </div> <div class="tit lineOverflow"><a href="/article/246596.html" title="Golang定时器与时间格式化技巧详解" class="aBlack">Golang定时器与时间格式化技巧详解</a></div> <div class="opt"> <span><i class="view"></i>307</span> <span class="collectBtn user_collection" data-id="246596" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  1分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246595.html" title="Golang信号阻塞解决与signal.Notify使用详解" class="aBlack">Golang信号阻塞解决与signal.Notify使用详解</a></div> <div class="opt"> <span><i class="view"></i>404</span> <span class="collectBtn user_collection" data-id="246595" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  2分钟前  |   <a href="/special/3_new_0_1.html" target="_blank" class="aLightGray" title="golang">golang</a> <a href="javascript:;" class="aLightGray" title="正则表达式">正则表达式</a> </div> <div class="tit lineOverflow"><a href="/article/246594.html" title="Golang正则优化技巧全解析" class="aBlack">Golang正则优化技巧全解析</a></div> <div class="opt"> <span><i class="view"></i>294</span> <span class="collectBtn user_collection" data-id="246594" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  6分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246590.html" title="Golang安全DevOps工具:沙箱与权限控制详解" class="aBlack">Golang安全DevOps工具:沙箱与权限控制详解</a></div> <div class="opt"> <span><i class="view"></i>246</span> <span class="collectBtn user_collection" data-id="246590" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  7分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246589.html" title="Golang反射机制原理与实战应用" class="aBlack">Golang反射机制原理与实战应用</a></div> <div class="opt"> <span><i class="view"></i>496</span> <span class="collectBtn user_collection" data-id="246589" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  8分钟前  |   <a href="javascript:;" class="aLightGray" title="Golang反射">Golang反射</a> <a href="javascript:;" class="aLightGray" title="动态RPC">动态RPC</a> </div> <div class="tit lineOverflow"><a href="/article/246587.html" title="Golang反射实现动态RPC调用技巧" class="aBlack">Golang反射实现动态RPC调用技巧</a></div> <div class="opt"> <span><i class="view"></i>414</span> <span class="collectBtn user_collection" data-id="246587" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  11分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246586.html" title="Golang解析JSON数据实用技巧" class="aBlack">Golang解析JSON数据实用技巧</a></div> <div class="opt"> <span><i class="view"></i>130</span> <span class="collectBtn user_collection" data-id="246586" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  19分钟前  |   <a href="javascript:;" class="aLightGray" title="网络速度测试">网络速度测试</a> <a href="javascript:;" class="aLightGray" title="速率计算">速率计算</a> </div> <div class="tit lineOverflow"><a href="/article/246575.html" title="用Golang编写网络速度测试工具" class="aBlack">用Golang编写网络速度测试工具</a></div> <div class="opt"> <span><i class="view"></i>397</span> <span class="collectBtn user_collection" data-id="246575" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  23分钟前  |   <a href="javascript:;" class="aLightGray" title="并发下载">并发下载</a> </div> <div class="tit lineOverflow"><a href="/article/246572.html" title="Golang文件下载:goroutine与sync.WaitGroup实战" class="aBlack">Golang文件下载:goroutine与sync.WaitGroup实战</a></div> <div class="opt"> <span><i class="view"></i>384</span> <span class="collectBtn user_collection" data-id="246572" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  25分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246570.html" title="Golang第三方错误类型断言技巧解析" class="aBlack">Golang第三方错误类型断言技巧解析</a></div> <div class="opt"> <span><i class="view"></i>151</span> <span class="collectBtn user_collection" data-id="246570" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  30分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246565.html" title="Go语言实现简单推荐算法教程" class="aBlack">Go语言实现简单推荐算法教程</a></div> <div class="opt"> <span><i class="view"></i>194</span> <span class="collectBtn user_collection" data-id="246565" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> <li> <div class="info"> <a href="/articlelist/25_new_0_1.html" class="aLightGray" title="Golang">Golang</a> · <a href="/articlelist/44_new_0_1.html" class="aLightGray" title="Go教程">Go教程</a>   |  31分钟前  |   </div> <div class="tit lineOverflow"><a href="/article/246564.html" title="Golang反射在RPC中的应用解析" class="aBlack">Golang反射在RPC中的应用解析</a></div> <div class="opt"> <span><i class="view"></i>392</span> <span class="collectBtn user_collection" data-id="246564" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </div> </li> </ul> </div> <!-- 课程推荐 --> <div class="contBoxNor"> <div class="contTit"> <div class="tit">课程推荐</div> <a href="/courselist.html" class="more">更多></a> </div> <ul class="classRecomList"> <li> <a href="/course/9.html" title="前端进阶之JavaScript设计模式" class="img_box"> <img src="/uploads/20221222/52fd0f23a454c71029c2c72d206ed815.jpg" onerror="this.onerror='';this.src='/assets/images/moren/morentu.png'" alt="前端进阶之JavaScript设计模式"> </a> <dl> <dt class="lineOverflow"> 前端进阶之JavaScript设计模式 </dt> <dd class="cont1 lineOverflow">设计模式是开发人员在软件开发过程中面临一般问题时的解决方案,代表了最佳的实践。本课程的主打内容包括JS常见设计模式以及具体应用场景,打造一站式知识长龙服务,适合有JS基础的同学学习。</dd> <dd class="cont2"> <a href="/course/9.html" title="前端进阶之JavaScript设计模式" class="toStudy">立即学习</a> <span>542次学习</span> </dd> </dl> </li> <li> <a href="/course/2.html" title="GO语言核心编程课程" class="img_box"> <img src="/uploads/20221221/634ad7404159bfefc6a54a564d437b5f.png" onerror="this.onerror='';this.src='/assets/images/moren/morentu.png'" alt="GO语言核心编程课程"> </a> <dl> <dt class="lineOverflow"> GO语言核心编程课程 </dt> <dd class="cont1 lineOverflow">本课程采用真实案例,全面具体可落地,从理论到实践,一步一步将GO核心编程技术、编程思想、底层实现融会贯通,使学习者贴近时代脉搏,做IT互联网时代的弄潮儿。</dd> <dd class="cont2"> <a href="/course/2.html" title="GO语言核心编程课程" class="toStudy">立即学习</a> <span>508次学习</span> </dd> </dl> </li> <li> <a href="/course/74.html" title="简单聊聊mysql8与网络通信" class="img_box"> <img src="/uploads/20240103/bad35fe14edbd214bee16f88343ac57c.png" onerror="this.onerror='';this.src='/assets/images/moren/morentu.png'" alt="简单聊聊mysql8与网络通信"> </a> <dl> <dt class="lineOverflow"> 简单聊聊mysql8与网络通信 </dt> <dd class="cont1 lineOverflow">如有问题加微信:Le-studyg;在课程中,我们将首先介绍MySQL8的新特性,包括性能优化、安全增强、新数据类型等,帮助学生快速熟悉MySQL8的最新功能。接着,我们将深入解析MySQL的网络通信机制,包括协议、连接管理、数据传输等,让</dd> <dd class="cont2"> <a href="/course/74.html" title="简单聊聊mysql8与网络通信" class="toStudy">立即学习</a> <span>497次学习</span> </dd> </dl> </li> <li> <a href="/course/57.html" title="JavaScript正则表达式基础与实战" class="img_box"> <img src="/uploads/20221226/bbe4083bb3cb0dd135fb02c31c3785fb.jpg" onerror="this.onerror='';this.src='/assets/images/moren/morentu.png'" alt="JavaScript正则表达式基础与实战"> </a> <dl> <dt class="lineOverflow"> JavaScript正则表达式基础与实战 </dt> <dd class="cont1 lineOverflow">在任何一门编程语言中,正则表达式,都是一项重要的知识,它提供了高效的字符串匹配与捕获机制,可以极大的简化程序设计。</dd> <dd class="cont2"> <a href="/course/57.html" title="JavaScript正则表达式基础与实战" class="toStudy">立即学习</a> <span>487次学习</span> </dd> </dl> </li> <li> <a href="/course/28.html" title="从零制作响应式网站—Grid布局" class="img_box"> <img src="/uploads/20221223/ac110f88206daeab6c0cf38ebf5fe9ed.jpg" onerror="this.onerror='';this.src='/assets/images/moren/morentu.png'" alt="从零制作响应式网站—Grid布局"> </a> <dl> <dt class="lineOverflow"> 从零制作响应式网站—Grid布局 </dt> <dd class="cont1 lineOverflow">本系列教程将展示从零制作一个假想的网络科技公司官网,分为导航,轮播,关于我们,成功案例,服务流程,团队介绍,数据部分,公司动态,底部信息等内容区块。网站整体采用CSSGrid布局,支持响应式,有流畅过渡和展现动画。</dd> <dd class="cont2"> <a href="/course/28.html" title="从零制作响应式网站—Grid布局" class="toStudy">立即学习</a> <span>484次学习</span> </dd> </dl> </li> </ul> </div> </div> <!-- footer --> <link href="https://fonts.googleapis.com/icon?family=Material+Icons" rel="stylesheet"> <div class="footer"> <ul> <li ><a href="/" class="aLightGray"><em class="material-icons">home</em><span>首页</span></a></li> <li class="curr"><a href="/articlelist.html" class="aLightGray"><em class="material-icons">menu_book</em><span>阅读</span></a></li> <li ><a href="/courselist.html" class="aLightGray"><em class="material-icons">school</em><span>课程</span></a></li> <li ><a href="/ai.html" class="aLightGray"><em class="material-icons">smart_toy</em><span>AI助手</span></a></li> <li ><a href="/user.html" class="aLightGray"><em class="material-icons">person</em><span>我的</span></a></li> </ul> </div> <script src="/assets/js/require.js" data-main="/assets/js/require-frontend.js?v=1671101972"></script> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?3dc5666f6478c7bf39cd5c91e597423d"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> </body> </html>