网页抓取
网页抓取工具的作用是根据网址解析到网页的内容。
1)可以使用jsoup库实现网页内容抓取和解析,首先给项目添加依赖:
<dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.19.1</version></dependency>
2)编写网页抓取工具类,几行代码就搞定了:
public class WebScrapingTool { @Tool(description = "Scrape the content of a web page") public String scrapeWebPage(@ToolParam(description = "URL of the web page to scrape") String url) { try { Document doc = Jsoup.connect(url).get(); return doc.html(); } catch (IOException e) { return "Error scraping web page: " + e.getMessage(); } }}
3)编写单元测试代码:
@SpringBootTestpublic class WebScrapingToolTest { @Test public void testScrapeWebPage() { WebScrapingTool tool = new WebScrapingTool(); String url = "https://www.juejin.cn"; String result = tool.scrapeWebPage(url); assertNotNull(result); }}