WebScrapping on a Schedule
Spring Boot application for web scraping with JSoup
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;
@SpringBootApplication
@EnableScheduling
public class WebScrapingApplication {
public static void main(String[] args) {
SpringApplication.run(WebScrapingApplication.class, args);
}
@Service
public static class WebScrapingService {
@Scheduled(fixedRate = 10000) // Runs every 10 seconds
public void scrapeWebsite() {
try {
Document doc = Jsoup.connect("https://example.com").get();
String title = doc.title();
System.out.println("Website Title: " + title);
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
TL;DR:
Setup: This Spring Boot application is configured to perform web scraping tasks using JSoup. It includes the
@EnableScheduling
annotation to enable scheduled tasks.Scheduled Task: The
WebScrapingService
class contains a methodscrapeWebsite
annotated with@Scheduled
, set to execute every 10 seconds. This method uses JSoup to connect to a specified URL, retrieves the document, and prints the website's title.Running: Upon running the application, the scheduled task will automatically scrape the website at the defined interval, demonstrating a basic use case of web scraping in a Spring Boot application.
This example provides a streamlined approach to integrating web scraping capabilities into a Spring Boot application, showcasing the ease of setting up scheduled tasks with JSoup for HTML parsing.
Last updated