WebScrapping on a Schedule

Spring Boot application for web scraping with JSoup

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;

@SpringBootApplication
@EnableScheduling
public class WebScrapingApplication {

    public static void main(String[] args) {
        SpringApplication.run(WebScrapingApplication.class, args);
    }

    @Service
    public static class WebScrapingService {

        @Scheduled(fixedRate = 10000) // Runs every 10 seconds
        public void scrapeWebsite() {
            try {
                Document doc = Jsoup.connect("https://example.com").get();
                String title = doc.title();
                System.out.println("Website Title: " + title);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

TL;DR:

  • Setup: This Spring Boot application is configured to perform web scraping tasks using JSoup. It includes the @EnableScheduling annotation to enable scheduled tasks.

  • Scheduled Task: The WebScrapingService class contains a method scrapeWebsite annotated with @Scheduled, set to execute every 10 seconds. This method uses JSoup to connect to a specified URL, retrieves the document, and prints the website's title.

  • Running: Upon running the application, the scheduled task will automatically scrape the website at the defined interval, demonstrating a basic use case of web scraping in a Spring Boot application.

This example provides a streamlined approach to integrating web scraping capabilities into a Spring Boot application, showcasing the ease of setting up scheduled tasks with JSoup for HTML parsing.

Last updated

Need US based Java Developers

Visit Katyella

Katyella LLC http://katyella.com