Spring Book
  • ❓About this book
  • Tutorials
    • JWT Authentication and Role-Based Authorization with Java Spring Boot
  • 🍲Recipes
    • Handling Exceptions in RESTful User Responses
    • Global REST Error Responses via @ControllerAdvice
    • Proper vs. Improper Way to Implement CRUD in RESTful Services
    • Referencing Values from Properties File in Components
    • Disabling OAuth2 Security for Integration Tests with @TestConfiguration
    • Custom Acutator Endpoints
    • Simplifying Spring Services with Lombok
    • Logging Entities in Spring with Lombok
    • WebScrapping on a Schedule
  • ☕Partner with Katyella for Your Development Needs
Powered by GitBook

Need US based Java Developers

  • Visit Katyella

Katyella LLC http://katyella.com

On this page
Export as PDF
  1. Recipes

WebScrapping on a Schedule

Spring Boot application for web scraping with JSoup

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Service;

@SpringBootApplication
@EnableScheduling
public class WebScrapingApplication {

    public static void main(String[] args) {
        SpringApplication.run(WebScrapingApplication.class, args);
    }

    @Service
    public static class WebScrapingService {

        @Scheduled(fixedRate = 10000) // Runs every 10 seconds
        public void scrapeWebsite() {
            try {
                Document doc = Jsoup.connect("https://example.com").get();
                String title = doc.title();
                System.out.println("Website Title: " + title);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

TL;DR:

  • Setup: This Spring Boot application is configured to perform web scraping tasks using JSoup. It includes the @EnableScheduling annotation to enable scheduled tasks.

  • Scheduled Task: The WebScrapingService class contains a method scrapeWebsite annotated with @Scheduled, set to execute every 10 seconds. This method uses JSoup to connect to a specified URL, retrieves the document, and prints the website's title.

  • Running: Upon running the application, the scheduled task will automatically scrape the website at the defined interval, demonstrating a basic use case of web scraping in a Spring Boot application.

This example provides a streamlined approach to integrating web scraping capabilities into a Spring Boot application, showcasing the ease of setting up scheduled tasks with JSoup for HTML parsing.

PreviousLogging Entities in Spring with Lombok

Last updated 5 months ago

🍲