| 4 min read

How I Built a Family Events Platform with Next.js and 11 Data Sources

Next.js data aggregation PostgreSQL web scraping Swindo full-stack

What Is Swindo?

Swindo.co.uk is a local platform for Swindon that helps families and residents find things to do. It aggregates events, venue information, and activity listings from 11 different data sources into a single, well-organised directory. Think of it as a local "what's on" guide, but powered by real data rather than manual curation.

I built it because Swindon's events and venue information is scattered across dozens of different websites, Facebook groups, council pages, and business listings. No single source has the complete picture. Swindo brings it all together.

The Data Sources

Getting data from 11 sources means dealing with 11 different formats, APIs, and reliability levels:

  • Google Places API: Venue details, ratings, opening hours, photos
  • Eventbrite API: Ticketed events
  • Facebook Events: Community events (scraped, since the API is restricted)
  • Swindon Borough Council: Council-run events and facilities
  • TripAdvisor: Reviews and visitor ratings
  • Meetup.com: Group activities and regular meetups
  • Local venue websites: Individual scrapers for major venues
  • Yelp: Additional business listings and reviews
  • OpenStreetMap: Geographic data and categorisation
  • Instagram: Visual content from venue accounts
  • Manual submissions: A form for venue owners to submit updates

The Normalisation Challenge

Each source represents data differently. Google gives you structured JSON with clear fields. Facebook gives you HTML that changes layout periodically. Council data comes as semi-structured HTML tables. The normalisation layer converts everything into a unified schema:

@dataclass
class UnifiedEvent:
    title: str
    description: str
    venue_id: Optional[int]
    start_time: datetime
    end_time: Optional[datetime]
    source: str
    source_url: str
    categories: list[str]
    price_info: Optional[str]
    image_url: Optional[str]
    confidence_score: float  # how confident we are in the data quality

Deduplication Across Sources

The same event often appears in multiple sources. A pub quiz might be listed on Facebook, on the venue's own website, and on Eventbrite. The deduplication system uses a combination of:

  • Venue matching (if we know the venue, events at the same venue on the same date are likely duplicates)
  • Title similarity using fuzzy matching with a threshold of 85%
  • Time overlap detection for events at the same location

When duplicates are found, the system merges them, keeping the richest description and the most complete metadata from across all sources.

The Next.js Frontend

I chose Next.js for the frontend because it gives me server-side rendering out of the box, which is essential for SEO. A local events platform lives or dies on organic search traffic.

The site is structured around three main views:

  • Events feed: Chronological listing of upcoming events with filtering by category, date, and area
  • Venue directory: Searchable directory of 300+ venues with detailed profiles
  • Family activities: A curated section specifically for family-friendly events and venues

SEO Strategy

For a local platform, SEO is everything. Every venue gets its own page with a unique URL, structured data markup (Schema.org), and content optimised for local search terms. Event pages are generated dynamically and include JSON-LD structured data:

function EventJsonLd({ event }) {
  return (
    
  );
}

Automated Content Pipeline

The data ingestion runs on a schedule:

  • Hourly: API sources (Google, Eventbrite, Meetup) are polled for new data
  • Every 6 hours: Scrapers run against web sources
  • Daily: Full deduplication pass and data quality checks
  • Weekly: Stale venue data is flagged for review

The entire pipeline is managed by a Python scheduler that logs every run and alerts me if a source fails consistently.

Handling Source Failures

When you depend on 11 external sources, something is always broken. Websites change their layouts. APIs hit rate limits. Services go down for maintenance. I designed the system with graceful degradation in mind:

  • Each source can fail independently without affecting others
  • Failed fetches are retried with exponential backoff
  • If a source fails for more than 48 hours, I get an alert
  • Stale data from a failed source remains visible but is marked with a freshness indicator

Content Freshness Strategy

Events have a natural expiry date, but venue information also goes stale. I implemented a freshness scoring system that considers when each piece of data was last verified. Sources that update frequently (like Google Places) get a higher freshness weight than sources that are scraped less often. When a venue's overall freshness score drops below a threshold, its listing gets a subtle indicator showing that some information may be outdated. This honest approach to data quality has actually built more trust with users than pretending everything is always current.

Mobile-First Design

Over 70% of Swindo's traffic comes from mobile devices, which makes sense for a local discovery platform. The Next.js frontend is designed mobile-first with large touch targets, easy-to-scan cards, and a bottom navigation bar. The map integration uses Leaflet with custom markers for different venue categories. Performance optimisation was critical since users on mobile often have variable connection quality. I use Next.js image optimisation, aggressive caching, and lazy loading for below-the-fold content to keep the time to interactive under 2 seconds on 3G connections.

Results

Swindo now tracks over 300 venues and surfaces dozens of events weekly. The organic search traffic has been growing steadily, driven largely by venue pages ranking for searches like "best restaurants in Swindon" and "things to do in Swindon this weekend."

The biggest lesson from this project is that data aggregation at this scale is mostly an engineering problem, not an AI problem. The value comes from reliable pipelines, good deduplication, and presenting clean data in a useful format. AI plays a supporting role in categorisation and content enrichment, but the foundation is solid data engineering.