By Bryan Vest — Feb 8, 2025

Building the Core

Technical Overview of the ISP Testbed

Network Architecture

The testbed is structured to replicate a real ISP environment with logically segmented routing, authentication, and service layers. VyOS routers are used to handle different aspects of network traffic.

This is still a work in progress, and right now, the focus is on getting the foundation right—building out the network, setting up the core infrastructure, and making sure everything is running cleanly before diving into deep analysis. The goal is to build a system that learns from real network events, improving over time as operators train it by feeding it data and validating its decisions.

Core Routing and Traffic Flow

The network consists of multiple routers performing distinct functions:

router-core-1 (172.16.1.90): The primary core router handling upstream connectivity and static route management.
router-core-mgmt-1 (10.0.0.2): Dedicated management network router, ensuring isolated administrative traffic.
router-isp-core-1 (10.1.0.2): The interconnect for business and residential service segmentation.
router-isp-mgmt-1 (10.1.1.2): Handles ISP backend services like authentication, databases, DHCP, and DNS.
router-isp-bus-1 (10.1.11.2): Segmented traffic for business customers.
router-isp-res-1 (10.1.21.2): Segmented traffic for residential customers.

Each router has defined static routes and NAT configurations to control and optimize traffic flow. Core routing decisions are automated via static route configurations and load balancing.

Service Infrastructure and Automation

Beyond the routing layer, essential ISP services are deployed and automated for operational efficiency.

Core Servers

In addition to routers and service-specific infrastructure, several core servers are in place to manage automation, logging, and GIS data:

Server	Function	IP Address
DNS-Core	Central DNS resolution for core systems	10.0.1.10
Ansible-Core	Manages automated deployments and configurations	10.0.1.50
Logs-Core-1	Centralized logging and event collection	10.0.1.100
GeoServer-Core	GIS-based network mapping and visualization	10.0.1.25
CodexMCP-Workstation	Primary development and management system	TBD

These core servers serve as the backbone for automating deployments, logging events across all infrastructure, and visualizing network data geographically.

Authentication, Addressing, and Service Handling

Several backend services operate to handle critical ISP functions:

Service	Function	IP Address
MariaDB	Stores structured ISP metadata and configuration	10.1.1.55
FreeRADIUS	Authentication and accounting for network access	10.1.1.5
DNS (Bind9)	Handles internal network resolution	10.1.1.10
DHCP (Kea)	Assigns dynamic IP addresses to clients	10.1.1.6
Mail Server	Handles outbound and inbound mail processing	10.1.1.100
Web Portal	Provides user self-service functions	10.1.1.200
Nextcloud	Internal documentation and file sharing	10.1.1.150
PBX (Asterisk)	Manages SIP trunking and call handling	10.1.1.20

Automation is implemented using Ansible to configure these services consistently across multiple nodes. The provisioning system ensures services start in a known-good state and recover quickly if a failure occurs.

Log Collection – Where We Are Now

Right now, the system isn’t doing any advanced analytics yet. The network and infrastructure are in place, and the first stage is simply collecting logs—letting the system absorb raw data from different sources so we can start making sense of it. This isn’t just about grabbing logs for the sake of it; it’s about training the system, teaching it to recognize what normal looks like before it can detect problems.

So far, we have:

Base logs from VyOS routers and newly installed Ubuntu 24 VMs flowing into a central log aggregator.
System-level logs (syslog, authentication, cron jobs) being stored but not yet processed.
Basic router logs being collected but not yet structured for analysis.

That’s it. No fancy machine learning, no complex automation—just gathering raw data so we can start training it.

Training the System – The Next Steps

The goal isn’t just passive monitoring. The system needs to learn how operators troubleshoot issues, then apply that knowledge dynamically. This requires a feedback loop where IT staff validate what the system suggests, correct errors, and refine detection methods. Over time, this will create an adaptive troubleshooting assistant that learns from real ISP incidents.

Current training priorities:

Establishing baseline behavior for system logs so that deviations can be flagged as potential issues.
Building early-stage rule sets for log correlation without overwhelming the system with false positives.
Refining log ingestion to ensure data is structured properly before deeper analysis begins.

Building Toward Intelligent Network Operations

Right now, this system is raw—it’s just a collection of running services, basic logs, and an idea of what’s next. The real work starts now as we begin training it to recognize patterns and troubleshoot faster than a human could alone.

This process will be documented here—every step of tuning, every issue encountered, and every breakthrough. This isn’t a theoretical exercise; it’s a hands-on approach to building a real, functional system that will assist in ISP operations and scale over time.

-Is it Live or ???
--Bryan