Building a Lean, On-Demand FTTH Monitoring Solution with OpenSearch, Logstash, and Perl
In 2024, as FTTH networks continue to expand, the demand for smarter, more efficient monitoring solutions grows. Instead of relying on traditional methods that can be heavy on resources and data storage, I’ve built a system that focuses on on-demand data enrichment and real-time metrics collection. This approach offers a precise, streamlined way to keep tabs on your FTTH network.
In this post, I’ll take you through the journey of how I constructed this system—from choosing the right technology stack to implementing custom scripts that make it all work seamlessly together.
The Challenge: Evolving FTTH Monitoring Techniques
Monitoring FTTH networks requires tools that can keep pace with the dynamic nature of these systems. Traditional approaches often involve collecting and storing large volumes of historical data for every Optical Network Terminal (ONT). While these methods provide valuable insights, they can be resource-intensive and may not always deliver the real-time responsiveness needed for immediate troubleshooting.
Recognizing the need for a more streamlined and efficient solution, I developed a system that focuses on gathering data on-demand, providing targeted insights exactly when they’re needed. This approach minimizes storage requirements while ensuring that network operators have access to the most current and relevant data during critical moments.
The Solution: On-Demand Data Enrichment
The key to this solution is a "less is more" approach—collecting the right data at the right time rather than storing everything continuously. Here’s how I implemented it:
Step 1: Setting Up the Stack
I deployed a LOGG stack—Linux, OpenSearch, Go, and Grafana—as the foundation of the system. This stack offers the flexibility and power needed to handle real-time data processing and visualization. Well, LOGG is my normal stack, I guess we would call this one LOGP but that just don't sound as cool. I chose to use Perl instead of Go to collect this data because, well XML.
- Linux my analytics OS platform for the last two decades.
- OpenSearch handles indexing and storing the enriched log data.
- Grafana provides powerful dashboards for real-time data visualization.
- Perl was chosen for the prototype due to its excellent XML processing capabilities, particularly for handling SOAP responses from the Calix Northbound API.
Step 2: Log Ingestion with Logstash
Logs from the Calix CMS are pushed into Logstash, where the real work begins. I use a Grok filter to parse the incoming logs and extract key values like shelfName
and ontID
, along with other relevant data.
Step 3: Conditional API Calls with Perl
Once the necessary data is extracted, a Ruby filter in Logstash checks if both ontID
and shelfName
are present. If they are, the filter triggers a Perl script, passing these values as arguments.
The Perl script handles the entire interaction with the Calix Northbound API:
- It sends an authentication envelope.
- It requests about 20 pieces of additional data from the ONT.
- It processes the returned XML, extracts the required information, and converts it into a JSON format.
- And finally it sends the logout envelope.
- Remember, the Calix Northbound API requires you to log out when all of your transactions are complete or you will fill up all available sessions.
Gathering data for a single ONT takes about one second, ensuring that even as logs start coming in rapidly, Logstash can efficiently buffer and process them. This keeps the data enriched and ensures that no critical insights are lost in the shuffle.
Step 4: Enriching the Log Data
The enriched JSON data from the Perl script is returned to Logstash, where it is merged back into the original log document. This creates a comprehensive record that includes both the original log entry and a snapshot of the ONT’s current metrics and configuration.
Step 5: Data Storage and Visualization
The enriched logs are stored in an OpenSearch index named calix_ont_analytics-yyyy.MM
By using Logstash filters, the original timestamp of each log entry is preserved, ensuring accurate time-based analysis.
To make sense of the data, I built Grafana dashboards that allow for quick filtering and analysis down to the PON ID level on the infrastructure side and individual ONT's on the premise side. These dashboards include line charts, heat maps, and other visualizations that make it easy to spot ONT errors as soon as they occur, such as Signal Degraded Bit Error Rate (SDBER) alerts.
The hope is to be able to spot when a single ONT is causing errors in other ONTs. This system has the potential to do that.
The Benefits of This Approach
- Efficiency: By only collecting data when it's needed, this approach significantly reduces storage requirements and ensures that the system remains lean and efficient.
- Accuracy: Triggering data collection based on real-time log entries ensures that the information is always current and relevant, which is critical for troubleshooting.
- Comprehensive Monitoring: Unlike traditional tools that rely on static data, this system provides a dynamic and detailed snapshot of the ONT’s status, allowing for faster and more accurate detection of issues.
- Customizable and Scalable: The use of OpenSearch and Grafana allows for easy customization and scalability, making it adaptable to various network environments.
Looking Forward
At some point, I plan to convert the Perl API poller to Go for potentially greater efficiency and performance. However, I was able to prototype it in Perl much faster, which allowed me to quickly test and implement the necessary features. The flexibility of Perl, particularly in handling XML and SOAP APIs, made it the ideal choice for the initial development phase. This way, I could focus on solving the problem first, knowing that optimization can always come later.
Far Into the Future
Looking ahead, there's potential to take this system even further by integrating it with the Calix API not just for monitoring, but for proactive management. Imagine a scenario where the system automatically writes back to the API when it detects a new ONT arrival, registering and configuring the device in real-time.
This could transform the way networks are managed, allowing for fully automated deployment and maintenance of FTTH infrastructure. As new ONTs come online, the system could instantly gather configuration data, apply necessary settings, and even initiate testing protocols—creating a self-healing, self-optimizing network environment.
While this might seem far into the future, the foundations being laid today with on-demand data enrichment and real-time analytics are the first steps towards a truly intelligent, automated network management system. The possibilities are vast, and the technology is evolving rapidly, bringing these concepts closer to reality every day.
Conclusion
By leveraging a lean, on-demand approach to FTTH monitoring, this approach not only meets but exceeds the capabilities of traditional tools. This solution is highly efficient, accurate, and customizable, making it an invaluable tool for maintaining network reliability.
Whether you’re facing similar challenges in your FTTH deployments or looking to enhance your network monitoring capabilities, this approach offers a practical and powerful solution that can be tailored to your specific needs.
-All Data Tells A Story
--Bryan