
Static IP geolocation databases are becoming the dial-up modems of the modern web: outdated, inflexible, and often wrong. During the AIORI-2 Hackathon, Team GEOSTHIRA from the Vemana Institute of Technology set out to build a smarter alternative. We developed the GEOSTHIRA IP GEOLOCATOR, a supervised machine learning system that maps IPs to cities by analyzing the “heartbeat” of the network itself—latency, hop patterns, and path behavior.
1. Moving from Static Lookups to Dynamic Intelligence
Traditional geolocation relies on self-reported data which can be stale. Our approach aligns with RFC 8805 and RFC 2330 (IP Performance Metrics) by using active measurements. Instead of just asking “who owns this IP?”, we ask “how long does it take for a signal to reach this IP from known anchors?”
The Feature Set:
- RTT (Round Trip Time): Using RFC 792 (ICMP) to measure the physical distance through signal delay.
- rDNS Tokens: Scraping Reverse DNS strings for regional keywords (e.g., “blr” for Bengaluru).
- Cluster Density: Identifying the concentration of similar IP prefixes in specific geographic zones.
2. The Architecture: ML Meets FastAPI
We didn’t just want a model; we wanted a tool. We built a 36,000-record dataset synthesized from real-world Indian network topologies and trained a Random Forest classifier.
The Pipeline:
- AIORI Testbed: Raw traceroute logs are pulled from Indian network anchors.
- Preprocessing: Outliers are scrubbed using IQR (Interquartile Range) filtering to remove network “noise.”
- The Brain: A Random Forest model predicts the city and—crucially—provides a Confidence Score.
- FastAPI Dashboard: A real-time interface that allows batch CSV processing and visualizes reliability via bar charts.
- Results: Accuracy with Accountability
In our testing, the system achieved an 87.5% Top-1 city prediction accuracy. But in the world of Internet standards, accuracy isn’t enough; you need interpretability.
| Metric | Result | Standard Alignment |
|---|---|---|
| City-Level Accuracy | 87.5% | RFC 8805 (Self-Published Data) |
| Confidence Scoring | Probability-based | RFC 2330 (Performance Metrics) |
| Validation | WHOIS/RDAP Enrichment | RFC 9081 (Registration Access) |
4. Bridging to IETF Standards
Our work provides a practical implementation of RFC 2330 and RFC 7680. We found that:
- Geo-Variance Weighting: This is essential for smaller cities where traceroute data is sparse.
- Confidence Calibration: Is vital for identifying “low-certainty” cases like VPNs or Anycast IPs, where the physical location is intentionally masked.
5. Lessons from the Sprints
The biggest hurdle wasn’t the code—it was the data noise. Network latency is volatile. By applying IQR-based outlier removal, we were able to stabilize our Random Forest model, proving that network measurement is as much about cleaning the signal as it is about the algorithm.
“Working on GEOSTHIRA helped me understand how real Internet standards and machine learning can combine to create meaningful, open-source solutions for India’s digital future.” — Kavyashree K, Team Lead
6. Future Work: The Error Radius
Our next milestone is the implementation of a Haversine Distance Module. Instead of just naming a city, the geolocator will provide a “circular error probable” (e.g., “Bengaluru, within a 15km radius”), providing the precision needed for modern location-based services.
Read the full report