
In an era where digital borders are fluid and routing table shifts are constant, understanding exactly where an IP address resides is a cornerstone of network security and performance. During the AIORI-2 Hackathon, team Intelligent IP from Christ University developed a machine-learning-driven geolocation module that moves beyond static databases by integrating active network measurements and multi-model ensembles.
1. The Core Innovation: RTT-Enhanced Intelligence
Traditional geolocation often relies on administrative records that may be outdated. Our approach anchors itself in RFC 9092 (Flow Data Export) and RFC 8805 (Self-Published Geolocation), but adds a vital layer of ground-truth: Round-Trip Time (RTT). By probing IPs from multiple vantage points (Bangalore and Sirsa), we generate a unique “latency signature” for every address.
- RFC 792 (ICMP): Used for gathering raw RTT samples.
- Ensemble Modeling: We utilize LightGBM, XGBoost, and Random Forest to process these signals. This multi-model approach ensures that if one model is biased toward ASN data, another can correct it using RTT-derived features.
2. Architecture & Pipeline
The system is designed for high-concurrency environments, using a Flask-based API and a sophisticated preprocessing pipeline that handles the noisy reality of Internet data.
Key Technical Achievements:
- City-Level Accuracy: Achieved 92.4% accuracy in Indian metropolitan regions.
- VPN/Proxy Gate: Integrated an initial check via ipwho.is to filter out non-actionable traffic before running the inference stack.
- Inference Speed: Under 100ms per query once model artifacts are cached in memory.
3. Performance & Validation
| Test | Result | Observation |
|---|---|---|
| Model Accuracy | 92.4% | Enriched RTT columns provided a 4% boost over static baselines. |
| Inference Time | < 100ms | Optimized by serving pre-loaded .joblib artifacts. |
| Missing Data | Robust | Median-based imputation handles intermittent probe failures gracefully. |
| Top-3 Accuracy | ≥ 97% | Provides a highly reliable “Best Guess” for network operators. |
4. Overcoming Interoperability Hurdles
One of the major challenges we faced was Artifact Weight. Tree-based models like Random Forest can produce massive files (~350MB). Reloading these on every request caused significant latency. We pivoted to a startup-caching strategy, ensuring models are loaded once into memory, which reduced response times from seconds to milliseconds.
“Bringing RTT data from Bangalore and Sirsa into the model made me appreciate how ground-level measurements can sharpen IP geolocation. It’s about turning numbers into real network stories.” — Team Intelligent IP
5. Future Roadmap: The Path to Standardization
Our future work focuses on expanding the “vantage point” network to cities like Delhi and Hyderabad to further refine metropolitan discrimination. We are also preparing a draft submission for the IETF IPPM Working Group to propose extensions for RFC 8805 that include measurement-based confidence metrics in geolocation feeds.
Read the full report