Cyberattacks are growing not just in number, but in complexity. Traditional, human-led defences can’t keep up — we need scalable, automated solutions. Our research introduces an Adaptive Deep Reinforcement Learning (DRL) Framework for Autonomous Threat Hunting (DRL-ATH), which models the cyber environment as a learning problem and employs the Proximal Policy Optimization (PPO) algorithm, powered by a Big Data architecture using Apache Spark, to enable real-time, adaptive decision-making against threats. In our tests on high-fidelity network datasets, the DRL-ATH agent showed a clear performance advantage over conventional methods, though real-world results may vary depending on data diversity and network conditions, achieving a 96.8% detection accuracy, a 35% reduction in the Mean Time to Detection (MTTD) (lowering the time to 28 minutes), and a significant 25% reduction in human analyst workload, thereby confirming that integrating DRL with scalable data processing is essential for building proactive, context-aware, and highly efficient next-generation cyber defence systems. This paper proposes a scalable DRL-based autonomous threat hunting framework integrating PPO with real-time Apache Spark-based telemetry processing and a multi-objective reward optimization strategy.