Figure 1(b): CenRL workflow from the IEEE S&P 2026 paper.Internet censorship measurements require continually testing reachability to many websites from diverse vantage points, yet measurement resources are limited and censorship evolves over time. We introduce CenRL, a reinforcement learning framework that optimizes and automates censorship measurements through sequential decision making. CenRL formulates censorship measurement as a multi-armed bandit problem, enabling an intelligent agent to maximize censorship detection within a limited time budget. CenRL supports two tasks: maximizing the discovery of blocked websites within a network, and rapidly detecting changes in blocking over time in dynamic environments. CenRL operates on large input lists of websites (e.g., Tranco) and uses features such as category, subdomain, TLD, website rank, and parent entity to capture censorship patterns. We evaluate CenRL in controlled environments with ground-truth datasets and demonstrate its effectiveness for real-world censorship measurements.