Equipped with specific IoT on-board devices, unmanned aerial vehicles (UAVs) can be orchestrated to assist in particular value-added service delivery with improved quality-of-service. Typically, services are delegated in the unit of tasks to a designated leader UAV, while the leader UAV splits each task into sub-tasks and offloads them to part of its nearby UAVs, a.k.a. helper UAVs, for timely processing. Such a decision making process, often referred to as UAV task offloading, still remains open and challenging to design, due to various uncertainties therein, such as the resource availability and instant workloads on helper UAVs. However, existing solutions often assume the knowledge of system dynamics is fully available and conduct decision making in an offline manner, resulting in excessive control overheads and scalability issues. In this paper, we study the UAV task offloading problem in an online setting and formulate it as a multi-armed bandits (MAB) problem with time-varying resource constraints. Then we propose VR-LATOS, a learning-aided offloading scheme that learns the unknown statistics from feedback signals while making effective offloading decisions in an online fashion. Results from both theoretical analysis and simulations demonstrate that VR-LATOS outperforms state-of-the-art schemes.