The scheme design of switch-controller association is an essential problem for software-defined networking (SDN) systems. A natural idea is to address the problem from the perspective of stable matching, since each switch (controller) often prefers to be associated with those controllers (switches) of lower communication costs and control traffic overhead. However, in practice, such system dynamics are usually unknown a priori, making it a challenging open problem. In this paper, we study such a problem of stable matching between switches and controllers with unknown communication costs from the perspective of multi-agent multi-armed bandit (MAMAB) learning. By integrating stable matching with online learning, we propose an effective Learning- aided Switch-controller Stable Matching (LS2M) scheme. Our theoretical analysis shows that LS2M effectively achieves a switch-optimal stable matching with a sublinear regret bound over time slots. Moreover, we conduct numerical simulations to verify the outperformance of LS2M over various baseline schemes.