With an increasing demand for spectrum, dynamic spectrum access (DSA) has been proposed as viable means for providing the flexibility and greater access to spectrum necessary to meet this demand. Within the DSA concept, unlicensed secondary users temporarily "borrow" or access licensed spectrum, while respecting the licensed primary user's rights to that spectrum. As key enablers for DSA, cognitive radios (CRs) are based on software-defined radios which allow them to sense, learn, and adapt to the spectrum environment. These radios can operate independently and rapidly switch channels. Thus, the initial setup and maintenance of cognitive radio networks are dependent upon the ability of CR nodes to find each other, in a process known as rendezvous, and create a link on a common channel for the exchange of data and control information. In this paper, we propose a novel rendezvous protocol, known as QLP, which is based on Q-learning and the p-persistent CSMA protocol. With the QLP protocol, CR nodes learn which channels are best for rendezvous and thus adapt their behavior to visit those channels more frequently. We demonstrate through simulation that the QLP protocol provides a rendevous capability for DSA environments with different dynamics of PU activity, while attempting to achieve the following performance goals: (1) minimize the average time-to-rendezvous, (2) maximize system throughput, (3) minimize primary user interference, and (4) minimize collisions among CR nodes.