American options allow early exercise, which yields an additional challenge – besides the weights of each option – when optimizing a portfolio of American options. In this work, we construct strategies for an American option portfolio by exercising options at optimal timings with optimal weights determined concurrently. To model such portfolios, a reinforcement learning (Q-learning) algorithm is proposed, combining an iterative progressive hedging method and a quadratic approximation to Q-values by regression. By means of Monte Carlo simulation and empirical experiments, using data from the SPY options market, we evaluate the quality of our algorithms and examine their performance under various investment assumptions, such as different portfolio settings and distributions of the underlying asset returns. With discretized timings, our strategies work better in a relatively long time horizon and when the portfolio is hedged using the underlying instrument. Due to the highly leveraged and risky nature of our strategies, overly risk-averse investors are proved unsuitable for such investment opportunities.