The starting point is that daily returns are not informative enough about vol. If a stock price went way up and then came back to where it started, the daily return could be zero. But the price path suggests there was significant volatility.
If you looked at the price path at higher frequencies, say every 5 minutes, you would see on that day there were a lot of positive returns followed by a lot of negative returns, i.e. plenty of volatility. So there's a benefit to looking at higher frequencies. 5 minutes is the common choice, but I'm not saying it's the best one in all cases.
If you have a dataset containing the price of a stock every 5 minutes, you can compute the a volatility of 5-minute returns for each day in your dataset. This is what we'll use to compute the "X" variables in a regression.
There are 3 problems (at least) with 5-minute volatilities. One is that you can only compute them while the markets are active. The second is that they are probably contaminated by bid-ask spreads, short-run liquidity effects, etc. The third is that volatility is mean reverting, so if yesterday's volatility is unusually high or low, tomorrow's will most likely be less so. So while the 5-minute volatilities will be informative, they will probably be biased. Thus, we would not want to assume, for instance, that the volatility of 5-minute returns from yesterday would tell what the volatility is of today's close-to-close return.
What HAR-RV does is to remove these biases by running a regression. The simplest version would be to take the 5-minute volatility from yesterday and transform it in two ways. First, square it so that it becomes a variance instead of a volatility. Second, and this has no effect other than interpretability, multiply it by the number of 5-minute intervals in the day. This transformed variable is your "X".
Your "Y" variable is the squared close-to-close return on the next day. So it's a predictive regression. A variance you see today is predicting a squared return you see tomorrow.
At a 1-day horizon, expected returns tend to be small. So if the regression is telling you that E[Y] = a + b*X, then a + b*X is your variance forecast and sqrt(a + b*X) is your volatility forecast.
This is simpler because there is no numerical optimization required, like GARCH, and more accurate because it uses 5-minute returns. You can add additional X variables to capture different forms of mean reversion. A single lagged variance is probably simpler than you would want.