Jackson Hole and Teton County Hospitality Forecasting
Methodology
Data
The data used in forecasting are provided by the JHCC. As this data is proprietary
it is described here but not reproduced.
The JHCC collects data from participating resorts and other lodging providers and
aggregates this data to the destination level. For each month from 05-2020 to the
present, the data contains information on average monthly Occupancy Rates (OR), average
daily revenue (ADR), average revenue per room (RevPAR), and total revenue collected
by all in-sample lodging providers (in-sample revenue [ISR]), along with other variables
not used in this analysis. In addition, the data has “on-the-books” (OTB) OR, ADR,
RevPAR, and ISR for each month six months in advance. OTB data are calculated the
same as monthly data, but from reservations booked in each month. Thus, for example,
the month of 12-2023 has data on realized OR, ADR, RevPAR, and total revenue as well
as OTB data for each variable for 01-24, 02-24, 03-24, 04-24, 05-24, and 06-24.
OR is measured as the average proportion of total rooms available that are occupied
(or reserved in the case of OTB OR) in a month. ISR and ADR are measured in nominal
US dollars. Thus, adjustments for inflation may be necessary if using the forecasts
for economic purposes.
The data is reshaped so that each month is an observation and there are values for
realized OR, ADR, RevPar, and total revenue and OTB values for each variable from
one, two, three, four, five, and six months prior. Table 1 gives an example of what OR looks like in the reshaped data:
Table 1: Example data table for OR.
Month
Actual OR
OTB OR one month prior
OTB OR two months prior
OTB OR three months prior
OTB OR four months prior
OTB OR five months prior
OTB OR six months prior
10-2023
#
#
#
#
#
#
#
11-2023
#
#
#
#
#
#
#
12-2023
#
#
#
#
#
#
#
01-2024
#
#
#
#
#
#
#
02-2024
NA
NA
#
#
#
#
#
03-2024
NA
NA
NA
#
#
#
#
04-2024
NA
NA
NA
NA
#
#
#
05-2024
NA
NA
NA
NA
NA
#
#
06-2024
NA
NA
NA
NA
NA
NA
#
Methodology
The base forecasts are those of OR, ADR, and ISR. These three variables are forecasted
directly from the data provided by the JHCC. Other variables incorporate these three
base forecasts into their methodology.
OR, ADR, and Total Revenue
OR, ADR, and ISR are forecast using the equation
IMAGE
where IMAGE denotes the variable of interest, IMAGE is the number of months in advance the forecast is made, IMAGE is the value for outcome IMAGE in month IMAGE, IMAGE is the OTB value from IMAGE months before for month IMAGE, IMAGE is a random error term, and IMAGE’s and IMAGE’s are regression parameters. So, for example, the one-month-ahead forecast of OR
is estimated as
IMAGE
where hats denote estimated variables and parameters.
Table 2 highlights the values that would be used in an example forecast for the month 04-2024
using the data available as of 01-2024.
Table 2: Example data table for OR. Highlighted values are those used to forecast
OR for 04-2024.
Month
Actual OR
OTB OR one month prior
OTB OR two months prior
OTB OR three months prior
OTB OR four months prior
OTB OR five months prior
OTB OR six months prior
10-2023
#
#
#
#
#
#
#
11-2023
#
#
#
#
#
#
#
12-2023
#
#
#
#
#
#
#
01-2024
#
#
#
#
#
#
#
02-2024
NA
NA
#
#
#
#
#
03-2024
NA
NA
NA
#
#
#
#
04-2024
NA
NA
NA
NA
#
#
#
05-2024
NA
NA
NA
NA
NA
#
#
06-2024
NA
NA
NA
NA
NA
NA
#
For OR, the data are first transformed using the logit function as the values of OR
are bounded between 0 and 1 (average proportion of rooms occupied). ADR and total
revenue are transformed using the natural log. In addition to forecasts, 95% confidence
intervals are constructed using the standard errors for each forecast.
One may wonder why Equation 1 only uses OTB data from a single month. For example, forecasting OR for 04-2024 from
Table 2, there is OTB data for bookings made four, five, and six months in advance. Why,
then, does Equation 1 only use OTB data from four months in advance? The primary reason is that the data
series used in forecasting is not long — as of 12-24 there were only 56 months of
data. This means that adding additional variables to the analysis risks overfitting
the forecasting models to the data, leading to poorer forecast performance. Additionally,
for forecasts with shorter time horizons (1-3 months in advance), the additional variables
do not provide much extra forecasting power as the IMAGE is already very high. More information on forecasting diagnostics is in Section 3.
An additional concern is the issue of stationarity. Equation 1 is an autoregressive model with three lags and the inclusion of an exogenous predictor,
IMAGE. Ordinarily, when forecasting using an autoregressive model, care needs to be taken
to test that the series is stationary. A stationary series is one that is generated
by a process that has no overall increasing or decreasing trend, and whose variance
is finite. However, testing shows that the OTB values are very good predictors independent
of any autoregressive terms. Thus any forecasting bias introduced by nonstationarity
is likely corrected by using the OTB values.
A more pressing concern for the forecasts of ADR, OR, and total revenue is selection
bias. Resorts and other lodging providers opt-in to providing data to the JHCCC. Survey
participants likely have commonalities that influence their decision to submit data,
and likewise providers that did not submit data likely have traits in common. This
means that the sample of lodging providers in the JHCCC is likely not representative
of all lodging providers in the Jackson Hole or Teton County area. Individual hospitality
providers should be aware that there may be significant differences between their
operations and the operations of providers in the sample used to construct forecasts.
It is recommended that individual lodging providers track their own OR, ADR, revenues,
and RevPAR to determine how their own statistics differ from the forecasts presented
here before relying on these forecasts to make internal business decisions. At minimum,
the forecasts of OR, ADR, ISR, and RevPar serve as a barometer for comparing an individual
provider’s operations to providers in the JHCC data. Forecasts of tax collections
and hospitality revenues for Teton County are corrected for selection bias as described
in Section 2.3.
RevPAR
Since RevPAR is a calculated statistic, that is,
IMAGE
it is not independently forecast. Instead, the RevPAR forecast is constructed using
forecasted ADR and OR following Equation Equation 2. Confidence intervals for RevPAR are constructed using the lower and upper bounds
of the confidence intervals for OR and ADR in Equation 2.
Taxes and County-Wide Lodging Sales
A statewide 3% tax is levied on lodging transactions in Wyoming that is used to fund
the Wyoming Office of Tourism. Teton County levies an additional 2% tax that is spent
on various tourism initiatives and programs within Teton County. Forecasting in-sample
tax collections would be a simple matter of multiplying ISR by 0.03 and by 0.02. Unfortunately,
as mentioned previously, the JHCC sample is likely not representative of Teton County
lodging providers. Therefore, the forecast obtained from this method would not be
representative of Teton County.
However, in addition to data collected from lodging providers, the JHCC keeps a record
of the county (2%) lodging taxes collected in Teton County each month from 07-2021
to the present. Using this data allows us to correct for selection bias in estimating
tax collections for Teton County. Unfortunately, without more information about lodging
providers in Jackson Hole, it is not possible to correct for sample bias in forecasting
tax collections in Jackson Hole; therefore, we limit our attention to tax collections
county-wide. The method outlined below also allows us to forecast monthly lodging
sales for all of Teton County without selection bias.
Inspection of the data revealed that Lodging tax collections lag sales by two months.
Forecasting tax collections involves two steps. First, monthly tax collections from
two months prior are regressed on total revenues from the JHCC hospitality sample:
IMAGE
where IMAGE is the estimated tax collected in month IMAGE and IMAGE are total lodging sales from the JHCC sample in month IMAGE. Notice that this model, unlike the models for OR, ADR, and ISR, has no autoregressive
terms in it. The reason for this is that there is a known relationship between lodging
revenues and tax collections as a certain percentage of lodging sales must be collected
as taxes in each month, irrespective of collections the month before. Any autoregressive
components of lodging sector revenues would be irrelevant to the model.
Next, forecasted ISR values are plugged into Equation 3 to produce forecasted tax collections. To construct 95% confidence intervals for
tax forecasts, the values from the upper and lower bounds of the 95% confidence interval
of forecasted revenues are also plugged in to Equation 3. The lower bound of the 95% confidence interval is constructed using the lower bound
of the 95% confidence interval from revenues is the 95% confidence interval for tax
collections. The upper bound is calculated similarly.
Calculating the forecasted 3% (State) and 5% (Total) lodging taxes involves multiplying
the forecasted 2% (County) lodging taxes by IMAGE and IMAGE, respectively. Similarly, forecasted total lodging sales are calculated by multiplying
forecasted 2% (County) lodging taxes by IMAGE.
Economic Indicators
The final set of forecasts are for economic indicators. Specifically, we forecast
economic output, employee compensation, contribution to GDP (Also known as value added),
and employment supported by the lodging sector in Teton County for the next six months.
Economic indicators are forecast using an economic impact analysis in IMPLAN, a regional
input-output modeling tool.
Economic Impact Analysis
Economic impact analyses are a widely accepted research approach used to better comprehend
how the operations of an economic entity impact the economy as a whole. They are also
used to study how a new event or a change in an industry changes local and state economies.
These analyses typically use input-output methodologies to re-create inter-industry
linkages and calculate the impact on a regional economy. We used the IMpact Analysis
for PLANning (IMPLAN) (version 3.1) software package to conduct our analysis.
An economic impact analysis calculates three kinds of effects from economic activity:
direct, indirect, and induced impacts. Direct impacts are the economic activity in the sector under examination.
Employees in the lodging sector, for example, are the direct impact on employment
of revenues in the lodging sector. Sales in the lodging sector are also spent on intermediate
goods, or materials, supplies, and services from industries that support the lodging
sector. Economic activity in supporting sectors are counted in indirect impacts. For
example, if a hotel purchases cleaning supplies from a wholesaler in Teton County,
the employees supported by this transfer of funds are counted in indirect impacts
on employment. Indirect impacts also include impacts from suppliers of suppliers,
so long as they are within the region being studied. Finally, induced impacts are
the result of economic activity generated by workers all along the supply chain as
they spend their wages in the local economy. For example, if an employee of a hotel
supplier in Teton County eats a meal at a local restaurant, this induces economic
activity in the restaurant sector.
Total forecasted lodging sector sales for the next six months are used as input to
IMPLAN modeling. It is important to use total sales and not sales for each month because
IMPLAN is constructed to produce annual impacts, not monthly impacts. Forecasts for
individual months would need to be adjusted for seasonal effects that are not present
in IMPLAN. Forecasts for the total six-month period are, therefore, more accurate
in aggregate than forecasting for individual months.
Employment forecasts from IMPLAN must also be adjusted due to differences in the time
horizon of forecasted lodging sector sales and IMPLAN modeling assumptions. A unit
of employment in IMPLAN corresponds to one full-time job being present for a full
year; therefore, the employment forecast directly obtained from IMPLAN is the number
of jobs supported by revenue in the lodging industry if the revenue input is understood
to be the total revenue in the lodging industry for the full year. Since the lodging
sector sales forecast is for a six-month period, IMPLAN’s forecasted employment figure
underreports the average number of employees by about half. To harmonize the forecast
period employment figures are multiplied by two. This results in a better estimate
of total employment supported by the lodging sector on average over the six-month
period.
Forecast Diagnostics
This section analyzes the results of the regression-based forecasts to test for accuracy.
First, we report the results of estimating Equation 1 for OR, ADR, and ISR.
OR
Table 3: Results of OR Forecast Models
Outcome: OR
Months ahead: 1
Months ahead: 2
Months ahead: 3
Months ahead: 4
Months ahead: 5
Months ahead: 6
Intercept
0.391***
0.761***
1.029***
1.156***
1.197***
1.355***
(0.022)
(0.024)
(0.066)
(0.088)
(0.123)
(0.151)
OR OTB 1 Months Prior
0.935***
(0.021)
OR OTB 2 Months Prior
0.897***
(0.040)
OR OTB 3 Months Prior
0.833***
(0.050)
OR OTB 4 Months Prior
0.809***
(0.050)
OR OTB 5 Months Prior
0.781***
(0.060)
OR OTB 6 Months Prior
0.788***
(0.080)
ORt-1
0.038
(0.030)
ORt-2
0.032
0.106***
(0.025)
(0.031)
ORt-3
-0.024
-0.073**
0.000
(0.019)
(0.035)
(0.041)
ORt-4
-0.029
-0.125***
-0.090*
(0.028)
(0.042)
(0.053)
ORt-5
0.030
-0.018
0.035
(0.046)
(0.055)
(0.054)
ORt-6
0.106***
0.080*
0.186***
(0.026)
(0.045)
(0.064)
ORt-7
0.146***
0.032
(0.049)
(0.086)
ORt-8
0.114**
(0.053)
Observations
31
30
29
28
27
26
R2
0.989
0.979
0.969
0.955
0.941
0.917
Adjusted R2
0.988
0.976
0.963
0.947
0.931
0.902
Residual Std. Error
0.096 (df=26)
0.132 (df=25)
0.165 (df=24)
0.190 (df=23)
0.221 (df=22)
0.268 (df=21)
F Statistic
810.386*** (df=4; 26)
347.136*** (df=4; 25)
200.894*** (df=4; 24)
187.882*** (df=4; 23)
101.434*** (df=4; 22)
45.440*** (df=4; 21)
Note:
*p<0.1; **p<0.05; ***p<0.01
ADR
Table 4: Results of ADR Forecast Models
Outcome: ADR
Months ahead: 1
Months ahead: 2
Months ahead: 3
Months ahead: 4
Months ahead: 5
Months ahead: 6
Intercept
-1717.462***
-1726.164***
-1827.525***
-1508.951***
-1345.316***
-1120.137***
(97.869)
(223.950)
(109.877)
(111.805)
(163.990)
(238.880)
ADR OTB 1 Months Prior
362.411***
(14.829)
ADR OTB 2 Months Prior
369.552***
(30.779)
ADR OTB 3 Months Prior
370.181***
(16.406)
ADR OTB 4 Months Prior
299.776***
(20.962)
ADR OTB 5 Months Prior
251.212***
(30.634)
ADR OTB 6 Months Prior
225.686***
(40.497)
ADRt-1
-0.042
(0.032)
ADRt-2
-0.064
-0.157***
(0.050)
(0.057)
ADRt-3
-0.053
-0.089**
-0.136**
(0.041)
(0.045)
(0.058)
ADRt-4
-0.035
-0.047
-0.072**
(0.053)
(0.037)
(0.036)
ADRt-5
0.113**
0.055
0.099
(0.055)
(0.040)
(0.069)
ADRt-6
0.226***
0.251***
0.277***
(0.049)
(0.052)
(0.067)
ADRt-7
0.168***
0.176***
(0.061)
(0.051)
ADRt-8
-0.095
(0.076)
Observations
31
30
29
28
27
26
R2
0.964
0.943
0.933
0.931
0.904
0.868
Adjusted R2
0.958
0.934
0.921
0.919
0.886
0.843
Residual Std. Error
27.201 (df=26)
34.765 (df=25)
38.415 (df=24)
38.001 (df=23)
44.127 (df=22)
52.055 (df=21)
F Statistic
253.388*** (df=4; 26)
116.003*** (df=4; 25)
154.073*** (df=4; 24)
121.168*** (df=4; 23)
67.347*** (df=4; 22)
48.819*** (df=4; 21)
Note:
*p<0.1; **p<0.05; ***p<0.01
ISR
Table 5: Results of ISR Forecast Models
Outcome: ISR
Months ahead: 1
Months ahead: 2
Months ahead: 3
Months ahead: 4
Months ahead: 5
Months ahead: 6
Intercept
1.207***
2.827***
5.027***
5.674***
3.229***
2.417**
(0.342)
(0.651)
(0.712)
(0.465)
(0.672)
(1.025)
ISR OTB 1 Months Prior
0.919***
(0.010)
ISR OTB 2 Months Prior
0.849***
(0.014)
ISR OTB 3 Months Prior
0.787***
(0.020)
ISR OTB 4 Months Prior
0.762***
(0.024)
ISR OTB 5 Months Prior
0.755***
(0.031)
ISR OTB 6 Months Prior
0.777***
(0.043)
ISRt-1
0.009
(0.022)
ISRt-2
0.016
0.034**
(0.020)
(0.016)
ISRt-3
-0.005
-0.020
-0.001
(0.011)
(0.017)
(0.022)
ISRt-4
-0.013
-0.064***
-0.075***
(0.013)
(0.023)
(0.025)
ISRt-5
-0.000
0.015
0.019
(0.025)
(0.021)
(0.026)
ISRt-6
-0.012
0.018
0.031
(0.019)
(0.026)
(0.040)
ISRt-7
0.055
0.050
(0.035)
(0.037)
ISRt-8
0.048
(0.031)
Observations
31
30
29
28
27
26
R2
0.995
0.994
0.990
0.988
0.980
0.970
Adjusted R2
0.994
0.993
0.988
0.986
0.977
0.965
Residual Std. Error
0.069 (df=26)
0.074 (df=25)
0.098 (df=24)
0.102 (df=23)
0.133 (df=22)
0.167 (df=21)
F Statistic
3004.374*** (df=4; 26)
1763.828*** (df=4; 25)
616.427*** (df=4; 24)
752.744*** (df=4; 23)
472.406*** (df=4; 22)
245.379*** (df=4; 21)
Note:
*p<0.1; **p<0.05; ***p<0.01
Note that the number of observations does not incorporate the full sample. This is
because the data used in the model are limited to the past 30 observations. The reason
for limiting the sample is discussed in Section 3.1
OTB covariates are significant at the 1% level in every regression. Interestingly,
lagged variables are only occasionally significant. This indicates that advance bookings
data is much better at forecasting than is past data. The IMAGE for ISR is very high; Even the six-month-ahead forecast has an IMAGE of 0.97. IMAGE for OR and ADR is not as high, but still very significant even for the six-month-ahead
forecast at 0.92 and 0.87.
A high IMAGE has two possible sources which may be working in conjunction. 1) The model explains
the data very well or 2) there are too few degrees of freedom left in the model after
estimation. The second source of a high IMAGE is commonly referred to as overfitting. A model that overfits the data is very good
at predicting in-sample values, but performs poorly when predicting out-of-sample
values. To ensure that the models used in forecasting are not overfit, we use cross
validation (CV) over multiple subsamples.
Cross Validation
We split the data into consecutive, overlapping training subsamples. The subsamples
are constructed such that each subsequent subsample fully contains the previous one.
For example, subsample 2 would contain all observations in subsample 1 plus an additional
month. Subsample 3 contains all months in both subsamples 1 and 2 plus an additional
month. Subsamples contain a minimum of 24 observations. Each training subsample is
matched with a testing subsample that consists of the next six months of data. The
model is estimated on each subsample and used to forecast into each matched testing
set. The root mean squared forecast error (RMSFE) is calculated for each subsample.
If the model is overfitted, the RMSFE will be relatively poor even for subsamples
that include most of the data.
In addition to testing for overfitting, we use cross validation for model selection.
It may be the case that the relationship between advance bookings data and actual
OR, ADR and RevPAR changes over time such that including data from earlier dates actually
results in poorer forecast performance. If this is the case, a model that excludes
earlier dates will have a better RMSFE than one that includes the whole series. To
test this, we construct two sets of training subsets: one where each subsample contains
all data from previous subsamples (full history subsamples), and one where subsamples
have a maximum length of 30 (rolling subsamples). The model is estimated on each subsample
for each set of subsamples and the RMSFE calculated for each subsample.
Figure 1 shows the RMSFE for both full history and rolling subsamples when forecasting OR.
In both cases, the RMSFE starts high but begins to decline after the 4th subsample
(6th for full history subsamples). Likely this reflects the uncertainty around lodging
during the COVID-19 pandemic as the earlier subsamples largely contain data from that
period. As travel became more consistent in later subsamples, the forecasts improve.
Both sets of subsamples lose some forecast accuracy in later forecasts, but the rolling
subsamples perform better. The average RMSFE for subsamples 8-15 is 0.156 for full
history subsamples and 0.133 for rolling subsamples. Thus, we choose rolling subsamples
as the superior forecasting model.
In addition, the RMSFE is relatively small for later subsamples (8-15). For comparison,
the standard deviation of OR in the data is 0.997, compared to the average RMSFE of
subsamples 8-15 of 0.133. Therefore, there is little evidence of model overfitting.
Note that similar results are obtained for ADR and ISR, though forecast accuracy is
lower for ADR. The RMSFE for ADR and ISR are in Table 6.
Figure 1: RMSFE for full history and rolling subsamples
Table 6 shows the RMSFE for each subsample and for each of OR, ADR, and ISR. Table 7 shows the ratio of each RMSFE to the standard deviations of OR, ADR, and ISR. This
table shows that all RMSFE’s are less than half the standard deviation for each variable.
Though this is more of a “rule-of-thumb” measure of forecast performance, it shows
that the models do not suffer from overfitting.
Table 6: RMSFE for OR, ADR, and ISR for all subsamples
Sample
OR
ADR
ISR
1
0.165
42.289
0.214
2
0.285
38.754
0.244
3
0.336
24.700
0.182
4
0.446
31.812
0.169
5
0.465
42.980
0.110
6
0.385
30.042
0.128
7
0.355
23.500
0.111
8
0.142
44.937
0.096
9
0.077
21.962
0.056
10
0.075
25.974
0.070
11
0.109
43.935
0.066
12
0.136
50.250
0.071
13
0.143
42.575
0.097
14
0.097
50.321
0.046
15
0.151
33.260
0.042
16
0.168
30.887
0.085
17
0.189
33.427
0.088
18
0.181
36.846
0.069
Table 7: Ratio of RMSFE to standard error for OR, ADR, and ISR for all subsamples
Sample
OR
ADR
ISR
1
0.165
0.318
0.222
2
0.286
0.292
0.254
3
0.336
0.186
0.189
4
0.447
0.240
0.176
5
0.467
0.324
0.115
6
0.386
0.226
0.133
7
0.356
0.177
0.116
8
0.142
0.338
0.100
9
0.078
0.165
0.059
10
0.076
0.196
0.073
11
0.109
0.331
0.068
12
0.136
0.378
0.074
13
0.143
0.321
0.101
14
0.098
0.379
0.047
15
0.152
0.250
0.043
16
0.168
0.233
0.088
17
0.189
0.252
0.092
18
0.182
0.277
0.072
Taxes and Lodging Sector Sales
The time series for tax rates is too short to allow for useful cross validation. Therefore,
we only examine the IMAGE for this variable. As more data become available, cross validation may become possible.
Table 8: Results of tax regression
Outcome: Lodging Tax Collections
Intercept
1.740***
(0.514)
ISRt-2
0.719***
(0.031)
Observations
30
R2
0.935
Adjusted R2
0.933
Residual Std. Error
0.180 (df=28)
F Statistic
526.224*** (df=1; 28)
Note:
*p<0.1; **p<0.05; ***p<0.01
With an IMAGE of 0.93, the tax regression seems to fit the data very well. Because 3% and 5% taxes
and county-wide lodging sector revenues are calculated directly from estimated 2%
taxes, no further validation methods are available for these indicators.
Economic indicators
IMPLAN is an economic model, as opposed to a statistical model, which makes forecast
validation less straightforward. In addition, IMPLAN relies on several modeling assumptions
that must hold for forecasts to be accurate. Nevertheless, a simple comparison of
forecasts to historical data can be used to ensure that estimates are within reason.
Table 9: Economic Indicators, 2022
Economic Indicator
2022 Estimates from IMPLAN
Employment
3,057
Employee Compensation
$159,179,667
Contribution to GCP
$342,542,549
Total Output
$473,487,769
Table 9 shows lodging sector economic indicators from 2022, the latest data available in
IMPLAN. Forecasts for Output, Employee Compensation, and contribution to gross county
product should be approximately half their actual 2022 values, while employment should
be approximately the same as its 2022 value, although there will likely be variation
depending on seasonality, growth in the lodging sector, and inflation. Comparing these
with the forecasts in the main report, the . Table 10 shows the percent difference between 2022 values and forecasted values for the low,
medium, and high forecasts, which indicates that the forecasted economic indicators
are higher than expected, with the medium forecast about 11% greater than 2022 values.
Table 10: Percent difference of forecasted economic indicators from 2022 values