WORTH logo with tourism photos

methodology

Jackson Hole and Teton County Hospitality Forecasting

Methodology

Data

The data used in forecasting are provided by the JHCC. As this data is proprietary it is described here but not reproduced.

The JHCC collects data from participating resorts and other lodging providers and aggregates this data to the destination level. For each month from 05-2020 to the present, the data contains information on average monthly Occupancy Rates (OR), average daily revenue (ADR), average revenue per room (RevPAR), and total revenue collected by all in-sample lodging providers (in-sample revenue [ISR]), along with other variables not used in this analysis. In addition, the data has “on-the-books” (OTB) OR, ADR, RevPAR, and ISR for each month six months in advance. OTB data are calculated the same as monthly data, but from reservations booked in each month. Thus, for example, the month of 12-2023 has data on realized OR, ADR, RevPAR, and total revenue as well as OTB data for each variable for 01-24, 02-24, 03-24, 04-24, 05-24, and 06-24.

OR is measured as the average proportion of total rooms available that are occupied (or reserved in the case of OTB OR) in a month. ISR and ADR are measured in nominal US dollars. Thus, adjustments for inflation may be necessary if using the forecasts for economic purposes.

The data is reshaped so that each month is an observation and there are values for realized OR, ADR, RevPar, and total revenue and OTB values for each variable from one, two, three, four, five, and six months prior. Table 1 gives an example of what OR looks like in the reshaped data:

Table 1: Example data table for OR.
Month Actual OR OTB OR one
month prior
OTB OR two
months prior
OTB OR three
months prior
OTB OR four
months prior
OTB OR five
months prior
OTB OR six
months prior
10-2023 # # # # # # #
11-2023 # # # # # # #
12-2023 # # # # # # #
01-2024 # # # # # # #
02-2024 NA NA # # # # #
03-2024 NA NA NA # # # #
04-2024 NA NA NA NA # # #
05-2024 NA NA NA NA NA # #
06-2024 NA NA NA NA NA NA #

Methodology

The base forecasts are those of OR, ADR, and ISR. These three variables are forecasted directly from the data provided by the JHCC. Other variables incorporate these three base forecasts into their methodology.

OR, ADR, and Total Revenue

OR, ADR, and ISR are forecast using the equation

IMAGE

where IMAGE denotes the variable of interest, IMAGE is the number of months in advance the forecast is made, IMAGE is the value for outcome IMAGE in month IMAGE, IMAGE is the OTB value from IMAGE months before for month IMAGE, IMAGE is a random error term, and IMAGE’s and IMAGE’s are regression parameters. So, for example, the one-month-ahead forecast of OR is estimated as

IMAGE

where hats denote estimated variables and parameters.

Table 2 highlights the values that would be used in an example forecast for the month 04-2024 using the data available as of 01-2024.

Table 2: Example data table for OR. Highlighted values are those used to forecast OR for 04-2024.
Month Actual OR OTB OR one
month prior
OTB OR two
months prior
OTB OR three
months prior
OTB OR four
months prior
OTB OR five
months prior
OTB OR six
months prior
10-2023 # # # # # # #
11-2023 # # # # # # #
12-2023 # # # # # # #
01-2024 # # # # # # #
02-2024 NA NA # # # # #
03-2024 NA NA NA # # # #
04-2024 NA NA NA NA # # #
05-2024 NA NA NA NA NA # #
06-2024 NA NA NA NA NA NA #

For OR, the data are first transformed using the logit function as the values of OR are bounded between 0 and 1 (average proportion of rooms occupied). ADR and total revenue are transformed using the natural log. In addition to forecasts, 95% confidence intervals are constructed using the standard errors for each forecast.

One may wonder why Equation 1 only uses OTB data from a single month. For example, forecasting OR for 04-2024 from Table 2, there is OTB data for bookings made four, five, and six months in advance. Why, then, does Equation 1 only use OTB data from four months in advance? The primary reason is that the data series used in forecasting is not long — as of 12-24 there were only 56 months of data. This means that adding additional variables to the analysis risks overfitting the forecasting models to the data, leading to poorer forecast performance. Additionally, for forecasts with shorter time horizons (1-3 months in advance), the additional variables do not provide much extra forecasting power as the IMAGE is already very high. More information on forecasting diagnostics is in Section 3.

An additional concern is the issue of stationarity. Equation 1 is an autoregressive model with three lags and the inclusion of an exogenous predictor, IMAGE. Ordinarily, when forecasting using an autoregressive model, care needs to be taken to test that the series is stationary. A stationary series is one that is generated by a process that has no overall increasing or decreasing trend, and whose variance is finite. However, testing shows that the OTB values are very good predictors independent of any autoregressive terms. Thus any forecasting bias introduced by nonstationarity is likely corrected by using the OTB values.

A more pressing concern for the forecasts of ADR, OR, and total revenue is selection bias. Resorts and other lodging providers opt-in to providing data to the JHCCC. Survey participants likely have commonalities that influence their decision to submit data, and likewise providers that did not submit data likely have traits in common. This means that the sample of lodging providers in the JHCCC is likely not representative of all lodging providers in the Jackson Hole or Teton County area. Individual hospitality providers should be aware that there may be significant differences between their operations and the operations of providers in the sample used to construct forecasts. It is recommended that individual lodging providers track their own OR, ADR, revenues, and RevPAR to determine how their own statistics differ from the forecasts presented here before relying on these forecasts to make internal business decisions. At minimum, the forecasts of OR, ADR, ISR, and RevPar serve as a barometer for comparing an individual provider’s operations to providers in the JHCC data. Forecasts of tax collections and hospitality revenues for Teton County are corrected for selection bias as described in Section 2.3.

RevPAR

Since RevPAR is a calculated statistic, that is,

IMAGE

it is not independently forecast. Instead, the RevPAR forecast is constructed using forecasted ADR and OR following Equation Equation 2. Confidence intervals for RevPAR are constructed using the lower and upper bounds of the confidence intervals for OR and ADR in Equation 2.

Taxes and County-Wide Lodging Sales

A statewide 3% tax is levied on lodging transactions in Wyoming that is used to fund the Wyoming Office of Tourism. Teton County levies an additional 2% tax that is spent on various tourism initiatives and programs within Teton County. Forecasting in-sample tax collections would be a simple matter of multiplying ISR by 0.03 and by 0.02. Unfortunately, as mentioned previously, the JHCC sample is likely not representative of Teton County lodging providers. Therefore, the forecast obtained from this method would not be representative of Teton County.

However, in addition to data collected from lodging providers, the JHCC keeps a record of the county (2%) lodging taxes collected in Teton County each month from 07-2021 to the present. Using this data allows us to correct for selection bias in estimating tax collections for Teton County. Unfortunately, without more information about lodging providers in Jackson Hole, it is not possible to correct for sample bias in forecasting tax collections in Jackson Hole; therefore, we limit our attention to tax collections county-wide. The method outlined below also allows us to forecast monthly lodging sales for all of Teton County without selection bias.

Inspection of the data revealed that Lodging tax collections lag sales by two months. Forecasting tax collections involves two steps. First, monthly tax collections from two months prior are regressed on total revenues from the JHCC hospitality sample:

IMAGE

where IMAGE is the estimated tax collected in month IMAGE and IMAGE are total lodging sales from the JHCC sample in month IMAGE. Notice that this model, unlike the models for OR, ADR, and ISR, has no autoregressive terms in it. The reason for this is that there is a known relationship between lodging revenues and tax collections as a certain percentage of lodging sales must be collected as taxes in each month, irrespective of collections the month before. Any autoregressive components of lodging sector revenues would be irrelevant to the model.

Next, forecasted ISR values are plugged into Equation 3 to produce forecasted tax collections. To construct 95% confidence intervals for tax forecasts, the values from the upper and lower bounds of the 95% confidence interval of forecasted revenues are also plugged in to Equation 3. The lower bound of the 95% confidence interval is constructed using the lower bound of the 95% confidence interval from revenues is the 95% confidence interval for tax collections. The upper bound is calculated similarly.

Calculating the forecasted 3% (State) and 5% (Total) lodging taxes involves multiplying the forecasted 2% (County) lodging taxes by IMAGE and IMAGE, respectively. Similarly, forecasted total lodging sales are calculated by multiplying forecasted 2% (County) lodging taxes by IMAGE.

Economic Indicators

The final set of forecasts are for economic indicators. Specifically, we forecast economic output, employee compensation, contribution to GDP (Also known as value added), and employment supported by the lodging sector in Teton County for the next six months. Economic indicators are forecast using an economic impact analysis in IMPLAN, a regional input-output modeling tool.

Economic Impact Analysis

Economic impact analyses are a widely accepted research approach used to better comprehend how the operations of an economic entity impact the economy as a whole. They are also used to study how a new event or a change in an industry changes local and state economies. These analyses typically use input-output methodologies to re-create inter-industry linkages and calculate the impact on a regional economy. We used the IMpact Analysis for PLANning (IMPLAN) (version 3.1) software package to conduct our analysis.

An economic impact analysis calculates three kinds of effects from economic activity: direct, indirect, and induced impacts. Direct impacts are the economic activity in the sector under examination. Employees in the lodging sector, for example, are the direct impact on employment of revenues in the lodging sector. Sales in the lodging sector are also spent on intermediate goods, or materials, supplies, and services from industries that support the lodging sector. Economic activity in supporting sectors are counted in indirect impacts. For example, if a hotel purchases cleaning supplies from a wholesaler in Teton County, the employees supported by this transfer of funds are counted in indirect impacts on employment. Indirect impacts also include impacts from suppliers of suppliers, so long as they are within the region being studied. Finally, induced impacts are the result of economic activity generated by workers all along the supply chain as they spend their wages in the local economy. For example, if an employee of a hotel supplier in Teton County eats a meal at a local restaurant, this induces economic activity in the restaurant sector.

Total forecasted lodging sector sales for the next six months are used as input to IMPLAN modeling. It is important to use total sales and not sales for each month because IMPLAN is constructed to produce annual impacts, not monthly impacts. Forecasts for individual months would need to be adjusted for seasonal effects that are not present in IMPLAN. Forecasts for the total six-month period are, therefore, more accurate in aggregate than forecasting for individual months.

Employment forecasts from IMPLAN must also be adjusted due to differences in the time horizon of forecasted lodging sector sales and IMPLAN modeling assumptions. A unit of employment in IMPLAN corresponds to one full-time job being present for a full year; therefore, the employment forecast directly obtained from IMPLAN is the number of jobs supported by revenue in the lodging industry if the revenue input is understood to be the total revenue in the lodging industry for the full year. Since the lodging sector sales forecast is for a six-month period, IMPLAN’s forecasted employment figure underreports the average number of employees by about half. To harmonize the forecast period employment figures are multiplied by two. This results in a better estimate of total employment supported by the lodging sector on average over the six-month period.

Forecast Diagnostics

This section analyzes the results of the regression-based forecasts to test for accuracy. First, we report the results of estimating Equation 1 for OR, ADR, and ISR.

OR

Table 3: Results of OR Forecast Models
 
  Outcome: OR
 
  Months ahead: 1 Months ahead: 2 Months ahead: 3 Months ahead: 4 Months ahead: 5 Months ahead: 6
 
Intercept 0.391*** 0.761*** 1.029*** 1.156*** 1.197*** 1.355***
  (0.022) (0.024) (0.066) (0.088) (0.123) (0.151)
OR OTB 1 Months Prior 0.935***          
  (0.021)          
OR OTB 2 Months Prior   0.897***        
    (0.040)        
OR OTB 3 Months Prior     0.833***      
      (0.050)      
OR OTB 4 Months Prior       0.809***    
        (0.050)    
OR OTB 5 Months Prior         0.781***  
          (0.060)  
OR OTB 6 Months Prior           0.788***
            (0.080)
ORt-1 0.038          
  (0.030)          
ORt-2 0.032 0.106***        
  (0.025) (0.031)        
ORt-3 -0.024 -0.073** 0.000      
  (0.019) (0.035) (0.041)      
ORt-4   -0.029 -0.125*** -0.090*    
    (0.028) (0.042) (0.053)    
ORt-5     0.030 -0.018 0.035  
      (0.046) (0.055) (0.054)  
ORt-6       0.106*** 0.080* 0.186***
        (0.026) (0.045) (0.064)
ORt-7         0.146*** 0.032
          (0.049) (0.086)
ORt-8           0.114**
            (0.053)
 
Observations 31 30 29 28 27 26
R2 0.989 0.979 0.969 0.955 0.941 0.917
Adjusted R2 0.988 0.976 0.963 0.947 0.931 0.902
Residual Std. Error 0.096 (df=26) 0.132 (df=25) 0.165 (df=24) 0.190 (df=23) 0.221 (df=22) 0.268 (df=21)
F Statistic 810.386*** (df=4; 26) 347.136*** (df=4; 25) 200.894*** (df=4; 24) 187.882*** (df=4; 23) 101.434*** (df=4; 22) 45.440*** (df=4; 21)
 
Note: *p<0.1; **p<0.05; ***p<0.01

ADR

Table 4: Results of ADR Forecast Models
 
  Outcome: ADR
 
  Months ahead: 1 Months ahead: 2 Months ahead: 3 Months ahead: 4 Months ahead: 5 Months ahead: 6
 
Intercept -1717.462*** -1726.164*** -1827.525*** -1508.951*** -1345.316*** -1120.137***
  (97.869) (223.950) (109.877) (111.805) (163.990) (238.880)
ADR OTB 1 Months Prior 362.411***          
  (14.829)          
ADR OTB 2 Months Prior   369.552***        
    (30.779)        
ADR OTB 3 Months Prior     370.181***      
      (16.406)      
ADR OTB 4 Months Prior       299.776***    
        (20.962)    
ADR OTB 5 Months Prior         251.212***  
          (30.634)  
ADR OTB 6 Months Prior           225.686***
            (40.497)
ADRt-1 -0.042          
  (0.032)          
ADRt-2 -0.064 -0.157***        
  (0.050) (0.057)        
ADRt-3 -0.053 -0.089** -0.136**      
  (0.041) (0.045) (0.058)      
ADRt-4   -0.035 -0.047 -0.072**    
    (0.053) (0.037) (0.036)    
ADRt-5     0.113** 0.055 0.099  
      (0.055) (0.040) (0.069)  
ADRt-6       0.226*** 0.251*** 0.277***
        (0.049) (0.052) (0.067)
ADRt-7         0.168*** 0.176***
          (0.061) (0.051)
ADRt-8           -0.095
            (0.076)
 
Observations 31 30 29 28 27 26
R2 0.964 0.943 0.933 0.931 0.904 0.868
Adjusted R2 0.958 0.934 0.921 0.919 0.886 0.843
Residual Std. Error 27.201 (df=26) 34.765 (df=25) 38.415 (df=24) 38.001 (df=23) 44.127 (df=22) 52.055 (df=21)
F Statistic 253.388*** (df=4; 26) 116.003*** (df=4; 25) 154.073*** (df=4; 24) 121.168*** (df=4; 23) 67.347*** (df=4; 22) 48.819*** (df=4; 21)
 
Note: *p<0.1; **p<0.05; ***p<0.01

ISR

Table 5: Results of ISR Forecast Models
 
  Outcome: ISR
 
  Months ahead: 1 Months ahead: 2 Months ahead: 3 Months ahead: 4 Months ahead: 5 Months ahead: 6
 
Intercept 1.207*** 2.827*** 5.027*** 5.674*** 3.229*** 2.417**
  (0.342) (0.651) (0.712) (0.465) (0.672) (1.025)
ISR OTB 1 Months Prior 0.919***          
  (0.010)          
ISR OTB 2 Months Prior   0.849***        
    (0.014)        
ISR OTB 3 Months Prior     0.787***      
      (0.020)      
ISR OTB 4 Months Prior       0.762***    
        (0.024)    
ISR OTB 5 Months Prior         0.755***  
          (0.031)  
ISR OTB 6 Months Prior           0.777***
            (0.043)
ISRt-1 0.009          
  (0.022)          
ISRt-2 0.016 0.034**        
  (0.020) (0.016)        
ISRt-3 -0.005 -0.020 -0.001      
  (0.011) (0.017) (0.022)      
ISRt-4   -0.013 -0.064*** -0.075***    
    (0.013) (0.023) (0.025)    
ISRt-5     -0.000 0.015 0.019  
      (0.025) (0.021) (0.026)  
ISRt-6       -0.012 0.018 0.031
        (0.019) (0.026) (0.040)
ISRt-7         0.055 0.050
          (0.035) (0.037)
ISRt-8           0.048
            (0.031)
 
Observations 31 30 29 28 27 26
R2 0.995 0.994 0.990 0.988 0.980 0.970
Adjusted R2 0.994 0.993 0.988 0.986 0.977 0.965
Residual Std. Error 0.069 (df=26) 0.074 (df=25) 0.098 (df=24) 0.102 (df=23) 0.133 (df=22) 0.167 (df=21)
F Statistic 3004.374*** (df=4; 26) 1763.828*** (df=4; 25) 616.427*** (df=4; 24) 752.744*** (df=4; 23) 472.406*** (df=4; 22) 245.379*** (df=4; 21)
 
Note: *p<0.1; **p<0.05; ***p<0.01

Note that the number of observations does not incorporate the full sample. This is because the data used in the model are limited to the past 30 observations. The reason for limiting the sample is discussed in Section 3.1

OTB covariates are significant at the 1% level in every regression. Interestingly, lagged variables are only occasionally significant. This indicates that advance bookings data is much better at forecasting than is past data. The IMAGE for ISR is very high; Even the six-month-ahead forecast has an IMAGE of 0.97. IMAGE for OR and ADR is not as high, but still very significant even for the six-month-ahead forecast at 0.92 and 0.87.

A high IMAGE has two possible sources which may be working in conjunction. 1) The model explains the data very well or 2) there are too few degrees of freedom left in the model after estimation. The second source of a high IMAGE is commonly referred to as overfitting. A model that overfits the data is very good at predicting in-sample values, but performs poorly when predicting out-of-sample values. To ensure that the models used in forecasting are not overfit, we use cross validation (CV) over multiple subsamples.

Cross Validation

We split the data into consecutive, overlapping training subsamples. The subsamples are constructed such that each subsequent subsample fully contains the previous one. For example, subsample 2 would contain all observations in subsample 1 plus an additional month. Subsample 3 contains all months in both subsamples 1 and 2 plus an additional month. Subsamples contain a minimum of 24 observations. Each training subsample is matched with a testing subsample that consists of the next six months of data. The model is estimated on each subsample and used to forecast into each matched testing set. The root mean squared forecast error (RMSFE) is calculated for each subsample. If the model is overfitted, the RMSFE will be relatively poor even for subsamples that include most of the data.

In addition to testing for overfitting, we use cross validation for model selection. It may be the case that the relationship between advance bookings data and actual OR, ADR and RevPAR changes over time such that including data from earlier dates actually results in poorer forecast performance. If this is the case, a model that excludes earlier dates will have a better RMSFE than one that includes the whole series. To test this, we construct two sets of training subsets: one where each subsample contains all data from previous subsamples (full history subsamples), and one where subsamples have a maximum length of 30 (rolling subsamples). The model is estimated on each subsample for each set of subsamples and the RMSFE calculated for each subsample.

Figure 1 shows the RMSFE for both full history and rolling subsamples when forecasting OR. In both cases, the RMSFE starts high but begins to decline after the 4th subsample (6th for full history subsamples). Likely this reflects the uncertainty around lodging during the COVID-19 pandemic as the earlier subsamples largely contain data from that period. As travel became more consistent in later subsamples, the forecasts improve.

Both sets of subsamples lose some forecast accuracy in later forecasts, but the rolling subsamples perform better. The average RMSFE for subsamples 8-15 is 0.156 for full history subsamples and 0.133 for rolling subsamples. Thus, we choose rolling subsamples as the superior forecasting model.

In addition, the RMSFE is relatively small for later subsamples (8-15). For comparison, the standard deviation of OR in the data is 0.997, compared to the average RMSFE of subsamples 8-15 of 0.133. Therefore, there is little evidence of model overfitting. Note that similar results are obtained for ADR and ISR, though forecast accuracy is lower for ADR. The RMSFE for ADR and ISR are in Table 6.

Figure 1: RMSFE for full history and rolling subsamples

Table 6 shows the RMSFE for each subsample and for each of OR, ADR, and ISR. Table 7 shows the ratio of each RMSFE to the standard deviations of OR, ADR, and ISR. This table shows that all RMSFE’s are less than half the standard deviation for each variable. Though this is more of a “rule-of-thumb” measure of forecast performance, it shows that the models do not suffer from overfitting.

Table 6: RMSFE for OR, ADR, and ISR for all subsamples
Sample OR ADR ISR
1 0.165 42.289 0.214
2 0.285 38.754 0.244
3 0.336 24.700 0.182
4 0.446 31.812 0.169
5 0.465 42.980 0.110
6 0.385 30.042 0.128
7 0.355 23.500 0.111
8 0.142 44.937 0.096
9 0.077 21.962 0.056
10 0.075 25.974 0.070
11 0.109 43.935 0.066
12 0.136 50.250 0.071
13 0.143 42.575 0.097
14 0.097 50.321 0.046
15 0.151 33.260 0.042
16 0.168 30.887 0.085
17 0.189 33.427 0.088
18 0.181 36.846 0.069
Table 7: Ratio of RMSFE to standard error for OR, ADR, and ISR for all subsamples
Sample OR ADR ISR
1 0.165 0.318 0.222
2 0.286 0.292 0.254
3 0.336 0.186 0.189
4 0.447 0.240 0.176
5 0.467 0.324 0.115
6 0.386 0.226 0.133
7 0.356 0.177 0.116
8 0.142 0.338 0.100
9 0.078 0.165 0.059
10 0.076 0.196 0.073
11 0.109 0.331 0.068
12 0.136 0.378 0.074
13 0.143 0.321 0.101
14 0.098 0.379 0.047
15 0.152 0.250 0.043
16 0.168 0.233 0.088
17 0.189 0.252 0.092
18 0.182 0.277 0.072

Taxes and Lodging Sector Sales

The time series for tax rates is too short to allow for useful cross validation. Therefore, we only examine the IMAGE for this variable. As more data become available, cross validation may become possible.

Table 8: Results of tax regression
 
  Outcome: Lodging Tax Collections
 
 
Intercept 1.740***
  (0.514)
ISRt-2 0.719***
  (0.031)
 
Observations 30
R2 0.935
Adjusted R2 0.933
Residual Std. Error 0.180 (df=28)
F Statistic 526.224*** (df=1; 28)
 
Note: *p<0.1; **p<0.05; ***p<0.01

With an IMAGE of 0.93, the tax regression seems to fit the data very well. Because 3% and 5% taxes and county-wide lodging sector revenues are calculated directly from estimated 2% taxes, no further validation methods are available for these indicators.

Economic indicators

IMPLAN is an economic model, as opposed to a statistical model, which makes forecast validation less straightforward. In addition, IMPLAN relies on several modeling assumptions that must hold for forecasts to be accurate. Nevertheless, a simple comparison of forecasts to historical data can be used to ensure that estimates are within reason.

Table 9: Economic Indicators, 2022
Economic Indicator 2022 Estimates from IMPLAN
Employment 3,057
Employee Compensation $159,179,667
Contribution to GCP $342,542,549
Total Output $473,487,769

Table 9 shows lodging sector economic indicators from 2022, the latest data available in IMPLAN. Forecasts for Output, Employee Compensation, and contribution to gross county product should be approximately half their actual 2022 values, while employment should be approximately the same as its 2022 value, although there will likely be variation depending on seasonality, growth in the lodging sector, and inflation. Comparing these with the forecasts in the main report, the . Table 10 shows the percent difference between 2022 values and forecasted values for the low, medium, and high forecasts, which indicates that the forecasted economic indicators are higher than expected, with the medium forecast about 11% greater than 2022 values.

Table 10: Percent difference of forecasted economic indicators from 2022 values
Economic Indicator Low Forecast Medium Forecast High Forecast
Employment 5.80% 10.94% 17.22%
Employee Compensation 5.12% 10.14% 16.28%
Contribution to GCP 6.31% 11.53% 17.91%
Total Output 6.31% 11.53% 17.91%
 
 
 
 
 
 






WORTH logo
Find us on LinkedIn (Link opens a new window)