This study focuses on predicting the price of Airbnb rentals in Amsterdam. To model the price of Airbnb listings, we use the characteristics of Airbnb rentals and the location of the Airbnb house, room type, the number of reviews, etc. We use random sample of 500 Airbnb listings from a larger dataset on Amsterdam Airbnb listings from the insideairbnb.com project led by Murray Cox (murray [at] murraycox.com). Every variable we use in our analysis is described in Table 1. To contextualize our analysis, we looked at past research done on the factors that might influence the price of Airbnb rentals.
Perez-Sanchez et. al examined which factors determine the price of Airbnb rentals in four Spanish cities. They examined 26 variables within four broad categories: location specifics, accommodation characteristics, surroundings, and advertising strategies. They used data from Airdna (airdna.co) and found that price decreases the further away the rental is from the shore, while the price increases the further the rental is from tourist zones. The second finding is particularly interesting for our analysis, because European cities tend to have the city center be a touristy area so we can see if this finding holds in Amsterdam.
Zhang et. al. studied Metro Nashville, Tennessee, and the factors that determine the price of the listings on Airbnb weighted for their location. They looked at 6 variables: the price of the listing, number of reviews, distance from a highway, distance from the local convention center, number of months since the listing got published, and the rating of the listing. Their findings show that the number of reviews, distance to convention center, and ratings are statistically significant predictors of the price of an Airbnb listing.
Both of the studies showed that the location of Airbnb rentals is a significant predictor of their price. The second one also showed evidence that the number of reviews and rating are significant to predict the price of Airbnb rentals. Hence, we decided to examine whether factors identified in above studies are significant for predicting price of Airbnb rentals in Amsterdam and explore what other factors in the data set of Amsterdam’s Airbnb listings are significant for predicting price.
The data set comes from the website Insider Airbnb, including the summary information of listings in Amsterdam. The following table shows the definition of important variables:
Variable Name | Description |
---|---|
room_type |
Three categories of rooms: private room, share room, entire home or apartment |
price |
The price of one night |
minimum_nights |
The number of minimum nights of one customer staying in one Airbnb house |
number_of_reviews |
The number of comments for each Airbnb on the website |
availability_365 |
The number of available days in one year of the Airbnb house |
longitude |
Measurement of location, expressed in degrees |
latitude |
Measurement of location, expressed in degrees |
For cleaning the data set of Amsterdam Airbnb listings, we eliminate all missing values, and delete variables which are clearly unrelated to the price an Airbnb rental, like the host’s ID number. In the process of EDA, we find that it’s hard to fit linear regression between variables in the data set. Hence, some modifications of the dataset help us to explore the data and build models. We mutate several new indicator variables: expensive, entire home, and trusted. The ‘expensive’ variable indicates whether the price of the rental is greater than 100 euros. The ‘entirehome’ variable indicates whether the type of the rental is the entire home or whether any rooms/spaces are shared. The ‘trusted’ variable indicates if the Airbnb has more than 10 reviews. We also create a new variable named ‘region’, by using the mean value of longitude and latitude to divide the whole city into four districts: South-west, South-east, North-west, and North-east.
After the EDA, we choose to use multiple linear and logistic regression as well as ANOVA to model the price of Airbnb listings and the odds that the price of Airbnb listings is higher than 100 euros.
In our analysis, we have examined how different factors correlate with the price of an Airbnb listing. First, we explore the relationship between price and the minimum number of nights a customer is required to pay, in order to be able to book the rental.
Figure 1 shows that the range of price might be more narrow as the log minimum number of nights increases. However, there are only a few Airbnb houses which have a large number of nights required in Amsterdam. As log minimum number of nights is lower, the prices of Airbnb houses evenly distribute between 0 to 200 euros. Customers who stay at an Airbnb for less time will have more choice, but there does not seem to be definitive relationship observable from Figure 1.
Next, we explored the relationship between price of an Airbnb rental in Amsterdam and the number of reviews it has received. Based on our experience, we speculate that most people think of rentals with more reviews as more trustworthy, which is in turn good for the host and could drive a price increase. We define a binary variable to determine whether an Airbnb is expensive, which we consider to be 100 euros (roughly equal to US dollars) on a student budget.
From Figure 2, there is no major difference in the number of reviews between the group of Airbnb priced less than 100 euros and the group of Airbnb priced greater than 100 euros.
We also explored the relationship between price and the type of the rooms in the rental. To investigate this relationship, we decided to create a binary variable for whether the listing offers the entire housing unit like home or apartment, or whether there are some spaces shared with other Airbnb customers.
From Figure 3, we see that the price of an Airbnb is generally higher if the Airbnb’s room type is entire home, or in other words if there are spaces shared with other Airbnb customers.
Finally, we also explored the relationship between the price of an Airbnb listing and the region of Amsterdam where the listing is located. While the dataset contains data on the Airbnb listing’s neighborhood, the number of available neighborhoods is large. To simplify the exploration, we divided up the city into four quadrants based on cardinal directions.
We divided Amsterdam into four districts by using the mean value of longitude and latitude. The plot between whether or not the listing is expensive and the region in which it is located we see that the price of Airbnb in the southwest and southeast area of Amsterdam tend to be more expensive. Thus, we infer that the region may be a factor of Airbnb price in Amsterdam.
To investigate the impact of the region of an Airbnb listing on its price, we conduct an Analysis of Variance. Our null hypothesis is that region does not have an impact on the price, in other words that any differences observed are due only to random variation. Our alternative hypothesis is that region does have an influence on the price.
\[ H_{0} : \mu_{SW} = \mu_{SE} = \mu_{NW} = \mu_{NE}\] \[ H_{A} : \mu_{i} \neq \mu_{j} | i,j \in \{ SW,SE,NW,NE\}\]
Df | Sum Sq | Mean Sq | F value | Pr(> F) | |
region | 3 | 0.410 | 0.138 | 0.533 | 0.665 |
Residuals | 496 | 130.220 | 0.262 | ||
The results of our ANOVA test are shown in Table 2. The p-value for region is 0.6 so not significant and therefore we fail to reject the null hypothesis and conclude that the region of Amsterdam does not have an impact on the pricing of the Airbnb. To explore this question further, we perform linear and logistic regression and discuss their results.
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
Coefficient | 8 | 0.604 | 1.565 | -0.065 | -0.027 | 0.134 | 4.447 |
Lower-bound | 8 | 0.520 | 1.547 | -0.176 | -0.134 | 0.108 | 4.319 |
Upper-bound | 8 | 0.688 | 1.584 | -0.001 | 0.028 | 0.238 | 4.575 |
Significant | 8 | 0.500 | 0.535 | 0 | 0 | 1 | 1 |
Table 3 shows the results of our Linear Regression model that models the log price in euros as the dependent variable, and the trust, entire home, minimum nights, days of availability in a year, and region as the independent variables. Because the p-values of entire home, minimum number of nights, and availability of days per year are less than 0.05, these four variables are significant predictors of log price per night. Additionally, the intercept is significant as well, which in our case represents the South-West region of Amsterdam as the reference group. So accounting for other regions, availability, minimum nights, trust, and entire home rental, the South-West region of Amsterdam is a significant predictor of log price. Choosing to stay in the South-West region of Amsterdam is going to raise the price of the listing by 85.36 euros on average, after accounting for other regions, availability, minimum nights, trust, and entire home rental.
Interpretation of the coefficients of significant variables, after accounting for other regions, availability, minimum nights, trust, and entire home rental:
On average, the price of an Airbnb in Amsterdam will increase by 1.7 euros if the room type of airbnb is the entire home.
On average, the price of an Airbnb in Amsterdam will decrease by 0.96 euros if the number of reviews is larger than 10.
On average, the price of an Airbnb will increase by 1 euro for every additional day in a year the listing is available for rent on the Airbnb website.
We used Logistic regression to determine how different factors affect whether or not an Airbnb is expensive. The threshold for being expensive is on a student budget of 100 euros, anything above that value we considered as expensive. The only two statistically significant predictors of whether or not an Airbnb listing price for one night is expensive were choosing the entire home and the number of days in a year the listing is available, after accounting for trust, minimum nights, and region of Amsterdam. Below are the exponentiated coefficients and their 95 percent confidence intervals.
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
Odds | 8 | 2.098 | 3.466 | 0.283 | 0.855 | 1.090 | 10.643 |
Lower-bound | 8 | 1.334 | 2.045 | 0.143 | 0.450 | 0.948 | 6.341 |
Upper-bound | 8 | 3.474 | 6.088 | 0.545 | 1.019 | 1.865 | 18.490 |
Significant | 8 | 0.375 | 0.518 | 0 | 0 | 1 | 1 |
Interpretation of significant coefficient odds ratios:
Intercept: For an Airbnb listing in the South-West region that is not available for renting, requires zero nights minimum stay, has shared rooms/spaces with less than 10 reviews the odds of it being expensive change by 0.28
Odds ratio of entire home: Comparing Airbnb listings that rent the full home/unit to those with shared rooms/spaces, the odds of being expensive for Airbnb with entire home increases by 1.3, after controlling for trust, minimum nights, the number of available days, region.
Odds ratio of the number of available days in a year: For every one day increase in the number of available days, the odds of being expensive change by 1, so no actual change to the odds.
Table 4 shows the confidence intervals. We are 95% confident that the true odds for an Airbnb becoming expensive when the customer chooses to rent the entire home is between 6.34 and 18.49, after accounting for trust, minimum nights, and region of Amsterdam. Additionally, we are 95% confident that the true odds for an Airbnb becoming expensive for a 1 day increase in availability during the year is between 1.0008548 and 1.0064249, after accounting for trust, minimum nights, and region of Amsterdam.
Table 5 shows the significance level for all of the variables in our Linear and Logistic Regressions. Entire home rental and the number of available days are both less than 0.05, so entire home and the number of available days are significant predictors an Airbnb rental being expensive.
Dependent variable: | ||
log(price) | expensive | |
OLS | logistic | |
(1) | (2) | |
trusted1 | -0.046 | 0.262 |
(0.042) | (0.218) | |
entirehome | 0.531^{***} | 2.365^{***} |
(0.051) | (0.272) | |
minimum_nights | -0.010^{**} | -0.027 |
(0.005) | (0.023) | |
availability_365 | 0.001^{***} | 0.003^{**} |
(0.0002) | (0.001) | |
regionSE | -0.007 | 0.019 |
(0.057) | (0.296) | |
regionNW | -0.065 | -0.460 |
(0.057) | (0.287) | |
regionNE | -0.021 | -0.073 |
(0.065) | (0.336) | |
Constant | 4.447^{***} | -1.261^{***} |
(0.065) | (0.340) | |
Observations | 500 | 500 |
R^{2} | 0.212 | |
Adjusted R^{2} | 0.201 | |
Log Likelihood | -276.008 | |
Akaike Inf. Crit. | 568.016 | |
Residual Std. Error | 0.457 (df = 492) | |
F Statistic | 18.901^{***} (df = 7; 492) | |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
There are many factors that Airbnb customers take into account when determining which listing to rent. Our analysis only finds the type of listing (entire home vs room in a home), the number of days available, and the location in South-west Amsterdam to be statistically significant, after accounting for trust, minimum nights, and other regions of Amsterdam. From an economic perspective, the price of Airbnb listings is determined by supply and demand of Airbnb. At this time, the Airbnb rentals with more available days become more frequented and perhaps more competitive resulting in higher price. This may be the reason for the number of days available as a significant predictor. For the entire home vs. shared space types of Airbnb listings as a significant predictor, we think that the entire home is thought as a more comfortable and safe rental, which is characteristic for which they are willing to pay. Thus, whether the Airbnb is entire home or not will affect the price significantly. The location of an Airbnb in the South-west of Amsterdam is a significant predictor of their price. The South-west region is in part the city center and museum quarter, which might be a touristy area contradicting the conclusion of Perez-Sanchez et. al’s study. We did not examine any interaction terms, because we did not suspect any interactions during the study.
Future studies might look at examining the prices in Amsterdam using more contextualized information from other disciplines, because our approach of dividing the city into four regions based on cardinal directions is a concept that might work in cities in the US and other countries around the world, but does not seem to be reflected that well in Amsterdam.
Our dataset did not have enough information about which year the listings were from, so a longitudinal study would be more appropriate for accurate modelling the factors that influence prices in Amsterdam. We are also relying on the authors of the dataset for completeness and accuracy of the data.
Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse
Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
V. Raul Perez-Sanchez, Leticia Serrano-Estrada, Pablo Marti, & Raul-Tomas Mora-Garcia. (2018). The What, Where, and Why of Airbnb Price Determinants. Sustainability, 10(12), 4596.
Zhihua Zhang, Rachel J. C. Chen, Lee D. Han, & Lu Yang. (2017). Key Factors Affecting the Price of Airbnb Listings: A Geographically Weighted Approach. Sustainability, 9(9), 1635.