170 likes | 180 Views
Explore how big data can improve residential property price monitoring in Indonesia, focusing on the case of Jakarta. Learn about challenges, solutions, and data preparation methods for creating an alternative Residential Property Price Index (RPPI) using hedonic method.
E N D
An alternative Hedonic RPPI for Indonesia using Big Data: The Case of Jakarta*Arief Noor RachmanReal Sector Statistics Division, Bank Indonesia - Luxembourg, February 21, 2019- *All views expressed are those of author, do not necessarily represents view of Bank Indonesia
Introduction • Monitoring property prices dynamics is a necessary task for central banks in order to maintain financial stability in the economy. • Property statistics provide early sign of economic cycle movement. Rising of property prices often leads to an expansionary phase period whereas falling of property prices indicate a contractionary phase. • Calculation of RPPI is a tricky process because houses are infrequently sold, and heterogeneous. • Infrequent transaction and heterogeneity caused a quality problem because the characteristic differences across houses are hard to control with limited frequency of transaction. • Hedonic method is widely known as quality-mix adjustment method.
Current Condition Bank Indonesia’s Existing Residential Property Price Index (RPPI) Primary Market Secondary Market Data Collection: Appraisal form Consultant 1 house (sample) & 3 comparison houses for each sample area. Methodology: Appraisal Method House price = market price for land value + cost price for building value Coverage: 10 major cities Internal Use Only since 2013 (Quarterly + 6 weeks after the end of quarters) Data Collection: Survey to Major Developers (purposive sampling) Questionairequestions: Price, Lot&Building Size, number of unit built and sold, and qualitative questions Methodology: Chain Index Coverage: 18 major cities Published Quarterly (+ 6 weeks after the end of quarters) • Challenges: • Granularity in data collection • Inadequate time series of granular data collection from questinaire refinement • Data collection problems: Issue on the compilation of sales data for primary market & house price data for secondary market • Limited/ unpublished market price for secondary market appraisal
Big Data • Digital technology has grown very fast and has created an extra-large amount of digital data known as “Big Data”. • Big data benefits: creating new indicators bridging time lags availability of existing official statistics an advanced, faster and inexpensive source of data to produce official statistics less burden for respondents • Big data challenges: data quality concerns legal access to the data continuity of data access technology and advanced skills requirements
Utilization of Big Data for Official Statistics • How do “big data” give benefits for Bank Indonesia in monitoringresidential property market? • Developed an alternative RPPI from property advertisement web portals to challenges the existing RPPI. • The new RPPI employed hedonic method to calculate robust asking price indexes for secondary market property (used houses) in five districts in Jakarta. • Asking prices is still a feasible solution for monitoring purposes, especially in the absence of declared transaction data such as administrative data from land registry or property tax records
Data Sources Property online ads from 2 biggest property online websites Data properties • Ads ID • Title • Status of property : sell/rent • Type of property(house/apartment/villa/condotel/condominium) • Ads date: start & end date/sold date • Property price • Lot & building size • Number of bedroom & bathroom • Address • Description
Workflow Portal‘s FTP/HTTP Server • Non Disclosure Agreement (NDA) with property website for data acquisition • Property portal shared the data using FTPS/HTTPS. • Loaded into Hadoop ≈ 2.2 million ads/month
Data Issues (1) Human error in data entry, i.e: • Price = Rp. 0, Price = Rp. 16 trillion ($ 1.2 billion) on small size property • Land Size = 0 sqm, Land Size = 1 sqm • Typo on city/regency name (2) Non standardized address data (freetext field) • District/sub district, e.g: Bogor, Bgr • Street name without district name, e.g: Jl. Kesadaran Sukmajaya (3) Duplicate ads, caused by: • One property can be advertised by more than one seller in a single portal • One property can be advertised by one seller across portals • Ads re-post after expiration date
Data Preparation/Pre-Processing City Detection • To generate the addresses shown in the data to the city level. • Some portals do not provide city/district’s data. • Map district/sub-district into city/regency using Indonesian Statistics City/regency list. • Map address into city/regency using Google Maps Geocoding API. • Example: Kampung Rambutan South Jakarta Jl. Kesadaran Sukmajaya Depok Cleansing Deletion of irrelevant characters such as HTML tag on advertisement title and description. Duplicates Removal Advertisements are identic if: • The same attributes values on city/regency, land size, building size, number of bathrooms, and number of bedroom. • Price difference ≤ 5% • String similarity score for address and ads title ≥ 0.8 (scale of 1) using LevenshteinDistance. Column Mapping • Different portals have distinct format and column structure. • Needed to standardize column structure, column name, and its delimiter across portals.
Edits and Statistical Trimming • Removed data with missing value price, lot size, building size, numbers of bedroom and bathroom. • Removed the spurious values of price data using a median absolute deviation (MAD) test on price per unit of property size and building size. • Based on normal distribution assumption, we: • Cut data with Lot size that greater than 600 squared-meter • Deleted data with number of bedroom that greater than 10 • Deleted data with number of bathroom that greater than 8 • Performed two regressions. 1st for detecting outlier then apply Cook distance method to removed the outlier. 2ndregression to produce the indexes. • Data records:
Hedonic Model Model Semi-log time dummy variable: Where: • is the price of property n at time t; • is k characteristics variable of n property at time t; • β0and βk are intercept and house characteristics parameters; • δτare dummy coefficients. Variables • Price (natural log) • building size • lot size • Number of bedrooms (3 dummies): dummy 1&2, dummy 3, and dummy greater than 4 (>4); reference (4) • Number of bathrooms (3 dummies): dummy 1, dummy 2, and dummy greater than 3 (>3); reference (3) *reference based on the highest frequency number of bedrooms and number of bathrooms. • Rolling window technique to calculate index:
Regression Result • Result • Explanatory variables were statistically significant, stable, and in line with a prioriexpectation. • High explanatory power. • Heteroscedasticity issue: • Using robust standard errors to improve the t-statistics value
Conclusion and future work • Our Hedonic indexes compilation showed a promising result and have the potential as official RPPI in the future. • The regression outputs represent robust “baseline” models for index compilation. • Our web listing advertisement observations seem more homogenous in nature, --as indicated by high explanatory power given limited characteristics variables available. • Smoothing may give a better option for the published index in order to reduce short-term volatility. • For further development, extend the coverage to other large cities in Indonesia (depend on the suitability of the listing data and the relative importance of cities according to the national property market share). • Regularly reviewing the model performance and updating the weights • Enhance the models by including a more granular spatial adjustment and other characteristics. • Concerns about the data quality and long term data continuity still exist.
Hedonic indexes (Smoothed) • We compiled the indexes into Total Jakarta index using the individual mortgage values data as weighting. • Mortgage data is used as a proxy of property transaction value in the absence of a more representative measure of property market structure such as, tax revenues for property transfer transaction.