I'm working on something to scrape real estate data from a certain website. It works as a standalone .rb file while saving to a JSON file. But I want this to run on Heroku and save the data to MongoDB.
Problem:
I keep getting the following errors when running:
rake aborted!
SyntaxError: /Users/user/Dropbox/Development/Rails/booyah/lib/tasks/properties_for_sale.rake:35: syntax error, unexpected tLABEL
street_name: @street_name,
^
/Users/user/Dropbox/Development/Rails/booyah/lib/tasks/properties_for_sale.rake:41: syntax error, unexpected tLABEL, expecting '='
bedrooms: @rooms[1],
^
/Users/user/Dropbox/Development/Rails/booyah/lib/tasks/properties_for_sale.rake:42: syntax error, unexpected tLABEL, expecting '='
number_of_floors: @number_of_floors,
^
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/engine.rb:654:in `load'
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/engine.rb:654:in `block in run_tasks_blocks'
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/engine.rb:654:in `each'
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/engine.rb:654:in `run_tasks_blocks'
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/application.rb:362:in `run_tasks_blocks'
/Users/user/.rvm/gems/ruby-2.1.1/gems/railties-4.1.1/lib/rails/engine.rb:449:in `load_tasks'
/Users/user/Dropbox/Development/Rails/booyah/Rakefile:6:in `<top (required)>'
/Users/user/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in `eval'
/Users/user/.rvm/gems/ruby-2.1.1/bin/ruby_executable_hooks:15:in `<main>'
This is the code I'm using:
require 'mechanize'
namespace :properties_for_sale do
desc "Scrape all properties currently for sale"
task :start => :environment do
a = Mechanize.new
@a2 = Mechanize.new
@i = 1
BASE_URL = 'http://www.funda.nl'
def scrape_objects_on_page(page)
objects_on_page = page.search('//*[contains(concat( " ", @class, " " ), concat( " ", "object-street", " " ))]')
objects_on_page.each do |object|
@a2.get(BASE_URL + object[:href] + 'kenmerken/') do |page_2|
break if page_2.title == '404 - Pagina niet gevonden'
@street_name = page_2.search('//*[@id="main"]/div[1]/div/div/div/h1').text.strip
@price = page_2.search('//*[@id="main"]/div[1]/div/div/div/p[2]/span/span').text.strip.gsub("€ ", "").gsub(".", "").to_i
@url = page_2.uri.to_s
@living_area = page_2.search('//*[@id="twwo13"]/td/span[1]').text.strip.gsub(" m²", "").to_i
@content = page_2.search('//*[@id="twih12"]/td/span[1]').text.strip.gsub(" m³", "").to_i
@rooms = page_2.search('//*[@id="aaka12"]/td/span[1]').text.strip.scan(/\d/).to_i
@number_of_floors = page_2.search('//*[@id="twva12"]/td/span[1]').text.strip.to_i
@year = page_2.search('//*[@id="boja12"]/td/span[1]').text.strip.to_i
@broker = page_2.search('//*[contains(concat( " ", @class, " " ), concat( " ", "rel-info", " " ))]//h3').text.strip
@city = page_2.search('//*[@id="nav-path"]/div/p[1]/span[4]/a/span').text.strip
@district = page_2.search('//*[@id="nav-path"]/div/p[1]/span[5]/a/span').text.strip
@province = page_2.search('//*[@id="nav-path"]/div/p[1]/span[3]/a/span').text.strip
@type_of_property = page_2.search('//*[@id="soap12"]/td/span[1] | //*[@id="sowo12"]/td/span[1] | //*[@id="twsp12"]/td/span[1]').text.strip
Property.create = (
street_name: @street_name,
price: @price,
url: @url,
living_area: @living_area,
content: @content,
rooms: @rooms[0],
bedrooms: @rooms[1],
number_of_floors: @number_of_floors,
year: @year,
broker: @broker,
city: @city,
district: @district,
province: @province,
type_of_property: @type_of_property
)
puts Property.last
end
end
end
loop do
a.get("http://www.funda.nl/koop/rotterdam/sorteer-datum-af/p#{@i}/") do |page|
@end = page.search('//h3').text == 'Geen koopwoningen gevonden die voldoen aan uw zoekopdracht' ? true : false
scrape_objects_on_page(page) unless @end == true
@i = @i + 1
end
break if @end
end
puts "==================================================================================="
puts "# Done scraping #{@i - 1} pages and collected #{@all_objects_array.length} objects."
puts "==================================================================================="
end
end
This is what my Property model looks like (MongoMapper):
class Property
include MongoMapper::Document
key :street_name, String
key :price, Integer
key :url, String
key :living_area, Integer
key :content, Integer
key :rooms, Integer
key :bedrooms, Integer
key :number_of_floors, Integer
key :year, Integer
key :broker, String
key :city, String
key :district, String
key :province, String
key :type_of_property, String
end
What am I doing wrong?
You have a typo. Remove the equal sign between Property.create
and the parens. Like below:
Property.create(
street_name: @street_name,
price: @price,
url: @url,
living_area: @living_area,
content: @content,
rooms: @rooms[0],
bedrooms: @rooms[1],
number_of_floors: @number_of_floors,
year: @year,
broker: @broker,
city: @city,
district: @district,
province: @province,
type_of_property: @type_of_property
)
Also, it might be better to store the #create
call in a variable instead of calling Property.last
. That way, you don't have to issue another query.