Technical Module

last updated 10/19/14

Slides

Slide show available here

Prototype

Scraping ACRIS with Python:

Sample HTML table obtained from ACRIS through a POST request.

Screen shot from QGIS showing study area in Prospect Lefferts Gardens.


Sample of data outputted from Map-PLUTO using QGIS:

Final Python code used for ACRIS web scraping:

from sys import argv
import csv
import requests
import bs4
import time
import random

script, filename = argv

# f = open(filename)

url = "http://a836-acris.nyc.gov/DS/DocumentSearch/BBLResult"
headers = {'User-Agent' : 'Mozilla/5.0'}

count = 0

with open(filename, 'rb') as f:
    reader = csv.reader(f)
    next(reader, None)

    try:
        for row in reader:
            print row
            # example url from arcis http://a836-acris.nyc.gov/bblsearch/bblsearch.asp?borough=3&block=1306&lot=35
            block = row[1]
            lot = row[2]
            address = row[11]
            #print  "the block is %s and the lot is %s" % (block, lot)
            url2post = "http://a836-acris.nyc.gov/bblsearch/bblsearch.asp?borough=3&block=%s&lot=%s" % (block,lot)
            print "the address is %s and the url is: %s" % (address, url2post)

            data = {
            'hid_borough':'3', 
            'hid_borough_name':'BROOKLYN / KINGS', 
            'hid_block':block, 
            'hid_block_value': block, 
            'hid_lot':lot, 
            'hid_lot_value': lot,
            'hid_doctype_name':'All Document Classes',
            'hid_max_rows':'10',
            'hid_page':'1',
            'hid_SearchType':'BBL',
            'hid_ISIntranet':'N'
            }
            print data

            t = open(address + ".csv", 'w+')
            # write column headers
            t.write("Reel/Pg/File,CRFN, Lot, Partial, Doc Date, Recorded / Filed, Document Type, Pages, Party1, Party2, Party3 / Other, More Party 1/2 Names, Corrected / Remarks, Doc Amount\n")

            response = requests.post(url, headers=headers,data=data)
            soup = bs4.BeautifulSoup(response.text)
            print response

            table = soup.find(attrs={"cellspacing":"1","width":"100%"})

            # iterate over table
            for row in table.find_all('tr')[1:]:
                
                for col in row.find_all('td')[1:]:
                    # f = col.find_all('font')                    
                    for f in col.find_all('font'):
                        value = f.string
                        print value                                               
                        try:                           
                            # print value.strip()
                            value = value.replace(',','')
                            t.write(value.strip())
                            t.write(',')                                    
                        except Exception:
                            t.write('*,')
                            pass
                        count +=1
                    if count !=0 and count % 14 == 0:
                        t.write('\n')

            t.close()
            time.sleep(random.randrange(32,48))
    except csv.Error as e:
        sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

Building off of the ACRIS data scraping to design a map based UI:

Landing page for the UI

Zooming in to select a lot

Hovering on a lot reveals basic info

Drawing a rectangle to select lots to retrieve transaction history

To Do:

  • refine the UI, eg: toggling the tax-lot selection, displaying retrieved tables, etc.
  • Implement the Backend part of the app that stores the tables for lot transaction histories and retrieves data based on the user’s query.

Design Brief

A PDF of the design brief is available here