Does your application make use of third-party datasets that augment your own information or services? In many cases, those third-party datasets come from API’s that aren’t real-time or are expensive to access. Using scriptr.io integrated storage and scheduler, we can build a system that caches that data and automatically refreshes it on a set interval. Read on for an example that utilizes NYC’s Open Data to illustrate.

open

NYC’s Open Data is an initiative by NYC to make as much data publicly accessible as possible to power “smart city” applications. There are 1300+ datasets available, in various formats and ranges. It’s really quite a mass of content. For our example, we’ll utilize the 311 Service Request API, which is a simple REST endpoint that returns JSON. You don’t even need to sign up for light access like we’re doing here.

View the raw payload in your browser, which includes just a filter that returns only requests that were routed to the Department of Health and Mental Hygeine (DOHMH). In scriptr.io, we can easily grab this using http like so:

var http = require('http');
var requestConfig = {
  url: 'https://data.cityofnewyork.us/resource/fhrw-4uyv.json?agency=DOHMH&$limit=20'
};
var response = http.request(requestConfig);
console.log(response.body);

Once you’ve pasted the above into a new script (call it ‘cache’), save and run and you should see the JSON response in the console. Next, let’s convert this data into a string and save the whole thing:

storage.global.complaints = JSON.stringify(response.body);

Here we’re using scriptr.io’s global storage object which allows us access from any other script. Next, create a new script called ‘complaints’. This will be our actual endpoint for accessing the cached data. We can grab it and return it with one line:

return storage.global.complaints;

Let’s make it a little more robust and allow a user to filter the response by borough. If you enable “Allow Anonymous Requests” and set up your sub-domain, a filtered request would be as simple as a GET to ‘http://yoursubdomain.scriptrapps.io/complaints?borough=QUEENS’. We’ll access those params and produce a filtered response with:

var complaintsArray = JSON.parse(storage.global.complaints);
if (request.parameters.borough) {
  return complaintsArray.filter(function(element) {
    return element.borough === request.parameters.borough;
  });
} else {
  return complaintsArray;
}

You can make sure this works really easily by running a test request using certain parameters from right in the IDE. Try it:

run

run2

So, now we’ve got a resourceful API with a filter. We can do anything with this data now… Accept a lat/lng in params and return the closest complaint or all complaints within a certain radius! Calculate metadata like total counts of various types of complaints in certain neighborhoods! The sky is the limit, but to finish this off we do want to make sure we refresh our cached data. To do that, we’re going to schedule the cache script to run every day and get the latest data. Find the ‘Schedule’ button above your script, set the interval, and you’re off!

sched

sched2

That’s pretty cool, but let’s just imagine for a second that the Socrates platform which NYC Open Data uses actually was able to set up a webhook any time they published updates to the dataset. We could simply set our ‘cache’ script as the endpoint (http://yoursubdomain.scriptrapps.io/cache). Then, anytime a message came in, that script would run and refresh! BOOM. The full program can be found here if you’d like to deploy this to your own scriptr.io account.