Over the past few weeks I’ve been building up Employbl and accomplished a few non-trivial things. They weren’t trivial for me at least! In this post I’ll go through some of the recent code I’ve pushed up for the Employbl project and remark a bit about the use case.
I hope that other Laravel developers find this post helpful. If nothing else it will be a way for me to reflect on what I’ve learned as a growing software engineer!
Greenhouse is a common Applicant Tracking System (ATS) that many Bay Area startups use to track job applications and store their job listing information. They offer a publically available Job Board API to fetch job listings for a given company.
I wanted to match the companies that I have in my open source dataset with job listings, if the startup or tech company uses Greenhouse ATS.
The only information you need to get a company’s job listings via Greenhouse public api is the “Job Board URL Token” a company has. Most of the time the job board token is the company’s name.
I created an artisan command that gets called by a database seeder. I call php artisan db:seed --force
in the Laravel Forge deploy script when I deploy updates.
My function to get the job listings from the Greenhouse API is pretty straightforward. The function takes in a company name or board token and returns the JSON result from the API call.
public function getGreenhouseListings($boardToken) {
$boardToken = str_replace(' ', '', $boardToken);
$client = new \GuzzleHttp\Client();
$response = $client->request(
'GET',
'https://boards-api.greenhouse.io/v1/boards/'. $boardToken .'/jobs'
);
return $response;
}
I have over 700 companies in my open source dataset of Bay Area companies. I did not know in advance which companies used Greenhouse and which used other ATS systems. (The other popular one is called Lever. They have an API too. But I haven’t gathered job listings from Lever yet.)
For every company in my database I went through and queried the Greenhouse API. If I got a valid JSON response that wasn’t 404 no job board then I stored the job board for that company in a job_boards
table.
One hiccup came up that the Greenhouse Job Listings API returned ALL job listings for a company. Employbl is primarily a resource for companies and candidates in the Bay Area. Having all the job listings for a small to mid-sized startup like Heap Analytics, Qualia or Eatsa that may or may not have too many locations. But over 200 companies in my database of 700 use Greenhouse. That includes major enterprise companies like Lyft and Stripe. Those companies, and many others hire candidates all over the world. I only want to display jobs that are remote or based in the San Francisco Bay Area.
The result was that I parsed through the list of all the different locations, wrote them to a file. Then I uploaded that file to Airtable and picked out the winners.
For instance, Employbl pulls in job listings for the Greenhouse location “Headquarters — Mountain View” and “New York City OR San Francisco OR Seattle” because those are relevant for Bay Area candidates.
It won’t pull in listings for “Cape Town” or “New York”.
Here’s the current state of the function that grabbed the location and saved the job boards for the companies:
/**
* Save job board info for all companies that use Greenhouse
* Write all locations for all of their job listings to a file
*/
public function getJobBoardsAndSaveAllLocations()
{
$companies = Company::all();
$locations = [];
foreach ($companies as $company) {
try {
$boardToken = str_replace(' ', '', $company->name);
$response = $this->getGreenhouseListings($boardToken);
if($response->getStatusCode() == 200){
// check if the job board exists
$existingBoard = JobBoard::where('board_token', '=', $boardToken)->first();
if(!$existingBoard){
// save the job board details
$jobBoard = new JobBoard();
$jobBoard->company_id = $company->id;
$jobBoard->board_token = $boardToken;
$jobBoard->provider = 'greenhouse';
$jobBoard->save();
}
// $decoded = json_decode($response->getBody());
// lowercase the location, trim whitespace and strip all semicolons
// if using in future. Semicolon used as delimiter later on...
// foreach($decoded->jobs as $job){
// // collect all the locations for jobs in an array
//
// if (!in_array($job->location->name, $locations)) {
// $locations[] = $job->location->name;
// }
// }
}
} catch(\Exception $e){
error_log($e->getMessage());
continue;
}
}
// write all locations to file so I can manually pick the ones I want
// $locations_file = fopen(realpath(__DIR__ . '/../../../database/seed_data/greenhouse_locations_' . Carbon::now()->getTimestamp() .'.csv'), 'w');
//
// fputcsv($locations_file, $locations);
}
This function is the Artisan command handle
function for my command to scrape job listings. This artisan command gets run when I seed and deploy the codebase. It first sets all the current job listings to inactive. They won’t show up on https://employbl.com/job-listings until I grab the job listing again. If it’s a new job listing add it to the database.
/**
* Execute the console command.
*
* @return mixed
*/
public function handle()
{
// Set all the jobs we currently have as inactive
DB::table('job_listings')->where('active', '=', 1)
->update([ 'active' => 0 ]);
$locations = file_get_contents(realpath(__DIR__ . '/../../../database/seed_data/greenhouse_locations.csv'));
$locations = collect(explode(';', $locations))->map(function($location){
return trim(preg_replace('/\s+/', ' ', $location));
})->toArray();
$locationsSet = new \Ds\Set($locations);
$jobBoards = JobBoard::orderBy('board_token', 'asc')->get();
foreach($jobBoards as $board) {
try {
$response = $this->getGreenhouseListings($board->board_token);
if($response->getStatusCode() == 200){
$decoded = json_decode($response->getBody());
foreach($decoded->jobs as $jobPost){
// check that location for the job listing is legit
$jobLocation = '"' . trim(preg_replace('/\s+/', ' ', $jobPost->location->name)) . '"';
if($locationsSet->contains($jobLocation)){
$jobListing = JobListing::where('listing_id', '=', $jobPost->id)->first();
if(!$jobListing){
$jobListing = new JobListing();
echo 'Adding new job ' . $jobPost->title . " at " . $board->board_token . "\n";
} else {
echo "Updating " . $jobPost->title . " at " . $board->board_token . "\n";
}
$jobListing->title = $jobPost->title;
$jobListing->url = $jobPost->absolute_url;
$jobListing->listing_location = $jobPost->location->name;
$jobListing->listing_updated_at = Carbon::parse($jobPost->updated_at)->format('Y-m-d H:i:s');
$jobListing->listing_id = $jobPost->id;
$jobListing->company_id = $board->company_id;
$jobListing->listing_source = 'greenhouse';
$jobListing->active = 1;
$jobListing->save();
}
}
}
} catch(\Exception $e){
error_log($e->getMessage());
continue;
}
}
}
Using this function makes sure that Employbl has the latest job listings for the 200+ companies and startups in the SF Bay Area that hire through Greenhouse as their internal Applicant Tracking System (ATS).
That’s how I got Bay Area job listings into Employbl using the Greenhouse API, Laravel and PHP!
The next nifty piece I implemented with Employbl is to change the unique URLs of each job listings. When I first updated the Employbl job listings to use Greenhouse, I added a unique page for each job.
That Laravel route looked like this:
Route::get('/job-listings/{jobListing}', 'JobListingController@show')->name('jobListings.show');
For anyone familiar with Laravel, Django or Ruby on Rails a statement like this in a routes file should look familiar.
One feature I like and currently take advantage of in Laravel is called Route Model Binding. By default, this route will match the job listings the on id of the database record.
That can be updated on the JobListing
model through a getter called getRouteKeyName
. I’m currently doing that with blog posts:
public function getRouteKeyName() {
return 'slug';
}
Instead of matching on /blog/123
my blog posts match on the value in the slug column of the job listings table.
Then when sharing blog posts the url looks like this:
https://employbl.com/blog/companies-that-use-reactjs-in-bay-area
For job listings I wanted the job listing page to look like:
https://employbl.com/job-listings/product-manager-growth-opentable-san-francisco-3813
Instead of having it be /job-listings/3813
. For job listings I didn’t really want to have a slug column either. I wanted the url to be populated by the job title, company name and location. It was fine to include the id, in order to ensure uniqueness!
The solution I found right in the Laravel 5.8 docs.
I had to retrieve a model for a bound route in Laravel.
For the job listings model I added a new method called resolveRouteBinding
:
/**
* Retrieve the model for a bound value.
*
* @param mixed $value
* @return \Illuminate\Database\Eloquent\Model|null
*/
public function resolveRouteBinding($value)
{
// either get the model with the matching id
// or get the model from the string ending in -id
// Expected string: jobTitle-company-location-id
$jobListingId = collect(explode('-', $value))->last();
return $this->where('id', $jobListingId)->first();
}
When Laravel looks for the job listing that matches my url it will pull the id from the end of the string.
One nice side side effect of this is that any url anyone writes will redirect to the correct job listings page, as long as the id is at the end of the string.
This helps with SEO because it’s communicative to Google what is on the page. For the thousands of job listings I pull in and feature on Employbl I wanted descriptive URLs.
This is how I got them! Thanks for reading.
#laravel #php #Greenhouse API #Greenhouse