• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

I Like Kill Nerds

The blog of Australian Front End / Aurelia Javascript Developer & brewing aficionado Dwayne Charrington // Aurelia.io Core Team member.

  • Home
  • Aurelia 2
  • Aurelia 1
  • About
  • Aurelia 2 Consulting/Freelance Work

My Experience Writing a Long-Running PHP Script to Parse News Content From the Associated Press News API

PHP · October 7, 2021

Filed under: super-specific use case with hints of generality for those wanting to write long-running PHP scripts.

For a little while now, I have been building a site with WordPress that consumes news content from the Associated Press news API and then stores the news content in WordPress.

My first iteration of the ingest engine with PHP worked quite well, but I encountered dreaded NGINX server timeouts and other issues.

You see, with the Associated Press API, it works for the feed request a little like this.

  • Make request to feed endpoint
  • Some items might be returned in the first request
  • Parse items (if any) which requires making a separate request for an NITF file
  • If there is a next_page link property, make a request to it

How the AP API is designed is that requests are long-running. A request remains open using a long polling feature for about 15 seconds before requiring you to follow the following page link (if one exists). The idea is that you perpetually connect to the API (especially if there is breaking news).

The nature of PHP and web servers is you won’t be able just to run a script and expect it always to run reliably. PHP isn’t a language that lends itself to long-running processes, nor are servers like Apache or Nginx, but it can work with some patience. A lot of the solutions you will find for this problem go back as far as 2010.

My solution involves creating four shell scripts for ultimate control and using native cron functionality.

  • A script to run our PHP application and create a process for it
  • A script to check on our PHP application to ensure it is running (start it if it’s not)
  • A script to reset our PHP application
  • A script to stop our PHP application

It is not a requirement you use shell scripts. You could very much call commands directly via your crons, but shell scripts will allow you to cleanly maintain your functionality and do status checks and so on.

The only package we are going to require to achieve our constant long-running PHP scripts is nohup. I am using Debian, and by default, nohup comes installed with the operating system. If you don’t have nohup, use your respective package manager to install it.

The first shell script we create is the one that starts everything.

#!/bin/bash
nohup /opt/bitnami/php/bin/php -q /opt/bitnami/wordpress/fetch-news.php >/dev/null 2>&1 &

We are running our PHP script using nohup and forcing it to be in the background. We also suppress PHP errors and output using a combination of -q as well as /dev/null 2>&1

Now, let’s create a shell script that allows us to stop the process.

#!/bin/bash
PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'`
if [ "" !=  "$PID" ]
then
    echo "killing $PID"
    kill -9 $PID
else
    echo "not running"
fi

We are checking if our PHP script can be found in the running processes or not. It then tries to determine if it is, then kills the process using kill and the process ID.

Another script for checking if our script is running or not.

#!/bin/bash
PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'`
if [ "" !=  "$PID" ] 
then
    echo "Parser running on $PID"
else
    echo "Not running, going to start it"
    cd /opt/bitnami/wordpress/
    ./run-news-parser.sh
fi

When I run my cron, this is the script that I call. It will check if my parser is running or not. If no process ID can be found, it’s not running and, therefore, needs to be started using our run-news-parser.sh script.

Some improvements here could be having a maximum number of attempts to start before altogether bailing and maybe sending you a notification something went wrong (remote API went down, credentials expired or revoked, etc.).

And finally, a shell script that can restart our script.

#!/bin/bash
PID=`ps -eaf | grep '/opt/bitnami/wordpress/fetch-news.php' | grep -v grep | awk '{print $2}'`
if [ "" !=  "$PID" ]
then
    echo "killing $PID"
    kill -9 $PID
else
    echo "not running"
fi

echo "Starting again"
/bitnami/wordpress/run-news-parser.sh

This script looks similar to that of the ones that came before it. It’s a combination of the start and stop scripts. If it finds a process, it will kill it and then restart.

This approach might not be pure and some might laugh at how simple it is, but it works and will continue to work well into the future. I am probably going to rewrite these scripts to use Node.js, but for now, it’s something I will keep using because it works (even if a little hacky).

Dwayne

Leave a Reply Cancel reply

0 Comments
Inline Feedbacks
View all comments

Primary Sidebar

Popular

  • Testing Event Listeners In Jest (Without Using A Library)
  • How To Get The Hash of A File In Node.js
  • Waiting for an Element to Exist With JavaScript
  • Thoughts on the Flipper Zero
  • How To Get Last 4 Digits of A Credit Card Number in Javascript
  • How To Paginate An Array In Javascript
  • Reliably waiting for network responses in Playwright
  • How To Mock uuid In Jest
  • How to Copy Files Using the Copy Webpack Plugin (without copying the entire folder structure)
  • Wild Natural Deodorant Review

Recent Comments

  • Dwayne on Is Asking Developers How to Write FizzBuzz Outdated?
  • kevmeister68 on Is Asking Developers How to Write FizzBuzz Outdated?
  • Kevmeister68 on Start-Ups and Companies That Embrace Work From Anywhere Will Be More Likely to Survive the Coming Recession in 2023
  • kevmeister68 on What Would Get People Back Into the Office?
  • Dwayne on PHP Will Not Die

Copyright © 2023 · Dwayne Charrington · Log in

wpDiscuz