• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

I Like Kill Nerds

The blog of Australian Front End / Aurelia Javascript Developer & brewing aficionado Dwayne Charrington // Aurelia.io Core Team member.

  • Home
  • Aurelia 2
  • Aurelia 1
  • About
  • Aurelia 2 Consulting/Freelance Work

Stopping PHP From Stripping out Hyperlinks From a NITF XML Response While Parsing the XML

PHP · May 27, 2022

This is another of those particular posts that might help one or two people out. If I can save you some time working with the News Industry Text Format in PHP, I’ll be glad that you didn’t experience my frustration.

While working with the Associated Press API, I recently ran into a situation where ingested content from the NITF format they supply was being stripped out in PHP.

The code in question looked like this:

function download_story_nitfy($nitf_href) {
	$nitf_file = file_get_contents($nitf_href . "&include=view_default&apikey=" . AP_API_KEY);
    $nitf_file = str_replace(array("\n", "\r", "\t"), '', $nitf_file);
  
    $nitf_xml = simplexml_load_string($nitf_file);
    $nitf_json = json_encode($nitf_xml);

    return json_decode($nitf_json);
}

The Associated Press API will return the XML, where the content is contained within the <block> element inside of the XML response. Inspecting it in Postman and the browser is fine; however, the content being ingested suffered from missing links, thus breaking the content.

I knew the API response was OK, so I set out to debug and realised that the simplexml_load_string call was stripping out the links inside of the content.

This is a snippet of what the code looked like:

<block>
              <p>KEY DEVELOPMENTS IN THE RUSSIA-UKRAINE WAR:</p>
              <p>— <a href="https://apnews.com/article/russia-ukraine-kyiv-moscow-d01152d589a482b52f1072ce9886fbe1">Scars of war</a> seem to be everywhere in Ukraine after 3 months</p>
              <p>— <a href="https://apnews.com/article/russia-ukraine-government-and-politics-de1d3ccf3ef990a046cafd7209d4653d">Saving the children</a>: War closes in on eastern Ukrainian town</p>
              <p>— Sweden, Finland delegations go to Turkey for <a href="https://apnews.com/article/russia-ukraine-middle-east-turkey-98d9b2bf7de63b3044d118e833626b13">NATO talks</a></p>
              <p>— US to end <a href="https://apnews.com/article/russia-ukraine-janet-yellen-government-and-politics-20dbb506790dddc6f019fa7fdf265514">Russia's ability to pay</a> international investors</p>
              <p>— UK <a href="https://apnews.com/article/russia-ukraine-putin-roman-abramovich-mlb-politics-710a500504e940db9d60ce3e674da346">approves sale of Chelsea</a> soccer club by sanctioned Abramovich</p>
</block>

The XML parser call in PHP would remove those links. They saw those as not being valid, or the parser wasn’t accounting for child nodes. It was a rather frustrating issue, and despite extensive Googling, I found no easy solution. Many were saying to use cdata markers around the links in other use cases.

In the end, that is what I did. Using a regular expression, I wrap all links in the response in CDATA markers.

function download_story_nitf($nitf_href) {
    $nitf_file = file_get_contents($nitf_href . "&include=view_default&apikey=" . AP_API_KEY);
    $nitf_file = str_replace(array("\n", "\r", "\t"), '', $nitf_file);

    $pattern = "/<a (.*?)>(.*?)<\/a>/i";
    $nitf_file = preg_replace($pattern, "<![CDATA[<a $1>$2</a>]]>", $nitf_file);
    
    $nitf_xml = simplexml_load_string($nitf_file);
    $nitf_json = json_encode($nitf_xml);

    return json_decode($nitf_json);
}

I am not the world’s best coder, but this did the trick. Nothing else I tried worked. While this was for working with NITF XML, I assume this issue might crop up in other scenarios. So, this fix might work in your case too.

Dwayne

Leave a Reply Cancel reply

0 Comments
Inline Feedbacks
View all comments

Primary Sidebar

Popular

  • How To Get The Hash of A File In Node.js
  • How To Install Eufy Security Cameras Without Drilling or Using Screws
  • Testing Event Listeners In Jest (Without Using A Library)
  • How to Use Neural DSP Archetype Plugins With the Quad Cortex
  • Which Neural DSP Archetype Plugins Should You Buy?
  • NBN Box Installed Inside of Garage, Where Do You Put The Modem?
  • A review of the Neural DSP Quad Cortex: is this the future of amp-modelling?
  • DJI Mini 3 Pro Review
  • Smoke Detector Randomly Goes Off Early Hours of The Morning
  • How To Mock uuid In Jest

Recent Comments

  • Jay on Neural DSP Reveal Details About the Long-Awaited Quad Cortex Desktop Editor
  • john on Deno Raises $21M – but is anyone using it yet?
  • Oranges on How To Store Users In Firestore Using Firebase Authentication
  • Precious on Fixing Sequel Pro SQL Encoding Error For Imported SQL Files
  • James on A List of WordPress Gutenberg Core Blocks

Copyright © 2022 · Dwayne Charrington · Log in

wpDiscuz