• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

I Like Kill Nerds

The blog of Australian Front End / Aurelia Javascript Developer & brewing aficionado Dwayne Charrington // Aurelia.io Core Team member.

  • Home
  • Aurelia 2
  • Aurelia 1
  • About
  • Aurelia 2 Consulting/Freelance Work

Stopping PHP From Stripping out Hyperlinks From a NITF XML Response While Parsing the XML

PHP · May 27, 2022

This is another of those particular posts that might help one or two people out. If I can save you some time working with the News Industry Text Format in PHP, I’ll be glad that you didn’t experience my frustration.

While working with the Associated Press API, I recently ran into a situation where ingested content from the NITF format they supply was being stripped out in PHP.

The code in question looked like this:

function download_story_nitfy($nitf_href) {
	$nitf_file = file_get_contents($nitf_href . "&include=view_default&apikey=" . AP_API_KEY);
    $nitf_file = str_replace(array("\n", "\r", "\t"), '', $nitf_file);
  
    $nitf_xml = simplexml_load_string($nitf_file);
    $nitf_json = json_encode($nitf_xml);

    return json_decode($nitf_json);
}

The Associated Press API will return the XML, where the content is contained within the <block> element inside of the XML response. Inspecting it in Postman and the browser is fine; however, the content being ingested suffered from missing links, thus breaking the content.

I knew the API response was OK, so I set out to debug and realised that the simplexml_load_string call was stripping out the links inside of the content.

This is a snippet of what the code looked like:

<block>
              <p>KEY DEVELOPMENTS IN THE RUSSIA-UKRAINE WAR:</p>
              <p>— <a href="https://apnews.com/article/russia-ukraine-kyiv-moscow-d01152d589a482b52f1072ce9886fbe1">Scars of war</a> seem to be everywhere in Ukraine after 3 months</p>
              <p>— <a href="https://apnews.com/article/russia-ukraine-government-and-politics-de1d3ccf3ef990a046cafd7209d4653d">Saving the children</a>: War closes in on eastern Ukrainian town</p>
              <p>— Sweden, Finland delegations go to Turkey for <a href="https://apnews.com/article/russia-ukraine-middle-east-turkey-98d9b2bf7de63b3044d118e833626b13">NATO talks</a></p>
              <p>— US to end <a href="https://apnews.com/article/russia-ukraine-janet-yellen-government-and-politics-20dbb506790dddc6f019fa7fdf265514">Russia's ability to pay</a> international investors</p>
              <p>— UK <a href="https://apnews.com/article/russia-ukraine-putin-roman-abramovich-mlb-politics-710a500504e940db9d60ce3e674da346">approves sale of Chelsea</a> soccer club by sanctioned Abramovich</p>
</block>

The XML parser call in PHP would remove those links. They saw those as not being valid, or the parser wasn’t accounting for child nodes. It was a rather frustrating issue, and despite extensive Googling, I found no easy solution. Many were saying to use cdata markers around the links in other use cases.

In the end, that is what I did. Using a regular expression, I wrap all links in the response in CDATA markers.

function download_story_nitf($nitf_href) {
    $nitf_file = file_get_contents($nitf_href . "&include=view_default&apikey=" . AP_API_KEY);
    $nitf_file = str_replace(array("\n", "\r", "\t"), '', $nitf_file);

    $pattern = "/<a (.*?)>(.*?)<\/a>/i";
    $nitf_file = preg_replace($pattern, "<![CDATA[<a $1>$2</a>]]>", $nitf_file);
    
    $nitf_xml = simplexml_load_string($nitf_file);
    $nitf_json = json_encode($nitf_xml);

    return json_decode($nitf_json);
}

I am not the world’s best coder, but this did the trick. Nothing else I tried worked. While this was for working with NITF XML, I assume this issue might crop up in other scenarios. So, this fix might work in your case too.

Dwayne

Leave a Reply Cancel reply

0 Comments
Inline Feedbacks
View all comments

Primary Sidebar

Popular

  • Testing Event Listeners In Jest (Without Using A Library)
  • How To Get The Hash of A File In Node.js
  • Waiting for an Element to Exist With JavaScript
  • Thoughts on the Flipper Zero
  • How To Get Last 4 Digits of A Credit Card Number in Javascript
  • How To Paginate An Array In Javascript
  • Reliably waiting for network responses in Playwright
  • How To Mock uuid In Jest
  • How to Copy Files Using the Copy Webpack Plugin (without copying the entire folder structure)
  • Wild Natural Deodorant Review

Recent Comments

  • Dwayne on Is Asking Developers How to Write FizzBuzz Outdated?
  • kevmeister68 on Is Asking Developers How to Write FizzBuzz Outdated?
  • Kevmeister68 on Start-Ups and Companies That Embrace Work From Anywhere Will Be More Likely to Survive the Coming Recession in 2023
  • kevmeister68 on What Would Get People Back Into the Office?
  • Dwayne on PHP Will Not Die

Copyright © 2023 · Dwayne Charrington · Log in

wpDiscuz