Quantcast
Viewing latest article 1
Browse Latest Browse All 2

Answer by Charles for pulling xml feed and detecting changes/deletion php

Would there be an easy way to do a diff on the last downloaded feed and new one to then somehow remove all identical items?

Sure, in fact it should be pretty easy. It looks like these are real estate listings, right? If so, the name of the MLS provider and the identifier that they issue for the listing forms a unique key:

<details>
    <!-- ... -->
    <mlsId>582649</mlsId>
    <mlsName>SFAR</mlsName>
    <provider-listingid>258136842</provider-listingid>
</details>

Now that you can uniquely identify each listing, it should be pretty trivial to decide how you will detect changes. I'd personally mangle the XML into a multidimensional associative array, sort every level by key name, then serialize it and run it through a hash routine (say, md5), for that oh-so-attractive sloppy-but-it-works effect. In fact, you already had that idea, kind of:

I've seen a few people mention hashing the items and the whole feed to compare to the previous downloaded one. If there are many items, this could potentially take long..

By hashing each unique entry in the document, you avoid having to reimport the entire thing when a single entry changes. Stick the per-entry hash in with the rest of the data in your database, with the information that makes up the unique key. When the hash changes, the XML has changed, and it's worth re-importing.

And again, once you have that unique key, it's amazingly easy to detect new listings. No matching key in the database? Import.

Likewise, it's amazingly easy to detect deleted listings. Key's in the database but isn't in the XML? Maybe it should be nuked.


Viewing latest article 1
Browse Latest Browse All 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>