Quantcast
Channel: pulling xml feed and detecting changes/deletion php - Stack Overflow
Viewing all articles
Browse latest Browse all 2

pulling xml feed and detecting changes/deletion php

$
0
0

I want to setup an xml feed polling system which would download an xml feed from a given URL every hour and detect whether the feed has changed. If it has, it would need to do a few things.

How can I efficiently accomplish this? The feed I would be pulling would have thousands of items inside and every item may have quite a bit of data in it.

I want to be able to detect any new data/item and save it to a database.
I want to be able to detect any modified data/item and update the database accordingly.
I want to be able to detect any deleted data/item and update it the database accordingly.

The order of items doesn't matter to me, so if the order changes but nothing else does, then we can say the feeds are identical.

I've seen a few people mention hashing the items and the whole feed to compare to the previous downloaded one. If there are many items, this could potentially take long..

Would there be an easy way to do a diff on the last downloaded feed and new one to then somehow remove all identical items? And maybe then go through the items that are left and do the comparison?

I'm not sure what the right approach would be. Any suggestions would be greatly appreciated.

An example of a similar feed I would be pulling would be:

<properties>
<property>
<location>
<unit-number>301</unit-number>
<street-address>123 Main St</street-address>
<city-name>San Francisco</city-name>
<zipcode>94123</zipcode>
<county>San Francisco</county>
<state-code>California</state-code>
<street-intersection>Broadway</street-intersection>
<parcel-id>359-02-4158</parcel-id>
<building-name>The Avalon</building-name>
<subdivision></subdivision>
<neighborhood-name>Marina</neighborhood-name>
<neighborhood-description>The Marina is a neighborhood on the Northern part of San
Francisco</neighborhood-description>
<elevation>10</elevation>
<longitude>-70.1200</longitude>
<latitude>30.0000</latitude>
<geocode-type>exact</geocode-type>
<display-address>yes</display-address>
<directions>Take 101 North to Lombard St. Make a left on Lombard and 3rd right
onto Main. 123 is at the end of the block on the right. </directions>
</location>
<details>
<listing-title>A great deal in the Marina</listing-title>
<price>725000</price>
<year-built>1928</year-built>
<num-bedrooms>3</num-bedrooms>
<num-full-bathrooms>2</num-full-bathrooms>
<num-half-bathrooms>1</num-half-bathrooms>
<num-bathrooms></num-bathrooms>
<lot-size>0.25</lot-size>
<living-area-square-feet>1720</living-area-square-feet>
<date-listed>2010-06-20</date-listed>
<date-available></date-available>
<date-sold></date-sold>
<sale-price></sale-price>
<property-type>condo</property-type>
<description>Newly remodeled condo in great location.</description>
<mlsId>582649</mlsId>
<mlsName>SFAR</mlsName>
<provider-listingid>258136842</provider-listingid>
</details>
<landing-page>
<lp-url>http://www.BrokerRealty.com/listing?id=123456&amp;source=Trulia</lp-url>
</landing-page>
<listing-type>resale</listing-type>
<status>for sale</status>
<foreclosure-status></foreclosure-status>
<site>
<site-url>http://www.BrokerRealty.com</site-url>
<site-name>Broker Realty</site-name>
</site>

etc..


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images