Jul. 18th, 2009

softpaw: (Default)
For the last 24 hours or so, my stomach's been really unhappy, and I'm not sure why. Presumably, it was something I ate. I'll spare description of my symptoms, but suffice to say I've spent almost all my time today either in bed or in the bathroom. I tried to eat some ramen today, the "safest" food in the house (which is really sad, but unfortunately true), and it didn't want to stick around either :-\

=======

What time I didn't spend feeling bleh today, I spent working on a PHP-based Twitter display widget. I know I just mentioned finding one I liked yesterday, and while it does its job relatively well for now, it has some drawbacks. Mainly regarding its speed; since it uses RSS instead of the true API, and has to load the RSS file on every page load, it adds a lot of overhead. According to my script-time calculator on my site, nearly a full second. And, while this is borderline-tolerable on my personal site, it's unacceptable for another project I have in mind.

After a bit of research, I found out that the Zend Framework has a Twitter API built right in. And Lupinia's server just happens to have the Zend Framework installed! So, all I had to do was include a file, and poof, I could retrieve every piece of data I could want. Full user timeline with much more per-post detail than the RSS feed provides, full user info, and if I wanted to, functions for working with direct messages and following/unfollowing. Basically, everything the Twitter website can do, the API can do, and the PHP scripts provided by Zend make everything just a function call away. This will be easy, right?

Well, sort of. Working with the data live is a breeze, but there's a catch: The Twitter API limits each authenticated user or IP address to 150 calls per hour. On average, that's about 1 call every 24 seconds. Might sound like a lot, but on a busy website, it doesn't take long to blow through that many hits. So, in order to keep from hitting the limiter and getting blacklisted, I need to implement some sort of data caching. And thus creates the problem I've been trying to solve for five hours. And yes, I know there's a whitelist to greatly increase the limits, but I'm not on it yet, and even if/when I do attempt get the server whitelisted, they really frown on applications that don't implement caching of some sort.

The API returns data in XML format. But not just raw XML, it goes one step further and creates a SimpleXML object containing it. If you're not familiar with SimpleXML, it's basically the Mac of XML parsing functions for PHP. It makes it stupidly-easy to work with XML data, and is relatively good at its job, but god help you if you're trying to make it work with other functions or methods. The Zend Twitter API makes this even weirder by returning an object of its own, containing a SimpleXML object with no apparent name, and that doesn't seem to be accessible by itself, even though other SimpleXML methods work properly with it. It's the weirdest thing I've ever seen, and by my understanding of PHP, it really should not work at all.

In order to cache the data, I need to store it somewhere that can be accessed across sessions for at least 30 seconds. This leaves two options, storing data to a file, or to a database, and since the volume of data would make it downright painful to move back and forth to a database, it's pretty attractive to try to keep it in XML format and store it to a file. This is much easier said than done, apparently. While SimpleXML provides a handy method for exporting XML data to a file, it doesn't want to work properly with the weird abomination that is the object I'm trying to work with. Here's a rundown of everything I've tried.

-Attempting to dump the entire XML object from the object root returns an error, because the root is not technically a SimpleXML object.

-Attempting to access the actual root in any way returns null, because the only name I have for it (accessed via var_dump) is invalid.

-Accessing one step below root works just fine, and I can dump raw XML data from there. However, SimpleXML's export function can't seem to understand the structure of the element when I do it that way, because it returns only the first element, instead of all 20 tweets.

-Attempting to export multiple elements works, but it somehow becomes an invalid XML file in the process, making it impossible to re-import.

-Attempting to use Xpath to retrieve all elements and export them returns null, because I can't seem to get a valid absolute structure out of this monstrosity, and thus can't give it valid search criteria.

After finding no real way to retrieve cacheable data from the SimpleXML objects, I decided to go with plan B, serializing the whole mess and storing that to a text file. But that didn't work either, because SimpleXML objects are impossible to serialize, and PHP refuses to do anything about it (as blatantly stated in a bug report on the matter). Their advice is to use the export function and re-parse the data on reload, which I can't do in this case because I can't export the data! Gah!

So, anyone have any advice? Using another PHP Twitter library would theoretically be good advice, except that I can't find one that doesn't require using JS, or that uses XML instead of JSON (the latter of which I'm really not familiar with), or that is well-documented enough to be useful without spending a day reverse engineering it.

Here's the object structure as returned by var_dump, if that helps. I trimmed out the middle of the array, there's no need to show 20 tweets in this post.
Click for variable structures )

Profile

softpaw: (Default)
Natasha Softpaw

December 2013

S M T W T F S
1234567
8910111213 14
15161718192021
22232425262728
293031    

Style Credit

Expand Cut Tags

No cut tags