As an organization, Chitika has wholeheartedly invested in building out the Chitika Insights research program through staffing, rigorous process management, and a quality data infrastructure. Additionally, our reports rely on our vast trove of ad impression data and our knowledgeable and skilled team of data scientists. It’s this impression-level data and attention to statistical detail that has ultimately led our research to be cited by The New York Times, Wall Street Journal, CNN, Bloomberg, and many others.
In our last edition of Logging Data, we introduced Cluster Map Reduce, or CMR. The new tool acts as an alternative to Hadoop and HDFS when paired with a POSIX compliant clustered file system, simplifying the movement of data through the analytical back end, and helping to minimize the dependencies and potential points where the data pull process may slow or stop altogether. Today, we’re proud to provide this tool to the world as a free, Open Source release!
Thus far, our Logging Data series has focused on the nuts and bolts of our network operations and data infrastructure. While we employ some terrific software and hardware, our proverbial secret sauce consists of the various customizations we employ using these tools. No place was this more evident than during the transition from HDFS to Gluster, and the subsequent porting of Hadoop resources. The team here is well versed in working around issues, so after some brainstorming, the solution pretty much morphed into “Let’s just build something internally that fulfills our needs better than Hadoop.” Not an easy task, but one that our Operations and DI teams took on readily
We’ve briefly mentioned our implementation of Infiniband in both of the previous Logging Data posts without giving a thorough explanation of its function and capabilities within our architecture. In this latest installment, we’ll be doing just that, along with discussing our corresponding Hadoop framework.
The previous installment of our Logging Data series outlined how individual impressions move through our network. In this edition, we’ll discuss the necessary storage considerations cataloguing all of these impressions effectively 24 hours a day, specifically focusing on the challenges that result from the requirements of ad network operations.
In this “Logging Data” series, we’ll provide some in-depth detail on the intricacies of data collection, infrastructure, and access here at Chitika, hopefully providing some useful lessons for both newcomers and veterans in the field. Our first post will focus on our logs – the baseline of our data collection – and the subsequent processes that coalesce the information they contain into more readily accessible formats for our data scientists.
As Chitika displays hundreds of millions of ads every day, a tremendous amount of data need to be stored to fulfill business requirements. In our case, that figure amounts to roughly 1 to 1.2 TB per day. The Chitika Data Infrastructure and Engineering teams each have several Minecraft aficionados among them, and our recent side project visualizes roughly 10 terabytes of this data as large towers of 8-bit 3D-rendered blocks. We call it, aptly, the Great Wall of Data.
Remember that time we completely overhauled our PartnerCenter to speed up your reports, smooth out the payment process, and enhance mobile access? We like to think that if you’re not ahead of the game, then you’re not in the game. That’s why we recently upgraded all of our ad code to be better (and more ahead of the game) than ever!
While almost any company outing presents a great way to bond with coworkers, Chitika’s spring outing provided the added benefit of helping out our local community on top of the usual drinks and fun. Prior to hitting up Pinz in Milford for lunch, drinks, bowling, laser tag, and nostalgic video game competitions, our Westborough HQ employees teamed up with United Way to clean and maintain the Assabet River Rail Trail in central Massachusetts.
Earlier this week on Digital Point, Chitika publisher Chris discussed how he got his forum, DeadMansCrossZone.com, upgraded to Gold level while attracting a relatively small audience - tens of thousands of impressions per day. While we’ve discussed major considerations in terms of getting your site Gold account ready, here we’ll be discussing some of the unique considerations for smaller websites looking to achieve Gold account status as quickly as possible.