A CIKM '13 Tutorial

Real-time Bidding: A New Frontier of Computational Advertising Research

The slides could be downloaded from the SlideShare.net website.

Basic Information

Time and venue

In the afternoon, 28th Oct 2013, venue to be announced soon.

Target audience information and relation to CIKM areas

The content of the tutorial is intermediate, and is targeted to PhD students, general researchers and practitioners in information retrieval and knowledge management.

Computational Advertising has been an important topical area in information retrieval and knowledge management. This tutorial will be focused on real-time advertising, aka Real-Time Bidding (RTB), the fundamental shift in the field of computational advertising. It is strongly related to CIKM areas such as user log analysis and modelling, information retrieval, text mining, knowledge extraction and management, behaviour targeting, recommender systems, personalization, and data management platform.

Prerequisite knowledge of audience

Basic knowledge of Information Retrieval and Data Mining, and good understanding of Probability and Statistics

Speakers and Affliations

Dr Jun Wang is a Senior Lecturer (Associate Professor) in University College London. His main research interests are statistical modelling of information retrieval, collaborative filtering and recommender systems, and computational advertising. His team has been working on bringing financial methods into computational advertising and is the first to introduce options contracts for selling online ads. Jun has published over 70 research papers in leading journals and conference proceedings including ACM Trans. on Information Systems, IEEE Trans. on Multimedia, ACM Multimedia System Journal, WWW, CIKM, ACM SIGIR, SIGMM. He was a recipient of the Beyond Search – Semantic Computing and Internet Economics award in 2007; he also received the Best Doctoral Consortium award in ACM SIGIR06 for his work on collaborative filtering, the Best Paper Prize in ECIR09 for his work on applying Modern Portfolio Theory of Finance (Mean-variance Analysis) to document ranking in Information Retrieval, and the Best Paper Prize in ECIR12 for top-k retrieval modelling. Jun’s recent service includes (Senior) Program Committees for SIGIR, CIKM, and RecSys. He has extensive experiences in giving tutorials on top conferences; his recent tutorials about risk management and portfolio theory of information retrieval were given in CIKM2011 and ECIR2011. He has addressed a keynote on real-time bidding and financial methods for computational advertising in SIGIR 2013 workshop on Internet Advertising: Theory and Practice.

Dr Xuehua Shen is CTO and co-founder of iPinYou. He received BSc of Computer Science in Nanjing University, China, and PhD of Computer Science at University of Illinois at Urbana-Champaign, USA. His PhD thesis is personalized search. After PhD research, he worked in Google search quality team at Mountain View, CA, doing personalized search, and search quality live experiment platform based on real user interactions. He then worked in BlueKai, the biggest data exchange and Data Management Platform (DMP) in Silicon Valley, using Hadoop cloud-computing platform to do personalized ads and predictive modelling. Now, he is the CTO and co-founder of iPinyou.

Dr Samuel Seljan is a Quantitative Analyst at AppNexus. His work has focused on supply-side optimization, including the development of algorithms to improve the allocation of impressions across RTB and non-RTB markets and reserve price optimization. He obtained a PhD in Political Science from the University of California, San Diego in 2010, where he specialized in international political economy. His dissertation used prediction market and stock market data to estimate beliefs about the effect of international conflict across industries.

Shuai Yuan is completing his PhD thesis in University College London. He has been working on mathematical models of online advertising with a number of companies such as Advanced International Media, Bright, Dot.tk, and Miaozhen. He has the background of information retrieval, data mining, machine learning, and economic theories; his research interests on computational advertising have been focused on RTB, statistical arbitrage, and supply side optimisation. Shuai Yuan has published several related research papers in top-tier conferences and workshops including CIKM and ADKDD. Among them, he published the first empirical study on RTB. He also contributes to an open advertising dataset project.

iPinYou is the largest Demand Side Platform (DSP) and the leader of audience targeting and real-time advertising in China. It makes intelligent decision for more than 3 billion ads impressions each day. In the past two years, iPinYou is the pioneer of programmatic buying of display media in China and organizes the annual RTB Summit. It has more than 150 employees and is headquartered in Beijing and has offices in Shanghai, Guangzhou, and Silicon Valley. At present, iPinYou is organizing a Global RTB Bidding Algorithm Competition

AppNexus is one of the largest real-time advertising platforms (exchanges). As of 2010, the company auctions more than four billion ads per day on a real-time basis. As an ad exchange, it offers one of the most powerful, open and customizable advertising technology platforms, serves many of the largest and most innovative buyers and sellers of online advertising, including Microsoft Advertising Exchange, Interactive Media (Deutsche Telekom), and Collective Exchange. Led by the pioneers of the Web’s original ad exchanges at Yahoo!’s Right Media and Google’s DoubleClick, AppNexus offers an advanced technology platform that empowers companies to build, manage, and optimize their entire online advertising businesses. Every day, hundreds of companies around the globe buy and sell billions of online ads using AppNexus' real-time ad serving technology, advanced yield management controls, optimization algorithms, and patented brand and safety monitoring.

Full Description


Information Retrieval, Text Mining, Language Models, Mathematical Models, Game Theory, Auction Theory, Relevance Models, Recommender Systems, Personalization, User Modelling, Behaviour Targeting, Computational Advertising, Real-Time Bidding, Programmatic Buying, Programmatic Reserve, Machine Learning, Optimisation, Ad Exchange, Demand-Side Platform, Supply-Side Platform, Data Management Platform, Campaign Management, Keywords Management, Budget Management, Yield Management


Online advertising is now one of the fastest advancing areas in IT industry. In display and mobile advertising, the most significant development in recent years is the growth of Real-Time Bidding (RTB), which allows selling and buying online display advertising in real-time one ad impression at a time. Since then, RTB has fundamentally changed the landscape of the digital media market by scaling the buying process across a large number of available inventories. It also encourages behaviour (re-)targeting, and makes a significant shift toward buying focused on user data, rather than contextual data. A report from IDC shows that in 2011, global RTB based display ad spend increased by 237% compared to 2010, with the U.S.’s $2.2 billion RTB display spend leading the way. The market share of RTB-based spending of all display ad spending will grow from 10% in 2011 to 27% in 2016, and its share of all indirect spending will grow from 28% to 78%.

Scientifically, the further demand for automation, integration and optimization in RTB brings new research opportunities in the CIKM fields. For instance, the much enhanced flexibility of allowing advertisers and agencies to maximize impact of budgets by more optimised buys based on their own or 3rd party (user) data makes the online advertising market a step closer to the financial markets, where unification and interconnection are strongly promoted. The unification and interconnections across webpages, advertisers, and users require significant research on knowledge management, data mining, information retrieval, behaviour targeting and their links to game theory, economics and optimization.

Despite its rapid growth and huge potential, many aspects of RTB remain unknown to the research community for a variety of reasons. In this tutorial, teamed up with presenters from both the industry and academia, we aim to bring the insightful knowledge from the real-world systems, to bridge the gaps between industry and academia, and to provide an overview of the fundamental infrastructure, algorithms, and technical and research challenges of the new frontier of computational advertising.

Aims and Learning objectives

This tutorial aims to provide not only a comprehensive and systemic introduction to RTB and computational advertising in general, but also the emerging research challenges and research tools and datasets in order to facilitate the research. Compared to previous Computational Advertising tutorials in relevant top-tier conferences, this tutorial takes a fresh, neutral, and the latest look of the field and focuses on the fundamental changes brought by RTB.

We will begin by giving a brief overview of the history of online advertising and present the current eco-system in which RTB plays an increasingly important part. Based on our field study and the DSP optimisation contest organised by iPinyou, we analyse optimization problems both from the demand side (advertisers) and the supply side (publishers), as well as the auction mechanism design challenges for Ad exchanges. We discuss how IR, DM and ML techniques have been applied to these problems. In addition, we discuss why game theory is important in this area and how it could be extended beyond the auction mechanism design.

CIKM is an ideal venue for this tutorial because RTB is an area of multiple disciplines, including information retrieval, data mining, knowledge discovery and management, and game theory, most of which are traditionally the key themes of the conference. As an illustration of practical application in the real world, we shall cover algorithms in the iPinyou global DSP optimisation contest on a production platform; for the supply side, we also report experiments of inventory management, reserve price optimisation, etc. in production systems.

We expect the audience, after attending the tutorial, to understand the real-time online advertising mechanisms and the state of the art techniques, as well as to grasp the research challenges in this field. Our motivation is to help the audience acquire domain knowledge and obtain relevant datasets, and to promote research activities in RTB and computational advertising in general.

Description of topics

The demand side Web advertising has long promised to provide marketers with a more objective measurement of its value than print or television marketing. Indeed, a large percentage of display and mobile ads are valued on the basis of how often users convert on an advertiser's offer — such as buying a product, signing up for a service, or downloading an app [39]. With the emergence of RTB, the calculations of the probability of such events have become more and more precise. Thus, it would appear that online, advertisers only pay for campaigns and inventory that work. Therefore, a natural challenge is to allocate budgets to high performance inventories and segments, where a general exploration and exploitation dilemma is introduced [42]. Besides, since conversions occur separately from the display of the ads themselves and because many ads can be seen before conversions occur, a choice must be made as to how to attribute conversions to an ad or among ads. Given the scale of RTB, simply assigning the conversion to the most recent associated ad—known as last touch attribution—is a common choice, even though it is known to be an imperfect option. In addition, some marketers prefer to split conversion attribution among all associated ads. Assigning attribution across ads presents additional analytical and technical challenges that the ad tech industry has yet to fully address [37].

One could argue that, from the perspective of advertisers, the value of an ad should be its causal influence on total sales (less its costs) [38]. However, few advertisers or networks today implement the type of control randomized display of ads that would facilitate valid causal inference or look at sales lift beyond conversions. In the abstract, the ideal design for a more valid and holistic assessment of the performance of marketing is known and industry and researchers have collaborated to solve the problem in some instances . However, implementing an experimental methodology at scale ¬(for many different types of advertisers and campaigns) while also optimizing individual level response rates requires a lot more effort. Ad Exchanges such as AppNexus are engaged in the initial steps of implementing experimental methodology at scale, but much works remains to be done and many opportunities for collaboration are available.

The supply side In the online advertising eco-system, yield optimisation is the core problem of the supply side, aka, publishers, and is commonly dealt with by managing revenue channels [28]; selecting relevant ads [36]; optimising reserve prices [25, 29]; controlling advertising level [11], etc. The community has focused largely on inventory management, especially handling guaranteed contracts with overlapping targeting [9, 15], and balancing between these contracts and non-guaranteed deliveries [28]. In addition, some researchers have compared popular pricing models, such as Cost Per Mille (CPM), Cost Per Click (CPC) and Cost Per Acquisition (CPA) [18, 21, 19, 14] and offered suggestions for optimizing among them.

With the emergence of RTB publishers are given more freedom to manage their inventories at a finer granularity. For instance, to set optimal reserve prices (sometimes referred to as hard floor prices) dynamically as a function of private valuations based on the context, or even towards a specific buyer [24, 25]. If the second price auction is used and the reserve price is between the first and second highest bids, the winner has to pay more than without a reserve price. On the other hand, the soft floor prices are also popular in modern exchanges [40]. Unlike hard floor prices, the soft ones are usually hidden. If it is higher than the highest bid, the publisher always charges the winner what he has bid but without the risk of failing the auction. That is, by setting a high soft floor price, the publisher can easily switch from the second price auction to the first price one, which leads to the debate of which auction mechanism is superior in the real world [17, 3]. The Generalised Second Price auction (GSP) has been considered the de facto standard for long, however it is not always true in the RTB scenario where publishers have more control. In fact, in the auction dataset that we plan to report, there are 40% first price auctions taking 55% of total advertisers’ budgets [40].

Datasets and evaluation For the purpose of evaluation and promote research in the field, we will also cover iPinYou global DSP optimisation contest , which kicked off on April 1, 2013 and will end no later than December 31, 2013. The purpose of this competition is to improve DSP bidding algorithms, stimulate the interest of research and development of RTB algorithms in the whole data science research community, and speed up the growth of RTB-enabled display advertising ecosystem. During online evaluation, iPinYou opens its DSP platform and embeds participants' algorithms to live bid ad impressions for advertisers. So far, the competition has attracted several hundred participants from 7 countries and the first milestone prize has been awarded to a team from Chinese Academy of Science. Princeton University, University of California at Santa Cruz, and Beijing University have adapted the competition as the course project. To facilitate the research in RTB, in this tutorial, we shall introduce the competition and examine the formal evaluation and datasets.

Research context

Before the emergence of RTB in 2009 (i.e., announcement of support by major ad exchanges [16]), the display advertising market was primarily divided between so-called premium and remnant sales channels. In the premium marketplace which covered approximately 40% of impressions, publishers would and still do negotiate and make deals with advertisers directly. Advertisers propose to buy a certain amount of impressions from select placements, regardless of the identities of users, when and how many times they have seen the ad, and so on. From the advertisers’ perspective, the purchase of premium inventory is about the aggregate quality and quantity of impressions and relies on the reputation of the publisher and reported audience profiles [10, 5, 1]. Thus, publishers need to guarantee the delivery of impressions that have been agreed upon otherwise a penalty fee would be incurred [28, 30]. The pricing models used in premium contracts are mostly Cost Per Mille (CPM). Since advertisers have no control over the inventories or users, it is more difficult to deploy goal-driven campaigns (e.g. booking a ticket) than branding ones (e.g. announcing a new product). These contracts are sometimes called guaranteed display advertising [4].

Remnant inventory, by contrast, was traditionally sold through ad networks. Publishers would register their placements with networks and offer impressions for sale. Networks would sell impressions largely via second price auctions among their registered advertisers. However the impressions in ad networks are non-guaranteed, as opposed to premium contracts.

Thus, it was and still is the ad network’s responsibility to understand the webpage and the user, and to select advertisers based on their pre-defined targeting rules. The understanding of webpage is usually referred to as contextual advertising, where ad networks crawl, parse, and extract keywords which summarise the target [41]. Advertisers bid on these keywords which is very similar to sponsored search [27, 35, 2, 32]. A more advanced approach is to learn a model including various features of webpages, which could then be used to compute a relevance score of advertisers’ targeting criteria [20, 6, 7]. The understanding of user is usually referred to as behavioural targeting, where ad networks utilise the browsing history of a specific user to infer his interests, as well as geographical location, local time, etc. for target matching [34, 26, 33].

In ad networks advertisers largely adopt the Cost Per Click (CPC) or Cost Per Acquisition (CPA) pricing models where they only pay when certain goal is achieved. These choices reduce their risks and thus are good for goal-driven campaigns. But then it is ad networks’ responsibility to optimise to maximise clicks or conversions. In order to take the measurement of performance into account, ad networks usually employ the Generalised Second Price auction (GSP) [12] which allow them to apply bid biases (e.g. the quality score) that usually weight the historical Click Through Rate (CTR) or Conversion Rate (CVR) heavily.

For the supply side, an important research topic is how to allocate impressions between guaranteed contracts and non-guaranteed ad networks/exchanges [30, 28, 13]. When selling via guaranteed contrasts, publishers need to decide which specific contract to fulfil when multiple contracts having overlapping targeting rules. It is possible that ad networks bring in more revenue, but if a publisher sends too many impressions to this revenue channel and fails to fulfil a contract, he needs to pay a good- will penalty.

When there were more and more ad networks, the excessive impressions in some of them led to the birth of ad exchanges. It is preferable to have more demand than supply because intense competition leads to higher revenue of both ad networks and publishers. However, when there are plenty of impressions unsold, ad networks try hard to find buyers even at low prices. Besides, a common practice for advertisers was to register with multiple ad networks to find cheap inventories, or at least to find enough impressions within their budget constraints. They only found that managing numerous channels difficult and inefficient (e.g. how to split the budget). Ad exchanges, like Google AdX, Yahoo! Right Media, Microsoft ad exchange, and AppNexus were created to address this problem by connecting hundreds of ad networks together. Advertisers then have a higher chance to locate enough impressions with preferred targeting rules. Publishers may receive higher profit too because more demand competes for their inventory.

There are new research problems introduced by ad exchanges. In the pioneer work of [23] the author discussed several issues including the truthfulness of auctions [22], call-out optimisation [8], arbitrage bidding and risk analysis, etc. There are also noteworthy attempts to introduce concepts from finance market [31] which would make the ad exchange more mature and attractive.

When advertisers want to take advantage of RTB, they work with ad exchanges through 3rd party platform that are usually referred to as the Demand Side Platform (DSP). DSPs are delegates of advertisers that answer bidding requests and optimise campaigns at the impression level. At the other end, the Supply Side Platform (SSP) was created to serve publishers. Similarly, SSPs provide a central management console with various tools for publishers’ ultimate goal — the yield optimisation.

To give more examples of the potential research challenges, consider the following: SSPs normally do not query data exchanges for 3rd party user data. However, if publishers could understand and model specific users, they will be in a better position of yield optimisation (e.g. setting up an optimal reserve price for this auction based on the forecast of bidding activity with user data), although such query would incur cost. Meanwhile, ad exchanges normally do not send impressions to each other even if they remain unsold. This is largely due to business considerations and it leads to the fact that DSPs and SSPs register with multiple ad exchanges. A unified and interconnected marketplace is what we find attractive and it is certainly good for the whole eco-system.

Key References

[1] Abramson, M. Toward the attribution of web behavior. In 2012 IEEE Symposium on CISDA (2012).

[2] Anagnostopoulos, A., Broder, A. Z., Gabrilovich, E., Josifovski, V., and Riedel, L. Just-in-time contextual advertising. In Proceedings of the ACM CIKM 2007.

[3] Balseiro, S., Besbes, O., and Weintraub, G. Auctions for online display advertising exchanges: Approximations and design. In Proceedings of the ACM EC 2013.

[4] Bharadwaj, V., Ma, W., Schwarz, M., Shanmugasundaram, J., Vee, E., Xie, J., and Yang, J. Pricing guaranteed contracts in online display advertising. In Proceedings of the ACM CIKM 2010.

[5] Bilenko, M., and Richardson, M. Predictive client-side profiles for personalized advertising. In Proceedings of the ACM SIGKDD 2011.

[6] Broder, A., Fontoura, M., Josifovski, V., and Riedel, L. A semantic approach to contextual advertising. In Proceedings of the ACM SIGIR 2007.

[7] Chakrabarti, D., Agarwal, D., and Josifovski, V. Contextual advertising by combining relevance with click feedback. In Proceedings of the ACM WWW 2008.

[8] Chakraborty, T., Even-Dar, E., Guha, S., Mansour, Y., and Muthukrishnan, S. Selective call out and real time bidding. Internet and Network Economics (2010), 145–157.

[9] Chickering, D. M., and Heckerman, D. Targeted advertising on the web with inventory management. Interfaces (2003).

[10] De Bock, K., and Van den Poel, D. Predicting website audience demographics for web advertising targeting using multi-website clickstream data. Fundamenta Informaticae 98, 1 (2010), 49–70.

[11] Dewan, R. M., Freimer, M. L., and Zhang, J. Management and valuation of advertisement-supported web sites. Journal of Management Information Systems (2003).

[12] Edelman, B., Ostrovsky, M., and Schwarz, M. Internet advertising and the generalized second price auction: selling billions of dollars worth of keywords. Tech. rep., National Bureau of Economic Research, 2005.

[13] Feldman, J., Henzinger, M., Korula, N., Mirrokni, V. S., and Stein, C. Online stochastic packing applied to display ad allocation. In Algorithms–ESA 2010. Springer, 2010, pp. 182–194.

[14] Fjell, K. Online advertising: Pay-per-view versus pay-per-click with market power. Journal of Revenue and Pricing Management (2010).

[15] Fridgeirsdottir, K., and Asadolahi, S. Revenue management for online advertising: Impatient advertisers. Tech. rep., Working paper, London Business School, 2007.

[16] Google. The arrival of real-time bidding, 2011.

[17] Hoy, D., Jain, K., and Wilkens, C. A dynamic axiomatic approach to first-price auctions. In Proceedings of the ACM EC 2013.

[18] Hu, Y. J. Performance-based pricing models in online advertising. SSRN 501082 (2004).

[19] Kwon, C. Single-period balancing of pay-per-click and pay-per-view online display advertisements. Journal of Revenue and Pricing Management (2009).

[20] Lacerda, A., Cristo, M., Gonc ̧alves, M. A., Fan, W., Ziviani, N., and Ribeiro-Neto, B. Learning to advertise. In Proceedings of the ACM SIGIR 2006.

[21] Mangani, A. Online advertising: Pay-per-view versus pay-per-click. Journal of Revenue and Pricing Management (2004).

[22] Muthukrishnan, S. Internet ad auctions: Insights and directions. In Automata, Languages and Programming. Springer, 2008, pp. 14–23.

[23] Muthukrishnan, S. Ad exchanges: Research issues. Internet and network economics (2009), 1–12.

[24] Myerson, R. Optimal auction design. Mathematics of operations research 6, 1 (1981), 58–73.

[25] Ostrovsky, M., and Schwarz, M. Reserve prices in internet advertising auctions: A field experiment.

[26] Provost, F., Dalessandro, B., Hook, R., Zhang, X., and Murray, A. Audience selection for on-line brand advertising: privacy-friendly social network targeting. In Proceedings of the ACM SIGKDD2009.

[27] Ribeiro-Neto, B., Cristo, M., Golgher, P. B., and Silva de Moura, E. Impedance coupling in content-targeted advertising. In Proceedings of the ACM SIGIR 2005.

[28] Roels, G., and Fridgeirsdottir, K. Dynamic revenue management for online display advertising. Journal of Revenue & Pricing Management 8, 5 (2009), 452–466.

[29] Thompson, D. R. M., and Leyton-Brown, K. Revenue optimization in the generalized second-price auction. In Proceedings of the ACM EC 2013.

[30] Vee, E., Vassilvitskii, S., and Shanmugasundaram, J. Optimal online assignment with forecasts. In Proceedings of the ACM EC 2010.

[31] Wang, J., and Chen, B. Selling futures online advertising slots via option contracts. In Proceedings of the ACM WWW 2012.

[32] Wu, X., and Bolivar, A. Keyword extraction for contextual advertisement. In Proceedings of the ACM WWW 2008.

[33] Wu, X., Yan, J., Liu, N., Yan, S., Chen, Y., and Chen, Z. Probabilistic latent semantic user segmentation for behavioral targeted advertising. In Proceedings of the ADKDD 2009.

[34] Yan, J., Liu, N., Wang, G., Zhang, W., Jiang, Y., and Chen, Z. How much can behavioral targeting help online advertising? In Proceedings of the ACM WWW 2009.

[35] Yih, W.-t., Goodman, J., and Carvalho, V. R. Finding advertising keywords on web pages. In Proceedings of the ACM WWW 2006.

[36] Yuan, S., and Wang, J. Sequential selection of correlated ads by pomdps. In Proceedings of the ACM CIKM 2012.

[37] Jordan, Mahdian, Verssilvistskii, and Vee, 2011 The Multiple attribution problem in pay-per-conversion advertising, Algorithmic Game Theory.

[38] Dalessandro, B., Perlich, C., Stitelman, O., and Provost, F. 2012. Causally motivated attribution for online advertising. In Proceedings ADKDD 2012.

[39] Rosales, R., Cheng, H., and Manavoglu, E. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In Proceedings of the ACM WSDM 2012.

[40] Yuan, S., Wang, J., and Zhao, X. Real-time bidding for online advertising: measurement and analysis. In Proceedings of the ADKDD 2013.

[41] Yuan, S. and Wang, J. Adaptive Keywords Extraction with Contextual Bandits for Advertising on Parked Domains. In Proceedings of the IATP 2013.

[42] Feldman, J. and Muthukrishnan, S. and Pal, M. and Stein, C. Budget optimization in search-based advertising auctions. In Proceedings of the ACM EC 2007.

Last updated: 7/8/2013