How AI is transforming the stock photo library market
Removing cost, driving scale & growing market potential
There are currently 1Bn images for sale across the world’s leading stock image libraries. 75% of these images are held by the top 4 players. The gross size of any collection suggests credibility and offers peace-of-mind for users who therefore feel more confident of finding what they need. But the actual number of images regularly downloaded is a small proportion of those available. The large scale and diversity of leading collections is an entry barrier for new providers but creates knock-on costs for processing, storage, and retrieval.
Having a large number of users (photographers and customers) underpins the value of monthly subscription models. Subscriptions are based on a low ASP (average selling price) per image running across a large recurring customer base. Attracting and retaining customers is critical and relies on the provider having a sufficiently diverse collection to cater for each user’s needs - real or perceived. However, once a photo collection reaches a significant size, with a relatively fixed level of demand, the time and cost of manual processing, storage, and retrieval leads to diminishing returns.
New content is needed to keep a collection fresh and to cover new fashions, styles and trends but it’s difficult to work out which new images are truly additive. Some older styles of content lose value over time but most generalist photography retains some value into the future. Estimating the lifetime value of each new image is not a simple task.
The underlying challenge is that the scale and speed of image capture growth is exponential while end-user demand is becoming more discerning. As more and more inventory is ingested, it becomes ever harder to surface the most relevant content for each user and to scale processes accordingly.
AI offers a way forwards.
AI is shifting photo library economics
Deep Learning, a subset of AI and Machine Learning, is now providing capabilities to rebase the cost structure of running a large-scale photo library and powering new tools to improve user access. Over the past 5 years the academic field of Deep Learning has transformed the detection and classification of language and vision. These tools can now be used as fundamental components of an image library’s ability to generate value.
Leading businesses in most vertical sectors have spent the last 1-2 years running experiments and pilots with new Deep Learning technologies, leading to a broader understanding of their applicability and future value across domains. In the stock photo market, major players are making acquisitions, building internal teams, and trialing solutions built around Deep Learning technologies.
At Pimloc we’ve been able to work with a range of commercial and heritage libraries to build a practical understanding of the value that Deep Learning can bring now and into the future. We see that Deep Learning is fundamentally changing the operations of photo libraries in two main ways:
1) Reducing operational costs: speed up and replace manual processing, enable use of larger scale platform
2) Increasing revenue: increase exposure of content in relevant channels and drive up sales conversion / usage
1 - Process automation / reducing costs
A high proportion of image ingestion and processing is still managed manually. Photo libraries invest resources to ensure the quality of data being brought into their systems: validating / editing metadata from photographers, checking image content and quality, reformatting, retouching, cropping, tagging and captioning. More content is being made available but the costs and time required for processing is increasing in proportion.
The unavoidable variability of manual annotation and image grading, often carried out by large offshore teams, depresses upstream asset value. The quality and comparability of metadata completely controls which images are later surfaced through search. Automation and augmentation of these processes not only allows images and video to be processed more quickly but also provides a more reliable and repeatable approach to metadata and search index creation.
Image libraries which adopt these new capabilities will build a competitive advantage in the medium term as well as the ability to maintain that position. Smaller providers with a niche, specialist offering will survive but it will become increasingly difficult to run a generalist offering without access to significant technology platform advantages.
Automated image processing can be used both to speed up ingestion of new imagery and to make a cost-effective clean-up of metadata in existing and legacy collections. Checks for duplicate / near-duplicate content, metadata quality, and content overlap apply equally to both tasks. The ability to process large collections quickly also opens up the opportunity to bring in new types of visual content from the growing range of capture devices and channels in the market.
Using high volumes of annotated photo data is generally an advantage for training Deep Learning models. But model-builders need to be acutely aware of any inherent biases in their training sets (ie. weighting of gender/age/ethnicity and tagging language/terms). The design, tuning, training, and testing of neural networks suited to each classification task requires expertise, experience, and time as well as balanced training datasets.
Paying attention to how metadata is generated, checked, and stored pays dividends in the future. For example, locale is a significant issue. Users of the same collection, using the same text-tag language, nevertheless search and browse in different ways according to local culture.
As these new methods are introduced, the improvements in access and usability they can provide will become as much of a strategic asset as the content of the collection itself. For the library business, they save costs, improve the quality of service, and increase asset value. For customers, they provide a powerful connection with the whole collection and a strong reason to remain on site, finding and purchasing imagery. This is discussed in more detail in the next section.
2 - Service innovation / increasing revenue
DRIVING UP CONVERSION
Most image libraries see AI primarily as a way to reduce costs rather for its potential to grow market share and ultimately the size of the overall market.
Let’s assume that, in the short term, demand for image and video content (ie. the number of people actively searching for photos to use each day) is relatively fixed. The most critical challenge for a photo library then becomes how to convert as many of their site’s visitors into customers.
For a user searching for an image of a dog, returning a huge array of thumbnails (1.5m as per above search results) is not particularly helpful except perhaps to reassure the user that the image they need is probably somewhere there - if they could find it. Even if the dog’s breed is specified, the user’s challenge can still be overwhelming. Text-based tag searches rarely deliver the required image in the first few pages of results.
From our work developing systems in this area we know that a user’s search strategy needs to quickly switch from text to a direct interaction with the visual content of the collection’s images. Their search priorities may relate to details of certain objects in the image, the overall scene or context, colouration, style, layout, or tone. These details may be very difficult or impossible to express in terms of simple text.
In the future, most searches will still likely start with a text-based filter, but users will then quickly begin to interact more directly with images as a means of refining their search.
At Pimloc we’ve built Deep Learning applications that provide end user tools for the rapid navigation through visual results based on the user’s direct interaction with image content. Tags are a useful and necessary anchor to start a search but quickly become redundant once a subset of images (ie. 100s – 1000s – 1Ms) are surfaced for review. To maximize on-site conversion photo libraries need to provide users with more flexible discovery tools that allow the fast (and enjoyable) exploration of dynamic subsets of images.
Once these tools are in place the main barrier to conversion becomes pricing and inventory – both of which are currently being managed through subscriptions and larger collection sizes.
Over time Deep Learning systems can be setup to learn the distinct visual preferences of individual users, groups, and specific channels, based on their past usage and/or current published images. Curated recommendations can be made automatically within search results providing a different ranking of initial results based on user preferences. Over time this can allow systems to improve the performance of the initial stage of a search. But content-based tools will always be required to ensure providers are maximising conversion on all searches.
DRIVING UP EXPOSURE
Once the tools are in place to maximise on-site conversion, the challenge switches to bringing more of the available market into contact with a provider’s images.
As the dominant search engine, Google image search has become a critical part of many demand generation activities. Google indexes the metadata associated with images on the web, including stock library sites. It picks up image captions/descriptions/tags that are displayed on each photo page and uses them to surface relevant images against user image search queries.
Google does not explicitly reveal its ranking algorithm but it is well known that it downweights duplicate content and tries to focus on providing relevant/quality results. Both are difficult to achieve when applied to collections with numerous identical or near-identical text tags and captions spread over 100s millions of images. Deep Learning techniques could provide an ability to auto-generate ‘uniquely relevant’ photo descriptions based on finer-grained classification of images to limit metadata duplication and ensure relevance.
Alongside image search engines, there are numerous content publishing, creative and editing platforms used by millions of people each day. Deep Learning systems can index image content types for specific platforms and use these to create profiles to serve subsets of their main collection into more specialist domains.
At Pimloc we have created a prototype system that analyses copy as it is being written and auto-suggests relevant images for insertion directly into the article or blog. This is a strategy that works well at small scale with manually curated image sets but needs automated systems to optimise across global channels. Injecting relevant image and video content directly into leading publishing platforms can remove the friction of using a separate search platform, driving up the incentive for users to add imagery to their publications.
Over time the winning photo providers are likely to be those which marry diverse photo inventory with the best tools to search and edit them.
If you are interested in learning more about image based process automation and/or new forms of image based search tools please get in contact: simon@pimloc.com.