Sari Azout

Posted on Oct 18, 2021Read on

Re-Organizing the World’s Information: Why we need more Boutique Search Engines

For most queries, Google search is pretty underwhelming these days. Google is great at answering questions with an objective answer, like “# of billionaires in the world” or “What is the population of Iceland”. It’s pretty bad at answering questions that require judgment and context like “What do NFT collectors think about NFTs?”.

The evidence is everywhere. These days, I find myself suppressing the garbage Internet by searching on Google for “Substack + future of learning” to find the best takes on education. We hack Twitter with the “what is the best” posts over and over again. When I’m researching a new product, I type "X item reddit" into Google. I find enormous value in small, niche, often forgotten sites like Spaghetti Directory.

There’s an emergence of tools like Notion, Airtable, and Readwise where people are aggregating content and resources, reviving the curated web. But at the moment these are mostly solo affairs, hidden in private or semi-private corners of the Internet, fragmented, poorly indexed, and unavailable for public use. We haven't figured out how to make them multiplayer. In cases where we’ve made them public and collaborative - here is a great example - these projects are often short-lived and poorly maintained.

The stated mission of a company worth almost two trillion dollars is to “organize the world’s information” and yet the Internet remains poorly organized. Or, stated differently, in a world of infinite information, it’s no longer enough to organize the world’s information. It becomes important to organize the world’s trustworthy information.

How did we get here?

It’s hard to believe, but one of Google’s main problems, once they got going, was that there just wasn’t much to see online. Having a great search engine is useless if somebody types in “how to grow a herb garden” and the answer doesn’t exist online. With the advent of Google AdWords, it became profitable to put out shitty content that passed as informative and filled Google’s search engine results. In 1998, the first iteration of Google indexed 25 million pages. Today, Google’s index includes more than 100 trillion pages. AdWords was the catalyst for the explosion of content garbage manufactured for SEO purposes we see today. It’s also the reason why hidden gems, the kind of UGC content we discover on Twitter all the time, are less likely to show up highly ranked in search results. What started as a well-intentioned way to organize the world’s information has turned into a business focusing most of its resources on monetizing clicks to support advertisers rather than focusing on the search experience for people.

The problem, now so drastically different from a decade ago, is not what to read/buy/eat/watch/etc but what is the best thing to read/buy/eat/watch/etc with my limited attention.

Audacious teams, like DuckDuckGo and Neeva, are trying to compete with Google head-on by building massive horizontal search engines. Rather than crawling and indexing things their own way, they sit on top of existing data sources and position themselves as privacy focused alternatives to Google. But protection of privacy is not a compelling enough reason to leave Google. For the vast majority of people, allowing them to “control their data” is not a selling point, especially if it requires paying for something they’re used to getting for free.

I believe the opportunity in search is not to attack Google head-on with a massive, one size fits all horizontal aggregator, but instead to build boutique search engines that index, curate, and organize things in new ways.

I realize I’ve just jumped to the conclusion but stick with me as I zoom out, make my case, and lay out the questions keeping me up at night.

Vertical Search Aggregators

Google is a great example of how the internet enabled scale and speed: every page on the web returned in an instant. But increasingly, we’re seeing this scale is at odds with a fundamental human need: relevance. Someone who wants to find the best freelance designer, or the best sushi restaurant, or the best NFT to buy will not find the answer on Google.

There is no search architecture that will work universally across all categories. It’s hard to imagine you wanting the same UX to search for recipes than to search for freelancers. Whereas Google’s product begins and ends with a search bar, trading off functionality for simplicity, vertical search players like Yelp, Expedia, Zillow, and Behance emerged to fill functionality and relevancy gaps using structured data specific to their industries. With strong opinions on how to organize information, reflected in their choice of filters, vertical search aggregators have distinct advantages that horizontal software can never achieve.

But here too, relevance depends on the sociology of the current moment. For example, on Behance, the online creative community, school and location are featured prominently as filters, implying that where you live and where you went to school is an important indicator of the quality of your design portfolio. In a world where talent is being decoupled from credentialism and geography, those filters are losing relevance.

As signals evolve, new ways of indexing and surface areas for innovation emerge. If Behance were designed today, I’d argue neither location nor school would be filters (if you want my opinion on what filters would replace it, DM me - I have an idea 😛).

On Yelp, a search for Electricians in Miami yields a page titled “The Best 10 Electricians in Miami, FL” with text underneath that tells me these are Sponsored Results. Do the top 10 electricians in Miami happen to be those that pay Yelp? Did some UX copywriter try to fool me into believing sponsored results are curated? Do they think I’m stupid? My mind is doing some serious mental gymnastics.

When you monetize via ads, curation takes a backseat to featuring advertisers - there is just less digital real estate available to curate your own recommendations - so these platforms end up making ethically dubious design choices that generate massive trust gaps.

Additionally, across vertical search aggregators like Yelp, Zillow, LinkedIn, and Behance, anyone can have a profile.  A combination of irrelevant filters, ad-based business models, and unconstrained supply has overwhelmed consumers and made it hard to find signal in these platforms. Vertical search aggregators work when you know exactly what you want. But knowing what you want isn’t usually the starting point - which creates an opportunity to help the overwhelmed consumer with better discovery and curation along the funnel.

Curators, Curators

It has become popular to say we live in the information age, and we need curation to help us sort through the mess. But thus far, the conversation around “curation” has been too focused on the content and not enough on the structure. We seem to have accepted the job of the curator as providing a product review, a list of links, a song recommendation, all inside linear structures and chronological feeds designed to surface the ideas of the last 24 hours, not to accumulate and surface knowledge as needed.

A daily email with the top five Alibaba products feels fun and gimmicky as a side project, but it doesn’t help when you’re trying to find the best crib for your baby. Inevitably, you’ll want a way to search through a curator’s archives.

Curation, when thought of in the context of sharing bite-sized, isolated bits in feed-like architectures, is predominantly about entertainment, not utility. It’s not wrong to say there is a market for this kind of curation. What people miss, though, is that this market is already captured by Twitter, Facebook, and TikTok.

These entertainment giants offer curation that demands our attention, but they don’t offer curation on-demand. The opportunity is in moving curated content feeds away from their never-ending-now orientation and towards more goal-oriented interfaces. People should be able to find whatever content they want on their terms and not be beholden to when the curator decides to publish.

All curation grows until it requires search, and all search grows until it requires curation. - Ben Evans

Applying Ben Evans’ framework, it becomes clear that while the vertical search players have become too large and need curation, the curation feeds have become too long to browse and require search and structured data.

The solution is better search and better curation, all wrapped in a better business model - a combination I call boutique search engines.

Searchable, curated interfaces will help us move away from ephemeral, time-bound feeds, into contextual, high signal, trustworthy knowledge spaces. Because searchable interfaces are densely linked, an explorer can follow multiple trails through the content, rather than being dumped into a "most recent" feed.

With such a tight relationship between curation and search, the real question is not whether you need curation or search but at what point it comes in, and how?

Spotify doesn’t curate what songs make it to their platform. Instead, they take the entire universe of music and find endless ways to discover and search across their library, including a mix of manual curation (via playlists curated by their in-house team of curators and their users) and algorithms (like Discover Weekly).

Wirecutter does not review every product. They manually curate the top products, and then use search and other discovery tools to help you find what you need.

Thingtesting is not automatically scraping all CPG brands on the Internet - someone from their team or community went out of their way to add a brand to their database.

If you’re searching the On Deck member database, you know everyone has applied, been vetted, and paid a fee to participate in the program.

If you’re reading through transcripts on Tegus, you know the content comes from experts that have been handpicked by their team.

Across all of these examples, the value is in what they exclude as much as what they include. The friction in the supply side is what generates the signal.

On top of the signals, these businesses have built strong search engines. OnDeck, for instance, has built an opinionated graph that allows you to discover talent in ways that are not possible on Linkedin. For example, you can filter by people with “Software Engineering” skills whose current status is “Open to new ideas’. As a founder looking for engineering talent, I’ll take this curated dataset over LinkedIn’s any day.

Unlike vertical search aggregators, boutique search engines feel less like yellow pages, and more like texting your friends to ask for a recommendation. They have constrained supply, which is the foundation for their biggest moat - trust. Importantly, boutique search engines introduce new business models that don’t rely on advertising.

At Startupy we’re building a boutique search engine for startup insights and the people and companies that have them. You can think of us as a digital playground where thinkers and creators curate, organize, map, and interconnect the world's most valuable insights and ideas.

There are tens of thousands of people sharing insights on a long-tail of topics, but their content is buried in the deep corners of the interwebs, found only by chance, and consumed in fleeting social media feeds that strip context and discourage reflection.

If you want to know what the smartest people have to say about the future of fandom, there is no single place to turn to. If you want to know who the experts in token design are, good luck. If you want to see what startups are doing interesting things in carbon offsetting, you’ll need hours to navigate a web of noise. If you want to find out who did the branding for your favorite DTC brand, you have to be an industry insider.

As information becomes more abundant, the connections drawn between disparate pieces are becoming increasingly important. Finding the next hot take on NFTs is easy, making sense of the crypto landscape is not. We have a firehose of information being created, and few people devoted to filtering, organizing, curating and indexing that information. That’s what Startupy’s curators do.

Instead of aggregating everything or showing you curated content in a linear feed, Startupy is building a search engine indexed by people.

The reductive nature of this post might imply that building a boutique search engine is easy. On the contrary, it will require countless nuanced product choices. Getting the right mix of curation, search, algorithms, and business model will likely prove to be massively difficult and massively valuable. Here’s a non-exhaustive list of questions we’re pondering:

If the value proposition is signal over noise, how do we scale the signal? Over and over again, curation sites fall into an existential trap. They start with high-quality, curated recommendations. As they grow, they scale with crowdsourcing, often filling the gap with scraping. Over time, the content goes from great to good. At that point, vertical search aggregators like Yelp offer more utility. Yahoo for example became too big to be browsed, lost its signaling power, and reached the point where Google was better. The line between curator, compiler, and cataloguer is thin and there is a natural invisible asymptote – diminishing returns on more data over time. Mitigating this will require making highly nuanced product choices that trade off signal for scale.

What is the Business Model for this new wave of search engines? On the surface, “vertical search engines” are simple - content is the supply, and eyeballs are the demand. But the last wave of vertical search engines was built atop ad-based business models, which made things trickier. In an ad-driven marketplace, the eyeballs are on the supply side. Their attention is what the demand-side — advertisers — want. The downside to this ad-driven model is that advertisers and content producers are competing for the same attention, which is why these sites end up feeling like marketing blogs. This is why subscriptions present an opportunity - it simplifies the network effects to two sides: content as supply, a paying audience as demand. But subscription itself is not a panacea, especially when the use cases are not frequent enough. How often are you needing to find a freelancer? An investor? If use cases are not frequent enough, the utility of a search engine won’t translate to a sustainable business model and you’ll have to come up with your own flavor of “come for the search, stay for something else”. Moreover, as my friend Joey points out in this article, any product where you’re spending a lot of time using it in incognito has a pretty big UX problem. How many New York Times accounts did you create before finally giving in to the paywall? With today’s subscription models, I have no real incentive to help the platform grow. Nascent token-based business models show early signs of promise. By giving ownership to its stakeholders and allowing subscribers to benefit from future upside, startups can overcome the cold start problem. Though appealing, a playbook for tokenized business models has not yet emerged. I suspect this will change in the coming years, and I’m excited to improve my understanding of the subject.

Who curates the curators? Platforms like Twitter delegate this responsibility to their users, who have to go through a long and arduous process of following a huge number of people to ultimately arrive at a self-curated timeline that mimics their interests. Some centralize their curation - at OnDeck, you trust that they are doing the work of selecting who can join their network. Yet others stay away from curation in favor of more traditional crowdsourcing. The spectrum is wide. To ensure a quality foundation, I am initially granting access to Startupy curators on an individual basis. But it’d be silly to grow Startupy as if my taste and knowledge was definitive. I suspect scaling the number of curators without compromising signal is one of the hardest questions we’ll grapple with. Ideally, we move away from the wisdom of the crowds and towards the wisdom of communities. It should feel more like an orchestra than a mob, more like the compounding knowledge of groups and less like an averaging out. Mirror’s Token Race offers inspiration here, but only time will tell how to best answer this question.

How can we incentivize curators? People are driven to contribute for a complex web of reasons, including a desire to create something from which the larger community will benefit as well as the sheer joy of practicing a craft. Though intrinsic incentives should continue to be the primary reason people contribute, I’m excited to see what layering extrinsic incentives will unlock. As always, reality has a surprising amount of detail, and there are a zillion details to figure out. It’s very easy to launch an ERC-20 token. It’s much more complicated to construct a legally compliant token and set it up in a sustainable way.

How do you find the search engine in the first place? We started this piece arguing that Google needs to be unbundled. That’s a catchy headline, but in truth, I believe that until you can build habitual recall into your product, Google will be an important part of how your engine gets discovered in the first place. Zillow and Airbnb are examples of search companies that enjoy a good amount of direct traffic, but SEO was a big part of their early strategy. By being among the first to create the definitive page for a home, they benefited from an SEO land grab that has been hard to displace since.

We are far from achieving the grand vision of the Internet. The project of human knowledge, as it stands today, is a vast ocean of ephemeral and fragmented information and ideas, with the best sources near-impossible to find. We need more interfaces with a point of view on what information is missing, how it needs to be organized, and at what point of the value chain the curation has to happen.

It will be a long journey, but I am invigorated by the process of manifesting the answers into a tool that can help us use the explosion of information to harness our potential as a species, not to keep us scrolling.

Recommended Reading