Content Optimizer Web App: A retrospective

29 Jul, 2022

AI Content Optimizer Web Application

Client needed a way to efficiently conduct analysis over a number of different on-site metrics for their own SEO clients. The tool would be comprised of two halves: a FAQ content creator and a ranking website content auditor. They wanted a tool that achieved the following:

Content Optimization

For a given keyword, find the most common nGrams present across the top 10 websites that rank for that keyword in Google
For those top 10 ranking sites, identify the most salient terms/entities
For both of these sets of phrases, compare that to how often those terms are used on the site we wish to optimize

FAQ Generation

For a given question, generate a starter answer that made sense using AI
For a given webpage URL, extract the content from it and generate a series of question and answer pairs so that the client can include FAQ content about their articles in a scalable manner

Solution

The scope of this project is quite large so I decided to use a number of libraries to assist with this. I’ll review the tech stack and the 3rd party tools used here.

Tech Stack

Google Cloud Run: The processing would be done on a Cloud Run instance that could be invoked as needed and spun down when not in use
Cloud Tasks: Since a lot of these jobs took 30 seconds to a minute to run, I built an enqueueing system using Cloud Tasks to post the job request data to a task queue for processing
Flask: Interfaces nicely with a number of AI libraries via Python
OpenAI: Templated prompts for the FAQ generation called by their API
Google Cloud ML: Used for harvesting entities out of crawled content
Cloud SQL: Easy to hook into Cloud Run. Since jobs can be enqueued and the user can exit the application we need to store results. All job runs and task statuses are stored here.
Auth0: Easy bake auth oven. I walled the Flask application behind Auth0
SerpScale API: For Google search engine results. I don’t want to have to manage proxies and scrape Google myself

This was a fun project. I built an API based solution with a couple of Flask endpoints whilst also serving a front end driven by Flask Jinja (keep it simple). I tied the auth into the web app and then processed a lot of the logic through various Google Cloud resources.

This project hits pretty much everything end-to-end.

Lessons Learned

Account for edge cases when web scraping. Certain assumptions you have when testing out your clean examples for web scraping will not hold up when you unleash your tool to the Internet. I racked up an immense GCP StackDriver bill with a logical error that looped over an expected value that never occurred due to an erroneous assumption I made about the contents that would get returned to me by default.

Parsing web content is difficult. Nobody uses consistent HTML tagging or semantic tags for that matter. Sometimes people don’t even use paragraph tags!

Overall client is happy. They have a web app that their employees can use to log into and run jobs, and let them run. This is a time savings that automates a lot of research and manual work of optimizing websites for search engines.