Over roughly eight weeks I rebuilt the whole thing as a web app. I help run pickup soccer games in San Francisco, and one time we had Ice Cube playing from the speakers. So, naturally, the app’s name had to change to . . . Check Yo Shelf. I switched to using Gemini Flash with a fallback to GPT-4 after testing every single VLM on the market.
I would run every test photo through Gemini Flash, Gemini Pro, multiple GPT ones, Qwen, and others, and compare their results against two benchmarks: the number of books they detected, and whether they got the book titles and author names right. The combination of both constituted an accuracy score.
Off the back of that process, the app’s accuracy rate shot up to over 95%. It was amazing how much better it worked.
Another benefit of a web app is I knew much more about web apps than native iOS apps, so this time round I built an architecture that was more flexible and fault-tolerant. And as a result, it allowed for new features like reasonably accurate book covers.








