Ok, I said I would be giving up on embedded searching, but I lied.
I found a pretty good solution that is justttt small enough to be flexible in many use cases, but large enough to do the main work reliably.
K.I.S.S.
If you can make a process simpler, do it. This is a lesson that falls on deaf ears on those who over-engineer solutions everywhere.
I realized I was doing too much. I needed a DB that stays up to date with its vector counterpart, and an easy UX. The services I was relying on (OpenAI embeddings, Pinecone, PostgresSQL) would have to take care of the rest.
Enter the middleware.
Middleware 1 splits full documents into chunks that can fit into a OpenAI cluster (currently ~2000 tokens).
Middleware 2 handles the relationship between PSQL and Pinecone. It handles data upsertion, embeddings and its job is to basically listen to any changes in the database and forward them to Pinecone. This was made so much easier after I started using Peewee.
Note to all devs: use an ORM! Interacting directly with your database will cost you an ORM and a leg!
# Middleware 2 upserting one document in psql AND pinecone
def upsert_one(chunk_id, chunk_text, chunk_source):
# retrieve index
index_res = upsert_pinecone_index(INDEX_NAME, 2048)
# update chunk in psql
update_chunk(chunk_id, chunk_text, chunk_source)
# embed one document
docs = embed_documents([chunk_text])
# zip chunk_id and embedding
zipped_res = zip([str(chunk_id)], docs)
# upsert vectors in pinecone index
upsert_many(INDEX_NAME, index_res, zipped_res)
Other News
I really like this quote from Foundations of Computer Science(1992)
Though it is a new field, computer science already touches virtually every aspect of human endeavor. Its impact on society is seen in the proliferation of computers, information systems, text editors, spreadsheets, and all of the wonderful application programs that have been developed to make computers easier to use and people more productive. An important part of the field deals with how to make programming easier and software more reliable. But fundamentally, computer science is a science Abstraction of abstraction — creating the right model for thinking about a problem and devising the appropriate mechanizable techniques to solve it. Every other science deals with the universe as it is. The physicist’s job, for example, is to understand how the world works, not to invent a world in which physical laws would be simpler or more pleasant to follow. Computer scientists, on the other hand, must create abstractions of real-world problems that can be understood by computer users and, at the same time, that can be represented and manipulated inside a computer.
TikTok Payouts are capped. The more successful TikTok gets, the less money creators earn.
Power as defined by physics is different from power as defined by persuasion, but there seems to be commonalities in currency buying tools (levers, pulleys) or people (experts, redundancy) to complete some physical work
I changed my portfolio from nouns to verbs. Turns out verbs like “I do x to achieve y” are much stronger at getting your point across than “I am a x person”. Turns out people don’t really care about the second.
The everyday madness perpetuated by the internet is the madness of this architecture, which positions personal identity as the center of the universe. It’s as if we’ve been placed on a lookout that oversees the entire world and given a pair of binoculars that makes everything look like our own reflection. Through social media, many people have quickly come to view all new information as a sort of direct commentary on who they are.
Trick Mirror - Jia Tolentino
ars longa, vita brevis
Bram