Jason Liu - Instructor, Shipping LLMs to Production

Durée: 64m5s

Date de sortie: 10/06/2024

This week we sit down with Jason Liu, a machine learning expert and the author of the Instructor. We talk about what working with LLMs is like, how to ship them to production, and how to make them more accessible to everyone. We also talk about the future of prompt engineering and how to make it easier to build better prompts.

Episode sponsored By Clerk (https://clerk.com)

Become a paid subscriber our patreon, spotify, or apple podcasts for the full episode.

Andrew

http://v0.dev

https://github.com/ChrisBuilds/terminaltexteffects

Justin

https://github.com/ThousandBirdsInc/chidori

https://www.cursorless.org/

Jason

https://betterdictation.com (code JASON20)

https://cursor.ai

https://one-sec.app/

Un bon AI product est un product que quelqu'un ne pense pas en utilisant sur un basis régulièrement.
Je pense que un bon AI product est un qui est un peu anxieux quand il n'est pas en train de se faire.
Je ne veux pas que les systèmes puissent faire mon travail ou faire mieux que moi parce que je ne pense pas que ça peut arriver.
Bonjour ! Bienvenue au DevTools FM Podcast.
C'est un podcast pour développer les tools et les gens qui les font.
Je suis Andrew et je suis ma co-host, Justin.
Salut tout le monde !
Nous sommes vraiment excitées de voir Jason Liu avec nous.
Jason, vous travaillez comme une consultant indépendante qui est vraiment focussée sur l'AI.
C'est un sujet que nous avons touché sur de nombreux différents entreprises.
Mais je suis excité de voir quelqu'un qui est vraiment focussé sur ce space en travaillant sur ça.
Avant de nous dévier pour parler plus sur ce que vous faites dans le space et ce que vous travaillez sur,
pouvez-vous juste dire aux listeners un peu plus sur vous-même ?
Oui !
Je suis Jason, j'ai spenté les dernières 8 ou 9 ans
et j'ai été très heureux de faire des machines d'apprentissage et de faire des modèles de différentes modèles.
En début, c'était très classique de machine d'apprentissage.
Puis j'ai été visé à la computer et ensuite aux systèmes de recommandation.
Et maintenant, comme les modèles de langues sont vraiment populaires,
je suis appris à tout ce que j'ai appris dans les systèmes de deploying,
peut-être en 2015, 2016, 2017,
mais dans cette nouvelle modèles de texte
vers les choses comme des images et des systèmes de recommandation.
Et vraiment, la plus grande chose que j'ai remarqué est que
beaucoup de la travail que nous avons fait avec les agents ou avec Ragh
ont beaucoup de systèmes de recommandation classiques
et de systèmes de chute-bots et de chute-bot.
Et donc, c'est vraiment fun de faire des travail de l'ancienne étudiée
dans ce nouveau paradigme.
C'est génial !
C'est intéressant de penser sur comment la industrie a s'estimée beaucoup
parce que c'est pas trop long,
nous avons juste spenté beaucoup de temps sur le monnaie,
juste la processation de langues naturelles.
Je travaillais pour la compagnie de la nourriture de la nourriture
et nous faisions un système de chute-bots pour Amazon Alexa.
Et c'était comme des templates.
Vous avez écrit des textes de templates pour combattre
ces phrases de la nourriture que les gens pouvaient dire.
Et les gens avaient dû dire l'exacte incantation.
Et maintenant, c'est comme avec les LLMs et les choses.
Nous avons beaucoup plus de nourriture et on peut appeler ça.
Et c'est juste intéressant de voir
juste un petit peu d'extrême recherche.
Non, pas un petit peu.
Mais beaucoup d'extrême recherche a vraiment changé la dynamique
et de comment nous approchons et de parler de ces choses.
C'est incroyable.
Je me sens comme...
3 ou 4 ans plus tard,
j'étais très dédissive de la monnaie en général
parce que je pensais que c'est très difficile.
On ne va pas faire de l'avion.
Je voulais dire,
je vais faire mon travail, faire du monnaie,
faire de la vision.
Et quand Chachi m'a dit que je suis là,
je me suis fait faire une lettre de l'apologie
pour tous mes amis,
pour faire de l'exercice de l'LLT.
Et je me dis,
ça marche maintenant.
Je vais voir comment nous pouvons
avoir des valeurs à ce point.
Cool.
Donc avec ça en mind,
on va mettre le stage un peu.
Il y a beaucoup de nouvelles termes en fonction de la monnaie.
Et il y a même des nouveaux types d'ingénieurs en fonction de la monnaie.
La monnaie de la monnaie est un ingénieur AI.
Donc, qu'est-ce que l'ingénieur AI?
Oui, en mon avis,
l'une des les plus grandes choses
qui séparent l'ingénieur AI
pour la monnaie en fonction de l'application de la machine
est que les machines sont déjà intelligentes.
Il n'y a pas de plus de teaches qui ont besoin de se faire.
Et donc,
les tools qu'on doit utiliser
pour élargir le modèle de l'anglais large
sont un peu plus différents.
Je pense que un ingénieur AI
est plus capable de faire
plus de fonds en développement.
Parce que maintenant,
vous avez donné des usages et des interfaces
pour ces modèles de langue.
Mais ils doivent encore être plus quantitifs
en pensant sur des valeurs,
comment tester différents modèles de langue.
Et aussi,
avoir des skills de writing
parce qu'on doit ensuite
faire un travail de la compagnie de la monnaie.
Mais si vous pensez au travail de la monnaie,
en compagnie d'ingénieur,
comparé à quelque chose de la science de la date,
il y a toujours été cette mode de translation.
Dans la science de la date,
je pense que
nous allons prendre un problème de business
et tourner dans des méteries et évaluations.
Dans l'ingénieur,
nous allons prendre la même méterie
que nous soyez en train de faire des prompts.
Et donc, pour la plupart,
c'est très similaire à ce que la science de la date
soit comme en 2015, 2016.
J'ai entendu des mots en bas
sur les termes de prompt d'ingénieurs online.
Certaines personnes disent que
c'est un film de discipline
et que ce n'est pas vraiment facile.
C'est larijé des utensils.
Vous devez enfuir ce film
et penser que ça prem bland haciendo beaucoup de travail.
Et viele personnes vår
se sontadı différentes.
C'est vraiment samh poisonous.
Et quand vous interjectez des Abraham abandonnements,
j'ai entendu un tel personnel
et je regrette que c'est des gens qui se sont dit
qu'ilsателя sont pas comme elles.
Onvision eninent orbits pour maintenant.
nutrition en rooting pour pas comme elle guys.
Il fautlying aux flux les ma нашés.
Le des trucs,
même des fr editions,
Finalement, ce n'est pas honte que le temps
en prof en ingénieur, si je l'ai donné un million de contexte,
30 pages de reportage financiers,
par un ingénieur prof en prof,
quelqu'un qui peut décrire les outils qu'ils veulent,
on va avoir des résultats plus vides
que de faire un petit son sur 200 pages de PDF.
Donc, vraiment, j'ai essayé de couper des choses,
et généralement je me suis rencontré avec beaucoup de docks Python.
Donc, quelle langue pense que vous devez être couper pour ces choses ?
Est-ce que ça ne fonctionne pas ?
Ou est-ce que vous avez besoin d'une interface à un LLM
et que vous êtes en train de faire des races ?
Je pense que pour la plupart,
quelque chose entre type script et Python
fait un peu de sens.
Je pense que la seule raison que Python a une advantage maintenant
est parce que l'ancienne communauté de machine learning
a déjà été en Python.
Et donc, en termes de ce type de tooling qui existe par défaut,
c'est toujours le cas que quelque chose de la Python arrive,
ils ont des plus de traction,
et ils se sont laissés en java script,
les langues ou tout ce qui se passe là-bas.
Mais je pense que, à partir de la perspective de la communauté,
tout a commencé par le monde Python.
Et aussi, en termes de JavaScript et type script,
parce que beaucoup de gens ont tendu à construire des tools
qui interviennent avec les clients et avec les front-end,
je pense que là, vous devez faire une décision
d'avoir une base Python et un réact front-end
ou de pouvoir construire ça complètement en réact.
Je pense qu'il y a vraiment beaucoup de choses
sur lesquelles vous voulez intervier.
Vous pouvez avoir un site web utilisé en JavaScript,
mais si vous voulez faire de l'évaluation et de l'étude
et utiliser des libraries
pour faire des raisons plus agentes,
ils ne peuvent pas exister en JavaScript maintenant.
Oui, c'est quelque chose que j'ai vu dans ma expérience.
C'est inconsistent de ce qui existe.
Et vous avez beaucoup de gens qui ont fait des choses ensemble
et ils font beaucoup de bones recherches
sur lesquelles je pense que je vais faire de l'exploitation
et de la réaction de la version AI,
pas de la version UI.
Et c'est un peu...
l'écosystème est un peu inconsistant,
j'ai l'impression que c'est en le type script.
Mais je trouve qu'il y a un autre challenge sur le top de ceci,
c'est que vous êtes toujours intervier avec un service.
C'est comme ça que les gens sont très...
Les gens sont très intéressés avec l'AI open
et leurs offrements.
Mais il y a beaucoup de platforms de LLM.
Il y a beaucoup de tools de raison
à travers la table,
d'où la vision, la voix,
et nous avons les LLMs
et les autres services generaux.
Donc, vous sortez du poids
des offrements qui sont là-bas.
Comment vous figurez ?
Les LLMs sont bons pour ces choses.
Et puis,
c'est...
est-ce que c'est nécessaire
pour essayer de trouver un set
de services dans votre repertoire ?
Ou est-ce que vous devez juste prendre un
et se faire avec ça ?
Quoi que votre conseil ?
Je pense que ça devrait commencer
avec le problème que vous voulez se résoudre.
Et je pense que pour quelqu'un qui a été
un peu tinker, ça fait un peu de sens
de essayer l'opus,
essayer le groc,
essayer l'AI open.
Mais quand ça vient de construire
les produits qui sont en train de se résoudre,
vous allez être limités
par les limites de la carte,
et de la plattforme.
Parfois,
je vais juste vous demander à mes amis
à l'AI open
pour donner à mon client
plus de crédits.
En termes de ce que l'on voit
maintenant,
je pense qu'il y a
vraiment
trois joueurs intéressants,
Anthropoc,
il y a des modèles longs
comme Haiku et Opus
qui sont vraiment bons en enregistrement.
4.0,
avec leurs nouvelles
modèles multimodales
et des capacités en futur
qui vont vraiment
s'exprimer
entre les progrès de la langue.
Et ensuite,
un troisième qui est très intéressant
est le groc,
où vous devez avoir
moins de limites de la carte
et plus de limites de la carte,
mais juste par pouvoir
générer
30 000,
30 000 tokens
en seconde,
la façon dont vous build
votre produit
va être différente
juste par la
la carte
pour la seconde
velocity.
Je pense que après ça,
jouer en mode open source
et jouer en mode
avec ces autres
modèles en langue
fait beaucoup de sens,
mais dans ces
modèles en production,
c'est vraiment
entre
l'anthrope
et l'opening.
Et encore,
c'est plus important
que vous entendrez
votre problème de business,
figurez comment vous pouvez
tester
lequel est meilleur
et juste
tourner
dans le suite
de tests
plutôt que
tenter de

lequel va performer
plus ou moins.
Donc,
est-ce groc
fondamentale
différent
que les autres?
Parce que,
dans mon avis,
c'est juste
une chose que vous pouvez
changer pour l'opening
et toutes les autres choses
et avoir des mêmes
propérations.
Oui,
donc,
ils ont
ils peuvent hoster
un bunch de modèles
comme Lama3
ou Mistral.
Mais je pense que
la propre unique
avec groc
avec la queue
est juste
comment c'est

Quand quelque chose
est 10 fois plus rapide,
30 fois plus rapide,
vous pouvez
vous pouvez construire
une autre application.
Je pense que c'est
quelque chose
qui est très
bien appelé
comme
un autre moyen
de travailler
avec ces systèmes.
Nous voulons remercier
notre sponsor
pour la semaine,
Clerk.
Clerk offre
le management de l'usage
à l'écran
pour vous faire
construire les app
plus vite.
N'a personne
qui veut
rester
concentré sur l'auth
ils veulent construire
leur actuale app.
Ce n'est pas important
si des gens disent
que c'est trop facile
d'adresser
un user
à un table.
L'auth
rapidement
s'étend
avec tant de
différentes
moyens
pour impliquer l'auth
comme
l'authentification
multifacturale
SSO.
Vous ne pourrez même
impliquer
un de ces
plus difficile
de faire
des logons d'entreprise
au learned unreучible !
Les
log in to my app for a day and then I'm gonna have a huge bill at the end of the month.
This program what it does is if a user signs up, uses your app, only uses it for one day
and then leaves, you never actually pay for them because they're not really one of your
customers.
I really like this because it allows you to use the ease of clerk while also not running
up the bill if you happen to go viral.
Super cool.
If you want to learn more about clerk, head over to clerk.com.
Or you can go back to episode 75 where we interview one of the co-founders, Braden.
Are you tired of hearing these ads?
Become a member on one of the various channels that we offer it.
And if you're not quite up for that, you can support us by buying some of our merch.
Head over to shop.debtools.fm to see what we got.
And with that, let's get back to the episode.
So like a lot of these things are like providers right now and I really think the future is
like on device.
Like I don't want to have to pay for every single thing.
I ask a thing.
Like usage-based pricing for something I might use in my personal life is not a fun thing.
So do you think some of these models could come to a local device in the future or any
of them are moving that direction or is just like an LLM, something that's just kind of
too big in general to be run like maybe in like a browser?
Yeah, here it's really going to be around how specific you want these certain tasks.
Like you can very much imagine like a language model that is just in your keyboard that can
do better like text protection, right?
Or using a small language model to improve how Siri works.
I have some bigger opinions on like whether or not you can have a smaller model be able
to reason more with a rag that doesn't need to have the knowledge of the world.
But I think everyone kind of recognizes that on device will ultimately be the way to go
and it's just a matter of how we can scale that and its capabilities in order to be actually
useful, right?
Because I think any language model of any size can probably run on your phone and help
you do better like auto correct and auto completion.
But you know, I don't know if like in the near future where I have a small model run
on your phone that is very like understanding of like medical records or legal documents,
we might need some, you know, offloading perspective.
Yeah, so I won't be generating code in my iMessage reply box.
Maybe some simple code, you know.
Yeah, I think it's interesting to think about.
So there's like, there's, I suppose like two sides of this.
So the sort of data that it takes to train the models, I'm sure there's like a certain
like compression rate that you can't repeat.
I'm sure that there's some theoretical maximum for that.
So it's like, oh, you want to like stuff all the knowledge of the world in there, whereas
probably going to be a ceiling to like how small that can actually be.
And then, you know, there's also like hardware improvements.
It's like making it like dedicated hardware that's like really, really, really good at
like either storing or evaluating these neural networks.
It's all sort of interesting to think about.
What do you think is the most important like area of unlock that we think maybe we'll hit
in the next few years?
Will it be like just continue to improve the models?
Like, oh, now we have like GPT-5 or will it be like hardware, like better evaluation
on like smaller machines?
What do you think the big unlock for the next step is?
I think it comes from basically what you said about that compression rate.

It's like, OK, if we want to compress all the world's knowledge, you know, there is
going to be some number of gigabytes.
This model has to be in order to have that.
But I think the more interesting question is, is that actually what we need?
Right.
So it seems to be the case that as these models get bigger and you give it the world's
knowledge, it's able to reason and read and reply.
But you can imagine a world where what if I take the this compressed knowledge of the
world and then subtract the world back out and try to only preserve the reasoning aspects.
Then you could hypothesize that what if I just had a system with longer context, doesn't have
to know everything about the universe, but is able to correctly take control of an iPhone.
There's like, this might be a little bit more reasonable.
I can just phrase a message.
And if someone asks me a question, it can go search that by understanding that it can
use a tool to do search and not have to necessarily remember every single fact or
every single medical condition that they have.
Right.
You can imagine training a model that is just able to reason, use tools and read.
But not necessarily remember that, you know, every single thing in the ennui psychopédia.
There are a lot of different tools and a lot of different acronyms that come with being an
AI engineer.
There's RAG, there's LLMS, there's vector databases.
Which one of these things do you think is like an actually important thing to learn?
And what do you think is vaporware?
I would say right now, I mean, I have some opinions here, but I think for the
most part, things like RAG and VQ databases will definitely be needed.
Right.
But only in conjunction with something like full tech search.
Right.
So if you look at how we test things, you know, we have had tech search for a very long
time and we've spent a lot of effort and a lot of money making tech search really,
really good.
And as a result, you know, I think we are already trained to understand something.
Like if we wrote the document, we know how to retrieve it by writing a search query.
With something like vector database, it lets you be a little bit fuzzier and search
things that are, you know, tangentially related.
I think if you combine the two, you get a pretty good, pretty good search system that
you can use for either RAG, right?
Which we basically full fine documents, we give them to a language model and a
language model can read that out and then try to give you an answer.
But also just for plain document search.
I think in terms of vaporware, like this is a very spicy opinion, but I think things
like knowledge graphs, for example, right?
I think anything that's not sequel in the long run will probably look like vaporware.
But that's, I think, a pretty hot take.
Sequel's a safe bet, I think in general.
But it does, it is an interesting parallel between how do you take concrete
data structures that are stored in some uniform way in a relational database or
whatever versus the more fuzzy, sometimes fake world of LLMs where you have relative
knowledge that's sort of accumulated over in a neural network.
Or, because it's processed a lot of information and connecting the bridge
between those is interesting.
I do wonder what the world will look like in a few years.
And especially if we can make more progress on correctness, which seems like a big thing.
So maybe this is a good next question.
It's like, how do you think about the hallucination issue with LLMs?
And it's like, is there a world in which we're able to wrangle this a little bit more,
like get closer to like a correctness threshold or something?
Yeah, I think the way that we'll sort of try to avoid this is by two things.
Right?
I think one language models try very hard to make the reader happy.
And so it will try to lie to satisfy the user.
I think one part will just be making sure that we can like down sample that kind of
behavior and be more comfortable with saying, like, no, I don't understand.
The second thing again, really goes down to this idea that like, because it has
knowledge of the world and it tries to be helpful at all costs, it might make things up.
But if we train models that have less knowledge and more reasoning, it might get us to a place
where we can only give the data that we want it to look at.
And it will try to reference and cite as much of it as possible to make sure
that things are a little bit more grounded.
And in practice, when we do these fine tuning tasks where, you know, if I have a question,
it will make a list of sentences and a list of citations.
And if we fine tune in a way where we say, OK, if every citation must validate the statement
that you're going to make, we do find better, you know, hallucination reduction rates.
Right?
But this is only because our hallucination task is very specific.
You might just be, I don't want you to write URLs that don't exist.
Right?
That's a very concrete, measurable way of saying, like, yes or no, something is correct.
But in the more general case, you know, if it's hard to figure out what it actually, what
does it actually mean to have a hallucination?
And is it always a bug versus, you know, a future at times?
But I think ultimately that might just come down to having different
kind of models with different requirements on again, like, how happy do you want
to make the user versus how much knowledge do you have and versus how much do you want
to cite and, you know, be able to say no to things.
Yeah, it's an interesting property of LLMs, where sometimes I have to go,
you're wrong, stop lying.
And they're all, I'm sorry, I'm sorry.
Like, so it seems seems like a hard problem to solve.
But like, I think one of your points there that will be able to, like, factor out
the reasoning, like that seems like kind of a spicy point in itself.
Just like, is, is there a reasoning to this things?
Because like the way I've come to think about it, like, at first, I was like, oh,
maybe there is some reasoning, but in my mind now, it's like LLMs are mostly just
like, let's predict the next word that's most likely and it's right most of the time.
So do you actually think we can like factor reasoning out?
I think so.
Like, we have, we have models that can like, you know, play chess,
we have models that can play go, we have models that can like beat Dota
and win poker.
So there's definitely some reasoning aspect, right?
But I definitely also think that, like, all this world knowledge is kind of just
because we haven't solved reasoning reasonably well, like, we don't have
the data for reasoning, but we just have data about all of humanity.
So yeah, I definitely think there is a future where if we can figure out how to remove
the, the knowledge part, we could stop a system that says, hey,
like, I don't really know what kind of libraries exist, but if you're giving
the documentation for these libraries, I will now write more correct code
because I'm never going to like hallucinate a library that doesn't exist
in order to generate code for you.
Yeah, it would be so useful because like when I'm using GPT, it like in my code
base, it like, sometimes we have a very complicated model package, like
not a, not a LLM model, like a data model package in our repo.
And it loves to come up with like just the perfect function.
And you're like, that's what I want.
And then you go look and it doesn't exist.
Yeah, it's like from library import solve my problem.
And then I just like, yeah, exactly.
And, you know, there's now a bunch of open source issues of just users who
had hallucinated methods in someone's library.
And the maintainers are just like, what the hell guys, this doesn't even exist.
Did you even check that?
That is a fascinating problem though.
There's like the second order effects of solution of like people taking it
seriously and they're trying to do things or like submitting things as
answers or like, you know, unfortunate misuses of like, you know, maybe
a, if you have like a doctor or a lawyer or like someone whose opinion
really matters, a civil engineer, you know, like you want them to be correct.
And if they get lazy and use an LLM, that's fraught.
Yeah, I think there it's going to be a lot around how do we build UIs that
allow humans to evaluate the correctness of systems, right?
Like if the EISystem is generating a legal statement, you know, if we ask
the human to read the entire legal statement, that might be a very difficult task.
But if we can create pairs of just, you know, like examples to show someone
evaluating maybe like three or four sentences at a time and referencing
some source material, that might be a task that takes, you know, 30 seconds
per label, whereas reading the entire contract might just be much, much harder.
So I think there's going to be a lot more research in figuring out what is
the best way of getting feedback from experts that we can use to then, you know,
maybe fine tune the model to be better or generally change the way people
will have to work, right?
Maybe in the future, there's only code reviewers, there's no code developers.
And, you know, that could be interesting, but, you know, does that mean
we're going to be shown multiple versions of a PR and told which one is correct?
Or are we going to be evaluating, like, is there only job to be like
be building like unit tests, you know, unsure?
But there might be a different way of interacting with how we build
things and how we generate data.
Kind of speaking of like how we build things, I wanted to ask you a little
bit more of a, like, I guess an industry specific question.
You know, it definitely seems like the tech industry does tend to go through
fads, so we'll have like hype cycles, you know, we had like web three
and the pandemic was like a big hype cycle.
And now we have like, obviously AI is a huge hype cycle.
And we see a lot of companies that are just like sprinkling on, you know,
LLMs into their products is like, oh, we are now the blah, blah, blah for AI
or like the AI for blah, blah, blah.
I'm curious about how you think about that, especially as you're looking
for companies to engage and work opportunities with.
I'm sure you have to have some level of filter.
Like, does this actually make sense for their business use case?
And you sort of refer to this a few times.
So how do you sort through like valuable usages versus like marketing
speak in the AI world?
Yeah, I think the biggest thing is a good AI product is a product
someone will actually think about and use on a regular basis.
And I think a great product is one where I'm almost a little bit anxious
when it's not around anymore.

And the reason, there's really two reasons.
I really like AI products that do like blender minimisation.
So I don't like, I don't really want systems that will try to do my job
or do it better than me, because I don't believe that can happen.
But I can know for certain that when I have this AI co-pilot
around that, you know, there are going to be less mistakes.
That's a very simple example.
You know, a product that could make you a little bit more anxious is like,
I am pretty dependent on a good note-taking app now,
right, with things like limitless and with CircleBack.
If I'm doing a job interview or I am meeting with an investor
and I don't have the notes, I think like, oh, let me go
and invite this just in case, because I might miss something or I might do
something else, right?
And this is because, again, like in every meeting, I'm so used to it
just providing this blender minimisation that I feel very good
about using these products.
The second thing is going to be around, you know, who are you actually selling to?
Like, are you selling to a consumer?
Are you selling to someone who is just trying to save time?
Versus selling to someone who is using the outputs of these AI systems
to make better decisions, right?
I always tell my friends, like, if you sell to someone who's trying to save time,
there may be willing to spend like $100, like a year tops.
But if you sell to someone who's trying to make decisions, right, like if you sell
to an investor and by using AI, they can do research faster, right?
Again, minimising the mistakes means that I can, you know, maybe like source
more deals, but ultimately make better decisions.
And so those two, I think, are the biggest ones.
Like something to be memorable, something that makes you a little bit anxious
once it like leaves your life and has something that helps you
make better decisions and minimise mistakes rather than just saving time.
So I think the saving time is kind of the trap right now, which is like,
you got to assume that their time is valuable and they even value their time.
Yeah, I've seen some interesting takes on this.
So like linear had came out with a product feature,
very specifically for, you know, a particular area of their product.
And they were talking about, you know, they're very opinionated
on how they do product design and kind of the best of ways.
And they're like, you know, you can use AI as an enhancement for a feature,
but it shouldn't be like the sole purpose.
It's like a means to an end.
And I thought that they sort of handled it well.
And it's just interesting to see how people position themselves.
I've seen some companies that like market themselves as AI, and you're like,
I have no idea how that even like makes any sense at all.
Like, what, what are you using?
Like, and then, you know, obviously,
there's, there are some that are just like chat GPT rappers.
They're like, oh, we'll just like put a UI on front of open AI
and, you know, build another thing.
So it's interesting to see the gamut of like how people are
experimenting with this and where they're going.
But I think we still have a lot to learn as an industry, for sure.
Yeah.
But even there, I think a lot of it ends up being like,
where's the value actually being derived?
Like my favorite example around GPT rappers is actually job boards.
Right.
Like, there's a ton of job boards, making a ton of money,
but it's really just a wrapper over like a MySQL database.
Right.
Like, what you're actually selling is just rows in a database
with like, you know, some blink to check out page.
But because of the way you package it,
you know, but because of the way you, you know, prepared it,
it's very cheap to, you know, pay $300
for a job listing because recruiter takes 10%
over their salary.
Right.
So again, because you're selling,
he's a decision maker.
It's really easy to capture that value.
Right.
You wouldn't think to yourself, OK,
well, the database cost me $10 a month.
So I should only be charging like 10 cents per job listing.
Right.
I think it's the same thing in the world where we should be
vapour, we should be pricing on that value rather than just saying
like, we will save some time, et cetera, et cetera.
Yeah, I like the way you put it of like the anxiety.
Like, if I think to all of the tools that I like to use,
that like have AI involved, like if you took them away from me,
yes, I'd be very anxious because like, they provide a lot of value to me.
Yeah.
Like now when I code, I will actually type something and then pause
and then wait for the next completion to happen.
And if the completion is slow, I go like, was I unclear?
What did I do wrong?
Right.
OK.
And I think you can just, you can tell when that is the case.
Yeah, I definitely have that same thing where it's like, oh, no,
they're not showing up.
I feel the dread of having to type so much more.
Yeah, exactly.
So there's a lot of people building a lot of things with LLMs.
But what do you think most people get wrong
when they start integrating LLMs into their product?
This kind of ties into what we were just talking about.
I think that the simplest thing, which I think almost every person
I've spoken to always has like made this mistake,
is thinking about things like fine tuning.
A little bit too early.
And in particular, using the cheapest models first.
Right.
I think what you should be doing is using the most expensive model
you can find to figure out if it's even possible at all.
And once you prove out these concepts, reduce cost
if it actually makes sense to reduce cost.

You know, it takes maybe a couple of cents to call open AI.
And maybe if you have enough users,
that's going to cost a bunch of money.
But if you start doing things like fine tuning,
now you have to hire a machine learning engineer.
Now you're worrying about like, where do you get your GPUs?
You know, it ends up being kind of a nightmare.
And so I had this tweet that went pretty popular a couple of weeks ago.
That was just like, hey, like, if you're worried about your LLM costs,
I don't think your product is like that valuable.
And you should just charge more.
I can sort of figure out how to like save, you know,
2 cents on the dollar.
Sol a really valuable problem instead.
And so, yeah, I think the biggest mistake is around
trying to use these open models and trying to get fine tuning.
And you should really be trying to build a product
and use the best models you have access to.
And that just might be like 4.0 or Opus.
Yeah.
It seems odd to me that people, like, there isn't a,
like a nice UI web solution where I can go fine tune models,
like super easily.
Cause like, I did some explorations with like image generation
and I did my own Laura on top of stable diffusion
that like kind of encoded in a certain anime style into it.
And I was actually really surprised at how easy the process was.
It was just like hidden in a Google collab notebook.
And I had to like put things in Google cloud.
And it just felt so like janky to me.
And it was like, this is an easy process, but it's like,
there's layers of this like fud of like, oh, it's a hard thing to do.
And you have to set a lot of things up.
So do you think, are there any startups like that?
Or do you think there's like room for one?
I mean, there's like open AI does it, I think, together.
And it's also just fine tune models.
But I think the difference between, again, it's like, it's the same as
getting a summary and getting a good summary
that people would pay for is very different.
And so if you think about the examples of like generating images,
you know, like generate cartoons, they're very straightforward.
But I have a friend who uses like generative AI
to generate fake images of MRIs with tumors in them.
And they use that to augment their model
to be able to better detect tumors, right?
That ends up being a very specific problem.
We understandably were the right knobs.
I need to figure out to actually build images
that can improve my model, right?
And that just ends up being like data preparation
and model training.
And less about just like, did I have a folder of 30 pictures
that can just throw into some UI?
And I think that's the same thing with fine tuning.
It's really easy to fine tune a model.
But it's really hard to figure out
if that actually has resulted in, you know,
a better business outcome.
Yeah, it's really interesting.
There's, it seems like there's a lot of things to keep in mind
when you're trying to add something like this to your product.
So what do you think are the hardest parts
of bringing like some AI solution to production?
I think bringing something into production
is actually quite simple now
because it's all really APIs.
The harder part is when you go viral
and the pie doesn't work
or you lose faith in it because it's hallucinating.
How do you then debug how to improve these systems?

I think software engineers really feel like
if you have it in production and you've deployed it,
it means it worked, right?
But really, you get in this place
where actually when you deploy it, it's like 80%.
80% doesn't even mean that you shouldn't have deployed.
It's just where you're at.
And then going from 80
and actually being able to figure out what that number means
and how you improve that metric,
I think that's, that's me in the hard part.
And that's where I think like most of my advisory work has been
is actually after deployment.
It's like, hey Jason, like, we have deployed the system,
we went viral, we have a bunch of users,
but now we're losing 20% of our users every month
because it's not setting things correctly
or it's not able to actually generate summaries
that are useful for people, right?
Like people are passing in three hour lectures
hoping to get, you know, study notes
and we get seven sentences that say,
like this video is a professor talking about
the importance of mathematics in the work.
But you know what I mean?
Like, and figuring out how to actually quantify that,
those end up being the harder problems.
This is funny because this is very topical
with like Google giving search recommendations
for like, oh, you should eat rocks for your health
or whatever, it's just like AI generated bull crap, right?
But it's like, how would you convert that into like any number
or any binary thing that you could say,
you know, selects, like select star from data
where like label greater than point five,
give me all the bad example, let's fix that.
They can't do it, right?
And so it's really hard to actually go figure out
like how do you debug that whole process?
So what is the strategy that you suggest to these companies?
Cause like testing seems like a hard problem
where it's like in traditional testing,
it's like I have inputs, there are outputs,
they should always be this.
In most prompts, it's like how do you determine good?
Like here at Descript, we have,
I think there's like one or two workflows
that we have like tests for where we can be like,
oh, this got better, this got worse,
but that doesn't seem like it's the case for most problems.
Yeah, I mean, my solution to this
is very much from like my social networks background.
Like you would just, you should just launch the product.
Like if Google, for example,
I think they should have like launched up the product
in a much smaller English speaking population
that was not the US.
Right, just launch in New Zealand,
run it for like three or four weeks,
have a really important like feedback mechanisms,
collect that feedback and then figure out what's going on.
But even then at like Facebook, for example,
when we did that, it would still be the case
that if you were able to get a New Zealand and Australia
to run these tests, when you then deploy to the US,
the Americans are still just like super unhinged
and it's still hard to red team
what exactly the US population will use
with these language models.
But I think that majority is like very much unsolved.
And I think that's why they took so long to deploy.
And even when they did take so long, they still messed up.
So part of me really feels like
at this point, you should just deploy earlier,
mess up quicker,
and then just sort of make sure
that the team is in place to iterate quickly.
Yeah, that's a really interesting point.
There's been a theme that has happened over,
that we've covered over the last several episodes.
But I think the most prominent like start of this
was when we talked to Danny Grant, Jam Dev,
talking about like how we just have a higher quality bar
for software products these days.
This is like, say like 2010s area,
it was like MVP, if you're not embarrassed of it,
you ship too late,
just like get it out, get it out, get it out,
get it in front of people.
And that seems to be kind of to your point,
we're seeing that more and more with the AI space.
And my hypothesis on that was largely
because everything is moving so fast
that people feel like we're gonna get left behind
if we just try to make this thing perfect.
But there is this tension of like,
people expect more from their software,
they expect it to be more correct,
more beautiful, more capable,
like whatever else, and have like less patience for it.
So do you think that,
that, I don't know,
people are gonna be fundamentally more patient
with like LLM generated things,
or like do you think that this will be an issue?
I'm just kind of curious to like contrast these areas.
I think it comes down again
to sort of selling to the wrong audience, right?
It's like, if we sell to these consumers
for these like $15 or like $5 a month apps,
you end up just getting kind of like
the cheapest, least patient person
that wants to like try something
to convince themselves they wanna save their time.
Whereas I find that when you actually sell to like,
you know, bigger,
like for example, when we sell to things
like executive coaches and do like, you know,
call summaries or consultants,
we don't ever run into the patient's problem
because they just have other things they need to get done.
And this is something that they're using
to like unlock the productivity.
Whereas like, you know,
I think when these higher price point customers
end up having issues,
is around things like quality, right?
The consultant can say,
hey, I made 15 phone calls,
I know three or four people had to answer to this question
and you only pulled out two of them, right?
If something is wrong,
I don't process this anymore.
I think that's when you can then go back in
and because you build a very specific product,
you can go focus on that and improve it and measure it.
Whereas again, when you try to capture everybody,
it's really unclear what anybody really wants.
And as soon as you do any kind of improvements in the system,
you take it for granted basically the next day, right?
It's like day one and you get like wifi on them playing,
day two, you feel like it's too slow, right?
I think that's generally how, you know,
the average consumer feels.
Yeah, that echoes a sentiment
we heard from the creator of MPM Isaac.
He was like,
I'd rather sell to one person with a lot of money
than 10,000 people with not very much money.
It's a lot easier to keep that one person happy
than it is to keep 10,000 people happy.
Yeah, I mean,
cause I think patience is about having other things to do
and only busy people have other things to do.
And so if you have other busy people,
they understand that like,
you know, time is money, money is time
and they can sort of make those trade-offs
and recognize what they're getting out of it, right?
Like it is companies and managers
that can recognize, okay,
a junior engineer is going to be cheaper,
but I'm going to have to delegate more.
A senior engineer is going to be much more expensive,
but I can kind of tell them what I want.
And I know that a couple of weeks from now,
things are going to be okay, right?
And I think he, like the consumer base
hasn't really grasped that fact yet
as they're trying out different models.
So you've mentioned a few cool projects,
that meeting note thing that you mentioned.
I'm definitely going to check out
after this episode seems super useful.
But what are some other cool non-chat related products
slash projects that you've found really cool that use LLMs?
I think the biggest one would be cursor.
I don't know if anyone,
if you guys have tried it out,
but I think the way that they've built out
a experience that is better than co-pilot
because they have this like next action prediction
has been like very ergonomic.
So the idea is that you can just select code,
press command K,
and then give instructions.
And they can figure out what in the context
you need to use to generate better outputs.
And so, I will go into 4.0 to write some code,
I'll then enter opus and help me take that code
and turn to a blog post, right?
And then because you can do this very interesting
like app to command,
as you're giving us commands,
I can add external documentation,
I can add other files
and really have a really very natural way
of writing code now, right?
I have, for example,
my own library documentation in cursor.
So when I ask it to write better documentation
or more documentation,
I'll do something that's like at this file.py,
write it just like the docs of at constructor.py, right?
And that just feels very, very natural.
And again, once I don't have that,
I get that little bit of anxiety
that I think is really important for these language models.
Yeah, that's really interesting.
I think that like,
it is interesting in the way that we're developing new habits
around these like tools.
I read a thing about like the millennial pause,
it's like this reaction where when you're starting to,
when like millennials are starting to record themselves,
they like pause for a few seconds before the recording starts,
whereas like Gen Z just like goes right into it or something.
So it's like, I wonder what,
what are LLMs gonna do to us in this way, right?
What ticks are we gonna develop?
I mean, with coding now, I basically have that pause.
Like when I go on my friend's computer and I type something,
I'll like start the name and I'll,
I kind of already know what the autocomplete would have given me,
but on a different computer, I just look like I'm extra slow.
Even the way that I write now,
like a lot of my coding is actually using a speech attack.
So I both will use speech attacks and curse at the same time.
And I'm kind of just very comfortable now
with like selecting code, talking a little bit,
selecting code, talking a little bit.
And now I definitely can't do that in Vim
or in all my friends computer.
So I've ever had to like show them something.
Yes, dictation and LLMs go together very well
cause it's like sometimes when you're writing these prompts,
you're like, I just, I could just code this in less characters.
It might take me less key strokes to do it in the end,
but actually talking to my editor
seems like a really nice workflow.
It's been a, yeah, I've been using it for a year now.
It's pretty good.
So we have like iPad babies, you know,
who just like come up like with touchscreens.
Now I'm wondering what, like, what are LLM babies gonna be like?
They just like want to talk to every computer.
I think it's gonna be great because LLMs require,
at least right now, require to be very specific and intentional
in how you describe the requirements
of the prompts you want to solve.
And I definitely think using an LLM
has made me like a better manager of junior engineers
and being a manager of junior engineers
has made me a better prompt engineer, right?
Cause now you can't really just say like,
oh, solve this prompt for me.
I'd go, no, no, no, no.
Solve the prompt for me.
This is when you would know,
this is when you know you'll be successful.
Consider these three qualities as we build the system.
Make sure that like this piece of code
needs to be very well organized
because this is gonna be something to be open source.
But for this piece,
just do it as quickly as possible
and then you have to reason about whether
you want to use Opus or 4.0.
I think there is a bit of a skill and delegation
that people are developing because of language models.
Yeah, it's really fun to think about.
So let's transition over
and talk about some of the work that you've done.
So you have some open source tools
that you've been working on.
One is called Instructor.
What is Instructor and what does that help you with?
Yeah, so Instructor is basically types for alums.
Right now, when you post to an API call,
you make a request,
you send out some list of strings
and you get a string back out.
And if you want something that's structured,
you kind of hope that it's structured.
Maybe you're doing some regular expression
to parse out some JSON object.
Then you JSON loads
and you hope all the keys and values are in there.
And generally what you do is you might use Zod in JavaScript
or you might use Pygdantic in Python
to validate that object.
And then once you have that validation
and you have that guarantee,
then you're in a place where you kind of have a type
that you can work with
and that boundary between the API call
and the language model
and the rest of your system
is gonna be safe, right?
And one of the things we do with that validation
is if anyone fails validation,
we have some prompts that can go and re-ask
the language model and say, hey,
you had these errors,
the date was not formatted correctly,
the phone number's not formatted correctly.
Also, this response doesn't pass
some content moderation rules,
regenerate the answer.
And so in production settings,
you have type safety at runtime.
And then because you might want to fine tune models later,
you can then fine tune models that say,
okay, given the input
and these two attempts at getting the right answer,
now I want you to give me the answer in a single attempt.
And so for these very specific Unix-like type boundaries,
you can then fine tune very small
task-specific models to do these jobs.
Yeah, it's really cool.
It reminds me of a library from Microsoft
called TypeChat.
I think that some of the TypeScript teams
responsible for building that.
So how it works is like,
it'll get wrong stuff back sometimes
and just like re-prompt the thing over and over again?
Yeah, so I mean,
these language models are pretty good now
that like if you can do it in one attempt,
it'll usually just work.
It's usually it's like zero or one.
And yeah, they're basically smart enough
that you can basically capture any kind of validation error
as if it was a regular error.
Like the way that you implement this
is no different than how you would just implement a form.
So on the same shape of code that has,
you register a validator that attaches to some attribute
and you say, okay, well, password one and password two
has to match and you must match some rejects.
You can do that
and it basically captures the exception messages
and passes them back to language model.
But what this means is you can just build
more sophisticated validators.
And so today it might just be,
the list must be better than 10 items,
but tomorrow it might be the joke must be funny
and reference an animal, right?
Because you just might put LLMs in that loop again.
That's really cool.
I mean, I think this is like,
this kind of tooling is like more of what we need
for the correctness, right?
Just like having more confidence
that it is doing the things that we want it to do.
And especially because if you're feeding it back
into some other process and some other system,
then it like, you wanna make sure
that is at least correct.
So that's cool.
Yeah, like in the linear case, for example,
it might make sense to say, okay,
given this call transcript,
give me the action items as this big markdown file,
but that might not be consistently useful.
Whereas if you could generate just the task list
that matches the schema of the API call
you wanna make to linear
and then also assign all these dependencies,
now you can have the structure
that is not just a linear ticket,
but it could be an entire project based on that call, right?
And because you're working with the data structures
and because you have this type safety,
the code you write ends up being much nicer, right?
If you just ask for JSON,
you still have to like parse it
and then hope every attribute is correct.
And you know, you still have to make sure that
if one ID depends on another ID,
that it's assigned correctly,
but the validation kinda captures all that for you.
So does it just work on the output end
or the input end also?
So if I'm asking something
and I like, does it like kinda load into context,
like it should kinda look like this
and then when it gets it out,
it validates that it kinda looks like that?
Yeah, so there's a couple of different implementations
of how other language models
can do the structured output.
Sometimes we do something like constraints sampling
where because we know the shape,
we can pick the,
we can basically say like,
given this current state,
I know you're not allowed to generate any of these tokens,
only generate tokens that are valid.
And so, you know, JSON world, for example,
allows you to do that.
But in other tools,
in other systems,
they have something called tool calling,
which again, you pass in a JSON schema
and you get return an instance of that JSON class, right?
But JSON isn't necessarily enough.
And so that's where the validations come in
to take you to that final step of correctness
rather than just structure it up.
Nice, that's really cool.
So, right now,
we discussed there's lots of different tools
around these things,
but as language models get better,
do you think we'll need like less of these tools?
Like in a future where language models are better,
do we have to be not as good at prompt engineering?
Does that discipline just kinda disappear
if the model's good enough?
Yeah, so, this is really two things, right?
I do think that as the language models get better,
we're gonna need less tools.
This is because right now,
you need to have the model like generate an answer,
reflect on its own answer, correct it, then try again.
And so, I think definitely that will become simpler.
And in the same sense,
prompt engineering will be simpler
because it's able to reason a lot more
about how you do things.
Like today, to do a good summary,
you might have to say like you are an expert,
you know, executive,
that's reading these notes,
generate a meeting summary that is actionable
and has good references to who is accountable
for what things based off of this framework
that our company uses to do meeting minutes,
return that and mark down.
In the future, you might just say like,
this is for me, I am an executive, right?
You can definitely believe that.
But on the same token,
just because things get easier,
doesn't mean we do less work.
It's often the case that as things get cheaper,
we have more demands of the system.
A simple example is like the battery on your phone.
The battery has been increasing in capacity
over the past 10 years,
but the battery life is not
because we just keep building more complicated applications.
And so, I think that's where the trade-offs will be.
I think that the simple cases will get simpler,
but this will allow us to actually do much more
sophisticated things in the future, right?
Today, we need to use an agent to write an email.
Maybe tomorrow the email is done in one shot,
but biology research still needs to have
like an agent in a loop, right?
So, I think that's kind of where we'll meet
as language models get smarter.
Oh, that's cool.
So, if you could make one priority decision
for all the LLM providers,
just like this feature,
I want you all to implement it, what would it be?
This is very biased
because my answer is structured outputs, right?
And the reason is because I think
even if these systems get smarter,
the pain really is on the processing layer, right?
There's a reason we don't send things as un type JSON.
There's a reason we're not sending CSV files
over the internet, right?
We have serialization formats, we have protobufs
because they are more efficient in certain ways
and they're safer in certain ways.
I think as we take the code
that we write with language models more seriously,
they're gonna have a lot more requirements
on the safety of these models
in terms of just, again,
like how many runtime errors are we gonna have?
And this is one of these things
that will last no matter how smart these models are.
Like today we'll want structured
because we just want to write code that's not crazy.
Like, if someone made an API endpoint
and the return type was just string,
like I would be livid,
like I would never use that endpoint, right?
I would hope there was like some kind of like open API spec.
I'm hoping there's some example JSON
that I can look at.
That's because they make me feel bad
about consuming from these endpoints.
And so being able to specify these return types
in a much more opinionated way,
I think it's gonna be a really big step
in just making the adoption of language models
for systems higher.
Because for chatbots, it makes sense.
I send a message and I send a message out.
And now they're struggling with a multimodal
because they realize that, okay,
with the text message, I can attach a picture,
I can add multiple pictures, multiple captions.
It could be a voice memo.
And so they're trying to solve that aspect.
But when it comes to systems,
now I kind of want like a photo buff
to descend between systems.
Cool, with that, let's move on to tool tips.
So my first tool tip of the week
is a project that I already shared,
but it got better, so I'm gonna share it again.
v0.dev is a way to generate UI just from a prompt.
It does really well.
You can iterate on things,
but the update I really wanted to share here,
which I think is an interesting move for Versel,
is that it's now built as generate UI
with ShadCN slash UI,
which I think is just a great story,
just like some kid made a thing
that everybody started using now it works at Versel.
Now it's like the thing behind one of their new initiatives.
So what ShadCN UI is,
it's just a way to generate components into your code base.
And now you can combine them.
And so stuff you generate from v0
will actually be somewhat usable code
that you can plop down into your React app,
and it'll just work.
And assuming that you have all the ShadCN UI components installed,
you can even customize what the output will be.
So the stuff you're seeing in the app now
is not really what you might see when you put it in your app.
You're just seeing the structure that it produces instead,
which I think is a big step.
The next step I wanna see, of course, is not just ShadCN.
I wanna see this for my design system
so that engineers at my company could come in
and generate UI with our design system.
I think that would be pretty cool.
I do think it makes a lot of sense
to elevate the level of abstraction
that it's generating at,
because there's a lot of details
when you're thinking about UI,
it's just like accessibility,
and there's just a lot of fine grain details
that's gonna be really hard to generate correctly.
So going up an abstraction layer,
let's just have a solid base that we know is good.
But yeah, more support for more frameworks,
it'll be interesting.
Yeah, you just gotta go reach out to Versailles
on their white blood service,
and you'll have practice to implement
the descript design framework.
Yeah, at a low, low price,
it's probably way too much money.
Okay, next up, we have LLM Client.
Yeah, so this is a fun one that I found recently.
It's a TypeScript library.
Well, I'll come back to that in a second.
But it implements a lot of different things.
So it's got RAG, it's got React,
not React to UI, again, React to the AI picture,
Chain of Thought, Function Calling,
it works over different providers.
It's got a relatively simple AI,
and it also has open telemetry integration
if you wanna do tracing for it.
So Andrew, if you scroll down,
you can kinda see it'll give some interesting things.
So there was something that I didn't know.
It's like part of how this is structured is a,
there's like some research out of Stanford or something
about this like pretty simple syntax
of like describing questions and responses and types,
like very seriously in a prompt.
It reminds me Andrew, a little bit of pseudo-ling
when we talked about it a while back.
So it got some of that feel to it.
Anyway, it does a lot of stuff,
and it's kinda interesting.
I was like, I'm using it right now
to experiment on building a CLI agent,
just to be able to like describe,
hey, I want you to take action, like do these things.
Tell me when this file was created
and have it like give me a list of CLI commands
to run to be able to like do that.
You could do that all,
we're just like open AI's API if you wanted to do that.
But anyway, it's kinda interesting to explore this library.
If anybody is wanting to do
some open source contributions,
the typescript types on this could be greatly improved.
There's some weird build stuff under the hood here.
So that's the only caveat for that,
but it's been pretty cool.
Cool.
Next up, we have the combination of cursor,
which they got a nice new pretty website.
I haven't seen this one yet, and better dictation.
Yeah, so like guys,
if anyone follows me on Twitter,
they kinda know that I've been sort of fighting
this like hand injury for the past like two and a half years.
So it's one of the reasons I don't code as much
as I used to.
And to the tools I really use in combination a lot
is better dictation
that basically uses a on-device language model.
So it basically uses like on-device fast English whisper
to do a dictation.
And then I use cursor to then edit all my code.
And so a very simple pattern I do
is basically I'll select code command K,
command L to let it represent it.
And then I'm able to using my voice
generate some code.
Anyways, this is like a tool
that like my friend had built for me
and then we ended up building it out.
And so if anyone wants to try it out,
you can use JSON20 to get it for 20 bucks.
And yeah, all it does is it just loads
of the hugging face model locally on your computer
and you get access for life.
And so if you use this with Olamma,
you can actually code on a plane.
And that is a crazy feeling
that blew my mind at one time when I tried it out.
Yeah, I've been trying to,
I kind of got inspired by another coder.
Who's the guy?
Justin, we interviewed him.
Scott Hanselman.
Scott Hanselman is also very much
in the like dictate to code.
And then I got home and tried to start dictating
with Max dictation.
Oh my God, how have they not made it better yet?
It's crazy.
I was using the Macitation for about a year and a half
and it completely changed the way that I spoke
because I would have to enuncié
every single word in order for it to get you out.
But whisper, obviously, it's a lot faster,
but it has its own funny behavior.
I don't mean this to be my tool tip,
but there's this other project that I've had my eye on,
which should be easy to remember.
It's called Cursorless.
So it's cursorless.org
and it's specifically for coding with your voice.
So yeah, they do a lot of really interesting stuff
around like providing like little visual indicators
directly in your IDE to like help you jump
to different points.
So I feel like you need to move your cursor around.
It's a really fascinating project.
When I was at RecurCenter,
I was doing a little bit of research on,
or just like doing a thought experiment
is like what would happen if you tried to build a,
language for someone who is blind,
like just build a language for someone who is blind.
And that's like a hard thought experiment.
And then like I came to a conference
and I actually saw Cursorless.
And I mean, obviously you have to be cited
to be able to use this,
but for people with like hand injuries and stuff,
I was like, I think this is pretty interesting.
Yeah, at some point,
the company I worked for
almost suggested I get a typist.
So I could keep coding and I realized that it was coding
and if not, yeah,
like the coding was not just like editing a single file.
Like the thing that ended up driving me crazy
was like transitioning through large code bases, right?
Like if there was an on call,
like it's not one file that has like one typo,
it's like, okay, look at the error message.
Can you online, 172, can you jump for that file?
Okay, can you just double,
okay, what's that function name called?
Okay, and then can you go back to the original?
It gets crazy, right?
And so that's why I think cursor
has like the magic of all the prompts inside.
And so it's able to do that.
But this is very cool.
I'm gonna definitely check it out, cursor list.
Okay, so a coworker shared this this morning on Slack
and it is amazing.
So this is a CLI library to just do crazy tech stuff.
And I have no clue how they're doing it.
Like some of the examples they do,
I can't even imagine a terminal rendering that.
So like if you just go through these,
there's like a bunch of like matrixy looking ones.
There's one where this,
this becomes like a circle out of a galaxy.
There's one where all the text on the screen turns into fire.
There's just so many cool things
that they've done with this library.
And I really wanna see some terminals integrated
cause this seems like the most polished terminal thing
I've ever seen.
Sure, like there's a library
where you can generate these like text image type things
that some people use,
but those pale in comparison
to what they're doing on this project.
So if you ever wanted to make a very pretty CLI with Python,
go check out what's it called?
Effects, just terminal text effect.
Yeah.
I highly suggest you go look at the website
in the show notes cause it's a fun scroll through.
Next up, we have Chidori.
Yeah.
So,
Chidori is a framework by this guy named Colton
who I had met through mutual friends.
And it's an agent framework for LMS essentially.
So there's a lot of problems in building agent flows.
And one of the things you think of is like,
say you have this chain of thought, this reasoning
and you have multiple steps that it has to go down.
If it gets to the wrong path or to the wrong conclusion,
a lot of times you just have to restart.
It's like, okay, sorry, that didn't go well.
Let's tweak the original thing
and go through all this reasoning again.
The really interesting thing about Chidori
is it does time travel debugging, kind of,
but you can reset to a different point
and say, actually, I wanna step back a few steps
and then retry from this point.
So Colton's working on this startup called 1000Birds.
It's like 1000Birds.ai
and then I think Chidori is like a part
of this larger ecosystem that he's building out.
But it's got a lot of really cool stuff in it.
It honestly reminds me a lot of the startup
that I'm working on, Membrane.
There are like some parallels
and the kinds of work that we're doing.
But because this is like really focused on agents and LMS,
I thought it was an interesting thing
to share for this one.
This reminds me of last episode a little bit
cause like Dagger almost,
I see what he was saying about LLMs
and Dagger working really well together
where you could create these big workflows
that are basically all cached
based on the inputs and outputs.
Pretty cool.
Last up, we have OneSec.
I mean, if anyone who follows my Twitter knows,
I post a ton.
And that is just fight using OneSec.
Basically what OneSec does is it just makes you take
like a deep breath before you open any apps.
And it really helps manage a lot of distractions
when I'm building things.
And so I have it set to like,
every time I open up Twitter,
the next time I open up Twitter,
it takes like 0.1 seconds longer to open up the app.
And it's pretty funny,
but it actually makes a meaningful difference
in my productivity.
Yeah, I'm a fan.
Oh yeah, you use them?
Yeah.
So you use this and we're able to post that many times
on Twitter in the last year?

I just use the laptop.
Oh, you just use your laptop.
There you go.
You can get around any system.
Exact, exactly.
Well, I got like hand issues now,
so I just got to make sure I'm not on my phone.
OneSec is a lot better than Apple's
like built-in sort of screen time
because there's always a way to skip it.
And you just like build a habit.
It's like, oh, pop up, skip, you know.
And then this like, at least forces you,
it still lets you do the thing,
but it like forces you to like pause for a few seconds
and that usually that initial dopamine craving
that you get, you like get a chance to realize
like, oh, I'm monkey brain right now.
I can like not do this.
Yeah, just like processing it really.
Yeah, yeah, exactly.
Cool. Well, that wraps it up for tool tips
and for the episode.
Thanks for coming on, Jason,
and teaching us all about LLMs
and how to use them
and what all the acronyms mean.
So thanks again for coming on.
Thanks, man.
Oh, the last.
Thanks for having me.
Yeah, thanks, Jason.
This has been, this has been really interesting.
And yeah, you're doing a lot of cool work.
We'd love to see it as it develops.

Episode suivant:

Naman Goel - StyleX

Les infos glanées

Je suis une fonctionnalité encore en dévelopement

Signaler une erreur

devtools.fm:DeveloperTools,OpenSource,SoftwareDevelopment

A podcast about developer tools and the people who make them. Join us as we embark on a journey to explore modern developer tooling and interview the people who make it possible. We love talking to the creators front-end frameworks (React, Solid, Svelte, Vue, Angular, etc), JavaScript and TypeScript runtimes (Node, Deno, Bun), Languages (Unison, Elixor, Rust, Zig), web tech (WASM, Web Containers, WebGPU, WebGL), database providers (Turso, Planetscale, Supabase, EdgeDB), and platforms (SST, AWS, Vercel, Netlify, Fly.io).

Card title

Lien du podcast

[{'term': 'Technology', 'label': None, 'scheme': 'http://www.itunes.com/'}]

Go somewhere