The One with AI and Todd Underwood

Durée: 43m23s

Date de sortie: 04/06/2025

In this Google Prodcast episode, Todd Underwood, a reliability expert from Anthropic with experience at Google and OpenAI, discusses the current state and future of AI in SRE. Todd and the hosts focus on the current state and future of AI and ML in production, particularly for SREs. Topics discussed include the challenges of AI-Ops, limitations of current anomaly detection, the potential for AI in config authoring and troubleshooting, trade-offs between product velocity and reliability, the evolving role of SREs in an AI-driven world, and book publication for optimal timing.

Salut tout le monde, bienvenue à la fête de la fête de la podcast.
Google est un podcast sur la compétition de l'engineering et de la production de la

Je suis votre host, Steve McGee.
Cette fête est de nos amis et de nos taux de la France.
C'est tout pour ce qui est venu dans le space de la SRE, de la nouvelle technologie, de
les processus modernisés.
Et bien sûr, la partie la plus importante est la fête que nous avons faite.
Alors, bonsoir à tous et à vous de vous rappeler, j'espère que ce n'est pas une stratégie.
Salut tout le monde, bienvenue à la fête de la podcast.
C'est un podcast sur Google sur la production de la SRE.
Aujourd'hui, nous avons un député spécial.
La TMU, tout le monde sait ce que ça veut dire.
La TMU, en vrai, son nom est probablement Todd Underwood.
Peut-être que c'est ce qu'il parle sur les formes actuales et les choses.
Et bien sûr, je suis toujours ici avec ma frère Matt.
Comment va la vie ?
Je suis bien.

Et je pense que les deux de vous, en fait, Matt et Todd,
peuvent même être dans la même partie du monde.
Est-ce que c'est même vrai ?
Est-ce que c'est possible ?
C'est vrai.
Vous êtes comme, plus que deux kilomètres de part.
Wow, c'est fou.
Je suis en fait en fait en ce moment dans son office.
Oh, même mieux.
C'est juste comme ça.
C'est une expérience bizarre.
C'est un peu...
C'est réel.
Insultant, frustré,
dénourvant.
Je n'aime pas que vous soyez en train de faire.
C'est bien.
C'est bien.
Je vais y aller.
C'est bien.
Oui.
Je suis dans son office.
Je ne suis pas à son office.
Je n'ai pas l'air d'utiliser.
Je suis en train de le faire.
Je ne peux pas dire ça.
Je vois que vous êtes occupé.
C'est...
C'est le correct mot.
Je suis occupé.
Todd, qui est-ce que vous êtes ?
Qu'est-ce que vous avez dit ?
Vous pouvez dire à nos listeners ?
Je suis Todd.
Je suis le président de l'anthropique,
qui est une compagnie de l'aéroport.
Je travaille pour développer
une compagnie de l'aéroport
et d'un pays de l'aéroport.
Près à ça, j'ai travaillé à l'aéroport
pour un temps de détail
et j'ai Google pour un temps très long.
Donc, j'ai été
en faisant des choses de la rédaction
pour la plupart de ma vie,
mais j'ai été en faisant des choses de la rédaction machine,
en particulier depuis 2009.
Donc, c'est ça que je suis.
Un piece d'advice que j'ai pour tout le monde
n'est pas à l'aéroport
sous aucun circumstance
qu'il devrait avoir à l'aéroport.
Je sais ça parce que j'ai écrit un livre
avec des gens de la rédaction machine.
C'était un misté catastrophique,
vraiment terrible.
Non, le livre est bien,
mais c'est vraiment beaucoup plus de travail
que ce que tu penses.
J'ai écrit un livre de l'un à l'autre,
et c'était difficile.
C'est tout de même juste Google Doc,
mais très long.
C'est un livre de l'autre.
Ah, man.
Ça prend beaucoup de temps.
La première histoire de l'histoire de l'aéroport
est de la fin de l'an dernier
et tu te printes le premier temps
et tu le réveilles.
Tu le réveilles tout le monde en ordre.
Et mon étudiant
m'a montré et m'a dit
qu'est-ce que c'est?
Et je suis comme,
c'est le livre.
C'est presque fini.
Et elle m'a dit
wait,
tu étais sérieux sur ça?
J'ai travaillé en train de faire
deux ans.
Elle m'a dit
oh, je pensais que
tout le monde est en train de faire un livre.
Tout le monde a un play de l'enfer.
Je me suis dit
non, nous étions en train de faire un livre.
Nous avons écrit un livre.
C'est différent.
C'est vrai.
Que te dis-tu?
Qu'est-ce que c'était un livre important?
Qu'est-ce qui s'est passé après le livre?
Je pense qu'il est intéressant de
dire
que le sujet est difficile
parce que
quand tu penses
qu'on fait ce que tu dis
et que tu parles
à des gens
qui font des choses réel.
Oh mon Dieu,
regarde ça,
tu as un.
Oh,
oui,
en train de faire des machines.
Et aussi,
ce qui est très curieux
dans ce livre
est
que les employés anthropiques
sont appelés
des ans.
Et comme l'animal,
je me suis dit
ce qui est bizarre
dans les couvres
un peu d'ans.
Oui,
ce qui est très étrange.
C'est destiné
à être
ce qui est intéressant
et difficile
dans ce livre
est
qu'on a intentionné
essayer de donner
un peu plus
plus
de vie
que ce que tu normalement
fais pour les livres techniques
en part
parce que
nous pensons
que ce genre de choses
change si vite
si tu vas essayer
de faire des choses
qu'on fait aujourd'hui.
Ne pas faire un livre,
juste mettre un blog,
faire des articles

faire un vidéo,
un tutoriel ou quelque chose.
Mais nous voulons essayer
de faire des principes
envers les principes.
Des choses comme
comment tu penses
de la managée
de la data
et de la metadata.
Ou comment tu penses
de la faute de la faute
de la faute
pour des services
qui ont
des AI,
des ML,
comme
un component
particulièrement important
de ces interfaces.
Et des applications.
Et donc,
pour faire ça,
tu dois être
plus
générique
et plus
principled,
ce qui est difficile
parce que tu veux
être concrète
suffisamment
d'être utile
mais pas
aussi concrète
que d'être
utile
dans deux semaines.
Je pense que c'était
ce qui était difficile.
Je pense que la réception
était bien
mais aussi,
ce qui est bizarre
c'est que
c'est super stupide.
Je dirais que
les nombreuses
des fois
que les gens
ont appris
le livre
dans le dernier
mois
n'est pas
assez similaire
de ce mois
dernier.
Ce qui est vraiment
surprenant
pour un livre technique.
C'est cool.
Oui.
Dans la récemme
de la nouvelle,
je voulais vous dire
que je l'ai spentée
beaucoup de temps
en vibescoitant
pour la première fois
dans ma vie.
Et c'est une phrase
donc cette phrase
ne peut pas s'en sortir
en perpétuité
comme les gens
ont trouvé ce podcast
25 ans maintenant.

j'ai eu un code AI
pour moi
et je n'ai pas
accepté
que ce soit le cas.
Je me dis
yep yep yep yep yep yep.
Je n'ai pas fait
aucun débug.
Et tout ce que j'ai fait
c'était
que
faire des tests
et maintenant
faire les tests passent.
Non, pas comme ça.
Comme ça.
C'était le niveau
de ma engineering prompt.
Ce n'était pas très smart.
C'était juste comme
Hey, man,
viens.
Et ça a été
c'était assez cool.
Encore un peu
de choses
productionnées.
Ce n'était pas
une production réelle,
bien sûr.
Mais c'était comme
faire ça
pour que je puisse
le faire sur un cloud
et faire
ça
pour que ce soit
débugable
et
et
pour que l'adlog
et
bla bla bla bla bla
des mots comme ça.
Donc
ce sont des
rouses
pour
parler
d'utiliser
l'AI
en production
parce que c'est
le topic
de
ce

C'est
le




le point.




C'est le point.


C'est le point.











C'est le point.
Mais






l' Sepjm

Luckily
award


l'authenticité. Et comme, on va être honnête, comme, la plupart de ce n'est pas travaillé très bien.
C'est vraiment bien sur des démarches, mais ensuite, vous allez utiliser ça sur un code réel, et c'est comme,
je ne sais pas vraiment ce qui se passe ici. Peut-être que vous pouvez faire un peu de travail pour
faire votre code base, juste comme ce que j'ai prévu, et puis je vais vous en sauver un peu de temps.
Et comme, c'est un trappage, hein. Donc, c'est là où nous avons été, jusqu'à maintenant,
100 ans de maintenant, je pense que nous tous agrions que les compétences vont faire
la plupart des choses pour eux et pour nous. On ne va pas les avoir besoin. Donc,
à peu près, le moment où nous sommes à 100 ans de maintenant, les choses vont changer beaucoup.
Donc, j'ai vu des choses limitées qui commencent à travailler, et c'est surtout des choses
qui sont super agressives pour le software et l'ingénierie, comme vous le disait Steve. Donc,
comme, vous savez, vous pouvez faire des terraformes pour moi. Oui, les choses peuvent faire des terraformes.
Vous pouvez produire une charte de l'alcool. Oui, comme, vous pouvez m'aider à faire des troubleshoots
pour faire ceci, comme, le commande de Kubernetes qui ne fonctionne pas. Oui, comme, ils peuvent faire
ça. Mais après, quand vous disiez, hey, pouvez-vous spécifier et architecte un série de services
rédundants, qui sont spread à trois locations avec 20 000 nodes et 1 location et 2000 nodes
dans les deux autres locations et qui sont très normales pour un humain. Vous vous dites,
hey, et ils pensent, non, ils ne sont pas là. Donc, à peu près, faites-vous des Kubernetes pour moi
et, comme, construire le service ? C'est le gaffe. Ça fait bien. Faites-le, c'est bon. Oui,
c'est bon. Faites-le, c'est bon. Et en pensant sur les tests, comme, l'une des choses la plus
plus drame de la vie, c'est la chose, je dois dire, un petit note de côté, c'est la chose que je trouve
le plus humain de la coding. Si vous disiez un système AI pour faire quelque chose par les tests,
la plus common chose que ça va souvent faire, c'est changer les tests pour passer. Et je vais
dire, c'est un peu comme ça. Mais, c'est un peu comme ça. C'est un peu comme ça.



Mais, c'est un peu comme ça. Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.

Mais, c'est un peu comme ça. Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est un peu comme ça. Mais, c'est un peu comme ça.
Mais, c'est pas une question facile de demander de votre documentation.
C'est pas une question de websearch.
C'est une question de l'aéroport.
Mais, si vous avez un grand nombre d'documentaires et vous demandez ce que vous avez,
les grands systèmes de l'aéroport vont vous répondre à cette question.
Ça me semble beaucoup comme une reduction de toile par cette technologie,
et aussi comme une fraîche de compétence.
Je sais que c'est probablement une phrase meilleure pour ça.
Mais, être capable de savoir plus sur ce que l'on a écrit la dernière fois,
et plus sur le state du système,
sur les sources, peut-être.
Je pense que, dans le piece de l'augmentation humaine,
tous les systèmes où vous pouvez mettre un série de documents
dans un projet et regarder ce projet,
comme notebook.lm, Google, Anthropoc, des projets,
les deux sont comme,
« Hey, link de ces documents,
d'où vous pouvez uploader les documents,
ou link de votre drive Google, ou autre. »
Et maintenant, je veux juste parler de ces documents.
Je veux demander des questions sur ces documents.
Et donc, des exemples cool que j'ai vu.
C'est un point de vue de production d'ingénierie,
mais je suis un manager.
Un exemple cool que j'ai entendu,
c'est de mettre tous vos documents en 1,
dans un projet.
Et puis, vous pouvez demander des questions
sur vos documents en 1, en aggregate.
Vous êtes comme,
« Qui est vraiment en train de se défendre,
ou qui a eu des questions intéressantes,
ou qui a-t-il rencontré en plus de 3 mois ? »
Vous pouvez avoir ces choses dans d'autres places,
mais vous pouvez mettre vos meetings de team doc,
ou vos interviews incidentaux,
et dire,
« Quels sont les reviews incidentales
les plus contentes dans les dernières 6 mois ? »
Vous pouvez avoir des questions,
mais vous pouvez avoir des questions,
« Quels sont les 3 documents
qui ont eu un nombre de commentaires,
un nombre d'enquête,

un nombre de résolves,
et les items de follow-up ? »
Toutes ces documents sont dans ce bucket
d'aide aux humains,
d'être plus utiles pour augmenter les humains.
Et les deux compétences ne sont pas en train de faire ça.
Nous n'avons pas à ce point.
Cela fait partie de quelque chose
qui est très important pour certains.
Le fait de la confiance et la sécurité,
de savoir que ce qui s'est passé
aux autres des décisions,
n'est pas en train de se faire perdre,
mais aussi ne pas être en train de dire
quelque chose qui est potentiellement problématique.
Qu'est-ce que vous avez en tête pour ce moment ?
Je pense que le plus grand que j'ai été
amélioré par des systèmes
qui sont basés sur les documents web
ou des documents de la store
et des documents de producteur,
c'est une citation.
Ne me dites pas que c'est une chose,
dis-moi pourquoi vous pensez que c'est une chose,
pour que je puisse construire la confiance.
C'est aussi le cas avec les humains.
Si quelqu'un me dit que le temps est terrible,
je ne sais pas pourquoi.
Pourquoi pensez-vous que ça ?
C'est parfaitement normal.
Où avez-vous l'information ?
Je veux avoir l'accessité et comprendre le mieux.
C'est un normal chose pour faire un autre être humain.
C'est aussi un normal chose
pour construire la confiance
dans l'AI.
Ne me dites pas que c'est un numéro,
je suis sûr que c'est un numéro,
mais aussi, dis-moi,
où avez-vous l'information ?
La citation est nécessaire.
Les systèmes anthropiques,
ou les documents de notebook,
qui travaillent avec des documents,
ont des link d'autres.
Ils disent que c'est un tableau,
un paragraphe,
et c'est pour ça que je pense
que c'est important,
ou que c'est faux,
ou que c'est intéressant,
ou que ceci existe.
J'aime cette réponse.
C'est aussi sur Google,
que nous avons essayé de les gens
de dire, et ici,
qu'on a des informations
d'autorité.
C'est la citation,
c'est la question de comment nous sommes
pour les gens de ces sortes.
La ampleur des lettres sont nos


d'autres.
Pour investir,




Je ne sais pas comment vous allez faire tout ça.
Ce que je pense, c'est que quand je pense à la manière dont j'utilise,
par exemple, le search, je pense qu'il y a des cas de utilisation différentes.
Un des cas de utilisation est,
est-ce que Cedar Rapids est le capital de Minnesota ?
Non, non, ce n'est pas.
Ok, ce n'est pas le capital de Minnesota ?
Dis-moi de Saint Paul.
Ok, je voulais juste savoir un facteur.
La autre question est, comment sont les capitaux de la U.S. sélectés ?
Ou pourquoi était la border entre le Canada et le U.S.
déclaré pour être le 45e parallèle ?
Pourquoi a-t-il fait ça ?
Ce n'est pas un answer un facteur.
Je veux faire des rédits, mais je veux aussi regarder les citations sur les rédits.
Donc, je pense que, quand nous nous sommes allés de simple des réponses
dans des trucs plus complexes,
je pense que plus et plus de gens seront intéressés dans les rédits
et les liens sur les rédits.
Il y a un issue de la U.S. où des versions d'exemple,
par exemple, des autres engines de search n'ont pas été très élevés,
des autres engines de search AI n'ont pas été très élevés,
mais les sources n'ont pas été très élevées.
Parce que vous voulez vraiment être comme,
Hey, je pense que ce sont des liens, les liens, les liens,
les liens, les liens, les liens, les liens,
et puis aussi, un table de citations à la fin.
C'est un X, mais je pense que ce sont les choses importantes.
Donc, juste focussant sur votre position aujourd'hui.
Donc, vous êtes, la relation,
Czar, on va dire,
non, vous avez probablement un vrai titre,
vous êtes une relation, un gars, à un endroit,
est-ce que la qualité fallait sous votre perte ?
C'est-à-dire, vous,
vous soyez au-delà de ce truc ?
Et comme, si, comment vous...
Comme, c'est où les Nations de Hallouze.
Donc, quelqu'un s'assoit, vous savez,
sur le dispute entre les deux portes,
comme, une hallucination peut totalement,
comme, se tromper dans l'arrivée.
Nous savons, comme, vous savez,
vous savez, vous êtes en train de mettre votre
rédacteur à une compagnie qui fait ce truc.
Comme, est-ce qu'il y a une façon de faire un graphisme ?
Comme, n'est-ce pas bien ?
Vous savez, quelqu'un a dit que ce n'était pas correct,
ou comme, on pense que peut-être,
le modèle était être étrange.
Comme, est-ce que ça,
ça juste se passe et on est comme,
je ne sais, c'est trop dur.
Ou est-ce qu'il y a un moyen de faire ça mieux en temps ?
Comme, comment vous vouliez aller across this type of
quality issue,
comme, c'est up, but it's weird, you know.
Donc, c'est drôle, vous avez sorti dans mon troupes,

Comme, j'ai donné un nombre
of public presentation.
Je pensais que vous avez sorti dans mon troupes,
but here we are, at an impasse.
J'ai donné un nombre of presentations,
arguing that end-to-end model quality
is the only SLO that people working on
reliability for ML systems can have.
And I think it's easier, like,
hallucination kind of stuff is interesting
and some of the subtle cases are interesting.
But let's just talk about, like, a payment broad system.
If the payment fraud system is running and the model is fresh
and the model just says yes,
every transaction is fraud,
the system is not running.
Like, it's not, you know,
it's not, you are no longer accepting any payments.
It's not an interesting case.
Like, the model just thinks that everything's a fraud.
Okay, is that okay ?
Might as well be down.
Right, might as well be down, right.
And so I think similarly, like,
if you're Amazon and the recommendation systems
recommence a kitty litter robot
to everyone regardless of what they're purchasing,
Amazon's losing tens or hundreds of millions of dollars
until they fix that recommendation.
Because, like, you know,
I'm not saying no one would buy it,
but I don't have a cat,
so I'm not buying, like, a kitty litter robot, right.
And so I think, so that's the first thing I would say.
And then the second thing is,
even in the modern systems,
one of the most common intersections
or one of the most obvious intersections
between model behavior and reliability
is, like, the trust and safety checking.
So you will do, like, at Anthropic,
we care desperately that the models are safe,
that the models are helpful,
and that the models don't let you do bad things.
Publish this whole responsible scaling policy
about, like, you know, what we're going to prevent
the models from helping people to do.
We don't want models to help people, you know,
carry out acts of terrorism.
We don't want models, right.
So, for example, like, you want a model
that understands chemistry and biology,
but you don't want a model that helps you
make bio weapons that will kill people, right.
So that's tough, because, you know,
some of that's just chemistry and biology,
so that's a tough line.
So even after you test the models,
the various people, sometimes, you know,
researchers are trying to get the models
to produce harmful content,
and there's some live checking that goes on.
If that live checking is bad,
it can just disable all of the sessions.
Be like, nope, nope, nope, nope, nope.
You still want to be able to be able to make real fertilizer,
not fertilizer for bad things, yeah.
That's right.
Yeah, for example, like, it's completely reasonable
for somebody who's got a farm to be like,
I'm sick of paying for,
is there anything I can do to not have to pay for this stuff?
Yes, but also, yes,
there are bad uses for similar chemical compounds.
And so, I think that's what's interesting is that,
what, so what I would say is that,
like, there's a tendency for SREs
who work on the stuff,
to be like, model quality, not my problem.
Like, well, literally,
the only reason you're running this system
is because the model does something.
Like, if the model didn't do something,
you wouldn't be asking it questions
in the middle of whatever you're doing,
whether it's targeting ads,
fixing spelling, correcting grammar,
having a conversation, identifying fraud.
Like, the model does that thing.
So, given that it does that thing,
if it stops doing that thing,
then you don't actually have a service anymore.
It's a complex issue though,
and I find it interesting because,
obviously, you know,
if I'm running a model, I didn't make the model.
But I think there's some good patterns
on how to do that.
So, one of the simplest patterns is,
if the model's brand new and nobody's ever used it,
and it's terrible, and like, Matt, you launched it,
you launched a bad model, take it back.
It's all your fault, I'm not gonna help you.
If the model's been working fine for two weeks,
four weeks, and nobody's touched it,
and all of a sudden it's garbage,
but so are five other models,
that's probably my fault, not Matt's fault, right?
Like, it's probably not all five models went bad at once.
So, you can do some, like, really simple correlation
to try to figure out, like,
is this ML problem that is particular
to the design of the model,
of the, you know, training of the model,
or is this some kind of a systems problem?
Yeah, life is systems, man, for sure.
This makes me think about a core principle,
which is an SRE principle
of making conscious decisions
between product velocity and product reliability.
And I want to hear your thoughts
on where you see, like, industry-wide,
on AI, product releases coming
in terms of reliability versus velocity right now,
because these releases are coming hard and fast,
and they're coming hard and fast because,
like, the discoveries are coming hard and fast
and the models are coming out really quickly.
What is your stance? What are you seeing right now?
Where are we in this pendulum between
things coming out faster and things coming out better?
And what are we to do?
I think it's been an interesting question.
What's funny is, like, that question,
I dealt with when I was working on the Google Cloud AI team
back in, like, I don't know, 2022 or 2021 or something.
Like, this is a long-term question,
long-term for our industry.
Ones of years ago.
Ones of years ago, yes.
But so I think that the question,
the right way to reframe this,
the way we often do in reliability circles,
let's think about the end users and their preferences,
and just say, like, hey, I got a new thing,
or I can spend time making the existing thing better.
What do you want?
And right now, the market says,
hey, I just want the new thing.
I just want the new thing.
Mostly.
I think there are some cases where that's not the case,
but the feedback I've heard from most users is,
you know, similarly, you can often trade reliability
for capacity as well as you can trade reliability
for velocity.
You can be like, well,
I can run the service hotter,
but it's going to break more often.
That's another common thing.
And most users will be like, yeah, yeah, I'll take that.
I'll take it.
I would like twice as much quota at like one and a half
or two fewer nines of reliability.
I'm like, really?
Like right now, a lot of users are saying yes to that.
I think that the key question, the same way,
the same way that cloud was like a toy until it wasn't,
like cloud was just kind of a thing people played with
and put some stuff in until you turned around and looked
and like 20 or 30% of all the like 911 services
in the country were running on somebody's cloud.
Oh, okay.
So not a toy need to be very thoughtful
about how we do this stuff.
I think we're going to get to the same thing
with the public AI systems,
but I still don't think we are.
I think users want a better model.
If we're doing this really quick
because this is what the market is asking for
and this is what we can deliver and like, okay,
sure market, you say so.
What if we're wrong?
Like what's the backup plan?
Like what if that goes too fast?
The worry that I have
and this is pretty philosophical
is that like this is a highly leverageable technology,
right?
Like we can get it to the hands of literally billions
of people and if it's goofy or weird or wrong,
like it can have drastic effects real fast
and it can have like a lot of them potentially,
especially in the not this year,
but next several years mode,
presuming that we're right about that.
Are there thoughts about that
that you're aware of within the industry,
within, you know, things that you work on
in terms of like, how do we, you know,
if it's bad, what do we do?
You know, like, is there a roll it back big red button
in the meta sense, I guess.
I mean, I think when Matt was talking about this,
I was thinking about reliability
and the like, the model's fine,
but it's not working right.
Like the inferences are not returning.
I think in terms of launching new models,
one of the things that like Anthropic
is particularly opinionated about
is being very careful and thoughtful of that.
So, you know, we have this,
it's, I think it's a great framework
and this responsible scaling policy
and the responsible scaling policy just says,
if you have a model that can do these things,
then you need security control that can do these things.
It's very explicit.
And before I started at Anthropic,
I thought like, ah,
this is the kind of thing that security orcs
do to market themselves.
When I got to Anthropic,
I asked like Jason Clinton,
who is the chief information security officer.
I was like, hey, how does security work here?
It's like, well, I mean,
we describe it publicly in the responsible scaling policy.
That's just what we do.
Every time we have a model,
we decide how capable the model is
and then we implement the controls
that are of that capability.
And so, that's what we have now.
And we're trying to encourage other organizations
to do similar sorts of thinking
because I think you're right.
Like, you know, there's,
the ways that humans use these things
are many and varied.
It's exciting.
It's wonderful.
But the potential for harm is pretty significant as well.
Right.
And that can be from,
I don't know if you all have read all these stories
and talked to friends who were like, oh yeah,
like I use chat GPT as a therapist.
Really?
Yeah.
The models are,
they're not suited to that.
Not that they couldn't be.
They, like, I can imagine a model
being a fantastic therapist,
but I'm not aware of any data that says
that we have models that are good at that
and safe to do that right now
because that's tricky.
So, yeah.
So what I would say is like, yeah,
everyone like,
anthropic is super careful about model releases
for these reasons
and we're hoping to sort of like
pressure the rest of the industry
and to being careful about it as well.
In my experience,
and I think yours,
because we worked together at the same place
for a long time,
like SRE, historically,
at least, yeah,
I would say SRE in general,
like has kind of been at this weird intersection
where we're like,
you know, the fight for the user group
where we would say like,
no, we shouldn't do the thing,
right?
Like, even though,
you know,
some team was like,
well, you know,
this feature is going to be great.
Like,
we would hold this sign saying like,
no, it's actually not good.
Do you think that SRE in the future,
the current isn't a similar position
to be able to say like,
yeah, but not cool.

this isn't going to work for the following reasons
or is it like,
are we still in that position
where we were looking at like the holistic picture
where maybe other people at the company are not,
you know,
and is this a bad idea
or is this like actually like an honorable thing
to try to do
at these types of companies
and like the origin behind that,
you know,
was always going back to like early days,
sys admins who are like,
I have access to all the secret data
and I take that responsibility
super seriously.
Like a lot of sys admins,
I know are as ethical
as the physicians.
I know about private day.
Like,
of course,
I know what you do in your computer.
And of course,
I would never let anybody know that
without appropriate permission,
etc.
Etc.
I think that the difference here is that
the worrisome capabilities
are much more subtle.
They're not like,
I'm going to expose your credit card information online
or I'm going to,
you know,
give somebody access to your bank account.
There,
I'm going to launch a model that is weird
and dangerous for a very small subset of people
under this very odd circumstance.
And so I think,
I guess what I'm getting to is like,
one of the things I really like about,
you know,
my current employer is that the whole company is organized
around trying to prevent this.
And I think it needs that because you need sophisticated
researchers.
You need a trust and safety team that are understanding
those.
You need red teaming.
You need people publishing about that stuff.
There was one of one of my co-workers who until recently
was here in Pittsburgh,
did a whole,
you know,
paper of like,
Hey,
I used one of our models to do like end to end network
penetration.
I just,
you know,
it worked.
And so I figured out how it worked.
And I wanted to publish.
It was really easy and easy to do on everybody else's models
too.
And I was like,
can you publish that?
He's like,
yeah,
it was so easy.
I'm sure other people are already doing it like we publish
it so that people can see how this is done so that we can,
right?
So this is my point is like,
you're not going to like that took a PhD researcher four weeks,
which is a small amount of time,
but also not something an SRE is going to do on a Tuesday
before their call shift starts at noon.
So it takes more work.
Tell us a story,
an anecdote,
something something that really surprised you like something
either adjacent or outside your field since changing companies.
You know,
you meet new people,
new scenarios,
something you've learned since changing that really took you by surprise.
I will say here speaking directly to us as the audience and what I
believe is your audience.
Most of us,
we're pretty skeptical about these technologies.
Right?
Ah,
they're okay.
They fail a lot.
I don't know.
I don't really trust them.
I'm not sure.
Also,
AI is just a bubble.
Also,
these are just glorified zip.
Like it's not really,
it's just some glorified compression.
LLMs are not like,
okay,
I think those are widely held beliefs.
And when I was at Open AI,
it was difficult to disprove those because Open AI is such a consumer-oriented company.
Consumers are very susceptible.
We love us a fad,
right?

we just love whatever other people are doing,
especially like nerd land.
We're like,
if three other technical people are doing a thing,
I just have to do that thing.
I'll take two.

I'll take two, right?
Then I came to my current employer who really sells mostly to businesses,
and it's not that businesses are not susceptible to fads,
but all of them are doing proof of concepts,
proofs of concept.
That's better.
And they're buying a lot.
And so I think what surprised me is how much of a real market for these services there is and how diverse it is.
Like clearly,
the coding use case is caught wildfire.
People love this for coding.
It's very concrete.
It's super economically valuable.
The price points are right.
Like it's going to change our whole industry and like we can all already see it.
That one has appeared basically out of nowhere in the last two years where,
you know,
when GitHub launched co-pilot,
I think a lot of us were like cute story,
but it's not that useful in part because the models weren't quite good enough yet.
Like it was a good UI,
but the models weren't good enough.
But now like Steve,
your vibe coding and it's kind of working.
That's wild,
but also for marketing and also like for marketing,
like product marketing,
someone was describing to me how one of their products is somebody writes a strategic product brief
and like outcomes,
the marketing segmentation plan,
the advertising plan,
the manufacturing plan,
what ingredients they should or should not use,
proposed names,
blah, blah, blah.
And I'm like,
oh,
is that good?
And they're like,
yeah,
we have currently between 112 and 120 people doing that to each product.
And if you can do that,
like those people could go do something else.
Like,
and it doesn't do it all by itself.
But anyway,
my point is like the thing that I have seen that has surprised me the most,
is the volume,
the intensity
and the diversity of real economically valuable use cases
that already exist in these models
that are far from where they're going to be in the near future.
So that's it.
I think the reasoning stuff,
the little sub point,
the reasoning technology that has appeared recently is fascinating.
I don't know if like,
if either of you watched Claude plays Pokemon,
that is mesmerizing.
So basically like,
they just turn Claude with reasoning turned on at Pokemon.
And they say go
and it gives it an interface
and so I can parse the screen and be like,
what do I see?
I think I'm in a room.
What can I do?
Are there any buttons I can press?
I will press this button.
That didn't do anything.
Should I press another button?
It's wild.
And you're like,
what is happening here?
And they can chart
how far each model has ever gotten playing Pokemon.
But it sort of shows you this like naive potential of what agents.
Like,
this is this is very primitive environment.
It's a very constrained environment.
It's old school Pokemon,
but it shows you the potential of what agents might be able to do.
So I would love to have an agent to be able to like have an agent and say,
hey,
book us a vacation next spring.
And just like,
have it know who we us vacation spring note what those things mean
and come back to like either no clarifying questions or a couple and like,
hey,
I got you and your kids and your partner plane tickets,
hôtel,
activities booked.
Like,
I actually replace your luggage
because I know that you've broken it down over the last 15 years
and you need a new roller bag.
So I got that on the way and like,
you're good to go.
I feel like,
wow,
you can imagine that we're not there yet,
but I don't know how far off we are from that.
And so that's going to be exciting as well.
I hate to be bad cop here,
Todd,
but there's quite a lot of hope in your voice.
And I recall hope being a phrase in SRE land.
Something we're not supposed to do too much.
It's weird.
So people who think a lot about AI harms
are some of the people who are most optimistic about AI benefits.
And like the analogy I'll give you is homeopathy.
No one worries that a homeopathic medicine will cause you harm
because it doesn't do anything.
And since it doesn't do anything,
it can't hurt you.
It's just water,
right?
Just go for it.
Just go for it.
You can over I mean,
you can overdo some water,
but it's got to be like many, many leaders, right?
Right.
With the AI,
it's the same thing.
Like if you think these things are incapable
and aren't getting much better very quickly,
then you didn't need to worry about the harms
because they're not any good.
And so that's one camp of people.
There's another camp of people
who are concerned about the human transition
to these technologies,
what the machines are capable,
what some of the harms might be
during that transition and after.
But those are the people
who are seeing the improvement curves
and saying like it's hard
in the ever present now
for us to take this seriously.
But cast your mind back 18 months ago,
like what could the models do cast your mind back
36 months ago?
What could the models do?
What are the models do today?
Like even if you just look at like the model launches
that were in October of last year
and like the Gemini 25
and the sonnet 37 launches
and you only think about coding,
you're like,
you know,
it writes really good code
a sonnet 37
and what writes really good code
as of last week
is the new Gemini launch.
Those write really good code
and they didn't four months ago.
So I don't know what happens
in four more months
and I don't know what happens
in 12 more months.
But like there has been no plateau
to the improvement
and therefore I think
a it's appropriate to be optimistic
because you're looking at a curve
and you're seeing where it goes
but then b it's also appropriate
to be thoughtful and cautious
and say,
okay,
what are we going to do
to make sure that this doesn't hurt people?
This doesn't hurt societies.
It doesn't hurt economies.
It doesn't hurt the world.
Right.
All right.
I got a hard hitting one for you Todd.
All right.
Some of our audience is listening
and some of our also watching.
Some of those who are watching
only see the top of your shirt.
Explain your shirt.
Describe it out loud
and then describe the whole thing.
What does it say?
Yes.
So the shirt says
has like two goofy SRE logos
a dragon and a unicorn.
The dragon is the original Google SRE
a logo and the unicorn
is the SRE EDU.
How do you say it?
Like not
mascot.
Thank you.
I was like mascot.
That means pet
but we don't call it anyway.
I got stuck in the middle there.
So the shirt says trader in a scoundrel
and it's got a box of filled in.
Like what are you?
What kind of SRE
are you a trader in a scoundrel to?
And I am a trader to MLSRE
as of 2023.
So what happened there?
What did what did we?
That's what I left.
And I think like what I
when people would leave my teams
when I was a manager at Google,
I would always call them a trader in a scoundrel
because I like I love that people would go do stuff.
Like I thought I was great.
Like in any good technical environment,
people go dry stuff
and then like if they think your team is great,
then they come back to something else
maybe for a different reason
and that's healthy and that's good.
And so sort of joking around with people
and then I like I worked with sky Wilson
and like we had the shirts made up.
She designed it and like just actually started giving them
to people writing in like I am a trader to
I have one that says MLSRE
because I founded machine learning SRE at Google.
I have one that says Pittsburgh SRE
because I used to be the pit SRE site lead
and ultimately the pit site lead.
So yeah, I really enjoy those
and I think it's like it's a nice marker
of like where you've been.
Yeah.
I have another hard end question.
Todd, now is your opportunity?
Just this is a you can do as if you want.
Okay.
Would you like to be like Nile
and tear up your own book live on camera?
No.
No, but if you were to
if you were to write another book magically
it just happened,
how would you like change what you guys wrote
about what would be like radically different?
Like what are the diffs?
What's the addendum that you would like to exist
without actually doing the work of making it exist?
What do they call this a fast follow
or maybe just an errata?
Yeah.
Yeah, that one.
I think that the whole like your experience
with vibe coding Steve sort of
is where a lot of us are.
And I think that getting the timing would be difficult.
But what I would probably do is start.
No, but you don't don't let me talk myself into that.
I'm who if anybody's listening.
This was not a trick.
You're tricking yourself here.
This is not anyone is listening from O'Reilly.
I am not writing another book.
But if I were to,
I would spend this next like six
or twelve months watching that transition
and trying to plot out what is our work like
in a year to five years,
which is hard to do.
But I would say like what are the like domains of expertise?
How does like testing and rollout work?
Like think about rollout.
Like most places don't have something like a canary
analysis server or something that like tech.
Like, but most people have like primitive smoke tests.
But what you really want is to ask a model like
I rolled out a thing.
Does it work?
Is it good?
Like it'd be amazing to be able to say
and have some sense of what work and good mean.
I think there's going to be a bunch of stuff like that.
And so what I'd like to do is try to help understand
what is going to be the technical work that we will have
in the future,
where instead of doing all of this,
we direct the work of all of this.
And the same way like it used to be
like you wrote code by hand with your fingers
and an Emax or VI window like God intended.
And now you just say like,
Hey, could you unroll this loop?
Now, you know what unroll and what loop means.
Like, you don't have to unroll the loop.
It just unrolls the loop for you
or it makes the UI for you
and it plombs them into your methods.
Right.
So, so I think like that's what I would try to do is
I would try to say like,
what is it like to do technical work in a world
where the execution becomes less and less important.
But the architecture and the purpose
and the design are still important
because I think that's what will be for a while.
Now, eventually we'll be past that,
but I just don't know how quickly that.
So that would be the book.
It would be like,
how do you direct the technical execution
of a production engineering environment
whose execution is managed by computers?
Well, thank you, Todd.
That was awesome.
Thank you for your time.
Is there anything else that you want to talk about
that we didn't hit?
Like, did you have a big gotcha
that you wanted to just drop on the world?
This is covering great stuff.
It's going to be, it's a fascinating.
One thing I will recommend people go look at
the closing keynote of SREcon,
which should be out on video
by the time you guys launch this.
Because it's charity majors,
Honeycomb, who has been notably like
AI Cranky, AI Skeptic,
who is now AI bargaining.
So sort of interesting transition there.

We should include those in the show notes.
Good.
Yeah, we'll do.
Thank you.
Thank you, Todd.
It's been great.
Nice to see you.
Good to see you.
Till next time.
Bye.
You've been listening to podcast,
Google's podcast on site reliability engineering.
Visit us on the web at sre.google,
where you can find papers, workshops,
videos and more about SRE.
The podcast is hosted by Steve McGee
with contributions from Jordan Greenberg,
Florian Rathgeber and Matt Siegler.
The podcast is produced by Paul Gulli-Mino,
Sunny Schau and Salim Virgi.
The podcast theme is Tell-Abot by Javi Beltran.
Special thanks to MP English and Jen Petoff.

Les infos glanées

Je suis une fonctionnalité encore en dévelopement

Signaler une erreur

GoogleSREProdcast

SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
Tags
Card title

Lien du podcast

[{'term': 'Technology', 'label': None, 'scheme': 'http://www.itunes.com/'}]

Go somewhere