
Embracing Complexity with Christina Schulman & Dr. Laura Maguire
Durée: 33m59s
Date de sortie: 20/11/2024
In this episode of the Prodcast, we are joined by guests Christina Schulman (Staff SRE, Google) and Dr. Laura Maguire PhD (Principal Engineer, Trace Cognitive Engineering). They emphasize the human element of SRE and the importance of fostering a culture of collaboration, learning, and resilience in managing complex systems. They touch upon topics such as the need for diverse perspectives and collaboration in incident response, the necessity of embracing complexity, and explore concepts such as aerodynamic stability, and more.
Welcome to Season 3 of the broadcast. Google's podcast about site reliability engineering and production software. I'm your host, Steve McGee.
This season we're going to focus on designing and building software in SRE. Our guests come from a variety of roles both inside and outside of Google.
Happy listening and remember, hope is not a strategy.
Hey, everyone, and welcome back to the broadcast. Google's podcast about SRE and production software. I'm Steve McGee, your host.
This season we're focusing on software engineering in SRE. This is notably done by humans, often in teams. Wow, I know, so bold.
These teams build and manage complex systems that tend to grow and grow, and, you know, what could possibly go wrong with that?
Well, we have two great guests today with complementary approaches to big problems like this that we face in SRE.
Specifically, how do we deal with these big, complex, continuously changing systems? How should these teams work together? How do they work together?
And can one person even understand a big thing like this? And if not, like, then what? How do we even deal with this?
So with that, let's meet our truly awesome guests, Christina Schulman and Dr. Laura McGuire, who will introduce themselves. Christina.
I'm Christina Schulman. I am a software engineer focusing on reliability in Google Cloud.
Hey, my name is Laura McGuire, and I am a cognitive systems engineer, which is basically a fancy way of saying that I study how people do the thinking parts of their jobs.
And the cool thing about CSE is that we can study things like perception, attention, and reasoning capacities of people operating in all kinds of very cognitively demanding work settings.
And we can apply those patterns to software engineering. So I like to say that software engineers have more in common with astronauts and fighter pilots than they might originally think.
Cool. OK, so can we just start with this stuff is hard. These systems are big.
We as humans have trouble peering into each other's brains, so that also makes it a little bit difficult.
So we do have to like speak with words and like write things down and I don't know plan for things.
So first off, when we're talking about these complex systems, I've also heard the words, you know, complicated thrown around and like, it's big.
Like, what do these words even mean? Like, what do we mean by this complexity?
Like, does it matter the words that we use?
And if so, in what way should we be careful about how we describe these problems that we're trying to solve together?
I think it's honestly difficult for humans to wrap their heads around just how large and complicated a lot of these platforms are.
My mantra is that once a system won't fit on one whiteboard, you just don't understand anything that's outside the realm of that whiteboard.
The more whiteboards it takes up, the more you need people just to understand where everything is.
And nothing ever gets less complicated.
You just keep adding more and more complexity, because if you don't, your system will die and fall over.
Yes, to everything Christina said.
And I would say that we tend to often think about complexity in terms of it being for a strictly technical sense,
but complexity is a lot broader than that and it extends to the whole socio-technical system.
So I try to think about it in terms of like the levels of abstraction.
So what makes this work hard?
Well, it's cognitively demanding.
How do we notice what is happening, what's changing in the world around us?
How do we reason about that?
How do we apply our knowledge?
And then when we start thinking about the teaming aspects, as Christina said,
we need to start to bring other perspectives and other knowledge bases together.
That brings in a whole lot more social complexity.
And then organizational aspects like how are you managing trade-off decisions?
How are resources being allocated?
All of those things are stuff that software engineers deal with on a day-to-day basis.
It's not just the technical parts.
And then every time you add a new layer of abstraction to make things look simpler,
you've just added an enormous new layer of technical complexity that your end-users don't even know is there.
Yeah, absolutely.
And you also start adding things like automation and now you're looking at human-machine teaming
and how do we try to understand and manage with our automated counterparts?
So basically without putting some barriers in place, everybody winds up crying in a corner professionally.
It hurts.
I feel that so much part of our job as SREs is, of course, to try to, as we say, automate ourselves out of a job.
Of course, we know that that's not actually true.
The automation itself becomes a job dealing with the automation and interacting with the automation.
One thing that we've found within Google, having done SREs for a long time and having grown a lot,
one thing that we have pointed out, I think in one of the books,
is the idea of teams undergo what we call mitosis,
which is when you kind of like the team is responsible for too much stuff.
And so we've got to split the team in half along some line.
I know from experience, having done this several times,
that finding out what that line is is real hard and it's real important.
And then coming up with the ability to judge, you know, who works on what
and making sure that they actually understand what is there working on.
Super, super important, not technical at all.
It's purely, as you said, Laura, socio-technical.
It is just who is doing what and how do they talk to each other and things like that.
OK, that's awesome.
So another thing that comes up when it comes to understanding complex systems
and the people that deal with them is like stuff breaks, right?
And one of the things we do in SREs, we deal with incidents.
This is you can talk about being on call.
You can talk about writing postmortems, that you can talk about getting paged, right?
You can talk about all these kinds of things.
What should we think about here when it comes to like making sure we don't screw it all up?
Like what are the risks, first of all, when it comes to dealing with these complex systems
with a group of, you know, fallible humans that we need, you know, to sleep and things like.
Where should we start with that?
Let's let's start with Laura from the kind of humanistic side.
And then, Christina, you, I'm sure you'll have things to say as well on top of that.
So I think the point that Christina brought up about how there's a lot of different perspectives
that are needed to really understand, you know, how the system works is thinking
that we can try to solve these problems independently or that we can try to solve
these problems without others and knowing when and how to bring other people in
and how to bring them up to speed appropriately so that they can be useful
to the incident response effort.
That in and of itself is actually quite a sophisticated skill set,
especially when you're under a lot of time pressure, there's a lot of uncertainty.
All of the details matter and they matter more.
Well, I just add that we actually have people inside Google who specialize
in dealing with very large, very visible incidents, or, you know,
even if they're not super large and visible, but who have experience
just doing the coordination and the communication necessary to keep an incident
moving while the people who actually understand the systems that are probably involved
work on understanding what's going on and mitigating it.
I will say that ideally, nobody wants an incident in the first place,
but you're going to have them.
So being able to restrain the blast radius of an incident
and hopefully limit the potential effects of any particular failure in advance
is certainly something we would all like to do, although it's very, it's hard
and it's very difficult to analyze, test and guarantee.
One thing that I heard someone of us was talking about before we started this podcast
was the phrase that incident response is a team sport.
So what you're both kind of saying reminds me of that
and it helps to have in your team like broad perspectives,
whether it's just experience with the system at hand
or if it's experience with the other teams that are involved.
So one thing that I've found when working with customers is lots of times
teams that are adopting SRE or similar type of practices
hear things like you build it, you own it
and they come to believe that that means
they should have complete control over their entire destiny
and that means they need to own everything from the load balancer down to the CPU
for their particular service.
I don't think that scales and I understand the idea of it
is that you want to be able to have control over your own domain
but this doesn't really work when it gets really big.
So does this ring any bells in terms of how to manage autonomy under scale?
What are some ways of thinking about that?
Yeah, so I think from a cognitive systems engineering perspective
I think it's one of the sort of fundamental truth
is when you get to a certain size in a system of work
no one person's mental model about how that work system operates
is going to be complete.
It's going to be wrong in ways that are, you know, can be consequential.
It's going to be buggy in other ways.
And so we do need to be able to bring multiple diverse perspectives
to be able to respond to incidents.
I would say that from a practical standpoint
you can't be responsible for too much of your stack.
Among other reasons, somebody's going to need to switch out the software layers
and they aren't going to be able to do that
if you are clinging hard to the behavior of your load balancer.
There should be contracts around the behavior.
There should not be any promises around how it actually operates.
I also think that in order to make it safe,
psychologically safe for people to be on call
it always has to be okay for them to pull in the people
who do understand the systems that they think may be involved.
And that's very much a team culture issue.
You can shore that up with technical support
but there's no substitute for making it safe for people to get things wrong
and blunder their way into being good at things.
I do my best to get things wrong for my team as frequently as possible
to model that kind of behavior.
I think that is a really important point
because if all of our mental models are going to be partial and incomplete
in some ways, then we're all going to be wrong at some point.
So being able to say out loud in an incident response
I don't understand what's happening right now
or here's what I know, right?
Because it's easier to say here's what I know
and then here's what I don't know
but normalizing that ability to not understand something
and to be needing to ask or to help recognize
when you might have something wrong
and then being able to share that knowledge across each other
that is fundamentally what's going to make your incident response work.
I agree with you that psychological safety
and that social environment that allows you to just share knowledge.
It makes it a lot easier to keep your rotation staffed as well.
And I think it reduces the anxiety of when you're going on call for the first time
because the stakes aren't as high
if you know that other people are willing to be pulled in,
they're willing to kind of tell you when they're wrong
and so it makes space for you to be wrong as well.
Totally.
So when there is an incident I've seen within customers
and personally I've been on calls, not at Google but elsewhere
where you're in an incident like in the big room with the TVs on the wall
and like there's a VP in the room
and like there is this idea of a chilling effect
and it is totally real.
Like it can be fine depending on you know the VP
and the culture and everything
but often I've heard and I've experienced that it's not fine.
It actually tends to really make people freeze up
and feel like they have to say the right thing
and they can't do what you're just describing
which is saying like I don't know, like let's look into it,
I'm not really sure who can help, blah blah blah blah blah.
So is there more to it than that?
One of you also used the phrase earlier,
armchair esereying is another kind of like example of this type of thing.
So can you talk a little bit about that?
Like what is the effect of some other person in the room
have on this team that's trying to get something done?
Well, I think it's useful to also think
it can be a person in a position of leadership
and when there's sort of authority
and power imbalances in the room
then you know that makes sense
but it can also be someone that you really respect
that you don't want to be wrong in front of
and I think that the difficulties
or the folks who may have that chilling effect
you know as Christina was saying earlier
kind of can be wrong, can sort of normalize being wrong
but structuring your kind of incident response
so that the incident commander
or the coordinator, whoever is in charge
can sort of step that person out of the room
respectfully if they need to
and to be able to take a suggestion
that comes from a person in a position of authority
on par with a suggestion that comes from a junior engineer as well
they don't have to give it more weight
and more credence just because of who it comes from.
I think there's an interesting practice
within high reliability organizing
which kind of came out of looking at operations
on aircraft carriers
and one of the sort of principles
of high reliability organizing
is a deference to expertise
so just because the VP
you know may have 20 years or 30 years of experience
and they may have kind of a perspective
that may be a bit broader
you know your engineer who's only been on the team
for 2 months might be closest to the action
and they might have the most current
relevant knowledge to the situation
so it kind of you dynamically shift
where the sort of focus is relative to
who has the current expertise for that situation
I think it's worth bearing in mind though
that if it's a sufficiently large invisible outage
the VP is probably
they're freaking out too
just as much as the junior engineers are
so having an incident commander
who's in a position to manage
the social aspects of both of those freak outs
is really useful and to be fair
it's the VP who's going to get yelled at in public
you know the junior engineer is unlike
to get yelled at at all
I guess it depends on the organization for sure
but you're right I think it's really important
to recognize that like everyone within the system
has different kinds of pressures and constraints
and they are dealing with different kinds of goals
and priorities
and so that you know when we talk about
incident response being a team sport
that's actually a fundamental aspect
of being able to signal to others
when your goals and priorities are changing
and when your pressures and constraints are changing
so that they can proactively
anticipate or adapt or adjust
kind of the things that they're suggesting
or the way that they're carrying out their work
to kind of have this really smooth interaction
and reduce some of the friction there
So speaking of teams and reacting to things
we also designed things as teams
and there's a law called Conway's law
which a lot of people might be familiar with
and maybe tell us a little bit about that
but how that might apply to the thing that you built
and how it exposes itself to things like failure domains
so like what's the deal, what's going on here
and like what can we, why is it worth knowing about this
like what can we do about it
Christina you're grinning like you have something to say
Like you've heard of this before for sure
I mean Conway's law is most frequently quoted
I think in software companies
that you ship your organizational structure
you ship your org chart
which I have mixed feelings about
I don't want failure domains to look like my org chart
but at the same time in terms of as we keep saying
being able to understand the surface area
of the things you're responsible for
there's really good reasons for that
not to cross organizational boundaries
you can only understand so many things
you might as well understand the things that you're actually responsible for
I think that you need very strong agreements in place
in cases where the ways that you want your system to fail
cuts across organizational boundaries
and I think it's a very hard problem
Can you give us an example of that
like where you may have seen something similar to that
Well I work in dependency management
so if you give me an opening I'm always going to talk about isolating
where your RPCs go
and then I'll keep talking about it until you hit me with a stick
but fundamentally
you know
we're running a whole lot of different things
in lots of different geographical locations
that are subject to different rules
Gmail is subject to different rules than
let's say the various ML infrastructure pieces
both in terms of how they're allowed to fail
and how they're allowed to store data
when there's a failure
you want to constrain that geographically if possible
if somebody trips over a power cord
in South Carolina
you do not want that to affect jobs that are running in Europe
but in order to guarantee that that's the case
you need to be able to understand and test behavior
across a lot of different systems
and as we already said
since nobody can understand all of those systems in depth
or in breadth really
you can't do that without software controls in place
The tripping over the cable reminds me of Leslie Lamport's description
of a distributed system
which is any system where your computer can be affected by a computer
you've never heard of
and I think that's great, that really describes a lot of SRE pretty deeply to me
I've found actually that a lot of folks
kind of don't get this
there's a phrase that I think I learned from John Alsba
but it comes from some other folks and it's that of reductive tendencies
and this is just give it to me simple
like with all the nerd stuff
just tell me what's going on
and often we lose a lot of the really important subtleties
when we reduce it too much
and this kind of ruins the whole idea of what it is that we're trying to accomplish here
by running a complex system
if we write our rules for Gmail
or whatever
in these reductive ways
often we'll lose track of
some of the constraints that are really important
that actually keep the whole thing running
does this ring any bells either?
this seems like it's kind of crossing the
the two streams here a little bit
yeah absolutely
the work you're referring to is
some folks from the institute of human machine cognition
and from the Ohio State University
and they kind of
distilled I think 11 different
tenets or truisms about sort of managing or operating in complex systems
and that synopsis you gave is like
we want to oversimplify things
because it's sort of easier to manage and control
and get our arms around of
is actually quite dangerous
because when we're treating things that are
you know dynamic
and they are simultaneous
and they're like things that are running in parallel
as you know they're linear
and they're cause and effect sort of these oversimplified models
we're solving the wrong problem
you know
same things that a lot of these kinds of events
and these sort of failure events in complex systems
they involve a lot of sort of
they're non linear
they fail in surprising or unexpected ways
and so if you are not sort of considering
your system as being complex and adaptive
your response will not be complex and adaptive
and so there's
the law of requisite variety
kind of basically states that like
if your problems are all
highly variable and very dynamic
and kind of changing
then your responses to those problems
have to be similarly so
so
it kind of goes back to
you know what we said at the outset is like
how do we help people cope with complexity
and sometimes that's about
helping to sort of
build common ground
from the boundaries of where my part of the system
and what I'm responsible for
interact with kind of neighboring parts of the system
it's about helping people who have never worked together
you know to very quickly
try and ascertain who knows what
who's important, who do I need to
you know coordinate and collaborate with
and those kinds of like dynamic
reconfiguration
means you've all got to bring whatever skills
and knowledge you have to bear
in a situation that you've never seen before
with people you maybe never have worked with before
and so that's kind of the ways in which
we shift away from this oversimplified view
of like we just write more rules
we just have good process
you know we can draw these really firm boundaries
to say surprises are going to happen
how do we help people cope with that most effectively
I love that law of requisite variety
I've not heard of that before
I'm going to cite this now every time somebody asks
why I can't come up with a simple design
for a complex problem
Ashby 1956
All right, I'm going to read this now
I will say that most of the really interesting
large outages that I have seen
in Google production infrastructure
have involved interactions
between systems that were very powerful
and very complex
and they had to be that complex
in order to handle just the enormity
and the heterogeneity of the system
and there's a point at which you simply can't
prevent these interactions from happening
you can't predict them, you can't prevent them
you just have to have really good systems
in place for containing and mitigating them
at some point I don't care what the root cause was
I just want to be able to make it stop
without spending three weeks understanding it
Stop the bleeding is a term that we tend to throw around
sure, we'll find what really happened someday
but for now let's mitigate
and let's bring the ball forward
But it's interesting too
because sometimes the right response is to let the bleeding
continue a little bit more so that you have more diagnostic information
So this kind of goes to those like
how do we manage these trade-offs
how do we make sure that that VP is in the room
so that their perspective can be included
in what types of actions we're trying to take
That's why we love observability
I know we just talked about how reducing a system
to a simple idea is actually not great
but there is a metaphor, if you will,
that comes up quite a bit when we're talking about these types of things
and that's the idea of aerodynamics stability
which comes from flight, from airplanes
and this is the idea that you want a system
that when you take your hands off the yoke
it kind of satisfies itself
it kind of brings itself to some level flying ability
This is a, I think this was coined by John Rhys
within Google at least, I'm sure it's been used elsewhere a million times
but how can we use this metaphor when it comes to
designing systems that don't require intensive
hands-on keyboard at all times
and watching all the screens at all
Is this a real thing? Is this something you can actually strive for
or is this just a pipe dream from GTR?
GTR, John Rhys, was specifically talking about
having, removing dependency cycles
from production when he wrote about that
and essentially what you want to be able to do
is identify and remove dependency cycles from your system
A dependency cycle is when you have
System A depends on System B
System B depends on System C
System C depends on System A
This seems like it should be easy to
identify and prevent but when there are
300 systems involved in this cycle it's a lot harder to find
and it's particularly pernicious when you have
turn up cycles
Like, everything went down because somebody
tripped over a very large power cord
bringing it back up
is not going to be able to happen automatically
if you have these cycles
The problem is of course that
we keep saying this is a very large, complex system
so massively re-engineering things
to change across multiple systems
it can be done, it hurts like hell
so putting systems in place where you control
how things are allowed to depend on other things
Essentially the top of your stack is allowed to depend on
the bottom of your stack
The bottom of your stack should not have dependencies
on the top of your stack
That's going to make things a lot less painful
when something at the bottom of your stack
goes down, comes back up
and you want everything at the top of your stack
to recover without human interference
Just being able to speak in terms of directionality
and having a common understanding of
this is what we mean by up
and this is what we mean by down
and there is this intent to not have this particular direction
that helps a lot
Yeah, this may not be directly answering the question
but it kind of brought up a really interesting thing
I saw in some studies that I did watching
ongoing operations within large scale systems
and that is that people are
nearly continuously monitoring the systems
and providing small little course corrections
and they are preventing incidents
before they even
before they are even a seed of an incident
and so this is something that is subtle
but very non trivial because a lot of this work
is hidden, right?
All of the ways in which someone is sitting
in a co-located space
and they spot something on a dashboard
and they go and check some logs
or they kind of do some little action
to kind of make a change in the system there
This stuff is happening all the time
all around us and we don't typically tend to notice it
and so I'm a paraglider
when I think about dynamic flight
under my wing
it is these really small little course corrections
because you don't want to drive things into the place
where it is very clear that you need to make a correction
So the point I guess of bringing this up
is that this kind of hidden work
this sort of continuous monitoring
and continuous managing of the system
is work
and if we overload engineers
with task work or support work
or future development
whatever it is
you're going to lose a lot of this capacity
that's already happening
and so maintaining a little bit of slack in the system
or starting to notice
when and how people are doing these small little course corrections
can help to actually surface them
so that you can account for them
you can resource them
and you can try to engineer them into future systems
to prevent surprises
Je suis en train de travailler sur un projet que c'est très affecté
par la production de garanties
sur un système qui a beaucoup d'attentions
de l'attention constante
et de l'attention de la babysitting
et c'est très dangereux
parce que vous voulez vraiment
que vous faites un système qui n'a pas besoin
et c'est un truc très hidden
et si vous ne vous assumez pas
vous allez avoir un temps très excitant
et je vous dirais
que d'autres études
de toutes les situations de procédure
de procédure
que c'est
un principle fondamental
de ces systèmes
et si on ne veut pas
on peut en avoir besoin
pour accoucher
les choses que nous ne pouvons pas imaginer
dans notre design
ou pour les moyens dont les interactions
ou les pressions
des choses qui sont envers le monde
sont influencées dans les moyens
que nous ne espérons pas ou l'attention
et donc je suis certain
que nous ne voulons pas dans le système
mais c'est aussi vrai que c'est là
et donc de pouvoir
garder cette capacité
de la maintenir
enlever où nous pouvons
mais ne l'éliminer pas tout ça
je pense que c'est très nécessaire
je crois que vous êtes probablement corrects
mais je vous souhaite que vous êtes pas mal
c'est une bonne histoire
que je peux déjà partager
mais je vais vous le dire
il y a une certaine sereine
qui est en charge de la chose près de la frontière de Google
près de la balance de la load
et il a cliqué le bouton et a poussé le truc
et tout de suite tout le monde est en place
tout le monde s'est passé en même temps
et il sortait
et il s'est dit
ok j'ai fixé
et c'était un petit dip
dans l'all of Google Graph
et je me dis
je parle de l'histoire de mes clients
et je me dis ok si vous êtes le manager de cette personne
comment vous le faites ?
parlez de l'incendie
vous le faites et vous avez un mot
et vous ne vous parlez pas de ça
vous le faites sur le spot
vous le faites pas dans deux semaines
et ce sont les mauvaises choses que vous pouvez faire
et ce qui s'est passé à Google
c'est qu'il s'est passé en sereine
ils le mettent sur le stage et ils ont appris la histoire
pour l'entière compagnie
ils ont montré que ce truc s'est passé
et sa réaction était très très bonne
le fait qu'il s'est pu
l'adapter, l'adapter
et se faire dire ok c'est bon
et je lui ai également ajouté des procédés
pour que je ne vous les screw pas encore dans le futur
à l'institition de juste hiding le problème
ou d'expecter des gens à ne pas avoir ce problème
ils ont déclaré ce problème
pour l'entière compagnie et tout le monde s'est dit
ok pas seulement ça peut se passer
et je devrais penser à ce type de problème
mais si ça se passait, je ne serais pas fiers
je serais pas fiers, je serais pas fiers
c'est pas bon
donc beaucoup de choses structurelles
quand il s'agit de incentives
on parle d'exécutifs qui font des choses terrible
mais ils font aussi des choses bonnes
la leadership peut faire la même culture
c'est un autre moyen de regarder ce problème
c'est comme ça, quand il y a un truc complexe
et ces défis emergent
comment vous réagissez à eux est souvent un producteur
de comment vous interagissez avec tout le monde
dans votre équipe
et comment vous observez cela avec les autres
aussi
je pense que c'est vraiment important
que cette réponse organisée
soit une erreur
parce que si ils étaient à Dock's Pay
ou ils ont des actions punitives
cela fait beaucoup de rapports
c'est un peu de conversations
de
je pensais que ça a travaillé de cette façon
et puis je l'ai trouvé dans un grand public
embarrassant de la façon dont ça n'était pas
et donc, plus que ces choses
peuvent arriver à la surface
ces sortes de laissés dans nos modèles mentaux
et nous pouvons parler de ces choses
et nous pouvons partager cette connaissance
le plus résilient que votre système va être
je pense que c'était une bonne réponse
je pense que le plus fameux
interne incident
dans le office de Pittsburgh
était
une des interne, ça a été passé depuis longtemps
mais une des interne sur le team de paiement
était
validation de la procédé de crédit
et sur la dernière journée, le whole team a été
à la nourriture à la nourriture
quand les paiesments de la paix ont été offrés
parce que leur test a été dévoilé
par les paiesments
et elle était bien sûr horrifiée
et la team était délite
parce que sa test a été successement
tu sais, elle a trouvé un problème
elle a vécu la conversion
et je pense qu'elle est un manager maintenant
mais nous racontons cette histoire
pour tous les interne d'incompréhension
que vous comprenez A, vous allez travailler
dans les systèmes de production réel et vous allez avoir
beaucoup de pouvoir et B, c'est ok
si vous avez un événement très impactif
Yeah, je pense que
Etsy a
la 3 armes de la paix
qu'ils présentent
une des plus impressionnantes
des défais en proie
et c'est un bon moyen
de normaliser ce défais
et de surfacer
des choses comme ça
C'est important
Merci à vous tous, je ne veux pas vous en parler tous les jours
et je sais que nous pouvons parler de ça
donc on va devoir le couper à un moment
mais avant de nous aller, si il y a quelque chose
que vous voulez partager avec votre audience
où les gens peuvent entendre plus de vous sur internet
ou juste un truisme pithier
que vous pourriez partager avec la whole internet
c'est votre chance
Je voudrais juste que la whole internet
soit vérifiée par les valeurs de retour
Excellent, bonne conseil
Je suis connecté
avec moi sur internet
Je pense que je peux être touché par l'email
à laura
à TraceCognitive.com
ou je suis en bleu sky
lauramaguire.bsky.social
Vous pouvez me trouver
à la Lab de l'Engineur cognitive
de l'Université de l'Ottawa
et le take-away, je pense, est
de ne pas s'implifier
et de s'y aller
et vous allez avoir beaucoup plus d'adaptive
et de résilience
C'est cool, merci à vous tous
et à tous les listeners
et, comme toujours,
fais le query, le flow et le silence de la page
C'est long, bye
Les infos glanées
GoogleSREProdcast
SRE Prodcast brings Google's experience with Site Reliability Engineering together with special guests and exciting topics to discuss the present and future of reliable production engineering!
Tags
Human Factors in Complex Systems with Casey Rosenthal and John Allspaw