Uncovering Hidden Risks

Raman Kalyan, Talhah Mir

Welcome to Uncovering Hidden Risks, a broader set of podcasts focused on identifying the various risks organizations face as they navigate the internal and external requirements they must comply with.   We’ll take you through a journey on insider risks to uncover some of the hidden security threats that Microsoft and organizations across the world are facing.  We will bring to surface some best-in-class technology and processes to help you protect your organization and employees from risks from trusted insiders.  All in an open discussion with topnotch industry experts! read less
NegociosNegocios

Episodios

Episode 8: Class is in session
26-05-2021
Episode 8: Class is in session
When Professor Kathleen Carley of Carnegie Mellon University agreed to talk with us about network analysis and its impact on insider risks, we scooched our chairs a little closer to our screens and leaned right in. In this episode of Uncovering Hidden Risks, Liz Willets and Christophe Fiessinger get schooled by Professor Carley about the history of Network Analysis and how social and dynamic networks affect the way that people interact with each other, exchange information and even manage social discord. 0:00 Welcome and recap of   1:30 Meet our guest: Kathleen Carley, Professor at Carnegie Mellon University; Director of Computational Analysis & Social and Organizational Systems; and Director of Ideas for Informed Democracy and Social Cybersecurity 3:00 Setting the story: Understanding Network Analysis and its impact on company silos, insider threats, counter terrorism and social media. 5:00 The science of social networks: how formal and informal relationships contribute to the spread of information and insider risks 7:00 The influence of dynamic networks: how locations, people and beliefs impact behavior and shape predictive analytics 13:30 Feelings vs Facts:  Using sentiment analysis to identify positive or negative sentiments via text 19:41 Calming the crowd: How social networks and secondary actors can stave off social unrest 22:00 Building a sentiment model from scratch: understanding the challenges and ethics of identifying offensive language and insider threats 26:00 Getting granular: how to differentiate between more subtle sentiments such as anger, disgust and disappointment 28:15 Staying Relevant: the challenge of building training sets and ML models that stay current with social and language trends.   Liz Willets: Well, hi, everyone. Uh, welcome back to our podcast series Uncovering Hidden Risks, um, our podcast where we uncover insights from the latest trends, um, in the news and in research through conversations with some of the experts in the insider risk space. Um, so, my name's Liz Willets, and I'm here with my cohost, Christophe Fiessinger, to dis- just discuss and deep dive on some interesting topics.             Um, so, Christophe, can you believe we're already on Episode 3? (laughs) Christophe Fiessinger: No, and so much to talk about, and I'm just super excited about this episode today and, and our guest. Liz Willets: Awesome. Yeah, no. I'm super excited. Um, quickly though, let's recap last week. Um, you know, we spoke with Christian Rudnick. He's from our Data Science, um, and Research team at Microsoft and really got his perspective, uh, a little bit more on the machining learning side of things. Um, so, you know, we talked about all the various signals, languages, um, content types, whether that's image, text that we're really using ML to intelligently detect inappropriate communications. You know, we talked about how the keyword and lexicon approach just won't cut it, um, and, and kind of the value of machine learning there. Um, and then, ultimately, you know, just how to get a signal out of all of the noise, um, so super interesting, um, topic.             And I think today, we're gonna kind of change gears a bit. I'm really excited to have Kathleen Carley here. Uh, she's a professor across many disciplines at Carnigen Melligan, Carnegie Mellon University, um, you know, focused with your research around network analysis and computational social theory. Um, so, so, welcome, uh, Kathleen. Uh, we're super excited to have you here and, and would love to just hear a little bit about your background and really how you got into this space. Professor Kathleen Carley: So, um, hello, Liz and Christophe, and I'm, I'm really thrilled to be here and excited to talk to you. So, I'm a professor at Carnegie Mellon, and I'm also the director there of two different, uh, centers. One is Computational Analysis of Social and Organizational Systems, which is, you know, it brings computer science and social science together to look at everything from terrorism to insider threat to how to design your next organization. And then, I'm also the director of a new center that we just set up called IDeaS for Informed Democracy and Social Cybersecurity, which is all about disinformation, uh, hate speech, and extremism online. Liz Willets: Wow. Professor Kathleen Carley: Awesome. Liz Willets: Sounds like you're (laughs) definitely gonna run the gamut over there (laughs) at, uh, CMU. Um, that's great to hear and definitely would love, um, especially for the listeners and even for my own edification to kinda double-click on that network analysis piece, um, and l- learn a little bit more about what that is and kind of how it's developed over the past, um, couple years. Professor Kathleen Carley: So, network analysis is the scientific field that actually started before World War II, and it's all about connecting things. And it's the idea that when you have a set of things, the way they're connected both constrains and enables them and makes different things possible.             The field first started it was called social networks. This is long before social media. And, um, people were doing things like watching kindergartners play with each other, and they realized that the way which kids played with which, which kids bashed each other over the head with the, their sand shovel was really informative at effect at telling how they would actually do in the various kind of studies they needed to do. The same kind of thing was applied to hermit crabs and to deers and other kinds of animals to identify pecking orders, and, from those groups, and identify which animals had the best survival rate.             Today, of course, the field's grown up a lot, and we now, uh, talk about kind of networks+. So, we apply network science to everything from, you know, how your company ... Where are the silos in your company? Who should be talking to 'em? We also apply to things like insider threat and look at it there to say, "Ah, well, maybe these two people should be talking, but they're not. That's a potential problem," a, and we apply to things like counterterrorism. We apply it to social media and so on. So, people now look at really large networks and very what are called high-dimensional or meta networks such as, who's talking to whom, who's talking about what, and how those ideas are connected to each other. Liz Willets: Awesome. Yeah, I think, I know Christophe and I, we're very interested around that space and thinking about who should be talking to one another, um, you know, as we think about communication risks in an organization, especially in the (laughs) financial services industry. You've got things, um, that, you know, you're mandated by law to, um, kind of detect for like collusion between two parties whether it's your sales and trading group who just should not be, um, communicating with one another. So, I think that certainly applies, um, to your point earlier around the insider threat space. Professor Kathleen Carley: Well, one of the great things in, in, uh, using social networks, especially depending what data you have access to, you may be able to find informal linkages. So, not just who's, uh, formally connected because they're like in an authority relationship, like you report to your boss, but, you know, who you're friends with or who you go to lunch with or, you know, and all these kind of informal relationships. And we often find that those are as or more important for affecting, you know, house, how information goes through a group, how information gets traded, and even for such things as promotion and your health. Christophe Fiessinger: And to not only to, to add to, uh, what you were saying, Kathleen, is like the context is usually important to make an informed decisions of what's going on in that network. Professor Kathleen Carley: And then, cer- Christophe Fiessinger: Isn't that what you think about it? Professor Kathleen Carley: Yeah, certainly. In fact, the context is very important, and it's also important to realize that one context doesn't, um, capture all of s- somebody's interactions, right? So, for example, when Twitter started, people were trying to predict elections from, uh, interactions on Twitter among people. Well, the problem was not only was not everybody on Twitter, so you didn't have a full social network, not all communication even with people who were on Twitter, that's not the only way they communiticated with each other. They might have also gone to the bars together or, or whatever. Liz Willets: Um, I was actually kinda reading through some of your research (laughs) as I was prepping for this interview and, um, read, um, some of your research around the difference between social network analysis and dynamic network analysis. And so, as you think about, kind of as we're talking, contexts and, you know, it's not just maybe the social connections, but it's adding in now the organization or the location or someone's beliefs. Um, I'd love if you could just kind of, you know, double-click there for us and tell us a little bit more about that. Professor Kathleen Carley: Yeah. So, when, um, when the field started, right, people were really dealing with fairly small groups. And so, it was not unusual to say go into a small, like, startup company, and you would have maybe 20, 25 people. Um, for each one of 'em, you would know who was friends with who and who went to 'em for advice, and that was your data set, right? It was all people, and it was all just one or two types of links. Technically, we call that one-mode data 'cause there's only one type of node, and there's two types of links. So, it's t- ... It's multiplex and one mode.             Um, but now what's happened, as the field has gotten grown up in some sense, uh, we're dealing with much larger data sets, and you happen to have multiple modes of data. So, you'll have things like people, organizations, locations, beliefs, resources, tasks, et cetera, and when you have all of that, you have multiple modes of data. And in fact, this is great because you need multiple modes of data to be able to do things like do predictive analytics, but in addition, you have way ... And you have lots of different kinds of ties. So, I not only have ties between people, I have ties of people to these things like what resources they have, what knowledge they have, and so on. So, it's called by bipartite data.             But then, I also have the connections among those things themselves, like words to words, and because you have all of that high-dimensional data and you have it through time, you now have a kind of a dynamic, high-dimensional networks. And so, the big difference here is that you've got more data, more kinds of data, and you've got it dynamically. And we even talk about it sometimes as geospatial because sometimes, you even have locations and you have to take into account, uh, both the distance physically as well as the distance socially. Christophe Fiessinger: Interesting. And Kathleen, I, I, I can't resist- Professor Kathleen Carley: Mm-hmm (affirmative). Christophe Fiessinger: I mean, I got kids and, and, uh, uh, I'm originally from Europe, and the way my k- kids interact with their family non-members, grandmothers in Europe is obviously very different than how I did it when I was growing up. So, to your point on all those dimensions is you also see a difference where a person might talk one way on a channel or, uh, an app and talk another way in another app, and then layer that, you know, I would talk differently on a PC where I get a full form. I can be very verbalist in my email or whatever versus my phone wherever I'm located. Are you seeing some of those patterns as well influence? Professor Kathleen Carley: Absolutely. Yeah, and then they're ... Yeah. And you, you've probably even seen these in your own work lives because, for example, you'll communicate one way on LinkedIn. You'll communicate a different way on Facebook, a different way on Twitter, and a different way in person. So, it also matters what media you're on, and it also matters whether or what kind of others you surround yourself with. I mean, I know people who use different variants of their names on- Christophe Fiessinger: Mm-hmm (affirmative). Professor Kathleen Carley: ... different platforms to signal to themselves, "Oh, when I'm on this one, I don't talk about money," or, "When I'm on this one, I don't talk politics," you know? And so, people not only change how they talk, they change what they talk about, and they change who they talk to. Christophe Fiessinger: Yeah. And I think the personas as well. I've seen my younger one who plays, uh, who does a lot of gaming. Professor Kathleen Carley: Yep. Christophe Fiessinger: Typically, they have their own persona, and, and then obviously, there's a different realm then of, of, of a different network, but they even put a different hat going into that mode of, of talking in the context of a game. Professor Kathleen Carley: Well, and for there, it's just doing a game, right? But what we're actually seeing on social media is, you know, you do see adversarial actors- Christophe Fiessinger: Uh-huh (affirmative). Professor Kathleen Carley: ... under fake personas doing things like trying to do fishing expeditions or trying, you know, trying to convince you that they're just one of the other people in the neighborhood- Christophe Fiessinger: Yeah. Professor Kathleen Carley: ... and they really aren't, you know, and try, and trying to suck you into things. Christophe Fiessinger: Yeah. Professor Kathleen Carley: So, we see a lot of that as well. Christophe Fiessinger: Yeah. Liz Willets: Grooming. Christophe Fiessinger: I guess grooming is also not a new problem but also something that, that's present in those communities or anywhere. Professor Kathleen Carley: Yeah. Liz Willets: Definitely, and I think what we've seen especially with the pandemic is, yes, you might have these different personas, um, but now, like your, your home is become your workplace. And so, how you might have typically behaved, um, you know, when you'd come home at the end of a long day versus now, you're in the context of work. Um, you know, I think we've seen a lot of organizations think about the risks that, that that could pose, um, in addition to all the other, um, you know, (laughs) stresses that people have on their day-to-day lives.             Um, but I think it's interesting, um, to your point earlier around, you know, having all the context. Um, you know, we're seeing signals come through from Teams, email, Zoom, uh, you know, social media, et cetera, and, uh, um, also detecting for things like repeated bullying, um, behavior. And so, it's not just, uh, a way f- to your point and around using the analytics to predict something, but it's also to say, "Hey, this is a pattern, and, uh, you know, we should probably step in and do something about it." Professor Kathleen Carley: Yeah, absolutely. And I think people are becoming more aware of these patterns themselves because they're actually not just seeing their own communication. They're actually seeing their kids' communication or their parents communication or whatever. And so, they're starting to realize that the people around them may be comm- communicating in ways that impacts them, and so there's a variety of now new technologies that people are talking about trying to develop to try to help people manage this more collectively. Liz Willets: Definitely. And I think, um, you know, another area that I'd love to explore with you is just around sentiment analysis. So, you know, you have all these signals, but, um, how do you know if someone's talking about something positively or negatively, um, and g- kind of would love to kind of hear if you've done any research in that spaces? Professor Kathleen Carley: Oh, yes, we ... Yeah. I and my group, of course, we do a lot of work on sentiment. So, um, so, sentiment is one of those really tricky things when you're, uh, when you're not there because it depends on how many different modalities you have. Like, if you only have text, it's harder to detect than if you have text plus images, which is still harder than if you also have sound. So, the ... So, it's kind of tricky, and there's new techniques for all of those.             But let's just think about text for the moment. The way people often de- try to detect sentiment and then where they started out was just by, um, counting the number of positive versus negative words. Okay? And that's kinda okay, but it more tells you about overall, was the message kind of written from an upbeat or a downbeat kind of way. That's really all it really tells you, but people thought that that meant that if there was a something they cared about, like let's say I wanna know if it's about vaccines and are they happy about the vaccines or upset. Well, they would just say, "Here's a message. It has the word vaccine in it. Oh, there's more happy words than sad words, so it must be positive toward vaccines." No. Not even close.             Because locally, it coulda been, "I'm so happy I don't have to take the vaccine." That woulda come out as overall positive, but it's really negative about the vaccine. So, then, the people came up with loads. So, then, we work on locals then, but how do I tell for a particular word?             But the thing is when I make a statement like that, that's out of context still because there could've been this whole dialogue discussion, right? And in the ... And when we actually then looked at, at, at these kind of sentences within the context of the discussion, over 50% of time, we had to change our mind about what the sentiment really was in that particular and what was really meant, you know?             And then, there's issue of sarcasm and humor, which we were terrible at detecting, right? Liz Willets: (laughs) Professor Kathleen Carley: And so, peep ... And one of the ways people start to detect that is by looking at what's written and then looking for an emoji or emoticon, and if it's at the opposite sentiment of the what's written, you go, "Ah, this must be a joke." Okay? Christophe Fiessinger: Or just sarcastic again. Professor Kathleen Carley: Yeah. So, it goes cra- ... It goes on and on from there, but there's a couple of a ... There's ... That's kind of the classic line. And now, of course, we do all that with machine learning as opposed to just based on just keywords.             But there's two other things that are in the sentiment field that people often forget about. One is, um, these subconscious almost supplemental cues that are in messages. So, when you write things and use images, your reader will pick up on things in it and it will cause them to respond and with particular emotional reactions.             So, for example, you've probably gotten an email or a text from someone where it was in all caps, and your, and your initial response is, "Oh, my gosh. They must be mad at me," right? Or, "What did I do wrong now?" It's like, "Oh, okay." But that's a subliminal cue, okay? It's like things like all caps, use of pronouns. There're special words that people use that will evoke emotions in others, so we look for these subliminal cues also.             And, uh, an emergent field is looking for these in images, like the use of light versus dark images, the use of cute little kitties, right? Christophe Fiessinger: Yeah. Professor Kathleen Carley: There's a whole bunch of things that people know now make them happy. And then, so, that's another aspect of it.             And then, the third aspect of it is that, um, sentiment is actually very tied to your social networks. Your emotional state is tied to your social networks. So, the more I can get you excited either really happy or really angry, the more I can change your social network patterns. So, we can actually look at for our detections in changes in social network patterns as a way of figuring out something about sentiment as well. Liz Willets: Interesting. So, are you saying essentially that through your social networks, it kind of like reinforces or, or strengthen, strengthens your connections with that group that you're identifying yourself with? Professor Kathleen Carley: So, I'm saying that, well, it does. It's kind of a cycle because your mind likes to, um, maintain balance, okay? It likes to be emotionally balanced. You don't ... You really don't like to be overly excited in any direction, right? Most people don't. And so, if something's making you very uncomfortable, you will either ... If it, like, your connection with someone's making you, uh, very uncomfortable, you will either change your opinion to be more like theirs so you're less more comfortable, or you will drop your connection with that person. So, your affect of your emotional state modulates your social networks, and your social networks that affect what information and emotions come to you and modulate what emotions you have. So, it's kind of this cycle. Christophe Fiessinger: Then- Professor Kathleen Carley: And so, we actually can watch this happening in groups where I can form them into ... I can prime groups to be ready to be emotionally triggered simply by building up social network connections among them. And then, I can emotionally trigger them, and the people in them will either get more involved in the group or they'll say, "I'm not really feeling comfortable anymore. I'm gonna leave." Christophe Fiessinger: Mm-hmm (affirmative). I'm sure you've got a trove of data to research with COVID or with recent election in the U.S. that would- Liz Willets: (laughs) Christophe Fiessinger: ... that would prove those theories of the relationship between your social network and h- your, your sentiment, right? Professor Kathleen Carley: Yes. Yeah. Yeah. Christophe Fiessinger: Well, actually, going back, tying this to, um, to what you were mentioning earlier, Kathleen, like, sometimes, we say that the conversation at the edges are, are the one, um, are the highest risk one, and the ones that are happening on the fringes and, you know ... A- And then, you add to that like something you mentioned earlier which is a, and also looking at how you, how you are potentially detecting like social unrest and things like that. And, and because those are like at the fringes, it might start very small in a network with very few people, but it could definitely have a network effect very quickly. How do you find those needles that that did, didn't exist before the, a theory, a pattern, an opinion? Professor Kathleen Carley: So, the short answer is it's really hard, and we're not good at it yet. (laughing) Christophe Fiessinger: Okay. Professor Kathleen Carley: Um, but there's a couple of techniques that first off, sometimes, you find 'em by luck. You just happened on 'em. Sometimes, you find 'em just through, um, good journalistic forensics, um, and sometime, but sometimes, we can aid and help that a bit by actually looking for, um, critical secondary actors. Christophe Fiessinger: Sure. Professor Kathleen Carley: And these are like there's these kinda network metrics for finding these kinda critical secondary actors, and we look for those because those are the kind of actors that could emerge into leaders of these kinds of things. So, they're kind of ... It's not quite anomaly detection, but it's kind of like anomaly detection for networks. Christophe Fiessinger: Oh. Is it kind of like that secondary actor is potentially a placebo that could flip and you're trying to either a, a change compared to that, that baseline? Professor Kathleen Carley: I think that's probably the wrong, the wrong model of it. Christophe Fiessinger: Okay. Professor Kathleen Carley: Like, uh, a s- secondary actor is often someone who does things like brokerage relationships between two big actors, okay? Christophe Fiessinger: Ah, okay. Professor Kathleen Carley: Yeah. Christophe Fiessinger: So, that person would be potentially more of a f- will ... Pride, whatever, would be a fire starter and will accelerate on that. Professor Kathleen Carley: Yeah. Exactly. Christophe Fiessinger: Two people having a point of view to suddenly a wildfire is spreading out across the entire network. Professor Kathleen Carley: Exactly. Yeah. Christophe Fiessinger: Okay, I get it. Thanks. Liz Willets: Yeah, but back to your point around some of the challenges with the, for example, detecting sarcasm, and is it an emoji? Um, would love to hear your thoughts on just some of the other challenges more generally if you're thinking about building, uh, a, a sentiment model from scratch, um, whether it's for, you know, threats or offensive language, um, or things like burnout and suicide. Um, how do you go about doing that, and how do you go to do about doing that in an ethical, um, manner? Professor Kathleen Carley: Okay. So, um, so, one of the challenges is culture and language because the way we express sentiment vary, differs, even though there's like basic emotions that are, that are built in cognitively in our brain. The way we express those is socially, culturally defined. Christophe Fiessinger: Mm-hmm (affirmative). Professor Kathleen Carley: So, one of the big issues is making sure you understand the culture and the language that's associated with it. So, that's part of it.             The second, a second, uh, critical thing is the fact that, um, when people express themselves, when you're using, and if you're mainly using online data, um, people can go silent, in which case, you don't have any data. Your data could just be a sample. They could choose to enact one of their personas and be lying. Christophe Fiessinger: Yeah. Professor Kathleen Carley: So, there's lots of ways in which your data- Christophe Fiessinger: Mm-hmm (affirmative). Professor Kathleen Carley: ... itself could be wrong, okay? And that's another big challenge in the area. So, those, I would say, are, uh, so those are examples of some of the challenges in addition to having to have the whole discussion and having to, you know, be careful what you're looking at sentiment around and so on.             So, from an ethical perspective, um, I would say that part of this is, is that when you're collecting data and trying to analyze it and create, like a model for one of these issues, one of the biggest chall- one of the biggest issues is making sure that you haven't over focused on a certain class of people, like only focused on young white guys or only focused on, you know, um, agent, uh, Hispanic women. You wanna make sure that you're as k- much as possible balanced across the different kinds of publics you want to serve. So, that's, that's part and the ... That's one of the challenge, or one of the kind of ethical guidelines and challenges at the same time.             Um, the other part is if you were actually going to, to intervene, then you'd need to think about intervention from a, you know, what does the community consider appropriate ethically within that community for the way you intervene? And the answer may be very different if you're talking about, you know, intervening with children versus intervening with, uh, young adults versus intervening with people with autism. So, so you need to look at it more from a community perspective. So, those are two I would raise. Liz Willets: That's fine. Yeah. I think, um, you know, especially at Microsoft, we are committed to having unbiased, um, training data so that we aren't, you know, discriminating by against someone because they have these, um, certain characteristics, um, and definitely keep that top of mind, um, as well as, you know, remediation and, and how do you go about now that you've identified that this person is at risk for whatever, uh, reason? Now, how do you reach out to them and give them the support they might need, or how do you alert, um, you know, someone who, who might need to step in? And so, I think that's been, um, a really interesting challenge that we're digging into on our end as well.             Um, and I think to the first piece you were talking about just more generally the challenges, um, I know you've done some research around control theory, um, and would love to get your perspective on, you know, especially, uh, in some of these more granular sentiments. Like, how do you differentiate between anger, disgust, disappointment, um, and, and really, um, kind of define exactly what you're looking for in the communications to pull that out? Professor Kathleen Carley: Yeah. So, um, basically, we, we start with what are thought to be the basic emotions, the ones that are built in cognitively? So, and we would take those ones, and those, you can distinguish fairly reasonably on the basis of the cues I was talking about, and they're kind of big swaths of things. Of course, most of the basic emotions are ones that are kind of more on the negative side, so it's really more on the positive side, discriminating, you know, happy from ecstatic from mildly amused. That, it's much harder there 'cause that's, none of those are ba- basic, just happiness is basic, right? All the others are variations of happiness.             So, we start with the basic of the emotion and try to discriminate into those categories, and to go further than that, we often find we don't need to. If we need to, um, then really, it's because the context demands that you have to pay attention to a parti- ... So, you're looking for something particular in a particular environment. And so, then, we let the context dictate what the difference is that it's interested in.             Um, so, for example, if I, if I was doing this for Disney for, you know, people's response to a new ride, for example, that context would dictate that what I really wanna focus on is not just happiness but their satisfaction and pull that out. And so, then, I would actually develop my technology around that, around the, the different people who fell into the different categories, and I might do it first by getting survey data or something like that. Christophe Fiessinger: Yeah. Professor Kathleen Carley: But, you know, you said something that made me realize that I hadn't mentioned one of the major challenges- Christophe Fiessinger: That was good. Professor Kathleen Carley: ... that, um, people often overlook because we're so in love with machine learning, right? And we so think, "Training sets," right? Well, the trouble is, in a social space, your training sets are yesterday's news. Christophe Fiessinger: Yeah. Professor Kathleen Carley: They're never up-to-date. They're always, they're always a mess, and a lot of things where you wanna use sentiment and wanna look at behavior of people, you don't have time to build a training set. So, this is an area where we really need new technologies like match functions and things like that, or where you can just get the bare minimum training set and then do some kind of leapfrogging on it. Christophe Fiessinger: Yeah. I think it- Liz Willets: Yeah. Christophe Fiessinger: I think this is to, to that point. I can relate to that. I think the ... And also, what you were s- saying early on the key part where you look at demographics or what is that target audience with that pattern you're trying to detect is even that let's say that sp- specific demographics, you did a good job on day zero. We know language is this constant evolving function, and just because, to your point, you know, it was yesterday's data set. Just because you would put ball on sweats to do a white paper to detect blah for those demographics. Professor Kathleen Carley: Yep. Christophe Fiessinger: That was great at that point in time, but I'm sure it already changed rapidly because of, of the today's availability of social network and things like that, you know. My, when I was visiting Europe, my, my nephew and niece speak English from what they've seen on YouTube and Netflix. Professor Kathleen Carley: Yeah. Christophe Fiessinger: So, just it'll almost feel like language is even moving faster with that, uh, availability of, of that, all those tools worldwide that's making researchers I'm sure his job even harder to stay up-to-date. Professor Kathleen Carley: Absolutely. (laughs) Yeah. The level of new jargon and new phrases out, it's crazy. Liz Willets: (laughs) Christophe Fiessinger: Yeah. Liz Willets: And that's not just in English, too, you know? Professor Kathleen Carley: That's right. That's right. Liz Willets: We were talking last week with Christian around languages (laughs) and, you know, how many languages there are in the world and how you have to kind of build your models to be trained to kind of reason over, uh, you know, diable, double-byte characters and, um, you know- Professor Kathleen Carley: Yep. Liz Willets: ... Japanese and, and Chinese characters. And so, it just (laughs) it's never ending. Christophe Fiessinger: Yeah. Professor Kathleen Carley: And sometimes, the fact that you have the multiple character sets and multiple languages can be diagnostic, right? So, for, like, when we look at, um, response to say natural disasters in various areas, typically, people, when they communicate online, will communicate in one language with others in the same language. And there'll be a few people who will communicate in multiple languages, but they'll have different groups like, "Here's my English group. Here's my Spanish group." Okay?             But during a disaster, you'll see, actually see more messages come out where you've got mixed part English, part Spanish in the same message- Christophe Fiessinger: Mm-hmm (affirmative). Professor Kathleen Carley: ... and, and so it can be diagnostic of, "Oh, this is a bilingual community," for example. Liz Willets: Interesting. Christophe Fiessinger: Interesting. Liz Willets: Well, great. I know, um, Kathleen, I have certainly learned a lot and wanna thank you again for, for joining us today. Um, Christophe, I thought that was a great conversation. Christophe Fiessinger: Yeah. I, after that, I wish I was a student, and I could join, uh, CMU and be one of your students and write a PhD. It sounds like a infinite number of fascinating topics so and, and research topics, so it sounds- Professor Kathleen Carley: Well- Christophe Fiessinger: ... very fascinating.
Episode 7: Say what you mean!
26-05-2021
Episode 7: Say what you mean!
Oh my gosh Oh my gosh, I’m dying. Oh my gosh, I’m dying.  That’s so funny! And in just three short lines our emotions boomeranged from intrigue, to panic, to intrigue again…and that illustrates the all-important concept of context! In this episode of Uncovering Hidden Risks, Liz Willets and Christophe Fiessinger sit down with Senior Data Scientist, Christian Rudnick to discuss how Machine Learning and sentiment analysis are helping to unearth the newest variants of insider risks across peer networks, pictures and even global languages. 0:00 Welcome and recap of 1:25 Meet our guest: Christian Rudnick, Senior Data Scientist, Microsoft Data Science and Research Team 2:00 Setting the story: Unpacking Machine Learning, sentiment analysis and the evolution of each 4:50 The canary in the coal mine: how machine learning detects unknown insider risks 9:35 Establishing intent: creating a machine learning model that understands the sentiment and intent of words 13:30 Steadying a moving target: how to improve your models and outcomes via feedback loops 19:00 A picture is worth a thousand words: how to prevent users from bypassing risk detection via Giphy’s and memes 23:30 Training for the future: the next big thing in machine learning, sentiment analysis and multi-language models   Liz Willets: Hi everyone. Welcome back to our podcast series, Uncovering Hidden Risks. Um, our podcasts, where we cover insights from the latest in news and research through conversations with thought leaders in the insider risk space. My name is Liz Willets and I'm joined here today by my cohost Christophe Feissinger, um, to discuss some really interesting topics in the insider risks space. Um, so Christophe, um, you know, I know we spoke last week with Raman Kalyan and Talhah Mir, um, our crew from the insider risk space, just around, you know, insider risks that pose a threat to organizations, um, you know, all the various platforms, um, that bring in signals and indicators, um, and really what corporations need to think about when triaging or remediating some of those risks in their workflow. So I don't know about you, but I thought that was a pretty fascinating conversation. Christophe Feissinger: No, that was definitely top of mine and, and definitely an exciting topic to talk about that's rapidly evolving. So definitely something we're pretty passionate to talk about. Liz Willets: Awesome. And yeah, I, I know today I'm, I'm super excited, uh, about today's guests and just kind of uncovering, uh, more about insider risk from a machine learning and data science perspective. Um, so joining us is [Christian redneck 00:01:24], uh, senior data scientist on our security, uh, compliance and identity research team. So Christian welcome. Uh, why don't you- Christian Redneck: Thank you. Liz Willets: ... uh, just tell us a little bit about yourself and how you came into your role at Microsoft? Christian Redneck: Uh, yeah. Hey, I'm Christian. Uh, I work in a compliance research team and while I just kinda slipped into it, uh, we used to be the compliance research and email security team, and then even security moved to another team. So we were all forced to the complaints role, uh, but at the end of the day, you know, it's just machine learning. So it's not much of a difference. Liz Willets: Awesome. And yeah, um, you know, I know machine learning and and sentiment analysis are big topics to unpack. Um, why don't you just tell us a little bit since you've worked so long in kinda the machine learning space around, you know, how, how that has changed over the years, um, as well as some of the newer trends that you're seeing related to machine learning and sentiment analysis? Christian Redneck: Yeah. In, in our space, the most significant progress that we've seen in the past year, was as moving towards more complex models. The more complex models and also more complex way of analyzing the task. So if you look at the models that were very common, about 10 years ago, they basically would just look at words, it's like, uh, a set of words. Uh, so the order of words don't matter at all and that's changed. The modern algorithms, they will look at sen- sentences as a secret before and they will actually think the order of the words into account when they run analysis. The size of models has also increased dramatically over the years. So for example, I mentioned earlier that I've worked the email security at the [monastery 00:03:04] that we had shipped. They were often in the magnitude of kilobytes versus like really modern techniques to analyze the pensive language. They use deep neural nets and the models they can be the sizes of various gigabytes. Christophe Feissinger: What's driving that evolution of the models. Uh, you know, I'm assuming a, a big challenges to, uh, or a big goal is to make those model better and better to really re- reduce the noise and things like false positives or, or misses. Is that what's driving some of those things? Christian Redneck: Yeah. So at the end of the day, you know, the model size of translates in the complexity. So you can think of, um, the smaller model is basically they have very levers on how to modify their decision. If you have a very large model, it will just have that many more levers. If you wanna capture the variation that you have in your data set, often you need a lot of these levers and new models provide them. It's not just that, uh, there's one thing I didn't mention explicitly, the newer models... So traditionally old models, they were trained on the relatively small set of data that's split into two parts, the positive set, the negative set. And basically the machinery model was kinda trying to draw a boundary between them.             The more modern model affected rates factor different. Uh, we do something called pre-training, which means that we train a model on neutral data, which are neither positive, nor negative to just capture elements of language. So once the model is loaded up with like huge, huge amount of data, huge amount of this neutral data, then we start feeding into positives and negatives to draw the boundary, but it can use all this information that is gained from the general language to make that decision. Liz Willets: That's super interesting. Um, you know, when I think about technology and kind of leveraging, you know, the machine learning to get an early signal, um, you know, something like discovering a canary in a coal mine, um, you know, how do you go about, it sounds like we're feeding positives and negatives towards neutral data, but how do you go about finding like the unknown unknowns and, um, you know, maybe identify risks that you may or may not have been aware of previously, um, with these types of models? Christian Redneck: It, it's the, at the end of the day, it's the neutral. So the way you can see it as that is you feed it a few, say positives, um, known positives. And that gives you an idea of where, you know, we know that possible attacks are, but then what's happening is it's using all this language is learned from the neutral data to consider like, okay, w- we've had to state our point, but everything that is like semantically close to that is most likely also something that we wanna target. And, and that's really, that's really the recipe. I mean, th- th -the ML that we're using, it doesn't have magical capabilities. It can really detect patterns that we haven't had before. It, it's possible in other parts of the incident risk space, if you rely on anomaly detection. Um, so not only in tech, in some sense, anomaly detection is a, is not a negative approach.             So now in our approach, we have the positives and that's our starting point and for the positives for trying to see how far we can generalize from those to, to, to get a wider scop. In, um, what I mentioned in, uh- Christophe Feissinger: Anomaly detection. Christian Redneck: ... anomaly detection, thank you so much for Christophe. It, it's kind of the opposite. You're trying to learn from the negatives. You're trying to understand what the typical state of the, of the system is and everything which deviates from it is anomaly that you might wanna look into. So that has more abilities to detect things which are completely unknown. Christophe Feissinger: Yeah. Liz Willets: Love it. That's super talenting from both, both perspectives. Christophe Feissinger: That's, uh, I think, just to step back and, and to make, um, the audience appreciate, um, the complexity is, you know, a simple sentence. Like if I sent a, a team message to Liz and say, I will hurt you. Again, so first of all, there's no foul language. It's perfectly okay obviously, but the words that sentence targeted at someone else could mean a potential, uh, threats- Christian Redneck: Right. Christophe Feissinger: ... um, or harassment. And so for the audience, the challenge here is not to detect every time the word, uh, hurt, because hurt could be, uh, using perfectly acceptable context, but here targeted at someone, uh, that set of words potentially could be a risk. And I think- Christian Redneck: Right. Christophe Feissinger: ... that's the, that's the journey you've been on, uh, as well as, uh, the rest of the research team. And that's where you can just do look at single words, you get to look at a sentence, right Christian? Christian Redneck: Yes. That's exactly right. So older ML algorithms, they will just see the I, the will, and the hurt, kind of independently, and then do a best guess based on the presence of any of these words, more modern algorithms they will actually look at the sequence. I will hurt. They're perfectly capable of learning that the combination of these three words in that order is something that's relevant versus if they come in a different order or dependent of, uh, you know, in a different context, then it might not be possible. And let me pick up what Liz had mentioned earlier. So modern algorithms, if you train it or something like I will hurt you as a positive, it'll understand that there's a lot of words which are similar to hurt, which kind of have the same meaning. So it will also pick up on something like, I will kill you. Uh, I will crush you, even though you haven't fed those into the positive set. Christophe Feissinger: But that all falls into that kind of threat, which- Christian Redneck: Yes. Christophe Feissinger: ... stepping back is a risk soon as someone starts using that language, maybe, maybe they are actually meaning those things and they're gonna escalate or transition to physical threat. Christian Redneck: That's a real possibility. Yes. Christophe Feissinger: Okay. Liz Willets: Definitely. Yeah. I think it's interesting, 'cause I kinda feel like where you're headed with this is that you can't just use keywords to detect harassment. You know, it's kind of like thinking about overall sentiment and, and tackling sentiment is not, um, you know, an easy thing to do, you know, looking at keywords, won't cut it. Um, and would love to get your perspective, Christian, you know, from an intelligence and modeling view around identifying that intent versus just the keyword level. Um, you know, how do you get a string of words together that might indicate that, uh, that someone's about to, you know, harm someone else? Christian Redneck: Yeah. So first of all, you're right. Keywords by themselves they're usually not sufficient to solve this problem. They are very narrow, very focused problems where keywords might get you a long way. Like say, if you care just about prof- let's take the example of profanities. You care, just the profanities. There's a lot of words that you can put in the keyword filter, where we're gonna do a fine job? And this classifier is actually gonna do quite well. You're gonna start seeing borderline cases where it's gonna fail. So, you know, there, there are some words that are profanities in one context, but there are perfectly normal words in another context. Um, I mean, I don't wanna use profanities, but most of you might know that a donkey has a synonym, which actually is a swear word.             So if you including in your list, then obviously you will hit on the message every time that someone actually means to use the word that's if it was donkey, but from a profanity, you can get a long way. If you look at things like threat, it's pretty much what Christophe said earlier. Um, all three words I will hurt, uh, forwards. I will hurt you. Each of those words will appear most of the times in a perfectly, uh, normal context where no harassment or no threat that's present. Christophe Feissinger: Right. Christian Redneck: So you can put any of those into your keyword list. You can say, okay, I can evolve my model from a keyword list to a key phrase list. You can say, uh, I will actually take small phrases and put them into my list. So instead of just, will, or just hurt, you will put in, I will hurt you and I will kill you. But now the problem is that, you know, there's a lot of different ways in which you can combine seemingly in the normal words into a threat. And this is ext- incredibly hard to numerate all of them. And even if you were to numerate all of them, you know, the language of waltz, it, it might be something that is good today, but in maybe half a year, your list will, you know, will not update. If you have ML models, this problem gets solved in a very convenient way.             So first of all, the model by default kinda understands variations of language due to this pre-training. So we'll already capture a lot of variations that correspond to one of your input examples. And second of all, it's relatively easy to retrain these models based on new information that's coming in. So if you install like say a feedback loop, you give customers the possibility of saying, okay, Hey, look, this is another example, uh, that I've found that I would like to target. It can very easily be incorporated to the model and then not only catch this, but a lot of additional variations of a new set, this stuff came up. Christophe Feissinger: Yeah, I think, yeah, I think, uh, the, I think what's important here is this is not a static, it's a moving target because like you say Christian, language evolves, you know, there's always a new generation, there's a new slang thanks to social media that spreads rapidly and new way to hurt or insult someone or to harass or whatever it is. Um, and it evolves. So I think it's, it's, you're right. That it's a moving target. So it's all about the learning part of machine learning to either, like you say, identify new part that didn't exist before because language evolve or dismissing what we call false positives. So if I'm a seller and say, I will kill it this quota, I mean, norm, I mean, like I'm gonna exceed my quota and maybe the model caught that and we need to say that's okay. That, that, that sentence I'm gonna kill my quota is okay. Uh, hurting someone else not. Okay. Liz Willets: Yeah. And I'd love to learn a little bit more, you mentioned this feedback loop kind of, can you tell us a little bit about behind the scenes, on what that looks like? You know, how, how you might see, uh, a model improve based on those, um, feedback points that, um, you know, end users might be giving to the model? Christian Redneck: Uh, I'll try my best (laughs). So like, you know, thinking about it being a lance and lance doesn't quite hit the target. If you feed it, if you feed it a new item back, it will move this lens slightly closer to the target. And if you keep doing it, it's gonna do that until it actually hits the target. And not just the target, once again, the ball can generalize, so it will hit everything that's kind of similar. Christophe Feissinger: Yeah. Just to add to that, I think, um, in addition to the model, again, you get, uh, listeners gotta remember that it's, it's an evolving target and that Christian say you're seated with data and we do our best to have representative data. But again, the world of languages is so fascinating because the permutations are infinite. You know, we haven't even talked about multi language support in globalization, but you can imagine that, uh, even in words, a lot of people might swap letters with, uh, symbols or, or just to try to get away with, with whatever, um, things are trying to do. But it's, you can, basically where the point is, the combinations are infinite.             So the only way to, to tackle that is to continue to learn and evolve. And for us to learn, that's when we need a feedback, not just from, let's say one industry in one region, uh, but from all industries across the world, as much as, as a school district in the US has a manufacturing, a manufacturer in the UK or whatever. Um, so it's, it's definitely, uh, a fascinating field where w- you know, we can, we're continue invest. Liz Willets: Yeah. Christophe Feissinger: What do you think Christian. Christian Redneck: Yeah, no. I completely agree. And at the end of the day, the same image, so the difference is you have a target, which is moving, and you have your lens, which is kind of like trying to catch up to it. It's a bit of a curse of the mail that you always a bit behind. So you always have to rely on people giving you samples, which usually means it's violations, which have already occurred. But at the same time, the retraining cycles, they're, they're fairly short. So you can adapt quite quickly to do information and adjust to new items that you would like to catch with your model. Christophe Feissinger: Yeah. Is it, is it a good analogy Christian, to draw from things we do on the security front or a malware phishing or virus it's an evolving target? Christian Redneck: Oh, absolutely. Uh, [inaudible 00:16:22] the risks in cyber security or, yeah, the overlap is massive. If you think about it. I mean, the way I like to think about it is that security kind of deals with the external attackers versus inside the risks and do some insight internal attackers. So you can see that, that the overlap assists, you know, very big, almost everything we doing compliance, we do security is very similar way. So for example, we have a lot of ML models deployed into production. They get retrained on a regular basis with new data, but there's insecurity. You know, there's a lot of other features that you can use as attack vectors, and then we have a lot of models built around those. Christophe Feissinger: Christian, how about the, one topic that I think is also, we hear a lot is sure you get valid feedback, but valid feedback is bias and someone's trying to, instead of improving the detections, trying to take it, introduce bias, whether it's racial or, or sexual nature, whatever. H- h- h- how do you make sure you mitigate for that type of, I guess, junk feedback or bias feedback? Christian Redneck: Yeah. Junk feedback is, is indeed a problem. There, there's a few things that you can do. Uh, first of all, we don't usually accept feedback from everyone, but the feedback we accept is usually people from admins and admins, we know our understanding is that they have a certain amount of knowledge that they can use to get feedback. Christophe Feissinger: Hmm. Christian Redneck: And that's particularly true if they get the feedback you're looking on from end users. So we usually, they won't just blindly trust them, but, but they will look at it, at it, and then only if it's right- Christophe Feissinger: And [inaudible 00:17:57] trash. Christian Redneck: Right, tri- [inaudible 00:17:59] trash. Thank you. So that's one way, um, then generally we don't just, so we're not rebuilding the amount of the data and then just automatically pushing it. There's actually a whole system, which ensures that whatever new model we've built is better than the previous model. So if someone feeds in poor feedback, you would expect that the model it gets worse, does worse of the test set. And in that case, we would publish this model and just discard the feedback and move on. That might store that data that will slow down the process. But at the same time, it ensures that the models will degrade and actually get better. Christophe Feissinger: No. So again, do you think saying, we do have a rigorous process to make sure that- Christian Redneck: Yes. Christophe Feissinger: ... a blind, doesn't blindly me, uh, role in production versus the quality along the way to make sure it's converging not diverging. Christian Redneck: Yes. Liz Willets: Definitely. Yeah. And I think having those responsible AI and ML practices is again, to your point earlier, Christophe, something that's always top of mind for us, anything concerning privacy (laughs), uh, really in this day and age. Um, but to kinda just change gears a little bit here. Um, last week, when we spoke with [Grumman Tolobby 00:19:07], we got into the conversation around like GIPHYs and Memes et cetera. Um, and you know, thinking about how we can prevent users from trying to bypass detection, um, whether it's putting inappropriate language into images, um, and you know, trying to think about how you might extract that text from images. Um, we'd love to hear if you can talk a little bit to, to that side of things. Christian Redneck: Yeah. Um, I'm actually not an expert in the area, but, uh, image recognition is, is in general variable theory. It's actually a lot more involved than, than text processing. Almost everything we have done text processing we kinda stole from the people that have previously done an image processing. Like for example, the pre-training that I, that I mentioned earlier and in particular of their excellent bottles which, uh, can extract text from images. So I, I don't know what Microsoft version is called, but it is very, very good. You can almost be guaranteed that if you have an image, we can extract the text that appears at the image, and then just process it through our regular, uh, channels.             So that's regarding texts and images. If it comes to images theselves and that's something that actually our team doesn't do directly, but there are lots of models which, uh, target, let's say problematic images. So what I've mostly seen is detection of adult images and gory images. Christophe Feissinger: Yes. Christian Redneck: And usually these classifiers, they actually operate in almost the same way as to [inaudible 00:20:47] I mentioned earlier. They start, so they're usually very big models. They start by pre-training them on just any kind of images. So they use these huge collection of public images to train the model and just kinda learns patterns. And in this case, you know, patterns are literally like visual patterns, they'll understand round shapes, square shapes. It will understand, it will have a vague understanding of the shape of a human than all sorts of different configurations. And, you know, of course, it can also understand the different color shadings. So models like that, they'll probably learn that if you have, uh, from human shaped with a lot of red on it, then it's probably more likely that there's, that you've already image as opposed to a promoter human with a lot of purple on it or a green on it. Liz Willets: That just kind of reminded me of something, you know, when, when you see those images and you're extracting that text, we're also still able to provide that feedback loop. Um, because I do remember we had one case where, you know, we were working with this school district and they all of a sudden started seeing a lot of homework assignments, um, being flagged for gory images. And it came down to the fact that the teacher was using red pen to kind of, you know- Christian Redneck: Yes. Liz Willets: ... mark up the student's test or quiz- Christophe Feissinger: Yeah. Liz Willets: ... or whatnot. And so there's always, you know, that feedback loop top of mind. Christophe Feissinger: Yes. Christian Redneck: Yeah. I think that ties back to, I think, to, uh, exactly what Christian was saying that obviously with a pandemic now, everything is online and doing annotation of maths exercise with a red pen. Uh, I guess the initial training set didn't take into account that type of data like in a school district, uh, using modern digital tool to do math assignments. And so that's a perfect case that, yeah, it detected those as potentially gory because it was a lot of red inking on a, on a white background with formulas. Uh, and, but again, it gets back to what Christine was talking about. Then we pass that feedback. So pretty much like we, the text detection need to evolve that image detection of what, what is defined as gory needs to ignore forming us with red annotation and start to be a little more, to be refined to avoid that in the future, because that's what we would consider a false positive. So it equally applies that any model, whether it's text or image, there is always that virtual cycle of, of constantly learning new patterns. And this one, that's a good example of a use case that we miss when we build those models. Liz Willets: Christian, um, you know, I'm just certainly learning a lot today (laughs), um, through this conversation. Um, but love to learn what's next. Um, you know, whether that's in your role or, um, just regard to machine learning and, and sentiment analysis. Um, but what do you think kinda the next big thing will be? Christian Redneck: That's a very good question (laughs). So, uh, from our perspective, our main effort is to get other features into the system, even when it comes to text processing. So as you mentioned earlier in, um, security, we have a much richer set of features that we've been using for quite a while now. We wanna do the same journey of our text models. So if you look at the communication, for example, you can induce, uh, whether it's falls under, it, it should hit on a certain policy or not, but you actually get more powerful models if you not just look at that one message, but that the entire conversation, or at least, um, you know, like the conversation, which is near, or, your target message. Like for example, the language that is acceptable between the students and the language that's acceptable between the student and teacher put different, it might not necessarily be the same. So there's a very rich set of, um, possibility that arise from looking at all of these metadata surrounding a message. Christophe Feissinger: Yeah. I mean, it's, it's, I'm glad you mentioned that of getting more context, because we did have a, uh, um, an example from the school district where, um, a student at [St. Litery 00:24:53] something like I will kill you in, in teams. And that was detected. Then the next question was what was the context around that? And sure enough, uh, the context was two students playing a video game. Um, so suddenly I went from a high alert, you know, the student is gonna- Christian Redneck: Yeah. Christophe Feissinger: ... hurt this other student, whereas no, they're just having fun. So I definitely second that they're just adding the couple messages above and before that- Christian Redneck: Right. Christophe Feissinger: ... you see that they're just playing a video game. And even though that language might not be acceptable, it's definitely not as bad as, uh, that intent to hurt someone. It was, I don't wanna hurt that virtual character in the video games. So yeah, definitely, uh, second down more context will definitely help really decide if this is really a, a high severity and more important what to do next in terms of remediation and cursing, the one thing I wanted to, we didn't really talk briefly, but we know that angu- language is not just US English. What are we doing to, to cater to other languages that our customers speak worldwide? Christian Redneck: Right. So we started all our efforts in English, but we're currently working on globalizing our model, which means that we want to provide the same protections for users in lots of other languages. We have like three tiers of languages and we're currently very focused in the first year, but eventually what we plan to get to all three tiers. And in principle, you have two ways of approaching this problem. The simplest thing you can do is you can basically build one model per language and that's something which works reasonably well. But in principle, what we aim for is models, which can deal with all languages at once. So there's been a lot of research in this area they're called multi-language models. They used the very same techniques that you use for, um, that you use for just English [inaudible 00:26:58] but then they have a few additions that make it suitable for applying it in a context with a lot of languages.             And basically what it's trying to do is so there's very powerful models which can use, which you can translate from one language to another. And if more than a few of the ideas from these models and incorporated them, which enables the model to basically in some sense, like relate all the languages, uh, to each other at once. So these models they will understand, I mean, I understand in a, in a machine learning way of thinking about it, that one word in Eng- one word English, as long as its translation into Greek or the Spanish or the French that they all kind of are, are the same. And, and then this provides that opportunity. So particularly, it means that you can train models in, uh, say like a set of languages and you'll actually get decent performance in the other languages, even though it might have not seen these samples are generally very few samples from this other language. Liz Willets: Uh, the more- Christophe Feissinger: That's great. Liz Willets: ... the more and more I listen, the more complex it gets, you know, you're using machine learning to, you know, look at different languages, uh, text versus images, ingesting things from different platforms. It's just mind boggling (laughs), how much goes into this, um, and really wanted to thank you, Christian for taking the time to, to chat with us today. I don't know about you Christophe, but I learned a lot. Christophe Feissinger: Fascinating. Fascinating. Liz Willets: Awesome. Yes. Well, thank you so much, Christian. And, um, thank you to our listeners. Um, we have a exciting lineup of, um, podcast series coming your way. Uh, next time we'll be talking to Kathleen Carley, who's a professor in social behavior analysis at Carnegie Mellon University. So, um, definitely tune in.
Episode 6: Cracking down on communication risks
26-05-2021
Episode 6: Cracking down on communication risks
Words matter. Intent Matters.  And yes, most certainly, punctuation matters.  Don’t believe us? Just ask the person who spent the past five-minutes eating a sleeve of cookies reflecting on which emotion “Sarah” was trying to convey when she ended her email with, “Thanks.” In this episode of Uncovering Hidden Risks, Raman Kalyan, Talhah Mir and new hosts Liz Willets and Christophe Fiessinger come together to examine the awesomely complex and cutting-edge world of sentiment analysis and insider risks. From work comm to school chatter to social memes, our clever experts reveal how the manifestation of “risky” behavior can be detected.   0:00 Hello!: Meet your new Uncovering Hidden Risks hosts 2:00 Setting the story: The types and underlying risks of company communication 6:50 The trouble with identifying troublemakers: the link between code of conduct violations, sentiment analysis and risky behavior 10:00 Getting the full context: The importance of identifying questionable behavior across multiple platforms using language detection, pattern matching and AI 16:30 Illustrating your point: how memes and Giphy’s contribute to the conversation 19:30 Kids say the darndest things: the complexity of language choices within the education system 22:00 Words hurt: how toxic language erodes company culture 26:45 From their lips to our ears: customers stories about how communications have impacted culture, policy and perception Raman Kalyan: Hi everyone. My name is Raman Kalyan, I'm on the Microsoft 365 product marketing team, and I focus on insider risk management from Microsoft. I'm here today, joined by my colleagues, Talhah Mir, Liz Willetts, and Christophe Eisinger. And we are excited to talk to you about hidden risks within your organization. Hello? We're back, man. Talhah Mir: Yeah, we're back, man. It was super exciting, we got through a series of a, a couple of different podcasts, three great interviews, uh, span over multiple podcasts and just an amazing, amazing reaction to that, amazing conversations. I think we certainly learned a lot. Raman Kalyan: Mm-hmm (affirmative). I, I learned a lot. I mean, having Don Capelli on the podcast was awesome, talked about different types of insider risks, and what I'm most excited about today, Talhah, is to have Liz and Christophe on the, on the show with us 'cause we're gonna talk about communication risk. Talhah Mir: Yeah, super exciting. It's a key piece for us to better understand sort of sentiment of a customer, but I think it's important to kind of understand that on its own, there's a lot of interesting risks that you can identify, uh, that are completely sort of outside of the purview of typical solutions that customers think about. So really excited about this conversation today. Raman Kalyan: Absolutely. Liz, Christophe, welcome. We'd love to take an opportunity to have you guys, uh, introduce yourselves. Liz Willetts: Awesome, yeah, thanks for having us. We're excited to kind of take the reins from you all and, and kick off our own, uh, version of our podcast, but yeah, I'm, I'm Liz Willetts. I am the product marketing manager on our compliance marketing team and work closely with y'all as well as Christophe on the PM side. Christophe Eisinger: Awesome. Christophe. Hello everyone, I'm, uh, Christophe Eisinger and similar to Carla, I'm on the engineering team focusing on our insider risk, um, solution stack. Raman Kalyan: Cool. So there's a, there's a ton, breadth of communications out there. Liz, can you expand upon the different types of communications that organizations are using within their, uh, company to, to communicate? Liz Willetts: Yeah, definitely. Um, and you know kind of as we typically think about insider risks, you know, there's a perception around the fact that it's used, um, and related to things like stealing information or, um, you know, IP, sharing confidential information across the company, um, but in addition to some of those actions that they're taking, organizations really need to think about, you know, what might put the company, the brand, the reputation at risk. And so when you think about the communication platforms, um, you know, I think we're really looking to collaboration platforms, especially in this remote work environment- Raman Kalyan: Hmm. Liz Willetts: ... where employees, you know, have to have the tools to be enabled to do their best work at home. Um, so that's, you know, Teams, uh, Slack, Zoom, um, but then also, you know, just other forms of communication. Um, we're thinking about audio, video, um, those types of things to identify where there might be risks and, and how you can help an organization remediate what some of those risks might be. Raman Kalyan: Awesome. And Christophe, as we think about communications risk more broadly, what kind of threats do you... have you start seeing, um, organizations being more concerned about? Christophe Eisinger: Yeah, so exactly to what you just mentioned and, and Liz, so again, there's two, two main use cases; fulfilling regulatory compliance and the regulators definitely have been putting more scrutiny and, and fining, uh, organizations large and small that don't abide by those, uh, laws, whether it's in the US, whether it's in Europe and Canada. So there's definitely an increase in enforcement, so definitely, you know, a common use case that we're seeing over is with the, uh, recent event, and the pandemics, banks wanna enable their workforce to work remotely, and one of the tools that they need is the ability to do meetings and voice and, and chat. As soon as you introduce a n- a new tool like Teams for productivity, you need to, uh, look at, uh, patterns that would, um... that fall under those regulations, things like insider trading and collusions.             So definitely, where the change in the workforce and, and as being remote has accelerated adoption of Teams, certainly people want a, uh, a way to look at those behavior and, and avoid getting fined. And then the parallel work stream, which is also what, uh, Liz was mentioning is, you know, there has been, um, change significantly and that has naturally put some stress. Uh, it could be personal stress, you know, my kids are at home screaming or the dog or whatever, um, maybe I don't have a, uh, a nice room like here today where I can have a podcast, you know, maybe I'm, maybe I'm sitting in the kitchen and my young kids don't understand what it means to hush. So I put personal stress on me.             Maybe I'm stressed because I don't know if I'm gonna have a, a job tomorrow, maybe I've already been [inaudible 00:05:15]. That potentially could trigger me to, to forget that the tool I'm using to get work done and to communicate with my peers, there are some rules of engagement, if you like, and there's things that are not acceptable per employee, uh, code of conduct. And again, all this stress and the fact that maybe I'm lying on my couch make... gives me the full sense of it's casual, but now I'm having a meeting with Liz and Raman, and there's certain language that's just not acceptable at our organization.             So I think that's, that's a new trend that we're seeing that's also backed up by, by regulation in certain countries, um, to make sure there's no abuse over language. And the most common use cases, uh, in the world of education, the, the, the district, the school, the principal are responsible, uh, if bullying is reported or, or misbehavior and to really help mitigate so it doesn't escalate in- into something bad. So, uh, those are examples of what we're seeing this, eh, um- Talhah Mir: [inaudible 00:06:19], Christophe, um, you know, you and I have talked a lot about this sort of interplay and, and looking at, um, these communication risks, it's sentiment at the end of the day. And I know when we talk to our customers, it's, it's a very common ask around being able to understand, uh, these leading indicators. Now, Raman and I talk about insider risk management as a game of indicators, and, um, the, the more leading the indicator, the more impact it's gonna have on being able to help you identify proactive issues. So talk to me a little bit more about how some of these code of conduct violations are actually sentiment that can help you identify somebody who's a potential insider risk in the organization. Christophe Eisinger: Yeah, so the, the high level is, uh, if we take a concrete example, let's say, you know, I say some, some... I use some profanities certainly with peers, and, and... or sexual content, but it's just not acceptable. And, again, assume that Christophe is stressed, just a bad day, kids are screaming, whatever, I'm just stressed in my personal life and I've crossed that line. Now, the question is, was it accidental, Christophe suddenly reached the tipping point and started using foul language, or no Christophe, uh, did use foul language today, but he's been using foul language against Liz for the past 30 days. And not just over Teams or emails, over whatever, the... all the different communication channels that my employer has given.             So I think there's that two things, is it accidental, and I think you, you guys talked about that or is it [inaudible 00:08:02]? And most of the time, you know, we're humans and we get good intent, a lot of the time it is accidental. Uh, so it's just a matter of very quickly, hopefully, uh, seeing that behavior and notifying the [inaudible 00:08:14], whatever is your, your process of telling that person's manager that, "Hey, you stepped out of bound, uh, first warning, you know, maybe retake the employee training, you reread the code of conduct, and all good then and, and move forward."             To your questions, so that's the scenario. What's hard is because of the richness of the language, and we're humans and language keeps evolving, is just looking for specific profanities, there's some usual suspects that have no room in the workplace, but there's more pattern like abuse and harassment where I might not even use profanity, but the way I, I, I, um, criticize Liz or Raman clearly is way beyond constructive criticism. Talhah Mir: Mm-hmm (affirmative). Christophe Eisinger: And then, so how do you detect that? Because it might be u- I might be using perfectly, uh, okay dictionary words, uh, but when you read it as a whole from a sentence is horrendous or is just not acceptable? Um, so that's... To the... your question, like to really get to the crux, which is the a- that intent, that sentiment, you need to certainly look at the context and the intent. You need to see, is it a one-off with Christophe against that person, or no, it has been a pattern of repeated, uh, uh, communication risk against the individual. And so that's where, um, the problem is a fascinating problem and ever evolving because human language is this dynamic dimension that keeps evolving every day. And as you can see, I'm sure you have kids, with social media, whatever's the new buzz word, that certainly is part of the common language and guess what, we need to adapt to detect those new patterns. Raman Kalyan: Yeah. That's, that's fascinating, man. I think a couple of questions, one for you, and, and one for Liz. You mentioned a couple of things. One is that there's this accidental or inadvertent type of, "Hey, I... Maybe I'm not meaning what, what you think I'm meaning." So I'd love to kind of tease that out in terms of like, how does, how do we deal with that in terms of like a privacy... from a privacy perspective, right? So, you know, um, don't... you don't assume that the individual is actually doing something wrong, you wanna investigate it further. And then... That's a question for Liz and then a question for you would be really around, okay, you talked about context, how has the technology evolved to be able to really sort of understand that context? Because I know there's a lot of tools out there that promise, you know, offensive language detection or like, you know, the sentiment analysis, but they really focus in on pattern matching. And I wanna try to contrast, you know, how are we approaching that from a, from a, uh, machine learning perspective or AI perspective. So maybe Liz, you can go first on the privacy side. Liz Willetts: Yeah, definitely. I think that's a great question. Um, you know, we at Microsoft always keep the privacy of our customers top of mind and so wanna ensure we're, um, you know, delivering solutions to our customers that really have those capabilities built in. So, you know, when we think about, um, you know, communications, we think about, um, you know, making sure that all of the, um, communications that organizations are seeing in their solution are synonymized, um, meaning that they are de-identified, and so, um, when you think too about, you know, the fact that this is on by default, um, you know, customers are opted into, um, then you have to think about those people who are actually reviewing, um, and scoping the policies out to their workers, their analysts, their investigators, and so we definitely also keep, um, role-based access control top of mind so that only the right people, um, within an organization are able to see, um, you know, certain policies, f- flagged violations, um, and then, you know, we, we have audit reports where we can ensure that those investigators and analysts aren't misusing the data that they have at hand.             But then also thinking about, you know, one of the, the more important differentiators is that insiders are actually in a position of trust. And so, you know, they're making use of privileges that have been granted to them to really perform their role, and if they are abusing them, um, you know, we definitely wanna make sure we're catching that while at the same time, ensuring that those privacy principles are in place. Raman Kalyan: Awesome, that's great. Uh, really, that's, uh, great to hear. And then Christophe, as we talk about the evolution of the technology, you know, and talk to me a little bit more about how we've evolved the technology to kind of talk about what you said, which was this context, this sentiment, like, how do we get to that? Christophe Eisinger: Yeah. Actually, I don't wanna talk about technology. I just wanna talk about the problem we're trying to solve. Now that... Uh, leaving that aside, so yeah, it's all about context because it's, it's already a challenge and I think we're... one of the future podcasts will go about that to detect negative sentiment, uh, for... is, is already a challenge in itself, but the question is then you put that into context. Was it just the first time, Christophe just having a bad day, he crossed the line, he needs to be reminded that this is, uh, not acceptable and problem solved, and he never does it again? Or no, he crossed the line and guess what? Last Friday he put in his resignation and it looks like he started downloading a lot of document that were marked as confidential. So suddenly you're getting language risks, you know, a code of conduct violations, but you add that with the fact that he's gonna leave and he's also been... downloaded things that could potentially signify, um, theft.             So certainly getting that whole context of that individual, at the end of the day, what, what all that context give you is then your remediation action can be very specific versus just saying, "Christophe, stop using foul language." You know, suddenly we need to maybe pull in our compliance team or legal team or a security team or Christophe's manager versus just slapping him on the wrist for a foul language. So context is very... uh, is hugely important to help you deal with the proper remediation and the proper process based on that initial red flag which was foul language for instance. And so obviously that's, that's the, you know, the ideal, the, the uber solution that, um, a lot of us are trying to solve because the more complex you have, then [inaudible 00:14:59], position to really find those needle in the haystack and then take the appropriate action versus dismissing foul language when this person is on the road to actually burn on the house. Raman Kalyan: Yeah, that's, that's actually a really important point. I think the whole context, it's not even just the context of the communication, it's context of the sequence of events surrounding that communication and what might've happen before mi- might be happening after. Christophe Eisinger: Yeah. And Just to add to that, [inaudible 00:15:25], to mention that one thing I wanna be clear to the audience, uh, we're fully aware at, at Microsoft that it's not just the way you communicate in, in 365 such as Yammer or on email and, and Teams, but we also potentially help you... Like I said, if you give a, a, a work phone to your employees and they have SMS or they have WhatsApp, or they use [inaudible 00:15:49], technology or professional apps like Instant Bloomberg- Raman Kalyan: Mm-hmm (affirmative). Christophe Eisinger: ... you gotta be holistic because again, you might see one thing in one channel, but it's actually probably hiding maybe the forest of abuse or maybe my initial thing to Liz was on Teams, but the really bad behavior happen over SMS. So giving you the ability to look holistically and make sure you've... you reduced the blind spots as possible is also something that's, uh, dear to our heart. Raman Kalyan: Yeah, so having that sort of one pane of glass, you don't have to have multiple solutions and platforms that- Christophe Eisinger: Yeah. Raman Kalyan: ... you're trying to manage and manage workflows, manage integration, and signals, you can actually take one pane of glass and look across multiple communications and leverage the technology to identify the risks that are most important to you, right? Christophe Eisinger: Yes. Talhah Mir: So, um, Christophe, you... Christophe, you're gonna talk like multiple times a day and, and a lot of it is words, a lot of it is passionate words, but a lot of it is memes and GIPHYs that we send back and forth, so how do you think about in the context of, um, the communications and words and whatnot, how do you think about, uh, memes and GYPHYs? 'Cause some could be funny, but some could be crossing the line, right? Christophe Eisinger: No, You're you're spot on and, and it's definitely... Back to Liz, what Liz was mentioning, we know that communication is not written, right, anymore. And, and, you know, some of us have been on the workforce longer than others but... and some of us have kids and we've seen definitely the shift- Talhah Mir: Yeah. Christophe Eisinger: ... that it's no longer just an email or a one page memo, uh, now we have the Torah of channels on how we can do work, but like you say, rules for the form on how we communicate is not written. And so for the audience, what, uh, Talhah is referring to, it could be an image and very commonly, a lot of people, um, will annotate on an image, will literally put text on an image and that text could be a risk, could be very nasty, could be inappropriate, could be containing customer information, could be containing confidential information. Um, so how do we detect that if Christophe is just sending images in Teams all day or over email but if there's actually nothing is written?             Um, so we're actually working on, on, on this problem and we have a number of solution because there's like basically two patterns. First of all, there's the obvious image, you know, maybe is, is racist or adult or, or gory in nature, and that again has no place in the organization. So just recognizing, uh, the content of that image. But like we say, in addition to that, we're also working on doing, uh, what we call, uh, in technical jargon, optical character recognition. So extracting whatever the text is, whether it's a written sketch or, or typed on top of the image, and then once you get that extracted test, run that to our detection, we say, "Is it... matches code of conduct violation? Does it match potential regulatory, uh, compliance violations?" And so forth?             So yes, we're absolutely looking at other forms of communication that are including in our... in the tools we use day in day, uh, such as images. And you're probably thinking how about video? And yes, this is also, uh, something we're, we're, um, working on in the futures. The goal is to reduce as much as possible, those blind spots. And that's what effectively we're doing, you know... If the end user thinks they can outsmart the system by just putting whatever, some social security from their favorite customers of a bank account or swear words in an image then, uh... and not in written text, then we wanna mitigate that to again, close all those bad blind spots. Liz Willetts: Yeah, and I would add there to that too, it's, it's not just... English isn't the only spoken language. So thinking about globalizing, um, some of that as well 'cause I know, um, we were talking to a customer in the EDU space and they were saying, "Hey, you know, students are trying to (laughing), bypass the system. They are writing... They are cyberbullying and, and writing harassing messages in Japanese, um, translating that through, you know, a translate app and sending that to their peers." And, um, you know, being able to detect things like that, not just in English, um, is certainly something that's also come, um, to the forefront for us. Christophe Eisinger: Yeah, that's... Thi- this is a true story that Liz is telling and it was interesting for us. And that's when you learn so much from kids, uh, are very... Creativity to abuse the system or be colorful is amazing and endless. But yeah, this is a true story of a school district in the Midwest, and, and we're definitely, to Liz's point, being Microsoft, we know we, we wanna cater to, uh, customers worldwide, and we already had strong demand in Asia that has laws to protect against harassment, so there's Japan and others, and we're, we're, we're wanting feedback from some customers into one of those customer interaction, and we asked the school district, "A, we're looking at introducing, um, the abuse, uh, in those languages, would you be interested? Including Asian languages."             And the customers to our surprise say, "Yeah, I'm very interested in that." It's like how come a customer in the Midwest in the US is interested in, in Japanese and Korean and simplified Chinese. And to Liz's point, some students might not even be native in those language but they can definitely use a search engine. And instead of saying what I think about Talhah in plain English, I'll translate it and put the translated version with the, with the Katakana or Kanji which are the alphabets in Japan, and think I can get away because no one else besides Talhah will figure that I'm, I'm being very nasty and my school administrators is definitely not fluent in that language and will think it's harmless. So yeah- Talhah Mir: Now we're gotta, gotta go back and search our chat history, man. Now, now, Japanese characters are making sense. I gotta go (laughing), translate them. Liz Willetts: (laughs). Christophe Eisinger: I mean, it, wasn't just in French. Talhah Mir: (laughs). Raman Kalyan: Now I have to look at my kid's chat history and be like, "What are you... What is that?" Christophe Eisinger: Yeah, anytime you find some language you don't speak, question yourself. Uh, it might not be love words after all. Talhah Mir: (laughs). Liz Willetts: (laughs). Christophe Eisinger: I'm just saying. Raman Kalyan: Well, as you know, one of the things that we've talked about is, uh, the importance of supporting company culture, right? And how toxic communications, um, can erode that, you know, culture and the trust in your organization. I'd love to talk a little bit more about that and, you know, get your perspective on that and also talk about how, you know, some of the remediation actions we have within, you know, this solution can help organizations really address, uh, or support a positive company culture. Liz Willetts: Yeah, definitely. I think there are a lot of cultural implications, um, for, uh, a corporation or, um, an organization and, and definitely having the ability to support their, um, company culture, but also to support their employees in times when, you know, they might be going through an external stress factor, you know, COVID being a great example. Um, you know, an organization that might be looking at, um, you know, their company culture impact in this day and age, they want their employees to have the tools and, and support to do their best work, whether that's webcams, computers, conference calls, um, and you know, now in the context of remote work, you know, you're in the privacy of your own home, um, and there are definitely distractions all around. And at the same time, you have to remember, "Hey, this is a work environment." Raman Kalyan: Mm-hmm (affirmative). Liz Willetts: Um, so there are definitely some things that you should, and shouldn't say in the context of work that might be okay in your personal life, um, but you know, in the workplace, there still is a code of conduct charter, you've signed it, um, you know, you take training, hopefully on the first day of work, um, and so in this context, how do you remind people, um, you know, that there is this change for remote work but the same standards still apply, um, you know, whether that's fostering diversity and inclusion within your company. Um, and, and you certainly wanna make sure that you're investigating and remediating something, um, that your employees know are, um, wrong, you know, something like sexual harassment, um, you know, lots of, kind of potential infractions, um, and to kind of...             One, from a brand reputation perspective, you know, this person might go off and write some social tweets or whatnot, um, and have a pretty big and bad impact for your organization. Um, so it's kind of one thing to have code of conduct, a charter, um, but another is to really live by it and, and show your people that, um, you know, it's, it's really something that you're invested in. Um, and so I think also it's not all that (laughs). Um, so, you know, we're under stress, job security concerns, scared of, um, you know, a loved one or a parent getting sick, and so maybe you're not intentionally trying to hurt your peers, um, but just, you know, perhaps used an inappropriate word or expressed your frustrations at work.             Um, and so I think that that's kind of where you can also come in and provide support. You know, maybe it's a little slap on the wrist, but just remind you what your company charter is, um, maybe, you know, encourage you to retake some of the trainings, um, and really just kind of making sure that all around, um, you know, employee wellbeing is, uh, kind of top of mind for the company. Talhah Mir: Yeah, and On that note, Liz, I know you talked to me about the fact that, you know, technology like this, solutions like these are not just about finding the bad, it's about, you know, uh, an organization using it as an opportunity to show a commitment towards a positive employee culture and saying, "We're gonna put money behind what we say is important to us, which is a positive company culture." But some of the stories that I've heard from you was just amazing where companies are looking to do, whether it's education or government or private, uh, sector, just being able to back that up and say, "We actually care, we're gonna look out for these things." And to your point, it's not just, "When we find something bad that we're gonna take some, you know, dramatic action. It's like when we find something, it's an opportunity for us to educate and kind of uplift the culture." So I think that's a, that's a really important one for you to call up there. Liz Willetts: Exactly, yeah. And I think, um, you know, especially as you think, living and breathing your corporate culture and, and your principles, um, it's important 'cause, you know, other employees are expecting you to take action on, on certain things and, um, you kind of have to uphold your standards as well to, to match their expectations. Talhah Mir: Hmm. So What are some stories that you guys have heard or come across from customers? Something, uh... And then I don't know, I don't know which one of those you can actually talk about here, I don't... You guys have shared a lot of those offline and stuff, and I talked about quite a few, but what are some, some great examples of positive impact that you've seen that you're... that you guys can share? Christophe Eisinger: Uh, I'll share one. I'm not gonna mention the customers, uh, due to sensitivity, but to your point on... and what Liz was saying that, you know, it doesn't take... You just look at the headline in the newspaper and you can see there's potential regions, potential, uh, industries that, that had bad press, and, uh, probably for good reasons because of, of not doing anything about those, um, abusive behavior. Uh, so I, I've been involved with one customer, um, I'll just say North America, but it was exactly to get ahead of that. They, they haven't been in the headlines, the industry has been in the headlines, and it's just a mandate from their leadership team to say, to your point, "We wanna be proactive so we want a virtuous cy- uh, cycle of making sure we live by, to Li- to Liz's point, live by our code of conduct." So it's more like. "Le- I wanna get ahead of the game because I wanna show all my employees I've got their back and this is a healthy environment, please don't go to my competitor. Like we've got your back and let me prove it to you that we're, um, fostering that healthy environment."             The example, that example, I mentioned earlier, it's it's not a company, but it's the same team where in Japan in April of 2020, a new law went into effect around, uh, what they call power harassment, and so the question is great, there's this new law that if your manager or your manager's manager is, is abusing you, uh, it's illegal, then the next question comes, uh, what are you gonna do about it, uh, as an employer? So in Japan, they, you know, because it, it takes time to put processes and, and solutions to look for that, initially it starts with the large corporations. I think it's like a three-year four year phase out by the time it goes to, uh, small and me- medium size. Liz Willetts: Yeah. And I think one of my favorite, um, customer stories was one that really, in my mind, helped enable their creativity. Um, you know, we were talking to a sports league kind of right at the beginning of the pandemic. You know, they knew that it was gonna be a washout season, all games, everything was being canceled basically, except for golf at that point in time, um, and there was obviously a worry around, you know, contact sports and, and spreading of the virus. And so, um, we had this one sports league come to us and say, "Hey, you know, we've got these season ticket holders, they're huge fans. We feel like we're letting them down. You know, they don't have a season to, to kind of, um, rally around this year. And so we're thinking about, um, you know, how can we get them to interact with players, coaches, um, you know, coaching staff, et cetera?" Um, and so they wanted to enable that sort of scenario but at the same time were concerned around, you know, "We need to moderate content to ensure there's no abusive language, either between fans, between players, um, staff, et cetera."             And so I think that was an interesting use case where, hey, yeah, you wanna detect certain things in communications and this might be completely out of your wheelhouse. Um, but being able to feel comfortable coming to a company like Microsoft and say, "You know, what can we do here?" Um, and so I thought that was, uh, enlightening, uh, case for, um, us as well. Talhah Mir: This is terribly exciting stuff, man. I know the four of us have talked about this quite a bit, but to me, sentiment analysis is the holy grail of insider risk. Being in this space for a couple of years now, um, the sooner you detect these things, the more impactful you will be, and it's all about the behavior. And one of the, the first areas, the first sort of physical manifestation of a behavior is in the communication of an individual. So that's why I sent them an [inaudible 00:30:59], to such an amazing, amazing people. It's also incredibly difficult if you guys don't. So you guys are on the tip end sort of [inaudible 00:31:05], sphere as it comes to this stuff, but, we're super excited about some of the opportunities that you guys are driving towards and how we can leverage that to kind of broaden our detection when it comes to identifying and managing insider risk[inaudible 00:31:18]. Thank you guys, this is very exciting stuff, looking forward to the rest of the podcast as well. Raman Kalyan: Yeah. And I was just gonna say, thank you so much for coming onto the show. We really appreciate having you here, and, Liz in Christophe, we can't wait to hear the different podcasts you have coming up, uh, like Talhah said. Exciting space, definitely, uh, space where there's a lot of innovation happening and we're excited to see what you have coming up. So thank you again. Liz Willetts: Awesome. Yeah, thanks. Thanks for having us on, and, um, we're excited to kind of... We've passed the torch from y'all and have a great lineup of speakers, um, over the next couple of weeks. Um, Talhah, to your point, sentiment analysis is definitely an area where we're gonna deep dive with, um, Kathleen Carley, a professor at CMU. Talhah Mir: Thanks. Liz Willetts: Um, we're gonna go deep on machine learning with one of our data scientists, Christian Rodnick, um, so definitely have some exciting, uh, conversations to come. Talhah Mir: Awesome, awesome. Raman Kalyan: And so thank you everyone for listening. Uh, this is another episode of the Hidden Risk Podcast. We've had, uh, some awesome guests on the, on the show today. Again, uh, Liz Willetts and Christophe Eisinger, and Talhah and I are excited to have you listen, uh, to their podcast as well as if you haven't heard our, uh, previous podcasts, you can find them on your favorite, uh, YouTube channel. So... Or favorite podcast channel, wherever you wanna see it.
Episode 5: Practitioners guide to effectively managing insider risks
21-09-2020
Episode 5: Practitioners guide to effectively managing insider risks
In this podcast we explore steps to take to set up and run an insider risk management program.  We talk about specific organizations to collaborate with, and top risks to address first.  We hear directly from an expert with three decades of experience setting up impactful insider risk management programs in government and private sector. Episode Transcript: Introduction: Welcome to Uncovering Hidden Risks. Raman Kalyan: Hi, I'm Raman Kalyan, I'm with Microsoft 365 Product Marketing Team. Talhah Mir: And I'm Talhah Mir, Principal Program Manager on the Security Compliance Team. Raman: Talhah, episode five, more time with Dawn Cappelli, CISO of Rockwell Automation. Today, we're gonna talk to her about, you know, how to set up an effective insider risk management program in your organization. Talhah: That's right. Getting a holistic view of what it takes to actually properly identify and manage that risk and do it in a way so that it's aligned with your corporate culture and your corporate privacy requirements and legal requirements. Really looking forward to this, Raman. Let's just jump right into it. Talhah: Ramen and I talk to a lot of customers now and it's humbling to see how front and center insider risk, insider threat management, has become, but at the same time, customer are still asking, "How do I get started?" So what do you tell those customers, those peers of yours in the industry today, with the kind of landscape and the kind of technologies and processes and understanding we have about the space, what kind of guidance would you give them in terms of how to get started building out an effective program? Dawn: So first of all you need to get HR on board. I mean, that's essential. We have insider risk training that is specifically for HR. They have to take it every single year. So we have our security awareness training that every employee in the company has to take every year, HR in addition has to take specific insider risk training. So in that way we know that globally we're covered. So that's where I started, was by training HR, and that way the serious behavioral issues, I mean, IP theft is easier to detect, but sabotage is a serious issue, and it does happen. Dawn: I'm not going to say it happens in every company, but when you read about an insider cyber sabotage case, it's really scary, because this is where you have your very technical users who are very upset about something, they are angry with the company, and they have what the psychologists called personal predispositions that make them prone to actually take action. Because most people, no matter how angry you are, most people are not going to actually try to cause harm, it's just not in our human nature. Dawn: But like I said, I worked with psychologists from day one, and they said, "The people that commit sabotage, they have these personal predispositions. They don't get along with people well, they feel like they're above the rules, they don't take criticism well, you kind of feel like you have to walk on eggshells around them." And so I think a good place to start is by educating HR so that if they see that, they see someone who has that personality and they are very angry, very upset, and their behaviors are bad enough that someone came to HR to report it, HR needs to contact, even if you don't have an insider risk team, contact your IT security team and get legal involved, because you could have a serious issue on your hand. And so I think educating HR is a good to start. Dawn: Of course, technical controls are a good place to start. Think about how you can prevent insider threats. That's the best thing to do is lock things down so that, first of all, people can only access what they need to, and secondly, they can only move it where they need to be able to move information. So really think about those proactive technical controls. Dawn: And then third, take that look back, like we talked about Talhah, take that look back. Pick out just some key people, go to your key business segments and say, "Hey, who's left in the past" I mean, as long as your logs go back, if they go back six months, you can go back six months. But just give me the name of someone who's left who had access to the crown jewels, and just take a look in all those logs and see what you see. And you might be surprised. Talhah: Yeah, and on this look back piece, Dawn, we're actually hearing that from our customers quite a bit in that, the way they kind of frame it is that, "Why don't you give me an idea, with technology, can you give me some sort of an idea that you can look through some of the logs I already have in the system, parse through that, to give me an insider risk profile, if you will, of what's happening, what looks like potential shenanigans in the environment, so I can get a better sense of where I need to focus and what kind of a case I need to make to my executive sponsor so I can get started." So that's definitely something we're thinking about quite deeply, and hearing consistently from our customers as well. Dawn: Yeah, because the interesting thing we found in CERT, we expected that we would find very sophisticated ways of exfiltrating information, but what we found was these are insiders, they don't have to do anything fancy. If they can use a USB drive, they're going to use a USB drive, especially if you don't have an insider risk program, and so they think they can get away with it. If it's a small amount of information, they'll email it to their personal email account. Or if you're an Office 365 user, just go and download the information onto a personal computer if you can, move it to a cloud site. Dawn: So we found there weren't a whole lot of really sophisticated theft of IP cases, and maybe that's because those people weren't caught. But if you can get to the point where you have a mature insider risk program that's analytics based, then you have time to look at the more sophisticated ways of exfiltrating information. Raman: I had a conversation with a customer about a week and a half ago. And you talked about people who are sometimes doing things maliciously, they are also doing other things. Have you looked at things like sentiment analysis? So this customer was talking to me about hey, communications, like people in communications actually saying things that they shouldn't be saying, maybe harassing people, and then that leading to other types of behaviors, to your point around sabotage. Would love for you to kind of, if that's something that you have either implemented yourself, or if you've heard as part of the broader OSIT group, around the communications people, the harassment, and all that kind of stuff. Dawn: Yeah, we did look at that when I was in CERT. Back then we found that the technologies just weren't mature enough, so we did not have any luck with it back then. And I don't know what Dan Costa said to you as far as what they're doing now, but in my experience, I have not found anything that really was effective. Dawn: I tried a little experiment at Rockwell, with legal approval, and just kind of looked for words like kill, and die, you know, those kinds of words, and it came back like... IT uses those words all the time. Like, "The system died, I killed the process." It was like, oh, this just isn't working at all. And the other thing that made it really hard, just with sentiment analysis, people were very casual in their communications. So it was the informal communications that made it really difficult to really tell the sentiment. So yeah, I'd love to hear if the tools mature to that point, that would be great. Raman: So one of the things that we've been looking at is using Azure Cognitive Services to really start to think about natural language, to distinguish between, "That product is killer" to, "I'm going to kill you." If we were, to your point, initially, yeah, it would be looking at keywords, and then you get overloaded with a ton of different alerts. Now if you can distinguish between the context of how the word kill was used, then you can start to highlight things like, again in a risk score type of thing, that this could be more risky communication than this other communication. Allow you to really prioritize and filter through it. Dawn: Hmm, interesting. Do you know if anyone has gone to the European Works Councils about that kind of technology? Raman: So one of the things that we have been working with the workers councils, we have customers in Europe that are using some of our solutions to start to look at communications, and they have been working with the various worker councils to start to think about, for example, pseudonymization is key here, right? You want to anonymize the user before you go down the path of really investigating them. If you're just highlighting this could be a possible violation, you want to do that in a way that doesn't really invite bias or discrimination. Raman: And so if he can do that upfront, then that would allow you to say, "Hey, okay, this might be something that's a challenge." And one of the things that we've seen recently, especially with COVID and all the different stressors that people are under, is that some customers are actually using machine learning classifiers that we have for threats, and really looking, not at me trying to threaten somebody else, but me maybe threatening myself. So suicide type things, people under a lot of pressure, and we've seen a lot of organizations start to take that route. And also in education, where you have a lot of young folks who might be sharing things in appropriately, their imagery or bullying, that kind of stuff. That's another area where we're seeing some activity around this. Dawn: Hmm. That's interesting. Yeah, I bring up the works councils just because when you're talking insider risk, it's a really important topic, that if anybody is watching this that doesn't know what the works councils are and does business in Europe, you need to find out what they are because basically they're there to protect the privacy of the employees in the company. And some of them have a lot of power, like in Germany, they can just block you from using a new technology, and in other countries you simply have to inform them, but they can't stop you. Dawn: And so we're very careful about our works councils, and we have taken the approach that that's our bar. If we can't get something through the works councils, then we don't do it, because we feel like they're protecting the privacy of their employees, and all of our employees are entitled to that degree of privacy. So that's kind of how we approach it, and so it's kind of an all or nothing approach for us. But that's each company's decision to make, and it probably depends on how much business you do where and how global you really are, but it's something that everybody should look into who's working in insider threat. Talhah: So with COVID-19, it's been sort of a punch in the gut, the whole roles, having to react, personal lives, professional lives. And clearly we're starting to see from our customers this insider risk becoming more heightened in terms of awareness of it, and a need to manage it. Because you have work from home, and data's being moved all over the place. What have you seen work in this environment, with your experience, how have you kind of adjusted to this COVID reality? Have you done things differently with your program? What kind of advice would you give to your peers in the industry and how to deal with it? Dawn: So we were fortunate. I know a lot of companies, from what I've been reading, a lot of companies, their employees use desktops at their office. And so when COVID struck, suddenly you have employees at home working on their personal computers. Fortunately we didn't have that. We've been using all laptops since I went to Rockwell in 2013, so it was easier for us because our employees are just working at home now. They're off our network, but they're using their same computer they always have, with the same controls that we've always had. But we are seeing a big uptick in them downloading, and again, this is not malicious, but downloading a game that has malware in it, downloading pirated copies of software, things like that. Dawn: Because they're at home, and they're sitting at their desk and I guess they figure, "Hey, I have my Rockwell computer here, I guess I'll play my games on here and not fight with the kids, because now they're home, they're trying to do schoolwork, they're trying to play games, they're trying to watch movies. And so I'm not going to compete for that computer, I'm going to use my Rockwell computer." So we're catching a lot of those things. And that's what I meant when I said that by using the analytics to give us more time, we're not doing all those manual audits. Dawn: Now we have time that the C-CERT, they used to catch those things and they would just kind of, "Hey, you're not allowed to do that, get that off of there." Or just block it. But now they come to us because sometimes when you see someone downloading malware, we had an employee who downloaded malicious hacking tools, and our C-CERT contacted the insider risk team and said, "Hey, this is someone who's a developer and downloaded a hacking tool, and so we're going to hand it to you to investigate." And we talked to their manager because we thought, oh, well maybe this is a pen tester and so he needed the hacking tools. Dawn: Well, there was no reason that he needed the hacking tools and the manager was very concerned. Like, "What is that guy doing?" And he was sophisticated, we have a secure development environment that protects that development environment with additional controls. And so he downloaded it to his Rockwell computer, and then he was trying to move it over into the secure development environment, so we saw what he was doing. And he had no good reason, but this is where we didn't rely on the human social behaviors to trigger the investigation, we were able to cut, catch it quickly, because of that technical indicator, and because of the partnership with the C-CERT. Dawn: So it's interesting just to see, as you talk about the evolution of technology for insider threat over the years, it's now to the point where we're not just looking at theft of IP, we're looking at those technical indicators that might indicate sabotage. So we're not so reliant on human behavior because, look at COVID. People are working at home, so are we really going to know when we have an employee who's really angry and really upset and getting worse and worse? I don't know. I don't know if we're going to be able to rely on those human behavior so much. If you're in the office all day, people can see that, but if you're on a phone call here and there, you might not pick that up. Talhah: That's right. And this could lead to sabotage type scenarios where, we've worked for our customers this ability to detect technical indicators which may indicate somebody downloading unwanted software or malicious software, or somebody trying to tamper with security controls, is so important because these could be those leading indicators, similar to behavior indicators, these are technical indicators that could indicate an oncoming potential sabotage risk. Dawn: We had a very interesting case, but I hate to talk about that one. Yeah, I hate to talk about that one, because this actual individual told me, I don't want you to go out and talk about me in your conferences, Dawn, don't ever do that. Yeah, so actually I'm not going to talk about that one. I'll talk about a different one, though. We had a team, a test engineer team, that was under intense deadlines and really working long hours and weekends. And one day two of the employees on that team had a big, huge verbal argument. Just yelling at each other, not physical, but very, very verbal argument. So bad that someone had to go get a manager to come in and break it up. So he broke it up. Next day the whole test environment goes down, and that's really bad. It took three days to rebuild the environment. Dawn: So when you're working nights and weekends to make a deadline, and now you lost three days, that's a huge deal. And the manager said, "When that first happened, I was thinking, "Well, it went down, let's just get it back up and not worry about why until later." But then he said, "I thought about Dawn's insider risk presentations," because I communicate as widely as I can around the company to everyone, not just HR, about insider risk. And he said, "I thought about Dawn's presentation and the concerning behaviors and I thought, hmm, wonder if one of those two could have deliberately sabotage the test environment." So he contacted us, we got legal approval to investigate, and sure enough, when we looked, one of those guys wrote a script to bring down the entire environment. Dawn: And when we talked to him he said, "Oh, well, I had in my goals and objectives, I had an objective that I had to write some automated scripts to maintain the environment. So I was testing it, and it just accidentally brought everything down." And we're like, "Wait a second," this was in like April, his objective was due like September 30th, the end of our fiscal year. So we just didn't buy it, and we ended up looking and we could see, he actually was executing these commands. He didn't write a script, he was executing the commands manually and brought down the test environment. Dawn: But it was a really good case where, if that manager hadn't thought to contact us, who knows what he would've done next. So I thought that was a really good, even though we did not avert the sabotage completely, he did commit the sabotage, three days impact, which was a big deal, but it could have been much, much worse because he could have done much worse, and that could have been the next step. Just another story, Talhah. Talhah: Love it. Raman: Yeah, I mean, that's just crazy. This has been a great conversation, Dawn. The stories that you've told, they've just been captivating, and the last thing that you just mentioned, which is really, within an organization, to have a successful insider risk program, you really need to educate all levels of the organization, all the different teams so people can sort of look and be on the lookout for these types of things. Dawn: Right. Raman: Not only to identify the risks, but to also maybe help support people who might be under intense pressure. Dawn: Yeah, and first of all, deterrence is huge. We talk very widely, we have an insider risk blog that we put out internally for employees. We talk about cases, we talk about what we find, because deterrence is a big thing, and I think that's why we're not catching as much malicious activity as we used to. Now we're finding, almost everything we're finding is unintentional, it's not malicious. Because I think word has gotten around, "Hey, if you try to do that, we're going to catch you." We tell people that all the time, don't even try, we're going to catch you. Talhah: Totally. Dawn: So yeah, you're absolutely right. Raman: I wanted to thank you for being on our podcast, it's been a great conversation. I have definitely enjoyed the time that we spent together today. Dawn: Well, thank you. I enjoyed it as well. Raman: Talhah, man, that was a great conversation with Dawn Cappelli, as always. Talhah: Yeah, super, super excited about that one. Like I said, she's a huge inspiration for me. She's a huge inspiration for everything we're doing in this product right now, what we did at Microsoft. So a wealth of knowledge, wealth of experience. So give them a chance to share that with everybody. Really appreciate her. Raman: All right, well that does it for our podcast for insider risks. If you're interested in learning more, definitely visit aka.ms/insiderriskblog. And we're gonna have more podcasts available shortly, so definitely subscribe to, "Uncovering Hidden Risk" wherever you're listening to this podcast. And thank you so much.
Episode 4: Insider risk programs have come a long way
21-09-2020
Episode 4: Insider risk programs have come a long way
In this podcast we discover the history of the practice of insider threat management; the role of technology, psychology, people, and cross-organizational collaboration to drive an effective insider risk program today; and things to consider as we look ahead and across an ever-changing risk landscape. Episode Transcript: Introduction: Welcome to Uncovering Hidden Risks. Raman Kalyan: Hi, I'm Raman Kalyan, I'm with Microsoft 365 Product Marketing Team. Talhah Mir: And I'm Talhah Mir, Principal Program Manager on the Security Compliance Team. Raman: Talhah, this is episode four where we're gonna talk about putting insider risk management into practice. Talhah: That's right, with Dawn Cappelli, somebody who's been a personal inspiration for me, especially as I undertook the effort to build the insider risk program in Microsoft. Somebody who I've admired very much for what she's done in this space, an amazing storyteller, and how she lands the value and importance of insider risk. Super excited to have her here with us today to share some of that with our customers abroad. So really looking forward to this conversation. Raman: Yeah and Dawn is the CISO of Rockwell Automation, and know that this is gonna be great. So let's do it, man. Talhah: Let's do it. Raman: So thank you Dawn for being on our podcast. Tallah and I started this about two years ago at Microsoft, where we started looking at insider risk management in Microsoft 365. Of course had been doing it a lot longer for Microsoft as part of our insider threat group and he talked a lot about you and so we're really excited to have you on the podcast. And the interesting thing is is that everyone that we've actually had a conversation with thus far actually knows you. So I'm excited to finally meet you virtually. We met once before, but thank you again and very much appreciate it. Dawn: You're welcome, thank you for the invitation. Raman: Yeah, absolutely. Just for people listening, would be great to get your background, what is it that you do now, how did you get into insider threats, all that sort of stuff? Dawn: Okay, so right now I am the VP of Global Security and the Chief Information Security Officer for Rockwell Automation. We make industrial control system products. I came to Rockwell in 2013 as the Insider Risk Director. So I came to Rockwell to build our Insider Risk Program and at that time not many companies in the private sector had Insider Risk Programs. Financial did, Defense Sector of course, they counterintelligence, but not many other companies had Insider Risk Programs. I came here from Carnegie Mellon, the [CERT] program, which for those that don't know, CERT was the very first cyber security organization in the world. It was formed in 1988 when the first internet worm hit and no one knew what it was or what to do about it and Carnegie Mellon helped the Department of Defense to respond. So going back, I actually started my career as a software engineer, programming nuclear power plants for Westinghouse. From there, I went to Carnegie Mellon again as a software engineer, but I became interested in security and SERP was right there at Carnegie Mellon, so I tried to get a job there. Fortunately, they hired me. I didn't know anything about security, but I got a job there as a technical project manager so that I could get my foot in the door and learn security. So I was hired by CERT, CERT is a federally funded research and development center. So it's primarily federally funded. They had funding from the United States Secret Service to help them figure out how to incorporate cyber into their protective mission. So at this point, this was August 1st, 2001 when I started, the Secret Service, their protective mission was gates, guards, guns. It was physical and they knew they needed to incorporate cyber. So my job was to run this program and the first thing that we had to do was protect the Salt Lake City Olympics, which were in February 2002. So I thought, "How cool is this? I get to work with the Secret Service, protecting the Olympics and I know nothing about security. How did I ever get this job?" And it was very cool. I thought this is the greatest thing. "I can't believe they're paying me for this," but then a month later, September 11th happened and suddenly the Olympics they thought that would be the next terrorist target. And so that cool fun job became a very real, very scary job and when we first went to Salt Lake City to talk to the Olympic Committee about how could a terrorist bring down the network or harm attendees? And someone just, the security experts were looking at network diagrams and trying to figure this out. Someone just happened to say, "So have any network administrators or system administrators left on bad terms?" And they gave us a list of 20 people. So we're like, "Oh my gosh, these 20 people they could get right into this network. They know what all the vulnerabilities are." So we decided we needed an insider threat team and an external threat team. I was intrigued by the insider threat team. You have people and you have technology, that really intrigued me. So I said I would take that team and look at where it led me. So yeah, that's how I got started. Talhah: So Dawn, one of the things that Raman and I talk about quite a bit is how influential your work at CERT, the book that you wrote was in not only helping me get acclimated to insider risk, insider threat management, but also what we did at Microsoft in terms of building a program and the solution that we're building. And one of the things that was big for me, coming from a traditional security background where you have this tendency to think that we could pretty much contain and manage the risk from a SOC perspective. When it comes to insider risk it's important to consider other business partners like HR and legal. I'd love to get your take on that. I know that's one of the things that was big in terms of my learning, how you came about that and what your journey has been building those partnerships in Rockwell? Dawn: So when I took over that insider threat team or not took it over, but created it, we thought, "Okay, so what is an insider threat? How could they attack the Olympics? What could they do?" And so what we decided to do was let's, we have the Secret Services, our customer here. Let's ask them to collect every real case they can find/ let's look at what insiders have done in the past and learn from that. So they did, they collected 150 cases for us for the first batch and we looked at like, who does it? Why do they do it? What do they do when, how, where and how can you stop them? And what we came to realize was this is very different than an external adversary, that they could be anywhere in the world and you have no idea who they are. These are insiders, they are in your company and they come to work every day and the interesting thing about it was we actually partnered with the psychologist, the behavioral, I can't remember what the team was exactly called at the time at the Secret Service, but it was the behavioral psychologists and we teamed with them on this effort because we realized this is very different. We need to look at security issues, technical issues, but we also need to look at the people issues because they are people and we see them every day. And so we teamed with them and we looked through those cases and we just created a big database of all of these different attributes of every single case that we wanted to catalog and it was behavioral aspects of the case, organizational aspects of the case and technical. So there were really three components. And because we teamed with psychologists from the very start, it's always been a part of my DNA when it comes to insider threat. These are people. And what we did with those cases was we kept collecting them because it was so valuable to collect them. We ended up with, by the time I left CERT, we had over 800 cases collected. And we started looking at the cases to look for patterns because the attitude back then was, these are insiders. They have access. They come in to work every day. You'll never stop them. They can do, you'll never be able to stop them from doing something. They do what they do every day. They just do something bad. And fortunately, because the patterns were so distinct in these cases, we realized, "Yeah, you can. You can mitigate these by looking at social behaviors, as well as technical behaviors." And that's where HR and legal come in, because I really realized when I came to Rockwell and tried to put all of our theory into practice, I thought, "Well okay, so someone who's going to commit cyber sabotage, insider cyber sabotage, we know that in almost every single case we had and we have like 169 of them, every single case there were behavioral indicators." Happy people do not commit sabotage. People that commit sabotage are they're angry, they're upset about something and their behaviors get worse and worse over time. So here I am in a company like Rockwell, where we're in over a hundred countries around the world. So how can I possibly train every manager in the company on what to look for and came to realize that really what I need is HR. Because if you have an employee, we used to say in CERT that an employee who's going to commit sabotage ends up on the HR radar, meaning their behaviors get bad enough that they come to the attention of HR. And so there it is, HR, they're my eyes and ears all around the world. If we can train HR as part of their normal training, then we can rely on them to be the ones that notify us when there's a potential insider threat and it works. It works amazingly well, our HR department, they get it, they know when to contact us, but that's where legal comes in because there's subjectivity when you're talking about someone's human social behaviors. So you can't just initiate an investigation because someone says, "Hey, Dawn's been acting really crappy lately." So legal is a very important part of that to make sure that we really substantiate what we're being told. We have multiple people that can attest to the behaviors that we aren't violating any privacy laws in that local part of the world because they're different everywhere. So that's how the human and the HR and the legal partnership came to be. Raman: It makes a lot of sense. Actually even here at Microsoft as we were looking at the solution broadly or the insider risk solution is we really wanted to bring in HR and legal into that conversation such that organizations would have the ability to collaborate with those two teams to not only help to ensure that they were meeting their regulatory requirements or they were compliant with employment laws and privacy laws, et cetera but what we also ended up realizing was that there's this other side to the coin. We talked a lot about it here on malicious threats, but then there's the inadvertent risks as well. People being distracted and especially in this time with COVID and everything, doing things that maybe they didn't mean to and what we've heard from HR legal is, "Hey, how could we maybe use some of this insight to help support a stronger company culture to help people do the right thing and feel like, I'm not going to always get slapped on the back of my hand because I did something wrong." Dawn: Right, yeah. Most of the insider risk cases that we have are unintentional. It's people who they are doing something they shouldn't, they're putting information somewhere they shouldn't or downloading software that they shouldn't be, but they don't have malicious intent, but that has changed a lot over the years. When we first started, we did catch people who it appeared that they were trying to be malicious. When I started the program at Rockwell, I always tell companies you can start very quickly. You don't have to go out and invest in technology. We started with nothing. That was one lesson learned when you take a job in the private sector you should ask like, "Am I going to have people? Am I going to have a budget?" I took the job and then found out that I didn't have any people or a budget, but I built the program with nothing, just me and just worked with IT and so we approached theft of IP first. Theft of IP is much easier than sabotage. Theft of IP is very different than sabotage by the way. Theft of IP, we have very nice, happy people that try to steal intellectual property. They're not disgruntled. They're ambitious. They're going on to their next job and they feel like "What I created is mine. I'm going to take it with me." So they are disgruntled, sometimes they are. But the key in theft of IP is that they're leaving the company. So most people that steal intellectual property do it within 90 days of resignation. So I knew that going into Rockwell and so I first, well our executive sponsor of our program is the Senior Vice President of HR. So I talked to her and I said, "Hey, can I just pick one team in the company that has access to the crown jewels, the most critical information we have and just use them as a pilot?" "So I'll work with their HR person. So she knows to be on the lookout and let me know when someone's leaving the company," and then I worked with it and I said, "Hey, what kind of audits do you have? Or what kind of logs do you have? I need USB logs, cloud activity, email logs, can I get access to them? If I have a person's name, can I go in and just look up their activity?" And they said, "Sure, yeah, we can give you that access." So then I went to legal and I said, "Okay, here's what I want to do. One team, six months, I'm going to do a pilot. And here's how it'll work. HR will tell me someone's leaving. I'll go to IT, I'll look them up in the logs. And if we find something, then we'll investigate and I'll pull you in. And they said, sure, you can do that. So I was two weeks into my pilot and at that time I was educating HR globally about insider risk. But two weeks into my pilot, I get a phone call from an HR person in a totally different part of the company and she said, "I know I'm not part of your pilot, but we just had four engineers that all quit at the same time from the same team and they now have a competitive company and they're starting to try to take our customers away and there is no way they could have built this capability in two weeks. We've invested millions of dollars in this capability over years and they just now had to have taken it. There's no other way they could be competing with us. So can you do an audit?" So I went to legal and I said, "Well, they're not part of my pilot, but can I do an audit?" And they said, "Sure, go ahead and do an audit." So I did it and found sure enough, they had taken all of the intellectual property that that team had created for years and were starting to try to take our customers. So we contacted law enforcement, took legal action against them. We ended up collecting royalties from them for like five years, every time they went to a customer that we already had and so it ended up actually, I wouldn't say a good news story, but it certainly got the Insider Risk Program off the ground because my six month pilot was over after two weeks. They said, "You need to roll this out. We don't need a pilot, just roll it out," and here I am one person and no money, but you can prove your value really quickly. I remember talking to you about that Tallah about how to just take a backward look at people that have left your company over the past 90 days and see what you see and when companies do that, they're always shocked. Talhah: And I'm grateful for that. We followed a very similar approach where we didn't try to boil the ocean Dawn. It was very focused approach to say what kind of scenarios we care about, what kind of risks we care about, departing employee data, theft for example is one of the key ones that you educated us on. And just go look back, see what the data is showing you and go from there. And this really is a big inspiration for us in terms of how we develop the solution at Microsoft now, where we're trying to take these key scenarios that customers care about, that we see a lot of data in the field around and how do we build these detections to be able to identify those things? So in that same vein, I'd love to get, you've been a great storyteller, which really inspired us and a lot of the folks at [inaudible] as well. I'd love to hear some of your other stories that really got you focused on different parts of insider risk, something you came across from working in the peers or in your own experience, other stories like this. Dawn: Well, in November of 2014, we had an employee who resigned and he was one of our software, senior firmware engineer. So he had access to the crown jewels and we ran an audit of course. And back then I actually had, I had one or two people in my team by then, but it was still very manual. So this engineer was leaving, the company had access to all of our source code, did an audit and found that, oh my gosh, I'll never forget. Whenever I did have a team because they went to HR, HR called me and said, "Dawn, this is the big one," and I was getting on a plane to go to Milwaukee, to our headquarters and I said, "Okay, I'll be there in an hour," and it was the big one. It was an employee who had taken all of our source code. He was from China, was working in Milwaukee, but was from China. Just took the information on a USB drive and was leaving the company and so we met with him, just like we always do. We had a really good process down and so we caught this pretty quickly. We met with them and said, "Look, we have logs. We know that you took this information. We just want to get it back." Because that was always our attitude. We don't want people leaving the company feeling like we distrust them or like they're a criminal. We always just say, "Look, we know you took it. We just need to get it back," and he wouldn't give it back. He was very, very resistant about giving it back and so we contacted the FBI. You figure this is a lot of our source code and this ended up going to federal court. So I'm not saying anything that isn't public knowledge or available in the public. So we ended up going to federal court and the interesting thing is that he ended up, he was found not guilty. It's funny because our company lawyers were really upset about that. They were like, "We never have lost a case. I can't believe we lost the case." And I said, "No, we did not lose the case," because our goal is to protect our intellectual property. We caught him fast enough that after a year and a half of forensics by the FBI, they found no evidence that he had given that information to anyone else. So we caught it fast, we took action, law enforcement went in and we got the information back. So from my perspective that was a success story for the Insider Risk Program, but yeah, the lawyers didn't really see it that way. Raman: Totally. One of the things you just mentioned on was that at this time your process was a little bit manual. We've spoken to an applied researcher here at Microsoft who spoke to Jan Costa from CMU last week and a lot of what we talked about was around the technology. As you've been in this space for so long, how has the technology evolved from your perspective and how is it really helping you being not only be more efficient, but catch things that you may not have caught before? Dawn: Well it's funny because when I was at CERT and we even said this in the book, what we really need is a technology that will let us pull in whatever logs we have. Every company's different, every company has different data sources. We need a tool that will let us bring in all of our logs, correlate them together and create custom risk scoring algorithms based on the data and those logs and the logs have to include HR data because like termination date, that's the key. That's the key trigger there for theft of IP. So we kept saying that at CERT, "Why can't somebody do this? Why can't somebody do this?" And in 2000, about 2000, that case happened in November of '14. And leading up to that case, there started to be some products on the market that did that. And so I had been saying, "Hey, there are these products that we can actually automate what we're doing so we wouldn't have this manual process because it's really not scaling. It's getting too big," and they said, "Well no, we don't have the money for that." Well, the day that we caught that engineer, I get a phone call that night. The Senior Vice President of his business called our CSO at the time and said, "What's that technology Dawn wants? I'll pay for it. We need to get that in here." So it was fantastic except I learned being an early adopter can be very painful. So you figure this was like early 2015. So that was five and a half years ago. Boy, that's a long time in technology terms and so it was a bumpy road, but it's exciting to me that we had the idea in CERT, I came to Rockwell and was able to be one of those early adopters of that technology is a nice little road to go down. Raman: That's awesome because we've heard a lot from customers as well as like, "Hey, I want to get started, but I don't know where to start and I need, I have all these different sources and I just need something that I can just quickly scope." To your point scope the team I want to start looking at and see what happens and that to your point is if you make it easy like that and allow people to say, "Hey, you don't have to spend a lot of time configuring things, bringing in a bunch of logs and scripts and things like that." If you can get started and saying, "I'm interested. I'm concerned about IP theft, I'm concerned about this particular group. and I want to just go," that's probably half the battle right there. Dawn: Well yeah, I just actually put together a graph last night showing the number of manual audits that we have done from 2015 through 2020 and just created a bar graph, looking at how has the program improved from an efficiency perspective. And it's just incredible, once because all of those technologies have matured so much now, we are totally in a, I shouldn't say totally. We still do some manual audits. If someone walks in and says, "I am leaving and I'm going to the competitor." The HR picks up the phone, the phone and says, "I need an audit. We can't wait until tomorrow when the analytics runs the risk algorithm and says, 'Oh hey, that person it's gone.'" But for like 90% of the cases our insider risk analyst just come in, sit down at the dashboard, start at the top of the list and look at the highest risk users first and just start working your way down. And there are times, if there's a reduction in force, you have a lot of people that are showing up on that dashboard, but the beauty of using analytics where you have these risk scoring algorithms, they are combining all of the factors and they're telling you, here's where to start and if you work your way down, what I tell the team is, "When it's time to go home, it's time to go home. You don't have to work through the entire list because as you get down and those risks scores are lower, it's okay if you can't get to everybody today. You got to the most important cases." And that way, people don't even show up on the dashboard unless they have some activity that warrants being on the dashboard. So the number of audits, where we used to look at every single person leaving the company, you had a bar like this. Now we're just looking at the people that are leaving the company and have suspicious activity. That's a much smaller bar and it enables us to focus on other things, like more technical, the sabotage kinds of cases and the serious security violations which wasn't even in our scope before. Raman: That was an amazing conversation with Dawn. Like you said, man, she's a great storyteller, right? Talhah: Yeah and I've said this before, but I definitely gotta say it again. We need more time with Dawn. Raman: Well, we have more time. We got one more episode with Dawn. In this one, we talked about the history and origins of insider threats, insider risks. In the next one, we're gonna talk about how to really build an effective program. So definitely join us for that and be sure to subscribe wherever you're listening to this podcast. And we appreciate it.
Episode 3: Insider risks aren’t just a security problem
21-09-2020
Episode 3: Insider risks aren’t just a security problem
In this podcast we explore how partnering with Human Resources can create a strong insider risk management program, a better workplace and more secure organization.  We uncover the types of HR data that can be added to an insider risk management system, using artificial intelligence to contextualize the data, all while respecting privacy and keeping in line with applicable policies. Episode Transcript: Introduction: Welcome to Uncovering Hidden Risks. Raman Kalyan: Hi, I'm Raman Kalyan, I'm with Microsoft 365 Product Marketing Team. Talhah Mir: And I'm Talhah Mir, Principal Program Manager on the Security Compliance Team. Raman: And this is an episode three with Dan Costa, talking about how do you bring in HR, legal, privacy, and compliance into building an effective insider risk management program. Talhah: Yeah, super important. This is not like security where you can just take care of this in your SOC alone, you need collaboration, and he's gonna tell us more on why that's critical. Raman: Yeah, it was awesome talking to Dan last week. So I'm, let's do it. Talhah: .... when you talk about these predispositions, these stressors. You gave a great example of a organizational stressor, like somebody being demoted or somebody being put on a performance improvement plan. You can also have personal stressors outside of work that you guys have talked about openly in a lot of your guidance and whatnot. When you look at these, at least the organizational stressors, that a lot of times they reside with your human resources department, right? So this is a place where you have to negotiate with them to be able to bring this data in. So talk to me about that. How do you guide the teams that are looking to establish these connections with their human resources department, the HR department, and negotiate this kind of data so that it's not just for... It's for insider risk management purposes. So talk about that and also talk about, are there opportunities that you see where you could potentially infer sentiment by looking at, let's say, communication patterns or physical movement patterns or digital log-in patterns and things like that? So how can you help to identify these early indicators, if you will? Dan: Yeah. So let's start with how we bridge the gap between the insider threat program and stakeholders like human resources, because Talhal, you're spot on. They're one of the key stakeholders for an insider threat program, really in two respects. One is they own a lot of the data that will allow us to gather the context that we can use to augment or supplement what we're seeing from our technical detection capabilities, to figure out was that activity appropriate for the job role, the responsibility of the individual associated with the activity. How can we pull left relative to an incident progression and find folks that might be experiencing these organizational stressors, right? That's data that our human resources stakeholders have and hold. We've seen insider threat programs over the years struggle with building the relationships between stakeholders like human resource management. A lot of the challenges there, from what we've seen, come down to a lack of understanding of what it is that the insider threat program is actually trying to do. In many cases, the insider threat program isn't necessarily without fault in making that impression stick in the minds of human resources. So this goes back to the insider threat program's not trying to be duplicative or boil the ocean, or carve off too big of a part of this broader enterprise-wide activity that needs to happen to manage insider risk. In that early relationship building and establishment, there's an education piece that has to happen. Human resources folks aren't spending all day every day thinking about how insiders can misuse their access like we are, right? So much of it is these are the threats that our critical assets are subject to, by the nature of our employees having authorized access to them. We understand that this isn't always the most comfortable subject to talk about, but here's a myriad of incident data that shows where vulnerabilities existed within a human resource process, or a lack of information sharing between HR and IT enabled an insider to carry out their attack or to evade detection for some significant amount of time. So, so much of it just starts with education. Once we've got them just aware of the fact that this is something that the organization has to consider as a part of its overarching security strategy, we need to help them understand the critical role that they play. Understanding how we use contextual information. Understanding how we don't use contextual information and helping them understand what, really, what an insider threat program is designed to do is help them make better data-driven decisions faster by giving them access to analysis that can only be conducted by folks that can take the data that they have and stitch it together with IT data, with SOC data, with information assurance data, with the risk register that's owned by our chief risk officer. They probably don't want to be spending all of their times writing analytics and making the relationships with IT and legal to facilitate some of that stuff. That's what the insider threat program is here for. So helping them understand that this is a mutually beneficial relationship, that the data that they provide will help the organization more proactively manage insider risk and that they are a stakeholder in terms of they are potentially recipients of the results of the analysis that the insider threat program itself will conduct. Helping them better understand how to do things like make refinements, enhancements, or improvements to the onboarding and offboarding processes. Helping them understand when it might be time to make a change to employee compensation strategies within the organization, how the employee performance management system is leveraged within the organization. The insider threat program, once it's up and running and bringing in all the different data and engaging with all the different stakeholders, can help highlight and emphasize where those processes are working, where they can be refined. It's hard for the stakeholders to do that work on their own. So there's a fine line to walk there too, right? Which is, you can't go in and say, "We think we can be doing your job better than you can because we have data scientists and all this other cool data." So a lot of effective insider threat program building from a relationship building perspective really comes down to having an insider threat program manager who has that organizational savvy, that can find the right ways to build and establish these relationships within their organizations. So it's not easy by any stretch of the imagination, but we're seeing lots of organizations be successful at helping their stakeholders understand the threat, helping their stakeholders understand the two way street of, "This is the information we need. Here's why we need it from you. Here's why you're the only part of the organization that can help us with this, and here's how we think it can be beneficial to you and the organization more broadly." Raman: Yeah. No, that's awesome. I think this is the kind of conversation that Talhah and I had when we first got into this space together. He'd already been in it, but I had come at it from a product perspective, as we think about helping our customers tackle these issues. One of the things we talked about early on was hey, look, this is, like you mentioned, Dan, this is a human problem. This is the employees that you're dealing with. These are people that are part of your organizational family, right? You just can't set something up that starts investigating people and snooping on them and doing that sort of thing. You got to take a little more holistic viewpoint here. The things that Talhah and I talked about were around insider... This is why insider risks makes more sense than insider threat because as you think about HR, they're stewards of the corporate culture, right? They're the ones that are responsible for help building a corporate culture of inclusion and of people feeling like they are wanted and they are rewarded and they're building towards a positive outcome for the organization. For them, the program itself can highlight, to your point, the risks that are there that might impact that organizational health and in a way actually helps support a better, stronger organization by pointing out areas that are vulnerable [inaudible] that they can go after and build training around, that they can go out and say, "Hey, this is something that we should be doing," or, "People aren't feeling supported and so they're doing things that they shouldn't be doing." Now rather than treating the symptom, treat the underlying issue. Dan: That's a great point, Raman. So I would add to that and take that a step further, which is, I think stakeholder parts of the organization like human resources don't intuitively or necessarily think about the things that they do as being influencers of increased security or resilience within the organization. So much of that education is helping them understand, "Look, organizations that have better management practices have higher degrees of employee engagement, have higher degrees of perceived organizational support amongst all the other great benefits that they experience." They also experience less security incidents. So what you're doing and these practices are security controls, and we really have to start to help our organizations broaden their understanding of what constitutes a security control. And, "Oh, by the way, HR, if you know that these are things that you'd like to be doing just to increase morale, we can amplify that ask and message up the chain when it comes time for budget requests, by saying, 'Hey, not only is this a good thing for us to do from a talent management perspective, but it's a key security strategy for our organization.'" Another way for those two disparate parts of the organization to work together in some mutually beneficial way. A couple of years ago, we did a study called the Critical Role of Positive Incentives in Mitigating Insider Threats. It was really looking at just this, Raman, which is, could we establish a relationship between levels of connectedness at work, employee engagement, perceived organizational support, and a reduction in the number of insider incidents that organizations experience? We actually leveraged our opensource insider threat information sharing group for a lot of that work to conduct surveys. What we saw was positive correlation between... Increases across those dimensions of connectedness at work, perceived organizational support, and a decrease in the number of insider incidents that organizations were experiencing. So the key takeaway from that study was better places to work ended up also being more secure organizations, particularly as it pertains to insider risk. Now, we're trying to continue that work and drive that towards a causal model and really being able to show that these are the root causes. These management practices, these HR practices, by putting them in place, you cause a reduction in insider incidents. So it's an area of ongoing research, but intuitively it just makes sense, right? So much of what we're trying to do in 2020 with insider threat programs is help folks recharacterize what constitutes security controls and what constitute valid response options for the things that insider threat programs should be on the lookout for. Talhah: So on that, I assume there's an inverse correlation then between an organization being potentially disconnected because of things like work from home and what's happening in today's environment and an increase in potentially insider risk activity. Is that a fair extrapolation to make? Dan: Well, I think it's a fair hypothesis to consider testing, right? It's the opposite side of the coin. I think now would be a fantastic time to be making sure that we can collect evidence and data that would show those data points trending in maybe a different direction, right? As our organizations are experiencing unprecedented volumes of personal and professional stressors across our workforces, what's that doing to the rates of occurrence or the rates of frequency with which we're experiencing insider misuse cases? It's the kind of thing where it takes organizations a while to collect that data, right? So I don't know that we're going to know for sure until we're maybe a little bit further out because these incidents tend to not evolve over the course of days and weeks or even months, but usually on the order of magnitude of several months, if not years in most cases. So it's one of those things, right? I think there are going to be far reaching implications, particularly from an insider threat perspective, that we'll be able to attribute to just how drastically everyone's normal changed over the past several months. Talhah: So Dan, we talked about stressors and a lot of times we hear customers talk about insider risk management. It really boils down to a game of indicators. When you have the right set of indicators and the ability to orchestrate over that, correlate over that, that's when you start to at least do the first part of the whole problem, which is to identify them. One of those indicators you talked about are these stressors, and you talked about the importance of partnering with your human resources organization, but how do you think about the potential to infer those stressors through communication channels or other means of looking at certain indicators in an environment to see if somebody is potentially disgruntled? We'd love to get your thoughts on that based on what you've researched. Dan: Yeah, certainly. So leveraging text analytics and natural understanding has been a hallmark of some of the research that we've done in this space. We've got a multi-part blog series that talks about how to apply text analytics to various stages of insider threat detection. What you'll find in there is a real strong emphasis and focus on the detection of the concerning behaviors and activities and the stressors that can precede attacks. So the state of the practice a few years ago was keyword-based searches, right? So these big buckets of words that we associate with topics like money or topics like code name sensitive projects within organizations, and every time we see one of those words being used, let's generate an alert and have an analyst dig in there. Over the past several years, we've seen the state of the practice move past those simple keyword-based searches and start to leverage AI and ML to help with natural language understanding that can better contextualize that, deal with the nuances of electronic communication, and then again, form the foundation for features that comprise a broader model that's really a model of models, that helps us understand the data that we're seeing in aggregate across our organizations, relative to what we constitute as our highest priority risks to our critical assets. This goes back to our HR friends and also our friends in legal and privacy, when this can be a tough pill to swallow from a legal privacy and civil liberties protection perspective for lots of organizations, right? This is the next step. We're going to start reading our employees electronic communications to figure out if they're talking about having money problems. This goes back to the need to educate those stakeholders in terms of what it is that we're actually trying to do, what we're not trying to do, who gets access to that analysis and what the allowable response options are with regards to the end products or end results of that analysis. And helping them understand that, "No, no, we want to feed this back to you all so that we can help you all and support you all in your decision making processes about what it is that we're seeing on our organizations networks and systems." So it's something that we're seeing lots of organizations start to incorporate after close consultation and collaboration with their legal, privacy and civil liberties folks. Certainly as you're considering this for large organizations that are operating maybe outside of the United States, you're going to have to make sure that you're working within all of those different legal jurisdictions that your program might be operating to understand what is and is not allowable in those different jurisdictions, because the privacy protection rules obviously change depending on operating location. Raman: Yeah, that's a great point. I mean, I had a conversation with a customer, I think back in springtime, and the CSO was all in on, "Hey, I want to go and identify these risks, use an automated tool," et cetera. About a week later, I get another phone call, which is, "Hey, now my chief privacy officer wants to have a conversation with you and she needs to better understand how information is being protected, how the PII of individuals is being protected." Because to your point, I could generate an alert, but that doesn't mean that person is necessarily doing something wrong. It's just an alert that's popping up. Now at that point, I'd want to protect their information. I want to make it private, anonymized so that there isn't this bias, this discrimination that might happen at the analyst level. Once we went through all of the different ways to protect information and privacy, anonymization pieces, that convinced them and they said, "All right, great. We're going to roll this out worldwide across multiple divisions in multiple countries." So you bring that up. That's a great point there. The other point you made, Dan, was around there's multiple sources here. I think Talhah touched upon this earlier, which is customers struggle to figure out how do I get started, right? You talked about, "Hey, you need sentiment analysis. You need contextual information. You need data sources beyond just..." For example, a lot of organizations say that they have an insider risk program, but yet they've just implemented DLP and that's it. It's like, "Well, that's only one piece of the puzzle and that's going to create alert fatigue for you." When we talk to him and we say, "Hey, you need not only the endpoint indicators, but you need signals from sentiment. You need signals from maybe HR data and other sources." They're like, "Wow, that's a lot for me to try to figure out and [crosstalk 00:19:43], you know what I mean?" So I think getting started quickly and, to your point, around scoping it to what are the risks that are most important and then quickly getting started on tackling those, scoping the right people and involving the cross-organizational parties, is probably is the foundational step that most organizations... Dan: Yeah. That, "We have a DLP program or we're trying to expand it," that's a very common pattern that we've seen in industry. One of the places that we've helped organizations get started in that space, what's that next data source to incorporate, was simply a list of known departing employees provided to us by human resources. So just knowing who is departing from the organization at any given time gives us the opportunity to supplement or augment what we have in the data loss prevention tools. So that if we see alerts or hits associated with folks that have announced their resignation, we've found that most theft of intellectual property cases that we've studied tend to occur within a 30 to 90 day window of an individual announcing their departure from the organization. So one of the earliest ways to address alert fatigue from something like a DLP is to just grab that tiny piece of context owned by human resources, right? It's focused. It's specific. We can point to data that just provides a rationale or justification as to why we think we need access to this information, how we'll use it, how we'll protect it. It gives us an opportunity to start small, but still make a big impact and show, "Look, HR, because we were able to incorporate this information, we've reduced our false positives rate by X or Y percentage, and we were able to increase our ability to recover intellectual property as it was being targeted for exfiltration by our departing employees." So it's finding those use cases that are important to our organizations that you can back up with empirical data and starting small and taking those quick wins, high-impact solutions and finding ways to build on those successes to establish broader relationships. Talhah: Yeah. Another thing that I remember, a customer of mine brought this up when we had a conversation with them, they said that, "DLP is just one piece. Another way to think about that is because DLP's really want data loss and if you just focusing on that, as far as your insider risk program is concerned, you're automatically focusing just on the confidentiality type risks." What about fraud? What about sabotage? What about physical issues that you might come up with, right? So you have to take that holistic approach, and then from there start to prioritize and figure out what you want to try to target. Raman: Oh yeah, and also- Dan: And- Talhah: Sorry. Raman: I was going to say, also the other parts of it, which around, like you mentioned this, the workplace harassment, right? You have other risks that are more human-oriented that DLP can't necessarily identify. The one thing that you just talked about just now, which is what happened 60, 90, 180 days prior? That's not going to get picked up by a transactional tool that just looks at today's data. You need that historical data to go back and reason over, right? Dan: Yeah. I mean, so much about what we're doing with insider threat detection is about anomaly detection, right? An understanding of a deviation from a defined process or the ways that things normally happen as it pertains to authorized use of our organization's critical assets. So Raman, you're spot on. If we're trying to determine deviations from normal, we need to have the capability to have an understanding of what normal has looked like historically, right? So it's finding how long back is long enough to look to establish a pattern of normal. But as we've seen over the past couple of months, it's also having an understanding of knowing when normal is going to change and knowing how quickly it's going to take for us to establish what a new normal looks like. That's something that we saw a countless number of insider threat programs struggle with over the past few months, which is every baseline that they were relying on as the foundation for an anomaly detection strategy was completely turned upside down and onto its head and rendered almost ineffective or useless when everybody fundamentally changed their normal in the way that they normally conduct authorized access to their organization's critical assets. So the last several months have really shone a light on the fact that we've got to get better at being able to find ways to articulate and describe what normal or expected is that might not necessarily have to rely on six months or years worth of data. Where do we start there with policies and procedures? How do we make it easier for our technical detection strategies to mirror our policies and procedures when everybody changed their policies and procedures as it pertained to remote work and authorized the use of information technology systems, which insider threat programs really struggle to catch up to those changes and to make sure that the detection strategies and the prevention strategies caught up. So there's lots of lessons learned over the past few months about how we do that and where our opportunities for improvement are as a community. Raman: Yeah. Talhah: Awesome. Raman: Sorry, Talhal. Go ahead, man. Talhah: No, I was just saying awesome. That's it. Sorry. That's a stupid line I just threw in there. Raman: Okay. One thing I wanted to just touch on there, Dan, is you're spot on, dude, on, in terms of the rapid change that COVID has brought along. In fact, we did a survey here at Microsoft with well over 200 CSOs focused on insider risk. One of the things that we found was 73% of them said to us that they're planning on spending more on insider risk technology now with COVID than they were before. I think this highlights the point that you just made, which is the systems and processes, if you even had them, that you were using nine months ago aren't necessarily relevant today, right? You need things that can... Because people are accessing data from end points that don't have agents on them, right? You have people that are working in new ways, sharing things with others and new mechanisms, right? I mean, just look at this particular podcast, videocast. We're doing it from our houses, right? I've got a courier coming to pick up SanDisk card here. It's one of those things where it's challenging for organizations in this new world, right? Dan: Yeah, certainly. So now that we've got the spotlight on the insider threat problem, particularly with everything that's going on in the world, that highlights the need for organizations to be intentional about where they put that expenditure, right? This goes back to where we started this discussion. You've got to think through... Now that you've got evidence that suggests that maybe you've got some gap areas from an insider threat detection or prevention mechanism, how are you going to prioritize where your next security dollar goes for insider threats? To get that answer right, you've got to take a risk-management-based approach to this. You've got to have an understanding of what's currently in place, and you've got to also be a little bit future forward-thinking here, in terms of when will things go back to the way that they were or something somewhat resembling the way that they were? What lessons learned are we going to incorporate from the last several months into that new normal? So I'm happy to hear that 73% of CSOs intend to spend more for inside threats, but also kind of terrified for them because I want to make sure that they understand what they're actually trying to do with that security investment and making sure that it's aligned with the actual risks to their organizations and being done in a way that is cognizant of their actual risk and what their current capabilities are and how that risk landscape might change and shift even within the next calendar year. So it's a hard problem to juggle and it's a continuously evolving process and a continuously evolving problem for organizations. Raman: Well hey, Dan, thank you so much. This was an awesome conversation today. You brought a lot of insights. How can people get more information about some of the research that you all are doing over there? Dan: Yeah, certainly. So thank you both, Raman and Talhah. This has been fantastic. For more information, check out our website, cert.org/insider-threat. You can also contact us at insider-threat-feedback@cert.org. In anticipation of National Insider Threat Awareness Month in September, we're going to be out and about a lot, trying to transition our research into as many broad communities of practices as we can. We'll be blogging to our Insider Threat blog about once a week in September and stay tuned for the seventh edition to The Common Sense Guide to Mitigating Insider Threats, which we're targeting for a late 2020 release. Raman: Awesome, man. Thank you so much. Talhah: Thank you, Dan. Dan: Thank you both, guys. Appreciate it. Raman: Talah, man, that was awesome speaking with Dan. I mean, it's clear that in order to have an effective program, you need to have HR, legal involved in the integrated solution. Talhah: That's right. It's gotta be integrated and it's gotta be collaborative. So that's what we provide. That's what we learned from Dan Costa and all the customers that he worked with. Super valuable information. Raman: All right, thanks for listening. Definitely be sure to subscribe 'cause next time we're gonna be speaking with Dawn Cappelli who's the CISO at Rockwell Automation.
Episode 2: Predicting your next insider risks
21-09-2020
Episode 2: Predicting your next insider risks
In this podcast we explore the challenges of addressing insider threats and how organizations can improve their security posture by understanding the  conditions and triggers that precede a potentially harmful act.  And how technological advances in prevention and detection can help organizations stay safe and steps ahead of threats from trusted insiders.  Episode Transcript: Introduction: Welcome to Uncovering Hidden Risks. Raman Kalyan: Hi, I'm Raman Kalyan, I'm with Microsoft 365 Product Marketing Team. Talhah Mir: And I'm Talhah Mir, Principal Program Manager on the Security Compliance Team. Raman: All right, Talhah, episode two. We're gonna be talking about insider threat challenges and where they come from, how to recognize them, what to do. And today we're talking to Dan Costa. Talhah: Dan Costa, that's right, the man who's got basically the brainpower of hundreds of organizations that he works with across the world, and given a chance to talk to him and distill this down in terms of what are some of the trends and what are some of the processes and procedures you can take to manage this risk. Super excited about this, man. Let's just get right into it. Talhah: Looking forward to this very much, man. And today we have Dan Costa. Dan, you want to just introduce yourself, give a little background on yourself and Carnegie Mellon and all that stuff? Dan: Yeah, sure thing. So Dan Costa, I'm the Technical Manager of the CERT National Insider Threat Center here at Carnegie Mellon University Software Engineering Institute. We're a federally funded research and development center solving longterm enduring cybersecurity and software engineering challenges on behalf of the DOD. One of the unique things about the Software Engineering Institute is that we are chartered and encouraged to go out and engage with industry as well, solving those longterm cybersecurity and software engineering challenges. And my group leads kind of the SEI's insider threat research. So collecting and analyzing insider incident data to gain an understanding of how insider incidents tend to evolve over time, what vulnerabilities exist within our organizations that enable insiders to carry out their attacks, and what organizations can and should be doing to help better protect, prevent, detect, and respond to insider threats to their critical assets. Raman: Wow. Nice. That's awesome. Dan, how did you get into this space? Dan: Yeah, so I've been with the SEI since 2011. I came onboard actually to work on the insider threat team as a software engineer, developing some data collection and analysis capabilities for some of our early insider threat vulnerability assessment methodologies. And since 2011, have really gotten a chance to have my hand in nearly every phase of kind of the insider threat mitigation challenges that organizations experience, not only on the government side, but in the industry as well. So since 2011, I've been able to stand up insider threat programs within the government, within industry, help organizations measure their current security posture as it pertains to insider risk, and try to find ways that organizations can collect and aggregate data from disparate sources within their organization that can help them more proactively manage insider risk. So that's been work, rolling my sleeves up, working with insider threat analysts, spending lots of time with insider threat analysts in the early years, conducting numerous vulnerability assessments and program evaluations, helping organizations explain to their boards and their senior leadership team the scope and severity and the breadth of the insider threat problem, and help folks understand kind of what they already have in place that can form the foundation for an enterprise-wide insider risk management strategy. So I've been very fortunate since 2011 to really have a hand in almost every aspect of insider threat program building, assessment, justifying the need to have an insider threat program in the first place. Obviously since then had a lot to do with actually collecting and analyzing insider incident data, not only what we have access to publicly, but also learning from how we've collected and analyzed data here at the SEI over almost 20 years, and help organizations understand how they can use their own data collection and analysis capabilities to bolster their insider threat programs. Talhah: Awesome. Okay. So Dan, one of the things that roam and I talked about quite a bit is my own journey in this space. I mean, I haven't been fortunate to be in the space as long as you have, but I remember when I came into this space a couple of years back, one of the first places I turned to was Carnegie Mellon. And specifically, CERT. And one of the places you pointed us towards was this treasure trove of knowledge that you have, that you then sort of complement with the OSIT Group to really drive awareness and learning, cross-learning across different subject matter experts. So I'd love to get your story of that journey of how OSIT came about, where was it, where is it going now, and what it looks like going forward. Raman: And for those listening, what does OSIT stand for? Dan: Yeah, so that's a good place to start. It's the Open Source Insider Threat Information Sharing Group. It's a community of interest of insider threat program practitioners in the private sector that are all trying to help their organizations more effectively manage insider risk. And in the group really is kind of a grassroots activity that was started by the first director of the insider threat center here at CERT, Dawn Capelli who I hear you'll be talking to soon. When Dawn left the SEI to go kind of put her research into practice out in industry, she wanted to establish this community of interest. And this is something that Dawn had been working on even while she was here at CERT, which was "How do I establish kind of a community of people who are all kind of going down the same roads within their organizations? How can we learn from each other? How can we benchmark? How can we share challenges faced early on? And how we're getting past and around and through and over those challenges?" So in the beginning, the OSIT Group was really probably a handful fold or two of folks that were just in the earliest phases of getting insider threat programs off the ground. And over the past six to seven years, we've really seen the group blossom really by word of mouth only, into an organization that currently boasts over 500 members and representing about 220 organizations in industry, all building out their own insider threat programs. So because of the community building that that was successful early on and finding time to get together and talk shop with folks that were going through the same things within their organizations, we've been able to over the years, continue to grow that. And then to mine the knowledge and the experiences gained by the folks that are building their own insider threat programs and try to find ways to generalize those conversations into resources like our common sense guide to mitigating insider threats, a variety of other research projects that we've been able to leverage the expertise insights, and really willingness to experiment and try new things that we're finding with those insider threat program practitioners. So we're really there kind of just as stewards of the community. It is governed by members of the group at large. We're there to kind of facilitate conversation discussions, make connections, and do what we can to either bring research questions out from those conversations or find opportunities to apply the findings from our research into organizations that are currently working on these insider threat challenges. Raman: Wow, that's great. When you think about when things first started, the types of challenges that you were facing the beginning to the types of challenges you're facing now with regards to insiders, how have things evolved? Dan: Yeah, that's a great question. Raman: Is risk different, or what's evolved in you're opinion? Dan: In the beginning. It was, "What do I call this thing? How do I convince the stakeholders within my organization that I need to work closely with for this to be successful? Information security, human, human resources, legal. How do I convince those folks to share their time, share their resources and partner with us to get this off the ground? How do we navigate successfully incorporating legal, privacy, civil liberties protections into our data collection and analysis efforts?" And those were really the challenges that a handful of years ago, folks were just starting to wrap their heads around, how to address particularly in the industry space. A little bit different for government in insider threat program practitioners, because for cleared populations, not only do you have kind of different expectations for privacy, but you've also got a mandate and a requirement here in the United States to have an insider threat program. So in the absence of a requirement like that for industry, getting that initial buy-in without having to have had your organization experience a harmful or a loss event perpetrated by an insider were some of the earliest challenges. And now that was six, seven years ago. The conversations that are had within that group now are far beyond that. And certainly, as folks come to the group that are in organizations that are just getting insider threat programs off the ground, they're asking the same questions, because there are the natural questions that they ask me to get started. But for the folks that have been at this for several years now and are a little bit further down the road, it's really interesting to see how those conversations have evolved. Lots of organizations now are trying to think about how we most effectively integrate things like a security operations center, a team of insider threat analysts, our data loss prevention capabilities, our fraud detection capabilities. How do we make sure that those capabilities we have within our organizations are integrated, not duplicative? What's the right way to share information between them? How do we see the insider threat program being a force multiplier for managing the employee employer relationship within the organizations? How can we be more proactive in our response strategies to not necessarily figuring out how to recover stolen intellectual property? But how can we leverage what we have internal to the organization to address the concerning behaviors and activity that might precede that harmful or loss event? So it's really been a rapid and fascinating evolution over the past handful of years in terms of the types of challenges organizations are taking on within their programs. Talhah: So I was going to say, although it feels like there's clearly been an evolution in this space, at the same time, it feels like compared to combating external adversary, we're still very much in the infancy of really getting our hand around as an industry, insider risk management. So for those customers that are new, that are coming into this space, that understand that this is a problem, particularly in this day and age of COVID and work from home, what are some of the guidance or tips that you provide? The top three, five things they should worry about to start off on the right footing when it comes to establishing a robust insider risk management program? Dan: Yeah. Great question, Tahlah. You bring up a good point, which is we've made a progress kind of as a community, particularly on the industry side over the past several years, but we're still seeing organizations still and insider threat programs, more broadly, struggle with an identity crisis. Which is it's hard for organizations to pinpoint exactly what they mean by insider threats, what the insider threats to their critical assets are, what insider threats to their critical assets they're actually going to do something about compared to what they already have in place. And because the definition of insider threat is so expansive and overarching, our definition really opens up to just the potential for any misuse of an organization's authorized access to critical assets. So that can span theft of intellectual property, that can lead completely leave the cyber realm and branch out into workplace violence, that that can incorporate things like fraud or theft of intellectual property, IT system sabotage, or even things that aren't necessarily conducted with malicious intent. Because the scope of what the insider threat problem or challenge is, we see organizations use that word to refer to a lot of different things from organization to organization. So because what the scope of the problem is so broad, we see organizations vary greatly in what chunk of this problem they decide to carve off and try to solve. And compounding that even further, even if we scope the program to one or more of those threats scenarios, let's take theft of intellectual property, for example, there are some prerequisite knowledge that has to be kind of understood within the organization to most effectively address that. What intellectual property are we worried about protecting? Who has authorized access to it? What is normal pattern of access and use look like for that intellectual property? So where we tell organizations to start is know your critical assets. Know and understand what it is that you're trying to protect from insider misuse. And lots of insider threat programs over the years, we've seen make the mistake of trying to answer that question on their own, taking their best guess, their best educated guess within their organization, and not really reaching out to finding the folks that might have ground truth or the best answer for their organization. So trying to do these things in a bubble within an insider threat program is an early recipe for calamity, an early recipe for either duplicating effort, or not finding the best right answers for your organization. And also if you can't kind of articulate the scope of what it is that you're trying to protect, you're going to have a really hard time measuring whether or not you've actually been successful at doing the things that you were trying to do. So that's where we always tell folks to start. We have a common sense guide for mitigating insider threats. We're on the sixth edition currently, we're working on the seventh edition now. And there's 21 best practices in there currently that are the foundational things for building an insider threat program. The first, and they're ordered intentionally by importance. The first is know your critical assets. Know what it is that you're trying to protect. And once you're there, work towards developing a formal insider threat program that engages all of the necessary stakeholders across the organization that can help you understand where your critical assets are, how they're currently being protected, where the gaps are, and how the organization is interested in investing to buy down risk to those critical assets in inky areas. Talhah: I love that. I love it. And I know that's one of the educations that I certainly got, one of the things that I learned working in OSIT. And the way we frame that is a lot of companies make this mistake. We certainly tried that approach, which is try to boil the ocean. And it doesn't work right? Learned the hard way. You got to be able to compartmentalize your problem space and say, "Out of this ocean of risks that you might have in your organization, what are the most critical ones? How do you prioritize that?" And once you prioritize that, the problem actually becomes a lot more tractable. Then you can kind of divide and conquer in terms of your prioritized approaches are. In a lot of ways, this is risk management 101, if you think about it. It's like, identify your assets, identify your risks, and then put the processes and programs in place to go tackle it. So, yeah, it makes a ton of sense. Dan: Yeah. So the risk management thing is really interesting because I think it's either best practice three or four, is make sure that insider threats are being addressed in organization-wide enterprise risk assessments. So if it's something that we've been saying for a really long time, and intuitively it makes sense, but we were in parallel with kind of insider threat program maturity. We're seeing organizations start to get more serious about managing risks across the enterprise in a more structured and in a more data-driven way, in a way that engages the folks that own the business processes. So it's been fascinating. So to watch the two activities come up in parallel when, when a lot of what the insider threat programs are having to do really depends on the organization having those enterprise risk management answers already established. So where we're struggling is when you go to talk to the folks that should know these answers, they don't have the right answers yet. So we're seeing organizations in parallel have to work these two activities, or try to find a way to get them to sync up and align better. And it's more pressing for insider threats as opposed to just broader cyber risk for lots of organizations, because our insiders are the ones that know where our crown jewels are. They're the ones that know the things that might not necessarily have externally the most value or the most tangible dollar value associated with impacts, but they know how and where to hurt organizations from an operational perspective. So when we're trying to figure out how bad one of these potential threats scenarios would be if it happened within the organization, those calculations and figuring that out with the right answer is for those scenarios can be a lot harder for insider threat programs because we're having to consider the second and third order impacts associated with something like IT system sabotage or something like fraud. So it's been really interesting to watch those two bodies of research and practice grow in parallel. And a little bit of inside baseball, but those two bodies of research at the Software Engineering Institute are housed within the same part of CERT. So it makes intuitive sense to have those things laid out in terms of parallel bodies of research. And what we're seeing is advances in cyber risk management and enterprise risk management more broadly from a data collection and analysis perspective, really translating over nicely into insider threat program operations. Raman: Wow, that's great. One thing as you were talking, Dan, that occurred to me is that there's a lot of, not a lot. But a fair number of the insider challenges and issues actually stem from accidental behavior, people being distracted, which of course, with a work from home environment probably gets expanded even more so because there's so many distractions going on. How do companies think about that and how do you advise organizations? Because now as we've spoken to industry analysts and even customers, they're thinking about insider instance less about the threats in general, but risks. So it encompasses both the malicious and the inadvertent side. And how do you think about that? Or how do you advise organizations in that area? Dan: Yeah, so we really buttered our bread on malicious insiders early on here at the Software Engineering Institute. About 2012, 2013, we conducted a foundational study on unintentional insider threats, where someone who wasn't necessarily motivated to cause harm to the organization, either through error or through being taken advantage of by an external threat actor, had their authorized access to the organization's critical assets misused. And a lot of what you'll find in that foundational study is when the motivation and intents differ, there are different response options that become what the organization can and should be pursuing. So once we figure out the intent associated with kind of some concerning behavior actor activity that we're seeing, or even a harmful event once it's occurred, we can then figure out the most appropriate strategies to take in terms of response options. Is this someone who needs free training? Have we misconfigured access control, like this person shouldn't have even had authorized access to that asset to begin with in the first place? How do we better educate the workforce about their individual responsibilities to protect the authorized access to the critical assets that they've been given by the nature of their employment with the organization? So it requires kind of a broadening of the aperture of what you consider to be kind of response options for insider threat incidents. And almost even a re characterization of how you declare an insider incident in the first place. So it's a worthwhile undertaking for organizations because the loss to your organization doesn't really care about whether or not there was malicious intent or not. The bad thing happened and it caused harm to the organization. So what we need to do is understand the impacts associated with malicious versus unintentional insider threats are kind of relatively equivalent and at high levels. And from there, broaden our aperture and understanding in terms of what response options the organization needs to take. Once we've been able to infer either we think that there's some malicious intent here or there's there was no malicious intent here. And that intent inference, that's where we need our human capital folks. That's where we need the contextual data that lives outside of the purview of our technical tools and capabilities. And our friends in the social and behavioral sciences to be all a part of our insider threat program teams and our inside our risk mitigation efforts to help us understand kind of the human aspects and elements of what we're seeing on the technical side of the house. That was one of the earliest findings that came out of our insider threat research here at the SEI was take a what we call a sociotechnical approach to insider threat mitigation. This is not just a bits and bytes problem. This is a people problem. We have to be able to collect and analyze data by using automated tools, to just deal with the scale and scope of this problem for larger organizations. But at the end of the day, we're talking about people that we brought into the organization, granted a position of trust to. We hopefully screened them on their way in, and they were good folks when they started here and they've been experiencing things in their lives that are causing them to kind of go down a path, a path that might potentially lead them to cause harm to the organization. So early, early on finding those proactive sociotechnical approaches to the problem was a hallmark of our research. And that was amplified as we and other folks started to kind of broaden the aperture to consider unintentional insider threats as a part of the scope of their insider threat programs and insider risk management strategies. Raman: Yeah. So the context is key here, right? And one of the things that of touched on is the sentiment. They started out as a good individual, but maybe they got distracted. Maybe they're not happy now, or something's happening and that's causing them to do something that is causing risk to the organization. The other thing you brought up earlier, which I wanted to kind of touch on was the sense of the preemptive nature, because one of the things that we had a conversation with Delinah and I have always talked about, and even talked about our customers about is once somebody has downloaded sensitive content from a repository onto their desktop, and then copy that to a USB, you're already like 80% out of the door. What were they doing prior to that? How could we identify that they may be going down this path? How do you all think about that? Because that's one of the questions that we continually get from customers. Dan: Yeah. So early on, when we were collecting and analyzing insider incident data to form the foundation of, of our understanding of how different types of insider incidents tend to evolve over time, we were looking at the incidents really from the beginning of the insider's relationship with the organization, basically through the final resolution of the incident itself. And what we found was for almost every case that we've collected and analyzed was the presence of concerning behaviors and activity that preceded the harmful act associated with the incident, that if the organization would have either known about prior or taken a different response to, might have taken the insider down a different path that did not cause harm to the organization. So in those different types of insider incidents that we've studied, fraud, theft of intellectual property, and IT systems sabotage, we've developed models that we've mined from the incidents that we've collected and analyze for those particular incident types. And those models capture not only how the insider attempted to evade detection or how they actually caused harm, but what were their personal predispositions and what stressors were they experiencing when combined with their personal predispositions that caused them to exhibit some concerning behaviors, detectable things, either from a technical perspective or from a behavioral perspective that the organization responded to in some maladaptive way? Either by paying no attention to it, either because they didn't think that that was something that could lead someone down the path of causing harm, or they didn't have a detection capability in place. They simply didn't know about it. Or they zagged when they should've zigged. A good example of this is in our IT sabotage model where we've found kind of a pattern of disgruntled insiders being maladaptively responded to by their organizations, through things like sanctions, being demoted, being pulled off of important projects, having their access revoked. And those sanctions, those responses by the organization led, the insider to become even more disgruntled. And you see patterns of this increased disgruntlement, another sanction, the insider gets more and more disgruntled, and at a certain point reaches the tipping point and decides that now it's time to strike back. Motivated by revenge against the organization, or they decide to leave the organization. Now they're going to take some intellectual property with them to benefit a competitor organization. So it's in those kind of feedback loops between concerning behaviors in maladaptive organizational responses where we found opportunities for organizations to improve their security posture as it pertains to insider risk, by gaining a better understanding of kind of those conditions that precede the harmful act and considering a much broader array of response options that might not necessarily lead someone to be motivated to cause harm, but might let them feel like they are supported by the organization, that they understand their relationship from a contractual perspective to the intellectual property that they're creating, and really a myriad of other different nuances for those different types. So that's, again, something early on that we've established. It's these patterns of concerning behaviors and maladaptive organizational responses that exacerbate the threats and lead insiders causing harm to the organization in finding those feedback loops and trying to propose different strategies and then find ways to measure the effectiveness of those alternative strategies. Raman: Talhah, that was an awesome conversation with Dan, got to understand like how to identify malicious versus unintentional insider risk. Talhah: Yeah, we need more time with him. Raman: Yeah, definitely. I think we have another episode coming up with Dan next, which is around, how do you engage with other stakeholders within your organization like HR, legal, privacy, compliance, hugely important, in terms of building an effective program, right. Talhah: That's right. Looking forward to that very much. Raman: Cool. Be sure to subscribe and we look forward to seeing you next time.
Episode 1: Artificial intelligence hunts for insider risks
21-09-2020
Episode 1: Artificial intelligence hunts for insider risks
In this podcast we explore how new advances in artificial intelligence and machine learning take on the challenge of hunting for insider risks within your organization.  Insider risks aren’t easy to find, however, with its ability to leverage the power of machine learning, artificial intelligence can uncover hidden risks that would otherwise be impossible to find. Episode Transcript: Introduction: Welcome to Uncovering Hidden Risks. Raman Kalyan: Hi, I'm Raman Kalyan, I'm with Microsoft 365 Product Marketing Team. Talhah Mir: And I'm Talhah Mir, Principal Program Manager on the Security Compliance Team. Raman: All right, welcome to episode one, where we're talking about using artificial intelligence to hunt for insider risks within your organization. Talhah, we're gonna be talking to Robert McCann today. Talhah: Yeah, looking forward to this. Robert's been here for 15 years, crazy-smart guy. He's an applied researcher, a Principal Applied Researcher at Microsoft, and he'd been like a core partner of ours, leading a lot of the work in the data science and the research space. So in this podcast, we'll go deeper into what are some of the challenges we're coming across, how we're planning to tackle some of those challenges, and what they mean in terms of driving impact with the product itself. Raman: I'm excited. Let's do it. Talhah: Let's get it. Raman: Robert has been focused on the insider risk space for us for, Robert, how long you've been in this space now? Robert: I've been doing science for about 15 years at Microsoft. The insider risk, about a year I think? Talhah? Something like that. Raman: Nice. What's your background? Robert: I am an applied researcher at Microsoft. I've been working on various forms of security for many years. You can see all the gray in here, it's from that. So I've done some communication security, like email filtering or attachment, email attachment filtering. I've done some protecting Microsoft accounts or user's accounts, a lot of reputation work. And then the last few years I've been on ATP products. So basically, babysitting corporate networks, looking to see if anybody had got through the security protections, post breach stuff. So, that's a lot of machine learning models across that whole stack. The post breach thing is a lot about looking for suspicious behaviors on networks or suspicious processes. And then the last year or so, I wanted to try to contribute to the insider threat space. Raman: What does it mean to be an inside ... or to be an applied researcher? Robert: An applied researcher, that's a propeller head. So we all know what propeller heads are. Basically, I get to go around and talk to product teams, figure out their problems, and then go try to do science on it and try to come up with technical solutions. AI is a big word. There's a lot of different things that we do under that umbrella. A lot of supervised learning, a lot of unsupervised learning to get insights and to ship detectors. I basically get to do experiments, see how things would work, and then try to tech transfer it to a product. Raman: So, you said you spend most of your time in the external security space, [crosstalk]- Robert: That's right. Raman: ... things like phishing, ransomware, people trying to attack us from the outside. How is insider threat different? What do [crosstalk] like to be, "Wow, this isn't what I expected," or, "Here are some challenges," or, "Here's some cool stuff that I think I could apply." Robert: Yeah. It's a very cool space. Number one, because it's very hard from a scientist's perspective, which I enjoy. So the first thing that you hit on, that's really the sort of fundamental first thing that makes it hard is that they're already inside. They're already touching assets. People are doing their normal work and they inside threaten might not even be malicious. It might be inadvertent. So it's a very challenging thing. It's different than trying to protect a perimeter. It's trying to sort of watch all this normal behavior inside and look for any place that anybody might be doing anything that's concerning from a internal assets perspective. Raman: So when you think about somebody doing something challenging, is it just like, hey, I've downloaded a bunch of files. Because today I might download a bunch of files. Tomorrow, I might just go back to my normal file thing. But if I look across an organization, besides a Microsoft, that's 200,000 people. That could probably produce a lot of noise, right? So how do you kind of filter through that? Robert: So actually, the solutions that are right now in the product and what we're trying to leverage to improve the product are built on a lot of AI things. There's very sophisticated algorithms that try to take documents and classify what's in those documents, or customers might go and label documents, and then you try to use those labels to classify more documents. There's a lot of very sophisticated, sort of deep learning, natural language processing stuff that we leverage. And those are very strong signals to try to see, okay, this behavior over here, that's not so concerning, but this behavior right here, that's a big deal. Now we need to fire an alert. Or maybe it's a little more of a deal, but then I sort of got some sentiment based on how the person's doing, the employee, if I combine those things, now it becomes compelling. It's a very hard noise reduction problem. Raman: As you were talking, Robert, one thing that sort of occurred to me is I've had conversations with customers, and you mentioned this around leveraging, artificial intelligence and learning and helping the system learn. A lot of questions I get from customers is like, "What is artificial intelligence in this context? And how do I know that this is something that I should trust, or how is it different than maybe what I'm doing today?" Robert: I've seen this play out time and time again on many, many times that sort of a security team has tried to start leveraging AI to do smart detections. It's a very different game. It's not, "I have precise detection criteria, and if you satisfy that, then I understand what I did, and I understand the detection." It is a very statistical machine that sometimes you have to assume it's going to make mistakes. So, one key thing you need to be able to do to trust that machine is you need to measure how well it's doing. So you have to have a way to babysit the thing, basically. And you have to set your expectations to understand that there is error going to happen, but there has to be an error bar met. So that's basically what you're babysitting against. Robert: Another very key thing is when it fires a detection, that thing can't be opaque. It needs to explain how in the heck or why in the heck it thinks that this thing is a threat, right? So the deep learning folks, like for image classification or natural language processing work, they sort of jumped on board real fast with the deep learning thrust without really worrying too much about being able to explain why that thing was classifying images the way it was. And they were ecstatic because they're getting so much better results than they've gotten the decade before. Right? But then it came to the point where they started realizing, hey, I can game this thing, and I'll prove it to you. And then you take a picture and you change a few pixels, and then I make that thing classify the cat as somebody else. When you use a camera for detecting people, facial recognition and identity verification, that becomes a serious problem. Robert: So they sort of went under this phase now, and it's very hot right now, can you do these sophisticated models that also can ... you can explain why they did what they did. And so there's a ton of science and a ton of work trying to crack open the black boxes, right? Those big sophisticated learners. But you don't have to go to that phase. There's all this other AI that works very, very well and is a very effective, and I would say is probably the most common stuff that's used and delivers the most value in industry that's not so opaque. And the models are simple enough or I guess opaque enough, or they're explainable enough that you can tell a customer, "I detected this threat because this, and this, and this happened." Right? So, explainability is very key to trying to trust AI. Talhah: That brings up another sort of key question we get from customers a lot. This idea of transparency in the model or the explainability in the model that is a key attribute, right? So it looks like we're learning from years and years of data science and research in this space to apply that into the models that we build. So can you talk about a little bit? Insider risk, what do you think constitutes a good model? What kind of explainability should be in that model so we can help our customers make the right decision on whether something is bad or not? Robert: Well, you have to put on the customer hat, which sometimes is hard as a scientist. A scientist might be satisfied saying, "If the explanation for some prediction by some model is ... The feature 32 was this far away from a margin." Okay? So there's some technical explanations why a classification might happen. But the customer, they just want to know, "What are the actually human actions that caused that?" You got to have a model where you can add simple enough features where you can boil it down and say, "This person's suspicious because they printed this document that's highly confidential, and then they did it again two days later, and then they did it again three days later, and then they did it again four days later." And you have to have that very human intelligible output from your model, which is something that is very easy to skip if you don't have explainability top of mind. You have to pick the appropriate technologies. Talhah: Because it's really about trying to abstract the way all the science behind the scenes, right? We should just be able to easily explain to the customer, "Here's what we saw." How we detected should be irrelevant to them. Here's what is happening with this potential actor. Let's go make the decision on how to manage that risk. Raman: Yeah. And I think that is the sort of the key here, right? As you think about there's the tech, which is how do I try to detect these things? And then there's the person consuming the output of the tech, right? And typically, the person consuming the output of the tech is somebody who may be in HR or may be in legal, may be they're ... Yeah, they could be a security analyst, but they have to interface with HR and legal. And they may not be as sophisticated. I'm technical, but I'm not as technical obviously as Robert and probably you. And so I don't want to go deep dive into some algorithm to try to figure out, "Well, what's going on here?" I want to do, "Hey, the risk score of this individual is high and here's the related activity that the system found, and this is why you should believe it." Talhah: Yeah. In fact, we've seen this in our customers. We've seen this in our own experience in that the people that have to make the timely and informed decision on how to manage insider risk is oftentimes the business or HR or legal. They don't want to get into the technical details behind the model that was used or this, that, or whatnot. They just need something that's easy to understand in business terms so they can make that determination on what needs to happen. Rob and I were just on a call with a customer earlier this week and they raised this question on why can't we do supervised learning for these detectors, so I'd love to get your thoughts on some of the challenges or maybe some of the opportunities or how you're looking at the types of learning models that you use for these detectors. Robert: One of the challenges is how much context it needs. And if you want labels, you got to be able to take and give that context to the customer when they have alerts, right? They need to be able to accurately say, "Hey, this alert's right, and it's easy for me to tell that, and I can do it in an efficient way because the product just gave me an explanation." Now, once you're able to sort of explain yourself and you're supposed to ... and you're able to give it to the customers, so they can efficiently triage, now you're starting to crack open this sort of virtuous cycle where they can start giving you labels and you can pull them back in house and you can start learning how to do supervised classification on this stuff. It's very key. You need this sort of label generation mechanism, right? Robert: So, that's key for opening supervised learning. But it's also key in that insider threats can be very subjective. So one tenant can want to see the same activity, and another tenant might say, "Ah, that's not important to me. Don't tell me that, please. That's noise." Right? So now you got to be able to do classification that's customized per tenant, right? And that each tenant doesn't want to go in and fiddle with all your AI and make it to work just right for them. An easier way for them to express what they want is to give you feedback. So we explain detections, they give us feedback, and now we can start learning. Okay, supervised model works for these types of customers. This other supervised model works for these types of customers, and now we can sort of get this customization game going as well. But all of that and all of those supervised learning techniques, they rely on labels, and you got to do a good job explaining to your customers to get that feedback. Raman: One question, Robert, I also get is around ... Today, a lot of the tools or a lot of my detection capabilities are reactionary. I got fired or I'm not happy, and I downloaded a bunch of stuff and I'm out of here. I resign. Right? But prior to that, maybe a month prior, or maybe it's four months prior, or even three weeks prior, there might've been some activity that was happening that might've indicated that I was about to do it. Well, can you help me predict? Can you help me be more proactive? And I think, again I go back to this is a spectrum of things, right? We're not going to know today, is Talhah bad tomorrow? Probably not. Right? But it could be like, hey, review time's coming up. Didn't get the bonus he wanted. He's been working on insider risk for the last two years. And now it's like, "Okay, I'm out of here, man. I'm going to go somewhere else." So I guess the big question I want to ask is, how do we answer that for customers when they ask us that? What would be your answer? Robert: There's something here, and Raman, I think you sort of hinted at it; is that there's past behavior that we could look at and we could say, "Okay, from our past experience, this sort of sequence, 10% of the time end up with something that we didn't like. So if we see that in the future, let's do that again." So actually, on a technical side, we're doing a lot of work on sequential pattern mining, and it boils down to just that. What are sequences of activity based on the type of context that Talhah mentioned, it might be sentiment, or it might be something else that tend to lead up to things that in hindsight we know were bad. Okay, so we're going to use that to predict in the future. But there's also stuff that maybe we didn't see before. So maybe we also look for here's some machinery that today ... Here's sequences that are totally abnormal, but let's go get somebody on them, and let's look at that and let's start get that labeling loop going on, so we can understand if that sequence is good or bad, so in the future, we can protect other people with the same observations. But your question about pre-emptiveness is a good one. And I think sort of the sequential mining aspect, very fun from a technical standpoint. And I think it'd be very valuable for our customers, for sure. Raman: Because I think that this is highlighting for me from a tech perspective ... You know, I'm a marketing guy, so I'm about selling it, selling the story. But as I think about this, what becomes very clear to me is that you can't just use one thing, one signal. Can't just be like, "Oh, somebody is on an endpoint and they tried to copy something to a USB and that might be bad." There's multiple things going on, right? There's sentiment analysis. There might be other activity. It's who they're talking, to how many times they're trying to access stuff. Did they come into a building when they shouldn't have been in the building? Raman: All of these different elements can come into play, and to Talhah's earlier point, it's really about ... because we're dealing with employees, you can't assume that everybody is bad, right. It could be like, "Wow, I couldn't get my PC to turn on at home, so now I got to go to the office and do it there." Maybe that was in the middle of the night. I don't know. But I think that's the big challenge in this space from my perspective is that you just can't rely on one set of signals. It has to be multiple signals, and the machine learning is key to really driving an exposure of, this could be something that you might want to take a closer look at. You're always going to have a human element, I guess, right? Talhah: That's absolutely true. In fact, this reminds me, when we were sort of establishing the program at the company, we had a whole virtual team put together and we were trying to kind of ground ourselves on a principle, and one of the guys on the team actually proposed something that just stuck, which is this program should be built on the principle of assume positive intent, but maintain healthy skepticism. What that effectively means is you just follow the data. That's it. Don't start off thinking everybody's bad. Don't start off thinking you're going to catch bad guys. This is about looking at the data, as much of the data, as much of the context, to Rob's point. And just follow that until you get to a point where it's like, this looks odd. This looks potentially risky. And then you take that information, you surface it for the business with the right context, right explainability in the model so that they can make the decision. Raman: I think presenting that in a way that allows you to make that informed decision does two things. One, it gives you the ability to kind of say, "Hey, this might be bad for me," but two, it also allows you to filter out the noise to say, "Hey, not everything is bad," because what I also hear is, "I'm done with ..." Let's imagine using a data loss prevention tool to try to detect insider risk, right? That's challenging because, A, that's just one set of signals. It's a very siloed approach. And B, you're going to be overwhelmed with a ton of alerts because it's very rules-based, right? It's not [crosstalk] using all this machine learning type of stuff. How do you prevent alert fatigue? And I think that's where you need this combination of signals to not only look at what might be potentially problematic, but presents it in a way that you can then make that informed decision. Raman: So, Rob, one of the things that ... As we look forward, there's a number of different types of detections that we could, potentially look at. One is sequential modeling. That's an interesting one, and we'd love for you to explain about that. The other one is around this concept of low and slow. From what I understand, it's not about this big burst of, "I come in today, I download a thousand files, and I'm out of here." It's more, "I'm now a little bit irritated, and over the next six months, I'm going to download a file here, a file there, 10 files here." I'd love for you to kind of deep dive into that. Robert: Yeah. I mean, those are the really interesting cases, right? Those are the people that are being very stealthy, right? And the people that we want to try to detect. It's a little bit different of a game. Like you said, the bursty stuff, did they do something abnormal to themselves or did they go over some globally agreed upon threshold that this thing is just bad behavior, right? That's a different game than looking at somebody who's trying to stay under the radar and taking long-term. You got to model things a little differently. Number one, you got to look at longer history. I'm not looking at bursts of daily activity. I'm looking at what they've done in the long term. So now you have engineering issues because you got to have the scale to look at everybody's rich, long history. But then after you get that, okay, are monitoring somebody, it's very hard to tell. I mean, you've looked at stock market charts where there's two very flat ... How do you tell the difference between two flat lines where one's a good investment and one's not a good investment? It's hard, because it's low and it's slow, right? The behavior is subtle. Robert: One thing that we're looking at is how can we tighten the screws when we do anomaly detection, right? So it's easy to tighten anomaly detection to the level of detecting a burst. Okay? You can do that, right? Now we want to tighten anomaly detection to the point we can pick out two flat lines and tell the difference from good behavior and bad behavior. Right? What does normal mean? I mean, normal's got to be right in between those two. How do we find that normal, right? So the way that we're doing that is we're modeling people based upon what's normal for groups of similar employees, right? How tight can we say what's normal behavior for devs so that we can have a model that looks at low and slow normal work behavior for devs and low and slow, little bit worse than normal behavior for devs and pick that apart. So you just got to do tighter anomaly detection, and you got to compare them to groups that's going to give you a definition of normal behavior that's tight enough that you're going to be able to pick out, even though they're low and slow, you're going to be able to pick out the different behavior over a long period of time. Talhah: So Rob, just a couple of fun things on the side. Being a long-term researcher, what are some of the pet peeves or some of the things that really have annoyed you about maybe some of the product pitches you've seen where they maybe over-promise or the way they position AI or ... I'd love to hear some of the stories that you have on what kind of just gives you the shivers. Robert: As scientists, we have a community and we go talk to each other, and you get to know people, and you figure out what's really behind that magic sauce. And it's not as impressive sounding as the marketing. So that means the marketing is doing a good job, I guess. Right? But that's sort of a pet peeve from a scientist standpoint. I mean, good signs that you should see to sort of prove that stuff out is you should see scientific activity. If they say they're doing good science, they probably have scientists working for them. And if they have scientists working for them, then those scientists like to do things publish or make patents or go out and do ... You should see some scientific evidence happening there. I think that's sort of a telltale sign. So that's one pet peeve; overselling how much is going on there. Robert: Another pet peeve is this idea that machine learning or AI is a magic bullet that you just throw stuff at and it magically gives you exactly what you want. It doesn't work that way. Computers are basically just big, really fast calculators, right? And we've figured out some algorithms that they can look at some data and pick out some patterns quickly, but that's what they are. They're pattern finders. The scientific community has been clever in how they take that sort of big, fancy calculator and put it into making some business decisions that are crucial and stitching them together. Like we talked about, you know, here's a module that does sentiment analysis. Big, fancy calculator, right? Here's a module that does confidentiality of the file. Big, fancy calculator. And then there's all this business stuff that comes in that has to stitch that together to make a good decision. It's not just the AI. It's the stitching together in the appropriate ways that solves your business problem that's really the magic sauce, right? So that's another pet peeve. You just throw stuff in AI and then you suddenly got a million dollar business. It doesn't work that way. You've got to put these components together and work hard on them because they're challenging, but you got to stitch them together correctly. It's the whole ecosystem. Raman: And that's actually an interesting point, Robert. I like that because in a way, what you'd say is, okay, I took these ... Imagine I'm creating clothing, right? And I've got different types of fabric, different types of zippers. And I stitch it together and I produce it and it's like, "Hey, here you go. Here's your shirt." And somebody says, "I don't like it that way. I want to be able to stitch it in a different way." Or if new fabric comes out, I'm going to use that in new types of clothing. And I think this is what to me is interesting about what you just said, which is you've got these different calculators that are looking at different parts of the puzzle, right? Taking different signals in, and then the secret sauce is how do you stitch it together to produce something that you might want to consider as being an anomaly or abnormal behavior, but then be able to provide feedback back into that calculator to say, "Hey, I didn't like that." Or, "This didn't work for me. Stitch it together somewhat differently." Robert: Yeah, you're right. I mean, like we've talked before about how do you trust these black boxes? It's all that logic that babysits it. You've got to have some guardrails in there so the thing doesn't go off the rail and mess up with everything else that you're stitching together. It's that sort of business logic on top that's super, super valuable and just as impressive to me as the AI under the hood, to tell you the truth. Raman: Robert, appreciate you being here today. This has been great, great conversation on the tech. As you think about the future and where we go from here, where we see ourselves in five years from now, what are your sort of projections in terms of what might be different than what we have today? Robert: Yeah, that's a great question. I think some of the big thing is, solve these sort of challenging tweaks, which is like, Talhah mentioned multi-users. We solve multi-users. We get good enough anomaly detection that we can pick off the low and slow, even differentiate that. I think one thing would be super powerful that you get to, is if you get this sort of feedback coming, right? Because once you get this feedback loop coming, then you crack open the AI door for all kinds of algorithms. There's a lot more supervised stuff that we could use and we could leverage that would make us even more powerful, which would give better detectors to people, which would give us more labels to get even more powerful. And when you sort of get that mutual synergy going, I think the detections, they skyrocket. Robert: And then one other thing, the industry hasn't really ... Like, external is tax space. Industry has these threat matrices, right? And they sort of have this benchmarks that they're trying to work against, and they're writing down simple rules to detect that, and they're using sophisticated AI targeted at known bad behaviors. I would to see that sort of landscape roadmap start happening in the insider threat space as well. Because it's going to prioritize what we do from a product standpoint and also from a research standpoint, and it's going to be an input to our models. "Hey, this is known bad stuff. We better be able to detect that." Stitch things together to detect those sequences. Raman: Thanks again for being here. Talhah, always good to talk to you, man. And I know we have another episode coming up with, I think it's Dan Costa from the Carnegie Mellon, right? Talhah: That's right, looking forward to that very much. and Robert again, appreciate it, man. Raman: Yeah. Excellent. Thank you so much Rob, thanks for- Robert: Thanks for inviting me. Tell Dan I said hi. Raman: Yeah, I will. Talah, that was an amazing conversation with Robert Talhah: Yeah. Raman: That guy is like surreal. Talhah: I told you. Raman: I know, it's amazing. Well, if you enjoyed this, we've got another podcast coming up with Dan Costa from the CERT Institute at Carnegie Mellon. Dan is engaging with a lot of players like the NSA and Secret Service. So it's gonna be great. Definitely subscribe and continue listening.