How to QA your ML models to deliver quality AI

Free on-demand webinar

You would never deploy software without being rigorously tested first. So why are we deploying ML models without prior rigorous testing?

Building a robust machine learning model is not an easy task. After all, most POCs don’t make it into production. And even if they make it into production, you still need to monitor its performance.

Want to know how your team can build performant, tolerant, stable, predictive models that have known and fair biases? How can you make sure your models yield their value over time and stay performant after your team has deployed them? What are the current practices of model validation (or lack of), how are they flawed, and how could we improve them?

We’ll go through the reasons behind using an efficient validation framework that goes beyond the common metrics used by ML practitioners and why these tests matter when building high-quality models.

Simon Shaienks, Product Marketing Manager, and Simon Dagenais, Data Scientist, will guide you through our model validation framework and how you can leverage Snitch AI to validate your own models.

Webinar Transcript

Below you’ll find the webinar transcript. We’ve obviously used AI to generate it, so maybe you’ll see some incoherence. Let us know if you do! 🙂

Simon Shaienks

Simon, the floor is yours!  

The genesis of an AI system 

Simon Dagenais   

Thank you, Simon! Oh, yeah, we forgot to show our nice pictures. All right.  

So before getting into the meat of it, and talking more specifically about the test that we can perform, I wanted to share more of a personal story of mine, that led to why I think QA is so important today. 

So before, like, before, I was working at Snitch AI, I was actually a data science consultant working on various projects with different industries and themes and whatnot. And then one of my first project was with one Telecom. And so this telecom wanted to use machine learning to target customers. So a really classic churn example. Right? A 

nd so, they, they wanted to use their internal system to target and better create advertisements. So the customer will not change company. When, like, during their contracts, right, so I was super excited, I was using data to do something that was really concrete that was out of the, the kaggle environment and what we were used to do as like when we were learning as data scientists.  

So we, we started, and the team and I and we were so excited, we start working on data and working on this classification model. And we reached a really good AUC score close to 94. And it really good accuracy. And we’re really stoked to show this to the manager, the marketing manager at the telecom. And we’re gonna really good we felt like we were gonna reboot, revolutionize their practices. And really, it like, starts, like, begin their journey of using machine learning to solve their problems.  

But so like, we showed this ROC curve and this confusion matrix, and rightfully so, like, the manager is a bit confused, like she’s never heard of all those technical metrics, and like those statistical language, but she, you know, she decides to take this leap of faith and to go ahead with this machine learning model, right, because it was, the performance was really great, and boasted, and it seems like it was just something good to do. But she’s a bit confused, but still decides to go.  

The failure of an AI system 

But unfortunately, it was just the both the performance that we, we thought we were going to add, like on our test sets, did not exactly replicate itself. Once in production, like instead of it being 94 AUC score, like 90%, that precision metrics, we were more around the 60s, or like something that was really underwhelming and then started, like, a downward spiral of trust, and maybe something that would really slow down their adoption of new technologies or machine learning in the company.  

And people did not know how, why and how the model was so underperformance people, like we’re really sceptic of its ability to, to perform because there were so much at stake, right? every customer we were targeting, it was some hunt, like some a lot of marketing expenses that would go to those potential churners or like potential customers that would leave.  

So they were a bit frisky to keep going in that sense, and to continue adopting the model. So, of course, as a data scientist, that’s something like a bit of a nightmare, right? Like, we want to avoid the situation as much as we can.  

The end of an AI system 

And I think that’s something like the topic of the talk today is how can we make sure that our models performance, one degrade once in production? But also, I think on more of AI adoption, point of view or transformational point of view? How can we make sure that the stakeholders surrounding the model trust and the model and the technology that is being built, trust the team that is treating like putting all those efforts into drafting better model and better quality But also, they will have more willingness to pursue a efforts and like pursue trying to adopt AI. Even if there’s bugs or there’s a malfunction with the system, we wish to also understand why those things happened.  

Existing QA approaches in Soft. Dev. 

So I think one thing we can do, as data scientists is to inspire itself from existing QA approaches that exist in software development. And let me drink a little water here. So and software development has been around for years, right, like more than 30 years. While data science has been around for a few years, I’m sure that most of the data scientists like on the call here, are probably learned, most of what you’re doing, like, on the internet, perhaps took like a few classes like a university. But it’s something that is still evolving, it’s a bit of the far away. But on the other side, like much of our coworkers are software developers,  

I’ve added methods that have been around for so long, and that I think, can inspire us to deliver better quality consistently. So what do software developers do? Well, they perform tests when they create their gold when they created the software, such as unit test, regression testing, or end to end you’ve maybe heard those words before. But I think all in all, what they’re trying to achieve is to test like put their programs into unlikely scenarios or regular scenarios, and validate that the output of their model is then they wanted out. So they might simulate different user behavior. They might either big load on their system, and make sure that their software responds into the appropriate way. And in turn, what it does is it guarantees quality. So I’m sure you’ve experienced much, less bugs. And when you’re playing video games, and like this year than in the past years, it’s gone. There’s constantly progress, it also builds accountability for developers. So when I, when they create some chunk of code, while the assured quality, and they did most of what was in their power to make sure that once the software is in production, it won’t fail. It also allows to catch costly mistakes before they happen.  

So by troubleshooting this, like their programs, they know what’s the worst possible scenario or when it will fail, they push their software to the limit. And it’s perhaps something we want to do as well with our machine learning models. So into extreme scenarios, how will they respond? And so why are they no like tree approach? Well, like I said, it’s in machine learning.  

Why are there no systematic QA approaches in ML? 

So like I said, it’s also a really immature field that has been around. But I think after like when we think of it, like machine learning models are more often than not a complex set of rule that are just embraced or condensed into a few lines of code. When we create our model, their predict functions can sometimes be very tiny into a software. But they’re, they’re not so different than just a complex set of rules, right. They’re also like regular software subject to unexpected inputs, building a relationship with other part of the software like components. So like, there can be so when we think of unexpected input, we can think of unexpected type of distribution. So whenever our, like models, then look at different types of data, different types of phenomenon, every, like every data that is fed can somehow change depending on the nature of the problem that we’re observing. I 

 think most importantly, we expect the output of our models to be really consistent, reliable and usable. So like in my case of the telecom, the company with their stakeholders of the model, expected the accuracy or the performance of my model to stay the same no matter what kind of data was thrown at my model, right or what what like, whatever kind of phenomenon was happening. With my data, we can also think of really more sensitive machine learning use cases, such as the health system or self driving cars. We do not expect our model to fail. And and it’s an because it can lead to really disastrous outcome.  

So just like software or machine learning models are like should not there should nothing should be left to like basically. So what? So how Then should we evaluate or like, make sure that there is nothing wrong and in production, our model will keep performing as intended as we we initially plan?  

How should we perform QA on ML models? 

Well, I think there’s three areas too many mainly cover to dig into when we’re drafting a machine learning model or into our ML cycle, there’s three things we should look at with more attention.  

First, is to derive expandability, or to uncover and understand those core functions. So in order to assess whether the reasoning of our model actually makes sense, and that there’s no flaws in the reasoning of our models, that could affect its performance down the line, we should try to see what are those rules, and maybe it says sounds a bit cliche, but you open this, this black box, and to look at which feature what are the main contributors of our models, we should also gain insights of response to alternate inputs. So that means that you know, most of the time are most often than not, and not all the cases that are possible into nature are included in our training and testing sets. Especially when we have a small data sets. And like our model is not trained on all the possible scenarios. So we might be interested in looking at our model will respond to slightly different scenarios than what it’s used to be.  

And to make sure that its response won’t be sending totally ludicrous or erroneous. Something that’s also really important is what we feed our model. So before feeding it something completely different, we should also validate that the examples that our models is ingesting somehow resemble the ones that it’s used to see.  

So if I didn’t give like a bit of an analogy here is like the students right in the classroom, seen all of the, like additions and like learns how to compute like, really simple example that we don’t want to expose them in the real world will multiplication or really complex algebra, like it should be like, our model will be able to respond on things that it has learned in the past. So and then you go ahead.  

An efficient framework for validation 

So here, I will propose this framework for validation. And so I will dig in more into the three areas that I covered I just previously mentioned. So here’s a bit of the agenda of the tests that you can perform on your machine learning model before putting into them into production.  

So we’ll start talking about feature expandability, what kind of tests we can do with the data that we basically generate. We’ll also talk about robustness to random and targeted data on detecting data drift, and I’ll mention briefly some other tests that are out there and that is possible for any data scientist to perform.  

Feature explainability related tests (1) 

Oh, yeah, before getting to feature expendability for people that are not maybe familiar with what it is, and so, there are frameworks that are model agnostic, which means and are transferable to any kind of model and allow the user to understand which feature are a bit like more important contribution to our prediction.  

Herein the little example that is shown on the screen, we can see that age and sex, bigger contribution are more importance. To the output that is point four, then BMI or dp, whatever that means. So, when we use those frameworks on our models, which can be any type of model, which can be random forest reading boosting neural networks, we derive information that is that becomes so important in assessing the quality of our model, it can be used to really different for different purposes, which are really useful. And I will dig into that. 

I encourage you to look into sharp and lime, which are some really popular frameworks do the right feature expandability. And then here, I will give more details on the kind of test that we can perform and which risks also to avoid. And to go back to my telecom example that I was telling you about, I think we’ll try to like go ahead and find which risks we could have avoided to make sure that our model would still be performing and once in production.  

One  risk that we all face as data scientists is to have a really complex data pipeline, the more data sources that we use, and the more there’s a likelihood or a chance that there might be problems with pre processing, that there might be errors coming from the data sources. Basically, our model relies on much more events that should like all go correctly, to create a relevant prediction. So let’s say there’s a different treatments have a certain data source, upstream, it will definitely affect our model downstream. And in my telecom example, like we were connected to different part of a CRM, and we don’t know what kind of happens to those data, sometimes there can even be manual transformation of the data, especially in some industries that have not relied on a company, completely automated cloud.  

We  should try to simplify our data pipeline, as much as possible to avoid those kinds of mistakes. And one thing we can do to make sure that to simplify this pipeline, is to look at our feature importance and basically take out the least important features. So our model may, like may rely on 100 different features. And we may realize that, essentially, there’s maybe 30 to 50 variables that have a marginal contribution to the output, the rest of the variables that we’re use are basically ignored, because then the model does not perceive that there’s a predictive power to those feature. By eliminating those features from our data sets or from our model, we can just avoid attends a mistake and maybe make sure that there’s more stability into our prediction.  

By using sharp we can identify those features, and basically prune our model and our data sets. So that’s a really simple thing to do that can avoid mistakes that we may not intend to make. I will even never intend to make mistakes.  

Feature explainability related tests (2) 

The second one is whether when our model learns erroneous or non replicable patterns, here, I’ll show you like I’m showing you a beautiful picture of a husky and a Corgi.  

And it’s might like we might create a model that learns that the snow is a good predictor for any for a husky breed. So like our model might use the snow as a background to create a prediction. And it might predict like that this beautiful Corgi is not a Corgi but more of a of a husky. And of course we don’t, we don’t want that to happen.  

In the case of Telecom, it might be that our model learns the importance of a certain promotion, for instance, but that promotion changed or and is not is not something that will also happen in the future. And that will bias basically the prediction of our model.  

We should try to uncover what are the main predictors of our model for anything Just to make sure that there’s a sense to what our model is doing. And that’s the pattern that it has learned will be replicable in the future. So, sometimes, what we need is more subject matter experts to evaluate whether our model is making relevant connection between the data and the actual problem.  

Some things we can also perform as a data scientist, that could maybe actually that could reduce the potential impact of this problem is to perform adversarial training. And that and that documentation, when we perform adversarial training, we see that there’s much less unimportant features like features that become important and that documentation would allow, in this case, for instance, to transpose our SQL with different backgrounds, to change the color of the background, or to cut the image, those transformation to the data would make sure that our model not only learns of the background, but more on like, Corporal feature of the SP for instance, like his eyes, his nose or his nice little ears.  

Feature explainability related tests (3) 

One other important test that we can perform actually, one other important risk, sorry, that could happen with any machine learning model is concept drift. So to give you a bit of an overview of what kinds of drift, think of it as my problematic change, or there’s the pattern to which my prediction, my model should make its prediction, as change in the, in our telecom example, we could think that previously, in my, in the past, in my training data, for instance, what led to customer churn was  

Oh, thank you. Just the talk will be recorded and sent to registered participants. Thank you anonymous attendee sorry, for cutting off. So the pattern may change, right? Like, we might think that there’s like that outages of like in the system or in the network or failure in the network, might lead to customer churn. But in the future, then the pattern that leads to churn is maybe the promotion of a certain competitor.  

So we see that there’s definitely like a transformation in which features should be important and the way our model should create its prediction. So by evaluating the feature importance, true time, we can distinguish whether like which pattern leads to our outcome of interest. So feature importance are also something that is constantly evaluated, and noticed. Because if the pattern to which customer churn then our past data may not be relevant anymore. Everything that we had learned may be thrown in the garbage. And we should maybe start over again.  

And I think consider it might be something that is also they might have also had been to a lot of your models during COVID-19. Right? We can think of the amount of toilet paper that was bought at a certain point like it was the pattern to which customer were not buying toilet paper was not because they needed but more of a sheer panic probably. So we can think of something that radically or just slowly change the way we should create those predictions.  

So the the feature expandability I was showing earlier should also be monitored. Because if we see too much of a change in the system, then the model should either be retrained with more recent data, or we should maybe include features that might explain those changes. We should also maybe change the pre processing that we do to our data to make sure that like our data set includes more of those examples that are relevant to the concept that we’re trying to learn in the future.  

Robustness to random and targeted noise 

So robustness to random and target noise. So there’s a risk, especially with, actually with any data set. But I think especially with small data sets, that our models output will vary significantly when confronted to slightly different inputs. This especially happens with overfitting. But what I want to illustrate here is perhaps best shown by the picture that is on my screen with the pandas that ended given example.  

So this picture of the pandas right you see is, and the picture on the left on the right are indistinguishable by human like, if you were to look at them at random ww able to guess which one is altered, or purchaser or which one as noise in. But the model on one side predicts abanda and on the other predicts a given Why is that? That’s perhaps because our model, learn patterns that are not replicable into the nature, that even when we transform our data a little bit, and everything fails. And that is something that if my model, and that was building like an image classifier, like where to give like this given prediction, I would be really concerned by its ability to predict accurately on new scenarios that might just be different from what it seemed before.  

What we want to target is more robustness from our models so that our model with any kind of inputs will be able to create the predict the correct outcome. So in order to evaluate whether this is the case for our model, whether like if I’m type of model that I just trained is that type of model that would predict that even when it’s clearly clearly a bender, then I should evaluate my models performance with random or targeted inputs. And so what we mean by that in this example, obviously, this is a targeted transformation. So it means that for the smallest change that we can include to our model, how, how big will be the loss? Like how can we maximize the decrease in performance of our model.  

So that’s something we should evaluate. And if we see that, because the the example that we’re showing here, might also happen in nature, right? Like, there’s no guarantee that this given picture will come up in your production data sets.  

So there’s different ways, of course, to make sure that our models are more robust, and that they will be able to create predictions that are more reliable in the future. And it’s either by performing data augmentation or by like, creating those altercation inputs when we’re training models, or also by creating those targeted transformation inputs, when we’re training our models that will guarantee that my model will not be creating those Gibbons prediction.  

For those who are curious, I’m sure, maybe half of you already alert on Google, what a Gibbon looked like. But here’s one, I thought to you look very much like my colleague, it’s something we should also look for. And so that is what our model will ingest. 

Data drift 

So we want to make sure that there’s a, there’s actually a big risk, that there’s differences in distribution in the input that my model is expecting. So we’re using different distributions, the pattern might not change, like the magnetic concentrate, but something but just that our data is changing, and that our model has never seen that data in the past. So think of your model as hundreds of different features that come from let’s say like this, this, this telecom example. Well, it might happen that all the sudden the expenses or the age of my customer drastically changed what in this scenario and that model is not able to respond accurately with this change in the data that it sees.  

So with those unseen input, my models performance will decrease. Like on the top right and graphic with time, my model quality decreases, because the data that is shown is changing. So, what can we do to evaluate whether this is the case from with our model, or we can evaluate it whether the distribution of incoming data is similar to the 20s data. So, there’s tons of tests to do perform, like to assess whether there is data drift, one test might be population stability indexes, Kolmogorov Smirnov t test or like just to look at the shape of the distribution, but we cannot assume that our model will perform at an expected level when it’s confronted to different inputs.  

There’s different actions that can be undertook by a data scientist to avoid this problem. Like we see here on the, on the bottom, graph our model retrain constantly, after some time to make sure it was up to date with the new distribution. But we can also just cut the features that are drifting. Obviously, if those features are great importance in the prediction of our model, then it’s something like we would not want to do but if the feature as a small contribution, and is drifting a lot, potentially, then we might want to go away from this feature, because it might just cause more erroneous predictions. And there’s a bunch.  

These are the tests I wanted to cover today, of course, there’s a different different test that any data set that is should perform into the machine learning project cycle, one of which is model simplification. So like I like with the pipeline, I was mentioning earlier that we want to simplify, our model should also aim to be as simple as possible for the use case we’re trying to create a model on. So we should test whether there’s there are models that exist that are simpler, but then abin, same level of performance.  

And we should test for data leakage. So that’s when a feature that we contain in our training set in our test set is a perfect predictor that won’t necessarily be there at the time of prediction. So that false silly, like boasts the performance of our model.  

And we should also look whether there’s overfitting. So if our model, learn a pattern in a really sensitive way, and did not learn something that is closer to the true underlying function of our data. So in our, I just, we no one likes sad ending stories.  

The alternate fate of an AI system 

So I just wanted to give you maybe I have an alternate ending to what the story we were previously listening to. So by performing all those tests, like hopefully, the data science team is able to recognize like that the incoming data or that is somehow different from what the model has seen during training. And they’re able to draft this more robust model, right.  

And on top of that, by performing like, all those, this q&a, and those stairs, they they’re able to derive feature expandability and to show to the stakeholders, how the data is moving through time. And the stakeholders feel involved and they are able to understand like on which basis then the model and its prediction, right, it does not become like this kind of scary black box, but more sort of like living environment that is really dynamic.  

And that’s something we should expect in production, right? There’s no such thing as a nice juggle environment, in the in the world. And they will also understand like what are the associated risk of using this model and most of all, that they know that a proper due diligence and quality assurance was conducted by the data science team. Here I will lead that the Simon’s speech 

Simon Shaienks   

Thanks, Simon, I know what a Gibbon looks like now, so I’ve learned something useful today. Thanks. 

It’s now time for the second part of the webinar. Today, we’ll go through a quick tour of AI. But I want to first take a quick recap of what we discussed so far.  

We started by looking at the existing QA approach in machine learning or lack of I should say, as one of the main reason why our ML projects fail to get into production or in the case of Simon, while he was actually in production, then we saw how we could inspire ourselves software engineering that hasn’t mature practice and testing that has greatly improved over time, the quality of the code that we release.  

And through some of that very cute Corgi and panda example. Thank you for that Simon. We saw we can gain from a systematic and rigorous approach to testing in machine learning.  

But that’s all great. But the question now is, okay, but how can I test all this? What are the solutions to this,  

You can do what most people do, which is doing nothing or practically nothing. And what we saw with Simon with his telecom example, what that led to,  

You can craft your own solution. And yes, there’s plenty of open source and Academy literature on on testing machine learning, but they required a lot of time to investigate, run to interpret, maintain, and you’re not even sure from the start if the right test to begin with. And you still don’t have kind of that systematic, scientifically backed approach to testing. And again, we can inspire ourselves from software developers, they don’t program their own testing tool, they use readily available tools. So that’s why I was a big introduction.  

But that’s why we built a Snitch AI or model validation tool. We’ve taken basically the framework that Simon explained and integrate it into an easy to use tool for the data science team. You simply have to add your training and testing data set and your trained model to our cloud or on premise environment and let Snitch AI run a series of mathematical tests to assess its quality.  

But I want to talk about that framework, because your model validation is only good as the framework as your framework at the end of the day. And so we’ve worked with two renowned research institutes here in Montreal, Montreal being one of the leading research clusters in AI in the world, we work with Polytechnique Montreal and Ivado. Through our partnership, they evaluated and refine our model validation metrics to make sure that they are accurate. So basically, you don’t have to go through all the academic literature, including that famous given example, that Simon had to.   

Perfect, so I go into Snitch AI, I’m going to share my screen, Instead of walking you through the tool itself. What I did is I prepared for different use cases that are important when deploying or monitoring your models.  

So the first one being the Ops team wants to know if a model is ready to be put into production. The second one being data science. team manager wants to know if other projects are properly audited. The third one being the data science team wants to know where and how they can improve the quality of their model. And finally, the data science team and manager wants to know if a deployed model is still performing. So that’s going to Snitch AI for the first one to see how they’re Ops team wants to know if a model is ready to be put into production. 

I just logged in into Snitch AI. And I have all of my project here. So our team has worked on a model to predict the price of the houses in Boston. And they want to know if it’s ready to put into production and they validated it with Snitch AI I so I’ll go into that specific project. And I’ll see the model that they’ve put in here and validated. I can see right away that there’s a warning right here, the quality score was 75 for it and there’s significant data drift per testing. So right away, I know that there’s warnings and things that I should look on. And I’m now in the model that I’m in the report itself to get a bit more idea of what are the warnings and the a recommendation that we have on at the top is that the model has addressable flaws, and the test results will be addresses. And the this is kind of the executive summary of Snitch AI. And you can see on one of the tasks, which is extreme noise robustness, it failed. So this model is definitely not ready to be put into production.  

Moving on to our second use case, the data science manager wants to know if all the projects are properly audited, I’m just going to go back to all of my projects here. And so here I have an overview of all the projects that my team worked on with the models inside of those project. And I can see all of their score for the latest or in production models. In the case of the project that has that we’re we’re trying to predict the maple syrup price, I can see that I have one in production and want to make sure it’s been validated accordingly. So we can see right away that it had passed all the tests and has a significant good quality score on it. And all of our reports, if we go to the end section, or our site signed with a seal, a seal of authenticity, so they’re all stamped for compliance and auditing purposes, you can see that their PDF there can be shared and stored as well.  

Third one, the data See, data science team wants to know where and how they can improve the quality of their model. So if we go back to our first model that we saw that was trying to predict the house price in Boston, well, you know, we had some warnings in there, especially odd, we failed the extreme noise robustness test. So the second part of our report goes into a lot more details on these tests. So if I go to this section of this extreme noise of this nest, I have an overview of the results. But I also have a detailed also explanation to those to those tests. So for instance, here, we can see that we injected some noise levels of 5% random noise to our model, and that has increased by 11% or loss, which is not significantly too bad. But if we look at the extreme noise, and where we injected 10% noise there, we had an increase on the loss of 238%, which may sell our our tests. And so for that one, we would be recommending some adversarial training on the model to improve its resistance and ultimately, its robustness once in production.  

And our last use case, the data science team wants and manager wants to know if a deployment model is still performant. So let’s say this model the Boston house price, we actually fix all the issues that we had on it. It’s put into production. And we want to check if I want to generate a data drift report to see if everything is is for having any drift there. We can see on this example that it has significant drift. So we have a second report specifically tailored for data drift. And that one, and we have seen it this particular test as failed. So we run a series of tests on that one statistical test Kolmogorov Smirnov outliers, we do a PCI as well. And the combination of all those tests gave us that the exact features where we’re seeing drift happening in our data. And from there on as a data scientist or as a manager, we can have an investigate these features to see if we should possibly remove them or actually start retraining our model.  

So that concludes our quick tour. This was a really, really quick tour. But we do have a free trial on our website. If you want to explore it. There’s a demo in the trial. If you don’t have any model, if you want to just explore it. You can also upload your own model and test it. It’s fully featured. So have fun with it.  

Question period  

We’re going to open up the q&a section part of this webinar. So you’ll see at the bottom of your screen, you can type in your questions. And Simon I will will respond to them as they come. I we had a question about the so far. If the webinar was really recorded, it is recorded and you’ll you will get the recording no later than tomorrow. by me, I will also include a link there to book Simon and and I, we’d be more than happy to go into any things that we showed today on Snitch AI on the models, the test that you should potentially do to QA your models, we’d be more than happy to assess you guys. 

Simon Dagenais   

Always happy to talk about quality assurance. 

Simon Shaienks   

Yes, Simon very passionate about quality assurance. All right, well, we don’t have 

Simon Dagenais   

I have someone typing here. 

Simon Shaienks   

Have someone typing? Yeah. There we go. 

Simon Dagenais   

How would you detect data leakage? That’s a very good question. Simon, do you want to take this one? 

Simon Shaienks   

No, I’ll leave that 

Simon Dagenais   

I think would feature importance, like deriving feature importance is a good way to detect that knowledge, because it will automatically pick up that this feature perhaps as 100% of the contribution, or like a really important part of the contribution is in one specific feature. So when you’re creating those features in it, like it should be a warning sign to see a very skewed distribution, and that like a few feature, they are automatically predictive. And 

Simon Shaienks   

I have a very good story about data leakage that happened to me as an end user, that I can tag to this, it was actually none other company or ahead of AI had done a lead scoring system. And he was very, very proud of his accuracy. It was like unseen accuracy. And so we start using the the lead scoring system. And it wasn’t working like it was it was complete nonsense. And actually what happened there. And when Olivier went in there and started digging into his data, he realized that one of the biggest predictor of his lead scoring was a specific email. And that’s just a fake email was that thank you email when you bought when you bought your your sharegate license at the time, 

Simon Dagenais   

so that email did not exist, like or was just like a zero, like in their metrics, I 

Simon Shaienks   

suppose it was, obviously, like, if you’re getting that email, you will be a very good lead, because you ultimately you will be purchasing. So. So that’s the story. You got another question match? how mature is the on premise version of this tool in terms of deployment in QA, tubes and integration with the AP authentication? 

Simon Dagenais   

Very good question. I certainly want to go ahead. 

Simon Shaienks   

Yeah. So a bit more on the on premise version. So the way the on premise version works is, we supply you with basically the heart of this, of the solution, which is the model validation framework, basically, in a container, and a Docker, you can integrate directly into your machine learning pipeline. So in terms of there’s no authentication, or your deployment, but it’s very customizable, since it’s in a darker in terms of integration or deployment. 

Simon Dagenais   

And, of course, we provide supports, in this case, to make sure that it’s fully integrated into your MLS pipeline. 

Simon Shaienks   

There’s a bit of documentation on our website, but I mean, if you need more, more clarification on it, we’re more than happy to help you show how it works and how it can integrate with your specific environment for sure. Assemble that I was gonna know what what, what data format, can the platform really support images texts, trying to take down when Simon Yeah, of course. 

Simon Dagenais   

So right now, our the platform expects any CSV or NumPy. Even like 3d matrices. So let’s say you have a STM are more of a time series problem, the platform can ingest those data sets. And so it basically can take any kind of data, they can be fed into the platform, and you can have an output on all the features that were mentioned today. But I think personally to be totally transparent, the I think they would be a bit more like the platform would be much more like sense or like a reward return something that makes much more sense for like tabular data, but time series and also the uploaded without any problem. I think Yeah, the tests are somehow specific, like are kind of different for images, and NLP. But oh, that’s a good question. So yes, I didn’t mention NumPy. That question will 

Simon Shaienks   

just like, nobody sees the question. So we’ll just read the data. 

Simon Dagenais   

So the question was, you mentioned NumPy files, is the platform specific for models treated with Python libraries? That’s a very good question. So right now we support, sighted learn, and extra boost or TensorFlow. So you can either upload a panda’s data frame that will, like directly match that your model expects so like, the way it works is that let’s say your model expects a matches that as maybe 12 featured, then you basically have to upload that matrix and your model file and the platform will read those file and perform those tests. So yes, right now, the platform is specific for models created with Python libraries more specifically, and actually boosts sighted learn and TensorFlow pytorch is torches. Right. pytorch 

Simon Shaienks   

pytorch on the horizon. Perfect. That was our last question. We’ll stay one more minute if anyone wants to put in last question. But as mentioned, we’re gonna send you the recording, you’ll have our contact info in there, feel free to book us or to send us an email with any of your question. We’re more than happy to assist with anything. So perfect. Well, thank you. Thanks, everyone, for joining us. I hope you like it and see you see you soon, perhaps another one of our webinar. Thanks. 

Simon Dagenais   


Guillaume Petitclerc

Guillaume Petitclerc

Guillaume is Snitch AI's CMO. He's a seasoned digital marketing expert. He has been Marketing Director of multiple startups in the past, where he helped managed marketing strategies for products and build performing teams.

Read more

Book a demo!

Get Started with Snitch AI

Learn everything about your model blackboxes.

Submit this form to request access.