Friday, June 29, 2018

Week 3

Transitioning away from the purely clinical aspect of my previous weeks, I spent the majority of this 3rd week learning about my research project, which deals with pneumothorax (lung collapse) during percutaneous lung biopsies under CT guidance, and how we can try and predict the incidence of pneumothorax as a function of how we insert the biopsy probe. This is an area of interest in the interventional radiology department, as these biopsies are some of the most frequent procedures performed (in this hospital, they see hundreds of patients a year for lung biopsies).

This provided both an opportunity for me to both use and improve my quantitative skills. For a case like this (binary output in response to large set of parameters), a common method to try and infer the importance of individual parameters is a logistic regression model. I’ve seen applications of this before but have never had the chance to explore the nitty-gritty details, so a couple of the days this week were devoted to understanding some of the underpinnings of the math. Already in reviewing some of the papers attempting to explore similar issues with the same methodology, I’ve noted some errors in conceptual understanding as what I suspect are clinicians use statistics software without properly preparing their data. For example, the misuse of categorical variables. Commonly, when categorical data is used in this type of data, it’s actually separated into n (one-hot) or n-1 (dummy variable) binary variables if the categorical data is not ordinal. This removes the bias of relative valuation – for example, if we imagine the categorical variable to be eye colour, assigning a value of 1 to brown, and 2 to blue, we’ve now explicitly valued blue eyes to have “twice” the value of brown eyes, which is a nonsensical relationship for what we’re interested in exploring.

In addition, I was able to explore some common tools in the machine learning toolbox – a particular example being the gradient descent function to minimize the cost function. What I found to be neat was to be able to learn about this method through linear regression – an algorithm that has a definite solution that we can use to compare gradient descent to so as to be able to see how it can be applied more generally.

All in all, while I don’t have access to the full data set yet, I’ve been able to get a model up and running using randomized data, so when I get access, I’m hoping to be able to learn something interesting with respect to previously undiscovered relationships!

Next week, I’ll be spending the majority of the week expanding the dataset we need for the project. This will involve compiling information about 1000! lung biopsies (but considering the scarcity of this type of data at all, this represents an incredible opportunity to work on something novel) so it’ll likely be the most monotonous work week I have, but I’ve scheduled a couple of other nice bits of clinical shadowing to be a part of my experience to try and break up the week. Of particular interest is a morbidity and mortality meeting, where I’ll be able to learn about how systems level errors affect procedural outcomes.

No comments:

Post a Comment