Saturday, February 16, 2019

Bully the brunt


Bully the Brunt
I was avidly watching the movie “Gully Boy”. Watching that movie and the acting in the male protagonist Ranveer Singh itself was a life changing experience for me. I must say that it was more like a soul searching and mirror reflecting moment for me, as in more ways than one, I could see the effect of the actor as well as the movie boomerang on my nonchalant nature, so much so that it created an upheaval in my mind and led to a gigantic clash of ideas and creative juices. I intend to use this blog post as the first in the series of a few savoury yet succinct messages that are sure to stir your soul.
I would like to start this blog by ironically two of the last messages shown in that movie, one which was spoken by Ranveer himself, and the other which I saw from both my eyes and that left a compelling tail effect in between my eyes. First, Ranveer had told to his father during a heated and a heavy slap exchanging elongated discussion that he had refused the job only because he wanted to pursue his passion which was to be a top rap singer of India, and that people in this world would like to remember his good deeds especially when they look up to someone who has been raised from a stinking slum in Mumbai. This act of selflessness seeped into my these days shallow soul and stirred me to life. All my life, I had adored and yearned to follow such humble personalities – mostly from the world of sports like tennis and cricket - as Federer, Messi, Atal Bihari Vajpayee, Steve Waugh, Manoj Bajpayee, Anil Kumble, Rahul Dravid, AB de Villiers, Usain Bolt, Mother Teresa, Brian Lara, MS Dhoni, Yuvraj Singh, Martina Navratilova, Kumar Sangakkara, Shane Bond, and Stefan Edberg, and of course my father who has now become the COO of a large European multinational firm. This is because all these men had and still have the diamond like trait of having explored the hardest heights and the dizzying depths of staying cool in the face of adversity yet coming across as the most humble and approachable people. Many of these now illustrious names were not innately talented, neither did they come from rich houses where four rooms would have sufficed but were still having ten rooms. They all came from not so gifted backgrounds, which can aptly be captured by my creative brain as a narrow and choc-a-bloc street, which in the slums of Mumbai can be resonated with the Hindi term “Gully”.
The other message that I saw from both my eyes and that left a compelling tail effect in between my eyes was the word “bblunt” with the first occurrence of the letter ‘b’ occupying much bigger vertical space than the remaining words in that letter. Somehow, it clicked me that I can carry forward that alliteration of the word ‘b’ and find a three word title to my blog. I quickly thought of “Bully the Brunt” because a bull is more positive than a bear, and the word bully rhymes with the word gully.
I was brought up in a very normal place in Bihar, where I had spent fourteen of my formative years next to the residence of MS Dhoni. I had spent most of my time either playing sport or practicing painting near streets that were often swamped with natural yet creepy creatures like crabs, dogs, scorpions, snakes, and snails. That experience had left me ingrained with a feeling that even though I am facing lot of adversity and obstacles in my road to success, yet there will come a time when people would like to remember me only for the right reasons and for having created a lasting lyrical message in their lives.
In the movie, there was a contest shown where the judge rotated a glass bottle on the floor, and the contestant on whose side the bottle last pointed toward, was gifted the chance to open the duel of singing rap song. At this, Ranveer had to bear the brunt of the other contestant who belonged to a very rich background and sang for a very long while and did not hesitate from making some very personal jibes at Ranveer. At this juncture, it was easy to imagine that Ranveer was being bullied. However, Ranveer picked up from that point and instead replied in such a machoistic manner that he held that contest by the scruff of his neck.
I am not an army man, but being passionately patriotic, it feels to me that I have come to the culmination of my first of a series of blogs in a manner that feels like a pyrrhic victory for me. I would like to end by the caption “My time and your time will surely come” - I saw this in spite of my not being a good time manager these days - because it is important to leave my readers with a poignant image of them imagining only positive elements in me.

Sunday, September 9, 2018

The still undercurrent


It was a scene from the famous movie ‘Saving Private Ryan’ where Matt Daemon was narrating to Tom Hanks (and they are my all-time favorite actors alongside Al Pacino) that he and two other friends were trying to save the life of the girlfriend of one of those friends as she was being attacked by a very violent mob. The gifted girl had climbed up a tall tree that was also fairly fat when compared to the swank, groggy lasso. Her boyfriend Adam was ruminating because he had left her alone in the middle of the mobile mob that was surrounded solely by soon to be bygone bushes behind the backyard of their huge house. Now afraid to the highest intensity of upon having climbed to the tree and reaching the apex of that tree, she in a still motion swooped down to the ground (not the Gradient Descent method way!) but found her clothes and her inner wear getting shredded to pieces before flying in all directions resembling a gyrating motion, including in the direction from where Matt was watching in a state of pensive yet constant shock. To add to the hysteria, she had touched down all the brown branches of that giant tree (she had not having any time to apply any pruning mechanism or optimized her way to descend) before landing on the ground with a loud thud. There she lay like a still leaf in a rain socked pool of water having little undercurrent.
Sadly, Matt reckoned that it was exactly two years ago that that scene had transpired and when he and those friends were last to be seen together.
Having heard this, Tom replied with ‘Oh, my’ and Matt quickly convoluted to a state of still and then let his laughter aloud. He was laughing from his gut. But before he could gather his wits, his head felt a huge undercurrent and his mind ricocheted in many directions that were not devoid of desolation and despair (we can call these weighted vectors and tensors when speaking in an academic institute of repute).  It was the moment of melancholy which also resembled a lull before the storm, or as I call it the still before the undercurrent, because they were laying stranded in the middle of a fierce battleground with the sound of guns and artillery blaring their ears for fun.
This scene was shown right after the interval of the movie.
Perhaps we can have similar moments in many other movies which decide their fate. For instance, in the movie ‘Sanju, the movie was veering towards a dull biopic of Sanjay Dutt until after the interval, a Gujarati origin man named Kanhaiya from New York city, never married before persona, entered the movie and gave some sane advice to Sanjay Dutt, and stole the limelight from Sanjay due to his demeanor, easy on eyes acting that was often juxtaposed with traces of wit and humor.
We should get videos of such scenes that can potentially change the course of the entire movie, resembling the undercurrents that we can often see in the business world! Can we cull out such patterns spanning say ten minutes from an otherwise two hundred minute long movie, and exclude those movies whose interest and plot tails off in the first half itself?
There can be other patterns in the movies as well, such as the beginner's luck as was evident in 'Dev D', 'Vicky Donor', etc. But we can analyze that pattern in our future analysis.
Definitions of undercurrent:
1.       A flow of water that moves below the surface of the ocean or a river.
2.       A hidden feeling or tendency that is usually different from the one that is easy to see or understand.
3.       There is a strong undercurrent for Mergers and Acquisitions in India right now. Many conversations are happening, even if the deal news has not been announced.

Thursday, March 2, 2017

The Popping Pink



The popping pink
I was seated on my couch of the sofa comfortably watching a sports show on my television. There was a fond and vivid memory of the three chocolates that I had munched and then gobbled one after the other in quick succession. There were precisely three chocolate wrappers that were making funny sounds in the left pocket of my pyajama.
Suddenly, the shun shown stupendously and so much was its vigor that it took my eyes of their comfort zone and all the way up to the center of the garden of our house. So much perfect and precise the rays were that I was forced to fixate my eyes on to a very large pink flower. I was so surprised to look at the radius of that flower that it evoked both a moment of happiness and emotion of tears on my face.
I instantly took my phone and switched on its camera to take a snap of the popping pink flower. When I had met my desire to take that snap, I then decided to go to the front of the house and take the snap from there. It was as if the entire flower was pellucid, a feature which was forcing me to take an all-round view of it by forcing me to take a peripatetic path.
When I told my wife about that flower, then she asked me the question “How old is it?”, then swiftly came the reply from me “How is it old?”
The queer syndrome did not stop there. Later I noticed the pocked of my pyjamas and found to my surprise that the chocolate wrappers were brick red and white in color, the colors which when mixed yield a bright pink hue just like the one that was painted on that popping pink flower.

Sunday, January 17, 2016

Variable Selection

First, apply simple steps to remove junk variables. For instance, throw those variables that have got too high a proportion of missing values, too low coefficient of variation, etc.
Then apply Weight of Evidence and Information Value (IV).
Then throw those variables with IV either too low or too high.
Then choose those values that have a high WoE.
You can do this for categorical variables. For continuous variables, first apply binning and then apply WoE and IV to those.
Then apply VIF (Variance Inflation Factor) to remove those variables having high muliticollinearity.

Addressing Multicollinearity

First, X and X^2 or for that mater X and XZ can not be considered as co-linear in a mathematical sense. Correlation coefficient as we commonly used to
measure collinearity between two variables bound give erroneous results. So, collinearity measured by this measure is not useful in such a case.

One should use correlation coefficient if you are relatively sure about linear relationship between two variables.

In case of Logistic Regression, All predictors should follow normal distribution and normalization is done to make sure that predictors are unit less (and hence additive in true sense). So if X and Z follows normal distribution then neither X^2, Z^2 nor XZ follows normal distribution. Now one can argue that after applying CLT XZ, X^2 or Z^2 will follow normal distribution, but under limit (n -> inf)
Hence X^2 or Z^2 or XZ in first place does not qualify to be predictors in case of logistic regression.

The idea of any predictive model is to include all those predictor which has predictive value (or has information about the target variable), if X is a predictor and brings some information what more information X^2 can bring is the question one should consider. In fact by nature of function X^2 it conceals or confounds some of the information which X brings. similar is the case for XZ. (I need not explain that there are better ways to handle interaction effect of factors X and Z)

From above three points it is clear that if you decide to include X and/OR Z as your predictors, you should not include X^2, Z^2 and XZ as a predictor into same predictive model.

About p-values, it is clear that CLT is applied before calculating p-values. These are asymptotic p-values. Once I assume X^2 or XZ follows normal distribution it does not matter whether I standardize X^2 or XZ, p-values are bound to be same anyways there is no magic.



How often do you see customers rebuild/refresh models?

I am interested in how frequently you see customers rebuilding/refreshing models they have deployed in production? Are they using our C&DS automation to schedule these, and do they do any refreshing of their models automatically, or do they have analysts perform these manually?

This is what I see customers wanting:
-Champion challenger
-Self improving models, models learns as new data comes in (ala our naive Bayes, 'self learning' model).  

This is what I see customers doing:
‎-use CADS to store models.  
-manually score
-sometime schedule score
-real time deployment.  
-Never: champion challenger. Reason: a model needs thorough checking when re-created.  Interface in CADS is not on par.  
-Never: refresh: R‎eason: a model needs thorough checking.  Refresh works well, but the storing and replacing the model is too complex because it involves scripting and Modeler/CADS interplay.  

This is what I customers want to do:

Analytical reporting:
-being able to set up an experiment in CADS to keep track of model performance over the lifetime of the model. 
-having a comprehensive (=prebuilt) and configurable model evaluation dashboard.  

Operational reporting:
-being able to see what ‎the model predicts (without knowing the outcomes yet, hence the operational reporting).  
-having a comprehensive and configurable model scoring dashboard.  

In addition: having both abilities working if one has many models in production in a convenient way.  On top of this massive model deployment, having a way to quickly get insight in the model trends and being able to alert the worse performing models.  


CLV (Customer Lifetime Modeling) for Retail (supermarket and grocery chain)

First challenge is identifying customer over visits. There's some entity analytics that can be done: you associate loyalty cards with creditcard/bank account numbers and so you can even identify when the same customer changes cards. Don't expect 100% identification here.   Divide your purchases in loose baskets vs identified customers and provide a separate treatment for both.  ‎For the former you can only provide margin details per basket.   

Revenue= easy, sum of purchases -returns. 
Costs = tricky. Top down approach. Get the details from finance and collaborate with finance on this. If they sell say fresh good and electronics, likely, finance has a p&l for them separately. If you find the costs on department level are 35% of the total revenue, you use that number for every product in that category.   Try to get as deep as possible. Likely you cant do product level.   Important distinction is own brand vs foreign brand, so you can have several factors you can account for. Likely you only have cost data for the main effects (own brand vs foreign brand and category A vs B rather‎ then for each of the 4 combination). Use the raking (iterative fitting) for it to get at the 4 combination level. (raking is used in reweighting surveys to make the research group having similar properties to the population). Available in SPSS.   

Now revenue - cost is margin. On product group level very interesting to ‎visualize. (revenue vs margin%, revenue vs costs, spot the outliers, do some segmentation, color the scatterplots with the segmentation and various other characteristics you have).  

Now roll up to customer level. Use 1+ year of purchases, provide numbers on a year basis.   Again, show that customer margin is not just‎ a matter of taking say 15% of the revenue, but differs for everyone. Segment customers on margin and per resulting segment profile the product groups they buy from.  

Next lifetime.  ‎Properly divide your available time line into parts and use the purchase data from say month 1 to 6 to predict the purchase amount for the next six months. Validate this model by back testing it on the data the year before. Depending on the structure in the data, the model can try to predict 1) are you returning (0/1) 2) what will be your revenue class, or 3) what will be your revenue.    You can try to do the same for margin in order to get a sense of customers that change their buying habits.  

Now you can future tell the customer value. You can properly try to account for net present value etc, but I believe the models will never give you enough resolution to justify going through that additional logic.  

Use the future revenue, costs, margin in relation with the current ones and segment to store level (store type, province, area characteristics etc).  

You can use the results as follows:

1) predict overall revenue and margin for next 6 months (interesting for finance in order to determine strategy, specially on store level)
2) spot customers who are going upward or downward (interesting for campaigning purposes).  
‎3) understand the effects of promotions of article categories in the light of those newly obtained kpi's (interesting for category managers).