When Harvard Business Review called it “The Sexiest Job of the 21st Century”, the term “data scientist” became a buzzword.
Dr. Usama M. Fayyad, a veteran data scientist and probably the first person to ever hold the title of Chief Data Officer, was in Mumbai this week to attend the largest data science summit in India, organized by the data science congress. Putting it simply, Dr. Fayyad, in his key-note speech during the event said “a data scientist is someone who knows lot more statistics than a software engineer, and lot more software engineering than a statistician”.
He said that many companies are buried in data but are not able to take advantage of it. A data scientist helps companies in leveraging data to get insights. With more data being available and rapid changes happening in the data ecosystem, the cost of acquiring, storing and analyzing data has come down, opening a lot of possible new ways to leverage data for key insights. Big data technologies like Hadoop and spark help in storing and processing huge volumes of structured as well as unstructured data in more cost effective and efficient ways. Dr. Fayyad mentioned that big data technologies can also help to simplify the enterprise data storage complexity by removing the need to keep multiple copies of data across the enterprise.
Dr. Fayyad also highlighted some of the challenges in adopting the Big Data technologies for analytics. He mentioned that the landscape is fast evolving and confusing. Dumping all data into a “data lake” may give some quick results, but maintaining a data lake in the long run can quickly turn into a nightmare. Unless the best principles of data governance and architecture are diligently applied, a data lake may sit there becoming toxic, with nobody knowing what data is inside it and how to use it.
Speaking on applications of data science, Dr. Fayyad gave the example of “Know Your Customer (KYC)” process. “Customer intimacy was lost when banks scaled, and now banks are spending huge amount of money to collect KYC data. Instead of considering the KYC data collection as just a regulatory compliance activity, if this data is used to understand the customer better, banks can hope to bring back the lost customer intimacy”. He also mentioned about an analytical model that was implemented in Barclays, which alerted customers when they were over-spending. This model received 60% response and was an instant hit. He said that data scientists should conceptualize and build similar innovative applications that are simple to use for the end users, and yet extremely effective for the business.