Today, the world of research and data science is undergoing one of its most substantial transformations. Big datasets generated as a byproduct of people’s online actions and choices are used in nearly every field of research. As we leave our digital footprint with every interaction, often unintentionally, we empower mighty artificial intelligence (AI) systems to analyze and generate insights from these large and complex dataflows that come in many forms. These complex models have recently achieved great predictive and classification successes. While they certainly outshine with their ability to sketch complex, non-linear patterns that a human brain cannot comprehend, they often produce outcomes that miss important aspects of real life or sometimes simply don’t make sense.
An MIT paper found that the AI applications concerning the identification and detection of fake news could be improved considerably by feeding the system with more rarely accruing, but actual cases of fake news released by reliable authors and sources. Another study from Microsoft and MIT on facial-analysis AI systems shows an error rate of only 0.8 percent for frequently accruing lighter-skinned men, and 34.7 percent for dark-skinned women, a segment that is less likely to own a featured phone. Similarly, self-driving cars very successfully recognize the most frequent objects like traffic lights, people, crosswalks. However, they often fail to identify less frequent pedestrians, such as animals, and more dangerously, they fail to predict their possible reaction to a moving car, something humans can easily do. These and many similar stories are just an introduction to countless commonsense, mental, and emotional aspects that today’s AI fails to capture.
As the research and data science field utilizes more of these increasingly available residual data, which get bigger in sizes and shapes, the outcomes can become increasingly artificial. At the same time, they also tend to become less intelligent as the field solely fulfills the big data quantity requirements of these complex computational systems. However, this ignores the commonsense and learnings that only come with purposeful, smaller datasets, that are less prevalent but much more insightful. After all, the interpretations required to ascribe meaning to machine-generated outcomes come primarily through learnings, knowledge, and commonsense of small, but deep data. Omitting this vitally important piece has its implications on actual use-cases and benefits of AI systems. Seven out of ten companies investing in AI report minimal or no impact from their AI projects. Moreover, only 4 percent of AI applications are currently critical to businesses.
With the extensive use of readily available residual data, AI systems tend to become more artificial and less intelligent, unless we integrate the learnings and commonsense that come primarily from purposeful and small data.
Clearly, the most prevalent underlying issue that many AI systems suffer from is their inability to readily handle so called ‘edge’ cases and smaller bits of learnings that come from smaller pieces of data. While big data is extremely powerful in complex classifications and categorizations, small and purposeful data is all about finding the greatest and most needed “WHY?” Therefore, it is not surprising that the majority of the biggest innovations of our time are based on small data. Successful outcomes of research and data science—which are competitive advantages for users of these outcomes—will not come from off-the-shelf automation algorithms ready to munch big data chunks. Instead, successful outcomes will come from tiny bits of learnings, perceptions, emotions, expectations, creativity and intelligence that come from smaller purposeful data being incorporated into these algorithms.
Many data scientists are now looking into augmenting the AI systems with the aim of incorporating small data learnings into AI as an ultimate path to a complete insights-generation strategy. Also, many are talking about the need of small data to play a big role for the research and data science field to succeed. Several technical solutions, such as few and one-shot, or even less-than-one-shot learning (LO) and transfer learning techniques, are being developed and further enhanced to help quantity-centric AI systems utilize qualitative learnings from smaller datasets.
The successful future of research and data science is where small, purposeful data plays a big role.
As technology gets streamlined and ready to embrace real-life nuances and intelligence gathered though small data, the role of good quality purposeful datasets will increase immensely. Of course, these types of datasets are hard to get as they present more sensitive, sometimes less evident, but important rudiments of our lives. They are most often people-oriented and question-specific, typically collected intentionally, with a clear purpose and data collection techniques, as they often explore off-grid opinions, trends, and populations. Collecting these kinds of granular insights from targeted subjects is not an easy task for the researchers for sure, neither is it free of biases.
Here at FINCA it’s our expertise to collect purposeful high-quality data. While FINCA’s portfolio ranges from responsible financial services to social enterprises, it represents the same social mission – serving vulnerable and marginalized populations. Oftentimes, the voices of our customer segments are poorly represented in big data trends and outcomes. This is because they have a limited digital footprint and reside in remote communities all around the world. Nevertheless, they offer immensely deep and important prospects as a distinct population segment, and we like blending these prospects with all we do.
FINCA’s Research and Data Science team elevates these marginalized voices by tailoring surveys, collecting, and analyzing purposeful data. To perform this fairly complicated research task with high quality, we use our proprietary data management platform, ValiData. The platform helps improve purposeful data collection practices though automated data validation rules and techniques. It screens datasets in real-time for anomalies, outliers, and biases using advanced statistical techniques and machine learning processes.
Oftentimes, the voices of marginalized are poorly represented in big data trends and outcomes. Nevertheless, they offer immensely deep and important prospects as a distinct population segment, and we like blending these prospects with all we do.
The learnings and unheard voices that we bring from all around the world using ValiData is the lifeblood of FINCA’s business, which help create and improve our services and assess the impact of our programs. Every day, we observe the immense value that purposeful data can bring into the field of research and data science, and inform the right business decisions, especially if it is collected with high quality standards.