Quality matters: for insightful quality advice, get to know your big data

Per Nymand-Andersen

This chapter outlines where data scientists often go wrong when trying to use big data sets to describe relationships between variables. We discuss these problems and explain why data quality analysis should always be done up front before any analysis takes place.

INTRODUCTION

Big data is a wonderful playground for statisticians and economists and provides ample opportunities to test existing theories and discover new causations among large data sets to trace down new insights which, if proven to be consistent over time, may well generate new theories, particularly in the social and behavioural sciences.

It is a wonderland both for those researchers who are curious at the micro level and for those with a macro-level oversight, as big data approaches offer the prospect of supplementing existing predictions and stimulating both academic and political debate. Indeed, using alternative data sets may create the opportunity to adjust our model-based theory and recognise the fragility of the assumptions imposed on our models in order to make our predictions more realistic, while describing the uncertainties surrounding our models and results. It may therefore give us the

To continue reading...

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: