New features now available on EconSpark: Learn more!
+1 vote
asked ago in General Economics Questions by (330 points)
retagged ago by
Looking for some (any, really) basic research into "data lakes" and their information architecture from economists' perspective. Use quotation marks for the term "data lakes" as no universal definition exist: agencies such as Statistics Canada, the United Nations and the OECD, do not provide definitions. As an employee for a public-funded, not for profit organization, want to make sure we avoid investing in technology fads. Not a single publication on this topic shows up on AEA's site or its sister organization Ag Econ Search.
commented ago by (870 points)
moved ago by
As you write there is no universal definition of the term you use, could you please expand on what you mean by it and how you consider to work on it?
commented ago by (220 points)
As for me, a data lake is an IT notion having to do with big data. See the following links: https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/
https://azure.microsoft.com/en-us/solutions/data-lake/
commented ago by (330 points)
Thank you, can you please add this as an answer so that it can be "upvoted".
commented ago by (330 points)
Thanks for your answer. The question was left vague intentionally because it sought to generate an open discussion. One commentator directed me to Microsoft and Amazon. For these companies, data lake and info architecture is a product bundle. The snag is that their product comes with a very high (prohibitive, dare I say?) switching cost. Also as a not-for-profit, we do not have the financial capacity to buy into their products. I see that you are based in Europe - are data lakes of the type provided by Alphabet, Amazon, and Microsoft commonplace?
commented ago by (870 points)
I am not expert enough to say what is commonplace concerning data lake usage in Europe. The research on the topic I see is near exclusively from computer science. I wondered why you came to an economics forum for this question. What kind of input do you hope to get from the economists' perspective? Are you interested in how economics related organisations use data lakes or if they do economic research on them? What kind of economic research do you think would benefit you as an employee of a not for profit organization? Isn't that more a question for business economists and computer engineers?

2 Answers

+1 vote
answered ago by (220 points)
In simple terms, a data lake is a data repository in which structured and unstructured (raw) data from many sources is stored. When you just accumulate data and do not manage it properly, your data lake turns into a data swamp. Just google these terms. Google has so many books on this subject with a preview available. I encountered this notion in the context of the internet of things, smart cities and so on. From an encyclopedia: “A data lake is a data repository in which datasets from multiple sources are stored in their original structures. It should provide functions to extract data and metadata from heterogeneous sources and to ingest them into a hybrid storage system. In addition, a data lake should offer a data transformation engine, in which datasets can be transformed, cleaned, and integrated with other datasets. Finally, interfaces to explore and to query the data and metadata of a data lake should be also available in a data lake system. The term “data lake” (DL) was first mentioned by James Dixon in 2010 in a blog post (https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/) where he put data marts on the same level as bottled water, which is cleansed, packaged, and structured for easy consumption. In contrast, a data lake manages the raw data as it is ingested from the data sources.” Actually, the OECD, World Bank and United Nations mention “data lake” on their sites. Just google these: “data lake” site:oecd.org; “data lake” site:worldbank.org; “data lake” site:un.org. Then also try these: “data lake” site:.gov; “data lake” site:.edu; “data lake” site:europa.eu; “data lake” site:ieee.org. So much information!
+1 vote
answered ago by (220 points)
Please also take into account that due to the current information revolution (industry 4.0) companies and business schools have started to change their approaches towards formulating business strategies (by including IT concepts). Please read the article “How Smart, Connected Products Are Transforming Companies” by Michael Porter, famous professor from Harvard Business School: https://hbr.org/2015/10/how-smart-connected-products-are-transforming-companies. In this article, you will find your favorite term “data lake”. And this is the link to the corresponding 1.5 hour lecture: https://www.youtube.com/watch?v=IioQ_jQjM3Q
commented ago by (330 points)
Thank you, @user_5o4c9e and @Jan_H._Höffler. I checked the references that you listed in the comments and answer sections. So building a data lake may be modelled as a fixed &or sunk cost (depending on how the owner sets it up). There may be several issues around market power. Data lakes may scale the firm's production non-linearly. Data lakes may be viewed as heterogeneous production inputs for the owner. Data lakes may require highly specialized services (thinking high switching cost: difficult if not impossible for a data lake owner to switch service providers). The data may be organized in a panel data format (translating the info from the discussion in the sites referenced above).  So there are several potential market power issues that may arise: storing the data (few large firms offer cloud storage), data cleaning (there are a small number of highly specialized firms), and analysing/interpreting the information (again, small number of large firms dominate the market, at least according to Bloomberg stats).  More importantly a data lake requires specialized training, and lack of certification and standard may generate information problems (market for lemons, not cars but skills).  And then there is the role of government policy: for example some data lake service providers are locked out from certain markets if their services violate residence or security requirements. Thank you again for the references and the high quality discussion.
...