The rise of the data centers
Cloud & Big Data
IDATE publishes a report which explores the various technical and economic issues bound up with cloud computing and Big Data – giving readers a deeper understanding of these concepts that are the subject of so much talk today. Among the key points examined is the critical role that data centres play when implementing technical solutions. Readers will also get a detailed profile of the players, both public and private sector, who are shaping the market.
“The development of cloud computing is bound up with challenges primarily in the arena of infrastructure, using powerful data centres and the IaaS sub-segment that involves making distributed computing resources available on demand. This is the cloud computing market’s most diversified segment in terms of ecosystem, and populated by all players involved in infrastructure – not necessarily IT – as well as those that own their own networks (e.g. telcos) and/or server farms (hosting companies, CDN) and massive databases (service operators, integrators)”, says Julien Gaudemer, Consultant at IDATE and project manager of this study.
“Across the board, these players are leaders in their field, as barriers to entry are made colossal by the investments required (real estate, servers despite optimisation capabilities opened up by virtualisation) and operating costs involved, especially when it comes to power. Energy consumption is in fact a major source of concern. And the role played by public authorities is potentially a crucial one, both in their ability to stimulate market development, through financing, and in their regulation of sensitive issues such as data security, privacy, etc.”
The cloud: big data facilitator
Until the concept of Big Data emerged, data were treated locally in data warehouses composed of several structured databases. Little by little, data sources have diversified, become relatively heterogeneous in terms of format and, above all, have been located chiefly on the Web. Another important point is that they are being produced continually at a steady pace.
The massive computing capabilities needed to be able to make use of this trove of information and these data streams are often available only in large data centres. Cloud computing makes it possible to rent computing power in a storage/hosting space tailored to big data processing. Only a handful of players are capable of performing this process using their own infrastructure, in terms of the IT equipment needed to do so. So the cloud will make big data available to small and medium businesses and other players who have no expertise in data processing.
From a technological standpoint, because traditional datamining solutions were not adapted to these new types of data, innovative software solutions started to appear. And the concept of unstructured databases became increasingly common, as opposed to the traditionally employed structured data. These structured data refer to information that often has the same format, whereas unstructured data will often have very different formats. The term NoSQL (not only structured query language) covers all of the techniques associated with unstructured data. Google has created an applications development environment called MapReduce that makes it possible to break a problem down into sub-problems solved in parallel on different servers. This environment gave birth to the Hadoop solution developed by the Apache software foundation which is now widely employed in the arena of big data. Parallel computing is by now legion in a great many big data schemas, helping to reduce processing time considerably and to achieve real-time data processing.
There would seem to be a very large number of potential applications given the relatively wide array of data sources, and the growing number of open data initiatives that are making government and certain private sector companies’ data available for public use. The Internet of things (connected sensors, RFID chips, etc.) will also be fertile ground for potentially major applications. This will be especially true in the areas of marketing and business intelligence, and in vertical domains such as medicine and geology which could be heavy users of big data solutions. Existing datamining applications are also concerned as they can be improved by the use of outside data, and by employing big data processing techniques.
For now, however, these solutions are only being offered, or explored, by a handful of players that specialise in data processing, have control over their own infrastructure and are expert in classic datamining solutions. Big data requires a very specific set of skills which is not yet terribly common in the IT world. Developers who are not skilled in treating data generally need to get special training to be able to develop big data applications.
So big data is still only nascent, and its technical aspects are not yet fully conceived or available. In the medium term, big data will likely benefit those players who are capable of gathering massive amounts of data and treating them in a meaningful way: Google and Facebook are the first that come to mind. Google thanks to the crawling techniques and infrastructures that fuel its search engine, and Facebook thanks to the information it has on its millions of users.