Is the Future Solution to Big Data?


By MYBRANDBOOK


Is the Future Solution to Big Data?


For more than a decade, Cisco has tracked and projected global Internet traffic growth and associated networking trends through the Visual Networking Index (VNI). From day one, the Zettabyte has been a benchmark that our analysts have targeted as a major networking milestone.

 

“Does the Zettabyte Era Officially Begins”... What’s all about ?

 

When will global Internet traffic reach an annual run rate of one Zettabyte ? 

 

Well, that day has finally come. According to an estimate, the world’s collective Internet use had reached the Zettabyte threshold for the calendar year on September 9, 2016. Finally… Consider the fact that the Internet essentially began to scale for global consumer and business use in the early 1980’s. It’s taken nearly four decades of global innovation and expansion to reach this digital peak.

 

A Zettabyte is a whole lot of traffic, here’s how big a Zettabyte is:

 

   *  A zettabyte is a measure of storage capacity and is 2 to the 70th power bytes, also expressed as 1021 (1,000,000,000,000,000,000,000 bytes) or 1 sextillion bytes.

   *  One Zettabyte is approximately equal to a thousand Exabytes, a billion Terabytes, or atrillion Gigabytes.

    

It is assumed that the global Internet traffic as measured by Cisco, which in 2016 had just exceeded the ZB1, and expected to exceed the 3 ZB by 2021. But the traffic is still nothing compared to the generated data (which exceeded the ZB already in 2012), whereas IDC, in its report Data Age 2025  showed that the threshold of 20 ZB was already exceeded this year and that this exponential growth would lead to breaking through the 160 ZB by 2025!   

 

 

We are generating an immense amount of data, and we are rapidly reaching the capacity limit of the current technology to handle it. Some might argue that a large part of the data generated is garbage that could easily be deleted without any problem, but it is difficult to understand today what might become relevant in the future, so this can certainly not be considered a solution.

 

Big Data is already a challenge in terms of computing capacity today, but it will soon become a challenge in terms of space with today’s technologies: SSD media have brought some performance improvement over magnetic hard disks, but for what concerns long-term storage we are still stuck with magnetic tapes. 

 

In 2007, GM Skinner, K. Visscher, and M. Mansuripur published a fairly revolutionary paper in the Journal of Bionanoscience, titled Biocompatible Writing of Data into DNA, where they used a simple DNA-based storage scheme. In this work, the group demonstrated the possibility to “write” information in DNA strands and to read it using a  specific gel. The method was still rudimentary but the way was paved.

 

Sequencing and Synthesis

 

The process of reading DNA, better known as “sequencing”, received a major boost from the work of the NHGRI within the scope of the Human Genome Project, which was completed in 2003.

 

The DNA is made of 4 bases: Adenine,  Guanine,  Thymine, and  Cytosine. The “trick” is that the only combinations allowed are between Adenine and Thymine, and between Cytosine and Guanina, thus allowing the reconstruction of the sequence by introducing one base at a time. The process is repeated millions of times. Now, by combining combinations of 0 and 1 to each base you get a 2-bit code: 00, 01, 10, 11. And voilà, we have a digitization scheme.

 

Why DNA ? - The advantages are here...

 

    Density: DNA is above all incredibly dense. Already last year the threshold of 200 PetaBytes (1000 TB) per gram was exceeded. It is estimated that all data on the Internet today could be easily contained on DNA in the space of a shoe box (!).

    Loyalty: data recovery can be virtually error-free due to the accuracy of DNA replication methods.

    Sustainability: the energy required to maintain DNA-encoded information is a small fraction of that required by modern data centers.

    Longevity: DNA is a stable molecule that can last for thousands of years without degrading.

 

The sequencing technologies are now very advanced, and today there are even USB pocket sequencers (see below), and the most advanced devices allow the execution of many runs in parallel.

 

The writing (or synthesis) of DNA instead requires to “attach” together one base after another in a controlled environment, a very slow chemical process that dates back to 1981. However, given the huge market demand, there are companies like Twist Bioscience  and DNA Script  that have developed innovative synthesis technologies, based respectively on silicon and enzymatic synthesis, which promise volumes orders of magnitude higher than traditional ones. Moreover, just recently, two researchers at the Synthetic Biology Informatics department of JBEI presented a new synthesis methodology that could lead to the creation of 3D printers of DNA.

 

Since the work of Skinner & coll. the research has made huge progress: in 2015,  Microsoft and MISL of the  University of Washington created the DNA Storage project, establishing a record in 2016 by storing and successfully recovering 200 MB in strands of DNA. In 2017, in another important work, Y. Erlich and D. Zielinski, stored and recovered 2 MB of material with a density of over 200 PetaByte per gram, touching the theoretical limit postulated by Shannon, through the use of  “fountain codes”2.

 

To this day, the DNA synthesis/sequencing process is still expensive (we are talking about a few thousand dollars per MB in writing and 200 for reading) but this is bound to fall, both in view of the rapid evolution of the sector, due to the explosive request of engineered DNA, both because for the storage of the data it is possible to use ad-hoc synthesized DNA instead of the biological one. In this regard, it is expected that the extensive use of editing technologies such as   CRISPR/Cas9,  TALEN and ZNF  in genetic manipulation will become the main driver of growth in this market.

 

The use of DNA for digitization is therefore not something that belongs to science fiction, but we are already starting to see the first prototypes of applications.

 

Encryption: Carverr, an American startup has developed a method to encrypt data into DNA molecules and offers a DNA-based password encryption service for $1,000.

    

Cloud: Just last March Microsoft published a paper on Nature where it demonstrated the ability to perform DNA readings through random access3, dramatically increasing the efficiency of the sequencing process. Thanks to progress like this and those mentioned above, Microsoft seems to be starting to consider DNA for cloud backup for the future and is actively collaborating with Twist Biosciences. The costs are still very high but people at Redmond are convinced that this obstacle will easily be overcome if there is sufficient demand from the computer industry.

 

 E-Magazine 
 VIDEOS  Placeholder image

Copyright www.mybrandbook.co.in @1999-2024 - All rights reserved.
Reproduction in whole or in part in any form or medium without express written permission of Kalinga Digital Media Pvt. Ltd. is prohibited.
Other Initiatives : www.varindia.com | www.spoindia.org