150 likes | 162 Views
Big data technologies are designed to handle challenges. Tools used in Hadoop eco-systems like Map Reduce, PIG, HIVE, HBase, Sqoop, Spark, Storm etc. are popular for big data processing. There are NO-SQL database like MongoDB are also used. Distance MBA in AI and Machine Learning Online Course would cover some of these technologies along with analytics. SimpliDistance helps you to suggest the best online courses.<br>
E N D
Distance MBA SimpliDistance
Why Care for Big Data – Distance MBA in AI and ML - SimpliDistance We live in the connected world where more than 3.7 billion humans are connected to internet. We use many digital gadgets daily. Communicate with our friends and the world by posting messages, likes, forwards on Facebook, twitter, through emails, mobile apps like Whatsapp and so on. Millennial are recording every event of their life using photos and videos of what food they are eating, which movie they are watching, on which airport they are waiting for the flight etc.
What we are doing is generating humongous amount of data with our interactions. Internet search engines are inseparable part of our daily life. Google processes 3.5 billion searches a day!
Why should we care about this? The challenges to handle such large amount of data seem to be very daunting. Does it even makes sense to attempt to embark on such an arduous task? Let us assume that we are interested in knowing how a particular disease is spreading across a country. This as it can cause epidemic. We are interested in this because we can take some action to prevent the outburst. As a part of the solution, one may try to communicate with all the hospitals and private medical practitioners to get some head start on the information. With this data, aid can be provided to crisis hit areas. However, this would be a huge effort and would also be time consuming.
Another way to look at this issue is with the help of search engines. Observe which search queries are related to the disease or medicines are being seen on the search engine. How many of them are as recent as one week or 15 days. Analysis of such data would be very useful in tracking how such disease is spreading. This will be very quick, almost real time. Hence it makes lot of sense to process such data.
3 V’s of Big Data Let us see what characteristics the data need to satisfy so that it can be called as big data. At start, typically three qualities volume, velocity and variety, popularly called as 3 V’s, were used to qualify anything as big data.
Volume Very large volume is the first characteristics of big data. Here are some sample statistics about number of users which produce large volume of data everyday using comments, posts, photos, videos, likes etc on different platforms on social media. Here are examples of large volumes. • Facebook: 2000 million users. • Google+: 111 million users. • Instagram: 1000 million users. etc.
Velocity Velocity refers to the speed with which data is generated from human interactions with social media, mobile apps, websites etc. It is very peculiar characteristics of big data. One should be able to handle this velocity to get insights and use it for the competitive advantage.
Variety Variety refers to different types of data. We generate different types of data like text files, PDFs, excel sheets, emails, photos, databases, videos and data generated by sensors,. This includes structured data like database records and unstructured data like comments, likes etc. The unstructured data cannot retrieved using structured Query Language (SQL). Hence, a different type of databases known as NO-SQL databases are used to handle such data. MongoDB, CouchDB are example of such databases.
Veracity, Validity, Volatility Dictionary meaning of word ‘Veracity’ is ‘conformity to facts’ or ‘accuracy’. In data processing the bigger challenge in cleanliness of the data. As data is gathered from multiple sources, chances of getting lot of noise or bad data are very high. If you use this so called dirty data without cleaning it, predictions and analysis would not be useful or may be sometimes grossly wrong. Accuracy or cleanliness of data refers to Veracity. Along with veracity, validity of data in the context is also equally important.
Volatility refers to how long the data would be valid in the business context. For example, comments of Facebook about a movie being released recently may not have longer impact. User sentiments may keep on changing every week. Processing the data quickly and taking corrective actions faster is required.
Volume It makes sense to process this big data only if it has value. Extracting value becomes difficult due to velocity and volatility. Big Data Technologies are designed to handle these challenges. Tools used in Hadoop eco-systems like Map Reduce, PIG, HIVE, HBase, Sqoop, Spark, Storm etc. are popular for big data processing.
There are NO-SQL database like MongoDB are also used. Distance MBA in AI and Machine Learning Online Coursewould cover some of these technologies along with analytics. After completion of the distance MBA program, in a data science job role, you would use technologies and overcome these challenges.
Conclusion SimpliDistance helps to visitors in career mapping & personalized guidance experience with the use of technology & suggest suitable courses to upgrade their knowledge/skills to grow in their career. SimpliDistance is the best Distance Learning Portal in India.