Article Preview
TopIntroduction
The global community is experiencing rapid growth in a huge number of data generated by all sensitive personal information (Madden, 2012). When data generation is rapid, the data holder faces a very challenging scenario in holding each and every data which lead into lack of data privacy on sensitive information. A data holder faces a huge compromise in data hide and handling a huge variety of data. Big Data analytics is one of the advanced analytical technologies used on large scale datasets. Big Data plays a vital role in this field leading to a data privacy breach. As owing to the huge technological enhancement and advancement, data streaming has been huge. Google, YouTube, Facebook, and WhatsApp collect personal and sensitive data of the user and they are archived by the social media organization (Kavanaugh et al., 2012). In research, Big Data includes mobile data, healthcare, traffic multimedia data, and aircraft data. The data generated by airline transportation is more challenging for big data analytics. These generated archives are used for analysis of the personal information for their profit. Therefore, the privacy of information is very important for one’s private and public data. Hence preserving the privacy of large datasets is ponderous. So many corporate organizations, customers, end-users, hesitate to take Cloud privacy and security due to its insecure and virtual storage and security on large scale datasets.
Anonymization
Anonymization is one of the information bits which are referred to as the extraction of sensitive data intent to privacy protection. Data anonymization helps in sharing from one server source to the destination client across the boundary without any side attack. Data anonymization based on k-anonymity is extremely used for this purpose in data hide or data sharing. With these structures, we combine the data processing categories in order to process large datasets in an efficient manner. Two broad anonymization methods such as bottom-up generalization (BUG) and top-down specialization (TDS) play a vital role in data privacy and data hiding of sensitive attribute in the dataset. The first BUG generalizes the data from bottom to up taxonomy (Wang et al., 2004) whereas the latter method, TDS specializes from the top down taxonomy of data flow processing (Fung et al., 2005). Nevertheless, these two methods fit only traditional data, but do not function on large scale data with a lack of efficiency and scalability. With this as the base of these two techniques, it is categorized as parallel BUG, Hybrid BUG, TDS and Two way TDS, Mondrian TDS, etc.