An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

Document Type : Research Article

Authors

Computer Science Department, Shahid Bahonar University of Kerman, Kerman, Iran

Abstract

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in a distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distances of the file transmission and achieving files from nearby locations to requested sites so as to minimize retrieval time and bandwidth usage. In this paper, we propose a new replica selection strategy which is based on response time and security. However, replication should be used wisely because the storage size of each Data Grid site is limited. In addition, we propose a new replica replacement strategy that considers file availability, time of access, access frequency and size of file. The simulation results report that the proposed strategy can effectively improve mean job time, bandwidth consumption for data delivery, and data availability compared with those of the tested algorithms.

Keywords

Main Subjects


[1] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, The data grid: towards an architecture for the distributed management and analysis of large scientific datasets, Journal of Network and Computer Applications, 23 (2001) 187-200.
[2] N. Rathore, I. Chana, Variable threshold-based hierarchical load balancing technique in Grid, Engineering with Computers, 31(3) (2014) 597-615.
[3] A.S. Saleh, An efficient system-oriented grid scheduler based on a fuzzy matchmaking approach, Engineering with Computers, 29 (2013) 185-206.
[4] T. Hamrouni, S. Slimani, F. Ben Charrada, A survey of dynamic replication and replica selection strategies based on data mining techniques in data grids, Engineering Applications of Artificial Intelligence, 48 (2016) 140- 158.
[5] E. Gallicchio, J. Xia, W.F. Flynn, B. Zhang, S. Samlalsingh, A. Mentes, R.M. Levy, Asynchronous replica exchange software for grid and heterogeneous computing, Computer Physics Communications, 196 (2015) 236-246.
[6] S. Warhade, P. Dahiwale, M.M. Raghuwanshi, A dynamic data replication in grid system, in: 1st International Conference on Information Security & Privacy, 2016, 537-543.
[7] T. Hamrouni, S. Slimani, Faouzi Ben Charrada, A data mining correlated patterns-based periodic decentralized replication strategy for data grids, Journal of Systems and Software, 110 (2015) 10-27.
[8] E.U. Munir, J. Li, S. Shi, QoS suffrage heuristic for independent task scheduling in grid, Journal of Information Technology, 6 (2007)1166-1170.
[9] OptorSim–A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edgwp2/ optimization/optorsim.html
[10] S. Goel, R. Buyya, Data replication strategies in wide-area distributed systems, Enterprise Service Computing: From Concept to Deployment, Idea Group Inc., Hershey, (2006) 211-241.
[11] Y. Saito, M. Shapiro, Optimistic replication, ACM Computing Surveys, 37(1) (2005) 42-81.
[12] I. Foster, K. Ranganathan, Design and evaluation of dynamic replication strategies for high performance data grids, in: Proceedings of International Conference on Computing in High Energy and Nuclear Physics, 2001.
[13] I. Foster, K. Ranganathan, Identifying dynamic replication strategies for high performance data grids, in: Proceedings of 3rd IEEE/ACM International Workshop on Grid Computing, 2002, pp. 75-86.
[14] I. Foster, K. Ranganathan, Decoupling computation and data scheduling in distributed data-intensive applications, in: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing, HPDC-11, IEEE, CS Press, Edinburgh, UK, 2002, pp. 352-358.
[15] M. Bsoul, A. Al-Khasawneh, E.E. Abdallah, Y. Kilani, Enhanced fast spread replication strategy for data grid, Journal of Network and Computer Applications, 34 (2011) 575-580.
[16] K. Sashi, A.S. Thanamani, Dynamic replica management for data grid, IACSIT International Journal of Engineering and Technology, 2 (2010) 329-333.
[17] R.S. Chang, H.P. Chang, A Dynamic data replication strategy using access-weight in data grids, The Journal of Supercomputing, 45 (2008) 277-295.
[18] S.M. Park, J.H. Kim, Y.B. Ko, W.S. Yoon, Dynamic grid replication strategy based on internet hierarchy, in: International Workshop on Grid and Cooperative Computing, 1001 (2003) 1324-1331.
[19] K. Sashi, A.S. Thanamani, Dynamic replication in a data grid using a Modified BHR region based algorithm, Future Generation Computer Systems, 27 (2011) 202- 210.
[20] A. Horri, R. Sepahvand, G.H. Dastghaibyfard, A hierarchical scheduling and replication strategy, International Journal of Computer Science and Network Security, 8 (2008).
[21] N. Mansouri, G.H. Dastghaibyfard, Job scheduling and dynamic data replication in data grid environment, The Journal of Supercomputing, 64 (2013) 204-225.
[22] R. Chang, J. Chang, S. Lin, Job scheduling and data replication on data grids, Future Generation Computer Systems, 23 (2007) 846-860.
[23] N. Mansouri, G.H. Dastghaibyfard, A dynamic replica management strategy in data grid, Journal of Network and Computer Applications, 35 (2012) 1297-1303.
[24] N. Mansouri, G.H. Dastghaibyfard, E. Mansouri, Combination of data replication and scheduling algorithm for improving data availability in Data Grids, Journal of Network and Computer Applications, 36 (2013) 711-722.
[25] N. Mansouri, G.H. Dastghaibyfard, Enhanced dynamic hierarchical replication and weighted scheduling strategy in data grid, Journal of Parallel and Distributed Computing, 73 (2013) 534-543.
[26] C. Wang, C. Hsu, P. Liu, H. Chen, J. Wu, Optimizing server placement in hierarchical grid environments, The Journal of Supercomputing, 42 (2007) 267-282.
[27] C. Yang, C. Fu, C. Hsu, File replication, maintenance, and consistency management services in data grids, The Journal of Supercomputing, 53 (2010) 411-439.
[28] R.M. Rahman, R. Alhajj, K. Barker, Replica selection strategies in data grid, Journal of Parallel and Distributed Computing, 68 (2008) 1561-1574.
[29] R. Vingralek, Y. Breitbart, M. Sayal, P. Scheuermann, Web++: a system for fast and reliable web service, in: Proceedings of the USENIX Annual Technical Conference, 1999.
[30] M. Sayal, Y. Breitbart, P. Scheuermann, R. Vingralek, Selection algorithms for replicated web servers, in: Proceedings of the Workshop on Internet Server Performance, 1998.
[31] Load Balancing System, Chapter 6 in Intel Solutions Manual, Intel Corporation, 49-67.
[32] R. M. Almuttairi, R. Wankar, A. Negi, R. Rao Chillarige, M.S. Almahna, New replica selection technique for binding replica sites in data grids, in: 1st International Conference on Energy, Power and Control (EPC-IQ), 2010, pp. 187-194.
[33] S. Lewontin, E. Martin, Client side load balancing for the web, in: Proceedings of 6th International World Wide Web Conference, 1997, pp. 7-11.
[34] Z. Fei, S. Bhattacharjee, E. Zegura, M. Ammar, A novel server selection technique for improving response time of a replicated service, in: Proceedings IEEE INFOCOM, 1998, pp. 783-791.
[35] G. Bingxiang ,Y. Kui, a global dynamic scheduling with replica selection algorithm using GridFTP, in: International Conference on Challenges in Environmental Science and Computer Engineering, 2010, pp. 106-109.
[36] M. Sayal, Y. Breitbart, P. Scheuermann, R. Vingralek, Selection algorithms for replicated web servers, in: Proceedings of the Workshop on Internet Server Performance, Wisconsin, 1998.
[37] T. Ceryen, M. Kevin, Performance characterization of decentralized algorithms for replica selection in distributed object systems, in: Proceedings of the 5th International Workshop on Software Performance, 2005, pp. 257-262.
[38] B. Kusý, P. Dutta, P. Levis, Elapsed time on arrival: a simple and versatile primitive for canonical time synchronization services, Int. J. Ad Hoc and Ubiquitous Computing, 1 (2006) 1-14.
[39] H. Hamad, E. AL-Mistarihi, C. Huah Yong, Response time optimization for replica selection service in data grids, Journal of Computer Science, 4 (2008) 487-493.
[40] D.G. Cameron, R. Carvajal-schiaffino, A. Paul Millar, C. Nicholson, K. Stockinger, F. Zini, UK Grid Simulation with OptorSim, UK e-Science All Hands Meeting, (2003). 49.