Research Article  Open Access
Ying Zou, Yanting Chai, Sha Shi, Lei Wang, Yunfeng Peng, Yuan Ping, Baocang Wang, "Improved CloudAssisted PrivacyPreserving ProfileMatching Scheme in Mobile Social Networks", Security and Communication Networks, vol. 2020, Article ID 4938736, 12 pages, 2020. https://doi.org/10.1155/2020/4938736
Improved CloudAssisted PrivacyPreserving ProfileMatching Scheme in Mobile Social Networks
Abstract
Due to the transparency of the wireless channel, users in multiplekey environment are vulnerable to eavesdropping during the process of uploading personal data and reencryption keys. Besides, there is additional burden of key management arising from multiple keys of users. In addition, profile matching using inner product between vectors cannot effectively filter out users with ulterior motives. To tackle the above challenges, we first improve a homomorphic reencryption system (HRES) to support a single homomorphic multiplication and arbitrarily many homomorphic additions. The public key negotiated by the clouds is used to encrypt the usersâ€™ data, thereby avoiding the issues of key leakage and key management, and the privacy of usersâ€™ data is also protected. Furthermore, our scheme utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring the usersâ€™ proximity. Thus, we can effectively improve the social experience of users.
1. Introduction
With the rapid development of Internet technology, mobile devices such as mobile phones and tablets have gradually become popular in peopleâ€™s daily life in recent years. Some social networks such as WeChat, Twitter, and Facebook are gradually integrated into peopleâ€™s life, and people would like to share some opinions, pictures, and videos with others. Therefore, people are more willing to find potential friends with similar interests in mobile social networks.
Profilematching is the most effective way to measure the proximity between usersâ€™ personal profiles. The userâ€™s personal profile is often defined as a vector in practical applications, and each dimension of the vector represents an attribute corresponding to a hobby, such as football, photography, and religion. Each attribute value is represented by an integer between 0 and 10 or a larger range. The attribute value indicates the degree of the interest. The value 0 means that the user has no interest in the item, and value 10 represents that the user is particularly fond of it. Moreover, social proximity is often defined as the inner product of two usersâ€™ vectors [1â€“3]. Since the inner product is the sum of the products of the corresponding attributes between the two vectors, it cannot accurately show the proximity between users. For example, the vector of user is , user is , and user is . User wants to find a person with a higher proximity between user and user . If the inner product of two vectors is chosen as a measurement, it is surprising that user meets higher requirements than user . To get more accurate social proximity results, it is preferred to adopt cosine proximity result between the two normalized vectors as the standard.
In addition, usersâ€™ profiles often contain sensitive data and they do not want to expose their personal information. One way to protect the usersâ€™ privacy is to encrypt the data by cryptographic technologies, but the structure of raw data will be essentially damaged after encryption, causing difficulties in reprocessing the data. The homomorphic encryption technology [4â€“7] has its unique advantages in encrypted data processing. In particular, partial homomorphic encryption technique [8] is more suitable for many realistic applications [9].
In many existing profilematching works [1, 10, 11], a large amount of interactions are often required between user mobile terminals to obtain matching results, which will bring heavy computational costs and communication overhead to users. In addition, users also need to stay online during the matching process. Fortunately, with the development of cloud computing technology [12, 13], the cloud platform can provide users with huge storage space and abundant computing resources. Outsourcing computing and storage to the cloud can effectively reduce the burden on the usersâ€™ mobile terminals. Gao et al. [2] transfer usersâ€™ work to the cloud with the help of two cooperative but noncollusive clouds, and users can go offline after uploading their profiles to the cloud. Gao et al.â€™s scheme utilizes an ElGamallike proxy reencryption [14] algorithm with additive homomorphic property, which leads to the issues of key management and the leakage of reencryption keys. Furthermore, the ElGamallike algorithm requires that the size of the plaintext cannot exceed 40â€‰bits in order to ensure the decryption efficiency; thus it cannot be applied to the scenarios requiring high data accuracy.
The main contributions of our work are as follows:(i)We improve the HRES algorithm [15] in order to avoid the drawbacks in [2]. The improved algorithm supports one homomorphic multiplication and arbitrarily many homomorphic additions, which can effectively avoid the key leakage and the key management issues caused during usersâ€™ uploading reencryption keys.(ii)This paper utilizes the homomorphic multiplication property of the improved HRES algorithm to compute the cosine result between the normalized vectors as the standard for measuring proximity. Our proposal can effectively ensure the accuracy of the matching results and improve the social experience of the users. Furthermore, the improved HRES algorithm can prove to be semantically secure, and the profilematching protocol is also secure in the sense that both clouds cannot get useful information about usersâ€™ data under the noncollusion security model.
1.1. Related Work
The research on privacypreserving profile matching can be mainly divided into two categories, coarsegrained [16â€“19] and finegrained [1, 2, 10] profilematching schemes.
1.1.1. CoarseGrained Scheme
In the coarsegrained profilematching schemes, the matching proximity is often defined as a set intersection or the cardinality of intersections of userâ€™s attribute sets, but this solution cannot further distinguish the specific relevance between users. In 2011, Li et al. [16] proposed two distributed privacypreserving profilematching protocols, which deploy the homomorphic property of Shamir secret sharing scheme [20] to calculate the intersections of usersâ€™ private sets without relying on a trusted third party. However, the coarsegrained profile matching cannot accurately measure the proximity of users.
1.1.2. FineGrained Scheme
For the finegrained profilematching schemes [1, 2, 10], userâ€™s preference or behavior pattern is usually regarded as a multidimensional vector, and the social proximity is usually measured by the inner product between two usersâ€™ vectors. For example, Zhang et al. [1] designed a finegrained profilematching protocol with three security levels. A user can initiate a matching query for their encrypted data with other users and finally obtains the proximity result by utilizing the homomorphic addition property of the Paillier cryptosystem [21]. However, it is required that both users stay online to perform multiple interactions during the execution of protocols, thus imposing a heavy communication and computational burden on mobile terminals. To tackle the above issues, Gao et al. [2] introduced a novel cloudassisted profilematching scheme under multiple keys. The cloud environment is composed of two cooperative but noncollusive clouds, and users in the social application could go offline after uploading their encrypted data. A friend finder can designate a target and initiate a matching query to the cloud; then, the two cloud servers return the matching result to the user through interactions. In their scheme, the clouds perform most of the computations, which effectively reduces the burden of users. And the data providers do not have to stay online all the time. However, the scheme [2] utilizes a secure ElGamallike [14] proxy reencryption algorithm to encrypt data, so each user needs to generate their secret keys and reencryption keys. Although schemes in a multiplekey environment could benefit from some existing technologies [22] that could create secure communicating groups within a secure group with exchange of small information, multiple keys increase additional burden of key management in our consideration. Moreover, it seems impossible to guarantee usersâ€™ privacy once the userâ€™s reencryption key is leaked in the process of uploading to the cloud. What is worse, Gao et al.â€™s scheme uses the inner product of two vectors to measure the matching degree. However, the ElGamallike algorithm requires the size of the plaintext to be less than 40â€‰bits to make sure that the ciphertext can be efficiently decrypted using Pollardâ€™s kangaroo algorithm [23] to solve the discrete logarithm problem for a relatively smaller integer. Therefore, the scheme fails to provide high precision computations on the usersâ€™ data.
1.2. Organization
The rest of the paper is organized as follows. Section 2 gives some notations and the main part of the HRES algorithm [15]. Section 3 introduces the system and adversary models. Section 4 provides the construction of the improved algorithm and the profilematching scheme. Section 5 presents the security proof of the improved HRES algorithm and the security analysis of the profilematching protocol. Section 6 draws conclusions.
2. Preliminaries
In this section, we briefly introduce some notations appearing in this paper. We also introduce a fundamental HRES algorithm with two cooperative and noncollusive clouds and a data normalization method.
2.1. Notation
In this paper, we write for assigning x a value X and denote the integer set as . In addition, indicates that is calculated by the algorithm . We use and to denote the Euler and the Carmichael function, respectively. The ring means the residue class ring of integers modulo , i.e., . We use bold lowercase letters to represent vectors, and the Euclidean norm for a vector is denoted as . In addition, the bit length of an integer is denoted as . We use the symbol to denote the ceiling function. Besides, means the inner product of the two vectors and , and we use to denote the encryption function with the public key of the improved HRES algorithm that will be introduced in the next section.
2.2. Homomorphic ReEncryption
Reencryption technology supports a proxy transferring decryption authority without decrypting ciphertext [24]. On this basis, homomorphic reencryption supports homomorphic operation on ciphertext, which is more suitable for data dissemination in networks with special needs. In 2017, Ding et al. proposed a HRES homomorphic reencryption scheme [15], which includes two noncollusive cloud servers, CA and CB, that jointly manage the ciphertext data. The data owner encrypts his data with the public key generated by negotiation between the two cloud servers, and the ciphertexts can be correctly decrypted only if the two cloud servers cooperate with each other. The HRES consists of the following algorithms.(i)KeyGen. Given a security parameter , let be a safe RSA modulus, where and are primes of the form of and , and and are primes of equal bit length. Let be an element of the maximal order in . The two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffieâ€“Hellman key .(ii)Enc. Given the Diffieâ€“Hellman key and a message , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can decrypt the plaintext as follows:where the function is defined as .
2.3. Normalization Method
This section mainly introduces a common data standardization method: Zscore standardization method, which standardizes data based on the mean and variance of the original data. The processed data obeys the normal distribution; that is, the mean value of the data is 0 and the standard deviation is 1. The conversion formula is as follows:where represents the processed data, is the mean of the samples and denotes the standard deviation of the samples.
In the proposed scheme, the raw data need to be normalized before processing. The reason for this operation is that the values of data from different sources are quite different. In order to eliminate the influence of different numerical range and make the data comparable, data need to be normalized. Hence, Zscore method is adopted to make the data at the same level, which is convenient for analyzing the data.
3. System and Threat Model
3.1. System Model
As shown in Figure 1, our system model contains three entities: a friend finder (Alice), the cloud environment, and other users. Each entity is described as follows: Alice is marked as a friend finder who wants to find friends that have similar interests with her in the social network. The cloud environment includes two cloud servers, CA and CB, which can provide users with enormous storage space for storing personal profiles and a large amount of computing resources. Through cooperating, CA and CB can help Alice calculate the proximities with other users, and the privacy of the two users will be not compromised. There are many other users in social networks who prefer to outsource the encrypted data representing their preferences to the cloud CA.
Each user registers a personal account in the mobile social network and then fills in the personal information. The personal information contains userâ€™s preferences that can be used as a measurement for profile matching. However, the user does not want to reveal his private data in order to avoid illegal use [25]. Personal profile is often defined as a multidimensional vector , and each dimension represents an attribute corresponding to a hobby, such as cooking and tourism. Each attribute value may be represented by an integer ranging from 0 to 10, or a larger range. The bigger the number is, the more the users like the item will be, and vice versa. To better measure the proximity between users, the cosine value of normalized vectors with Zscore method is taken in this paper.
3.2. Threat Model
In the honestbutcurious model, an external adversary and an internal adversary are considered. An external adversary mainly refers to an eavesdropper who can get some information (e.g., encrypted data) through the transparent channel by eavesdropping. An internal adversary is an honestbutcurious entity such that he faithfully follow the agreement but attempt to collect and reveal private information during the execution of the agreement. The friend finder may want to expose other usersâ€™ profiles, while the two clouds may want to reveal the usersâ€™ personal data in the social networks. Moreover, it is assumed that the two clouds will never collude with each other and the users will not deliberately attempt to guess the cosine result by adjusting the vector multiple times.
4. Our Construction
In this section, we give the improved HRES algorithm and our privacypreserving profilematching scheme.
4.1. Improved HRES Algorithm
In order to support homomorphic multiplication computations, a slight modification has been made on the original HRES algorithm. Here it is required that the two clouds CA and CB will not collude with each other and cooperate to perform decryption operations. The improved algorithm includes the following algorithms: KeyGen, Enc, Partial Dec1, Partial Dec2, and Evaluation.(i)KeyGen. Choose a large prime integer . Note that the multiplicative group has primitive roots of order , and hence the algorithm can randomly choose a generator with order from . Then the two cloud servers CA and CB, respectively, generate their own key pairs: and . Therefore, CA negotiates with CB to generate their Diffieâ€“Hellman key .(ii)Enc. Given the Diffieâ€“Hellman key and a plaintext message , where , output the ciphertext as follows: , where indicates the ciphertext encrypted with and is a random integer selected from .(iii)Partial Dec1. Given and a ciphertext , the cloud CA can transfer the above ciphertext into another ciphertext as , where indicates that the ciphertext can be decrypted with .(iv)Partial Dec2. Given and a ciphertext , the cloud CB can output the plaintext as follows:â€‰where the function is defined as .(v)Homomorphic evaluation. We first show that the improved scheme supports homomorphic additions. Given any ciphertexts, namely, for , with the underlying plaintext being for , we firstly show that our improved algorithm supports homomorphic additions.
Specifically, we compute . Note that
From the refreshed ciphertext , the cloud CA can use his secret key to partially decrypt the ciphertext as , where
The cloud CB can further decrypt the ciphertext using his secret key as follows. The cloud CB first computes
Accordingly, it is easy to verifywhere the cloud CB can obtain the plaintext
Note that should hold, i.e., , if is needed. Therefore, our scheme supports homomorphic additions on ciphertexts.
Now we show that the proposal can perform a homomorphic multiplication on two ciphertexts, and , where with and for . Specially, we compute . Note thatand . The cloud CA can partially decrypt the ciphertext as , where and . The cloud CB first computes
Noting that
The cloud CB computes
CB could obtain when . Then, CB calculateswhere the cloud CB obtains the results . Thus, our scheme supports a homomorphic multiplication operation on ciphertexts when .
4.2. PrivacyPreserving Profile Matching
In this part, the improved HRES algorithm is adopted to implement our privacypreserving profilematching scheme. Suppose that Aliceâ€™s vector is and that Bobâ€™s vector is . The cosine value of the two vectors can be calculated as
In particular, since and are nonnegative integers. Users can encrypt their vectors with the Diffieâ€“Hellman key and then outsource their encrypted data to CA. The procedure for the data outsourcing is presented in Algorithm 1 and the privacypreserving profilematching scheme is shown as Algorithm 2.


5. Correctness and Security
In this section, we firstly give the correctness analysis of our protocol and then prove that the improved HRES algorithm is semantically secure with rigorous method. At last, we prove that our profilematching scheme is secure under the semihonest model.
5.1. Correctness
In our scheme, CB can correctly decrypt and obtain the result of without revealing any useful information. The demonstration is shown as follows. Firstly, CA computes
We can verify that
Then, CA generates the random integers to obfuscate , and the final result can be obtained through the following formula:where . Hence, CB can obtain . Then, CB calculates when , and it cannot reveal information about and .
On the other hand, we will prove that Alice can obtain a more accurate cosine result as follows.
Once CA receives , it can remove the blinding values by computing , and then it computes the following formulas:(i).(ii).(iii).
Then, CA obtains with the blinding value . CB decrypts the ciphertext, computes , and then encrypts the ceiling result with . Thus, CA can compute and get . We find that when the pretreatment number is large, is closer to the cosine result .
5.2. Security Proof of Improved Algorithm
In this subsection, the semantic security of the improved HRES algorithm is proved through three theorems. We first prove that the DDH problem in (the cyclic group of modulo ) is hard to solve. Based on this conclusion, the improved HRES algorithm could be semantically secure.
Theorem 1. Let be the cyclic group of modulo , and be a generator of . The discrete logarithm problem (in ) is hard to solve.
Proof. For the sake of contradiction, it is assumed that the discrete logarithm problem (in ) is not difficult; i.e., there exists an oracle that can solve the discrete logarithm problem (in ) in polynomial time. For example, given an input , the oracle returns the index as output.(1)Note that, if is a generator of and satisfies , is also a generator of . Our strategy is as follows. Let , we can assume that there exists an integer that satisfies , such that the equation holds. Multiply simultaneously both sides of the equation by and the equation can be obtained. Thus, it has .â€‰Since , can be denoted as where is an integer. Thus, can be written as the following formula:â€‰We can obtain . Then, we query the oracle for solving the discrete logarithm problem in to get the result with as input. And of course can be solved out. This implies that, if the discrete logarithm problem in were not difficult, the discrete logarithm problem in turns out to be easy as well. It is contradictory obviously.(2)If is a generator of and satisfies , is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.
Theorem 2. Let be a large prime. Let be the cyclic group of modulo , and be a generator of . The Decisional Diffieâ€“Hellman problem (in ) is difficult.
Proof. For the sake of contradiction, it is assumed that the Decisional Diffieâ€“Hellman (in ) is not difficult. It means that there exists an adversary who can find a random integer such that . We assume that is a random quadruple and is a DH quadruple, where and .(1)Note that, if is a generator of and satisfies , is also a generator of . If can find a random integer that satisfies , can be derived. Hence, equation holds. Besides, since is a generator of , we can also get the equation . We thus state that if there exists an adversary who can distinguish the random quadruple from DH quadruple , can also distinguish and in . However, the DDH assumption in is difficult, so the original hypothesis does not hold. Thus, we can get a conclusion that DDH assumption (in ) is also difficult.(2)If is a generator of and satisfies , then is also the generator that belongs to both and . The proof process is similar to the above one and we will not describe it here.
Theorem 3. If Decisional Diffieâ€“Hellman assumption in holds, the improved HRES algorithm presented in Section 5.1 is semantically secure.
Proof. During the KeyGen phase, the cloud servers CA and CB negotiate with each other to generate their Diffieâ€“Hellman key. Due to the difficulty of discrete logarithm problem in , it is negligible to get any information about, , or for any adversaries.
For the sake of contradiction, it is assumed that the scheme is not semantically secure; i.e., there exists a polynomial time adversary which can break semantic security with nonnegligible probability . constructs a distinguisher that can solve the DDH problem in . The construction is as follows.
Given a challenge quadruple , where , the goal of the distinguisher is to determine or , where is a random integer in . The challenger sends the public key to the adversary . Then, the adversary submits two challenge messages of equal length to the challenger based on his prior knowledge, and then the challenger returns as a challenge ciphertext to the adversary , in which . Finally, the adversary outputs as the guess result. If , the challenger outputs ; otherwise, it outputs . The discussion is as follows:(1)If , then is a valid ciphertext and the probability of adversary guessing correctly is equal to .(2)If , then is independent of the encrypted message because the random value is uniformly and randomly distributed among . Therefore, the probability that the adversary guesses correctly is .As a result, if the distinguisher can break the scheme with a nonnegligible probability , the adversary can attack the DDH assumption in with the same advantage. For the reason that the DDH assumption in is difficult, our improved scheme is semantically secure.
5.3. Security Analysis of Our Protocol
The security analysis of our privacypreserving profilematching scheme under the semihonest model will be presented in this subsection with a real and ideal paradigm [2, 26]. For any adversaries who attack a real protocol execution, there exists an adversary who attacks an ideal execution, such that the input and output distributions of the adversary and participants in both the real and the ideal executions are fundamentally the same.
Theorem 4. Our profilematching scheme described in Section 5 can securely obtain the matching result through the calculations on ciphertexts under the semihonest and noncollusive adversaries.
Proof. In this scheme, there are mainly four parties: Alice, Bob, CA, and CB. We can construct four simulators against four types of adversaries that will corrupt the privacy of Alice, Bob, CA, and CB, respectively.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . Then, chooses a random integer and encrypts into . randomly picks a vector , computes , and sends the partial decryption result processed by its secret key to . The view of involves the normalized vector , the encrypted results set , and the decryption result . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: After receiving the normalized vector , encrypts and , respectively, to get and . After that, chooses a random integer and encrypts into . Then sends the encrypted results to . The view of involves the normalized vector and the encrypted result set . The view of in real and ideal executions is indistinguishable owing to the semantic security of the improved HRES scheme mentioned above.
simulates as follows: randomly picks two vectors and . Then, encrypts them as , , , and . generates two random integers , ; respectively, obtains the encrypted values ; and partially decrypts the above result. Finally, it generates random integers