How to encrypt data with bouncy castle while ensuring the result is deterministicEncrypt Data with C# AesCryptoServiceProvider crypted with BouncyCastle AesFastEngineAndroid RSA Keypair Generation - Should I use Standard Java/Bouncy Castle/Spongy Castle/JSch/Other?Why does Bouncy Castle RSAEngine.processblock method always returns 255 bytes when decrypting?AES-256 encryption workflow in scala with bouncy castle: salt and IV usage and transfer/storageDifferent file size when encrypting with Bouncy Castle (In Java) and GPG4WinHow to implement stream(cipher) encryption using PGP, Bouncing CastleBouncy Castle Export method in C# returns different size byte arrays in .NET 3.5 and 4.5Bouncy Castle CTS Mode for Blowfish Engine not working as expectedBouncy Castle ECIES compressed format
Why did UK NHS pay for homeopathic treatments?
Organisational search option
Do we know the situation in Britain before Sealion (summer 1940)?
practicality of 30 year fix mortgage at 55 years of age
Detect duplicates without exposing underlying data
Hiking with a mule or two?
Can Northern Ireland's border issue be solved by repartition?
To what extent is it worthwhile to report check fraud / refund scams?
Cut a cake into 3 equal portions with only a knife
What is the meaning of "heutig" in this sentence?
Co-Supervisor comes to office to help her students which distracts me
Subverting the emotional woman and stoic man trope
Basic digital RC approximation filter in python (Micropython)
Lettrine + string manipulation + some fonts = errors and weird issues
Do we have any particular tonal center in mind when we are NOT listening music?
Why weren't the Death Star plans transmitted electronically?
Safe to use 220V electric clothes dryer when building has been bridged down to 110V?
Is it really necessary to have a four hour meeting in Sprint planning?
Are Custom Indexes passed on to Sandboxes
A high quality contribution but an annoying error is present in my published article
extracting sublists
Examples of "unsuccessful" theories with afterlives
Is it impolite to ask for halal food when traveling to and in Thailand?
Designing a time thief proof safe
How to encrypt data with bouncy castle while ensuring the result is deterministic
Encrypt Data with C# AesCryptoServiceProvider crypted with BouncyCastle AesFastEngineAndroid RSA Keypair Generation - Should I use Standard Java/Bouncy Castle/Spongy Castle/JSch/Other?Why does Bouncy Castle RSAEngine.processblock method always returns 255 bytes when decrypting?AES-256 encryption workflow in scala with bouncy castle: salt and IV usage and transfer/storageDifferent file size when encrypting with Bouncy Castle (In Java) and GPG4WinHow to implement stream(cipher) encryption using PGP, Bouncing CastleBouncy Castle Export method in C# returns different size byte arrays in .NET 3.5 and 4.5Bouncy Castle CTS Mode for Blowfish Engine not working as expectedBouncy Castle ECIES compressed format
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Problem
We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.
Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.
Context
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.
Solution issues
Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto
, or org.bouncycastle.crypto.engines
package? or the crypto.ec
? I found the ZeroBytePadding
class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.
Goal
A class that has a set of methods similar to this:
class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]
The following code should be true
Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)
Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.
java scala encryption bouncycastle
add a comment
|
Problem
We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.
Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.
Context
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.
Solution issues
Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto
, or org.bouncycastle.crypto.engines
package? or the crypto.ec
? I found the ZeroBytePadding
class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.
Goal
A class that has a set of methods similar to this:
class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]
The following code should be true
Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)
Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.
java scala encryption bouncycastle
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56
add a comment
|
Problem
We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.
Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.
Context
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.
Solution issues
Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto
, or org.bouncycastle.crypto.engines
package? or the crypto.ec
? I found the ZeroBytePadding
class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.
Goal
A class that has a set of methods similar to this:
class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]
The following code should be true
Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)
Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.
java scala encryption bouncycastle
Problem
We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.
Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.
Context
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.
Solution issues
Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto
, or org.bouncycastle.crypto.engines
package? or the crypto.ec
? I found the ZeroBytePadding
class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.
Goal
A class that has a set of methods similar to this:
class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]
The following code should be true
Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)
Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.
java scala encryption bouncycastle
java scala encryption bouncycastle
edited Mar 28 at 17:49
pascalwhoop
asked Mar 28 at 16:14
pascalwhooppascalwhoop
1,5861 gold badge13 silver badges28 bronze badges
1,5861 gold badge13 silver badges28 bronze badges
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56
add a comment
|
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56
add a comment
|
1 Answer
1
active
oldest
votes
However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same
You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.
Just suggestions:
- you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information
- you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)
- for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)
Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)
I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.
apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance
ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly
I am struggling to find my entrypoint.
In most of the cases you may use default Java crypto API with specified provider
Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");
or
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");
Edit: fixed padding combinations
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
add a comment
|
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55402323%2fhow-to-encrypt-data-with-bouncy-castle-while-ensuring-the-result-is-deterministi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same
You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.
Just suggestions:
- you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information
- you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)
- for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)
Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)
I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.
apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance
ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly
I am struggling to find my entrypoint.
In most of the cases you may use default Java crypto API with specified provider
Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");
or
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");
Edit: fixed padding combinations
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
add a comment
|
However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same
You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.
Just suggestions:
- you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information
- you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)
- for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)
Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)
I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.
apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance
ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly
I am struggling to find my entrypoint.
In most of the cases you may use default Java crypto API with specified provider
Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");
or
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");
Edit: fixed padding combinations
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
add a comment
|
However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same
You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.
Just suggestions:
- you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information
- you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)
- for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)
Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)
I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.
apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance
ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly
I am struggling to find my entrypoint.
In most of the cases you may use default Java crypto API with specified provider
Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");
or
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");
Edit: fixed padding combinations
However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same
You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.
Just suggestions:
- you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information
- you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)
- for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)
Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg
We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)
I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.
apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance
ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly
I am struggling to find my entrypoint.
In most of the cases you may use default Java crypto API with specified provider
Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");
or
Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");
Edit: fixed padding combinations
edited Apr 5 at 11:32
answered Apr 1 at 6:54
gusto2gusto2
5,8692 gold badges10 silver badges23 bronze badges
5,8692 gold badges10 silver badges23 bronze badges
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
add a comment
|
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
Why did you choose PKCS5Padding if I may ask?
– pascalwhoop
Apr 5 at 11:23
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)
– gusto2
Apr 5 at 11:35
add a comment
|
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55402323%2fhow-to-encrypt-data-with-bouncy-castle-while-ensuring-the-result-is-deterministi%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.
– James K Polk
Mar 28 at 16:55
They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.
– pascalwhoop
Mar 28 at 17:15
You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.
– Markus Appel
Mar 29 at 15:23
Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for
– pascalwhoop
Apr 1 at 13:56