How to encrypt data with bouncy castle while ensuring the result is deterministicEncrypt Data with C# AesCryptoServiceProvider crypted with BouncyCastle AesFastEngineAndroid RSA Keypair Generation - Should I use Standard Java/Bouncy Castle/Spongy Castle/JSch/Other?Why does Bouncy Castle RSAEngine.processblock method always returns 255 bytes when decrypting?AES-256 encryption workflow in scala with bouncy castle: salt and IV usage and transfer/storageDifferent file size when encrypting with Bouncy Castle (In Java) and GPG4WinHow to implement stream(cipher) encryption using PGP, Bouncing CastleBouncy Castle Export method in C# returns different size byte arrays in .NET 3.5 and 4.5Bouncy Castle CTS Mode for Blowfish Engine not working as expectedBouncy Castle ECIES compressed format

Why did UK NHS pay for homeopathic treatments?

Organisational search option

Do we know the situation in Britain before Sealion (summer 1940)?

practicality of 30 year fix mortgage at 55 years of age

Detect duplicates without exposing underlying data

Hiking with a mule or two?

Can Northern Ireland's border issue be solved by repartition?

To what extent is it worthwhile to report check fraud / refund scams?

Cut a cake into 3 equal portions with only a knife

What is the meaning of "heutig" in this sentence?

Co-Supervisor comes to office to help her students which distracts me

Subverting the emotional woman and stoic man trope

Basic digital RC approximation filter in python (Micropython)

Lettrine + string manipulation + some fonts = errors and weird issues

Do we have any particular tonal center in mind when we are NOT listening music?

Why weren't the Death Star plans transmitted electronically?

Safe to use 220V electric clothes dryer when building has been bridged down to 110V?

Is it really necessary to have a four hour meeting in Sprint planning?

Are Custom Indexes passed on to Sandboxes

A high quality contribution but an annoying error is present in my published article

extracting sublists

Examples of "unsuccessful" theories with afterlives

Is it impolite to ask for halal food when traveling to and in Thailand?

Designing a time thief proof safe



How to encrypt data with bouncy castle while ensuring the result is deterministic


Encrypt Data with C# AesCryptoServiceProvider crypted with BouncyCastle AesFastEngineAndroid RSA Keypair Generation - Should I use Standard Java/Bouncy Castle/Spongy Castle/JSch/Other?Why does Bouncy Castle RSAEngine.processblock method always returns 255 bytes when decrypting?AES-256 encryption workflow in scala with bouncy castle: salt and IV usage and transfer/storageDifferent file size when encrypting with Bouncy Castle (In Java) and GPG4WinHow to implement stream(cipher) encryption using PGP, Bouncing CastleBouncy Castle Export method in C# returns different size byte arrays in .NET 3.5 and 4.5Bouncy Castle CTS Mode for Blowfish Engine not working as expectedBouncy Castle ECIES compressed format






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2















Problem



We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.



Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.



Context



We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.



Solution issues



Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto, or org.bouncycastle.crypto.engines package? or the crypto.ec? I found the ZeroBytePadding class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.



Goal



A class that has a set of methods similar to this:



class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]



The following code should be true



Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)


Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.










share|improve this question


























  • Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

    – James K Polk
    Mar 28 at 16:55











  • They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

    – pascalwhoop
    Mar 28 at 17:15











  • You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

    – Markus Appel
    Mar 29 at 15:23












  • Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

    – pascalwhoop
    Apr 1 at 13:56

















2















Problem



We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.



Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.



Context



We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.



Solution issues



Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto, or org.bouncycastle.crypto.engines package? or the crypto.ec? I found the ZeroBytePadding class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.



Goal



A class that has a set of methods similar to this:



class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]



The following code should be true



Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)


Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.










share|improve this question


























  • Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

    – James K Polk
    Mar 28 at 16:55











  • They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

    – pascalwhoop
    Mar 28 at 17:15











  • You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

    – Markus Appel
    Mar 29 at 15:23












  • Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

    – pascalwhoop
    Apr 1 at 13:56













2












2








2








Problem



We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.



Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.



Context



We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.



Solution issues



Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto, or org.bouncycastle.crypto.engines package? or the crypto.ec? I found the ZeroBytePadding class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.



Goal



A class that has a set of methods similar to this:



class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]



The following code should be true



Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)


Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.










share|improve this question
















Problem



We want to encrypt personally identifiable information. They should not be readable. However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same.



Most encryption ciphers include a initialization vector. This goes against what we need. To be clear, the data is supposed to be encrypted, yet this doesn't need to be bullet proof. The data is never transferred outside of the organization and this is simply done to adhere to GDPR.



Context



We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance.



Solution issues



Although the bouncy castle library is well written, it seems difficult to find good documentation and usage examples on it. I am struggling to find my entrypoint. Do I have to look at the org.bouncycastle.crypto, or org.bouncycastle.crypto.engines package? or the crypto.ec? I found the ZeroBytePadding class which I believe should point me to a potential engine that does what i want but I cannot find what I am looking for.



Goal



A class that has a set of methods similar to this:



class Anonomyzer
def initialize(publicKey: String, privateKey: String): Unit
def encode(data: Array[Byte]): Array[Byte]
def decode(data: Array[Byte]): Array[Byte]



The following code should be true



Anonomyzer.initialize("PUBLIC", "PRIVATE")
val once = Anonomyzer.encode(data)
val twice = Anonomyzer.encode(data)
Arrays.equals(once, twice)


Edit:
I've read more on this and found that what I am looking for is called
Electronic Codebook mode of operation. Although this is not perfectly secure, this is the best we can hope for AFAIK.







java scala encryption bouncycastle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 28 at 17:49







pascalwhoop

















asked Mar 28 at 16:14









pascalwhooppascalwhoop

1,5861 gold badge13 silver badges28 bronze badges




1,5861 gold badge13 silver badges28 bronze badges















  • Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

    – James K Polk
    Mar 28 at 16:55











  • They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

    – pascalwhoop
    Mar 28 at 17:15











  • You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

    – Markus Appel
    Mar 29 at 15:23












  • Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

    – pascalwhoop
    Apr 1 at 13:56

















  • Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

    – James K Polk
    Mar 28 at 16:55











  • They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

    – pascalwhoop
    Mar 28 at 17:15











  • You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

    – Markus Appel
    Mar 29 at 15:23












  • Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

    – pascalwhoop
    Apr 1 at 13:56
















Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

– James K Polk
Mar 28 at 16:55





Your requirements conflict with each other, and you don't seem to understand the difference between symmetric and asymmetric cryptography. Encryption is not necessarily equal to security, and encrypting identical plaintexts to identical ciphertexts is a security issue.

– James K Polk
Mar 28 at 16:55













They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

– pascalwhoop
Mar 28 at 17:15





They are not my requirements really, but our clients requirements. I'd be happy to improve this answer to "be correct" if you help me. The use case is rather common I believe: Encrypt personal data, yet still ensure that a phone number for example results in the same string on the output. I am aware of potential rainbow attacks.

– pascalwhoop
Mar 28 at 17:15













You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

– Markus Appel
Mar 29 at 15:23






You don't need to "encrypt" using security libraries like bouncy castle. Just calculate a hash from the personal data, it's irreversible and deterministic - without using an initialization vector.

– Markus Appel
Mar 29 at 15:23














Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

– pascalwhoop
Apr 1 at 13:56





Well, we do need to reverse the data unfortunately, once the derived models have been calculated, the predicted customer information then gets fed into mailing campaigns etc so the data needs to be decryptable. I wish it was different. but that's what the customer asks for

– pascalwhoop
Apr 1 at 13:56












1 Answer
1






active

oldest

votes


















1

















However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same




You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.



Just suggestions:



  • you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information

  • you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)

  • for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)

Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg




We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)




I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.




apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance




ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly




I am struggling to find my entrypoint.




In most of the cases you may use default Java crypto API with specified provider



Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");


or



Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");


Edit: fixed padding combinations






share|improve this answer



























  • Why did you choose PKCS5Padding if I may ask?

    – pascalwhoop
    Apr 5 at 11:23











  • @pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

    – gusto2
    Apr 5 at 11:35














Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/4.0/"u003ecc by-sa 4.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);














draft saved

draft discarded
















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55402323%2fhow-to-encrypt-data-with-bouncy-castle-while-ensuring-the-result-is-deterministi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1

















However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same




You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.



Just suggestions:



  • you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information

  • you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)

  • for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)

Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg




We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)




I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.




apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance




ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly




I am struggling to find my entrypoint.




In most of the cases you may use default Java crypto API with specified provider



Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");


or



Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");


Edit: fixed padding combinations






share|improve this answer



























  • Why did you choose PKCS5Padding if I may ask?

    – pascalwhoop
    Apr 5 at 11:23











  • @pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

    – gusto2
    Apr 5 at 11:35
















1

















However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same




You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.



Just suggestions:



  • you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information

  • you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)

  • for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)

Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg




We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)




I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.




apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance




ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly




I am struggling to find my entrypoint.




In most of the cases you may use default Java crypto API with specified provider



Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");


or



Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");


Edit: fixed padding combinations






share|improve this answer



























  • Why did you choose PKCS5Padding if I may ask?

    – pascalwhoop
    Apr 5 at 11:23











  • @pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

    – gusto2
    Apr 5 at 11:35














1














1










1










However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same




You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.



Just suggestions:



  • you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information

  • you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)

  • for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)

Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg




We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)




I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.




apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance




ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly




I am struggling to find my entrypoint.




In most of the cases you may use default Java crypto API with specified provider



Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");


or



Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");


Edit: fixed padding combinations






share|improve this answer
















However, because the results will also be used for machine learning, each time a value (say "ABC") gets encrypted, the resulting data should be the same




You may have more options than that. It is stil safer to properly encrypt data where they need to be encrypted. You may have different datasets for different purposes.



Just suggestions:



  • you may anonymize the learning dataset, stripping data of their PII and aggregate them to reasonable level, still valuable for ML. I'd prefer this option because then it's clean without risking to breach any rules or leaking protected information

  • you may hash PII (or categorical data), which would provide unique mapping without reversable mapping (though there will be always mapping from the original values)

  • for quantitative data you may search up "order preserving encryption" which may not be trivial to do properly (that's one of reasons why I'd go for the 1st option)

Taking shortcuts (using ECB or static IV) may in some cases completely break the security of encrypted data. So until you really know what are you doing, you may shoot yourself in your leg




We have decided to use bouncy castle because it supports a large number of encryption modes, including the (apparently fast ECC)




I'd say - you don't needed the BC library. It is a very well written library, but in your case I don't see any specific need for it.




apparently fast ECC). Since we are talking about encrypting several TB a day, it would be nice to have good performance




ECC is still asymmetric encryption usually used for hybrid encryption (encrypting a symmetric data encryption key). So if you aim for speed, you may use check that your JVM and VM allows native AES-NI support or use some fast cipher (salsa,..). Encryption is usually not the performance bottleneck if done properly




I am struggling to find my entrypoint.




In most of the cases you may use default Java crypto API with specified provider



Security.addProvider(new BouncyCastleProvider());
...
Cipher cipher = Cipher.getInstance("AES/OFB/NoPadding", "BC");


or



Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding", "BC");


Edit: fixed padding combinations







share|improve this answer














share|improve this answer



share|improve this answer








edited Apr 5 at 11:32

























answered Apr 1 at 6:54









gusto2gusto2

5,8692 gold badges10 silver badges23 bronze badges




5,8692 gold badges10 silver badges23 bronze badges















  • Why did you choose PKCS5Padding if I may ask?

    – pascalwhoop
    Apr 5 at 11:23











  • @pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

    – gusto2
    Apr 5 at 11:35


















  • Why did you choose PKCS5Padding if I may ask?

    – pascalwhoop
    Apr 5 at 11:23











  • @pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

    – gusto2
    Apr 5 at 11:35

















Why did you choose PKCS5Padding if I may ask?

– pascalwhoop
Apr 5 at 11:23





Why did you choose PKCS5Padding if I may ask?

– pascalwhoop
Apr 5 at 11:23













@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

– gusto2
Apr 5 at 11:35






@pascalwhoop you are right, OFB doesn't need any padding (fixed). Regardless that, important thing is - to use BouncyCastle the easiest option is just using default Java API with the provider parameter. However for common use BC won't give you any advantage (on the contrary, I am not sure if BC supports AES-NI, needs to be checked, it did not like 2 years ago)

– gusto2
Apr 5 at 11:35



















draft saved

draft discarded















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55402323%2fhow-to-encrypt-data-with-bouncy-castle-while-ensuring-the-result-is-deterministi%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript