I have a project that requires us to store xml in the Azure blob storage, and I have problems analysing those file Unicorn Meta Zoo #1: Why another podcast? Announcing the arrival of Valued Associate #679: Cesar Manara Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!How to query Cloud Blobs on Windows Azure StorageAzure - Updating an existing xml file in BLOB storageUploading file directly from a URL in Storage BlobCache the connection to Azure Blob storagePrevent hotlinking in Azure Blob StorageAzure blob storage limitation and filterAzure storage account backup (tables and blobs)Storing lots of files in Azure StorageCan I get Azure Blob storage to send me a blob resized?Upload file which stored in Azure blob storageCan Azure Blob storage container name be made case insensitive?

A Paper Record is What I Hamper

Does Feeblemind produce an ongoing magical effect that can be dispelled?

Could Neutrino technically as side-effect, incentivize centralization of the bitcoin network?

A strange hotel

Is there any hidden 'W' sound after 'comment' in : Comment est-elle?

Raising a bilingual kid. When should we introduce the majority language?

PIC mathematical operations weird problem

Seek and ye shall find

std::is_constructible on incomplete types

Is Diceware more secure than a long passphrase?

Second order approximation of the loss function (Deep learning book, 7.33)

Why isn't everyone flabbergasted about Bran's "gift"?

France's Public Holidays' Puzzle

What's parked in Mil Moscow helicopter plant?

Align column where each cell has two decimals with siunitx

Retract an already submitted recommendation letter (written for an undergrad student)

What is the best way to deal with NPC-NPC combat?

What is the term for a person whose job is to place products on shelves in stores?

Why didn't the Space Shuttle bounce back into space as many times as possible so as to lose a lot of kinetic energy up there?

How to avoid introduction cliches

Is Electric Central Heating worth it if using Solar Panels?

Book with legacy programming code on a space ship that the main character hacks to escape

I preordered a game on my Xbox while on the home screen of my friend's account. Which of us owns the game?

My bank got bought out, am I now going to have to start filing tax returns in a different state?



I have a project that requires us to store xml in the Azure blob storage, and I have problems analysing those file



Unicorn Meta Zoo #1: Why another podcast?
Announcing the arrival of Valued Associate #679: Cesar Manara
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!How to query Cloud Blobs on Windows Azure StorageAzure - Updating an existing xml file in BLOB storageUploading file directly from a URL in Storage BlobCache the connection to Azure Blob storagePrevent hotlinking in Azure Blob StorageAzure blob storage limitation and filterAzure storage account backup (tables and blobs)Storing lots of files in Azure StorageCan I get Azure Blob storage to send me a blob resized?Upload file which stored in Azure blob storageCan Azure Blob storage container name be made case insensitive?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








0















Our project requires us to store the xml in the azure blob storage, and right now we have to analysis the xml file in the backend, and then select the xml file by filtering the information stored in the file, and finally return the url of the corresponding xml file.



I have no idea what kind of measure could achieve this, could you help me if you have any idea? Thank you very much.










share|improve this question
























  • You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

    – Mike Oryszak
    Mar 22 at 20:05











  • You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

    – Vamshi
    Mar 22 at 21:34


















0















Our project requires us to store the xml in the azure blob storage, and right now we have to analysis the xml file in the backend, and then select the xml file by filtering the information stored in the file, and finally return the url of the corresponding xml file.



I have no idea what kind of measure could achieve this, could you help me if you have any idea? Thank you very much.










share|improve this question
























  • You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

    – Mike Oryszak
    Mar 22 at 20:05











  • You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

    – Vamshi
    Mar 22 at 21:34














0












0








0








Our project requires us to store the xml in the azure blob storage, and right now we have to analysis the xml file in the backend, and then select the xml file by filtering the information stored in the file, and finally return the url of the corresponding xml file.



I have no idea what kind of measure could achieve this, could you help me if you have any idea? Thank you very much.










share|improve this question
















Our project requires us to store the xml in the azure blob storage, and right now we have to analysis the xml file in the backend, and then select the xml file by filtering the information stored in the file, and finally return the url of the corresponding xml file.



I have no idea what kind of measure could achieve this, could you help me if you have any idea? Thank you very much.







xml filter azure-storage






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 22 at 15:57









marc_s

586k13011281273




586k13011281273










asked Mar 22 at 15:51









tiefu caitiefu cai

1




1












  • You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

    – Mike Oryszak
    Mar 22 at 20:05











  • You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

    – Vamshi
    Mar 22 at 21:34


















  • You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

    – Mike Oryszak
    Mar 22 at 20:05











  • You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

    – Vamshi
    Mar 22 at 21:34

















You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

– Mike Oryszak
Mar 22 at 20:05





You will need some form of an index for your files and their metadata. This is one of the big advantages to using a document based service like CosmosDB. I see a similar question here, and the answers may be helpful: stackoverflow.com/questions/14440506/…

– Mike Oryszak
Mar 22 at 20:05













You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

– Vamshi
Mar 22 at 21:34






You could use Azure Data Lake Gen2 APIs (docs.microsoft.com/en-us/azure/storage/blobs/…) to analyze your blobs present in Azure Blob storage with help of any analytics engines such as Hadoop, Spark, etc. provided as part of HDInsight. As part of your analytics job, you will filter the xml files based on their content and write the filtered URLs in another blob/azure table/cosmos db.

– Vamshi
Mar 22 at 21:34













1 Answer
1






active

oldest

votes


















0














I created a simple sample to read XML files stored in Azure Blob Storage and parse & filter them by a condition to output a list of blob urls. My sample is using Azure Storage SDK v8.0.0 for Java and a HTML parser jsoup in Java.



Here is the dependencies of my maven project.



<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-storage -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>


The XML content I used in my project is like as below, and there are 6 files for testing.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person SYSTEM "person.dtd">
<person>
<name>Peter Pan</name>
<gender>Male</gender>
<age>30</age>
</person>


And the code is as below.



import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.sql.Date;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.ListBlobItem;
import com.microsoft.azure.storage.blob.SharedAccessBlobPermissions;
import com.microsoft.azure.storage.blob.SharedAccessBlobPolicy;

public class FilterXMLFiles

private static final String storageConnectionString = "<your storage account connection string>";
private static final String containerName = "xmls"; // It's my container to store these XML files.

private static CloudBlobClient serviceClient;

public static void main(String[] args) throws InvalidKeyException, URISyntaxException, StorageException, MalformedURLException, IOException
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
serviceClient = account.createCloudBlobClient();
CloudBlobContainer container = serviceClient.getContainerReference(containerName);
// Generate a SAS token for reading XML files in the container
SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();
policy.setPermissions(EnumSet.allOf(SharedAccessBlobPermissions.class));
policy.setSharedAccessStartTime(Date.valueOf(LocalDate.now().minusYears(2)));
policy.setSharedAccessExpiryTime(Date.valueOf(LocalDate.now().plusYears(2)));
String token = container.generateSharedAccessSignature(policy, null);
// Get the list of blobs in the container.
Iterator<ListBlobItem> blobs = container.listBlobs().iterator();
// Create a List object to store these filtered urls.
List<String> blobUrls = new ArrayList<>();
while(blobs.hasNext())
// Get the blob url with SAS token
String uri = blobs.next().getUri().toString();
String urlWithSAS = String.format("%s?%s",uri, token);
// System.out.println(urlWithSAS);
// Parse and filter by jsoup with the condition age >= 30
Document root = Jsoup.parse(new URL(urlWithSAS), 30*1000);
int age = Integer.parseInt(root.selectFirst("age").text());
if(age >= 30) // It's the condition age >=30
blobUrls.add(uri);
// blobUrls.add(urlWithSAS);


System.out.println(String.join("n", blobUrls));





The result looks like this:



https://<my account name>.blob.core.windows.net/xmls/p1.xml
https://<my account name>.blob.core.windows.net/xmls/p3.xml
https://<my account name>.blob.core.windows.net/xmls/p5.xml


The sample is so simple for explaining my idea. Of couse, in a real applicated scenario, considering for filter query flexibility, I think using XQuery like SQL to realize this is a better solution, such as using Saxon (a third party library in Java) instead of jsoup to filter by XQuery Expression as condition. For more details about XQuery, you can refer to Xquery Tutorial and the documents of Saxon.






share|improve this answer























  • @tiefucai Any update or concern?

    – Peter Pan
    Apr 2 at 7:26











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55303390%2fi-have-a-project-that-requires-us-to-store-xml-in-the-azure-blob-storage-and-i%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














I created a simple sample to read XML files stored in Azure Blob Storage and parse & filter them by a condition to output a list of blob urls. My sample is using Azure Storage SDK v8.0.0 for Java and a HTML parser jsoup in Java.



Here is the dependencies of my maven project.



<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-storage -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>


The XML content I used in my project is like as below, and there are 6 files for testing.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person SYSTEM "person.dtd">
<person>
<name>Peter Pan</name>
<gender>Male</gender>
<age>30</age>
</person>


And the code is as below.



import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.sql.Date;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.ListBlobItem;
import com.microsoft.azure.storage.blob.SharedAccessBlobPermissions;
import com.microsoft.azure.storage.blob.SharedAccessBlobPolicy;

public class FilterXMLFiles

private static final String storageConnectionString = "<your storage account connection string>";
private static final String containerName = "xmls"; // It's my container to store these XML files.

private static CloudBlobClient serviceClient;

public static void main(String[] args) throws InvalidKeyException, URISyntaxException, StorageException, MalformedURLException, IOException
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
serviceClient = account.createCloudBlobClient();
CloudBlobContainer container = serviceClient.getContainerReference(containerName);
// Generate a SAS token for reading XML files in the container
SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();
policy.setPermissions(EnumSet.allOf(SharedAccessBlobPermissions.class));
policy.setSharedAccessStartTime(Date.valueOf(LocalDate.now().minusYears(2)));
policy.setSharedAccessExpiryTime(Date.valueOf(LocalDate.now().plusYears(2)));
String token = container.generateSharedAccessSignature(policy, null);
// Get the list of blobs in the container.
Iterator<ListBlobItem> blobs = container.listBlobs().iterator();
// Create a List object to store these filtered urls.
List<String> blobUrls = new ArrayList<>();
while(blobs.hasNext())
// Get the blob url with SAS token
String uri = blobs.next().getUri().toString();
String urlWithSAS = String.format("%s?%s",uri, token);
// System.out.println(urlWithSAS);
// Parse and filter by jsoup with the condition age >= 30
Document root = Jsoup.parse(new URL(urlWithSAS), 30*1000);
int age = Integer.parseInt(root.selectFirst("age").text());
if(age >= 30) // It's the condition age >=30
blobUrls.add(uri);
// blobUrls.add(urlWithSAS);


System.out.println(String.join("n", blobUrls));





The result looks like this:



https://<my account name>.blob.core.windows.net/xmls/p1.xml
https://<my account name>.blob.core.windows.net/xmls/p3.xml
https://<my account name>.blob.core.windows.net/xmls/p5.xml


The sample is so simple for explaining my idea. Of couse, in a real applicated scenario, considering for filter query flexibility, I think using XQuery like SQL to realize this is a better solution, such as using Saxon (a third party library in Java) instead of jsoup to filter by XQuery Expression as condition. For more details about XQuery, you can refer to Xquery Tutorial and the documents of Saxon.






share|improve this answer























  • @tiefucai Any update or concern?

    – Peter Pan
    Apr 2 at 7:26















0














I created a simple sample to read XML files stored in Azure Blob Storage and parse & filter them by a condition to output a list of blob urls. My sample is using Azure Storage SDK v8.0.0 for Java and a HTML parser jsoup in Java.



Here is the dependencies of my maven project.



<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-storage -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>


The XML content I used in my project is like as below, and there are 6 files for testing.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person SYSTEM "person.dtd">
<person>
<name>Peter Pan</name>
<gender>Male</gender>
<age>30</age>
</person>


And the code is as below.



import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.sql.Date;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.ListBlobItem;
import com.microsoft.azure.storage.blob.SharedAccessBlobPermissions;
import com.microsoft.azure.storage.blob.SharedAccessBlobPolicy;

public class FilterXMLFiles

private static final String storageConnectionString = "<your storage account connection string>";
private static final String containerName = "xmls"; // It's my container to store these XML files.

private static CloudBlobClient serviceClient;

public static void main(String[] args) throws InvalidKeyException, URISyntaxException, StorageException, MalformedURLException, IOException
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
serviceClient = account.createCloudBlobClient();
CloudBlobContainer container = serviceClient.getContainerReference(containerName);
// Generate a SAS token for reading XML files in the container
SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();
policy.setPermissions(EnumSet.allOf(SharedAccessBlobPermissions.class));
policy.setSharedAccessStartTime(Date.valueOf(LocalDate.now().minusYears(2)));
policy.setSharedAccessExpiryTime(Date.valueOf(LocalDate.now().plusYears(2)));
String token = container.generateSharedAccessSignature(policy, null);
// Get the list of blobs in the container.
Iterator<ListBlobItem> blobs = container.listBlobs().iterator();
// Create a List object to store these filtered urls.
List<String> blobUrls = new ArrayList<>();
while(blobs.hasNext())
// Get the blob url with SAS token
String uri = blobs.next().getUri().toString();
String urlWithSAS = String.format("%s?%s",uri, token);
// System.out.println(urlWithSAS);
// Parse and filter by jsoup with the condition age >= 30
Document root = Jsoup.parse(new URL(urlWithSAS), 30*1000);
int age = Integer.parseInt(root.selectFirst("age").text());
if(age >= 30) // It's the condition age >=30
blobUrls.add(uri);
// blobUrls.add(urlWithSAS);


System.out.println(String.join("n", blobUrls));





The result looks like this:



https://<my account name>.blob.core.windows.net/xmls/p1.xml
https://<my account name>.blob.core.windows.net/xmls/p3.xml
https://<my account name>.blob.core.windows.net/xmls/p5.xml


The sample is so simple for explaining my idea. Of couse, in a real applicated scenario, considering for filter query flexibility, I think using XQuery like SQL to realize this is a better solution, such as using Saxon (a third party library in Java) instead of jsoup to filter by XQuery Expression as condition. For more details about XQuery, you can refer to Xquery Tutorial and the documents of Saxon.






share|improve this answer























  • @tiefucai Any update or concern?

    – Peter Pan
    Apr 2 at 7:26













0












0








0







I created a simple sample to read XML files stored in Azure Blob Storage and parse & filter them by a condition to output a list of blob urls. My sample is using Azure Storage SDK v8.0.0 for Java and a HTML parser jsoup in Java.



Here is the dependencies of my maven project.



<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-storage -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>


The XML content I used in my project is like as below, and there are 6 files for testing.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person SYSTEM "person.dtd">
<person>
<name>Peter Pan</name>
<gender>Male</gender>
<age>30</age>
</person>


And the code is as below.



import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.sql.Date;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.ListBlobItem;
import com.microsoft.azure.storage.blob.SharedAccessBlobPermissions;
import com.microsoft.azure.storage.blob.SharedAccessBlobPolicy;

public class FilterXMLFiles

private static final String storageConnectionString = "<your storage account connection string>";
private static final String containerName = "xmls"; // It's my container to store these XML files.

private static CloudBlobClient serviceClient;

public static void main(String[] args) throws InvalidKeyException, URISyntaxException, StorageException, MalformedURLException, IOException
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
serviceClient = account.createCloudBlobClient();
CloudBlobContainer container = serviceClient.getContainerReference(containerName);
// Generate a SAS token for reading XML files in the container
SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();
policy.setPermissions(EnumSet.allOf(SharedAccessBlobPermissions.class));
policy.setSharedAccessStartTime(Date.valueOf(LocalDate.now().minusYears(2)));
policy.setSharedAccessExpiryTime(Date.valueOf(LocalDate.now().plusYears(2)));
String token = container.generateSharedAccessSignature(policy, null);
// Get the list of blobs in the container.
Iterator<ListBlobItem> blobs = container.listBlobs().iterator();
// Create a List object to store these filtered urls.
List<String> blobUrls = new ArrayList<>();
while(blobs.hasNext())
// Get the blob url with SAS token
String uri = blobs.next().getUri().toString();
String urlWithSAS = String.format("%s?%s",uri, token);
// System.out.println(urlWithSAS);
// Parse and filter by jsoup with the condition age >= 30
Document root = Jsoup.parse(new URL(urlWithSAS), 30*1000);
int age = Integer.parseInt(root.selectFirst("age").text());
if(age >= 30) // It's the condition age >=30
blobUrls.add(uri);
// blobUrls.add(urlWithSAS);


System.out.println(String.join("n", blobUrls));





The result looks like this:



https://<my account name>.blob.core.windows.net/xmls/p1.xml
https://<my account name>.blob.core.windows.net/xmls/p3.xml
https://<my account name>.blob.core.windows.net/xmls/p5.xml


The sample is so simple for explaining my idea. Of couse, in a real applicated scenario, considering for filter query flexibility, I think using XQuery like SQL to realize this is a better solution, such as using Saxon (a third party library in Java) instead of jsoup to filter by XQuery Expression as condition. For more details about XQuery, you can refer to Xquery Tutorial and the documents of Saxon.






share|improve this answer













I created a simple sample to read XML files stored in Azure Blob Storage and parse & filter them by a condition to output a list of blob urls. My sample is using Azure Storage SDK v8.0.0 for Java and a HTML parser jsoup in Java.



Here is the dependencies of my maven project.



<!-- https://mvnrepository.com/artifact/com.microsoft.azure/azure-storage -->
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-storage</artifactId>
<version>8.0.0</version>
</dependency>
<dependency>
<!-- jsoup HTML parser library @ https://jsoup.org/ -->
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.11.3</version>
</dependency>


The XML content I used in my project is like as below, and there are 6 files for testing.



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE person SYSTEM "person.dtd">
<person>
<name>Peter Pan</name>
<gender>Male</gender>
<age>30</age>
</person>


And the code is as below.



import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.security.InvalidKeyException;
import java.sql.Date;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.EnumSet;
import java.util.Iterator;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import com.microsoft.azure.storage.CloudStorageAccount;
import com.microsoft.azure.storage.StorageException;
import com.microsoft.azure.storage.blob.CloudBlobClient;
import com.microsoft.azure.storage.blob.CloudBlobContainer;
import com.microsoft.azure.storage.blob.ListBlobItem;
import com.microsoft.azure.storage.blob.SharedAccessBlobPermissions;
import com.microsoft.azure.storage.blob.SharedAccessBlobPolicy;

public class FilterXMLFiles

private static final String storageConnectionString = "<your storage account connection string>";
private static final String containerName = "xmls"; // It's my container to store these XML files.

private static CloudBlobClient serviceClient;

public static void main(String[] args) throws InvalidKeyException, URISyntaxException, StorageException, MalformedURLException, IOException
CloudStorageAccount account = CloudStorageAccount.parse(storageConnectionString);
serviceClient = account.createCloudBlobClient();
CloudBlobContainer container = serviceClient.getContainerReference(containerName);
// Generate a SAS token for reading XML files in the container
SharedAccessBlobPolicy policy = new SharedAccessBlobPolicy();
policy.setPermissions(EnumSet.allOf(SharedAccessBlobPermissions.class));
policy.setSharedAccessStartTime(Date.valueOf(LocalDate.now().minusYears(2)));
policy.setSharedAccessExpiryTime(Date.valueOf(LocalDate.now().plusYears(2)));
String token = container.generateSharedAccessSignature(policy, null);
// Get the list of blobs in the container.
Iterator<ListBlobItem> blobs = container.listBlobs().iterator();
// Create a List object to store these filtered urls.
List<String> blobUrls = new ArrayList<>();
while(blobs.hasNext())
// Get the blob url with SAS token
String uri = blobs.next().getUri().toString();
String urlWithSAS = String.format("%s?%s",uri, token);
// System.out.println(urlWithSAS);
// Parse and filter by jsoup with the condition age >= 30
Document root = Jsoup.parse(new URL(urlWithSAS), 30*1000);
int age = Integer.parseInt(root.selectFirst("age").text());
if(age >= 30) // It's the condition age >=30
blobUrls.add(uri);
// blobUrls.add(urlWithSAS);


System.out.println(String.join("n", blobUrls));





The result looks like this:



https://<my account name>.blob.core.windows.net/xmls/p1.xml
https://<my account name>.blob.core.windows.net/xmls/p3.xml
https://<my account name>.blob.core.windows.net/xmls/p5.xml


The sample is so simple for explaining my idea. Of couse, in a real applicated scenario, considering for filter query flexibility, I think using XQuery like SQL to realize this is a better solution, such as using Saxon (a third party library in Java) instead of jsoup to filter by XQuery Expression as condition. For more details about XQuery, you can refer to Xquery Tutorial and the documents of Saxon.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 26 at 6:01









Peter PanPeter Pan

12.7k3824




12.7k3824












  • @tiefucai Any update or concern?

    – Peter Pan
    Apr 2 at 7:26

















  • @tiefucai Any update or concern?

    – Peter Pan
    Apr 2 at 7:26
















@tiefucai Any update or concern?

– Peter Pan
Apr 2 at 7:26





@tiefucai Any update or concern?

– Peter Pan
Apr 2 at 7:26



















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55303390%2fi-have-a-project-that-requires-us-to-store-xml-in-the-azure-blob-storage-and-i%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Kamusi Yaliyomo Aina za kamusi | Muundo wa kamusi | Faida za kamusi | Dhima ya picha katika kamusi | Marejeo | Tazama pia | Viungo vya nje | UrambazajiKuhusu kamusiGo-SwahiliWiki-KamusiKamusi ya Kiswahili na Kiingerezakuihariri na kuongeza habari

Swift 4 - func physicsWorld not invoked on collision? The Next CEO of Stack OverflowHow to call Objective-C code from Swift#ifdef replacement in the Swift language@selector() in Swift?#pragma mark in Swift?Swift for loop: for index, element in array?dispatch_after - GCD in Swift?Swift Beta performance: sorting arraysSplit a String into an array in Swift?The use of Swift 3 @objc inference in Swift 4 mode is deprecated?How to optimize UITableViewCell, because my UITableView lags

Access current req object everywhere in Node.js ExpressWhy are global variables considered bad practice? (node.js)Using req & res across functionsHow do I get the path to the current script with Node.js?What is Node.js' Connect, Express and “middleware”?Node.js w/ express error handling in callbackHow to access the GET parameters after “?” in Express?Modify Node.js req object parametersAccess “app” variable inside of ExpressJS/ConnectJS middleware?Node.js Express app - request objectAngular Http Module considered middleware?Session variables in ExpressJSAdd properties to the req object in expressjs with Typescript