Dajbych.net


Working with Azure Batch

, 3 minutes to read

azure2017 logo

Azure Batch is a very practical and highly customizable service. It is one of the Azure Compute services designed for performing compute-intensive or data-intensive tasks. Azure Batch combined with Azure Storage is a pair of draft horses that can handle unusual workloads. Setup is very easy, and the program interface is intuitive and easy to code against. There is even an API to retrieve files from the working directory of your application.

What workload is Azure Batch good for

Azure Batch is good for data transformation. It is ideal when you need a large amount of computing power for a short period of time. I probably would not recommend Azure Batch for high availability scenarios. That is where Service Fabric serves much better. I don’t even know how to apply OS updates to Azure Batch compute nodes. I just didn’t need them. I allocated over a hundred virtual machines, performed what I needed to compute, and deallocated them when the job was done.

Where input and output are stored

It is ineffective to access a single database from hundreds of machines at one time. Azure Batch is prepared to read/write data from/to Azure Blob Storage. You don’t have to write this logic yourself. Azure Batch downloads input files from the storage to the compute node before the job starts. When the job is finished, it uploads output files back to the storage automatically (when the blob with the same name already exists, it is overwritten).

How to code the logic that is being executed

The business logic must be packed into an old-fashioned traditional (.exe) application, which is cool because you are not limited by any sandbox. The only limit you have is the boundary of the virtual machine. I wanted to say that the application can run with administrator privileges.

Application deployment & updates

The application is stored as a single ZIP archive in Azure Storage. From there, it is deployed to every compute node after its allocation. You can easily update the app, but you must restart the compute node to deploy the updated application automatically. You are not limited to one application only (however, you cannot exceed 20 applications), which can be used for separating application dependencies into individual packages that can be updated independently.

Organization

Virtual machine nodes are grouped into pools. All tasks of the same kind are grouped into jobs. One pool must be assigned to a job. One Azure Batch account can contain multiple pools. One task can declare multiple dependent tasks that must be completed before the follow-up task is started. A task consists of two important parts – a unique name and a command that is executed in a command line and calls one of your applications in the compute node. Azure Batch itself is an orchestration service that holds a list of your jobs and takes care of compute nodes in your pools.

Management

You can manage Azure Batch manually in the Azure Portal, Azure CLI, or in the Batch Explorer. There is also an option to manage it programmatically in Python or in .NET (Core) by the Azure.Batch NuGet package. You can create a new pool with Active Directory Batch account credentials only. The pool can be scaled manually or automatically by a custom script that has an overview of pending tasks.

How to start experimenting with Azure Batch