Reference Guide  2.5.0
psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans Class Reference
Inheritance diagram for psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans:
Collaboration diagram for psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans:

Public Member Functions

def name (self)
 
def validate (self, node, options=None)
 
def apply (self, node, options=None)
 

Detailed Description

Switches on/off the generation of an OpenCL PSy layer for a given
InvokeSchedule. Additionally, it will generate OpenCL kernels for
each of the kernels referenced by the Invoke. For example:

>>> from psyclone.parse.algorithm import parse
>>> from psyclone.psyGen import PSyFactory
>>> API = "gocean1.0"
>>> FILENAME = "shallow_alg.f90" # examples/gocean/eg1
>>> ast, invoke_info = parse(FILENAME, api=API)
>>> psy = PSyFactory(API, distributed_memory=False).create(invoke_info)
>>> schedule = psy.invokes.get('invoke_0').schedule
>>> ocl_trans = GOOpenCLTrans()
>>> ocl_trans.apply(schedule)
>>> print(schedule.view())

Definition at line 60 of file gocean_opencl_trans.py.

Member Function Documentation

◆ apply()

def psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans.apply (   self,
  node,
  options = None 
)
Apply the OpenCL transformation to the supplied GOInvokeSchedule. This
causes PSyclone to generate an OpenCL version of the corresponding
PSy-layer routine. The generated code makes use of the FortCL
library (https://github.com/stfc/FortCL) in order to manage the
OpenCL device directly from Fortran.

:param node: the InvokeSchedule to transform.
:type node: :py:class:`psyclone.psyGen.GOInvokeSchedule`
:param options: set of option to tune the OpenCL generation.
:type options: dict of str:value or None
:param bool options["enable_profiling"]: whether or not to set up the \
        OpenCL environment with the profiling option enabled.
:param bool options["out_of_order"]: whether or not to set up the \
        OpenCL environment with the out_of_order option enabled.
:param bool options["end_barrier"]: whether or not to add an OpenCL \
        barrier at the end of the transformed invoke.

Reimplemented from psyclone.psyGen.Transformation.

Definition at line 222 of file gocean_opencl_trans.py.

222  def apply(self, node, options=None):
223  '''
224  Apply the OpenCL transformation to the supplied GOInvokeSchedule. This
225  causes PSyclone to generate an OpenCL version of the corresponding
226  PSy-layer routine. The generated code makes use of the FortCL
227  library (https://github.com/stfc/FortCL) in order to manage the
228  OpenCL device directly from Fortran.
229 
230  :param node: the InvokeSchedule to transform.
231  :type node: :py:class:`psyclone.psyGen.GOInvokeSchedule`
232  :param options: set of option to tune the OpenCL generation.
233  :type options: dict of str:value or None
234  :param bool options["enable_profiling"]: whether or not to set up the \
235  OpenCL environment with the profiling option enabled.
236  :param bool options["out_of_order"]: whether or not to set up the \
237  OpenCL environment with the out_of_order option enabled.
238  :param bool options["end_barrier"]: whether or not to add an OpenCL \
239  barrier at the end of the transformed invoke.
240 
241  '''
242  if not options:
243  options = {}
244 
245  self.validate(node, options)
246  api_config = Config.get().api_conf("gocean1.0")
247 
248  # Update class attributes
249  if 'enable_profiling' in options:
250  self._enable_profiling = options['enable_profiling']
251 
252  if 'out_of_order' in options:
253  self._out_of_order = options['out_of_order']
254 
255  self._transformed_invokes += 1
256 
257  # Get end_barrier option
258  end_barrier = options.get('end_barrier', True)
259 
260  # Update the maximum value that the queue_number have.
261  for kernel in node.coded_kernels():
262  self._max_queue_number = max(self._max_queue_number,
263  kernel.opencl_options["queue_number"])
264 
265  # Insert, if they don't already exist, the necessary OpenCL helper
266  # subroutines in the root Container.
267  psy_init = self._insert_opencl_init_routine(node.root)
268  init_grid = self._insert_initialise_grid_buffers(node.root)
269  write_grid_buf = self._insert_write_grid_buffers(node.root)
270  self._insert_ocl_read_from_device_function(node.root)
271  self._insert_ocl_write_to_device_function(node.root)
272  init_buf = self._insert_ocl_initialise_buffer(node.root)
273 
274  for kern in node.coded_kernels():
275  self._insert_ocl_arg_setter_routine(node.root, kern)
276 
277  # Insert fortcl, clfortran and c_iso_binding import statement
278  fortcl = ContainerSymbol("fortcl")
279  node.symbol_table.add(fortcl)
280  get_num_cmd_queues = RoutineSymbol(
281  "get_num_cmd_queues", interface=ImportInterface(fortcl))
282  get_cmd_queues = RoutineSymbol(
283  "get_cmd_queues", interface=ImportInterface(fortcl))
284  get_kernel_by_name = RoutineSymbol(
285  "get_kernel_by_name", interface=ImportInterface(fortcl))
286  node.symbol_table.add(get_num_cmd_queues)
287  node.symbol_table.add(get_cmd_queues)
288  node.symbol_table.add(get_kernel_by_name)
289  clfortran = ContainerSymbol("clfortran")
290  node.symbol_table.add(clfortran)
291  cl_finish = RoutineSymbol(
292  "clFinish", interface=ImportInterface(clfortran))
293  cl_launch = RoutineSymbol(
294  "clEnqueueNDRangeKernel",
295  interface=ImportInterface(clfortran))
296  node.symbol_table.add(cl_finish)
297  node.symbol_table.add(cl_launch)
298  iso_c_binding = ContainerSymbol("iso_c_binding")
299  node.symbol_table.add(iso_c_binding)
300  c_loc = RoutineSymbol(
301  "C_LOC", interface=ImportInterface(iso_c_binding))
302  c_null = DataSymbol(
303  "C_NULL_PTR", datatype=INTEGER_TYPE,
304  interface=ImportInterface(iso_c_binding))
305  node.symbol_table.add(c_loc)
306  node.symbol_table.add(c_null)
307 
308  # Include the check_status subroutine if we are in debug_mode
309  if api_config.debug_mode:
310  ocl_utils = ContainerSymbol("ocl_utils_mod")
311  check_status = RoutineSymbol(
312  "check_status", interface=ImportInterface(ocl_utils))
313  node.symbol_table.add(ocl_utils)
314  node.symbol_table.add(check_status)
315 
316  # Declare local variables needed by an OpenCL PSy-layer invoke
317  qlist = node.symbol_table.new_symbol(
318  "cmd_queues", symbol_type=DataSymbol,
319  datatype=UnsupportedFortranType(
320  "integer(kind=c_intptr_t), pointer, save :: cmd_queues(:)"),
321  tag="opencl_cmd_queues")
322  # 'first_time' needs to be an UnsupportedFortranType because it has
323  # SAVE and initial value
324  first = DataSymbol("first_time",
325  datatype=UnsupportedFortranType(
326  "logical, save :: first_time = .true."))
327  node.symbol_table.add(first, tag="first_time")
328  flag = node.symbol_table.new_symbol(
329  "ierr", symbol_type=DataSymbol, datatype=INTEGER_TYPE,
330  tag="opencl_error")
331  global_size = node.symbol_table.new_symbol(
332  "globalsize", symbol_type=DataSymbol,
333  datatype=UnsupportedFortranType(
334  "integer(kind=c_size_t), target :: globalsize(2)"))
335  local_size = node.symbol_table.new_symbol(
336  "localsize", symbol_type=DataSymbol,
337  datatype=UnsupportedFortranType(
338  "integer(kind=c_size_t), target :: localsize(2)"))
339 
340  # Bring all the boundaries at the beginning (since we are going to
341  # use them during the setup block - and they don't change)
342  boundary_vars = []
343  for tag, symbol in node.symbol_table.tags_dict.items():
344  if tag.startswith(("xstart_", "xstop_", "ystart_", "ystop_")):
345  boundary_vars.append(symbol)
346  cursor = 0
347  for assignment in node.walk(Assignment):
348  if assignment.lhs.symbol in boundary_vars:
349  node.children.insert(cursor, assignment.detach())
350  cursor += 1
351 
352  # Create block of code to execute only the first time:
353  setup_block = IfBlock.create(Reference(first), [])
354  setup_block.preceding_comment = \
355  "Initialise OpenCL runtime, kernels and buffers"
356  node.children.insert(cursor, setup_block)
357  setup_block.if_body.addchild(Call.create(psy_init, []))
358 
359  # Set up cmd_queues pointer
360  ptree = Fortran2003.Pointer_Assignment_Stmt(
361  f"{qlist.name} => {get_cmd_queues.name}()")
362  cblock = CodeBlock([ptree], CodeBlock.Structure.STATEMENT)
363  setup_block.if_body.addchild(cblock)
364 
365  # Declare and assign kernel pointers
366  for kern in node.coded_kernels():
367  name = "kernel_" + kern.name
368  try:
369  kpointer = node.symbol_table.lookup_with_tag(name)
370  except KeyError:
371  pointer_type = UnsupportedFortranType(
372  "INTEGER(KIND=c_intptr_t), TARGET, SAVE :: " + name)
373  kpointer = DataSymbol(name, datatype=pointer_type)
374  node.symbol_table.add(kpointer, tag=name)
375  setup_block.if_body.addchild(
376  Assignment.create(
377  Reference(kpointer),
378  Call.create(get_kernel_by_name,
379  [Literal(kern.name, CHARACTER_TYPE)])))
380 
381  # Traverse all arguments and make sure all the buffers are initialised
382  initialised_fields = set()
383  there_is_a_grid_buffer = False
384  for kern in node.coded_kernels():
385  for arg in kern.arguments.args:
386  if arg.argument_type == "field":
387  field = node.symbol_table.lookup(arg.name)
388  if field not in initialised_fields:
389  # Call the init_buffer routine with this field
390  call = Call.create(init_buf, [Reference(field)])
391  setup_block.if_body.addchild(call)
392  initialised_fields.add(field)
393  elif (arg.argument_type == "grid_property" and
394  not arg.is_scalar):
395  if not there_is_a_grid_buffer:
396  # Call the grid init_buffer routine
397  field = node.symbol_table.lookup(
398  kern.arguments.find_grid_access().name)
399  call = Call.create(init_grid, [Reference(field)])
400  setup_block.if_body.addchild(call)
401  there_is_a_grid_buffer = True
402  if not arg.is_scalar:
403  # All buffers will be assigned to a local OpenCL memory
404  # object to easily reference them, make sure this local
405  # variable is declared in the Invoke.
406  name = arg.name + "_cl_mem"
407  try:
408  node.symbol_table.lookup_with_tag(name)
409  except KeyError:
410  node.symbol_table.new_symbol(
411  name, tag=name, symbol_type=DataSymbol,
412  datatype=UnsupportedFortranType(
413  "INTEGER(KIND=c_intptr_t) :: " + name))
414 
415  # Now call all the set_args routines because in some platforms (e.g.
416  # in Xilinx FPGA) knowing which arguments each kernel is going to use
417  # allows the write operation to place the data into the appropriate
418  # memory bank.
419  first_statement_comment = False
420  kernel_names = set()
421  for kern in node.coded_kernels():
422  if kern.name not in kernel_names:
423  kernel_names.add(kern.name)
424  callblock = self._generate_set_args_call(kern, node.scope)
425  for child in callblock.pop_all_children():
426  setup_block.if_body.addchild(child)
427  if not first_statement_comment:
428  child.preceding_comment = (
429  "Do a set_args now so subsequent writes place the "
430  "data appropriately")
431  first_statement_comment = True
432 
433  # Now we can insert calls to write_to_device method for each buffer
434  # and the grid writing call if there is one (in a new first time block)
435  first_statement_comment = False
436  for field in initialised_fields:
437  call = Call.create(
438  RoutineSymbol(field.name+"%write_to_device"), [])
439  setup_block.if_body.addchild(call)
440  if not first_statement_comment:
441  call.preceding_comment = "Write data to the device"
442  first_statement_comment = True
443 
444  if there_is_a_grid_buffer:
445  fieldarg = node.coded_kernels()[0].arguments.find_grid_access()
446  field = node.symbol_table.lookup(fieldarg.name)
447  call = Call.create(write_grid_buf, [Reference(field)])
448  setup_block.if_body.addchild(call)
449 
450  # We will just mark the nodes we are replacing as deleting them inside
451  # the loop would break the PSy-layer backward_dependency method in the
452  # following iterations. We will detach all these nodes after the loop.
453  nodes_to_detach = []
454 
455  # Transform each kernel call loop construct to its equivalent FortCL
456  # statements
457  for kern in node.coded_kernels():
458  outerloop = kern.ancestor(GOLoop).ancestor(GOLoop)
459 
460  # Set up globalsize and localsize arrays
461  garg = node.coded_kernels()[0].arguments.find_grid_access()
462  num_x = api_config.grid_properties["go_grid_nx"].fortran\
463  .format(garg.name)
464  num_y = api_config.grid_properties["go_grid_ny"].fortran\
465  .format(garg.name)
466  assig = Assignment.create(
467  Reference(global_size),
468  Literal(f"(/{num_x}, {num_y}/)",
469  ArrayType(INTEGER_TYPE, [2])))
470  node.children.insert(outerloop.position, assig)
471  local_size_value = kern.opencl_options['local_size']
472  assig = Assignment.create(
473  Reference(local_size),
474  Literal(f"(/{local_size_value}, 1/)",
475  ArrayType(INTEGER_TYPE, [2])))
476  node.children.insert(outerloop.position, assig)
477 
478  # Check that the global_size is multiple of the local_size
479  if api_config.debug_mode:
480  fortran_reader = FortranReader()
481  global_size_expr = fortran_reader.psyir_from_expression(
482  num_x, node.symbol_table)
483  self._add_divisibility_check(node, outerloop.position,
484  check_status, global_size_expr,
485  local_size_value)
486 
487  # Retrieve kernel symbol
488  kernelsym = node.symbol_table.lookup_with_tag(
489  "kernel_" + kern.name)
490 
491  # Choose the command queue number to which to dispatch this kernel.
492  # We have do deal with possible dependencies to kernels dispatched
493  # in different command queues as the order of execution is not
494  # guaranteed.
495  queue_number = kern.opencl_options['queue_number']
496  cmd_queue = ArrayReference.create(
497  qlist, [Literal(str(queue_number), INTEGER_TYPE)])
498  dependency = outerloop.backward_dependence()
499 
500  # If the dependency is a loop containing a kernel, add a barrier if
501  # the previous kernels were dispatched in a different command queue
502  if dependency:
503  for kernel_dep in dependency.coded_kernels():
504  previous_queue = kernel_dep.opencl_options['queue_number']
505  if previous_queue != queue_number:
506  # If the backward dependency is being executed in
507  # another queue we add a barrier to make sure the
508  # previous kernel has finished before this halo
509  # exchange starts.
510  barrier = Assignment.create(
511  Reference(flag),
512  Call.create(cl_finish, [
513  ArrayReference.create(qlist, [
514  Literal(str(previous_queue),
515  INTEGER_TYPE)])]))
516  node.children.insert(outerloop.position, barrier)
517 
518  # If the dependency is something other than a kernel, currently we
519  # dispatch everything else to queue _OCL_MANAGEMENT_QUEUE, so add a
520  # barrier if this kernel is not on queue _OCL_MANAGEMENT_QUEUE.
521  if dependency and not dependency.coded_kernels() and \
522  queue_number != self._OCL_MANAGEMENT_QUEUE:
523  barrier = Assignment.create(
524  Reference(flag),
525  Call.create(cl_finish, [
526  ArrayReference.create(qlist, [
527  Literal(str(self._OCL_MANAGEMENT_QUEUE),
528  INTEGER_TYPE)])]))
529  node.children.insert(outerloop.position, barrier)
530 
531  # Check that everything has succeeded before the kernel launch
532  if api_config.debug_mode:
533  self._add_ready_check(node, outerloop.position, check_status,
534  kern.name, flag, cl_finish,
535  cmd_queue.copy())
536  callblock = self._generate_set_args_call(kern, node.scope)
537  for child in callblock.pop_all_children():
538  node.children.insert(outerloop.position, child)
539 
540  # Then we call the clEnqueueNDRangeKernel
541  assig = Assignment.create(
542  Reference(flag),
543  Call.create(cl_launch, [
544  # OpenCL Command Queue
545  cmd_queue,
546  # OpenCL Kernel object
547  Reference(kernelsym),
548  # Number of work dimensions
549  Literal("2", INTEGER_TYPE),
550  # Global offset (if NULL the global IDs start at
551  # offset (0,0,0))
552  Reference(c_null),
553  # Global work size
554  Call.create(c_loc, [Reference(global_size)]),
555  # Local work size
556  Call.create(c_loc, [Reference(local_size)]),
557  # Number of events in wait list
558  Literal("0", INTEGER_TYPE),
559  # Event wait list that need to be completed before
560  # this kernel
561  Reference(c_null),
562  # Event that identifies this kernel completion
563  Reference(c_null)]))
564  assig.preceding_comment = "Launch the kernel"
565  node.children.insert(outerloop.position, assig)
566  self._insert_kernel_code_in_opencl_file(kern)
567 
568  # Add additional checks if we are in debug mode
569  if api_config.debug_mode:
570  self._add_kernel_check(node, outerloop.position, check_status,
571  kern.name, flag, cl_finish,
572  cmd_queue.copy())
573 
574  nodes_to_detach.append(outerloop)
575 
576  # If we execute the kernels asynchronously, we need to add wait
577  # statements before the halo exchanges to guarantee that the data
578  # has been updated
579  for possible_dependent_node in node.walk(HaloExchange):
580  # The backward_dependences returns the last Loop with a kernel
581  # that has a dependency with this halo exchange
582  dependency = possible_dependent_node.backward_dependence()
583  if dependency:
584  for kernel_dep in dependency.coded_kernels():
585  previous_queue = kernel_dep.opencl_options['queue_number']
586  if previous_queue != self._OCL_MANAGEMENT_QUEUE:
587  # If the backward dependency is being executed in
588  # another queue we add a barrier to make sure the
589  # previous kernel has finished before this one starts.
590  barrier = Assignment.create(
591  Reference(flag),
592  Call.create(cl_finish, [
593  ArrayReference.create(qlist, [
594  Literal(str(previous_queue),
595  INTEGER_TYPE)])]))
596  pos = possible_dependent_node.position
597  node.children.insert(pos, barrier)
598 
599  for node_to_detach in nodes_to_detach:
600  node_to_detach.detach()
601 
602  if end_barrier:
603  self._add_end_barrier(node, flag, cl_finish, qlist)
604 
605  # And at the very end always makes sure that first_time value is False
606  assign = Assignment.create(Reference(first),
607  Literal("false", BOOLEAN_TYPE))
608  assign.preceding_comment = "Unset the first time flag"
609  node.addchild(assign)
610 
611  self._output_opencl_kernels_file()
612 

References psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._add_divisibility_check(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._add_end_barrier(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._add_kernel_check(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._add_ready_check(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._enable_profiling, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._generate_set_args_call(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_initialise_grid_buffers(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_kernel_code_in_opencl_file(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_ocl_arg_setter_routine(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_ocl_initialise_buffer(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_ocl_read_from_device_function(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_ocl_write_to_device_function(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_opencl_init_routine(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._insert_write_grid_buffers(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._kernels_file, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._max_queue_number, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._OCL_MANAGEMENT_QUEUE, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._out_of_order, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._output_opencl_kernels_file(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._transformed_invokes, psyclone.domain.lfric.kernel.lfric_kernel_metadata.LFRicKernelMetadata.validate(), psyclone.transformations.MoveTrans.validate(), psyclone.transformations.Dynamo0p3AsyncHaloExchangeTrans.validate(), psyclone.domain.common.transformations.alg_invoke_2_psy_call_trans.AlgInvoke2PSyCallTrans.validate(), psyclone.domain.common.transformations.alg_trans.AlgTrans.validate(), psyclone.domain.common.transformations.kernel_module_inline_trans.KernelModuleInlineTrans.validate(), psyclone.domain.common.transformations.raise_psyir_2_alg_trans.RaisePSyIR2AlgTrans.validate(), psyclone.domain.gocean.transformations.gocean_const_loop_bounds_trans.GOConstLoopBoundsTrans.validate(), psyclone.domain.gocean.transformations.gocean_move_iteration_boundaries_inside_kernel_trans.GOMoveIterationBoundariesInsideKernelTrans.validate(), psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans.validate(), psyclone.domain.gocean.transformations.raise_psyir_2_gocean_kern_trans.RaisePSyIR2GOceanKernTrans.validate(), psyclone.domain.lfric.transformations.lfric_alg_invoke_2_psy_call_trans.LFRicAlgInvoke2PSyCallTrans.validate(), psyclone.domain.lfric.transformations.raise_psyir_2_lfric_kern_trans.RaisePSyIR2LFRicKernTrans.validate(), psyclone.domain.nemo.transformations.create_nemo_invoke_schedule_trans.CreateNemoInvokeScheduleTrans.validate(), psyclone.domain.nemo.transformations.create_nemo_psy_trans.CreateNemoPSyTrans.validate(), psyclone.domain.nemo.transformations.nemo_allarrayrange2loop_trans.NemoAllArrayRange2LoopTrans.validate(), psyclone.domain.nemo.transformations.nemo_arrayrange2loop_trans.NemoArrayRange2LoopTrans.validate(), psyclone.domain.nemo.transformations.nemo_outerarrayrange2loop_trans.NemoOuterArrayRange2LoopTrans.validate(), psyclone.psyad.transformations.assignment_trans.AssignmentTrans.validate(), psyclone.psyGen.Transformation.validate(), psyclone.psyir.transformations.acc_update_trans.ACCUpdateTrans.validate(), psyclone.psyir.transformations.allarrayaccess2loop_trans.AllArrayAccess2LoopTrans.validate(), psyclone.psyir.transformations.arrayaccess2loop_trans.ArrayAccess2LoopTrans.validate(), psyclone.psyir.transformations.arrayrange2loop_trans.ArrayRange2LoopTrans.validate(), psyclone.psyir.transformations.chunk_loop_trans.ChunkLoopTrans.validate(), psyclone.psyir.transformations.fold_conditional_return_expressions_trans.FoldConditionalReturnExpressionsTrans.validate(), psyclone.psyir.transformations.hoist_local_arrays_trans.HoistLocalArraysTrans.validate(), psyclone.psyir.transformations.hoist_loop_bound_expr_trans.HoistLoopBoundExprTrans.validate(), psyclone.psyir.transformations.hoist_trans.HoistTrans.validate(), psyclone.psyir.transformations.inline_trans.InlineTrans.validate(), psyclone.psyir.transformations.intrinsics.array_reduction_base_trans.ArrayReductionBaseTrans.validate(), psyclone.psyir.transformations.intrinsics.dotproduct2code_trans.DotProduct2CodeTrans.validate(), psyclone.psyir.transformations.intrinsics.intrinsic2code_trans.Intrinsic2CodeTrans.validate(), psyclone.psyir.transformations.intrinsics.matmul2code_trans.Matmul2CodeTrans.validate(), psyclone.psyir.transformations.loop_swap_trans.LoopSwapTrans.validate(), psyclone.psyir.transformations.loop_tiling_2d_trans.LoopTiling2DTrans.validate(), psyclone.psyir.transformations.loop_trans.LoopTrans.validate(), psyclone.psyir.transformations.omp_task_trans.OMPTaskTrans.validate(), psyclone.psyir.transformations.omp_taskwait_trans.OMPTaskwaitTrans.validate(), psyclone.psyir.transformations.parallel_loop_trans.ParallelLoopTrans.validate(), psyclone.psyir.transformations.reference2arrayrange_trans.Reference2ArrayRangeTrans.validate(), psyclone.psyir.transformations.replace_induction_variables_trans.ReplaceInductionVariablesTrans.validate(), psyclone.transformations.OMPDeclareTargetTrans.validate(), psyclone.transformations.DynamoOMPParallelLoopTrans.validate(), psyclone.transformations.Dynamo0p3OMPLoopTrans.validate(), psyclone.transformations.GOceanOMPLoopTrans.validate(), psyclone.transformations.Dynamo0p3RedundantComputationTrans.validate(), psyclone.transformations.Dynamo0p3KernelConstTrans.validate(), psyclone.transformations.ACCRoutineTrans.validate(), psyclone.transformations.KernelImportsToArguments.validate(), psyclone.domain.gocean.transformations.gocean_loop_fuse_trans.GOceanLoopFuseTrans.validate(), psyclone.domain.lfric.transformations.lfric_loop_fuse_trans.LFRicLoopFuseTrans.validate(), psyclone.psyir.transformations.loop_fuse_trans.LoopFuseTrans.validate(), psyclone.domain.gocean.transformations.gocean_extract_trans.GOceanExtractTrans.validate(), psyclone.domain.lfric.transformations.lfric_extract_trans.LFRicExtractTrans.validate(), psyclone.psyir.transformations.extract_trans.ExtractTrans.validate(), psyclone.psyir.transformations.nan_test_trans.NanTestTrans.validate(), psyclone.psyir.transformations.read_only_verify_trans.ReadOnlyVerifyTrans.validate(), psyclone.transformations.ParallelRegionTrans.validate(), psyclone.transformations.OMPParallelTrans.validate(), psyclone.transformations.ACCParallelTrans.validate(), psyclone.transformations.ACCKernelsTrans.validate(), psyclone.transformations.ACCDataTrans.validate(), psyclone.psyir.transformations.psy_data_trans.PSyDataTrans.validate(), psyclone.psyir.transformations.region_trans.RegionTrans.validate(), and psyclone.transformations.ACCEnterDataTrans.validate().

◆ name()

def psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans.name (   self)
:returns: the name of this transformation.
:rtype: str

Reimplemented from psyclone.psyGen.Transformation.

Definition at line 100 of file gocean_opencl_trans.py.

100  def name(self):
101  '''
102  :returns: the name of this transformation.
103  :rtype: str
104  '''
105  return "GOOpenCLTrans"
106 
Here is the caller graph for this function:

◆ validate()

def psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans.validate (   self,
  node,
  options = None 
)
Checks that the supplied InvokeSchedule is valid and that an OpenCL
version of it can be generated.

:param node: the Schedule to check.
:type node: :py:class:`psyclone.psyGen.InvokeSchedule`
:param options: a dictionary with options for transformations.
:type options: dict of str:value or None
:param bool options["enable_profiling"]: whether or not to set up the \
        OpenCL environment with the profiling option enabled.
:param bool options["out_of_order"]: whether or not to set up the \
        OpenCL environment with the out_of_order option enabled.
:param bool options["end_barrier"]: whether or not to add an OpenCL \
        barrier at the end of the transformed invoke.

:raises TransformationError: if the InvokeSchedule is not for the \
                             GOcean1.0 API.
:raises TransformationError: if any of the kernels have arguments \
                             which are passed as a literal.
:raises TransformationError: if any of the provided options is invalid.
:raises TransformationError: if any of the provided options is not \
                             compatible with a previous OpenCL
                             environment.
:raises TransformationError: if any kernel in this invoke has a \
                             global variable used by an import.
:raises TransformationError: if any kernel does not iterate over \
                             the whole grid.

Reimplemented from psyclone.psyGen.Transformation.

Definition at line 107 of file gocean_opencl_trans.py.

107  def validate(self, node, options=None):
108  '''
109  Checks that the supplied InvokeSchedule is valid and that an OpenCL
110  version of it can be generated.
111 
112  :param node: the Schedule to check.
113  :type node: :py:class:`psyclone.psyGen.InvokeSchedule`
114  :param options: a dictionary with options for transformations.
115  :type options: dict of str:value or None
116  :param bool options["enable_profiling"]: whether or not to set up the \
117  OpenCL environment with the profiling option enabled.
118  :param bool options["out_of_order"]: whether or not to set up the \
119  OpenCL environment with the out_of_order option enabled.
120  :param bool options["end_barrier"]: whether or not to add an OpenCL \
121  barrier at the end of the transformed invoke.
122 
123  :raises TransformationError: if the InvokeSchedule is not for the \
124  GOcean1.0 API.
125  :raises TransformationError: if any of the kernels have arguments \
126  which are passed as a literal.
127  :raises TransformationError: if any of the provided options is invalid.
128  :raises TransformationError: if any of the provided options is not \
129  compatible with a previous OpenCL
130  environment.
131  :raises TransformationError: if any kernel in this invoke has a \
132  global variable used by an import.
133  :raises TransformationError: if any kernel does not iterate over \
134  the whole grid.
135  '''
136 
137  if isinstance(node, InvokeSchedule):
138  if not isinstance(node, GOInvokeSchedule):
139  raise TransformationError(
140  f"OpenCL generation is currently only supported for the "
141  f"GOcean API but got an InvokeSchedule of type: "
142  f"'{type(node).__name__}'")
143  else:
144  raise TransformationError(
145  f"Error in GOOpenCLTrans: the supplied node must be a (sub-"
146  f"class of) InvokeSchedule but got {type(node)}")
147 
148  # Validate options map
149  valid_options = ['end_barrier', 'enable_profiling', 'out_of_order']
150  for key, value in options.items():
151  if key in valid_options:
152  # All current options should contain boolean values
153  if not isinstance(value, bool):
154  raise TransformationError(
155  f"InvokeSchedule OpenCL option '{key}' should be a "
156  f"boolean.")
157  else:
158  raise TransformationError(
159  f"InvokeSchedule does not support the OpenCL option "
160  f"'{key}'. The supported options are: {valid_options}.")
161 
162  # Validate that the options are valid with previously generated OpenCL
163  if self._transformed_invokes > 0:
164  if ('enable_profiling' in options and
165  self._enable_profiling != options['enable_profiling']):
166  raise TransformationError(
167  f"Can't generate an OpenCL Invoke with enable_profiling='"
168  f"{options['enable_profiling']}' since a previous "
169  f"transformation used a different value, and their OpenCL"
170  f" environments must match.")
171 
172  if ('out_of_order' in options and
173  self._out_of_order != options['out_of_order']):
174  raise TransformationError(
175  f"Can't generate an OpenCL Invoke with out_of_order='"
176  f"{options['out_of_order']}' since a previous "
177  f"transformation used a different value, and their OpenCL "
178  f"environments must match.")
179 
180  # Now we need to check that none of the invoke arguments is a literal
181  args = args_filter(node.args, arg_types=["scalar"])
182  for arg in args:
183  if arg.is_literal:
184  raise TransformationError(
185  f"Cannot generate OpenCL for Invokes that contain kernel "
186  f"arguments which are a literal, but found the literal "
187  f"'{arg.name}' used as an argument in invoke "
188  f"'{node.name}'.")
189 
190  # Check that we can construct the PSyIR and SymbolTable of each of
191  # the kernels in this Schedule. Also check that none of them access
192  # any form of global data (that is not a routine argument).
193  for kern in node.kernels():
194  KernelModuleInlineTrans().validate(kern)
195  ksched = kern.get_kernel_schedule()
196  global_variables = ksched.symbol_table.imported_symbols
197  if global_variables:
198  raise TransformationError(
199  f"The Symbol Table for kernel '{kern.name}' contains the "
200  f"following symbols with 'global' scope: "
201  f"{[sym.name for sym in global_variables]}. An OpenCL "
202  f"kernel cannot call other kernels and all of the data it "
203  f"accesses must be passed by argument. Use the "
204  f"KernelImportsToArguments transformation to convert such "
205  f"symbols to kernel arguments first.")
206 
207  # In OpenCL all kernel loops should iterate the whole grid
208  for kernel in node.kernels():
209  inner_loop = kernel.ancestor(GOLoop)
210  outer_loop = inner_loop.ancestor(GOLoop)
211  if not (inner_loop.field_space == "go_every" and
212  outer_loop.field_space == "go_every" and
213  inner_loop.iteration_space == "go_all_pts" and
214  outer_loop.iteration_space == "go_all_pts"):
215  raise TransformationError(
216  f"The kernel '{kernel.name}' does not iterate over all "
217  f"grid points. This is a necessary requirement for "
218  f"generating the OpenCL code and can be done by applying "
219  f"the GOMoveIterationBoundariesInsideKernelTrans to each "
220  f"kernel before the GOOpenCLTrans.")
221 

References psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._enable_profiling, psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._out_of_order, and psyclone.domain.gocean.transformations.gocean_opencl_trans.GOOpenCLTrans._transformed_invokes.

Here is the caller graph for this function:

The documentation for this class was generated from the following file: